0% found this document useful (0 votes)
22 views

Study and Prototype Development of Chinese Medicine Knowledge AI Assistant (CMKA) System Based On Large Language Models and Knowledge Graph

This is a report for the study and prototype development of Chinese Medicine Knowledge AI Assistant (CMKA) system based on the AI enabling technologies - Large Language Model (LLM) of Generative AI and Knowledge Graphs for exploration and learning in Chinese Medicine Knowledge.

Uploaded by

Benny Lam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Study and Prototype Development of Chinese Medicine Knowledge AI Assistant (CMKA) System Based On Large Language Models and Knowledge Graph

This is a report for the study and prototype development of Chinese Medicine Knowledge AI Assistant (CMKA) system based on the AI enabling technologies - Large Language Model (LLM) of Generative AI and Knowledge Graphs for exploration and learning in Chinese Medicine Knowledge.

Uploaded by

Benny Lam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Study and Prototype development of

Chinese Medicine Knowledge AI Assistant (CMKA) System based


on Large Language Models and Knowledge Graph

Benny Lam Michael Ng Bernie Kwok


[email protected] [email protected] [email protected]
Director, Right Station Dean, Science Faculty, HKBU Executive Director, Kinth Technology

Abstract

This is a report for the study and prototype development of Chinese Medicine Knowledge AI
Assistant (CMKA) system based on the AI enabling technologies - Large Language Model
(LLM) of Generative AI and Knowledge Graphs for exploration and learning in Chinese
Medicine Knowledge.

We have surveyed some latest LLM Chatbots for Chinese Medicine and Chinese Medicine
knowledge datasets. Their functionalities are discussed briefly in this report. We also
developed a standard program to benchmark some latest opensource foundation LLMs from
China and overseas. OpenAI close-source model of ChatGPT 4-turbo is also used as the
performance reference. The correct rate scope is tabulated below:
No. LLM No of TCM Question Correct Answer Correct Rate
1 Internlm2.5-1.8B-chat 32 19 59%
2 OpenAI ChatGPT4 Turbo 32 17 53%
3 Internlm2.5-1.8B-chat 32 17 53%
4 OpenAI ChatGPT4 Turbo 32 16 50%
5 Alibaba Qwen 1.5-7B 32 16 50%
6 Alibaba Qwen 1.5-7B 32 14 44%
7 Meta LLaMa3-8B 32 13 41%
8 Google Gemma2-latest 32 13 41%
9 Google Gemma2-latest 32 13 41%
10 Meta LLaMa3-8B 32 10 31%
The Analysis content of LLM responses were not exactly same as the Analysis content of
correct answer because of Chinese Medicine’s implicit and metaphorical way of explanation.
For those multiple-choice questions answered correctly, most analysis responses of LLM are
inline with the correct analysis content. For example, the analysis response of the
Internlm2.5-1.8B-chat LLM is compared with the analysis content of the correct answer. The
LLM analysis is long but it had captured the key points in the Correct Answer.
Question Choice Correct Correct Answer LLM Analysis Response
Answer Analysis Content
治疗肺痈 A.野菊花 D 鱼腥草能治疗肺痈 治疗肺部长期脓肿及肺肉质变等疾病时,常常需要清热解毒、
的要药是 B.金银花 吐脓、湿热淋证、 抗感染的药物。鱼腥草常被认为是一种较好的选择,因为它的
( )。 C.蒲公英 湿热泻痢,为治肺 味道清香、药性微寒,并具有较强的抗菌能力,可以抵抗包括
D.鱼腥草 痈之要药。 金葡菌在内的多种病原菌的作用,特别适合对其他内用或外用
E.败酱草 药物耐受不良的患者使用,并且其利尿发汗功效使得病灶能较
快地得到清除。因此,在治疗肺痈(一种呼吸道炎症)的选择
中,鱼腥草是恰当的选择。其他选项如野菊花、金银花、蒲公
英和败酱草虽然都有清热解毒等作用,但它们并不适合长期或
者作为主要药物使用。
故此答案为 D。

Page 1
The above LLM Benchmark result is just part of the summary result of full records in an
Excel file, which contain all Q&A and Analysis responses from each LLM for 32 questions
and the comparison of LLM Answers and Analysis with the correct answer of the sample
dataset. The full record of benchmark result can be provided upon request and approval.

By using the best performance opensource LLM in our benchmark result, we developed a
LLM Chatbot prototype for CMKA system. In addition, we implemented a Retrieval
Augmented Generation (RAG) with injection of a classic Traditional Chinese Medicine
(TCM) book for our prototype testing.

Knowledge Graph (KG) built from Chinese Medicine triple datasets can greatly improve the
understanding of complex concepts and relationships of Chinese Medicine properties,
symptoms and treatments. By setting up a community version of neo4j Knowledge Graph
system, we explored the KG system with triple dataset consisted of 123K records of Chinese
Medicine. Within the constraint of existing hardware, we evaluated the potential of using KG
system to help learning the complex concepts and relationships of Chinese Medicine
properties by few illustrative examples.

Going forward, we will pursue on the following targets when more resources and supports
are available:
1. Enrich the Traditional Chinese Medicine (TCM) knowledgebase in LLM and KG with
helps from TCM professionals and experts.
2. Release the CMKA prototype online for public trial access.
3. Integrating TCM KG knowledgebase with LLM with sufficient hardware resources
4. Enhance CMKA with interactive friendly user interface.
5. Promote public interest in learning TCM knowledge.

Last but not least, we would also like to thank David SU, Assistant Librarian I (Chinese
Medicine Library) of Hong Kong Baptist University Library for his advices on classical
literatures of Traditional Chinese Medicine and comments to this study and prototype
development.

Introduction

A Chinese Medicine Knowledge AI Assistant (CMKA) system is proposed to meet the


challenges faced by modern learners in understanding Traditional Chinese Medicine (TCM)
from classical texts, which are due to the cryptic and metaphorical language, diverse
diagnostic methods, varied treatment approaches, and interconnected concepts presented
without systematic organization. To tackle these challenges, we propose a solution combining
a Generative Pre-Trained (GPT) AI assistant with a Knowledge Graph knowledgebase
featuring visualization tools. The GPT AI assistant would be combined with classical Chinese
Medicine texts to interpret the complex language and provide simple and easy to understand
explanations, while the Knowledge Graph Knowledgebase would structure and visualize the
relationships between different aspects of Chinese Medicine. This combination would
facilitate interactive exploration and personalized learning, allowing learners to query the
system in natural language and receive contextually relevant information. This innovative

Page 2
approach aims to make the extensive knowledge of Chinese Medicine more accessible and
understandable, bridging the gap between ancient texts and modern educational needs, and
thus fostering a deeper understanding of Traditional Chinese Medicine (TCM) knowledge.

In a three-month exploratory study and prototype development of CMKA system, we


conducted the survey, study and prototype development as follows:
• Study the latest LLM Chatbots for Chinese Medicine and Chinese Medicine
knowledge datasets currently available in public.
• Explore their functionalities and limitations.
• Develop a standard program to benchmark some latest opensource foundation LLMs
from China and overseas.
• OpenAI close-source model of ChatGPT 4-turbo is also used as the performance
reference.
• Explore a knowledge graph system with triple dataset developed for Chinese
Medicine knowledge and evaluate its capabilities and advantages.

This report presents the result and findings of the survey, study and prototype and propose the
way forwards for the future development of CMKA system.

Existing LLM Chatbots and Datasets for Chinese Medicine knowledge

There are many opensource LLMs finetuned for Chinese Medicine knowledge. We have
surveyed a few well known popular LLMs and provide brief introduction below:
1. 神农 ShenNong-TCM-LM [1]
基座模型: Chinese-Alpaca-Plus-7B
数据: 以开源的中医药知识图谱为基础
调用 ChatGPT 得到 1Iw+的围绕中医药的指令数据
Model: Chinese-Alpaca-Plus-7B
Dataset: ShenNong_TCM Dataset, 中医药指令数据集 ShenNong_TCM_Dataset2.
2. GPT (HuaTuoGPT) (2023-05-24) [2]
BenTsao 本草 (original name: HuaTuo 华驼): Instruction-tuning Large Language Models
With Chinese Medical Knowledge
Code: HuatuoGPT
License: Apache 2.0
基座模型: BLOOMZ-7bl
数据: 用 ChatGPT 构成的指令数据集 (61K)
真实的医生指令集 (70K)
ChatGPT 角色扮演医患对话数据集 (68K)
真实的医患对话数据集 (26K)
算力: 8×NVIDIA A100 GPUs, 3 epochs, 16000 steps
院校: 香港中文大学(深圳)

3. CMLM-ZhongJing(中医大语言模型-仲景)[3]

Page 3
A Traditional Chinese Medicine large language model, inspired by the wisdom of the
eminent representative of ancient Chinese medical scholars, Zhang Zhongjing. This
model aims to illuminate the profound knowledge of Traditional Chinese Medicine,
bridging the gap between ancient wisdom and modern technology, and providing a
reliable and professional tool for the Traditional Chinese Medical fields. However, all
generated results are for reference only and should be provided by experienced
professionals for diagnosis and treatment results and suggestions.

Besides the above three LLMs, there are many other LLM Chatbots developed for Chinese
Medicine. Most of them are designed by fine-tuning foundation LLMs with some pre-trained
Chinese Medicine knowledge data. By such approach, these LLMs can be used directly for
Chatbot related to Chinese Medicine applications such as patient diagnosis and treatment.
However, with such approach, updating and enhancing the LLM with more or latest
knowledge and data is difficult and limited because change and finetuning of foundation
LLM are very expensive and time-consuming.

Standard Program for Benchmarking Opensource Foundation LLMs

Since training a Large Language Model (LLM) for Chinese medicine knowledge from
scratch by using huge volume of Traditional Chinese Medicine literatures data have been out
of reach in term of hardware and human resources for our project, we proposed a more
feasible approach by starting on the foundation LLM available in AI opensource community.
Once a good foundation LLM is selected, we can improve the Chinese medicine knowledge
of the LLM by using Retrieval Augmented Generation (RAG) and Knowledge Graph in our
CMKA system.

In order to evaluate and select a suitable opensource foundation LLM for CMKA system, a
standard python program has been developed for benchmarking several opensource
foundation LLMs available recently in China and overseas.

Four opensource LLMs, Meta LLaMa 3-8B, Google Gemma 2, Alibaba (通义千问) Qwen
1.5-7B, and Shanghai AI Laboratory (书生) Internlm 2.5-1.8B-chat had been selected for our
evaluation. All four opensource LLMs have been downloaded and setup with Ollama [4] as a
LLM serving engine running on a local notebook pc with the following configuration:
• CPU: Intel i7-10750H CPU @ 2.60GHz
• RAM: 16G
• OS: Ubuntu ver. 22.04
• GPU: NVIDIA GeForce RTX 3060 Laptop GPU with 6G VRAM

For benchmarking reference, ChatGPT4-Turbo of OpenAI was also used to be evaluated by


the same standard program and same testing data. Since ChatGPT4-Turbo is so far a very
high performance ranked LLM in many public evaluation benchmark tests, it is a good
reference measure of our opensource foundation LLMs benchmarking.

Since ChatGPT4-Turbo is a close-source LLM, which cannot be downloaded in our local pc,
we have to run the evaluation program in the local pc with API call to OpenAI ChatGPT

Page 4
services. With the support of Hong Kong Baptist University (HKBU), we have been able to
connect to ChatGPT4-Turbo via HKBU ChatGPT API Service, which was in turn provided
by Microsoft Azure OpenAI Service.

Evaluation Test Datasets

For the evaluation test dataset to benchmark the selected LLMs, we made use of
a small sample dataset of a Chinese Medicine competition provided in Alibaba Cloud Tianchi
platform (阿里云天池平台) https://tianchi.aliyun.com/competition/entrance/532199/introduction in Aug 2024.
The following is the information about the competition:
CCKS2024-TCMBench 中医知识理解与推理能力评测——不微调赛道 [5]
With the breakthrough progress of large language models (LLM) such as ChatGPT and
GPT-4 in the field of natural language processing (NLP), their capabilities similar to
general artificial intelligence (AGI) have paved the way for the inheritance and
innovation of traditional Chinese medicine. Provides new possibilities. As an important
part of traditional Chinese medicine, traditional Chinese medicine has profound
cultural heritage and unique diagnostic and treatment characteristics. Different from
Western medicine evidence-based medicine, traditional Chinese medicine is
significantly different from Western medicine in terms of theoretical basis, diagnostic
methods, treatment methods, prevention concepts and overall concepts. At the same
time, in the field of traditional Chinese medicine, language expressions are often rich in
philosophical and symbolic meanings, which puts higher requirements on language
models. Therefore, directly applying existing Western medicine assessment benchmarks
to TCM assessment cannot fully evaluate the potential and actual utility of language
models in TCM knowledge.

In order to promote the development and implementation of LLM in the field of


traditional Chinese medicine, the team of Professor Wang Xiaoling of East China
Normal University teamed up with experts and scholars such as Zhou Yaqian of Fudan
University and Zhou Yiping of Shanghai University of Traditional Chinese Medicine to
launch the TCMBench evaluation benchmark. The TCMBench evaluation benchmark
relies on the rich question bank of the Traditional Chinese Medicine Practitioner
Qualification Examination, which comprehensively covers the three major examination
areas, including basic theories of Traditional Chinese Medicine, clinical medicine of
Traditional Chinese Medicine, and comprehensive content of Western medicine and
clinical medicine, as well as 16 core examination subjects, with a total of 9,788 real
questions and 5,473 practice questions. TCMBench will be one of the evaluation tasks
of CCKS-2024 and will be evaluated based on the Alibaba Cloud Tianchi platform.

The small sample dataset contains 32 multiple choices Questions and Answers (Q&A)
covering different aspects of Chinese Medicine knowledge, diagnose, treatment and practices.
The reason why only 32 Q&A sample dataset was used is because only this small dataset has
included the correct answers to the questions while other datasets contain the questions for
the competition only. Nevertheless, our standard benchmarking program has been developed
to be able to test on Chinese Medicine Q&A dataset with unlimited number of questions and

Page 5
hence we can easily expand our benchmarking scope by adding hundreds, thousands or more
Q&A evaluation testing data as long as correct answers are included in the dataset.

Though the sample dataset size is small, it was well designed by the TCMBench team to
cover comprehensive area of Chinese Medicine knowledge. A few representative Q&A of the
testing dataset are shown below (Table 1):
Question Choices Ans Analysis
治疗肺痈的要药是 A.野菊花 D 鱼腥草能治疗肺痈吐
( )。 B.金银花 脓、湿热淋证、湿热泻
C.蒲公英 痢,为治肺痈之要药。
D.鱼腥草
E.败酱草
办理执业医师资格 A.所在地县级以上人民政府 A 《中华人民共和国执业
证和注册在什么机 卫生行政部门申请注册 医师法》第十三条:国
构?( ) B.所在地区级以上人民政府 家实行医师执业注册制
卫生行政部门申请注册 度。取得医师资格的,
C.所在地市级以上人民政府 可以向所在地县级以上
卫生行政部门申请注册 人民政府卫生行政部门
D.所在地省级以上人民政府 申请注册。
卫生行政部门申请注册
E.以上均非
患者,男性,43 A.感冒风寒 C 根据患者临床表现,诊
岁。身热,微恶 B.感冒风热 断为暑湿伤表证。暑湿
风,汗少,头晕重 C.感冒暑湿 伤表,腠理不固,故恶
胀而痛,心烦口 D.时行感冒 风;暑为阳邪,故身
渴,胸闷恶心,小 E.体虚感冒 热、汗出;暑湿夹杂,
便短赤,舌苔薄黄 伤津液,故心烦口渴,
腻,脉濡数。此患 小便短赤;湿邪内盛,
者应诊为( )。 困阻清阳,故头晕重胀
等;舌苔薄黄腻,脉濡
数均为暑湿伤表之象。
推动人体生长发 A.推动与调控作用 A {Not provided}
育,激发各脏腑经 B.温煦与凉润作用
络等组织生理功能 C.防御作用
是气的( )。 D.固摄作用
E.中介作用
Table 1

Benchmarking Program

A python program has been developed for testing 5 LLMs (including the ChatGPT4 Turbo) in
answering the same set of 32 questions. The program reads each question and combine with a
standard instruction prompt below to query the LLM and ask it to select the correct answer

Page 6
and provide its own analysis as well. Based on few-shots prompting approach, a standard
instruction prompt is designed as follows:
===============================================
您是中醫專家。這是一個全新的問題,與之前的任何對話無關。請僅基於當前提供的
問題作答,不要參考或引用任何先前的內容。請嚴格按照以下格式回答用戶的中醫問
題:
1. 回答模板:
answer: <ANSWER>
analysis: <ANALYSIS>
其中 <ANSWER> 必須是一個大寫字母(A、B、C、D 或 E),<ANALYSIS> 是不超
過 80 個字的分析。
2. 規則:
- 嚴格遵守上述模板格式!!!!
- 不允許在 “answer:” 或 “analysis:” 之前添加任何文字
- 不允許添加任何其他無關的文字、評論或額外信息
- 不允許評論或重複題目或選項
- 不要使用引號或其他標點符號包裹答案或分析
請務必嚴格遵守此格式。請完全按照以下 3. 例子的 llm reply: 格式進行操作
3. 例子:
-------------------
user:
肛管直肠癌的早期症状除便血外,还可见( )。
A.大便变形
B.腹胀肠鸣
C.脱出不纳
D.排便习惯改变
E.肛门潮湿
llm reply:
answer: D
analysis: 肛管直肠癌的早期症状通常包括排便习惯的改变。例如,患者可能会发现大
便频率增加、减少,或者粪便形状发生了变化(如变细或有凹陷)。
这些改变往往比明显的血便更早出现。选项 A、B、C、E 在某些情况下也可能伴随肛
管直肠癌早期症状,但排便习惯的改变是最常见的信号。
-------------------
user:
临终关怀的伦理意义表现在( )。
A.它有利于建立和谐社会
B.它体现生命神圣、质量和价值的统一
C.它理解临终病人的需求
D.它维护临终病人的生命尊严
E.它同情和关心临终病人的家属

Page 7
llm reply:
answer: B
analysis: 临终关怀在伦理意义上的表现主要体现在以下几个方面:
1. 生命神圣、质量和价值的统一:临终关怀尊重每一个生命的尊严,通过提供全面的
照护服务来提高生命质量。
2. 维护临终病人的生命尊严:临终关怀关注病人的尊严需求,通过无微不至的照顾,
帮助他们有尊严地面对死亡。
3. 同情和关心临终病人的家属:临终关怀不仅关注病人本身,还非常重视其家庭成员
的感受。提供心理支持和建议,帮助家属度过难关。综上所述,B、D、E 选项体现了
临终关怀的伦理意义。
------------------
只可如上例子格式用指定的 “answer:” 和 “analysis:” 在英文关键字後回復你的答案。请
勿将这些关键字翻译成中文。
除關鍵字 “answer:” 和 “analysis:” 外,請全部使用中文回覆。
===============================================

Since most generative AI LLMs generate next token based on probability of most likely token
found in a massively pre-trained language latent space, it may sometimes generate reply not
exactly following our prompt instruction and even gives some hallucination response [6]
accidentally. Therefore, a lot of programming and debugging effort have been spent on
ensuring the LLM is following our standard instruction prompt and question so that its
answer and analysis contents could be properly extracted and captured in our benchmark
testing records.

For testing the understanding and capability of answering the Chinese Medicine questions
correctly without any hints, the correct answer and analysis content of the Q&A dataset were
never provided to the LLMs. The responses from the LLM were extracted and processed to
compare with the known correct answer by the program itself without any interaction with
LLM again. The answers and analysis of the LLM responses and the known correct answers
and analysis were all recorded in a text file and an Excel file automatically for all questions.

Page 8
Benchmark Result

After many rounds of trial and error in running the benchmark program with different
standard instruction prompt variations and extraction procedures, the responses and
benchmark results from 5 LLMs had been stabilized. In the final round, we performed the
benchmarking evaluation tests two times for each LLM and tabulated the benchmark result in
the descending order of Correct Rate (Accuracy) as follows (Table 2):
No of TCM Correct Correct
No. LLM Question Answer Rate
1 Internlm2.5-1.8B-chat 32 19 59%
2 OpenAI ChatGPT4 Turbo 32 17 53%
3 Internlm2.5-1.8B-chat 32 17 53%
4 OpenAI ChatGPT4 Turbo 32 16 50%
5 Alibaba Qwen 1.5-7B 32 16 50%
6 Alibaba Qwen 1.5-7B 32 14 44%
7 Meta LLaMa3-8B 32 13 41%
8 Google Gemma2-latest 32 13 41%
9 Google Gemma2-latest 32 13 41%
10 Meta LLaMa3-8B 32 10 31%
Table 2

The above LLM Benchmark result is just part of the summary result of full records in an
Excel file, which contain all Q&A and Analysis responses from each LLM for 32 questions
and the comparison of LLM Answers and Analysis with the correct answer of the sample
dataset. The full record of benchmark result can be provided upon request and approval.

The top score of Correct Rate is the LLM (书生) Internlm 2.5-1.8B-chat from Shanghai AI
Laboratory, which had given 19 correct answers out of 32 questions achieving an accuracy of
59% in one of two tests. It is better than OpenAI ChatGPT4 Turbo with an accuracy of 53%.

For comparison and illustration, some answers replied by LLM Internlm 2.5-1.8B-chat are
shown below (Table 3). The analysis reply from LLM Internlm is rather long compared with
the analysis of correct Q&A dataset shown above in Table 1.
Question Choices Internlm2.5 Internlm2.5
LLM Ans LLM Analysis
治疗肺痈的 A.野菊花 D 治疗肺部长期脓肿及肺肉质变等
要药是 B.金银花 疾病时,常常需要清热解毒、抗
( )。 C.蒲公英 感染的药物。鱼腥草常被认为是
D.鱼腥草 一种较好的选择,因为它的味道
E.败酱草 清香、药性微寒,并具有较强的
抗菌能力,可以抵抗包括金葡菌
在内的多种病原菌的作用,特别
适合对其他内用或外用药物耐受
不良的患者使用,并且其利尿发
汗功效使得病灶能较快地得到清

Page 9
除。因此,在治疗肺痈(一种呼
吸道炎症)的选择中,鱼腥草是
恰当的选择。其他选项如野菊
花、金银花、蒲公英和败酱草虽
然都有清热解毒等作用,但它们
并不适合长期或者作为主要药物
使用。
故此答案为 D。
办理执业医 A.所在地县级以上 A 根据我国的医师资格考试和执业
师资格证和 人民政府卫生行政部 资格的相关规定,要想办理执业
注册在什么 门申请注册 医师资格证,必须参加在全国范
机构?( ) B.所在地区级以上 围内定期举行的医师资格考试并
人民政府卫生行政部 通过后。这一过程通常由所在地
门申请注册 的县级以上的政府卫生行政部门
C.所在地市级以上 负责办理的注册事项,这确保了
人民政府卫生行政部 注册的有效性,并符合中国相关
门申请注册 法律、法规。因此,
D.所在地省级以上 正确选项是 A:“所在地县级以
人民政府卫生行政部 上人民政府卫生行政部门申请注
门申请注册 册”,以此回答了你的问题。
E.以上均非
患者,男 A.感冒风寒 C 本题考查的是中医临床辨证论治
性,43 岁。 B.感冒风热 的知识,该患者症状包括身热、
身热,微恶 C.感冒暑湿 微恶风、汗少、头晕重胀痛、心
风,汗少, D.时行感冒 烦口渴、胸闷恶心以及舌苔薄黄
头晕重胀而 E.体虚感冒 腻(多为湿邪表现)等症状。
痛,心烦口 “面色苍白舌质淡”提示“体虚感
渴,胸闷恶 冒”,综合以上症状,更符合暑
心,小便短 湿感冒的诊断;若无此症状或体
赤,舌苔薄 质虚弱者,需从其他两方面来鉴
黄腻,脉濡 别:A.感冒风寒可能并无胸浊胀
数。此患者 满和心烦口渴的症状,B.感冒风
应诊为 热可能与头晕重胀痛相对应较
( )。 好,但“微恶风可资辨。D.时行
感冒一般没有湿邪表现。E.不伴
有舌质淡表现;因此最合理的鉴
别诊断是 C.
答案是:C
推动人体生 A.推动与调控作用 A 推动与调控作用是指气对人体生
长发育,激 B.温煦与凉润作用 长发育、脏腑经络等组织生理功
发各脏腑经 C.防御作用 能的影响和调节作用。具体表现
络等组织生 D.固摄作用 为通过推动作用促进机体各系统

Page 10
理功能是气 E.中介作用 的功能和器官的健康;同时,气
的( )。 还具有潜移默化的调控作用,可
帮助调节体内的各种功能活动以
及预防疾病的发生和发展。所以
正确答案是 A,即气的推动与调
控作用是使人体生长发育,激发
各脏腑经络等组织生理功能的主
要环节。综上分析可知答案是
A。这个选择项不仅正确描述了
问题的核心内容,还结合了气对
人体各个方面的积极影响,提供
了一个全面准确的回答。
Table 3

CMKA Chatbot Prototype

We have developed a Chinese Medicine Knowledge Assistant (CMKA) Chatbot Prototype,


specialized in Chinese Medicine knowledge by using Internlm 2.5-1.8B-chat, the top
performance foundation LLM in our benchmark result.

Page 11
The following figure shows a few replies from the CMKA running on our local PC for some
Chinese Medicine related questions.

Retrieval Augmented Generation (RAG)

Large Language Models (LLMs) have shown remarkable capabilities, but they face
challenges such as hallucinations, outdated knowledge and non-transparent reasoning
processes. Retrieval Augmented Generation (RAG) has emerged as a promising solution to
these issues by incorporating real-time data from external documents and knowledgebases
into LLM responses. This enhances the accuracy and credibility of the models, particularly
for knowledge-intensive tasks, and allows for continuous knowledge updates and integration
of domain-specific information (e.g. Chinese Medicine). RAG merges LLMs' intrinsic
knowledge with the vast, dynamic repositories of external documents and databases.
The following diagram shows the processes of a typical RAG system:

Page 12
Retrieval-Augmented Generation for Large Language Models: A Survey [7]

Our CMKA prototype chatbot has included the function of RAG. Due to our limited
hardware resource, we only injected one Traditional Chinese Medicine classic book (黄帝内
经原文白话全集) pdf with 474 pages into the vector knowledgebase for our prototype
testing as shown below:

As an illustration of the RAG function of CMKA prototype chatbot, we ask our chatbot the
following question without invoking RAG function:
==============
Bennylam: 黃帝說:人年紀老的時候,不能生育子女,是由於精力衰竭了呢,还是受
自然規律的限定呢?
岐伯如何說?
Internlm/lnternlm2.5:1.8b-Chat: 对不起,我不太清楚您所說的內容是指什么。期待您的
新问题,我会尽我最大的努力帮助您。
==============
As seen from the above dialog, we found that without making any reference to the imported
document 黄帝内经原文白话全集 via RAG, the LLM has no knowledge how to answer our

Page 13
question. However, after we stated explicitly to the LLM to make reference to the RAG
imported document as follows:
==============
在黃帝內经素问白活解內文,黃帝說:人年紀老的時候,不能生育子女,是由於精力
衰竭了呢,还是受自然規律的限定呢?
你知道岐伯如何說吗?
==============
The chatbot could answer correctly by the RAG function as shown in highlighted box below:

Though RAG can help the LLM chatbot to get a more accuracy answer by referencing the
related document content, it may sometimes miss the most relevant content if the document
content is too long. Since most opensource foundation LLMs are based on Transformers
technology, which has an inherent problem of long context (e.g., 100K to 10M tokens) [8]. To
process long context query, Transformers based model processing is prohibitively expensive
compared to short context (e.g., 4K tokens). There are many new research and architectural
design being conducted to address this problem. We will further explore and evaluate the best
solutions from state-of-the-art researches in future.

Knowledge Graph

Retrieval-augmented generation (RAG) applications are great at answering simple questions


by using external data with AI language models. However, they struggle with complex
questions that require connecting different pieces of information [9]. This is because RAG
Page 14
applications just use a vector database to store the chunked information and hence not able to
find the relational information just by ranking scope of similar record chunks.

Knowledge graphs are a better solution for handling complex questions. They store
information as a relationship network of connected nodes. This setup allows RAG
applications to easily move from one piece of information to another, finding all the related
details quickly.

Using a Knowledge Graph in a RAG application makes searching for information more
efficient, especially when dealing with connected data. The best part is that you can add any
new type of information to the graph without needing to change its overall structure.

In simple terms, Knowledge Graphs help RAG applications become smarter at answering
tricky questions by organizing information in a way that shows how everything is connected
by known factual relationships. [10]

A triple dataset in a Knowledge Graph consists of three parts: a subject, a predicate


(relationship), and an object. For example, in "Alice (subject) loves (predicate) ice cream
(object)," this structure helps to represent relationships and facts about entities, enabling
machines to understand and navigate complex information.

To explore how Knowledge Graph can help to organize the information of Chinese Medicine,
we have setup a neo4j community edition 5.17 [11] knowledge graph system in our local pc.
To test and evaluate building Knowledge Graph (KG) for Traditional Chinese Medicine
(TCM) data, we have made use of a publicly available TCM triple dataset with over 123K
entity records sourced from an opensource program “中医 TCM-neo4j 知识图谱” by
ACNgeeeekoi [12].

Since there are more than ten thousand herbs of Traditional Chinese Medicine (TCM) in the
world, we could not build a TCM Knowledge Graph (KG) at such scale due to limited
hardware, human resources and timeframe. For testing purpose, we have built our TCM-KG
with 938 commonly known TCM Herbs (中藥).

The neo4j KG system allows us to use a language called Cypher which is a declarative graph
query language [13]. It was a SQL-equivalent language for graph knowledgebase. Similar to
SQL, Cypher lets users focus on what to retrieve from graph, rather than how to retrieve it
allowing for efficient and expressive queries that reveal previously unknown data
connections and entity clusters. The neo4j KG system can output the result of a cypher query
in form of visual graph, table, text or programming code format.

The following figure shows the cypher query: [MATCH (n:`中葯`) RETURN n ] return a
visual graph of 938 nodes of Chinese Herbs `中葯` data from the KG knowledgebase.

Page 15
From the Node labels and relationship types of the injected TCM triple entities dataset as
shown in above figure, you can see the node names: TS_MS, chemical_MM,
symmap_chemical, 中药, 功能, 归经, 毒性, 治法, 注意, 用法, 用量, 药味, 药性, 证候, 贮
藏, 部位. Most of the Chinese names are easy to recognise and understand but the English
terms may need some explanations. Before explaining the English terms, we need to
introduce a TCM and Modern Medicine (MM) integration project – SymMap [14] as follows:
SymMap integrates traditional Chinese medicine (TCM) with modern medicine (MM)
through both internal molecular mechanism and external symptom mapping, thus
provides massive information on herbs/ingredients, targets, as well as the clinical
symptoms and diseases they are used to treat for drug screening efforts.

Page 16
Now, we update the SymMap knowledgebase into version 2.0 to embrace the recent
data update. First, we invited medical experts to manually extract and proofread
newly recorded herbs, symptoms, syndromes and the relationships among them based
on new Chinese Pharmacopoeia. Second, we obtained lots of high-quality
relationships among herbs, ingredients and proteins and evidentiary literatures and
texts that supports these relationships through the combination of information
extraction and strict manual review by medical experts. Third, according to the data
of mass spectrum experiments from Chinese Pharmacopoeia (2020 edition) and
biomedical literatures, we further annotated the TCM ingredients into 4 categories,
including QC ingredients, blood ingredients, metabolic ingredients, as well as other
ingredients. Finally, for users’ convenience, the detailed relationships among
symptoms and ingredients are shown as network figures. And a convenient function of
GO/pathway enrichment for gene targets is presented in a user-defined way.

To explain the English terms; TS_MS, chemical_MM and symmap_chemical, we select the
following triple relationship sample records in our triple dataset for illustration:
1. 黄芪 -> SMIT03044 ->symmap_chemical
2. 腰痛 -> Left Flank Pain ->TS_MS
3. SMIT11390 -> Bilateral Headache ->chemical_MM

Analysis of the sample records


1. 黄芪 -> SMIT03044 -> symmap_chemical
In this TCM triple relationship, 黄芪 (Astragalus membranaceus) is linked to the
SymMap ingredient ID SMIT03044. The "symmap_chemical" term indicates that
SMIT03044 represents a chemical compound or ingredient of Molecule name: 1,7-
Dihydroxy-3,9-Dimethoxy Pterocarpene [15] related to the herb 黄芪. SymMap
integrates information about herbs and their chemical constituents, allowing
researchers to explore the potential bioactive components in TCM herbs.

2. 腰痛 -> Flank Pain -> TS_MS


This relationship maps the TCM symptom 腰痛 to the modern medicine symptom
"Flank Pain". The "TS_MS" abbreviation stands for "Traditional Symptom to
Modern Symptom", which is a key function of SymMap. By establishing these
mappings, SymMap enables the integration of TCM and modern medical concepts,
facilitating the translation between the two systems and potentially identifying
common underlying mechanisms.

3. SMIT11390 -> Bilateral Headache -> chemical_MM


In this case, the SymMap ingredient ID SMIT11390 is linked to the modern medicine
symptom Bilateral Headache (双侧头痛). The "chemical_MM" abbreviation
suggests that SMIT11390 represents a chemical compound that may be associated
with or have an effect on the modern medical symptom of bilateral headache. This
type of relationship can help identify potential bioactive compounds in TCM that may
have applications in modern medicine for treating specific symptoms or conditions.

Page 17
In summary, the terms "symmap_chemical", "TS_MS", and "chemical_MM" are part of
SymMap's framework for integrating TCM and Modern Medicine. They represent the
relationships between herbs and their chemical constituents, TCM Symptoms and Modern
Medical Symptoms (TS_MS), and TCM chemicals and Modern Medical
(chemical_MM) conditions, respectively. These connections enable researchers to explore
the potential applications of TCM in modern medicine and gain insights into the underlying
mechanisms of action of TCM remedies.

It is difficult to visualize 938 nodes of Chinese Herb in one chart. Therefore, we limited our
focus to 25 nodes for illustration as shown below:

In our KG system, you can double click on any node and expand its connections with other
nodes or other relationship entities (e.g. 功能, 归经, 毒性, 治法, 注意, 用法 …etc.)

For example, if we double click on Borax (硼砂), the relationships depending on Borax are
displayed in the following KG chart. You can see that Borax (硼砂) is a kind of Chinese Herb
(中葯) for sores on mouth and tongue (口舌生疮). You can also find out one of the medicinal
properties (葯性) of Borax is Cool (凉). This type of relationship is aligned with our common
sense concept of taking some “Cool” medicine to treat mouth and tongue pain. From the KG
chart, one of the functions (功能) of Borax is able to clear away heat and eliminate phlegm
(清热消痰). Moreover, we can see many other properties of Borax such as its medicinal taste
(葯味) is salty (咸).

Page 18
If we further explore the relationships of connected nodes in the KG, we can see the KG chart
becomes complex and busy. When we double click on the node of sores on mouth and tongue
(口舌生疮), the connected nodes to Borax (硼砂), more entities and relationships are shown
as in the chart below.

Page 19
In some nodes, we can also see the symmap_chemical related to the Chinese Herb. For
example, when we double click on flaxseed (亞麻籽), you can see more than twenty
symmap_chemical ID are associated with flaxseed, which is useful for Chinese Medicine
researcher to explore the chemical components of Chinese Herbs.

By using Cypher, we can also explore properties of Chinese Herb from other angle e.g.
medicinal properties (葯性). In the following chart, we show 25 types of medicinal properties
(葯性), such as severe heat (大熱), severe cold (大寒), dryness (燥).

When we double click on severe heat (大熱) and severe cold (大寒), you can see the Chinese
medicinal property of Cinnamon (肉桂) belongs to severe heat (大熱) while the medicinal
properties of Plaster (石膏) and (Tinglizi) 葶苈子 are severe cold (大寒) as shown in the
following chart. An interesting finding, when we double click on dryness (燥), a Chinese
Herb found related is Tibetan Calamus (藏菖蒲). After further expanding the chemical

Page 20
property Tibetan Calamus, a special Chinese medicinal property of Sharp (銳), which means
fast and strong effect, is found related to the Tibetan Calamus as well.

KG system is very flexible in how to retrieve information of the triple entities data. Cypher
language is easy to use with different relationship entities (e.g. 功能, 归经, 毒性, 治法, 注意,
用法 etc.). By just changing the `葯性` to `功能` in the previous Cypher statement:
MATCH (n: `功能`) RETURN n LIMIT 25
25 nodes of function (功能) of Chinese Medicine are retrieved. After double clicking on any
one node such as “Promoting blood circulation and removing blood stasis (活血化瘀)”, all
connections with the node are expanded as shown in the chart below.

Page 21
Applying Knowledge Graphs in Traditional Chinese Medicine (TCM) study and learning
offers significant potential for enhancing understanding and research in this complex domain
study, Knowledge Graphs can structure the vast wisdom of TCM, connecting various
elements such as herbs, symptoms, treatments, and theories into a comprehensive network of
information.

In TCM study, Entity Linking can help connect specific herbs or treatments to their
corresponding nodes in the knowledge graph, allowing students and practitioners to quickly
access relevant information. For example, linking the Chinese herb Borax (硼砂) to its
various properties, e.g. Cool (凉); uses, e.g. To heal sores on mouth and tongue (口舌生疮);
and relationships with other herbs, e.g. Chinese Waxgourd Peel (冬瓜皮) or conditions. This
kind of exploration is related to the Path Mining research in Knowledge Graph [16].

Path mining in knowledge graph involves exploring and identifying hidden relationships and
patterns between entities, revealing insights that aren't immediately obvious. It helps uncover
complex connections within the data. Path Mining in TCM Knowledge Graphs can uncover
hidden relationships between symptoms, treatments, and underlying principles of Chinese
medicine. This can lead to new insights into treatment combinations or reveal unexpected
connections between seemingly unrelated aspects of TCM. Reasoning and Inference
capabilities can be particularly valuable in TCM learning. By integrating information from
various sources, Knowledge Graphs can help learner understand complex TCM concepts,
predict potential treatment outcomes, or even suggest novel approaches based on traditional
principles.

Furthermore, Knowledge Graphs can facilitate the integration of modern scientific research
with traditional TCM knowledge, creating a bridge between ancient wisdom and modern
medical understanding. This integration can enhance TCM education, support evidence-based
practice, and potentially lead to new discoveries in both traditional and modern medicine.
As mentioned before, SymMap is a project making use of Knowledge Graph to integrate
TCM with Modern Medicine (MM). For example, when we search Chinese Herb: Waxgourd
Peel (冬瓜皮) in the SymMap site [17], we can find its Latin name is Benincasae
Exocarpium, which is a common name used in Modern Medicine literature and under a class
of modern medicine: Diuretic Dampness Excreting Drugs (利尿祛濕藥) as shown below.

Page 22
Conclusion and Way Forwards
During the 3-month period of study and prototype development of CMKA system, we have
evaluated 5 LLMs on the accuracy in replying Chinese Medicine multiple choice questions
and select the best performance LLM (书生) Internlm 2.5-1.8B-chat from Shanghai AI
Laboratory [18] as our foundation LLM for the development of CMKA prototype system. To
enhance the LLM in understanding the knowledge of Traditional Chinese Medicine (TCM),
we have added the function of Retrieval Augmented Generation (RAG) in our prototype.
Within the constraint of our existing hardware and timeframe, we have imported one 474-
page classic TCM pdf book (黄帝内经原文白话全集) into the vector database of our
prototype. We found that the reply on TCM query by the foundation LLM was improved by
the RAG function and the imported TCM book.

Large language models LLMs have shown impressive performance on a wide range of tasks.
However, they can struggle with understanding and applying domain-specific knowledge,
such as traditional Chinese medicine (TCM). This is because TCM has its own unique
terminology, concepts, and reasoning patterns that are not well captured by general-purpose
language models. RAG involves retrieving relevant passages from a vector database and
using them to improve the reply from the LLM. However, the vector databases used for RAG
are often general-purpose and lack the depth and specificity required for TCM. Additionally,
TCM knowledge is often expressed in a more implicit, metaphorical way compared to
Western medicine [19]. LLMs trained on more literal language may have difficulty capturing
these nuances.

Knowledge graphs (KGs) can help address these shortcomings by providing a structured
representation of TCM knowledge. KG connects model entities (like symptoms, herbs,
functions and treatments) and the relationships between them. This allows for more precise
retrieval and reasoning compared to unstructured text.

The following are some key areas that KG can improve LLM performance on TCM tasks:
1. Capturing domain-specific terminology and concepts: KG can model the unique
vocabulary and conceptual framework of TCM, which general-purpose knowledge bases
often lack.
2. Representing complex relationships: KG can model the intricate relationships between
TCM concepts, like the associations between symptoms, syndromes, and treatments. This
allows for more sophisticated reasoning.
3. Enabling multi-hop inference: KG support traversing multiple relationships to uncover
indirect connections. This is useful for answering complex queries that require chaining
together multiple pieces of knowledge.
4. Providing structured data for training: The structured nature of KG allows for targeted
training of LLMs on specific aspects of TCM knowledge. This can help the model learn
the appropriate reasoning patterns for TCM.

By integrating KGs, LLMs can leverage the strengths of both unstructured language
understanding and structured knowledge representation to better comprehend and apply TCM
knowledge. This can lead to more accurate and reliable answers to TCM related queries and
learning.
Page 23
We have setup the neo4j KG system running on our local notebook pc environment with a
publicly available TCM triple entities dataset with over 123K records. We have tried the
access to the structured TCM data contained in the KG dataset by some python programs.

However, we have not been able to integrate and run the LLM CMKA prototype with the KG
system at the same time because of the limited hardware memory and processing power.
Looking forwards, we are going to seek supports from stakeholders to turn our CMKA
prototype system into a full-scale functional system available for public access through a
Software-as-a-Service (SaaS) model in the Internet. We believe students, teachers and
Chinese Medicine researchers and manufacturers, who are studying, working or doing
research on Chinese Medicine will be our stakeholders or customers. Our final released
CMKA Cloud service will contribute a great values and benefits in society for promoting the
interest and awareness of the public in learning Chinese Medicine.

Specifically, we hope to achieve the following targets in our future stages of CMKA
development:
1. Enrich the Traditional Chinese Medicine (TCM) knowledgebase in LLM and KG with
helps from TCM professionals and experts.
2. Release the CMKA prototype online for public trial access.
3. Integrating TCM KG knowledgebase with LLM with sufficient hardware resources
4. Enhance CMKA with interactive friendly user interface.
5. Promote public interest in learning TCM knowledge.

--- End ---

Page 24
References:

[1] 神农 ShenNong-TCM-LM, Source: https://github.com/michael-wzhu/ShenNong-TCM-


LLM

[2] GPT (HuaTuoGPT) (2023-05-24), BenTsao 本草 (original name: HuaTuo 华驼):


Instruction-tuning Large Language Models With Chinese Medical Knowledge, Paper:
arXiv:2305.15075 Citation: @misc{wang2023huatuo, title={HuaTuo: Tuning LLaMA
Model with Chinese Medical Knowledge}, author={Haochun Wang and Chi Liu and
Nuwa Xi and Zewen Qiang and Sendong Zhao and Bing Qin and Ting Liu},
year={2023}, eprint={2304.06975}, archivePrefix={arXiv},
primaryClass={cs.CL}}

[3] CMLM-ZhongJing(中医大语言模型-仲景)Source:
https://github.com/pariskang/CMLM-ZhongJing/tree/main Citation: @misc{CMLM-
ZhongJing, author = {Kang, Yanlan and Chang, Yang and Fu, Jiyuan and Wang, Yan
and Wang, Haofen and Zhang, Wenqiang}, title = {CMLM-ZhongJing: Large
Language Model is Good Story Listener}, year = {2023}, publisher = {GitHub},
journal = {GitHub Repository}, howpublished =
{\url{https://github.com/pariskang/CMLM-ZhongJing}}}

[4] Ollama is a local inference framework designed for deploying and running large
language models (LLMs) on personal machines. Website: https://ollama.com/ Source:
https://github.com/ollama/ollama

[5] CCKS2024-TCMBench 中医知识理解与推理能力评测——不微调赛道. Source:


https://tianchi.aliyun.com/competition/entrance/532199/information

[6] “The Beginner’s Guide to Hallucinations in Large Language Models” by Deval Shah.
Source: https://www.lakera.ai/blog/guide-to-hallucinations-in-large-language-models

[7] “Retrieval-Augmented Generation for Large Language Models: A Survey” by Yunfan


Gao, et al. Source: arXiv:2312.10997v2 [cs.CL] 29 Dec 2023,

[8] “Challenges in Deploying Long-Context Transformers: A Theoretical Peak


Performance Analysis” by Yao Fu arXiv:2405.08944

[9] “Why your RAG is not working?” by Saurabh Singh. Source:


https://medium.com/@saurabhgssingh/why-your-rag-is-not-working-96053b4d5305

[10] “Enhancing the Accuracy of RAG Applications With Knowledge Graphs” by Tomaz
Bratanic. Source: https://medium.com/neo4j/enhancing-the-accuracy-of-rag-
applications-with-knowledge-graphs-ad5e2ffab663

[11] “Neo4j: Graphs for Everyone” by Neo4j. Neo4j Community Edition is an open source
product licensed under GPLv3. Source: https://github.com/neo4j/neo4j

Page 25
[12] “中医 TCM-neo4j 知识图谱” by ACNgeeeekoi. Source:
https://github.com/ywjawmw/TCM_KG

[13] Cypher is Neo4j’s declarative query language, allowing users to unlock the full
potential of property graph databases. Source: https://neo4j.com/docs/cypher-
manual/current/introduction/

[14] “SymMap: an integrative database of traditional Chinese medicine enhanced by


symptom mapping” Nucleic Acids Research 2018, 47(D1): D1110-D1117 by Wu Y#,
Zhang F#, Yang K#, Fang S, Bu D, Li H, Sun L, Hu H, Gao K, Wang W, Zhou X*,
Zhao Y*, Chen J*. Website: http://www.symmap.org/

[15] Summary of the ingredient: SMIT03044, Molecule name: “1,7-Dihydroxy-3,9-


Dimethoxy Pterocarpene”. Source: http://www.symmap.org/detail/SMIT03044

[16] “Mining Simple Path Traversal Patterns in Knowledge Graph” by Feng Xiong and
Hongzhi Wang, Journal of Web Engineering, Vol. 21 2, 307–336.
doi:10.13052/jwe1540-9589.2128, 2022 River Publishers

[17] Search (Herb) in SymMap v2: http://www.symmap.org/search/

[18] 书生·浦语开源大模型 Internlm 2.5-1.8B-chat from Shanghai AI Laboratory


https://github.com/InternLM/InternLM

[19] “Investigating the Translation of Metaphors Used in Diagnosis and Treatment in


Chinese Medicine Classics Neijing and Shanghan Lun” by Yanmei Liu, 2022 PhD
thesis School of Humanities and Communication Arts, Western Sydney University

Page 26

You might also like