A Topic Modeling Approach For Traditional Chinese Medicine Prescriptions
A Topic Modeling Approach For Traditional Chinese Medicine Prescriptions
Abstract—In traditional Chinese medicine (TCM), prescriptions are the daughters of doctors’ clinical experiences, which have been the
main way to cure diseases in China for several thousand years. In the long Chinese history, a large number of prescriptions have been
invented based on TCM theories. Regularities in the prescriptions are important for both clinical practice and novel prescription
development. Previous works used many methods to discover regularities in prescriptions, but rarely described how a prescription is
generated using TCM theories. In this work, we propose a topic model which characterizes the generative process of prescriptions in TCM
theories and further incorporate domain knowledge into the topic model. Using 33,765 prescriptions in TCM prescription books, the
model can reflect the prescribing patterns in TCM. Our method can outperform several previous topic models and group recommendation
methods on generalization performance, herbs recommendation, symptoms suggestion, and prescribing patterns discovery.
1 INTRODUCTION
in many health care and biomedicine tasks. For instance, in and perform recommendation or suggestion tasks in this
population genetics, one can treat each individual’s geno- study because a label could correspond to different combina-
type as a “document” and genetic patterns are “topics” of tions of symptoms.
those documents [28]. Chen et al. [29] showed that the con- Although these topic models described the prescribing
figuration of functional groups in meta-genome samples process, they failed to characterize the two important
can be inferred by probabilistic topic modeling. Van principles jun-chen-zuo-shi and herb compatibility, or could
Esbroeck et al. [30] explored the application of topic models not utilize domain knowledge well, while our topic model is
on heart rate time series to identify functional sets of heart more consistent with TCM theories and domain knowledge.
rate sequences and to concisely describe patients. Recently,
latent treatment patterns for clinical pathways [31] were dis- 3 DATA
covered with topic modeling.
Some knowledge-based topic models [23], [24], [25] have We collect 98,334 prescriptions from Dictionary of Traditional
been proposed. These models mainly use different forms of Chinese Medicine Prescriptions [3] which contains almost all
external linguistic knowledge for better text mining, but (about 100,000) prescriptions recorded in China. We focus
knowledge-based topic models have not been extensively on herbs and symptoms in this work.
explored for other kinds of data, especially for medical data. We filter indication symptoms by using 603 standard
symptoms in Traditional Chinese Medicine Symptoms differen-
tial diagnosis [41], and filter herbs by using 970 herbs
2.2 TCM Knowledge Discovery
in Traditional Chinese Medical Subject Headings (TCM
Knowledge discovering and data mining have become hot MeSH) [42]2 which is compatible with Medical Subject
topics in health care and biomedicine [32], [33]. Compared Headings (MeSH). Each symptom has a syndrome category
with data mining research in modern biomedicine, TCM and each herb has efficacy description text. Among all
data mining just becomes popular in recent years. The efforts 98,334 prescriptions, 33,765 of them have both symptoms
of TCM data mining have been reviewed by Feng et al. [34], and herbs in two filters. S ¼ 390 symptoms and H ¼ 811
Lukman et al. [35], Liu et al. [36] and Li and Liu [37]. herbs appear in P ¼ 33;765 prescriptions. We run our
A number of works have been devoted to studying the experiments on the 33,765 prescriptions. We randomly
component patterns in TCM prescriptions. For example, Li divided the P ¼ 33;765 prescriptions into a training set of
et al. [6] constructed herb network using a method called Dis- 28,746 prescriptions and a test set of 5,019 prescriptions.
tance-based Mutual Information Model to identify useful
relationships among herbs in numerous prescriptions. Zhang
et al. [8] discovered interesting regularities using latent tree 4 PRESCRIPTION TOPIC MODEL (PTM)
models [38], these regularities are of interest to students of Guided by li-fa-fang-yao, TCM practitioners usually synthe-
TCM as well as pharmaceutical companies that manufacture sise disease manifestations (symptoms) and determine syn-
medicine using Chinese herbs. He et al. [9] proposed an dromes of a patient first. Then treatment methods are easily
approach that could discover herbal functional groups from a determined according to syndromes. In general, a particular
large set of prescriptions recorded in TCM books. Poon treatment method corresponds to a syndrome. For example,
et al. [7] proposed an approach that could systematically gen- in Fig. 1, TCM practitioners first determine the syndrome
erate combinations of interacting herbs that might lead to “depressed nutrient and defense” which means the nutrient
good outcome. Zheng et al. [11] constructed prescription asso- in blood is not well absorbed and immunity is weak and the
ciated networks by mining literature data sets. Yao et al. [10] syndrome “failure of lung qi in dispersion” which means
introduced a system which mines the evolutionary relation- respiratory movement is depressed, then the treatment
ship among TCM prescriptions from prescription books. methods “inducing sweating to releasing exterior” (which
The closest works to ours are [12], [13], [14], [15] which means inducing sweating and move qi (the fundamental
have explored topic modeling on TCM clinical data. Zhang substance which constitutes the human body) to skin) corre-
et al. [12] proposed a hierarchical symptom-herb topic model sponding to “depressed nutrient and defense” and “diffuse
which uses Link latent Dirichlet allocation (LinkLDA) [39] the lung to calm panting” (which means regulating respira-
model and nested Chinese restaurant process to automati- tory movement to calm panting) corresponding to “failure
cally extract hierarchical latent topic structures with both of lung qi in dispersion” are decided. Finally, practitioners
symptoms and their corresponding herbs in TCM clinical form a prescription based on the treatment methods. In the
records. The number of hierarchical topics is automatically prescription, each treatment method is implemented by
determined. Zhang et al. [13] proposed the Symptom-Herb- some herbs (e.g., the two treatment methods mentioned
Diagnosis topic model which uses Author-topic model above are mainly implemented by Ephedra), and each herb
(ATM) [40] and diagnoses information to discover the com- has a jun-chen-zuo-shi role (e.g., Ephedra is the jun herb).
mon relationships among symptoms, herb combinations and Based on this process, here we introduce the details of
diagnoses in clinical cases. Jiang et al. [14] applied LinkLDA our Prescription Topic Model (PTM). Let P be the number
directly to the same problem. Our model is an extension to of prescriptions where each prescription p has Nhp herbs
LinkLDA model. In our previous work [15], we presented a and Nsp symptoms, hpn is the nth herb in p and spm is the
framework to mine medicine usage patterns in clinical cases. mth symptom in p. The prescription in Fig. 1 has Nhp ¼ 4
We first mapped symptoms to treatment methods defined in herbs and Nsp ¼ 7 symptoms. zpn is the latent treatment
TCM domain ontology, then viewed treatment methods as method assignment for hpn , z0pm is the latent syndrome
labels of a prescription and employed a supervised topic assignment for spm , xpn is the latent jun-chen-zuo-shi role
model to learn herb usage patterns under each topic (label). assignment for hpn (The prescriptions with known jun-chen-
The method could reflect treatment methods-herbs relations.
However, it could not learn direct symptom-herb relations 2. Available at http://zcy.ckcest.cn/tcm/dic/home
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
1010 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 30, NO. 6, JUNE 2018
Fig. 3. The probabilistic graphical models representation of PTM. (a) PTM(a): The prescription topic model with herb role only. (b) PTM(b): The pre-
scription topic model with herb role and herb compatibility.
2
Nlp ¼ CN hp
¼ Nhp ðNhp 1Þ=2 herb pairs in p when Nhp > 1, use the 390 symptoms to filter efficacy descriptions of 811
if there is only one herb in p, we assume that p has one herb herbs in TCM MeSH, and obtain the symptom-herb corre-
pair, but the pair consists of two identical herbs. As shown spondences, then for each prescription in the training set, if an
in the left part of Fig. 3b, the generative story of the link set herb h in a prescription p can treat a symptom s of p’s indica-
of prescription p is as follows: tion, we add h and s (e.g., the herb Ephedra and the symptom
aversion to cold with fever in Fig. 1) to the symptom-herb cor-
(1) For each herb pair h ~pl of the Nl herb pairs in pre-
p responding set of prescription p. Since their correspondence
scription p: in TCM knowledge, we assume a symptom s in the corre-
a) Draw a treatment method zpl Multðup Þ. sponding set can only be assigned to the topics of s’s corre-
b) Draw two roles xpl1 ; xpl2 Multðppzpl Þ. sponding herbs in prescription p.
c) Draw two herbs hpl1 Multðfzpl xpl1 Þ, hpl2 Mult We name the prescription topic model with herb role and
ðfzpl xpl2 Þ. herb efficacy knowledge PTM(c) which is illustrated in
We name this model with herb compatibility PTM(b). Fig. 4a. If spm has no corresponding herb in prescription p,
The inference equation for zpl , xpl1 and xpl2 is defined as the inference equation for z0pm is the same as Equation (1);
~pl ; zpl ; xpl ; a; b; hÞ otherwise, z0pm can only be sampled from the topic assign-
pðzpl ¼ k; xpl1 ¼ x1 ; xpl2 ¼ x2 jh
ment set fzpn jhpn treats spm g of spm ’s corresponding herbs
n00pk þ a npkx þ h nkx1 hpl1 þ b fhpn jhpn treats spm g in p, the inference equation for z0pm is
/ 0 1
Nsp þ Nlp þ Ka npk þ Xh nkx1 þ Hb (7)
npkx þ h n kx2 hpl2 þ b pðz0pm ¼ kjspm ; spm ; z0pm ; z; a; b0 Þ /
0 2 ;
npk þ Xh nkx2 þ Hb I½k 2 fzpn jhpn treats spm g
~pl is the lth herb pair in prescription p, zpl are treat- (10)
where h npk þ a nkspm þ b0
ment method assignments for all herb pairs except h ~pl , xpl ;
~pl , n00 is the Nsp þ Nhp þ Ka nk þ Sb0
are role assignments for all herb pairs except h pk
number of times any herb pair or symptom in p is assigned
where I½y ¼ 1 when y is true and I½y ¼ 0 when y is false.
to topic k. The inference equation for z0pm in PTM(b) is
The inference equation for zpn and xpn in PTM(c) is the same
similar to Equation (1), but we need to replace npk and Nhp
as Equation (2). The parameter estimation equations for
with n00pk and Nlp
PTM(c) are the same as PTM(a).
pðz0pm ¼ kjspm ; spm ; z0pm ; z; a; b0 Þ We name our prescription topic model with herb role, herb
n00pk þ a nkspm þ b0 (8) compatibility and herb efficacy knowledge PTM(d) which is
/ shown in Fig. 4b. If spm has no corresponding herb in prescrip-
Nsp þ Nlp þ Ka nk þ Sb0
tion p, the inference equation for z0pm is the same as Equa-
The parameter estimation equations for f0kspm , ppkx and fkxh tion (8); otherwise, z0pm can only be sampled from the topic
in PTM(b) are the same as in PTM(a), the only difference is assignment set fzpl jhpl1 treats spm or hpl2 treats spm g of spm ’s
corresponding herbs in p, the inference equation for z0pm is
n00pk þ a
upk ¼ (9)
Nsp þ Nlp þ Ka pðz0pm ¼ kjspm ; spm ; z0pm ; z; a; b0 Þ
/ I½k 2 fzpl jhpl1 treats spm or hpl2 treats spm g
4.3 Incorporating Herb Efficacy Knowledge (11)
In this section, we use TCM prior knowledge to improve the n00pk þ a nkspm þ b0
prescription topic model. We extract the symptom-herb corre- Nsp þ Nlp þ Ka nk þ Sb0
spondences from the training prescriptions. Specifically, we
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
1012 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 30, NO. 6, JUNE 2018
Fig. 4. The probabilistic graphical models representation of PTM with herb efficacy knowledge. (a) PTM(c): The prescription topic model with herb
role and herb efficacy knowledge. (b) PTM(d): The prescription topic model with herb role, herb compatibility and herb efficacy knowledge.
The inference equation for zpl , xpl1 and xpl2 in PTM(d) is of each user in a user group by user-based collabora-
the same as Equation (7). The parameter estimation equa- tive filtering, then uses the average of these scores as
tions for PTM(d) are the same as PTM(b). the recommendation score for the group. We com-
pute the conditional probability of items (herbs/
symptoms) given users (symptoms/herbs) in train-
5 EXPERIMENT
ing prescriptions as the rating score.
In this section we evaluate our prescription topic model on User-based collaborative filtering with least-misery strat-
four experimental tasks. Specifically we want to determine: egy (CF-LM) [45], a widely used group recommenda-
tion method which uses the smallest rating score of
Can our model achieve better generalization perfor-
group users as the recommendation score for the
mance than other topic models?
group. We also compute the conditional probability
Can our model recommend herbs for a list of
of items given users in training prescriptions as the
symptoms?
rating score.
Can our model suggest symptoms for a list of herbs?
COnsensus Model (COM) [46], a group recommenda-
Can our model reflect the prescribing patterns in
tion method which simulates the generative process
TCM?
We compare our prescription topic model (PTM) with of group events and make recommendations for a
eight baselines. Among them, six baselines are topic models, group of users. We treat herbs in a prescription as a
three baselines are group recommendation methods. We group of users when recommending symptoms, and
compare our model with group recommendation methods view symptoms as a group of users when recom-
because recommending herbs (symptoms) for a list of mending herbs.
symptoms (herbs) is analogous to recommending items to a Bilingual Biterm Topic Model (BiBTM) [47], a topic
group of users. model describing the generation process of a paired
bilingual document corpus. We treat herbs in a pre-
Author-topic model (ATM) [40] employed by previous scription as the words in a document and symptoms
work [13] which treats herbs as authors and symp- as the words in the translated version of the document.
toms as words. We set the following hyperparameters: for PTM: a ¼ 1;
LinkLDA [39] used in previous works [12] and [14] b ¼ 0:1; b0 ¼ 0:1; h ¼ 1; for LinkLDA: a ¼ 1; b ¼ 0:1; b0 ¼ 0:1;
which views herbs and symptoms as words and for Block-LDA: aD ¼ aL ¼ 1; g ¼ 0:1; for Link-PLSA-LDA:
references. au ¼ aL ðhyperparameter of pÞ ¼ 1; b0 ðhyperparameter of VÞ ¼
Block-LDA [43], a topic model that extends Link- g ðhyperparameter of bÞ ¼ 0:1; for BiBTM: a ¼ 1; b ¼ 0:1; for
LDA. It can model links between certain type of enti- ATM: a ¼ 50=K; b ¼ 0:01 as suggested in [40]; for COM:
ties. We treat herb-pairs set extracted from all train- a ¼ 50=K; b ¼ h ¼ 0:01; g ¼ g t ¼ 0:5 and r ¼ 0:01 as sug-
ing prescriptions as the external links. gested in [46]. For CF-AVG and CF-LM, we use Pearson corre-
Link-PLSA-LDA [44], a topic model that extends lation similarity and top 10 similar users. We find that small
Link-LDA. It can model links between different changes of hyperparameters do not change the results much.
types of entities. We treat symptom-herb correspon- All topic models are trained using 1,000 Gibbs iterations.
dence set extracted from all training prescriptions as
the external links. 5.1 Generalization Performance
User-based collaborative filtering with averaging strategy 5.1.1 Herbs Predictive Perplexity
(CF-AVG) [45], a widely used group recommenda- We use the predictive perplexity to evaluate the herbs pre-
tion method. CF-AVG first estimates the rating score dictive power of topic models. Perplexity is a standard
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
YAO ET AL.: A TOPIC MODELING APPROACH FOR TRADITIONAL CHINESE MEDICINE PRESCRIPTIONS 1013
shows the correctness of modeling herbs and symptoms as Fig. 6 gives the symptoms predictive perplexity of each
two parts of a prescription. Block-LDA performs better than model with different number of topics. From Table 4, we
LinkLDA, which demonstrates using herb links can improve can see that ATM also does not perform well on symptoms
herb predictive capabilities. Link-PLSA-LDA outperforms prediction, and LinkLDA performs better than ATM again,
LinkLDA, which shows extracting symptom-herb corre- which shows modelling herbs and symptoms as two types
spondences from prescriptions can help herb prediction. of words of a document is a better choice. Block-LDA per-
PTM(a) performs better than LinkLDA and similarly to forms similarly to LinkLDA, which means using extracted
Link-PLSA-LDA, because considering herb roles can high- herb pairs as external links outside the training prescrip-
light most relevant herbs (jun (emperor) and chen (minister) tions could not help symptom prediction much. Link-
herbs) of given symptoms and ignore less relevant herbs. PLSA-LDA significantly outperforms Link-LDA (p < 104 ),
PTM(b) has lower perplexity scores than PTM(a) and Link-
which means herb-symptom links can also help symptom
LDA (p < 103 ), which means considering herb compatibil-
prediction. PTM(a) has lower perplexity than LinkLDA
ity in each prescription can significantly improve the herb
(p < 0:01), which means considering herb roles can signifi-
predictive power. This is intuitive because when seeing a
cantly improve the symptoms predictive power. This is
symptom, practitioners not only use an herb that can treat
the symptom, but also use a compatible herb to augment the because when seeing a list of herbs, the jun-chen-zuo-shi
effect or counteract the toxic [16]. PTM(c) also significantly labels can highlight jun (emperor) herbs and chen (minister)
outperforms PTM(a) (p < 106 ), which demonstrates herbs, and the corresponding symptoms are mainly treated
restricting symptom topic assignments using herb efficacy by jun herbs and chen herbs. PTM(b) performs slightly better
knowledge is also an efficient way to help herbs prediction, than PTM(a), which shows considering compatible herb
this is also intuitive because the knowledge makes an herb may highlight chen (minister) herbs or zuo (assistant) herbs,
and its indication symptoms tend to be under the same topic. which are also used to treat the corresponding symptoms.
PTM(d) has the lowest perplexity scores, and significantly PTM(c) also slightly outperforms PTM(a), which shows
outperforms PTM(c) (p < 106 ), which means considering restricting symptom topic assignment can also improve
both herb compatibility and herb efficacy knowledge leads symptom predictive capability, but the improvement is not
to the best herb predictive power. However, compared to obvious as the improvement in herb prediction task, the rea-
PTM(b), PTM(d) only improves a little, as connecting herb son could be that corresponding symptoms are fewer than
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
1014 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 30, NO. 6, JUNE 2018
Fig. 6. Symptoms predictive perplexity of the topic models with different Fig. 7. Prescription predictive perplexity of the topic models with different
number of topics K. A lower perplexity means the predictive power is number of topics K. A lower perplexity means the predictive power is
better. We run all models 10 times and report the mean standard devi- better. We run all models 10 times and report the mean standard devi-
ation. Improvements of PTM(a), PTM(b), PTM(c), and PTM(d) over Link- ation. Improvements of PTM(b) and PTM(d) over LinkLDA are significant
LDA are all significant (p < 0:01) based on 2-tailed paired t-test. (p < 1010 ) based on 2-tailed paired t-test.
TABLE 2
Herbs Precision@N (P @N) of Each Model with Different K (the Number of Topics) and N
K 20 30 40
ATM 0.0088 0.0021 0.0091 0.0026 0.0087 0.0005 0.0089 0.0023 0.0093 0.0016 0.0092 0.0004 0.0081 0.0023 0.0089 0.0014 0.0083 0.0006
LinkLDA 0.2301 0.0067 0.1851 0.0015 0.1336 0.0010 0.2277 0.0036 0.1789 0.0023 0.1298 0.0014 0.2188 0.0031 0.1786 0.0014 0.1276 0.0005
Block-LDA 0.2269 0.0030 0.1817 0.0015 0.1321 0.0016 0.2286 0.0029 0.1803 0.0020 0.1300 0.0014 0.2192 0.0052 0.1770 0.0019 0.1283 0.0006
Link-PLSA-LDA 0.2320 0.0037 0.1858 0.0016 0.1356 0.0013 0.2284 0.0036 0.1813 0.0016 0.1392 0.0010 0.2236 0.0034 0.1793 0.0021 0.1297 0.0015
BiBTM 0.2143 0.0000 0.1604 0.0000 0.1216 0.0000 0.2143 0.0000 0.1604 0.0000 0.1216 0.0000 0.2143 0.0000 0.1604 0.0000 0.1216 0.0000
CF-AVG 0.2324 0.0000 0.1933 0.0000 0.1476 0.0000 0.2324 0.0000 0.1933 0.0000 0.1476 0.0000 0.2324 0.0000 0.1933 0.0000 0.1476 0.0000
CF-LM 0.2320 0.0000 0.1936 0.0000 0.1481 0.0000 0.2320 0.0000 0.1936 0.0000 0.1481 0.0000 0.2320 0.0000 0.1936 0.0000 0.1481 0.0000
COM 0.2197 0.0008 0.1731 0.0010 0.1289 0.0008 0.2194 0.0011 0.1746 0.0012 0.1295 0.0007 0.2197 0.0011 0.1745 0.0008 0.1316 0.0005
PTM(a) 0.2320 0.0032 0.1835 0.0027 0.1346 0.0013 0.2299 0.0039 0.1819 0.0019 0.1348 0.0006 0.2241 0.0032 0.1810 0.0033 0.1326 0.0007
PTM(b) 0.2475 0.0029 0.1998 0.0027 0.1497 0.0009 0.2507 0.0029 0.2039 0.0020 0.1525 0.0008 0.2533 0.0024 0.2056 0.0011 0.1528 0.0009
PTM(c) 0.2385 0.0041 0.1920 0.0016 0.1414 0.0008 0.2376 0.0037 0.1880 0.0020 0.1326 0.0007 0.2313 0.0039 0.1846 0.0024 0.1398 0.0006
PTM(d) 0.2486 0.0023 0.2009 0.0019 0.1497 0.0006 0.2522 0.0029 0.2040 0.0022 0.1512 0.0016 0.2528 0.0027 0.2053 0.0011 0.1531 0.0008
We run all models 10 times and report the mean standard deviation. Improvements of PTM(b), PTM(c), and PTM(d) over LinkLDA are all significant
(p < 0:01) based on 2-tailed paired t-test.
TABLE 3
Symptoms Precision@N (P @N) of Each Model with Different K (the Number of Topics) and N
K 20 30 40
ATM 0.0742 0.0006 0.0522 0.0004 0.0375 0.0002 0.0738 0.0009 0.0520 0.0005 0.0367 0.0001 0.0738 0.0008 0.0519 0.0006 0.0363 0.0001
LinkLDA 0.1062 0.0009 0.0719 0.0006 0.0464 0.0002 0.1063 0.0008 0.0715 0.0004 0.0460 0.0003 0.1063 0.0011 0.0709 0.0006 0.0458 0.0001
Block-LDA 0.1027 0.0019 0.0692 0.0009 0.0453 0.0002 0.1040 0.0008 0.0693 0.0008 0.0452 0.0003 0.1038 0.0014 0.0694 0.0006 0.0456 0.0005
Link-PLSA-LDA 0.1081 0.0008 0.0723 0.0005 0.0468 0.0003 0.1080 0.0015 0.0728 0.0005 0.0469 0.0002 0.1085 0.0010 0.0725 0.0005 0.0469 0.0002
BiBTM 0.0750 0.0000 0.0528 0.0000 0.0371 0.0000 0.0749 0.0000 0.0528 0.0000 0.0371 0.0000 0.0749 0.0000 0.0528 0.0000 0.0371 0.0000
CF-AVG 0.1050 0.0000 0.0769 0.0000 0.0514 0.0000 0.1050 0.0000 0.0769 0.0000 0.0514 0.0000 0.1050 0.0000 0.0769 0.0000 0.0514 0.0000
CF-LM 0.0977 0.0000 0.0716 0.0000 0.0478 0.0000 0.0977 0.0000 0.0716 0.0000 0.0478 0.0000 0.0977 0.0000 0.0716 0.0000 0.0478 0.0000
COM 0.0775 0.0009 0.0597 0.0005 0.0413 0.0003 0.0849 0.0013 0.0649 0.0008 0.0437 0.0004 0.0918 0.0013 0.0681 0.0008 0.0449 0.0001
PTM(a) 0.1064 0.0010 0.0717 0.0006 0.0459 0.0003 0.1071 0.0016 0.0714 0.0006 0.0463 0.0003 0.1078 0.0008 0.717 0.0006 0.0469 0.0002
PTM(b) 0.0996 0.0016 0.0697 0.0006 0.0460 0.0002 0.1026 0.0011 0.0713 0.0008 0.0471 0.0004 0.1036 0.0011 0.0722 0.0008 0.0475 0.0002
PTM(c) 0.1018 0.0015 0.0705 0.0005 0.0464 0.0001 0.1038 0.0011 0.0705 0.0005 0.0467 0.0003 0.1029 0.0008 0.0707 0.0003 0.0467 0.0002
PTM(d) 0.0981 0.0012 0.0694 0.0005 0.0453 0.0003 0.1005 0.0013 0.0709 0.0009 0.0460 0.0003 0.1011 0.0008 0.0718 0.0007 0.0469 0.0002
We run all models 10 times and report the mean standard deviation.
5.3 Symptoms Suggestion than LinkLDA which shows the effect of extracted herb-
We compute the following conditional probability of a symptom correspondences using herb efficacy knowledge.
symptom given a set of test herbs PTM(a) can perform better than LinkLDA when K increases
(p < 0:002 at K ¼ 40 and N ¼ 5), which shows herbs roles
~p Þ ¼ 1 X are more helpful for larger topic number in recommenda-
pðsjh pðsjhpn Þ (19)
Nhp ~ tion tasks. We notice that PTM(b), PTM(c) and PTM(d) have
hpn 2hp
slightly lower perplexity than Link-LDA but cannot achieve
The Precision@N for symptom recommendation is higher symptoms Precision@5. But they can achieve higher
defined as Precision@N than LinkLDA when N increases, which
means they can rank the true symptoms higher on average,
jftop N symptomsg \ ftrue symptomsgj but may not rank true symptoms to top 5. Moreover, the
Precision@N ¼ Precision@N scores are low because symptoms are often
jftop N symptomsgj
few in a prescription.
(20)
We also average the precision@N of all testing prescriptions 5.4 Prescribing Patterns Discovery
as the final P @N. We now evaluate topics learned from all 33,765 prescrip-
Table 3 presents symptoms Precision@N of each model tions by our model. We first qualitatively show some topics.
with different K and N values. We note that ATM and Then we quantitatively evaluate learned topics by compar-
BiBTM do not perform well as in herbs recommendation, ing to TCM prior knowledge.
the reasons are also similar. COM also neglects symptoms
correlations, so it cannot produce satisfactory results.
CF-AVG and CF-LM perform well when N is large, the con- 5.4.1 Qualitative Results
ditional probability pðsjhÞ can also highlight most relevant Table 4 presents three topics generated by several topic
symptoms of an herb. Link-PLSA-LDA performs better models with K ¼ 25. We show top 10 symptoms on the left
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
1016 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 30, NO. 6, JUNE 2018
TABLE 4
Example Topics Learned by Several Topic Models with K ¼ 25
Blood-regulating Nourishing heart and tranquilizing mind Harmonizing intestines and stomach
ATM
oppression in the chest Semen Trichosanthis abdominal pain Longtube Groundivy Herb spontaneous sweating Folium Hibisci Mutabilis
aversion to cold Mercury Oxidum amnesia Fluorite abdominal fullness Amber
stomachache Calculus Equi measles Nardostachys Root chronic shank ulcer Ardisia Japonica
profuse spittle Terminalia chebula Retz palpitation Bamboo Shavings bloody stool Snakegourd Root
hyperopia Folium Phyllostach Lophatheri vomiting Emblic Leafflower Fruit stomach reflux Coffea Arabica
waggling tongue Serissa Serissoides infantile malnutrition Radix Boehmeriae borborigmus Officinal Magnolia Flower
hypermenorrhea Pharbitis Seed arthralgia Motherwort Herb dizziness Chives
palpitations below the heart Hibiscus Mutabilis metrorrhagia Lotus Leaf retention of the lochia Fermented Soybean
postpartum metrorrhagia Air Potato indigestion Radix Aconiti Kusnezoffii greenish complexion Rumex Japonicus
vomiting Pilose Antler tremor of feet Foeniculum Vulgare rigidity of limbs Fruit of Sharpleaf Calangal
LinkLDA
epistaxis Chinese Angelica palpitation Common Yam Rhizome vomiting Common Aucklandia Root
bloody stool Paeonia Veitchii amnesia Dodder Seed nausea Clove
hemafecia Red Peony Root deafness Eucommia Bark borborigmus Fructus Amomi Rotundus
hemoptysis Liquorice Root lumbago Chinese Magnoliavine Fruit stomach reflux Chinese Eaglewood Wood
hematuria Paeonia Suffruticosa frequent urination Asiatic Cornelian Cherry Fruit acid swallow Foeniculum Vulgare
dizziness Unprocessed Rehmannia night sweating Achyranthes Bidentata tenesmus Nutmeg
Root
heaviness of head Debark Peony Root enuresis Desertliving Cistanche abdomen cold Medicine Terminalia Fruit
hypermenorrhea Tree Peony Root Bark dreamfulness Prepared Rehmannia Root dysphagia Villous Amomum Fruit
hematemesis Cattail Pollen infertility Barbary Wolfberry Fruit abdominal pain Cablin Patchouli Herb
infertility Colla Corii Asini dizziness Pilose Antler spasm Cardamon Fruit
Block-LDA
hematemesis Paeonia Veitchii amnesia Milkwort Root vomiting Dried Tangerine Peel
bloody stool Chinese Angelica lumbago Achyranthes Bidentata acid swallow Officinal Magnolia Bark
epistaxis Red Peony Root dizziness Eucommia Bark nausea Villous Amomum Fruit
limbs pain Unprocessed Rehmannia palpitation Common Yam Rhizome epigastric upset Massa Medicata Fermentata
Root
hematuria Liquorice Root night sweating Chinese Magnoliavine Fruit belching Atractylodes Rhizome
hemoptysis Debark Peony Root frequent urination Dodder Seed dysphagia Nutgrass Galingale
Rhizome
retention of the lochia Paeonia Suffruticosa enuresis Prepared Rehmannia Root diarrhea Cablin Patchouli Herb
hemafecia Tree Peony Root Bark dreamfulness Asiatic Cornelian Cherry Fruit anorexia Hawthorn Fruit
retention of placenta Cattail Pollen fatigue Desertliving Cistanche hiccup Green Tangerine peel
yellow sweat Sichuan Lovage Rhizome deafness Dendrobium stomach reflux Pinellia Tuber
Link-PLSA-LDA
white vaginal discharge Chinese Angelica dizziness Dwarf Lilyturf Tuber vomiting Common Aucklandia Root
red and white vaginal Debark Peony Root palpitation Milkwort Root abdominal pain Clove
discharge
hematemesis Sichuan Lovage Rhizome amnesia Common Yam Rhizome nausea Fructus Amomi Rotundus
threatened abortion Paeonia Veitchii dreaminess Salvia Root borborygmus Chinese Eaglewood Wood
tidal fever Paeonia Suffruticosa vertigo Tangshen regurgitation Nutmeg
infertility Tree Peony Root Bark oppression in chest Chinese Angelica acid regurgitation Villous Amomum Fruit
vaginal bleeding during Nutgrass Galingale Rhizome vexation Chinese Magnoliavine Fruit dysphagia Cablin Patchouli Herb
pregnancy
hypochondriac pain Unprocessed Rehmannia insomnia Grassleaf Sweetflag Rhizome hiccup Foeniculum Vulgare
Root
flooding and spotting Prepared Rehmannia Root fatigue Spine Date Seed abdomen cold Medicine Terminalia Fruit
bloody stool Colla Corii Asini night sweating Debark Peony Root stomachache Cardamon Fruit
PTM(a)
hematemesis Chinese Angelica dizziness Milkwort Root abdominal pain Common Aucklandia Root
epistaxis Paeonia Veitchii palpitation Chinese Magnoliavine Fruit vomiting Clove
hemafecia Liquorice Root amnesia Common Yam Rhizome nausea Fructus Amomi Rotundus
hematuria Paeonia Suffruticosa lumbago Eucommia Bark borborygmus Chinese Eaglewood Wood
bloody stool Unprocessed Rehmannia deafness Achyranthes Bidentata spasm Cablin Patchouli Herb
Root
hemoptysis Debark Peony Root dreaminess Dodder Seed diarrhea Foeniculum Vulgare
flooding and spotting Colla Corii Asini anorexia Cornus Officinalis vomiting and diarrhea Nutmeg
menorrhagia Radix Ophiopogonis fatigue Grassleaf Sweetflag Rhizome regurgitation Villous Amomum Fruit
shortage of qi Eriobotrya Japonica vertigo Desertliving Cistanche abdomen cold Officinal Magnolia Bark
glossorrhagia Tree Peony Root Bark frequent urination Chinese Arborvitae kernel acid regurgitation Common Floweringqince
Fruit
PTM(b)
hematemesis Golden Thread amnesia Milkwort Root vomiting Fructus Amomi Rotundus
tidal fever Liquorice Root blurred vision Lightyellow Sophora Root abdominal pain Clove
night sweating Radix Bupleuri dizziness Liquorice Root nausea Common Aucklandia Root
bloody stool Turtle Carapace palpitation Poria borborigmus Liquorice Root
infantile malnutrition Figwortflower Picrorhiza vexation Chinese Angelica acid regurgitation Ginseng
Rhizome
epistaxis Chinese Angelica insomnia Divaricate Saposhnikovia Root spasm Officinal Magnolia Bark
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
YAO ET AL.: A TOPIC MODELING APPROACH FOR TRADITIONAL CHINESE MEDICINE PRESCRIPTIONS 1017
TABLE 4
(Continued )
Blood-regulating Nourishing heart and tranquilizing mind Harmonizing intestines and stomach
emaciation Areca Seed dreaminess Ginseng abdomen cold White Atractylodes
Rhizome
abdominal pain Rangooncreeper Fruit dysphoria Spine Date Seed regurgitation Fresh Ginger
indigestion Common Aucklandia Root weep Fleeceflower Root abdominal fullness Radix Aconiti Lateralis
Preparata
flooding and spotting Massa Medicata Fermentata headache Chrysanthemum Flower bitter taste in mouth Dried Ginger
PTM(c)
abdominal pain Chinese Angelica lumbago Chinese Magnoliavine Fruit abdominal pain Common Aucklandia Root
hematemesis Debark Peony Root deafness Milkwort Root borborigmus Fructus Amomi Rotundus
red and white vaginal Sichuan Lovage Rhizome amnesia Eucommia Bark vomiting Clove
discharge
flooding and spotting Colla Corii Asini night sweating Achyranthes Bidentata nausea Foeniculum Vulgare
dystocia Nutgrass Galingale Rhizome shortness of breath Dodder Seed lumbago Common Buried Tuber
metrostaxis Paeonia Veitchii dizziness Common Yam Rhizome abdomen cold Nutmeg
white vaginal discharge Argy Wormwood Leaf blurred vision Asiatic Cornelian Cherry Fruit hiccup Zedoray Rhizome
metrorrhagia Prepared Rehmannia Root frequent urination Grassleaf Sweetflag Rhizome tenesmus Chinese Eaglewood Wood
threatened abortion Cattail Pollen spontaneous sweating Desertliving Cistanche acid regurgitation Areca Seed
vaginal bleeding during Motherwort Herb infertility Dendrobium halitosis Green Tangerine peel
pregnancy
PTM(d)
hematemesis Oyster Shell amnesia Dodder Seed vomiting Fructus Amomi Rotundus
abdominal pain Chinese Angelica lumbago Poria abdominal pain Ginseng
bloody stool Bone Fossil of Big Mammals night sweating Milkwort Root borborygmus Dried Ginger
white vaginal discharge Garden Burnet Root dizziness Achyranthes Bidentata nausea White Atractylodes
Rhizome
night sweating Liquorice Root deafness Chinese Magnoliavine Fruit acid regurgitation Liquorice Root
metrorrhagia Red Halloysite palpitation Desertliving Cistanche reversal cold of hands and Radix Aconiti Lateralis
feet Preparata
red and white vaginal Colla Corii Asini white vaginal discharge Chinese Angelica spasm Officinal Magnolia Bark
discharge
metrostaxis Golden Thread blurred vision Pilose Antler abdomen cold Fresh Ginger
tenesmus Dried Ginger infertility Eucommia Bark abdominal fullness Common Aucklandia Root
epistaxis Debark Peony Root frequent urination Ginseng hiccup Poria
We show top 10 symptoms (left) and top 10 herbs (right). Symptoms italicized and marked in red do not appear in other symptoms’ syndrome categories. Herbs
italicized and marked in red could not treat the top 10 symptoms. We manually labeled the topic names.
TABLE 5
Example Topic Roles Learned by PTM(a) with K ¼ 25
Blood-regulating
Symptoms Role 0 Role 1 Role 2 Role 3
hematemesis Chinese Angelica Paeonia Suffruticosa Liquorice Root Colla Corii Asini
epistaxis Paeonia Veitchii Tree Peony Root Bark Eriobotrya Japonica Chinese Angelica
hemafecia Liquorice Root Chinese Angelica Loquat Leaf Unprocessed Rehmannia Root
hematuria Debark Peony Root Paeonia Veitchii Radix Ophiopogonis Cattail Pollen
bloody stool Red Peony Root Unprocessed Rehmannia Root Ginseng Debark Peony Root
hemoptysis Caulis Akebiae Prepared Rehmannia Root Bamboo Shavings Garden Burnet Root
flooding and spotting Radix Ophiopogonis Golden Thread Reed Rhizome Sophora Flower
menorrhagia Beautiful Sweetgum Resin Radix Ophiopogonis Unprocessed Rehmannia Root Panax Notoginseng
shortage of qi Lotus Rhizome Node Red Peony Root Pyrus Bretschneideri Chinese Arborvitae Twig and Leaf
glossorrhagia Orange Fruit Liquorice Root Egg India Madder Root
Nourishing heart and tranquilizing mind
Symptoms Role 0 Role 1 Role 2 Role 3
dizziness Desertliving Cistanche Common Yam Rhizome Milkwort Root Milkvetch Root
palpitation Dodder Seed Eucommia Bark Chinese Magnoliavine Fruit Deer horm
amnesia Achyranthes Bidentata Asiatic Cornelian Cherry Fruit Grassleaf Sweetflag Rhizome Fleeceflower Root
lumbago Pilose Antler Achyranthes Bidentata Chinese Arborvitae kernel Tangshen
deafness Dendrobium Oriental Waterplantain Rhizome Spine Date Seed Prepared Rehmannia Root
dreaminess Eucommia Bark Chinese Magnoliavine Fruit Salvia Root Barbary Wolfberry Fruit
anorexia Morinda Root Prepared Rehmannia Root Dwarf Lilyturf Tuber Dodder Seed
fatigue Asiatic Cornelian Cherry Fruit Gordon Euryale Seed Poria Ligustrum Lucidum
vertigo Chinese Magnoliavine Fruit Radix Codonopsis Dimocarpus Longan Deer-Horm Glue
frequent urination Palmleaf Raspberry Fruit Malaytea Scurfpea Fruit Arillus Longan Glossy Privet Fruit
We show top 10 symptoms (left) and top 10 herbs of each role (right). Herbs italicized and marked in red could not treat the top 10 symptoms.
again. On the left, nine symptoms are mental symptoms left, eight symptoms are intestines and stomach-related
except infertility. On the right, Dodder Seed can treat enure- symptoms. On the right, Common Aucklandia Root can treat
sis. Eucommia Bark can treat dizziness and lumbago. Chinese abdominal pain, vomiting, borborigmus and tenesmus.
Magnoliavine Fruit can treat palpitation, night sweating and Clove, Fructus Amomi Rotundus and Chinese Eaglewood
enuresis. Asiatic Cornelian Cherry Fruit can treat dizziness, Wood can treat vomiting and abdomen cold. Foeniculum Vul-
deafness, frequent urination and enuresis. Desertliving Cis- gare can treat abdominal pain, abdomem cold and vomiting.
tanche can treat lumbago and infertility. Prepared Rehmannia Nutmeg and Cardamon Fruit can treat vomiting. Villous
Root can treat palpitation, deafness, night sweating and infer- Amomum Fruit can treat abdominal pain, vomiting and nau-
tility. Barbary Wolfberry Fruit can treat dizziness. Pilose Ant- sea. Cablin Patchouli Herb can treat abdominal pain and vom-
ler can treat deafness and infertility. (3). Block-LDA finds iting. (3). Block-LDA finds coherent symptoms and six correct
coherent symptoms and seven correct herbs. Milkwort Root herbs. Officinal Magnolia Bark and Atractylodes Rhizome
can treat amnesia and dreamfulness. (4). Link-PLSA-LDA can treat anorexia. Nutgrass Galingale Rhizome can treat acid
finds good topic. Dwarf Lilyturf Tuber and Salvia Root can regurgitation and belching. Pinellia Tuber can treat vomiting
treat vexation. Tangshen and Chinese Angelica can treat pal- and stomach reflux. (4). Link-PLSA-LDA performs very well
pitation. Milkwort Root can treat amnesia and dreaminess. on symptoms and makes one mistake on herbs. (5). PTM(a)
Grassleaf Sweetflag Rhizome can treat amnesia. Spine Date finds both nine correct symptoms and herbs. Common Flow-
Seed can treat dreaminess and night sweating. Debark Peony eringqince Fruit can treat spasm. (6). PTM(b) also performs
Root can treat night sweating. (5). PTM(a) finds coherent well on symptoms. On the right, Fresh Ginger can treat vomit-
symptoms and seven correct herbs. Cornus Officinalis can ing. Radix Aconiti Lateralis Preparata can treat spasm. Dried
treat dizziness, deafness and frequent urination. Chinese Ginger can treat vomiting and abdomen cold. (7). PTM(c)
Arborvitae kernel can treat palpitation and amnesia. (6). PTM finds ten intestines and stomach-related symptoms and eight
(b) finds nine correct herbs. Liquorice Root and Fleeceflower correct herbs. Green Tangerine peel can treat abdominal pain.
Root can treat palpitation. Poria can treat amnesia and palpi- (8). PTM(d) finds nine intestines and stomach-related symp-
tation. Divaricate Saposhnikovia Root can treat headache. toms and eight correct herbs. Poria can treat vomiting.
Ginseng can treat dysphoria. Chrysanthemum Flower can From the three topics, we observe that our prescription
treat blurred vision and headache. (7). All the ten symptoms topic model could find topics that reflect TCM prescribing
found by PTM(c) are mental symptoms and seven herbs are patterns well. After incorporating herb compatibility and
correct. (8). PTM(d) finds ten mental symptoms and eight cor- herb efficacy knowledge, the patterns discovery capability
rect herbs. Pilose Antler can treat deafness and infertility. can be improved as shown in PTM(b), PTM(c), PTM(d) and
The third topic presents intestines and stomach-related Link-PLSA-LDA topics.
symptoms and herbs for “Harmonizing intestines and stom- Table 5 shows four roles’ top herbs of two topics gener-
ach”. We can note that: (1). ATM still finds poor topic. On the ated by PTM(a). In the “Blood-regulating” topic, we can see
left, only abdominal fullness, stomach reflux and borborig- that all ten herbs of Role 3 can treat at least one of the symp-
mus are intestines and stomach-related symptoms. On the toms, and we find seven of the ten herbs can treat at least 3
right, none of the ten herbs can treat the ten symptoms on the symptoms of the top ten symptoms. Because Role 3 treats
left. (2). LinkLDA still shows its superiority to ATM. On the main symptoms of the syndrome, we can label it as jun
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
YAO ET AL.: A TOPIC MODELING APPROACH FOR TRADITIONAL CHINESE MEDICINE PRESCRIPTIONS 1019
TABLE 6
Average Topic Herb Precision of Several Topic Models with Different K (the Number of Topics) and Top 10 Symptoms/Herbs
K 5 10 15 20 25 30 35 40
Model
ATM 0.236 0.058 0.252 0.042 0.234 0.032 0.228 0.024 0.217 0.025 0.233 0.015 0.237 0.014 0.231 0.034
LinkLDA 0.750 0.044 0.693 0.211 0.660 0.032 0.647 0.025 0.622 0.031 0.639 0.023 0.624 0.026 0.606 0.025
Block-LDA 0.710 0.050 0.652 0.034 0.621 0.034 0.575 0.019 0.589 0.045 0.582 0.027 0.595 0.027 0.584 0.027
Link-PLSA-LDA 0.778 0.048 0.711 0.031 0.677 0.029 0.690 0.027 0.696 0.019 0.701 0.021 0.693 0.018 0.678 0.013
COM 0.574 0.038 0.461 0.042 0.419 0.028 0.421 0.032 0.406 0.015 0.393 0.023 0.378 0.022 0.382 0.017
PTM(a) 0.774 0.040 0.710 0.049 0.647 0.026 0.618 0.030 0.597 0.026 0.615 0.025 0.593 0.019 0.579 0.024
PTM(b) 0.836 0.042 0.781 0.034 0.749 0.031 0.713 0.009 0.699 0.026 0.701 0.022 0.684 0.019 0.670 0.017
PTM(c) 0.864 0.034 0.817 0.026 0.807 0.025 0.820 0.013 0.808 0.023 0.808 0.012 0.801 0.010 0.800 0.013
PTM(d) 0.866 0.044 0.817 0.028 0.803 0.018 0.770 0.015 0.790 0.021 0.780 0.025 0.763 0.011 0.770 0.022
We run all models 10 times and report the mean standard deviation of all topics’ average precision for each model. PTM(b), PTM(c), and PTM(d) significantly
outperform others (p < 0:01) based on 2-tailed paired t-test.
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
1020 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 30, NO. 6, JUNE 2018
treatment patterns. LinkLDA performs relatively well on all [7] S. K. Poon, et al., “A novel approach in discovering significant
interactions from TCM patient prescription data,” Int. J. Data Min-
four tasks. Block-LDA and BiBTM generally do not improve ing Bioinf., vol. 5, no. 4, pp. 353–368, 2011.
LinkLDA because they model herb/symptoms pairs outside [8] N. L. Zhang, R. Zhang, and T. Chen, “Discovery of regularities in
training prescriptions and may ignore the original prescrip- the use of herbs in traditional chinese medicine prescriptions,” in
tions structures. By considering herb roles, PTM(a) can New Frontiers in Applied Data Mining. Berlin, Germany: Springer,
obtain better generalization and herbs/symptoms recom- 2012, pp. 353–360.
[9] P. He, K. Deng, Z. Liu, D. Liu, J. S. Liu, and Z. Geng, “Discovering
mendation performance, but the treatment patterns discov- herbal functional groups of traditional chinese medicine,” Statist.
ery capabilities are not improved. Nevertheless, PTM(a) Med., vol. 31, no. 7, pp. 636–642, 2012.
could infer herb roles in a prescription, and herb roles infer- [10] L. Yao, Y. Zhang, and B. Wei, “An evolution system for traditional
ence in each prescription is another interesting problem to chinese medicine prescription,” in Knowledge Engineering and Man-
agement. Berlin, Germany: Springer, 2014, pp. 95–106.
explore. By incorporating herb compatibility, PTM(b) further [11] G. Zheng, M. Jiang, C. Lu, and A. Lu, “Prescription analysis and
gains better performances on all four tasks. By incorporating mining,” in Data Analytics for Traditional Chinese Medicine Research.
herb efficacy knowledge, Link-PLSA-LDA and PTM(c) also Berlin, Germany: Springer, 2014, pp. 97–109.
gain better performances on all four tasks. PTM(d) generally [12] X. Zhang, X. Zhou, H. Huang, S. Chen, and B. Liu, “A hierarchical
achieves the best results on all tasks because it considers symptom-herb topic model for analyzing traditional chinese med-
icine clinical diabetic data,” in Proc. 3rd Int. Conf. Biomed. Eng. Inf.,
both herb compatibility and herb efficacy knowledge. These 2010, pp. 2246–2249.
results demonstrate it is necessary to consider TCM back- [13] X. Zhang, X. Zhou, H. Huang, Q. Feng, S. Chen, and B. Liu, “Topic
ground in TCM data analysis, and this work can be a promis- model for chinese medicine diagnosis and prescription regulari-
ing start for incorporating domain knowledge into the ties analysis: Case on diabetes,” Chin. J. Integrative Med., vol. 17,
pp. 307–313, 2011.
prescription topic modeling. [14] Z. Jiang, X. Zhou, X. Zhang, and S. Chen, “Using link topic model
to analyze traditional chinese medicine clinical symptom-herb
regularities,” in Proc. IEEE 14th Int. Conf. E-Health Netw., Appl..
6 CONCLUSION AND FUTURE WORK Serv., 2012, pp. 15–18.
[15] L. Yao, et al.“Discovering treatment pattern in traditional chinese
This paper presented a novel topic model for TCM prescrip- medicine clinical cases by exploiting supervised topic model
tions. It characterizes the generative process of prescriptions and domain knowledge,” J. Biomed. Inf., vol. 58, pp. 260–267,
in TCM theories. Using 33,765 prescriptions, this model can 2015.
discover the prescribing patterns in TCM. Furthermore, it [16] Z. Deng, Formulae of Chinese Medicine. Hong Kong: China Press of
Traditional Chinese Medicine, 2008, [in Chinese, ISBN:
can outperform several previous methods on recommend- 9787532384761].
ing herbs for a list of symptoms and predicting symptoms [17] Y. Huang, et al., “Exploring the rules of li-fa-fang-yao on diabetes
for a prescription. The method is helpful for clinical mellitus within traditional chinese medicine through text min-
research and practice. ing,” in Proc. 7th Int. Conf. Comput. Convergence Technol., 2012,
In future work, we plan to incorporate more prescription pp. 1369–1373.
[18] S. Wang et al., “Compatibility art of traditional chinese medicine:
information such as usage, form and herbal dosage, and From the perspective of herb pairs,” J. Ethnopharmacology, vol. 143,
more domain knowledge such as symptoms’ syndrome cat- no. 2, pp. 412–423, 2012.
egory as prior knowledge into our model. And evaluating [19] D. M. Blei, “Probabilistic topic models,” Commun. ACM, vol. 55,
herb roles inferred by our model is an interesting problem no. 4, pp. 77–84, 2012.
[20] L. Fei-Fei and P. Perona, “A Bayesian hierarchical model for learn-
we are going to investigate. ing natural scene categories,” in Proc. IEEE Comput. Soc. Conf.
Comput. Vis. Pattern Recognit., 2005, pp. 524–531.
ACKNOWLEDGMENTS [21] E. Bart, M. Welling, and P. Perona, “Unsupervised organization of
image collections: Taxonomies and beyond,” IEEE Trans. Pattern
This work is supported by the National Natural Science Anal. Mach. Intell., vol. 33, no. 11, pp. 2302–2315, Nov. 2011.
Foundation of China (No. 61572434), the China Knowledge [22] Z. Huang, W. Dong, P. Bath, L. Ji, and H. Duan, “On mining latent
treatment patterns from electronic medical records,” Data Mining
Centre for Engineering Sciences and Technology (No. Knowl. Discovery, vol. 29, no. 4, pp. 914–949, 2015.
CKCEST-2017-1-3), the Zhejiang Provincial Natural Science [23] D. Andrzejewski, X. Zhu, and M. Craven, “Incorporating domain
Foundation of China (No. LY14F020027), and the Special- knowledge into topic modeling via dirichlet forest priors,” in
ized Research Fund for the Doctoral Program of Higher Proc. 26th Annu. Int. Conf. Mach. Learn., 2009, pp. 25–32.
[24] D. Andrzejewski, X. Zhu, M. Craven, and B. Recht, “A framework
Education (SRFDP) (20130101110136). for incorporating general domain knowledge into latent dirichlet
allocation using first-order logic,” in Proc. 22nd Int. Joint Conf.
REFERENCES Artif. Intell., vol. 22, no. 1, 2011, Art. no. 1171.
[25] R. Balasubramanyan, B. Dalvi, and W. W. Cohen, “From topic
[1] F. Cheung, “TCM: Made in china,” Nature, vol. 480, no. 7378, models to semi-supervised learning: Biasing mixed-membership
pp. S82–S83, 2011. models to exploit topic-indicative features in entity clustering,” in
[2] J. Qiu, “Traditional medicine: A culture in the balance,” Nature, Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases, 2013,
vol. 448, no. 7150, pp. 126–128, 2007. pp. 628–642.
[3] H. Peng, Dictionary of Traditional Chinese Medicine Prescriptions. [26] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles
Beijing: People Health Press, 1996, [in Chinese, ISBN: 7117018879]. and Techniques. Cambridge, MA, USA: MIT press, 2009.
[4] X. Zhou, et al., “Development of traditional chinese medicine [27] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet
clinical data warehouse for medical knowledge discovery and allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003.
decision support,” Artif. Intell. Med., vol. 48, no. 2, pp. 139–152, [28] J. K. Pritchard, M. Stephens, and P. Donnelly, “Inference of popu-
2010. lation structure using multilocus genotype data,” Genetics,
[5] H. Yang, et al., “New drug R&D of traditional chinese medicine: vol. 155, no. 2, pp. 945–959, 2000.
Role of data mining approaches,” J. Biol. Syst., vol. 17, no. 03, [29] X. Chen, T. He, X. Hu, Y. An, and X. Wu, “Inferring functional
pp. 329–347, 2009. groups from microbial gene catalogue with probabilistic topic
[6] S. Li, B. Zhang, D. Jiang, Y. Wei, and N. Zhang, “Herb network models,” in Proc. IEEE Int. Conf. Bioinf. Biomed., 2011, pp. 3–9.
construction and co-module analysis for uncovering the combina- [30] A. Van Esbroeck, C.-C. Chia, and Z. Syed, “Heart rate topic
tion rule of traditional chinese herbal formulae,” BMC Bioinf., models,” in Proc. 26th AAAI Conf. Artif. Intell., 2012, pp. 1635–1641.
vol. 11, no. 11, 2010, Art. no. 1.
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
YAO ET AL.: A TOPIC MODELING APPROACH FOR TRADITIONAL CHINESE MEDICINE PRESCRIPTIONS 1021
[31] Z. Huang, W. Dong, L. Ji, C. He, and H. Duan, “Incorporating Liang Yao received the BE degree from the Col-
comorbidities into latent treatment pattern mining for clinical lege of Computer Science, Sichuan University,
pathways,” J. Biomed. Inform., vol. 59, pp. 227–239, 2016. Chengdu, China, in 2012. He is currently working
[32] I. Yoo, et al., “Data mining in healthcare and biomedicine: A survey toward the PhD degree from the College of Com-
of the literature,” J. Med. Syst., vol. 36, no. 4, pp. 2431–2448, 2012. puter Science and Technology, Zhejiang Univer-
[33] N. Esfandiari, M. R. Babavalian, A.-M. E. Moghadam, and sity. His current research interests include data
V. K. Tabar, “Knowledge discovery in medicine: Current issue mining, medical informatics, natural language
and future trend,” Expert Syst. Appl., vol. 41, no. 9, pp. 4434–4463, processing, topic models, and probabilistic graph-
2014. ical models.
[34] Y. Feng, Z. Wu, X. Zhou, Z. Zhou, and W. Fan, “Knowledge dis-
covery in traditional chinese medicine: State of the art and
perspectives,” Artif. Intell. Med., vol. 38, no. 3, pp. 219–236, 2006.
[35] S. Lukman, Y. He, and S.-C. Hui, “Computational methods for tra- Yin Zhang received the PhD degree in computer
ditional chinese medicine: A survey,” Comput. Methods Programs science from Zhejiang University, in 1999. Cur-
Biomed., vol. 88, no. 3, pp. 283–294, 2007. rently, she is an associate professor with the Col-
[36] B. Liu, et al., “Data processing and analysis in real-world tradi- lege of Computer Science, Zhejiang University.
tional chinese medicine clinical data: Challenges and approach- Her research interests mainly include data mining
es,” Statist. Med., vol. 31, no. 7, pp. 653–660, 2012. and knowledge discovery, medical informatics,
[37] G.-Z. Li and B.-Y. Liu, “Big data is essential for further develop- multimedia information processing, pattern rec-
ment of integrative medicine,” Chin. J. Integrative Med., vol. 21, ognition, and knowledge engineering.
pp. 323–331, 2015.
[38] N. L. Zhang, S. Yuan, T. Chen, and Y. Wang, “Latent tree models
and diagnosis in traditional chinese medicine,” Artifi. Intell. Med.,
vol. 42, no. 3, pp. 229–245, 2008.
[39] E. Erosheva, S. Fienberg, and J. Lafferty, “Mixed-membership Baogang Wei received the PhD degree from
models of scientific publications,” Proc. Nat. Acad. Sci. USA, Northwestern Polytechnical University, China, in
vol. 101, no. suppl 1, pp. 5220–5227, 2004. 1997. He is currently a professor with Zhejiang
[40] M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth, “The University, China. His main research interests
author-topic model for authors and documents,” in Proc. 20th include artificial intelligence, pattern recognition,
Conf. Uncertainty Artif. Intell., 2004, pp. 487–494. image processing, machine learning, digital
[41] N. Yao, J. Zhu, and R. Gao, Traditional Chinese Medicine Symp- library, and information & knowledge manage-
toms Differential Diagnosis. Beijing: People Health Press, 2013, ment. Since 1999, he has been a member of the
[in Chinese, ISBN: 9787117036139]. Chinese Association for Artificial Intelligence. So
[42] L. Wu, Chin. Traditional Medicine and Materia Medical Subject Head- far, he has published more than 50 papers in
ings. Beijing: Chinese Medical Ancient Books Publishing, 1996, international journals including the IEEE Trans-
[in Chinese, ISBN: 9787801743596]. actions on Knowledge and Data Engineering and IEEE Transactions on
[43] R. Balasubramanyan and W. W. Cohen, “Block-LDA: Jointly Visualization and Computer Graphics, and conference proceedings
modeling entity-annotated text and entity-entity links,” in Proc. including AAAI, CIKM, and PAKDD.
SIAM Int. Conf. Data Mining, 2011, pp. 450–461.
[44] R. Nallapati and W. W. Cohen, “Link-plsa-LDA: A new unsuper-
vised model for topics and influence of blogs,” in Proc. Int. Conf. Wenjin Zhang received the graduate degree
Weblogs Social Media, 2008, pp. 84–92. from the School of Medicine, Zhejiang University,
[45] S. Amer-Yahia, S. B. Roy, A. Chawla, G. Das, and C. Yu, “Group in 1990 and received the PhD degree, in 2005.
recommendation: Semantics and efficiency,” Proc. VLDB Endow- Now, he is an associate chief physician in the
ment, vol. 2, no. 1, pp. 754–765, 2009. Hepatopancreatobiliary Surgery Department of
[46] Q. Yuan, G. Cong, and C.-Y. Lin, “Com: A sgenerative model for the first affiliated Hospital of Zhejiang University.
group recommendation,” in Proc. 20th ACM SIGKDD Int. Conf. In addition to clinical work, he is also engaged in
Knowl. Discovery Data Mining, 2014, pp. 163–172. medical research involving experimental science
[47] T. Wu, G. Qi, H. Wang, K. Xu, and X. Cui, “Cross-lingual taxon- and data analysis by IT technology.
omy alignment with bilingual biterm topic model,” in Proc. AAAI
Conf. Artif. Intell., 2016, pp. 287–293.
[48] H. M. Wallach, I. Murray, R. Salakhutdinov, and D. Mimno,
“Evaluation methods for topic models,” in Proc. 26th Annu. Int. Zhe Jin received the BE degree from the College
Conf. Mach. Learn., 2009, pp. 1105–1112. of Computer Science and Technology, Zhejiang
[49] A. Asuncion, M. Welling, P. Smyth, and Y. W. Teh, “On smooth-
University, Hangzhou, China, in 2015, where he
ing and inference for topic models,” in Proc. 25th Conf. Uncertainty
is currently a PhD candidate at the same college.
Artif. Intell., 2009, pp. 27–34. His current research interests include text mining
[50] C. Archambeau, B. Lakshminarayanan, and G. Bouchard, “Latent and semantic network.
IBP compound dirichlet allocation,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 37, no. 2, pp. 321–333, Feb. 2015.
[51] J. Chang, S. Gerrish, C. Wang, J. L. Boyd-Graber, and D. M. Blei,
“Reading tea leaves: How humans interpret topic models,” in
Advances Neural Inf. Process. Syst., 2009, pp. 288–296.
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.