100% found this document useful (1 vote)
43 views15 pages

A Topic Modeling Approach For Traditional Chinese Medicine Prescriptions

Uploaded by

ganapathy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
43 views15 pages

A Topic Modeling Approach For Traditional Chinese Medicine Prescriptions

Uploaded by

ganapathy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 30, NO.

6, JUNE 2018 1007

A Topic Modeling Approach for Traditional


Chinese Medicine Prescriptions
Liang Yao , Yin Zhang, Baogang Wei, Wenjin Zhang, and Zhe Jin

Abstract—In traditional Chinese medicine (TCM), prescriptions are the daughters of doctors’ clinical experiences, which have been the
main way to cure diseases in China for several thousand years. In the long Chinese history, a large number of prescriptions have been
invented based on TCM theories. Regularities in the prescriptions are important for both clinical practice and novel prescription
development. Previous works used many methods to discover regularities in prescriptions, but rarely described how a prescription is
generated using TCM theories. In this work, we propose a topic model which characterizes the generative process of prescriptions in TCM
theories and further incorporate domain knowledge into the topic model. Using 33,765 prescriptions in TCM prescription books, the
model can reflect the prescribing patterns in TCM. Our method can outperform several previous topic models and group recommendation
methods on generalization performance, herbs recommendation, symptoms suggestion, and prescribing patterns discovery.

Index Terms—Traditional chinese medicine, prescriptions, topic model, domain knowledge

1 INTRODUCTION

A S a system of ancient medical practice that differs in


substance, methodology and philosophy to modern
medicine, traditional Chinese medicine (TCM) plays an
[12], [13], [14], [15], but they failed to comprehensively
describe how a prescription is generated using TCM theo-
ries or utilize TCM domain knowledge well. The detailed
indispensable role in health care for Chinese people for sev- discussions of these works are in Section 2.2.
eral thousand years, and is becoming more frequently used The therapeutic process in traditional Chinese medicine
in countries in the West [1]. can be called as li-fa-fang-yao which is of critical importance
In TCM, a prescription is a group of herbal medicines in clinical practices [11], [16], [17]. li-fa-fang-yao, which means
(mineral medicines and animal medicines are also used, we principles, methods, prescriptions and Chinese herbs respec-
will use the word “herb” to refer to medicinal materials in tively. It indicates the four basic steps of diagnosis and treat-
prescriptions), which is the main way to cure diseases for ment: determining the cause, mechanism (syndromes) of the
thousands of years. In the long Chinese history, a lot of pre- disease according to symptoms, then deciding the treatment
scriptions have been invented to treat diseases and more methods based on the mechanism, and finally selecting a
than 100,000 have been recorded [2]. An example TCM pre- prescription as well as proper herbs. Fig. 2 shows the general
scription in Dictionary of Traditional Chinese Medicine Pre- process of li-fa-fang-yao. We refer readers to Fig. 1 in [17]
scriptions [3] is given in Fig. 1. It has a source book, which shows an intuitive example process of li-fa-fang-yao
composition herbs, usage and indication symptoms. when TCM practitioners treat diabetes mellitus.
Regularities on the herbs composition in prescriptions Regarding the basic composition of TCM prescriptions,
and corresponding symptoms play a significant role for one of the most influential theories is the principle of
clinical treatment and novel prescription development. For jun-chen-zuo-shi [2], [16] (also known as “emperor-minister-
instance, common herb combinations are important for effi- assistant-courier”). It means different herbs play different
cient clinical prescriptions [4], and the necessity of prescrip- roles in a prescription. The jun (emperor) herbs treat the main
tion patterns discovery for new drug research and cause or primary symptoms of a disease. The chen (minister)
development in TCM has been shown in [5]. herbs serve to augment or broaden the effects of jun, and
Previous works proposed many methods that could dis- relieve secondary symptoms. The zuo (assistant) herbs are
cover regularities in prescriptions [6], [7], [8], [9], [10], [11], used to improve the effects of jun and chen, and to counteract
the toxic or side effects of these herbs. The shi (courier) herbs
are included in many prescriptions to ensure that all compo-
 L. Yao, Y. Zhang, B. Wei, and Z. Jin are with the College of Computer Science nents in the prescription cooperate well, or to help deliver or
and Technology, Zhejiang University, No. 38, Zheda Road, Hangzhou, Zhe- guide them to the target organs. Taking the famous prescrip-
jiang 310027, China. E-mail: {yaoliang, yinzh, wbg, shrineshine}@zju.edu.cn.
 W. Zhang is with the First Affiliated Hospital, School of Medicine,
tion “Ephedra Decoction” in Fig. 1 as an example, Ephedra
Zhejiang University, Hangzhou, Zhejiang 310027, P. R. China. (marked red) is the jun herb, which is used to induce sweating
E-mail: [email protected]. and treat the main symptoms aversion to cold with fever and
Manuscript received 26 July 2016; revised 2 Nov. 2017; accepted 11 Dec. asthma without sweat. Cassia Twig (marked blue) is the chen
2017. Date of publication 1 Jan. 2018; date of current version 27 Apr. 2018. herb which helps Ephedra to induce sweating and treat sec-
(Corresponding author: Yin Zhang.) ondary symptom headache and body pain. Apricot Seed
Recommended for acceptance by Q. He.
(marked green) is the zuo herb which helps Ephedra to treat
For information on obtaining reprints of this article, please send e-mail to:
[email protected], and reference the Digital Object Identifier below. asthma. Liquorice Root (marked orange) is the shi herb which
Digital Object Identifier no. 10.1109/TKDE.2017.2787158 makes the other three herbs to work well together. The same
1041-4347 ß 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
1008 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 30, NO. 6, JUNE 2018

Fig. 2. The general process of li-fa-fang-yao.

among closely related variables. The directions of edges show


how a variable is generated given other variables, thus topic
models can tell how a prescription is generated by tracking
Fig. 1. An example TCM prescription. The herbs marked red, blue, green,
and orange are the jun (emperor) herb, chen (minister) herb, zuo (assis- these edges among variables. Topic models can be readily
tant) herb, and shi (courier) herb, respectively. They play different roles in extended if we have prior knowledge about specific elements.
the prescription. In text mining tasks, a number of works have incorporated
linguistic knowledge about words into topic modeling [23],
herb can have distinct roles in different prescriptions. Thus, [24], [25]. Similarly, we can incorporate knowledge about
jun and chen herbs in one prescription may serve as zuo and herbs and symptoms into prescription topic modeling.
shi herbs in another prescription. In this work, we propose a topic model which characterizes
Another important concept for prescribing is herb compat- the generative process of prescriptions in TCM theories and
ibility [16], [18], which means the combination of two or more further incorporate domain knowledge into the topic model.
herbs based on the clinical settings and the properties of Using 33,765 prescriptions in TCM prescription books, this
herbs. The combination of herbs can improve the treatment model can reflect the prescribing patterns in TCM. The
and avoid adverse reactions. Herb pairs, the unique combina- method can help TCM practitioners prescribe and pharma-
tions of two relatively fixed herbs, are the most fundamental ceutical companies decide what combination of herbs to test.
and the simplest form of herb compatibility. In the procedure The contributions of the paper are summarized as
of forming a prescription, herb pairs are always used as the follows:
basic units. For instance, in Fig. 1, the jun (emperor) herb
 It proposes a novel prescription topic model which
Ephedra and the chen (minister) herb Cassia Twig can cooper-
ate to induce sweating, if we only use one of them, the sweat- characterizes the generative process of prescriptions
ing inducing effect would be much weaker. Similarly, Radix based on TCM theories.
Aconiti Lateralis Preparata and Dried Ginger are always used  To the best of our knowledge, our work is one of the
together in many prescriptions for dispelling cold [16]. earliest works to study the problem of herbs recom-
To model the complex TCM domain, we resort to topic mendation and symptoms suggestion. Our work is
models [19] which are widely used in exploratory data analy- also among the earliest works to introduce a knowl-
sis. Topic models are mainly used to uncover latent “topics” edge-based topic model in medical data mining.
in a collection of documents. The topics are distributions over  Our experimental results demonstrate that our
words which shows semantic patterns in the documents. method outperforms several baselines on generaliza-
Each document exhibits those topics with different degrees tion performance, herbs recommendation, symp-
(topic proportions). One advantage is that topic models can toms suggestion and prescribing patterns discovery.
be adapted to other kinds of data when we make a direct anal-  We provide a benchmark TCM prescription dataset.1
ogy from a kind of data to documents [19]. For instance, in
computer vision, researchers have made a direct analogy
from images to documents [20], [21]. They assume each image 2 RELATED WORK
is a group of “visual words” and shows a combination of 2.1 Topic Models
visual patterns (topics). In medical domain, one can regard a Topic models lie in a more general framework called proba-
medical record as a “document”, view treatment activities bilistic graphical models [26] which provide an elegant and
and patient features as “words” and treatment patterns as principled approach to developing novel methods for data
“topics” [22]. Similarly, we can view a prescription as a analysis and knowledge discovery. Probabilistic graphical
“document” (a group of “herbs words” or “symptom words”) models give us a visual language for expressing assump-
and treatment patterns in prescriptions as “topics”. Another tions about data and its hidden structure.
advantage of topic models is that they can easily express rela- Probabilistic topic models [19] such as Latent Dirichlet
tions among elements of a complex domain, and explain how Allocation (LDA) [27] are commonly used machine learning
the modeled data is generated and incorporate domain methods that could find latent topics in documents. Topic
knowledge. Taking Fig. 1 as an example, we can only see the models can be adapted to model many other data forms as
herbs and symptoms but could not see other elements in long as we can treat target data samples as documents
Fig. 2. Topic models can characterize this by regarding herbs (groups of words). Apart from the medical record example
and symptoms as observed variables, and treating syn- in the introduction section, topic models could also be used
dromes, treatment methods as hidden variables. The relations
among the herbs, symptoms, syndromes, treatment methods
1. We released the data set, source code, and domain knowledge
and herb roles are complicated. Topic models can easily files of this paper at the first author’s GitHub: https://github.com/
express relations among these elements by putting edges yao8839836/PTM.
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
YAO ET AL.: A TOPIC MODELING APPROACH FOR TRADITIONAL CHINESE MEDICINE PRESCRIPTIONS 1009

in many health care and biomedicine tasks. For instance, in and perform recommendation or suggestion tasks in this
population genetics, one can treat each individual’s geno- study because a label could correspond to different combina-
type as a “document” and genetic patterns are “topics” of tions of symptoms.
those documents [28]. Chen et al. [29] showed that the con- Although these topic models described the prescribing
figuration of functional groups in meta-genome samples process, they failed to characterize the two important
can be inferred by probabilistic topic modeling. Van principles jun-chen-zuo-shi and herb compatibility, or could
Esbroeck et al. [30] explored the application of topic models not utilize domain knowledge well, while our topic model is
on heart rate time series to identify functional sets of heart more consistent with TCM theories and domain knowledge.
rate sequences and to concisely describe patients. Recently,
latent treatment patterns for clinical pathways [31] were dis- 3 DATA
covered with topic modeling.
Some knowledge-based topic models [23], [24], [25] have We collect 98,334 prescriptions from Dictionary of Traditional
been proposed. These models mainly use different forms of Chinese Medicine Prescriptions [3] which contains almost all
external linguistic knowledge for better text mining, but (about 100,000) prescriptions recorded in China. We focus
knowledge-based topic models have not been extensively on herbs and symptoms in this work.
explored for other kinds of data, especially for medical data. We filter indication symptoms by using 603 standard
symptoms in Traditional Chinese Medicine Symptoms differen-
tial diagnosis [41], and filter herbs by using 970 herbs
2.2 TCM Knowledge Discovery
in Traditional Chinese Medical Subject Headings (TCM
Knowledge discovering and data mining have become hot MeSH) [42]2 which is compatible with Medical Subject
topics in health care and biomedicine [32], [33]. Compared Headings (MeSH). Each symptom has a syndrome category
with data mining research in modern biomedicine, TCM and each herb has efficacy description text. Among all
data mining just becomes popular in recent years. The efforts 98,334 prescriptions, 33,765 of them have both symptoms
of TCM data mining have been reviewed by Feng et al. [34], and herbs in two filters. S ¼ 390 symptoms and H ¼ 811
Lukman et al. [35], Liu et al. [36] and Li and Liu [37]. herbs appear in P ¼ 33;765 prescriptions. We run our
A number of works have been devoted to studying the experiments on the 33,765 prescriptions. We randomly
component patterns in TCM prescriptions. For example, Li divided the P ¼ 33;765 prescriptions into a training set of
et al. [6] constructed herb network using a method called Dis- 28,746 prescriptions and a test set of 5,019 prescriptions.
tance-based Mutual Information Model to identify useful
relationships among herbs in numerous prescriptions. Zhang
et al. [8] discovered interesting regularities using latent tree 4 PRESCRIPTION TOPIC MODEL (PTM)
models [38], these regularities are of interest to students of Guided by li-fa-fang-yao, TCM practitioners usually synthe-
TCM as well as pharmaceutical companies that manufacture sise disease manifestations (symptoms) and determine syn-
medicine using Chinese herbs. He et al. [9] proposed an dromes of a patient first. Then treatment methods are easily
approach that could discover herbal functional groups from a determined according to syndromes. In general, a particular
large set of prescriptions recorded in TCM books. Poon treatment method corresponds to a syndrome. For example,
et al. [7] proposed an approach that could systematically gen- in Fig. 1, TCM practitioners first determine the syndrome
erate combinations of interacting herbs that might lead to “depressed nutrient and defense” which means the nutrient
good outcome. Zheng et al. [11] constructed prescription asso- in blood is not well absorbed and immunity is weak and the
ciated networks by mining literature data sets. Yao et al. [10] syndrome “failure of lung qi in dispersion” which means
introduced a system which mines the evolutionary relation- respiratory movement is depressed, then the treatment
ship among TCM prescriptions from prescription books. methods “inducing sweating to releasing exterior” (which
The closest works to ours are [12], [13], [14], [15] which means inducing sweating and move qi (the fundamental
have explored topic modeling on TCM clinical data. Zhang substance which constitutes the human body) to skin) corre-
et al. [12] proposed a hierarchical symptom-herb topic model sponding to “depressed nutrient and defense” and “diffuse
which uses Link latent Dirichlet allocation (LinkLDA) [39] the lung to calm panting” (which means regulating respira-
model and nested Chinese restaurant process to automati- tory movement to calm panting) corresponding to “failure
cally extract hierarchical latent topic structures with both of lung qi in dispersion” are decided. Finally, practitioners
symptoms and their corresponding herbs in TCM clinical form a prescription based on the treatment methods. In the
records. The number of hierarchical topics is automatically prescription, each treatment method is implemented by
determined. Zhang et al. [13] proposed the Symptom-Herb- some herbs (e.g., the two treatment methods mentioned
Diagnosis topic model which uses Author-topic model above are mainly implemented by Ephedra), and each herb
(ATM) [40] and diagnoses information to discover the com- has a jun-chen-zuo-shi role (e.g., Ephedra is the jun herb).
mon relationships among symptoms, herb combinations and Based on this process, here we introduce the details of
diagnoses in clinical cases. Jiang et al. [14] applied LinkLDA our Prescription Topic Model (PTM). Let P be the number
directly to the same problem. Our model is an extension to of prescriptions where each prescription p has Nhp herbs
LinkLDA model. In our previous work [15], we presented a and Nsp symptoms, hpn is the nth herb in p and spm is the
framework to mine medicine usage patterns in clinical cases. mth symptom in p. The prescription in Fig. 1 has Nhp ¼ 4
We first mapped symptoms to treatment methods defined in herbs and Nsp ¼ 7 symptoms. zpn is the latent treatment
TCM domain ontology, then viewed treatment methods as method assignment for hpn , z0pm is the latent syndrome
labels of a prescription and employed a supervised topic assignment for spm , xpn is the latent jun-chen-zuo-shi role
model to learn herb usage patterns under each topic (label). assignment for hpn (The prescriptions with known jun-chen-
The method could reflect treatment methods-herbs relations.
However, it could not learn direct symptom-herb relations 2. Available at http://zcy.ckcest.cn/tcm/dic/home
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
1010 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 30, NO. 6, JUNE 2018

TABLE 1 (5) For each of the Nsp symptoms in prescription p:


Mathematical Notations a) Draw a syndrome z0pm  Multðup Þ.
b) Draw a symptom spm  Multðf0z0 Þ.
Symbol Description pm
(6) For each of the Nhp herbs in prescription p:
P The number of prescriptions.
a) Draw a treatment method zpn  Multðup Þ.
K The number of topics (syndromes/treatment methods).
H The number of unique herbs. b) Draw a role xpn  Multðppzpn Þ.
S The number of unique symptoms. c) Draw an herb hpn  Multðfzpn xpn Þ.
X The number of unique jun-chen-zuo-shi roles, X ¼ 4. This generative story is shown in the probabilistic graph-
Nhp The number of herbs in prescription p. ical models representation of Fig. 3a. It is similar to Link
Nsp The number of symptoms for prescription p. latent Dirichlet allocation (LinkLDA) model [39]. The dis-
Nlp The number of herb pairs for prescription p. tinction is that we encode herb role x into our model. We
hpn The nth herb in prescription p. name it PTM(a).
~pl
h The lth herb pair in prescription p.
hpl1 The first herb of the lth herb pair in prescription p.
hpl2 The second herb of the lth herb pair in prescription p. 4.1 Model Inference and Learning
spm The mth symptom for prescription p. We use Gibbs sampling to infer latent assignments z0pm , zpn
zpn The latent treatment method assignment for hpn . and xpn . The Gibbs sampling equation for z0pm is defined as
zpl The latent treatment method assignment for h ~pl .
xpn The latent jun-chen-zuo-shi role assignment for hpn . pðz0pm ¼ kjspm ; spm ; z0pm ; z; a; b0 Þ
xpl1 The latent jun-chen-zuo-shi role assignment for hpl1 .
xpl2 The latent jun-chen-zuo-shi role assignment for hpl2 . npk þ a nkspm þ b0 (1)
/  ;
z0pm The latent syndrome assignment for spm . Nsp þ Nhp þ Ka nk þ Sb0
up The prescription-topic multinomial for prescription p.
ppk The prescription-treatment method-role multinomial for where k is a syndrome, spm are all symptoms except spm ,
prescription p and treatment method k. z0pm are syndrome assignments for all symptoms except spm ,
fkx The treatment method-role-herb multinomial for z are treatment method assignments for all herbs. npk is the
treatment method k and role x. number of times topic (syndrome or treatment method) k is
f0k The syndrome-symptom multinomial for syndrome k. assigned to a symptom or an herb in prescription p, nkspm is
a Hyperparameter of the Dirichlet prior on up . the number of times spm is assigned to syndrome k, nk is the
b Hyperparameter of the Dirichlet prior on fkx . number of times any symptom is assigned to syndrome k.
b0 Hyperparameter of the Dirichlet prior on f0k .
The sampling equation for zpn and xpn is defined as
h Hyperparameter of the Dirichlet prior on ppk .
pðzpn ¼ k; xpn ¼ xjhpn ; zpn ; xpn ; hpn ; z0 ; a; b; hÞ
zuo-shi herb roles are limited, there are only several hundred npk þ a npkx þ h nkxhpn þ b (2)
/   ;
prescriptions in textbooks like [16] to our knowledge). In Nsp þ Nhp þ Ka n0pk þ Xh nkx þ Hb
Fig. 1, the latent treatment method assignment for Ephedra
should be “inducing sweating to releasing exterior” or where k is a treatment method, x is a jun-chen-zuo-shi role, zpn
“diffuse the lung to calm panting”, and the latent syndrome are treatment method assignments for all herbs except hpn ,
assignment for the symptom “asthma without sweat” xpn are role assignments for all herbs except hpn , hpn are all
should be “depressed nutrient and defense”. Let K be the herbs except hpn , z0 are syndrome assignments for all symp-
number of topics, a topic k 2 1 . . . K is a syndrome and toms. npkx is the number of times treatment method k and role
the syndrome’s corresponding treatment method, (e.g., x are assigned to an herb in prescription p, n0pk is the number
“depressed nutrient and defense” and its corresponding of times treatment method k is assigned to an herb in prescrip-
“inducing sweating to releasing exterior”), f0k is the tion p, nkxhpn is the number of times k and x are assigned to hpn .
S-dimensional syndrome-symptom multinomial for syn- With Gibbs sampling, we can make the following param-
drome k 2 1 . . . K, where S is the number of unique symp- eter estimation
npk þ a
toms. fkx is the H-dimensional treatment method-role-herb upk ¼ (3)
Nsp þ Nhp þ Ka
multinomial for treatment method k and jun-chen-zuo-shi
role x, where H is the number of unique herbs. up is the nkspm þ b0
K-dimensional prescription-topic multinomial for p. ppk f0kspm ¼ (4)
nk þ Sb0
is the X-dimensional prescription-treatment method-role
multinomial for prescription p and treatment method k, npkx þ h
ppkx ¼ (5)
X ¼ 4, which means an herb is a jun, chen, zuo or shi herb. a, n0pk þ Xh
b, b0 and h are hyperparameters of the Dirichlet prior on up ,
fkx , f0k and ppk respectively. We illustrate the mathematical nkxhpn þ b
fkxhpn ¼ (6)
notations in Table 1. The generative story of our prescrip- nkx þ Hb
tion topic model is given as follows:
4.2 Herb Compatibility
(1) For each prescription p draw up  DirðaÞ.
Since herb pairs are always used as the basic units, and each
(2) For each syndrome k in 1 . . . K, draw f0k  Dirðb0 Þ.
herb pair often implements a certain treatment method [16],
(3) For each prescription p and treatment method k in
we extract herb pairs from each training prescription, i.e., if
1 . . . K, draw ppk  DirðhÞ.
any two herbs co-occur in a prescription p of the training set
(4) For each treatment method k in 1 . . . K and jun-chen- (e.g., the two herbs Ephedra and Cassia Twig in Fig. 1), we
zuo-shi role x in 1 . . . X, draw fkx  DirðbÞ. add the herb pair to the herb pair set of p. There are
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
YAO ET AL.: A TOPIC MODELING APPROACH FOR TRADITIONAL CHINESE MEDICINE PRESCRIPTIONS 1011

Fig. 3. The probabilistic graphical models representation of PTM. (a) PTM(a): The prescription topic model with herb role only. (b) PTM(b): The pre-
scription topic model with herb role and herb compatibility.

2
Nlp ¼ CN hp
¼ Nhp ðNhp  1Þ=2 herb pairs in p when Nhp > 1, use the 390 symptoms to filter efficacy descriptions of 811
if there is only one herb in p, we assume that p has one herb herbs in TCM MeSH, and obtain the symptom-herb corre-
pair, but the pair consists of two identical herbs. As shown spondences, then for each prescription in the training set, if an
in the left part of Fig. 3b, the generative story of the link set herb h in a prescription p can treat a symptom s of p’s indica-
of prescription p is as follows: tion, we add h and s (e.g., the herb Ephedra and the symptom
aversion to cold with fever in Fig. 1) to the symptom-herb cor-
(1) For each herb pair h ~pl of the Nl herb pairs in pre-
p responding set of prescription p. Since their correspondence
scription p: in TCM knowledge, we assume a symptom s in the corre-
a) Draw a treatment method zpl  Multðup Þ. sponding set can only be assigned to the topics of s’s corre-
b) Draw two roles xpl1 ; xpl2  Multðppzpl Þ. sponding herbs in prescription p.
c) Draw two herbs hpl1  Multðfzpl xpl1 Þ, hpl2  Mult We name the prescription topic model with herb role and
ðfzpl xpl2 Þ. herb efficacy knowledge PTM(c) which is illustrated in
We name this model with herb compatibility PTM(b). Fig. 4a. If spm has no corresponding herb in prescription p,
The inference equation for zpl , xpl1 and xpl2 is defined as the inference equation for z0pm is the same as Equation (1);
~pl ; zpl ; xpl ; a; b; hÞ otherwise, z0pm can only be sampled from the topic assign-
pðzpl ¼ k; xpl1 ¼ x1 ; xpl2 ¼ x2 jh
ment set fzpn jhpn treats spm g of spm ’s corresponding herbs
n00pk þ a npkx þ h nkx1 hpl1 þ b fhpn jhpn treats spm g in p, the inference equation for z0pm is
/  0 1 
Nsp þ Nlp þ Ka npk þ Xh nkx1 þ Hb (7)
npkx þ h n kx2 hpl2 þ b pðz0pm ¼ kjspm ; spm ; z0pm ; z; a; b0 Þ /
 0 2  ;
npk þ Xh nkx2 þ Hb I½k 2 fzpn jhpn treats spm g
~pl is the lth herb pair in prescription p, zpl are treat- (10)
where h npk þ a nkspm þ b0
ment method assignments for all herb pairs except h ~pl , xpl  ;
~pl , n00 is the Nsp þ Nhp þ Ka nk þ Sb0
are role assignments for all herb pairs except h pk
number of times any herb pair or symptom in p is assigned
where I½y ¼ 1 when y is true and I½y ¼ 0 when y is false.
to topic k. The inference equation for z0pm in PTM(b) is
The inference equation for zpn and xpn in PTM(c) is the same
similar to Equation (1), but we need to replace npk and Nhp
as Equation (2). The parameter estimation equations for
with n00pk and Nlp
PTM(c) are the same as PTM(a).
pðz0pm ¼ kjspm ; spm ; z0pm ; z; a; b0 Þ We name our prescription topic model with herb role, herb
n00pk þ a nkspm þ b0 (8) compatibility and herb efficacy knowledge PTM(d) which is
/  shown in Fig. 4b. If spm has no corresponding herb in prescrip-
Nsp þ Nlp þ Ka nk þ Sb0
tion p, the inference equation for z0pm is the same as Equa-
The parameter estimation equations for f0kspm , ppkx and fkxh tion (8); otherwise, z0pm can only be sampled from the topic
in PTM(b) are the same as in PTM(a), the only difference is assignment set fzpl jhpl1 treats spm or hpl2 treats spm g of spm ’s
corresponding herbs in p, the inference equation for z0pm is
n00pk þ a
upk ¼ (9)
Nsp þ Nlp þ Ka pðz0pm ¼ kjspm ; spm ; z0pm ; z; a; b0 Þ
/ I½k 2 fzpl jhpl1 treats spm or hpl2 treats spm g
4.3 Incorporating Herb Efficacy Knowledge (11)
In this section, we use TCM prior knowledge to improve the n00pk þ a nkspm þ b0
 
prescription topic model. We extract the symptom-herb corre- Nsp þ Nlp þ Ka nk þ Sb0
spondences from the training prescriptions. Specifically, we
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
1012 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 30, NO. 6, JUNE 2018

Fig. 4. The probabilistic graphical models representation of PTM with herb efficacy knowledge. (a) PTM(c): The prescription topic model with herb
role and herb efficacy knowledge. (b) PTM(d): The prescription topic model with herb role, herb compatibility and herb efficacy knowledge.

The inference equation for zpl , xpl1 and xpl2 in PTM(d) is of each user in a user group by user-based collabora-
the same as Equation (7). The parameter estimation equa- tive filtering, then uses the average of these scores as
tions for PTM(d) are the same as PTM(b). the recommendation score for the group. We com-
pute the conditional probability of items (herbs/
symptoms) given users (symptoms/herbs) in train-
5 EXPERIMENT
ing prescriptions as the rating score.
In this section we evaluate our prescription topic model on  User-based collaborative filtering with least-misery strat-
four experimental tasks. Specifically we want to determine: egy (CF-LM) [45], a widely used group recommenda-
tion method which uses the smallest rating score of
 Can our model achieve better generalization perfor-
group users as the recommendation score for the
mance than other topic models?
group. We also compute the conditional probability
 Can our model recommend herbs for a list of
of items given users in training prescriptions as the
symptoms?
rating score.
 Can our model suggest symptoms for a list of herbs?
 COnsensus Model (COM) [46], a group recommenda-
 Can our model reflect the prescribing patterns in
tion method which simulates the generative process
TCM?
We compare our prescription topic model (PTM) with of group events and make recommendations for a
eight baselines. Among them, six baselines are topic models, group of users. We treat herbs in a prescription as a
three baselines are group recommendation methods. We group of users when recommending symptoms, and
compare our model with group recommendation methods view symptoms as a group of users when recom-
because recommending herbs (symptoms) for a list of mending herbs.
symptoms (herbs) is analogous to recommending items to a  Bilingual Biterm Topic Model (BiBTM) [47], a topic
group of users. model describing the generation process of a paired
bilingual document corpus. We treat herbs in a pre-
 Author-topic model (ATM) [40] employed by previous scription as the words in a document and symptoms
work [13] which treats herbs as authors and symp- as the words in the translated version of the document.
toms as words. We set the following hyperparameters: for PTM: a ¼ 1;
 LinkLDA [39] used in previous works [12] and [14] b ¼ 0:1; b0 ¼ 0:1; h ¼ 1; for LinkLDA: a ¼ 1; b ¼ 0:1; b0 ¼ 0:1;
which views herbs and symptoms as words and for Block-LDA: aD ¼ aL ¼ 1; g ¼ 0:1; for Link-PLSA-LDA:
references. au ¼ aL ðhyperparameter of pÞ ¼ 1; b0 ðhyperparameter of VÞ ¼
 Block-LDA [43], a topic model that extends Link- g ðhyperparameter of bÞ ¼ 0:1; for BiBTM: a ¼ 1; b ¼ 0:1; for
LDA. It can model links between certain type of enti- ATM: a ¼ 50=K; b ¼ 0:01 as suggested in [40]; for COM:
ties. We treat herb-pairs set extracted from all train- a ¼ 50=K; b ¼ h ¼ 0:01; g ¼ g t ¼ 0:5 and r ¼ 0:01 as sug-
ing prescriptions as the external links. gested in [46]. For CF-AVG and CF-LM, we use Pearson corre-
 Link-PLSA-LDA [44], a topic model that extends lation similarity and top 10 similar users. We find that small
Link-LDA. It can model links between different changes of hyperparameters do not change the results much.
types of entities. We treat symptom-herb correspon- All topic models are trained using 1,000 Gibbs iterations.
dence set extracted from all training prescriptions as
the external links. 5.1 Generalization Performance
 User-based collaborative filtering with averaging strategy 5.1.1 Herbs Predictive Perplexity
(CF-AVG) [45], a widely used group recommenda- We use the predictive perplexity to evaluate the herbs pre-
tion method. CF-AVG first estimates the rating score dictive power of topic models. Perplexity is a standard
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
YAO ET AL.: A TOPIC MODELING APPROACH FOR TRADITIONAL CHINESE MEDICINE PRESCRIPTIONS 1013

measure for estimating the performance of a probabilistic


model which has been used to evaluate predictive capability
of topic models in previous works [13], [14], [40]. The pre-
dictive perplexity of a set of test herbs given symptoms is
PPtest !
~ sp Þ
p¼1 log pðhp j~
perplexityðhtest jstest Þ ¼ exp  PPtest
p¼1 Nhp
Y Y 1 X (12)
~p j~
pðh sp Þ ¼ pðhpn j~sp Þ ¼ pðhpn jspm Þ;
~ ~
Nsp s 2~s
hpn 2hp hpn 2hp pm p

Fig. 5. Herbs predictive perplexity of each model with different number of


where stest are the symptoms in test prescriptions, htest are topics K. A lower perplexity means the predictive power is better. We run
the herbs in test prescriptions, ~ sp are symptoms in prescrip- all models 10 times and report the mean  standard deviation. Improve-
~p are herbs in prescription p of the test
tion p of the test set, h ments of PTM(a), PTM(b), PTM(c), and PTM(d) over LinkLDA are all sig-
set, Ptest ¼ 5; 019 is the number of prescriptions in the test nificant (p < 103 ) based on 2-tailed paired t-test.
set. Better predictive performance is indicated by a lower
perplexity over test prescriptions. pairs and symptoms can make a symptom and its corre-
The probability of an herb h given a symptom s for PTM is sponding herb appear in the same topic, but meanwhile
X highlight some less related herbs for the symptom.
pðhjsÞ ¼ pðhjk; xÞpðxjp; kÞpðpjkÞpðkjsÞ
p;k;x 5.1.2 Symptoms Predictive Perplexity
X pðkjpÞ pðsjkÞ
¼ pðhjk; xÞpðxjp; kÞ P P The predictive perplexity of a set of test symptoms given
pðkjp 0Þ 0 (13)
p;k;x p0 k0 pðsjk Þ herbs is
X PPtest !
upk f0 log pð~ ~p Þ
sp j h
¼ fkxh ppkx P P ks 0 ; perplexityðstest jhtest Þ ¼ exp 
p¼1
PPtest
p0 u p0 k k0 fk0 s
p;k;x p¼1 Nsp
Y Y (14)
Fig. 5 shows the herbs predictive perplexity of several ~p Þ ¼ ~p Þ ¼ 1 X
pð~
sp j h pðspm jh pðspm jhpn Þ:
topic models with different number of topics. We do not spm 2~
sp spm 2~
sp
Nhp ~
hpn 2hp
compute predictive perplexity for COM and BiBTM because
they only describe generative story of group events (a group The probability of a symptom s given an herb h for PTM is
of symptoms and a selected herb for herbs recommenda-
X X
tions) or herb/symptoms pairs set without modeling pre- pðsjhÞ ¼ pðsjkÞ pðk; xjhÞ
scriptions explicitly. We can see that ATM does not perform k x
well, which implies treating herbs as authors and symptoms X X fkxh (15)
as words is not consistent with the generative story of pre- ¼ f0ks P
scriptions. LinkLDA performs better than ATM, which k x k0 ;x0 fk0 x0 h

shows the correctness of modeling herbs and symptoms as Fig. 6 gives the symptoms predictive perplexity of each
two parts of a prescription. Block-LDA performs better than model with different number of topics. From Table 4, we
LinkLDA, which demonstrates using herb links can improve can see that ATM also does not perform well on symptoms
herb predictive capabilities. Link-PLSA-LDA outperforms prediction, and LinkLDA performs better than ATM again,
LinkLDA, which shows extracting symptom-herb corre- which shows modelling herbs and symptoms as two types
spondences from prescriptions can help herb prediction. of words of a document is a better choice. Block-LDA per-
PTM(a) performs better than LinkLDA and similarly to forms similarly to LinkLDA, which means using extracted
Link-PLSA-LDA, because considering herb roles can high- herb pairs as external links outside the training prescrip-
light most relevant herbs (jun (emperor) and chen (minister) tions could not help symptom prediction much. Link-
herbs) of given symptoms and ignore less relevant herbs. PLSA-LDA significantly outperforms Link-LDA (p < 104 ),
PTM(b) has lower perplexity scores than PTM(a) and Link-
which means herb-symptom links can also help symptom
LDA (p < 103 ), which means considering herb compatibil-
prediction. PTM(a) has lower perplexity than LinkLDA
ity in each prescription can significantly improve the herb
(p < 0:01), which means considering herb roles can signifi-
predictive power. This is intuitive because when seeing a
cantly improve the symptoms predictive power. This is
symptom, practitioners not only use an herb that can treat
the symptom, but also use a compatible herb to augment the because when seeing a list of herbs, the jun-chen-zuo-shi
effect or counteract the toxic [16]. PTM(c) also significantly labels can highlight jun (emperor) herbs and chen (minister)
outperforms PTM(a) (p < 106 ), which demonstrates herbs, and the corresponding symptoms are mainly treated
restricting symptom topic assignments using herb efficacy by jun herbs and chen herbs. PTM(b) performs slightly better
knowledge is also an efficient way to help herbs prediction, than PTM(a), which shows considering compatible herb
this is also intuitive because the knowledge makes an herb may highlight chen (minister) herbs or zuo (assistant) herbs,
and its indication symptoms tend to be under the same topic. which are also used to treat the corresponding symptoms.
PTM(d) has the lowest perplexity scores, and significantly PTM(c) also slightly outperforms PTM(a), which shows
outperforms PTM(c) (p < 106 ), which means considering restricting symptom topic assignment can also improve
both herb compatibility and herb efficacy knowledge leads symptom predictive capability, but the improvement is not
to the best herb predictive power. However, compared to obvious as the improvement in herb prediction task, the rea-
PTM(b), PTM(d) only improves a little, as connecting herb son could be that corresponding symptoms are fewer than
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
1014 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 30, NO. 6, JUNE 2018

Fig. 6. Symptoms predictive perplexity of the topic models with different Fig. 7. Prescription predictive perplexity of the topic models with different
number of topics K. A lower perplexity means the predictive power is number of topics K. A lower perplexity means the predictive power is
better. We run all models 10 times and report the mean  standard devi- better. We run all models 10 times and report the mean  standard devi-
ation. Improvements of PTM(a), PTM(b), PTM(c), and PTM(d) over Link- ation. Improvements of PTM(b) and PTM(d) over LinkLDA are significant
LDA are all significant (p < 0:01) based on 2-tailed paired t-test. (p < 1010 ) based on 2-tailed paired t-test.

herbs in a prescription, which makes the data more suffi-


cient for the symptom prediction task, but incorporating 1 X
sp Þ ¼
pðhj~ pðhjspm Þ: (17)
external knowledge is still useful. PTM(d) performs simi- Nsp s 2~s
pm p
larly to PTM(b), because connecting herb pairs and symp-
toms can highlight some less related herbs for a symptom We then use the top N herbs with the largest probabili-
and weaken the effect of herb-symptom correspondences. ties as the recommendation herbs of our PTM model. Fol-
lowing previous work [46], We use average Precision@N
5.1.3 Prescription Predictive Perplexity (P @N) as the recommendations effectiveness metric.
We also compute prescription predictive perplexity to eval- Precision@N is the proportion of the top N herb recommen-
uate the generalization performance of the whole prescrip- dations that are in the real prescription p. Formally,
tion, including both herbs and symptoms. Following [48], Precision@N is defined as
[49], [50], we add the first half of each test prescription to
jftop N herbsg \ ftrue herbsgj
the training data, while retaining the second half for evalua- Precision@N ¼ : (18)
tion. We estimate prescription level parameters up and pp on jftop N herbsgj
the first half of the test prescriptions, then use the learned
We average the precision@N of all testing prescriptions
parameters to calculate the perplexity. We randomly split
as the final P @N.
the test prescriptions (including both herbs and symptoms)
Table 2 presents herbs Precision@N of each model
into the first half and the second half. The prescription pre-
with different K and N values. Note that CF-AVG and
dictive perplexity is defined as
CF-LM results are always the same because they do not
PPtest ! have the parameter. We can observe that for topic mod-
~ els, the Precision@N scores are generally consistent
p¼1 log pð~ sp ; hp Þ
perplexityðpretest jpretrain Þ ¼ exp  PPtest ; (16) with perplexity scores. A topic model with lower per-
p¼1 ðN sp þ Nhp Þ
plexity scores also tends to achieve higher Precision@N.
PTM(a) performs slightly better than LinkLDA on aver-
where pretest are the test prescriptions, pretrain are the train- age, and it can significantly outperform LinkLDA when
ing prescriptions. K increases (p < 0:01 at K ¼ 40), which shows using
Fig. 7 presents the prescription predictive perplexity of herb roles makes sense and it is more suitable to distin-
several topic models. We do not compute prescription pre- guish herbs roles when there are more treatment meth-
dictive perplexity for ATM because we could not compute ods. All topic models tend to perform better when K
the probability of herbs (authors) in test set given parame- increases, and the highest Precision@N are generally
ters learned in training set. We can note that Block-LDA per- achieved by PTM(b) and PTM(d). CF-AVG and CF-LM
forms worse than Link-LDA. And Link-PLSA-LDA, PTM(a) perform well, because computing symptom (user) simi-
and PTM(c) also do not improve Link-LDA. This is because larities using conditional probability pðhjsÞ can highlight
the symptoms in a prescription are usually fewer than herbs most relevant herbs of a symptom and filter some noise,
in the same prescription. Most “words” among the first half while topic models use bag of words (herbs/symptoms)
and the second half of test prescriptions are herbs. Thus out- representation, they may consider less related herbs.
side herb-pairs set, herb-symptom correspondences and We can further consider using weighted representation
herb roles may not help the herbs prediction given the other of prescriptions for topic modeling. COM could not pro-
herbs in the same prescription. PTM(b) and PTM(d) signifi- duce satisfactory results. It treats symptoms in a pre-
cantly outperform Link-LDA, the reason is that extracting scription as a group of users, and views a group of users
all herb pairs in each prescription can highlight herb co- and an herb selected by the users as an independent
occurrence and help the herbs prediction given other herbs. group event, this can neglect relations among herbs.
BiBTM performs worse than LinkLDA, considering
5.2 Herbs Recommendation herb pairs, symptom pairs and symptom-herb pairs
We compute the following conditional probability of an respectively may ignore structure information of original
herb given a set of test symptoms prescriptions.
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
YAO ET AL.: A TOPIC MODELING APPROACH FOR TRADITIONAL CHINESE MEDICINE PRESCRIPTIONS 1015

TABLE 2
Herbs Precision@N (P @N) of Each Model with Different K (the Number of Topics) and N

K 20 30 40

Model P @5 P @10 P @20 P @5 P @10 P @20 P @5 P @10 P @20

ATM 0.0088  0.0021 0.0091  0.0026 0.0087  0.0005 0.0089  0.0023 0.0093  0.0016 0.0092  0.0004 0.0081  0.0023 0.0089  0.0014 0.0083  0.0006
LinkLDA 0.2301  0.0067 0.1851  0.0015 0.1336 0.0010 0.2277  0.0036 0.1789  0.0023 0.1298  0.0014 0.2188  0.0031 0.1786  0.0014 0.1276  0.0005
Block-LDA 0.2269  0.0030 0.1817  0.0015 0.1321  0.0016 0.2286  0.0029 0.1803  0.0020 0.1300  0.0014 0.2192  0.0052 0.1770  0.0019 0.1283  0.0006
Link-PLSA-LDA 0.2320  0.0037 0.1858  0.0016 0.1356  0.0013 0.2284  0.0036 0.1813  0.0016 0.1392  0.0010 0.2236  0.0034 0.1793  0.0021 0.1297  0.0015
BiBTM 0.2143  0.0000 0.1604  0.0000 0.1216  0.0000 0.2143  0.0000 0.1604  0.0000 0.1216  0.0000 0.2143 0.0000 0.1604  0.0000 0.1216  0.0000
CF-AVG 0.2324  0.0000 0.1933  0.0000 0.1476  0.0000 0.2324  0.0000 0.1933  0.0000 0.1476  0.0000 0.2324  0.0000 0.1933  0.0000 0.1476  0.0000
CF-LM 0.2320  0.0000 0.1936  0.0000 0.1481  0.0000 0.2320  0.0000 0.1936  0.0000 0.1481  0.0000 0.2320  0.0000 0.1936  0.0000 0.1481  0.0000
COM 0.2197  0.0008 0.1731  0.0010 0.1289  0.0008 0.2194  0.0011 0.1746  0.0012 0.1295  0.0007 0.2197  0.0011 0.1745  0.0008 0.1316  0.0005
PTM(a) 0.2320  0.0032 0.1835  0.0027 0.1346  0.0013 0.2299  0.0039 0.1819  0.0019 0.1348  0.0006 0.2241  0.0032 0.1810  0.0033 0.1326  0.0007
PTM(b) 0.2475  0.0029 0.1998  0.0027 0.1497  0.0009 0.2507  0.0029 0.2039  0.0020 0.1525  0.0008 0.2533  0.0024 0.2056  0.0011 0.1528  0.0009
PTM(c) 0.2385  0.0041 0.1920  0.0016 0.1414  0.0008 0.2376  0.0037 0.1880  0.0020 0.1326  0.0007 0.2313  0.0039 0.1846  0.0024 0.1398  0.0006
PTM(d) 0.2486  0.0023 0.2009  0.0019 0.1497  0.0006 0.2522  0.0029 0.2040  0.0022 0.1512  0.0016 0.2528  0.0027 0.2053  0.0011 0.1531  0.0008

We run all models 10 times and report the mean  standard deviation. Improvements of PTM(b), PTM(c), and PTM(d) over LinkLDA are all significant
(p < 0:01) based on 2-tailed paired t-test.

TABLE 3
Symptoms Precision@N (P @N) of Each Model with Different K (the Number of Topics) and N

K 20 30 40

Model P @5 P @10 P @20 P @5 P @10 P @20 P @5 P @10 P @20

ATM 0.0742  0.0006 0.0522  0.0004 0.0375  0.0002 0.0738  0.0009 0.0520  0.0005 0.0367  0.0001 0.0738  0.0008 0.0519  0.0006 0.0363 0.0001
LinkLDA 0.1062  0.0009 0.0719  0.0006 0.0464  0.0002 0.1063  0.0008 0.0715  0.0004 0.0460  0.0003 0.1063  0.0011 0.0709  0.0006 0.0458  0.0001
Block-LDA 0.1027  0.0019 0.0692  0.0009 0.0453  0.0002 0.1040  0.0008 0.0693  0.0008 0.0452  0.0003 0.1038  0.0014 0.0694  0.0006 0.0456  0.0005
Link-PLSA-LDA 0.1081  0.0008 0.0723  0.0005 0.0468  0.0003 0.1080  0.0015 0.0728  0.0005 0.0469  0.0002 0.1085  0.0010 0.0725  0.0005 0.0469 0.0002
BiBTM 0.0750  0.0000 0.0528  0.0000 0.0371 0.0000 0.0749  0.0000 0.0528  0.0000 0.0371  0.0000 0.0749  0.0000 0.0528  0.0000 0.0371  0.0000
CF-AVG 0.1050  0.0000 0.0769  0.0000 0.0514  0.0000 0.1050  0.0000 0.0769  0.0000 0.0514  0.0000 0.1050  0.0000 0.0769  0.0000 0.0514  0.0000
CF-LM 0.0977  0.0000 0.0716  0.0000 0.0478  0.0000 0.0977  0.0000 0.0716  0.0000 0.0478  0.0000 0.0977  0.0000 0.0716  0.0000 0.0478  0.0000
COM 0.0775  0.0009 0.0597  0.0005 0.0413  0.0003 0.0849  0.0013 0.0649  0.0008 0.0437  0.0004 0.0918  0.0013 0.0681  0.0008 0.0449  0.0001
PTM(a) 0.1064  0.0010 0.0717  0.0006 0.0459  0.0003 0.1071  0.0016 0.0714  0.0006 0.0463  0.0003 0.1078  0.0008 0.717  0.0006 0.0469 0.0002
PTM(b) 0.0996  0.0016 0.0697  0.0006 0.0460  0.0002 0.1026  0.0011 0.0713  0.0008 0.0471  0.0004 0.1036  0.0011 0.0722  0.0008 0.0475 0.0002
PTM(c) 0.1018  0.0015 0.0705  0.0005 0.0464  0.0001 0.1038  0.0011 0.0705  0.0005 0.0467  0.0003 0.1029  0.0008 0.0707  0.0003 0.0467 0.0002
PTM(d) 0.0981  0.0012 0.0694  0.0005 0.0453  0.0003 0.1005  0.0013 0.0709  0.0009 0.0460  0.0003 0.1011  0.0008 0.0718  0.0007 0.0469 0.0002

We run all models 10 times and report the mean  standard deviation.

5.3 Symptoms Suggestion than LinkLDA which shows the effect of extracted herb-
We compute the following conditional probability of a symptom correspondences using herb efficacy knowledge.
symptom given a set of test herbs PTM(a) can perform better than LinkLDA when K increases
(p < 0:002 at K ¼ 40 and N ¼ 5), which shows herbs roles
~p Þ ¼ 1 X are more helpful for larger topic number in recommenda-
pðsjh pðsjhpn Þ (19)
Nhp ~ tion tasks. We notice that PTM(b), PTM(c) and PTM(d) have
hpn 2hp
slightly lower perplexity than Link-LDA but cannot achieve
The Precision@N for symptom recommendation is higher symptoms Precision@5. But they can achieve higher
defined as Precision@N than LinkLDA when N increases, which
means they can rank the true symptoms higher on average,
jftop N symptomsg \ ftrue symptomsgj but may not rank true symptoms to top 5. Moreover, the
Precision@N ¼ Precision@N scores are low because symptoms are often
jftop N symptomsgj
few in a prescription.
(20)
We also average the precision@N of all testing prescriptions 5.4 Prescribing Patterns Discovery
as the final P @N. We now evaluate topics learned from all 33,765 prescrip-
Table 3 presents symptoms Precision@N of each model tions by our model. We first qualitatively show some topics.
with different K and N values. We note that ATM and Then we quantitatively evaluate learned topics by compar-
BiBTM do not perform well as in herbs recommendation, ing to TCM prior knowledge.
the reasons are also similar. COM also neglects symptoms
correlations, so it cannot produce satisfactory results.
CF-AVG and CF-LM perform well when N is large, the con- 5.4.1 Qualitative Results
ditional probability pðsjhÞ can also highlight most relevant Table 4 presents three topics generated by several topic
symptoms of an herb. Link-PLSA-LDA performs better models with K ¼ 25. We show top 10 symptoms on the left
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
1016 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 30, NO. 6, JUNE 2018

TABLE 4
Example Topics Learned by Several Topic Models with K ¼ 25

Blood-regulating Nourishing heart and tranquilizing mind Harmonizing intestines and stomach
ATM

oppression in the chest Semen Trichosanthis abdominal pain Longtube Groundivy Herb spontaneous sweating Folium Hibisci Mutabilis
aversion to cold Mercury Oxidum amnesia Fluorite abdominal fullness Amber
stomachache Calculus Equi measles Nardostachys Root chronic shank ulcer Ardisia Japonica
profuse spittle Terminalia chebula Retz palpitation Bamboo Shavings bloody stool Snakegourd Root
hyperopia Folium Phyllostach Lophatheri vomiting Emblic Leafflower Fruit stomach reflux Coffea Arabica
waggling tongue Serissa Serissoides infantile malnutrition Radix Boehmeriae borborigmus Officinal Magnolia Flower
hypermenorrhea Pharbitis Seed arthralgia Motherwort Herb dizziness Chives
palpitations below the heart Hibiscus Mutabilis metrorrhagia Lotus Leaf retention of the lochia Fermented Soybean
postpartum metrorrhagia Air Potato indigestion Radix Aconiti Kusnezoffii greenish complexion Rumex Japonicus
vomiting Pilose Antler tremor of feet Foeniculum Vulgare rigidity of limbs Fruit of Sharpleaf Calangal
LinkLDA

epistaxis Chinese Angelica palpitation Common Yam Rhizome vomiting Common Aucklandia Root
bloody stool Paeonia Veitchii amnesia Dodder Seed nausea Clove
hemafecia Red Peony Root deafness Eucommia Bark borborigmus Fructus Amomi Rotundus
hemoptysis Liquorice Root lumbago Chinese Magnoliavine Fruit stomach reflux Chinese Eaglewood Wood
hematuria Paeonia Suffruticosa frequent urination Asiatic Cornelian Cherry Fruit acid swallow Foeniculum Vulgare
dizziness Unprocessed Rehmannia night sweating Achyranthes Bidentata tenesmus Nutmeg
Root
heaviness of head Debark Peony Root enuresis Desertliving Cistanche abdomen cold Medicine Terminalia Fruit
hypermenorrhea Tree Peony Root Bark dreamfulness Prepared Rehmannia Root dysphagia Villous Amomum Fruit
hematemesis Cattail Pollen infertility Barbary Wolfberry Fruit abdominal pain Cablin Patchouli Herb
infertility Colla Corii Asini dizziness Pilose Antler spasm Cardamon Fruit
Block-LDA

hematemesis Paeonia Veitchii amnesia Milkwort Root vomiting Dried Tangerine Peel
bloody stool Chinese Angelica lumbago Achyranthes Bidentata acid swallow Officinal Magnolia Bark
epistaxis Red Peony Root dizziness Eucommia Bark nausea Villous Amomum Fruit
limbs pain Unprocessed Rehmannia palpitation Common Yam Rhizome epigastric upset Massa Medicata Fermentata
Root
hematuria Liquorice Root night sweating Chinese Magnoliavine Fruit belching Atractylodes Rhizome
hemoptysis Debark Peony Root frequent urination Dodder Seed dysphagia Nutgrass Galingale
Rhizome
retention of the lochia Paeonia Suffruticosa enuresis Prepared Rehmannia Root diarrhea Cablin Patchouli Herb
hemafecia Tree Peony Root Bark dreamfulness Asiatic Cornelian Cherry Fruit anorexia Hawthorn Fruit
retention of placenta Cattail Pollen fatigue Desertliving Cistanche hiccup Green Tangerine peel
yellow sweat Sichuan Lovage Rhizome deafness Dendrobium stomach reflux Pinellia Tuber
Link-PLSA-LDA
white vaginal discharge Chinese Angelica dizziness Dwarf Lilyturf Tuber vomiting Common Aucklandia Root
red and white vaginal Debark Peony Root palpitation Milkwort Root abdominal pain Clove
discharge
hematemesis Sichuan Lovage Rhizome amnesia Common Yam Rhizome nausea Fructus Amomi Rotundus
threatened abortion Paeonia Veitchii dreaminess Salvia Root borborygmus Chinese Eaglewood Wood
tidal fever Paeonia Suffruticosa vertigo Tangshen regurgitation Nutmeg
infertility Tree Peony Root Bark oppression in chest Chinese Angelica acid regurgitation Villous Amomum Fruit
vaginal bleeding during Nutgrass Galingale Rhizome vexation Chinese Magnoliavine Fruit dysphagia Cablin Patchouli Herb
pregnancy
hypochondriac pain Unprocessed Rehmannia insomnia Grassleaf Sweetflag Rhizome hiccup Foeniculum Vulgare
Root
flooding and spotting Prepared Rehmannia Root fatigue Spine Date Seed abdomen cold Medicine Terminalia Fruit
bloody stool Colla Corii Asini night sweating Debark Peony Root stomachache Cardamon Fruit
PTM(a)

hematemesis Chinese Angelica dizziness Milkwort Root abdominal pain Common Aucklandia Root
epistaxis Paeonia Veitchii palpitation Chinese Magnoliavine Fruit vomiting Clove
hemafecia Liquorice Root amnesia Common Yam Rhizome nausea Fructus Amomi Rotundus
hematuria Paeonia Suffruticosa lumbago Eucommia Bark borborygmus Chinese Eaglewood Wood
bloody stool Unprocessed Rehmannia deafness Achyranthes Bidentata spasm Cablin Patchouli Herb
Root
hemoptysis Debark Peony Root dreaminess Dodder Seed diarrhea Foeniculum Vulgare
flooding and spotting Colla Corii Asini anorexia Cornus Officinalis vomiting and diarrhea Nutmeg
menorrhagia Radix Ophiopogonis fatigue Grassleaf Sweetflag Rhizome regurgitation Villous Amomum Fruit
shortage of qi Eriobotrya Japonica vertigo Desertliving Cistanche abdomen cold Officinal Magnolia Bark
glossorrhagia Tree Peony Root Bark frequent urination Chinese Arborvitae kernel acid regurgitation Common Floweringqince
Fruit
PTM(b)

hematemesis Golden Thread amnesia Milkwort Root vomiting Fructus Amomi Rotundus
tidal fever Liquorice Root blurred vision Lightyellow Sophora Root abdominal pain Clove
night sweating Radix Bupleuri dizziness Liquorice Root nausea Common Aucklandia Root
bloody stool Turtle Carapace palpitation Poria borborigmus Liquorice Root
infantile malnutrition Figwortflower Picrorhiza vexation Chinese Angelica acid regurgitation Ginseng
Rhizome
epistaxis Chinese Angelica insomnia Divaricate Saposhnikovia Root spasm Officinal Magnolia Bark

Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
YAO ET AL.: A TOPIC MODELING APPROACH FOR TRADITIONAL CHINESE MEDICINE PRESCRIPTIONS 1017

TABLE 4
(Continued )

Blood-regulating Nourishing heart and tranquilizing mind Harmonizing intestines and stomach
emaciation Areca Seed dreaminess Ginseng abdomen cold White Atractylodes
Rhizome
abdominal pain Rangooncreeper Fruit dysphoria Spine Date Seed regurgitation Fresh Ginger
indigestion Common Aucklandia Root weep Fleeceflower Root abdominal fullness Radix Aconiti Lateralis
Preparata
flooding and spotting Massa Medicata Fermentata headache Chrysanthemum Flower bitter taste in mouth Dried Ginger

PTM(c)
abdominal pain Chinese Angelica lumbago Chinese Magnoliavine Fruit abdominal pain Common Aucklandia Root
hematemesis Debark Peony Root deafness Milkwort Root borborigmus Fructus Amomi Rotundus
red and white vaginal Sichuan Lovage Rhizome amnesia Eucommia Bark vomiting Clove
discharge
flooding and spotting Colla Corii Asini night sweating Achyranthes Bidentata nausea Foeniculum Vulgare
dystocia Nutgrass Galingale Rhizome shortness of breath Dodder Seed lumbago Common Buried Tuber
metrostaxis Paeonia Veitchii dizziness Common Yam Rhizome abdomen cold Nutmeg
white vaginal discharge Argy Wormwood Leaf blurred vision Asiatic Cornelian Cherry Fruit hiccup Zedoray Rhizome
metrorrhagia Prepared Rehmannia Root frequent urination Grassleaf Sweetflag Rhizome tenesmus Chinese Eaglewood Wood
threatened abortion Cattail Pollen spontaneous sweating Desertliving Cistanche acid regurgitation Areca Seed
vaginal bleeding during Motherwort Herb infertility Dendrobium halitosis Green Tangerine peel
pregnancy

PTM(d)
hematemesis Oyster Shell amnesia Dodder Seed vomiting Fructus Amomi Rotundus
abdominal pain Chinese Angelica lumbago Poria abdominal pain Ginseng
bloody stool Bone Fossil of Big Mammals night sweating Milkwort Root borborygmus Dried Ginger
white vaginal discharge Garden Burnet Root dizziness Achyranthes Bidentata nausea White Atractylodes
Rhizome
night sweating Liquorice Root deafness Chinese Magnoliavine Fruit acid regurgitation Liquorice Root
metrorrhagia Red Halloysite palpitation Desertliving Cistanche reversal cold of hands and Radix Aconiti Lateralis
feet Preparata
red and white vaginal Colla Corii Asini white vaginal discharge Chinese Angelica spasm Officinal Magnolia Bark
discharge
metrostaxis Golden Thread blurred vision Pilose Antler abdomen cold Fresh Ginger
tenesmus Dried Ginger infertility Eucommia Bark abdominal fullness Common Aucklandia Root
epistaxis Debark Peony Root frequent urination Ginseng hiccup Poria

We show top 10 symptoms (left) and top 10 herbs (right). Symptoms italicized and marked in red do not appear in other symptoms’ syndrome categories. Herbs
italicized and marked in red could not treat the top 10 symptoms. We manually labeled the topic names.

and top 10 herbs on the right (The probability


P of an herb spotting. Nutgrass Galingale Rhizome can treat threatened
given a topic for PTM is pðhjkÞ ¼ p;x pðhjk; xÞpðxjp; kÞp abortion and flooding and spotting. Prepared Rehmannia
u
ðpjkÞ ¼ fkxh ppkx P pku ). We do not present topics of COM Root can treat tidal fever, infertility and flooding and spot-
0
p0 p k
ting. (5). PTM(a) finds coherent symptoms, Eriobotrya
and BiBTM because all the K topics in each model are basi-
Japonica can treat hematemesis and hemoptysis. (6). PTM
cally the same. Symptoms italicized and marked in red do
(b) finds coherent symptoms and nine correct herbs. Golden
not appear in other topical symptoms’ syndrome categories Thread and Figwortflower Picrorhiza Rhizome can treat
in [41]. Herbs italicized and marked in red could not treat hematemesis, tidal fever and night sweating. Liquorice Root
the top 10 symptoms (validated by TCM MeSH symptom- and Common Aucklandia Root can treat abdominal pain.
herb correspondences in Section 4.3). Note that for PTM’s Radix Bupleuri can treat infantile malnutrition. Turtle Cara-
topics that are not discovered by the baseline models, we pace can treat tidal fever. Areca Seed can treat abdominal
try to find the best possible matches from the topics of the pain and indigestion. Rangooncreeper Fruit can treat infan-
baseline models. tile malnutrition and abdominal pain. (7). PTM(c) finds nine
The first topic is about blood-related symptoms and their blood-related symptoms and nine herbs are correct. Sichuan
corresponding blood-regulating herbs. We can see that: (1). Lovage Rhizome can treat abdominal pain. Argy Worm-
ATM could not find good topic. On the left, only hyperme- wood Leaf can treat flooding and spotting, hematemesis
norrhea and postpartum metrorrhagia are blood-related and threatened abortion. Cattail Pollen can treat flooding
symptoms. On the right, none of the ten herbs can treat the and spotting, hematemesis and abdominal pain. Mother-
ten symptoms on the left. (2). LinkLDA finds much better wort Herb can treat vaginal bleeding during pregnancy,
topic. On the left, seven symptoms are blood-related symp- dystocia and abdominal pain. (8). PTM(d) finds nine correct
toms. On the right, Chinese Angelica can treat hypermenor- herbs. Oyster Shell and Bone Fossil of Big Mammals can
rhea. Red Peony Root, Paeonia Suffruticosa, Unprocessed treat night sweating. Garden Burnet Root can treat red and
Rehmannia Root and Tree Peony Root Bark can treat hema- white vaginal discharge, white vaginal discharge, hematem-
temesis. Cattail Pollen can treat hematemesis and bloody esis and bloody stool. Red Halloysite can treat bloody stool.
stool. Colla Corii Asini can treat hematemesis, bloody stool The second topic is about “nourishing heart and tranquiliz-
and hematuria. (3). Block-LDA finds six blood-related ing mind”. We can find that: (1). ATM could not find good
symptoms and five correct herbs. (4). Link-PLSA-LDA can topic again. On the left, only amnesia and palpitation are men-
find coherent symptoms and accurate herbs. Debark Peony tal symptoms. On the right, only Fluorite can treat the mental
Root can treat hypochondriac pain and flooding and symptoms palpitation. (2). LinkLDA finds much better topic
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
1018 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 30, NO. 6, JUNE 2018

TABLE 5
Example Topic Roles Learned by PTM(a) with K ¼ 25

Blood-regulating
Symptoms Role 0 Role 1 Role 2 Role 3
hematemesis Chinese Angelica Paeonia Suffruticosa Liquorice Root Colla Corii Asini
epistaxis Paeonia Veitchii Tree Peony Root Bark Eriobotrya Japonica Chinese Angelica
hemafecia Liquorice Root Chinese Angelica Loquat Leaf Unprocessed Rehmannia Root
hematuria Debark Peony Root Paeonia Veitchii Radix Ophiopogonis Cattail Pollen
bloody stool Red Peony Root Unprocessed Rehmannia Root Ginseng Debark Peony Root
hemoptysis Caulis Akebiae Prepared Rehmannia Root Bamboo Shavings Garden Burnet Root
flooding and spotting Radix Ophiopogonis Golden Thread Reed Rhizome Sophora Flower
menorrhagia Beautiful Sweetgum Resin Radix Ophiopogonis Unprocessed Rehmannia Root Panax Notoginseng
shortage of qi Lotus Rhizome Node Red Peony Root Pyrus Bretschneideri Chinese Arborvitae Twig and Leaf
glossorrhagia Orange Fruit Liquorice Root Egg India Madder Root
Nourishing heart and tranquilizing mind
Symptoms Role 0 Role 1 Role 2 Role 3
dizziness Desertliving Cistanche Common Yam Rhizome Milkwort Root Milkvetch Root
palpitation Dodder Seed Eucommia Bark Chinese Magnoliavine Fruit Deer horm
amnesia Achyranthes Bidentata Asiatic Cornelian Cherry Fruit Grassleaf Sweetflag Rhizome Fleeceflower Root
lumbago Pilose Antler Achyranthes Bidentata Chinese Arborvitae kernel Tangshen
deafness Dendrobium Oriental Waterplantain Rhizome Spine Date Seed Prepared Rehmannia Root
dreaminess Eucommia Bark Chinese Magnoliavine Fruit Salvia Root Barbary Wolfberry Fruit
anorexia Morinda Root Prepared Rehmannia Root Dwarf Lilyturf Tuber Dodder Seed
fatigue Asiatic Cornelian Cherry Fruit Gordon Euryale Seed Poria Ligustrum Lucidum
vertigo Chinese Magnoliavine Fruit Radix Codonopsis Dimocarpus Longan Deer-Horm Glue
frequent urination Palmleaf Raspberry Fruit Malaytea Scurfpea Fruit Arillus Longan Glossy Privet Fruit

We show top 10 symptoms (left) and top 10 herbs of each role (right). Herbs italicized and marked in red could not treat the top 10 symptoms.

again. On the left, nine symptoms are mental symptoms left, eight symptoms are intestines and stomach-related
except infertility. On the right, Dodder Seed can treat enure- symptoms. On the right, Common Aucklandia Root can treat
sis. Eucommia Bark can treat dizziness and lumbago. Chinese abdominal pain, vomiting, borborigmus and tenesmus.
Magnoliavine Fruit can treat palpitation, night sweating and Clove, Fructus Amomi Rotundus and Chinese Eaglewood
enuresis. Asiatic Cornelian Cherry Fruit can treat dizziness, Wood can treat vomiting and abdomen cold. Foeniculum Vul-
deafness, frequent urination and enuresis. Desertliving Cis- gare can treat abdominal pain, abdomem cold and vomiting.
tanche can treat lumbago and infertility. Prepared Rehmannia Nutmeg and Cardamon Fruit can treat vomiting. Villous
Root can treat palpitation, deafness, night sweating and infer- Amomum Fruit can treat abdominal pain, vomiting and nau-
tility. Barbary Wolfberry Fruit can treat dizziness. Pilose Ant- sea. Cablin Patchouli Herb can treat abdominal pain and vom-
ler can treat deafness and infertility. (3). Block-LDA finds iting. (3). Block-LDA finds coherent symptoms and six correct
coherent symptoms and seven correct herbs. Milkwort Root herbs. Officinal Magnolia Bark and Atractylodes Rhizome
can treat amnesia and dreamfulness. (4). Link-PLSA-LDA can treat anorexia. Nutgrass Galingale Rhizome can treat acid
finds good topic. Dwarf Lilyturf Tuber and Salvia Root can regurgitation and belching. Pinellia Tuber can treat vomiting
treat vexation. Tangshen and Chinese Angelica can treat pal- and stomach reflux. (4). Link-PLSA-LDA performs very well
pitation. Milkwort Root can treat amnesia and dreaminess. on symptoms and makes one mistake on herbs. (5). PTM(a)
Grassleaf Sweetflag Rhizome can treat amnesia. Spine Date finds both nine correct symptoms and herbs. Common Flow-
Seed can treat dreaminess and night sweating. Debark Peony eringqince Fruit can treat spasm. (6). PTM(b) also performs
Root can treat night sweating. (5). PTM(a) finds coherent well on symptoms. On the right, Fresh Ginger can treat vomit-
symptoms and seven correct herbs. Cornus Officinalis can ing. Radix Aconiti Lateralis Preparata can treat spasm. Dried
treat dizziness, deafness and frequent urination. Chinese Ginger can treat vomiting and abdomen cold. (7). PTM(c)
Arborvitae kernel can treat palpitation and amnesia. (6). PTM finds ten intestines and stomach-related symptoms and eight
(b) finds nine correct herbs. Liquorice Root and Fleeceflower correct herbs. Green Tangerine peel can treat abdominal pain.
Root can treat palpitation. Poria can treat amnesia and palpi- (8). PTM(d) finds nine intestines and stomach-related symp-
tation. Divaricate Saposhnikovia Root can treat headache. toms and eight correct herbs. Poria can treat vomiting.
Ginseng can treat dysphoria. Chrysanthemum Flower can From the three topics, we observe that our prescription
treat blurred vision and headache. (7). All the ten symptoms topic model could find topics that reflect TCM prescribing
found by PTM(c) are mental symptoms and seven herbs are patterns well. After incorporating herb compatibility and
correct. (8). PTM(d) finds ten mental symptoms and eight cor- herb efficacy knowledge, the patterns discovery capability
rect herbs. Pilose Antler can treat deafness and infertility. can be improved as shown in PTM(b), PTM(c), PTM(d) and
The third topic presents intestines and stomach-related Link-PLSA-LDA topics.
symptoms and herbs for “Harmonizing intestines and stom- Table 5 shows four roles’ top herbs of two topics gener-
ach”. We can note that: (1). ATM still finds poor topic. On the ated by PTM(a). In the “Blood-regulating” topic, we can see
left, only abdominal fullness, stomach reflux and borborig- that all ten herbs of Role 3 can treat at least one of the symp-
mus are intestines and stomach-related symptoms. On the toms, and we find seven of the ten herbs can treat at least 3
right, none of the ten herbs can treat the ten symptoms on the symptoms of the top ten symptoms. Because Role 3 treats
left. (2). LinkLDA still shows its superiority to ATM. On the main symptoms of the syndrome, we can label it as jun
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
YAO ET AL.: A TOPIC MODELING APPROACH FOR TRADITIONAL CHINESE MEDICINE PRESCRIPTIONS 1019

others. Block-LDA and COM perform similarly to Link-


LDA, and they can find reasonable symptom topics. PTM
(a) also performs similarly to LinkLDA, the difference is
not obvious. Despite of significant predictive perplexity
difference, the symptoms extracted by two models remain
almost the same, at least for the top 10 symptoms. This
result is similar to the discovery in [51] that the lower per-
plexity may not enhance interpretability of inferred topics.
Link-PLSA-LDA, PTM(b), PTM(c) and PTM(d) find more
coherent symptoms than others, which shows using herb-
herb links and herb efficacy knowledge can improve the
Fig. 8. Average topic symptom coherence of several topic models with coherence of symptoms.
different K (the number of topics) and top 10 symptoms. We run all mod- Topic Herb Precision. We compute the herb precision to
els 10 times and report the mean  standard deviation of all topics’ aver- measure topics’ herb quality, the herb precision is defined
age coherence for each model. PTM(d) significantly outperforms as: if an herb hki in the top N herbs of topic k can treat a
LinkLDA (p < 0:05) based on 2-tailed paired t-test.
symptom skj in the top N symptoms Sk of k (validated by
TCM MeSH symptom-herb correspondences in Section 4.3),
(emperor). We may also label Role 1 as chen (minister) and we label the herb hki as a correct herb, and the herb preci-
Role 0 as zuo (assistant). Because Role 2 has the least corre- sion is the proportion of correct herbs in the top N herbs.
sponding herbs and Liquorice Root is the most widely used Formally, the herb precision of topic k is given as
shi (courier) herb, we can label Role 2 as shi (courier). For
“Nourishing heart and tranquilizing mind” topic, we may 1X N
precisionðkÞ ¼ 1f9skj : skj 2Sk ^ hki treats skj g (22)
label Role 2 as jun (emperor), Role 3 as chen (minister), Role 0 N i¼1
as zuo (assistant) and Role 1 as shi (courier) in a similar way.
Table 6 presents average topic herb precision of several
5.4.2 Quantitative Results topic models with different number of topics and N ¼ 10.
We can see that most herbs learned by LinkLDA, Block-
We now quantitatively evaluate learned topics by compar-
LDA, Link-PLSA-LDA and PTM can be validated, while
ing to TCM prior knowledge. We want to determine: (1)
ATM and COM could not find accurate herbs. We think the
whether symptoms under a topic are closely related and
reasons are similar as that in predictive tasks, and we also
can represent a certain syndrome? (2) how many herbs in a
note that COM tends to find most commonly used herbs in
topic can treat the corresponding symptoms?
all topics because common herbs are selected by most symp-
Topic Symptom Coherence. We define topic symptom
tom groups. PTM(a) performs similarly to LinkLDA, which
coherence to measure topics’ symptom quality. Let N be the
is also similar to the discovery in [51]. PTM(b), PTM(c)
number of top symptoms of a topic k, ski be a symptom in
and PTM(d) find more accurate herbs than others and the
top N symptoms, c be any syndrome category in [41], and
improvements are significant (p < 0:01), which shows using
Sc be symptoms in c. For each pair of top symptoms ski and
herb compatibility and herb efficacy knowledge can also
skj , we check if they co-occur in any category c. Formally,
improve the precision of herbs.
the symptom coherence of topic k is given as
N X
X j1
5.5 Discussion
coherenceðkÞ ¼ 1f9c: ski 2Sc ^ skj 2Sc g (21)
From experimental results, we can conclude that ATM is not
j¼2 i¼1
suitable for TCM prescriptions modeling, and it shows
Fig. 8 shows average topic symptom coherence of sev- the worst performances on generalization performance,
eral topic models with different number of topics and herbs/symptoms recommendation and treatment patterns
N ¼ 10. For COM, we treat symptoms as users and herbs discovery. Group recommendation methods such as CF-
as items. We can observe that ATM could not find coherent AVG, CF-LM and COM perform well on herbs/symptoms
symptoms, and the coherence scores are much lower than recommendation tasks, but could not discover meaningful

TABLE 6
Average Topic Herb Precision of Several Topic Models with Different K (the Number of Topics) and Top 10 Symptoms/Herbs

K 5 10 15 20 25 30 35 40
Model
ATM 0.236  0.058 0.252  0.042 0.234  0.032 0.228  0.024 0.217  0.025 0.233  0.015 0.237  0.014 0.231  0.034
LinkLDA 0.750  0.044 0.693  0.211 0.660  0.032 0.647  0.025 0.622  0.031 0.639  0.023 0.624  0.026 0.606  0.025
Block-LDA 0.710  0.050 0.652  0.034 0.621  0.034 0.575  0.019 0.589  0.045 0.582  0.027 0.595  0.027 0.584  0.027
Link-PLSA-LDA 0.778  0.048 0.711  0.031 0.677  0.029 0.690  0.027 0.696  0.019 0.701  0.021 0.693  0.018 0.678  0.013
COM 0.574  0.038 0.461  0.042 0.419  0.028 0.421  0.032 0.406  0.015 0.393  0.023 0.378  0.022 0.382  0.017
PTM(a) 0.774  0.040 0.710  0.049 0.647  0.026 0.618  0.030 0.597  0.026 0.615  0.025 0.593  0.019 0.579  0.024
PTM(b) 0.836  0.042 0.781  0.034 0.749  0.031 0.713  0.009 0.699  0.026 0.701  0.022 0.684  0.019 0.670  0.017
PTM(c) 0.864  0.034 0.817  0.026 0.807  0.025 0.820  0.013 0.808  0.023 0.808  0.012 0.801  0.010 0.800  0.013
PTM(d) 0.866  0.044 0.817  0.028 0.803  0.018 0.770  0.015 0.790  0.021 0.780  0.025 0.763  0.011 0.770  0.022

We run all models 10 times and report the mean  standard deviation of all topics’ average precision for each model. PTM(b), PTM(c), and PTM(d) significantly
outperform others (p < 0:01) based on 2-tailed paired t-test.
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
1020 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 30, NO. 6, JUNE 2018

treatment patterns. LinkLDA performs relatively well on all [7] S. K. Poon, et al., “A novel approach in discovering significant
interactions from TCM patient prescription data,” Int. J. Data Min-
four tasks. Block-LDA and BiBTM generally do not improve ing Bioinf., vol. 5, no. 4, pp. 353–368, 2011.
LinkLDA because they model herb/symptoms pairs outside [8] N. L. Zhang, R. Zhang, and T. Chen, “Discovery of regularities in
training prescriptions and may ignore the original prescrip- the use of herbs in traditional chinese medicine prescriptions,” in
tions structures. By considering herb roles, PTM(a) can New Frontiers in Applied Data Mining. Berlin, Germany: Springer,
obtain better generalization and herbs/symptoms recom- 2012, pp. 353–360.
[9] P. He, K. Deng, Z. Liu, D. Liu, J. S. Liu, and Z. Geng, “Discovering
mendation performance, but the treatment patterns discov- herbal functional groups of traditional chinese medicine,” Statist.
ery capabilities are not improved. Nevertheless, PTM(a) Med., vol. 31, no. 7, pp. 636–642, 2012.
could infer herb roles in a prescription, and herb roles infer- [10] L. Yao, Y. Zhang, and B. Wei, “An evolution system for traditional
ence in each prescription is another interesting problem to chinese medicine prescription,” in Knowledge Engineering and Man-
agement. Berlin, Germany: Springer, 2014, pp. 95–106.
explore. By incorporating herb compatibility, PTM(b) further [11] G. Zheng, M. Jiang, C. Lu, and A. Lu, “Prescription analysis and
gains better performances on all four tasks. By incorporating mining,” in Data Analytics for Traditional Chinese Medicine Research.
herb efficacy knowledge, Link-PLSA-LDA and PTM(c) also Berlin, Germany: Springer, 2014, pp. 97–109.
gain better performances on all four tasks. PTM(d) generally [12] X. Zhang, X. Zhou, H. Huang, S. Chen, and B. Liu, “A hierarchical
achieves the best results on all tasks because it considers symptom-herb topic model for analyzing traditional chinese med-
icine clinical diabetic data,” in Proc. 3rd Int. Conf. Biomed. Eng. Inf.,
both herb compatibility and herb efficacy knowledge. These 2010, pp. 2246–2249.
results demonstrate it is necessary to consider TCM back- [13] X. Zhang, X. Zhou, H. Huang, Q. Feng, S. Chen, and B. Liu, “Topic
ground in TCM data analysis, and this work can be a promis- model for chinese medicine diagnosis and prescription regulari-
ing start for incorporating domain knowledge into the ties analysis: Case on diabetes,” Chin. J. Integrative Med., vol. 17,
pp. 307–313, 2011.
prescription topic modeling. [14] Z. Jiang, X. Zhou, X. Zhang, and S. Chen, “Using link topic model
to analyze traditional chinese medicine clinical symptom-herb
regularities,” in Proc. IEEE 14th Int. Conf. E-Health Netw., Appl..
6 CONCLUSION AND FUTURE WORK Serv., 2012, pp. 15–18.
[15] L. Yao, et al.“Discovering treatment pattern in traditional chinese
This paper presented a novel topic model for TCM prescrip- medicine clinical cases by exploiting supervised topic model
tions. It characterizes the generative process of prescriptions and domain knowledge,” J. Biomed. Inf., vol. 58, pp. 260–267,
in TCM theories. Using 33,765 prescriptions, this model can 2015.
discover the prescribing patterns in TCM. Furthermore, it [16] Z. Deng, Formulae of Chinese Medicine. Hong Kong: China Press of
Traditional Chinese Medicine, 2008, [in Chinese, ISBN:
can outperform several previous methods on recommend- 9787532384761].
ing herbs for a list of symptoms and predicting symptoms [17] Y. Huang, et al., “Exploring the rules of li-fa-fang-yao on diabetes
for a prescription. The method is helpful for clinical mellitus within traditional chinese medicine through text min-
research and practice. ing,” in Proc. 7th Int. Conf. Comput. Convergence Technol., 2012,
In future work, we plan to incorporate more prescription pp. 1369–1373.
[18] S. Wang et al., “Compatibility art of traditional chinese medicine:
information such as usage, form and herbal dosage, and From the perspective of herb pairs,” J. Ethnopharmacology, vol. 143,
more domain knowledge such as symptoms’ syndrome cat- no. 2, pp. 412–423, 2012.
egory as prior knowledge into our model. And evaluating [19] D. M. Blei, “Probabilistic topic models,” Commun. ACM, vol. 55,
herb roles inferred by our model is an interesting problem no. 4, pp. 77–84, 2012.
[20] L. Fei-Fei and P. Perona, “A Bayesian hierarchical model for learn-
we are going to investigate. ing natural scene categories,” in Proc. IEEE Comput. Soc. Conf.
Comput. Vis. Pattern Recognit., 2005, pp. 524–531.
ACKNOWLEDGMENTS [21] E. Bart, M. Welling, and P. Perona, “Unsupervised organization of
image collections: Taxonomies and beyond,” IEEE Trans. Pattern
This work is supported by the National Natural Science Anal. Mach. Intell., vol. 33, no. 11, pp. 2302–2315, Nov. 2011.
Foundation of China (No. 61572434), the China Knowledge [22] Z. Huang, W. Dong, P. Bath, L. Ji, and H. Duan, “On mining latent
treatment patterns from electronic medical records,” Data Mining
Centre for Engineering Sciences and Technology (No. Knowl. Discovery, vol. 29, no. 4, pp. 914–949, 2015.
CKCEST-2017-1-3), the Zhejiang Provincial Natural Science [23] D. Andrzejewski, X. Zhu, and M. Craven, “Incorporating domain
Foundation of China (No. LY14F020027), and the Special- knowledge into topic modeling via dirichlet forest priors,” in
ized Research Fund for the Doctoral Program of Higher Proc. 26th Annu. Int. Conf. Mach. Learn., 2009, pp. 25–32.
[24] D. Andrzejewski, X. Zhu, M. Craven, and B. Recht, “A framework
Education (SRFDP) (20130101110136). for incorporating general domain knowledge into latent dirichlet
allocation using first-order logic,” in Proc. 22nd Int. Joint Conf.
REFERENCES Artif. Intell., vol. 22, no. 1, 2011, Art. no. 1171.
[25] R. Balasubramanyan, B. Dalvi, and W. W. Cohen, “From topic
[1] F. Cheung, “TCM: Made in china,” Nature, vol. 480, no. 7378, models to semi-supervised learning: Biasing mixed-membership
pp. S82–S83, 2011. models to exploit topic-indicative features in entity clustering,” in
[2] J. Qiu, “Traditional medicine: A culture in the balance,” Nature, Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases, 2013,
vol. 448, no. 7150, pp. 126–128, 2007. pp. 628–642.
[3] H. Peng, Dictionary of Traditional Chinese Medicine Prescriptions. [26] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles
Beijing: People Health Press, 1996, [in Chinese, ISBN: 7117018879]. and Techniques. Cambridge, MA, USA: MIT press, 2009.
[4] X. Zhou, et al., “Development of traditional chinese medicine [27] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet
clinical data warehouse for medical knowledge discovery and allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003.
decision support,” Artif. Intell. Med., vol. 48, no. 2, pp. 139–152, [28] J. K. Pritchard, M. Stephens, and P. Donnelly, “Inference of popu-
2010. lation structure using multilocus genotype data,” Genetics,
[5] H. Yang, et al., “New drug R&D of traditional chinese medicine: vol. 155, no. 2, pp. 945–959, 2000.
Role of data mining approaches,” J. Biol. Syst., vol. 17, no. 03, [29] X. Chen, T. He, X. Hu, Y. An, and X. Wu, “Inferring functional
pp. 329–347, 2009. groups from microbial gene catalogue with probabilistic topic
[6] S. Li, B. Zhang, D. Jiang, Y. Wei, and N. Zhang, “Herb network models,” in Proc. IEEE Int. Conf. Bioinf. Biomed., 2011, pp. 3–9.
construction and co-module analysis for uncovering the combina- [30] A. Van Esbroeck, C.-C. Chia, and Z. Syed, “Heart rate topic
tion rule of traditional chinese herbal formulae,” BMC Bioinf., models,” in Proc. 26th AAAI Conf. Artif. Intell., 2012, pp. 1635–1641.
vol. 11, no. 11, 2010, Art. no. 1.
Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.
YAO ET AL.: A TOPIC MODELING APPROACH FOR TRADITIONAL CHINESE MEDICINE PRESCRIPTIONS 1021

[31] Z. Huang, W. Dong, L. Ji, C. He, and H. Duan, “Incorporating Liang Yao received the BE degree from the Col-
comorbidities into latent treatment pattern mining for clinical lege of Computer Science, Sichuan University,
pathways,” J. Biomed. Inform., vol. 59, pp. 227–239, 2016. Chengdu, China, in 2012. He is currently working
[32] I. Yoo, et al., “Data mining in healthcare and biomedicine: A survey toward the PhD degree from the College of Com-
of the literature,” J. Med. Syst., vol. 36, no. 4, pp. 2431–2448, 2012. puter Science and Technology, Zhejiang Univer-
[33] N. Esfandiari, M. R. Babavalian, A.-M. E. Moghadam, and sity. His current research interests include data
V. K. Tabar, “Knowledge discovery in medicine: Current issue mining, medical informatics, natural language
and future trend,” Expert Syst. Appl., vol. 41, no. 9, pp. 4434–4463, processing, topic models, and probabilistic graph-
2014. ical models.
[34] Y. Feng, Z. Wu, X. Zhou, Z. Zhou, and W. Fan, “Knowledge dis-
covery in traditional chinese medicine: State of the art and
perspectives,” Artif. Intell. Med., vol. 38, no. 3, pp. 219–236, 2006.
[35] S. Lukman, Y. He, and S.-C. Hui, “Computational methods for tra- Yin Zhang received the PhD degree in computer
ditional chinese medicine: A survey,” Comput. Methods Programs science from Zhejiang University, in 1999. Cur-
Biomed., vol. 88, no. 3, pp. 283–294, 2007. rently, she is an associate professor with the Col-
[36] B. Liu, et al., “Data processing and analysis in real-world tradi- lege of Computer Science, Zhejiang University.
tional chinese medicine clinical data: Challenges and approach- Her research interests mainly include data mining
es,” Statist. Med., vol. 31, no. 7, pp. 653–660, 2012. and knowledge discovery, medical informatics,
[37] G.-Z. Li and B.-Y. Liu, “Big data is essential for further develop- multimedia information processing, pattern rec-
ment of integrative medicine,” Chin. J. Integrative Med., vol. 21, ognition, and knowledge engineering.
pp. 323–331, 2015.
[38] N. L. Zhang, S. Yuan, T. Chen, and Y. Wang, “Latent tree models
and diagnosis in traditional chinese medicine,” Artifi. Intell. Med.,
vol. 42, no. 3, pp. 229–245, 2008.
[39] E. Erosheva, S. Fienberg, and J. Lafferty, “Mixed-membership Baogang Wei received the PhD degree from
models of scientific publications,” Proc. Nat. Acad. Sci. USA, Northwestern Polytechnical University, China, in
vol. 101, no. suppl 1, pp. 5220–5227, 2004. 1997. He is currently a professor with Zhejiang
[40] M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth, “The University, China. His main research interests
author-topic model for authors and documents,” in Proc. 20th include artificial intelligence, pattern recognition,
Conf. Uncertainty Artif. Intell., 2004, pp. 487–494. image processing, machine learning, digital
[41] N. Yao, J. Zhu, and R. Gao, Traditional Chinese Medicine Symp- library, and information & knowledge manage-
toms Differential Diagnosis. Beijing: People Health Press, 2013, ment. Since 1999, he has been a member of the
[in Chinese, ISBN: 9787117036139]. Chinese Association for Artificial Intelligence. So
[42] L. Wu, Chin. Traditional Medicine and Materia Medical Subject Head- far, he has published more than 50 papers in
ings. Beijing: Chinese Medical Ancient Books Publishing, 1996, international journals including the IEEE Trans-
[in Chinese, ISBN: 9787801743596]. actions on Knowledge and Data Engineering and IEEE Transactions on
[43] R. Balasubramanyan and W. W. Cohen, “Block-LDA: Jointly Visualization and Computer Graphics, and conference proceedings
modeling entity-annotated text and entity-entity links,” in Proc. including AAAI, CIKM, and PAKDD.
SIAM Int. Conf. Data Mining, 2011, pp. 450–461.
[44] R. Nallapati and W. W. Cohen, “Link-plsa-LDA: A new unsuper-
vised model for topics and influence of blogs,” in Proc. Int. Conf. Wenjin Zhang received the graduate degree
Weblogs Social Media, 2008, pp. 84–92. from the School of Medicine, Zhejiang University,
[45] S. Amer-Yahia, S. B. Roy, A. Chawla, G. Das, and C. Yu, “Group in 1990 and received the PhD degree, in 2005.
recommendation: Semantics and efficiency,” Proc. VLDB Endow- Now, he is an associate chief physician in the
ment, vol. 2, no. 1, pp. 754–765, 2009. Hepatopancreatobiliary Surgery Department of
[46] Q. Yuan, G. Cong, and C.-Y. Lin, “Com: A sgenerative model for the first affiliated Hospital of Zhejiang University.
group recommendation,” in Proc. 20th ACM SIGKDD Int. Conf. In addition to clinical work, he is also engaged in
Knowl. Discovery Data Mining, 2014, pp. 163–172. medical research involving experimental science
[47] T. Wu, G. Qi, H. Wang, K. Xu, and X. Cui, “Cross-lingual taxon- and data analysis by IT technology.
omy alignment with bilingual biterm topic model,” in Proc. AAAI
Conf. Artif. Intell., 2016, pp. 287–293.
[48] H. M. Wallach, I. Murray, R. Salakhutdinov, and D. Mimno,
“Evaluation methods for topic models,” in Proc. 26th Annu. Int. Zhe Jin received the BE degree from the College
Conf. Mach. Learn., 2009, pp. 1105–1112. of Computer Science and Technology, Zhejiang
[49] A. Asuncion, M. Welling, P. Smyth, and Y. W. Teh, “On smooth-
University, Hangzhou, China, in 2015, where he
ing and inference for topic models,” in Proc. 25th Conf. Uncertainty
is currently a PhD candidate at the same college.
Artif. Intell., 2009, pp. 27–34. His current research interests include text mining
[50] C. Archambeau, B. Lakshminarayanan, and G. Bouchard, “Latent and semantic network.
IBP compound dirichlet allocation,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 37, no. 2, pp. 321–333, Feb. 2015.
[51] J. Chang, S. Gerrish, C. Wang, J. L. Boyd-Graber, and D. M. Blei,
“Reading tea leaves: How humans interpret topic models,” in
Advances Neural Inf. Process. Syst., 2009, pp. 288–296.

" For more information on this or any other computing topic,


please visit our Digital Library at www.computer.org/publications/dlib.

Authorized licensed use limited to: Francis Xavier Engineering College. Downloaded on February 21,2024 at 10:34:14 UTC from IEEE Xplore. Restrictions apply.

You might also like