44-Research trends in text mining
44-Research trends in text mining
Research trends in text mining: Semantic network and main path analysis of
selected journals
Hoon Jung a, Bong Gyou Lee b, *
a
Hana Institute of Finance, Seoul 07321, South Korea
b
Yonsei University, Seoul 03722, South Korea
A R T I C L E I N F O A B S T R A C T
Keywords: In this study, network and main path analyses were conducted on 1856 studies related to text mining, by
Text mining extracting keywords and citation information from the text of each paper. Our findings indicate that research
Research trends papers on text mining have been published in 45 academic disciplines in the 1980s and 1990s, 105 disciplines in
Keywords
the 2000s, and 171 disciplines in the 2010s. The results show that using text mining as a research topic and
Semantic network analysis
Main path analysis
method has rapidly increased. We also demonstrate that the main theme of text mining research is discourse and
content analysis in the 1980s and 1990s, biology and data mining in the 2000s, and medicine and advanced text
mining in the 2010s. Moreover, we examined the main citation path for text mining studies and suggest that the
main focus of text mining studies has evolved from information science to information systems and technology
management. Additionally, influential papers have been recently published in fields such as architecture and
social ecology revealing the wide scope of text mining. This article presents an understanding of previously
unexplored research trends in text mining and how these trends shed light on the most influential academic
papers in the field.
1. Introduction analysis and main path analysis are implemented as text mining
methods in this paper.
Text mining is a technique for extracting meaningful information
from data in text form. The targets of text mining range from academic 2. Theoretical background
literature to social networking sites, posts and comments about the
news, voice of the customer, speech to text (STT) data, and more. Text 2.1. Text mining
mining is also actively used as a means of analyzing research trends in
various fields of study, such as information systems, technology man “Text” in text mining is defined as a symbol stored in digital form.
agement, education, library and information science, psychology, soci Images and video files are also defined as objects of text mining
ology, and others. However, there has not been enough study on the (Grimmer & Stewart, 2013). However, in this study, we consider text as
research trends of text mining itself, in spite of fact that text mining is data expressed in characters only. Text mining finds new information in
being utilized in a variety of fields of study. Therefore, when researchers human character-based data by extracting context and meaning using
need to find papers with academic importance and contribution on text natural language and document processing techniques. The typical
mining, they primarily rely on the reputation of the journal in which the process of text mining analysis begins with pre-processing the collected
paper is published and number of the citations the paper has received. text data. Usually at this stage a morphological analysis is performed to
The purpose of this study is to define which papers make a significant sort sentences into parts of speech. The main keywords are extracted
academic contribution, what is the main research path of current text based on key topics and words that appear simultaneously in the same
mining studies, and to predict future research trends in text mining. To paragraphs or sentences. Then the characteristics and frequency of the
answer these questions, we analyzed 1856 papers about text mining words are defined and analyzed through a variety of text mining tech
stored in the international academic citation databases, Scopus and Web niques, such as keyword network analysis, association analysis, opinion
of Science. To find current trends in text mining, semantic network mining, topic modeling, emotion analysis, and others. Text mining can
* Corresponding author.
E-mail address: [email protected] (B.G. Lee).
https://doi.org/10.1016/j.eswa.2020.113851
Received 29 January 2020; Received in revised form 14 July 2020; Accepted 4 August 2020
Available online 11 August 2020
0957-4174/© 2020 Elsevier Ltd. All rights reserved.
H. Jung and B.G. Lee Expert Systems With Applications 162 (2020) 113851
focus an analysis on areas of interest to the researcher, but the inter mining techniques that examines papers in journal database to reveal
pretation by the researcher can be arbitrary. Hence, the generalization the research trends. A text mining-based research trend analysis has
of the results of text mining should be done sparingly. Conclusions been conducted not only for the specific field of study but also for cross-
drawn in certain studies may not be repeated in other studies (DiMaggio, disciplinary studies. Calero-Medina and Noyons (2008) analyzed all the
Nag, & Blei, 2013). However, despite some of these limitations, text studies having “absorptive capacity” as the keyword of the paper by text
mining has been actively used as an analysis technique in many aca mining and revealed which academic fields used the concept of ab
demic studies due to the advantages of analyzing large amounts of un sorption capacity. In the present study, we collected and analyzed the
structured data that cannot process during traditional data classification papers that contain “text mining” or “text analysis” in the title or author
and analysis. keyword of the paper to identify the types and trends of the academic
fields in which text mining was utilized, the major research topics, and
2.2. Network analysis key papers concerning the research on text mining. As mentioned, even
though text mining is being utilized in a variety of fields of study, few
Network analysis, often called “keyword network analysis” or “se scholars have examined research trends of text mining itself. This study
mantic network analysis”, interprets phenomenon through the use of is apparently the first study to comprehensively analyze the research
networks built by linking words and words appearing in the text to trends in text mining.
create a map of relationships. Network analysis describes the structured
results from uncategorized data rather than fully categorized data. In 3. Analysis and classification of text mining studies
other words, from a relational perspective, not an independent object
but the relationships between the objects that help a situation or phe 3.1. Scope of analysis
nomenon to be understood more clearly. The network consists of nodes
corresponding to keywords and lines or links indicating the relationship The data sources used in this study were Web of Science and Scopus.
between them. In this study, we utilize Gephi 0.9.2 and VOSviewer 1.6.9 Moro et al. (2015) conducted searches with keywords such as “banking”
as the software for network analysis. The measures to determine the and “Business Intelligence” to study research trend of Business Intelli
centrality of a node, such as degree centrality which measures how gence in banking. If the authors judge subjectively whether a paper is a
many nodes are directly connected, closeness centrality which scores study on “Business Intelligence in banking” or not, it will weaken the
each node based on its closeness to all other nodes within the network, reliability of the results. Moreover, if there are thousands of papers on
and betweenness centrality which measures the number of times a node Business Intelligence, it is almost impossible to analyze the relevance
lies on the shortest path between other nodes, are used. In this study, respectively. Hence, this study collected all the papers that contains the
network nodes are visualized based on eigenvector centrality which word “text mining” or “text analysis” in the title or author keyword of
values the relative importance of nodes. the paper through text mining technique to select relevant journals,
targeting all the English journals listed in Scopus for the past 40 years.
However, when searching for two consecutive words like “text
2.3. Main path analysis mining” or “text analysis”, papers such as, “Affect analysis of text using
fuzzy semantic typing” (Subasic & Huettner, 2001), “A machine learning
Main path analysis establishes the important route among the cita approach to sentiment analysis in multilingual web texts” (Boiy &
tion relationships of papers. It is a technique that sheds light on the Moens, 2009), and “Text and structural data mining of influenza men
academic, behind-the-scenes relationship and the broadening path of tions in web and social media” (Corley, Cook, Mikler, & Singh, 2010),
knowledge by visualizing the use of citations between institutions. In would not be found. Therefore, we searched for studies which included
this study, the main path is derived based on the search path count “text” and “mining” or “text” and “analysis” in the title. We also
(SPC). The SPC is the total number of times that a link is traversed from considered searching for “content analysis” as a keyword phrase which
the source to the end of the path in which a paper is cited. We used four is typically used in the same way as “text analysis”, but when “content
types of main path analysis; forward local main path, backward main analysis” was used, a rhetorical or contextual study on lyrics, “A
path, global main path, and key-route main path analysis. Forward local Content-based Analysis of Shahriar’s Azerbaijani Turkish Poem Getmə
main path and backward local main path analyses select and connect the Tərsa Balası in Terms of Religious Images and Interpretations”, (Moza
link with the largest SPC at each contact point. The global main path heb, Shahiditabar, Monfared, & Mirzapour, 2016), qualitative research
analysis selects the path where the total sum of the SPCs is the largest papers on specific topics, “The Rap on Chicano and Black Masculinity: A
and the key-route main path analysis selects the path where the largest Content Analysis of Gender Images in Rap Lyrics” (Baker-Kimmons &
link is first and combines the important path forward and backward. The McFarland, 2011), “Experiences of living with dementia: Qualitative
key-route main path search solves the problem of missing some routes Content Analysis of semi-structured interviews” (Mazaheri et al., 2013),
and includes all important connections. We used the software, Pajek were included in the search results. Hence, when both “text” and
64–5.07a as the main path analysis tool. “mining” are included in the titles of literature, or both “text” and
“analysis” are included, we considered them as meeting the criteria for
2.4. Literature analysis this study. We also decided to analyze papers published since 1980 that
received more than one citation by the end of December 2019. We
There have been several studies on research trends of academic fields categorized the selected journals into (1) the 1980s and 1990s, when
such as information science (Lee, Kim, & Kim, 2010), education (Hung, text mining was rarely used; (2) the 2000s, when the internet and digital
2012), machine learning (Sharma, Kumar, & Chand, 2018), biomedicine literature were spreading; (3) the 2010s, when text mining was
(Zhai et al., 2015), business intelligence (Moro, Carneiro, Cortez, & Rita, expanding in various disciplines and smart phones and social
2015), and medical informatics (Kim & Delen, 2018) using quantitative networking services became popular. The purpose of this analysis is to
analysis on peer reviewed papers. All of these studies are based on text-
2
H. Jung and B.G. Lee Expert Systems With Applications 162 (2020) 113851
3
H. Jung and B.G. Lee Expert Systems With Applications 162 (2020) 113851
and 1990s, have been pushed out of the top 10 since the 2000s.
4
H. Jung and B.G. Lee Expert Systems With Applications 162 (2020) 113851
network analysis for the keywords chosen by the author are limited in “representation”, “word”, “time”, “year”, “work”, “use”, “term”,
some ways, it is necessary to analyze the data at the paragraph level “search”, “author”, “field”, “level”, “area”, “development”, “source”,
rather than the unstructured set of words. Therefore, we conducted an “concept”, “context”, “review”, “conclusion”, “group”, “help”, “quality”,
additional keyword network analysis based on the abstract for each “value”, “number”, “performance”, “student”, “science”, “test”, “aim”,
paper. In addition, to exclude words that appear frequently in most “keyword”, “form”, “measure”, “report”, “type”, “processing”, “factor”,
papers regardless of their area of focus, we applied the ”stop word“ “effect”, “difference”, “content”, “amount”, “strategy”, “way”, “man
function of Python. Excluded words were: “article”, “result”, “informa agement”, “background”, “show”, “technology”, “question”, “measure”,
tion”, “system”, “study”, “research”, “knowledge”, “paper”, “data”, “theme”, “challenge”, “interest”, “question”, “order”, “evaluation”,
“document”, “approach”, “method”, “problem”, “system”, “tool”, “service”, “world”.
“literature”, “task”, “technique”, “feature”, “design”, “language”, We analyzed the text of abstracts for 1,856 papers and used Python to
“structure”, “program”, “process”, “case”, “model”, “process”, extract the frequency of keyword pairs appearing together within the
Fig. 8. Density visualization for abstract-based keywords in the 1980s Fig. 9. Density visualization for abstract-based keywords in 2000s.
and 1990s.
5
H. Jung and B.G. Lee Expert Systems With Applications 162 (2020) 113851
Fig. 11. Year-by-year visualizations of the keyword network (from 1980s to 2010s).
6
H. Jung and B.G. Lee Expert Systems With Applications 162 (2020) 113851
7
H. Jung and B.G. Lee Expert Systems With Applications 162 (2020) 113851
Table 1
Key thesis of text mining based on main path analysis.
Author Title Academic Field
Tanabe (1999) MedMiner: An Internet text-mining tool for biomedical information, with application to gene Biology
expression profiling
Kostoff et al. (2001) Text mining using database tomography and bibliometrics: A review Technology Assessment and
Forecasting
Fattori, Pedrazzi, and Turra (2003) Text mining applied to patent mapping: A practical business case Intellectual Property Information
Yoon and Park (2004) A text-mining-based patent network: Analytical tool for high-technology trend High Technology Management
Tseng et al. (2007) Text mining techniques for patent analysis Information Management
Choudhary, Oluikpe, Harding, and The needs and benefits of text mining applications on post-project reviews Information and Communication
Carrillo (2009) Technology
Ur-Rahman and Harding (2012) Textual data mining for industrial knowledge management and text classification Information Systems
He (2013) Improving user experience with case-based reasoning systems using text mining and Web 2.0 Information Systems
He et al. (2013) Social media competitive analysis and text mining: A case study in the pizza industry Information Management
Moro et al. (2015) Business intelligence in banking: A literature analysis from 2002 to 2013 using text mining and latent Information Systems
Dirichlet allocation
Shravan Kumar and Ravi (2016) A survey of the applications of text mining in financial domain Information Systems
Aureli (2016) Sustainability disclosure after a crisis: A text mining approach Social Ecology
Kim, Park, Yun, and Yun (2017) What makes tourists feel negatively about tourism destinations? Application of hybrid text mining Technology Assessment and
methodology to smart destination management Forecasting
Shen et al. (2017) An integrated system of text mining technique and case-based reasoning for supporting green building Building Science
design
item as one. possible to identify seminal papers on the citation path and to find ac
In Figs. 8–10, we show the density visualization of keywords through ademic fields in which text mining-related papers were published for
VOSviewer, but no specific differentiated findings are identified each year. Moreover, we can establish the order and direction of the
compared to network visualization through Gephi. However, the over development of the main route. In Fig. 15, the forward local main path
lay visualization feature of VOSviewer shows keyword trends by year analysis, “Social media competitive analysis and text mining” (He, Zha,
and it can identify additional effects and characteristics of research & Li, 2013), a study on social media and text mining published in the
trends. The overlay visualization visualizes word pairs that appeared International Journal of Information Management, is followed by a paper
together in text mining studies from 1980 to 2019. The keywords with a on architecture, “An integrated system of text mining technique and
minimum number of concurrent appearances (more than 2) are visual case-based reasoning for supporting green building design” (Shen,
ized and the analysis are derived as shown in Figs. 11–13. In Figs. 11–13, HangYan, Ya, & Zhang, 2017). In Fig. 16, unlike the forward local main
the color of the node indicates the age of the study. The closer to yellow path, the path starting with Tanabe (1999) and Nasukawa (2001) is
the node is, the more recent it is. On the other hand, the closer to blue shown as an additional main path. Fig. 18 shows the key-route local
the node is, the older it is, which means close to 1980. Fig. 11 is the main path analysis which minimizes the omission of certain routes, in
result of an analysis of the all papers we targeted. Figs. 12 and 13 are dicates that He et al. (2013) has influenced various academic fields.
also visualizations that highlight the 2000s and 2010s, respectively. In Also, Moro, Cortez, and Rita (2015), which uses the latent Dirichlet
the 2000s, as we see in Fig. 12, the network is formed around “infor allocation (LDA) technique to analyze the literature related to the
mation retrieval”, “natural language processing”, and “document clas banking industry’s business intelligence system, has been cited in
sification”. In Fig. 13, which highlights keywords in relatively recent various academic fields. In particular, “Text mining using database to
papers in the 2010s, the latest research trends such as “big data”, mography and bibliometrics: A review” (Kostoff, Toothman, Eberhart, &
“twitter”, “topic modeling”, and “sentiment analysis” are emphasized. Humenik, 2001) is a study of academic literature databases using text
mining, This paper is in the early stages of the all four main path analysis
5. Main path analysis of text mining studies results. Two papers which are studies on patent analysis, “A text-mining-
based technology network: Analytical tool for high-technology trend”
We conducted a main path analysis on two citation databases. The (Yoon & Park, 2004) and “Text mining techniques for patent analysis”
targeted set of studies of the main path analysis are the same as the set (Tseng, Lin, & Lin, 2007) are also present in the all four main path
used in the previous network analysis. The software used for the main analysis results. Comparing the results of the main path analysis, sig
path analysis was VOSviewer, Gephi, and Pajek. First, we used VOS nificant studies on text mining are produced in the field of literature
viewer and Gephi to create citation network relationships between information (Kostoff et al., 2001) in the early 2000s, but later appeared
studies on text mining and utilized Pajek to calculate the SPC and in a number of studies related to technology management, patents, and
visualize the main path. Since the paper’s publication year is displayed information systems. Recently, the scope of text mining studies has
in the visualization results, we implemented the main path analysis both expanded to the analysis of social media and other social phenomena.
by separating the time periods and by combining the time periods Other current seminal studies in the main path can be found in social
(Fig. 14). ecology and architecture. We also evidence that He et al. (2013) and
In this study, four types of main path analysis and visualization were Moro et al. (2015) authored important papers that have contributed to
performed; forward local main path, backward local main path, key the broad spread of knowledge and played an important role in each
route local main path, and standard global main path. The final results of field. In each of the above main path analyses, the papers occurring more
the main path analyses are shown in Figs. 15–18. From the analysis, it is than twice are shown in the order of the year of publication in Table 1.
8
H. Jung and B.G. Lee Expert Systems With Applications 162 (2020) 113851
These outcomes suggest that these papers are important studies in the academic development of text mining were produced in the information
main path of text mining studies. In most cases, the academic journals in science literature until the early 2000s and in information system and
which these papers are published are related to information systems or technology management literature in 2010s. In addition, recent impor
information management, technical management or technological tant research concerning text mining has been published, surprisingly, in
management. The results reveal that papers with highly significant social ecology and architecture and we observed the widespread spread
contributions in the field of text mining have been published in domains use of text mining throughout various academic fields. We also highlight
which were rarely related to text mining in the past. This effect corre which studies have had an extensive impact on various academic fields.
sponds to the overall expansion of the domain in which the paper was Recently it has been shown that text mining is increasingly used as a
written. The results also show that most important papers use text means of research rather than as the purpose of research. Based on the
mining as tools of research rather than focusing on text mining itself and international databases of academic literature, we extracted and pre
this trend has recently increased. processed citation and cited data between the studies. The contribution
of this study is that it unearthed research trends on text mining from
6. Conclusion 1980 to the present and derives the implications of these trends by
analyzing semantic networks and main paths within these networks. A
We have analyzed studies on text mining from different time periods future extension of this research would be to analyze research trends of
and derived research trends from the databases of peer-reviewed liter text mining comparing “text mining as a means of study” with “text
ature, Web of Science and Scopus. The results reveal that the number of mining as a subject to study”.
academic fields where text mining is utilized has increased significantly
and specifically identify in which areas of study text mining is being CRediT authorship contribution statement
actively applied. In addition, we have extracted keywords occurring
simultaneously from the abstracts of text mining papers to analyze Hoon Jung: Visualization, Conceptualization. Bong Gyou Lee:
network paths and identify the major keywords for each time period Validation.
based on eigenvector centrality. Our findings indicate that conversa
tional and speech-related keywords such as “discourse” and “speech” in Declaration of Competing Interest
the 1980s and 1990s, biomedical words like “gene” in the 2000s, and
medical-related keywords such as “cancer”, and keywords related to The authors declare that they have no known competing financial
advanced analytical techniques such as “topic” and “algorithm” are interests or personal relationships that could have appeared to influence
prominent in the 2010s. In addition, we specifically demonstrate the the work reported in this paper.
changes in keywords by year suggesting that research on big data, social
media analysis, and emotion analysis (“big data,” “twitter,” “sentiment Appendix A. Supplementary data
analysis”) are emerging as the latest research trends. Based on the results
of keyword analysis of the abstracts of papers, we can expect that the Supplementary data to this article can be found online at https://doi.
academic fields publishing papers related to text mining will be steadily org/10.1016/j.eswa.2020.113851.
expanded in the future and new analysis techniques will also continue to
be developed. Appendix B
We also examined the main path of citation networks among 1,856
studies on text mining and presented evidence regarding influential See Tables B1–B3
authors and important contributing papers to the advancement of
knowledge and development in the field of text mining. To sum up the
results of the four main path analyses, the papers contributing to the
Table B1
Academic fields of journals that published papers about text mining in the 1980s and 1990s.
Academic Fields, Number of Papers, Percentage of Papers % Academic Fields, Number of Papers, Percentage of Papers %
Computer Science Artificial Intelligence 22 26.2 Audiology Speech Language Pathology 1 1.2
Computer Science Information Systems 20 23.8 Biochemistry Molecular Biology 1 1.2
Information Science Library Science 14 16.7 Biology 1 1.2
Computer Science Interdisciplinary Applications 13 15.5 Biotechnology Applied Microbiology 1 1.2
Engineering Electrical Electronic 12 14.3 Chemistry Analytical 1 1.2
Computer Science Theory Methods 9 10.7 Chemistry Multidisciplinary 1 1.2
Computer Science Software Engineering 8 9.5 Education Special 1 1.2
Medical Informatics 6 7.1 Engineering Industrial 1 1.2
Computer Science Hardware Architecture 5 6.0 Genetics Heredity 1 1.2
Health Care Sciences Services 5 6.0 Linguistics 1 1.2
Telecommunications 4 4.8 Management 1 1.2
Computer Science Cybernetics 3 3.6 Mathematical Computational Biology 1 1.2
Engineering Biomedical 3 3.6 Mathematics Applied 1 1.2
Operations Research Management Science 3 3.6 Mathematics Interdisciplinary Applications 1 1.2
Statistics Probability 3 3.6 Medicine General Internal 1 1.2
Biochemical Research Methods 2 2.4 Medicine Legal 1 1.2
Ergonomics 2 2.4 Physics Fluids Plasmas 1 1.2
History Philosophy of Science 2 2.4 Psychology 1 1.2
Multidisciplinary Sciences 2 2.4 Psychology Applied 1 1.2
Neurosciences 2 2.4 Psychology Experimental 1 1.2
Optics 2 2.4 Psychology Multidisciplinary 1 1.2
Social Sciences Mathematical Methods 2 2.4 Rehabilitation 1 1.2
SOCIAL Sciences Interdisciplinary 1 1.2
9
Table B2
Computer Science Artificial Intelligence 138 30.2 Integrative Complementary Medicine 2 0.4
Computer Science Information Systems 104 22.8 Literary Theory Criticism 2 0.4
Computer Science Theory Methods 65 14.2 Literature Romance 2 0.4
Computer Science Interdisciplinary Applications 43 9.4 Medieval Renaissance Studies 2 0.4
Mathematical Computational Biology 37 8.1 Ophthalmology 2 0.4
Biochemical Research Methods 36 7.9 Pharmacology Pharmacy 2 0.4
Biotechnology Applied Microbiology 36 7.9 Philosophy 2 0.4
Information Science Library Science 31 6.8 Psychology Applied 2 0.4
Computer Science Software Engineering 30 6.6 Transportation 2 0.4
Language Linguistics 23 5.0 Agriculture Multidisciplinary 1 0.2
Linguistics 20 4.4 Asian Studies 1 0.2
Engineering Electrical Electronic 19 4.2 Chemistry Analytical 1 0.2
Religion 18 3.9 Chemistry Multidisciplinary 1 0.2
Statistics Probability 18 3.9 Developmental Biology 1 0.2
Social Sciences Interdisciplinary 15 3.3 Ecology 1 0.2
Genetics Heredity 12 2.6 Education Special 1 0.2
Biochemistry Molecular Biology 11 2.4 Electrochemistry 1 0.2
Literature 11 2.4 Emergency Medicine 1 0.2
Operations Research Management Science 11 2.4 Endocrinology Metabolism 1 0.2
Psychology Experimental 11 2.4 Energy Fuels 1 0.2
Medical Informatics 10 2.2 Engineering Aerospace 1 0.2
Communication 8 1.8 Engineering Civil 1 0.2
Health Care Sciences Services 8 1.8 Ethics 1 0.2
Sociology 8 1.8 Geriatrics Gerontology 1 0.2
Management 7 1.5 History Philosophy of Science 1 0.2
Education Educational Research 6 1.3 Horticulture 1 0.2
Psychology Educational 6 1.3 Humanities Multidisciplinary 1 0.2
10
Table B3
Academic fields of journals that published papers on text mining in the 2010s.
Academic Field, Number of Papers, Percentage of Papers % Academic Field, Number of Papers, Percentage of Papers %
Computer Science Artificial Intelligence 139 10.1 Nuclear Science Technology 4 0.3
Computer Science Information Systems 135 9.8 Nutrition Dietetics 4 0.3
Computer Science Interdisciplinary Applications 112 8.1 Obstetrics Gynecology 4 0.3
Mathematical Computational Biology 94 6.8 Optics 4 0.3
Education Educational Research 93 6.7 Psychology 4 0.3
Information Science Library Science 91 6.6 Psychology Developmental 4 0.3
Engineering Electrical Electronic 88 6.4 Public Administration 4 0.3
Linguistics 85 6.2 Transportation 4 0.3
Language Linguistics 83 6.0 Anthropology 3 0.2
Multidisciplinary Sciences 57 4.1 Archaeology 3 0.2
Medical Informatics 52 3.8 Audiology Speech Language Pathology 3 0.2
Biochemical Research Methods 51 3.7 Biodiversity Conservation 3 0.2
Computer Science Software Engineering 50 3.6 Engineering Environmental 3 0.2
Operations Research Management Science 48 3.5 Food Science Technology 3 0.2
Communication 47 3.4 Green Sustainable Science Technology 3 0.2
Biotechnology Applied Microbiology 45 3.3 Imaging Science Photographic Technology 3 0.2
Management 44 3.2 Literature Slavic 3 0.2
Computer Science Theory Methods 40 2.9 Physics Applied 3 0.2
Health Care Sciences Services 37 2.7 Plant Sciences 3 0.2
Public Environmental Occupational Health 36 2.6 Psychology Psychoanalysis 3 0.2
Humanities Multidisciplinary 34 2.5 Remote Sensing 3 0.2
Statistics Probability 34 2.5 Social Sciences Biomedical 3 0.2
Business 31 2.2 Surgery 3 0.2
Psychology Multidisciplinary 25 1.8 Toxicology 3 0.2
Engineering Multidisciplinary 24 1.7 Cardiac Cardiovascular Systems 2 0.1
Social Sciences Interdisciplinary 23 1.7 Chemistry Analytical 2 0.1
Political Science 20 1.4 Classics 2 0.1
Biochemistry Molecular Biology 19 1.4 Criminology Penology 2 0.1
Medicine General Internal 19 1.4 Dentistry Oral Surgery Medicine 2 0.1
Psychology Experimental 16 1.2 Education Special 2 0.1
Telecommunications 14 1.0 Energy Fuels 2 0.1
Computer Science Hardware Architecture 13 0.9 Ethics 2 0.1
Engineering Industrial 13 0.9 Ethnic Studies 2 0.1
Literature 13 0.9 Geography 2 0.1
Medicine Research Experimental 13 0.9 Geosciences Multidisciplinary 2 0.1
Chemistry Multidisciplinary 12 0.9 History of Social Sciences 2 0.1
Environmental Sciences 12 0.9 Infectious Diseases 2 0.1
Mathematics Interdisciplinary Applications 12 0.9 Mathematics 2 0.1
Psychology Educational 12 0.9 Meteorology Atmospheric Sciences 2 0.1
Automation Control Systems 11 0.8 Orthopedics 2 0.1
Economics 11 0.8 Philosophy 2 0.1
Health Policy Services 11 0.8 Physics Fluids Plasmas 2 0.1
Physics Multidisciplinary 11 0.8 Physiology 2 0.1
Education Scientific Disciplines 10 0.7 Psychology Mathematical 2 0.1
Genetics Heredity 10 0.7 Radiology Nuclear Medicine Medical Imaging 2 0.1
History 10 0.7 Rheumatology 2 0.1
Planning Development 10 0.7 Social Work 2 0.1
Psychiatry 10 0.7 Theater 2 0.1
Psychology Clinical 10 0.7 Veterinary Sciences 2 0.1
Engineering Civil 9 0.7 Water Resources 2 0.1
Integrative Complementary Medicine 9 0.7 Agriculture Dairy Animal Science 1 0.1
Pharmacology Pharmacy 9 0.7 Anatomy Morphology 1 0.1
Neurosciences 8 0.6 Area Studies 1 0.1
Acoustics 7 0.5 Behavioral Sciences 1 0.1
Biology 7 0.5 Chemistry Applied 1 0.1
Business Finance 7 0.5 Clinical Neurology 1 0.1
Computer Science Cybernetics 7 0.5 Demography 1 0.1
Environmental Studies 7 0.5 Ecology 1 0.1
Law 7 0.5 Electrochemistry 1 0.1
Literature Romance 7 0.5 Endocrinology Metabolism 1 0.1
Nursing 7 0.5 Engineering Aerospace 1 0.1
Oncology 7 0.5 Engineering Mechanical 1 0.1
Psychology Applied 7 0.5 Family Studies 1 0.1
Psychology Social 7 0.5 Film Radio Television 1 0.1
Religion 7 0.5 Folklore 1 0.1
Sociology 7 0.5 Forestry 1 0.1
Chemistry Medicinal 6 0.4 Geography Physical 1 0.1
Construction Building Technology 6 0.4 Literature German Dutch Scandinavian 1 0.1
Ergonomics 6 0.4 Mathematics Applied 1 0.1
Instruments Instrumentation 6 0.4 Medical Laboratory Technology 1 0.1
International Relations 6 0.4 Medieval Renaissance Studies 1 0.1
Primary Health Care 6 0.4 Metallurgy Metallurgical Engineering 1 0.1
Rehabilitation 6 0.4 Microbiology 1 0.1
Respiratory System 6 0.4 Music 1 0.1
(continued on next page)
11
H. Jung and B.G. Lee Expert Systems With Applications 162 (2020) 113851
Table B3 (continued )
Academic Field, Number of Papers, Percentage of Papers % Academic Field, Number of Papers, Percentage of Papers %
References Kostoff, R. N., Toothman, D. R., Eberhart, H. J., & Humenik, J. A. (2001). Text mining
using database tomography and bibliometrics: A review. Technological Forecasting
and Social Change, 68(3), 223–253.
Aureli, S. (2016). Sustainability disclosure after a crisis: A text mining approach.
Lee, J. Y., Kim, H., & Kim, P. J. (2010). Domain analysis with text mining: Analysis of
International Journal of Social Ecology and Sustainable Development, 7(1), 35–49.
digital library research trends using profiling methods. Journal of Information Science,
Baker-Kimmons, L., & McFarland, P. (2011). The rap on Chicano and black masculinity:
36, 144–161.
A content analysis of gender images in rap lyrics. Race, Gender & Class, 18(1/2),
Mazaheri, M., Eriksson, L. E., Heikkilä, K., Nasrabadi, A. N., Ekman, S.-L., &
331–344.
Sunvisson, H. (2013). Experiences of living with dementia: Qualitative content
Boiy, E., & Moens, M.-F. (2009). A machine learning approach to sentiment analysis in
analysis of semi-structured interviews. Journal of Clinical Nursing, 22, 3032–3041.
multilingual Web texts. Information Retrieval, 12(5), 526–558.
Moro, S. M. C., Cortez, P. A. R., & Rita, P. M. R. F. (2015). Business intelligence in
Calero-Medina, C., & Noyons, E. C. M. (2008). Combining mapping and citation network
banking: A literature analysis from 2002 to 2013 using text mining and latent
analysis for a better understanding of the scientific development: The case of the
Dirichlet allocation. Expert Systems with Applications, 42(3), 1314–1324.
absorptive capacity field. Journal of Informetrics, 2(4), 272–279.
Mozaheb, M. A., Shahiditabar, M., Monfared, A., & Mirzapour, F. (2016). A content-
Choudhary, A. K., Oluikpe, P. I., Harding, J. A., & Carrillo, P. M. (2009). The needs and
based analysis of Shahriar’s Azerbaijani Turkish poem Getmə Tərsa Balası (A
benefits of Text Mining applications on Post-Project Reviews. Computers in Industry,
Christian Child) in terms of religious images and interpretations. International
60(9), 728–740.
Journal of Applied Linguistics & English Literature, 5(2), 159–163.
Corley, C. D., Cook, D. J., Mikler, A. R., & Singh, K. P. (2010). Text and structural data
Sharma, D., Kumar, B., & Chand, S. (2018). Trend analysis in machine learning research
mining of influenza mentions in web and social media. International Journal of
using text mining. International Conference on Advances in Computing, Communication
Environmental Research and Public Health, 7, 596–615.
Control and Networking, 136–141.
DiMaggio, P., Nag, M., & Blei, D. (2013). Exploiting affinities between topic modeling
Shen, L., HangYan, H. F., Ya, W.u., & Zhang, Y.u. (2017). An integrated system of text
and the sociological perspective on culture: Application to newspaper coverage of U.
mining technique and case-based reasoning (TM-CBR) for supporting green building
S. government arts funding. Poetics, 41(6), 570–606.
design. Building and Environment, 124(1), 388–401.
Fattori, M., Pedrazzi, G., & Turra, R. (2003). Text mining applied to patent mapping: A
Shravan Kumar, B., & Ravi, V. (2016). A survey of the applications of text mining in
practical business case. World Patent Information, 25(4), 335–342.
financial domain. Knowledge-Based Systems, 114(15), 128–147.
Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic
Subasic, P., & Huettner, A. (2001). Affect analysis of text using fuzzy semantic typing.
content analysis methods for political texts. Political Analysis, 21, 267–297.
IEEE Transactions on Fuzzy Systems, 9(4), 483–496.
He, W. (2013). Improving user experience with case-based reasoning systems using text
Tanabe, L. (1999). MedMiner: An Internet text-mining tool for biomedical information,
mining and Web 2.0. Expert Systems with Applications, 40(2,1), 500–507.
with application to gene expression profiling. BioTechniques, 27(6), 1210–1217.
He, W., Zha, S., & Li, L. (2013). Social media competitive analysis and text mining: A case
Tseng, Y.-H., Lin, C.-J., & Lin, Y.-I. (2007). Text mining techniques for patent analysis.
study in the pizza industry. International Journal of Information Management, 33(3),
Information Processing & Management, 43(5), 1216–1247.
464–472.
Ur-Rahman, N., & Harding, J. A. (2012). Textual data mining for industrial knowledge
Hung, J. (2012). Trends of e-learning research from 2000 to 2008: Use of text mining and
management and text classification: A business oriented approach. Expert Systems
bibliometrics. British Journal of Educational Technology, 43(1), 5–16.
with Applications, 39(5), 4729–4739.
Kim, Y. M., & Delen, D. (2018). Medical informatics research trend analysis: A text
Yoon, B., & Park, Y. (2004). A text-mining-based patent network: Analytical tool for
mining approach. Health Informatics Journal, 24(4), 432–452.
high-technology trend. The Journal of High Technology Management Research, 15(1),
Kim, K., Park, O.-J., Yun, S., & Yun, H. (2017). What makes tourists feel negatively about
37–50.
tourism destinations? Application of hybrid text mining methodology to smart
Zhai, X., Li, Z., Gao, K., Huang, Y., Lin, L., & Wang, L. (2015). Research status and trend
destination management. Technological Forecasting and Social Change, 123, 362–369.
analysis of global biomedical text mining studies in recent 10 years. Scientometrics,
105, 509–523.
12