0% found this document useful (0 votes)
10 views

Analysis of research papers on E-commerce (2000–2013)- based on a text mining approach

Uploaded by

zj2012fall
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Analysis of research papers on E-commerce (2000–2013)- based on a text mining approach

Uploaded by

zj2012fall
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Scientometrics (2015) 105:403–417

DOI 10.1007/s11192-015-1675-6

Analysis of research papers on E-commerce (2000–2013):


based on a text mining approach

Bei-Ni Yan1,2 • Tian-Shyug Lee2 • Tsung-Pei Lee2

Received: 16 July 2015 / Published online: 13 August 2015


Ó Akadémiai Kiadó, Budapest, Hungary 2015

Abstract E-commerce (EC) is sweep across the globe and has become a most important
commercial activity. Accordingly, EC also causes the academia’s research interests. A lot
of research achievements have been gained in recent years. This paper takes these
achievements as research object and collects 8488 research papers published in academic
journals during 2000–2013 included in Web of Science database. Using text mining
techniques, 68 terms are identified as the main keywords of EC field. Then the scientific
structure of the EC is mapped through multidimensional scaling, based upon the co-
occurrence of the main terms in the academic journals. The results show that the EC
domain is composed of three main fields, such as technology, management and customer.
Furthermore, knowledge graph based on the EC research network is visualized and it
shows that the whole EC research papers covering seven important subnets, which are:
internet, consumer behaviour, customer satisfaction, online shopping, reputation, Taiwan
and knowledge management.

Keywords E-commerce  Research papers  Co-word  Text mining

Introduction

E-commerce is transforming the way the world shops and is expanding at a high speed.
Starting from virtually zero in 1995, global E-commerce sales reached $750 million in
1997; only 1 year later, the figure went up to $2.3 billion (Robn 1998). US Department of
Commerce pointed out that global online retail sales rose by 26 % from the previous year
to $55 billion in 2003. The business-to-consumer (B2C) revenue of 2005 had been

& Bei-Ni Yan


[email protected]
1
College of Management, Anhui University, Hefei 230039, Anhui Province, China
2
College of Management, Fu Jen Catholic University, Taipei 24205, Taiwan

123
404 Scientometrics (2015) 105:403–417

estimated to be $156 billion (Wolfinbarger and Gilly 2003). Nowadays, there is a growing
interest in the use of electronic commerce as a means to perform business transactions. For
many businesses, it has become a priority (Sharp 1998). According to the latest data
predicted by Goldman Sachs, the global E-commerce sales are growing at more 19 % a
year from 2010 to 2013 (Davis 2011). The worldwide retail web sales reached nearly $1
trillion by 2013. E-commerce also has caused the great research interesting of scholars in
different areas. They have issued a large number of literatures on E-commerce. These
literatures cover many disciplines from computer science, business, management, infor-
mation and library science, telecommunications and economics and so on. They have
different types including article, review, monograph, newsletter, proceeding paper and
editorial material etc. The research topics contain internet technology, privacy, consumer
behaviour,supply chain management and so on. The quantity of the literatures is so huge
that we can’t identify the core areas of the current studies. So it suggests that the literatures
need to be reviewed and summarized.

Related studies

E-commerce review

Some reviews on E-commerce, on the whole, can be divided into two types according to
the methods used. One type can be called quantitative reviews, the other is qualitative
reviews. Qualitative reviews also can be divided into two types. The first type focuses on
different aspects of E-commerce’s developments. The second type focuses on E-commerce
in different countries or regions.
The qualitative studies are as following. For example Bond and Whiteley (1998)
reviewed a number of critical legal aspects in the area of secure E-commerce and found out
key issues including data protection laws, web site ownership and hypertext linking. Visser
and Lanzendorf (2004) explored the mobility and accessibility effects of B2C E-commerce
by means of a literature review. Goi (2007) studied the existing web site models for
E-commerce and presented three main ways of classifying web sites, including digital
business model, stages of development model and scoring systems. Liao et al. (2008)
pointed out that content preparation was an important stage in E-commerce website
development. After examining relevant literature, they proposed a conceptual model on
content preparation for cross-cultural E-commerce. The model explored cultural effects on
information processing of consumers by taking into account both normative effects and
psychological effects. Yi and Thomas (2007) attempted to provide a review of the current
state of the art of how e-business/ICT affecting the environment. The work reviewed was
in various forms including journal papers and thesis which had been peer-reviewed, as well
as other resources such as projects and project reports, conference and symposia, and
websites. It was claimed that the research examined had captured the most important work
to date, either for a general knowledge of this new area or for background study by experts
carrying out future research. Boritz et al. (2008) reviewed the research on Internet privacy
in E-commerce that had been conducted in the fields of information systems, business, and
marketing and developed a framework for classifying the studies. The framework con-
cluded seven aspects: customer perspective, company perspective, government regulation,
customer–company interaction, customer–government interaction, company–government
interaction, customer–company–government interaction. Liu (2008) reviewed the

123
Scientometrics (2015) 105:403–417 405

arguments both criticising and defending business method patents and analyzed the
underlying reasons for the controversy. Han and Jin (2009) studied the origin and devel-
opment of technology acceptance model (TAM) in the ecommerce, summarized and
contrasted classified TAM into the following three categories: simple modified TAM,
models combining TAM with related theories and acceptance models from other per-
spectives. Gupta et al. (2009) reviewed, classified, consolidated, and synthesized the
contributions to the expanding field of e-business that had been published in Production
and Operations Management. They classified e-business researches in the following four
categories: (1) E-auctions, (2) radio frequency identification, (3) e-business system design,
and (4) competition, conflict, collaboration, and coordination (C4 in e-business). Wu and
Liu (2010) did a systematic literature review of e-business capability research in order to
develop an extensive E-business capability (EBC) model for better examining how
e-business technology creates business value. Wang et al. (2011) reviewed the concept and
formation mechanisms of initial trust as well as various factors influencing it. Goi (2012)
reviewed web evaluation criteria for E-commerce web sites and drew a conclusion that the
most common web site criteria to be applied were quality, function, credibility, reliability,
attractiveness, systematic structure and navigation. Trad and Kalpi (2013) researched the
selection and training framework for manager in e-business innovation transformation
projects by literature review. Zhao (2013) reviewed the web mining in E-commerce.
While some researchers paid attention to E-commerce in different countries or regions,
for example E-commerce issues in the Russian federation, driving factors for E-commerce
in gulf region and the existing status of E-commerce in India etc. (Iatsyk and Szymczyk
2001; Rawi et al. 2008; Vaithianathan 2010).
Quantitative reviews on E-commerce are relatively few. Chen et al. (2012) collected
keywords and abstracts from 995 articles in four primary EC journals. For exploring
significant and latent EC topics, they analyzed the differences and similarities between
international and Taiwanese sources. Shiau and Dwivedi (2013) chose top six E-commerce
journals from 2006 to 2010. A total of 1064 electronic commerce related articles and
33,173 references were identified. There were 48 high value research articles identified
using a citation and co-citation analysis. Using statistical analysis including factor analysis,
multidimensional scaling, and cluster analysis, they identified five research areas: trust,
technology acceptance and technology application, E-commerce task-related application,
e-markets, identity and evaluation.

Co-word analysis of text-mining

Text mining is a method that offers great opportunities to information specialists to reveal
unknown knowledge form bibliographic information of scientific publications (Wormell
2000). Text mining could be defined from different perspectives like information retrieval,
knowledge discovery and artificial intelligence, but here it means—the process of identi-
fying novel, interesting and understandable patterns from collection of texts (Blake 2011).
One of text mining techniques is co-word analysis which was developed by Callon et al.
(1983). Co-word analysis is based on counting and analyzing the co-occurrences of words
in different parts of articles of a specific domain (Callon et al. 1991). Most of the previous
research used co-word analysis to depict structures of different scientific domains such as
biotechnology (Rip and Courtial 1984), acidification research (Law and Whittaker 1992),
biological safety (Cambrosio et al. 1993), patents (Courtial et al. 1993), scientometrics
(Courtial 1994), drug industry (Rikken et al. 1995), software engineering (Coulter et al.

123
406 Scientometrics (2015) 105:403–417

1998), information retrieval research (Ding et al. 2001), document retrieval (Hui and Fong
2004), R&D domain of robot technology (Lee and Jeong 2008), stem cells field (An and
Wu 2011), visualization methods (Yang et al. 2012), doctoral dissertation (Zong et al.
2013), intellectual structure (Cho 2014).
In view of this, this study uses co-word analysis of text mining to visualize and shows
the main content of the electronic commerce research and knowledge structure. Through
systematic and quantitative conclusion of research in this field, this paper aims to give a
follow-up study and lay the foundation for future studies on E-commerce.

Methods

Method and core indexes

In order to visualize the whole e-commerce research network by text mining method, it has
to calculate the four core indexes which are density, degree centrality, betweenness cen-
trality and closeness centrality.

Density

The density of a network may give us insights into such phenomena as the speed at which
information diffuses among the nodes, and the extent to which nodes have high levels of
social capital and/or social constraint (Hanneman and Riddle 2006). The density D of a
network is defined as a ratio of the
 number
 of edges E to the number of possible edges,
N 2E
given by the binomial coefficient , giving D ¼ NðN1Þ (Kumar et al. 2012). Another
2
T
possible equation is D ¼ NðN1Þ whereas the ties T are unidirectional (Wasserman and
Faust 1994). This gives a better overview over the network density, because unidirectional
relationships can be measured.

Degree centrality

Degree is the simplest of the node centrality measures by using the local structure around
nodes only (Opsahl et al. 2010). In a binary network, the degree is the number of ties a
node has. Degree centrality can be measured from two aspects, the first is the absolute
degree centrality, which is the direct number and the related points; the other is the relative
degree centrality (Ma et al. 2013). The expressions of them are as following:
X
n X
n
CD0 ðni Þ ¼ Xij ; CDI ðni Þ ¼ Xij
j¼1 j¼1

ni means node; Xij is the adjacency matrix built by the relationship of nodes; if node
i points to node j, Xij is 1, otherwise 0.

Betweenness centrality

Betweenness represents the number of shortest paths in a network that traverse through that
node. Betweenness centrality determines the relative importance of a node by measuring

123
Scientometrics (2015) 105:403–417 407

the amount of traffic flowing through that node to other nodes in the network. This is done
by measuring the fraction of paths connecting all pairs of nodes and containing the node of
interest (Brandes 2001). Betweenness centrality, according to Borgatti (2005), focuses on
‘‘the share of times that a node i needs a node k (whose centrality is being measured) in
order to reach j via the shortest path’’. The expression is as following:
P
j\k gjk ðni Þ=gjk
CB ðni Þ ¼
ðg  1Þðg  2Þ
gjk means the number of shortcuts that node j to node k, gjk(ni) is the number of node
i which is the shortcuts of node j to node k. g is node numbers of the network.

Closeness centrality

Closeness is defined as the inverse of farness, which in turn, is the sum of distances to all
other nodes (Freeman 1978). Closeness has been generalised to weighted networks by

STEP1: Keywords Selection Tool: Bibexcel

Invalid words Elimination

STEP2:Data
Synonyms Combination Tool: Bibexcel
Preproccessing

High Frequency Words Selection

STEP3: Density Analysis Tool: Ucinet

Degree Centrality

STEP4:
Betweenness Centrality Tool: Ucinet
Centrality Analysis

Closeness Centrality

Component Analysis
STEP5:Cohesive
Tool:Ucinet ,SPSS
Subgroup Analysis
Multidimensional scaling

STEP6: Knowledge Graph Visualization Tool: Ucinet,Netdraw


-5-

Fig. 1 Data process flow

123
408 Scientometrics (2015) 105:403–417

Newman (2001) who used Dijkstra’s (1959) algorithm. To quickly reiterate Dijkstra’s
(1959) and Newman’s (2001) work here: closeness centrality describes the extent of
influence of a node on the network. The expression is as following:
" #1
X
g
 
Cc ðni Þ ¼ d ni ; nj
j¼1

d(ni, nj) means the distance between ni and nj. The formula means the distance that ni node
to other nodes. Up to bottom, the greater the distance, then the edge nodes and the less
important.

Data retrieval strategy and data process flow

The Science Citation Index Expanded (SCI-Expanded) and Social Sciences Citation Index
(SSCI) of the Institute for Scientific Information (ISI) Web of Science have been used to
retrieve the data for the study. The data were collected in March 2014. Six query words
(i.e., E-commerce, ecommerce, e-business, ebusiness, electronic commerce and electronic
business) were used to search topics of E-commerce relevant articles. Article language was
limited to English, and document type was limited to scholarly journal articles. The period
2000–2013 was used as restriction for the publication data since the data entry for 2014 had
not been completed yet by the time of analysis. We subsequently combined all the records
and deleted duplicated records. Finally 8488 records were picked out as the samples and
these records constituted the database for further analysis. The data process flow is shown
in Fig. 1.

Results

High frequency keywords

Bibexcel is a great tool for helping with bibliometric analysis, and citation studies in
particular (Pilkington 2006). It is a tool-box developed by Olle Persson, Inforsk, Unea
Univ in Sweden. BibExcel is designed to assist a user in analysing bibliographic data, or
any data of a textual nature formatted in a similar manner. The idea is to generate data files
that can be imported to Excel, or any program that takes tabbed data records, for further
processing (Persson et al. 2009). This paper chooses bibexcel to select all keywords in the
8488 papers. In order to ensure the data are validity, we preprocess data to gain high
frequency words.
Usually, there are four kinds of methods to pick out high frequency words. The first
method is the subjective judgments of the researchers. This method inevitably has a
subjective problem. The second method is according to Zipf’s law. The Zipf’s law is the
supposedly straight line relation between occurrence frequency of words in a language and
their rank, if both are plotted logarithmically. But when the keywords is extremely high
and extremely low frequencies, where the basic regularity governing word frequencies,
Zipf’s Law, is known to fail (Kornai 2002). So the third methods is brought out by
Donohue. Donohue (1973) believed that there was a critical value in the demarcation of
high-and low-frequency terms used in articles, and at the same time he proposed formulas
of calculation, that was:

123
Scientometrics (2015) 105:403–417 409

 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
T ¼ 1 þ 1 þ 8I1 =2

The fourth method is called frequency g-index. Firstly g-index is used to quantifying
scientific productivity based on publication records and is calculated through the distri-
bution of citations received by a given researcher’s publications (Egghe 2006). Later, many
researchers use g-index to choose high frequency keywords. If it ranks in decreasing order
of the number of keywords frequency, the g-index is the (unique) largest number such that
the top g keywords received (together) at least g2 frequency. They compare the g-index
method and Donohue and conclude that the g-index is better (Zhang et al. 2013).
After eliminating invalid keywords and combining synonyms keywords, we get the
keywords database of 11,063. Then using Donohue’s method to choose high frequency
words, we find that there is 7291 keywords having one frequency. So we got the T & 120.
And there were only seven keywords which have the frequency beyond 120. So we use g-
index and we get the g = 68. We pick out top 30 to display in Table 1.

Core indexes results

Density analysis

After preprocessing, we get the 68 keywords. By using Ucinet to calculate, the E-com-
merce research network’s density is (matrix average) = 0.0865, the SD = 0.2811. The
density is very low, it means that the relationships among the knowledge points are not so
intimate. Among the research subfields are in the condition of the low correlation, low
clustering. It shows that E-commerce research scope is too broad and at the same time it
lacks common focus and a clear context.

Table 1 Top 30 high frequency words distribution of E-commerce research


Rank Keywords Frequency Rank Keywords Frequency

1 Internet 397 16 Data mining 67


2 Trust 390 17 B2C E-commerce 67
3 Supply chain management 189 18 Ontology 66
4 Consumer behaviour 133 19 Information systems 65
5 Small-medium enterprises 132 20 Structural equation model 64
6 Recommender systems 128 21 Personalization 62
7 Security 127 22 Innovation 61
8 Privacy 117 23 Case study 60
9 Information technology 108 24 Supply chain 59
10 Web services 98 25 Intelligent agents 58
11 Technology acceptance model 94 26 Performance 57
12 Customer satisfaction 87 27 Knowledge management 57
13 Online shopping 84 28 Information and communication 57
technology
14 Collaborative filtering 70 29 Adoption 56
15 World Wide Web 67 30 Survey methods 55

123
410 Scientometrics (2015) 105:403–417

Centrality analysis

The degree centrality, betweenness centrality and closeness centrality on E-commerce are
shown in Table 2. Due to paper space limitations, it only shows top 15 of each kind of
centrality.
Degree centrality reflects the relationships of the keywords co-occurrence, the greater
the value, then the keywords and other keywords co-occurrence frequency is higher. And
the keywords are more nearer to the center of the network (Li et al. 2010). As Table 2
shows, the keywords that Internet, consumer behaviour, customer satisfaction, online
shopping, reputation are the research hotspots on E-commerce. These subfields had gained
widespread concerns of the researchers.
Betweenness centrality means one keywords’ power to affect the other keywords’ co-
occurrence in one paper. The more great the value is, the more betweenness power it has
(Wu 2009). As Table 2 shows, the top three highest betweenness centrality value of

Table 2 Centrality of E-commerce research


No. Degree Betweenness Closeness

Keywords Degree Nrmdegree Keywords Betweenness Keywords Farness

1 Internet 25.000 37.313 Internet 439.169 Internet 322.000


2 Consumer 15.000 22.388 Customer 177.286 Reputation 331.000
behaviour satisfaction
3 Customer 14.000 20.896 Reputation 170.594 Consumer 336.000
satisfaction behaviour
4 Online 12.000 17.910 Knowledge 164.578 Customer 336.000
shopping management satisfaction
5 Reputation 12.000 17.910 Consumer 138.675 Online 338.000
behaviour shopping
6 Taiwan 11.000 16.418 Online 128.339 XML 339.000
shopping
7 Knowledge 10.000 14.925 Simulation 121.629 Knowledge 340.000
management management
8 Internet 10.000 14.925 Information 101.505 Performance 341.000
marketing quality
9 Internet 9.000 13.433 Data mining 93.871 Technology 342.000
shopping adoption
10 Data mining 9.000 13.433 Information 93.039 Taiwan 344.000
technology
11 XML 9.000 13.433 Technology 91.775 Internet 345.000
adoption shopping
12 Retailing 8.000 11.940 Collaborativ-e 90.908 Risk 345.000
filtering
13 Information 8.000 11.940 Negotiation 86.179 Internet 345.000
technology marketing
14 Technology 8.000 11.940 Internet 83.427 Privacy 347.000
adoption marketing
15 Innovation 8.000 11.940 Retailing 74.997 Intelligent 349.000
agents

123
Scientometrics (2015) 105:403–417 411

keywords are Internet, customer satisfaction and reputation. These keywords have higher
power to control the network. In the E-commerce research network, these keywords are
most important intermediary of the knowledge network and they had the more influence to
affect the other keywords to co-occurrence.
Closeness centrality reflects the intimate degree of the node. Closeness centrality is
higher, it shows the lower co-occurrence the keyword to the other keywords in one paper
(Kas et al. 2013). As Table 2 shows, the keywords which are Internet, reputation, con-
sumer behaviour and customer satisfaction had the lower closeness centrality values. It
means they were easier to co-occur with other keywords. And these keywords had more
powers than the others. While keywords such as privacy and intelligent agents had higher
closeness centrality meant that they had lower probabilities to co-occur with other
keywords.

Cohesive subgroup analysis

The calculation of density and centrality reflect the status and influence of individual
keywords on E-commerce. In order to disclose the relationships among the keywords, it
has to make certain the main components of E-commerce researches. So we do compo-
nents analysis and multidimensional scaling analysis of the keywords.

Components analysis

Components represent the simplest form of cohesive subgroup analysis (Robinson et al.
2007). Through the composition analysis, it can make clear that whether the network has
the existence of separate several subgroups. The component can be called when there is no
connection between each subgroup. Components analysis has two types that are strongly
connected component and weak connected component (Fleischer et al. 2000). With Ucinet,
we find out one strongly and three weakly connected components as Table 3 shows.

MDS analysis

Multidimensional scaling (MDS) is a collection of statistical techniques that attempt to


embed a set of patterns described by means of a dissimilarity matrix into a low-dimen-
sional display plane in a way that preserves their original pairwise interrelationships as
closely as possible (Agrafiotis et al. 2000). MDS is also a main method of the co-words
network analysis. It discloses the structure behind the data. In this paper, the Ochiia
similarity coefficient is used to standardize and normalize the keywords matrix. Further-
more, the Euclidean distance model is adopted to make two dimensional scaling analysis.
We conduct nonmetric MDS with SPSS and the output map is shown in Fig. 2.

Table 3 Component analysis


Component Nodes Proportion
results
1 65 0.956
2 1 0.015
3 1 0.015
4 1 0.015

123
412 Scientometrics (2015) 105:403–417

Nonmetric MDS is often preferred because it tends to provide a better ‘‘goodness-of-fit’’


(stress) statistic, which is correspondingly better with lower stress (0 = perfect fit)
(Everton 2004). Generally, stress levels below 0.1 are considered excellent, while levels
above 0.2 are considered unacceptable. Accordingly, a higher RSQ (r-squared) value
(1 = perfect fit) is better, and RSQ values exceeding 0.6 are usually considered excellent
(Borgatti et al. 2002).
The reliability value stress is 0.13866, considerably lower than 0.2, and the validity
value RSQ was 0.93805, greater than 0.60, which equals an excellent goodness of fit. The
map plots each variable, thus permitting us to examine the similarity according to the
variable’ proximity to each other (Liu et al. 2013). We label three categories as Fig. 2, with
each dimension implicating a main research field on E-commerce.
Category 1 focuses on technology. Technology is the core of E-commerce. It is not only
the platform the E-commerce exists but also the driving force of E-commerce’s sustainable
development. Category 2 focuses on management. It covers knowledge management,
supply chain management, privacy and trust ect. Management of E-commerce involves
prioritizing buy-side and sell-side activities and putting in place the plans and resources to
deliver the identified benefits. These plans need to focus on management of the many risks
to success, some of which you may have experienced when using E-commerce sites, from
technical problems such as transactions that fail, sites that are difficult to use or are too
slow, through to problems with customer service or fulfilment, which also indicate failure
of management (Chaffey 2009). So, management is also an important research filed of
E-commerce. Category 3 focuses on customer. It covers customer satisfaction, consumer
behaviour, web services etc. Category 3 has some overlaps with category 1. Category 3
also pays attention to technology. For example, information technology and information
systems are in category 3. It means that customer studies on E-commerce based on the
construction and development of technology. From the MDS map, we know that tech-
nology is the foundation, management is the mean and customer is the object. The three
categories are mingling.

Category3

Category1

Category2

Fig. 2 Results of multidimensional scaling (MDS)

123
Scientometrics (2015) 105:403–417 413

Fig. 3 E-commerce network knowledge graph

Knowledge graph visualization

According to the nature of the network nodes, we choose the keywords which have the
higher center degree and strong resource control ability as the key points. We establish
respectively for the center with its subnet according to the key points, reflecting the subnet
gathered trend of knowledge. To a certain extent, the subnets represent the current research
topic groups of E-commerce research. Through the analysis of keywords co-occurrence
matrix and by using UCINET drawing function, we draw the E-commerce research field
network knowledge graph in Fig. 3. This paper picks out 7 key nodes which have stronger
power and stonger network control ability, respectively is: internet, consumer behaviour,
customer satisfaction, online shopping, reputation, Taiwan and knowledge management.
The subnet which based on internet covers 26 nodes, including trust, supply chain
management, privacy, online shopping, world wide web, supply chain etc. It covers 38 %
of the whole network. It is the most important subnet among the whole subnets. At the
same time, internet subnet has the most power and control ability of the whole subnet and
is the core node. In the other word, the base of E-commerce research is internet. It is the
most core research area on E-commerce.
Consumer behavior subnet has 16 nodes, including internet shopping, Taiwan, online
shopping, customer satisfaction, internet marketing and retailing etc. Customer satisfaction
subnet has 15 nodes, including service quality, consumer behavior, internet shopping,
retailing, satisfaction and information quality etc. Online shopping subnet has 13 nodes,
including internet, consumer behavior, satisfaction, perceived risk, personalization and
Taiwan etc. Reputation subnet has 13 nodes, including internet and collaborative filtering
etc. Taiwan subnet has 12 nodes, including consumer behavior, innovation, supply chain
management etc. Knowledge management subnet has 12 nodes, including information
technology, data mining and intelligent agents etc.

123
414 Scientometrics (2015) 105:403–417

Conclusions

In this research, we classify E-commerce research into three categories and seven subnets.
The three categories are technology, management and customer. The seven subnets
encompass internet, consumer behaviour, customer satisfaction, online shopping, reputa-
tion, Taiwan and knowledge management. According to the results, we can draw three
conclusions.

E-commerce research has accumulated a lot of theoretical achievements

We retrieved Web of Science database and found 8488 papers were on E-commerce
research subjects from 2000 to 2013. So there were 606 per year. It showed that many
researchers paid attention to the research field, and made a deep and detailed study. At the
same time, In the face of massive literature, new researchers have some difficult in quickly
grasping the research situation and key points. So this paper studies the research papers’
co-words network by text mining approach and gives a visualization of E-commerce
knowledge graph.

E-commerce research has three categories according to multidimensional


scaling analysis

Technology, management and customer are three levers of E-commerce. Technology can
be seen the guarantee of E-commerce. Laudon and Traver (2012) emphasized that tech-
nology, business development and social issues are the three major driving forces behind
E-commerce. Management involvement is imperative for the success of E-commerce
implementation (Chan and Swatmann 1999). Some topics on customer for example cus-
tomer satisfaction, consumer behaviour and web services caused the research interest.

E-commerce research’s most popular subject is internet

We find that subject on internet is the most popular subject. As we all know, E-commerce
refers to various online commercial activities focusing on commodity exchanges by
electronic means, Internet in particular. Internet is the foundation of E-commerce and the
carrier of commercial business information (Zheng 2009). So internet has a closest rela-
tionship with many other subjects on E-commerce. Besides that, consumer behavior,
customer satisfaction, online shopping, reputation, Taiwan and knowledge management
are hot research points.
In this research, we propose a knowledge graph of E-commerce by using text mining
technique, grouping terms of E-commerce research papers by similarity of their pattern
based on co-occurrence of the terms in papers. We hope to provide new insights for
researchers in the field of E-commerce that they apply results of this study for develop-
ments of research and educational programs. However, this pattern is limited to term co-
occurrence method and further characterization of the field by other bibliometrics tech-
niques like co-citation and co-coupling techniques is needed. Additionally, as data gath-
ering of this research was restricted to the source of WOS, a study on databases like
Medline, US patent, INSPEC in order to more coverage of publications set is recom-
mended. Moreover, institutional level analysis and author analysis could help to complete

123
Scientometrics (2015) 105:403–417 415

insight in this regard. Also, detailed comparison scientometric studies from different
perspectives with different national E-commerce research will be more helpful and fruitful.

References
Agrafiotis, D. K., Rassokhin, D. N., & Lobanov, V. S. (2000). Visualization of large molecular similarity
tables. Journal of Computational Chemistry, 22(5), 488–500.
An, X. Y., & Wu, Q. Q. (2011). Co-word analysis of the trends in stem cells field based on subject heading
weighting. Scientometrics, 88(1), 133–144.
Blake, C. L. (2011). Text mining. Annual Review of Information Science and Technology, 45, 121–155.
Bond, R., & Whiteley, C. (1998). Untangling the Web: A review of certain secure e-commerce legal issues.
International Review of Law Computers and Technology, 12(2), 349–370.
Borgatti, S. P. (2005). Centrality and network flow. Social Networks, 27(1), 55–71.
Borgatti, S. P., Everett, M. G., & Freeman, L. C. (2002). Ucinet for Windows: Software for Social Network
Analysis. Collegeville: Analytic Technologies, Inc. http://pages.uoregon.edu/vburris/hc431/Ucinet_
Guide.pdf.
Boritz, J. E., No, W. G., & Sundarraj, R. P. (2008). Internet privacy in E-commerce: Framework, review,
and opportunities for future research. In Proceedings of the 41st Hawaii international conference on
system sciences (p. 204).
Brandes, U. (2001). A faster algorithm for betweenness centrality. Journal of Mathematical Sociology,
25(2), 163–177.
Callon, M., Courtial, J. P., & Laville, F. (1991). Co-word analysis as a tool for describing the network of
interactions between basic and technological research: The case of polymer chemsitry. Scientometrics,
22(1), 155–205.
Callon, M., Courtial, J. P., Turner, W. A., & Bauin, S. (1983). From translations to problematic networks:
An introduction to coword analysis. Social Science Information Sur Les Sciences Sociales, 22(2),
191–235.
Cambrosio, A., Limoges, C., Courtial, J. P., & Laville, F. (1993). Historical scientometrics? Mapping over
70 years of biological safety research with coword analysis. Scientometrics, 27(2), 119–143.
Chaffey, D. (2009). E-business and E-commerce management: Strategy, implementation and practice (p.
14). Essex: Prentice Hall.
Chan, C., & Swatmann, P. M. C. (1999). B2B e-commerce implementation: The case of BHP steel. In
Conference paper. Melbourne, Australia: RMIT University.
Chen, L. C., Yu, T. J., & Hsieh, C. J. (2012). KeyGraph-based chance discovery for exploring the devel-
opment of e-commerce topics. Scientometrics, 95(4), 257–275.
Cho, J. (2014). Intellectual structure of the institutional repository field: A co-word analysis. Journal of
Information Science, 40(3), 386–397.
Coulter, N., Monarch, I., & Konda, S. (1998). Software engineering as seen through its research literature: A
study in co-word analysis. Journal of the American Society for Information Science, 49(13),
1206–1223.
Courtial, J. P. (1994). A coword analysis of scientometrics. Scientometrics, 31(3), 251–260.
Courtial, J. P., Callon, M., & Sigogneau, A. (1993). The use of patent titles for identifying the topics of
invention and forecasting trends. Scientometrics, 26(2), 231–242.
Davis, D. (2011). Global e-commerce sales head for the $1 trillion mark. http://www.internetretailer.com/
2011/01/04/global-e-commerce-sales-head-1-trillion-mark.
Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1,
269–271.
Ding, Y., Chowdhury, G. G., & Foo, S. (2001). Bibliometric cartography of information retrieval research
by using co-word analysis. Information Processing and Management, 37(6), 817–842.
Donohue, J. C. (1973). Understanding scientific literature: A bibliometric approach (p. 101). Cambridge,
MA: MIT Press.
Egghe, L. (2006). Theory and practice of the g-index. Scientometrics, 69(1), 131–152.
Everton, S. F. (2004). A guide for the visually perplexed: Visually representing social networks. Stanford:
Stanford University Press.
Fleischer, L. K., Hendrickson, B., & Pinar, A. (2000). On identifying strongly connected components in
parallel. In Parallel and distributed processing, IPDPS workshops, lecture notes in computer science
(pp. 505–511).

123
416 Scientometrics (2015) 105:403–417

Freeman, L. C. (1978). Centrality in social networks: Conceptual clarification. Social Networks, 1, 215–239.
Goi, C. L. (2007). A review of existing web site models for e-commerce. Journal of Internet Banking and
Commerce, 12(1), 1–17.
Goi, C. L. (2012). A review of web evaluation criteria for e-commerce web sites. Journal of Internet
Banking and Commerce, 17(3), 1–10.
Gupta, S., Koulamas, C., & Kyparisis, G. J. (2009). E-business: A review of research published in pro-
duction and operations management (1992–2008). Production and Operations Management, 18(6),
604–620.
Han, L., & Jin, Y. S. (2009). A review of technology acceptance model in the e-commerce environment. In
IEEE international conference on management of e-commerce and e-government (pp. 28–31).
Hanneman, R. A. & Riddle, M. (2006). Introduction to social network methods. http://faculty.ucr.edu/
*hanneman/nettext/C7_Connection.html.
Hui, S. C., & Fong, A. C. M. (2004). Document retrieval from a citation database using conceptual
clustering and co-word analysis. Online Information Review, 28(1), 22–32.
Iatsyk, O., & Szymczyk, K. (2001). A review of the main e-commerce issues in the Russian federation.
Journal of Internet Law, 9, 14–17.
Kas, M., Carley, K. M., & Carley, L. R. (2013). Incremental closeness centrality for dynamically changing
social networks. In IEEE/ACM international conference on advances in social networks analysis and
mining (pp. 1250–1258).
Kornai, A. (2002). How many words are there? Glottometrics, 4, 61–86.
Kumar, P. S., Krishnendu, S., & Aditi, D. (2012). Appliance of graph models to ascertain social network in
pri system. International Journal of Engineering and Management Science, 3(4), 490–499.
Laudon, K., & Traver, C. G. (2012). E-commerce 2013: Business, technology, society. London: Prentice
Hall.
Law, J., & Whittaker, J. (1992). Mapping acidification research: A test of the co-word method. Sciento-
metrics, 23(3), 417–461.
Lee, B., & Jeong, Y. I. (2008). Mapping Korea’s national R&D domain of robot technology by using the co-
word analysis. Scientometrics, 77(1), 3–19.
Li, W. H., Zhang, W. D., & Chen, Z. B. (2010). China’s urbanization development research: Based on
bibliometrics and social network analysis. Library and Information Network Journal, 12, 1–9.
Liao, H., Proctor, R. W., & Salvendy, G. (2008). Content preparation for cross-cultural e-commerce: A
review and a model. Behaviour and Information Technology, 27(1), 43–61.
Liu, H. X. (2008). Review on the patentability of business methods in E-commerce. In IEEE international
conference on management of e-commerce and e-government (pp. 320–323).
Liu, C. L., Xu, Y. Q., Wu, H., Chen, S. S., & Guo, J. J. (2013). Correlation and interaction visualization of
altmetric indicators extracted from scholarly social network activities: Dimensions and structure.
Journal of Medical Internet Research, 15(11), 259–261.
Ma, X. J., Wu, J., & Zhang, Y. J. (2013). Research on industry alliance knowledge transfer network
modeling and simulation based on complex networks. Management Science and Engineering, 7(3),
13–21.
Newman, M. E. J. (2001). Scientific collaboration networks. II. Shortest paths, weighted networks, and
centrality. Physical Review, 64(1), 1–7.
Opsahl, T., Agneessens, F., & Skvoretz, J. (2010). Node centrality in weighted networks: Generalizing
degree and shortest paths. Social Networks, 32, 245–251.
Persson, O., Danell, R., & Schneider, J. W. (2009). How to use Bibexcel for various types of bibliometric
analysis. In Åström, F. (Ed.), Celebrating scholarly communication studies: A Festschrift for Olle
Persson at his 60th Birthday (pp. 9–24). International Society for Scientometrics and Informetrics.
Pilkington, A. (2006). Quick start guide to bibliometrics and citation analysis. http://yunus.hacettepe.edu.tr/
*tonta/courses/spring2011/bby704/bibexcel-primer.pdf.
Rawi, K. A., Sabry, K., & Nakeeb, A. A. (2008). Driving factors for e-commerce: Gulf region review.
Academy of Information and Management Sciences Journal, 11(2), 19–32.
Rikken, F., Kiers, H. A. L., & Vos, R. (1995). Mapping the dynamics of adverse drug-reactions in sub-
sequent time periods using indscal. Scientometrics, 33(3), 367–380.
Rip, A., & Courtial, J. P. (1984). Co-word maps of biotechnology: An example of cognitive scientometrics.
Scientometrics, 6(6), 381–400.
Robinson, S. E., Everett, M. G., & Christley, R. M. (2007). Recent network evolution increases the potential
for large epidemics in the British cattle population. Journal of the Royal Society, 4(15), 669–674.
Robn, J. A. (1998). Creating usable E-commerce sites. Standard View, 6(3), 110–115.
Sharp, B. (1998). Creating an e-commerce architecture. Unix Review, 47(2), 45–51.

123
Scientometrics (2015) 105:403–417 417

Shiau, W. L., & Dwivedi, Y. K. (2013). Citation and co-citation analysis to identify core and emerging
knowledge in electronic commerce research. Scientometrics, 94(3), 1317–1337.
Trad, A., & Kalpi, D. (2013). The selection and training framework for managers in e-business innovation
transformation projects—The literature review. Procedia Technology, 9, 411–420.
Vaithianathan, S. (2010). A review of e-commerce literature on India and research agenda for the future.
Electronic Commerce Research, 10, 83–97.
Visser, E. J., & Lanzendorf, M. (2004). Mobility and accessibility effects of B2C E-commerce: A literature
review. Tijdschrift voor Economische en Sociale Geografie, 95(2), 189–205.
Wang, B. H., Guo, X., Niu, H. J., & Li, H. Y. (2011). A review and prospects of initial trust in E-commerce.
In IEEE international conference on management and service science (pp. 1–4).
Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge:
Cambridge University Press.
Wolfinbarger, M. F., & Gilly, M. C. (2003). Consumer behavior. In H. Bidgoli (Ed.), The Internet Ency-
clopedia (Vol. 1, pp. 272–283). New Jersy: Wiley.
Wormell, I. (2000). Critical aspects of the Danish welfare state-as revealed by issue tracking. Sciento-
metrics, 48(2), 237–250.
Wu, R. B. (2009). An empirical study of keywords network analysis using social network analysis. Journal
of Intelligence, 28(9), 50–53.
Wu, J. N., & Liu, L. (2010). E-business capability research: A systematic literature review. In IEEE 3rd
international conference on information management, innovation management and industrial engi-
neering (pp. 142–147).
Yang, Y., Wu, M. Z., & Cui, L. (2012). Integration of three visualization methods based on co-word
analysis. Scientometrics, 90(2), 659–673.
Yi, L., & Thomas, R. H. (2007). A review of research on the environmental impact of e-business and ICT.
Environment International, 33(6), 841–849.
Zhang, S., Liu, C. X., & Chang, Y. (2013). Selection research of keywords in co-word clustered based on the
G-index of word frequency. Modern Educational Technology, 23(10), 54–57.
Zhao, Y. D. (2013). The review of web mining in e-commerce. In IEEE international conference on
computational and information sciences (pp. 21–23).
Zheng, Q. (2009). Introduction to E-commerce (pp. 7–8). Beijing: Tsinghua University Press.
Zong, Q. J., Shen, H. Z., Yuan, Q. J., Hu, X. W., Hou, Z. P., & Deng, S. G. (2013). Doctoral dissertations of
library and information science in China: A co-word analysis. Scientometrics, 94(2), 781–799.

123

You might also like