Big Data Review
Big Data Review
Abstract— The paradigm of Big Data has been established as a solid field of studies in many areas such as healthcare, science, transport,
education, government services, among others. Despite widely discussed, there is no agreed definition about the paradigm although there are
many concepts proposed by the academy and industry. This work aims to provide an analytical view of the studies conducted and published
regarding the Big Data paradigm. The approach used is the systematic map of the literature, combining bibliometric analysis and content
analysis to depict the panorama of research works, identifying patterns, trends, and gaps. The results indicate that there is still a long way to
go, both in research and in concepts, such as building and defining adequate infrastructures and standards, to meet future challenges and for
the paradigm to become effective and bring the expected benefits.
25
http://ijses.com/
All rights reserved
International Journal of Scientific Engineering and Science
Volume 3, Issue 11, pp. 25-32, 2019. ISSN (Online): 2456-7361
Bibliometric analysis enables to perform a quantitative the most cited articles, but also of its most cited references,
analysis of the citations between articles. This analysis can be allowing identifying those which contributed to articulating
made by counting the number of the individual citations of the theoretical foundations of the area.
each paper, as well as the analysis of the references used by
the most cited articles, allowing the researcher to identify
bibliometric clustering phenomena and relationships between
two articles based on the number of common references [13].
The scientific database "Web of Science" was used to
collect the sample by the combination of the terms "("bigdata"
OR "big data")" to search the titles of the publications,
resulting in a set of 2,718 articles.
This database was selected because it provides an interface
to simultaneously search across different sources using a
common set of search fields for obtaining comprehensive
results. It includes studies from 1985 to the current date,
covering the Science Citation Index Expanded, Social
Sciences Citation Index, Arts & Humanities Citation Index,
and Emerging Sources Citation Index, which comprehends
studies from ACM, EBSCOhost, Elsevier, Emerald, IEEE,
INFORMS, ProQuest, SAGE, Springer, Taylor & Francis, Fig. 1. Workflow conducted for analyzing the sample selected
Wiley, among many other publishers. This database is also the
source for computing the "Journal Citation Report" index, The content analysis allows analytical flexibility for
which is one of the most used mechanisms for evaluating defining the codification scheme, which is then used in the
journals based on citation data. code occurrence statistics and their relationships, as well as in
From this initial set, only the works in the "Article" the qualitative interpretation analysis [15]. Content analysis
category were selected. This filtering was performed because, was used to identify the key issues and gaps in the literature,
according to [17], these works go through a peer review in addition to the type of research, the nature of the data and
process before being published and they also contain the the affiliations of the authors.
information necessary for performing the bibliometric For content analysis and codification, 11 articles were
analysis. selected from the original sample of 288 items. The criteria for
Additionally, some articles were excluded from the the selection was to extract the articles that received two or
sample. The criteria for selection were: the articles had to be more citations from other articles in the sample. These articles
published in “English” and classified in the categories related were read and classified according to the coding scheme
to "Computer Science", "Business Economics", presented in Table I.
"Engineering", "Telecommunication", "Governmental Law", First, the articles were divided into two groups, according
"Information Science Library Science", "Science Technology to the type of study: conceptual (theoretical/conceptual,
Other Topics" "Health Care Services Sciences", "Social modeling, literature review, and simulation) or empirical
Sciences Other Topics", "Communication", "Mathematics", (survey, case study, experimental, action research, and
"Operation Research Management Science", "Medical technology). Then they were classified according to the
Informatics" and "Public Administration". After this affiliation of the authors; finally, they were evaluated as to the
screening, a 391-article subset was obtained from the initial nature of the variables used in the research (quantitative and
sample. qualitative).
Next, the articles titles were read and classified separately; III. RESULTS
in addition, the individual evaluation results were
consolidated. The items that showed divergence in A. Bibliometric Analysis
classification, as to maintaining them or not for further TABLE I. Code scheme used to classify selected works (adapted from [19]).
C1 – Kind of study C2 – Authors affiliation
analysis, were discussed until a consensus emerged. From this
TE1 – Modeling A1 – University
analysis, 103 items were excluded for not being associated
with the theme, reducing the sample to 288 items. TE2 – Theoretical / A2 – Research institution
Conceptual
Finally, the papers were analyzed in two stages: 1) A3 – Company
TE3 – Literature
bibliometric and, 2) content analysis and codification. Fig 1. review
shows the workflow carried out for analyzing the selected TE4 – Simulation C3 – Approach
sample, adapted [17]. TE5 – Survey ND1 – Quantitative
The Sci2Tool version 1.1 software [18] was used for the ND2 – Qualitative
TE6 – Case study
bibliometric analysis and for constructing the citation of the
articles and the keyword networks. For analyzing the articles TE7 – Research action
that most impacted the area, the articles to references network TE8 – Experimental
was elaborated, because it provides not only an overview of TE9 –Technology
26
http://ijses.com/
All rights reserved
International Journal of Scientific Engineering and Science
Volume 3, Issue 11, pp. 25-32, 2019. ISSN (Online): 2456-7361
The 288 articles selected were published in 152 journals. articles was represented as a link. After elaborating the paper
Table II shows the number of publications per year, from 2009 citation network, the articles presenting at least two citations
and 2015, of the journals with at least four publications. were selected and it is shown in Table III.
The first publications appeared in 2009, which shows that
B. Content Analysis and Coding
it corresponds to a recent research topic. Eight journals
published approximately 19.5% of the sample: Computer, The results shown in Table IV were obtained from the
IBM Journal of Research and Development, Accounting coding scheme (Table I) used to classify the selected works
Horizons, International Journal of Production and Economic, (Table III). The cells of Table IV indicate the presence of
Software Practice & Experience, IEEE Network, International terms associated with the respective code in each of the 11
Journal of Communication and Communications of the ACM. selected articles.
However, the vast majority of the selected articles, The following sub-sections discusses the relevant points of
corresponding to approximately 67%, occurred in 143 each of the 11 works presented, based on the coding scheme.
different journals with three or fewer publications.
Among the journals shown in Table II, there are three of University/Research Institution/Enterprise (A1-A2-A3)
them that focus on areas beyond technology, which are: The articles selected showed the presence of a
Accounting Horizons (targeting academic and professional representative set of technical works related to Big Data
audiences besides addressing issues related to accounting); applications. Most articles are from universities and research
International Journal of Production and Economic institutions (A1-3), with little association with enterprise
(interdisciplinary, focusing on the interface between works, which shows the Big Data paradigm is still new.
engineering and management; also covering issues related to
TABLE III. Most cited articles from the sample
manufacturing, industrial processes and production); and MIT
# Title Authors # Citations
Sloan Management Review (discourses about advances in Business Intelligence and Analytics - From Big
management practices that are transforming how people lead 1 [1] 17
Data to Big Impact
and innovate). 2 Critical Questions for Big Data [20] 13
From the analysis of Table II, it is also possible to identify 3 The pathologies of big data [21] 10
Data Science, Predictive Analytics, and Big Data
a growing trend in the number of publications over the years,
4 - A Revolution That Will Transform Supply [22] 5
ranging from 1 publication in 2009 to 116 in 2015. Chain Design and Management
Data-intensive applications, challenges,
TABLE II. Journal and period publication distribution 5 techniques and technologies - A survey on Big [23] 5
JOURNAL JCR 2009 2011 2012 2013 2014 2015 TOTAL Data
Computer 1.443 5 3 3 11 6 Data mining with big data [24] 4
The meaningful use of big data - four
IBM Journal of Research and
0.688 7 [25] 3
Development
6 4 1 11 perspectives four challenges
Accounting Horizons 0.881 8 8
Data quality for data science, predictive
analytics, and big data in supply chain
International Journal of 8 [26] 2
Production Economics
2.752 1 5 6 management - An introduction to the problem
and suggestions for research and applications
Software Practice & Experience 0.897 1 4 5
A cubic framework for the chief data officer -
IEEE Network 2.540 4 1 5 9 [27] 2
Succeeding in a world of big data
International Journal of
0.618
Hazy - making it easier to build and maintain
Communication
1 4 5 10 [28] 2
big-data analytics
Communications of the ACM 3.621 1 1 2 1 5 Data quality management, data usage experience
11 [29] 2
Information Sciences 4.038 3 1 4 and acquisition intention of big data analytics
International Journal of
0.665 2 2 4
Distributed Sensor Networks TABLE IV. Code schema associated with selected works
Cluster Computing – The # Article/
Journal of Networks Software 1.510 4 4 TE1 TE2 TE3 TE4 TE5 TE6 TE7 TE8 A1 A2 A3 ND1 ND2
Tools and Applications Author
27
http://ijses.com/
All rights reserved
International Journal of Scientific Engineering and Science
Volume 3, Issue 11, pp. 25-32, 2019. ISSN (Online): 2456-7361
Quantitative (ND1) Applications and Knowledge, which deals with user privacy
It is also possible to notice that Analytics has been used issues, high-level semantics and knowledge of application
since 2000 and recently has evolved into Big Data Analytics domain; and 3) Big Data Mining Algorithms, which processes
(ND1). the data mining.
For example: [1] presented a trend of Business Intelligence
and Analytics (BI&A) works between 2000-2011 with 3,602 Theory (TE2)
works classified in Business Intelligence (3,146), Business The selected works presented technologies and correlated
Analytics (213), and Big Data (243). Business Analytics and research areas, issues that still require more definitions and
Big Data started to emerge in 2007; 2011 was the most research on using Big Data, a framework for processing Big
expressive year, with publications related to Business Data, and methods of monitoring and controlling data quality
Intelligence (338), Business Analytics (126), and Big Data (TE2).
(95). [1] presented the foundational technologies, which are
mature and well evolved, and some emerging researches being
Qualitative (ND2) developed in Analytics. The Analytic types highlighted are
As qualitative approach (ND2), [1] presented an overview (Big) Data Analytics, Text Analytics, Web Analytics,
of the impact that Big Data had in some Business Intelligence Network Analytics, and Mobile Analytics.
applications. For example, in “E-Commerce and Market Reference [20] presented critical issues about Big Data that
Intelligence”, the user data logs, records, and contents require more research and definitions for expanding its usage,
generated by customers can affect their faithfulness and such as Big Data changes the knowledge definition that
satisfaction. requires new functionalities for searching and archiving; data
Reference [20] presented six issues with the emergence of interpretation that requires objectivity and accuracy; the
Big Data that refer to new knowledge interpretation, rigorous and systematic way for collecting and analyzing data,
subjective data analysis, the importance of systematic and independently of their size.
rigorous approach to data collecting and analysis, Big Data [24] presented the HACE theorem charactering Big Data
contextualization, data accessibility versus ethics, public and as Heterogeneous, Autonomous, Complex and Evolved
private access. features, proposing a framework for Big Data processing from
[21] presented the pathologies of Big Data facing the a Data Mining perspective in three layers (data accessing and
appearance of good performance devices due to the increasing computing, data privacy and domain knowledge, Big Data
quantity of data. mining algorithms).
Reference [22] presented the causes that affect Big Data
volume, velocity and variety attributes. Among data types, for Literature Review (TE3)
example, the sales department needs more details about its Articles presenting a literature review (TE3) showed the
operations, including prices, dates, customers (volume) data, general view of concepts, but according to [1], due to recent
the monthly, weekly, daily or hourly (velocity) frequency, interests in Big Data, there is still no significant amount of
straight, distributors, internet and competitors (variety) sales. work discussing the many tendencies. In this sense, the work
Reference [23] discussed opportunities and challenges to of [1] aimed to analyze the impact that Business Intelligence
US government sectors that believe in Big Data usefulness to & Analytics (BI&A) had on the currently growing importance
decision making based on intensive data. The challenges of data in several critical areas including commerce,
regard data analysis of Big Data, taking into account government, science and technology, health, public security
inconsistencies, non-completeness, and scalability based on and safety.
time and data security.
[24] treated Big Data as enormous data volume, Survey (TE5)
heterogeneous, autonomous sources with a centralized and Some authors presented research of predictions, challenges
distributed control, which explores complex and evolved and opportunities of Big Data (TE5). For example, [1]
relationships. The challenges of Data Mining with Big Data presented some predictions about the future impacts of Big
include three layers: access and data computation; privacy and Data on some BI&A applications. For E-Commerce and
domain knowledge; and the mining algorithm of Big Data. Market Intelligence, the authors highlighted that long-tail
Reference [26] presented the data quality problem in the marketing, personalized and directed recommendation, sales
Supply Chain Management (SCM) context and proposed increase, and customer satisfaction are important issues.
methods of monitoring and controlling data quality based on Reference [23] presented the opportunities and challenges
Statistical Process Control (SPC). of Big Data. The USA government sectors believe in the
usefulness of Big Data to decision making based on intensive
Modeling (TE1) data.
Among conducted studies, only one of the eleven selected [29] presented a research based on the theory perspective
articles presented a framework of Big Data processing [24] of data management among 306 selected industries, with
(TE1). The framework consists of three layers: 1) Big Data experience in using internal and external data, and which the
Mining Platform, which accesses and computes low-level effort in data quality revealed positive effect to Big Data
data; 2) Information Sharing, Data Privacy, Big Data Analytics adoption.
28
http://ijses.com/
All rights reserved
International Journal of Scientific Engineering and Science
Volume 3, Issue 11, pp. 25-32, 2019. ISSN (Online): 2456-7361
Case Study (TE6) Analytics or also to Big Data Analytics. The terms Big Data
Reference [21] presented a case study (TE6) about some and Analytics have different definitions, sometimes
pathologies originated due to the large amount of data, which interchangeable and sometimes generating conceptual conflict,
brings difficulties in using it. Other difficulties involve media as can be seen in [30], describing that some executives are
storage to deal with huge datasets, besides the manipulated questioning whether Big Data is not just another way of
data size considering the analytical applications as the last saying Analytics.
stage of the data analysis. For dealing with these difficulties,
the authors proposed the Distributed Computing as a strategy TABLE V. Keywords with frequency index greater than 3 identified in the
eleven most cited articles
for Big Data. Frequency
Keywords Selected articles
index
Experimental (TE8) 1 2 3 4 5 6 7 8 9 10 11
Big Data 115
We can note discussions about applications presented by Cloud
some authors. For example, [1] discussed some applications 17
Computing
of BI&A. In E-Commerce and Market Intelligence, examples MapReduce 12
Data Mining 9
are Recommender Systems, Social Media Monitoring and
Hadoop 9
Analysis, Social and Virtual Games. In E-Government and Big Data
8
Politics 2.0, Ubiquitous Government Services, Citizen Analytics
Analytics 7
Engagement and Participation and Political Campaign and e-
Social Media 6
Pooling are the most important. Data Analytics 5
Reference [22] presented some potential applications of Predictive
5
Big Data in logistics and SCM. For example, in manufactures, Analytics
Machine
fast response to the positive and negative customer sentiments, 5
Learning
stocks managed by retailers, efficient response to customers, Text Mining 5
Internet of
improvement of delivery tracking system regarding time and Things
4
availability, and more effective monitoring of productivity. In Business
4
Retail, mobile devices and customer data sentiments in stores, Intelligence
Computational
improvement of accuracy of stocks, linking of local traffic and Social Science
4
climate, work reduction due to stock error reductions.
[28] treated the GeoDeepDive, a demonstration project that The notion of Analytics is quite variable and it is possible
illustrates Hazy approach to the trained systems development. to verify some definitions for this term generally referring to
The GeoDeepDive project had the support of geologists to do the extraction of knowledge from information. To [1]
the linguistics and statistics analysis on 10-ton of journal Analytics associated with Big Data is a term that has been
articles in geology. used to describe the data sets and techniques in large and
complex applications demanding advanced storage,
IV. DISCUSSION
management, analysis, and visualization technologies.
As part of the results, this section addresses some concepts For more specific cases it is possible to identify the term
based on the most cited keywords presented in the eleven most Analytics associated with Big Data, creating the term Big Data
cited articles. Analytics. It is also possible to identify some definitions for
From the extensive set of keywords identified, some of this term, which is emphasized by [29] as the technologies
them, which have their frequency index greater than three, and techniques that a company can employ to analyze large-
were selected for the analysis as shown in Table V. It is also scale, complex data, for various applications intended to
presented its relationship with the eleven most cited articles augment firm performance in various dimensions.
presented in Table III. To [22] Predictive Analytics is a subset of data science
The first keyword, i.e., the most mentioned by all articles having a number of disciplines related to this, such as
corresponds to Big Data. The considerations related to the Statistics, Forecasting, Optimization, Applied Probability,
other keywords are presented in an analytical and conceptual Data Mining, and Analytical Mathematical Modeling.
view according to the most cited word, i.e., Big Data. Predictive Analytics comprise a variety of techniques that
As can be seen in Table V, the keyword Big Data predict future outcomes based on the historical and current
Analytics was mentioned three times by considering the set of data.
eleven articles, representing the most mentioned keyword. All It is relevant to the current scenario involving analytics and
the others were cited only once. Big Data to present the classification proposed by [1] for
With the set of keywords identified, it is possible to verify emerging research on analytics associated with Big Data. The
that there is a close relationship between some of them, authors proposed five technical categories for analytics and
promoting an integrated approach to specific concepts. Big Data: 1) (Big) Data Analytics, 2) Text Analytics, 3) Web
Analytics, 4) Network Analytics, and 5) Mobile Analytics. It
Big Data Analytics, Analytics, and Predictive Analytics are is also possible to verify in [5] other categories that were
the keywords frequently cited offering prominence to define a highlighted for analytics related to Big Data: 1) Structured
term that has been associated with Big Data, which refers to
29
http://ijses.com/
All rights reserved
International Journal of Scientific Engineering and Science
Volume 3, Issue 11, pp. 25-32, 2019. ISSN (Online): 2456-7361
Analytics, 2) Text Analytics, 3) Web Analytics, 4) Multimedia Data Mining applied to Big Data, such as: Hazy Project [28];
Analytics, 5) Network Analytics, and 6) Mobile Analytics. MLBase [34]; Weka [35]; and Stratrosphere [36].
The concept related to the analytics has been presenting As mentioned by Chen et al. [1], Business Intelligence
significant challenges when it comes to Big Data. Perhaps it is with Analytics corresponds to Big Data as an important area,
one of the most important challenges for this paradigm, as it either in the academic or corporate environments.
refers to the semantic capacity of Big Data. Reference [1] propose the unification of Business
Although different challenges are checked for the different Intelligence and Analytics addressing questions about Big
organizations that need integrated Analytics to Big Data, [31] Data Analytics as a way to offer new directions to Business
commented that the main barriers faced by organizations that Intelligence and Analytics. They consider the evolution of
adopt analytics with Big Data are the management and Business Intelligence and Analytics standing as follows:
cultural, besides, they are not necessarily linked to data and BI&A 1.0 with focus to structured data (DBMS-based, content
technologies. For [32] the value of any analytics is largely structured); BI&A 2.0 considering the Internet as a new
related to the capacity that it offers to decision makers. possibility for data collection and the advent of web analytics
From the integrated presentation of the keywords tools (Google Analytics) (Web-based, unstructured content);
highlighted above related to the analytics concept, the other and BI&A 3.0 which covers the era of large-scale use of
three keywords that have high relevance and are also mobile phones and the beginning of the actions related to the
highlighted by researchers and the industry involved in Big new paradigm that corresponds to the Internet of Things
Data are: Machine Learning, Data Mining, and Business (Mobile and Sensor-based content).
Intelligence. Considering another keyword highlighted in Table V, there
It should be noted that these three keywords mentioned in is the concept of virtualization technologies culminating in
several articles are words that refer to broad concepts, so this one of the most robust technologies for Big Data
article provides the relationship of these keywords by corresponding to Cloud Computing, which together with other
addressing them conceptually and by analyzing the integration technologies has become a mechanism capable of offering
that can be verified from these concepts. effective response to the needs presented in a petabyte-scale
The analysis from these other three keywords has the data as [23]. To [37] Cloud Computing is closely linked to Big
objective of presenting their main features, concepts, and Data and should be used for the management of huge
especially the relationship of them with Big Data. computing and storage resources providing computing
The principle of Data Mining is to extract knowledge from capacity for Big Data Applications. As complemented by [38],
information using algorithms, models, platforms, and specific cloud maintain more than the hardware, it gives to customers a
technologies to meet this objective related to data set of virtual machines for Big Data Management.
management. As another form of high-volume data generation, Social
According to [1] and [33], some of the most influential Media is linked to Big Data becoming an important instrument
Data Mining algorithms refer to C4.5, k-means, SVM, Apriori, for decision-making in many sectors. Considering text
EM (Expectation Maximization) , PageRank, AdaBoost, kNN messages exchanged by people, e-mail messages, and
(k-Nearest neighbors), Naive Bayes, and CART covering mechanisms for providing images, Social Media has reached
problems of classification, clustering, regression, association more and more users. As [20], Social Media interaction is part
analysis, and network analysis. However, these do not meet all of behavioral networks.
the needs of Big Data, representing a clear technical barrier for Another relevant point related to Social Media is presented
this paradigm. by [1] referred to the creation of many techniques for analysis
Mining operation in Big Data, as cited by [1], culminates of user opinion, text, and sentiment analysis. These types of
in studies that turn to analytics to meet the shares of analysis of online community behavior and social network
classification, prediction, regression, association rule, and culminate in what the authors call Network Analytics
clustering analysis where machine learning and genetic including new techniques and computational models.
algorithms have been contributing to the success of different Two relevants keywords related to Big Data are: Hadoop
data mining applications. and MapReduce. Both have its frequency index greater than
To [23] there is an integration of data analysis techniques three, but they were not mentioned by the authors of the
involving Data Mining, Neural Networks, Machine Learning, eleven most cited articles, so, as they have high impact and
Signal Processing, and Visualization Methods. It is possible to relevance for Big Data paradigm, they will be treated as
note that Machine Learning practices, Statistical Theories and follow.
Models, and Multivariate Analysis represent a relevant data Hadoop supports massive data storage and has high
analysis technique that culminates in Data Mining. Therefore, scalability and processing power distributed to deal with a
it can be concluded that the Statistical Models collaborate to huge amount of data and corresponds to an open-source
the application of Machine Learning Methods in Data Mining framework. Being used with NoSQL databases it can provide
tasks favoring the activities of knowledge discovery flexibility for Big Data Applications [39]. Hadoop is also
considering Big Data. regarded as the best software platform established to support
As examples, there are some technologies that address the high data volume implementing a distributed computing
integration of practices and methods of Machine Learning and paradigm called MapReduce as [23].
30
http://ijses.com/
All rights reserved
International Journal of Scientific Engineering and Science
Volume 3, Issue 11, pp. 25-32, 2019. ISSN (Online): 2456-7361
Based on the divide and conquer method, the improvements in sectors such as healthcare, science, transport,
computational model MapReduce, developed by Yahoo! and education, government services, among others.
other web companies, treat a complex problem through several
sub-problems using the map step and reduce step. The ACKNOWLEDGMENT
solution is ensured by the combination of the sub-problems We thank Programa de Educação Continuada em
solutions, which has led the Hadoop/MapReduce model to a Engenharia (PECE) of Polytechnic School of University of
wide use for Big Data. São Paulo for supporting this work.
MapReduce as a programming model operates with two
basic functions that refer to map and reduce. Moreover, as part REFERENCES
of the Apache Hadoop, it has emerged as a very efficient [1] H. Chen, R. H. L. Chiang, V. C. Storey, “Business intelligence and
model for Big Data processing as [40]. analytics: from big data to big impact,” MIS Quarterly, vol. 36.
pp.1165–1188, 2012.
Thus, the concepts associated with the set of keywords that [2] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh,
are involved with Big Data were covered up, allowing a better A.H. Byers, “Big data: The next frontier for innovation, competition,
understanding of this paradigm. Similarly, this approach can and productivity,” McKinsey Global Institute, 2011.
foster future research able to define and structure the concepts http://www.mckinsey.com/business-functions/business-technology/our-
insights/big-data-the-next-frontier-for-innovation.
and applications related to Big Data to offer better results. [3] P. Russom, “Managing Big Data,” 2013.
http://tdwi.org/webcasts/2013/10/managing-big-data.aspx.
V. CONCLUSION [4] M. Schroeck, R. Shockley, J. Smart, D. Romero-Morales, P. Tufano,
This work aimed to depict the panorama of published “Analytics: The real-world use of big data,” IBM Global Business
Services, vol. 12, pp. 1-20, 2012.
studies regarding to Big Data evaluating its maturity by [5] H. Hu, Y. Wen, T. Chua, X. Li, “Toward Scalable Systems for Big Data
following a consistent method of scientific research. For Analytics: A Technology Tutorial,” IEEE Access, vol. 2, pp.652–687,
analyzing the published scientific articles, the "Web of 2014. doi:10.1109/ACCESS.2014.2332453.
Science" (WoS) database was used along with the bibliometric [6] R. T. Bedeley, L. S. Iyer, “Big Data Opportunities and Challenges: The
Case of Banking Industry”, in Proceedings of the Southern Association
analysis following a specific workflow to favor the for Information Systems Conference, Macon, GA, USA, pp. 1–6, 2014.
observation of the production and expansion of scientific [7] NIST, Draft NIST Big Data Interoperability Framework: Volume 1,
articles considering Big Data. 2015.
The bibliometric analysis allowed some detailed responses [8] Y. Demchenko, C. de Laat, P. Membrey, “Defining architecture
components of the Big Data Ecosystem,” in 2014 International
presented in two formats: the first one from the perspective of Conference on Collaboration Technologies and Systems (CTS) IEEE,
some categories presented in Table IV and the second one 2014. doi:10.1109/CTS.2014.6867550.
based on the concepts observed from the keywords with [9] IDC, “Worldwide Big Data Technology and Services forecast 2016-
higher frequency index and presented in the eleven most cited 2020,” 2016.
https://www.idc.com/getdoc.jsp?containerId=US40803116.
articles as Table V. [10] Gartner, “Gartner Survey Shows More Than 75 Percent of Companies
A detailed review and analysis of these studies allowed Are Investing or Planning to Invest in Big Data in the Next Two Years,”
observing that this is a term involved in many research areas 2015. http://www.gartner.com/newsroom/id/3130817.
and presented in a variety of complexity levels. [11] A. Neely, “The evolution of performance measurement research:
Developments in the last decade and a research agenda for the next,”
Big Data scientific studies concern areas such as International Journal of Operations & Production Management, vol. 25,
economics, business, healthcare, information systems, among no. 12, pp.1264–1277, 2005. doi:10.1108/01443570510633648.
others. Some research works are related to new technologies, [12] S. Prasad, J. Tata, “Publication patterns concerning the role of
processes and models addressing the Big Data Architecture teams/groups in the information systems literature from 1990 to 1999,”
Information & Management, vol. 42, no. 8, pp.1137–1148, 2005.
and Infrastructure as verified in [28] and [36]; as also some are doi:10.1016/j.im.2005.01.003.
linked to expectations considering Big Data responses. [13] M. M. Kessler, “Bibliographic coupling between scientific papers,”
Thus, since the works have varying levels of complexity, it American Documentation, vol. 14, no. 1, pp. 10- 25, 1963.
was possible to find articles with high scientific formalism, doi:10.1002/asi.5090140103.
[14] L. Ikpaadhindi, “An overview of bibliometric: its measurements, laws
besides works that present certain superficiality regarding and ther application,” Libri, vol. 35, 1985.
results. Basic concepts observable in most studies are greatly [15] V. J. Duriau, R. K. Reger, M. D. Pfarrer, “A content analysis of the
repeated, which leads to a sense of solidification of this initial content analysis literature in organization studies: Research themes, data
stage of research. sources, and methodological refinements,” Organizational Research
Methods, vol. 10, no. 1, pp.5–34, 2007.
There are some surveys on Big Data, many specific works doi:10.1177/1094428106289252.
related to a particular subject or area, and others that involve [16] M. M. Carvalho, A. Fleury, A. P. Lopes, “An overview of the literature
technologies to support Big Data. However, a slow on technology roadmapping (TRM): Contributions and trends,”
convergence of these studies or surveys was observed; again, Technological Forecasting and Social Change, vol. 80, no. 7, pp.1418–
1437, 2013. doi:10.1016/j.techfore.2012.11.008.
it is possible to note that although linearly the research works [17] M. M. Carvalho, P. Lopes, D. Marzagão, “Gestão de portfólio de
present some advance, in its integrative way, they do not projetos: contribuições e tendências da literatura,” Gestão & Produção,
converge. vol. 20, pp. 433–453, 2013.
Embracing challenges and powerful results, the Big Data [18] Sci2 Team, Science of Science (Sci2) Tool, 2009. https://sci2.cns.iu.edu.
[19] J. A. Carnevalli, P. C. Miguel, “Review, analysis and classification of
paradigm has been established in many areas, despite not the literature on QFD-Types of research, difficulties and benefits,”
presenting concrete and replicable results yet, but as a solid International Journal of Production Economics, vol. 114, no. 2, pp.737–
element of studies and research that may provide significant 754, 2008. doi:10.1016/j.ijpe.2008.03.006.
31
http://ijses.com/
All rights reserved
International Journal of Scientific Engineering and Science
Volume 3, Issue 11, pp. 25-32, 2019. ISSN (Online): 2456-7361
[20] D. Boyd, K. Crawford, “Critical questions for big data: Provocations for
a cultural, technological, and scholarly phenomenon,” Information,
Communication and Society, vol. 15, no. 5, pp.662–679, 2012.
doi:10.1080/1369118X.2012.678878.
[21] A. Jacobs, “The pathologies of big data,” Communications of the ACM,
vol. 52, no. 8, pp. 36-44, 2009. doi:10.1145/1536616.1536632.
[22] M. A. Waller, S. E. Fawcett, “Data science, predictive analytics, and big
data: A revolution that will transform supply chain design and
management,” Journal of Business Logistics, vol. 34, pp.77–84, 2013.
doi:10.1111/jbl.12010.
[23] C. P. Chen, C. Y. Zhang, “Data-intensive applications, challenges,
techniques and technologies: A survey on Big Data,” Information
Sciences, vol. 275, pp. 314–347, 2014. doi:10.1016/j.ins.2014.01.015.
[24] X. Wu, X. Zhu, G. Q. Wu, W. Ding, “Data mining with big data,” IEEE
Transactions Knowledge and Data Engineering, vol. 26, no. 1, pp. 97–
107, 2014. doi:10.1109/TKDE.2013.109.
[25] C. Bizer, P. A. Boncz, E. L. Brodie, O. Erling, “The meaningful use of
big data,” ACM SIGMOD Record, vol. 40, pp. 56-60, 2011.
doi:10.1145/2094114.2094129.
[26] B. T. Hazen, C. A. Boone, J. D. Ezell, L. A. Jones-Farmer, “Data quality
for data science, predictive analytics, and big data in supply chain
management: An introduction to the problem and suggestions for
research and applications,” International Journal of Production
Economics, vol. 154, pp.72–80, 2014. doi:10.1016/j.ijpe.2014.04.018.
[27] Y. Lee, S. Madnick, R. Wang, F. Wang, H. Zhang, “A cubic framework
for the chief data officer: Succeeding in a world of big data,” MIS
Quarterly Executive, vol. 13, pp.1–13, 2014.
[28] A. Kumar, F. Niu, C. Ré, “Hazy: making it easier to build and maintain
big-data analytics,” Communications of the ACM, vol. 56, no. 3, pp.40-
49, 2013. doi:10.1145/2428556.2428570.
[29] O. Kwon, N. Lee, B. Shin, “Data quality management, data usage
experience and acquisition intention of big data analytics,” International
Journal of Information Management, vol. 34, no. 3, pp. 387–394, 2014.
doi:10.1016/j.ijinfomgt.2014.02.002.
[30] A. McAfee, E. Brynjolfsson, “Big data: The management revolution,”
Harvard Business Review, vol. 90, no.10, pp. 60–68, 2012.
[31] S. LaValle, M. S. Hopkins, E. Lesser, R. Shockley, N. Kruschwitz,
“Analytics: The new path to value,” MIT Sloan Management Review,
vol. 52, pp.1–25, 2014.
[32] D. Kiron, P. K. Prentice, R. B. Ferguson, “Raising the bar with
Analytics,” MIT Sloan Management Review, vol. 55, no. 2, 2014.
[33] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G.J.
McLachlan, A. Ng, B. Liu, P. S. Yu, Z. H. Zhou, M. Steinbach, D. J.
Hand, D. Steinberg, “Top 10 algorithms in data mining,” Knowledge
and Information Systems, vol. 14, no.1, pp.1–37, 2008.
doi:10.1007/s10115-007-0114-2.
[34] T. Kraska, A. Talwalkar, J. Duchi, R. Griffith, M.J. Franklin, M. Jordan,
“MLbase: A distributed machine-learning system,” in 6th Biennial
Conference on Innovative Data Systems Research (CIDR’13), Asilomar,
California, USA, 2013.
[35] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. H.
Witten, “The WEKA data mining software,” ACM SIGKDD
Explorations Newsletter, vol. 11, no. 1, pp. 10-18, 2009.
doi:10.1145/1656274.1656278.
[36] A. Alexandrov, R. Bergmann, S. Ewen, J. C. Freytag, F. Hueske, A.
Heise, O. Kao, M. Leich, U. Leser, V. Markl, F. Naumann, M. Peters, A.
Rheinländer, M. J. Sax, S. Schelter, M. Höger, K. Tzoumas, D.
Warneke, “The Stratosphere platform for big data analytics,” VLDB
Journal, vol. 23, pp.939–964, 2014. doi:10.1007/s00778-014-0357-y.
[37] M. Chen, S. Mao, Y. Liu, “Big Data: A Survey,” Mobile Networks and
Applications, vol. 19, no. 2, pp.171–209, 2014. doi:10.1007/s11036-013-
0489-0.
[38] H. Demirkan, D. Delen, “Leveraging the capabilities of service-oriented
decision support systems: Putting analytics and big data in cloud,”
Decision Support Systems, vol. 55, no. 1, pp. 412–421, 2013.
doi:10.1016/j.dss.2012.05.048.
[39] J. O. Chan, “An architecture for big data analytics,” Communications of
IIMA, vol. 13, no. 2, 2013.
[40] S. Sharma, U. S. Tim, J. Wong, S. Gadia, S. Sharma, “A brief review on
leading big data models,” Data Science Journal, vol. 13, pp. 138–157,
2014. doi:10.2481/dsj.14-041.
32
http://ijses.com/
All rights reserved