0% found this document useful (0 votes)
236 views

ACM - A Systematic Review On Big Data Analytics Frameworks For PDF

Uploaded by

R.Kirubakaran R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
236 views

ACM - A Systematic Review On Big Data Analytics Frameworks For PDF

Uploaded by

R.Kirubakaran R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

A Systematic Review on Big Data Analytics Frameworks for

Higher Education - Tools and Algorithms


David Otoo-Arthur Terence Van Zyl
School of Computer Science and School of Computer Science and
Applied Mathematics, University Applied Mathematics, University
of the Witwatersrand of the Witwatersrand
Private Bag 3, Wits, 2050 Private Bag 3, Wits, 2050
Johannesburg, South Africa Johannesburg, South Africa
david.otoo- [email protected]
[email protected]

•Information systems ~ Database management system ~ Storage


ABSTRACT
architectures ~ Applied computing ~ E-learning.
The development of Big Data applications in education has drawn
much attention in the last few years due to the enormous benefits it KEYWORDS
brings to improving teaching and learning. The integration of these Big Data, Big Educational Data, Higher Education, Data
Big Data applications in education generates massive data that put Mining,Learning Analytics,MS
new demands to available processing technologies of data and
extraction of useful information. Primarily, several higher ACM Reference format:
educational institutions depend on the knowledge mined from these David Otoo-Arthur, Terence Van Zyl. 2019. A Systematic Review on Big
vast volumes of data to optimise the teaching and learning Data Analytics Frameworks for Higher Education - Tools and Algorithms.
environment. However, Big Data in the higher education context In Proceedings of 2019 International Conference on E-Business,
has relied on traditional data techniques and platforms that are less Information Management and Computer Science (EBIMCS2019).
efficient. This paper, therefore, conducts a Systematic Literature December, 2019, Kuala Lumpur, Malaysia. ACM, New York, NY, USA. 9
pages. https://doi.org/10.1145/ 3377817.3377836.
Review (SLR) that examines Big Data framework technologies in
higher education outlining gaps that need a solution in Big
Educational Data Analytics. We achieved this by summarising the
INTRODUCTION
current knowledge on the topic and recommend areas where
educational institutions could focus on exploring the potential of The term Big Data is widely used to describe the massive volume
Big Data Analytics. To this end, we reviewed 55 related articles out and diversity of information. Many explanations given for Big Data
of 1543 selected from Six (6) accessible Computer Science focus on volume, variety, velocity, veracity and value. More
databases between the period of 2007 and 2018, focusing on the precisely, Andrea et al. [1] define Big Data as “the Information
development of the Big Data framework and its applicability in asset characterized by such a High Volume, Velocity and Variety
education for academic purposes. Our results show that very few to require specific Technology and Analytical Methods for its
researchers have tried to address the integrative use of Big Data transformation into Value.” The over-reliance on digital
framework and learning analytics in higher education. The review applications have spread the term across various sectors, including
further suggests that there is an emerging best practice in applying healthcare, banking, stock markets, customer trade, insurance,
Big Data Analytics to improve teaching and learning. However, manufacturing, and communications and media. Figure 1 shows the
this information does not appear to have been thoroughly examined Big Data Ecosystem from data creation to decision-making. It
in higher education. Hence, there is the need for a complete depicts the major processes within the Big Data Ecosystem. One of
investigation to come up with comprehensive Big Data frameworks the fields where volume, velocity, variety and value of data coexist
that build effective learning systems for instructors, learners, course is in higher education. Volumes of data captured and generated
designers and educational administrators. daily from various applications in higher educational institutions
keep growing exponentially. With the fast pace of changing trends
CCS Concepts in higher education due to the inclusion of these digital
technologies, there is the need to deal with the large volumes of
data regularly [2].
∗Article Title Footnote needs to be captured as Title Note

Romero and Ventura [3] argue that these massive amounts of
Author Footnote to be captured as Author Note
Permission to make digital or hard copies of part or all of this work for personal or data generated can be examined to gain insight into students
classroom use is granted without fee provided that copies are not made or distributed learning to improve educational outcomes and explain educational
for profit or commercial advantage and that copies bear this notice and the full
citation on the first page. Copyrights for third-party components of this work must phenomena. However, identifying a systematic survey that focuses
be honored. For all other uses, contact the owner/author(s). on a Big Data framework for education and how it could influence
EBIMCS2019, December, 2019, Kuala Lumpur, Malaysia
© 2019 Copyright held by the owner/author(s). 978-1-4503-6649-6/19/12..$15.00
educational reforms and enhance digital education governance
https://doi.org/10.1145/ 3377817.3377836 remains a challenge. Presently, identifying an integrative Big Data

1
A Systematic Review on Big Data Analytics Frameworks for
EBIMCS 2019, December, 2019, Kuala Lumpur, Malaysia
Higher Education - Tools and Algorithms

framework for higher education is emerging to be a problem [4]. RQ 1: Do higher educational institutions include any Big Data
Many frameworks proposed for Big Data analytics in higher and related frameworks in addressing concerns and challenges of
education focus on predictive analytics. Building a data-sharing improving teaching and learning?
network and managing data generated from multiple sources is also RQ 2: Which tools and algorithms are used to model academic
a problem [5]. performance in the field of education?
For this purpose, this paper undertakes a systematic literature RQ 3: What are the educational community forecasting?
review (SLR) to establish a general overview of the existing body We planned the rest of the paper as follows. We present the
of knowledge and also outline the gaps that need to be addressed in methodology in section 2. Section 3 discusses the results, and
Big Data analytics in higher education. We intend to answer the finally, we conclude the study in section 4.
subsequent questions to provide direction to researchers about the
unexplored areas.

Figure 1: Big Data Ecosystem (From Data Acquisition to Decision-Making)

296 from ISI Web. These studies summed up to 1543. Based on the
title, we rejected 1061 papers. We further rejected 155 studies
RESEARCH METHODOLOGY whose abstract were not thematically related to our central idea.
For this review, we followed Kofod-Petersen [6] methodology, Again, we rejected 92 studies based on the general study of the
which emphasises on getting a snapshot of research piloted in a paper. Based on primary study, we rejected 88 studies. We
particular field of study. We focus on works that combine considered 92 studies not detailed enough, hence were rejected.
Educational Data Mining, Learning Analytics, Big Data These processes narrowed down the studies to 55, which we finally
frameworks in education and tools and algorithms for modelling used for the SLR. Figure 2 shows an overview of the selection and
students’ academic performance. rejection procedure adopted for this study. Table 1 provides the
search criteria that produced the results for the study.
Inclusion and Quality Standards
Table 3: Search Criteria
We selected research works from 2007 to 2018 that deals with Big
Data framework, Educational Data Mining framework for
S/No. Related String Structure of the Search Strings
academic performance prediction and frameworks for Learning
Analytics. Our focus was to get relevant works that address issues Big Data framework AND Education
of Big Data and higher education concerning framework, tools and Big Data OR Big Data AND Higher Education
algorithms from reputable databases. 1 Framework OR Big Data AND E-learning AND
Blended Learning
Search Process Big Data AND (Students OR
We covered six (6) primary data sources, which are ACM, IEEE, 2 Education Students Performance OR Pedagogy
Springer, Elsevier, SAGE and ISI Web. The total number of OR Schools OR Educational Data
research work conducted on Big Data framework for modelling Mining OR Learning Analytics)
students’ academic performance was 286 from ACM, 428 from
IEEE, 193 from Springer, 203 from Elsevier, 137 from SAGE and Quality of Assessment

2
EBIMCS2019, December, 2019, Kuala Lumpur, Malaysia David Otoo-Arthur, Terence Van Zyl.

To ensure the quality valuation of the SLR, we examined the most RESULTS AND ANALYSIS
recent and standard papers from the six selected databases. These
The motivation of this study was to get the idea of Big Data
papers were of two types, journal papers and conference papers as
frameworks, tools and algorithms used broadly in higher education.
shown in Table 2.
The sequence of results are 1) Big Data frameworks, 2) tools and
algorithms, and 3) what the education community is forecasting.
Extraction and Synthesis of Data
Data abstraction and synthesis involve selecting the results from the Table 5: Data Extraction and Synthesis Specimen
pool or studies relevant to the review questions. Table 3 shows the
collections of evidence from selected 55 studies. Narration Aspect
Type of Paper (Conference or Journal),
Title, Author(s), Publication Year,
Meta Data Statistics
Volume, Issue, Pages
Data Extraction
Fundamental objective espoused by the
Abstract
study
Different Case Studies most selected
Case Studies
papers used
Results Selected studies results
Various criteria measures are used to
Assessment
evaluate the results
Data Synthesis
Research Study data sources are
Research
outlined in Table 2
Identified BD frameworks are outlined
Frameworks
in Table 4
Tools and algorithms used are detailed
Tools and Algorithms
in Tables 5 and 6
Identified Areas of Predictions in
Focus
Education are detailed in table 6
Figure 2: An Overview of the Selection Process

Table 4: Research Database and Type of Publication

Database Type References Total


[7],[8],[9],[10],[11],[12],
Conference
[13],[14],[15][16],[17] 11
ACM
Journal -
[18],[19],[20],[21],[22],
Conference [23],[24] ,[25],[26],[4],
13
[27],[28],[29]
IEEE
Journal -
Conference [30],[31],[32]
Springer 7
Journal [33],[34],[35],[36]
Conference - Figure 3: Shows a Year-Wise Distribution of Research Papers
[37],[38],[39],[40],[41], from 2007 to 2018. We see a sharp increasing interest in Big
Elsevier 11 Data in education within recent years.
Journal [42],[43],[44][45],[46],
[47]
RQ 1: Do higher educational institutions include any Big Data and
Conference - related frameworks in addressing concerns and challenges of
SAGE [48],[49],[50],[51],[52], 9 improving teaching and learning?
Journal To address RQ1, we focus on the frameworks that integrate Big
[53],[54],[55],[56]
Data and Educational Data Mining, Learning Analytics and related
Conference [57]
ISI Web 4 fields. Mainly we highlight Big Data tools and its applications to
Journal [58],[59],[60]
education, frameworks that provided a detail description and have

3
A Systematic Review on Big Data Analytics Frameworks for
EBIMCS 2019, December, 2019, Kuala Lumpur, Malaysia
Higher Education - Tools and Algorithms

a strong theoretical basis. From Figure 3, we note that the visualisations in the result would have led to a better understanding
integration of Big Data integration into education is a new and of the usage and impact of this framework.
evolving endeavour. Most of the studies selected were between 3.1.2 Framework by Udupi, Malali and Noronha (2016)
2015 and 2018, with 2017 recording the highest number (20) of Udupi, Malali and Noronha [20] presented a Smart Learning
studies that focused on Big Data and Education. The upsurge in framework that integrates E-Learning, Big Data and Smart
2017 is mainly due to the quest of building effective learning Technology. The e-Learning framework (layer 1) focused on
systems based on Big Data to support rapid and timely learning pedagogical activities and educational technologies comprising
analytics. A notable number of higher educational institutions were three sub-layers that enabled data processing. Information from the
utilising Big Data technologies to handle the massive data e-learning systems is synthesized from their sources into three
generated within their settings and to gain a competitive advantage forms (contents, user information and data evaluation) and then
in the education space as highlighted by authors in [29], [34], [61], passed onto the Big Data framework (layer 2) for data extraction,
[62] Our comprehensive checks further revealed ten (10) Big Data exploration and analytics. At the top layer (layer 3) is the smart
frameworks related to educational settings, as shown in Table 4. learning systems used by learners. Despite their framework
integrating Big Data into teaching and learning, detailed
Table 6: Identified Big Data Frameworks for Education explanations on analytics techniques, data storage mechanisms and
report visualization are lacking. Their proposed concept failed to
S/No Big Data Framework Work capture administrative data, demographic data and unstructured
data as well.
A Big Data Framework for Early 3.1.3 Framework by Machova, Komarkova and Lnenicka (2016)
Identification of Dropout Students in [30]
1 Machova, Komarkova and Lnenicka [25] recommended
Massive Open Online Course (MOOC)
Apache Hadoop cluster to examine complex students interaction
Big Data Integration for Transition data from Moodle system on the cloud. Apache Hadoop,
from e-Learning to Smart Learning [20]
2 MapReduce, OpenStack, and Ubuntu Servers anchored their
Framework framework. Although their framework tries to open up the issue of
Processing of Big Educational Data [24] data security, however, it suffers from several limitations; in
3 in the Cloud using Apache Hadoop particular, the authors restricted their work to historical data failing
Five-Sided Educational Data Mining to discuss the various related activities such as analytic techniques,
4 [28] user roles and data security.
Framework (5S-EDMF)
Data Science in Education: Big Data 3.1.4 Framework by Zeng (2017)
5 [4]
and Learning Analytics [28] Zeng [29] proposed a five-sided educational data-mining
Implementation of Learning Analytics framework (5S-EDMF) to analyse college students' diligence and
Framework for MOOCs using state-of- [23] effectiveness of study and to recommend learning resources
6
the-art In-Memory Computing accordingly. Their framework discusses the sources of Big Data in
A Big Data Analytic Framework for education and how these captured data support teaching and
investigating Streaming Educational [15] learning. The framework comprised four main modules:
7
Data 1. Data acquisition module outlining the various sources of
[34] data from the national level to the individual learner.
8 Big Data for Online Learning Systems
A Novel Adaptive e-Learning Model 2. Data processing module where the framework processes
Based on Big Data by using structured, semi-structured, and unstructured historical
[43]
9 competence-based Knowledge and data.
Social Learner Activities
3. Data mining analysis module which evokes the mining
Harnessing the power of Big Data algorithm in Mahout to carry on parallel computation on
Analytics in the Cloud to support Learning [46] data to obtain knowledge hidden pattern(s), and
10 Analytics in Mobile Learning Environment
4. Data application module that deals with the presentation
and visualisation of data analysis results for decision-
Frameworks Review making. Their framework, however, failed to consider
streaming data.
3.1.1 Framework by Tang et al. (2015)
Tang et al. [31] investigated the application of Big Data methods to While their framework highlights significant issues such as
identify students who are likely to drop out of MOOC. To this analytics techniques and visualisation, they are not detailed.
extent, they proposed an automatic framework that used historical 3.1.5 Framework by Klašnja-Milicevic, Ivanovic and Budimac
data to construct a classification model to identify the potential (2017)
dropout student. Their framework presented six (6) modules: data Klašnja-Milicevic, Ivanovic and Budimac [4] also suggested a
collection, data processing, model construction, online framework for analysing and processing multi-structured data sets
classification and result presentation. In spite of their framework for Higher learning institutions. Their work proposed five modules
achieving an average accuracy of 94.9\% on a real dataset, the that anchored the framework; Data Capture and Collection, ETL
inclusion of user roles, storage infrastructure, data types and (Extractions, Transformations, Loading), Hadoop platform,

4
EBIMCS2019, December, 2019, Kuala Lumpur, Malaysia David Otoo-Arthur, Terence Van Zyl.

Analysis Engine and Presentation Layer. The framework aims at analysing, optimisation, and visualisation. Users profile, enrolment
improving teaching and learning by identifying the roles of actors, facts and other relevant contents are synthesised at the third layer
namely learners, teachers/researchers and administrators/data (e-Learning) and presented to the actors of the framework at layer
scientist. Their framework captures various sources of data in four. Despite proposing an extensive and detailed framework, the
education, manage these data within the Hadoop platform and authors failed to discuss the pedagogical design of online learning
stores them in a compatible repository. The analysis engine tasks for the various users of the architecture. Again, their
executes predefined and standard procedures and other complex framework did not look at analytics methods and how learners’
statistical analysis. The presentation layer provides a user-friendly behaviour could be monitored in a real-time environment, as
graphical visualisation interface for the users of the system. Even suggested by the authors for future work.
though their framework is comprehensive and elaborate, the
3.1.9 Framework by Birjali, Beni-Hssane and Erritali (2018)
authors failed to delve deep into the analytics techniques. How their
Birjali, Beni-Hssane and Erritali [44] offered an adaptive e-
framework would secure data and optimise this platform with other
learning model architecture based on Big Data technology and
learning services is also not mentioned. Besides, no real data was
optimisation algorithms. The model presented two levels of
used to test the framework.
adaptation. The first level employed the MapReduce-based GA to
3.1.6 Framework by Laveti et al. (2017)
retrieve the necessary future educational objectives (FEO) based on
Laveti et al. [24] proposed a workflow learning analytic for
the learner prerequisites through learner e-assessment method.
MOOCs that applied the Spark framework. The framework
Using FEO's MapReduce-based Ant Colony Optimisation (ACO)
advanced three main components; Data Store, Processing and
generates adaptive personalised learning path (PLP) that contains
Visualisation, and Reporting and Visualisation. Their framework
the learning content that the learner needs. The second level of their
primarily focused on machine learning algorithms and its
framework determined social indicators from social networks that
application to analytics in education. Contrary to testing their
are in correlation with their learning activities through social
framework with a real dataset, the type of data was not explicit.
network analysis (SNA). Despite producing a framework that
Data captured were historical and excluded streaming data.
blends Big Data technologies to an e-learning platform, the
Moreover, the authors failed to explain how the framework would
framework failed to look into the type of data such as administrative
handle data storage process and its related security issues.
and demographic. Besides, the framework did not include any
3.1.7 Framework by Yang et al. (2017)
mechanisms for securing data and visualisation.
Yang et al. [15] proposed a framework for learners streaming
3.1.10 Framework by Shorfuzzaman et al. (2018)
data using term frequency-inverse document frequency (TF-IDF)
Shorfuzzaman et al. [47] proposed a cloud-based mobile
and fuzzy representation techniques to uncover significant patterns
learning framework that uses Big Data analytics technique to
from usage data. Their work suggested three critical stages;
extract values from a vast volume of mobile learners' data. Their
1. Data collection stage where students' metadata and
behavioural data were collected, framework provided on-demand scalable computing and data
storage resources. However, the framework failed to consider
2. Data persistence stage which comprises Kafka and administrative and demographics data, real-time learning analytics
storage systems for streaming and storing data and and result visualisation, as suggested by authors for future work.
3. Data mining stage where they extracted knowledge from
data. Big Data Frameworks Correlation
Besides, the framework adopted a unified storage mechanism, In general, we consider the framework in this study as a conceptual
which combined traditional RDBMS, HDFS and NoSQL intending or real-time architectural model intended to guide or support Big
to handle small to a large amount of structured and semi-structured Data analytics in higher education.
data, and streaming data. However, their framework could not Our frameworks review revealed several similarities and gaps
handle unstructured data and left out features such as reporting and within the ten (10) identified frameworks. Specifically, the
visualisation. frameworks present four key thematic modules. These modules
3.1.8 Framework by Dahdouh et al. (2018) include data capture and collection, data processing and
Drawing on a broad range of Big Data technologies, Dahdouh et al. visualisation, model construction and data mining, and result
[35] proposed an architecture framework that integrates traditional presentation and visualisation. We realised authors failed to address
e-Learning systems with cloud computing. The Architecture thoroughly issues of data security, privacy, ownership and theories
provided a detailed description of how Hadoop and other Big Data that support the whole data cycle in education.
technologies (Spark and NoSQL) could work together. Four layers Demchenko, de Laat and Membrey [64] suggests that Big Data
characterised their framework; cloud infrastructure, Big Data, e- infrastructure should support the whole data lifecycle, data security
Learning and user. At the infrastructure layer is the hardware and data ownership protection. Similarly, authors in [11], [65], [66]
resources, built with virtualising compute, storage and network argue that the impact of Big Data thrives not only on building
resources. This layer provides abstraction and flexibility when effective models, but informed theories which serve as the basis to
managing various hardware resources. The second layer, Big Data, analysing large-scale data. Therefore, having an educational
provides various Big Data tools for data storage, processing, framework that integrates these pertinent issues is imperative.

5
A Systematic Review on Big Data Analytics Frameworks for
EBIMCS 2019, December, 2019, Kuala Lumpur, Malaysia
Higher Education - Tools and Algorithms

that educational institutions are using within their space. Hadoop


Table 7: Big Data Tools for Big Education Data
framework and Python were noted to be the most widely used tools
within the educational community. Besides, fewer related works
Category Big Data Technologies Works
used mining, ETL and visualisation tools. Collecting, transforming
Storm, Spark, Hadoop1, [57],[17],[39],[18], and assembling data in an automated manner is essential in the era
Kafka, Mahout, Flume [41],[15],[23],[53], of Big Data. Hence the need for ETL tools.
Middleware [34], [28],[24],[31], Notably, traditional ETL tools have some limitations, such as
[43],[46] depriving one to have a full view of data, and to decide what parts
Programming Java, Python, Sqoop, R [17],[24],[53],[23], are useful or not. Only one of the studies \cite{dahdouh2018big}
[33],[52], [16],[34] reviewed used ELT that have the potential of eliminating the
Language shortfall of ETL. Considering the integration of these two
Databases MongoDB, SQL, [38],[17],[18],[41], approaches will achieve high-quality data for making reliable
MySQL, NoSQL, [15],[53], [34] decisions in education since they leverage on each other's strengths.
Cassandra
STATA, SPSS,Vertica, [53]
WEKA, KNIME, KEEL,
Mining Tools
RapidMiner
ETL Tools Scriptella, Pentaho, [33],[46]
KETL, OpenRefine
Visualisation Kibana, Elastic Search, [53],[46]
Tableau
Tools
RQ 2: Which tools and algorithms are used to model academic
performance in the field of education?
In the comparison of algorithms shown in Table 6, we observed
nineteen (19) key algorithms and thirty (30) Big Data tools as
Figure 10: Year-Wise Distribution of Big Data tools Used in
recorded in Table 5 from the 55 selected studies. Our findings show
Education2
that MapReduce, regression, SVM and decision tree are the
commonly used algorithms in education. In general, most related
While Spark remains a more suitable middleware option in terms
studies focused on supervised learning techniques. Although we
of performance and parallelisation [68]; we recognised that Hadoop
found several techniques, there is still a gap in finding efficient and
and MapReduce had been widely deployed tools in education. This
effective models for predictions in higher education [67].
evidence of deployment is seen clearly in Figure 10 where we note
Figures 4 - 9 illustrate the usage trends of Big Data tools in
the consistent dominance of Hadoop along with an ever-increasing
higher education. We observed six (6) categories of Big Data tools

1 *
The Category of Hadoop includes HDFS, YARN and Common 2
Other category includes SPSS, STATA, Vertica, WEKA, KNIME, KEEL,
Tableau, Cassandra, Kibana, Elastic Search, OpenRefine, Scriptella, Pentaho,
KETL. Each tool within this category appeared once.

6
EBIMCS2019, December, 2019, Kuala Lumpur, Malaysia David Otoo-Arthur, Terence Van Zyl.

variety of other tools in recent years. However, Grolinger et al. [69] inability to execute the in-memory job efficiently due to the slow
and Gupta et al. [70] point out that, Hadoop comes with some nature of the Hadoop-MapReduce.
challenges such as difficulty in programming, thus requiring
abstraction, ineffectiveness in processing streaming data and its
Table 8: Comparison of Algorithms used for Modelling Students Performance

Category Technique Work

Decision Tree3 [9],[30],[23], [27]


Supervised
Learning SVM4 [28],[23],[27],[44],[46]

Random Forest [23],[27],[46]

Neural Networks5 [27],[46]

Association Rule Mining [38],[15]


Unsupervised
Learning Association Rule Mining [38],[15]

Regression6 [17],[37],[10],[23],[42],[27]

Bayesian Knowledge Tracing [11],[53]


Statistics-Based
Learning Bayesian Estimation
[45]
(Latent Dirichlet Allocation)

Latent Profile
[37]
Analysis (LPA)

TF-IDF [15]

ıve Bayes
Na¨ [23]

KC Model [11]

Fuzzy Logic [15]


Other techniques
Page Ranking [16]

Hadoop MapReduce [38],[57],[24],[41],[31],[16],[43],[34]

Genetic Algorithm [43]

Gradient Boosting
[23],[46]
(XGBoost)

Stacked Ensemble [23]

Student Performance
Table 9: Domains of Forecasting in Education 1 Prediction [15],[26]
Planning and Management [12],[27],[18],
What Education Community is 2 of Stakeholders
S\No Works [33],[55]
Forecasting

3The Category of Decision Trees include Classification and Network


6
Regression Trees, C4.5/5.0 The category of regression includes Logistic Regression, Cox Proportional Hazard
4 Regression, Time Dependent Cox, Linear Regression, Multiple Linear Regression,
The category of SVM includes one-against-all
5
Multinomial Logistic Regression and Multivariate
The category of neural network includes Feedforward Neural
Network, Deep Neural Network and Backpropagation Neural

7
A Systematic Review on Big Data Analytics Frameworks for
EBIMCS 2019, December, 2019, Kuala Lumpur, Malaysia
Higher Education - Tools and Algorithms

Security and Risk In future, we expect to conduct a thorough examination of tools


3 Mitigation [58],[55]
and algorithms identified to provide more in-depth information in
Behaviour Visualization the context of higher education.
4 and Analysis [58]
Social Network Analysis [13],[7],[6],
5
and Students Collaboration [43],[52]
Student Enrolment
ACKNOWLEDGMENTS
6 Management [55],[58] We want to acknowledge the Wits Institute of Data Science for their
Student Dropout, support.
[10],[27],[23],
7 Retention and At-Risk
[30]
Detection REFERENCES
8 Student Skill Estimation [11],[14] [1] A. De Mauro, M. Greco, and M. Grimaldi, “A formal definition of big data based
on its essential features,” Library Review, vol. 65, no. 3, pp. 122–135, 2016.
[2] C. Vaitsis, V. Hervatis, and N. Zary, “Introduction to big data in education and
RQ 3: What are the educational community forecasting? its contribution to the quality improvement processes,” 2016.
From the 55 studies selected, we identified eight (8) forecast [3] C. Romero and S. Ventura, “Data mining in education,” Wiley Interdisciplinary
Reviews: Data Mining and Knowledge Discovery, vol. 3, no. 1, pp. 12–27, 2013.
areas of focus in education. Table 7 lists related works on areas [4] S. M. Muthukrishnan, N. B. M. Yasin, and M. Govindasamy, “Big data
where researchers focus on forecasting in higher education. Among framework for students’ academic performance prediction: A systematic
the eight (8) domains observed, planning and management of literature review,” in 2018 IEEE Symposium on Computer Applications &
Industrial Electronics (ISCAIE). IEEE, 2018, pp. 376–382.
stakeholders, social network analysis and students’ collaboration, [5] M. R. Jang, T. Miller, M. Squires, and D. Cargile, “Managing data received from
student dropout, retention and at-risk detection were found to be multiple sources for generating a contact profile for synchronizing with the
multiple sources,” Mar. 14 2013, uS Patent App. 13/609,216.
researchers’ priority. Fewer studies were concerned about learners’ [6] A. Kofod-Petersen, “How to do a structured literature review in computer
behaviour, performance prediction, security and risk mitigation. science,” Ver. 0.1. October, vol. 1, 2012.
These studies reviewed were noted to have conducted their research [7] M. Cubric, “Wiki-based process framework for blended learning,” in
Proceedings of the 2007 international symposium on Wikis. ACM, 2007, pp. 11–
mostly around the MOOC, Moodle and Sakai LMS. Investigating 24.
the feelings of the learner and deployment designs of these systems [8] R.-A. Lee and J. DePue, “Using baldrige method frameworks, excellence in
higher education standards, and the sakai cle for the self assessment process,” in
remains unexplored. Proceedings of the 38th annual ACM SIGUCCS fall conference: navigation and
discovery. ACM, 2010, pp. 165–170.
[9] E. Aljenaa, F. Al-Anzi, and M. Alshayeji, “Towards an efficient e-learning
system based on cloud computing,” in Proceedings of the Second Kuwait
CONCLUSION Conference one-Services and e-Systems. ACM, 2011, p. 13.
In this study, SLR is conducted for Big Data Analytics in Higher [10] T. Zarouchas, I. Perikos, M. Paraskevas, and T. Pegiazis, “A hybrid training
framework oriented to computer engineering educators,” in Proceedings of the
Education based on three key research questions. In finding 19th Panhellenic Conference on Informatics. ACM, 2015, pp. 33–37.
answers to these questions, 55 research works were selected from [11] S. Ameri, M. J. Fard, R. B. Chinnam, and C. K. Reddy, “Survival analysis based
framework for early prediction of student dropouts,” in Proceedings of the 25th
2007 to 2018 from six (6) computer science databases. We ACM International on Conference on Information and Knowledge Management.
constrained our search to Big Data and Education. This study ACM, 2016, pp. 903–912.
[12] Y. Huang, M. Yudelson, S. Han, D. He, and P. Brusilovsky, “A framework for
identified ten (10) Big Data frameworks, thirty (30) Big Data tools, dynamic knowledge modeling in textbook-based learning,” in Proceedings of the
nineteen (19) key algorithms and eight (8) areas where the higher 2016 conference on user modeling adaptation and personalization. ACM, 2016,
education community focus on predictions. pp. 141–150.
[13] S.-K. Haw, S.-T. Ong, C.-O. Wong, and M. S. Wong, “Conceptualize the e-
Many researchers have focused on predictions with less learning framework for the secondary school curriculum,” in Proceedings of the
attention to data management and security. Maintaining International Conference on Digital Technology in Education. ACM, 2017, pp.
18–22.
provenance, privacy and confidentiality whilst performing [14] J. S. He, S. Ji, and P. O. Bobbie, “Internet of things (iot)-based learning
analytics remains a challenge in educational Big Data analytics. framework to facilitate stem undergraduate education,” in Proceedings of the
Another issue that emerged from the study is identifying a SouthEast Conference. ACM, 2017, pp. 88–94.
[15] S. Grover, M. Bienkowski, S. Basu, M. Eagle, N. Diana, and J. Stamper, “A
general framework standard for educational Big Data analytics with framework for hypothesis-driven approaches to support data-driven learning
theoretical underpinning. Hence, there is a critical need for a analytics in measuring computational thinking in block-based programming,” in
Proceedings of the Seventh International Learning Analytics & Knowledge
general standard that allows for portability of Big Data frameworks Conference. ACM, 2017, pp. 530–531.
and mining models. [16] J. Yang, J. Ma, S. K. Howard, M. Ciao, and R. Srikhanta, “A big data analytic
framework for investigating streaming educational data,” in Proceedings of the
Furthermore, few researchers focused on behavioural analysis. Australasian Computer Science Week Multiconference. ACM, 2017, p. 55.
None of these researchers put a spotlight on learner emotions [17] R. Sooriamurthi, “Introducing big data analytics in high school and college,” in
(emotion-aware systems) and instructors' effectiveness in Proceedings of the 23rd Annual ACM Conference on Innovation and Technology
in Computer Science Education. ACM, 2018, pp. 373–374.
supporting the learners. Understanding the emotions of the learner [18] M. M. Mohan, S. K. Augustin, and V. K. Roshni, “A bigdata approach for
and the sentiment of learners about teacher effectiveness could classification and prediction of student result using mapreduce,” in 2015 IEEE
Recent Advances in Intelligent Computational Systems (RAICS). IEEE, 2015,
allow an institution to take corrective action on time. pp. 145–150.
As education shifts from the traditional learning approach to a [19] A. Z. Bhat and I. Ahmed, “Big data for institutional planning, decision support
more digitised learning system, massive data is likely to be and academic excellence,” in 2016 3rd MEC International Conference on Big
Data and Smart City (ICBDSC). IEEE, 2016, pp. 1–5.
produced. This data will provide an enormous potential for [20] M. Riffai, P. Duncan, D. Edgar, and A. H. Al-Bulushi, “The potential for big data
researchers and managers of education to improve teaching and to enhance the higher education sector in oman,” in 2016 3rd MEC International
Conference on Big Data and Smart City (ICBDSC). IEEE, 2016, pp. 1–6.
learning.

8
EBIMCS2019, December, 2019, Kuala Lumpur, Malaysia David Otoo-Arthur, Terence Van Zyl.

[21] P. K. Udupi, P. Malali, and H. Noronha, “Big data integration for transition from [46] A. De Mauro, M. Greco, M. Grimaldi, and P. Ritala, “Human resources for big
e-learning to smart learning framework,” in 2016 3rd MEC International data professions: A systematic classification of job roles and required skill sets,”
Conference on Big Data and Smart City (ICBDSC). IEEE, 2016, pp. 1–4. Information Processing & Management, vol. 54, no. 5, pp. 807–817, 2018.
[22] B. Liu, X. Li, Y. Wang, H. Wang, and F. Xu, “The system framework of data [47] M. Shorfuzzaman, M. S. Hossain, A. Nazir, G. Muhammad, and A. Alamri,
mining and learning analysis for smart classroom,” in 2018 International Joint “Harnessing the power of big data analytics in the cloud to support learning
Conference on Information, Media and Engineering (ICIME). IEEE, 2018, pp. analytics in mobile learning environment,” Computers in Human Behavior, vol.
331–336. 92, pp. 578–588, 2018.
[23] S. T. Konstantinidis, A. Fecowycz, K. Coolin, H. Wharrad, G. Konstantinidis, [48] B. Cope and M. Kalantzis, “Big data comes to school: Implications for learning,
and P. D. Bamidis, “A proposed learner activity taxonomy and a framework for assessment, and research,” AERA Open, vol. 2, no. 2, p. 2332858416641907,
analysing learner engagement versus performance using big educational data,” 2016.
in 2017 IEEE 30th International Symposium on Computer-Based Medical [49] G. Veletsianos, J. Reich, and L. A. Pasquini, “The life between big data log events:
Systems (CBMS). IEEE, 2017, pp. 429–434. Learners’ strategies to overcome challenges in moocs,” AERA Open, vol. 2, no.
[24] R. N. Laveti, S. Kuppili, J. Ch, S. N. Pal, and N. S. C. Babu, “Implementation of 3, p. 2332858416657002, 2016.
learning analytics framework for moocs using state-of-the-art in-memory [50] M. Clayton and D. Halliday, “Big data and the liberal conception of education,”
computing,” in 2017 5th National Conference on E-Learning & E-Learning Theory and Research in Education, vol. 15, no. 3, pp. 290–305, 2017.
Technologies (ELELTECH). IEEE, 2017, pp. 1–6. [51] G. Dishon, “New data, old tensions: Big data, personalized learning, and the
[25] R. Machova, J. Komarkova, and M. Lnenicka, “Processing of big educational challenges of progressive education,” Theory and Research in Education, vol. 15,
data in the cloud using apache hadoop,” in 2016 International Conference on no. 3, pp. 272–289, 2017.
Information Society (i-Society). IEEE, 2016, pp. 46–49. [52] C. F. Lynch, “Who prophets from big data in education? New insights and new
[26] F. Matsebula and E. Mnkandla, “A big data architecture for learning analytics in challenges,” Theory and Research in Education, vol. 15, no. 3, pp. 249–271, 2017.
higher education,” in 2017 IEEE AFRICON. IEEE, 2017, pp. 951–956. [53] J. Scott and T. P. Nichols, “Learning analytics as assemblage: Criticality and
[27] R. Soltanpoor and A. Yavari, “Coala: contextualization framework for smart contingency in online education,” Research in Education, vol. 98, no. 1, pp. 83–
learning analytics,” in 2017 IEEE 37th International Conference on Distributed 105, 2017.
Computing Systems Workshops (ICDCSW). IEEE, 2017, pp. 226–231. [54] S. Slater, S. Joksimovi´c, V. Kovanovic, R. S. Baker, and D. Gasevic, “Tools for
[28] K. Stefanova and D. Kabakchieva, “Educational data mining perspectives within educational data mining: A review,” Journal of Educational and Behavioral
university big data environment,” in 2017 International Conference on Statistics, vol. 42, no. 1, pp. 85–106, 2017.
Engineering, Technology and Innovation (ICE/ITMC). IEEE, 2017, pp. 264–270. [55] B. Williamson, “Who owns educational theory? Big data, algorithms and the
[29] T. Zeng, “The research and practice of a five-sided educational data mining expert power of education data science,” E-Learning and Digital Media, vol. 14,
framework,” in 2017 IEEE 3rd Information Technology and Mechatronics no. 3, pp. 105–122, 2017.
Engineering Conference (ITOEC). IEEE, 2017, pp. 1050–1053. [56] M. Attaran, J. Stark, and D. Stotler, “Opportunities and challenges for big data
[30] J. Lam, K. K. Ng, S. K. Cheung, T. L. Wong, K. C. Li, and F. L. Wang, analytics in us higher education: A conceptual model for implementation,”
Technology in education. Technology-mediated proactive learning: Second Industry and Higher Education, vol. 32, no. 3, pp. 169–182, 2018.
international conference, ICTE 2015, Hong Kong, China, July 2-4, 2015, revised [57] Y.-F. Zhao, Z.-G. Fu, and F. Chen, “Research on big data preprocessing
selected papers. Springer, 2015, vol. 559. technology of thermal system,” in 2nd Annual International Conference on
[31] J. K. Tang, H. Xie, and T.-L. Wong, “A big data framework for early Energy, Environmental & Sustainable Ecosystem Development (EESED 2016).
identification of dropout students in mooc,” in International Conference on Atlantis Press, 2016.
Technology in Education. Springer, 2015, pp. 127–132. [58] G. Zhang, J. Li, and L. Hao, “Research on cloud computing and its application
[32] S. Velampalli and V. M. Jonnalagedda, “Intelligent computing for skill-set in big data processing of distance higher education.” International Journal of
analytics in a big data framework—a practical approach,” in Proceedings of the Emerging Technologies in Learning, vol. 10, no. 8, 2015.
First International Conference on Intelligent Computing and Communication. [59] S. S. Chaurasia and A. Frieda Rosin, “From big data to big impact: analytics for
Springer, 2017, pp. 267–275. teaching and learning in higher education,” Industrial and Commercial Training,
[33] J. A. Reyes, “The skinny on big data in education: Learning analytics simplified,” vol. 49, no. 7/8, pp. 321–328, 2017.
TechTrends, vol. 59, no. 2, pp. 75–80, 2015. [60] S. S. Chaurasia, D. Kodwani, H. Lachhwani, and M. A. Ketkar, “Big data
[34] C. Laux, N. Li, C. Seliger, and J. Springer, “Impacting big data analytics in academic and learning analytics: Connecting the dots for academic excellence
higher education through six sigma techniques,” International Journal of in higher education,” International Journal of Educational Management, vol. 32,
Productivity and Performance Management, vol. 66, no. 5, pp. 662–679, 2017. no. 6, pp. 1099–1117, 2018.
[35] K. Dahdouh, A. Dakkak, L. Oughdir, and F. Messaoudi, “Big data for online [61] J. McCarthy, “Enhancing feedback in higher education: Students’ attitudes
learning systems,” Education and Information Technologies, vol. 23, no. 6, pp. towards online and in-class formative assessment feedback models,” Active
2783–2800, 2018. Learning in Higher Education, vol. 18, no. 2, pp. 127–141, 2017.
[36] B. Williamson, “The hidden architecture of higher education: building a big data [62] M. Nu´n˜ez-del Prado and R. Gom´ez, “Learning data analytics through a
infrastructure for the ‘smarter university’,” International Journal of Educational problem based learning course,” in 2017 IEEE World Engineering Education
Technology in Higher Education, vol. 15, no. 1, p. 12, 2018. Conference (EDUNINE). IEEE, 2017, pp. 52–56.
[37] T. De Feyter, R. Caers, C. Vigna, and D. Berings, “Unraveling the impact of the [63] A. Klaˇsnja-Mili´cevi´c, M. Ivanovi´c, and Z. Budimac, “Data science in
big five personality traits on academic performance: The moderating and education: Big data and learning analytics,” Computer Applications in
mediating effects of self-efficacy and academic motivation,” Learning and Engineering Education, vol. 25, no. 6, pp. 1066–1078, 2017.
individual Differences, vol. 22, no. 4, pp. 439–448, 2012. [64] Y. Demchenko, C. De Laat, and P. Membrey, “Defining architecture components
[38] W. Smidt, G. Kammermeyer, and S. Roux, “Relations between the big five of the big data ecosystem,” in 2014 International Conference on Collaboration
personality traits of prospective early childhood pedagogues and their beliefs Technologies and Systems (CTS). IEEE, 2014, pp. 104–112.
about the education of preschool children: Evidence from a german study,” [65] P. Nilsen, “Making sense of implementation theories, models and frameworks,”
Learning and Individual Differences, vol. 37, pp. 96–106, 2015. Implementation science, vol. 10, no. 1, p. 53, 2015.
[39] J. T. Wassan, “Discovering big data modelling for educational world,” Procedia- [66] P. V. Coveney, E. R. Dougherty, and R. R. Highfield, “Big data need big theory
Social and Behavioral Sciences, vol. 176, pp. 642–649, 2015. too,” Philosophical Transactions of the Royal Society A: Mathematical, Physical
[40] G. Bello-Orgaz, J. J. Jung, and D. Camacho, “Social big data: Recent and Engineering Sciences, vol. 374, no. 2080, p. 20160153, 2016.
achievements and new challenges,” Information Fusion, vol. 28, pp. 45–59, 2016. [67] L. P. Macfadyen and S. Dawson, “Mining lms data to develop an “early warning
[41] J. Zhang and M. Ziegler, “How do the big five influence scholastic performance? system” for educators: A proof of concept,” Computers & education, vol. 54, no.
A big five-narrow traits model or a double mediation model,” Learning and 2, pp. 588–599, 2010.
Individual Differences, vol. 50, pp. 93–102, 2016. [68] M. Franklin, “Making sense of big data with the berkeley data analytics stack,”
[42] O. Bent, P. Dey, K. Weldemariam, and M. K. Mohania, “Modeling user behavior in Proceedings of the Eighth ACM International Conference on Web Search and
data in systems of engagement,” Future Generation Computer Systems, vol. 68, Data Mining. ACM, 2015, pp. 1–2.
pp. 456–464, 2017. [69] K. Grolinger, M. Hayes, W. A. Higashino, A. L’Heureux, D. S. Allison, and M.
[43] S. Singh, R. Misra, and S. Srivastava, “An empirical investigation of student’s A. Capretz, “Challenges for mapreduce in big data,” in Services (SERVICES),
motivation towards learning quantitative courses,” The International Journal of 2014 IEEE World Congress on. IEEE, 2014, pp. 182–189.
Management Education, vol. 15, no. 2, pp. 47–59, 2017. [70] A. Gupta, H. K. Thakur, R. Shrivastava, P. Kumar, and S. Nag, “A big data
[44] M. Birjali, A. Beni-Hssane, and M. Erritali, “A novel adaptive e-learning model analysis framework using apache spark and deep learning,” in Data Mining
based on big data by using competence-based knowledge and social learner Workshops (ICDMW), 2017 IEEE International Conference on. IEEE, 2017, pp.
activities,” Applied Soft Computing, vol. 69, pp. 14–32, 2018. 9–16.
[45] K. T. Chui, D. C. L. Fung, M. D. Lytras, and T. M. Lam, “Predicting at-risk
university students in a virtual learning environment via a machine learning
algorithm,” Computers in Human Behavior, 2018.

You might also like