Journal paper 3 new-Revolutionizing Online Education Integrating Machine Learning and Data
Journal paper 3 new-Revolutionizing Online Education Integrating Machine Learning and Data
Research Paper
Revolutionizing Online Education: Integrating Machine Learning and Data
Analysis into LMS
Parveen Singh1 , Meenakshi Handa2 , Anwal Ul Haq3*
1,2
Department of Computer Sciences Govt SPMR College of Commerce, Jammu
3
Department of Management, Govt SPMR College of Commerce, Jammu
educational data analysis tools in LMS [8], but they do not was created to accommodate individuals with work, family,
prioritize enhancing the learning experience. Some papers or geographic constraints who cannot attend traditional
suggest using data mining algorithms to uncover student classrooms. Interactive models that facilitate student
weaknesses in a particular course [6], which is then used to engagement with content, instructors, and peers, easy
address these weaknesses by the relevant individuals in accessibility from any location with internet access, both
charge of learning quality. Other studies propose more synchronous and asynchronous options, and access to online
complex models that integrate business intelligence (BI) resources at any time are all key features of online education.
frameworks [7], enabling the identification of factors
contributing to student attrition in virtual or online 3. Method
educational models by combining multiple data sources to
provide a more detailed analysis. The study conducted involved a university in Ecuador that
offers two modes of study - traditional face-to-face approach
The focus of several studies is to enhance the teacher's ability and online education [30]. The traditional mode is teacher-
to create better learning models and methodologies using AI centered, where the teacher determines the learning content
in LMS. Specialized AI techniques have been shown to and evaluation criteria, while the student is passive and must
improve user interaction and continuously learn from adhere to pre-set schedules [9]. This approach may not be
interactions [8]. This proposed work is distinct from previous ideal for a comprehensive learning model, as it can be
studies because it integrates both AI and data analysis in a subjective. On the other hand, the online education mode has
centralized system, creating a virtual assistant that manages evolved over ten years, with the integration of Information
student information and performs personalized monitoring. Technology (IT) and establishment of a technological
Unlike other studies that only analyze data within the LMS, foundation [30]. This mode caters to individuals who cannot
integrating multiple data sources through big data technology attend in-person classes due to their schedules and
provides the assistant with a comprehensive understanding of obligations, and operates on a Learning Management System
each student's needs and expectations, resulting in more (LMS) platform, where courses are designed for each degree
adaptable decision-making by AI [1]. This integration enables program. The student is required to complete three mandatory
swift and efficient evaluations of student performance, activities - completing tasks, evaluations, and participating in
ultimately enhancing the learning experience. forums, which can be accessed through a module with access
to all relevant resources [30].
Preliminary Concepts
The primary goal of data analysis is to extract insights and The university's technology infrastructure is an advantage for
knowledge from a dataset by applying statistical and the study, as it can support a high volume of transactions and
computational methods. This process involves several steps, services, with information security being maintained by the IT
including data cleaning, processing, modeling, and department. Figure 1 shows the technology architecture,
interpretation. Data analysis has broad applications across including a layer for data analysis, which allows for the
industries, including finance, healthcare, and government, integration of AI, data analysis, and the learning management
among others. It helps organizations to make data-driven system without compromising data [30]. However, the online
decisions, optimize their operations, and gain a competitive education mode faces challenges such as a high dropout rate
advantage. In healthcare, data analysis is used to identify and low academic effectiveness, which may be due to
patterns and trends in patient data, improve diagnoses, and students lacking proper study methods and discipline to
develop personalized treatment plans. In finance, it is used to adhere to their own schedules. To address these challenges,
assess risk, detect fraud, and forecast market trends. the AI system is refined to enhance the online education
Similarly, in the education sector, data analysis is used to experience, with the data analysis model being established
measure student progress, identify areas of improvement, and based on the variables and inquiries to be addressed [30].
evaluate the effectiveness of educational programs.
The layer in charge of data entry is accountable for gathering form of a report, a graph, or any other output that was
information from all systems and tools utilized by students specified by the user. The MapReduce process is widely used
[10]. This layer also takes into account data generated by in big data processing, machine learning, and data analytics
students on social networks related to their student applications.
experience.The knowledge layer is dedicated to data analysis
and employs big data architecture. It drives the online Phases for the Implementation of Machine Learning
education model by processing data from various sources, Before considering the technical approach, it is essential to
analyzing it with data mining algorithms, and passing the identify the business objective that the machine learning tool
information to the AI system. This AI system then generates aims to achieve. The goals can range from boosting
insights and interacts with students and learning authorities. conversions, lowering customer churn, to enhancing user
satisfaction [6]. The key is to have a clear understanding of
The LMS service layer serves as the integration point for all the aspect to be improved, to concentrate efforts and resources
the systems and layers discussed earlier, consolidating and in that direction, avoiding implementation of a solution that
presenting the information to users of the online education exceeds the original goal [12].
mode. The way the information is displayed can be
customized and presented through various systems that are The illustration in Figure 3 depicts the various stages of the
related to the educational model. machine learning process and how
they are interconnected.
Analysis of Data
The importance of data analysis in this study cannot be
overstated due to the vast volume and variety of data that
needs to be processed. Big data technology is well-suited to
meet these requirements, as it is designed to analyze data
from various sources. Student-generated data from
interactions with the Learning Management System (LMS) is
stored in a structured manner in a separate database.
However, relying solely on this data would limit the
granularity of analysis and only provide segmented scores for
each activity. To gain a more complete understanding of
students' learning processes, additional information, such as
socio-economic background and past academic performance,
should also be integrated into the analysis. This includes
information typically stored by universities and information
Figure 2. Phases for the implementation of a machine learning model.
obtained from students through sources such as social
networks, both structured and unstructured. The objective is
to collect information from all available sources to gain To effectively build a machine learning model, it is crucial to
deeper insights into students' learning behaviors. establish a clear evaluation criterion. This involves choosing
an appropriate error measure, depending on the type of
problem being solved. For regression problems, root-mean-
Hadoop Operation
The MapReduce computational process is executed within a square error is commonly used, while cross-entropy is
cluster. This process involves assigning tasks to different typically used for classification problems. In binary
servers within the cluster. Hadoop, which is a popular tool for classification, accuracy and completeness measures are
MapReduce processing, manages the distribution of data commonly utilized. Choosing the appropriate evaluation
between nodes in the cluster to reduce network traffic. criterion is crucial for accurately assessing the performance of
the model.
The MapReduce process consists of two main phases, which
are the Map phase and the Reduce phase. During the Map Before building the model, it is also important to perform an
phase, the initial data processing occurs, and data is mapped exploratory analysis of the data. This involves using
into key-value pairs. This phase is responsible for filtering descriptive statistics, correlations, and graphs to gain a better
and sorting the data, as well as grouping it by key. understanding of the data and the story it is trying to tell. This
analysis can help determine if the data is sufficient and
The Reduce phase, on the other hand, further processes the relevant enough to build a model.
data and is divided into two sub-phases: shuffling and
reduction. The shuffling phase involves transferring data In some cases, the problem may originate from a field with
between nodes based on their key values. During the limited knowledge. In such cases, establishing collaborative
reduction phase, the data is aggregated and processed to relationships with individuals who have a deep understanding
produce the final output. of the problem can be crucial in gaining insight into the data.
Overall, the process of building a machine learning model
Once the MapReduce process is complete, the user receives involves multiple steps, including data preparation, feature
the result generated by the cluster. This result could be in the engineering, model selection, and evaluation, among others.
Evaluating the current solution: Determine if there is an The big data architecture in its initial stage focuses on
existing solution and measure its performance. Compare it to gathering data from various sources, which can be both
the machine learning model to determine feasibility. If there structured and unstructured [9]. The collected data is then
is no existing solution, create a simple solution and compare processed and made useful for the AI to use in machine
its performance to the machine learning model. Stick with the learning. The machine learning algorithm is tasked with
simple solution if it's comparable to the machine learning identifying patterns in the data, which it then uses to classify
solution. individuals based on certain characteristics. The goal is for
the system to understand the needs of each group, allowing it
Preparing the data: Data preparation involves dealing with to suggest strategies to improve the way activities are
incomplete data. Actions such as deletion, imputation with a presented [5].
reasonable value, imputation using a machine learning model,
or using a machine learning technique that handles In addition to providing students with personalized learning
incomplete data can be used. Harmonize data from multiple experiences, learning management systems can also
sources, calculate relevant features, and express data in recommend activities based on individual student needs. After
intuitive ways to enhance performance. activities have been recommended, the system enters an
analysis phase in which it evaluates the effectiveness of the
Developing the Model: Select the machine learning activities by analyzing the grades that students receive. If the
technique and algorithm, and the algorithm will automatically results indicate improvement, the process is repeated.
learn to produce correct outcomes using prepared historical However, if the system detects that the results fall below the
data. university's average mark, feedback is integrated into the
analysis phase to further improve the learning process, with
Error Analysis: Identify areas of improvement for the the process being repeated until satisfactory results are
machine learning results. Verify the model's ability to achieved.
generalize and achieve accurate results when presented with
new data. Repeating the previous phases multiple times can 4. Discussion and Results
lead to exceptional results.
The integration of technology in education has revolutionized
Model Integration into System: Integrate the model into the the way student performance is monitored. The traditional
system, automate the data preparation process, and monitor method of relying on teacher evaluation has been replaced by
errors automatically. Create interfaces for the data so that the a new model that utilizes continuous analysis by the system
model can obtain data automatically and the system can use and machine learning. The implementation of this model
its predictions automatically. enables the system to identify students at risk of poor
academic performance and provide early warnings, thus
Integration of Big Data, Machine Learning, and LMS preventing further decline in academic performance. Unlike
A model, similar to the one depicted in Figure 4, is utilized the conventional method, the new model uses the student's
for the integration of systems and new technology. The LMS previous history to generate projections, which enables the
(Learning Management System) has a vast amount of data system to detect potential difficulties in subjects related to a
regarding all student activities and interactions with the previously failed one, and also possible course repetition.
platform. Although the interaction is indirect, the LMS The machine learning model used in this study recommends
database often records the length of time each student spends further learning activities based on the student's performance
actively using the platform, as well as the regular schedule in in each activity. The model uses the best results a student has
which each student logs in [10]. Additionally, data from achieved to make decisions on which activities would be most
administrative and other academic databases are also added to beneficial. It also identifies groups of students that certain
this information. This leads to a comprehensive analysis that activities, such as rapid evaluations using true or false
involves a greater number of variables, processed by the big questions, do not meet the needs of, and suggests alternative
data architecture. activities that promote active learning. This approach
prioritizes the development of active learning, offering
students a range of activities tailored to their individual needs
and abilities.
final evaluation. The courses, created and registered within generally result in mean values that meet the activity
the LMS (Moodle in this case), were divided into modules requirements, the lowest performing activity is the
corresponding to each week of the term and comprised a main questionnaire-type evaluations. These evaluations consist of
module providing information on the study type, subject 10 questions to be completed in 20 minutes, with an expected
matter, and assigned tutor, as well as a syllabus and study response time of two minutes per question. Unfortunately,
guide outlining the topics and activities. Each weekly module these evaluations yield extremely low results that do not
was further divided into sections containing resources, contribute to the learning process.
activities, and information for asynchronous meetings with
the tutor. To address this issue, the big data result is fed into an AI for
machine learning and decision-making purposes. The
In the "resources section," the tutor is responsible for integrated AI model includes an analysis of data from the
uploading all the material related to the week's topic. This LMS, which records the time students spend reading the
material must be aligned with the subject's learning outcomes, teacher's materials, as well as data from a survey of students
and the tutor usually provides their own material such as that covers the amount of time they took to answer each
presentations, exercise solutions, and readings. The material question. The data from this analysis is subjected to a naive
may also include supporting resources such as videos, Bayes data mining algorithm, which conducts a detailed
articles, etc. analysis on 51 instances to determine the reason for the
below-expected scores in evaluations. Out of the 51 instances,
The "activities section" contains tasks for the students to 48 were found to be accurate with an accuracy rate of
complete each week. This includes an opinion forum where 94.1176%, which was deemed as a reliable result to form the
students critically discuss a topic raised by the tutor. analysis decision.
Additionally, students must complete tasks based on Bloom's
taxonomy which aims to ensure students acquire new skills The analysis revealed that the allotted time of 2 minutes per
and knowledge. The levels of Bloom's taxonomy are question hinders the success of the evaluation. Findings were
knowledge, understanding, application, analysis, evaluation, compared to the number of evaluations that were closed due
and creation. Lastly, students must take a questionnaire-style to time expiration, with 18 instances identified as affected and
evaluation to encourage them to read the resources. one instance mistakenly detected as a difficulty with the
evaluation. Additionally, 15 instances were observed where
The final section retains information pertaining to the students did not dedicate enough time to reading the teacher's
asynchronous meeting with the instructor. The purpose of this materials. The evaluation difficulty was also considered, with
meeting is for students to have the opportunity to ask the tutor 15 instances recorded as being a cause and two instances
any questions they may have or receive feedback on topics recorded as being due to the time assigned per question.
discussed. Each meeting is 60 minutes long, and participation Based on these results, the machine learning model
is not mandatory. If desired, students can review the recorded recommended that the tutor increase the response time per
meeting as many times as they see fit. question to 2 minutes and 30 seconds, resulting in a total
evaluation time of 25 minutes.
In order to determine the factors contributing to student
dropout, variables are established once the integrated model is The model incorporates additional variables into the analysis
defined. These variables include: the university degree and makes decisions based on the outcomes. Data was
equivalent to the average score a student received during their gathered from the LMS and a student survey, and the model
secondary studies, the number of successful course enables weight adjustments to ensure effective decision-
completions, the number of times enrolled within a specified making, which becomes crucial when results need to be
time frame, the specific courses taken (with a range of 1 to adjusted. The model's quick action allows prompt corrections
20, coded based on average course load), student gender, and to be made before a situation escalates into a problem,
age (between 19 and 30 years old). The objective is to particularly when evaluations are conducted concurrently.
identify the causes of university dropout, as previous studies
have considered abandonment as a failure to continue 5. Conclusions
enrollment in consecutive semesters.
The education sector has undergone significant changes with
For the first exercise, big data access to logs of teacher and the advent of virtual, online, or hybrid learning models, which
student activities is required. These logs are typically stored are now at the forefront of learning and research. These
in MySQL. The data collected from various sources models have integrated technology as a crucial aspect to
undergoes processing and transformation to obtain clean data, enhance education, with the adoption of student-centric
which is then analyzed using Hadoop to uncover patterns in educational models holding the potential to improve learning
student behavior. outcomes and address issues such as high dropout rates and
low academic performance.
Most students in the forums demonstrate a high level of
learning, but some receive low grades due to either not The integration of technology in education has allowed for the
registering their participation or submitting non-objective development of systems that serve as valuable aids to both
contributions. While the tasks based on Bloom's taxonomy students and teachers. These systems facilitate students'
calendar management, generating reminders, notifications, Luján-Mora, S. (2019). Comprehensive learning system based on
and events to keep them informed of their required activities. the analysis of data and the recommendation of activities in a
distance education environment. International Journal of
The system also provides continuous support to help students
Engineering Education, 35, 1316-1325, 2019.
improve their performance, while enabling teachers to [12]. Darcy, A.M., Louie, A.K. & Roberts, L.W. (2016). Machine
monitor each student's learning progress through a thorough learning and the profession of medicine. JAMA, 315, 2016.
analysis of the system's data.
References
[1]. Li, H., Liu, S.M., Yu, X.H., Tang, S.L. & Tang, C.K. (2020).
Coronavirus disease 2019 (COVID-19): Current status and future
perspectives. International Journal of Antimicrobial Agents, 55,
105951, 2020. doi: 10.1016/j.ijantimicag.2020.105951
[2]. Riofrio, G., Encalada, E., Guaman, D. & Aguilar, J. (2015,
October). Business intelligence applied to learning analytics in
student-centered learning processes. In Proceedings of the Latin
American Computing Conference. Arequipa, Peru. pp.1-10, 2015.
[3]. Beldarrain, Y. (2006). Distance education trends: Integrating new
technologies to foster student interaction and collaboration.
Distance Education, 27, 139-153, 2006. doi:
10.1080/01587910600789498
[4]. Hssina, B., Bouikhalene, B. & Merbouha, A. (2017). Europe and
MENA Cooperation Advances in Information and Communication
Technologies (Vol. 520). In A. Rocha, S. Mohammed & C.
Felgueiras (Eds.), Springer International Publishing. 2017. doi:
10.1007/978-3-319-46568-5_43
[5]. Villegas-Ch, W., Lujan-Mora, S. & Buenano-Fernandez, D. (2017,
November). Application of a Data Mining Method in to LMS for
the Improvement of Engineering Courses in Networks. In
Proceedings of the 10th International Conference of Education,
Research and Innovation. Seville, Spain. pp. 6374-6381, 2017.
[6]. Comendador, B.E.V., Rabago, L.W. & Tanguilig, B.T. (2016,
March). An educational model based on Knowledge Discovery in
Databases (KDD) to predict learner’s behavior using classification
techniques. In Proceedings of the IEEE International Conference on
Signal Processing, Communications and Computing, Shanghai,
China. pp. 1-6, 2016.
[7]. Kim, T. & Lim, J. (2019). Designing an Efficient Cloud
Management Architecture for Sustainable Online Lifelong
Education. Sustainability, 11, 1523, 2019. doi: 10.3390/su11061523
[8]. Ferguson, R. (2013). Learning analytics: Drivers, developments and
challenges. International Journal of Technology Enhanced
Learning, 4, 304-317, 2013. doi: 10.1504/IJTEL.2013.057405
[9]. Lee, S.J., Lee, H. & Kim, T.T. (2018). A study on the instructor
role in dealing with mixed contents: How it affects learner
satisfaction and retention in e-learning. Sustainability, 10, 850,
2018. doi: 10.3390/su10030850
[10]. Lee, J., Song, H.D. & Hong, A.J. (2019). Exploring factors, and
indicators for measuring students’ sustainable engagement in e-
learning. Sustainability, 11, 985, 2019. doi: 10.3390/su11040985
[11]. Villegas-Ch, W., Palacios-Pacheco, X., Buenaño-Fernandez, D. &