SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 311
STUDENT PASS PERCENTAGE DEDECTION USING ENSEMBLE
LEARNINNG
P Kiran Rao1, K Giri Kumar2, T Bala Krishna3
1Assisstant Professor, GPCET (affiliated to JNTUA, Anantapur) Kurnool, India
2B.Tech Student, CSE Department, GPCET(affiliated to JNTUA , Anantapur),Kurnool, India
3B.Tech Student, CSE Department, GPCET(affiliated to JNTUA , Anantapur),Kurnool, India
------------------------------------------------------------------------***-------------------------------------------------------------------
ABSTRACT
Ensemble learning is the process by which multiple
models, such as classifiers or experts, are strategically
generated and combined to solve a particular
computational intelligence problem. Ensemble learning
is primarily used to improve the (classification,
prediction, function approximation, etc.) performance of
a model, or reduce the likelihood of an unfortunate
selection of a poor one. Other applications of ensemble
learning include assigning a confidence to the decision
made by the model, selecting optimal (or near optimal)
features, data fusion, incremental learning,
nonstationary learning and error-correcting. This article
focuses on classification related applications of ensemble
learning, however, all principle ideas described below
can be easily generalized to function approximation or
prediction type problems as well.
1. INTRODUCTION
Strengthening the scientific workforce has been and
continues to be of importance for every country in the
world. Preparing an educated workforce to enter
Science, Technology, Engineering and Mathematics
(STEM) careers is important for scientific innovations
and technological advancements, as well as economic
development and competitiveness. In addition to
expanding the nation’s workforce capacity in STEM,
broadening participation and success in STEM is also
imperative for women given their historical under
representation and the occupational opportunities
associated with these fields.
Prediction modeling lies at the core of many EDM
applications whose success depends critically on the
quality of the classifier . There has been substantial
research in developing sophisticated prediction
models and algorithms with the goal of improving
classification accuracy, and currently there is a rich
body of such classifiers. However, although the topic of
explanation and prediction of enrollment is widely
researched, prediction of student enrollment in higher
education institutions is still the most topical debate in
higher learning institutions.
The rest of the paper is organized as follows: Section
II describes the related works including ensemble
methods in machine learning and related empirical
studies on educational data mining using ensemble
methods. Section III describes the methodology used in
this study and the experiment conducted. Section IV
presents results and discussion. Finally, section V
presents the conclusions of the study.
2. ENSEMBLE CLASSIFICATION
Ensemble modeling has been the most influential
development in Data Mining and Machine Learning in
the past decade. The approach includes combining
multiple analytical models and then synthesizing the
results into one usually more accurate than the best of its
components.
The following sub sections details different base
classifiers and the ensemble classifiers.
2.1 Base Classifiers:
Rahman and Tasnim describe base classifiers as
individual classifiers used to construct the ensemble
classifiers. The following are the common base
classifiers: (1) Decision Tree Induction – Classification
via a divide and conquer approach that creates
structured nodes and leafs from the dataset. (2)
Logistics Regression – Classification via extension of
the idea of linear regression to situations where
outcome variables are categorical. (3) Nearest
Neighbor – Classification of objects via a majority vote of
its neighbors, with the object being assigned to the class
most common. (4) Neural Networks – Classification by
use of artificial neural networks. (5) Naïve Bayes
Methods – Probabilistic methods of classification based
on Bayes Theorem, and (6) Support Vector Machines –
Use of hyper-planes to separate different instances into
their respective classes.
2.2 Ensemble Classifiers
Many methods for constructing ensembles have been
developed. Rahman and Verma argued that ensemble
classifier generation methods can be broadly classified
into six groups that that are based on (i)
manipulation of the training parameters, (ii)
manipulation of the error function, (iii) manipulation
of the feature space, (iv) manipulation of the output
labels, (v) clustering, and (vi) manipulation of the
training patterns.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 312
3. RELATED EMPIRICAL STUDIES
Stapel, Zheng, and Pinkwart study investigated an
approach that decomposes the math content structure
underlying an online math learning platform, trains
specialized classifiers on the resulting activity scopes
and uses those classifiers in an ensemble to predict
student performance on learning objectives.
The study used J48, Decision Table and Naïve Bayes
as base classifers and bagging ensemble model. The
study concluded that J48 algorithm was doing better
than the Naïve Bayesian. Also, bagging ensemble
technique provided accuracy which was comparable
to J48. Hence, this approach could aid the institution
to find out means to enhance their students
performance.
4. METHODOLOGY
4.1 Study Design
This study adapted the Cross Industry Standard
Process for Data Mining (CRISP-DM) process model
suggested by Nisbet, Elder and Miner as a guiding
framework. The framework breaks down a data
mining project in phases which allow the building and
implementation of a data mining model to be used in a
real environment, helping to support
business decisions. Figure I give an overview of the
key stages in the adapted methodology.
Stage 1: Business Understanding
Stage 2: Data Understanding
Stage 3: Data preparation
Stage 4: Modeling
Stage 5: Evaluation
4.2 Experiment
4.2.1 Data Collection
Data was collected from sampled students through a
personally administered structured questionnaire at
Murang’a University of Technology, Kenya for the
academic year 2016-2017. The target population was
grouped into two mutually exclusive groups namely;
STEM (Science,Technology,Engineering and
Mathematics) and non-STEM Majors.
4.2.2 Data Transformation
The collected data attributes were transformed into
numerical values, where we assigned different numerical
values to each of the attribute values. This data was then
transformed into forms acceptable to WEKA data mining
software.
4.2.3 Data Modeling
To find the main reasons that affects the students’
choice to enroll in STEM courses the study used three
base classification algorithms together with an
ensemble model method, so that we can find accurate or
exact factors affecting students’ enrollment in STEM.
5. RESULTS AND DISCUSSION
We collected students’ information by distributing
structured questionnaire among 220 students and 209
responses were collected. This data was preprocessed
and recorded into Microsoft Excel file and then through
online conversion tool, the Excel file was converted into
.arff file which is supported by the WEKA software tool.
We used Weka 3.6 software for our analysis. Table II
shows the results obtained from the experiment.
The information on Table II shows comparison details of
the algorithms that were used in our analysis. When we
compared the models, we found that the J48
Algorithm correctly classified 84% of the instances
and 16% of the instances incorrectly classified. The
classification error is less compared to the other two
baseline classification algorithm, that is, CART (23%
Incorrectly Classified Instances) and Naïve Bayes
(28% Incorrectly Classified Instances).
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 313
6. CONCLUSION
There are many factors that may affect students'
choice to enroll and pursue a career in STEM in
higher education institutions. These factors can be used
during the admission process to ensure that students are
admitted in the courses that best fit them. To
categorize the students' based on the association
between choice to enroll in a STEM major and
attributes, a good classification is needed. In addition,
rather than depending on the outcome of a single
technique, ensemble model could do better. In our
analysis, we found that J48 algorithm.is doing better than
Naïve Bayesian and the CART algorithms.
8. REFERENCES
[1].https://www.researchgate.net/publication/3262419
25_Using_Machine_Learning_Algorithm_to_Predict_Stude
nt_Pass_Rates_In_Online_Education
[2].https://www.hindawi.com/journals/scn/2018/5264
526/
[3].https://www.emerald.com/insight/content/doi/10.
1108/JARHE-09-2017-0113/full/html?fullSc=1
[4].https://www.researchgate.net/publication/3262419
25_Using_Machine_Learning_Algorithm_to_Predict_Stude
nt_Pass_Rates_In_Online_Education
[5].
https://dl.acm.org/citation.cfm?id=3018896.3065830

More Related Content

What's hot (16)

PDF
F03403031040
theijes
 
PDF
PREDICTING ACADEMIC MAJOR OF STUDENTS USING BAYESIAN NETWORKS TO THE CASE OF ...
ijcax
 
PDF
Data Mining Application in Advertisement Management of Higher Educational Ins...
ijcax
 
PDF
IRJET- Student Performance Analysis System for Higher Secondary Education
IRJET Journal
 
PDF
IRJET- Machine Learning Classification Algorithms for Predictive Analysis in ...
IRJET Journal
 
PDF
IRJET- Using Data Mining to Predict Students Performance
IRJET Journal
 
PDF
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...
Editor IJCATR
 
PDF
Data Analysis and Result Computation (DARC) Algorithm for Tertiary Institutions
IOSR Journals
 
PDF
IRJET- Predictive Analytics for Placement of Student- A Comparative Study
IRJET Journal
 
PDF
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
ijtsrd
 
PDF
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
IJDKP
 
PDF
Modeling the Student Success or Failure in Engineering at VUT Using the Date ...
journal ijrtem
 
PDF
Multi Label Spatial Semi Supervised Classification using Spatial Associative ...
cscpconf
 
PDF
IRJET- Analysis of Student Performance using Machine Learning Techniques
IRJET Journal
 
PDF
IRJET - A Study on Student Career Prediction
IRJET Journal
 
PDF
L016136369
IOSR Journals
 
F03403031040
theijes
 
PREDICTING ACADEMIC MAJOR OF STUDENTS USING BAYESIAN NETWORKS TO THE CASE OF ...
ijcax
 
Data Mining Application in Advertisement Management of Higher Educational Ins...
ijcax
 
IRJET- Student Performance Analysis System for Higher Secondary Education
IRJET Journal
 
IRJET- Machine Learning Classification Algorithms for Predictive Analysis in ...
IRJET Journal
 
IRJET- Using Data Mining to Predict Students Performance
IRJET Journal
 
A Model for Predicting Students’ Academic Performance using a Hybrid of K-mea...
Editor IJCATR
 
Data Analysis and Result Computation (DARC) Algorithm for Tertiary Institutions
IOSR Journals
 
IRJET- Predictive Analytics for Placement of Student- A Comparative Study
IRJET Journal
 
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
ijtsrd
 
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
IJDKP
 
Modeling the Student Success or Failure in Engineering at VUT Using the Date ...
journal ijrtem
 
Multi Label Spatial Semi Supervised Classification using Spatial Associative ...
cscpconf
 
IRJET- Analysis of Student Performance using Machine Learning Techniques
IRJET Journal
 
IRJET - A Study on Student Career Prediction
IRJET Journal
 
L016136369
IOSR Journals
 

Similar to IRJET - Student Pass Percentage Dedection using Ensemble Learninng (20)

PDF
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...
IRJET Journal
 
PDF
Survey on Techniques for Predictive Analysis of Student Grades and Career
IRJET Journal
 
PDF
Student Performance Predictor
IRJET Journal
 
PDF
Using Naive Bayesian Classifier for Predicting Performance of a Student
ijtsrd
 
PDF
IRJET- Performance for Student Higher Education using Decision Tree to Predic...
IRJET Journal
 
PPTX
Short story ppt
KarishmaKuria1
 
PDF
journal for research
graphicdesigner79
 
DOCX
mini project on artificial intelligence and machine learning
Gayu Ram
 
PPTX
Short story ppt
KarishmaKuria1
 
PDF
Empirical Study on Classification Algorithm For Evaluation of Students Academ...
iosrjce
 
PDF
K017626773
IOSR Journals
 
PDF
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and Lime
IRJET Journal
 
PDF
Fd33935939
IJERA Editor
 
PDF
Fd33935939
IJERA Editor
 
PDF
Data Mining Model for Predicting Student Enrolment in STEM Courses in Higher ...
Editor IJCATR
 
PDF
A Comparative Study of Educational Data Mining Techniques for Skill-based Pre...
IJCSIS Research Publications
 
PDF
Lecture4 - Machine Learning
Albert Orriols-Puig
 
PDF
Post Graduate Admission Prediction System
IRJET Journal
 
PDF
CORRELATION BASED FEATURE SELECTION (CFS) TECHNIQUE TO PREDICT STUDENT PERFRO...
IJCNCJournal
 
M-Learners Performance Using Intelligence and Adaptive E-Learning Classify th...
IRJET Journal
 
Survey on Techniques for Predictive Analysis of Student Grades and Career
IRJET Journal
 
Student Performance Predictor
IRJET Journal
 
Using Naive Bayesian Classifier for Predicting Performance of a Student
ijtsrd
 
IRJET- Performance for Student Higher Education using Decision Tree to Predic...
IRJET Journal
 
Short story ppt
KarishmaKuria1
 
journal for research
graphicdesigner79
 
mini project on artificial intelligence and machine learning
Gayu Ram
 
Short story ppt
KarishmaKuria1
 
Empirical Study on Classification Algorithm For Evaluation of Students Academ...
iosrjce
 
K017626773
IOSR Journals
 
IRJET- Stabilization of Black Cotton Soil using Rice Husk Ash and Lime
IRJET Journal
 
Fd33935939
IJERA Editor
 
Fd33935939
IJERA Editor
 
Data Mining Model for Predicting Student Enrolment in STEM Courses in Higher ...
Editor IJCATR
 
A Comparative Study of Educational Data Mining Techniques for Skill-based Pre...
IJCSIS Research Publications
 
Lecture4 - Machine Learning
Albert Orriols-Puig
 
Post Graduate Admission Prediction System
IRJET Journal
 
CORRELATION BASED FEATURE SELECTION (CFS) TECHNIQUE TO PREDICT STUDENT PERFRO...
IJCNCJournal
 
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
PDF
Kiona – A Smart Society Automation Project
IRJET Journal
 
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
PDF
Breast Cancer Detection using Computer Vision
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Ad

Recently uploaded (20)

PPTX
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
PDF
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
PDF
Book.pdf01_Intro.ppt algorithm for preperation stu used
archu26
 
PPTX
Introduction to Design of Machine Elements
PradeepKumarS27
 
DOC
MRRS Strength and Durability of Concrete
CivilMythili
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PPTX
Day2 B2 Best.pptx
helenjenefa1
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
PDF
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
PDF
Water Design_Manual_2005. KENYA FOR WASTER SUPPLY AND SEWERAGE
DancanNgutuku
 
PDF
Zilliz Cloud Demo for performance and scale
Zilliz
 
PDF
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
PPTX
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
PPTX
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PDF
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
PPTX
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
PPTX
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
PPTX
Hashing Introduction , hash functions and techniques
sailajam21
 
PDF
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
Book.pdf01_Intro.ppt algorithm for preperation stu used
archu26
 
Introduction to Design of Machine Elements
PradeepKumarS27
 
MRRS Strength and Durability of Concrete
CivilMythili
 
Design Thinking basics for Engineers.pdf
CMR University
 
Day2 B2 Best.pptx
helenjenefa1
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
Water Design_Manual_2005. KENYA FOR WASTER SUPPLY AND SEWERAGE
DancanNgutuku
 
Zilliz Cloud Demo for performance and scale
Zilliz
 
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
Hashing Introduction , hash functions and techniques
sailajam21
 
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 

IRJET - Student Pass Percentage Dedection using Ensemble Learninng

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 311 STUDENT PASS PERCENTAGE DEDECTION USING ENSEMBLE LEARNINNG P Kiran Rao1, K Giri Kumar2, T Bala Krishna3 1Assisstant Professor, GPCET (affiliated to JNTUA, Anantapur) Kurnool, India 2B.Tech Student, CSE Department, GPCET(affiliated to JNTUA , Anantapur),Kurnool, India 3B.Tech Student, CSE Department, GPCET(affiliated to JNTUA , Anantapur),Kurnool, India ------------------------------------------------------------------------***------------------------------------------------------------------- ABSTRACT Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. Ensemble learning is primarily used to improve the (classification, prediction, function approximation, etc.) performance of a model, or reduce the likelihood of an unfortunate selection of a poor one. Other applications of ensemble learning include assigning a confidence to the decision made by the model, selecting optimal (or near optimal) features, data fusion, incremental learning, nonstationary learning and error-correcting. This article focuses on classification related applications of ensemble learning, however, all principle ideas described below can be easily generalized to function approximation or prediction type problems as well. 1. INTRODUCTION Strengthening the scientific workforce has been and continues to be of importance for every country in the world. Preparing an educated workforce to enter Science, Technology, Engineering and Mathematics (STEM) careers is important for scientific innovations and technological advancements, as well as economic development and competitiveness. In addition to expanding the nation’s workforce capacity in STEM, broadening participation and success in STEM is also imperative for women given their historical under representation and the occupational opportunities associated with these fields. Prediction modeling lies at the core of many EDM applications whose success depends critically on the quality of the classifier . There has been substantial research in developing sophisticated prediction models and algorithms with the goal of improving classification accuracy, and currently there is a rich body of such classifiers. However, although the topic of explanation and prediction of enrollment is widely researched, prediction of student enrollment in higher education institutions is still the most topical debate in higher learning institutions. The rest of the paper is organized as follows: Section II describes the related works including ensemble methods in machine learning and related empirical studies on educational data mining using ensemble methods. Section III describes the methodology used in this study and the experiment conducted. Section IV presents results and discussion. Finally, section V presents the conclusions of the study. 2. ENSEMBLE CLASSIFICATION Ensemble modeling has been the most influential development in Data Mining and Machine Learning in the past decade. The approach includes combining multiple analytical models and then synthesizing the results into one usually more accurate than the best of its components. The following sub sections details different base classifiers and the ensemble classifiers. 2.1 Base Classifiers: Rahman and Tasnim describe base classifiers as individual classifiers used to construct the ensemble classifiers. The following are the common base classifiers: (1) Decision Tree Induction – Classification via a divide and conquer approach that creates structured nodes and leafs from the dataset. (2) Logistics Regression – Classification via extension of the idea of linear regression to situations where outcome variables are categorical. (3) Nearest Neighbor – Classification of objects via a majority vote of its neighbors, with the object being assigned to the class most common. (4) Neural Networks – Classification by use of artificial neural networks. (5) Naïve Bayes Methods – Probabilistic methods of classification based on Bayes Theorem, and (6) Support Vector Machines – Use of hyper-planes to separate different instances into their respective classes. 2.2 Ensemble Classifiers Many methods for constructing ensembles have been developed. Rahman and Verma argued that ensemble classifier generation methods can be broadly classified into six groups that that are based on (i) manipulation of the training parameters, (ii) manipulation of the error function, (iii) manipulation of the feature space, (iv) manipulation of the output labels, (v) clustering, and (vi) manipulation of the training patterns.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 312 3. RELATED EMPIRICAL STUDIES Stapel, Zheng, and Pinkwart study investigated an approach that decomposes the math content structure underlying an online math learning platform, trains specialized classifiers on the resulting activity scopes and uses those classifiers in an ensemble to predict student performance on learning objectives. The study used J48, Decision Table and Naïve Bayes as base classifers and bagging ensemble model. The study concluded that J48 algorithm was doing better than the Naïve Bayesian. Also, bagging ensemble technique provided accuracy which was comparable to J48. Hence, this approach could aid the institution to find out means to enhance their students performance. 4. METHODOLOGY 4.1 Study Design This study adapted the Cross Industry Standard Process for Data Mining (CRISP-DM) process model suggested by Nisbet, Elder and Miner as a guiding framework. The framework breaks down a data mining project in phases which allow the building and implementation of a data mining model to be used in a real environment, helping to support business decisions. Figure I give an overview of the key stages in the adapted methodology. Stage 1: Business Understanding Stage 2: Data Understanding Stage 3: Data preparation Stage 4: Modeling Stage 5: Evaluation 4.2 Experiment 4.2.1 Data Collection Data was collected from sampled students through a personally administered structured questionnaire at Murang’a University of Technology, Kenya for the academic year 2016-2017. The target population was grouped into two mutually exclusive groups namely; STEM (Science,Technology,Engineering and Mathematics) and non-STEM Majors. 4.2.2 Data Transformation The collected data attributes were transformed into numerical values, where we assigned different numerical values to each of the attribute values. This data was then transformed into forms acceptable to WEKA data mining software. 4.2.3 Data Modeling To find the main reasons that affects the students’ choice to enroll in STEM courses the study used three base classification algorithms together with an ensemble model method, so that we can find accurate or exact factors affecting students’ enrollment in STEM. 5. RESULTS AND DISCUSSION We collected students’ information by distributing structured questionnaire among 220 students and 209 responses were collected. This data was preprocessed and recorded into Microsoft Excel file and then through online conversion tool, the Excel file was converted into .arff file which is supported by the WEKA software tool. We used Weka 3.6 software for our analysis. Table II shows the results obtained from the experiment. The information on Table II shows comparison details of the algorithms that were used in our analysis. When we compared the models, we found that the J48 Algorithm correctly classified 84% of the instances and 16% of the instances incorrectly classified. The classification error is less compared to the other two baseline classification algorithm, that is, CART (23% Incorrectly Classified Instances) and Naïve Bayes (28% Incorrectly Classified Instances).
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 313 6. CONCLUSION There are many factors that may affect students' choice to enroll and pursue a career in STEM in higher education institutions. These factors can be used during the admission process to ensure that students are admitted in the courses that best fit them. To categorize the students' based on the association between choice to enroll in a STEM major and attributes, a good classification is needed. In addition, rather than depending on the outcome of a single technique, ensemble model could do better. In our analysis, we found that J48 algorithm.is doing better than Naïve Bayesian and the CART algorithms. 8. REFERENCES [1].https://www.researchgate.net/publication/3262419 25_Using_Machine_Learning_Algorithm_to_Predict_Stude nt_Pass_Rates_In_Online_Education [2].https://www.hindawi.com/journals/scn/2018/5264 526/ [3].https://www.emerald.com/insight/content/doi/10. 1108/JARHE-09-2017-0113/full/html?fullSc=1 [4].https://www.researchgate.net/publication/3262419 25_Using_Machine_Learning_Algorithm_to_Predict_Stude nt_Pass_Rates_In_Online_Education [5]. https://dl.acm.org/citation.cfm?id=3018896.3065830