SlideShare a Scribd company logo
Swagat Ranjan Behera
What it is all about?
2
 Client/Business that provides several technical trainings to fresh graduates are interested in
understanding distinguishing factors that differentiated between deployable and non
deployable students.
 Business get badly affected with non deployable ones from both prospect revenues and
operational cost perspectives. Hence, interested in leveraging data science process in order to
identify certain student attributes which help them in distinguishing in front in order to avoid
all the business loss.
 Current study uses machine learning techniques to provide actionable insights to the
businesses. Business has provided past 3 years data that contains students basic
demographics, training details and deployment details.
Objective: “Model that distinguishes non deployable ones for all future uses.”
Data Understanding & Preparation
2
 Data of 44 columns with 3470 rows have student demographics, academic background,
training and assessment details, deployment details, etc.
 It has around ten to fifteen columns that measured students training and deployment
against their difficulty, attendance and repetitiveness.
 Columns are measured and available in both numeric and character formats.
 All columns are duly converted to needed formats namely., dates, binary responses,
nominal to numeric, etc.
 Found few columns (5) with missing value percentage more than 50% have been
excluded.
 Columns representing identification ones are ignored for analysis.
Feature Selection & Training:
 After the employment of feature selection algorithms infogain and glmnet the
following variables have been used for training the model:
 Education stream belonging to ECE, and IT, Technical skill belonging to Java &
Testing, Number of Training Days underwent, gender, quarter of the year, hiring mode
method.
 Then, data has been partitioned into training and testing dataset in the
respective ratio of 70:30.
 Due to nature of binary dependent (target), we employed three classification
model namely., Binary Logistic Regression for base model comparison and two
deep learning models using two different libraries.
 Below is glimpse of high p-value independents (features):
Validation
 Below are the accuracy with respect to different models on test data set.
 As, it is evident from above, BLR as base model couldn’t provide
appropriate AUC value, where as both deep learning classifier libraries
provided very good AUC accuracy measure against BLR.
 Thus, deep learning classifier provided better classification than
traditional binary logistic regression model.
Models Accuracy (AUC)
Binary Logistic Regression (BLR) 0.54
Deep Learning (using tensor flow) 0.86
Deep Learning (using H2o) 0.73

More Related Content

What's hot (18)

PPTX
Predicting students performance in final examination
Rashid Ansari
 
PPTX
Students' Academic Performance Predictive Analytics Use Case – Smarten
Smarten Augmented Analytics
 
PPTX
Machine learning overview
prih_yah
 
DOCX
Bt8901 object oriented systems-de (1)
smumbahelp
 
PDF
Bulk ieee projects 2012 2013
SBGC
 
DOCX
Bt8901 object oriented systems-de (1)
smumbahelp
 
PPTX
Info 2102 l4 basic select statement lab1
IIUM
 
PDF
Predicting students performance using classification techniques in data mining
Lovely Professional University
 
DOC
Audit report[rollno 49]
RAHULROHAM2
 
PDF
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET Journal
 
PDF
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
AIRCC Publishing Corporation
 
PPTX
what is a report?
diegofvl1
 
PDF
Evaluation of Data Mining Techniques for Predicting Student’s Performance
Lovely Professional University
 
PPTX
Determining Online Brand Reputation with Machine Learning from Social Media M...
Joni Salminen
 
DOCX
Report 01(MaxMin)
Md. Bashartullah (Rabby)
 
PPTX
Machine learning
Mohit Bishnoi
 
PPTX
INTERNSHIP
Vishal Srivastava
 
PPTX
Uniterm indexing
kavikaviarasan
 
Predicting students performance in final examination
Rashid Ansari
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Smarten Augmented Analytics
 
Machine learning overview
prih_yah
 
Bt8901 object oriented systems-de (1)
smumbahelp
 
Bulk ieee projects 2012 2013
SBGC
 
Bt8901 object oriented systems-de (1)
smumbahelp
 
Info 2102 l4 basic select statement lab1
IIUM
 
Predicting students performance using classification techniques in data mining
Lovely Professional University
 
Audit report[rollno 49]
RAHULROHAM2
 
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET Journal
 
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
AIRCC Publishing Corporation
 
what is a report?
diegofvl1
 
Evaluation of Data Mining Techniques for Predicting Student’s Performance
Lovely Professional University
 
Determining Online Brand Reputation with Machine Learning from Social Media M...
Joni Salminen
 
Report 01(MaxMin)
Md. Bashartullah (Rabby)
 
Machine learning
Mohit Bishnoi
 
INTERNSHIP
Vishal Srivastava
 
Uniterm indexing
kavikaviarasan
 

Similar to Ml application on_student_non_deployment (20)

PPTX
Employee Retention Prediction: A Data Science Project by Devangi Shukla
Boston Institute of Analytics
 
PPTX
"Predicting Employee Retention: A Data-Driven Approach to Enhancing Workforce...
Boston Institute of Analytics
 
PPTX
Employee Retention Prediction: Enhancing Workforce Stability
Boston Institute of Analytics
 
PPTX
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
Smarten Augmented Analytics
 
PDF
Group13 kdd cup_report_submitted
Chamath Sajeewa
 
PDF
Data Mining using SAS
Tanu Puri
 
PDF
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET Journal
 
PDF
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET Journal
 
PDF
Machine learning at b.e.s.t. summer university
László Kovács
 
PDF
Data mining - Machine Learning
RupaDutta3
 
PDF
Lead Scoring Group Case Study Presentation.pdf
KrishP2
 
PDF
Machine learning project_promotion
kahhuey
 
PDF
BIM Data Mining Unit3 by Tekendra Nath Yogi
Tekendra Nath Yogi
 
PPTX
Strategies for Employee Retention: Building a Resilient Workforce
Boston Institute of Analytics
 
PDF
Machine Learning in Customer Analytics
Course5i
 
DOCX
Feature extraction for classifying students based on theirac ademic performance
Venkat Projects
 
DOCX
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
celenarouzie
 
PDF
Lecture 12 binary classifier confusion matrix
Mostafa El-Hosseini
 
PPTX
BMDSE v1 - Data Scientist Deck
Sasha Lazarevic
 
PDF
Final Report
Aman Soni
 
Employee Retention Prediction: A Data Science Project by Devangi Shukla
Boston Institute of Analytics
 
"Predicting Employee Retention: A Data-Driven Approach to Enhancing Workforce...
Boston Institute of Analytics
 
Employee Retention Prediction: Enhancing Workforce Stability
Boston Institute of Analytics
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
Smarten Augmented Analytics
 
Group13 kdd cup_report_submitted
Chamath Sajeewa
 
Data Mining using SAS
Tanu Puri
 
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET Journal
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET Journal
 
Machine learning at b.e.s.t. summer university
László Kovács
 
Data mining - Machine Learning
RupaDutta3
 
Lead Scoring Group Case Study Presentation.pdf
KrishP2
 
Machine learning project_promotion
kahhuey
 
BIM Data Mining Unit3 by Tekendra Nath Yogi
Tekendra Nath Yogi
 
Strategies for Employee Retention: Building a Resilient Workforce
Boston Institute of Analytics
 
Machine Learning in Customer Analytics
Course5i
 
Feature extraction for classifying students based on theirac ademic performance
Venkat Projects
 
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
celenarouzie
 
Lecture 12 binary classifier confusion matrix
Mostafa El-Hosseini
 
BMDSE v1 - Data Scientist Deck
Sasha Lazarevic
 
Final Report
Aman Soni
 
Ad

Recently uploaded (20)

PPT
1 DATALINK CONTROL and it's applications
karunanidhilithesh
 
PPTX
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PPTX
Hadoop_EcoSystem slide by CIDAC India.pptx
migbaruget
 
PPTX
Introduction to Artificial Intelligence.pptx
StarToon1
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PPTX
This PowerPoint presentation titled "Data Visualization: Turning Data into In...
HemaDivyaKantamaneni
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPTX
AI Project Cycle and Ethical Frameworks.pptx
RiddhimaVarshney1
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PPTX
Climate Action.pptx action plan for climate
justfortalabat
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PDF
Performance Report Sample (Draft7).pdf
AmgadMaher5
 
PPTX
Usage of Power BI for Pharmaceutical Data analysis.pptx
Anisha Herala
 
PPTX
Rocket-Launched-PowerPoint-Template.pptx
Arden31
 
1 DATALINK CONTROL and it's applications
karunanidhilithesh
 
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
Hadoop_EcoSystem slide by CIDAC India.pptx
migbaruget
 
Introduction to Artificial Intelligence.pptx
StarToon1
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
This PowerPoint presentation titled "Data Visualization: Turning Data into In...
HemaDivyaKantamaneni
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
AI Project Cycle and Ethical Frameworks.pptx
RiddhimaVarshney1
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
Climate Action.pptx action plan for climate
justfortalabat
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
Performance Report Sample (Draft7).pdf
AmgadMaher5
 
Usage of Power BI for Pharmaceutical Data analysis.pptx
Anisha Herala
 
Rocket-Launched-PowerPoint-Template.pptx
Arden31
 
Ad

Ml application on_student_non_deployment

  • 2. What it is all about? 2  Client/Business that provides several technical trainings to fresh graduates are interested in understanding distinguishing factors that differentiated between deployable and non deployable students.  Business get badly affected with non deployable ones from both prospect revenues and operational cost perspectives. Hence, interested in leveraging data science process in order to identify certain student attributes which help them in distinguishing in front in order to avoid all the business loss.  Current study uses machine learning techniques to provide actionable insights to the businesses. Business has provided past 3 years data that contains students basic demographics, training details and deployment details. Objective: “Model that distinguishes non deployable ones for all future uses.”
  • 3. Data Understanding & Preparation 2  Data of 44 columns with 3470 rows have student demographics, academic background, training and assessment details, deployment details, etc.  It has around ten to fifteen columns that measured students training and deployment against their difficulty, attendance and repetitiveness.  Columns are measured and available in both numeric and character formats.  All columns are duly converted to needed formats namely., dates, binary responses, nominal to numeric, etc.  Found few columns (5) with missing value percentage more than 50% have been excluded.  Columns representing identification ones are ignored for analysis.
  • 4. Feature Selection & Training:  After the employment of feature selection algorithms infogain and glmnet the following variables have been used for training the model:  Education stream belonging to ECE, and IT, Technical skill belonging to Java & Testing, Number of Training Days underwent, gender, quarter of the year, hiring mode method.  Then, data has been partitioned into training and testing dataset in the respective ratio of 70:30.  Due to nature of binary dependent (target), we employed three classification model namely., Binary Logistic Regression for base model comparison and two deep learning models using two different libraries.  Below is glimpse of high p-value independents (features):
  • 5. Validation  Below are the accuracy with respect to different models on test data set.  As, it is evident from above, BLR as base model couldn’t provide appropriate AUC value, where as both deep learning classifier libraries provided very good AUC accuracy measure against BLR.  Thus, deep learning classifier provided better classification than traditional binary logistic regression model. Models Accuracy (AUC) Binary Logistic Regression (BLR) 0.54 Deep Learning (using tensor flow) 0.86 Deep Learning (using H2o) 0.73