0% found this document useful (0 votes)

16 views

Project 3 - Income Qualification - Source Code

Uploaded by

sneha fabey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Project 3 - Income Qualification - Source Code

Uploaded by

sneha fabey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Project2: Income Qualification

Import os and Warnings

Problem Statement Scenario:

Many social programs have a hard time making sure the right people are given enough aid. It’s tricky when a
program focuses on the poorest segment of the population. This segment of population can’t provide the
necessary income and expense records to prove that they qualify.

In Latin America, a popular method called Proxy Means Test (PMT) uses an algorithm to verify income
qualification. With PMT, agencies use a model that considers a family’s observable household attributes like
the material of their walls and ceiling or the assets found in their homes to classify them and predict their
level of need. While this is an improvement, accuracy remains a problem as the region’s population grows
and poverty declines.

The Inter-American Development Bank (IDB) believes that new methods beyond traditional econometrics,
based on a dataset of Costa Rican household characteristics, might help improve PMT’s performance.

Let us explore our dataset before moving further

Let us identify our target variable

Lets Understand the type of data.

We have mixed data types. Specified as below:

 float64 : 8 variables
 int64 : 130 vriables
 object :5 variables

Below is Data dictionary for above object variables

 ID = Unique ID
 idhogar, Household level identifier
 dependency, Dependency rate, calculated = (number of members of the household younger than 19
or older than 64)/(number of member of household between 19 and 64)
 edjefe, years of education of male head of household, based on the interaction of escolari (years of
education), head of household and gender, yes=1 and no=0
 edjefa, years of education of female head of household, based on the interaction of escolari (years of
education), head of household and gender, yes=1 and no=0
Lets Convert object variables into numerical data.
Now all data is in numerical form

Lets identify variable with 0 varinace

elimbasu5 : 1 if rubbish disposal mainly by throwing in river, creek or sea.

Interpretation: From above it is shown that all values of elimbasu5 is same so there is no variability in
dataset therefor we will drop this variable

Check if there are any biases in your dataset.

Therefore, variables ('r4t3','hogar_total') have relationship between them. For good result we can use any
one of them.
Therefore, variables ('tipovivi3','v2a1') have relationship between them. For good result we can use any one
of them.
Therefore,variables ('v18q','v18q1') have relationship between them. For good result we can use any
one of them.

Conclusion : Therefore, there is bias in our dataset.

Check if there is a house without a family head.

"parentesco1" =1 if household head
Interpretation : Above cross tab shows 0 male head and 0 female head which implies that there are 435
families with no family head.

Count how many null values are existing in columns.

Interpretation: There are no null values in Target variable. Now lets proceed further and identify and fillna
of other variable.
Interpretation and action : 'v2a1', 'v18q1', 'rez_esc' have more than 50% null values, because for v18q1,
there are families with their own house so they won't pay rent in that case it should be 0 and similar is for
v18q1 there can be families with 0 tablets.

Istead we can drop a column tipovivi3,v18q

 tipovivi3, =1 rented
 v18q, owns a tablet

as v2a1 alone can show both **as v18q1 alone can show that if respondent owns a tablet or not
Interpretation : Now there is no null value in our datset.

Set the poverty level of the members and the head of the house same in a family.
Now for people below poverty level can be people paying less rent and don't own a
house. and it also depends on whether a house is in urban area or rural area.

 For rural area level if people paying rent less than 8000 is under poverty level.
 For Urban area level if people paying rent less than 140000 is under poverty level.
Interpretation :

 There are total 1242 people above poverty level independent of area whether rural or Urban
 Remaining 1111 people level depends on their area

Rural :

Above poverty level= 445

Urban :

Above poverty level =1103

Below poverty level=1081

Applying Standard Scalling to dataset

Now we will proceed to model fitting

Lets identify best parameters for our model using GridSearchCv

Lets apply cleaning on test data and then find prediction for that.

Interpretation : Above is our prediction for test data.

Conclusion :
Using RandomForest Classifier we can predict test_data with accuracy of 90%.

Machine Learning Business Report
75% (55)
Machine Learning Business Report
60 pages
Lieberman Completo
100% (3)
Lieberman Completo
340 pages
Capstone Project - Credit Risk Analysis
67% (6)
Capstone Project - Credit Risk Analysis
50 pages
Census Income Project
No ratings yet
Census Income Project
4 pages
ML Ts Proj
100% (9)
ML Ts Proj
58 pages
Income Qualification Project3
No ratings yet
Income Qualification Project3
40 pages
Costa Rican Household Poverty Level Prediction
50% (2)
Costa Rican Household Poverty Level Prediction
19 pages
BA Project - Section 1 Group 1
No ratings yet
BA Project - Section 1 Group 1
27 pages
BOSeJ_1_3_Article+3 (1)
No ratings yet
BOSeJ_1_3_Article+3 (1)
14 pages
StarterNotebook - Jupyter Notebook
No ratings yet
StarterNotebook - Jupyter Notebook
12 pages
Loan Status Prediction
No ratings yet
Loan Status Prediction
23 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Project paarth (1) (1)
No ratings yet
Project paarth (1) (1)
21 pages
FRA Assignment - India Credit Model
No ratings yet
FRA Assignment - India Credit Model
14 pages
US Census Income 1
No ratings yet
US Census Income 1
18 pages
Report_1_AI17C_DBM302m_KhaiHoan_BaoChau_VanThu
No ratings yet
Report_1_AI17C_DBM302m_KhaiHoan_BaoChau_VanThu
6 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Machine Learning Engineer Nanodegree Supervised Learning Project: Finding Donors For CharityML
No ratings yet
Machine Learning Engineer Nanodegree Supervised Learning Project: Finding Donors For CharityML
16 pages
WBWP Proxy
No ratings yet
WBWP Proxy
45 pages
Project - Finance and Risk Assessment: Submitted By: Navendu Mishra
No ratings yet
Project - Finance and Risk Assessment: Submitted By: Navendu Mishra
18 pages
JAYANT Project Machine Learning
No ratings yet
JAYANT Project Machine Learning
20 pages
A Note On R
No ratings yet
A Note On R
90 pages
PA Univariate R Solution
No ratings yet
PA Univariate R Solution
6 pages
Sukanya Linear LogisticRegression Report
100% (1)
Sukanya Linear LogisticRegression Report
23 pages
R Assignment
No ratings yet
R Assignment
8 pages
Predictive Analytics Group Assignment
No ratings yet
Predictive Analytics Group Assignment
21 pages
Feature Engineering And Feature Selection With Python A Practical Guide For Feature Crafting Younes download
100% (3)
Feature Engineering And Feature Selection With Python A Practical Guide For Feature Crafting Younes download
45 pages
Germany Credit Analysis
No ratings yet
Germany Credit Analysis
41 pages
Building Logistic regression model in python
No ratings yet
Building Logistic regression model in python
24 pages
Adult Census Income Prediction
No ratings yet
Adult Census Income Prediction
31 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
Churn Assignment
No ratings yet
Churn Assignment
11 pages
Sberbank Project Report
No ratings yet
Sberbank Project Report
19 pages
STA 591 Test1 F24 (TakeHome)
No ratings yet
STA 591 Test1 F24 (TakeHome)
7 pages
Data Preparation
No ratings yet
Data Preparation
2 pages
02450ex Spring2020 Sol
No ratings yet
02450ex Spring2020 Sol
20 pages
Laporan Analisis Studi Kasus Klasifikasi Kemiskinan - I Wayan Ardi Satya Putra - Maulana Ihsan
No ratings yet
Laporan Analisis Studi Kasus Klasifikasi Kemiskinan - I Wayan Ardi Satya Putra - Maulana Ihsan
6 pages
Family Main
No ratings yet
Family Main
5 pages
data_preprocess_steps
No ratings yet
data_preprocess_steps
2 pages
LDA CreditCardDefault Code N
No ratings yet
LDA CreditCardDefault Code N
11 pages
Bussiness Report PM
No ratings yet
Bussiness Report PM
44 pages
Finance and Risk Analytics Project Sai Vinayak Sanam PDF
No ratings yet
Finance and Risk Analytics Project Sai Vinayak Sanam PDF
99 pages
Machine Learning Project: Name-Rasmita Mallick Date - 5 September 2021
100% (2)
Machine Learning Project: Name-Rasmita Mallick Date - 5 September 2021
47 pages
solutions3 (1)
No ratings yet
solutions3 (1)
12 pages
Empirical Project 4 Using Google Datacommons To Predict Social Mobility
No ratings yet
Empirical Project 4 Using Google Datacommons To Predict Social Mobility
8 pages
Credit EDA Case Study
No ratings yet
Credit EDA Case Study
42 pages
Ensemble Techniques Project
100% (2)
Ensemble Techniques Project
28 pages
Predictive_Modelling_Alternate_Project_Business_Case.docx
No ratings yet
Predictive_Modelling_Alternate_Project_Business_Case.docx
47 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
DM WK 1
No ratings yet
DM WK 1
13 pages
Random Sets Approach and Its Applications
No ratings yet
Random Sets Approach and Its Applications
12 pages
Solution Manual for Introductory Econometrics A Modern Approach 6th Edition Wooldridge 130527010X 9781305270107 download
100% (3)
Solution Manual for Introductory Econometrics A Modern Approach 6th Edition Wooldridge 130527010X 9781305270107 download
44 pages
Mid-Sem Model Answer 7
No ratings yet
Mid-Sem Model Answer 7
5 pages
Lecture Note 2019 PDF
100% (1)
Lecture Note 2019 PDF
235 pages
TYCS Practical
No ratings yet
TYCS Practical
26 pages
HD Econometrics
No ratings yet
HD Econometrics
197 pages
ECN190 Term Project: Predicting Credit Card Default Risk: Introduction and Literature
No ratings yet
ECN190 Term Project: Predicting Credit Card Default Risk: Introduction and Literature
18 pages
Credit Card Default
No ratings yet
Credit Card Default
30 pages
Trackpad Pro Ver. 5.0 Class 7: WINDOWS 11 & MS OFFICE 2021
From Everand
Trackpad Pro Ver. 5.0 Class 7: WINDOWS 11 & MS OFFICE 2021
Nidhi Arora
No ratings yet
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet
Hypothesis Testing Made Simple
From Everand
Hypothesis Testing Made Simple
Leonard Gaston
4/5 (5)
Htm13-Epicyclic Gear Train
No ratings yet
Htm13-Epicyclic Gear Train
2 pages
Specific Gravity Test - Lab Manual
No ratings yet
Specific Gravity Test - Lab Manual
3 pages
Asrjc 9758 2023 Prelim p2
No ratings yet
Asrjc 9758 2023 Prelim p2
7 pages
Yale
No ratings yet
Yale
3 pages
XXX Russian-Polish-Slovak Seminar Theoretical Foundation of Civil Engineering (RSP 2021): Selected Papers (Lecture Notes in Civil Engineering, 189) Pavel Akimov (Editor) instant download
100% (6)
XXX Russian-Polish-Slovak Seminar Theoretical Foundation of Civil Engineering (RSP 2021): Selected Papers (Lecture Notes in Civil Engineering, 189) Pavel Akimov (Editor) instant download
62 pages
Auma Acexc01.2
No ratings yet
Auma Acexc01.2
96 pages
CBSE Test Paper 02 Chapter 7 Coordinate Geometry
No ratings yet
CBSE Test Paper 02 Chapter 7 Coordinate Geometry
9 pages
Skema Fizik Kertas 2 Trial Perlis
100% (1)
Skema Fizik Kertas 2 Trial Perlis
9 pages
2024+Lecture+7+Weathering
No ratings yet
2024+Lecture+7+Weathering
30 pages
Pn Junction Diode (Lec -3) - SDC PDF Notes.
No ratings yet
Pn Junction Diode (Lec -3) - SDC PDF Notes.
21 pages
SUN2000MA-12-15-20KTL-M0 Datasheet 01 Brazil - (20190326)
No ratings yet
SUN2000MA-12-15-20KTL-M0 Datasheet 01 Brazil - (20190326)
2 pages
Masterpress User Manual
No ratings yet
Masterpress User Manual
11 pages
Work and Energy: 1. Objective Questions
100% (1)
Work and Energy: 1. Objective Questions
7 pages
(Ebook) Exercises and Problems in Linear Algebra by John M. Erdman ISBN 9789811221071, 9811221073 2024 Scribd Download
100% (4)
(Ebook) Exercises and Problems in Linear Algebra by John M. Erdman ISBN 9789811221071, 9811221073 2024 Scribd Download
67 pages
AM of Ceramic Based Materials
No ratings yet
AM of Ceramic Based Materials
26 pages
PHYS 3023 Final Exam SP22_removed
No ratings yet
PHYS 3023 Final Exam SP22_removed
4 pages
Vsphere Troubleshooting Tips and Tricks: Publication or Distribution
No ratings yet
Vsphere Troubleshooting Tips and Tricks: Publication or Distribution
52 pages
Rock Mechanics
No ratings yet
Rock Mechanics
7 pages
PL - SQL Quick Guide
No ratings yet
PL - SQL Quick Guide
35 pages
Performing A Modal Transient Response Part II
No ratings yet
Performing A Modal Transient Response Part II
5 pages
10 Unit6
No ratings yet
10 Unit6
44 pages
PTHC-200DC Manual
No ratings yet
PTHC-200DC Manual
24 pages
Deep Sea Electronics: DSE3110 Operator Manual Document Number: 057-086
No ratings yet
Deep Sea Electronics: DSE3110 Operator Manual Document Number: 057-086
56 pages
Plain & Reinforced Concrete: Analysis and Design of Slabs
No ratings yet
Plain & Reinforced Concrete: Analysis and Design of Slabs
54 pages
Carta Descriptiva Kid's Box 5
No ratings yet
Carta Descriptiva Kid's Box 5
15 pages
Ujian Faktor Dan Gandaan
No ratings yet
Ujian Faktor Dan Gandaan
2 pages
Photobioreactor Design
No ratings yet
Photobioreactor Design
11 pages
Biophysics
No ratings yet
Biophysics
2 pages
Dokumen - Tips LG 21fu7rl Chasis Cw81bpdf
No ratings yet
Dokumen - Tips LG 21fu7rl Chasis Cw81bpdf
22 pages

Project 3 - Income Qualification - Source Code

Uploaded by

Project 3 - Income Qualification - Source Code

Uploaded by

Project2: Income Qualification

Import os and Warnings

Problem Statement Scenario:

Let us explore our dataset before moving further

Let us identify our target variable

We have mixed data types. Specified as below:

Below is Data dictionary for above object variables

Lets identify variable with 0 varinace

elimbasu5 : 1 if rubbish disposal mainly by throwing in river, creek or sea.

Check if there are any biases in your dataset.

Conclusion : Therefore, there is bias in our dataset.

Check if there is a house without a family head.

Count how many null values are existing in columns.

Istead we can drop a column tipovivi3,v18q

Above poverty level= 445

Above poverty level =1103

Below poverty level=1081

Now we will proceed to model fitting

Lets identify best parameters for our model using GridSearchCv

Interpretation : Above is our prediction for test data.

You might also like