Project 3 - Income Qualification - Source Code
Project 3 - Income Qualification - Source Code
In Latin America, a popular method called Proxy Means Test (PMT) uses an algorithm to verify income
qualification. With PMT, agencies use a model that considers a family’s observable household attributes like
the material of their walls and ceiling or the assets found in their homes to classify them and predict their
level of need. While this is an improvement, accuracy remains a problem as the region’s population grows
and poverty declines.
The Inter-American Development Bank (IDB) believes that new methods beyond traditional econometrics,
based on a dataset of Costa Rican household characteristics, might help improve PMT’s performance.
float64 : 8 variables
int64 : 130 vriables
object :5 variables
ID = Unique ID
idhogar, Household level identifier
dependency, Dependency rate, calculated = (number of members of the household younger than 19
or older than 64)/(number of member of household between 19 and 64)
edjefe, years of education of male head of household, based on the interaction of escolari (years of
education), head of household and gender, yes=1 and no=0
edjefa, years of education of female head of household, based on the interaction of escolari (years of
education), head of household and gender, yes=1 and no=0
Lets Convert object variables into numerical data.
Now all data is in numerical form
Interpretation: From above it is shown that all values of elimbasu5 is same so there is no variability in
dataset therefor we will drop this variable
tipovivi3, =1 rented
v18q, owns a tablet
as v2a1 alone can show both **as v18q1 alone can show that if respondent owns a tablet or not
Interpretation : Now there is no null value in our datset.
Set the poverty level of the members and the head of the house same in a family.
Now for people below poverty level can be people paying less rent and don't own a
house. and it also depends on whether a house is in urban area or rural area.
For rural area level if people paying rent less than 8000 is under poverty level.
For Urban area level if people paying rent less than 140000 is under poverty level.
Interpretation :
There are total 1242 people above poverty level independent of area whether rural or Urban
Remaining 1111 people level depends on their area
Rural :
Urban :
Conclusion :
Using RandomForest Classifier we can predict test_data with accuracy of 90%.