ML - Expt 7
ML - Expt 7
7
Apply Dimensionality/ Reduction on Adult Census Income
Dataset and analyze the performance of the model
Date of Performance: 04/09/2023
Date of Submission: 11/09/2023
Aim: Apply Dimensionality Reduction on Adult Census Income Dataset and analyze the
performance of the model.
Theory:
In machine learning classification problems, there are often too many factors on the basis of
which the final classification is done. These factors are basically variables called features.
The higher the number of features, the harder it gets to visualize the training set and then
work on it. Sometimes, most of these features are correlated, and hence redundant. This is
where dimensionality reduction algorithms come into play. Dimensionality reduction is the
process of reducing the number of random variables under consideration, by obtaining a set
of principal variables. It can be divided into feature selection and feature extraction.
Dataset:
Predict whether income exceeds $50K/yr based on census data. Also known as "Adult"
dataset.
Attribute Information:
Listing of attributes:
>50K, <=50K.
age: continuous.
fnlwgt: continuous.
education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc,
9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.
education-num: continuous.
capital-gain: continuous.
capital-loss: continuous.
hours-per-week: continuous.
Code:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
print("Accuracy: {:.2f}".format(accuracy))
print("Precision: {:.2f}".format(precision))
print("Recall: {:.2f}".format(recall))
print("F1 Score: {:.2f}".format(f1))