0% found this document useful (0 votes)

9 views

ML - Expt 7

The document describes an experiment applying dimensionality reduction to the Adult Census Income dataset using PCA and evaluating a random forest classifier's performance. Dimensionality is reduced to 10 components, and the model achieves an accuracy of 0.85, precision of 0.72, recall of 0.62, and F1 score of 0.66.

Uploaded by

mitali.201433201

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

ML - Expt 7

Uploaded by

mitali.201433201

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Experiment No.

7
Apply Dimensionality/ Reduction on Adult Census Income
Dataset and analyze the performance of the model
Date of Performance: 04/09/2023
Date of Submission: 11/09/2023
Aim: Apply Dimensionality Reduction on Adult Census Income Dataset and analyze the
performance of the model.

Objective: Able to perform various feature engineering tasks, perform dimetionality

reduction on the given dataset and maximize the accuracy, Precision, Recall, F1 score.

Theory:

In machine learning classification problems, there are often too many factors on the basis of
which the final classification is done. These factors are basically variables called features.
The higher the number of features, the harder it gets to visualize the training set and then
work on it. Sometimes, most of these features are correlated, and hence redundant. This is
where dimensionality reduction algorithms come into play. Dimensionality reduction is the
process of reducing the number of random variables under consideration, by obtaining a set
of principal variables. It can be divided into feature selection and feature extraction.

Dataset:

Predict whether income exceeds $50K/yr based on census data. Also known as "Adult"
dataset.

Attribute Information:

Listing of attributes:

>50K, <=50K.

age: continuous.

workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov,

Without-pay, Never-worked.

fnlwgt: continuous.
education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc,
9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.

education-num: continuous.

marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-

spouse-absent, Married-AF-spouse.

occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-

specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-
moving, Priv-house-serv, Protective-serv, Armed-Forces.

relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.

race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.

sex: Female, Male.

capital-gain: continuous.

capital-loss: continuous.

hours-per-week: continuous.

native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany,

Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras,
Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-
Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua,
Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad & Tobago, Peru, Hong, Holand-
Netherlands.

Code:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load the dataset

# data = pd.read_csv("adult.csv") # Make sure to load your dataset

# Encode categorical features

categorical_features = data.select_dtypes(include=['object']).columns
for feature in categorical_features:
data[feature] = LabelEncoder().fit_transform(data[feature])

# Split the data into features (X) and target (y)

X = data.drop('>50K', axis=1)
y = data['>50K']

# Standardize the features

scaler = StandardScaler()
X = scaler.fit_transform(X)

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Perform dimensionality reduction using PCA

n_components = 10 # Adjust the number of components as needed
pca = PCA(n_components=n_components)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)

# Train a classifier (Random Forest, for example)

clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train_pca, y_train)

# Make predictions on the test set

y_pred = clf.predict(X_test_pca)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print("Accuracy: {:.2f}".format(accuracy))
print("Precision: {:.2f}".format(precision))
print("Recall: {:.2f}".format(recall))
print("F1 Score: {:.2f}".format(f1))

# Optional: Visualize explained variance ratio

explained_variance_ratio = pca.explained_variance_ratio_
print("Explained Variance Ratio for Each Principal Component:")
print(explained_variance_ratio)
Output:
Accuracy: 0.85
Precision: 0.72
Recall: 0.62
F1 Score: 0.66
Explained Variance Ratio for Each Principal Component:
[0.15518513 0.10236402 0.09369864 0.08605513 0.08026009 0.07491667
0.07026711 0.06332068 0.06128732 0.04822278]
Conclusion:
The process of reducing dimensionality in machine learning models can produce a range of
outcomes. On the positive side, it can help prevent overfitting, enhance the model's ability to
make predictions on new data, and make computations more efficient. However, it's
important to note that there can be drawbacks, including the possibility of losing valuable
information, decreasing the model's precision and recall, and struggling to handle noisy data
effectively.

Ai-900 3df695e8afa1
No ratings yet
Ai-900 3df695e8afa1
61 pages
Mercedes-Benz Greener Manufacturing Ai
0% (1)
Mercedes-Benz Greener Manufacturing Ai
16 pages
featureselection
No ratings yet
featureselection
11 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
MACHINE LEARNING manual
No ratings yet
MACHINE LEARNING manual
36 pages
ml_labmanual (3)
No ratings yet
ml_labmanual (3)
33 pages
Machine Learning Lab New
No ratings yet
Machine Learning Lab New
14 pages
AIL303 M
No ratings yet
AIL303 M
22 pages
US Census Income 1
No ratings yet
US Census Income 1
18 pages
2_DataPreProcessing_code
No ratings yet
2_DataPreProcessing_code
46 pages
Pattern Recognition
No ratings yet
Pattern Recognition
26 pages
machinelearning
No ratings yet
machinelearning
26 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
Assignment 03
No ratings yet
Assignment 03
6 pages
ml_all_projectpdf_removed
No ratings yet
ml_all_projectpdf_removed
41 pages
MACHINE LEARNING PROJECT
No ratings yet
MACHINE LEARNING PROJECT
29 pages
CSC 240 HW 2
No ratings yet
CSC 240 HW 2
5 pages
ML lab manual
No ratings yet
ML lab manual
25 pages
StarterNotebook - Jupyter Notebook
No ratings yet
StarterNotebook - Jupyter Notebook
12 pages
Project paarth (1) (1)
No ratings yet
Project paarth (1) (1)
21 pages
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
No ratings yet
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
11 pages
2. ML Lab Record
No ratings yet
2. ML Lab Record
38 pages
Department of Computer Engineering Academic Term: June-Nov 2021
No ratings yet
Department of Computer Engineering Academic Term: June-Nov 2021
6 pages
ML Lab File[1]
No ratings yet
ML Lab File[1]
43 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
33 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
Adult Census Income Prediction
No ratings yet
Adult Census Income Prediction
31 pages
B22EE010 Report
No ratings yet
B22EE010 Report
9 pages
ML LAB Rec
No ratings yet
ML LAB Rec
9 pages
ml lab
No ratings yet
ml lab
14 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
26 pages
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
No ratings yet
Perform Prediction Using Regression Algorithm: Ex No: 1 Date
13 pages
Machine Learning Laboratory
No ratings yet
Machine Learning Laboratory
23 pages
MLLabManual
No ratings yet
MLLabManual
24 pages
Bank Marketing Targets 1724510938
No ratings yet
Bank Marketing Targets 1724510938
13 pages
4 Automatic Outlier Detection Algorithms in Python
No ratings yet
4 Automatic Outlier Detection Algorithms in Python
2 pages
About Classificatio1
No ratings yet
About Classificatio1
5 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
Coding Activity 3.ipynb - Colaboratory
No ratings yet
Coding Activity 3.ipynb - Colaboratory
7 pages
ML File
No ratings yet
ML File
37 pages
EMPLOYEE PERFORMANCE ANALYSIS
No ratings yet
EMPLOYEE PERFORMANCE ANALYSIS
3 pages
PPA Data Preparation
No ratings yet
PPA Data Preparation
31 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
Assignment 2 Documentation
No ratings yet
Assignment 2 Documentation
15 pages
ML Lab PT
No ratings yet
ML Lab PT
25 pages
Exp 5
No ratings yet
Exp 5
4 pages
TYCS Practical
No ratings yet
TYCS Practical
26 pages
Machine learning lab manual
No ratings yet
Machine learning lab manual
22 pages
Arithmetic With Matrices
No ratings yet
Arithmetic With Matrices
8 pages
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
No ratings yet
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
6 pages
Xgboost
No ratings yet
Xgboost
12 pages
Titanic Dataset Model Prediction
No ratings yet
Titanic Dataset Model Prediction
11 pages
Machine Learning Project Report
No ratings yet
Machine Learning Project Report
65 pages
Machine Learning (2) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (2) : Inteligência Artificial E Cibersegurança (Inacs)
45 pages
Aiml Ex 4-7
No ratings yet
Aiml Ex 4-7
8 pages
Lecture02. ML Pipeline (Chapter 2)
No ratings yet
Lecture02. ML Pipeline (Chapter 2)
50 pages
BA Project - Section 1 Group 1
No ratings yet
BA Project - Section 1 Group 1
27 pages
Final 1
No ratings yet
Final 1
6 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
Building Logistic regression model in python
No ratings yet
Building Logistic regression model in python
24 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Radio Frequency Fingerprint-Based Drone Identification and Classification Using Mel Spectrograms and Pre-Trained YAMNet Neural
No ratings yet
Radio Frequency Fingerprint-Based Drone Identification and Classification Using Mel Spectrograms and Pre-Trained YAMNet Neural
15 pages
Movie Recommendation System Using Machine Learning: Robin Sharma (1613106009)
No ratings yet
Movie Recommendation System Using Machine Learning: Robin Sharma (1613106009)
21 pages
Analysis On Prediction of Plant Leaf Diseases Using Deep Learning
No ratings yet
Analysis On Prediction of Plant Leaf Diseases Using Deep Learning
5 pages
Pump and Dumps in The Bitcoin Era Real Time Detect-1
No ratings yet
Pump and Dumps in The Bitcoin Era Real Time Detect-1
10 pages
COC257-Commercial Applications of Vehicle Image Classification
No ratings yet
COC257-Commercial Applications of Vehicle Image Classification
48 pages
Review of Products Using Sentiment Analysis (4-2 Project Report) - 3
No ratings yet
Review of Products Using Sentiment Analysis (4-2 Project Report) - 3
75 pages
Estimation of Blood Glucose by Non Invas
No ratings yet
Estimation of Blood Glucose by Non Invas
14 pages
003 01 KNN - Intro W3L1
No ratings yet
003 01 KNN - Intro W3L1
8 pages
NCA-GENM Dumps
No ratings yet
NCA-GENM Dumps
15 pages
Heliyon D 23 40759
No ratings yet
Heliyon D 23 40759
18 pages
Sales Forecasting Elsvier
No ratings yet
Sales Forecasting Elsvier
19 pages
An Auto-Explained Automated Machine Learning Tool For Big
No ratings yet
An Auto-Explained Automated Machine Learning Tool For Big
6 pages
Ai Notes
No ratings yet
Ai Notes
31 pages
The Little Book of Deep Learning François Fleuret download pdf
100% (3)
The Little Book of Deep Learning François Fleuret download pdf
55 pages
Deep Learning Module 1
No ratings yet
Deep Learning Module 1
46 pages
Malware Detection
No ratings yet
Malware Detection
38 pages
DDoS Attack Detection Using ML
No ratings yet
DDoS Attack Detection Using ML
6 pages
An IoT Based Smart Farming System Using Machine Learning
No ratings yet
An IoT Based Smart Farming System Using Machine Learning
7 pages
Mobile Phone price classification and Prediction - Final Project
No ratings yet
Mobile Phone price classification and Prediction - Final Project
7 pages
Applications of Machine Learning in The Chemical Pathology Laboratory 2021
No ratings yet
Applications of Machine Learning in The Chemical Pathology Laboratory 2021
8 pages
State of Charge SoC Estimation of Battery Energy Storage System BESS Using Artificial Neural Network ANN Based On IoT - Enabled Embedded System
No ratings yet
State of Charge SoC Estimation of Battery Energy Storage System BESS Using Artificial Neural Network ANN Based On IoT - Enabled Embedded System
7 pages
Predective Modelling
No ratings yet
Predective Modelling
28 pages
Complete Panoptic Traffic Recognition System With Ensemble of YOLO Family Models
No ratings yet
Complete Panoptic Traffic Recognition System With Ensemble of YOLO Family Models
9 pages
Age and Gender Detection Using Deep Learning
No ratings yet
Age and Gender Detection Using Deep Learning
6 pages
Evaluation Metrics for Your Regression Model - Analytics Vidhya
No ratings yet
Evaluation Metrics for Your Regression Model - Analytics Vidhya
6 pages
RIAWELC a Novel Dataset of Radiographic Images For
No ratings yet
RIAWELC a Novel Dataset of Radiographic Images For
5 pages
OrthoNets: Orthogonal Channel Attention Networks
No ratings yet
OrthoNets: Orthogonal Channel Attention Networks
9 pages
Breast Cancer Survival Prediction Using Machine Learning
No ratings yet
Breast Cancer Survival Prediction Using Machine Learning
7 pages
FDS Unit 2
No ratings yet
FDS Unit 2
8 pages

ML - Expt 7

Uploaded by

ML - Expt 7

Uploaded by

Experiment No.

Objective: Able to perform various feature engineering tasks, perform dimetionality

workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov,

marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-

occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-

relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried.

race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.

sex: Female, Male.

native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany,

# Load the dataset

# Encode categorical features

# Split the data into features (X) and target (y)

# Standardize the features

# Split the data into training and testing sets

# Perform dimensionality reduction using PCA

# Train a classifier (Random Forest, for example)

# Make predictions on the test set

# Evaluate the model

# Optional: Visualize explained variance ratio

You might also like