0% found this document useful (0 votes)

120 views5 pages

Subset Selection Class Assignment

The Python program: 1) Imports necessary libraries and reads in a diamond price data set 2) Handles missing data through imputation of interval and categorical variables 3) Encodes categorical variables and scales interval variables 4) Splits data into training and test sets and fits a linear regression model 5) Evaluates model performance and prints the minimum, maximum, and mean of predicted and actual values 6) Prints the first 15 observations of predicted values

Uploaded by

Aashu Nema

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

120 views5 pages

Subset Selection Class Assignment

Uploaded by

Aashu Nema

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Part 1 SAS:

1) A screen shot of your project window

2) A listing or screen shot of the min, max and mean of the predicted and actual amounts in
your data.
3) A listing of the first 15 observations after imputation and prediction.

Part 2: Python
1) A copy of your Python program
#Importing Library
import pandas as pd
import numpy as np
from AdvancedAnalytics import ReplaceImputeEncode

#Reading Data
df = pd.read_excel("diamondswmissing.xlsx")

#Missing Values
data_map = {\
'obs': [4,(1,53940)],\
'Carat':[0,(0.2,5.5)],\
'cut':[2,('Fair','Good','Premium','Very Good')],\
'color':[2,('D','E','F','G','H','I','J')],\
'clarity':[2,('I1','IF','SI1','SI2','VS1','VS2','VVS1','VVS2')],\
'depth':[0,(40,80)],\
'table':[0,(40,100)],\
'x':[0,(0,11)],\
'y':[0,(0,60)],\
'z':[0,(0,32)],\
'price':[0,(300,20000)]\
}
rie = ReplaceImputeEncode(data_map=data_map,display=True)
df.rie = rie.fit_transform(df)

#Imputing Missing Values

from sklearn import preprocessing
interval_attributes = ['Carat','depth','table','x','y','z']
interval_data = df.as_matrix(columns = interval_attributes)
interval_imputer = preprocessing.Imputer(strategy = 'mean')
imputed_interval_data = interval_imputer.fit_transform(interval_data)

print("Imputed Interval Data:\n", imputed_interval_data)

# Convert String Categorical Attribute to Numbers for further assesment

# Mapping of categories to numbers for attribute 'cut'
cut_map = {'Ideal':0, 'Premium':1, 'Good':2, 'Very Good':3, 'Fair':4}
df['cut'] = df['cut'].map(cut_map)
# Mapping of categories to numbers for attribute 'color'
color_map = {'E':0,'I':1,'J':2,'H':3,'F':4,'G':5,'D':6}
df['color'] = df['color'].map(color_map)
# Mapping of categories to numbers for attribute 'clarity'
clarity_map = {'SI2':0,'SI':1,'VS1':2,'VS2':3,'VVS2':4,'VVS1':5,'I1':6,'IF':7}
df['clarity'] = df['clarity'].map(clarity_map)
print(df)

# Converting nominal data from the dataframe into a numpy array

nominal_attributes = ['cut','color','clarity']
nominal_data = df.as_matrix(columns=nominal_attributes)
# Create Imputer for Categorical Data
cat_imputer = preprocessing.Imputer(strategy='most_frequent')
# Imputing missing values in the Categorical Data
imputed_nominal_data = cat_imputer.fit_transform(nominal_data)
#inserting imputed data in the data frame
df[['cut','color','clarity']] = imputed_nominal_data
df[['Carat','depth','table','x','y','z']] = imputed_interval_data
df.head()

#Encoding
scaler = preprocessing.StandardScaler() # Create an instance of StandardScaler()
scaler.fit(imputed_interval_data)
scaled_interval_data = scaler.transform(imputed_interval_data)
print("Imputed & Scaled Interval Data\n", scaled_interval_data)

# Create an instance of the OneHotEncoder & Selecting Attributes

onehot = preprocessing.OneHotEncoder()
hot_array = onehot.fit_transform(imputed_nominal_data).toarray()
print(hot_array)

from AdvancedAnalytics import linreg

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

y = df['price']
x = df.drop('price',axis=1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=1)

lr=LinearRegression()

col=[]
for i in range(x_train.shape[1]):
col.append('X'+str(i))

lr.fit(x_train,y_train)
print("\n*** LINEAR REGRESSION ***")
linreg.display_coef(lr, x_train, y_train, col)
linreg.display_metrics(lr, x_train, y_train)

y_hat= lr.predict(x_test)
xtestarr = np.asanyarray(x_test)
ytestarr = np.asanyarray(y_test)

print("Residual sum of squares: %.2f"

% np.mean((y_hat - y_test) ** 2))

#Maximum, minimum and mean of predicted value

prediction_min = y_hat.min()
print("\nPredicted minimum\n",prediction_min)
prediction_max = y_hat.max()
print("\nPredicted maximum\n",prediction_max)
prediction_mean = y_hat.mean()
print("\nPredicted mean\n",prediction_mean)

#Maximum, minimum and mean of imputed value

imputed_min = y_test.idxmin(axis = 0)
print("\nImputed minimum\n",imputed_min)
imputed_max = y_test.idxmax(axis = 0)
print("\nImputed maximum\n",imputed_max)
imputed_mean = y_test.mean(axis = 0)
print("\nImputed mean\n",imputed_mean)

# Printing first 15 predicted values

print("\nFirst 15 predicted values\n",y_hat[0:14])

# Printing table
final_table = df.head(15)
from pandas import ExcelWriter
writer = ExcelWriter('PythonHW.xlsx')
final_table.to_excel(writer)
writer.save()

2) A listing or screen shot of the min, max and mean of the predicted and actual amounts in
your data.
Predicted minimum
-8484.009878847311

Predicted maximum
27244.01984476174

Predicted mean
3950.5366225183566

Actual minimum
2

Actual maximum
27746

Actual mean
3900.195464095909

3) A listing of the first 15 observations after imputation and prediction.

First 15 predicted values
[ 131.20374427 7045.54015631 3251.63967449 -152.36320968 6716.05896494
1077.15222802 6289.70190779 -116.45970468 9951.5798897 823.26302334
-242.72890304 3120.9538463 3851.97449455 430.84315957]

Data Wrangling and Preprocessing
100% (1)
Data Wrangling and Preprocessing
41 pages
Data Analytics Lab Manual_250402_095326
No ratings yet
Data Analytics Lab Manual_250402_095326
58 pages
ML LAB manual-1
No ratings yet
ML LAB manual-1
33 pages
data analytics lab manual
No ratings yet
data analytics lab manual
26 pages
ml_all_projectpdf_removed
No ratings yet
ml_all_projectpdf_removed
41 pages
Machine Exercise 3 (1)
No ratings yet
Machine Exercise 3 (1)
22 pages
_MLP_reg_improved pdf
No ratings yet
_MLP_reg_improved pdf
38 pages
Research Report 109 The Gender Pay Gap PDF
No ratings yet
Research Report 109 The Gender Pay Gap PDF
67 pages
DA LAB MANNUAL
No ratings yet
DA LAB MANNUAL
25 pages
C121 Exp2
No ratings yet
C121 Exp2
23 pages
HIV Regression Source Code
No ratings yet
HIV Regression Source Code
26 pages
Intermediate Machine learning
No ratings yet
Intermediate Machine learning
12 pages
prg7a - Jupyter Notebook
No ratings yet
prg7a - Jupyter Notebook
12 pages
featureselection
No ratings yet
featureselection
11 pages
ML Lab Codes
No ratings yet
ML Lab Codes
14 pages
C121 Exp1
No ratings yet
C121 Exp1
32 pages
Gaurav - Data Mining Lab Assignment
No ratings yet
Gaurav - Data Mining Lab Assignment
36 pages
MACHINE LEARNING manual
No ratings yet
MACHINE LEARNING manual
36 pages
bacdeaf_23032025_115708_split_1
No ratings yet
bacdeaf_23032025_115708_split_1
37 pages
Data Preparation
No ratings yet
Data Preparation
11 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
ADS EXP Assignments
No ratings yet
ADS EXP Assignments
38 pages
DSBDA4
No ratings yet
DSBDA4
6 pages
New Text Document
No ratings yet
New Text Document
7 pages
Slides on DataII
No ratings yet
Slides on DataII
26 pages
Train
No ratings yet
Train
17 pages
Machine_Learning_Lab_File (1)
No ratings yet
Machine_Learning_Lab_File (1)
45 pages
AIML PROGRAMS
No ratings yet
AIML PROGRAMS
12 pages
ML_Manual
No ratings yet
ML_Manual
18 pages
Codes
No ratings yet
Codes
5 pages
MachineLearning
No ratings yet
MachineLearning
10 pages
DataAnalytics Lab Manual (1)
No ratings yet
DataAnalytics Lab Manual (1)
35 pages
DA PROGRAM UPTO 6 (1)
No ratings yet
DA PROGRAM UPTO 6 (1)
20 pages
ASSESSMENT2
No ratings yet
ASSESSMENT2
22 pages
DA_Programs
No ratings yet
DA_Programs
44 pages
DS Problem Statements and Codes
No ratings yet
DS Problem Statements and Codes
21 pages
Machine File
No ratings yet
Machine File
27 pages
Assignment 2 Documentation
No ratings yet
Assignment 2 Documentation
15 pages
ML Practical File
100% (2)
ML Practical File
43 pages
DA lab
No ratings yet
DA lab
27 pages
Machine Learning Model Building
No ratings yet
Machine Learning Model Building
6 pages
ML Complete Notes Hridoy.docx
No ratings yet
ML Complete Notes Hridoy.docx
5 pages
ML SELF UNIT 2
No ratings yet
ML SELF UNIT 2
20 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
1
No ratings yet
1
13 pages
EXP-2 ML
No ratings yet
EXP-2 ML
6 pages
Exp 5
No ratings yet
Exp 5
4 pages
Nanyang Junior College 2008 Jc2 Preliminary Examination
No ratings yet
Nanyang Junior College 2008 Jc2 Preliminary Examination
5 pages
Data Analytics lab manual
No ratings yet
Data Analytics lab manual
47 pages
Group 1 (Thesis)
No ratings yet
Group 1 (Thesis)
34 pages
SiddharthShah 1032221195 DivC 50 DL LabAssignment2
No ratings yet
SiddharthShah 1032221195 DivC 50 DL LabAssignment2
7 pages
Deneesha Tharunika Sooriyaarachchi CL-HDCSE-CMU-102-40 CSE5014 1668472 412159309
No ratings yet
Deneesha Tharunika Sooriyaarachchi CL-HDCSE-CMU-102-40 CSE5014 1668472 412159309
15 pages
PR222
No ratings yet
PR222
52 pages
EXP-2
No ratings yet
EXP-2
6 pages
ch11_wb_ans.pdf
No ratings yet
ch11_wb_ans.pdf
39 pages
Decision Tree.pdf
No ratings yet
Decision Tree.pdf
2 pages
Testing The Difference Between Two Means, Two Proportions, and Two Variances
No ratings yet
Testing The Difference Between Two Means, Two Proportions, and Two Variances
39 pages
DA_012307
No ratings yet
DA_012307
8 pages
Aiml Ex 4-7
No ratings yet
Aiml Ex 4-7
8 pages
Data Pre Processing
No ratings yet
Data Pre Processing
2 pages
BCA Third Semester Question Paper 2019
No ratings yet
BCA Third Semester Question Paper 2019
20 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
micro
No ratings yet
micro
4 pages
ASSi2 DSBDA
No ratings yet
ASSi2 DSBDA
4 pages
ML Lab
No ratings yet
ML Lab
7 pages
STATPRB - Quarter 3 - Module 3 (FINAL)
No ratings yet
STATPRB - Quarter 3 - Module 3 (FINAL)
24 pages
Corporate Financial Risk Management Notes
No ratings yet
Corporate Financial Risk Management Notes
26 pages
Chapter 6 Processing and Analysis of Data
No ratings yet
Chapter 6 Processing and Analysis of Data
30 pages
Chap 7
100% (1)
Chap 7
28 pages
Statistical Quality Control: by 4Th Edition © Wiley 2010 Powerpoint Presentation by R.B. Clough - Unh M. E. Henrie - Uaa
No ratings yet
Statistical Quality Control: by 4Th Edition © Wiley 2010 Powerpoint Presentation by R.B. Clough - Unh M. E. Henrie - Uaa
40 pages
Handling Missing Values in A Real-Time Dataset During
No ratings yet
Handling Missing Values in A Real-Time Dataset During
5 pages
Supplier Quality Answer-Case 3 PDF
No ratings yet
Supplier Quality Answer-Case 3 PDF
8 pages
Assignment Work Class-8
No ratings yet
Assignment Work Class-8
5 pages
2A Stats Notes
No ratings yet
2A Stats Notes
10 pages
Field and Laboratory Methods For General Zoology PDF
0% (1)
Field and Laboratory Methods For General Zoology PDF
10 pages
Task 3.1: Explanation For Task 1: Data Acquisition & Preparation
No ratings yet
Task 3.1: Explanation For Task 1: Data Acquisition & Preparation
6 pages
Chapter 1 BFC34303 (Lyy)
No ratings yet
Chapter 1 BFC34303 (Lyy)
104 pages
Quantitative Analysis Using Spss
100% (1)
Quantitative Analysis Using Spss
42 pages
Unit 21 - Application of SD (Student)
No ratings yet
Unit 21 - Application of SD (Student)
8 pages
Econ 4002 Assignment 02
No ratings yet
Econ 4002 Assignment 02
2 pages
Hacking Into The Indian Education System
No ratings yet
Hacking Into The Indian Education System
12 pages
Assesment Outline For Report
100% (1)
Assesment Outline For Report
13 pages
RSRCH 121 2ND Quarter Exam
No ratings yet
RSRCH 121 2ND Quarter Exam
33 pages
The Effect of Time Schedule On The Students'
No ratings yet
The Effect of Time Schedule On The Students'
32 pages
Question Bank On Z Transforms
No ratings yet
Question Bank On Z Transforms
15 pages
Cambridge International AS & A Level: Further Mathematics 9231/41
No ratings yet
Cambridge International AS & A Level: Further Mathematics 9231/41
16 pages
Senior High School Math
No ratings yet
Senior High School Math
22 pages
CH1-Sequences and Seriess
No ratings yet
CH1-Sequences and Seriess
17 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet

Subset Selection Class Assignment

Uploaded by

Subset Selection Class Assignment

Uploaded by

Part 1 SAS:

1) A screen shot of your project window

#Imputing Missing Values

print("Imputed Interval Data:\n", imputed_interval_data)

# Convert String Categorical Attribute to Numbers for further assesment

# Converting nominal data from the dataframe into a numpy array

# Create an instance of the OneHotEncoder & Selecting Attributes

from AdvancedAnalytics import linreg

print("Residual sum of squares: %.2f"

#Maximum, minimum and mean of predicted value

#Maximum, minimum and mean of imputed value

# Printing first 15 predicted values

3) A listing of the first 15 observations after imputation and prediction.

You might also like