Subset Selection Class Assignment
Subset Selection Class Assignment
2) A listing or screen shot of the min, max and mean of the predicted and actual amounts in
your data.
3) A listing of the first 15 observations after imputation and prediction.
Part 2: Python
1) A copy of your Python program
#Importing Library
import pandas as pd
import numpy as np
from AdvancedAnalytics import ReplaceImputeEncode
#Reading Data
df = pd.read_excel("diamondswmissing.xlsx")
#Missing Values
data_map = {\
'obs': [4,(1,53940)],\
'Carat':[0,(0.2,5.5)],\
'cut':[2,('Fair','Good','Premium','Very Good')],\
'color':[2,('D','E','F','G','H','I','J')],\
'clarity':[2,('I1','IF','SI1','SI2','VS1','VS2','VVS1','VVS2')],\
'depth':[0,(40,80)],\
'table':[0,(40,100)],\
'x':[0,(0,11)],\
'y':[0,(0,60)],\
'z':[0,(0,32)],\
'price':[0,(300,20000)]\
}
rie = ReplaceImputeEncode(data_map=data_map,display=True)
df.rie = rie.fit_transform(df)
#Encoding
scaler = preprocessing.StandardScaler() # Create an instance of StandardScaler()
scaler.fit(imputed_interval_data)
scaled_interval_data = scaler.transform(imputed_interval_data)
print("Imputed & Scaled Interval Data\n", scaled_interval_data)
y = df['price']
x = df.drop('price',axis=1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=1)
lr=LinearRegression()
col=[]
for i in range(x_train.shape[1]):
col.append('X'+str(i))
lr.fit(x_train,y_train)
print("\n*** LINEAR REGRESSION ***")
linreg.display_coef(lr, x_train, y_train, col)
linreg.display_metrics(lr, x_train, y_train)
y_hat= lr.predict(x_test)
xtestarr = np.asanyarray(x_test)
ytestarr = np.asanyarray(y_test)
# Printing table
final_table = df.head(15)
from pandas import ExcelWriter
writer = ExcelWriter('PythonHW.xlsx')
final_table.to_excel(writer)
writer.save()
2) A listing or screen shot of the min, max and mean of the predicted and actual amounts in
your data.
Predicted minimum
-8484.009878847311
Predicted maximum
27244.01984476174
Predicted mean
3950.5366225183566
Actual minimum
2
Actual maximum
27746
Actual mean
3900.195464095909