0% found this document useful (0 votes)
38 views11 pages

PA Lab2

The document discusses predicting customer buying behavior using random forest and decision tree classification algorithms on a dataset containing customer information like age, annual salary and whether they purchased a product or not. It loads the dataset, splits it into training and test sets, performs feature scaling and fits a logistic regression classifier to the training set for prediction.

Uploaded by

syedkashif047
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views11 pages

PA Lab2

The document discusses predicting customer buying behavior using random forest and decision tree classification algorithms on a dataset containing customer information like age, annual salary and whether they purchased a product or not. It loads the dataset, splits it into training and test sets, performs feature scaling and fits a logistic regression classifier to the training set for prediction.

Uploaded by

syedkashif047
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

PA Exp 2 Predicting buying customer_behaviour and Predict customer ... https://colab.research.google.com/drive/18ImfExgc2pyWKWnkMoEi...

1 #Random Forest Classification, decision tree classification

1 import numpy as np
2 import matplotlib.pyplot as plt
3 import pandas as pd

1 df = pd.read_csv('car_data.csv')

1 display(df.head(5))

User ID Gender Age AnnualSalary Purchased

0 385 Male 35 20000 0

1 681 Male 40 43500 0

2 353 Male 49 74000 0

3 895 Male 40 107500 1

4 661 Male 25 79000 0

1 X = df.iloc[:400, [2,3]].values
2 y = df.iloc[:400, -1].values

1 of 11 21-09-2023, 13:01
PA Exp 2 Predicting buying customer_behaviour and Predict customer ... https://colab.research.google.com/drive/18ImfExgc2pyWKWnkMoEi...

3 print(X)

[[ 35 20000]
[ 40 43500]
[ 49 74000]
[ 40 107500]
[ 25 79000]
[ 47 33500]
[ 46 132500]
[ 42 64000]
[ 30 84500]
[ 41 52000]
[ 42 80000]
[ 47 23000]
[ 32 72500]
[ 27 57000]
[ 42 108000]
[ 33 149000]
[ 35 75000]
[ 35 53000]
[ 46 79000]
[ 39 134000]
[ 39 51500]
[ 49 39000]
[ 54 25500]
[ 41 61500]
[ 31 117500]
[ 24 58000]
[ 40 107000]
[ 40 97500]
[ 48 29000]
[ 38 147500]
[ 45 26000]
[ 32 67500]
[ 37 62000]
[ 41 79500]
[ 44 113500]
[ 47 41500]
[ 38 55000]
[ 39 114500]
[ 42 73000]
[ 26 15000]
[ 21 37500]
[ 59 39500]
[ 39 66500]
[ 43 80500]
[ 49 86000]
[ 37 75000]
[ 49 76500]
[ 28 123000]
[ 59 48500]
[ 40 60500]
[ 38 99500]
[ 51 35500]
[ 55 130000]
[ 23 56500]

2 of 11 21-09-2023, 13:01
PA Exp 2 Predicting buying customer_behaviour and Predict customer ... https://colab.research.google.com/drive/18ImfExgc2pyWKWnkMoEi...

[ 23 56500]
[ 49 43500]
[ 49 36000]
[ 48 21500]
[ 49 98500]

1 sampled_df = df.sample(n=400)
2 X = sampled_df.iloc[:, [2,3]].values
3 y = sampled_df.iloc[:, -1].values
4 print(X)

[[ 22 55000]
[ 26 23500]
[ 46 23000]
[ 26 35000]
[ 43 63500]
[ 30 62500]
[ 38 63500]
[ 36 21500]
[ 50 37500]
[ 38 72500]
[ 33 69000]
[ 32 86000]
[ 32 35500]
[ 51 45500]
[ 20 74000]
[ 38 74500]
[ 29 60500]
[ 35 22000]
[ 44 73500]
[ 35 47000]
[ 26 80500]
[ 35 57000]
[ 35 79000]
[ 57 61500]
[ 40 123500]
[ 45 82500]
[ 28 138500]
[ 32 72500]
[ 30 76500]
[ 38 65000]
[ 49 36500]
[ 20 86500]
[ 40 97500]
[ 36 51500]
[ 39 52500]
[ 25 90000]
[ 25 56500]
[ 26 17000]
[ 59 135500]
[ 49 119500]
[ 34 72000]
[ 28 59000]
[ 52 67500]
[ 50 45500]

3 of 11 21-09-2023, 13:01
PA Exp 2 Predicting buying customer_behaviour and Predict customer ... https://colab.research.google.com/drive/18ImfExgc2pyWKWnkMoEi...

[ 50 45500]
[ 40 148500]
[ 48 114500]
[ 20 77500]
[ 29 45500]
[ 26 118000]
[ 31 108500]
[ 41 60000]
[ 39 42000]
[ 59 42000]
[ 40 82500]
[ 60 124500]
[ 31 90500]
[ 51 35500]
[ 50 107500]

1 #splitting the dataset into the Training set and Test set
2 from sklearn.model_selection import train_test_split
3 X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.25, random_state =

1 #feature scaling
2 from sklearn.linear_model import LogisticRegression
3 from sklearn.preprocessing import StandardScaler
4 classifier = LogisticRegression()
5 sc = StandardScaler()
6 X_train = sc.fit_transform(X_train)
7 X_test = sc.transform(X_test)
8 y_train = y_train.ravel()
9 classifier.fit(X_train, y_train)

▾ LogisticRegression
LogisticRegression()

1 print(X_train)

[[ 1.85560654 2.30625342]
[ 0.41442093 -0.17358897]
[ 0.89481613 0.13639133]
[ 0.99089517 -0.14259094]
[-0.06597427 0.18288838]
[ 0.51049997 -0.45257124]
[-1.3150018 0.3068805 ]
[ 0.89481613 -0.99503676]
[ 0.89481613 1.99627313]
[-0.3542114 -0.79354957]
[-1.3150018 -1.21202297]
[ 0.12618381 0.24488444]
[ 0.03010477 1.11282927]
[-0.45029044 -0.43707222]
[ 0.41442093 -0.5145673 ]
[ 0.79873709 -1.13452789]
[ 1.08697421 -1.53750228]

4 of 11 21-09-2023, 13:01
PA Exp 2 Predicting buying customer_behaviour and Predict customer ... https://colab.research.google.com/drive/18ImfExgc2pyWKWnkMoEi...

[ 1.08697421 -1.53750228]
[-0.8346066 1.15932632]
[-1.41108084 0.44637163]
[ 0.31834189 0.29138148]
[-0.3542114 2.39924751]
[ 0.22226285 -0.54556533]
[ 0.03010477 0.16738936]
[-0.06597427 -1.10352986]
[ 0.31834189 -0.48356927]
[ 0.03010477 0.04339724]
[-0.45029044 0.24488444]
[ 0.60657901 -1.47550622]
[-1.12284372 -1.53750228]
[ 0.70265805 -1.16552592]
[-2.08363413 -0.09609389]
[-0.06597427 0.04339724]
[-1.21892276 0.52386671]
[ 0.03010477 1.99627313]
[ 0.22226285 -0.57656336]
[-0.3542114 -0.32857912]
[ 0.99089517 0.57036375]
[ 0.70265805 -0.32857912]
[-1.89147604 0.33787853]
[-0.45029044 -1.52200327]
[-0.93068564 0.4153736 ]
[-1.21892276 0.39987459]
[ 0.89481613 -1.07253183]
[ 1.18305325 0.16738936]
[ 0.51049997 -0.80904858]
[-0.93068564 -0.8245476 ]
[-1.3150018 0.27588247]
[ 0.79873709 -1.18102494]
[ 0.22226285 -0.18908798]
[-0.25813236 0.21388641]
[-0.06597427 -0.45257124]
[ 0.12618381 -0.39057518]
[ 2.2399227 -0.8245476 ]
[ 0.03010477 0.29138148]
[ 0.03010477 2.12026525]
[-1.50715988 -1.36701312]
[-0.3542114 0.66335784]
[-0.8346066 -0.40607419]

1 print(classifier.predict(sc.transform([[49, 74000]])))

[1]

1 y_pred = classifier.predict(X_test)
2 print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

[[0 0]
[0 0]
[1 1]
[0 1]

5 of 11 21-09-2023, 13:01
PA Exp 2 Predicting buying customer_behaviour and Predict customer ... https://colab.research.google.com/drive/18ImfExgc2pyWKWnkMoEi...

[0 0]
[1 1]
[1 1]
[1 1]
[0 0]
[0 1]
[0 0]
[1 1]
[1 1]
[1 1]
[0 0]
[1 1]
[0 1]
[0 0]
[1 1]
[0 0]
[1 1]
[1 1]
[1 1]
[1 1]
[0 0]
[0 0]
[0 0]
[0 0]
[1 1]
[0 0]
[0 0]
[1 0]
[0 0]
[1 1]
[0 0]
[0 0]
[0 0]
[0 0]
[1 1]
[1 1]
[0 0]
[0 0]
[0 0]
[0 0]
[1 1]
[1 1]
[0 0]
[0 1]
[1 0]
[1 1]
[0 1]
[1 1]
[0 0]
[0 0]
[0 0]
[0 0]
[1 0]
[0 0]

6 of 11 21-09-2023, 13:01
PA Exp 2 Predicting buying customer_behaviour and Predict customer ... https://colab.research.google.com/drive/18ImfExgc2pyWKWnkMoEi...

1 import seaborn as sns


2 import matplotlib.pyplot as plt
3 from sklearn.metrics import confusion_matrix
4 from sklearn.metrics import plot_confusion_matrix
5 from sklearn.metrics import accuracy_score
6 cm = confusion_matrix(y_test, y_pred)
7 sns.heatmap(cm, annot=True, fmt='g')
8 plt.xlabel('Predicted labels')
9 plt.ylabel('True labels')
10 plt.show()
11
12 accuracy = accuracy_score(y_test, y_pred)
13 print(cm)
14 print(f'Accuracy: {accuracy:.3f}')
15
16

---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-24-2a6c70da7013> in <module>
2 import matplotlib.pyplot as plt
3 from sklearn.metrics import confusion_matrix
----> 4 from sklearn.metrics import plot_confusion_matrix
5 from sklearn.metrics import accuracy_score
6 cm = confusion_matrix(y_test, y_pred)

ImportError: cannot import name 'plot_confusion_matrix' from 'sklearn.metrics'


(/usr/local/lib/python3.9/dist-packages/sklearn/metrics/__init__.py)

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the


"Open Examples" button below.
---------------------------------------------------------------------------

7 of 11 21-09-2023, 13:01
PA Exp 2 Predicting buying customer_behaviour and Predict customer ... https://colab.research.google.com/drive/18ImfExgc2pyWKWnkMoEi...

1 from matplotlib.colors import ListedColormap


2 X_set, y_set = sc.inverse_transform(X_train), y_train
3 X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() +
4 np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() +
5 plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).re
6 alpha = 0.75, cmap = ListedColormap(('slategrey', 'slateblue')))
7 plt.xlim(X1.min(), X1.max())
8 plt.ylim(X2.min(), X2.max())
9 for i, j in enumerate(np.unique(y_set)):
10 plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], color = ListedColormap(('slategrey
11 plt.title('Random Forest Classification (Train set)')
12 plt.xlabel('Age')
13 plt.ylabel('Estimated Salary')
14 plt.legend()
15 plt.show()

8 of 11 21-09-2023, 13:01
PA Exp 2 Predicting buying customer_behaviour and Predict customer ... https://colab.research.google.com/drive/18ImfExgc2pyWKWnkMoEi...

1 from matplotlib.colors import ListedColormap


2 X_set, y_set = sc.inverse_transform(X_test), y_test
3 X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() +
4 np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() +
5 plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).re
6 alpha = 0.75, cmap = ListedColormap(('slategrey', 'slateblue')))
7 plt.xlim(X1.min(), X1.max())
8 plt.ylim(X2.min(), X2.max())
9 for i, j in enumerate(np.unique(y_set)):
10 plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], color = ListedColormap(('slategrey
11 plt.title('Random Forest Classification (Test set)')
12 plt.xlabel('Age')
13 plt.ylabel('Estimated Salary')
14 plt.legend()
15 plt.show()

1 # create a new data point to predict


2 new_data = [[35, 50000]]
3
4 # scale the new data using the same scaler used on the training data
5 new_data_scaled = sc.transform(new_data)
6
7 # make a prediction using the trained classifier
8 prediction = classifier.predict(new_data_scaled)

9 of 11 21-09-2023, 13:01
PA Exp 2 Predicting buying customer_behaviour and Predict customer ... https://colab.research.google.com/drive/18ImfExgc2pyWKWnkMoEi...

9
10 print(prediction)
11

[0]

1 new_data_2 = [[35, 50000], [42, 80700], [51, 65000], [23, 45600], [36, 42000], [41, 72000
2 df = pd.DataFrame(new_data_2, columns=[ 'Age', 'Annual Salary'])
3 df.index = df.index + 1
4 display(df)
5

Age Annual Salary

1 35 50000

2 42 80700

3 51 65000

4 23 45600

5 36 42000

6 41 72000

1 from tabulate import tabulate


2
3 results = []
4
5 for i, data in enumerate(new_data_2):
6 # scale the new data using the same scaler used on the training data
7 data_scaled = sc.transform([data])
8
9 # make a prediction using the trained classifier
10 prediction = classifier.predict(data_scaled)
11
12 # add the results to the table
13 if prediction == 1:
14 prediction_text = "Yes"
15 else:
16 prediction_text = "No"
17 results.append([i+1, data[0], data[1], prediction_text])
18
19 # print the table

10 of 11 21-09-2023, 13:01
PA Exp 2 Predicting buying customer_behaviour and Predict customer ... https://colab.research.google.com/drive/18ImfExgc2pyWKWnkMoEi...

20 headers = ["Customer", "Age", "Annual Salary", "Prediction"]


21 print(tabulate(results, headers=headers))
22

Customer Age Annual Salary Prediction


---------- ----- --------------- ------------
1 35 50000 No
2 42 80700 No
3 51 65000 Yes
4 23 45600 No
5 36 42000 No
6 41 72000 No

11 of 11 21-09-2023, 13:01

You might also like