0% found this document useful (0 votes)

10 views

Quality Prediction Checkpoint

Uploaded by

Pavan Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Quality Prediction Checkpoint

Uploaded by

Pavan Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

In [260]:

# importing libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import preprocessing
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn import svm
from sklearn import metrics

import warnings
warnings.filterwarnings('ignore')

In [215]:

# Loding the dataset

df = pd.read_csv('QualityPrediction.csv')
df

Out[215]:

free total
fixed volatile citric residual
chlorides sulfur sulfur density pH sulphates alcohol quality
acidity acidity acid sugar
dioxide dioxide

0 7.4 0.700 0.00 1.9 0.076 11.0 34.0 0.99780 3.51 0.56 9.4 5

1 7.8 0.880 0.00 2.6 0.098 25.0 67.0 0.99680 3.20 0.68 9.8 5

2 7.8 0.760 0.04 2.3 0.092 15.0 54.0 0.99700 3.26 0.65 9.8 5

3 11.2 0.280 0.56 1.9 0.075 17.0 60.0 0.99800 3.16 0.58 9.8 6

4 7.4 0.700 0.00 1.9 0.076 11.0 34.0 0.99780 3.51 0.56 9.4 5

... ... ... ... ... ... ... ... ... ... ... ... ...

1594 6.2 0.600 0.08 2.0 0.090 32.0 44.0 0.99490 3.45 0.58 10.5 5

1595 5.9 0.550 0.10 2.2 0.062 39.0 51.0 0.99512 3.52 0.76 11.2 6

1596 6.3 0.510 0.13 2.3 0.076 29.0 40.0 0.99574 3.42 0.75 11.0 6

1597 5.9 0.645 0.12 2.0 0.075 32.0 44.0 0.99547 3.57 0.71 10.2 5

1598 6.0 0.310 0.47 3.6 0.067 18.0 42.0 0.99549 3.39 0.66 11.0 6

1599 rows × 12 columns

In [216]:

# Checking for null values

df.isnull().sum()

Out[216]:

fixed acidity 0
volatile acidity 0
citric acid 0
residual sugar 0
chlorides 0
free sulfur dioxide 0
total sulfur dioxide 0
density 0
pH 0
sulphates 0
alcohol 0
quality 0
dtype: int64
In [217]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 fixed acidity 1599 non-null float64
1 volatile acidity 1599 non-null float64
2 citric acid 1599 non-null float64
3 residual sugar 1599 non-null float64
4 chlorides 1599 non-null float64
5 free sulfur dioxide 1599 non-null float64
6 total sulfur dioxide 1599 non-null float64
7 density 1599 non-null float64
8 pH 1599 non-null float64
9 sulphates 1599 non-null float64
10 alcohol 1599 non-null float64
11 quality 1599 non-null int64
dtypes: float64(11), int64(1)
memory usage: 150.0 KB

In [218]:

df.describe()

Out[218]:

fixed volatile residual free sulfur total sulfur

citric acid chlorides density pH sulphates
acidity acidity sugar dioxide dioxide

count 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 1599.000000 159

mean 8.319637 0.527821 0.270976 2.538806 0.087467 15.874922 46.467792 0.996747 3.311113 0.658149

std 1.741096 0.179060 0.194801 1.409928 0.047065 10.460157 32.895324 0.001887 0.154386 0.169507

min 4.600000 0.120000 0.000000 0.900000 0.012000 1.000000 6.000000 0.990070 2.740000 0.330000

25% 7.100000 0.390000 0.090000 1.900000 0.070000 7.000000 22.000000 0.995600 3.210000 0.550000

50% 7.900000 0.520000 0.260000 2.200000 0.079000 14.000000 38.000000 0.996750 3.310000 0.620000

75% 9.200000 0.640000 0.420000 2.600000 0.090000 21.000000 62.000000 0.997835 3.400000 0.730000

max 15.900000 1.580000 1.000000 15.500000 0.611000 72.000000 289.000000 1.003690 4.010000 2.000000

In [219]:

df.head(5)

Out[219]:

fixed volatile citric residual free sulfur total sulfur

chlorides density pH sulphates alcohol quality
acidity acidity acid sugar dioxide dioxide

0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5

1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5

2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 5

3 11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8 6

4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5

Data Preprocessing
In [220]:

df['quality'].value_counts()

Out[220]:

5 681
6 638
7 199
4 53
8 18
3 10
Name: quality, dtype: int64
In [221]:

sns.catplot(x='quality', data=df, kind='count')

Out[221]:

<seaborn.axisgrid.FacetGrid at 0x152b9301c10>

In [222]:

plot=plt.figure(figsize=(5,5))
sns.barplot(x='quality',y='volatile acidity',data=df)

Out[222]:

<AxesSubplot:xlabel='quality', ylabel='volatile acidity'>

In [223]:

plot=plt.figure(figsize=(5,5))
sns.barplot(x='quality',y='citric acid',data=df)

Out[223]:

<AxesSubplot:xlabel='quality', ylabel='citric acid'>

In [224]:

plt.bar(df['quality'], df['alcohol'])
plt.xlabel('quality')
plt.ylabel('alcohol')
plt.show()

Exploratory Data Analysis

In [225]:

df['quality'] = df['quality'].apply(lambda x: 1 if x >= 7 else 0)

df.rename(columns={'quality': 'good quality'}, inplace=True)
df.head()

Out[225]:

fixed volatile citric residual free sulfur total sulfur good

chlorides density pH sulphates alcohol
acidity acidity acid sugar dioxide dioxide quality

0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 0

1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 0

2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 0

3 11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8 0

4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 0

In [226]:

plt.figure(figsize=(5,5))
sns.countplot(x='good quality', data=df)
plt.xlabel('good quality')
plt.ylabel('Count')
plt.title('Count of Good vs Bad Quality Wines')
plt.show()
In [227]:

plt.figure(figsize=(10,6))
sns.heatmap(df.corr(), annot=True)
plt.show()

In [228]:
fig, ax = plt.subplots(2,4,figsize=(20,20))
sns.scatterplot(x = 'fixed acidity', y = 'citric acid', hue = 'good quality', data = df, ax=ax[0,0])
sns.scatterplot(x = 'volatile acidity', y = 'citric acid', hue = 'good quality', data = df, ax=ax[0,1])
sns.scatterplot(x = 'free sulfur dioxide', y = 'total sulfur dioxide', hue = 'good quality', data = df, ax=ax[0,2
])
sns.scatterplot(x = 'fixed acidity', y = 'density', hue = 'good quality', data = df, ax=ax[0,3])
sns.scatterplot(x = 'fixed acidity', y = 'pH', hue = 'good quality', data = df, ax=ax[1,0])
sns.scatterplot(x = 'citric acid', y = 'pH', hue = 'good quality', data = df, ax=ax[1,1])
sns.scatterplot(x = 'chlorides', y = 'sulphates', hue = 'good quality', data = df, ax=ax[1,2])
sns.scatterplot(x = 'residual sugar', y = 'alcohol', hue = 'good quality', data = df, ax=ax[1,3])
Out[228]:

<AxesSubplot:xlabel='residual sugar', ylabel='alcohol'>

Train Test Split

In [229]:
X_train, X_test, y_train, y_test = train_test_split(df.drop('good quality', axis=1), df['good quality'], test_siz
e=0.3, random_state=42)

In [230]:

X_train.head()

Out[230]:

fixed volatile citric residual free sulfur total sulfur

chlorides density pH sulphates alcohol
acidity acidity acid sugar dioxide dioxide

925 8.6 0.22 0.36 1.9 0.064 53.0 77.0 0.99604 3.47 0.87 11.0

363 12.5 0.46 0.63 2.0 0.071 6.0 15.0 0.99880 2.99 0.87 10.2

906 7.2 0.54 0.27 2.6 0.084 12.0 78.0 0.99640 3.39 0.71 11.0

426 6.4 0.67 0.08 2.1 0.045 19.0 48.0 0.99490 3.49 0.49 11.4

1251 7.5 0.58 0.14 2.2 0.077 27.0 60.0 0.99630 3.28 0.59 9.8
In [231]:

X_test.head()

Out[231]:

fixed volatile citric residual free sulfur total sulfur

chlorides density pH sulphates alcohol
acidity acidity acid sugar dioxide dioxide

803 7.7 0.56 0.08 2.50 0.114 14.0 46.0 0.9971 3.24 0.66 9.6

124 7.8 0.50 0.17 1.60 0.082 21.0 102.0 0.9960 3.39 0.48 9.5

350 10.7 0.67 0.22 2.70 0.107 17.0 34.0 1.0004 3.28 0.98 9.9

682 8.5 0.46 0.31 2.25 0.078 32.0 58.0 0.9980 3.33 0.54 9.8

1326 6.7 0.46 0.24 1.70 0.077 18.0 34.0 0.9948 3.39 0.60 10.6

Model Training
Feature Scaling
In [232]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [233]:

X_train_scaled

Out[233]:

array([[ 1.69536131e-01, -1.72107140e+00, 4.59303345e-01, ...,

1.01180685e+00, 1.22661179e+00, 5.50057013e-01],
[ 2.44606730e+00, -4.01957443e-01, 1.84105501e+00, ...,
-2.10687612e+00, 1.22661179e+00, -2.05174641e-01],
[-6.47680186e-01, 3.77472102e-02, -1.28054303e-03, ...,
4.92026353e-01, 2.97270776e-01, 5.50057013e-01],
...,
[-6.47680186e-01, 4.77451864e-01, -1.07597628e+00, ...,
1.27169710e+00, -6.90154049e-01, -8.66002338e-01],
[-2.39072027e-01, -1.83099757e+00, 4.08127357e-01, ...,
3.72184202e-02, 8.20025095e-01, 1.39969262e+00],
[-1.46489650e+00, -1.33632983e+00, -5.24565306e-02, ...,
4.92026353e-01, -6.90154049e-01, 2.91015593e+00]])

In [234]:

X_test_scaled

Out[234]:

array([[-0.35581722, 0.14767337, -0.97362431, ..., -0.48256207,

0.00685171, -0.77159838],
[-0.29744462, -0.18210512, -0.51304042, ..., 0.49202635,
-1.03865693, -0.86600234],
[ 1.39536061, 0.75226727, -0.25716048, ..., -0.22267183,
1.86553373, -0.48838651],
...,
[-0.93954316, -0.40195744, -0.15480851, ..., 0.49202635,
-0.34165117, 0.17244119],
[ 1.27861542, -0.12714203, 1.892231 , ..., -1.4571505 ,
0.00685171, 1.30528867],
[ 0.92837985, -0.18210512, -0.15480851, ..., 0.16716354,
-0.80632167, -0.39398255]])

Logistic Refression
In [235]:
lr = LogisticRegression()
lr

Out[235]:

LogisticRegression()
In [261]:

#training the model

lr.fit(X_train, y_train)
lr.score(X_train, y_train)

Out[261]:

0.8838248436103664

In [237]:

# testing the model

lr_pred = lr.predict(X_test)
accuracy_score(y_test, lr_pred)

Out[237]:

0.85625

Support Vector Machine (SVM)

In [238]:

clf = svm.SVC(kernel='rbf')
clf

Out[238]:

SVC()

In [239]:

# training the model

clf.fit(X_train, y_train)
clf.score(X_train, y_train)

Out[239]:

0.8668453976764968

In [240]:

# testing the model

sv_pred = clf.predict(X_test)
accuracy_score(y_test, sv_pred)

Out[240]:

0.8625

Decision Tree
In [241]:

dtree = DecisionTreeClassifier()
dtree

Out[241]:

DecisionTreeClassifier()

In [242]:

# training the model

dtree.fit(X_train, y_train)
dtree.score(X_train, y_train)

Out[242]:

1.0

In [243]:

# testing the model

tr_pred = dtree.predict(X_test)
accuracy_score(y_test, tr_pred)

Out[243]:

0.8604166666666667
K-Nearest Neighbors (KNN)
In [244]:

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=5)
knn

Out[244]:

KNeighborsClassifier()

In [262]:

# training the model

knn.fit(X_train, y_train)
knn.score(X_train, y_train)

Out[262]:

0.9079535299374442

In [263]:

# testing the model

kn_pred = knn.predict(X_test)
accuracy_score(y_test, kn_pred)

Out[263]:

0.8583333333333333

Model Evaluation
Logistic Regression
In [247]:

# logistic regression model evaluation

sns.heatmap(confusion_matrix(y_test, lr_pred), annot=True, cmap='Blues')
plt.ylabel('Predicted Values')
plt.xlabel('Actual Values')
plt.title('Confusion Matrix for Logistic Regression')
plt.show()
In [248]:

print('Logistic Regression Model Accuracy: ', accuracy_score(y_test, lr_pred))

print('Logistic Regression Model f1 score: ', metrics.f1_score(y_test, lr_pred))
print('Logistic Regression Model MAE: ', metrics.mean_absolute_error(y_test, lr_pred))
print('Logistic Regression Model RMSE: ', np.sqrt(metrics.mean_squared_error(y_test, lr_pred)))

Logistic Regression Model Accuracy: 0.85625

Logistic Regression Model f1 score: 0.28865979381443296
Logistic Regression Model MAE: 0.14375
Logistic Regression Model RMSE: 0.3791437722025775

Support Vector Machine (SVM)

In [249]:

sns.heatmap(confusion_matrix(y_test, sv_pred), annot=True, cmap='Reds')

plt.ylabel('Predicted Values')
plt.xlabel('Actual Values')
plt.title('Confusion Matrix for Support Vector Machine')
plt.show()

In [250]:

print('Support Vector Machine Model Accuracy: ', accuracy_score(y_test, sv_pred))

print('Support Vector Machine Model f1 score: ', metrics.f1_score(y_test, sv_pred))
print('Support Vector Machine Model MAE: ', metrics.mean_absolute_error(y_test, sv_pred))
print('Support Vector Machine Model RMSE: ', np.sqrt(metrics.mean_squared_error(y_test, sv_pred)))

Support Vector Machine Model Accuracy: 0.8625

Support Vector Machine Model f1 score: 0.029411764705882353
Support Vector Machine Model MAE: 0.1375
Support Vector Machine Model RMSE: 0.37080992435478316

Decision Tree
In [251]:

sns.heatmap(confusion_matrix(y_test, tr_pred), annot=True, cmap='Greens')

plt.ylabel('Predicted Values')
plt.xlabel('Actual Values')
plt.title('Confusion Matrix for Decision Tree')
plt.show()

In [252]:
print('Decision Tree Model Accuracy: ', accuracy_score(y_test, tr_pred))
print('Decision Tree Model f1 score: ', metrics.f1_score(y_test, tr_pred))
print('Decision Tree Model MAE: ', metrics.mean_absolute_error(y_test, tr_pred))
print('Decision Tree Model RMSE: ', np.sqrt(metrics.mean_squared_error(y_test, tr_pred)))

Decision Tree Model Accuracy: 0.8604166666666667

Decision Tree Model f1 score: 0.5677419354838709
Decision Tree Model MAE: 0.13958333333333334
Decision Tree Model RMSE: 0.3736085295243316

K-Nearest Neighbors (KNN)

In [253]:

sns.heatmap(confusion_matrix(y_test, kn_pred), annot=True, cmap='Purples')

plt.ylabel('Predicted Values')
plt.xlabel('Actual Values')
plt.title('Confusion Matrix for K-Nearest Neighbors')
plt.show()

In [254]:
print('K-Nearest Neighbors Model Accuracy: ', accuracy_score(y_test, kn_pred))
print('K-Nearest Neighbors Model f1 score: ', metrics.f1_score(y_test, kn_pred))
print('K-Nearest Neighbors Model MAE: ', metrics.mean_absolute_error(y_test, kn_pred))
print('K-Nearest Neighbors Model RMSE: ', np.sqrt(metrics.mean_squared_error(y_test, kn_pred)))

K-Nearest Neighbors Model Accuracy: 0.8583333333333333

K-Nearest Neighbors Model f1 score: 0.276595744680851
K-Nearest Neighbors Model MAE: 0.14166666666666666
K-Nearest Neighbors Model RMSE: 0.3763863263545405

Model Comparison
In [264]:

models = ['Logistic Regression', 'Support Vector Machine', 'Decision Tree', 'K-Nearest Neighbors']
accuracy = [accuracy_score(y_test, lr_pred), accuracy_score(y_test, sv_pred), accuracy_score(y_test, tr_pred), ac
curacy_score(y_test, kn_pred)]
plt.figure(figsize=(10,6))
sns.barplot(x=models, y=accuracy)
plt.title('Model Accuracy Comparison')
plt.xlabel('Model')
plt.ylabel('Accuracy')
plt.ylim(0.5, 1.0)
plt.show()

Conclusion

It is observed that the Logistic Regression model performs the best on the test set with an accuracy of 86%. The model
can predict the quality of the wine based on the given features with an accuracy of 86%.

In [ ]:

Wine DS
No ratings yet
Wine DS
14 pages
Practical04.ipynb - Colab
No ratings yet
Practical04.ipynb - Colab
2 pages
EDA RED WINE
No ratings yet
EDA RED WINE
16 pages
m1
No ratings yet
m1
10 pages
water_qualit… (2) - JupyterLab
No ratings yet
water_qualit… (2) - JupyterLab
10 pages
Quality Prediction
No ratings yet
Quality Prediction
20 pages
Assignment4 VidulGarg
No ratings yet
Assignment4 VidulGarg
14 pages
Decision Trees
No ratings yet
Decision Trees
2 pages
learning-concepts-hackers-realm
No ratings yet
learning-concepts-hackers-realm
78 pages
Empirical Crop Suitability Model 1694688954
No ratings yet
Empirical Crop Suitability Model 1694688954
24 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
100% (1)
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
10 pages
Mini Projects 1-3-Satyaki Mitra
No ratings yet
Mini Projects 1-3-Satyaki Mitra
33 pages
AS Notebook - PCA - Wine Data-4
100% (1)
AS Notebook - PCA - Wine Data-4
1 page
Wine
No ratings yet
Wine
22 pages
ML LAB 12 - Jupyter Notebook
No ratings yet
ML LAB 12 - Jupyter Notebook
11 pages
Code
No ratings yet
Code
5 pages
Data Science Libraries
No ratings yet
Data Science Libraries
4 pages
Machine Learning Lab Manual (1)
No ratings yet
Machine Learning Lab Manual (1)
42 pages
AM19 EDA Assignment5
No ratings yet
AM19 EDA Assignment5
19 pages
Scikit Learn1
No ratings yet
Scikit Learn1
4 pages
datamining_exp5_datanormalisation
No ratings yet
datamining_exp5_datanormalisation
14 pages
21brs1715 Lab3
No ratings yet
21brs1715 Lab3
4 pages
Ensemble D'apprentissage1 - Jupyter Notebook
No ratings yet
Ensemble D'apprentissage1 - Jupyter Notebook
11 pages
air-quality-randomforest
No ratings yet
air-quality-randomforest
5 pages
devesh
No ratings yet
devesh
11 pages
14-May - Jupyter Notebook
No ratings yet
14-May - Jupyter Notebook
15 pages
Mini Project Report
No ratings yet
Mini Project Report
12 pages
Mini Project With Output
No ratings yet
Mini Project With Output
8 pages
vertopal.com_model_training
No ratings yet
vertopal.com_model_training
6 pages
Wine Quality Prediction
No ratings yet
Wine Quality Prediction
6 pages
som
No ratings yet
som
19 pages
Basic Python Analysis
No ratings yet
Basic Python Analysis
33 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
RANDOM FOREST (Binary Classification)
No ratings yet
RANDOM FOREST (Binary Classification)
5 pages
Wine Quality Prediction Using Machine Learning
No ratings yet
Wine Quality Prediction Using Machine Learning
10 pages
Refactor Wine Quality - Ipynb
No ratings yet
Refactor Wine Quality - Ipynb
4 pages
Water Quality 1673157384
No ratings yet
Water Quality 1673157384
30 pages
Pyspark_MLlib
No ratings yet
Pyspark_MLlib
4 pages
Kaggle Course Notes
No ratings yet
Kaggle Course Notes
87 pages
compte rendu tp 2 pandas
No ratings yet
compte rendu tp 2 pandas
2 pages
Mlp Slides Merged
No ratings yet
Mlp Slides Merged
480 pages
IMPLEMENTATION
No ratings yet
IMPLEMENTATION
6 pages
unit 6 Pyspark_MLlib
No ratings yet
unit 6 Pyspark_MLlib
6 pages
Mini Project
No ratings yet
Mini Project
8 pages
Coding An
No ratings yet
Coding An
19 pages
45B AIML Practical07 Clustering
No ratings yet
45B AIML Practical07 Clustering
8 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Proyecto Final Model
No ratings yet
Proyecto Final Model
13 pages
Program 5
No ratings yet
Program 5
3 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
CatBoost - An In-Depth Guide Python
No ratings yet
CatBoost - An In-Depth Guide Python
33 pages
Dav Lab Manual
No ratings yet
Dav Lab Manual
28 pages
Melbourne Ia
No ratings yet
Melbourne Ia
16 pages
Fds Practical Slips Solutions
No ratings yet
Fds Practical Slips Solutions
32 pages
batch1 ds
No ratings yet
batch1 ds
15 pages
Machine Learning Practice
No ratings yet
Machine Learning Practice
17 pages
FDS Solved Slips
100% (1)
FDS Solved Slips
63 pages
ML Lab Codes
No ratings yet
ML Lab Codes
14 pages
A List of Factorial Math Constants
From Everand
A List of Factorial Math Constants
StreetLib
No ratings yet
The Fibonacci Number Series
From Everand
The Fibonacci Number Series
Michael Husted
5/5 (1)
Rocker Bogie
No ratings yet
Rocker Bogie
31 pages
Mainprojectedtd 151114185045 Lva1 App6892
No ratings yet
Mainprojectedtd 151114185045 Lva1 App6892
14 pages
Smart Helmet For Driver
No ratings yet
Smart Helmet For Driver
15 pages
Floating Windmill Powerpoint
100% (1)
Floating Windmill Powerpoint
25 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
189 Cheat Sheet Nominicards PDF
No ratings yet
189 Cheat Sheet Nominicards PDF
2 pages
F# For Machine Learning Essentials - Sample Chapter
No ratings yet
F# For Machine Learning Essentials - Sample Chapter
29 pages
Automatic Modulation Classification Principles Algorithms and Applications 1st Edition Zhechen Zhu download
100% (2)
Automatic Modulation Classification Principles Algorithms and Applications 1st Edition Zhechen Zhu download
48 pages
(IJCST-V11I6P2) :ms. Madhuri P. Narkhede, Dr. Harshali B Patil
No ratings yet
(IJCST-V11I6P2) :ms. Madhuri P. Narkhede, Dr. Harshali B Patil
5 pages
Fake Profile Detection
100% (1)
Fake Profile Detection
69 pages
Skill Oriented Program: Shaik Akram
No ratings yet
Skill Oriented Program: Shaik Akram
121 pages
ELC Data Science Curriculum
No ratings yet
ELC Data Science Curriculum
18 pages
Mulberry Leaf Disease Detection
No ratings yet
Mulberry Leaf Disease Detection
7 pages
Equitability, Mutual Information, and The Maximal Information Coefficient
No ratings yet
Equitability, Mutual Information, and The Maximal Information Coefficient
6 pages
Enhanced Bagging EBagging A Novel Approach For Ens
No ratings yet
Enhanced Bagging EBagging A Novel Approach For Ens
15 pages
Radiomic Prediction of Tumor Grade and Overall Survival From The BraTS Glioma Dataset SI PDF
No ratings yet
Radiomic Prediction of Tumor Grade and Overall Survival From The BraTS Glioma Dataset SI PDF
28 pages
Applying K-Nearest Neighbour in Diagnosing Heart Disease Patient
No ratings yet
Applying K-Nearest Neighbour in Diagnosing Heart Disease Patient
4 pages
Music Genre Classification with ResNet and
No ratings yet
Music Genre Classification with ResNet and
17 pages
Predicting Mental Health Illness Using Machine Learning Algorithms
No ratings yet
Predicting Mental Health Illness Using Machine Learning Algorithms
8 pages
Ppt_intrusion Detection System Using Machine Learning
No ratings yet
Ppt_intrusion Detection System Using Machine Learning
12 pages
Data Science and Gen AI
No ratings yet
Data Science and Gen AI
27 pages
Broadly, There Are 3 Types of Machine Learning Algorithms.
No ratings yet
Broadly, There Are 3 Types of Machine Learning Algorithms.
33 pages
Disease Prediction From Various Symptoms Using Machine Learning
No ratings yet
Disease Prediction From Various Symptoms Using Machine Learning
7 pages
CS550 Lec7-ClassificationIntro
No ratings yet
CS550 Lec7-ClassificationIntro
49 pages
Detecting and Classifying DNS Tunneling Through Novel Machine Learning Approach
No ratings yet
Detecting and Classifying DNS Tunneling Through Novel Machine Learning Approach
7 pages
Complete Big Data Analytics Tools and Technology For Effective Planning 1st Edition Arun K. Somani PDF For All Chapters
100% (3)
Complete Big Data Analytics Tools and Technology For Effective Planning 1st Edition Arun K. Somani PDF For All Chapters
52 pages
2yrs Mca Sem4
No ratings yet
2yrs Mca Sem4
10 pages
Improving Sporadic Demand Forecasting Using A Modified K-Nearest Neighbor Framework
No ratings yet
Improving Sporadic Demand Forecasting Using A Modified K-Nearest Neighbor Framework
11 pages
Masters in Data Science Brochure
No ratings yet
Masters in Data Science Brochure
20 pages
Project_Report_Template_AICTE_Internship_2025
No ratings yet
Project_Report_Template_AICTE_Internship_2025
21 pages
A Review of Supervised Object-Based Land-Cover Image Classification
No ratings yet
A Review of Supervised Object-Based Land-Cover Image Classification
17 pages
Sapthagiri College of Engineering Information Science and Engineering
0% (1)
Sapthagiri College of Engineering Information Science and Engineering
14 pages
Machine Learning
No ratings yet
Machine Learning
33 pages

Quality Prediction Checkpoint

Uploaded by

Quality Prediction Checkpoint

Uploaded by

In [260]:

# Loding the dataset

1599 rows × 12 columns

# Checking for null values

fixed volatile residual free sulfur total sulfur

fixed volatile citric residual free sulfur total sulfur

sns.catplot(x='quality', data=df, kind='count')

<AxesSubplot:xlabel='quality', ylabel='volatile acidity'>

<AxesSubplot:xlabel='quality', ylabel='citric acid'>

Exploratory Data Analysis

df['quality'] = df['quality'].apply(lambda x: 1 if x >= 7 else 0)

fixed volatile citric residual free sulfur total sulfur good

<AxesSubplot:xlabel='residual sugar', ylabel='alcohol'>

Train Test Split

fixed volatile citric residual free sulfur total sulfur

fixed volatile citric residual free sulfur total sulfur

array([[ 1.69536131e-01, -1.72107140e+00, 4.59303345e-01, ...,

array([[-0.35581722, 0.14767337, -0.97362431, ..., -0.48256207,

#training the model

# testing the model

Support Vector Machine (SVM)

# training the model

# testing the model

# training the model

# testing the model

from sklearn.neighbors import KNeighborsClassifier

# training the model

# testing the model

# logistic regression model evaluation

print('Logistic Regression Model Accuracy: ', accuracy_score(y_test, lr_pred))

Logistic Regression Model Accuracy: 0.85625

Support Vector Machine (SVM)

sns.heatmap(confusion_matrix(y_test, sv_pred), annot=True, cmap='Reds')

print('Support Vector Machine Model Accuracy: ', accuracy_score(y_test, sv_pred))

Support Vector Machine Model Accuracy: 0.8625

sns.heatmap(confusion_matrix(y_test, tr_pred), annot=True, cmap='Greens')

Decision Tree Model Accuracy: 0.8604166666666667

K-Nearest Neighbors (KNN)

sns.heatmap(confusion_matrix(y_test, kn_pred), annot=True, cmap='Purples')

K-Nearest Neighbors Model Accuracy: 0.8583333333333333

You might also like