DS3-Lab5-v3

Uploaded by

Vijaylakshmi Bishnoi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

DS3-Lab5-v3

Uploaded by

Vijaylakshmi Bishnoi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Lab5: Data classification using Bayes Classifier with Gaussian Mixture Model (GMM);

Regression using Linear Regression and Polynomial Regression

Deadline for submission: 19th October 2021, 10:00 PM
PART-A:
You are given the Steel Plates Faults Data Set as a csv file (SteelPlateFaults-2class.csv) in
Assignment4 (Lab4). The dataset used for this assignment contains features extracted from the steel
plates of types A300 and A400 to predict whether the image of the surface of the steel plate contains
two types of faults such as Z_Scratch and K-Scratch. It consists of 1119 tuples each having 27 attributes
which are indicators representing the geometric shape of the fault. The last attribute (28th attribute) for
every tuple signifies the class label (0 for K_Scratch fault and 1 for Z_Scratch fault). It is a two-class
problem. Use the same train data file (SteelPlateFaults-train.csv) and test data file
(SteelPlateFaults-test.csv), used in Assignment4, in this assignment also.
1. Build a Bayes classifier with multi-modal Gaussian distribution (GMM) with Q Gaussian
components (modes) as class conditional density for each class on the training data
SteelPlateFaults-train.csv. Build a GMM with Q components for class1 and build
a GMM with Q components for class2. Classify every test tuple using the Bayes classifier with
GMM for the different values of Q=2, 4, 8, and 16. Perform the following analysis:
a. Find confusion matrix for each Q
b. Find the classification accuracy for each Q. Note the value of Q for which the accuracy
is high.
Note: Remove the attributes X_Minimum, Y_Minimum, TypeOfSteel_A300 and
TypeOfSteel_A400 from both training and test data set. Because correlation of X_Minimum &
X_Maximum is 1 and correlation of Y_Minimum & Y_Maximum is also 1. Also, correlation
between TypeOfSteel_A300 and TypeOfSteel_A400 is -1. This indicates the variance and
covariance of these pair of attributes are the same. This leads to the covariance matrix singular.
Due to this, the inverse of the covariance matrix cannot be computed, hence, the likelihood
cannot be computed. To avoid this issue, we need to remove the above-mentioned attributes.
Bayes classifier is now built using the data with only 23 attributes.
2. Tabulate and compare the best result of the KNN classifier, the best result of the KNN classifier
on normalized data, the result of the Bayes classifier using unimodal Gaussian density (all from
Assignment-4), and Bayes classifier using GMM.

Note:
Use the function “mixture.GaussianMixture” from scikit-learn to build GMM.

GMM = mixture.GaussianMixture(n_components=Q, covariance_type='full')

GMM.fit(x)

Compute the weighted log probabilities for each sample using

GMM.score_samples(x).

Compute accuracy using metrics.accuracy_score.

PART B:
You are given a data file abalone.csv. Abalones are marine snails. The dataset has been prepared
with the aim of making age predictions easier. Customarily, the age of abalone is determined by cutting
the shell through the cone, staining it, and counting the number of rings through a microscope. But it
is a tedious and time-consuming task. Therefore, other measurements, which are easier to obtain, are
used to predict age.
Attribute information:
Given is the attribute name, attribute type, the measurement unit, and a brief description. The number
of rings is the value to predict: either as a continuous value or as a classification problem.

Name / Data Type / Measurement Unit / Description

1. Length / continuous / mm / Longest shell measurement

2. Diameter / continuous / mm / Diameter of the shell calculated as perpendicular to length
3. Height / continuous / mm / Height of the shell with meat in shell
4. Whole weight / continuous / grams / Weight of whole abalone
5. Shucked weight / continuous / grams / Weight of meat
6. Viscera weight / continuous / grams / Gut-weight (after bleeding)
7. Shell weight / continuous / grams / Weight of the shell after being dried
8. Rings / integer / -- / Number of rings in a shell. (Adding 1.5 to the number of rings gives the
age of abalone in years)

Write a python program to split the data from abalone.csv into train data and test data. Train data
contain 70% of tuples and test data contain the remaining 30% of tuples. Save the train data as
abalone-train.csv and save the test data as abalone-test.csv.
Note: Use the command train_test_split from scikit-learn given below to split the data (keep
random_state=42 to get the same random values for every student).

1. Use the attribute which has the highest Pearson correlation coefficient with the target attribute
Rings as an input variable and build a simple linear (straight-line) regression model to predict
rings. (Prerequisite: calculate the Pearson correlation coefficient of every attribute with the target
attribute rings.)
a. Plot the best fit line on the training data where the x-axis represents the chosen attribute
value and the y-axis represents Rings.
b. Find the prediction accuracy on the training data using root mean squared error.
c. Find the prediction accuracy on the test data using root mean squared error.
d. Plot the scatter plot of actual Rings (x-axis) vs predicted Rings (y-axis) on the test data.
Draw inferences from the scatter plot.

2. Build a multivariate (multiple) linear regression model to predict Rings. All the attributes
other than the target attribute should be used as input to the model.
a. Find the prediction accuracy on the training data using root mean squared error.
b. Find the prediction accuracy on the test data using root mean squared error.
c. Plot the scatter plot of actual Rings (x-axis) vs predicted Rings (y-axis) on the test
data. Draw inferences from the scatter plot.
3. Use the attribute which has the highest Pearson correlation coefficient with the target attribute
Rings as input and build a simple nonlinear regression model using polynomial curve fitting
to predict Rings.
a. Find the prediction accuracy on the training data for the different values of degree of
the polynomial (p = 2, 3, 4, 5) using root mean squared error (RMSE). Plot the bar
graph of RMSE (y-axis) vs different values of degree of the polynomial (x-axis).
b. Find the prediction accuracy on the test data for the different values of degree of the
polynomial (p = 2, 3, 4, 5) using root mean squared error (RMSE). Plot the bar graph
of RMSE (y-axis) vs different values of degree of the polynomial (x-axis).
c. Plot the best fit curve using the best fit model on the training data where the x-axis
represents the chosen attribute value and the y-axis is Rings.
(Note: The best fit model is chosen based on the p-value for which the test RMSE is
minimum.)
d. Plot the scatter plot of the actual number of Rings (x-axis) vs the predicted number of
Rings (y-axis) on the test data for the best degree of the polynomial (p). Comment on
the scatter plot and compare it with that of in 1(d).

4. Build a multivariate nonlinear regression model using polynomial regression to predict Rings.
All the attributes other than the target attribute should be used as input to the model.
a. Find the prediction accuracy on the training data for the different values of degree of
the polynomial (p = 2, 3, 4, 5) using root mean squared error (RMSE). Plot the bar
graph of RMSE (y-axis) vs different values of degree of the polynomial (x-axis).
b. Find the prediction accuracy on the test data for the different values of degree of the
polynomial (p = 2, 3, 4, 5) using root mean squared error (RMSE). Plot the bar graph
of RMSE (y-axis) vs different values of degree of the polynomial (x-axis).
(Note: The best fit model is chosen based on the p-value for which the test RMSE is
minimum.)
c. Plot the scatter plot of the actual number of Rings (x-axis) vs the predicted number of
Rings (y-axis) on the test data for the best degree of the polynomial (p). Comment on
the scatter plot and compare it with that of in 1(d).

Hints and code snippets:

● For linear regression use:
from sklearn.linear_model import LinearRegression
reg = LinearRegression().fit(X, y)
#Input arguments: X: Input variable(s) of training data
and y: target values
y_pred = predict(X) #Predict using the linear model.
● For polynomial curve fitting and regression use:
from sklearn.preprocessing import PolynomialFeatures
poly_features = PolynomialFeatures(p) #p is the degree
x_poly = poly_features.fit_transform(X)
regressor = LinearRegression()
regressor.fit(x_poly, y)
#Input arguments: x_poly: Polynomial expansion of input
variable(s) of training data and y: target values
y_pred = regressor.predict(X)
Instructions:
● Your python program(s) should be well commented on. The comment section at
the beginning of the program(s) should include your name, registration number,
and mobile number.
● The python program(s) should be in the file extension .py
● The report should be strictly in PDF form. Write the report in word or latex form
and then convert it to PDF form. The template for the report (in word and latex) is
uploaded.
● The first page of your report must include your name, registration number, and
mobile number. Use the template of the report given in the assignment.
● Upload your program(s) and report in a single zip file. Give the name
as <roll_number>_Assignment5.zip. Example: b20001_Assignment5.zip
● Upload the zip file in the link corresponding to your group only.
In case the program is found to be copied from others, both the person who copied
and who helped to copy will get zero as a penalty.

The AI Wealth Creation Blueprint PDF
67% (3)
The AI Wealth Creation Blueprint PDF
50 pages
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
100% (8)
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
148 pages
How To Hack Atm
87% (15)
How To Hack Atm
1 page
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
95% (20)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
81% (48)
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
708 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
100% (10)
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
708 pages
National Oil Corporation: Rev Date Description Checked Approved
No ratings yet
National Oil Corporation: Rev Date Description Checked Approved
98 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
100% (24)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
The Fabric of Reality
100% (1)
The Fabric of Reality
6 pages
SQLTRANSACTIONS
No ratings yet
SQLTRANSACTIONS
2 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Assignment 1:: Intro To Machine Learning
No ratings yet
Assignment 1:: Intro To Machine Learning
6 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
No ratings yet
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
20 pages
Machine Learning For Humans
100% (4)
Machine Learning For Humans
97 pages
Lab Manual 04
No ratings yet
Lab Manual 04
12 pages
Feature Combinatrics
No ratings yet
Feature Combinatrics
10 pages
ML File - Merged
No ratings yet
ML File - Merged
24 pages
Exercise - 3: DS203-2024-S1 Roll Number: 23B2215
No ratings yet
Exercise - 3: DS203-2024-S1 Roll Number: 23B2215
25 pages
PGP25116 - Soubhagya - Dash - DPolynomial Regression
No ratings yet
PGP25116 - Soubhagya - Dash - DPolynomial Regression
4 pages
NR21 ML LAB MANUAL
No ratings yet
NR21 ML LAB MANUAL
34 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
13 pages
MBAN Assignment
No ratings yet
MBAN Assignment
2 pages
ML Lab Mannual
No ratings yet
ML Lab Mannual
29 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
ML LAB Manual
No ratings yet
ML LAB Manual
28 pages
lab mannual of ML
No ratings yet
lab mannual of ML
43 pages
CS5691:Programming Assignment 1 15 M Arch 2019 Team 25 M Sriram (EE16B027), P Venkat (CS16B017) Indian Institute of Technology, Madras
No ratings yet
CS5691:Programming Assignment 1 15 M Arch 2019 Team 25 M Sriram (EE16B027), P Venkat (CS16B017) Indian Institute of Technology, Madras
13 pages
ML Lab Programs PDF
No ratings yet
ML Lab Programs PDF
15 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
Computer Lab 2 Block 1-3
No ratings yet
Computer Lab 2 Block 1-3
7 pages
Train
No ratings yet
Train
17 pages
Post Midsem Prob
No ratings yet
Post Midsem Prob
5 pages
MachineLearning
No ratings yet
MachineLearning
10 pages
ML-Lab07-Building and Evaluating Multivariate Regression Models in Python
No ratings yet
ML-Lab07-Building and Evaluating Multivariate Regression Models in Python
5 pages
Pattern Recognition
No ratings yet
Pattern Recognition
26 pages
ML Lab Exp-5
No ratings yet
ML Lab Exp-5
3 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
Data Mining Assignment No. 1
No ratings yet
Data Mining Assignment No. 1
7 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
Machine Learning and Pattern Recognition Week 3 Intro - Classification
No ratings yet
Machine Learning and Pattern Recognition Week 3 Intro - Classification
5 pages
Chbe413cds HW5 Fa24
No ratings yet
Chbe413cds HW5 Fa24
2 pages
CS F320 - Assignment II - Draft (Subject to a Few Changes in the Description of Problems)
No ratings yet
CS F320 - Assignment II - Draft (Subject to a Few Changes in the Description of Problems)
12 pages
TD2345
No ratings yet
TD2345
3 pages
DA Practicle Answers Easyw
No ratings yet
DA Practicle Answers Easyw
30 pages
Argha's ML LAB_240927_121838
No ratings yet
Argha's ML LAB_240927_121838
13 pages
Questions and Solutions On Linear Regression
No ratings yet
Questions and Solutions On Linear Regression
5 pages
Chapter 5 Learning Deterministic Models
No ratings yet
Chapter 5 Learning Deterministic Models
28 pages
ML Lab Prgms Split
No ratings yet
ML Lab Prgms Split
3 pages
Polynomial Regression
No ratings yet
Polynomial Regression
6 pages
Mlaifile1 3
No ratings yet
Mlaifile1 3
27 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
19BCS2059 DL1
No ratings yet
19BCS2059 DL1
4 pages
Machine Learning Lab New
No ratings yet
Machine Learning Lab New
14 pages
MLLabManual
No ratings yet
MLLabManual
24 pages
HW1
No ratings yet
HW1
18 pages
Stat Lab
No ratings yet
Stat Lab
24 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
ML Lab Manual (1-10) FINAL
No ratings yet
ML Lab Manual (1-10) FINAL
34 pages
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
No ratings yet
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
6 pages
ML manoj
No ratings yet
ML manoj
51 pages
message (3)
No ratings yet
message (3)
2 pages
unit 3 7
No ratings yet
unit 3 7
4 pages
new89梁涛企业管理（运营与供应链方向）202111080248Application of linear regression model and logistic regression model based on Iris data set
No ratings yet
new89梁涛企业管理（运营与供应链方向）202111080248Application of linear regression model and logistic regression model based on Iris data set
21 pages
Exam2Review
No ratings yet
Exam2Review
23 pages
ML LAB Rec
No ratings yet
ML LAB Rec
9 pages
Udacity Machine Learning Analysis Supervised Learning
100% (1)
Udacity Machine Learning Analysis Supervised Learning
504 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
20 pages
Ml Solution
No ratings yet
Ml Solution
60 pages
ml_all_projectpdf_removed
No ratings yet
ml_all_projectpdf_removed
41 pages
BCSL606 MACHINE LEARNING LAB
No ratings yet
BCSL606 MACHINE LEARNING LAB
33 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Roadmap How To Learn AI in 2024 (Uncovered AI)
No ratings yet
Roadmap How To Learn AI in 2024 (Uncovered AI)
6 pages
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
From Music To Mathematic
100% (1)
From Music To Mathematic
4 pages
2045: The Year Man Becomes Immortal
No ratings yet
2045: The Year Man Becomes Immortal
9 pages
Tech Trend 2024 Report-2
No ratings yet
Tech Trend 2024 Report-2
11 pages
Rationality From AI To Zombies
86% (7)
Rationality From AI To Zombies
1,813 pages
Mind Control Patents
100% (1)
Mind Control Patents
41 pages
Wisc V Interpretation
100% (1)
Wisc V Interpretation
8 pages
Attention Is All You Need
67% (3)
Attention Is All You Need
11 pages
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
100% (7)
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
145 pages
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
No ratings yet
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
456 pages
Psych Unit 7a Practice Quiz
No ratings yet
Psych Unit 7a Practice Quiz
4 pages
National Pension System: NSDL E-Governance Infrastructure Limited
No ratings yet
National Pension System: NSDL E-Governance Infrastructure Limited
2 pages
Aries 675 Plus Aries Pro 675 Plus
No ratings yet
Aries 675 Plus Aries Pro 675 Plus
2 pages
Smart PV Controller: SUN2000-100KTL-M1
No ratings yet
Smart PV Controller: SUN2000-100KTL-M1
2 pages
06.06.2023 Call For Interview
No ratings yet
06.06.2023 Call For Interview
13 pages
k Anika Resume 1
No ratings yet
k Anika Resume 1
2 pages
MUST Users Guide PSS®E 35.3.0: July 2021
No ratings yet
MUST Users Guide PSS®E 35.3.0: July 2021
71 pages
Form Peminjaman Kit Lego Mindstorm PDF
No ratings yet
Form Peminjaman Kit Lego Mindstorm PDF
25 pages
Iot Labrecord 541
No ratings yet
Iot Labrecord 541
27 pages
A Progress Report: CUSTOMERS 2020
No ratings yet
A Progress Report: CUSTOMERS 2020
34 pages
Technology in Teaching and Learning
No ratings yet
Technology in Teaching and Learning
20 pages
BAB III Gearbox Avanza
No ratings yet
BAB III Gearbox Avanza
11 pages
Theoretical Manual
No ratings yet
Theoretical Manual
566 pages
RTU7C User Manual
No ratings yet
RTU7C User Manual
63 pages
Newtec QPSKPDF
No ratings yet
Newtec QPSKPDF
17 pages
Analisis Pemanfaatan e Puskesmas Dengan b25cd078
No ratings yet
Analisis Pemanfaatan e Puskesmas Dengan b25cd078
10 pages
Calibration of Venturimeter: Name: Amishasharon Rajavijai Sahidha Roll Number: 111120011
No ratings yet
Calibration of Venturimeter: Name: Amishasharon Rajavijai Sahidha Roll Number: 111120011
6 pages
Install
No ratings yet
Install
933 pages
Module 2
No ratings yet
Module 2
17 pages
Cimberio Valve Catalogue
No ratings yet
Cimberio Valve Catalogue
16 pages
Project Report Front
No ratings yet
Project Report Front
5 pages
Elite - 515 Manual
No ratings yet
Elite - 515 Manual
196 pages
Https Uucms - Karnataka.gov - in ExamGeneral PrintExamApplication
No ratings yet
Https Uucms - Karnataka.gov - in ExamGeneral PrintExamApplication
1 page
Válvula de Presión Check
No ratings yet
Válvula de Presión Check
2 pages
MicroSCADA X SYS600 Process Picture Design Manual
No ratings yet
MicroSCADA X SYS600 Process Picture Design Manual
210 pages
Bba22 23
No ratings yet
Bba22 23
72 pages
KRA-J 1-5KVA 0.8PF 120vac
No ratings yet
KRA-J 1-5KVA 0.8PF 120vac
2 pages
Lecture-5_Flip-flops
No ratings yet
Lecture-5_Flip-flops
26 pages
1
No ratings yet
1
16 pages

DS3-Lab5-v3

Uploaded by

DS3-Lab5-v3

Uploaded by

Lab5: Data classification using Bayes Classifier with Gaussian Mixture Model (GMM);

Regression using Linear Regression and Polynomial Regression

GMM = mixture.GaussianMixture(n_components=Q, covariance_type='full')

Compute the weighted log probabilities for each sample using

Compute accuracy using metrics.accuracy_score.

Name / Data Type / Measurement Unit / Description

1. Length / continuous / mm / Longest shell measurement

Hints and code snippets:

You might also like