DS3-Lab5-v3
DS3-Lab5-v3
Note:
Use the function “mixture.GaussianMixture” from scikit-learn to build GMM.
Write a python program to split the data from abalone.csv into train data and test data. Train data
contain 70% of tuples and test data contain the remaining 30% of tuples. Save the train data as
abalone-train.csv and save the test data as abalone-test.csv.
Note: Use the command train_test_split from scikit-learn given below to split the data (keep
random_state=42 to get the same random values for every student).
1. Use the attribute which has the highest Pearson correlation coefficient with the target attribute
Rings as an input variable and build a simple linear (straight-line) regression model to predict
rings. (Prerequisite: calculate the Pearson correlation coefficient of every attribute with the target
attribute rings.)
a. Plot the best fit line on the training data where the x-axis represents the chosen attribute
value and the y-axis represents Rings.
b. Find the prediction accuracy on the training data using root mean squared error.
c. Find the prediction accuracy on the test data using root mean squared error.
d. Plot the scatter plot of actual Rings (x-axis) vs predicted Rings (y-axis) on the test data.
Draw inferences from the scatter plot.
2. Build a multivariate (multiple) linear regression model to predict Rings. All the attributes
other than the target attribute should be used as input to the model.
a. Find the prediction accuracy on the training data using root mean squared error.
b. Find the prediction accuracy on the test data using root mean squared error.
c. Plot the scatter plot of actual Rings (x-axis) vs predicted Rings (y-axis) on the test
data. Draw inferences from the scatter plot.
3. Use the attribute which has the highest Pearson correlation coefficient with the target attribute
Rings as input and build a simple nonlinear regression model using polynomial curve fitting
to predict Rings.
a. Find the prediction accuracy on the training data for the different values of degree of
the polynomial (p = 2, 3, 4, 5) using root mean squared error (RMSE). Plot the bar
graph of RMSE (y-axis) vs different values of degree of the polynomial (x-axis).
b. Find the prediction accuracy on the test data for the different values of degree of the
polynomial (p = 2, 3, 4, 5) using root mean squared error (RMSE). Plot the bar graph
of RMSE (y-axis) vs different values of degree of the polynomial (x-axis).
c. Plot the best fit curve using the best fit model on the training data where the x-axis
represents the chosen attribute value and the y-axis is Rings.
(Note: The best fit model is chosen based on the p-value for which the test RMSE is
minimum.)
d. Plot the scatter plot of the actual number of Rings (x-axis) vs the predicted number of
Rings (y-axis) on the test data for the best degree of the polynomial (p). Comment on
the scatter plot and compare it with that of in 1(d).
4. Build a multivariate nonlinear regression model using polynomial regression to predict Rings.
All the attributes other than the target attribute should be used as input to the model.
a. Find the prediction accuracy on the training data for the different values of degree of
the polynomial (p = 2, 3, 4, 5) using root mean squared error (RMSE). Plot the bar
graph of RMSE (y-axis) vs different values of degree of the polynomial (x-axis).
b. Find the prediction accuracy on the test data for the different values of degree of the
polynomial (p = 2, 3, 4, 5) using root mean squared error (RMSE). Plot the bar graph
of RMSE (y-axis) vs different values of degree of the polynomial (x-axis).
(Note: The best fit model is chosen based on the p-value for which the test RMSE is
minimum.)
c. Plot the scatter plot of the actual number of Rings (x-axis) vs the predicted number of
Rings (y-axis) on the test data for the best degree of the polynomial (p). Comment on
the scatter plot and compare it with that of in 1(d).