0% found this document useful (0 votes)
8 views

ML Regression Documentation

Uploaded by

samuel.yacoubb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

ML Regression Documentation

Uploaded by

samuel.yacoubb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

ML Regression Documentation

Memory updated

Here's a comprehensive documentation and explanation of the machine learning section 3, complete
with introductions, code snippets, explanations, and real-world examples.

Section 3: Linear and Polynomial Regression in Machine


Learning
Introduction

In this section, we explore regression techniques, focusing on Linear Regression and Polynomial
Regression. These are fundamental methods in machine learning for predicting continuous outcomes
based on input features. We will utilize the `scikit-learn` library for linear regression and `numpy` for
polynomial regression, along with `pandas` for data manipulation.

1. Linear Regression

1.1 Overview

Linear Regression is a statistical method to model the relationship between a dependent variable
(target) and one or more independent variables (predictors). The model assumes that the relationship
can be represented as a linear function.

1.2 Example Code

python

import pandas as pd
from sklearn import linear_model

# Load the dataset


df = pd.read_csv("data.csv")

# Define the independent variables (features) and dependent variable (target)


X = df[['Weight', 'Volume']]
y = df['CO2']

# Create a Linear Regression model


regr = linear_model.LinearRegression()

# Fit the model to the data


regr.fit(X, y)

# Predict CO2 emission for a car with given weight and volume
predictedCO2 = regr.predict([[2300, 1300]])

# Output the prediction


print(predictedCO2)

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 1/7
1.3 Explanation

Imports: We start by importing necessary libraries: `pandas` for data handling and `sklearn` for
building the linear regression model.
Loading Data: The data is read from a CSV file into a DataFrame (`df`).
Defining Variables: We define our features `X` (Weight and Volume) and our target `y` (CO2
emissions).
Model Creation and Fitting: A Linear Regression model is instantiated and fitted with the data.
Prediction: We predict CO2 emissions for a specific car's weight and volume.

1.4 Coefficient Values

To understand the influence of each feature, we can print the coefficients of the linear regression
model.

python

# Print the coefficient values of the regression object


print(regr.coef_)

1.5 Modifying Predictions

We can easily modify the input parameters to see how predictions change. For instance, if we change
the weight from 2300kg to 3300kg:

python

# Change weight and predict CO2 emissions


predictedCO2 = regr.predict([[3300, 1300]])
print(predictedCO2)

2. Polynomial Regression

2.1 Overview

Polynomial Regression is an extension of linear regression that can model relationships between the
dependent and independent variables that are not linear. By adding polynomial terms, we can fit a
curve to the data.

2.2 Scatter Plot Visualization

To visualize our data points, we can create a scatter plot:

python

import matplotlib.pyplot as plt

# Sample data for plotting


x = [1,2,3,5,6,7,8,9,10,12,13,14,15,16,18,19,21,22]
y = [100,90,80,60,60,55,60,65,70,70,75,76,78,79,90,99,99,100]

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 2/7
# Create scatter plot
plt.scatter(x, y)
plt.show()

2.3 Polynomial Regression Line

Next, we fit a polynomial regression line to the data and visualize it:

python

import numpy as np

# Fit a polynomial of degree 3 to the data


mymodel = np.poly1d(np.polyfit(x, y, 3))

# Create a range of values for the x-axis


myline = np.linspace(1, 22, 100)

# Create scatter plot with polynomial regression line


plt.scatter(x, y)
plt.plot(myline, mymodel(myline))
plt.show()

2.4 Prediction Example

We can also make predictions using our polynomial model:

python

# Predict the speed of a car passing at 17:00


speed = mymodel(17)
print(speed)

3. Evaluating Fit with R-squared

To evaluate the quality of our polynomial regression model, we calculate the R-squared value, which
indicates how well our model fits the data.

python

from sklearn.metrics import r2_score

# Values leading to poor fit


x_bad = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40]
y_bad = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15]

# Fit polynomial regression


mymodel_bad = np.poly1d(np.polyfit(x_bad, y_bad, 3))

# Print the R-squared value


print(r2_score(y_bad, mymodel_bad(x_bad)))

3.1 Poor Fit Visualization

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 3/7
Visualizing data with a poor fit helps us understand the limitations of our polynomial model:

python

# Plotting the bad fit


myline_bad = np.linspace(2, 95, 100)

plt.scatter(x_bad, y_bad)
plt.plot(myline_bad, mymodel_bad(myline_bad))
plt.show()

Conclusion

In this section, we've covered both linear and polynomial regression techniques, demonstrated
through code examples and visualizations. Understanding these methods is crucial for predictive
modeling in machine learning.

Real-World Example

Consider a car manufacturer that wants to estimate CO2 emissions based on a vehicle's weight and
volume. Using the above regression techniques, they can develop a predictive model to help guide
their design choices towards more environmentally friendly options.

This documentation should give you a thorough understanding of the concepts and practical
applications of linear and polynomial regression in machine learning. If you need further clarification
on any point, feel free to ask!

Let's dive deeper into each aspect of linear and polynomial regression, including additional
explanations, visualizations, and real-world applications.

Section 3: Linear and Polynomial Regression in Machine


Learning (Continued)
4. Linear Regression: Detailed Explanation

4.1 Understanding the Model

Linear regression seeks to establish a relationship between the dependent variable y and the
independent variables X . The model can be mathematically represented as:

y = b0 + b1 X1 + b2 X2 + ... + bn Xn
​ ​ ​ ​ ​ ​ ​

Where:

b0 is the intercept.

b1 , b2 , ..., bn are the coefficients for each independent variable X1 , X2 , ..., Xn .


​ ​ ​ ​ ​ ​

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 4/7
The goal is to find the values of b that minimize the difference between the actual and predicted
values of y .

4.2 Cost Function

The cost function, also known as the loss function, quantifies how well the model's predictions match
the actual data. For linear regression, we typically use the Mean Squared Error (MSE):
n
1
MSE = ∑(yi − y^i )2
​ ​ ​ ​ ​

n
i=1

Where:

yi is the actual value.


y^i is the predicted value.


​ ​

n is the number of data points.

Minimizing this cost function helps us find the best-fit line.

4.3 Visualization of Fit

Visualizing the regression line along with the data points can provide insights into how well the model
fits the data.

python

import matplotlib.pyplot as plt

# Plotting the data points


plt.scatter(df['Weight'], df['CO2'], color='blue', label='Actual Data')

# Plotting the regression line


plt.plot(df['Weight'], regr.predict(X), color='red', label='Regression Line')
plt.xlabel('Weight (kg)')
plt.ylabel('CO2 Emissions')
plt.title('Weight vs CO2 Emissions')
plt.legend()
plt.show()

5. Polynomial Regression: Detailed Explanation

5.1 Polynomial Features

In polynomial regression, we extend our linear model by adding powers of the independent variables.
This allows us to capture non-linear relationships.

For instance, if we have a single feature X , the polynomial regression model might look like this:

y = b0 + b1 X + b2 X 2 + b3 X 3
​ ​ ​ ​

The degrees of the polynomial (in this case, up to 3) can significantly affect the model's ability to fit the
data.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 5/7
5.2 Overfitting vs Underfitting

Underfitting occurs when the model is too simple to capture the underlying trend in the data,
leading to poor performance on both training and testing datasets.
Overfitting happens when the model is too complex, capturing noise in the training data, which
can result in high accuracy on the training set but poor generalization to new data.

Visualizing the fit of a polynomial regression model can help identify whether the model is overfitting
or underfitting:

python

# Creating a polynomial of degree 4 (overfitting example)


mymodel_overfit = np.poly1d(np.polyfit(x, y, 4))

# Plotting the data points and the overfitted polynomial


plt.scatter(x, y, color='blue', label='Actual Data')
plt.plot(myline, mymodel_overfit(myline), color='green', label='Overfitted Model')
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Polynomial Regression (Overfitting)')
plt.legend()
plt.show()

6. Performance Metrics

6.1 R-squared

The R-squared value provides an indication of how well the independent variables explain the
variability of the dependent variable. It ranges from 0 to 1, where 1 indicates that the model explains
all variability.

To calculate R-squared:

python

r_squared = r2_score(y, regr.predict(X))


print(f'R-squared: {r_squared}')

6.2 Adjusted R-squared

Adjusted R-squared modifies the R-squared value based on the number of predictors in the model. It
is especially useful when comparing models with a different number of predictors.

The formula for adjusted R-squared is:

(1 − R2 )(n − 1)
Adjusted R2 = 1 − ( )
n−p−1

Where:

n is the number of observations.


Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 6/7
p is the number of predictors.

7. Real-World Applications

7.1 Predicting CO2 Emissions

Using the regression techniques demonstrated, a car manufacturer can predict CO2 emissions based
on various vehicle attributes (weight, volume, etc.). This information can guide design decisions,
enabling the creation of more fuel-efficient vehicles.

7.2 Real Estate Price Prediction

Linear and polynomial regression can also be applied in real estate to predict property prices based on
features such as size, location, number of rooms, etc. By analyzing historical data, realtors can
estimate property values and identify trends.

7.3 Sales Forecasting

Businesses often utilize regression analysis to forecast sales based on advertising spend, seasonality,
and economic indicators. By modeling these relationships, companies can make informed decisions
about marketing strategies and resource allocation.

Conclusion

This detailed overview of linear and polynomial regression equips you with the knowledge to apply
these techniques in various real-world scenarios. Understanding how to visualize and evaluate models
is crucial for ensuring that your predictions are reliable and valid.

Further Reading and Exploration

Books: Look for textbooks on machine learning and statistics for deeper insights.
Online Courses: Platforms like Coursera and edX offer courses that cover regression analysis in
depth.
Kaggle Datasets: Experiment with various datasets available on Kaggle to practice your
regression skills.

Feel free to reach out if you have questions about specific topics, need further examples, or want to
explore more advanced techniques in regression analysis!
ChatGPT can make mistakes. Check important info.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 7/7

You might also like