0% found this document useful (0 votes)

8 views

Group_1_Practical

The document outlines the process of implementing a linear regression model for predicting startup profits, covering objectives such as understanding linear regression, dataset specifications, data preprocessing, model selection, training, and evaluation. It details the significance of various dataset variables, the importance of data cleaning and feature scaling, and the rationale behind choosing linear regression as a predictive model. Additionally, it discusses model evaluation metrics like Mean Squared Error and R-squared to assess the model's performance.

Uploaded by

haldankarjatin286

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Group_1_Practical

Uploaded by

haldankarjatin286

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Experiment No 4: Linear Regression Model

Objectives:

● Intuition
○ Understanding the basics of linear regression
○ Conceptualizing how linear regression works

● Dataset Specification
○ Describing the dataset used in the practical
○ Identifying the variables and their significance

● Data Pre-processing
○ Cleaning and handling missing data
○ Feature scaling and normalization
○ Encoding categorical variables

● Data Splitting
○ Dividing the dataset into training and testing sets
○ Determining the split ratio

● Model Selection
○ Choosing linear regression as the predictive model
○ Justification for selecting linear regression

● Model Training
○ Implementing a simple linear regression model
○ Training the model using the training dataset

● Model Evaluation (Mean Squared Error and R-squared Metrics)

○ Calculating the mean squared error (MSE)
○ Computing the R-squared value to assess model fit
○ Interpreting the results

● Generalization and Application

○ Discussing the practical applications of linear regression
○ Generalization of the model for real-world scenarios
○ Sharing insights and takeaways
Intuition

Understanding the basics of linear regression

Linear regression is a supervised learning algorithm that is used to predict continuous values.
It is one of the most fundamental and widely used machine learning algorithms.

Linear regression works by finding the best-fit line through a set of data points. The best-fit
line is the line that minimizes the distance between the data points and the line itself.

Once the best-fit line has been found, it can be used to predict the value of the dependent
variable for new input values.

Let’s assume there is a linear relationship between X and Y then value of Y can be predicted
using:

Here,
● are labels to data (Supervised learning)
● are the input independent training data (univariate – one input
variable(parameter))
● are the predicted values.

A linear regression model can be trained using the optimization algorithm gradient descent by
iteratively modifying the model’s parameters to reduce the mean squared error (MSE) of the
model on a training dataset. To update θ1 and θ2 values in order to reduce the Cost function
(minimizing RMSE value) and achieve the best-fit line the model uses Gradient Descent. The
idea is to start with random θ1 and θ2 values and then iteratively update the values

The assumptions of linear regression are:

● The relationship between the dependent variable and the independent variables is
linear.
● The variance of the residuals is constant across all values of the independent
variables.
● The errors are independent of each other.

If these assumptions are not met, the results of the linear regression analysis may not be
reliable.
Conceptualizing how linear regression works

Linear regression is a statistical method used to predict the value of a dependent variable
based on the values of one or more independent variables. It is a supervised learning
algorithm, which means that it is trained on a dataset of known input-output pairs.

There are several ways to conceptualize how linear regression works. One way is to think of
it as a line that best fits a set of data points. The line is fitted to the data using a least squares
approach, which minimizes the sum of the squared distances between the data points and the
line.

Another way to conceptualize linear regression is to think of it as a way to model the

relationship between the dependent and independent variables. The linear regression model
assumes that the relationship between the two variables is linear, i.e., that a change in the
independent variable results in a proportional change in the dependent variable.

Once the linear regression model has been fitted to the data, it can be used to predict the
value of the dependent variable for new input values. For example, if the linear regression
model is used to predict the weight of a person based on their height, the model can be used
to predict the weight of a person of any height.

Linear regression is a powerful tool that can be used to predict the value of a dependent
variable based on the values of one or more independent variables. It is a versatile algorithm
that can be used for a wide variety of tasks, including machine learning.

Here's a step-by-step conceptual overview of how linear regression operates:

1. Data Points :
- Linear regression starts with a dataset containing observations or data points. Each data point
consists of pairs of values: one or more independent variables (features) and the corresponding
dependent variable (the target).

2. Scatter Plot :
- To visualize the relationship between the independent variable(s) and the dependent variable,
you can create a scatter plot. Each data point is plotted, with the independent variable(s) on the
x-axis and the dependent variable on the y-axis.

3. Linear Equation :
- Linear regression assumes that there is a linear relationship between the independent
variable(s) and the dependent variable. This relationship is represented by a linear equation of
the form:
y = b0 + b1 * x

- `y` is the predicted value of the dependent variable.

- `x` is the value of the independent variable.
- `b0` is the intercept (the value of `y` when `x` is 0).
- `b1` is the slope (the change in `y` for a one-unit change in `x`).

4. Best-Fit Line :
- The goal of linear regression is to find the best-fit line that minimizes the sum of the squared
differences between the actual data points and the predicted values along this line. This line
represents the model's estimate of the linear relationship.

5. Training the Model :

- Linear regression calculates the values of `b0` and `b1` that make the best-fit line the best
possible fit to the data. This process involves minimizing the mean squared error (MSE) or a
similar cost function.

6. Predictions :
- Once the model is trained and you have the values of `b0` and `b1`, you can use the linear
equation to make predictions for new, unseen data points. Simply plug in the values of the
independent variable(s) to estimate the dependent variable.

7. Interpretation :
- The coefficients `b0` and `b1` have interpretive significance. `b1` indicates how much the
dependent variable changes for a one-unit change in the independent variable. A positive `b1`
suggests a positive relationship, and a negative `b1` suggests a negative relationship.

8. Model Evaluation :
- Linear regression models are evaluated using various metrics such as R-squared, MSE, or
RMSE. These metrics assess how well the model fits the data and makes accurate predictions.

9. Limitations and Considerations :

- It's important to be aware of the assumptions of linear regression, including linearity,
independence of errors, homoscedasticity, and normality of residuals. Violations of these
assumptions can affect the model's accuracy.
Dataset Specification

Describing the dataset used in the practical

About Dataset:

● R&D Spend: The amount spent annually by a startup in Research and Development
for their product/service.
● Administration: Amount spent annually in managing workforce, including salaries,
machine costs, etc.
● Marketing Spend: Amount spent annually for promoting the product/service both
online and offline.
● State: The name of State where the organization is located or operating from.
● Profit: The net profit amount of the startup company annually.

This dataset can be used for a variety of purposes, such as:

● Predicting the profit to be earned in future.

● Identifying factors that affect the profit of the startup.
● Segmenting the company based on states.
● Developing marketing campaigns for certain startups.

Identifying the variables and their significance

What is variables in dataset

In a dataset, a variable refers to a specific characteristic, attribute, or field that holds
information about individual data points. Variables are the columns or features within the dataset
that store data values, and they play a fundamental role in data analysis and statistical modeling.
1. R&D Spent (Amount):
- Significance : It reflects the investment in innovative activities that can lead to the
development of new products, improved processes, or other competitive advantages. Generally,
higher R&D spending might be associated with higher future profits if those investments
translate into successful products or services.

2. Administration:
- Significance : It can include costs such as salaries for administrative staff, office rent,
utilities, and other overhead expenses. The significance of this column depends on how
efficiently these administrative expenses are managed. High administrative expenses relative to
revenue could negatively impact profitability.

3. Marketing:
- Significance : It's important because marketing is essential for promoting products or
services, expanding the customer base, and increasing sales. Effective marketing can lead to
higher revenue and, ultimately, higher profit. The significance of this column depends on the
effectiveness of the marketing efforts.

4. State:
- Significance : The significance of this categorical variable depends on various factors, such
as state-specific economic conditions, market size, regulatory environment, and consumer
behavior. Different states may offer different business opportunities and challenges, and the
choice of state can impact profitability.

5. Profit:
- Significance : This is the target variable you want to predict. It represents the company's
financial performance, and it's the primary measure of success. The goal is to predict and
maximize profit, so understanding the significance of the other columns in relation to "Profit" is
essential for making informed business decisions.
Data Pre-processing

Data preprocessing is a crucial phase in our startup profit prediction project using linear
regression. This phase involves several key steps to ensure that our dataset is prepared for
effective model training and evaluation. Additionally, data splitting helps assess your model's
performance accurately.

Cleaning and Handling Missing Data:

Missing data is a common issue in datasets. It can lead to inaccurate results and cause problems
for machine learning models.Start by addressing missing data in the dataset, particularly in
essential columns such as ‘profit,' 'Marketing,' and ’Administration'. Utilize techniques like mean
imputation for numerical features and mode imputation for categorical attributes. Clean data
ensures that the linear regression model receives high-quality inputs.

Feature Scaling and Normalization:

Standardization (Z-score normalization): Standardization scales data so that it has a mean of 0

and a standard deviation of 1. It is appropriate when the data is approximately normally
distributed and doesn't have strong outliers.
Min-Max scaling transforms data into a specific range, often [0, 1], by using the minimum and
maximum values of the feature. It's suitable when you want to constrain data to a specific range.

Encoding Categorical Variables:

One-Hot Encoding is a technique used in data preprocessing, particularly in the context of

machine learning and data analysis, to convert categorical variables into a numerical format that
can be used in statistical and machine learning algorithms. It is particularly useful when dealing
with categorical data that cannot be directly used in models that expect numerical input.
Data Splitting

Dividing the Dataset into Training and Testing Sets:

It is essential to split the dataset into two subsets: a training set and a testing set. In the context of
our startup profit prediction project, this division plays a vital role. The training set is where our
machine learning model learns patterns and relationships within the data, such as the impact of
features like 'R&D Spent,' 'Administration Cost', ‘Marketing Cost' and other relevant features.
The testing set, on the other hand, serves as a means to evaluate how well our model performs in
predicting the profit earned by the startup when presented with new, unseen data. This division
ensures that our model not only learns from the data but also generalizes effectively, making
reliable predictions for potential profit seeking startups.

Determining the Split Ratio:

The split ratio is a critical decision in our startup profit prediction project. While common ratios
like 70/30 or 80/20 are often used, the choice depends on the size of our dataset and the specific
goals of our project. In our case, a larger training set allows our model to learn more
comprehensively from historical startups data, enabling it to capture complex profit
determinants. However, we must balance this with the need for a sufficiently substantial testing
set. This is vital for evaluating our model's performance accurately and ensuring that it can
handle diverse costs to spend in various categories involving salaries, marketing and research
inputs. The choice of the split ratio is a strategic decision, and it's essential to find the right
balance between model learning and evaluation.

Splitting the dataset into train and test sets with 80:20 ratio.

● X_train: This contains the features for the training set.

● y_train: This contains the corresponding target values for the training set.
● X_test: This contains the features for the testing set.
● y_test: This contains the corresponding target values for the testing set.

test_size = 0.2 represents the 20% of the dataset to be included in the test set.
random_state is used to control the randomness of the data split. Setting it to a fixed value (e.g.,
0) ensures that you get the same random split every time you run the code. If you don't set it, the
split will be different each time you run the code.

You can use X_train and y_train to train your machine learning model, and then use X_test to
make predictions, which you can compare to y_test to evaluate the model's performance. This
splitting ensures that you have a separate dataset for testing the model's performance, helping to
assess how well it generalizes to new, unseen data.

Model Selection

Why Linear Regression?

● Linearity Assumption: Linear regression assumes a linear relationship between the

independent variables (features) and the dependent variable (Profit). In business and
financial contexts, it is often reasonable to assume that there is a linear relationship
between certain financial indicators (such as R&D Spend, Administration, and Marketing
Spend) and profit. Linear regression can capture this relationship well.
● Interpretability: Linear regression provides straightforward interpretability. You can
easily interpret the coefficients of the regression equation, which represent the change in
profit associated with a one-unit change in each independent variable while holding all
other variables constant. This interpretability is valuable for making business decisions
and understanding the impact of different factors on profit.
● Simplicity: Linear regression is a simple and well-understood model. It doesn't require
complex assumptions or a large amount of data. If the relationship between the features
and profit is roughly linear, a simple linear regression model may provide adequate
predictions.
● Quick to Implement: Linear regression is easy to implement and computationally
efficient. This makes it a good choice for quick initial analysis and as a baseline model.
● Assumption Testing: You can perform various diagnostic tests to assess whether the
assumptions of linear regression are met. If the assumptions are reasonably satisfied,
linear regression can provide reliable predictions.
● Feature Importance: Linear regression provides coefficients for each feature, indicating
their importance in predicting profit. This information can help you identify which
variables have the most significant impact on profit
Here, we can use Linear Regression model with Y (dependant) value as profit (which needs to be
predicted) and Independent values:
1. X1 = R&D Cost
2. X2 = Administration
3. X3 = Marketing

We can then apply Y = aX1 + bX2 + cX3 (multiple linear regression).

Model Training

We will use the LinearRegression() method from the scikit-learn module from python.
Following code implements the same to train the model:

Now, storing the predicted value to the y_pred variable. Printing the predicted and test set values
to compare them.
Model Evaluation

Computing the R-squared metrics for finding the accuracy of the model.

Unexplained Variation or SSR is the Sum of Squares of Residuals (also known as the Sum of
Squared Errors, SSE): It measures the total squared differences between the observed values (the
actual target values) and the predicted values from the model. A lower SSR indicates a better fit.
Total Variation or SST is the Total Sum of Squares: It measures the total squared differences
between the observed values and the mean of the observed values. It represents the total
variability in the dependent variable. A lower SST indicates a better fit.

● If R2 = 1, it means that the model perfectly fits the data, explaining all the variability in
the dependent variable.
● If R2 = 0, it means that the model doesn't explain any of the variability, and it's no better
than a horizontal line (the mean of the dependent variable).

We will use the r2square method from scikit-learn module metrics.

Computing Mean Squared Error metrics for finding the accuracy of the model.

The Mean Squared Error (MSE) is a commonly used metric for evaluating the performance of
regression models. It measures the average squared difference between the predicted values and
the actual (observed) values of the dependent variable (target). A lower MSE indicates a better fit
of the model to the data.

● MSE values are expressed in quadratic equations. Hence when we plot it, we get a
gradient descent with only one global minima.
● For small errors, it converges to the minima efficiently. There are no local minima.
● MSE penalizes the model for having huge errors by squaring them.
● It is particularly helpful in weeding out outliers with large errors from the model by
putting more weight on them.

We will use the mean_squared_error method from scikit-learn module metrics.

Following code implements both metrics:
Generalization

Predicting a single value:

Assuming-
● R&D cost = 56000
● Administration = 67000
● Marketing = 68000

The predicted value:

Plotting the curve for Actual vs Predicted profit values over fitting a linear line:

From the above Accuracy Metrics calculations, we can infer that:

● Accuracy of the Linear Regression model according to R-squared metrics on test set =
93%
● Accuracy of the Linear Regression model according to Mean-Squared-Error on test set =
83.5%

Python code for above regression task:

# importing the required libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from sklearn.compose import ColumnTransformer

from sklearn.preprocessing import OneHotEncoder

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import r2_score

from sklearn.metrics import mean_squared_error

import matplotlib.pyplot as plt

#importing the dataset

df = pd.read_csv('50_Startups.csv')

#assigning the dependent and independent variables

X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

#Encoding the categorical data

ct = ColumnTransformer(transformers=[('encoder',
OneHotEncoder(), [3])], remainder='passthrough')
X = np.array(ct.fit_transform(X))

#Splitting the dataset into training and test set

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size = 0.2, random_state = 0)

#Training multiple linear regression model on training set

regressor = LinearRegression()
regressor.fit(X_train, y_train)
#Predicting the test set results
y_pred = regressor.predict(X_test)
np.set_printoptions(precision=2)
print(np.concatenate((y_pred.reshape(len(y_pred),1),
y_test.reshape(len(y_test),1)),1))

#Computing the R-squared and MSE metrics

model = LinearRegression()

def model_prediction(model, x_train, y_train, x_test, y_test):

model.fit(x_train, y_train)
x_train_pred = model.predict(x_train)
x_test_pred = model.predict(x_test)

#Calculate the R-squared value for training and test

predictions
a = r2_score(y_train, x_train_pred)
b = r2_score(y_test, x_test_pred)
print(f"R2 Score of {type(model).__name__} model on Training
Data: {a:.2f}")
print(f"R2 Score of {type(model).__name__} model on Testing
Data: {b:.2f}")

#Calculate the Mean Squared Error for training and test

predictions
mse_train = mean_squared_error(y_train, x_train_pred)
mse_test = mean_squared_error(y_test, x_test_pred)

print(f"Mean Squared Error on the training set:

{mse_train:.2f}")
print(f"Mean Squared Error on the test set: {mse_test:.2f}")

print(f"Evaluating {type(model).name} model : ")

model_prediction(model, X_train, y_train, X_test, y_test)
print("\n")

#Plotting the curve to test fit of actual vs predicted profit

value
import matplotlib.pyplot as plt
# Create a scatter plot to compare the predicted values and the
actual target values
plt.scatter(y_test, y_pred, color='blue', label='Actual vs.
Predicted')
plt.xlabel('Actual Values (y_test)')
plt.ylabel('Predicted Values (y_pred)')
plt.title('Actual vs. Predicted Values')
plt.legend()
plt.grid()

plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)],

color='red', linestyle='--', linewidth=2, label='Perfect
Prediction')
plt.legend()

# Show the plot

plt.show()

Applications

Linear regression is a very versatile algorithm and can be used for a wide variety of tasks,
including:
● Predicting the price of a house based on its square footage and number of bedrooms.
● Predicting the risk of a customer churning based on their past purchase history.
● Predicting the demand for a product based on historical sales data.
● Predicting the performance of a student on a test based on their past test scores.

Linear regression is a relatively simple algorithm to understand and implement. However, it is

important to understand the assumptions of linear regression and to use it appropriately.

References used by students:

https://www.analyticsvidhya.com/blog/2021/10/evaluation-metric-for-regression-models/#:~:text
=Mean%20Squared%20Error%20(MSE),-MSE%20is%20one&text=In%20Mean%20Squared%
20Error%20also,the%20square%20of%20the%20error.

https://www.investopedia.com/terms/r/r-squared.asp#:~:text=The%20calculation%20of%20R%2
Dsquared,values%2C%20and%20square%20the%20results.

CH02 - Wooldridge - 7e PPT - 2pp
100% (3)
CH02 - Wooldridge - 7e PPT - 2pp
40 pages
Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
From Everand
Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
Jim Frost
5/5 (4)
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Student Sol 064 e
No ratings yet
Student Sol 064 e
98 pages
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript
No ratings yet
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript
9 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
linear regression (1)
No ratings yet
linear regression (1)
8 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
No ratings yet
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
20 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
MachineLearning_Unit-II
No ratings yet
MachineLearning_Unit-II
45 pages
Data Science
100% (1)
Data Science
14 pages
3. Linear Regression
No ratings yet
3. Linear Regression
49 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
Practical - Regression
No ratings yet
Practical - Regression
114 pages
Linear Regression - 1st draft (1)
No ratings yet
Linear Regression - 1st draft (1)
5 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Hanan
No ratings yet
Hanan
9 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
33 pages
lecture 9-10
No ratings yet
lecture 9-10
28 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
ML PR-2
No ratings yet
ML PR-2
11 pages
Model Development
No ratings yet
Model Development
80 pages
MODULE-3
No ratings yet
MODULE-3
34 pages
11 Regression
No ratings yet
11 Regression
34 pages
Linear_Regression (1)
No ratings yet
Linear_Regression (1)
35 pages
Linear Regression - Jupyter Notebook
100% (3)
Linear Regression - Jupyter Notebook
56 pages
Chapter_2_Linear and Logistic Regression
No ratings yet
Chapter_2_Linear and Logistic Regression
34 pages
P-1.3.1 Linear Regression Analysis
No ratings yet
P-1.3.1 Linear Regression Analysis
9 pages
BA3-4-5modules
No ratings yet
BA3-4-5modules
258 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Module 4
No ratings yet
Module 4
41 pages
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
No ratings yet
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
6 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Lecture Notes - Linear Regression
No ratings yet
Lecture Notes - Linear Regression
26 pages
How Linear Regression Works - A Simple Explanation - by Ravishek Singh - Sep, 2024 - Medium
No ratings yet
How Linear Regression Works - A Simple Explanation - by Ravishek Singh - Sep, 2024 - Medium
13 pages
Predictive Analytics (2)
No ratings yet
Predictive Analytics (2)
46 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Unit 2
No ratings yet
Unit 2
19 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
LINEAR REGRESSION IN R
No ratings yet
LINEAR REGRESSION IN R
6 pages
unit-3 part 2 DA
No ratings yet
unit-3 part 2 DA
20 pages
Linear Regression PDF
100% (1)
Linear Regression PDF
32 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
DA Notes 3
No ratings yet
DA Notes 3
12 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
A Conversation About Calculus
From Everand
A Conversation About Calculus
Ginachukwu Amah
No ratings yet
Chapter 5 - Demand Estimation - Farrukh Wazir Khan - Fall22
No ratings yet
Chapter 5 - Demand Estimation - Farrukh Wazir Khan - Fall22
22 pages
Chapter 4 Regression Models: Quantitative Analysis For Management, 11e (Render)
No ratings yet
Chapter 4 Regression Models: Quantitative Analysis For Management, 11e (Render)
27 pages
Statistics for Business and Economics : Metric Edition, 14th Edition Cengage South-Western - Download the full ebook now to never miss any detail
100% (1)
Statistics for Business and Economics : Metric Edition, 14th Edition Cengage South-Western - Download the full ebook now to never miss any detail
52 pages
Computational Statistics With R
100% (1)
Computational Statistics With R
125 pages
TEST1
No ratings yet
TEST1
8 pages
An Introduction To Regression Analysis
No ratings yet
An Introduction To Regression Analysis
34 pages
Regression III: Advanced Methods: William G. Jacoby Department of Political Science
No ratings yet
Regression III: Advanced Methods: William G. Jacoby Department of Political Science
21 pages
STAT 135: Linear Regression: Joan Bruna
No ratings yet
STAT 135: Linear Regression: Joan Bruna
232 pages
Regression Analysis (1722021)
No ratings yet
Regression Analysis (1722021)
279 pages
Chapter 5: Correlation and Linear Regression: Phan Thi Khanh Van
No ratings yet
Chapter 5: Correlation and Linear Regression: Phan Thi Khanh Van
19 pages
(Ebook) Applications of Regression Models in Epidemiology by Erick SuÃ¡rez, Cynthia M. PÃ©rez, Roberto Rivera, Melissa N. MartÃnez ISBN 9781119212485, 1119212480 pdf download
100% (3)
(Ebook) Applications of Regression Models in Epidemiology by Erick SuÃ¡rez, Cynthia M. PÃ©rez, Roberto Rivera, Melissa N. MartÃnez ISBN 9781119212485, 1119212480 pdf download
48 pages
lecture_8
No ratings yet
lecture_8
29 pages
Basic Business Statistics Australian 4Th Edition Berenson Test Bank Full Chapter PDF
100% (23)
Basic Business Statistics Australian 4Th Edition Berenson Test Bank Full Chapter PDF
68 pages
Data analysis with Microsoft Excel updated for Office 2007 3rd Edition Kenneth N. Berk - Download the ebook and explore the most detailed content
100% (1)
Data analysis with Microsoft Excel updated for Office 2007 3rd Edition Kenneth N. Berk - Download the ebook and explore the most detailed content
57 pages
Econometrics Whole Course PDF
No ratings yet
Econometrics Whole Course PDF
50 pages
Chapter-9-Simple Linear Regression & Correlation
No ratings yet
Chapter-9-Simple Linear Regression & Correlation
11 pages
Econometrics For ECO 2022 Tutorial 4
No ratings yet
Econometrics For ECO 2022 Tutorial 4
2 pages
LBYACST [Lecture Notes] (2)
No ratings yet
LBYACST [Lecture Notes] (2)
7 pages
Demand Forecasting
No ratings yet
Demand Forecasting
99 pages
Chapter 3
No ratings yet
Chapter 3
23 pages
Spract 6
No ratings yet
Spract 6
40 pages
MIT18 05S14 Prac Fnal Exm
No ratings yet
MIT18 05S14 Prac Fnal Exm
8 pages
Statistical Methods For Business and Economics
No ratings yet
Statistical Methods For Business and Economics
888 pages
Get Applied Biostatistics for the Health Sciences 2nd Edition Richard J. Rossi free all chapters
100% (1)
Get Applied Biostatistics for the Health Sciences 2nd Edition Richard J. Rossi free all chapters
47 pages
SolomonAntonioVisuyanTandoyBallartaGumbocAretanoNaive - Ed104 - Pearson R & Simple Regression - April 24, 2021
No ratings yet
SolomonAntonioVisuyanTandoyBallartaGumbocAretanoNaive - Ed104 - Pearson R & Simple Regression - April 24, 2021
13 pages
Management Science
No ratings yet
Management Science
15 pages
II Year IV Semester 2017 Syllabus
No ratings yet
II Year IV Semester 2017 Syllabus
32 pages
M1 Stat-701 SLR 2022
No ratings yet
M1 Stat-701 SLR 2022
17 pages

Group_1_Practical

Uploaded by

Group_1_Practical

Uploaded by

Experiment No 4: Linear Regression Model

● Model Evaluation (Mean Squared Error and R-squared Metrics)

● Generalization and Application

Understanding the basics of linear regression

The assumptions of linear regression are:

Another way to conceptualize linear regression is to think of it as a way to model the

Here's a step-by-step conceptual overview of how linear regression operates:

- `y` is the predicted value of the dependent variable.

5. Training the Model :

9. Limitations and Considerations :

Describing the dataset used in the practical

This dataset can be used for a variety of purposes, such as:

● Predicting the profit to be earned in future.

Identifying the variables and their significance

What is variables in dataset

Cleaning and Handling Missing Data:

Feature Scaling and Normalization:

Standardization (Z-score normalization): Standardization scales data so that it has a mean of 0

Encoding Categorical Variables:

One-Hot Encoding is a technique used in data preprocessing, particularly in the context of

Dividing the Dataset into Training and Testing Sets:

Determining the Split Ratio:

● X_train: This contains the features for the training set.

Why Linear Regression?

● Linearity Assumption: Linear regression assumes a linear relationship between the

We can then apply Y = aX1 + bX2 + cX3 (multiple linear regression).

We will use the r2square method from scikit-learn module metrics.

We will use the mean_squared_error method from scikit-learn module metrics.

Predicting a single value:

The predicted value:

From the above Accuracy Metrics calculations, we can infer that:

Python code for above regression task:

# importing the required libraries

from sklearn.compose import ColumnTransformer

from sklearn.model_selection import train_test_split

from sklearn.metrics import r2_score

import matplotlib.pyplot as plt

#importing the dataset

#assigning the dependent and independent variables

#Encoding the categorical data

#Splitting the dataset into training and test set

#Training multiple linear regression model on training set

#Computing the R-squared and MSE metrics

def model_prediction(model, x_train, y_train, x_test, y_test):

#Calculate the R-squared value for training and test

#Calculate the Mean Squared Error for training and test

print(f"Mean Squared Error on the training set:

print(f"Evaluating {type(model).__name__} model : ")

#Plotting the curve to test fit of actual vs predicted profit

plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)],

# Show the plot

Linear regression is a relatively simple algorithm to understand and implement. However, it is

References used by students:

You might also like

print(f"Evaluating {type(model).name} model : ")