0% found this document useful (0 votes)
8 views

Practical # 10

Uploaded by

Alishba Aleem
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Practical # 10

Uploaded by

Alishba Aleem
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Department of Software Engineering

Mehran University of Engineering and Technology, Jamshoro

Course: SWE – Data Analytics and Business Intelligence


Instructor Ms Sana Faiz Practical/Lab No. 10
Date CLOs 04
Signature Assessment Score

Topic To understand basics of python


Objectives To become familiar with Linear Regression using SciKit

Lab Discussion: Theoretical concepts and Procedural steps

Linear regression
 Linear regression is a basic and commonly used type of predictive analysis. The overall
idea of regression is to examine two things:
(1) does a set of predictor variables do a good job in predicting an outcome
(dependent) variable?
(2) Which variables in particular are significant predictors of the outcome
variable?
Simple linear regression
 1 dependent variable (interval or ratio), 1 independent variable
 These regression estimates are used to explain the relationship between one dependent
variable and one or more independent variables. The simplest form of the regression
equation with one dependent and one independent variable is defined by the formula y =
c + b*x, where y = estimated dependent variable score, c = constant, b = regression
coefficient, and x = score on the independent variable.
Regression variables
 Naming the Variables. There are many names for a regression’s dependent variable. It may
be called an outcome variable, criterion variable, endogenous variable, or regress
 The independent variables can be called exogenous variables, predictor variables, or
regressors.
Import libraries and read data from csv files

# Import the necessary libraries


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Import the dataset
dataset = pd.read_csv('salaryData.csv')
X = dataset.iloc[:, :-1].values # Assuming the feature is in the first column
y = dataset.iloc[:, -1].values # Assuming the target is in the second column

Train classifier and predict outcomes

# Split the dataset into the training set and test set
# We're splitting the data in 1/3, so out of 30 rows, 20 rows will go into the training
set,
# and 10 rows will go into the testing set.
xTrain, xTest, yTrain, yTest = train_test_split(X, y, test_size=1/3, random_state=0)

# Optional: Check the split data (this step depends on your needs)
import pandas as pd
show_data = pd.DataFrame({'Training Set': xTrain.flatten(), 'Training Target':
yTrain})
print(show_data)

# Creating a LinearRegression object and fitting it on our training set.


linearRegressor = LinearRegression()
linearRegressor.fit(xTrain.reshape(-1, 1), yTrain)

# Predicting the test set results


yPrediction = linearRegressor.predict(xTest.reshape(-1, 1))

# Flattening the prediction to match original test set format (if needed)
yPrediction = yPrediction.flatten()
print(yPrediction)
Visualizing training and target training

Showing actual and predicted data and visualize it

# Showing test set and predicted values side by side


results = pd.DataFrame({
'Test Set': xTest.flatten(),
'Actual Value': yTest,
'Predicted Values': yPrediction
})
print(results)

# Visualising the training set results


plt.subplot(2, 1, 1) # Define a 2-row, 1-column grid, and use the 1st cell
plt.scatter(xTrain, yTrain, color='red')
plt.plot(xTrain, linearRegressor.predict(xTrain.reshape(-1, 1)), color='blue')
plt.title('Salary vs Experience (Training set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

Actual and predicted values


Plotting test set data

# Visualising the test set results


plt.subplot(2, 1, 2) # Define a 2-row, 1-column grid, and use the 2nd cell
plt.scatter(xTest, yTest, color='red')
plt.plot(xTrain, linearRegressor.predict(xTrain.reshape(-1, 1)), color='blue') # Use
training data to plot regression line
plt.title('Salary vs Experience (Test set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

Regression metrics

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Calculating and printing the performance metrics


print("Mean Absolute Error:", mean_absolute_error(yTest, yPrediction))
print("Mean Squared Error:", mean_squared_error(yTest, yPrediction))
print("Variance Score (R^2):", r2_score(yTest, yPrediction))

Regression metrics
 Mean Absolute Error: mean absolute error is a measure of difference between two
continuous variables.
 Mean Squared Error: the mean squared error or mean squared deviation of an estimator
measures the average of the squares of the errors—that is, the average squared difference
between the estimated values and what is estimated. MSE is a risk function, corresponding to
the expected value of the squared error loss.
Visualizing data
Class Tasks
Submission Date: --

 Perform linear regression on the student dataset uploaded on the drive.


 Perform linear regression on the dataset of your own choice.

You might also like