Practical # 10
Practical # 10
Linear regression
Linear regression is a basic and commonly used type of predictive analysis. The overall
idea of regression is to examine two things:
(1) does a set of predictor variables do a good job in predicting an outcome
(dependent) variable?
(2) Which variables in particular are significant predictors of the outcome
variable?
Simple linear regression
1 dependent variable (interval or ratio), 1 independent variable
These regression estimates are used to explain the relationship between one dependent
variable and one or more independent variables. The simplest form of the regression
equation with one dependent and one independent variable is defined by the formula y =
c + b*x, where y = estimated dependent variable score, c = constant, b = regression
coefficient, and x = score on the independent variable.
Regression variables
Naming the Variables. There are many names for a regression’s dependent variable. It may
be called an outcome variable, criterion variable, endogenous variable, or regress
The independent variables can be called exogenous variables, predictor variables, or
regressors.
Import libraries and read data from csv files
# Split the dataset into the training set and test set
# We're splitting the data in 1/3, so out of 30 rows, 20 rows will go into the training
set,
# and 10 rows will go into the testing set.
xTrain, xTest, yTrain, yTest = train_test_split(X, y, test_size=1/3, random_state=0)
# Optional: Check the split data (this step depends on your needs)
import pandas as pd
show_data = pd.DataFrame({'Training Set': xTrain.flatten(), 'Training Target':
yTrain})
print(show_data)
# Flattening the prediction to match original test set format (if needed)
yPrediction = yPrediction.flatten()
print(yPrediction)
Visualizing training and target training
Regression metrics
Regression metrics
Mean Absolute Error: mean absolute error is a measure of difference between two
continuous variables.
Mean Squared Error: the mean squared error or mean squared deviation of an estimator
measures the average of the squares of the errors—that is, the average squared difference
between the estimated values and what is estimated. MSE is a risk function, corresponding to
the expected value of the squared error loss.
Visualizing data
Class Tasks
Submission Date: --