0% found this document useful (0 votes)
306 views

Linear Regression Hands-On

Linear Regression

Uploaded by

Nishant Randev
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
306 views

Linear Regression Hands-On

Linear Regression

Uploaded by

Nishant Randev
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Linear

Regression
InsuranceCharges
Prediction

Copyright Intellipaat. All rights reserved.


Agenda
01 Business 05 Train and Test Split
Problem

02 Solution 06 Model Fitting


Approach

03 ED 07 Performance of the model


A

04 Feature Selection And


Feature Scaling

Copyright Intellipaat. All rights reserved.


Business Problem - Background

The objective of proposed work is to predict the insurance charges of a person and
identify those patients with health insurance policy and medical details weather they
have any health issues or not.

The level of treatment in crisis department vary drastically depending the type of
health insurance a person has by this we predict the insurance charges of a person .

Copyright IntelliPaat, All rights reserved


Solution Approach

Using linear regression model for health insurance prediction is proposed. Some factors like age, gender,
bmi, smoker, and children , no.of.past consultation were input for developing the linear regression model .

This kind of model is


useful for insurance
companies to determine
the yearly insurance
premium charges for a
person

Copyright IntelliPaat, All rights reserved


EDA- Loading The Data

We have imported all the


necessary libraries to
perform the EDA, and
have imported the
dataset using the
read_csv() from the
pandas module.

Copyright IntelliPaat, All rights reserved


EDA- Printing the First 5 Rows

To print the first five rows of the imported data, we are using the head() method for the pandas dataframe.

Copyright IntelliPaat, All rights reserved


EDA- Label Encoding

To make the dataset clear we have transformed the columns ‘sex’ and ‘smoker’ using the label
encoder and changed the data type to integer.

Copyright IntelliPaat, All rights reserved


EDA- Label Encoding

The first five rows of the dataset shows the smoker and sex column with the changed
values instead of strings to integer data type.

Copyright IntelliPaat, All rights reserved


EDA- Descriptive Analysis

The info() function gives the


information about the entire dataset.
You will get information about the table
columns, how many entries (not-null)
are present for the columns with their
respective data types.

Copyright IntelliPaat, All rights reserved


EDA- Descriptive Analysis

For further information


about the data, we can
get the column names,
the shape of the data
and the dimensions of
the data available to
have more clarity.

Copyright IntelliPaat, All rights reserved


EDA- Descriptive Analysis

The describe() method gives the five point summary of the data that includes the count, mean,
standard deviation, 25th percentile, 50th percentile, 75th percentile, minimum and maximum of each of
the columns in the data that are of the type ‘numbers’.

Copyright IntelliPaat, All rights reserved


EDA- Descriptive Analysis

We can plot the distribution plots for


the columns to get more clarity on the
distribution of the data.

Copyright IntelliPaat, All rights reserved


EDA- Descriptive Analysis

The distribution plots


show the data to be
skewed for some of the
features and close to a
normal distribution for
some.

Copyright IntelliPaat, All rights reserved


EDA- Descriptive Analysis

To get a better idea about the


peakedness and presence of
outliers in the data, we can
make use of the boxplots.

Copyright IntelliPaat, All rights reserved


EDA- Descriptive Analysis

The plots show presence


of outliers in some of the
features, but we can use
the data since the
values/outliers does not
have to be dealt with at all
times. Sometimes,
keeping the outliers can
still yield good results.

Copyright IntelliPaat, All rights reserved


EDA- Descriptive Analysis

The pair plot


showcases the
relationship between
each of the features
with the target
variable.

Copyright IntelliPaat, All rights reserved


EDA- Handling Null Values

Null values in the dataset can cause inefficiency in the model. Therefore has to be dealt with – either
we can drop the null values, or we can replace/fill the null values with mean, median, mode of the
column.

Copyright IntelliPaat, All rights reserved


EDA- Handling Null Values

The features where the distribution was skewed,


the null values are replaced with the median, and
the rest are replaced with the mean values. If there
were object type features, we could have replaced
them with the mode values.

Copyright IntelliPaat, All rights reserved


Feature Selection Using Correlation

The columns that show good


correlation with the target
variable are going to be used
for the prediction.

Copyright IntelliPaat, All rights reserved


Splitting the Data into Train And Test

Splitting the
dataset into
training and testing
data in the ratio
80:20.

Copyright IntelliPaat, All rights reserved


Feature Scaling

Feature scaling through standardization


is a necessary practice to normalize the
features so that they will have properties
of a standard normal distribution i.e.
mean is zero and standard deviation is
one.

Copyright IntelliPaat, All rights reserved


Linear Regression Model

We can create a linear regression


model and fit the training data using
the fit() method, and make predictions
on the test or new data using the
predict() method.

Copyright IntelliPaat, All rights reserved


Model Evaluation – r2 and adjusted r2 score

R squared tells us the


goodness of fit. The
larger the r2 score, the
better the regression
model fits the
observations.

The adjusted R squared statistic takes into account the number of predictor
variables and helps us in determining the goodness of fit in presence of new
predictor variables.

Copyright IntelliPaat, All rights reserved


Model Evaluation – Mean Squared Error

Calculates the average


of squares of the errors
of the estimators.

Copyright IntelliPaat, All rights reserved


Model Evaluation – Mean Absolute Percentage Error

Mean absolute percentage error gives you an estimate of the percentage error between the actual
and predicted values.

Copyright IntelliPaat, All rights reserved


Model Evaluation – Plotting the best fit line

Using the actual and


predicted values to plot
the best fit line and
understand the error.

Copyright IntelliPaat, All rights reserved


Model Evaluation – Plotting the best fit line

Plotting the best fit line


and the actual and
predicted values.

Copyright IntelliPaat, All rights reserved

You might also like