Linear Regression Hands-On
Linear Regression Hands-On
Regression
InsuranceCharges
Prediction
The objective of proposed work is to predict the insurance charges of a person and
identify those patients with health insurance policy and medical details weather they
have any health issues or not.
The level of treatment in crisis department vary drastically depending the type of
health insurance a person has by this we predict the insurance charges of a person .
Using linear regression model for health insurance prediction is proposed. Some factors like age, gender,
bmi, smoker, and children , no.of.past consultation were input for developing the linear regression model .
To print the first five rows of the imported data, we are using the head() method for the pandas dataframe.
To make the dataset clear we have transformed the columns ‘sex’ and ‘smoker’ using the label
encoder and changed the data type to integer.
The first five rows of the dataset shows the smoker and sex column with the changed
values instead of strings to integer data type.
The describe() method gives the five point summary of the data that includes the count, mean,
standard deviation, 25th percentile, 50th percentile, 75th percentile, minimum and maximum of each of
the columns in the data that are of the type ‘numbers’.
Null values in the dataset can cause inefficiency in the model. Therefore has to be dealt with – either
we can drop the null values, or we can replace/fill the null values with mean, median, mode of the
column.
Splitting the
dataset into
training and testing
data in the ratio
80:20.
The adjusted R squared statistic takes into account the number of predictor
variables and helps us in determining the goodness of fit in presence of new
predictor variables.
Mean absolute percentage error gives you an estimate of the percentage error between the actual
and predicted values.