Simple Liner REgression
Simple Liner REgression
Simple linear regression aims to find a linear relationship (between two variables) to describe the
correlation between an independent and possibly dependent variable.
To do linear regression analysis, first, we need to add excel add-ins by following steps.
Linear regression is a statistical modeling method that examines the relationship between
a dependent variable and one or more independent variables. Establishing a linear
equation constructs a best-fit line to represent the data.
Linear regression is a commonly employed technique for predictive analysis and plays a
vital role in statistical modeling.
When performing linear regression analysis, the focus is on determining the equation of a
straight line that effectively represents the average relationship between variables in the
dataset.
Click on File – Options (This will open an Excel Options pop-up for you).
Click on Add-ins – Select “Excel Add-ins” from the Manage Drop Down in excel, then Click on
"Go".
It will open the "Add-ins" pop-up. Select Analysis ToolPak, then click "OK."
The "Data Analysis" add-in will appear under the "Insert" tab.
Suppose we have monthly sales and spent on marketing for last year. Now, we need to predict
future sales based on last year’s sales and marketing spending.
Click "Data Analysis" under the "Data" tab to open the "Data Analysis" pop-up for you.
Select "Regression" from the list and click "OK."
Checkmark the "Labels" box if you have selected headers in the data. Else, it will give you the
error.
Next, select "Output Range" if you want to get the value on the specific range on the worksheet.
Otherwise, select "New Worksheet Ply," which will add a new worksheet and give you the
result.
Then, check the "Residuals" box and click "OK."
It will add worksheets and give you the following result.
Summary Output
Multiple R: This represents the correlation coefficient. The value 1 shows a positive
relationship, and the value 0 shows no relationship.
R Square: R Square represents the coefficient of determination. It tells you the percentage of
points that fall on the regression line. 0.49 means that 49% of values fit the model
Adjusted R square: This is adjusted R square, which requires when you have more than one X
variable.
Standard Error: This represents an estimate of the standard deviation of error. It is the
precision of the regression coefficient that is measured.
Coefficient: The coefficient gives you the estimate of the least squares.
Lower 95% and Upper 95%: These are the lower boundary and the upper boundary for
the confidence interval.
Residuals Output: We have 12 observations based on the data. The second column represents
"Predicted" sales, and the third is "Residuals." Residuals are the difference in predicted sales
from the actual ones.
Example#2
Go to the chart group under the "Insert" tab. Next, select the "Scatter" chart icon.
It will insert the scatter plot in excel. See the image below.
Right-click on any point, then select Add Trendline in excel. It will add a Trendline to your
chart.
You can format the trendline by right-clicking anywhere on the trendline and then
selecting the format trendline.
You can make more improvements to the chart. i.e., formatting the trendline, color and
changing title, etc.
You can also show the formula on the graph by checking the formula on the chart and
displaying the R-squared value.
Some More Examples of Linear Regression Analysis:
Example #3
Suppose we have nine students with their IQ level and the number they scored on Test.
.Shyam 97 140
Kul 93 130
Kappu 91 125
Raju 89 115
Vishal 86 110
Vivek 82 100
Vinay 78 95
Kumar 75 90
Step 1: First, find out the dependent and independent variables. Here, the test score is the
dependent variable, and IQ is the independent variable, as the test score varies as IQ changes.
Step 2: Go to Data Tab – Click on Data Analysis – Select regression – click "OK."
You will get the summary output shown in the below Image.
Step 4: Analysing the regression by summary output.
Summary Output
Multiple R: Here, the correlation coefficient is 0.99, which is very near 1, which means the
linear relationship is very positive.
R Square: R-Square value is 0.983, which means that 98.3% of values fit the model.
P-value: Here, P-value is 1.86881E-07, which is very less than .1, which means IQ has
significant predictive values.
Example #4
We need to predict sales of AC based on the sales and temperature for a different month.
Jan 25 38893
Feb 28 42254
Mar 31 42845
Apr 33 47917
May 37 51243
Jun 40 69588
Jul 38 56570
Aug 37 50000
Follow the below steps to get the regression result.
Step 1: First, find out the dependent and independent variables. Sales are the dependent variable,
and temperature is an independent variable as sales vary as Temp changes.
Step 2: Go to the "Data" tab – Click on "Data Analysis" – Select "Regression," – click "OK."
Multiple R: Here, the correlation coefficient is 0.877, near 1, which means the linear
relationship is positive.
R Square: R-Square value is 0.770, which means that 77% of values fit the model.
P-Value: Here, P-value is 1.86881E-07, which is very less than .1, which means IQ has
significant predictive values.
Example #5
Step 1. First, find out the dependent and independent variables. Here, sales are the dependent
variable, as quantity and population. Both are independent variables as sales vary with the
country's quantity and population.
Step 2. Go to the "Data" tab – Click on "Data Analysis" – Select “Regression” – click "OK."
It will open the regression window for you.
Step 3. Input sales in the "Input Y Range" box and select quantity and population in the "Input X
Range" box. (Check on "Labels" if you have headers in your data range. Select output options,
then check on the desired residuals. Click "OK."
Run the regression using data analysis under the "Data" tab. It will give you the below result.
Summary Output
Multiple R: Here, the correlation coefficient is 0.93, which is very near 1, which means the
Linear relationship is very positive.
R Square: R-Square value is 0.866, which means that 86.7% of values fit the model.
Significance F: Significance F is less than .1, which means that the regression equation has a
significant predictive value.
P-Value: If you look at P-value for quantity and population, you can see that values are less than
.1, which means quantity and population have significant predictive values. The fewer P-values
mean that a variable has more significant predictive values.
However, quantity and population have significant predictive value. Still, If you look at P-value
for quantity and population, you can see that quantity has a lesser P-value in excel than
population. It means quantity has a more significant predictive value than population.
Things to Remember
Whenever one selects data, one must always check the dependent and independent
variables.
Linear regression analysis considers the relationship between the mean of the variables.
Sometimes, it is not the best fit for a real-world problem. For example: (age and wages).
Most of the time, wages increase as age increases. However, after retirement, age
increases but wages decrease.