0% found this document useful (0 votes)
73 views21 pages

Predictive Analytics - Business Predictions Using Mutliple Linear Regression

Predictive analytics uses statistics and modeling to predict future outcomes. Multiple linear regression analyzes relationships between dependent and independent variables to predict values. This document outlines assumptions and steps for conducting multiple linear regression to predict student achievement from factors like interest, anxiety, goals, gender identity. Key outputs include goodness of fit tests, predictive equations, and interpretations of coefficients.

Uploaded by

Sakshi Garg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views21 pages

Predictive Analytics - Business Predictions Using Mutliple Linear Regression

Predictive analytics uses statistics and modeling to predict future outcomes. Multiple linear regression analyzes relationships between dependent and independent variables to predict values. This document outlines assumptions and steps for conducting multiple linear regression to predict student achievement from factors like interest, anxiety, goals, gender identity. Key outputs include goodness of fit tests, predictive equations, and interpretations of coefficients.

Uploaded by

Sakshi Garg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

PREDICTIVE ANALYTICS -

BUSINESS PREDICTIONS
USING MUTLIPLE LINEAR
REGRESSION
What is Predictive Analytics?
• The term predictive analytics refers to the use of statistics and
modeling techniques to make predictions about future outcomes
and performance.
• Predictive analytics looks at current and historical data patterns to
determine if those patterns are likely to emerge again.
• This allows businesses and investors to adjust where they use their
resources to take advantage of possible future events.
• Predictive modeling is often used to clean and optimize the quality
of data used for such forecasts.
• Modeling ensures that more data can be ingested by the system,
including from customer-facing operations, to ensure a more
accurate forecast.
What is Multiple Linear Regression?
• A multiple linear regression analysis is carried out to predict the values of a
dependent variable, Y, given a set of kth predictor variables (X1, X2, …, Xk).
• Multiple linear regression is used to estimate the relationship between two or
more independent variables and one dependent variable.
• We also use it when we want to determine which variables are better predictors
than others. (Variables Selection).
• The objective of multiple regression analysis is to use the independent variables
whose values are known to predict the value of the single dependent value.
• For example, if you're doing a multiple regression to try to predict blood
pressure (the dependent variable) from independent variables such as height,
weight, age, and hours of exercise per week, you'd also want to include sex as
one of your independent variables.
ASSUMPTION 1

Dependent variable should be measured


on a continuous scale (i.e., it is either an
interval or ratio variable).

Examples of variables that meet this


criterion include Achieve (measured from 0
to 100).
ASSUMPTION 2

You have two or more independent variables,


which can be either continuous (i.e., an interval
or ratio variable) or categorical (i.e., an ordinal
or nominal variable).

Examples of variables that meet this criterion


include Perfgoal, Mastery, Interest, Anxiety and
Genderid (measured from 0 to 100).
ASSUMPTION 3 ASSUMPTION 4

You should have independence of observations There needs to be a linear relationship between (a) the
(i.e., independence of residuals), which you can dependent variable and each of your independent
easily check using the Durbin-Watson statistic, variables, and (b) the dependent variable and the
which is a simple test to run using SPSS Statistics. independent variables collectively. Whilst there are a
number of ways to check for these linear relationships,
we suggest creating scatterplots and partial regression
plots using SPSS Statistics, and then visually
inspecting these scatterplots and partial regression
plots to check for linearity.
ASSUMPTION 5

There should be no significant outliers,


high leverage points or highly influential
points. Outliers, leverage and influential
points are different terms used to
represent observations in your data set
that are in some way unusual when you
wish to perform a multiple regression
analysis.
ASSUMPTION 6

One of the assumptions of linear


regression is that the residuals are
normally distributed*.

we have a histogram of the standardized


residuals (note: The unstandardized and
standardized residuals have the same mean
(of zero) and shape (in terms of skewness
and kurtosis). They differ in terms of their
standard deviations. Here, the residuals
exhibit only a minor departure from
normality.
SCENARIO
We will be carrying out a multiple regression with student (a) interest, (b)
anxiety, (c) mastery goals, (d) performance goals and (e) gender identification as
predictors (IV’s) of student achievement, Interest, anxiety, mastery goals,
performance goals, and achievement are all assumed to be continuous variables.

Gender identification is a binary IV (which is permissible in OLS regression)


which has been dummy coded (coded 0=identified male, 1=identified female).
Dummy coding is a form of coding of binary variables that facilitates
interpretation of the intercept when included in a regression model.
STEPS TO PERFORM MULTIPLE LINEAR REGRESSION

STEPS TO PERFORM
MULTIPLE LINEAR
REGRESSION
STEP 1:
To open your Excel file in SPSS:
1. File, Open, Data, from the SPSS menu.
2. Select type of file you want to open, Excel *.xls
*.xlsx, *.xlsm .
3. Select file name.
4. Click 'Read variable names'  if the first row of the
spreadsheet contains column headings.
5. Click Open.
Analyze > Regression > Linear > Move Achieve to Dependent, all
STEP 2 other variables (Perfgoal, Mastery, Interest, Anxiety, Genderid)
to Independent(s).
STEP 3:
Now we will fill out the sub-dialogs as shown below
In Statistics, we will check the following-
• Estimates
• Confidence Interval Level (95%)
• Model Fit
• Descriptives
• Part and Partial Correlations
• Collinearity Diagnostics
• Under Residuals
Case wise Diagnostics
Outliner Outside (3) SD
•In Plots > SRESID IN Y > ZPRED IN X > check Histogram > check
STEP 4 Normal Probability plot > In Standardized Residual Plot Check >
Produce all partial plots.
STEP 5:
In Save Option
INTERPRETATION OF THE
DATA
• Descriptive Analysis is the type of
analysis of data that helps describe,
show or summarize data points in a
constructive way such that patterns
might emerge that fulfill every
condition of the data.

• It gives you a conclusion of the


distribution of your data, helps you
detect typos and outliers, and
enables you to identify similarities
among variables, thus making you
ready for conducting further
statistical analyses.
INTERPRETATION OF
THE DATA

• The first table of interest is the Model


Summary table. This table provides the R, R
2 , adjusted R 2 , and the standard error of
the estimate, which can be used to determine
how well a regression model fits the data.
• The "R" column represents the value of R,
the multiple correlation coefficient. R can be
one measure of the quality of the prediction
of the dependent variable.
• In this case, Achieve has a value of 0.642, in
this example, indicates a good level of
prediction.
• The F-ratio in the ANOVA table (see below) tests whether the overall regression
INTERPRETATION OF model is a good fit for the data. The table shows that the independent variables

THE DATA statistically significantly predict the dependent variable, F(45, 134) = 18.770, p
< .0005 (i.e., the regression model is a good fit of the data).
INTERPRETATION OF THE DATA
• The general form of the equation to predict Achieve from
Perfgoal, Mastery, Interest, Anxiety and Genderid is
predicted Achieve = 2.357 – (0.010 x Perfgoal) – (0.325 x
Mastery) – (0.198 x Interest) -0.023 x Anxiety) – (0.235 x
Genderid)

• This is obtained from the Coefficients table, as shown


below Unstandardized coefficients indicate how much
the dependent variable varies with an independent
variable when all other independent variables are held
constant.
CONCLUSION
• A multiple regression was run to predict from gender, age,
weight and heart rate.

• These variables statistically significantly predicted VO2max,


F(45, 134) = 18.770, p < .0005 (i.e., the regression model is a
good fit of the data).

• R2 = .412. All four variables added statistically significantly to


the prediction, p < .05.

You might also like