0% found this document useful (0 votes)
12 views

Unit 3 Regression

Uploaded by

Paidi Ashritha
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Unit 3 Regression

Uploaded by

Paidi Ashritha
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 91

Unit 3

Contents

Unit – III: Regression

Concepts, Blue property assumptions, Linear regression, Logistic regression, Least Square Estimation, Variable
Rationalization, Model Building, etc.

Logistic Regression: Binary, Multinomial regression, Model Theory, Model fit Statistics, Maximum Likelihood
Estimation(MLE), Model Construction, Analytics applications to various Business Domains, Finance, marketing,
credit card companies
Regression Analysis

• Regression analysis is a statistical method to model the relationship between


dependent (target) and independent (predictor) variables with one or more
independent variables.
• More specifically, Regression analysis helps us to understand how the value of the
dependent variable changes corresponding to an independent variable when other
independent variables are held fixed.
• It predicts continuous/real values such as temperature, age, salary, price, etc.

Lecture Notes for E Alpaydın 2010 Introduction to Ma 3


chine Learning 2e © The MIT Press (V1.0)
Example: Suppose there is a marketing company A, which does various advertisements
every year and gets sales on that. The below list shows the advertisement made by the
company in the last 5 years and the corresponding sales:

Now, the company wants to do the advertisement of $200 in the year 2023 and wants to know the
prediction about the sales for this year. So to solve such type of prediction problems in machine
learning, we need regression analysis.
Lecture Notes for E Alpaydın 2010 Introduction to Ma 4
chine Learning 2e © The MIT Press (V1.0)
• Regression is a supervised learning technique which helps in finding the correlation between
variables and enables us to predict the continuous output variable based on the one or more
predictor variables. It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.
• In Regression, we plot a graph between the variables which best fits the given datapoints, using
this plot, the machine learning model can make predictions about the data. In simple
words, "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum." The distance between datapoints and line tells whether a model
has captured a strong relationship or not.
Some examples of regression can be as:
• Prediction of rain using temperature and other factors
• Determining Market trends
• Prediction of road accidents due to rash driving.

Lecture Notes for E Alpaydın 2010 Introduction to Ma 5


chine Learning 2e © The MIT Press (V1.0)
Terminologies
• Dependent Variable: The main factor in Regression analysis that we want to predict or
understand is called the dependent variable. It is also called the target variable.
• Independent Variable: The factors which affect the dependent variables or which are used to
predict the values of the dependent variables are called independent variables, also called
predictor.
• Outliers: Outlier is an observation that contains either a very low value or very high value in
comparison to other observed values. An outlier may hamper the result, so it should be
avoided.
• Multicollinearity: If the independent variables are highly correlated with each other than
other variables, then such condition is called Multicollinearity. It should not be present in the
dataset, because it creates problems while ranking the most affecting variable.
• Underfitting and Overfitting: If our algorithm works well with the training dataset but not
well with the test dataset, then such a problem is called Overfitting. And if our algorithm
does not perform well even with a training dataset, then such a problem is
called underfitting.
Lecture Notes for E Alpaydın 2010 Introduction to Ma 6
chine Learning 2e © The MIT Press (V1.0)
Why do we use regression analysis?
• Regression analysis helps in the prediction of a continuous variable. There are various
scenarios in the real world where we need some future predictions such as weather
conditions, sales prediction, marketing trends, etc., for such a case we need some
technology that can make predictions more accurately.
• So for such a case we need Regression analysis which is a statistical method used in
machine learning and data science.
Below are some other reasons for using Regression analysis:
• Regression estimates the relationship between the target and the independent variable.
• It is used to find the trends in data.
• It helps to predict real/continuous values.
• By performing the regression, we can confidently determine the most important factor,
the least important factor, and how each factor is affecting the other factors.
Lecture Notes for E Alpaydın 2010 Introduction to Ma 7
chine Learning 2e © The MIT Press (V1.0)
• There are various types of regressions that are used in data science and machine
learning. Each type has its own importance in different scenarios, but at the core,
all the regression methods analyze the effect of the independent variable on the
dependent variables

Lecture Notes for E Alpaydın 2010 Introduction to Ma 8


chine Learning 2e © The MIT Press (V1.0)
Linear Regression
• Linear regression is a statistical regression method that is used for predictive
analysis.
• It is one of the very simple and easy algorithms that works on regression and shows
the relationship between the continuous variables.
• It is used for solving the regression problem in machine learning.
• Linear regression shows the linear relationship between the independent variable
(X-axis) and the dependent variable (Y-axis), hence called linear regression.
• If there is only one input variable (x), then such linear regression is called simple
linear regression. If there is more than one input variable, then such linear
regression is called multiple linear regression.
• The relationship between variables in the linear regression model can be explained
using the below image. Here we are predicting the salary of an employee based
on the year of experience.
Lecture Notes for E Alpaydın 2010 Introduction to Ma 9
chine Learning 2e © The MIT Press (V1.0)
Lecture Notes for E Alpaydın 2010 Introduction to Ma 10
chine Learning 2e © The MIT Press (V1.0)
Below is the mathematical equation for Linear regression:

Y= aX+b

Here,
Y= Dependent variables (target variables), X= Independent variables (predictor variables),
a and b are the linear coefficients

Some popular applications of linear regression are:


Analyzing trends and sales estimates
Salary forecasting
Real estate prediction
Arriving at ETAs in traffic.

Lecture Notes for E Alpaydın 2010 Introduction to Ma 11


chine Learning 2e © The MIT Press (V1.0)
Logistic Regression

• Logistic regression is another supervised learning algorithm which is used to solve


the classification problems. In classification problems, we have dependent
variables in a binary or discrete format such as 0 or 1.
• Logistic regression algorithm works with categorical variables such as 0 or 1, Yes or
No, True or False, Spam or not spam, etc.
• It is a predictive analysis algorithm that works on the concept of probability.
• Logistic regression is a type of regression, but it is different from the linear
regression algorithm in terms of how they are used.
• Logistic regression uses a sigmoid function or logistic function which is a complex
cost function. This sigmoid function is used to model the data in logistic regression.
The function can be represented as:

Lecture Notes for E Alpaydın 2010 Introduction to Ma 12


chine Learning 2e © The MIT Press (V1.0)
f(x)= Output between the 0 and 1 value.
x= input to the function
e= base of natural logarithm.

When we provide the input values (data) to the function, it


gives the S-curve as follows:

It uses the concept of threshold levels, values above the


threshold level are rounded up to 1, and values below the
threshold level are rounded up to 0.

There are three types of logistic regression:


Binary(0/1, pass/fail)
Multi(cats, dogs, lions)
Ordinal(low, medium, high)

Lecture Notes for E Alpaydın 2010 Introduction to Ma 13


chine Learning 2e © The MIT Press (V1.0)
Linear Regression

• Linear regression is a statistical method for modeling relationships between a


dependent variable with a given set of independent variables.

Note: we refer to dependent variables as responses and independent variables


as features for simplicity

Lecture Notes for E Alpaydın 2010 Introduction to Ma 14


chine Learning 2e © The MIT Press (V1.0)
Simple linear regression

• Simple linear regression is an approach for predicting a response using a single feature.
• It is assumed that the two variables are linearly related. Hence, we try to find a linear
function that predicts the response value(y) as accurately as possible as a function of the
feature or independent variable(x).
Let us consider a dataset where we have a value of response y for every feature x:

Lecture Notes for E Alpaydın 2010 Introduction to Ma 15


chine Learning 2e © The MIT Press (V1.0)
• For generality, we define:

x as feature vector, i.e x = [x_1, x_2, …., x_n],


y as response vector, i.e y = [y_1, y_2, …., y_n]
for n observations (in the above example, n=10).
A scatter plot of the above dataset looks like:-

Lecture Notes for E Alpaydın 2010 Introduction to Ma 16


chine Learning 2e © The MIT Press (V1.0)
• Now, the task is to find a line that fits best in the above scatter plot so that we can
predict the response for any new feature values. (i.e a value of x not present in a
dataset)
This line is called a regression line.
The equation of regression line is represented as:

Lecture Notes for E Alpaydın 2010 Introduction to Ma 17


chine Learning 2e © The MIT Press (V1.0)
Lecture Notes for E Alpaydın 2010 Introduction to Ma 18
chine Learning 2e © The MIT Press (V1.0)
For example, let's say we have a regression equation of y = 2 + 0.5x. For
every 1-unit increase in the independent variable (x), there will be a 0.50
increase in the dependent variable (y).
Lecture Notes for E Alpaydın 2010 Introduction to Ma 19
chine Learning 2e © The MIT Press (V1.0)
Lecture Notes for E Alpaydın 2010 Introduction to Ma 20
chine Learning 2e © The MIT Press (V1.0)
Calculating How Well The Regression Line Fits

• To determine how well our regression line fits the data, we want to calculate the
correlation coefficient, commonly referred to just as R, and the coefficient of
determination, otherwise known as R² (R squared).

• Coefficient of Determination (R²) — The percentage of variance explained by the


independent variable (x) with values between 0 and 1. It cannot be negative because it is
a square value. For example, if R² = 0.81, then this tells you that x explains 81% of the
variance in y. Otherwise known as the “goodness of fit”.

Lecture Notes for E Alpaydın 2010 Introduction to Ma 21


chine Learning 2e © The MIT Press (V1.0)
• Correlation Coefficient (R) — The degree of relationship or correlation between two
variables (x and y in this case). R can range from -1 to 1 with values equal to 1 meaning a
perfect positive correlation and values equal to -1 meaning a perfect negative
correlation.
• Below is the formula for Pearson’s correlation coefficient:

Lecture Notes for E Alpaydın 2010 Introduction to Ma 22


chine Learning 2e © The MIT Press (V1.0)
Ex:

Lecture Notes for E Alpaydın 2010 Introduction to Ma 23


chine Learning 2e © The MIT Press (V1.0)
Lecture Notes for E Alpaydın 2010 Introduction to Ma 24
chine Learning 2e © The MIT Press (V1.0)
Lecture Notes for E Alpaydın 2010 Introduction to Ma 25
chine Learning 2e © The MIT Press (V1.0)
Another way

Lecture Notes for E Alpaydın 2010 Introduction to Ma 26


chine Learning 2e © The MIT Press (V1.0)
Lecture Notes for E Alpaydın 2010 Introduction to Ma 27
chine Learning 2e © The MIT Press (V1.0)
Least Square Method

Here x̅ is the mean of all the values in the input X and ȳ is the mean of all the
values in the desired output Y. This is the Least Squares method.
Summary

• The least-squares method is used to predict the behavior of the dependent variable with
respect to the independent variable.
• The sum of the squares of errors is called variance.
• The main aim of the least-squares method is to minimize the sum of the squared errors.
Model Building Life Cycle in Data Analytics:

• When we come across a business analytical problem, without acknowledging the


stumbling blocks, we proceed towards the execution. Before realizing the misfortunes,
we try to implement and predict the outcomes.

• The problem-solving steps involved in the data science model-building life cycle. Let’s
understand every model-building step in-depth, The data science model-building life
cycle includes some important steps to follow. The following are the steps to follow to
build a Data Model
1. Problem Definition

• The first step in constructing a model is to understand the industrial problem in a more
comprehensive way. To identify the purpose of the problem and the prediction target, we
must define the project objectives appropriately.
• Therefore, to proceed with an analytical approach, we have to recognize the obstacles
first. Remember, excellent results always depend on a better understanding of the
problem.
2. Hypothesis Generation

• Hypothesis generation is the guessing approach through which we derive some essential
data parameters that have a significant correlation with the prediction target.
• Your hypothesis research must be in-depth, looking for every perceptive of all
stakeholders into account. We search for every suitable factor that can influence the
outcome.
• Hypothesis generation focuses on what you can create rather than what is available in
the dataset
3. Data Collection

• Data collection is gathering data from relevant sources regarding the analytical
problem, then we extract meaningful insights from the data for prediction.
The data gathered must have:

• Proficiency in answering hypothesis questions.


• Capacity to elaborate on every data parameter.
• Effectiveness to justify your research.
• Competency to predict outcomes accurately.
4. Data Exploration/Transformation

• The data you collected may be in unfamiliar shapes and sizes. It may contain unnecessary
features, null values, unanticipated small values, or immense values. So, before applying
any algorithmic model to data, we have to explore it first.
• By inspecting the data, we get to understand the explicit and hidden trends in data. We
find the relation between data features and the target variable.
• Usually, a data scientist invests his 60–70% of project time dealing with data exploration
only.
• There are several sub-steps involved in data exploration:
o Feature Identification:

• You need to analyze which data features are available and which ones are not.
• Identify independent and target variables.
• Identify data types and categories of these variables.
o Univariate Analysis:

• We inspect each variable one by one. This kind of analysis depends on the variable type
whether it is categorical and continuous.
• Continuous variable: We mainly look for statistical trends like mean, median, standard
deviation, skewness, and many more in the dataset.
• Categorical variable: We use a frequency table to understand the spread of data for each
category. We can measure the counts and frequency of occurrence of values
o Multi-variate Analysis:

• The bi-variate analysis helps to discover the relation between two or more variables.
• We can find the correlation in case of continuous variables and the case of
categorical, we look for association and dissociation between them.
o Filling Null Values:

• Usually, the dataset contains null values which lead to lower the potential of the
model.
• With a continuous variable, we fill these null values using the mean or mode of
that specific column.
• For the null values present in the categorical column, we replace them with the
most frequently occurring categorical value.
• Remember, don’t delete those rows because you may lose the information.
5. Predictive Modeling

• Predictive modeling is a mathematical approach to create a statistical model to forecast


future behavior based on input test data. Steps involved in predictive modeling:

• Algorithm Selection:
o When we have a structured dataset, and we want to estimate the continuous or
categorical outcome then we use supervised machine learning methodologies like
regression and classification techniques. When we have unstructured data and want to
predict the clusters of items to which a particular input test sample belongs, we use
unsupervised algorithms. An actual data scientist applies multiple algorithms to get a more
accurate model.
Train Model:
After assigning the algorithm and getting the data handy, we train our model using
the input data applying the preferred algorithm. It is an action to determine the
correspondence between independent variables, and the prediction targets.

Model Prediction
• We make predictions by giving the input test data to the trained model. We
measure the accuracy by using a cross-validation strategy or ROC curve which
performs well to derive model output for test data.
6. Model Deployment

• There is nothing better than deploying the model in a real-time environment. It helps
us to gain analytical insights into the decision-making procedure. You constantly
need to update the model with additional features for customer satisfaction.
• To predict business decisions, plan market strategies, and create personalized
customer interests, we integrate the machine learning model into the existing
production domain.
• When you go through the Amazon website and notice the product recommendations
completely based on your curiosities. You can experience the increase in the
involvement of the customers utilizing these services. That’s how a deployed model
changes the mindset of the customer and convince him to purchase the product.
BLUE Property Assumptions
In data analytics, the term "BLUE" stands for "Best Linear Unbiased Estimator." BLUE properties
refer to the desirable characteristics of an estimator that make it optimal for estimating parameters
in a linear regression model. These properties are fundamental for statistical inference and
regression analysis. Here's a breakdown of what each component means:
• Best: The estimator is considered "best" because it has the smallest variance among all unbiased
estimators. In other words, it minimizes the spread or uncertainty of the estimated parameter
values.
• Linear: The estimator is a linear function of the observed data. This means that it can be
expressed as a linear combination of the independent variables.

• Unbiased: The estimator is unbiased if, on average, it produces estimates that are equal to the
true parameter values. In other words, there is no systematic overestimation or underestimation.

• Estimator: An estimator is a statistical method or procedure used to estimate a population


parameter (e.g., the slope or intercept in linear regression) based on sample data.
Variable Rationalization
• "Variable rationalization" in data analytics typically refers to the process of evaluating and justifying
the inclusion or exclusion of variables in a statistical model. It involves assessing the relevance,
importance, and impact of different variables on the outcome of the analysis. Here's a more detailed
explanation of variable rationalization:
1. Variable Selection: In many datasets, there can be numerous variables available for analysis. Variable
rationalization involves selecting the subset of variables that are most relevant to the analysis at hand.
This can be based on domain knowledge, statistical techniques such as feature selection algorithms,
or a combination of both.
2. Feature Engineering: Sometimes, variable rationalization involves creating new variables through
feature engineering. This could include transforming existing variables, combining them to create new
features, or deriving additional variables that may capture important information.
3. Multicollinearity Assessment: Multicollinearity occurs when two or more predictor variables in a
regression model are highly correlated. Variable rationalization includes assessing multicollinearity to
identify and potentially remove redundant variables to avoid issues with interpretation and
estimation in regression analysis.
4. Validation and Iteration: Variable rationalization is often an iterative process that involves validating
the chosen variables and model through techniques such as cross-validation or out-of-sample testing.
This helps ensure that the selected variables are robust and generalize well to unseen data.
Variable Rationalization:
• The data set may have a large number of attributes. But some of those attributes can be
irrelevant or redundant. The goal of Variable Rationalization is to improve the Data Processing
optimally through attribute subset selection.
• This process is to find a minimum set of attributes such that dropping of those irrelevant
attributes does not much affect the utility of data and the cost of data analysis could be reduced.
• Mining on a reduced data set also makes the discovered pattern easier to understand. As part of
Data processing, we use the below methods of Attribute subset selection

1. Stepwise Forward Selection


2. Stepwise Backward Elimination
3. Combination of Forward Selection and Backward Elimination
4. Decision Tree Induction.
• Stepwise Forward Selection: This procedure starts with an empty set of attributes as the
minimal set. The most relevant attributes are chosen (having minimum p-value) and are
added to the minimal set. In each iteration, one attribute is added to a reduced set.
• Stepwise Backward Elimination: Here all the attributes are considered in the initial set of
attributes. In each iteration, one attribute is eliminated from the set of attributes whose
p-value is higher than the significance level.
• Combination of Forward Selection and Backward Elimination: The stepwise forward
selection and backward elimination are combined to select the relevant attributes most
efficiently. This is the most common technique which is generally used for attribute
selection.
• Decision Tree Induction: This approach uses a decision tree for attribute selection. It
constructs a flow chart-like structure having nodes denoting a test on an attribute. Each
branch corresponds to the outcome of the test and leaf nodes are a class prediction. The
attribute that is not part of the tree is considered irrelevant and hence discarded.
Logistic regression

• Logistic regression is the appropriate regression analysis to conduct when the dependent
variable is dichotomous (binary). Like all regression analyses, logistic regression is a
predictive analysis. It is used to describe data and to explain the relationship between
one dependent binary variable and one or more nominal, ordinal, interval or ratio-level
independent variables.

• Logistic Regression is another statistical analysis method borrowed from Machine


Learning. It is used when our dependent variable is dichotomous or binary. It just means
a variable that has only 2 outputs,
• For example, whether A person will survive this accident or not, The student will pass this
exam or not. The outcome can either be yes or no (2 outputs). This regression technique
is similar to linear regression and can be used to predict the Probabilities for classification
problems.
Logistic regression
Types of Logistic Regression

• Here are the three main types of logistic regression:


Binary logistic regression
• Binary logistic regression is used to predict the probability of a binary outcome, such as
yes or no, true or false, or 0 or 1. For example, it could be used to predict whether a
patient has a disease or not, or whether a loan will be repaid or not.
Multinomial logistic regression
• Multinomial logistic regression is used to predict the probability of one of three or more
possible outcomes, such as the type of product a customer will buy, the rating a customer
will give a product, or the political party a person will vote for.
Ordinal logistic regression
• is used to predict the probability of an outcome that falls into a predetermined order,
such as the level of customer satisfaction, the severity of a disease, or the stage of cancer
How does Logistic Regression work?

Logistic regression works in the following steps:

1.Prepare the data: The data should be in a format where each row represents a single
observation and each column represents a different variable. The target variable (the
variable you want to predict) should be binary (yes/no, true/false, 0/1).
2.Train the model: We teach the model by showing it the training data. This involves
finding the values of the model parameters that minimize the error in the training data.
3.Evaluate the model: The model is evaluated on the held-out test data to assess its
performance on unseen data.
4.Use the model to make predictions: After the model has been trained and assessed, it
can be used to forecast outcomes on new data.
Logistic Regression
• Logistic regression uses a sigmoid function or logistic function which is a complex cost function.
This sigmoid function is used to model the data in logistic regression. The function can be
represented as:

f(x)= Output between the 0 and 1 value.


x= input to the function
e= base of natural logarithm.

When we provide the input values (data) to the function, it gives


the S-curve as follows:

It uses the concept of threshold levels, values above the threshold


level are rounded up to 1, and values below the threshold level
are rounded up to 0.

58
Logistic Regression: Model Theory

• Logistic regression is a technique used when the dependent variable is categorical (or
nominal). Examples: 1) Consumers make a decision to buy or not to buy, 2) a product may
pass or fail quality control, 3) there are good or poor credit risks, and 4) an employee may be
promoted or not.
• Binary logistic regression - determines the impact of multiple independent variables
presented simultaneously to predict membership of one or other of the two dependent
variable categories.
• Since the dependent variable is dichotomous we cannot predict a numerical value for it using
logistic regression so the usual regression least squares deviations criteria for the best fit
approach of minimizing error around the line of best fit is inappropriate (It’s impossible to
calculate deviations using binary variables!).
• Instead, logistic regression employs binomial probability theory in which there are only two
values to predict: that probability (p) is 1 rather than 0, i.e. the event/person belongs to one
group rather than the other.
• Logistic regression forms a best fitting equation or function using the maximum
likelihood (ML) method, which maximizes the probability of classifying the observed data
into the appropriate category given the regression coefficients.
• Like multiple regression, logistic regression provides a coefficient ‘b’, which measures
each independent variable’s partial contribution to variations in the dependent variable.
• The goal is to correctly predict the category of outcome for individual cases using the
most parsimonious model.
• To accomplish this goal, a model (i.e. an equation) is created that includes all predictor
variables that are useful in predicting the response variable.
The Purpose of Binary Logistic Regression

1.The logistic regression predicts group membership


• Since logistic regression calculates the probability of success over the probability of
failure, the results of the analysis are in the form of an odds ratio.
• Logistic regression determines the impact of multiple independent variables presented
simultaneously to predict membership of one or other of the two dependent variable
categories.
2. The logistic regression also provides the relationships and strengths among the variables
## Assumptions of (Binary) Logistic Regression
• Logistic regression does not assume a linear relationship between the dependent and independent variables.
• Logistic regression assumes linearity of independent variables and log odds of dependent variables.
• The independent variables need not be interval, nor normally distributed, nor linearly related, nor of equal
variance within each group
• Homoscedasticity is not required. The error terms (residuals) do not need to be normally distributed.
• The dependent variable in logistic regression is not measured on an interval or ratio scale.
• The dependent variable must be dichotomous ( 2 categories) for the binary logistic regression.
• The categories (groups) as a dependent variable must be mutually exclusive and exhaustive; a case can only be
in one group and every case must be a member of one of the groups.
• Larger samples are needed than for linear regression because maximum coefficients using an ML method are
large sample estimates. A minimum of 50 cases per predictor is recommended (Field, 2013)
• Hosmer, Lemeshow, and Sturdivant (2013) suggest a minimum sample of 10 observations per independent
variable in the model but caution that 20 observations per variable should be sought if possible.
• Leblanc and Fitzgerald (2000) suggest a minimum of 30 observations per independent variable.
Log Transformation
Equation
Logistic Function (Sigmoid Function):

• The sigmoid function is a mathematical function that maps the predicted values to
probabilities.
• The sigmoid function maps any real value into another value within a range of 0 and 1,
forming forma S-Form curve.
• The value of the logistic regression must be between 0 and 1, which cannot go beyond
this limit, so it forms a curve like the "S" form
Hypothesis Test

• In logistic regression, hypotheses are of interest:


• the null hypothesis, which is when all the coefficients in the regression equation take the
value zero, and

• the alternate hypothesis that the model currently under consideration is accurate and
differs significantly from the null of zero, i.e. gives significantly better than the chance or
random prediction level of the null hypothesis.
• The null hypothesis in logistic regression states that there is no relationship between the
independent variables and the outcome variable. In other words, it suggests that the
coefficients of the independent variables in the regression equation are all equal to zero.
This implies that the independent variables do not have any effect on predicting the
outcome variable, and any observed relationship is due to random chance.
• On the other hand, the alternate hypothesis in logistic regression asserts that the model
being evaluated is accurate and significantly different from the null hypothesis. This
means that the independent variables included in the model are meaningful predictors of
the outcome variable, and the model provides a better fit to the data than would be
expected by chance alone. In essence, the alternate hypothesis suggests that the
regression model has predictive power beyond random chance and is capable of making
meaningful predictions about the outcome variable based on the values of the
independent variables.
Model Statistics

Likelihood Ratio Test: This test compares how well our full model with all predictors fits the
data compared to a simpler model with fewer predictors. It helps us see if adding more
predictors significantly improves our model's fit.

Example: Let's say we have a logistic regression model predicting whether customers will
purchase a product based on their age, income, and education level. We compare this full
model to a simpler model that only includes age and income. If the likelihood ratio test
shows a significant improvement in fit for the full model over the reduced model, it
suggests that including education level improves our ability to predict purchases.
Model Statistics

Deviance: Deviance tells us how far our model's predictions are from a perfect fit. A lower
deviance means our model fits the data better.
Example: Suppose we have a logistic regression model predicting whether patients will
develop a certain disease based on their health indicators. Lower deviance indicates that
our model's predicted probabilities are closer to the actual outcomes. For instance, a
deviance value of 1000 suggests better fit compared to a deviance value of 1500.
Model Statistics

AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion): These are
measures that balance how well our model fits the data with how complex it is. Smaller AIC
and BIC values suggest better models.
Example: Continuing with the disease prediction example, if we have two competing
logistic regression models, one with five predictors and another with ten predictors, we can
compare their AIC and BIC values. If the model with five predictors has lower AIC and BIC
values compared to the ten-predictor model, it suggests that the simpler model is
preferable as it balances goodness of fit with model complexity.
Model Statistics

Pseudo R-squared: Instead of using the traditional R-squared from linear regression,
logistic regression uses pseudo R-squared values like McFadden's R-squared or
Nagelkerke's R-squared. These values help us understand how much of the variability in the
data our model explains.

Example: In a study investigating factors influencing student performance, a logistic


regression model predicts whether students will pass an exam based on variables like study
hours, attendance, and prior GPA. McFadden's R-squared might indicate that our model
explains 30% of the variation in exam pass rates. This helps us gauge how well our model
captures the relationship between predictors and the outcome compared to a null model.
Model Construction:

1. Data Collection and Preparation: Gather and preprocess your data, ensuring that
independent and dependent variables are correctly formatted.
2. Model Specification: Choose the appropriate independent variables based on domain
knowledge and exploratory data analysis.
3. Model Estimation: Use an algorithm (often maximum likelihood estimation) to estimate
the coefficients of the logistic regression model.
4. Model Evaluation: Evaluate the model using fit statistics, cross-validation, or other
techniques to assess its performance and generalization ability.
5. Model Interpretation: Interpret the coefficients of the model to understand the
relationship between the independent variables and the log odds of the outcome.
6. Model Deployment: Deploy the model for making predictions on new data or for use in
decision-making processes.
Analytics applications to various Business Domains:

• Applications of Data Modelling can be termed as Business analytics.


• Business analytics involves the collecting, sorting, processing, and studying of
business-related data using statistical models and iterative methodologies. The
goal of BA is to narrow down which datasets are useful and which can increase revenue,
productivity, and efficiency.
• Business analytics (BA) is the combination of skills, technologies, and practices
used to examine an organization's data and performance as a way to gain insights
and make data-driven decisions in the future using statistical analysis.
Although business analytics is being leveraged in most commercial sectors and industries,
the following applications are the most common.

1. Credit Card Companies


Credit and debit cards are an everyday part of consumer spending, and they are an ideal
way of gathering information about a purchaser’s spending habits, financial situation,
behavior trends, demographics, and lifestyle preferences.

2. Customer Relationship Management (CRM)


Excellent customer relations is critical for any company that wants to retain
customer loyalty to stay in business for the long haul. CRM systems analyze
important performance indicators such as demographics, buying patterns, socio-
economic information, and lifestyle.
3. Finance
The financial world is a volatile place, and business analytics helps to extract
insights that help organizations maneuver their way through tricky terrain.
Corporations turn to business analysts to optimize budgeting, banking, financial
planning, forecasting, and portfolio management.
4. Human Resources
Business analysts help the process by pouring through data that characterizes high-
performingcandidates, such as educational background, attrition rate, the average
length of employment, etc. By working with this information, business analysts help
HR by forecasting the best fits between the company and candidates
5. Manufacturing:
Business analysts work with data to help stakeholders understand the things that
affect operations and the bottom line. Identifying things like equipment downtime,
inventory levels, and maintenance costs helps companies streamline inventory
management, risks, and supply-chain management to create maximum efficiency.
6. Marketing
Business analysts help answer these questions and so many more, by measuring
marketing and advertising metrics, identifying consumer behavior and the target
audience, and analyzing market trends.
Multinomial Logistic Regression

• A multinomial logistic regression (or multinomial regression for short) is used when the
outcome variable being predicted is nominal and has more than two categories that do
not have a given rank or order.
• This model can be used with any number of independent variables that are categorical or
continuous.
What MNLR?

• Multinomial logistic regression is also a classification algorithm same like the


logistic regression for binary classification. Whereas in logistic regression for
binary classification the classification task is to predict the target class which is of
binary type. Like Yes/NO, 0/1, Male/Female.

• When it comes to multinomial logistic regression. The idea is to use the logistic
regression techniques to predicted the target class for more than 2 target classes.
• The underline technique will be same like the
logistic regression for binary classification until calculating the probabilities for
each target. Once the probabilities were calculated. We need to transfer them
into one hot encoding and uses the cross entropy methods in the training
process for getting the proper weights.
Example

• Using the multinomial logistic regression. We can address different types of


classification problems. Where the trained model is used to predict the target
class from more than 2 target classes. Below are few examples to understand
what kind of problems we can solve using the multinomial logistic regression.
• Predicting the Iris flower species type
• Targets: different species
• Predicting the animal category using the given animal features
• Targets: Dog, Cat, Tiger, Lion
Assumptions

1.Independence of observations
2.Categories of the outcome variable must be mutually exclusive and exhaustive
3.No multicollinearity between independent variables
4.Linear relationship between continuous variables and the logit transformation of
the outcome variable
5.No outliers or highly influential points
1.Mutually Exclusive: The categories or groups into which the variable is divided
should not overlap. In other words, each observation should only fall into one
and only one category. There should be no ambiguity or possibility of an
observation belonging to multiple categories simultaneously.
For example, if you're categorizing people by their education level into "High School
Graduate," "College Graduate," and "Postgraduate," these categories should be
mutually exclusive. A person cannot be both a "College Graduate" and a
"Postgraduate" simultaneously; they should fit into only one category.
2. Exhaustive: This implies that all possible outcomes or scenarios should be
covered by the categories defined within the variable. No residual category should
be needed to account for observations that do not fit into any defined categories.
For instance, if you're categorizing people by their employment status into
"Employed," "Unemployed," and "Student," these categories should be exhaustive.
Every person's employment status should fit into one of these categories, leaving
no one unaccounted for. There should not be any other potential employment
status that is not covered by these categories.
Multinomial Logistic Regression Workflow/ Stages:

• Inputs
• Linear model
• Logits
• Softmax Function
• Cross Entropy
• One-Hot-Encoding
Inputs

• The inputs to the multinomial logistic regression are the features we have in the
dataset. Suppose if we are going to predict the Iris flower species type, the features
will be the flower sepal length, width and petal length and width parameters will be
our features. These features will treat as the inputs for the multinomial logistic
regression.
• The keynote to remember here is the features values always numerical. If the features
are not numerical, we need to convert them into numerical values using the proper
categorical data analysis techniques.
• Just a simple example: If the feature is color and having different attributes of the
color features are RED, BLUE, YELLOW, ORANGE. Then we can assign an integer value
to each attribute of the features like for RED we can assign 1. For BLUE we can assign
the value 2 likewise of the other attributes for the color feature. Later we can use the
numerically converted values as the inputs for the classifier.
Linear Model

• The linear model equation is the same as the linear equation in the linear regression
model. You can see this linear equation in the image. Where the X is the set of inputs,
Suppose from the image we can say X is a matrix. Which contains all the
feature( numerical values) X = [x1,x2,x3]. Where W is another matrix includes the same
input number of weights W = [w1,w2,w3].
• In this example, the linear model output will be the w1*x1, w2*x2, w3*x3
• The weights w1, w2, w3, w4 will update in the training phase. We will learn about this in
the parameters optimization section of this article.
Logits

• The Logits also called as scores. These are just the outputs of the linear
model. The Logits will change with the changes in the calculated
weights.
Softmax function

• The Softmax function is a probabilistic function that calculates the probabilities for the
given score. Using the softmax function return the high probability value for the high
scores and fewer probabilities for the remaining scores. This we can observe from the
image. For the logits 0.5, 1.5, 0.1 the calculated probabilities using the softmax function
are 0.2, 0.7, 0.1
• For Logit 1.5, we are getting a high probability value 0.7 and a very less probability value
for the remaining Logits 0.5 and 0.1
Cross Entropy

• The cross entropy is the last stage of multinomial logistic regression. It uses the cross-
entropy function to find the similarity distance between the probabilities calculated
from the softmax function and the target one-hot-encoding matrix.
• Before we learn more about Cross Entropy, let’s understand what it means by a One-Hot-
Encoding matrix.
One hot encoding

• One-Hot Encoding is a method to represent the target values or categorical attributes into a
binary representation. From this article main image, where the input is the dog image, the
target having 3 possible outcomes like bird, dog, cat. Where you can find the one-hot-
encoding matrix like [0, 1, 0].
• The one-hot-encoding matrix is so simple to create. For every input features (x1, x2, x3) the
one-hot-encoding matrix is with the values of 0 and the 1 for the target class. The total
number of values in the one-hot-encoding matrix and the unique target classes are the same.
• Suppose if we have 3 input features like x1, x2, and x3 and one target variable (With 3 target
classes). Then the one-hot-encoding matrix will have 3 values. Out of 3 values, one value will
be 1 and all other will be 0s.
• You will know where to place the 1 and where to place the 0 value from the training dataset.
Let’s take one observation from the training dataset which contains values for x1, x2, x3 and
what will be the target class for that observation. The one-hot-encoding matrix will be having
1 for the target class for that observation and 0s for other.
Cross-entropy

• The Cross-entropy is a distance calculation function which takes the calculated


probabilities from softmax function and the created one-hot-encoding matrix to calculate
the distance. For the right target class, the distance value will be less, and the distance
values will be larger for the wrong target class.
• With this, we discussed each stage of the multinomial logistic regression. All the stages/
workflow will happen for each observation in the training set.

You might also like