0% found this document useful (0 votes)
8 views17 pages

Logistic Regression

Uploaded by

researcherniaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views17 pages

Logistic Regression

Uploaded by

researcherniaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

LOGISTIC

REGRESSION

BY
EDWIN JOBISH
WHAT IS LOGISTIC REGRESSION?

• Logistic regression, along with discriminant analysis, is the appropriate statistical


technique when the dependent variable is a categorical (nominal or nonmetric) variable
and the independent variables are metric or nonmetric variables.
• Logistic regression is a statistical method used for modeling the probability of a binary
outcome.
• It's commonly employed in classification tasks where the dependent variable is
dichotomous.
• Despite its name, logistic regression is used for classification, not regression.
• Unlike linear regression, which predicts continuous outcomes, logistic regression
predicts the probability of a categorical outcome.
• In a practical sense, logistic regression may be preferred for two reasons.

• First, discriminant analysis relies on strictly meeting the assumptions of multivariate


normality and equal variance–covariance matrices across groups -Assumptions that are
not met in many social science research situations.
• Second, even if the assumptions are met, many researchers prefer logistic regression
because it is more similar to multiple regression. It has straightforward statistical tests,
similar approaches to incorporating metric and nonmetric variables and nonlinear effects,
and a wide range of diagnostics.
HOW LOGISTIC REGRESSION WORKS?

• - Logistic regression predicts probabilities for binary outcomes.


• - It uses the logistic function to map inputs to probabilities between 0 and 1.
• - During training, the model adjusts parameters to minimize the difference between
predicted probabilities and actual class labels.
• - Predictions are made by applying a threshold to these probabilities to determine class
assignment.
• Formula:
• represents the probability of the event Y occurring given the input X.
• z is the linear combination of the input variables.
• The logistic function maps any real-valued number to the range [0,1].
THE DECISION PROCESS FOR LOGISTIC
REGRESSION
• The application of logistic regression can be viewed from the
• Stage 1: Objectives of Logistic Regression
• Stage 2: Research Design for Logistic Regression
• Stage 3: Assumptions of Logistic Regression
• Stage 4: Estimation of the Logistic Regression Model and Assessing overall Fit
• Stage 5: Interpretation of the Results
• Stage 6: Validation of the Results
STAGE 1: OBJECTIVES OF LOGISTIC REGRESSION

• The first objective focuses on identifying the independent variables that impact group
membership represented by the dependent variable. Here the focus is on the variate in
terms of
(a) specifying which object characteristics should be included as independent variables
(b) estimating the importance of each independent variable in explaining group membership
• The second objective involves establishing a classification system based on the logistic
model for determining group membership. Here the ultimate goal of prediction is not a
specific metric value, like in multiple regression, but instead a method of placing each
observation into a distinct category/group
STAGE 2: RESEARCH DESIGN FOR LOGISTIC
REGRESSION
1. Binary Dependent Variable Representation : Logistic regression codes binary outcomes as 0 and
1, aligning with the research question for proper interpretation.
2. Logistic Curve Use : The logistic curve captures the nonlinear relationship between variables,
ensuring predicted probabilities remain within 0 and 1.
3. Unique Dependent Variable Nature : Binary outcomes violate multiple regression assumptions,
requiring specialized treatment in logistic regression due to binomial distribution and non-constant
variance.
4. Sample Size Importance : Logistic regression requires larger samples for maximum likelihood
estimation, with a minimum of 10 observations per estimated parameter, particularly in rare event
scenarios.
5. Handling Low Occurrence Frequency : Approaches like exact logistic regression or penalized
estimation mitigate biases associated with small sample sizes and low event frequencies.
6. Impact of Nonmetric Variables : Nonmetric independent variables affect sample size
considerations, requiring adequate cell sizes to prevent model instability.
7. Aggregated Data Analysis : Logistic regression can analyze aggregated data patterns, providing
insights at a higher level of aggregation.
8. Research Design Essentials : Proper logistic regression research design includes careful
consideration of sample size, variable coding, and model specification for valid results.
STAGE 3: ASSUMPTIONS OF LOGISTIC REGRESSION

• The advantages of logistic regression compared to discriminant analysis and even


multiple regression stem in large degree to the general lack of assumptions required in a
logistic regression analysis.
• Assumptions
● Logistic regression does not require the assumptions of normality and homoscedasticity
seen in both multiple regression and discriminant analysis.
● The primary assumption is the independence of observations, which if violated requires
some form of hierarchical/nested model approach.
● An inherent assumption that should be addressed with the Box–Tidwell test is the
linearity of the independent variables, especially continuous variables, with the outcome.
STAGE 4: ESTIMATION OF THE LOGISTIC
REGRESSION MODEL AND ASSESSING OVERALL FIT
• Logistic regression utilizes the logistic relationship to estimate the model and establish
the relationship between dependent and independent variables.
• The dependent variable is transformed using the logit transformation to ensure
predictions stay within the range of 0 to 1.
• Predictions are made by assigning probability values to observations based on
independent variables and estimated coefficients.
• A cutoff value is selected to classify observations into binary outcomes, with
misclassifications minimized by choosing an appropriate cutoff.
• Predictive accuracy is assessed using a classification matrix, similar to approaches used in
discriminant analysis and multiple regression.
STAGE 5: INTERPRETATION OF THE RESULTS

• Logistic regression coefficients differ from multiple regression due to the transformed
nature of the dependent variable.
• Original coefficients reflect changes in log odds, while exponentiated coefficients
represent odds changes directly.
• Positive coefficients increase the predicted probability, negative coefficients decrease it,
reflecting changes in log odds.
• For metric variables, original coefficients are less intuitive, while exponentiated
coefficients indicate odds changes for each unit change.
• Exponentiated coefficients for dummy variables show the relative odds compared to the
reference category.
• Variable importance measures like relative importance and information value assess the
significance of independent variables.
• High multicollinearity reduces the unique impact of independent variables, affecting
coefficient interpretation.
• Coefficients allow estimation of probabilities for specific values of the independent
variable, considering the nonlinear relationship between variables.
STAGE 6: VALIDATION OF THE RESULTS

• 1. Ensure external and internal validity in the final stage of logistic regression analysis.
• 2. Validation is essential, especially with smaller samples, to prevent overfitting.
• 3. External validity is typically assessed through hit ratios using holdout samples or
cross-validation methods.
• 4. Holdout samples, separate from the estimation sample, help evaluate the
generalizability of the logistic model.
• 5. Cross-validation methods like the jackknife approach utilize multiple subsets of the
total sample for testing.
• 6. Large values or associated standard errors of original logistic coefficients may indicate
quasi-complete separation issues.
• 7. Coefficients can be expressed in original and exponentiated forms for easier
interpretation.
• 8. Logistic regression offers advantages over discriminant analysis, especially with
categorical variables, and resembles multiple regression results
CONCLUSION

• In conclusion, logistic regression is a powerful statistical tool for analyzing relationships


between a binary outcome and one or more independent variables. Its advantages over
discriminant analysis include robustness with categorical variables and similarities to
multiple regression interpretation. However, ensuring both internal and external validity
through validation techniques remains crucial, especially in cases of small sample sizes. By
understanding the nuances of coefficient interpretation and employing appropriate
validation methods, logistic regression offers valuable insights into predictive modeling
and decision-making processes.
THANK YOU

You might also like