Logistic Regression
Logistic Regression
REGRESSION
BY
EDWIN JOBISH
WHAT IS LOGISTIC REGRESSION?
• The first objective focuses on identifying the independent variables that impact group
membership represented by the dependent variable. Here the focus is on the variate in
terms of
(a) specifying which object characteristics should be included as independent variables
(b) estimating the importance of each independent variable in explaining group membership
• The second objective involves establishing a classification system based on the logistic
model for determining group membership. Here the ultimate goal of prediction is not a
specific metric value, like in multiple regression, but instead a method of placing each
observation into a distinct category/group
STAGE 2: RESEARCH DESIGN FOR LOGISTIC
REGRESSION
1. Binary Dependent Variable Representation : Logistic regression codes binary outcomes as 0 and
1, aligning with the research question for proper interpretation.
2. Logistic Curve Use : The logistic curve captures the nonlinear relationship between variables,
ensuring predicted probabilities remain within 0 and 1.
3. Unique Dependent Variable Nature : Binary outcomes violate multiple regression assumptions,
requiring specialized treatment in logistic regression due to binomial distribution and non-constant
variance.
4. Sample Size Importance : Logistic regression requires larger samples for maximum likelihood
estimation, with a minimum of 10 observations per estimated parameter, particularly in rare event
scenarios.
5. Handling Low Occurrence Frequency : Approaches like exact logistic regression or penalized
estimation mitigate biases associated with small sample sizes and low event frequencies.
6. Impact of Nonmetric Variables : Nonmetric independent variables affect sample size
considerations, requiring adequate cell sizes to prevent model instability.
7. Aggregated Data Analysis : Logistic regression can analyze aggregated data patterns, providing
insights at a higher level of aggregation.
8. Research Design Essentials : Proper logistic regression research design includes careful
consideration of sample size, variable coding, and model specification for valid results.
STAGE 3: ASSUMPTIONS OF LOGISTIC REGRESSION
• Logistic regression coefficients differ from multiple regression due to the transformed
nature of the dependent variable.
• Original coefficients reflect changes in log odds, while exponentiated coefficients
represent odds changes directly.
• Positive coefficients increase the predicted probability, negative coefficients decrease it,
reflecting changes in log odds.
• For metric variables, original coefficients are less intuitive, while exponentiated
coefficients indicate odds changes for each unit change.
• Exponentiated coefficients for dummy variables show the relative odds compared to the
reference category.
• Variable importance measures like relative importance and information value assess the
significance of independent variables.
• High multicollinearity reduces the unique impact of independent variables, affecting
coefficient interpretation.
• Coefficients allow estimation of probabilities for specific values of the independent
variable, considering the nonlinear relationship between variables.
STAGE 6: VALIDATION OF THE RESULTS
• 1. Ensure external and internal validity in the final stage of logistic regression analysis.
• 2. Validation is essential, especially with smaller samples, to prevent overfitting.
• 3. External validity is typically assessed through hit ratios using holdout samples or
cross-validation methods.
• 4. Holdout samples, separate from the estimation sample, help evaluate the
generalizability of the logistic model.
• 5. Cross-validation methods like the jackknife approach utilize multiple subsets of the
total sample for testing.
• 6. Large values or associated standard errors of original logistic coefficients may indicate
quasi-complete separation issues.
• 7. Coefficients can be expressed in original and exponentiated forms for easier
interpretation.
• 8. Logistic regression offers advantages over discriminant analysis, especially with
categorical variables, and resembles multiple regression results
CONCLUSION