Sta302 Final Project - Poster
Sta302 Final Project - Poster
1.Methods of Analysis
Predictor Variables: 2. Results/Findings
Numerical: Total Bathrooms, Living Area, Garage Size, Basement Area,
Data Preparation & Initial Modeling: 3. Conclusion
Year Built.
Final Model Equations:
Categorical: Central Air. Ames Housing dataset was cleaned and split into training and testing
Complete Model:
Model Development: subsets.
Model A Response variable: Sale Price (log-transformed). Predictors: Living
Included all six predictors.
Area, Garage Size, Year Built, Total Basement Area, Central Air,
Full Bath.
Multicollinearity check Variance Inflation Factor (VIF) = 1.31 (< Initial Model (Model A): Subset Model:
threshold of 5).
Key predictors identified: Living Area and Garage Size. Included six predictors.
Limitations:
Model B: Full Bath was removed due to a p-value > 0.05.
Subset with significant predictors from Model A.
Adjusted R² = 0.6115.
Influential Observations: Identified through Cook’s distance, which could bias
Model Refinement (Model B):
Compared to Model A using ANOVA (partial F-test). model estimates.
Model C: Removed non-significant predictors.
Multicollinearity: Some predictors showed collinearity, complicating
Further refinement via stepwise selection. Conducted Variance Inflation Factor (VIF) analysis; all predictors
interpretation but not affecting overall fit.
Validation and Metrics: had VIF < 5, indicating no multicollinearity.
Dataset split into training and testing subsets. Adjusted R² improved to 0.6826.
High Leverage Points: Influential on model fit, requiring cautious interpretation.
Stepwise Selection & Model Validation (Model C):
Metrics: Adjusted R2R^2R2, AIC, AICc, BIC.
Models validated for performance and robustness. Stepwise AIC selection refined the model further. Dataset Limitations: Lack of cross-validation affects robustness.
Model Diagnostics: Adjusted R² = 0.6826, consistent with Model B.
Linearity: Residual plots confirmed no curvature. Model validated using training and testing datasets, showing no
Constant Variance: Residuals showed no systematic dispersion overfitting.
(homoscedasticity).
Model Diagnostics:
Independence: Residuals showed no clustering patterns.
Residual plots confirmed linear relationships and homoscedasticity.
Normality: QQ plot confirmed near-normal distribution of residuals. QQ plot showed residual normality.
Leverage & Influence: Cook’s distance analysis found no significant influential points.
Leverage points checked using h > 2(p+1)/2
Influential points assessed with Cook’s Distance; none required
adjustments.