0% found this document useful (0 votes)
52 views1 page

Sta302 Final Project - Poster

Uploaded by

hype4everxxx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views1 page

Sta302 Final Project - Poster

Uploaded by

hype4everxxx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

MOTIVATION Factors Influencing Housing DATA COLLECTION

Understanding housing prices is crucial for economic


Prices in Ames, Iowa: A Data Source: Ames Housing Dataset (Kaggle, 2020)

stability, real estate appraisal, and informed decision-


making for investors and policymakers. Housing values
Multiple Linear Regression Data was originally collected by the Ames City
Assessor's Office from 2006 to 2010, involving

significantly impact individual wealth and broader


economic conditions, and analyzing the factors that
Analysis 2,930 property sales.
The dataset contains detailed variables (e.g., lot
size, year built) that directly align with the
influence these prices can provide key insights for Research Question: What characteristics of houses and neighborhoods research question, making it suitable for the
stakeholders. determine house pricing in Ames, Iowa? analysis.

1.Methods of Analysis
Predictor Variables: 2. Results/Findings
Numerical: Total Bathrooms, Living Area, Garage Size, Basement Area,
Data Preparation & Initial Modeling: 3. Conclusion
Year Built.
Final Model Equations:
Categorical: Central Air. Ames Housing dataset was cleaned and split into training and testing
Complete Model:
Model Development: subsets.
Model A Response variable: Sale Price (log-transformed). Predictors: Living
Included all six predictors.
Area, Garage Size, Year Built, Total Basement Area, Central Air,
Full Bath.
Multicollinearity check Variance Inflation Factor (VIF) = 1.31 (< Initial Model (Model A): Subset Model:
threshold of 5).
Key predictors identified: Living Area and Garage Size. Included six predictors.
Limitations:
Model B: Full Bath was removed due to a p-value > 0.05.
Subset with significant predictors from Model A.
Adjusted R² = 0.6115.
Influential Observations: Identified through Cook’s distance, which could bias
Model Refinement (Model B):
Compared to Model A using ANOVA (partial F-test). model estimates.
Model C: Removed non-significant predictors.
Multicollinearity: Some predictors showed collinearity, complicating
Further refinement via stepwise selection. Conducted Variance Inflation Factor (VIF) analysis; all predictors
interpretation but not affecting overall fit.
Validation and Metrics: had VIF < 5, indicating no multicollinearity.
Dataset split into training and testing subsets. Adjusted R² improved to 0.6826.
High Leverage Points: Influential on model fit, requiring cautious interpretation.
Stepwise Selection & Model Validation (Model C):
Metrics: Adjusted R2R^2R2, AIC, AICc, BIC.
Models validated for performance and robustness. Stepwise AIC selection refined the model further. Dataset Limitations: Lack of cross-validation affects robustness.
Model Diagnostics: Adjusted R² = 0.6826, consistent with Model B.
Linearity: Residual plots confirmed no curvature. Model validated using training and testing datasets, showing no
Constant Variance: Residuals showed no systematic dispersion overfitting.
(homoscedasticity).
Model Diagnostics:
Independence: Residuals showed no clustering patterns.
Residual plots confirmed linear relationships and homoscedasticity.
Normality: QQ plot confirmed near-normal distribution of residuals. QQ plot showed residual normality.
Leverage & Influence: Cook’s distance analysis found no significant influential points.
Leverage points checked using h > 2(p+1)/2
Influential points assessed with Cook’s Distance; none required
adjustments.

You might also like