0% found this document useful (0 votes)
15 views14 pages

Predicting House Prices Grades: Business Computing - Ii

Uploaded by

Shubham Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views14 pages

Predicting House Prices Grades: Business Computing - Ii

Uploaded by

Shubham Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

BUSINESS

COMPUTING - II

Predicting House
Prices
Grades
GROUP NUMBER 12
RICHA BHARDWAJ MBAA24116
SAYANTAN MBAA24122
SHIVANGI AGARWAL MBAA24124
SNEHA RAMPURIA MBAA24130
PRIYANKA S MBAA24167
DIKSHA ARYA MBAA24090
INTRODUCTION
Our aim is to analyze the King County House Prices Dataset from
Kaggle (https://www.kaggle.com/datasets/gauravduttakiit/predict-the-
house-prices-king-county) to explore key patterns, conduct statistical
analysis, and build predictive models for price estimation.

House prices and grades are influenced by a variety of factors, including


location, property characteristics, and economic conditions.
Understanding these factors through data analytics can provide valuable
insights for buyers, sellers, and real estate investors.

This presentation explores different statistical modeling techniques


used for both continuous and categorical output variables. It covers
linear regression, logistic regression, Linear Discriminant Analysis
(LDA), and Random Forest, along with key statistical measures such as
t-tests, Z-scores, and p-values. Additionally, the evaluation metrics like
the confusion matrix for classification models.
DATA UNDERSTANDING

Numerical Features:
Price, Square Footage
(living, lot, basement),
Number of
Bedrooms/Bathrooms,
Year Built, Grade,
Condition, Floors.
Categorical Features:
Waterfront Presence,
Renovation Status, Zip
Code, View Rating.
Target Variable: Price,
Grade.
DATA PREPROCESSING

Handling Missing Values: Identify


and impute missing data using
median or mode.
Encoding Categorical Variables:
Convert categorical features into
numerical values (e.g., one-hot
encoding for zipcode and waterfront).
Feature Scaling: Normalize
numerical variables such as square
footage and price using Min-Max
Scaling.
Outlier Detection: Use IQR and Z-
score methods to detect and remove
extreme values.
EXPLORATORY DATA ANALYSIS (EDA)
Univariate Analysis
Price Distribution: Analyze the skewness and
identify potential outliers.
Square Footage Trends: Understand how
living space size impacts pricing.
Grade and Condition Analysis: Determine
how house quality affects valuation.

Bivariate and Multivariate Analysis


Correlation Matrix: Identify relationships
between numerical variables (e.g., square
footage vs. price).
Price Trends Across Zip Codes: Investigate
how location influences pricing.
Impact of Renovation: Compare the prices of
renovated vs. non-renovated houses.
EXPLORATORY DATA ANALYSIS (EDA)
Visualization Techniques
Histograms & Box Plots: Visualize data distribution and
detect outliers.
Scatter Plots: Identify linear relationships between price
and numerical features.
Heatmaps: Explore feature correlations.
STATISTICAL ANALYSIS
Two Sample T- Test

Hypothesis: Waterfront properties doesn't have significantly higher mean prices.

ANOVA

Hypothesis: House condition doesn't significantly impact the average sale price.
LINEAR REGRESSION (Price Prediction)

Linear Regression

Linear Regression: Establishes a


fundamental relationship between
predictors and price for Continuous
Output

Model Evaluation
Metrics

Linear Regression R-squared:


0.6616279632052
Adjusted R-squared: 0.6605
Linear Regression RMSE:
556267.069909821
RANDOM FOREST
Model Evaluation
Clasification Model
Metrics

RANDOM FOREST: Random Forest RMSE:


statistical model used to 556714.576601593
enhance accuracy with
ensemble learning.
CLASSIFICATION MODELS (Grade Prediction)

Conversion of Grade
to Binary 0 and 1

Classification for Categorical Output (Convert Grade Prediction)


Clasification Models Linear Discriminant Analysis (LDA) for Non-Categorical Inputs
Logistic Regression (glm()) for Categorical Inputs

Model Evaluation
Confusion Matrix for Classification Models
LDA

Clasification Model

LDA : A classification algorithm that is used when the input features are continuous (not
categorical). Predicting Binary Grade.

Model Evaluation
Metrics

As per the Hit Ratio the accuracy of


the model is 90.41%
LOGISTIC REGRESSION

Clasification Model

Logistic Regression:
statistical model used for
binary or multi-class
classification. Input
features are categorical.
Predicting Binary Grade.

Model Evaluation
Metrics

As per the Hit Ratio the accuracy of


the model is 87.92%
CONCLUSION

We used machine learning models to predict house prices (continuous) and property
grades (categorical) using key input factors like location, size, number of rooms, and
amenities.

For price prediction, we applied Linear Regression (lm) and statistical tests like t-tests,
ANOVA, and Z-scores, along with Random Forest for better accuracy.

For grade classification, we categorized grades as ‘0 or Bad’ (<7) or ‘1 or Good’ (≥7) and used
Logistic Regression (glm) and Linear Discriminant Analysis (LDA) for classification.

We evaluated model performance using RMSE, R-squared for regression and Confusion
Matrix, accuracy scores for classification, ensuring robust and reliable predictions.

Feature importance analysis and correlation visualizations helped identify key factors
influencing house prices and grades, improving prediction accuracy and aiding real estate
decision-making.
Thank you

You might also like