SlideShare a Scribd company logo
Prediction of House Price using Multiple Regression By Vinod Kumar Shanmugam MATH 661 – APPLIED STATISTICS PROFESSOR ARIDAMAN JAIN FALL 2009
ABSTRACT This project focuses on predicting the selling price of the house depending on various parameters like Year built, Square feet, Lot size, number of beds and baths, features, Walk score etc.  The data is taken from  www.zillow.com .  What is zillow.com? -Zillow is an online real estate service dedicated to people to get an edge in real estate by providing with valuable tools and information. PROJECT OBJECTIVE: -This project aims in constructing a mathematical model using Multiple Regression to estimate the selling price of the house based on a set of predictor variables.  Analysis Software Used –  SAS  (Statistical Analysis Software)
VARIABLES USED FOR ANALYSIS LIST OF DEPENDENT AND INDEPENDENT VARIABLES -We have 8 independent variables and 1 dependent variable.we screen variables based on correlation coefficient with price and amount of variability explained by the model (R-square).
STASTISTICAL APPROACH The statistical approach used here is Multiple Regression.  What is Multiple Regression? -Multiple regression involves the use of more than one independent variable to predict a dependent variable.  EQUATION FOR MULTIPLE REGRESSION: -> Y = b0 + b1*X1 + b2*X2 + ... + bp*Xp -> X1, X2…Xp are the independent variables and Y is the housing price and is the dependent variable that is being predicted or explained. -> bo  is the Constant or intercept -> b1 is the Slope (Beta coefficient) for X1, b2 is the Beta coefficient for X2, etc… This equation is estimated using  the  Least-Squares method.
EXPLORATORY DATA ANALYSIS The exploratory data analysis involves the scatter plot outputs between house price and predictor variables with natural log transform of price and without natural log transform of price variable.  The log transformation is necessary for price to have a linearity relationship between price and other independent variables and there by to have accurate prediction.
DISTRIBUTION OF HOUSING PRICE VARIABLE WITHOUT NATURAL LOG TRANSFORM   Distribution
DISTRIBUTION OF HOUSING PRICE VARIABLE WITH NATURAL LOG TRANSFORM Distribution 1)Normal Probability plot  2)Histogram The housing price is transformed using natural log and appears very close to normal distribution. This ensures linearity relationship between housing price and other predictor variables.  The distribution is not that much skewed compared to before transformation.
CORRELATION AND REGRESSION ANALYSIS:   What is correlation? -Correlation is a statistical relation between two or more variables such that systematic changes in the value of one variable are accompanied by systematic changes in the other.  It is represented by r and ranges between -1 to +1.  Pearson correlation coefficient : associates the independent variable price with other features of the house like age, sqft, appliances_cnt etc… The highlighted correlation is greater than 0.5 and have strong positive or negative correlation and will be able to explain the variation of house price in the regression model better than other variables. Automatic variable selection is done in sas based on amount of variability explained in the model.
MULTIPLE REGRESSION ANALYSIS: Multiple regression was done on the data set using the Proc REG procedure in SAS. ANOVA TABLE:
Main Points from SAS output: The F-Value is 37.32 and P value is <0.05, so the regression model is significant. The P-value for the t-statistic of the selected variables are all <=0.05, so all the variables are significant in the model The R-square is 0.8092, which means 80.92% of the total variability is explained by the age, lotsizesqft, bedrooms, appliances_cnt and numfloors variables The Regression equation to predict the house price is
Identifying Outliers using residuals After Identifying influential observations, the outliers were removed from the data. The top 3 and bottom 3 cases were removed, to see if it improves the variability explained by the model. The R-square value increased from 0.8092 to 0.8322, which is good, so we retain the newly fit model after removing the outliers.
Main Points from SAS output The F-Value is 37.70 and P value is <0.05, so the regression model is significant. The P-value for the t-statistic of the selected variables are all <=0.05, so all the variables are significant in the model, except numfloors, we can remove the variable from the model if we wanted to. The R-square is 0.8322, which means 83.32% of the total variability is explained by the age, lotsizesqft, bedrooms, appliances_cnt and numfloors variables after removing for outliers    FINAL MODEL
Explaining the effect of each independent variable selected by the regression model Interpreting Regression Co-efficient   - Each regression Coefficient measures the average change in Y per unit change in the relevant independent variables.  Starting to compare two houses: same input value, same output value- no change here:
Explaining the effect of each independent variable selected by the regression model (Cont..) Explaining Age Coefficient  Explaining lot coefficient
Explaining the effect of each independent variable selected by the regression model (Cont..) Predicting : New Case -1  Predicting : New Case - 2 This will help the house seller or realtor to suggest modifications to existing house, if they wanted a good selling price in the neighborhood. The X (independent) variables should be within the min and max of the data set that was used to fit the regression model, as out of range predictions will not work
PLOT OF ACTUAL VS PREDICTED VALUE BEFORE REMOVING OUTLIERS  AFTER REMOVING OUTLIERS PLOTS OF Actuals vs Predicted Value after removing outliers, now it looks quite linear association between actual vs predicted.
CUMULATIVE DISTRIBUTION OF PREDICTION ERROR % The formula is (abs(actual-predicted)*100/actual). This cumulative chart shows that 70% (0.7 on y-axis) of cases have less than 9% prediction  error when compare to the actual selling price. 80% of cases have less than 10% prediction error 90% of cases have less than 12% prediction error
CONCLUSION we are able to predict house price with around  90% accuracy for most of the cases and we have a good R-square of 0.83, which means 83% of the variability is explained by the model and we are also able to explain the interpretation of the estimates of the model . SCOPE OF THE PROJECT: In future we can also include, latitude, longitude and elevation of the house in the model to predict the house price with more accuracy. Future work can also include demographics variable like income, number of children, education, age of the family group etc in the model, to explain the variability in the house pricing and to predict house pricing more effectively.

More Related Content

What's hot (20)

House price ppt 18 bcs6588_md. tauhid alam
House price ppt  18 bcs6588_md. tauhid alamHouse price ppt  18 bcs6588_md. tauhid alam
House price ppt 18 bcs6588_md. tauhid alam
ArmanMalik66
 
House Price Prediction.pptx
House Price Prediction.pptxHouse Price Prediction.pptx
House Price Prediction.pptx
CodingWorld5
 
House price prediction
House price predictionHouse price prediction
House price prediction
SabahBegum
 
Data analytics with python introductory
Data analytics with python introductoryData analytics with python introductory
Data analytics with python introductory
Abhimanyu Dwivedi
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
Leo Salemann
 
Machine learning
Machine learningMachine learning
Machine learning
Mike Martinez
 
IRJET- House Rent Price Prediction
IRJET- House Rent Price PredictionIRJET- House Rent Price Prediction
IRJET- House Rent Price Prediction
IRJET Journal
 
Data Science: Prediction analysis for houses in Ames, Iowa.
Data Science: Prediction analysis for houses in Ames, Iowa.Data Science: Prediction analysis for houses in Ames, Iowa.
Data Science: Prediction analysis for houses in Ames, Iowa.
ASHISH MENKUDALE
 
House Price Prediction An AI Approach.
House Price Prediction An AI Approach.House Price Prediction An AI Approach.
House Price Prediction An AI Approach.
Nahian Ahmed
 
House Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning AlgorithmHouse Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning Algorithm
ijtsrd
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Simplilearn
 
Stroke Prediction
Stroke PredictionStroke Prediction
Stroke Prediction
MamathaGuntu1
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
Sandeep Garg
 
Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data Mining
AarshDhokai
 
Capstone Project in Business Intelligence
Capstone Project in Business IntelligenceCapstone Project in Business Intelligence
Capstone Project in Business Intelligence
Samantha Adriaan
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
AyanaRukasar
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
Dr. C.V. Suresh Babu
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
Knoldus Inc.
 
Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)
Rehan Guha
 
House price ppt 18 bcs6588_md. tauhid alam
House price ppt  18 bcs6588_md. tauhid alamHouse price ppt  18 bcs6588_md. tauhid alam
House price ppt 18 bcs6588_md. tauhid alam
ArmanMalik66
 
House Price Prediction.pptx
House Price Prediction.pptxHouse Price Prediction.pptx
House Price Prediction.pptx
CodingWorld5
 
House price prediction
House price predictionHouse price prediction
House price prediction
SabahBegum
 
Data analytics with python introductory
Data analytics with python introductoryData analytics with python introductory
Data analytics with python introductory
Abhimanyu Dwivedi
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
Leo Salemann
 
IRJET- House Rent Price Prediction
IRJET- House Rent Price PredictionIRJET- House Rent Price Prediction
IRJET- House Rent Price Prediction
IRJET Journal
 
Data Science: Prediction analysis for houses in Ames, Iowa.
Data Science: Prediction analysis for houses in Ames, Iowa.Data Science: Prediction analysis for houses in Ames, Iowa.
Data Science: Prediction analysis for houses in Ames, Iowa.
ASHISH MENKUDALE
 
House Price Prediction An AI Approach.
House Price Prediction An AI Approach.House Price Prediction An AI Approach.
House Price Prediction An AI Approach.
Nahian Ahmed
 
House Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning AlgorithmHouse Price Estimates Based on Machine Learning Algorithm
House Price Estimates Based on Machine Learning Algorithm
ijtsrd
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Simplilearn
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
Sandeep Garg
 
Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data Mining
AarshDhokai
 
Capstone Project in Business Intelligence
Capstone Project in Business IntelligenceCapstone Project in Business Intelligence
Capstone Project in Business Intelligence
Samantha Adriaan
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
AyanaRukasar
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
Knoldus Inc.
 
Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)
Rehan Guha
 

Similar to Prediction of house price using multiple regression (20)

REGRESSION AND EXPLORATORY FACTOR ANALYSIS
REGRESSION AND EXPLORATORY FACTOR ANALYSISREGRESSION AND EXPLORATORY FACTOR ANALYSIS
REGRESSION AND EXPLORATORY FACTOR ANALYSIS
umadhanr
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression Analysis
Shailendra Tomar
 
Predicting US house prices using Multiple Linear Regression in R
Predicting US house prices using Multiple Linear Regression in RPredicting US house prices using Multiple Linear Regression in R
Predicting US house prices using Multiple Linear Regression in R
Sotiris Baratsas
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
Antoine De Henau
 
Final Presentation.pptx
Final Presentation.pptxFinal Presentation.pptx
Final Presentation.pptx
JayPatel711918
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
Smarten Augmented Analytics
 
Regression
RegressionRegression
Regression
nandini patil
 
522323444-Presentation-HousePricePredictionSystem.pptx
522323444-Presentation-HousePricePredictionSystem.pptx522323444-Presentation-HousePricePredictionSystem.pptx
522323444-Presentation-HousePricePredictionSystem.pptx
aasthamahajan2003
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
Smarten Augmented Analytics
 
The normal presentation about linear regression in machine learning
The normal presentation about linear regression in machine learningThe normal presentation about linear regression in machine learning
The normal presentation about linear regression in machine learning
dawasthi952
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
Smarten Augmented Analytics
 
Detail Study of the concept of Regression model.pptx
Detail Study of the concept of  Regression model.pptxDetail Study of the concept of  Regression model.pptx
Detail Study of the concept of Regression model.pptx
truptikulkarni2066
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Smarten Augmented Analytics
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
Kemal İnciroğlu
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
Abhimanyu Dwivedi
 
Regression
RegressionRegression
Regression
ICFAI Business School
 
statistical learning theory
statistical learning theorystatistical learning theory
statistical learning theory
HarshKumar943076
 
Understanding Simple Regression_ Concepts and Applications.pptx
Understanding Simple Regression_ Concepts and Applications.pptxUnderstanding Simple Regression_ Concepts and Applications.pptx
Understanding Simple Regression_ Concepts and Applications.pptx
umeshpakhrin2694
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
Smarten Augmented Analytics
 
Simple Linear Regression.pptSimple Linear Regression.ppt
Simple Linear Regression.pptSimple Linear Regression.pptSimple Linear Regression.pptSimple Linear Regression.ppt
Simple Linear Regression.pptSimple Linear Regression.ppt
NersyPrincessBongoya
 
REGRESSION AND EXPLORATORY FACTOR ANALYSIS
REGRESSION AND EXPLORATORY FACTOR ANALYSISREGRESSION AND EXPLORATORY FACTOR ANALYSIS
REGRESSION AND EXPLORATORY FACTOR ANALYSIS
umadhanr
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression Analysis
Shailendra Tomar
 
Predicting US house prices using Multiple Linear Regression in R
Predicting US house prices using Multiple Linear Regression in RPredicting US house prices using Multiple Linear Regression in R
Predicting US house prices using Multiple Linear Regression in R
Sotiris Baratsas
 
Final Presentation.pptx
Final Presentation.pptxFinal Presentation.pptx
Final Presentation.pptx
JayPatel711918
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
Smarten Augmented Analytics
 
522323444-Presentation-HousePricePredictionSystem.pptx
522323444-Presentation-HousePricePredictionSystem.pptx522323444-Presentation-HousePricePredictionSystem.pptx
522323444-Presentation-HousePricePredictionSystem.pptx
aasthamahajan2003
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
Smarten Augmented Analytics
 
The normal presentation about linear regression in machine learning
The normal presentation about linear regression in machine learningThe normal presentation about linear regression in machine learning
The normal presentation about linear regression in machine learning
dawasthi952
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
Smarten Augmented Analytics
 
Detail Study of the concept of Regression model.pptx
Detail Study of the concept of  Regression model.pptxDetail Study of the concept of  Regression model.pptx
Detail Study of the concept of Regression model.pptx
truptikulkarni2066
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Smarten Augmented Analytics
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
Kemal İnciroğlu
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
Abhimanyu Dwivedi
 
statistical learning theory
statistical learning theorystatistical learning theory
statistical learning theory
HarshKumar943076
 
Understanding Simple Regression_ Concepts and Applications.pptx
Understanding Simple Regression_ Concepts and Applications.pptxUnderstanding Simple Regression_ Concepts and Applications.pptx
Understanding Simple Regression_ Concepts and Applications.pptx
umeshpakhrin2694
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
Smarten Augmented Analytics
 
Simple Linear Regression.pptSimple Linear Regression.ppt
Simple Linear Regression.pptSimple Linear Regression.pptSimple Linear Regression.pptSimple Linear Regression.ppt
Simple Linear Regression.pptSimple Linear Regression.ppt
NersyPrincessBongoya
 

Prediction of house price using multiple regression

  • 1. Prediction of House Price using Multiple Regression By Vinod Kumar Shanmugam MATH 661 – APPLIED STATISTICS PROFESSOR ARIDAMAN JAIN FALL 2009
  • 2. ABSTRACT This project focuses on predicting the selling price of the house depending on various parameters like Year built, Square feet, Lot size, number of beds and baths, features, Walk score etc. The data is taken from www.zillow.com . What is zillow.com? -Zillow is an online real estate service dedicated to people to get an edge in real estate by providing with valuable tools and information. PROJECT OBJECTIVE: -This project aims in constructing a mathematical model using Multiple Regression to estimate the selling price of the house based on a set of predictor variables. Analysis Software Used – SAS (Statistical Analysis Software)
  • 3. VARIABLES USED FOR ANALYSIS LIST OF DEPENDENT AND INDEPENDENT VARIABLES -We have 8 independent variables and 1 dependent variable.we screen variables based on correlation coefficient with price and amount of variability explained by the model (R-square).
  • 4. STASTISTICAL APPROACH The statistical approach used here is Multiple Regression. What is Multiple Regression? -Multiple regression involves the use of more than one independent variable to predict a dependent variable. EQUATION FOR MULTIPLE REGRESSION: -> Y = b0 + b1*X1 + b2*X2 + ... + bp*Xp -> X1, X2…Xp are the independent variables and Y is the housing price and is the dependent variable that is being predicted or explained. -> bo is the Constant or intercept -> b1 is the Slope (Beta coefficient) for X1, b2 is the Beta coefficient for X2, etc… This equation is estimated using the Least-Squares method.
  • 5. EXPLORATORY DATA ANALYSIS The exploratory data analysis involves the scatter plot outputs between house price and predictor variables with natural log transform of price and without natural log transform of price variable. The log transformation is necessary for price to have a linearity relationship between price and other independent variables and there by to have accurate prediction.
  • 6. DISTRIBUTION OF HOUSING PRICE VARIABLE WITHOUT NATURAL LOG TRANSFORM Distribution
  • 7. DISTRIBUTION OF HOUSING PRICE VARIABLE WITH NATURAL LOG TRANSFORM Distribution 1)Normal Probability plot 2)Histogram The housing price is transformed using natural log and appears very close to normal distribution. This ensures linearity relationship between housing price and other predictor variables. The distribution is not that much skewed compared to before transformation.
  • 8. CORRELATION AND REGRESSION ANALYSIS: What is correlation? -Correlation is a statistical relation between two or more variables such that systematic changes in the value of one variable are accompanied by systematic changes in the other. It is represented by r and ranges between -1 to +1. Pearson correlation coefficient : associates the independent variable price with other features of the house like age, sqft, appliances_cnt etc… The highlighted correlation is greater than 0.5 and have strong positive or negative correlation and will be able to explain the variation of house price in the regression model better than other variables. Automatic variable selection is done in sas based on amount of variability explained in the model.
  • 9. MULTIPLE REGRESSION ANALYSIS: Multiple regression was done on the data set using the Proc REG procedure in SAS. ANOVA TABLE:
  • 10. Main Points from SAS output: The F-Value is 37.32 and P value is <0.05, so the regression model is significant. The P-value for the t-statistic of the selected variables are all <=0.05, so all the variables are significant in the model The R-square is 0.8092, which means 80.92% of the total variability is explained by the age, lotsizesqft, bedrooms, appliances_cnt and numfloors variables The Regression equation to predict the house price is
  • 11. Identifying Outliers using residuals After Identifying influential observations, the outliers were removed from the data. The top 3 and bottom 3 cases were removed, to see if it improves the variability explained by the model. The R-square value increased from 0.8092 to 0.8322, which is good, so we retain the newly fit model after removing the outliers.
  • 12. Main Points from SAS output The F-Value is 37.70 and P value is <0.05, so the regression model is significant. The P-value for the t-statistic of the selected variables are all <=0.05, so all the variables are significant in the model, except numfloors, we can remove the variable from the model if we wanted to. The R-square is 0.8322, which means 83.32% of the total variability is explained by the age, lotsizesqft, bedrooms, appliances_cnt and numfloors variables after removing for outliers FINAL MODEL
  • 13. Explaining the effect of each independent variable selected by the regression model Interpreting Regression Co-efficient - Each regression Coefficient measures the average change in Y per unit change in the relevant independent variables. Starting to compare two houses: same input value, same output value- no change here:
  • 14. Explaining the effect of each independent variable selected by the regression model (Cont..) Explaining Age Coefficient Explaining lot coefficient
  • 15. Explaining the effect of each independent variable selected by the regression model (Cont..) Predicting : New Case -1 Predicting : New Case - 2 This will help the house seller or realtor to suggest modifications to existing house, if they wanted a good selling price in the neighborhood. The X (independent) variables should be within the min and max of the data set that was used to fit the regression model, as out of range predictions will not work
  • 16. PLOT OF ACTUAL VS PREDICTED VALUE BEFORE REMOVING OUTLIERS AFTER REMOVING OUTLIERS PLOTS OF Actuals vs Predicted Value after removing outliers, now it looks quite linear association between actual vs predicted.
  • 17. CUMULATIVE DISTRIBUTION OF PREDICTION ERROR % The formula is (abs(actual-predicted)*100/actual). This cumulative chart shows that 70% (0.7 on y-axis) of cases have less than 9% prediction error when compare to the actual selling price. 80% of cases have less than 10% prediction error 90% of cases have less than 12% prediction error
  • 18. CONCLUSION we are able to predict house price with around 90% accuracy for most of the cases and we have a good R-square of 0.83, which means 83% of the variability is explained by the model and we are also able to explain the interpretation of the estimates of the model . SCOPE OF THE PROJECT: In future we can also include, latitude, longitude and elevation of the house in the model to predict the house price with more accuracy. Future work can also include demographics variable like income, number of children, education, age of the family group etc in the model, to explain the variability in the house pricing and to predict house pricing more effectively.