0% found this document useful (0 votes)
21 views21 pages

HOUSE PRICE PREDICTION Shreya Majumder012345678910111213141516171819 - Sign

This research paper focuses on predicting housing prices using various machine learning algorithms, including linear regression, support vector machines, gradient boosting, and random forest. The study emphasizes the importance of accurate predictions for buyers and sellers in the real estate market and demonstrates that a combination of these techniques yields better results than individual algorithms. The findings suggest that the proposed methodology can be further developed to enhance property price predictions across different regions.

Uploaded by

nikashp11a1svv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views21 pages

HOUSE PRICE PREDICTION Shreya Majumder012345678910111213141516171819 - Sign

This research paper focuses on predicting housing prices using various machine learning algorithms, including linear regression, support vector machines, gradient boosting, and random forest. The study emphasizes the importance of accurate predictions for buyers and sellers in the real estate market and demonstrates that a combination of these techniques yields better results than individual algorithms. The findings suggest that the proposed methodology can be further developed to enhance property price predictions across different regions.

Uploaded by

nikashp11a1svv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Contents

Abstract ………………………………………………………………………………………………………..v
Chapter 1
1.1 Introduction ………………………………………………………………………….………………………1

Chapter 2
2.1 Background Studies……………………………………………………………………………….…….2
2.2Literature Survey ….…………………………………………………………..………………………….2

Chapter 3
3.1 Proposed Methodology…………………………………….…………………………………….……4

Chapter 4
4.1Experimental Dataset…………………….…………………………….……….………………..…....6

Chapter 5
5.1 Results and Output……………………………………………………………………………………….8
5.2 Findings and Discussion……………………………………………………………………………….15

Chapter 6
6.1 Conclusions …………………………………………..……………………………………………………16
6.2 Future Work …………………………………………….…………………………………………………16
Abstract

Real estate is the least transparent industry in our ecosystem. Housing prices keep
changing day in and day out and sometimes are hyped rather than being based on
valuation. Predicting housing prices with real factors is the main crux of our
research project. Here we aim to make our evaluations based on every basic
parameter that is considered while determining the price. We use various
regression techniques in this pathway, and our results are not sole determination
of one technique rather it is the weighted mean of various techniques to give
most accurate results. The results proved that this approach yields minimum error
and maximum accuracy than individual algorithms applied. We also propose to
use real-time neighborhood details using Google maps to get exact real-world
valuation

ii
Chapter 1

1.1 Introduction
The trend of house prices is always a controversial content as its change will
pose a huge effect on the entire frugality. The rise in house price means growth in
non-financial means which eventually increases particular wealth, stimulating
consumption and boosting frugality; still, a drop in house price limits an existent’s
borrowing capacity, crowding out investments due to the evaporation in the value
of collaterals content. The purpose of house price prediction is to give a base for
pricing between buyers and merchandisers. By viewing sale records, buyers can
understand whether they've entered a fair price for a house, and merchandisers
can estimate the price at which they can vend a house along a specific road
section. Consequently, our final exploration thing is to probe features impacting
pricing results. We employ an attention medium and import miscellaneous data
into our advanced attention model. These data and the introduced styles not only
ameliorate house price prediction performance but also reveal factors impacting
house prices. In this research paper, we've used different Machine learning
algorithms for better training the Machine learning models. Trends in containing
costs show the current profitable situation and as well as to direct concern with
buyers and merchandisers. Factual cost of a house is depending on so
multitudinous factors. In this paper, the sale price prediction of a house is done
using different groups algorithms like linear regression, SVMs algorithm, gradient
boosting, and Random Forest is used. Then we consider RMSE as the performance

1 | Institute of Engineering & Management


matrix for different datasets and these algorithms are applied and find out the

utmost delicacy model which prognosticates better results.

Chapter 2
2.1 Background Studies

This research paper consists of formulating a working system for house price
prediction. During the data processing, we estimated several evaluation points
and other errors and eliminated all other irregularities.
The model was really accurate and achieved a high R square score. It claimed a
high square score especially in the test set and in other aspects too. House is one
of the most essential requirements of mortal life which we can’t deny. As people’s
living norms bettered, the demands for houses went up much further over time.
With the increase in population, the price of houses will always be on an uptrend.
There are a few people who view their house as an investment or a property as
well as an asset. On the other hand, the remaining 90% of the people around the
world buy their house as a livelihood. Houses are viewed as a profitability status
symbol and political structure of a country nevertheless reports that the change in
house prices has always been an issue for house processing. Structure and real
estate stated that in several countries, house has become unaffordable of their
substantial price growth. Quality of life as well as public frugality depends on an
implicate increase in house prices. Eventually, this issue will affect real estate
investors who are making houses as an investment for appreciation. Each time
the occurrence in house demands is accompanied by the increase in house prices.
After comparison and contrast with other papers, it is found that the paper
confirms to real life and formulates a model that fits better than the presiding
studies of house price prediction.

2 | Institute of Engineering & Management


2.2 Literature Survey
In this conference paper, we have to analyze the different Machine Learning
algorithms for better training Machine Learning models. The latest worldwide
financial crisis restored a sharp enthusiasm toward both academic and strategy
circles on the part of asset costs and specifically, lodging costs clinched alongside
monetary movement. There are so many factors for the actual price of the house
which will be predicted because it depends on the number of the room like bed
rooms, bathrooms, kitchen, and balcony in front or back. This also includes the
area where the house is situated. Over few years ago, real estate companies
trying to predict price of property by manually. In company there is special
management team is present for prediction of cost of any real estate property.
This is the reason for the loss of both the buyers and the sellers. A few years back
it has 25%of loss back then vs present day. So the main objective is to predict the
house price not, manually by using AI system which will give us result as predicted
without any loss. People from different state has been doing the same research
work based on that some example given below taken from others research work.
They mostly got output above 75% accuracy result. Here following are some real
time experimental research work done by other people in other country.

Recently, a few writers scope to experimental discoveries that house costs


can make instrument AI molding to determining yield. (Forni etc, 2003; stock and
Watson, 2003; Gupta Furthermore Das, 2010; das etc, 2009; 2010; 2011; Gupta
and Hartley, 2013). Those who are lodging development been only speaking
expensive and monetary communicated in GDP. In that case house cost can make
a point to them. The overall matter is this can be both suitable for the business
members and the fiscal strategy. They work on the project and got 76% above
accuracy

There is huge literature writing in regards to U.S. house prices. Rapach


Furthermore strauss (2007) use an auto regressive dispersed slack (ARDL) model
framework, holding 25 determinants with conjecture genuine lodging cost
development to the unique states of the elected Reserve’s eighth region. In case

3 | Institute of Engineering & Management


of auto regressive gave 78% accuracy. They discover that ARDL models tend
should beat a benchmark AR model.

4 | Institute of Engineering & Management


Gupta and Das (2010) also forecast the recent downturn in real house price
growth rates for the twenty largest U.S. states. Their research work gave more
than 80% accuracy. The author also used basic AI technique based only on
monthly real house price growth rates, to forecast their downturn over the period
2007:01 to 2008:01.

These are some example that we got doing our survey for the research work
which we’re going to do in this paper. Here our research discussion will be
forecasted to all.

Chapter 3
3.1 Proposed Methodology
Now that the data is cleaned of noisy data and pre-processesed, we can finally use
it for prediction. In this section, we will discuss about the algorithm we have used
for the prediction of the property price and how the training and testing of the
model will take place.
Used algorithm:
Mainly using different type of supervised algorithm is the best choice to predict for
house price prediction.
Supervised algorithm:
Supervised learning is also known as supervised machine learning is a subcategory
of machine learning and artificial intelligence.it is defined by its use of labeled
datasets to train algorithms that to classify data or predict outcomes accurately.
Here we’ve use different regression to find the accurate prediction:-

1) Linear regression

2) Vector regression

5 | Institute of Engineering & Management


3) Gradient boosting regression

4) Random forest regression

1) Linear regression: The algorithm that we have selected is linear regression,


where the value for the dependent variable is calculated using multiple
independent variables. The value of variable which is to be predicted depends on
its strength of relationship with the other independent variables. This factor is
called correlation.

Y = mx + c [y is dependent variable ,x is independent variable ,m


is intersection factor ]

In linear regression, the relationships are modeled using linear predictor functions
whose unknown model parameters are estimated from the data. Such models are
called linear models

2) Vector regression: Support Vector Machines (SVM) are popularly and


widely used for classification problems in machine learning. Support Vector
Machines (SVM) are popularly and widely used for classification problems in
machine learning. The problem of regression is to find a function that
approximates mapping from an input domain to real numbers on the basis of a
training sample. When we are moving on with SVR, is to basically consider the
points that are within the decision boundary line. Our best fit line is the
hyperplane that has a maximum number of points.

3) Gradient boosting regression: Gradient boosting works by multiple residual


models together. The error value (difference between the actual and predicted
value) and the X value is taken as input at each start of residual model. Then after
calculating the prediction, its error value is fed into next residual model. This
process goes on until the error becomes minimum. Gradient boosting is a machine
learning technique used in regression and classification tasks, among others. It
gives a prediction model in the form of an ensemble of weak prediction models,
which are typically decision trees. When a decision tree is the weak learner, the
resulting algorithm is called gradient-boosted trees; it usually
5 | Institute of Engineering & Management

outperforms random forest. A gradient-boosted trees model is built in a


stagewise fashion as in other boosting methods, but it generalizes the other
methods by allowing optimization of an arbitrary differentiable loss function.

4) Random forest regression: The random forest algorithm works by taking


different decision trees together and combining them. Next predictions are made
for each and averaging is done on the basis of these predictions. The prediction is
made on the basis of most accurate results. The following flowchart describes the
working of the random forest algorithm.

Chapter 4
4.1 Experimental Dataset
In machine learning according to regression algorithms all the data in dataset
needs to go through some steps to get the correct prediction accuracy. They are-
⮚ Data collection

⮚ Data formation

⮚ Model selection

⮚ Training

⮚ Testing
Chosen dataset below-
6 | Institute of Engineering & Management

Above dataset is chosen because it has 23 attributes which are so much


predictable and assurance of the accuracy highest. This is cleaned dataset .to get
this cleaned dataset we’ve gone through those steps mentioned previously.

Data collection: We have collected datasets of house price and all other attributes
that are applicable from online website. We have downloaded the .csv files in
which information was present.

Data formation: The collected data is formatted into suitable data sets. We check
the collinearity with the mean value .The data sets which have collinearity to 1.0
or more are selected.

Model Selection: We have selected different models to minimize the error of the
predicted value. The different models used are linear regression ,random forest
regression, gradient boosting regression, vector regression.

Training: The data sets was divided such x_train is used to train model with
corresponding x_tyest values and some y_train kept reserved for testing.

Testing: The model was tested with y_train and stored in y_predict. Both y_train
and y_predict was compared.
7 | Institute of Engineering & Management

All this steps are followed every time when we run for different model but using
the same dataset.

Chapter 5
5.1 Result and Output
After following data cleaning and testing model prediction, we got our predicted
result.

Following is the actual codes and outputs for the predicted price of House price
prediction –

In different regression model we’ve used the dataset and predict the result which
is-

1) Linear regression:

In this code we’ve used for importing libraries and reading files
8 | Institute of Engineering & Management

This is for data cleaning

Here we designated the values which will be considered for training and testing
data.
9 | Institute of Engineering & Management
This is the code is for linear regression model.

12 | Institute of Engineering & Management


This is the predicted output result of the model.

Output-

Mean squared error- 877146956.1425225

Root mean squared error- 29616.666864158135

R2 value- 0.87

2) Vector regression :

Now we here load the vector regression model and run the code for desirable
more accurate result.

R2 value- -0.052043992698062036

Adjusted R2- -0.07455855285307877

RMSE- -80992.68532803077

13 | Institute of Engineering & Management


SVM chart-

Output for this model

R2 value- -0.04491082465181462

Adjusted R2- -0.1406943169115642

RMSE- -77630.93962507547 2)

Random forest:

Code and output for the model

14 | Institute of Engineering & Management


R2 value- 0.9770202518773589

Adjusted R2- 0.9765284673844619

Random forest charts-

Price vs predicted price

15 | Institute of Engineering & Management


Output prediction

R2 value- -0.04491082465181462

Adjusted R2- -0.1406943169115642

RMSE- -77630.93962507547

3) Gradient boosting:

Code and accuracy result of prediction

Accuracy- 86.88557238296252

16 | Institute of Engineering & Management


5.2 Finding and discussion
After using all regression model used in a single dataset we got different several
output prediction. But we got the highest result in the prediction predicted with
linear regression model. Which is 87% accuracy no ,and in random forest we got
77%accuracy result ,in vector regression got 80% and in gradient boosting
regression model 86% accuracy prediction.
Different regression model Predicted
accuracy
result

1)Linear regression model 87%

2)Vector regression model 80%

3)Random forest regression 77%


model

4)gradient boosting regression 86%

17 | Institute of Engineering & Management


Comparison graph for the different regression model-

RMSE

Gradient Boosting

Random forest

Vector regression model

Linear regression model

72 74 76 78 80 82 84 86 88

RMSE

Chapter 6
6.1 Conclusion
In this project report, our team came up with a method to predict property rates
over a particular region. We have selected a database from Kaggle and Magic
bricks and worked out with 23 attributes and 4 algorithms for the properties .We
processed the dataset to remove any irregularities. then we fit a portion of this
dataset to analyze the models Hereafter, we calculated the R2 score for our
prediction model, which came out to be this system can be taken further to
predict the prices of properties in more cities and rural areas all over the
world .We can also go live by converting our project to Websites on the internet
after making a good web interface and adding more features to the system. A
system like ours can be developed to predict the appreciation in the price of the
property too.

6.2 Future Work


We’ve done our research project in such a way so that there are so many future
scope in case of extending the research work. In this project we’ve only discuss all

18 | Institute of Engineering & Management


the factor that how different regression model in AI ml can give different
prediction by using only the same dataset for all models. In future we can extend
this project work as like that all prediction can be merged and give the single
prediction for all and using Google map we can find our right room ,our perfect
destination home easily as the price prediction in all area will be accurate.

19 | Institute of Engineering & Management

You might also like