0% found this document useful (0 votes)
25 views55 pages

house price prediction

The document presents a mini project report on 'House Price Prediction' by a group of students from SNS College of Technology, focusing on using machine learning algorithms to provide accurate and unbiased property valuations. It emphasizes the integration of various factors such as location, property features, and real-time market data to enhance prediction accuracy and promote transparency in real estate transactions. The project aims to create a fair platform for buyers and sellers, leveraging data-driven insights to foster trust and informed decision-making in the housing market.

Uploaded by

selvaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views55 pages

house price prediction

The document presents a mini project report on 'House Price Prediction' by a group of students from SNS College of Technology, focusing on using machine learning algorithms to provide accurate and unbiased property valuations. It emphasizes the integration of various factors such as location, property features, and real-time market data to enhance prediction accuracy and promote transparency in real estate transactions. The project aims to create a fair platform for buyers and sellers, leveraging data-driven insights to foster trust and informed decision-making in the housing market.

Uploaded by

selvaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

HOUSE PRICE PREDICTION

BY

SANTHOSH S (22CS138)
SHRIYANS (22CS149)
SURUTHIKA C (22CS158)
VIJAY A (22CS166)

MINI PROJECT REPORT

Submitted to the

FACULTY OF COMPUTER SCIENCE &


ENGINEERING

In partial fulfillment for the award of


the degree of

BACHELOR OF ENGINEERING

SNS COLLEGE OF TECHNOLOGY,


COIMBATORE-35

(AN AUTONOMOUS INSTITUTION)

Department of Computer Science & Engineering

NOVEMBER 2024
BONAFIDE CERTIFICATE

Certified that this Project Report titled, “HOUSE PRICE PREDICTION” is the bonafide record

of “SANTHOSH S, SHRIYANS S, SURUTHIKA C, VIJAY A” who carried out the Project

Work under our supervision. Certified further,that to the best of my knowledge the work reported

herein does not form part of any other project report or dissertation on the basis of which a degree

or award was conferred on an earlier occasionon this or any other candidate.

PROJECT GUIDE HEAD OF THE DEPARTMENT

Mr SELVAKUMAR N AP/CSE Dr.K.SANGEETHA ASP/CSE

Assistant Professor, Head of the Department,

Department of Computer Science & Engg., Department of Computer Science & Engg.,,

SNS College of Technology, SNS College of Technology,

Coimbatore-641035. Coimbatore-641 035.

Submitted for the Viva-Voce examination held at SNS COLLEGE OF TECHNOLOGY, held
on ……………………………..

Examiner 1 Examiner 2

i
ACKNOWLEDGMENT

First of all, we extend our heart-felt Gratitude to the management of SNS College of Technology,
for providing us with all sorts of supports in completion of this mini project.

We record our indebtedness to our Director Dr.V.P.Arunachalam, and our Principal


Dr.S.Chenthur Pandian, for their guidance and sustained encouragement for the successful
completion of this mini project.

We are highly grateful to Dr.L.M.Nithya, Professor & Dean/CSE,IT & AIML for her valuable
suggestions and guidance throughout the course of this project, her positive approach had offered
incessant help in all possible ways from the beginning.

We are profoundly grateful to Dr.K.Sangeetha, Associate Professor & Head, Department of


Computer Science & Engineering for her consistent encouragement and directions to improve our
mini project and completing the project work in time.

Words are inadequate in offering our thanks to the Project Coordinator, Mrs. Subhashree P,
Assistant Professor, Department of Computer Science & Engineering, for her encouragement and
cooperation in carrying out the mini project work.

We take immense pleasure in expressing our humble note of gratitude to our project guide, Mr.
Selvakumar N Assistant Professor, Department of Computer Science &Engineering, for his
remarkable guidance and useful suggestions, which helped us in completing the project work in
time.

We also extend our thanks to other faculty members, Parents and our friends for their moral support
in helping us to successfully complete this mini project.

ii
ABSTRACT

This project introduces a cutting-edge solution for predicting house prices using machine
learning algorithms, designed to transform the way real estate transactions are conducted. By
incorporating key features such as location, BHK (bedroom-hall-kitchen) configuration,
neighborhood area, and land area, the model provides reliable price predictions tailored to
individual property characteristics. In addition to these primary features, advanced algorithms are
implemented to evaluate critical secondary factors, including the quality of the neighborhood,
proximity to essential services like schools, hospitals, and public transport, and current market
trends. These factors enable the model to deliver more precise predictions that reflect the true
market value of a property.

A unique strength of the system lies in its ability to ensure fairness and transparency. The
model provides unbiased, data-driven price estimates, effectively addressing the issue of price
manipulation, which is common in traditional real estate practices. By relying on objective data
and statistical modeling, it helps buyers gain confidence in their decisions while offering sellers a
credible platform to showcase their properties. The project is further enhanced with real-time data
integration, allowing it to adapt to the ever-changing dynamics of the housing market. Real-time
updates ensure that predictions remain current, capturing trends like rising demand in a particular
area or seasonal fluctuations in property prices. This feature ensures that buyers and sellers can
make informed decisions based on the latest market insights.

The ultimate goal of this project is to provide a transparent and equitable platform for all
stakeholders in the real estate ecosystem. Buyers benefit from trustworthy price insights that guide
their investment decisions, while sellers gain a reliable estimation of their property’s value, helping
them set competitive prices. For real estate agents, the platform serves as a valuable tool for
negotiating deals that are fair to all parties involved. By promoting data-driven decision-making,
this machine learning-based system paves the way for a more efficient, informed, and trustworthy
real estate industry. It bridges the gap between buyers, sellers, and market realities, fostering
transactions built on trust and mutual benefit.

iii
TABLE OF CONTENT

CHAPTER NO TITLE PAGE NO

ABSTRACT iii

LIST OF FIGURES vii

LIST OF ABBREVATIONS viii

1 INTRODUCTION 1

2 LITERATURE SURVEY 4

3 PROJECT ANALYSIS 11
3.1 Existing System 11
3.1.1 Drawback 11
3.2 Problem Statement 13
3.3 Proposed System 13
3.3.1 Advantages 16

4 SYSTEM SPECIFICATION 18
4. 1 Software specification 18
4.1.1 Python 18
4.1.2 User Interface 19
4.1.2.1 Streamlit 19

iv
4.1.3 Algorithm 20
4.1.3.1 Random Forest 20
4.1.3.2 Scikit-learn 20
4.1.4 Libraries 21
4.1.4.1 Numpy 21
4.1.4.2 Pandas 21
4.1.5 Dataset 22
4.1.5.1 Kaggle 22

5 PROJECT DESCRIPTION 23
5.1 System Design 23
5.2 Module Description 28
5.2.1 Data Collection And Preprocessing 28
5.2.2 Feature Selection And Engineering 28
5.2.3 Machine Learning Model 28
Development
5.2.4 Prediction And Decision Support 29
5.2.5 Transparency And Ethical 29
Considerations
5.2.6 User Interface And Accessibility 29
5.2.7 Continuous Learning And 30
Adaptation

6 IMPLEMENTATIONS 31
5.1 System Model 31

7 CONCLUSION AND FUTURE WORKS 35


APPENDIX I –SOURCE CODE 37
APPENDIX II -SCREENSHOT 41
REFERENCES 45
v
LIST OF FIGURES

FIG.NO TITLE PAGE NO

1.1 ESTIMATION OF HOUSE PRICE 18

3.1 WORK FLOW OF THE PROJECT 14

3.2 HOUSE PRICE PREDICTION 15

4.1 PYTHON 19

4.2 STREAMLIT 20

4.3 SCIKIT LEARN 21

4.4 NUMPY 21

4.5 PANDAS 22

5.1.1 FLOW CHART 23

5.1.2 FLOW CHART 26

6.1 PREDICTION OF HOUSE PRICE 31

A.2.1 HOME PAGE 41

A.2.2 HOUSE PRICE PREDICTION 42

A.2.3 SOURCE CODE 43

A.2.4 SOURCE CODE 44

vii
LIST OF ABBREVATION

ABBREVATION EXPANTION

LSTM Long Short-Term Memory

SVM Support Vector Machines


CMA Comparative Market Analysis
AVM Automated Valuation Model
AI Artificial Intelligence
NumPy Numerical Python

viii
CHAPTER 1
INTRODUCTION

The real estate market is one of the most dynamic and complex sectors globally, with
property prices influenced by a myriad of factors such as location, market demand,
amenities, and neighborhood quality. Traditional methods of property valuation often rely
on subjective assessments or limited data, which can lead to inconsistencies, inaccuracies,
and even manipulation in pricing. This creates challenges for buyers, sellers, and real estate
agents, resulting in a lack of trust and transparency in transactions.
To address these challenges, this project introduces a Machine Learning-Based House
Price Prediction System that leverages the power of advanced data analytics to offer precise
and unbiased property valuations. By incorporating critical features such as location, BHK
configuration, neighborhood area, and land size, the system provides predictions that are not
only accurate but also reflective of real market conditions. The inclusion of additional
factors such as proximity to services, neighborhood quality, and current market trends
further enhances the model’s reliability.
One of the standout features of this project is its focus on real-time data integration,
which ensures that the model remains adaptable to the fast-changing dynamics of the real
estate market. Unlike static valuation methods, this approach captures fluctuations in
property prices due to seasonal demand shifts, economic conditions, and emerging market
trends, providing stakeholders with the most relevant and up-to-date information.
This system is not merely a tool for prediction—it is a step toward creating a more
transparent and equitable real estate ecosystem. By offering data-driven insights, the model
eliminates biases, reduces the risk of price manipulation, and promotes fair transactions. For
buyers, it serves as a trustworthy guide to assess a property's true value. For sellers, it
provides a credible platform to price their properties competitively.

1
Figure 1.1 Estimation of House Price

The goal of this project is to transform real estate transactions by fostering trust,
reducing uncertainties, and ensuring that decisions are based on accurate and comprehensive
data. As technology continues to reshape industries, this innovative solution is a prime
example of how machine learning can revolutionize traditional practices, making them more
efficient, transparent, and fair.
Another advantage of this approach is its ability to handle large datasets
encompassing diverse variables. For instance, the system can process information on
historical sales data, population growth in a region, infrastructure development, and even
environmental factors like air quality or natural disaster risks. By analyzing this rich data
pool, the model can uncover patterns and relationships that are often overlooked, providing
a more nuanced and accurate prediction of property values.
Transparency and fairness are at the heart of this initiative. In many real estate
transactions, buyers and sellers rely heavily on third-party valuations, which may not always
be impartial. By utilizing an unbiased, data-driven approach, this system eliminates the
influence of subjective opinions or market manipulation. Buyers gain confidence in making
2
investments based on reliable insights, while sellers can trust the system to reflect the true
value of their property without external biases.
Moreover, the integration of such a system can foster trust and collaboration among
all stakeholders in the real estate ecosystem. Real estate agencies, brokers, and developers
can use this platform to provide consistent and transparent pricing strategies, enhancing their
credibility in the market. Governments and policymakers can also leverage these insights to
monitor market trends, identify housing affordability issues, and implement targeted
interventions to support sustainable development.

3
CHAPTER 2
LITERATURE SURVEY

[1] Machine Learning for Real Estate Price Prediction

Smith et al. (2020) conducted a comprehensive study on the application of machine


learning models for predicting real estate prices. The researchers utilized supervised learning
algorithms such as Random Forest, Support Vector Machines (SVM), and Gradient
Boosting to predict house prices based on features like location, property size, and number
of rooms (BHK). The study focused on datasets collected from metropolitan cities with
diverse economic conditions. Among the models tested, Random Forest emerged as the
most accurate due to its ability to handle both categorical and numerical data while
effectively managing missing values and outliers. The research highlighted the importance
of hyperparameter tuning in enhancing prediction accuracy and demonstrated the scalability
of these models for large datasets.
ADVANTAGES:
 Comprehensive Analysis: The study utilized multiple supervised learning algorithms
(Random Forest, SVM, Gradient Boosting), enabling a thorough comparison and
validation of their effectiveness in predicting real estate prices.
 Practical Insights: Highlighting Random Forest's ability to handle categorical and
numerical data, missing values, and outliers provided valuable insights into its
robustness and suitability for real-world applications.
DISADVANTAGES:
 Limited Scope of Data Diversity: The study focused only on datasets from
metropolitan cities, which might not generalize well to rural or less economically
diverse regions.

4
 Algorithmic Bias: While Random Forest was deemed the most accurate, the reliance
on hyperparameter tuning for all algorithms could introduce biases or mask other
potentially impactful variables.
[2] Impact of Geographic and Socioeconomic Factors on Property Valuation
Chau and Wong (2019) explored the influence of geographic and socioeconomic
variables, including proximity to public services such as schools, hospitals, and
transportation hubs, on real estate pricing. Their study combined geospatial analysis with
machine learning models, such as regression trees and clustering algorithms, to capture the
nuanced impact of these variables. The findings indicated that properties located near
essenti-al amenities and low-crime neighborhoods were valued significantly higher. The
authors emphasized the importance of integrating socioeconomic factors into predictive
models to provide a holistic understanding of property valuation, improving the accuracy
and relevance of price predictions.
ADVANTAGES:
 Holistic Understanding of Property Valuation: The study emphasizes integrating
socioeconomic factors, such as proximity to public services and neighborhood safety,
which provides a comprehensive approach to property pricing, leading to more accurate
and relevant price predictions.
 Use of Advanced Analytical Methods: By combining geospatial analysis with
machine learning models like regression trees and clustering algorithms, the study
effectively captures complex relationships between variables, enhancing the predictive
power of the model.
DISADVANTAGES:
 Potential Data Limitations: The study's reliance on geographic and socioeconomic
data may face limitations in data availability or accuracy, particularly for regions where
such information is not easily accessible or reliable, affecting the robustness of the
predictions.
 Model Complexity and Interpretability: The use of complex machine learning
5
models such as regression trees and clustering algorithms may make the results less
interpretable for stakeholders without technical expertise, limiting the practical
application of the findings for non-experts.

[3] Real-Time Integration of Market Trends in Price Prediction Models


Kumar and Gupta (2021) introduced a novel approach to incorporating real-time
market data into house price prediction systems. The study employed time-series forecasting
models, such as ARIMA and Long Short-Term Memory (LSTM) networks, to analyze
fluctuations in property prices over time. These predictions were then combined with
traditional machine learning models to provide comprehensive price forecasts. The
integration of market indicators, including demand-supply ratios, housing inventory levels,
and economic conditions like interest rates, allowed the system to adapt dynamically to
changing trends. The authors demonstrated that real-time data integration improved the
responsiveness of prediction models, making them suitable for fast-evolving markets.
ADVANTAGES:
 Dynamic Adaptation to Market Changes: By incorporating real-time market data
such as demand-supply ratios, housing inventory levels, and economic conditions, the
approach offers better responsiveness to fluctuations in property prices. This makes the
system more reliable and adaptable to rapidly changing market trends.
 Enhanced Prediction Accuracy: The use of both time-series forecasting models
(ARIMA, LSTM) and traditional machine learning models allows for a more
comprehensive analysis, potentially leading to more accurate and robust house price
predictions.
DISADVANTAGES:
 Complexity in Data Integration: The integration of real-time data with traditional
models requires sophisticated data processing techniques, which could make the
system more complex to implement and maintain, especially for users with limited
technical expertise.
6
 Dependence on Data Quality: The accuracy of the predictions heavily depends on the
quality and timeliness of the market data. Inaccurate or outdated data could lead to poor
forecasting, undermining the effectiveness of the model.

[4] Predictive Accuracy of Ensemble Methods for Housing Prices


Lee et al. (2020) investigated the effectiveness of ensemble methods, such as Random
Forest, XGBoost, and LightGBM, in predicting housing prices. The study compared these
advanced methods to simpler models like Multiple Linear Regression and Decision Trees.
It was observed that ensemble techniques consistently outperformed traditional models due
to their ability to handle complex, nonlinear relationships and interactions between features.
The authors highlighted how XGBoost provided the best balance of speed and accuracy,
particularly when combined with effective feature selection and engineering. The study
emphasized the use of cross-validation techniques to reduce overfitting and ensure the
robustness of predictions.
ADVANTAGES:
 Comparison of Models: The study provides a comprehensive comparison of advanced
ensemble methods (Random Forest, XGBoost, and LightGBM) against traditional
models (Multiple Linear Regression and Decision Trees), allowing readers to
understand the strengths and weaknesses of each approach for housing price prediction.
 Focus on Practical Techniques: The emphasis on feature selection, engineering, and
cross-validation techniques provides practical insights for improving model
performance and reducing overfitting, which is valuable for real-world applications.
DISADVANTAGES:
 Lack of Specifics on Data or Domain Context: The literature survey does not provide
detailed information about the dataset or the domain context of the housing price
prediction, making it difficult to assess how applicable the findings are to different
types of data or geographical regions.

7
 Limited Exploration of Hyperparameter Tuning: While the paper highlights
XGBoost’s performance, it doesn’t delve deeply into the impact of hyperparameter
tuning, which can significantly affect model outcomes in ensemble methods.

[5] Role of Neighborhood Quality and Amenities in Predicting House Prices


Patel et al. (2022) conducted an in-depth analysis of neighborhood attributes such as
green spaces, crime rates, and proximity to essential services like supermarkets, schools,
and recreational areas. These variables were integrated into machine learning models,
alongside property-specific features, to assess their impact on house price predictions. The
study found that incorporating neighborhood quality improved the model’s accuracy by 15%
compared to models relying solely on traditional features like location and property size.
The researchers demonstrated that clustering similar neighborhoods using techniques such
as k-means clustering before applying predictive models improved the interpretability and
performance of their predictions.
ADVANTAGES:
 Improved Prediction Accuracy: The incorporation of neighborhood attributes (e.g.,
green spaces, crime rates, proximity to essential services) alongside property-specific
features led to a 15% improvement in model accuracy. This demonstrates the added
value of including additional contextual factors in house price prediction.
 Enhanced Interpretability: By using clustering techniques like k-means to group
similar neighborhoods, the researchers improved the interpretability of the model,
making it easier to understand the relationship between neighborhood quality and
house prices.
DISADVANTAGES:
 Complexity and Computation: The addition of neighborhood attributes and the use
of clustering techniques may increase the complexity of the model, requiring more data
preprocessing and computational resources, potentially making the model harder to
deploy in real-world scenarios.
8
 Data Availability and Quality: Neighborhood-specific data, such as crime rates and
proximity to services, may not always be available or accurate for all regions, which
could limit the applicability or generalization of the model across different
geographical areas.

[6] Data Preprocessing and Feature Engineering in Real Estate Price Prediction
Garcia et al. (2021) explored the importance of preprocessing real estate data to
enhance the performance of machine learning models. Key preprocessing techniques
discussed included handling missing values, removing outliers, and scaling numerical
features to ensure uniformity. The study also delved into feature engineering, creating new
variables such as "price per square foot" or "distance to city center," which added significant
predictive power to the models. The researchers compared various preprocessing methods
and found that rigorous data cleaning and transformation processes contributed to better
model stability and accuracy, particularly for ensemble and deep learning-based methods
like neural networks.

ADVANTAGES:
 Enhanced Model Performance: The study emphasizes the importance of
preprocessing techniques like handling missing values, removing outliers, and scaling
features, which help improve the accuracy and stability of machine learning models,
particularly ensemble and deep learning methods.
 Effective Feature Engineering: By introducing new features such as "price per square
foot" and "distance to city center," the study demonstrates how feature engineering can
significantly boost the predictive power of the models.

9
DISADVANTAGES:
 Data Dependency: The effectiveness of the preprocessing techniques and feature
engineering heavily relies on the quality and characteristics of the real estate data,
which may not be applicable to all datasets or industries.
 Time-Consuming Process: Implementing rigorous data cleaning and transformation
steps can be resource-intensive and time-consuming, especially for large datasets,
which may increase the overall project timeline.

10
CHAPTER 3
PROJECT ANALYSIS

3.1 EXISTING SYSTEM


Existing systems for house price estimation typically rely on traditional methods, such
as manual appraisals by real estate agents or simple comparative market analysis (CMA),
which involve comparing the target property to similar properties recently sold in the same
area. These methods often depend on subjective judgment and limited data, which can lead
to inaccuracies or biases in pricing. Many systems also focus on basic factors like location
and square footage but fail to integrate more nuanced features, such as the condition of the
property, neighborhood trends, or macroeconomic factors that could influence price
fluctuations. Some automated valuation models (AVMs) exist, but they often rely on a
limited dataset or out-of-date market information, which may not fully capture the current
real estate dynamics. Additionally, these systems may not be transparent about how they
arrive at price estimates, leading to trust issues with users. In contrast, the proposed system
offers a more comprehensive, data-driven approach that ensures accuracy, objectivity, and
fairness by leveraging broader datasets and machine learning techniques to make real-time
price predictions.

3.1.1 DRAW BACKS:


 Subjectivity and Bias

Traditional methods like manual appraisals and Comparative Market Analysis


(CMA) are highly dependent on the expertise and judgment of real estate agents or
appraisers, which introduces subjectivity and the potential for bias. This can lead to
inconsistent and inaccurate price estimates, as the valuation may vary depending on the
appraiser's experience or personal perspective.
 Limited Data and Features

11
Many existing systems rely on basic factors such as location, square footage, and
recent sales data. However, they often fail to account for other important features such
as property condition, neighborhood trends, and macroeconomic factors, which can
significantly influence house prices. As a result, the price predictions may be incomplete
or outdated.
 Inaccuracy in Dynamic Markets

Traditional systems often struggle to adapt quickly to changes in the real estate
market, especially in volatile conditions. They may use historical data that doesn’t reflect
recent shifts in market trends, leading to overvalued or undervalued property estimates.
 Lack of Transparency

Many existing automated valuation models (AVMs) do not provide clear explanations
of how their estimates are derived. This lack of transparency can lead to a lack of trust
from users, as they may not understand the logic behind the price prediction or feel
confident in its accuracy.
 Vulnerability to Manipulation

Real estate agents and sellers may manipulate or adjust their pricing strategies based
on their own goals or interests, which can lead to inflated or misleading house price
estimates that do not reflect the true market value.
 Inability to Handle Complex Variables

Current systems often fail to process complex or non-linear relationships between


various factors, such as the interaction between market conditions and property features.
Advanced factors like infrastructure developments, zoning laws, or neighborhood
gentrification are typically not incorporated into the predictions, which can make these
systems less reliable.

12
3.2 PROBLEM STATEMENT

Real estate markets often suffer from a lack of accurate and accessible methods for
predicting house prices, which can result in overvaluation or undervaluation of properties.
These inaccuracies can negatively impact both buyers and sellers, leading to poor
investment decisions, financial losses, and an overall lack of confidence in the market.
House prices are influenced by a complex combination of factors, including location,
property size, neighborhood features, local amenities, and market trends. To accurately
predict house prices, it is essential to consider these variables collectively, which requires a
more advanced, data-driven approach rather than relying on traditional methods.
Traditional methods of house price estimation often depend on intuition or limited
historical data, which can fail to capture the nuances of a dynamic market. In contrast, a
data-driven prediction model can provide a much more accurate and reliable estimate by
analyzing vast amounts of historical data alongside current market trends. By integrating
machine learning algorithms and statistical techniques, such a model can account for the full
range of factors that influence house prices. This enables real estate stakeholders, including
buyers, sellers, and investors, to make more informed decisions based on empirical evidence
and projections for future market conditions. The ability to forecast price trends with higher
accuracy can also reduce the risk of market volatility and help maintain stability in the real
estate industry. This modern, analytical approach offers a clear advantage over traditional
methods, driving both transparency and efficiency in property transactions.

3.3 PROPOSED SYSTEM

The proposed system aims to revolutionize the real estate market by offering an
automated and unbiased approach to estimating house prices. Traditional methods of pricing
properties often rely on subjective assessments, which can lead to inaccuracies and potential
disputes. This system addresses those challenges by analyzing key features that significantly
influence property values, such as location, the number of bedrooms, hall size, land area,
13
and other relevant factors like proximity to amenities, schools, and transport links. By
incorporating these attributes, the system ensures precise and data-backed price predictions,
serving as a reliable tool for all stakeholders in the housing market.

Figure 3.1 Workflow of the project

A significant advantage of this system is its role as a data-driven decision-support


tool. It leverages extensive historical housing data and current market trends to provide
accurate predictions of property values. The use of advanced machine learning algorithms
enables the system to identify patterns and correlations in the data, which are then used to
make predictions. This allows potential buyers to evaluate whether a property is priced fairly
and empowers sellers to set competitive and realistic prices. Moreover, the system's ability
to adapt to evolving market conditions ensures its predictions remain relevant and useful
over time.

14
Figure 3.2 House Price Prediction

An essential feature of this system is its emphasis on transparency and fairness, which
are often lacking in traditional real estate dealings. Sellers or agents can sometimes
manipulate property prices for personal gain, creating an imbalance in the market. This
automated system removes such biases by basing its estimates solely on data, ensuring
objectivity. Buyers can confidently rely on these estimates to make informed decisions,
fostering trust and promoting ethical practices in real estate transactions. This system not
only empowers individuals but also contributes to a more balanced and equitable housing
market.
To enhance usability, the system can be designed with a user-friendly interface,
allowing even non-technical users to access and interpret its insights easily. Features like
interactive dashboards, customizable inputs, and visual data representations can make the
system accessible to a wide audience, including first-time buyers, seasoned investors, and
real estate professionals. Additionally, by providing real-time insights, the system can

15
significantly reduce the time spent on negotiations and decision-making processes, making
transactions smoother and more efficient.
In conclusion, the proposed system has the potential to transform the real estate
market by promoting informed decision-making, fostering fairness, and ensuring
transparency. Its innovative use of data and machine learning algorithms can address
longstanding challenges in property pricing, making it a valuable tool for buyers, sellers,
and industry professionals alike. By eliminating biases and streamlining processes, this
system paves the way for a more ethical and efficient housing market.

3.3.1 ADVANTAGES

 The data-driven house price estimation model offers several key advantages over
traditional methods of property valuation. By leveraging historical data, market trends,
and various property features like location, size, and neighborhood, this model provides
more accurate and objective predictions of house prices.
 Unlike traditional approaches that often rely on intuition or limited data, the model
accounts for a comprehensive set of variables, reducing the risk of overvaluation or
undervaluation. This helps both buyers and sellers make informed decisions,
minimizing financial risks and ensuring fair pricing in the market.
 Additionally, the model's ability to continuously learn from new data means that it can
adapt to changing market conditions, offering real-time insights that traditional
methods cannot provide. Ultimately, this data-driven approach enhances transparency,
fairness, and trust in real estate transactions, benefiting all stakeholders and
contributing to a more stable and efficient housing market.
 The transparency offered by a data-driven approach further enhances its appeal. By
providing a clear breakdown of the factors influencing the price estimation, stakeholders
can better understand the rationale behind the predictions. This builds trust among buyers,
sellers, and real estate agents, fostering confidence in the property market.

16
 In addition to accuracy and transparency, the model contributes to a more efficient
housing market. Automated valuation systems reduce the time and effort required for
property appraisals, streamlining transactions and enabling faster decision-making. For
real estate agencies and financial institutions, these models can also improve operational
efficiency by providing instant property valuations, aiding in mortgage approvals, or
guiding investment decisions.
 Furthermore, the integration of advanced machine learning techniques allows these
models to uncover emerging trends in the housing market. For example, identifying
upcoming hotspots for property development or neighborhoods with declining values
helps investors and policymakers anticipate market shifts and make proactive decisions.
 Ultimately, the data-driven approach transforms the real estate industry by enhancing
transparency, fairness, and trust in property transactions. It benefits all stakeholders,
including buyers, sellers, real estate agents, and financial institutions, while contributing
to a more stable, equitable, and efficient housing market. As technology continues to
advance, these models will play an increasingly central role in shaping the future of real
estate valuation.

17
CHAPTER 4
SYSTEM
SPECIFICATION

4.1 SOFTWARE SPECIFICATION


4.1.1 PYTHON

4.1.2 USER INTERFACE


4.1.2.1 STREAMLIT
4.1.3 ALGORITHM
4.1.3.1 RANDOM FOREST
4.1.3.2 SCIKIT-LEARN
4.1.4 LIBRARIES
4.1.4.1 NUMPY
4.1.4.2 PANDAS
4.1.5 DATASET
4.1.5.1 KAGGLE

4.1.1 PYTHON

Python is an interpreted, object-oriented, high-level programming language with


dynamic semantics. Its high-level built-in data structures, combined with dynamic typing
and dynamic binding, make it very attractive for Rapid Application Development, as well
as for use as a scripting or glue language to connect existing components together. Python's
simple, easy to learn syntax emphasizes readability and therefore reduces the cost of
program maintenance. Python supports modules and packages, which encourages program
modularity and code reuse. The Python interpreter and the extensive standard library are
available in source or binary form without charge for all major platforms and can be freely
distributed. Python is used for server-side web development, software development,
mathematics, and system scripting, and is popular for Rapid Application Development and
18
as a scripting or glue language to tie existing components because of its high-level, built-in
data structures, dynamic typing, and dynamic binding. Program maintenance costs are
reduced with Python due to the easily learned syntax and emphasis on readability.
Additionally, Python’s support of modules and packages facilitates modular programs and
reuse of code. Python is an open-source community language, so numerous independent

programmers are continually building libraries and functionality for it.

Figure 4.1 Python

4.1.2 USER INTERFACE


4.1.2.1 STREAMLIT

Streamlit is an open-source Python library that simplifies the creation of interactive


web applications for machine learning, data visualization, and data science projects. It
enables developers to build intuitive and responsive apps with minimal effort, as it
eliminates the need for HTML, CSS, or JavaScript knowledge. Streamlit supports a range
of interactive widgets like sliders, buttons, and file uploaders, making it easy to add user
inputs and dynamically update visualizations or outputs. The library integrates seamlessly
with popular Python tools like Pandas, NumPy, and Matplotlib, allowing users to showcase
data insights and machine learning models interactively. With its real-time hot-reloading
feature, Streamlit ensures quick iterations and development. It's an excellent choice for
building dashboards, prototypes, or tools that enable non-technical users to interact with
complex models and datasets.

19
Figure 4.2 Streamlit

4.1.3 ALGORITHM
4.1.3.1 RANDOM FOREST
Random Forest is a versatile and powerful machine learning algorithm used for both
classification and regression tasks. It operates by constructing multiple decision trees during
training and merging their outputs for improved accuracy and robustness. Each tree is trained
on a random subset of the data, with features selected randomly at each split, which introduces
diversity and reduces overfitting. The final prediction is made by averaging the outputs of all
trees in regression tasks or by taking a majority vote in classification tasks. This ensemble
approach enhances the model's performance and stability compared to a single decision tree.
Random Forest is also effective in handling missing data and is less sensitive to noise, making
it a popular choice for complex datasets.

4.1.3.2 SCIKIT-LEARN
Scikit-learn is a popular Python library for machine learning and data analysis, built
on top of NumPy, SciPy, and Matplotlib. It provides simple and efficient tools for data
preprocessing, model training, evaluation, and deployment. Scikit-learn supports a wide
range of supervised and unsupervised learning algorithms, including classification,
regression, clustering, dimensionality reduction, and ensemble methods. It offers intuitive
APIs for implementing machine learning pipelines, making it accessible for both beginners
and experts. The library is highly optimized and widely used in academia and industry for
prototyping and production. Its comprehensive documentation and active community make
it an essential tool for data science projects.
20
Figure 4.3 Scikit Learn

4.1.3 LIBRARIES
4.1.4.1 NUMPY

NumPy, which stands for Numerical Python, NumPy is a Python library used for
working with arrays. It also has functions for working in domain of linear algebra, fourier
transform, and matrices. NumPy was created in 2005 by Travis Oliphant. It is an open source
project and you can use it freely. NumPy stands for Numerical Python. NumPy aims to
provide an array object that is up to 50x faster than traditional Python lists. NumPy is very
useful for performing logical and mathematical calculations on arrays and matrices. This
tool performs these operations much faster and more efficiently than Python lists. Numpy
uses less memory and storage space, which is the main advantage. It is a library consisting
of multidimensional array objects and a collection of routines for processing of array .

Figure 4.4 NumPy

4.1.4.2 PANDAS
PANDAS is short for Pediatric Autoimmune Neuropsychiatric Disorders Associated
with Streptococcal Infections. Pandas is mainly used for data analysis and associated
manipulation of tabular data in Data Frames. Pandas allows importing data from variousfile
formats such as comma-separated values, JSON, Parquet, SQL database tables or queries,

21
and Microsoft Excel. Pandas allows various data manipulation operations such as merging,
reshaping, selecting, as well as data cleaning, and data wrangling features. The development
of pandas introduced into Python many comparable features of working with Data Frames
that were established in the R programming language. The pandas library is built upon
another library, NumPy, which is oriented to efficiently working with arrays instead of the
features of working on Data Frames.

Figure 4.5 Pandas

4.1.4 DATASET:
4.1.4.1 KAGGLE
Kaggle is a global online community and platform that focuses on data science,
machine learning, and artificial intelligence. It provides access to an extensive repository of
datasets across various domains, enabling users to explore, analyze, and use them for
projects or competitions. Kaggle is best known for its competitive environment, where data
scientists and machine learning practitioners participate in challenges to solve real-world
problems while competing for prizes and recognition. It also offers a cloud-based workspace
with tools like Jupyter Notebooks, Python, and R, allowing users to perform data analysis
and modeling directly on the platform. Additionally, Kaggle hosts tutorials, courses, and
discussion forums, making it a great place to learn and collaborate. It’s widely used for
developing skills, building portfolios, and networking within the data science community.

22
CHAPTER 5
PROJECT DESCRIPTION

5.1 SYSTEM DESIGN

Start

Data collection

Data preprocess

Random forest

Prediction

Stop

FIGURE 5.1.1: FLOW CHART

23
FLOWCHART DESCRIPTION:

Start

The starting point of your project workflow. This step marks the initiation of the
process to build a machine learning model for solving your problem (e.g., house price
prediction, classification, etc.)

Data Collection

This step involves gathering all the necessary data for the project.The data may come
from various sources like publicly available datasets, APIs, or manual input.

Example: If this is a house price prediction project, the data could include house
attributes like location, area, number of bedrooms, and their respective prices.

Data Preprocessing

Data preprocessing is crucial to clean and prepare the raw data for analysis and model
training.

Common tasks include:

Handling missing values: Filling in or removing incomplete data points.

Feature scaling: Normalizing numeric data for uniformity.

Encoding categorical variables: Converting categories (e.g., city names) into


numerical values.

Data splitting: Dividing the data into training and testing sets.

The goal is to ensure the dataset is consistent, accurate, and usable for the Random
Forest algorithm.

Random Forest (Model Training)

This is the model training phase, where the Random Forest algorithm is used.Random
Forest is an ensemble machine learning algorithm that builds multiple decision trees and

24
combines their outputs to improve accuracy and reduce overfitting.The algorithm uses
the preprocessed data to learn patterns and relationships between the input features and
the target variable (e.g., house price).

Prediction

Once the model is trained, it can make predictions on new data inputs.

Example: Given new features like the number of rooms, land area, and location, the model
predicts the expected house price.This is the final goal of the machine learning model.

Stop

The workflow ends after predictions are made.The output (predictions) can be evaluated,
visualized, or used in real-world applications.

Example: A user interface could display predicted house prices based on user input.

25
FIGURE 5.1.2 : FLOW CHART

26
FLOWCHART DESCRIPTION:

Data Collection and Preprocessing


Collect housing data from reliable sources such as real estate websites, government
databases, or public datasets. Ensure the data is relevant and up-to-date.Handle missing
values, outliers, and inconsistencies in the data. This may involve imputation techniques,
removal of outliers, or data normalization. Create new features that might improve model
performance. This could involve combining existing features, extracting information from
text or image data, or transforming numerical features.
Data Splitting
 Train-Test Split: Divide the cleaned dataset into two subsets:
Training Set: Used to train the machine learning model.
Testing Set: Used to evaluate the model's performance on unseen data.
 Stratified Sampling: If the dataset is imbalanced, use stratified sampling to ensure that
the distribution of classes in the training and testing sets is representative of the original
dataset.
Model Selection and Training
Select a machine learning algorithm appropriate for regression tasks. Train the selected
model on the training dataset. This involves iteratively adjusting the model's parameters to
minimize the error between predicted and actual house prices.
Model Evaluation
Evaluate the model's performance on the testing set using metrics like Mean Squared Error
(MSE),Mean Absolute Error (MAE),Root Mean Squared Error (RMSE),R-squared.Compare
the performance of different models and select the one with the best performance metrics.
Price Prediction:
Deploy the trained model to a production environment, such as a web application, API, or
cloud-based platform. Create a user-friendly interface that allows users to input house
features and receive predicted prices.

27
5.2 MODULE DESCRIPTION
5.2.1 DATA COLLECTION AND PREPROCESSING

This module focuses on gathering and organizing the data required to train the
prediction model. Key data sources include historical housing prices, location details,
property features (e.g., number of bedrooms, hall size, and land area), and proximity to
amenities like schools, transport links, and recreational facilities. Data preprocessing
techniques such as handling missing values, outlier detection, normalization, and encoding
categorical data are applied to ensure the dataset is clean and ready for analysis. This step
is essential for creating a reliable foundation for subsequent predictive modeling, ensuring
accuracy and consistency across the entire system.

5.2.2 FEATURE SELECTION AND ENGINEERING

The system leverages this module to identify and prioritize the most influential
factors affecting house prices. By analyzing correlations and using techniques like
principal component analysis (PCA) and mutual information, the module highlights
features like location, neighborhood amenities, and market trends. Feature engineering
methods create new meaningful variables, such as walkability scores or the property’s
proximity to key amenities, enhancing the predictive power of the model. This module
ensures that the input data aligns closely with real-world dynamics, improving the
accuracy of price estimates.

5.2.3 MACHINE LEARNING MODEL DEVELOPMENT


The core of the system lies in this module, where machine learning algorithms such as
Random Forest, Gradient Boosting, or Neural Networks are employed. The model is
trained using historical data to capture patterns and relationships among the selected
features. Advanced techniques like hyperparameter tuning and cross-validation are used
to optimize the model’s performance. The focus is on ensuring the system learns
effectively from the data, balancing precision and generalizability to produce accurate
28
house price predictions. This module enables the system to handle complex, nonlinear
interactions between features, offering robust estimates even in dynamic markets.

5.2.4 PREDICTION AND DECISION SUPPORT


The core of the system lies in this module, where machine learning algorithms such as
Random Forest, Gradient Boosting, or Neural Networks are employed. The model is
trained using historical data to capture patterns and relationships among the selected
features. Advanced techniques like hyperparameter tuning and cross-validation are used
to optimize the model’s performance. The focus is on ensuring the system learns
effectively from the data, balancing precision and generalizability to produce accurate
house price predictions. This module enables the system to handle complex, nonlinear
interactions between features, offering robust estimates even in dynamic markets.

5.2.5 TRANSPARENCY AND ETHICAL CONSIDERATIONS


Addressing the challenge of bias and manipulation in traditional pricing methods, this
module ensures that predictions are based solely on data. It incorporates audit logs,
enabling users to trace how predictions are derived, thus fostering trust in the system.
Ethical algorithms minimize the risk of skewed results, promoting fairness and equity in
real estate transactions. This module’s emphasis on transparency reassures all
stakeholders, including buyers, sellers, and investors, creating a more balanced and reliable
market environment.

5.2.6 USER INTERFACE AND ACCESSIBILITY


Designed with inclusivity in mind, this module focuses on creating a user-friendly
interface. Features like intuitive dashboards, customizable inputs, and visual
representations make the system accessible even to non-technical users. Real-time insights
and simplified outputs ensure that users can make decisions efficiently without needing
extensive technical expertise. This module bridges the gap between complex machine
learning algorithms and everyday users, ensuring widespread adoption and usability of the
system across different demographics.
29
5.2.7 CONTINUOUS LEARNING AND ADAPTATION
To remain relevant in fluctuating markets, this module ensures the system continuously
learns from new data. It employs techniques like incremental learning and model retraining
to adapt to emerging trends, such as shifts in economic conditions or changes in buyer
preferences. This adaptive capability allows the system to provide up-to-date, reliable
predictions, ensuring long-term value for users. Regular performance monitoring and
feedback mechanisms also improve the system’s accuracy and robustness over time.

30
CHAPTER 6
IMPLEMENTATION

6.1 SYSTEM MODEL

Figure 6.1 Prediction of House Price

31
6.1.1 SYSTEM MODEL
The system model of the proposed house price prediction system is meticulously
designed to ensure precision, scalability, and accessibility for users. At its core, the model
operates in three interconnected stages: data preprocessing, model training, and prediction
generation. This architecture ensures that each step contributes to a seamless and accurate
prediction process, addressing the multifaceted factors influencing house prices. By
systematically analyzing key features and integrating them into a machine learning
framework, the system model offers an innovative and reliable solution for the real estate
industry.
The foundation of the system model lies in the preprocessing of raw data. Datasets
sourced from Kaggle often include extensive information about properties, such as size,
number of rooms, location, and market trends. This raw data is cleaned, standardized, and
transformed into a format suitable for analysis using Python libraries like Pandas and
NumPy. The preprocessing step includes handling missing values, removing outliers, and
encoding categorical variables such as neighborhood types or proximity to landmarks. By
preparing the data meticulously, the system ensures that the machine learning model can
focus on relevant patterns and correlations, resulting in more accurate predictions.
The implementation of the proposed system for house price prediction integrates
advanced technologies to address the challenges faced in the real estate market, offering
significant utility to stakeholders such as buyers, sellers, and investors. By utilizing Python
as the programming language, the system ensures flexibility and efficiency in handling
complex algorithms and large datasets. The simplicity and versatility of Python make it ideal
for developing a robust model that can analyze multiple features influencing house prices,
such as location, property size, and proximity to amenities. This implementation
demonstrates how modern tools can empower users with actionable insights, creating a more
equitable and efficient housing market.

32
The user interface is built using Streamlit, a powerful framework that simplifies the
development of interactive web applications. This ensures the system is accessible even to
non-technical users. The interface allows stakeholders to input property details easily,
explore predictions, and visualize data in an intuitive manner. By presenting insights
through interactive dashboards and graphs, the system demystifies complex data analysis,
enabling users to interpret results effectively. This accessibility not only broadens the
potential audience but also democratizes access to reliable information, making it easier for
first-time buyers and small-scale investors to make informed decisions.
At the core of the system lies the Random Forest algorithm, implemented using the
Scikit-learn library. This algorithm excels in predictive accuracy and adaptability, making
it well-suited for a dynamic market like real estate. Random Forest works by creating a
multitude of decision trees and combining their outputs to produce precise predictions. Its
ability to identify complex patterns and relationships among features ensures that the model
captures even subtle influences on property prices, such as seasonal trends or the impact of
nearby infrastructure projects. This data-driven approach eliminates guesswork, reducing
the risks of overvaluation or undervaluation and enhancing trust in market transactions.
The implementation also leverages libraries such as NumPy and Pandas to handle data
preprocessing and analysis efficiently. These libraries allow for seamless manipulation of
large datasets, ensuring the system can handle diverse and extensive historical data sourced
from platforms like Kaggle. Historical data forms the backbone of the model, as it helps
identify trends and correlations that influence pricing. By analyzing past transactions and
market conditions, the system provides predictions rooted in evidence, enabling users to
anticipate future market behavior with confidence.
This implementation brings tangible benefits to buyers, sellers, and investors by
offering transparent and unbiased property evaluations. Buyers gain a reliable tool to assess
whether a property is priced fairly, helping them avoid overpaying. Sellers, on the other
hand, can use the system to set competitive prices that attract genuine buyers, thereby
reducing the time a property spends on the market. For investors, the system offers an
33
opportunity to evaluate properties based on projected returns, minimizing the risk of
financial loss. This comprehensive approach fosters trust and stability in the real estate
market, addressing a longstanding gap in the industry.
Moreover, the system’s ability to adapt to real-time market conditions ensures its
relevance and accuracy over time. As the housing market evolves due to factors like
economic fluctuations or policy changes, the model can incorporate new data to update its
predictions. This adaptability makes it a valuable tool for long-term use, ensuring users stay
ahead in a competitive market. By offering timely insights, the system also streamlines
decision-making processes, reducing the time and effort required to evaluate properties or
negotiate deals.
The implementation of this system redefines how house prices are estimated by shifting
from subjective assessments to data-backed predictions. Its integration of advanced machine
learning algorithms, efficient data handling libraries, and a user-friendly interface
exemplifies how technology can address complex real-world problems. By promoting
fairness, transparency, and efficiency, the system benefits all participants in the real estate
market, contributing to a more equitable and stable housing industry.

34
CHAPTER 7
CONCLUSION AND FUTURE ENHANCEMENTS
CONCLUSION:
The proposed House Price Prediction System leverages modern machine learning
techniques and data analytics to provide an unbiased, accurate, and transparent method of
predicting property prices. This system addresses several longstanding challenges in the real
estate market, particularly the subjectivity and potential biases that often accompany
traditional house valuation methods. By considering multiple factors such as location,
property size, neighbourhood amenities, and market trends, the system offers a
comprehensive and data-driven approach to predicting house prices with high accuracy.
One of the most important features of the system is its transparency and fairness. Unlike
traditional pricing methods that are susceptible to manipulation or bias, the system relies
entirely on data, ensuring that the price estimates are objective and grounded in factual
information. This approach eliminates potential conflicts and provides a level of confidence
for both buyers and sellers. The result is a more equitable real estate market, where prices
reflect the true value of properties based on measurable, verifiable factors rather than
subjective opinions.
In conclusion, the House Price Prediction System provides a much-needed solution to
the challenges faced by the real estate market today. By combining advanced data analytics,
machine learning, and transparent, objective pricing, this system significantly improves the
accuracy and fairness of property valuations. It not only enhances the decision-making
process for buyers and sellers but also contributes to the overall stability and trustworthiness
of the real estate market. The system’s ability to adapt to market changes and its focus on
fairness makes it an indispensable tool in modern real estate transactions, ensuring a more
efficient, transparent, and reliable market for all stakeholders involved.

35
FUTURE ENHANCEMENTS:
The future enhancement of the House Price Prediction System could involve several
improvements to further increase its accuracy, functionality, and accessibility. One potential
enhancement is the integration of additional data sources, such as social media trends, real-
time economic indicators, and geospatial data, which could provide deeper insights into the
property market, especially in fast-changing or emerging neighborhoods. Incorporating
natural language processing (NLP) techniques could allow the system to analyze textual
data, such as property descriptions or user reviews, to extract sentiment and other features
that influence pricing. Additionally, the system could be expanded to provide predictive
analytics on future property values by analyzing long-term market trends, making it a
powerful tool for investors looking to anticipate price changes. Another enhancement could
be the inclusion of personalized recommendations for buyers and sellers, suggesting
properties or pricing strategies based on individual preferences, budget, and goals.
Integrating the system with augmented reality (AR) tools could allow users to visualize
properties or view potential renovation impacts, further enhancing the decision-making
process. Lastly, improving the model’s interpretability with features like explainable AI
(XAI) would make the system even more user-friendly, providing transparent reasons
behind pricing estimates, which could increase trust and adoption. These enhancements
would ensure that the system remains at the forefront of real estate technology, offering
valuable insights and tools for all stakeholders in the housing market.

36
APPENDIX I
SOURCECODE
import streamlit as st
import pandas as pd
import pickle
import pandas as pd

# Load the model


model = pickle.load(open('house_price_model.pkl', 'rb'))

# Define the expected feature columns


expected_columns = [
'area', 'bedrooms', 'bathrooms', 'stories', 'parking',
'location_suburban', 'location_urban', 'mainroad_yes', 'guestroom_yes',
'basement_yes', 'hotwaterheating_yes', 'airconditioning_yes',
'prefarea_yes', 'furnishingstatus_semi-furnished',
'furnishingstatus_unfurnished', 'district_erode', 'district_tiruppur'
]

st.sidebar.header('Please fill this section: ')

def user_report():
area = st.sidebar.text_input('Area SqFt', 3500)
bedrooms = st.sidebar.slider('Bedrooms', 1, 5, 1)
bathrooms = st.sidebar.slider('Bathrooms', 1, 5, 1)
stories = st.sidebar.slider('Floors', 1, 5, 1) # New input for Floors
37
parking = st.sidebar.slider('Parking', 0, 3, 1) # New input for parking

location_index = st.sidebar.selectbox('Location', options=['Suburban', 'Urban', 'Rural'])


location_suburban = 1 if location_index == 'Suburban' else 0
location_urban = 1 if location_index == 'Urban' else 0

mainroad_index = st.sidebar.selectbox('Main Road', options=['Yes', 'No'])


mainroad_yes = 1 if mainroad_index == 'Yes' else 0

guestroom_index = st.sidebar.selectbox('Guestroom', options=['No', 'Yes'])


guestroom_yes = 1 if guestroom_index == 'Yes' else 0

basement_index = st.sidebar.selectbox('Basement', options=['No', 'Yes'])


basement_yes = 1 if basement_index == 'Yes' else 0

waterheating_index = st.sidebar.selectbox('Hot Water Heating', options=['No', 'Yes'])


hotwaterheating_yes = 1 if waterheating_index == 'Yes' else 0

airconditioning_index = st.sidebar.selectbox('Air Conditioning', options=['No', 'Yes'])


airconditioning_yes = 1 if airconditioning_index == 'Yes' else 0

prefarea_index = st.sidebar.selectbox('Water connection', options=['Yes', 'No'])


prefarea_yes = 1 if prefarea_index == 'Yes' else 0

furnishing_index = st.sidebar.selectbox('Furnishing Status', options=['Unfurnished', 'Semi-


Furnished', 'Furnished'])
furnishingstatus_semi_furnished = 1 if furnishing_index == 'Semi-Furnished' else 0

38
furnishingstatus_unfurnished = 1 if furnishing_index == 'Unfurnished' else 0

district_index = st.sidebar.selectbox('District', options=['erode', 'tiruppur','coimbatore'])


district_erode = 1 if district_index == 'erode' else 0
district_tiruppur = 1 if district_index == 'tiruppur' else 0
district_tiruppur = 1 if district_index == 'coimbatore' else 0

user_report_data = {
'area': area,
'bedrooms': bedrooms,
'bathrooms': bathrooms,
'stories': stories,
'parking': parking,
'location_suburban': location_suburban,
'location_urban': location_urban,
'mainroad_yes': mainroad_yes,
'guestroom_yes': guestroom_yes,
'basement_yes': basement_yes,
'hotwaterheating_yes': hotwaterheating_yes,
'airconditioning_yes': airconditioning_yes,
'prefarea_yes': prefarea_yes,
'furnishingstatus_semi-furnished': furnishingstatus_semi_furnished,
'furnishingstatus_unfurnished': furnishingstatus_unfurnished,
'district_erode': district_erode,
'district_tiruppur': district_tiruppur
}

39
report_data = pd.DataFrame(user_report_data, index=[0])
return report_data

user_data = user_report()

st.write('''
## House Price Prediction

This web app predicts the price of your dream house! :D


''')
#st.header('House data')
#df22=pd.DataFrame(user_data)
#print(df22.columns)
#st.write(user_data)

if st.button('Predict'):
# Ensure user_data is a DataFrame with the correct structure
house_price = model.predict(user_data)

formatted_price = '${:,.2f}'.format(house_price[0])
formatted_price = formatted_price.replace('$', '₹').replace(',', '')
st.write("## :green[*Your Predicted House Price is :*]", formatted_price)

40
APPENDIX II
SCREENSHOTS
OUTPUT SCREENSHOT

PREDICTING HOUSE PRICE

Figure A.2.1 Home Page

41
Figure A.2.2 Predicting the House Price

42
SOURCE CODE SCREENSHOTS

Figure A.2.3 Source Code Screenshot

43
Figure A.2.4 Source code screenshot

44
REFERENCES

[1] F Tan, C Cheng and Z Wei, "Time-Aware Latent Hierarchical Model for Predicting
House Prices", 2017 IEEE International Conference on Data Mining (ICDM), pp. 1111-
1116, 2017.

[2] D. Banerjee and S. Dutta, "Predicting the housing price direction using machine
learning techniques", 2017 IEEE International Conference on Power Control Signals and
Instrumentation Engineering (ICPCSI), pp. 2998-3000, 2017.

[3] R. E. Febrita, A. N. Alfiyatin, H. Taufiq and W. F. Mahmudy, "Data-driven fuzzy


rule extraction for housing price prediction in Malang East Java", 2017 IEEE International
Conference on Advanced Computer Science and Information Systems (ICACSIS), pp.
351-358, 2017.

[4] Rahul Misra and Ramkrishan Sahay, "A Review on Student Performance
Predication Using Data Mining Approach", International Journal of Recent Research and
Review, vol. X, no. 4, pp. 45-47, December 2017.

[5] Himanshu Arora, Shilpi Mishra and Manish Dubey, "Development of the
Framework for the Solution of the Security Problems in Data Transmission Involving
Advanced Asymmetric Algorithm", International Journal of Emerging Technology and
Advanced Engineering, vol. 8, no. 4, pp. 18-20, April 2018.

[6] J. J. Wang et al., "Predicting House Price With a Memristor-Based Artificial Neural
Network", IEEE Access, vol. 6, pp. 16523-16528, 2018.

[7] R. Misra and R. Sahay, "Evaluation of Student Performance Prediction Models


with Two Class Using Data Mining Approach", International Journal of Recent Research
and Review, vol. XI, no. 1, pp. 71-79, March 2018.

45
[8] T. D. Phan, "Housing Price Prediction Using Machine Learning Algorithms: The
Case of Melbourne City Australia", 2018 IEEE International Conference on Machine
Learning and Data Engineering (iCMLDE), pp. 35-42, 2018.

[9] Y. Tang, S. Qiu and P. Gui, "Predicting Housing Price Based on Ensemble
Learning Algorithm", 2018 IEEE International Conference on Artificial Intelligence and
Data Processing (IDAP), pp. 1-5, 2018.

[10] C. R. Madhuri, G. Anuradha and M. V. Pujitha, "House Price Prediction Using


Regression Techniques: A Comparative Study", 2019 International Conference on Smart
Structures and Systems (ICSSS), pp. 1-5, 2019.

[11] P. Durganjali and M. V. Pujitha, "House Resale Price Prediction Using


Classification Algorithms", 2019 IEEE International Conference on Smart Structures and
Systems (ICSSS), pp. 1-4, 2019.

[12] M. Jain, H. Rajput, N. Garg and P. Chawla, "Prediction of House Pricing using
Machine Learning with Python", 2020 IEEE International Conference on Electronics and
Sustainable Communication Systems (ICESC), pp. 570-574, 2020.

[13] H. Arora, G. K. Soni, R. K. Kushwaha and P. Prasoon, "Digital Image Security


Based on the Hybrid Model of Image Hiding and Encryption", 2021 IEEE 6th
International Conference on Communication and Electronics Systems (ICCES), pp. 1153-
1157, 2021.

[14] Himanshu Aora, Kiran Ahuja, Himanshu Sharma, Kartik Goyal and Gyanendra
Kumar, "Artificial Intelligence and Machine Learning in Game Development", Turkish
Online Journal of Qualitative Inquiry (TOJQI), vol. 12, no. 8, pp. 1153-1158, 2021.

[15] Kiran Ahuja, Harsh Sekhawat, Shilpi Mishra and Pradeep Jha, "Machine Learning
in Artificial Intelligence: Towards a Common Understanding", Turkish Online Journal of

46
Qualitative Inquiry (TOJQI), vol. 12, no. 8, pp. 1143-1152, July 2021.

[16] P. Jha, T. Biswas, U. Sagar and K. Ahuja, "Prediction with ML paradigm in


Healthcare System", 2021 IEEE Second International Conference on Electronics and
Sustainable Communication Systems (ICESC), pp. 1334-1342, 2021.

[17] Y. Chen, R. Xue and Y. Zhang, "House price prediction based on machine learning
and deep learning methods", 2021 International Conference on Electronic Information
Engineering and Computer Science (EIECS), pp. 699-702, 2021.

[18] P. Jha, R. Baranwal, Monika and N. K. Tiwari, "Protection of User's Data in


IOT", 2022 IEEE Second International Conference on Artificial Intelligence and Smart
Energy (ICAIS), pp. 1292-1297, 2022.

[19] Priya Gour, Sudhanshu Vashistha and Pradeep Jha, "Twitter Sentiment Analysis
Using Naive Bayes based Machine learning Technique", 2nd International Conference on
Sentiment Analysis and Deep Learning (ICSADL 2022), 2023.

47

You might also like