ADS_LAB8

The document outlines the data science lifecycle for house price prediction, detailing steps from problem definition to model deployment and monitoring. It includes coding examples using Python for data processing, model training, and evaluation, demonstrating the relationship between house features and prices. The conclusion highlights the effectiveness of the regression model in predicting house prices based on area and furnishing status.

Uploaded by

abhijaysingh66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

ADS_LAB8

Uploaded by

abhijaysingh66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

EXPERIMENT NO.

08
AIM: Illustrate the data science lifecycle for the selected case study. (Prepare case study
document for the selected case study)
THEORY:
House price prediction is the process of using data analysis and statistical techniques to forecast
the selling or buying price of a house. This prediction is typically based on various factors such
as location, size, number of bedrooms and bathrooms, neighborhood amenities, historical sales
data, economic indicators, and more.
The data science life cycle, in the context of house price prediction, typically follows these
steps:
1. Problem Definition: This phase involves understanding the objective of the prediction task.
For house price prediction, the goal is to estimate the selling or buying price of a house
accurately.
2. Data Acquisition: Data acquisition involves gathering relevant datasets that contain
information about houses such as features (e.g., square footage, number of bedrooms, location)
and their corresponding sale prices. Data can be obtained from various sources like real estate
websites, government databases, or through web scraping.
3. Data Cleaning and Preprocessing: Raw data often contains errors, missing values, or
inconsistencies that need to be addressed before analysis. Data cleaning involves removing
duplicates, handling missing values, and standardizing formats. Preprocessing involves
transforming the data into a suitable format for analysis, which may include feature scaling,
normalization, or encoding categorical variables.
4. Exploratory Data Analysis (EDA): EDA involves analyzing and visualizing the data to
understand patterns, relationships, and distributions. In the context of house price prediction,
EDA might include creating histograms, scatter plots, or correlation matrices to explore the
relationship between house features and prices.
5. Feature Engineering: Feature engineering involves selecting, creating, or transforming
features that are most relevant for the prediction task. This might include techniques such as
feature selection, dimensionality reduction, or creating new features based on domain
knowledge.
6. Model Selection and Training: In this phase, various machine learning algorithms are
evaluated and trained on the prepared dataset. Common algorithms for house price prediction
include linear regression, decision trees, random forests, and neural networks. The dataset is
typically split into training and testing sets to evaluate the performance of the models.
7. Model Evaluation: Models are evaluated using appropriate metrics such as mean squared
error (MSE), root mean squared error (RMSE), or mean absolute error (MAE). The
performance of different models is compared to select the one that provides the best prediction
accuracy.
8. Model Deployment: Once a satisfactory model is selected, it can be deployed into production
for making predictions on new, unseen data. This might involve creating an application
interface, API, or integrating the model into existing systems.
9. Monitoring and Maintenance: After deployment, it's important to monitor the model's
performance over time and update it as needed. This might involve retraining the model with
new data or adjusting its parameters to adapt to changing conditions.
Throughout this life cycle, data scientists apply their expertise in statistics, machine learning,
and domain knowledge to build accurate and reliable house price prediction models.
CODING:
import pandas as pd
file_path = '/content/Housing.csv'
df = pd.read_csv(file_path, encoding='latin1')
# print(df)
# df.info()
df.head()

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error

data = pd.read_csv('/content/Housing.csv')
X = data[['bedrooms', 'mainroad'
y = data['price']

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.2, random_state=42)
preprocessor = ColumnTransformer(
transformers=[
('cat', OneHotEncoder(drop='first'), ['mainroad'])
],
remainder='passthrough'
)
model = Pipeline(steps=[
('preprocessor', preprocessor),
('regressor', LinearRegression())
])
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print('Mean Squared Error:', mse)
results_df = pd.DataFrame({'Actual Price': y_test, 'Predicted Price':
y_pred})
plt.figure(figsize=(12, 8))

avg_predicted_prices =
results_df.groupby(X_test['bedrooms'])['Predicted
Price'].mean().reset_index()
sns.barplot(x='bedrooms', y='Predicted Price',
data=avg_predicted_prices)
plt.xlabel('Number of Bedrooms')
plt.ylabel('Average Predicted Price')
plt.title('Average Predicted Prices based on Number of Bedrooms')
plt.show()

X = data[['area']]
y = data['price']
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print('Mean Squared Error:', mse)
plt.figure(figsize=(12, 8))
plt.scatter(X_test, y_test, color='blue', label='Actual Price')
plt.scatter(X_test, y_pred, color='red', label='Predicted Price')
plt.xlabel('Area')
plt.ylabel('House Price')
plt.title('Actual vs Predicted House Prices based on Area')
plt.legend()
plt.show()

X = data[['area', 'furnishingstatus']] # Features: area, furnishing

status
y = data['price'] # Target variable: house price
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
preprocessor = ColumnTransformer(
transformers=[
('cat', OneHotEncoder(drop='first'), ['furnishingstatus'])
],
remainder='passthrough')

model = Pipeline(steps=[
('preprocessor', preprocessor),
('regressor', LinearRegression())])

model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print('Mean Squared Error:', mse)
results_df = pd.DataFrame({'Area': X_test['area'], 'Furnishing Status':
X_test['furnishingstatus'], 'Actual Price': y_test, 'Predicted Price':
y_pred})
plt.figure(figsize=(12, 8))
sns.scatterplot(data=results_df, x='Area', y='Actual Price',
hue='Furnishing Status', palette='deep')
plt.scatter(results_df['Area'], results_df['Predicted Price'],
color='black', marker='x', label='Predicted Price')
plt.xlabel('Area')
plt.ylabel('House Price')
plt.title('Actual House Prices based on Area and Furnishing Status')
plt.legend()
plt.show()

CONCLUSION: The visualization suggests a positive correlation between house prices and
area, with larger houses generally commanding higher prices. Furnishing status also appears to
influence prices, with furnished houses tending to have higher prices compared to semi-
furnished or unfurnished ones. The regression model's predicted prices closely align with actual
prices in most cases, indicating its effectiveness in capturing the relationship between area,
furnishing status, and house prices. These insights can inform pricing strategies and marketing
approaches in the real estate market.

SSRN Id3565512
No ratings yet
SSRN Id3565512
5 pages
intership report
No ratings yet
intership report
20 pages
Real-Estate Property
No ratings yet
Real-Estate Property
11 pages
CSIC 6132 排版870 878
No ratings yet
CSIC 6132 排版870 878
9 pages
House Price Prediction Report
No ratings yet
House Price Prediction Report
2 pages
Phase 5
No ratings yet
Phase 5
5 pages
House price predictor ppt Project
No ratings yet
House price predictor ppt Project
13 pages
Synopsis
No ratings yet
Synopsis
7 pages
House price prediction
No ratings yet
House price prediction
5 pages
Title Predicting House Pricing Using AIML (KASHISH)
No ratings yet
Title Predicting House Pricing Using AIML (KASHISH)
2 pages
House Price Prediction Analysis PDF
No ratings yet
House Price Prediction Analysis PDF
78 pages
Utkarsh Gupta - House Price Prediction
No ratings yet
Utkarsh Gupta - House Price Prediction
6 pages
UtkarshGupta (House Price Prediction)
No ratings yet
UtkarshGupta (House Price Prediction)
14 pages
Predicting House Prices
No ratings yet
Predicting House Prices
9 pages
Project Synopsis Shaiba
No ratings yet
Project Synopsis Shaiba
5 pages
Updated_House_Price_Prediction_Report
No ratings yet
Updated_House_Price_Prediction_Report
5 pages
ml project clg (2)
No ratings yet
ml project clg (2)
62 pages
Project
No ratings yet
Project
10 pages
Comparative Study of House Price Prediction Using Machine Learning Research Paper
No ratings yet
Comparative Study of House Price Prediction Using Machine Learning Research Paper
14 pages
ML Practical 04
No ratings yet
ML Practical 04
19 pages
Khare 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012053
No ratings yet
Khare 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012053
15 pages
Data Science Assignment Chapter 1
No ratings yet
Data Science Assignment Chapter 1
5 pages
Real Estate Price Prediction Model
No ratings yet
Real Estate Price Prediction Model
3 pages
Real Estate Price Prediction
No ratings yet
Real Estate Price Prediction
7 pages
Sample Synopsis
No ratings yet
Sample Synopsis
4 pages
ml project part a 1
No ratings yet
ml project part a 1
6 pages
Document 4 (1)
No ratings yet
Document 4 (1)
4 pages
Dma 362
No ratings yet
Dma 362
7 pages
Shub Neet Dt
No ratings yet
Shub Neet Dt
12 pages
Oral Presentation
No ratings yet
Oral Presentation
9 pages
Data_Science_Project_Report_Long
No ratings yet
Data_Science_Project_Report_Long
177 pages
House Price Prediction - Research Paper FINAL DRAFT
100% (1)
House Price Prediction - Research Paper FINAL DRAFT
10 pages
Aastha Mahajan Python File
No ratings yet
Aastha Mahajan Python File
17 pages
Utkarsh Gupta G (73) (House Price Prediction)
No ratings yet
Utkarsh Gupta G (73) (House Price Prediction)
6 pages
House Price Prediction
No ratings yet
House Price Prediction
12 pages
Regression Dataset
No ratings yet
Regression Dataset
3 pages
MBB JETIR2204579
No ratings yet
MBB JETIR2204579
5 pages
Synopsis 01 (1)
No ratings yet
Synopsis 01 (1)
2 pages
House Price Prediction Using Machine Learning Techniques
No ratings yet
House Price Prediction Using Machine Learning Techniques
5 pages
House Price Prediction Using Machine Learning Techniques
No ratings yet
House Price Prediction Using Machine Learning Techniques
5 pages
Data Analysis Project MAIN
No ratings yet
Data Analysis Project MAIN
6 pages
House Prices
No ratings yet
House Prices
5 pages
FML PROJECT diya (1) (1)
No ratings yet
FML PROJECT diya (1) (1)
9 pages
1822 B.E Ece Batchno 120
No ratings yet
1822 B.E Ece Batchno 120
29 pages
Synopsis 427
No ratings yet
Synopsis 427
5 pages
draft of prgt
No ratings yet
draft of prgt
8 pages
ese lab file
No ratings yet
ese lab file
30 pages
House Price Forecasting Using Machine Learning Methods: Uter and Mathematics Education 11 (2021), 3624-3632
No ratings yet
House Price Forecasting Using Machine Learning Methods: Uter and Mathematics Education 11 (2021), 3624-3632
9 pages
HOUSE_PREDICTION_(1)[1]new[1][1]
No ratings yet
HOUSE_PREDICTION_(1)[1]new[1][1]
24 pages
HousePricePrediction_Zillow_solution_methodology
No ratings yet
HousePricePrediction_Zillow_solution_methodology
5 pages
Abstract Machine Learning Has Been Instrumental Across Diver
No ratings yet
Abstract Machine Learning Has Been Instrumental Across Diver
6 pages
IJCRT2111135
No ratings yet
IJCRT2111135
7 pages
Report
No ratings yet
Report
40 pages
Housepriceprediction ML 221104055342 Fb5109ae
No ratings yet
Housepriceprediction ML 221104055342 Fb5109ae
17 pages
Dsbda Mini Priyanshu
No ratings yet
Dsbda Mini Priyanshu
17 pages
synopsis of predicting house prices using decison tree
No ratings yet
synopsis of predicting house prices using decison tree
14 pages
Mini Project Report Format
No ratings yet
Mini Project Report Format
22 pages
Task 1
No ratings yet
Task 1
11 pages
BDA_REPORT
No ratings yet
BDA_REPORT
27 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
3BUS095813 PartD Symphony Plus
No ratings yet
3BUS095813 PartD Symphony Plus
67 pages
Aytemiz Makelsan Generator Catalogue
No ratings yet
Aytemiz Makelsan Generator Catalogue
48 pages
Grade 11 Final Revision first semester
No ratings yet
Grade 11 Final Revision first semester
20 pages
2.2.3 2.2.4
No ratings yet
2.2.3 2.2.4
25 pages
Theory Assignment of AI
No ratings yet
Theory Assignment of AI
3 pages
Blockchain-Based Privacy-Preserving Shop Floor Auditing Architecture
No ratings yet
Blockchain-Based Privacy-Preserving Shop Floor Auditing Architecture
8 pages
Series worksheet ip class 12 (1)
No ratings yet
Series worksheet ip class 12 (1)
3 pages
Fleet Management, Logistics and Warehouse
No ratings yet
Fleet Management, Logistics and Warehouse
49 pages
Permit For Temporary Service Connection (Front)
No ratings yet
Permit For Temporary Service Connection (Front)
1 page
Comp-Prog11 Q1 Mod1 MSword 16052021
No ratings yet
Comp-Prog11 Q1 Mod1 MSword 16052021
32 pages
Order Management System
No ratings yet
Order Management System
18 pages
EN54-4 Approved Power Supply Units
No ratings yet
EN54-4 Approved Power Supply Units
2 pages
NSE6 - FML 6.4 2023
No ratings yet
NSE6 - FML 6.4 2023
100 pages
CS 412: Introduction To Data Mining Course Syllabus
No ratings yet
CS 412: Introduction To Data Mining Course Syllabus
7 pages
Installed Files
No ratings yet
Installed Files
54 pages
Unit 1 Leisure Activities Lesson 5 Skills 1
No ratings yet
Unit 1 Leisure Activities Lesson 5 Skills 1
21 pages
Manual de Usuario DVD Portatil P7100PDE Disney Princesas
No ratings yet
Manual de Usuario DVD Portatil P7100PDE Disney Princesas
50 pages
TP Link Oc200
No ratings yet
TP Link Oc200
138 pages
Prospectus PDF
No ratings yet
Prospectus PDF
43 pages
OM Chapter 7 Process Stretagies
No ratings yet
OM Chapter 7 Process Stretagies
36 pages
SCCM 2012 SQLViews
No ratings yet
SCCM 2012 SQLViews
1,828 pages
Digital Communications A Discretetime Approach Rice Michael pdf download
No ratings yet
Digital Communications A Discretetime Approach Rice Michael pdf download
74 pages
Sae J1708
100% (1)
Sae J1708
39 pages
Course Introduction - OOAD
No ratings yet
Course Introduction - OOAD
7 pages
Syllabus SYDE223
No ratings yet
Syllabus SYDE223
2 pages
Analyzing Pi System Data Work Book
No ratings yet
Analyzing Pi System Data Work Book
280 pages
Quiz1 - Introduction, Filters, Custom Action in Dashboard, KPI Charts, Creating A Storyboard - Data Vis
No ratings yet
Quiz1 - Introduction, Filters, Custom Action in Dashboard, KPI Charts, Creating A Storyboard - Data Vis
5 pages
Introduction and Instrument
No ratings yet
Introduction and Instrument
29 pages
X-10 Identity Card: District Employment and Career Guidance Centre-Villupuram
No ratings yet
X-10 Identity Card: District Employment and Career Guidance Centre-Villupuram
1 page
PHD Thesis in Computer Science India
100% (2)
PHD Thesis in Computer Science India
6 pages

ADS_LAB8

Uploaded by

ADS_LAB8

Uploaded by

EXPERIMENT NO.

X_train, X_test, y_train, y_test = train_test_split(X, y,

X = data[['area', 'furnishingstatus']] # Features: area, furnishing

You might also like