0% found this document useful (0 votes)

43 views

Experiment No.2 Title:: Predicting Missing Data Using Regression Modeling

The document describes an experiment on predicting missing data using regression modeling. It discusses linear regression and multiple linear regression techniques. Linear regression finds the best fitting line to predict a variable from another. Multiple linear regression extends this to predict a variable from multiple other variables. The document provides equations and examples to explain simple and multiple linear regression. It also discusses when regression imputation techniques can be used to predict missing data values based on correlations between variables.

Uploaded by

harsh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views

Experiment No.2 Title:: Predicting Missing Data Using Regression Modeling

Uploaded by

harsh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

KJSCE/IT/TY /SEMVI/EDA/2020-21

Experiment No.2

Title: Predicting missing data using regression modeling

(A constituent college under Somaiya Vidyavihar University)

KJSCE/IT/TY /SEMVI/EDA/2020-21

Batch:B-1 Roll No.:1814091 Experiment No.: 2

Aim: Predict missing data using regression modeling.

___________________________________________________________________________
Resources needed: Any programming language, any data source (RDBMS/Excel/CSV)
___________________________________________________________________________
Theory:

Missing data (or missing values) is defined as the data value that is not stored for a variable in
the observation of interest. The problem of missing data is relatively common in data set and
can have a significant effect on the conclusions that can be drawn from the data. There are
various techniques proposed for handling missing values like deletion of records/attributes,
filling with a random value or using some measures of central tendency, imputation using
regression etc. Regression imputation is guessing missing variables using regression if we
know there is a correlation between the missing value and other variables. Scatterplots can be
used to identify correlation between variables.

Figure 1: A scatter plot showing correlation between attributes pain and tampascale.

Once correlation is identified either linear regression or multiple regression can be used for
imputation. Linear regression involves finding the “best” line as shown in fig. 1 to fit two
attributes (or variables) so that one attribute can be used to predict the other.

Figure 2: Example of simple linear regression, which has one independent variable

(A constituent college under Somaiya Vidyavihar University)

KJSCE/IT/TY /SEMVI/EDA/2020-21

Multiple linear regression is an extension of linear regression, where more than two
attributes are involved and the data are fit to a multidimensional surface.

Prediction is predicting continuous or ordered values for a given input i.e. Numeric
prediction, for example, predicting salary of employee with 10 years of experience.

Simple Linear Regression:

Straight line regression analysis involves a responsible variable y and a single predictor
variable x. by modeling y as a linear function of x as given in equation 1,

y=w0 + w1*x ………………………………………………………………….. (1)

where w0 and w1 are Regression co-efficient.

w0 = Y-intercept
w1 = Slope of the line

Calculate w0 and w1 by method of least squares, which estimates best fitting straight line.

Let D be a training set,

[ D ] = { (x1,y1),(x2,y2),(x3,y3),…….,( xn, yn)}

Regression co-efficient,

|D|
( y¿ ¿i− ý)
w 1=∑ ( x¿¿ i¿−x́) |D| ¿¿¿
i=1
∑ ( x ¿¿ i¿− x́)2 ¿ ¿
i=1

………………………………………………………………..(2)

w 0= ý −w1 x́ …………………………………………………..…………………….(3)

Where x́ is the mean value of x1, x2, x3,….xn.

And ýis the mean value of y1, y2, y3, y4,…yn.

Multiple Linear Regression:
Multiple linear regression is used to explain the relationship between one continuous
dependent variable and two or more independent variables. The independent variables can
be continuous or categorical. Formula can be represented as Y=mX1+mX2+mX3…+b, or it
can be written in matrix format as Y = Xb

(A constituent college under Somaiya Vidyavihar University)

KJSCE/IT/TY /SEMVI/EDA/2020-21

To handle the complications of multiple regression, we will use matrix algebra. The least
squares normal equations can be expressed as: Y=Xb--------Multiply both sides with XT
XTY = XTXB or XTXB = XTY
Here, matrix XT is the transpose of matrix X. To solve for regression coefficients, simply pre-
multiply by the inverse of XTX:
( XTX)-1 XTXB = ( XTX)-1 XTY …........since ( XTX)-1 XTX = I, the identity matrix, we get
slope B as
B = (XTX)-1 XTY

___________________________________________________________________________

Procedure / Approach /Algorithm / Activity Diagram:

1. Identify attributes suitable for applying Linear regression. Construct a linear

regression model for your dataset and predict the missing values in your data set.
Evaluate the accuracy of prediction.(usage of built in package for prediction is not
expected)
2. Identify attributes suitable for applying Multiple Linear regression. Construct a linear
regression model for your dataset and predict the missing values in your data set.

(A constituent college under Somaiya Vidyavihar University)

KJSCE/IT/TY /SEMVI/EDA/2020-21

Evaluate the accuracy of prediction. .(usage of built in package for prediction is not
expected)
___________________________________________________________________________

Results: (Program printout with output / Document printout as per the format)

Questions:
1. How will you choose between linear regression and non-linear regression?
Ans: The general guideline is to use linear regression first to determine whether it can fit the
particular type of curve in your data. If you can’t obtain an adequate fit using linear
regression, that’s when you might need to choose nonlinear regression. Linear regression is
easier to use, simpler to interpret, and you obtain more statistics that help you assess the
model. While linear regression can model curves, it is relatively restricted in the shapes of the
curves that it can fit. Sometimes it can’t fit the specific curve in your data.
Nonlinear regression can fit many more types of curves, but it can require more effort both to
find the best fit and to interpret the role of the independent variables. Additionally, R-squared
is not valid for nonlinear regression, and it is impossible to calculate p-values for the
parameter estimates.

The general guideline is to use linear regression first to determine whether it can fit the
particular type of curve in your data. If you can’t obtain an adequate fit using linear
regression, that’s when you might need to choose nonlinear regression.Linear regression is
easier to use, simpler to interpret, and you obtain more statistics that help you assess the
model. While linear regression can model curves, it is relatively restricted in the shapes of the
curves that it can fit. Sometimes it can’t fit the specific curve in your data.Nonlinear
regression can fit many more types of curves, but it can require more effort both to find the
best fit and to interpret the role of the independent variables. Additionally, R-squared is not
valid for nonlinear regression, and it is impossible to calculate p-values for the parameter
estimates.

2. Explain the nature or characteristics of a dataset where we can apply regression

imputation.
Ans. Imputation simply means that we replace the missing values with some
guessed/estimated ones.

A simple guess of a missing value is the mean, median, or mode (most frequently
appeared value) of that variable.

(A constituent college under Somaiya Vidyavihar University)

KJSCE/IT/TY /SEMVI/EDA/2020-21

Mean, median or mode imputation only look at the distribution of the values of
the variable with missing entries. If we know there is a correlation between the
missing value and other variables, we can often get better guesses by regressing
the missing variable on other variables.

As we can see, in our example data, tip and total_bill have the highest correlation.
Thus, we can use a simple linear model regressing total_bill on tip to fill the
missing values in total_bill.

When we replace the missing data with some common value we might
under(over)estimate it. In other words, we add some bias to our estimation.

(A constituent college under Somaiya Vidyavihar University)

KJSCE/IT/TY /SEMVI/EDA/2020-21

There are many missing data imputation methods to avoid these troublesome
cases and Regression Imputation is one such method in which we estimate the
missing values by Regression using other variables as the parameters.

In Deterministic Regression Imputation, we replace the missing data with the

values predicted in our regression model and repeat this process for each variable.

To add uncertainty back to the imputed variable values, we can add some
normally distributed noise with a mean of zero and the variance equal to the
standard error of regression estimates. This method is called as Random
Imputation or Stochastic Regression Imputation.

Regression imputation has the opposite problem of mean imputation. A regression

model is estimated to predict observed values of a variable based on other
variables, and that model is then used to impute values in cases where the value of
that variable is missing.

Outcomes: Comprehend descriptive and proximity measures of data.

Conclusion: (Conclusion to be based on the objectives and outcomes achieved)

(A constituent college under Somaiya Vidyavihar University)

KJSCE/IT/TY /SEMVI/EDA/2020-21

In this experiment we learnt about the descriptive and proximity measures of data. We also
learnt about regression algorithms and how to implement them and predict values using them
which further helps in analysing the dataset based on the outcome of the regression model.

Grade: AA / AB / BB / BC / CC / CD /DD

Signature of faculty in-charge with date

___________________________________________________________________________
References:

Books/ Journals/ Websites:

1. Han, Kamber, "Data Mining Concepts and Techniques", Morgan Kaufmann 3 nd

Edition

(A constituent college under Somaiya Vidyavihar University)

Linear Regression
No ratings yet
Linear Regression
16 pages
3 VCutWorks Software RDD6584
No ratings yet
3 VCutWorks Software RDD6584
96 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Advanced - Linear Regression
No ratings yet
Advanced - Linear Regression
57 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
ML PR-2
No ratings yet
ML PR-2
11 pages
Data Analytics Regression Unit III
No ratings yet
Data Analytics Regression Unit III
27 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
DA-3rd unit
No ratings yet
DA-3rd unit
16 pages
Additional Material - Linear Regression
No ratings yet
Additional Material - Linear Regression
11 pages
Module 3 PoM-Forecasting
No ratings yet
Module 3 PoM-Forecasting
5 pages
5_AML Lecture 5_Linear regression
No ratings yet
5_AML Lecture 5_Linear regression
56 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Topic:-Regression: Name: - Teotia Nidhi Class: - M.SC Biotechnology
No ratings yet
Topic:-Regression: Name: - Teotia Nidhi Class: - M.SC Biotechnology
10 pages
Regression
No ratings yet
Regression
24 pages
Module01 LinearRegression
No ratings yet
Module01 LinearRegression
41 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Linear Regresion
No ratings yet
Linear Regresion
28 pages
Regression DPP 01 Discussion Notes664745df1b2c900018f5ac7e
No ratings yet
Regression DPP 01 Discussion Notes664745df1b2c900018f5ac7e
32 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Regression
No ratings yet
Regression
60 pages
Linear Regression (1)
No ratings yet
Linear Regression (1)
19 pages
Reference Material Linear Regression
No ratings yet
Reference Material Linear Regression
12 pages
Inferential Statistics
No ratings yet
Inferential Statistics
22 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Updated_Lecture_7
No ratings yet
Updated_Lecture_7
29 pages
Topic:-Regression: Name: - Teotia Nidhi Class: - M.SC Biotechnology
No ratings yet
Topic:-Regression: Name: - Teotia Nidhi Class: - M.SC Biotechnology
11 pages
Unit-3 Data Analysis
No ratings yet
Unit-3 Data Analysis
36 pages
8.-Linear-Regression
No ratings yet
8.-Linear-Regression
25 pages
Da Unit III
No ratings yet
Da Unit III
43 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
LINEAR REGRESSION IN R
No ratings yet
LINEAR REGRESSION IN R
6 pages
Noakhali Science and Technology University
No ratings yet
Noakhali Science and Technology University
28 pages
Module01.1 LinearRegression
No ratings yet
Module01.1 LinearRegression
32 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
UNIT-2 ML
No ratings yet
UNIT-2 ML
39 pages
da-unit-iii
No ratings yet
da-unit-iii
43 pages
Unit III
No ratings yet
Unit III
18 pages
Section 2
No ratings yet
Section 2
22 pages
IV Ai & Ds Al3451 Ml Unit2
No ratings yet
IV Ai & Ds Al3451 Ml Unit2
50 pages
Topic 7 Linear Regreation CHP14
No ratings yet
Topic 7 Linear Regreation CHP14
21 pages
Tute - 04
No ratings yet
Tute - 04
6 pages
High Yield Notes
No ratings yet
High Yield Notes
251 pages
Da Unit-3
No ratings yet
Da Unit-3
27 pages
DMJAP-LinearRegression-3
No ratings yet
DMJAP-LinearRegression-3
28 pages
Reference+Material Linear Regression
No ratings yet
Reference+Material Linear Regression
12 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Chapter4_Regression.docx
No ratings yet
Chapter4_Regression.docx
15 pages
Machine Learning QB
No ratings yet
Machine Learning QB
32 pages
Unit 3 Notes
100% (2)
Unit 3 Notes
32 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Linear regression for machine learning
No ratings yet
Linear regression for machine learning
9 pages
Regression Notes- Part-1
No ratings yet
Regression Notes- Part-1
17 pages
ML Exp1 C36
No ratings yet
ML Exp1 C36
13 pages
01 - Quantitative Methods
No ratings yet
01 - Quantitative Methods
28 pages
(Mathe) Simple Linear Regression and Correlation
No ratings yet
(Mathe) Simple Linear Regression and Correlation
61 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Data Analytics Regression UNIT-III
No ratings yet
Data Analytics Regression UNIT-III
26 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Title Slide With Image (960 X 384 PX), Corpo A, 24 PT., 2 Lines Possible
No ratings yet
Title Slide With Image (960 X 384 PX), Corpo A, 24 PT., 2 Lines Possible
12 pages
Cambridge International AS & A Level: Information Technology 9626/11
No ratings yet
Cambridge International AS & A Level: Information Technology 9626/11
16 pages
Automobile Maintenance 2021
No ratings yet
Automobile Maintenance 2021
32 pages
2024-FastSim: A Modular and Plug-and-Play
No ratings yet
2024-FastSim: A Modular and Plug-and-Play
8 pages
Interacting With Your EF Core Data Model: Julie Lerman
No ratings yet
Interacting With Your EF Core Data Model: Julie Lerman
55 pages
Introduction To Search Engine Optimization - A Guide For Absolute Beginners (PDFDrive) PDF
No ratings yet
Introduction To Search Engine Optimization - A Guide For Absolute Beginners (PDFDrive) PDF
134 pages
WinchWarnPullZAll ManuaL
No ratings yet
WinchWarnPullZAll ManuaL
32 pages
Touch Screen Technology
No ratings yet
Touch Screen Technology
24 pages
THS88 Max
No ratings yet
THS88 Max
5 pages
Pilar Irving - English CV - Oct 2020 PDF
No ratings yet
Pilar Irving - English CV - Oct 2020 PDF
4 pages
User Guide: Multi-Variable Analog Interface
No ratings yet
User Guide: Multi-Variable Analog Interface
26 pages
Technical Data Sheet: LPI® Data Transmission Protectors: RS Range
No ratings yet
Technical Data Sheet: LPI® Data Transmission Protectors: RS Range
2 pages
Fgghuj
No ratings yet
Fgghuj
8 pages
Data Sheet FOSC 600 Fiber Optic Splice Closures
No ratings yet
Data Sheet FOSC 600 Fiber Optic Splice Closures
4 pages
Azure Data Platform End2End - 1day
No ratings yet
Azure Data Platform End2End - 1day
90 pages
Problem Statement ID – 1598 Problem Statement Title- Student Innovation Theme- Heritage and Culture PS
No ratings yet
Problem Statement ID – 1598 Problem Statement Title- Student Innovation Theme- Heritage and Culture PS
6 pages
Z-Transform Analysis IIR Vs FIR System
No ratings yet
Z-Transform Analysis IIR Vs FIR System
7 pages
Product Bulletin: Optical Bypass Relay OBR40 From Hirschmann™
No ratings yet
Product Bulletin: Optical Bypass Relay OBR40 From Hirschmann™
4 pages
Citect Anywhere Quick Start Guide
No ratings yet
Citect Anywhere Quick Start Guide
22 pages
Full Download (Ebook) Cyber-Security in Critical Infrastructures: A Game-Theoretic Approach by Stefan Rass, Stefan Schauer, Sandra König, Quanyan Zhu ISBN 9783030469078, 9783030469085, 3030469077, 3030469085 PDF DOCX
100% (1)
Full Download (Ebook) Cyber-Security in Critical Infrastructures: A Game-Theoretic Approach by Stefan Rass, Stefan Schauer, Sandra König, Quanyan Zhu ISBN 9783030469078, 9783030469085, 3030469077, 3030469085 PDF DOCX
57 pages
Untitled
No ratings yet
Untitled
21 pages
Samsung S22C450 PDF
No ratings yet
Samsung S22C450 PDF
143 pages
Tutorial Django Kelas o
No ratings yet
Tutorial Django Kelas o
3 pages
FMEA Integration in Requirements Management As A Basis For An Automotive SPICE Project
No ratings yet
FMEA Integration in Requirements Management As A Basis For An Automotive SPICE Project
11 pages
Advance PLC Programming Course V2019 Chapter 02 PDF
100% (1)
Advance PLC Programming Course V2019 Chapter 02 PDF
38 pages
Excel Project Dashboard2
No ratings yet
Excel Project Dashboard2
12 pages
2021 Eng CV
No ratings yet
2021 Eng CV
1 page
Matrices Practice Problem
No ratings yet
Matrices Practice Problem
8 pages
1
No ratings yet
1
58 pages

Experiment No.2 Title:: Predicting Missing Data Using Regression Modeling

Uploaded by

Experiment No.2 Title:: Predicting Missing Data Using Regression Modeling

Uploaded by

KJSCE/IT/TY /SEMVI/EDA/2020-21

Title: Predicting missing data using regression modeling

(A constituent college under Somaiya Vidyavihar University)

Batch:B-1 Roll No.:1814091 Experiment No.: 2

Aim: Predict missing data using regression modeling.

(A constituent college under Somaiya Vidyavihar University)

Simple Linear Regression:

y=w0 + w1*x ………………………………………………………………….. (1)

where w0 and w1 are Regression co-efficient.

Let D be a training set,

Where x́ is the mean value of x1, x2, x3,….xn.

And ýis the mean value of y1, y2, y3, y4,…yn.

(A constituent college under Somaiya Vidyavihar University)

Procedure / Approach /Algorithm / Activity Diagram:

1. Identify attributes suitable for applying Linear regression. Construct a linear

(A constituent college under Somaiya Vidyavihar University)

2. Explain the nature or characteristics of a dataset where we can apply regression

(A constituent college under Somaiya Vidyavihar University)

(A constituent college under Somaiya Vidyavihar University)

In Deterministic Regression Imputation, we replace the missing data with the

Regression imputation has the opposite problem of mean imputation. A regression

Outcomes: Comprehend descriptive and proximity measures of data.

Conclusion: (Conclusion to be based on the objectives and outcomes achieved)

(A constituent college under Somaiya Vidyavihar University)

Signature of faculty in-charge with date

Books/ Journals/ Websites:

1. Han, Kamber, "Data Mining Concepts and Techniques", Morgan Kaufmann 3 nd

(A constituent college under Somaiya Vidyavihar University)

You might also like