Introduction To Data Science

This is the document used to define about data science

Uploaded by

rakesh rocking

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Introduction To Data Science

This is the document used to define about data science

Uploaded by

rakesh rocking

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

PREDICTIVE ANALYTICS

UNIT – VI

1
Topics of Unit-6:
PREDICTIVE ANALYTICS

Data Interfaces: Introduction, CSV Files: Syntax, Importing a CSV File

Statistical Applications: Introduction, Basic Statistical Operations,
Linear Regression Analysis, Chi-Squared Goodness of Fit Test, Chi-
Squared Test of Independence, Multiple Regression.

2
CSV file in R
• In R can read and write into various file formats like csv, excel, json,
xml etc.
• The csv file is a text file in which the values in the columns are
separated by a comma.
• read.csv() function is used to read a CSV file in your working
directory. Similarly, write.csv() function is used to write the csv file.
How to import .CSV file in R
• It is hard to use the clipboard to work on larger data sets, instead a
command read.csv() can be used to read such files
• dataset = read.csv(file = “c:/samplefile.csv”, header = TRUE,
sep = “,”)
Mean
Mean is calculated as the summation of all the values in the data series divided
by the number of values.

Following is the function and syntax definition for calculating the mean:
mean(V, trim=0.0, na.rm= FALSE, …)

• V is the input vector;

• trim, which sorts the input vector and removes equal number of values specified from both ends of the sorted vector;
• na.rm dismisses the missing values from the input vector.

5
Median
Median is the middle value of any data series. The R function for calculating the
median is:

median(V, na.rm=FALSE)

The function takes V as the input vector and na.rm for dismissing the missing values from V to avoid anomalies.

6
Mode
• Mode is defined as a value that has the maximum number of occurrences in the
data series.

• There may be more than one value with equal number of occurrences.

• Mode can be calculated for both numerical data and character data unlike mean
and median, which are applicable only for numerical data.

7
Mode : Example
• # create a function for MODE
• getmode <- function(v) {
• uniqv <- unique(v)
• uniqv[which.max(tabulate(match(v,uniqv)))]
• }
• # Create a vector with some numbers
• v <- c(2,1,2,3,1,2,4,5,3,2,1,2,3,4,5)

• result <- getmode(v)

• Result =2

• v <- c(3,2,1,2,3,1,2,4,5,3,2,1,2,3,4,5,3,3)
• result <- getmode(v)
• result = 3
8
Standard Deviation
• Apart from calculating the mean, median, and mode, another important
function for any statistical analysis of the data is calculation of standard
deviation.

• The R function for calculating the same is as follows with syntax definition:
sd(V, na.rm=FALSE)

Function takes any numerical vector V as input with the na.rm parameter.

9
SD & Variance
• Standard Deviation
• The Standard Deviation is a measure of how spread out numbers are.
• Its symbol is σ (the greek letter sigma)
• The formula is easy: it is the square root of the Variance.
• Variance
• The Variance is defined as:
• “The average of the squared differences from the Mean”
• To calculate the variance follow these steps:
o Work out the Mean (the simple average of the numbers)
o Then for each number: subtract the Mean and square the result (the squared difference).
o Then work out the average of those squared differences.
10
Example:
• The heights of the (5) dogs (at the shoulders) are: 600mm, 470mm, 170mm,
430mm and 300mm.
• Find out the Mean, the Variance, and the Standard Deviation.
• Your first step is to find the Mean = 1970/5 = 394
• Variance σ2 =2062 + 762 + (−224)2 + 362 + (−94)2 / 5 =
• => 42436 + 5776 + 50176 + 1296 + 8836 / 5 =
• => 108520 / 5 = 21704
• So the Variance is 21,704
• Standard Deviation σ = √21704 = 147.32... = 147 (to the nearest mm)

11
10.7 Illustration of chi-squared goodness of fit test

10.8 Chi-squared Test of Independence

Linear Regression Analysis
• Linear regression analysis is one of the widely used methods for statistical
analysis and the result of this analysis is how exactly the relationship is
established between the two variables of the model.

• The first variable is known as the independent variable, which comprises the values
drawn out of experimental results.
• The second variable is the dependent variable, which comprises the values derived from
the independent variable.

• Linear regression analysis in R constitutes model building with the help of two
variables in the equation form as follows:
Yi = a + bXi + ei , i = 1, …, h

24
• Linear regression is a basic and commonly used type of predictive analysis.
• The overall idea of regression is to examine two things:
• (1) does a set of predictor variables do a good job in predicting an outcome
(dependent) variable? (2) Which variables in particular are significant
predictors of the outcome variable, and in what way do they–indicated by the
magnitude and sign of the beta estimates–impact the outcome variable?
• Three major uses for regression analysis are (1) determining the strength of
predictors, (2) forecasting an effect, and (3) trend forecasting.
• First, the regression might be used to identify the strength of the effect that the
independent variable(s) have on a dependent variable. Typical questions are
what is the strength of relationship between dose and effect, sales and
marketing spending, or age and income.

25
• Second, it can be used to forecast effects or impact of changes. That is, the
regression analysis helps us to understand how much the dependent variable
changes with a change in one or more independent variables. A typical question
is, “how much additional sales income do I get for each additional $1000 spent
on marketing?”
• Third, regression analysis predicts trends and future values. The regression
analysis can be used to get point estimates. A typical question is, “what will the
price of gold be in 6 months?”
Linear Regression
• Linear regression is an algorithm that provides a linear relationship
between an independent variable and a dependent variable to predict
the outcome of future events.
• It is a statistical method used in data science and machine learning for
predictive analysis.
• The independent variable is also the predictor or explanatory variable
that remains unchanged due to the change in other variables.
• However, the dependent variable changes with fluctuations in the
independent variable.
• The regression model predicts the value of the dependent variable,
which is the response or outcome variable being analyzed or studied.
Linear Regression Equation

• The measure of the relationship between two variables is shown by the correlation
coefficient. The range of the coefficient lies between -1 to +1. This coefficient
shows the strength of the association of the observed data between two variables.

• Linear Regression Equation is given below:

• Y=a+bX
• where X is the independent variable and it is plotted along the x-axis
• Y is the dependent variable and it is plotted along the y-axis
• Here, the slope of the line is b, and a is the intercept (the value of y when x = 0).
Linear Model Function
R has the lm() function, which stands for ‘linear model’

Multiple Linear Regression
• Multiple linear regression (MLR), also known simply as multiple
regression, is a statistical technique that uses several explanatory
variables to predict the outcome of a response variable.
• The goal of multiple linear regression is to model the linear
relationship between the explanatory (independent) variables and
response (dependent) variables.
• In essence, multiple regression is the extension of ordinary least-
squares (OLS) regression because it involves more than one
explanatory variable.

Downloadable Official CompTIA A+ Core 1 and Core 2 Student Guide
99% (72)
Downloadable Official CompTIA A+ Core 1 and Core 2 Student Guide
1,260 pages
What Men Dont Want Women To Know - The Secrets, The Lies, The Unspoken Truth - Smith and Doe
90% (20)
What Men Dont Want Women To Know - The Secrets, The Lies, The Unspoken Truth - Smith and Doe
157 pages
Love Will Come and Find Me Again
90% (30)
Love Will Come and Find Me Again
7 pages
Combs Indictment
91% (11)
Combs Indictment
14 pages
(Texas Driving Test) - Questions 4
80% (10)
(Texas Driving Test) - Questions 4
11 pages
4chan Ufo
100% (4)
4chan Ufo
46 pages
Nursing Cheat Sheets 76 Cheat Sheets For Nursing Students - Nodrm PDF
97% (65)
Nursing Cheat Sheets 76 Cheat Sheets For Nursing Students - Nodrm PDF
100 pages
House of Leaves - Mark Z Danielewski
80% (20)
House of Leaves - Mark Z Danielewski
750 pages
Volo's Guide To Monsters
100% (16)
Volo's Guide To Monsters
226 pages
Do You Like Big Girls V01
21% (24)
Do You Like Big Girls V01
161 pages
1new Code STBMU STALKS 2025
75% (4)
1new Code STBMU STALKS 2025
16 pages
Isx 15 Fuel Diagram
88% (17)
Isx 15 Fuel Diagram
3 pages
David Lanz Christmas Eve
100% (14)
David Lanz Christmas Eve
78 pages
Drug Cookbook
78% (9)
Drug Cookbook
67 pages
Waiting Addams Family
88% (8)
Waiting Addams Family
4 pages
Heavy-Duty Trucks Service Manual
88% (8)
Heavy-Duty Trucks Service Manual
4,439 pages
The Anarchist Cookbook - William Powell
No ratings yet
The Anarchist Cookbook - William Powell
405 pages
Nikola Tesla - The Inventions, Researches and Writings of Nikola Tesla
100% (39)
Nikola Tesla - The Inventions, Researches and Writings of Nikola Tesla
509 pages
SCDL - Managerial Economics
100% (11)
SCDL - Managerial Economics
19 pages
Electricians Exam Preparation Guide 8th Edition by Dale C Brickner and John E Traister PDF
100% (7)
Electricians Exam Preparation Guide 8th Edition by Dale C Brickner and John E Traister PDF
355 pages
Service Manual Trucks: Lighting Control Module (LCM) Fault Codes VN, VHD Version2
80% (5)
Service Manual Trucks: Lighting Control Module (LCM) Fault Codes VN, VHD Version2
82 pages
Where There Is No Doctor
100% (15)
Where There Is No Doctor
503 pages
4l60e Service Manual
100% (2)
4l60e Service Manual
150 pages
CMA - Forecasting Techniques
No ratings yet
CMA - Forecasting Techniques
22 pages
The Ultimate Survival Guide
20% (20)
The Ultimate Survival Guide
14 pages
Installation Guide: Roadrelay™ 4
100% (3)
Installation Guide: Roadrelay™ 4
54 pages
1979 Ford Truck Shop Manual Engine
100% (7)
1979 Ford Truck Shop Manual Engine
696 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
LinearRegression
No ratings yet
LinearRegression
24 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
LECTURE Regression
No ratings yet
LECTURE Regression
12 pages
Unit2 ML Notes
No ratings yet
Unit2 ML Notes
19 pages
Residual Analysis and test_02
No ratings yet
Residual Analysis and test_02
37 pages
Unit-Vi 2
No ratings yet
Unit-Vi 2
31 pages
Machine learning
No ratings yet
Machine learning
62 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
U3 U4 Regression
No ratings yet
U3 U4 Regression
22 pages
Linear Regression & Logistic Regression
No ratings yet
Linear Regression & Logistic Regression
30 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
lecture 9-10
No ratings yet
lecture 9-10
28 pages
3. Linear Regression
No ratings yet
3. Linear Regression
49 pages
Machine Learning QB
No ratings yet
Machine Learning QB
32 pages
Experiment No 7
No ratings yet
Experiment No 7
7 pages
Linear_Regression (1)
No ratings yet
Linear_Regression (1)
35 pages
m2 Data analytic and visualization
No ratings yet
m2 Data analytic and visualization
53 pages
BA Notes[End Sem)
No ratings yet
BA Notes[End Sem)
26 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
21 pages
Big Data - Sources and Opportunities
No ratings yet
Big Data - Sources and Opportunities
30 pages
Lecture 8-Association Between Variables
No ratings yet
Lecture 8-Association Between Variables
28 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
ML-U2-Regression
No ratings yet
ML-U2-Regression
20 pages
Modern Pridictive Modelling(Regression)
No ratings yet
Modern Pridictive Modelling(Regression)
12 pages
Unit 3
No ratings yet
Unit 3
25 pages
FAM Unit6
No ratings yet
FAM Unit6
32 pages
Linear regression case study
No ratings yet
Linear regression case study
6 pages
KCA 034 - Unit 2
No ratings yet
KCA 034 - Unit 2
97 pages
Budgetind Concepts and Forecoasting Techniques
No ratings yet
Budgetind Concepts and Forecoasting Techniques
26 pages
Budgeting Concepts and Forecasting Techniques
No ratings yet
Budgeting Concepts and Forecasting Techniques
26 pages
Budgeting Concepts and Forecoasting Techniques
No ratings yet
Budgeting Concepts and Forecoasting Techniques
26 pages
SC&RP - Unit 5
No ratings yet
SC&RP - Unit 5
36 pages
Data Science
100% (1)
Data Science
14 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Model Evaluation
No ratings yet
Model Evaluation
80 pages
BA - Advanced statistical method using R (P2)
No ratings yet
BA - Advanced statistical method using R (P2)
12 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Regression
No ratings yet
Regression
11 pages
10.Introduction to Artificial Intelligence
No ratings yet
10.Introduction to Artificial Intelligence
25 pages
ML - Unit 2
No ratings yet
ML - Unit 2
155 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
10 - APM 1205 Linear Model
No ratings yet
10 - APM 1205 Linear Model
40 pages
Unit 2 ML_Ver 2
No ratings yet
Unit 2 ML_Ver 2
129 pages
Da On Regression
No ratings yet
Da On Regression
58 pages
Group_1_Practical
No ratings yet
Group_1_Practical
16 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
U-4_IML
No ratings yet
U-4_IML
17 pages
w3 - Linear Model - Linear Regression
No ratings yet
w3 - Linear Model - Linear Regression
33 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Applied Linear Algebra: Core Principles
From Everand
Applied Linear Algebra: Core Principles
Kartikeya Dutta
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Linear Algebra Fundamentals
From Everand
Linear Algebra Fundamentals
Kartikeya Dutta
No ratings yet
The Backyard Gardener - Simple, Easy, and Beautiful Gardening With Vegetables, Herbs, and Flowers
100% (13)
The Backyard Gardener - Simple, Easy, and Beautiful Gardening With Vegetables, Herbs, and Flowers
257 pages
Lolita Express Flight Logs - Searchable - 1997 - 2006 Lolita Express Flight Logs
100% (8)
Lolita Express Flight Logs - Searchable - 1997 - 2006 Lolita Express Flight Logs
23 pages
The ARRL General Class License - ARRL Inc
100% (1)
The ARRL General Class License - ARRL Inc
463 pages
Form No 67 Ladder Inspection Checklist
No ratings yet
Form No 67 Ladder Inspection Checklist
1 page
Jeep XJ 2000 Cherokee MOPAR Parts Catalog
100% (4)
Jeep XJ 2000 Cherokee MOPAR Parts Catalog
583 pages
Final Take-Home Assignment: Bennett, Tony. "Text and Social Process: The Case of James Bond."
No ratings yet
Final Take-Home Assignment: Bennett, Tony. "Text and Social Process: The Case of James Bond."
11 pages
DLL - Mathematics 6 - Q2 - W5
No ratings yet
DLL - Mathematics 6 - Q2 - W5
4 pages
Download Full Love or Greatness Max Weber and Masculine Thinking 1° Edition Roslyn Wallach Bologh PDF All Chapters
100% (8)
Download Full Love or Greatness Max Weber and Masculine Thinking 1° Edition Roslyn Wallach Bologh PDF All Chapters
77 pages
A Case Study of How Netflix Adapts Its Development
No ratings yet
A Case Study of How Netflix Adapts Its Development
5 pages
Sansui Au-Alpha 777dg
No ratings yet
Sansui Au-Alpha 777dg
2 pages
MMM336 - Compute Fluid Dynamics
No ratings yet
MMM336 - Compute Fluid Dynamics
119 pages
The Environmental and Civilization Crisis and The Permaculture Alternative PDF
No ratings yet
The Environmental and Civilization Crisis and The Permaculture Alternative PDF
328 pages
South African Renewable Energy Grid Code Version 2.9 Requirements Part III Discussions and Conclusions
No ratings yet
South African Renewable Energy Grid Code Version 2.9 Requirements Part III Discussions and Conclusions
5 pages
Tech Uddeholm Steel For Moulds EN
No ratings yet
Tech Uddeholm Steel For Moulds EN
28 pages
Activity 1 Serial Dilution NEW
100% (1)
Activity 1 Serial Dilution NEW
7 pages
People Soft Bundle Release Note 9 Bundle19
No ratings yet
People Soft Bundle Release Note 9 Bundle19
28 pages
No To Premarital Sex: Ring Wearers
No ratings yet
No To Premarital Sex: Ring Wearers
26 pages
Class X Syllabus 2024-25
No ratings yet
Class X Syllabus 2024-25
6 pages
Water Demand Analysis of Municipal Water Supply Using Epanet PDF
No ratings yet
Water Demand Analysis of Municipal Water Supply Using Epanet PDF
11 pages
Mathematics for Machine Learning
No ratings yet
Mathematics for Machine Learning
270 pages
Reading Writing Skills
No ratings yet
Reading Writing Skills
10 pages
Lab - Assignment - 1 Structural Finite Element Analysis
No ratings yet
Lab - Assignment - 1 Structural Finite Element Analysis
2 pages
Med - 2013-05-12 - Vibration Measurement and Applications
No ratings yet
Med - 2013-05-12 - Vibration Measurement and Applications
27 pages
International Studies An Interdisciplinary Approach To Global Issues Sheldon R. Anderson Download PDF
100% (3)
International Studies An Interdisciplinary Approach To Global Issues Sheldon R. Anderson Download PDF
62 pages
(NB24-T524HO-01D) Specsheet 01152010
No ratings yet
(NB24-T524HO-01D) Specsheet 01152010
2 pages
Cylinder-Pressure-Based Engine Control Using Pressure-Ratio-Management and Low-Cost Non-Intrusive Cylinder Pressure Sensors PDF
No ratings yet
Cylinder-Pressure-Based Engine Control Using Pressure-Ratio-Management and Low-Cost Non-Intrusive Cylinder Pressure Sensors PDF
22 pages
Final Project Report Risc
No ratings yet
Final Project Report Risc
25 pages
120-Online Bus Ticket Booking - Synopsis
No ratings yet
120-Online Bus Ticket Booking - Synopsis
6 pages
Heckscher-Ohlin Theory (Factor Proportions Theory)
No ratings yet
Heckscher-Ohlin Theory (Factor Proportions Theory)
4 pages
DS 113 Science, Technology and Innovation For Development
No ratings yet
DS 113 Science, Technology and Innovation For Development
50 pages
Sas #20 Cri 199
No ratings yet
Sas #20 Cri 199
7 pages
ARIAS Act01
No ratings yet
ARIAS Act01
2 pages
Pumping of Liquids
100% (1)
Pumping of Liquids
175 pages
Hayes, 2008
No ratings yet
Hayes, 2008
10 pages