SlideShare a Scribd company logo
SHARETHIS
DATA ANALYSIS with R
Hassan Namarvar
2
WHAT IS R?
• R is a free software programming language and software
development for statistical computing and graphics.
• It is similar to S language developed at AT&T Bell Labs by Rick
Becker, John Chambers and Allan Wilks.
• R was initially developed by Ross Ihaka and Robert Gentleman
(1996), from the University of Auckland, New Zealand.
• R source code is written in C, Fortran, and R.
3
R PARADIGMS
Multi paradigms:
– Array
– Object-oriented
– Imperative
– Functional
– Procedural
– Reflective
4
STATISTICAL FEATURES
• Graphical Techniques
• Linear and nonlinear modeling
• Classical statistical tests
• Time-series analysis
• Classification
• Clustering
• Machine learning
5
PROGRAMMING FEATURES
• R is an interpreted language
• Access R through a command-line interpreter
• Like MATLAB, R supports matrix arithmetic
• Data structures:
– Vectors
– Metrics
– Array
– Data Frames
– Lists
6
ADVANTAGES OF R
• The most comprehensive statistical analysis package
available.
• Outstanding graphical capabilities
• Open source software – reviewed by experts
• R is free and licensed under the GNU.
• R has over 5,578 packages as of May 31, 2014!
• R is cross-platform. GNU/Linux, Mac, Windows.
• R plays well with CSV, SAS, SPSS, Excel, Access, Oracle, MySQL,
and SQLite.
7
HOW TO INSTALL R?
• Download an install the latest version from:
– http://cran.r-project.org
• Install packages from R Console:
– > install.packages(‘package_name’)
• R has its own LaTeX-like documentation:
– > help()
8
STARTING WITH R
• In R console:
– > x <- 2
– > x
– > y <- x^2
– > y
– > ls()
– > rm(y)
• Vectors:
– > v <- c(4, 7, 23.5, 76.2, 80)
– > Summary(v)
9
STARTING WITH R
• Histogram:
– > r <- rnorm(100)
– > summary(r)
– > plot(r)
– > hist(r)
• QQ-Plot (Quantile):
– > qqplot(r, rnorm(1000))
10
STARTING WITH R
• Factors:
– > g <- c(‘f’, ‘m’, ‘m’, ‘m’, ‘f’, ‘m’, ‘f’, ‘m’)
– > h <- factor(g)
– > table(g)
• Matrices:
– > r <- rnorm(100)
– > dim(r) <- c(50,2)
– > r
– > Summary(r)
– > M <- matrix(c(45, 23, 66, 77, 33, 44), 2, 3,
byrow=T)
11
STARTING WITH R
• Data Frames:
– > n = c(2, 3, 5)
– > s = c("aa", "bb", "cc")
– > b = c(TRUE, FALSE, TRUE)
– > df = data.frame(n, s, b)
• Built-in Data Set:
– > state.x77
– > st = as.data.frame(state.x77)
– > st$Density = st$Population * 1000 / st$Area
– > summary(st)
– > cor(st)
– > pairs(st)
12
STARTING WITH R
Population
3000 5500 68 71 40 55 0e+00 5e+05
015000
30005500
Income
Illiteracy
0.52.0
6871
Life Exp
Murder
2814
4055
HS Grad
Frost
0100
0e+005e+05
Area
0 15000 0.5 2.0 2 8 14 0 100 0 600
0600
Density
13
LINEAR REGRESSION MODEL IN R
• Linear Regression Model:
– > x <- 1:100
– > y <- x^3
– Model y = a + b . x
– > lm(y ~ x)
– > model <- lm(y ~ x)
– > summary(model)
– > par(mfrow=c(2,2))
– > plot(model)
14
LM MODEL
– Call:
– lm(formula = y ~ x)
– Residuals:
– Min 1Q Median 3Q Max
– -129827 -103680 -29649 85058 292030
– Coefficients:
– Estimate Std. Error t value Pr(>|t|)
– (Intercept) -207070.2 23299.3 -8.887 3.14e-14 ***
– x 9150.4 400.6 22.844 < 2e-16 ***
– ---
– Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
1
– Residual standard error: 115600 on 98 degrees of freedom
– Multiple R-squared: 0.8419, Adjusted R-squared: 0.8403
– F-statistic: 521.9 on 1 and 98 DF, p-value: < 2.2e-16
15
LM MODEL
0 20 40 60 80 100
0e+002e+054e+056e+058e+051e+06
y=x^3
x
y
16
DIAGNOSIS PLOT
-2e+05 2e+05 4e+05 6e+05
-1e+051e+053e+05
Fitted values
Residuals
Residuals vs Fitted
100
99
98
-2 -1 0 1 2
-10123
Theoretical Quantiles
Standardizedresiduals
Normal Q-Q
100
99
98
-2e+05 2e+05 4e+05 6e+05
0.00.51.01.5
Fitted values
Standardizedresiduals
Scale-Location
100
99
98
0.00 0.01 0.02 0.03 0.04
-10123
Leverage
Standardizedresiduals
Cook's distance
Residuals vs Leverage
100
99
98
17
LINEAR REGRESSION MODEL IN R
• Model Built-in Data:
– > colnames(st)[4] = "Life.Exp"
– > colnames(st)[6] = "HS.Grad"
– model1 = lm(Life.Exp ~ Population + Income
+ Illiteracy + Murder + HS.Grad + Frost +
Area + Density, data=st)
– > summary(model1)
– > model2 <- step(model1)
– > model3 = update(model2, .~.-Population)
– > Summary(model3)
18
LINEAR REGRESSION MODEL IN R
• Confidence limits on Estimated Coefficients:
– > confint(model3)
– > predict(model3, list(Murder=10.5,
HS.Grad=48, Frost=100))
19
OUTLIERS
• Boxplot:
– > v <- rnorm(100)
– > v = c(v,10)
– > boxplot(v)
– > rug(jitter(v), side=2)
-20246810
20
PROBABILITY DENSITY FUNCTION
• PDF:
– > r <- rnorm(1000)
– > hist(r, prob=T)
– > lines(density(r), col="red") Histogram of r
r
Density
-3 -2 -1 0 1 2 3
0.00.10.20.30.4
21
CASE STUDY: SHARETHIS EXAMPLE
• Relationship of clicks with winning price and Impression on
ADX:
• Data
– Analyzed ADX Hourly Impression Logs
• Method
– Detected outliers
– Predicted clicks using a regression tree model
22
CASE STUDY: SHARETHIS EXAMPLE
• Outlier Detection:
Clicks Impressions
23
CASE STUDY: SHARETHIS EXAMPLE
• Regression Tree
– One of the most powerful classification/regression
– > library(rpart)
– > fit <- rpart(log(CLK) ~ log(IMP) + AVG_PRICE +
SD_PRICE, data=x)
– > plot(fit)
– > text(fit)
– > plot(predict(fit), log(x$CLK))
24
CASE STUDY: SHARETHIS EXAMPLE
• Regression Tree
|
log(IMP)< 9.33
log(IMP)< 8.349 log(IMP)< 11.28
SD_PRICE< 0.2604
log(IMP)>=10.04 log(IMP)< 10.39
AVG_PRICE>=1.713 AVG_PRICE>=1.247
AVG_PRICE< 0.8555
log(IMP)< 12.49
0.751 1.387
1.541 2.869
1.959 2.729
3.003
3.104 4.331
3.577 4.753
25
CASE STUDY: SHARETHIS EXAMPLE
• Predict Log of Clicks
0 1 2 3 4 5 6 7
1234
log(x$CLK)
predict(fit)
26
CASE STUDY: COLOR DETECTION
• Detect color from product image:
-1.0 -0.5 0.0 0.5 1.0
-1.0-0.50.00.51.0
-1.0 -0.5 0.0 0.5 1.0
-1.0-0.50.00.51.0
-1.0 -0.5 0.0 0.5 1.0
-1.0-0.50.00.51.0
27
RESOURCES
• Books:
– An Introduction to Statistical Learning: with
Applications in R by G. James, D. Witten, T. Hatie,
R. Tibshirani, 2013
– The Art of R Programming: A Tour of Statistical
Software Design, N. Matloff, 2011
– R Cookbook (O'Reilly Cookbooks), P. Teetor, 2011
• R Blog:
– http://www.r-bloggers.com
Ad

More Related Content

What's hot (20)

Class ppt intro to r
Class ppt intro to rClass ppt intro to r
Class ppt intro to r
JigsawAcademy2014
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
izahn
 
R programming
R programmingR programming
R programming
Shantanu Patil
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
Kazuki Yoshida
 
Data Management in R
Data Management in RData Management in R
Data Management in R
Sankhya_Analytics
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
krishna singh
 
Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
Dr. C.V. Suresh Babu
 
Introduction to Rstudio
Introduction to RstudioIntroduction to Rstudio
Introduction to Rstudio
Olga Scrivner
 
Data Visualization With R
Data Visualization With RData Visualization With R
Data Visualization With R
Rsquared Academy
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
Dr. Hamdan Al-Sabri
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With R
Edureka!
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptx
Malla Reddy University
 
Data analytics using R programming
Data analytics using R programmingData analytics using R programming
Data analytics using R programming
Umang Singh
 
R programming slides
R  programming slidesR  programming slides
R programming slides
Pankaj Saini
 
Exploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science ClubExploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science Club
Martin Bago
 
R programming
R programmingR programming
R programming
TIB Academy
 
R studio
R studio R studio
R studio
Kinza Irshad
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
malathieswaran29
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in Python
Marc Garcia
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
Alichy Sowmya
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
izahn
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
krishna singh
 
Introduction to Rstudio
Introduction to RstudioIntroduction to Rstudio
Introduction to Rstudio
Olga Scrivner
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
Dr. Hamdan Al-Sabri
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With R
Edureka!
 
Unit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptxUnit 2 - Data Manipulation with R.pptx
Unit 2 - Data Manipulation with R.pptx
Malla Reddy University
 
Data analytics using R programming
Data analytics using R programmingData analytics using R programming
Data analytics using R programming
Umang Singh
 
R programming slides
R  programming slidesR  programming slides
R programming slides
Pankaj Saini
 
Exploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science ClubExploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science Club
Martin Bago
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
malathieswaran29
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in Python
Marc Garcia
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
Alichy Sowmya
 

Viewers also liked (20)

Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in R
Duyen Do
 
Discriminant analysis basicrelationships
Discriminant analysis basicrelationshipsDiscriminant analysis basicrelationships
Discriminant analysis basicrelationships
divyakalsi89
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
Dataspora
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
Great Wide Open
 
R for data analytics
R for data analyticsR for data analytics
R for data analytics
VijayMohan Vasu
 
R programming Basic & Advanced
R programming Basic & AdvancedR programming Basic & Advanced
R programming Basic & Advanced
Sohom Ghosh
 
R language tutorial
R language tutorialR language tutorial
R language tutorial
David Chiu
 
R learning by examples
R learning by examplesR learning by examples
R learning by examples
Michelle Darling
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with R
Yanchang Zhao
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics Platform
Syracuse University
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
izahn
 
Biopilot training centre @ vadodara
Biopilot training centre @ vadodaraBiopilot training centre @ vadodara
Biopilot training centre @ vadodara
Dr.Sumant Chaubey,Biologics Biosimilar
 
Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple.
Dr. Volkan OBAN
 
Applied spatial data introducing
Applied spatial data introducingApplied spatial data introducing
Applied spatial data introducing
Ha Hoang
 
Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...
Duyen Do
 
Introtor
IntrotorIntrotor
Introtor
Kamakshaiah M
 
Building powerful dashboards with r shiny
Building powerful dashboards with r shinyBuilding powerful dashboards with r shiny
Building powerful dashboards with r shiny
Victoria Blechman-Pomogajko
 
R programming language in spatial analysis
R programming language in spatial analysisR programming language in spatial analysis
R programming language in spatial analysis
Abhiram Kanigolla
 
Data clustering
Data clustering Data clustering
Data clustering
GARIMA SHAKYA
 
Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013
BertrandDrouvot
 
Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in R
Duyen Do
 
Discriminant analysis basicrelationships
Discriminant analysis basicrelationshipsDiscriminant analysis basicrelationships
Discriminant analysis basicrelationships
divyakalsi89
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
Dataspora
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
Great Wide Open
 
R programming Basic & Advanced
R programming Basic & AdvancedR programming Basic & Advanced
R programming Basic & Advanced
Sohom Ghosh
 
R language tutorial
R language tutorialR language tutorial
R language tutorial
David Chiu
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with R
Yanchang Zhao
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics Platform
Syracuse University
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
izahn
 
Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple.
Dr. Volkan OBAN
 
Applied spatial data introducing
Applied spatial data introducingApplied spatial data introducing
Applied spatial data introducing
Ha Hoang
 
Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...
Duyen Do
 
R programming language in spatial analysis
R programming language in spatial analysisR programming language in spatial analysis
R programming language in spatial analysis
Abhiram Kanigolla
 
Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013
BertrandDrouvot
 
Ad

Similar to Data analysis with R (20)

R
RR
R
exsuns
 
Big datacourse
Big datacourseBig datacourse
Big datacourse
Massimiliano Ruocco
 
R-programming with example representation.ppt
R-programming with example representation.pptR-programming with example representation.ppt
R-programming with example representation.ppt
geethar79
 
R Programming for Statistical Applications
R Programming for Statistical ApplicationsR Programming for Statistical Applications
R Programming for Statistical Applications
drputtanr
 
Basocs of statistics with R-Programming.ppt
Basocs of statistics with R-Programming.pptBasocs of statistics with R-Programming.ppt
Basocs of statistics with R-Programming.ppt
geethar79
 
Basics of R-Programming with example.ppt
Basics of R-Programming with example.pptBasics of R-Programming with example.ppt
Basics of R-Programming with example.ppt
geethar79
 
R-Programming.ppt it is based on R programming language
R-Programming.ppt it is based on R programming languageR-Programming.ppt it is based on R programming language
R-Programming.ppt it is based on R programming language
Zoha681526
 
R programming by ganesh kavhar
R programming by ganesh kavharR programming by ganesh kavhar
R programming by ganesh kavhar
Savitribai Phule Pune University
 
Rtutorial
RtutorialRtutorial
Rtutorial
Dheeraj Dwivedi
 
Perm winter school 2014.01.31
Perm winter school 2014.01.31Perm winter school 2014.01.31
Perm winter school 2014.01.31
Vyacheslav Arbuzov
 
Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)
Chia-Chi Chang
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
Khaled Al-Shamaa
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
AmanBhalla14
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Vyacheslav Arbuzov
 
Introduction to R.pptx
Introduction to R.pptxIntroduction to R.pptx
Introduction to R.pptx
karthikks82
 
Ch1
Ch1Ch1
Ch1
Chhom Karath
 
Seminar psu 20.10.2013
Seminar psu 20.10.2013Seminar psu 20.10.2013
Seminar psu 20.10.2013
Vyacheslav Arbuzov
 
statistical computation using R- an intro..
statistical computation using R- an intro..statistical computation using R- an intro..
statistical computation using R- an intro..
Kamarudheen KV
 
Language R
Language RLanguage R
Language R
Girish Khanzode
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
Long Nguyen
 
R-programming with example representation.ppt
R-programming with example representation.pptR-programming with example representation.ppt
R-programming with example representation.ppt
geethar79
 
R Programming for Statistical Applications
R Programming for Statistical ApplicationsR Programming for Statistical Applications
R Programming for Statistical Applications
drputtanr
 
Basocs of statistics with R-Programming.ppt
Basocs of statistics with R-Programming.pptBasocs of statistics with R-Programming.ppt
Basocs of statistics with R-Programming.ppt
geethar79
 
Basics of R-Programming with example.ppt
Basics of R-Programming with example.pptBasics of R-Programming with example.ppt
Basics of R-Programming with example.ppt
geethar79
 
R-Programming.ppt it is based on R programming language
R-Programming.ppt it is based on R programming languageR-Programming.ppt it is based on R programming language
R-Programming.ppt it is based on R programming language
Zoha681526
 
Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)
Chia-Chi Chang
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
AmanBhalla14
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Vyacheslav Arbuzov
 
Introduction to R.pptx
Introduction to R.pptxIntroduction to R.pptx
Introduction to R.pptx
karthikks82
 
statistical computation using R- an intro..
statistical computation using R- an intro..statistical computation using R- an intro..
statistical computation using R- an intro..
Kamarudheen KV
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
Long Nguyen
 
Ad

More from ShareThis (20)

ShareThis Canadian Millennials Study_2015
ShareThis Canadian Millennials Study_2015ShareThis Canadian Millennials Study_2015
ShareThis Canadian Millennials Study_2015
ShareThis
 
Real time pipeline at terabyte sacle
Real time pipeline at terabyte sacleReal time pipeline at terabyte sacle
Real time pipeline at terabyte sacle
ShareThis
 
ShareThis TV Study
ShareThis TV StudyShareThis TV Study
ShareThis TV Study
ShareThis
 
Q1/2015 ShareThis Consumer Sharing Trends Report
Q1/2015 ShareThis Consumer Sharing Trends ReportQ1/2015 ShareThis Consumer Sharing Trends Report
Q1/2015 ShareThis Consumer Sharing Trends Report
ShareThis
 
ShareThis Finance Study
ShareThis Finance Study ShareThis Finance Study
ShareThis Finance Study
ShareThis
 
DataScienceInnovation_ShareThis
DataScienceInnovation_ShareThisDataScienceInnovation_ShareThis
DataScienceInnovation_ShareThis
ShareThis
 
Share this influentialdemocrats_jan2015
Share this influentialdemocrats_jan2015Share this influentialdemocrats_jan2015
Share this influentialdemocrats_jan2015
ShareThis
 
ShareThis TravelStudy-2014
ShareThis TravelStudy-2014ShareThis TravelStudy-2014
ShareThis TravelStudy-2014
ShareThis
 
ShareThis Midterm Elections_2014
ShareThis Midterm Elections_2014ShareThis Midterm Elections_2014
ShareThis Midterm Elections_2014
ShareThis
 
H2O platform workshop
H2O platform workshopH2O platform workshop
H2O platform workshop
ShareThis
 
Q3 2014 Consumer Sharing Trends Report
Q3 2014 Consumer Sharing Trends ReportQ3 2014 Consumer Sharing Trends Report
Q3 2014 Consumer Sharing Trends Report
ShareThis
 
ShareThis_Return on a Share Study
ShareThis_Return on a Share StudyShareThis_Return on a Share Study
ShareThis_Return on a Share Study
ShareThis
 
Share this millennial study_2014
Share this millennial study_2014Share this millennial study_2014
Share this millennial study_2014
ShareThis
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on Oozie
ShareThis
 
ShareThis_CSTR_July2014
ShareThis_CSTR_July2014ShareThis_CSTR_July2014
ShareThis_CSTR_July2014
ShareThis
 
Sharing Steals the Cup
Sharing Steals the CupSharing Steals the Cup
Sharing Steals the Cup
ShareThis
 
ShareThis Auto Study
ShareThis Auto Study ShareThis Auto Study
ShareThis Auto Study
ShareThis
 
ShareThis Return on a Share Study
ShareThis Return on a Share StudyShareThis Return on a Share Study
ShareThis Return on a Share Study
ShareThis
 
Social TV
Social TVSocial TV
Social TV
ShareThis
 
ShareThis RoS
ShareThis RoS ShareThis RoS
ShareThis RoS
ShareThis
 
ShareThis Canadian Millennials Study_2015
ShareThis Canadian Millennials Study_2015ShareThis Canadian Millennials Study_2015
ShareThis Canadian Millennials Study_2015
ShareThis
 
Real time pipeline at terabyte sacle
Real time pipeline at terabyte sacleReal time pipeline at terabyte sacle
Real time pipeline at terabyte sacle
ShareThis
 
ShareThis TV Study
ShareThis TV StudyShareThis TV Study
ShareThis TV Study
ShareThis
 
Q1/2015 ShareThis Consumer Sharing Trends Report
Q1/2015 ShareThis Consumer Sharing Trends ReportQ1/2015 ShareThis Consumer Sharing Trends Report
Q1/2015 ShareThis Consumer Sharing Trends Report
ShareThis
 
ShareThis Finance Study
ShareThis Finance Study ShareThis Finance Study
ShareThis Finance Study
ShareThis
 
DataScienceInnovation_ShareThis
DataScienceInnovation_ShareThisDataScienceInnovation_ShareThis
DataScienceInnovation_ShareThis
ShareThis
 
Share this influentialdemocrats_jan2015
Share this influentialdemocrats_jan2015Share this influentialdemocrats_jan2015
Share this influentialdemocrats_jan2015
ShareThis
 
ShareThis TravelStudy-2014
ShareThis TravelStudy-2014ShareThis TravelStudy-2014
ShareThis TravelStudy-2014
ShareThis
 
ShareThis Midterm Elections_2014
ShareThis Midterm Elections_2014ShareThis Midterm Elections_2014
ShareThis Midterm Elections_2014
ShareThis
 
H2O platform workshop
H2O platform workshopH2O platform workshop
H2O platform workshop
ShareThis
 
Q3 2014 Consumer Sharing Trends Report
Q3 2014 Consumer Sharing Trends ReportQ3 2014 Consumer Sharing Trends Report
Q3 2014 Consumer Sharing Trends Report
ShareThis
 
ShareThis_Return on a Share Study
ShareThis_Return on a Share StudyShareThis_Return on a Share Study
ShareThis_Return on a Share Study
ShareThis
 
Share this millennial study_2014
Share this millennial study_2014Share this millennial study_2014
Share this millennial study_2014
ShareThis
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on Oozie
ShareThis
 
ShareThis_CSTR_July2014
ShareThis_CSTR_July2014ShareThis_CSTR_July2014
ShareThis_CSTR_July2014
ShareThis
 
Sharing Steals the Cup
Sharing Steals the CupSharing Steals the Cup
Sharing Steals the Cup
ShareThis
 
ShareThis Auto Study
ShareThis Auto Study ShareThis Auto Study
ShareThis Auto Study
ShareThis
 
ShareThis Return on a Share Study
ShareThis Return on a Share StudyShareThis Return on a Share Study
ShareThis Return on a Share Study
ShareThis
 
ShareThis RoS
ShareThis RoS ShareThis RoS
ShareThis RoS
ShareThis
 

Recently uploaded (20)

Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 

Data analysis with R

  • 1. SHARETHIS DATA ANALYSIS with R Hassan Namarvar
  • 2. 2 WHAT IS R? • R is a free software programming language and software development for statistical computing and graphics. • It is similar to S language developed at AT&T Bell Labs by Rick Becker, John Chambers and Allan Wilks. • R was initially developed by Ross Ihaka and Robert Gentleman (1996), from the University of Auckland, New Zealand. • R source code is written in C, Fortran, and R.
  • 3. 3 R PARADIGMS Multi paradigms: – Array – Object-oriented – Imperative – Functional – Procedural – Reflective
  • 4. 4 STATISTICAL FEATURES • Graphical Techniques • Linear and nonlinear modeling • Classical statistical tests • Time-series analysis • Classification • Clustering • Machine learning
  • 5. 5 PROGRAMMING FEATURES • R is an interpreted language • Access R through a command-line interpreter • Like MATLAB, R supports matrix arithmetic • Data structures: – Vectors – Metrics – Array – Data Frames – Lists
  • 6. 6 ADVANTAGES OF R • The most comprehensive statistical analysis package available. • Outstanding graphical capabilities • Open source software – reviewed by experts • R is free and licensed under the GNU. • R has over 5,578 packages as of May 31, 2014! • R is cross-platform. GNU/Linux, Mac, Windows. • R plays well with CSV, SAS, SPSS, Excel, Access, Oracle, MySQL, and SQLite.
  • 7. 7 HOW TO INSTALL R? • Download an install the latest version from: – http://cran.r-project.org • Install packages from R Console: – > install.packages(‘package_name’) • R has its own LaTeX-like documentation: – > help()
  • 8. 8 STARTING WITH R • In R console: – > x <- 2 – > x – > y <- x^2 – > y – > ls() – > rm(y) • Vectors: – > v <- c(4, 7, 23.5, 76.2, 80) – > Summary(v)
  • 9. 9 STARTING WITH R • Histogram: – > r <- rnorm(100) – > summary(r) – > plot(r) – > hist(r) • QQ-Plot (Quantile): – > qqplot(r, rnorm(1000))
  • 10. 10 STARTING WITH R • Factors: – > g <- c(‘f’, ‘m’, ‘m’, ‘m’, ‘f’, ‘m’, ‘f’, ‘m’) – > h <- factor(g) – > table(g) • Matrices: – > r <- rnorm(100) – > dim(r) <- c(50,2) – > r – > Summary(r) – > M <- matrix(c(45, 23, 66, 77, 33, 44), 2, 3, byrow=T)
  • 11. 11 STARTING WITH R • Data Frames: – > n = c(2, 3, 5) – > s = c("aa", "bb", "cc") – > b = c(TRUE, FALSE, TRUE) – > df = data.frame(n, s, b) • Built-in Data Set: – > state.x77 – > st = as.data.frame(state.x77) – > st$Density = st$Population * 1000 / st$Area – > summary(st) – > cor(st) – > pairs(st)
  • 12. 12 STARTING WITH R Population 3000 5500 68 71 40 55 0e+00 5e+05 015000 30005500 Income Illiteracy 0.52.0 6871 Life Exp Murder 2814 4055 HS Grad Frost 0100 0e+005e+05 Area 0 15000 0.5 2.0 2 8 14 0 100 0 600 0600 Density
  • 13. 13 LINEAR REGRESSION MODEL IN R • Linear Regression Model: – > x <- 1:100 – > y <- x^3 – Model y = a + b . x – > lm(y ~ x) – > model <- lm(y ~ x) – > summary(model) – > par(mfrow=c(2,2)) – > plot(model)
  • 14. 14 LM MODEL – Call: – lm(formula = y ~ x) – Residuals: – Min 1Q Median 3Q Max – -129827 -103680 -29649 85058 292030 – Coefficients: – Estimate Std. Error t value Pr(>|t|) – (Intercept) -207070.2 23299.3 -8.887 3.14e-14 *** – x 9150.4 400.6 22.844 < 2e-16 *** – --- – Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 – Residual standard error: 115600 on 98 degrees of freedom – Multiple R-squared: 0.8419, Adjusted R-squared: 0.8403 – F-statistic: 521.9 on 1 and 98 DF, p-value: < 2.2e-16
  • 15. 15 LM MODEL 0 20 40 60 80 100 0e+002e+054e+056e+058e+051e+06 y=x^3 x y
  • 16. 16 DIAGNOSIS PLOT -2e+05 2e+05 4e+05 6e+05 -1e+051e+053e+05 Fitted values Residuals Residuals vs Fitted 100 99 98 -2 -1 0 1 2 -10123 Theoretical Quantiles Standardizedresiduals Normal Q-Q 100 99 98 -2e+05 2e+05 4e+05 6e+05 0.00.51.01.5 Fitted values Standardizedresiduals Scale-Location 100 99 98 0.00 0.01 0.02 0.03 0.04 -10123 Leverage Standardizedresiduals Cook's distance Residuals vs Leverage 100 99 98
  • 17. 17 LINEAR REGRESSION MODEL IN R • Model Built-in Data: – > colnames(st)[4] = "Life.Exp" – > colnames(st)[6] = "HS.Grad" – model1 = lm(Life.Exp ~ Population + Income + Illiteracy + Murder + HS.Grad + Frost + Area + Density, data=st) – > summary(model1) – > model2 <- step(model1) – > model3 = update(model2, .~.-Population) – > Summary(model3)
  • 18. 18 LINEAR REGRESSION MODEL IN R • Confidence limits on Estimated Coefficients: – > confint(model3) – > predict(model3, list(Murder=10.5, HS.Grad=48, Frost=100))
  • 19. 19 OUTLIERS • Boxplot: – > v <- rnorm(100) – > v = c(v,10) – > boxplot(v) – > rug(jitter(v), side=2) -20246810
  • 20. 20 PROBABILITY DENSITY FUNCTION • PDF: – > r <- rnorm(1000) – > hist(r, prob=T) – > lines(density(r), col="red") Histogram of r r Density -3 -2 -1 0 1 2 3 0.00.10.20.30.4
  • 21. 21 CASE STUDY: SHARETHIS EXAMPLE • Relationship of clicks with winning price and Impression on ADX: • Data – Analyzed ADX Hourly Impression Logs • Method – Detected outliers – Predicted clicks using a regression tree model
  • 22. 22 CASE STUDY: SHARETHIS EXAMPLE • Outlier Detection: Clicks Impressions
  • 23. 23 CASE STUDY: SHARETHIS EXAMPLE • Regression Tree – One of the most powerful classification/regression – > library(rpart) – > fit <- rpart(log(CLK) ~ log(IMP) + AVG_PRICE + SD_PRICE, data=x) – > plot(fit) – > text(fit) – > plot(predict(fit), log(x$CLK))
  • 24. 24 CASE STUDY: SHARETHIS EXAMPLE • Regression Tree | log(IMP)< 9.33 log(IMP)< 8.349 log(IMP)< 11.28 SD_PRICE< 0.2604 log(IMP)>=10.04 log(IMP)< 10.39 AVG_PRICE>=1.713 AVG_PRICE>=1.247 AVG_PRICE< 0.8555 log(IMP)< 12.49 0.751 1.387 1.541 2.869 1.959 2.729 3.003 3.104 4.331 3.577 4.753
  • 25. 25 CASE STUDY: SHARETHIS EXAMPLE • Predict Log of Clicks 0 1 2 3 4 5 6 7 1234 log(x$CLK) predict(fit)
  • 26. 26 CASE STUDY: COLOR DETECTION • Detect color from product image: -1.0 -0.5 0.0 0.5 1.0 -1.0-0.50.00.51.0 -1.0 -0.5 0.0 0.5 1.0 -1.0-0.50.00.51.0 -1.0 -0.5 0.0 0.5 1.0 -1.0-0.50.00.51.0
  • 27. 27 RESOURCES • Books: – An Introduction to Statistical Learning: with Applications in R by G. James, D. Witten, T. Hatie, R. Tibshirani, 2013 – The Art of R Programming: A Tour of Statistical Software Design, N. Matloff, 2011 – R Cookbook (O'Reilly Cookbooks), P. Teetor, 2011 • R Blog: – http://www.r-bloggers.com

Editor's Notes

  • #2: Client Interview Position the upcoming as introductory and a launching pad for further exploration To get started, want to share a brief video that’s been helpful for our partners …