SlideShare a Scribd company logo
Exploratory Data Analysis 
Wesley GOI
In today’s session 
• Principles behind exploratory analyses 
• Plotting data out on to popular exploratory graphs 
• Plotting Systems in R 
• Base (Week1) 
• Lattice (Week2) 
• GGPLOT2 (Week2) 
• Choosing and using Graphic Devices aka the output formats 
Scripts can be downloaded at: 
https://www.dropbox.com/s/ii1yj8f650d4l1q/lesson1.r?dl=0 
https://www.dropbox.com/s/eme44h6lrhn775l/final.r?dl=0
Principles behind exploratory analyses 
• Show comparisons 
• Show causality, mechanism, explanation 
• Show multivariate data 
• Integrate multiple modes of evidence 
• Describe and document the evidence 
• Content is king 
• SPEED
Dimensionality 
• Five-number summary 
• Boxplots 
• Histograms 
• Density plot 
• Barplot 
Multiple-overlayed 1D plots 
Scatter plots
Downloading our dataset 
R code 
dir.create("exploring_data") 
setwd(“exploring_data”) 
download.file(“http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/therbook.zip",dest="data.zip") 
unzip(“data.zip”)
R code 
Boxplots 
weather = read.table("SilwoodWeather.txt",h=T) 
onemonth = subset(weather, 
month==1 & yr == 2004) 
boxplot(onemonth$rain) 
Header = T
Histograms 
R code 
hist(weather$upper) 
rug(weather$upper) ticks for each value
Barplot 
R code 
Barplot( 
table(weather$month), 
col = "wheat", 
main = "Number of Observations in 
Months”)
Raster Vector 
PNG PDF SVG 
grDevices 
Filesize small medium medium 
Scalable No Yes Yes 
Web friendly Yes No Yes
Plotting Systems 
Plotting Systems 
Base Lattice Grid 
Libraries lattice grid, gridExtras 
ggplot2 
Example 
functions 
hist✔ 
barplot✔ 
boxplot✔ 
Plot 
xyplot (scatterplots) 
bwplot (boxplots) 
levelplot 
qplot 
ggplot 
geom 
Facetted plots Yes Yes Yes 
Grammar of 
NO No Yes 
graphics 
Interface with 
statistical 
functions 
Yes Partial Partial + 
Workarounds 
Cannot 
be mixed
Base plots: Scatterplot 
R code 
data1 = read.table("scatter1.txt", h=T) 
data2 = read.table("scatter2.txt", h=T)
Base plots: Scatterplot 
R code 
data1 = read.table("scatter1.txt", h=T) 
data2 = read.table("scatter2.txt", h=T) 
#Color 
with(data1, plot(xv, ys, col="red")) 
#Regression Line 
with(data1, abline(lm(ys~xv))) 
Color
Base plots: Scatterplot 
Set symbol to represent data point
Base plots: Scatterplot 
R code 
data1 = read.table("scatter1.txt", h=T) 
data2 = read.table("scatter2.txt", h=T) 
#Color 
with(data1, plot(xv, ys, col="red")) 
with(data1, abline(lm(ys~xv))) 
#shape 
with(data2, 
points(xv2, ys2, col="blue", 
pch =11)) 
Symbol shape
Base plots: Scatterplot 
R code 
data1 = read.table("scatter1.txt", h=T) 
data2 = read.table("scatter2.txt", h=T) 
#Color 
with(data1, plot(xv, ys, col="red")) 
with(data1, abline(lm(ys~xv))) 
#shape 
with(data2, 
points(xv2, ys2, col="blue", 
pch =11)) 
Symbol shape
Base plots: Using par for multiple plots 
R code 
par(mfrow=c(1,2)) 
with(data1, plot(xv, ys, col="red")) 
with(data1, abline(lm(ys~xv))) 
#Plot2 
with(data2, 
plot(xv2, ys2, col="blue", 
pch =11)) 
title(“My Title", outer=TRUE)
Par: To set global settings 
R code 
mfrow( 
mar=c(5.1,4.1,4.1,2.1), 
oma=c(2,2,2,2) 
)
Lattice 
productivity = read.table("productivity.txt",h=T) 
# of species in forest against differing productivity 
library(lattice) 
#plotting 
xyplot( x~y, productivity, 
xlab=list(label="Productivity"), 
ylab=list(label="Mammal Species")) 
R code 
Formular 
Data frame
Exploratory Analysis Part1 Coursera DataScience Specialisation
Lattice 
productivity = read.table("productivity.txt",h=T) 
# of species in forest against differing productivity 
library(lattice) 
#plotting 
xyplot( x~y, productivity, 
xlab=list(label="Productivity"), 
ylab=list(label="Mammal Species")) 
xyplot( x~y | f, productivity, 
xlab=list(label="Productivity"), 
ylab=list(label="Mammal Species")) 
R code 
Formular 
Data frame 
given
Exploratory Analysis Part1 Coursera DataScience Specialisation
ggplot2 
• Grammar of graphics (gg) 
• Based on GRID plotting system, cannot be 
mixed with base 
ggplot2.org
ggplot 
Components 
• Data & relationship 
• GEOMetric Object 
• Statistical transformation 
• Scales 
• Coordinate system 
• Facetting
ggplot 
Data
ggplot 
Mapping
ggplot 
Geometric objects 
aka 
Geoms 
Coordinate system 
wrt 
scales 
Log scale / sqrt / log ratio 
Title 
Plot 
Theme 
etc
ggplot 
Geometric objects 
aka 
Geoms
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object 
• Statistical transformation 
• Scales 
• Coordinate system 
• Facetting 
R code 
Rmbr to change 
month into a 
factor 
data.frame 
Aesthetics function which maps the relationships 
ggplot(weather, aes(x=month, y=upper))+ 
geom_boxplot()
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object ✔ 
• Statistical transformation✔ 
• Scales 
• Coordinate system 
• Facetting 
R code 
weather2 = weather %>% 
group_by(month) %>% 
summarise(average.upper = mean(upper)) 
ggplot(weather2, aes(month, average.upper))+ 
geom_bar(stat="identity")
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object ✔ 
• Statistical transformation✔ 
• Scales 
• Coordinate system 
• Facetting 
R code 
weather2 = weather %>% 
group_by(month) %>% 
summarise(average.upper = mean(upper)) 
ggplot(weather2, aes(month, average.upper))+ 
geom_bar(stat="identity")
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object ✔ 
• Statistical transformation✔ 
• Scales✔ 
• Coordinate system 
• Facetting 
R code 
plot2 = ggplot(weather2, 
aes(month, average.upper))+ 
geom_bar(aes(fill=month),stat="identity")+ 
scale_fill_brewer(palette="Set3")+ 
xlab("Months")+ 
ylab("Upper Quantile")+theme_bw()
ggplot 
Components 
• Data & relationship ✔ 
• GEOMetric Object ✔ 
• Statistical transformation✔ 
• Scales✔ 
• Coordinate system 
• Facetting 
R code 
plot2 = ggplot(weather2, 
aes(month, average.upper))+ 
geom_bar(aes(fill=month),stat="identity")+ 
scale_fill_brewer(palette="Set3")+ 
xlab("Months")+ 
ylab("Upper Quantile")+theme_bw()
ggplot
qplot 
A separate function which wraps ggplot, for simpler syntax 
R code 
qplot(month, upper, fill=month, data=weather, facets = ~yr, geom="bar", 
stat="identity")
Ethos behind visualization 
http://keylines.com/network-visualization
Final Challenge
Final Challenge 
R code 
library(ggplot2) 
#Reads in data 
data = read.csv("final.csv") 
#Preparing for the rectangle background 
areas=unique(subset(data, select=c(Planning_Area,Planning_Region))) 
areas=areas[order(areas$Planning_Region),] 
areas$rectid=1:nrow(areas) 
rectdata = areas %>% group_by(Planning_Region) %>% summarise(xstart=min(rectid)- 
0.5,xend= max(rectid)+0.5) 
#Order the levels 
data$Planning_Area=factor(data$Planning_Area, 
levels=as.character(areas[order(areas$Planning_Region),]$Planning_Area))
Final challenge 
#Plot 
p0 = 
ggplot(data, aes(Planning_Area, Unit_Price____psm_))+ 
geom_boxplot(outlier.colour=NA)+ 
geom_rect(data=rectdata,aes(xmin=xstart,xmax=xend,ymin = -Inf, ymax = Inf, fill = 
Planning_Region,group=Planning_Region), alpha = 0.4,inherit.aes=F)+ 
geom_jitter(alpha=0.40, aes(color=as.factor(Year)))+ 
scale_color_brewer("Year", palette='RdBu')+ 
scale_fill_brewer(palette="Set1",name='Region')+ 
theme_minimal()+ 
theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1))+ 
xlab("Planning Area")+ylab("Unit Price (PSM)") 
R code 
#Save plot 
ggsave(p0, file="areaboxplots.pdf",w=20,h=10,units="in",dpi=300)
“Above all else show the data.” 
― Edward R. Tufte, The Visual Display of Quantitative Information 
Thank you for your time
gridExtras
Ad

More Related Content

Viewers also liked (16)

Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor Analysis
Daire Hooper
 
Hamilton 1994 time series analysis
Hamilton 1994 time series analysisHamilton 1994 time series analysis
Hamilton 1994 time series analysis
Ozan Baskan
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
Ammar Pervaiz
 
Descriptive Analysis in Statistics
Descriptive Analysis in StatisticsDescriptive Analysis in Statistics
Descriptive Analysis in Statistics
Azmi Mohd Tamil
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and Practice
Tetiana Ivanova
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
Bhagya Silva
 
Time series
Time seriesTime series
Time series
Haitham Ahmed
 
Time Series Analysis Ravi
Time Series Analysis RaviTime Series Analysis Ravi
Time Series Analysis Ravi
RAVINDRA KUMAR KUMAWAT
 
Time series slideshare
Time series slideshareTime series slideshare
Time series slideshare
Sabbir Tahmidur Rahman
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Aiden Yeh
 
Time Series
Time SeriesTime Series
Time Series
yush313
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
guest290abe
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
jamiebrandon
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
James Neill
 
time series analysis
time series analysistime series analysis
time series analysis
SACHIN AWASTHI
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
Drift
 
Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor Analysis
Daire Hooper
 
Hamilton 1994 time series analysis
Hamilton 1994 time series analysisHamilton 1994 time series analysis
Hamilton 1994 time series analysis
Ozan Baskan
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
Ammar Pervaiz
 
Descriptive Analysis in Statistics
Descriptive Analysis in StatisticsDescriptive Analysis in Statistics
Descriptive Analysis in Statistics
Azmi Mohd Tamil
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and Practice
Tetiana Ivanova
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
Bhagya Silva
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Aiden Yeh
 
Time Series
Time SeriesTime Series
Time Series
yush313
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
guest290abe
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
jamiebrandon
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
James Neill
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
Drift
 

Similar to Exploratory Analysis Part1 Coursera DataScience Specialisation (20)

R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
AmanBhalla14
 
Big datacourse
Big datacourseBig datacourse
Big datacourse
Massimiliano Ruocco
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
Long Nguyen
 
Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache Calcite
DataWorks Summit
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
Julian Hyde
 
Tech talk ggplot2
Tech talk   ggplot2Tech talk   ggplot2
Tech talk ggplot2
jalle6
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
Julian Hyde
 
ggplotcourse.pptx
ggplotcourse.pptxggplotcourse.pptx
ggplotcourse.pptx
JAVIERDELAHOZ8
 
M4_DAR_part1. module part 4 analystics with r
M4_DAR_part1. module part 4 analystics with rM4_DAR_part1. module part 4 analystics with r
M4_DAR_part1. module part 4 analystics with r
LalithauLali
 
Spatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the PrettySpatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the Pretty
Noam Ross
 
Presentation: Plotting Systems in R
Presentation: Plotting Systems in RPresentation: Plotting Systems in R
Presentation: Plotting Systems in R
Ilya Zhbannikov
 
R training5
R training5R training5
R training5
Hellen Gakuruh
 
BasicGraphsWithR
BasicGraphsWithRBasicGraphsWithR
BasicGraphsWithR
Aureliano Bombarely
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_public
Long Nguyen
 
ggplot2: An Extensible Platform for Publication-quality Graphics
ggplot2: An Extensible Platform for Publication-quality Graphicsggplot2: An Extensible Platform for Publication-quality Graphics
ggplot2: An Extensible Platform for Publication-quality Graphics
Claus Wilke
 
Week-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docx
helzerpatrina
 
Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2
yannabraham
 
Data visualization with R and ggplot2.docx
Data visualization with R and ggplot2.docxData visualization with R and ggplot2.docx
Data visualization with R and ggplot2.docx
kassaye4
 
Exploratory data analysis of 2017 US Employment data using R
Exploratory data analysis  of 2017 US Employment data using RExploratory data analysis  of 2017 US Employment data using R
Exploratory data analysis of 2017 US Employment data using R
Chetan Khanzode
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
Spencer Fox
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
AmanBhalla14
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
Long Nguyen
 
Data profiling in Apache Calcite
Data profiling in Apache CalciteData profiling in Apache Calcite
Data profiling in Apache Calcite
DataWorks Summit
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
Julian Hyde
 
Tech talk ggplot2
Tech talk   ggplot2Tech talk   ggplot2
Tech talk ggplot2
jalle6
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
Julian Hyde
 
M4_DAR_part1. module part 4 analystics with r
M4_DAR_part1. module part 4 analystics with rM4_DAR_part1. module part 4 analystics with r
M4_DAR_part1. module part 4 analystics with r
LalithauLali
 
Spatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the PrettySpatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the Pretty
Noam Ross
 
Presentation: Plotting Systems in R
Presentation: Plotting Systems in RPresentation: Plotting Systems in R
Presentation: Plotting Systems in R
Ilya Zhbannikov
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_public
Long Nguyen
 
ggplot2: An Extensible Platform for Publication-quality Graphics
ggplot2: An Extensible Platform for Publication-quality Graphicsggplot2: An Extensible Platform for Publication-quality Graphics
ggplot2: An Extensible Platform for Publication-quality Graphics
Claus Wilke
 
Week-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docx
helzerpatrina
 
Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2
yannabraham
 
Data visualization with R and ggplot2.docx
Data visualization with R and ggplot2.docxData visualization with R and ggplot2.docx
Data visualization with R and ggplot2.docx
kassaye4
 
Exploratory data analysis of 2017 US Employment data using R
Exploratory data analysis  of 2017 US Employment data using RExploratory data analysis  of 2017 US Employment data using R
Exploratory data analysis of 2017 US Employment data using R
Chetan Khanzode
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
Spencer Fox
 
Ad

Recently uploaded (20)

Examining Visual Attention in Gaze-Driven VR Learning: An Eye-Tracking Study ...
Examining Visual Attention in Gaze-Driven VR Learning: An Eye-Tracking Study ...Examining Visual Attention in Gaze-Driven VR Learning: An Eye-Tracking Study ...
Examining Visual Attention in Gaze-Driven VR Learning: An Eye-Tracking Study ...
Yasasi Abeysinghe
 
SuperconductingMagneticEnergyStorage.pptx
SuperconductingMagneticEnergyStorage.pptxSuperconductingMagneticEnergyStorage.pptx
SuperconductingMagneticEnergyStorage.pptx
BurkanAlpKale
 
Application of Microbiology- Industrial, agricultural, medical
Application of Microbiology- Industrial, agricultural, medicalApplication of Microbiology- Industrial, agricultural, medical
Application of Microbiology- Industrial, agricultural, medical
Anoja Kurian
 
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...
home
 
Polymerase Chain Reaction (PCR).Poer Pint
Polymerase Chain Reaction (PCR).Poer PintPolymerase Chain Reaction (PCR).Poer Pint
Polymerase Chain Reaction (PCR).Poer Pint
Dr Showkat Ahmad Wani
 
Nutritional Diseases in poultry.........
Nutritional Diseases in poultry.........Nutritional Diseases in poultry.........
Nutritional Diseases in poultry.........
Bangladesh Agricultural University,Mymemsingh
 
Water analysis practical for ph, tds, hardness, acidity, conductivity, and ba...
Water analysis practical for ph, tds, hardness, acidity, conductivity, and ba...Water analysis practical for ph, tds, hardness, acidity, conductivity, and ba...
Water analysis practical for ph, tds, hardness, acidity, conductivity, and ba...
ss0077014
 
Keynote presentation at DeepTest Workshop 2025
Keynote presentation at DeepTest Workshop 2025Keynote presentation at DeepTest Workshop 2025
Keynote presentation at DeepTest Workshop 2025
Shiva Nejati
 
2025 Insilicogen Company Korean Brochure
2025 Insilicogen Company Korean Brochure2025 Insilicogen Company Korean Brochure
2025 Insilicogen Company Korean Brochure
Insilico Gen
 
biochemistry amino acid from chemistry to life machinery
biochemistry amino acid from chemistry to life machinerybiochemistry amino acid from chemistry to life machinery
biochemistry amino acid from chemistry to life machinery
chaitanyaa4444
 
amino compounds.pptx class 12_Govinda Pathak
amino compounds.pptx class 12_Govinda Pathakamino compounds.pptx class 12_Govinda Pathak
amino compounds.pptx class 12_Govinda Pathak
GovindaPathak6
 
Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...
Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...
Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...
muralinath2
 
APES 6.5 Presentation Fossil Fuels .pdf
APES 6.5 Presentation Fossil Fuels   .pdfAPES 6.5 Presentation Fossil Fuels   .pdf
APES 6.5 Presentation Fossil Fuels .pdf
patelereftu
 
Botany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdf
Botany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdfBotany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdf
Botany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdf
JseleBurgos
 
UNIT chromatography instrumental6 .pptx
UNIT chromatography  instrumental6 .pptxUNIT chromatography  instrumental6 .pptx
UNIT chromatography instrumental6 .pptx
myselfit143
 
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptxQuiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
NutriGen
 
Zoonosis, Types, Causes. A comprehensive pptx
Zoonosis, Types, Causes. A comprehensive pptxZoonosis, Types, Causes. A comprehensive pptx
Zoonosis, Types, Causes. A comprehensive pptx
Dr Showkat Ahmad Wani
 
VERMICOMPOSTING A STEP TOWARDS SUSTAINABILITY.pptx
VERMICOMPOSTING A STEP TOWARDS SUSTAINABILITY.pptxVERMICOMPOSTING A STEP TOWARDS SUSTAINABILITY.pptx
VERMICOMPOSTING A STEP TOWARDS SUSTAINABILITY.pptx
hipachi8
 
Presentatation_SM_muscle_structpes_funtionre_ty.pptx
Presentatation_SM_muscle_structpes_funtionre_ty.pptxPresentatation_SM_muscle_structpes_funtionre_ty.pptx
Presentatation_SM_muscle_structpes_funtionre_ty.pptx
muralinath2
 
Class-11-notes- Inorganic Chemistry Hydrogen, Oxygen,Ozone,Carbon,Phosphoros
Class-11-notes- Inorganic Chemistry Hydrogen, Oxygen,Ozone,Carbon,PhosphorosClass-11-notes- Inorganic Chemistry Hydrogen, Oxygen,Ozone,Carbon,Phosphoros
Class-11-notes- Inorganic Chemistry Hydrogen, Oxygen,Ozone,Carbon,Phosphoros
govindapathak8
 
Examining Visual Attention in Gaze-Driven VR Learning: An Eye-Tracking Study ...
Examining Visual Attention in Gaze-Driven VR Learning: An Eye-Tracking Study ...Examining Visual Attention in Gaze-Driven VR Learning: An Eye-Tracking Study ...
Examining Visual Attention in Gaze-Driven VR Learning: An Eye-Tracking Study ...
Yasasi Abeysinghe
 
SuperconductingMagneticEnergyStorage.pptx
SuperconductingMagneticEnergyStorage.pptxSuperconductingMagneticEnergyStorage.pptx
SuperconductingMagneticEnergyStorage.pptx
BurkanAlpKale
 
Application of Microbiology- Industrial, agricultural, medical
Application of Microbiology- Industrial, agricultural, medicalApplication of Microbiology- Industrial, agricultural, medical
Application of Microbiology- Industrial, agricultural, medical
Anoja Kurian
 
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...
home
 
Polymerase Chain Reaction (PCR).Poer Pint
Polymerase Chain Reaction (PCR).Poer PintPolymerase Chain Reaction (PCR).Poer Pint
Polymerase Chain Reaction (PCR).Poer Pint
Dr Showkat Ahmad Wani
 
Water analysis practical for ph, tds, hardness, acidity, conductivity, and ba...
Water analysis practical for ph, tds, hardness, acidity, conductivity, and ba...Water analysis practical for ph, tds, hardness, acidity, conductivity, and ba...
Water analysis practical for ph, tds, hardness, acidity, conductivity, and ba...
ss0077014
 
Keynote presentation at DeepTest Workshop 2025
Keynote presentation at DeepTest Workshop 2025Keynote presentation at DeepTest Workshop 2025
Keynote presentation at DeepTest Workshop 2025
Shiva Nejati
 
2025 Insilicogen Company Korean Brochure
2025 Insilicogen Company Korean Brochure2025 Insilicogen Company Korean Brochure
2025 Insilicogen Company Korean Brochure
Insilico Gen
 
biochemistry amino acid from chemistry to life machinery
biochemistry amino acid from chemistry to life machinerybiochemistry amino acid from chemistry to life machinery
biochemistry amino acid from chemistry to life machinery
chaitanyaa4444
 
amino compounds.pptx class 12_Govinda Pathak
amino compounds.pptx class 12_Govinda Pathakamino compounds.pptx class 12_Govinda Pathak
amino compounds.pptx class 12_Govinda Pathak
GovindaPathak6
 
Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...
Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...
Body temperature_chemical thermogenesis_hypothermia_hypothermiaMetabolic acti...
muralinath2
 
APES 6.5 Presentation Fossil Fuels .pdf
APES 6.5 Presentation Fossil Fuels   .pdfAPES 6.5 Presentation Fossil Fuels   .pdf
APES 6.5 Presentation Fossil Fuels .pdf
patelereftu
 
Botany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdf
Botany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdfBotany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdf
Botany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdf
JseleBurgos
 
UNIT chromatography instrumental6 .pptx
UNIT chromatography  instrumental6 .pptxUNIT chromatography  instrumental6 .pptx
UNIT chromatography instrumental6 .pptx
myselfit143
 
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptxQuiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
NutriGen
 
Zoonosis, Types, Causes. A comprehensive pptx
Zoonosis, Types, Causes. A comprehensive pptxZoonosis, Types, Causes. A comprehensive pptx
Zoonosis, Types, Causes. A comprehensive pptx
Dr Showkat Ahmad Wani
 
VERMICOMPOSTING A STEP TOWARDS SUSTAINABILITY.pptx
VERMICOMPOSTING A STEP TOWARDS SUSTAINABILITY.pptxVERMICOMPOSTING A STEP TOWARDS SUSTAINABILITY.pptx
VERMICOMPOSTING A STEP TOWARDS SUSTAINABILITY.pptx
hipachi8
 
Presentatation_SM_muscle_structpes_funtionre_ty.pptx
Presentatation_SM_muscle_structpes_funtionre_ty.pptxPresentatation_SM_muscle_structpes_funtionre_ty.pptx
Presentatation_SM_muscle_structpes_funtionre_ty.pptx
muralinath2
 
Class-11-notes- Inorganic Chemistry Hydrogen, Oxygen,Ozone,Carbon,Phosphoros
Class-11-notes- Inorganic Chemistry Hydrogen, Oxygen,Ozone,Carbon,PhosphorosClass-11-notes- Inorganic Chemistry Hydrogen, Oxygen,Ozone,Carbon,Phosphoros
Class-11-notes- Inorganic Chemistry Hydrogen, Oxygen,Ozone,Carbon,Phosphoros
govindapathak8
 
Ad

Exploratory Analysis Part1 Coursera DataScience Specialisation

  • 2. In today’s session • Principles behind exploratory analyses • Plotting data out on to popular exploratory graphs • Plotting Systems in R • Base (Week1) • Lattice (Week2) • GGPLOT2 (Week2) • Choosing and using Graphic Devices aka the output formats Scripts can be downloaded at: https://www.dropbox.com/s/ii1yj8f650d4l1q/lesson1.r?dl=0 https://www.dropbox.com/s/eme44h6lrhn775l/final.r?dl=0
  • 3. Principles behind exploratory analyses • Show comparisons • Show causality, mechanism, explanation • Show multivariate data • Integrate multiple modes of evidence • Describe and document the evidence • Content is king • SPEED
  • 4. Dimensionality • Five-number summary • Boxplots • Histograms • Density plot • Barplot Multiple-overlayed 1D plots Scatter plots
  • 5. Downloading our dataset R code dir.create("exploring_data") setwd(“exploring_data”) download.file(“http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/therbook.zip",dest="data.zip") unzip(“data.zip”)
  • 6. R code Boxplots weather = read.table("SilwoodWeather.txt",h=T) onemonth = subset(weather, month==1 & yr == 2004) boxplot(onemonth$rain) Header = T
  • 7. Histograms R code hist(weather$upper) rug(weather$upper) ticks for each value
  • 8. Barplot R code Barplot( table(weather$month), col = "wheat", main = "Number of Observations in Months”)
  • 9. Raster Vector PNG PDF SVG grDevices Filesize small medium medium Scalable No Yes Yes Web friendly Yes No Yes
  • 10. Plotting Systems Plotting Systems Base Lattice Grid Libraries lattice grid, gridExtras ggplot2 Example functions hist✔ barplot✔ boxplot✔ Plot xyplot (scatterplots) bwplot (boxplots) levelplot qplot ggplot geom Facetted plots Yes Yes Yes Grammar of NO No Yes graphics Interface with statistical functions Yes Partial Partial + Workarounds Cannot be mixed
  • 11. Base plots: Scatterplot R code data1 = read.table("scatter1.txt", h=T) data2 = read.table("scatter2.txt", h=T)
  • 12. Base plots: Scatterplot R code data1 = read.table("scatter1.txt", h=T) data2 = read.table("scatter2.txt", h=T) #Color with(data1, plot(xv, ys, col="red")) #Regression Line with(data1, abline(lm(ys~xv))) Color
  • 13. Base plots: Scatterplot Set symbol to represent data point
  • 14. Base plots: Scatterplot R code data1 = read.table("scatter1.txt", h=T) data2 = read.table("scatter2.txt", h=T) #Color with(data1, plot(xv, ys, col="red")) with(data1, abline(lm(ys~xv))) #shape with(data2, points(xv2, ys2, col="blue", pch =11)) Symbol shape
  • 15. Base plots: Scatterplot R code data1 = read.table("scatter1.txt", h=T) data2 = read.table("scatter2.txt", h=T) #Color with(data1, plot(xv, ys, col="red")) with(data1, abline(lm(ys~xv))) #shape with(data2, points(xv2, ys2, col="blue", pch =11)) Symbol shape
  • 16. Base plots: Using par for multiple plots R code par(mfrow=c(1,2)) with(data1, plot(xv, ys, col="red")) with(data1, abline(lm(ys~xv))) #Plot2 with(data2, plot(xv2, ys2, col="blue", pch =11)) title(“My Title", outer=TRUE)
  • 17. Par: To set global settings R code mfrow( mar=c(5.1,4.1,4.1,2.1), oma=c(2,2,2,2) )
  • 18. Lattice productivity = read.table("productivity.txt",h=T) # of species in forest against differing productivity library(lattice) #plotting xyplot( x~y, productivity, xlab=list(label="Productivity"), ylab=list(label="Mammal Species")) R code Formular Data frame
  • 20. Lattice productivity = read.table("productivity.txt",h=T) # of species in forest against differing productivity library(lattice) #plotting xyplot( x~y, productivity, xlab=list(label="Productivity"), ylab=list(label="Mammal Species")) xyplot( x~y | f, productivity, xlab=list(label="Productivity"), ylab=list(label="Mammal Species")) R code Formular Data frame given
  • 22. ggplot2 • Grammar of graphics (gg) • Based on GRID plotting system, cannot be mixed with base ggplot2.org
  • 23. ggplot Components • Data & relationship • GEOMetric Object • Statistical transformation • Scales • Coordinate system • Facetting
  • 26. ggplot Geometric objects aka Geoms Coordinate system wrt scales Log scale / sqrt / log ratio Title Plot Theme etc
  • 28. ggplot Components • Data & relationship ✔ • GEOMetric Object • Statistical transformation • Scales • Coordinate system • Facetting R code Rmbr to change month into a factor data.frame Aesthetics function which maps the relationships ggplot(weather, aes(x=month, y=upper))+ geom_boxplot()
  • 29. ggplot Components • Data & relationship ✔ • GEOMetric Object ✔ • Statistical transformation✔ • Scales • Coordinate system • Facetting R code weather2 = weather %>% group_by(month) %>% summarise(average.upper = mean(upper)) ggplot(weather2, aes(month, average.upper))+ geom_bar(stat="identity")
  • 30. ggplot Components • Data & relationship ✔ • GEOMetric Object ✔ • Statistical transformation✔ • Scales • Coordinate system • Facetting R code weather2 = weather %>% group_by(month) %>% summarise(average.upper = mean(upper)) ggplot(weather2, aes(month, average.upper))+ geom_bar(stat="identity")
  • 31. ggplot Components • Data & relationship ✔ • GEOMetric Object ✔ • Statistical transformation✔ • Scales✔ • Coordinate system • Facetting R code plot2 = ggplot(weather2, aes(month, average.upper))+ geom_bar(aes(fill=month),stat="identity")+ scale_fill_brewer(palette="Set3")+ xlab("Months")+ ylab("Upper Quantile")+theme_bw()
  • 32. ggplot Components • Data & relationship ✔ • GEOMetric Object ✔ • Statistical transformation✔ • Scales✔ • Coordinate system • Facetting R code plot2 = ggplot(weather2, aes(month, average.upper))+ geom_bar(aes(fill=month),stat="identity")+ scale_fill_brewer(palette="Set3")+ xlab("Months")+ ylab("Upper Quantile")+theme_bw()
  • 34. qplot A separate function which wraps ggplot, for simpler syntax R code qplot(month, upper, fill=month, data=weather, facets = ~yr, geom="bar", stat="identity")
  • 35. Ethos behind visualization http://keylines.com/network-visualization
  • 37. Final Challenge R code library(ggplot2) #Reads in data data = read.csv("final.csv") #Preparing for the rectangle background areas=unique(subset(data, select=c(Planning_Area,Planning_Region))) areas=areas[order(areas$Planning_Region),] areas$rectid=1:nrow(areas) rectdata = areas %>% group_by(Planning_Region) %>% summarise(xstart=min(rectid)- 0.5,xend= max(rectid)+0.5) #Order the levels data$Planning_Area=factor(data$Planning_Area, levels=as.character(areas[order(areas$Planning_Region),]$Planning_Area))
  • 38. Final challenge #Plot p0 = ggplot(data, aes(Planning_Area, Unit_Price____psm_))+ geom_boxplot(outlier.colour=NA)+ geom_rect(data=rectdata,aes(xmin=xstart,xmax=xend,ymin = -Inf, ymax = Inf, fill = Planning_Region,group=Planning_Region), alpha = 0.4,inherit.aes=F)+ geom_jitter(alpha=0.40, aes(color=as.factor(Year)))+ scale_color_brewer("Year", palette='RdBu')+ scale_fill_brewer(palette="Set1",name='Region')+ theme_minimal()+ theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1))+ xlab("Planning Area")+ylab("Unit Price (PSM)") R code #Save plot ggsave(p0, file="areaboxplots.pdf",w=20,h=10,units="in",dpi=300)
  • 39. “Above all else show the data.” ― Edward R. Tufte, The Visual Display of Quantitative Information Thank you for your time

Editor's Notes

  • #3: In this course we will be learning how to
  • #4: In this course we will be learning how to
  • #5: In this course we will be learning how to
  • #6: In this course we will be learning how to
  • #9: barplot(table(weather$month), col = "wheat", main = "Number of Observations in Months")
  • #12: In this course we will be learning how to
  • #13: In this course we will be learning how to
  • #14: In this course we will be learning how to
  • #15: In this course we will be learning how to
  • #16: In this course we will be learning how to
  • #17: In this course we will be learning how to title("My Title", outer=TRUE)
  • #18: In this course we will be learning how to
  • #29: ggplot(weather, aes(month, upper))+ geom_boxplot()
  • #30: ggplot(weather, aes(month, upper))+ geom_boxplot()
  • #31: ggplot(weather, aes(month, upper))+ geom_boxplot()
  • #32: ggplot(weather, aes(month, upper))+ geom_boxplot()
  • #33: ggplot(weather, aes(month, upper))+ geom_boxplot()
  • #34: ggplot(weather, aes(month, upper))+ geom_boxplot()
  • #38: In this course we will be learning how to
  • #39: In this course we will be learning how to