SlideShare a Scribd company logo
data.table talk
January 21, 2015
The data.table package
author: Pete Dodd date: 4 November, 2014
dataframes in R
What is a dataframe?
default R objects for holding data
can mix numeric, and text data
ordered/unordered factors
many statistical functions require dataframe inputs
dataframes in R
Problems:
print!
slow searching
verbose syntax
no built-in methods for aggregation
Which is most annoying depends on who you are. . .
Constructing data.tables
myDT <- data.table(
number=1:3,
letter=c('a','b','c')
) # like data.frame constructor
myDT2 <- as.data.frame(myDF) #conversion
The data.table class inherits dataframe, so data.tables (mostly) can
be used exactly like dataframes, and should not break existing code.
Examples
WHO TB data:
D <- read.csv('TB_burden_countries_2014-09-10.csv')
names(D)[1:10]
## [1] "country" "iso2" "iso3"
## [5] "g_whoregion" "year" "e_pop_num"
## [9] "e_prev_100k_lo" "e_prev_100k_hi"
Examples
WHO TB data:
head(D[,c(1,6,8)])
## country year e_prev_100k
## 1 Afghanistan 1990 327
## 2 Afghanistan 1991 359
## 3 Afghanistan 1992 387
## 4 Afghanistan 1993 412
## 5 Afghanistan 1994 431
## 6 Afghanistan 1995 447
Examples
Mean TB in Afghanistan
mean(D[D$country=='Afghanistan','e_prev_100k'])
## [1] 397.6087
As data.table:
library(data.table)
E <- as.data.table(D) #convert
E[country=='Afghanistan',mean(e_prev_100k)]
## [1] 397.6087
Examples
dataframe multi-column access:
D[D$country=='Afghanistan',
c('e_prev_100k','e_prev_100k_lo',
'e_prev_100k_hi')]
data.table multi-column means, renamed:
E[country=='Afghanistan',
list(mid=mean(e_prev_100k),
lo=mean(e_prev_100k_lo),
hi=mean(e_prev_100k_hi))]
## mid lo hi
## 1: 397.6087 187.913 684.7391
Examples
Means for each country? data.table solution:
E[,list(mid=mean(e_prev_100k)),by=country]
## country mid
## 1: Afghanistan 397.60870
## 2: Albania 29.52174
## 3: Algeria 133.95652
## 4: American Samoa 15.09130
## 5: Andorra 30.71304
## ---
## 215: Wallis and Futuna Islands 117.86957
## 216: West Bank and Gaza Strip 11.14783
## 217: Yemen 180.30435
## 218: Zambia 501.39130
## 219: Zimbabwe 386.30435
Examples
A more complicated example:
E[,
list(lo=mean(e_prev_100k_lo),
hi=mean(e_prev_100k_hi)),
by=list(country,
century=factor(year<2000)
)]
Examples
Output:
## country century lo hi
## 1: Afghanistan TRUE 189.20000 749.80000
## 2: Afghanistan FALSE 186.92308 634.69231
## 3: Albania TRUE 13.20000 65.40000
## 4: Albania FALSE 10.59231 47.53846
## 5: Algeria TRUE 49.40000 212.80000
## ---
## 427: Yemen FALSE 62.69231 218.38462
## 428: Zambia TRUE 291.60000 1024.90000
## 429: Zambia FALSE 197.00000 733.76923
## 430: Zimbabwe TRUE 14.81000 1074.60000
## 431: Zimbabwe FALSE 56.07692 1219.61538
Examples
eo <- E[,plot(sort(e_prev_100k))]
0 1000 2000 3000 4000 5000
050010001500
Index
sort(e_prev_100k)
(1-
line combination with aggregations
Fast insertion
A new column can be inserted by:
E[,country_t := paste0(country,year)]
head(E[,country_t])
## [1] "Afghanistan1990" "Afghanistan1991" "Afghanistan1992
## [5] "Afghanistan1994" "Afghanistan1995"
Keys: fast row retrieval
Need to pre-compute (setkey line)
setkey(E,country) #must be sorted
E['Afghanistan',e_inc_100k]
## country e_inc_100k
## 1: Afghanistan 189
## 2: Afghanistan 189
## 3: Afghanistan 189
## 4: Afghanistan 189
## 5: Afghanistan 189
## 6: Afghanistan 189
## 7: Afghanistan 189
## 8: Afghanistan 189
## 9: Afghanistan 189
## 10: Afghanistan 189
## 11: Afghanistan 189
## 12: Afghanistan 189
Gotchas: column access
E[,1]
## [1] 1
E[,1,with=FALSE]
## country
## 1: Afghanistan
## 2: Afghanistan
## 3: Afghanistan
## 4: Afghanistan
## 5: Afghanistan
## ---
## 4899: Zimbabwe
## 4900: Zimbabwe
## 4901: Zimbabwe
## 4902: Zimbabwe
## 4903: Zimbabwe
Gotchas: copying
E2 <- E
E[,foo:='bar']
head(E2[,foo])
## [1] "bar" "bar" "bar" "bar" "bar" "bar"
Gotchas: copying
This is because copying is by reference.
Use:
E2 <- copy(E)
instead.
Summary
more compact
faster (sometimes lots)
less memory
great for aggregation/exploratory data crunching
But: - a few traps for the unwary
Good package vignettes & FAQ,
Related
aggregate in base R
plyr: use of ddply
sqldf: good if you know SQL
RSQLlite: ditto
other: - RODBC etc: talk to databases - dplyr: nascent, by Hadley,
internal & external

More Related Content

What's hot (20)

Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
Piyush rai
 
Rsplit apply combine
Rsplit apply combineRsplit apply combine
Rsplit apply combine
Michelle Darling
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Alexander Hendorf
 
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Jeffrey Breen
 
R factors
R   factorsR   factors
R factors
Learnbay Datascience
 
R seminar dplyr package
R seminar dplyr packageR seminar dplyr package
R seminar dplyr package
Muhammad Nabi Ahmad
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
Sakthi Dasans
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
Avjinder (Avi) Kaler
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
Sourabh Sahu
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
krishna singh
 
Pandas
PandasPandas
Pandas
maikroeder
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
Rsquared Academy
 
Data preparation, depth function
Data preparation, depth functionData preparation, depth function
Data preparation, depth function
FAO
 
Python Pandas
Python PandasPython Pandas
Python Pandas
Sunil OS
 
Data Analysis in Python
Data Analysis in PythonData Analysis in Python
Data Analysis in Python
Richard Herrell
 
10. Getting Spatial
10. Getting Spatial10. Getting Spatial
10. Getting Spatial
FAO
 
R Programming: Export/Output Data In R
R Programming: Export/Output Data In RR Programming: Export/Output Data In R
R Programming: Export/Output Data In R
Rsquared Academy
 
R Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RR Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In R
Rsquared Academy
 
3 R Tutorial Data Structure
3 R Tutorial Data Structure3 R Tutorial Data Structure
3 R Tutorial Data Structure
Sakthi Dasans
 
Pandas Cheat Sheet
Pandas Cheat SheetPandas Cheat Sheet
Pandas Cheat Sheet
ACASH1011
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
Piyush rai
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Alexander Hendorf
 
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R codeMove your data (Hans Rosling style) with googleVis + 1 line of R code
Move your data (Hans Rosling style) with googleVis + 1 line of R code
Jeffrey Breen
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
Sakthi Dasans
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
Sourabh Sahu
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
krishna singh
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
Rsquared Academy
 
Data preparation, depth function
Data preparation, depth functionData preparation, depth function
Data preparation, depth function
FAO
 
Python Pandas
Python PandasPython Pandas
Python Pandas
Sunil OS
 
10. Getting Spatial
10. Getting Spatial10. Getting Spatial
10. Getting Spatial
FAO
 
R Programming: Export/Output Data In R
R Programming: Export/Output Data In RR Programming: Export/Output Data In R
R Programming: Export/Output Data In R
Rsquared Academy
 
R Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RR Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In R
Rsquared Academy
 
3 R Tutorial Data Structure
3 R Tutorial Data Structure3 R Tutorial Data Structure
3 R Tutorial Data Structure
Sakthi Dasans
 
Pandas Cheat Sheet
Pandas Cheat SheetPandas Cheat Sheet
Pandas Cheat Sheet
ACASH1011
 

Viewers also liked (11)

How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
Paul Richards
 
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
Paul Richards
 
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflowSheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
Paul Richards
 
Introduction to knitr - May Sheffield R Users group
Introduction to knitr - May Sheffield R Users groupIntroduction to knitr - May Sheffield R Users group
Introduction to knitr - May Sheffield R Users group
Paul Richards
 
constants, variables and datatypes in C
constants, variables and datatypes in Cconstants, variables and datatypes in C
constants, variables and datatypes in C
Sahithi Naraparaju
 
Data and its types by adeel
Data and its types by adeelData and its types by adeel
Data and its types by adeel
Ayaan Adeel
 
Data types
Data typesData types
Data types
Zahid Hussain
 
Data presentation 2
Data presentation 2Data presentation 2
Data presentation 2
Rawalpindi Medical College
 
Presentation of data
Presentation of dataPresentation of data
Presentation of data
Winona Esel Bernardo
 
Concept Of C++ Data Types
Concept Of C++ Data TypesConcept Of C++ Data Types
Concept Of C++ Data Types
k v
 
How to Present Data in PowerPoint
How to Present Data in PowerPointHow to Present Data in PowerPoint
How to Present Data in PowerPoint
Matt Hunter
 
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
Paul Richards
 
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
Paul Richards
 
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflowSheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
Paul Richards
 
Introduction to knitr - May Sheffield R Users group
Introduction to knitr - May Sheffield R Users groupIntroduction to knitr - May Sheffield R Users group
Introduction to knitr - May Sheffield R Users group
Paul Richards
 
constants, variables and datatypes in C
constants, variables and datatypes in Cconstants, variables and datatypes in C
constants, variables and datatypes in C
Sahithi Naraparaju
 
Data and its types by adeel
Data and its types by adeelData and its types by adeel
Data and its types by adeel
Ayaan Adeel
 
Concept Of C++ Data Types
Concept Of C++ Data TypesConcept Of C++ Data Types
Concept Of C++ Data Types
k v
 
How to Present Data in PowerPoint
How to Present Data in PowerPointHow to Present Data in PowerPoint
How to Present Data in PowerPoint
Matt Hunter
 
Ad

Similar to Introduction to data.table in R (20)

Data wrangling IN R LANGUAGE
Data wrangling IN R LANGUAGEData wrangling IN R LANGUAGE
Data wrangling IN R LANGUAGE
LOVELY PROFESSIONAL UNIVERSITY
 
Data import-cheatsheet
Data import-cheatsheetData import-cheatsheet
Data import-cheatsheet
Dieudonne Nahigombeye
 
Data tidying with tidyr meetup
Data tidying with tidyr  meetupData tidying with tidyr  meetup
Data tidying with tidyr meetup
Matthew Samelson
 
R_CheatSheet.pdf
R_CheatSheet.pdfR_CheatSheet.pdf
R_CheatSheet.pdf
MariappanR3
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-export
FAO
 
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageJanuary 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
Zurich_R_User_Group
 
R-Data table Cheat Sheet
R-Data table Cheat SheetR-Data table Cheat Sheet
R-Data table Cheat Sheet
Dr. Volkan OBAN
 
R gráfico
R gráficoR gráfico
R gráfico
stryper1968
 
PLOTCON NYC: New Open Viz in R
PLOTCON NYC: New Open Viz in RPLOTCON NYC: New Open Viz in R
PLOTCON NYC: New Open Viz in R
Plotly
 
R part I
R part IR part I
R part I
Ruru Chowdhury
 
Table of cfasdfdasfasdfsfsadfsdount (4).pptx
Table of cfasdfdasfasdfsfsadfsdount (4).pptxTable of cfasdfdasfasdfsfsadfsdount (4).pptx
Table of cfasdfdasfasdfsfsadfsdount (4).pptx
SteveDudu
 
Table of Useful R commands.
Table of Useful R commands.Table of Useful R commands.
Table of Useful R commands.
Dr. Volkan OBAN
 
9. R data-import data-export
9. R data-import data-export9. R data-import data-export
9. R data-import data-export
ExternalEvents
 
Ggplot2 work
Ggplot2 workGgplot2 work
Ggplot2 work
ARUN DN
 
R learning by examples
R learning by examplesR learning by examples
R learning by examples
Michelle Darling
 
description description description description
description description description descriptiondescription description description description
description description description description
ibrahimradwan14
 
Expository data analysis aand visualization-1.pdf
Expository data analysis aand visualization-1.pdfExpository data analysis aand visualization-1.pdf
Expository data analysis aand visualization-1.pdf
PrinceUzair4
 
Expository data analysis aand visualization-1.pdf
Expository data analysis aand visualization-1.pdfExpository data analysis aand visualization-1.pdf
Expository data analysis aand visualization-1.pdf
PrinceUzair4
 
tidyr.pdf
tidyr.pdftidyr.pdf
tidyr.pdf
Mateus S. Xavier
 
Day 1c access, select ordering copy.pptx
Day 1c   access, select   ordering copy.pptxDay 1c   access, select   ordering copy.pptx
Day 1c access, select ordering copy.pptx
Adrien Melquiond
 
Data tidying with tidyr meetup
Data tidying with tidyr  meetupData tidying with tidyr  meetup
Data tidying with tidyr meetup
Matthew Samelson
 
R_CheatSheet.pdf
R_CheatSheet.pdfR_CheatSheet.pdf
R_CheatSheet.pdf
MariappanR3
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-export
FAO
 
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageJanuary 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
Zurich_R_User_Group
 
R-Data table Cheat Sheet
R-Data table Cheat SheetR-Data table Cheat Sheet
R-Data table Cheat Sheet
Dr. Volkan OBAN
 
PLOTCON NYC: New Open Viz in R
PLOTCON NYC: New Open Viz in RPLOTCON NYC: New Open Viz in R
PLOTCON NYC: New Open Viz in R
Plotly
 
Table of cfasdfdasfasdfsfsadfsdount (4).pptx
Table of cfasdfdasfasdfsfsadfsdount (4).pptxTable of cfasdfdasfasdfsfsadfsdount (4).pptx
Table of cfasdfdasfasdfsfsadfsdount (4).pptx
SteveDudu
 
Table of Useful R commands.
Table of Useful R commands.Table of Useful R commands.
Table of Useful R commands.
Dr. Volkan OBAN
 
9. R data-import data-export
9. R data-import data-export9. R data-import data-export
9. R data-import data-export
ExternalEvents
 
Ggplot2 work
Ggplot2 workGgplot2 work
Ggplot2 work
ARUN DN
 
description description description description
description description description descriptiondescription description description description
description description description description
ibrahimradwan14
 
Expository data analysis aand visualization-1.pdf
Expository data analysis aand visualization-1.pdfExpository data analysis aand visualization-1.pdf
Expository data analysis aand visualization-1.pdf
PrinceUzair4
 
Expository data analysis aand visualization-1.pdf
Expository data analysis aand visualization-1.pdfExpository data analysis aand visualization-1.pdf
Expository data analysis aand visualization-1.pdf
PrinceUzair4
 
Day 1c access, select ordering copy.pptx
Day 1c   access, select   ordering copy.pptxDay 1c   access, select   ordering copy.pptx
Day 1c access, select ordering copy.pptx
Adrien Melquiond
 
Ad

More from Paul Richards (7)

SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...
SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...
SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...
Paul Richards
 
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Paul Richards
 
Querying open data with R - Talk at April SheffieldR Users Gp
Querying open data with R - Talk at April SheffieldR Users GpQuerying open data with R - Talk at April SheffieldR Users Gp
Querying open data with R - Talk at April SheffieldR Users Gp
Paul Richards
 
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
Paul Richards
 
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Paul Richards
 
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Paul Richards
 
Introduction to Shiny for building web apps in R
Introduction to Shiny for building web apps in RIntroduction to Shiny for building web apps in R
Introduction to Shiny for building web apps in R
Paul Richards
 
SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...
SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...
SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...
Paul Richards
 
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Paul Richards
 
Querying open data with R - Talk at April SheffieldR Users Gp
Querying open data with R - Talk at April SheffieldR Users GpQuerying open data with R - Talk at April SheffieldR Users Gp
Querying open data with R - Talk at April SheffieldR Users Gp
Paul Richards
 
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
Paul Richards
 
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Paul Richards
 
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Paul Richards
 
Introduction to Shiny for building web apps in R
Introduction to Shiny for building web apps in RIntroduction to Shiny for building web apps in R
Introduction to Shiny for building web apps in R
Paul Richards
 

Recently uploaded (20)

Build enterprise-ready applications using skills you already have!
Build enterprise-ready applications using skills you already have!Build enterprise-ready applications using skills you already have!
Build enterprise-ready applications using skills you already have!
PhilMeredith3
 
Simplify Training with an Online Induction Portal for Contractors
Simplify Training with an Online Induction Portal for ContractorsSimplify Training with an Online Induction Portal for Contractors
Simplify Training with an Online Induction Portal for Contractors
SHEQ Network Limited
 
Agile Software Engineering Methodologies
Agile Software Engineering MethodologiesAgile Software Engineering Methodologies
Agile Software Engineering Methodologies
Gaurav Sharma
 
Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...
Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...
Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...
SheenBrisals
 
Topic 26 Security Testing Considerations.pptx
Topic 26 Security Testing Considerations.pptxTopic 26 Security Testing Considerations.pptx
Topic 26 Security Testing Considerations.pptx
marutnand8
 
Leveraging Foundation Models to Infer Intents
Leveraging Foundation Models to Infer IntentsLeveraging Foundation Models to Infer Intents
Leveraging Foundation Models to Infer Intents
Keheliya Gallaba
 
Design by Contract - Building Robust Software with Contract-First Development
Design by Contract - Building Robust Software with Contract-First DevelopmentDesign by Contract - Building Robust Software with Contract-First Development
Design by Contract - Building Robust Software with Contract-First Development
Par-Tec S.p.A.
 
Software Engineering Process, Notation & Tools Introduction - Part 4
Software Engineering Process, Notation & Tools Introduction - Part 4Software Engineering Process, Notation & Tools Introduction - Part 4
Software Engineering Process, Notation & Tools Introduction - Part 4
Gaurav Sharma
 
Online Queue Management System for Public Service Offices [Focused on Municip...
Online Queue Management System for Public Service Offices [Focused on Municip...Online Queue Management System for Public Service Offices [Focused on Municip...
Online Queue Management System for Public Service Offices [Focused on Municip...
Rishab Acharya
 
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdfThe Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
Varsha Nayak
 
Artificial Intelligence Applications Across Industries
Artificial Intelligence Applications Across IndustriesArtificial Intelligence Applications Across Industries
Artificial Intelligence Applications Across Industries
SandeepKS52
 
Neuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
Neuralink TemplateeeeeeeeeeeeeeeeeeeeeeeeeeNeuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
Neuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
alexandernoetzold
 
Providing Better Biodiversity Through Better Data
Providing Better Biodiversity Through Better DataProviding Better Biodiversity Through Better Data
Providing Better Biodiversity Through Better Data
Safe Software
 
Bonk coin airdrop_ Everything You Need to Know.pdf
Bonk coin airdrop_ Everything You Need to Know.pdfBonk coin airdrop_ Everything You Need to Know.pdf
Bonk coin airdrop_ Everything You Need to Know.pdf
Herond Labs
 
Rebuilding Cadabra Studio: AI as Our Core Foundation
Rebuilding Cadabra Studio: AI as Our Core FoundationRebuilding Cadabra Studio: AI as Our Core Foundation
Rebuilding Cadabra Studio: AI as Our Core Foundation
Cadabra Studio
 
FME for Climate Data: Turning Big Data into Actionable Insights
FME for Climate Data: Turning Big Data into Actionable InsightsFME for Climate Data: Turning Big Data into Actionable Insights
FME for Climate Data: Turning Big Data into Actionable Insights
Safe Software
 
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdfHow to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
QuickBooks Training
 
Revolutionize Your Insurance Workflow with Claims Management Software
Revolutionize Your Insurance Workflow with Claims Management SoftwareRevolutionize Your Insurance Workflow with Claims Management Software
Revolutionize Your Insurance Workflow with Claims Management Software
Insurance Tech Services
 
Integration Ignited Redefining Event-Driven Architecture at Wix - EventCentric
Integration Ignited Redefining Event-Driven Architecture at Wix - EventCentricIntegration Ignited Redefining Event-Driven Architecture at Wix - EventCentric
Integration Ignited Redefining Event-Driven Architecture at Wix - EventCentric
Natan Silnitsky
 
zOS CommServer support for the Network Express feature on z17
zOS CommServer support for the Network Express feature on z17zOS CommServer support for the Network Express feature on z17
zOS CommServer support for the Network Express feature on z17
zOSCommserver
 
Build enterprise-ready applications using skills you already have!
Build enterprise-ready applications using skills you already have!Build enterprise-ready applications using skills you already have!
Build enterprise-ready applications using skills you already have!
PhilMeredith3
 
Simplify Training with an Online Induction Portal for Contractors
Simplify Training with an Online Induction Portal for ContractorsSimplify Training with an Online Induction Portal for Contractors
Simplify Training with an Online Induction Portal for Contractors
SHEQ Network Limited
 
Agile Software Engineering Methodologies
Agile Software Engineering MethodologiesAgile Software Engineering Methodologies
Agile Software Engineering Methodologies
Gaurav Sharma
 
Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...
Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...
Eliminate the complexities of Event-Driven Architecture with Domain-Driven De...
SheenBrisals
 
Topic 26 Security Testing Considerations.pptx
Topic 26 Security Testing Considerations.pptxTopic 26 Security Testing Considerations.pptx
Topic 26 Security Testing Considerations.pptx
marutnand8
 
Leveraging Foundation Models to Infer Intents
Leveraging Foundation Models to Infer IntentsLeveraging Foundation Models to Infer Intents
Leveraging Foundation Models to Infer Intents
Keheliya Gallaba
 
Design by Contract - Building Robust Software with Contract-First Development
Design by Contract - Building Robust Software with Contract-First DevelopmentDesign by Contract - Building Robust Software with Contract-First Development
Design by Contract - Building Robust Software with Contract-First Development
Par-Tec S.p.A.
 
Software Engineering Process, Notation & Tools Introduction - Part 4
Software Engineering Process, Notation & Tools Introduction - Part 4Software Engineering Process, Notation & Tools Introduction - Part 4
Software Engineering Process, Notation & Tools Introduction - Part 4
Gaurav Sharma
 
Online Queue Management System for Public Service Offices [Focused on Municip...
Online Queue Management System for Public Service Offices [Focused on Municip...Online Queue Management System for Public Service Offices [Focused on Municip...
Online Queue Management System for Public Service Offices [Focused on Municip...
Rishab Acharya
 
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdfThe Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
Varsha Nayak
 
Artificial Intelligence Applications Across Industries
Artificial Intelligence Applications Across IndustriesArtificial Intelligence Applications Across Industries
Artificial Intelligence Applications Across Industries
SandeepKS52
 
Neuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
Neuralink TemplateeeeeeeeeeeeeeeeeeeeeeeeeeNeuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
Neuralink Templateeeeeeeeeeeeeeeeeeeeeeeeee
alexandernoetzold
 
Providing Better Biodiversity Through Better Data
Providing Better Biodiversity Through Better DataProviding Better Biodiversity Through Better Data
Providing Better Biodiversity Through Better Data
Safe Software
 
Bonk coin airdrop_ Everything You Need to Know.pdf
Bonk coin airdrop_ Everything You Need to Know.pdfBonk coin airdrop_ Everything You Need to Know.pdf
Bonk coin airdrop_ Everything You Need to Know.pdf
Herond Labs
 
Rebuilding Cadabra Studio: AI as Our Core Foundation
Rebuilding Cadabra Studio: AI as Our Core FoundationRebuilding Cadabra Studio: AI as Our Core Foundation
Rebuilding Cadabra Studio: AI as Our Core Foundation
Cadabra Studio
 
FME for Climate Data: Turning Big Data into Actionable Insights
FME for Climate Data: Turning Big Data into Actionable InsightsFME for Climate Data: Turning Big Data into Actionable Insights
FME for Climate Data: Turning Big Data into Actionable Insights
Safe Software
 
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdfHow to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
QuickBooks Training
 
Revolutionize Your Insurance Workflow with Claims Management Software
Revolutionize Your Insurance Workflow with Claims Management SoftwareRevolutionize Your Insurance Workflow with Claims Management Software
Revolutionize Your Insurance Workflow with Claims Management Software
Insurance Tech Services
 
Integration Ignited Redefining Event-Driven Architecture at Wix - EventCentric
Integration Ignited Redefining Event-Driven Architecture at Wix - EventCentricIntegration Ignited Redefining Event-Driven Architecture at Wix - EventCentric
Integration Ignited Redefining Event-Driven Architecture at Wix - EventCentric
Natan Silnitsky
 
zOS CommServer support for the Network Express feature on z17
zOS CommServer support for the Network Express feature on z17zOS CommServer support for the Network Express feature on z17
zOS CommServer support for the Network Express feature on z17
zOSCommserver
 

Introduction to data.table in R

  • 2. The data.table package author: Pete Dodd date: 4 November, 2014
  • 3. dataframes in R What is a dataframe? default R objects for holding data can mix numeric, and text data ordered/unordered factors many statistical functions require dataframe inputs
  • 4. dataframes in R Problems: print! slow searching verbose syntax no built-in methods for aggregation Which is most annoying depends on who you are. . .
  • 5. Constructing data.tables myDT <- data.table( number=1:3, letter=c('a','b','c') ) # like data.frame constructor myDT2 <- as.data.frame(myDF) #conversion The data.table class inherits dataframe, so data.tables (mostly) can be used exactly like dataframes, and should not break existing code.
  • 6. Examples WHO TB data: D <- read.csv('TB_burden_countries_2014-09-10.csv') names(D)[1:10] ## [1] "country" "iso2" "iso3" ## [5] "g_whoregion" "year" "e_pop_num" ## [9] "e_prev_100k_lo" "e_prev_100k_hi"
  • 7. Examples WHO TB data: head(D[,c(1,6,8)]) ## country year e_prev_100k ## 1 Afghanistan 1990 327 ## 2 Afghanistan 1991 359 ## 3 Afghanistan 1992 387 ## 4 Afghanistan 1993 412 ## 5 Afghanistan 1994 431 ## 6 Afghanistan 1995 447
  • 8. Examples Mean TB in Afghanistan mean(D[D$country=='Afghanistan','e_prev_100k']) ## [1] 397.6087 As data.table: library(data.table) E <- as.data.table(D) #convert E[country=='Afghanistan',mean(e_prev_100k)] ## [1] 397.6087
  • 9. Examples dataframe multi-column access: D[D$country=='Afghanistan', c('e_prev_100k','e_prev_100k_lo', 'e_prev_100k_hi')] data.table multi-column means, renamed: E[country=='Afghanistan', list(mid=mean(e_prev_100k), lo=mean(e_prev_100k_lo), hi=mean(e_prev_100k_hi))] ## mid lo hi ## 1: 397.6087 187.913 684.7391
  • 10. Examples Means for each country? data.table solution: E[,list(mid=mean(e_prev_100k)),by=country] ## country mid ## 1: Afghanistan 397.60870 ## 2: Albania 29.52174 ## 3: Algeria 133.95652 ## 4: American Samoa 15.09130 ## 5: Andorra 30.71304 ## --- ## 215: Wallis and Futuna Islands 117.86957 ## 216: West Bank and Gaza Strip 11.14783 ## 217: Yemen 180.30435 ## 218: Zambia 501.39130 ## 219: Zimbabwe 386.30435
  • 11. Examples A more complicated example: E[, list(lo=mean(e_prev_100k_lo), hi=mean(e_prev_100k_hi)), by=list(country, century=factor(year<2000) )]
  • 12. Examples Output: ## country century lo hi ## 1: Afghanistan TRUE 189.20000 749.80000 ## 2: Afghanistan FALSE 186.92308 634.69231 ## 3: Albania TRUE 13.20000 65.40000 ## 4: Albania FALSE 10.59231 47.53846 ## 5: Algeria TRUE 49.40000 212.80000 ## --- ## 427: Yemen FALSE 62.69231 218.38462 ## 428: Zambia TRUE 291.60000 1024.90000 ## 429: Zambia FALSE 197.00000 733.76923 ## 430: Zimbabwe TRUE 14.81000 1074.60000 ## 431: Zimbabwe FALSE 56.07692 1219.61538
  • 13. Examples eo <- E[,plot(sort(e_prev_100k))] 0 1000 2000 3000 4000 5000 050010001500 Index sort(e_prev_100k) (1- line combination with aggregations
  • 14. Fast insertion A new column can be inserted by: E[,country_t := paste0(country,year)] head(E[,country_t]) ## [1] "Afghanistan1990" "Afghanistan1991" "Afghanistan1992 ## [5] "Afghanistan1994" "Afghanistan1995"
  • 15. Keys: fast row retrieval Need to pre-compute (setkey line) setkey(E,country) #must be sorted E['Afghanistan',e_inc_100k] ## country e_inc_100k ## 1: Afghanistan 189 ## 2: Afghanistan 189 ## 3: Afghanistan 189 ## 4: Afghanistan 189 ## 5: Afghanistan 189 ## 6: Afghanistan 189 ## 7: Afghanistan 189 ## 8: Afghanistan 189 ## 9: Afghanistan 189 ## 10: Afghanistan 189 ## 11: Afghanistan 189 ## 12: Afghanistan 189
  • 16. Gotchas: column access E[,1] ## [1] 1 E[,1,with=FALSE] ## country ## 1: Afghanistan ## 2: Afghanistan ## 3: Afghanistan ## 4: Afghanistan ## 5: Afghanistan ## --- ## 4899: Zimbabwe ## 4900: Zimbabwe ## 4901: Zimbabwe ## 4902: Zimbabwe ## 4903: Zimbabwe
  • 17. Gotchas: copying E2 <- E E[,foo:='bar'] head(E2[,foo]) ## [1] "bar" "bar" "bar" "bar" "bar" "bar"
  • 18. Gotchas: copying This is because copying is by reference. Use: E2 <- copy(E) instead.
  • 19. Summary more compact faster (sometimes lots) less memory great for aggregation/exploratory data crunching But: - a few traps for the unwary Good package vignettes & FAQ,
  • 20. Related aggregate in base R plyr: use of ddply sqldf: good if you know SQL RSQLlite: ditto other: - RODBC etc: talk to databases - dplyr: nascent, by Hadley, internal & external