Week3 Cheat Sheet Exploratory Data Analysis

This document is a cheat sheet for Exploratory Data Analysis (EDA) that provides a summary of various R functions and their syntax, including 'summarize', 'group_by', 'cor', 'cor.test', 'aov', 'count', 'ggplot', and others. Each function is accompanied by a brief description and an example of its usage. The document also includes a changelog detailing updates made by different authors.

Uploaded by

moonb4115

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views3 pages

Week3 Cheat Sheet Exploratory Data Analysis

Uploaded by

moonb4115

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Cheat Sheet: Exploratory Data Analysis

Command Syntax Description Example

summarize function reduces a
data frame to a summary of
just one vector or value.

.data

A data frame, data frame

extension (e.g. a tibble), or a avg_delays <- sub_airline %>%
lazy data frame group_by(Reporting_Airline,
DayOfWeek) %>%
summarize() summarize(.data, ...)
summarize(mean_delays =
… mean(ArrDelayMinutes),
.groups = 'keep')
Name-value pairs of
summary functions. The
name will be the name of the
variable in the result. The
value should be an expression
that returns a single value like
min(x), n(), or sum(is.na(y))
group_by function takes an
existing table and converts it
into a grouped table where
operations are performed "by
group".

.data
A data frame, data frame
extension (e.g. a tibble), or a sub_airline %>%
group_by(.data, ..., .add =
lazy data frame group_by(Reporting_Airline)
group_by() FALSE, .drop =
%>% summarize(mean_delays =
group_by_drop_default(.data))
.add mean(ArrDelayMinutes))
When FALSE, the default,
group_by() will override
existing groups.

.drop
Drop groups formed by factor
levels that don’t appear in the
data
cor() cor(x, use=, method= ) cor function computes the sub_airline %>%
correlation coefficient select(DepDelayMinutes,
ArrDelayMinutes) %>%
cor(method = "pearson")
x: Matrix or data frame

use: Specifies the handling of

missing data.
method: Specifies the type of
correlation. Options are
pearson, spearman or kendall.
cor.test function is a test for
association/correlation
cor.test(x, y, alternative =
between paired samples. It
c("two.sided", "less", returns both the correlation
"greater"), method = coefficient and the sub_airline %>%
cor.test() c("pearson", "kendall", significance level(or p-value) cor.test(~DepDelayMinutes +
"spearman"), exact = NULL, of the correlation . ArrDelayMinutes, data = .)
conf.level = 0.95, continuity
= FALSE, …)
x, y: numeric vectors of data
values. x and y must have the
same length.
aov function (Analysis of
Variance (ANOVA)) is a
statistical method used to test
whether there are significant
aa_as_subset <- sub_airline
differences between the %>% select(ArrDelay,
means of two or more groups. Reporting_Airline) %>%
filter(Reporting_Airline ==
aov(formula, data = NULL, formula: A formula 'AA' | Reporting_Airline ==
aov projections = FALSE, qr =
TRUE, contrasts = NULL, …) specifying the model. 'AS')

data: A data frame in which ad_aov <- aov(ArrDelay ~

Reporting_Airline, data =
the variables specified in the aa_as_subset)
formula will be found. If
missing, the variables are
searched for in the standard
way.
count function lets you
quickly count the unique
values of one or more
variables
count(df, vars = NULL, wt_var sub_airline %>%
count() = NULL) count(Reporting_Airline)
df: data frame to be processed

vars: variables to count

unique values of
ggplot function initializes a
ggplot object. It can be used
to declare the input data
ggplot(aes(x =
ggplot(data = NULL, mapping = frame for a graphic and to
Reporting_Airline, y =
ggplot() aes(), ..., environment = specify the set of plot DayOfWeek, fill =
parent.frame()) aesthetics intended to be mean_delays))
common throughout all
subsequent layers unless
specifically overridden.
corrplot() corrplot(method=, type=,....) corrplot function provides a corrplot(airlines_cor, method
visual exploratory tool on = "color", col = col(200),
type = "upper", order =
correlation matrix that "hclust", addCoef.col =
supports automatic variable "black", # Add coefficient of
reordering to help detect correlation tl.col = "black",
hidden patterns among tl.srt = 45, #Text label
variables. color and rotation )

method: There are seven

visualization methods
(parameter method) in
corrplot package, named
‘circle’, ‘square’, ‘ellipse’,
‘number’, ‘shade’, ‘color’,
‘pie’

type: There are three layout

types (parameter type): ‘full’,
‘upper’ and ‘lower’.
geom_bar
ggplot(aes(x =
Reporting_Airline, y =
geom_bar(mapping = NULL, data function is used to produce Average_Delays)) +
geom_bar() = NULL, stat = "bin", position
1d area plots: bar charts for geom_bar(stat = "identity") +
= "stack", ...)
categorical x, and histograms ggtitle("Average Arrival
for continuous y. Delays by Airline")
ggplot(avg_delays, aes(x =
Reporting_Airline, y =
geom_tile(mapping = NULL, data geom_tile function tile plane lubridate::wday(DayOfWeek,
geom_tile() = NULL, stat = "identity",
position = "identity", ...) with rectangles. label = TRUE), fill = bins))
+ geom_tile(colour = "white",
size = 0.2)
ggplot(avg_delays, aes(x =
Reporting_Airline, y =
geom_text(mapping = NULL, data lubridate::wday(DayOfWeek,
= NULL, stat = "identity", geom_text used for text label = TRUE), fill = bins))
geom_text() position = "identity", parse = annotation. + geom_tile(colour = "white",
FALSE, ...) size = 0.2) +
geom_text(aes(label =
round(mean_delays, 3)))
ggplot(avg_delays, aes(x =
Reporting_Airline, y =
labs(...)
lubridate::wday(DayOfWeek,
labs Change axis labels and label = TRUE), labs(x =
labs() …
a list of new names in the legend titles "Reporting Airline",y = "Day
of Week",title = "Average
form aesthetic = “new name”
Arrival Delays") fill =
bins)) +
scale_fill_manual function
Change axis labels and
legend titles

…
scale_fill_manual(values =
common discrete scale c("#d53e4f", "#f46d43",
scale_fill_manual() scale_fill_manual(..., values) parameters: name, breaks, "#fdae61", "#fee08b",
labels, na.value, limits and "#e6f598", "#abdda4"))
guide. See discrete_scale for
more details

values: a set of aesthetic

values to map data values to.

Author(s)
Lakshmi Holla

Changelog
Date Version Changed by Change Description
2023-05-11 1.1 Eric Hao & Vladislav Boyko Updated Page Frames
2021-08-09 1.0 Lakshmi Holla Initial Version

Verzani Answers
100% (8)
Verzani Answers
94 pages
R For Health Data Science
100% (1)
R For Health Data Science
365 pages
Modern Statistics With R
100% (3)
Modern Statistics With R
580 pages
Content: Dplyr, Readr, TM, Ggplot2/+ggforce/, Tidyr, Broom Dplyr
No ratings yet
Content: Dplyr, Readr, TM, Ggplot2/+ggforce/, Tidyr, Broom Dplyr
8 pages
Intro To Data Coursera
No ratings yet
Intro To Data Coursera
9 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
P6ADBMS
No ratings yet
P6ADBMS
34 pages
Important R Codes and Notes
No ratings yet
Important R Codes and Notes
13 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
DAV_EXP8
No ratings yet
DAV_EXP8
10 pages
R语言学习笔记
No ratings yet
R语言学习笔记
78 pages
Module 2 ExploratoryDataAnalysis
No ratings yet
Module 2 ExploratoryDataAnalysis
22 pages
BS730 Class 12
No ratings yet
BS730 Class 12
36 pages
R Graphics Essentials For Great Data Visualization
No ratings yet
R Graphics Essentials For Great Data Visualization
28 pages
R For Health Data Science Ewen Harrison Riinu Pius download
No ratings yet
R For Health Data Science Ewen Harrison Riinu Pius download
78 pages
ppt3
No ratings yet
ppt3
20 pages
R-Programming-Cheat-Sheet
No ratings yet
R-Programming-Cheat-Sheet
7 pages
Module IV
No ratings yet
Module IV
43 pages
DSCI 100 Cheat Sheet
No ratings yet
DSCI 100 Cheat Sheet
3 pages
R
No ratings yet
R
6 pages
Graphs and Viz With R
No ratings yet
Graphs and Viz With R
119 pages
Tài liệu không có tiêu đề (1)
No ratings yet
Tài liệu không có tiêu đề (1)
7 pages
Data Manipulation Workshop Handout
No ratings yet
Data Manipulation Workshop Handout
46 pages
Ismaykim1 PDF
No ratings yet
Ismaykim1 PDF
522 pages
R Exercises For Modules
100% (1)
R Exercises For Modules
41 pages
Introduction to R for Business Analytics(1)
No ratings yet
Introduction to R for Business Analytics(1)
7 pages
CRM Cheat Sheet
No ratings yet
CRM Cheat Sheet
7 pages
BDA 09 Shridhti Tiwari
No ratings yet
BDA 09 Shridhti Tiwari
12 pages
Descriptive Statistics, Hypothesis Testing, and Basic
No ratings yet
Descriptive Statistics, Hypothesis Testing, and Basic
62 pages
Unit 3Data Visualization With Ggplot2
No ratings yet
Unit 3Data Visualization With Ggplot2
19 pages
Solutions for QB3
No ratings yet
Solutions for QB3
14 pages
R Workshop Material 18-19, Oct-2023
No ratings yet
R Workshop Material 18-19, Oct-2023
67 pages
Importing The Files
No ratings yet
Importing The Files
14 pages
Basics of Data Analysis and Graphics In
No ratings yet
Basics of Data Analysis and Graphics In
103 pages
Excel and R Integration
No ratings yet
Excel and R Integration
20 pages
Lecture 7 - Integrated Analysis With R
No ratings yet
Lecture 7 - Integrated Analysis With R
79 pages
Module II
No ratings yet
Module II
40 pages
cs446_tool-summarizing-and-visualizing-numerical-variables-in-bbivariate-and-multivariate-analyses
No ratings yet
cs446_tool-summarizing-and-visualizing-numerical-variables-in-bbivariate-and-multivariate-analyses
14 pages
Week4-CheatSheet-ModelDevelopment
No ratings yet
Week4-CheatSheet-ModelDevelopment
4 pages
DataViz Ggplot Sample
No ratings yet
DataViz Ggplot Sample
23 pages
DA_Lab_Week-2
No ratings yet
DA_Lab_Week-2
22 pages
R Imp Funtions
No ratings yet
R Imp Funtions
10 pages
unit 5 big data (1)
No ratings yet
unit 5 big data (1)
19 pages
Graphics
No ratings yet
Graphics
10 pages
R Unit5
No ratings yet
R Unit5
12 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
Figures With GGPlot
No ratings yet
Figures With GGPlot
58 pages
Unit_3 (1)
No ratings yet
Unit_3 (1)
36 pages
Boulder Handout 2019
No ratings yet
Boulder Handout 2019
187 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
All Codes
No ratings yet
All Codes
10 pages
EDAV
No ratings yet
EDAV
218 pages
Creating EDA Reports Using Ggplot2 in R Markdown
No ratings yet
Creating EDA Reports Using Ggplot2 in R Markdown
5 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Drug Poisoning(1)
No ratings yet
Drug Poisoning(1)
70 pages
Week1 Cheat Sheet Dplyr Functions
No ratings yet
Week1 Cheat Sheet Dplyr Functions
2 pages
Chicken Farm Project Cameras Technical Specs
No ratings yet
Chicken Farm Project Cameras Technical Specs
17 pages
DS-MCH208_Datasheet_20241206
No ratings yet
DS-MCH208_Datasheet_20241206
5 pages
HikVision Only 8MP Cameras
No ratings yet
HikVision Only 8MP Cameras
5 pages
CH 18
No ratings yet
CH 18
12 pages
ANFIS Notes
No ratings yet
ANFIS Notes
4 pages
Advanced Econometric Methods I: Problem Set 1: Geert Mesters September 26, 2020
No ratings yet
Advanced Econometric Methods I: Problem Set 1: Geert Mesters September 26, 2020
2 pages
Example A Lossless Reciprocal Network
No ratings yet
Example A Lossless Reciprocal Network
4 pages
MCQ Ai
No ratings yet
MCQ Ai
40 pages
Solver Configuration2
No ratings yet
Solver Configuration2
10 pages
Integer & Goal
No ratings yet
Integer & Goal
9 pages
Deepjun Btech Report
No ratings yet
Deepjun Btech Report
24 pages
HW Mod8 Marian Use
No ratings yet
HW Mod8 Marian Use
3 pages
Research Methodology 22 Year Question
No ratings yet
Research Methodology 22 Year Question
3 pages
BCT Unit 1
No ratings yet
BCT Unit 1
80 pages
Python Lab Manual - III BCA (1 To 10)
No ratings yet
Python Lab Manual - III BCA (1 To 10)
23 pages
Static Electromagnetic Geon
No ratings yet
Static Electromagnetic Geon
4 pages
Paper 12-Application of The Tabu Search Algorithm
No ratings yet
Paper 12-Application of The Tabu Search Algorithm
6 pages
Presentation of AI ML Session 1
No ratings yet
Presentation of AI ML Session 1
131 pages
Basic Linear Algebra For Deep Learning - Built in
No ratings yet
Basic Linear Algebra For Deep Learning - Built in
18 pages
Lec6 Hist KDE
No ratings yet
Lec6 Hist KDE
11 pages
Wang 2021
No ratings yet
Wang 2021
11 pages
Coding Exercise 15 LL Reverse Between ( Interview Question)
No ratings yet
Coding Exercise 15 LL Reverse Between ( Interview Question)
3 pages
Team - 2 Term Glossary
No ratings yet
Team - 2 Term Glossary
5 pages
Xia Text2Loc 3D Point Cloud Localization From Natural Language CVPR 2024 Paper
No ratings yet
Xia Text2Loc 3D Point Cloud Localization From Natural Language CVPR 2024 Paper
10 pages
Data Structures & Algorithms (CS-212) : Week 9: Trees
No ratings yet
Data Structures & Algorithms (CS-212) : Week 9: Trees
57 pages
Monthly Rainfall Prediction Using Wavelet Neural Network Analysis
No ratings yet
Monthly Rainfall Prediction Using Wavelet Neural Network Analysis
15 pages
Linebalancingtext
No ratings yet
Linebalancingtext
5 pages
IC 1403 Neural Network and Fuzzy Logic Control PDF
No ratings yet
IC 1403 Neural Network and Fuzzy Logic Control PDF
6 pages
Daa Obj-17
100% (1)
Daa Obj-17
20 pages
Chapter 18
No ratings yet
Chapter 18
35 pages
cl-12 Applied Maths Lesson Plan 2023-24
No ratings yet
cl-12 Applied Maths Lesson Plan 2023-24
29 pages
ICPC - Training 23
No ratings yet
ICPC - Training 23
5 pages
Database Systems (CS-122) - Lecture 09 & 10
No ratings yet
Database Systems (CS-122) - Lecture 09 & 10
18 pages

Week3 Cheat Sheet Exploratory Data Analysis

Uploaded by

Week3 Cheat Sheet Exploratory Data Analysis

Uploaded by

Cheat Sheet: Exploratory Data Analysis

Command Syntax Description Example

A data frame, data frame

use: Specifies the handling of

data: A data frame in which ad_aov <- aov(ArrDelay ~

vars: variables to count

method: There are seven

type: There are three layout

values: a set of aesthetic

You might also like