0% found this document useful (0 votes)

4 views

Visualization Techniques

The document outlines various visualization techniques and tools used for data analysis, including histograms, density plots, box plots, bar graphs, pie charts, line charts, and scatterplots. It emphasizes the importance of these techniques in transforming complex data into intuitive graphical representations for better interpretation and decision-making. Additionally, it provides examples using the Titanic dataset and R programming for implementing these visualization methods.

Uploaded by

hoeofjimin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Visualization Techniques

Uploaded by

hoeofjimin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Visualization Techniques

Visualization Tools

1. HISTOGRAM
2. DENSITY PLOT
3. BOX PLOT
4. BAR GRAPH
5. PIE CHART
6. LINE CHART
7. SCATTERPLOT
8. JOINT BAR
GRAPH
Visualization Techniques

 We will utilize various datasets to explore and

understand different visualization techniques.
 Visualization techniques play a crucial role in
transforming complex data into intuitive graphical
representations, allowing for better interpretation and
decision-making.
 By utilizing diverse visualization methods, we can
enhance data analysis, uncover patterns, and
communicate findings more effectively.
Datasets
Here, we have used combination of in-built datasets in R
and external datasets.

 titanic – This data set will have to

import using a R code or else we can
directly import from ‘Import Datasets’
tab in R. (look at attached snapshot)
 titanic_data <- read.csv(path/titanic.csv", header =
TRUE, sep = ",")
 Header = TRUE tells that the first row of
the CSV file contains column names
(headers) instead of data.
 In-Built datasets used: mtcars,
Variables of `titanic’ dataset
Key variables in the Titanic dataset include:

PassengerId: Unique identifier for SibSp: Number of siblings/spouses on

each passenger board
Survived : 1 for survival and 0 for not Parch: Number of parents/children on
surviving board
Pclass: Passenger class (1st, 2nd, or Ticket: Ticket number
3rd) Fare: Ticket fare
Name: Passenger's name Cabin: Cabin number
Gender: Passenger's gender (male or Embarked: Port of embarkation
female) (C = Cherbourg, Q = Queenstown, S =
Age: Passenger's age Southampton)
Histogram- using simple method
• Historgam is used to see how
a distribution of a numerical
variable looks like.
• From the visualization, it can
be concluded that a
significant portion of Explanation

passengers were in the aged  Hist: function to draw hist,

 titanic$Age: To extract values
between 20-30 from.
hist(titanic$Age, main =
containing Age column/variable
"Histogram of Age", xlab = from titanic dataset
"Age“,  Main: Name of the histogram
ggplot2 Terminology

Some of the terminologies used in ggplot2:

•data- what we want to visualize and consists of variables
•geoms - type of plot
geometric objects that are drawn to represent the data, such as bars, lines, and points
(geom_*())
•aesthetics - Defines how variables in the data are mapped to visual
properties
such as x and y position, line color, point shapes, etc
•There are mappings from data values to aesthetics
•Scales -Adjust axes, colors, and sizes in the plot
Histogram : using ggplot

ggplot(titanic, aes(x = Age)) +

geom_histogram(binwidth = 5,
fill = "steelblue", color =
"black", alpha = 0.7) +
labs(title = "Distribution of
Ages of Titanic Passengers", Explanati
onaes(x = Age): Defines the aesthetics of the plot, mapping the

x = "Age", y = "Count") +
Age on x-axis
theme_minimal()  geom_histogram() creates a histogram
 Binwidth: Each bar represents a range.
 Main: Name of the histogram
 labels the y-axis as Count.

Density Plot : using simple method
• A density plot shows the probability
density function of a continuous
variable, providing insights into the
data's distribution, such as mean,
presence of multiple peaks, and the
spread.
• The plot is smoothed, which helps Explanation
1. density(titanic$Age): This calculates the kernel
in visualizing patterns without the
density estimate of the Age variable in the titanic
noise of individual data points.
plot(density(na.omit(titanic$Age) dataset.
), 2. Kernel density estimation is a non-parametric way
main = "Density Plot of Age", to estimate the probability density function of a
continuous random variable.
Density Plot : using ggplot

ggplot(titanic, aes(x = Age,

na.rm=TRUE)) geom_density(fill
= "blue", alpha = 0.5)
labs(title = "Density Plot of Age",
x = "Age", y = "Density") +
theme_minimal()
Explanatio
n
1. ggplot(titanic, aes(x = Age)):This initializes a ggplot object with the dataset titanic
and specifies that the variable Age will be mapped to the x-axis.
2. geom_density(fill = "blue", alpha = 0.5):This adds a density plot to the graph. The fill
= "blue" argument sets the color of the area under the density curve to blue, and
alpha = 0.5 makes the blue color semi-transparent (the value of alpha ranges from 0
for fully transparent to 1 for fully opaque).
Boxplot: using simple method
• Useful for visualizing the
distribution, median, and possible
outliers of a numerical variable.
• It can be seen that there is a
presence of outliers/extreme values
in the dataset. This extreme values
distorts the dataset and results in
biased results
boxplot(titanic$Age, Explanation

main="Boxplot of Age", notch=TRUE: Adds notches to the boxplot.

ylab="Age", col="lightblue", Notches help compare medians: if notches of

notch=TRUE) two boxplots do not overlap, their medians are

significantly different.
Boxplot: using ggplot

ggplot(titanic, aes(y = Age))

geom_boxplot(fill = "lightblue",
color = "black")
labs(title = "Boxplot of Age in
Titanic Dataset",
y = "Age")
theme_minimal()
The box plot can provide answers to the
following questions:

• Is a factor significant?
• Does the location differ between
subgroups?
• Does the variation differ between
subgroups?
• Are there any outliers?
Boxplot: using ggplot
ggplot(titanic, aes(x =
factor(Survived), y = Age, fill =
factor(Survived))) +
geom_boxplot() +
labs(title = "Titanic Survivals by
Age",
x = "Survival (0 = No, 1 = Yes)",
y = "Age") +
scale_fill_manual(values =
c("red", "pink"), labels = c("Did
not Survive", "Survived")) +
theme_minimal()
Plotting Age against Survival (0 = No, 1 = Yes).
Bargraph: using
ggplot
A bar graph showing the count of
passengers by gender in the Titanic
dataset, with "Gender" on the x-axis
and "Count" on the y-axis.

ggplot(titanic, aes(x = Gender, fill

= Gender)) +
geom_bar() +
labs(title = "Bar Graph of Gender",
x = "Gender", y = "Count")
Do it Yourself
Explanation
fill = Gender: Bars are colored based on
Q. Count the number of occurrences of each
gender.
unique value in the Embarked column and
geom_bar() counts the number of
creates bars for each one
occurrences of each gender and plots
the frequency.
Percentage Bar Graph: using simple method
• A percentage bar graph visually
represents proportions of different
categories as parts of a whole.
• Each bar represents 100% and is divided
into segments corresponding to different
categories, showing their relative
percentages.
• This is a stacked percentage bar chart
showing the survival rate across different
Step Explanation
passenger classes Converted 'Survived' and 'Pclass' to
1
titanic$Survived <-
factors
factor(titanic$Survived, labels =
c("No", "Yes"))
titanic$Pclass <-
Joint Bar Graph: using ggplot
This allows you to easily compare how the
distribution of gears varies across different
cylinder categories.
Step
Dataset: mtcars
2:
ggplot(mtcars, aes(x = cyl, fill = gear)) +
geom_bar(position = "dodge") +
labs(title = "Joint Bar Graph: Number of
Cylinders vs Number of Gears",
x = "Number of Cylinders", y = "Count of
Cars",
fill = "Number of Gears") +
Explanation
theme_minimal()
:
geom_bar(position = "dodge") to display the bars side by side. Dodge means the bars for
different groups
Step
2
ggplot(titanic, aes(x = Pclass, fill = Survived)) +
geom_bar(position = "fill") +
scale_y_continuous(labels =
scales::percent_format()) + Convert y-axis to
percentage
labs(title = "Survival Percentage by Passenger
Class",
x = "Passenger Class", y = "Percentage", fill =
Explanati
"Survived") theme_minimal()
geom_bar(position
on = "fill") → Converts counts to
proportions (percentage).
scale_y_continuous(labels =
scales::percent_format()) → Stacked bars scaled to
100%
fill = Survived → Colors bars based on survival
Stack bar chart: showing exact count

ggplot(titanic, aes(x =
as.factor(Pclass), fill =
as.factor(Survived))) +
geom_bar(position = "stack") +
labs(x = "Passenger Class", y =
"Count", fill = "Survived") +
theme_minimal() +
scale_fill_manual(values
Explanati = c("red",
"green"))
on
Pclass is on the x-axis, and the bars are filled based on Survived.
geom_bar(position = "stack"): This tells ggplot2 to stack the bars.
Pie Chart: using simple method
• Pie Chart is used to see to the
proportion of each categories of a
particular categorical variable.
• It produces a pie chart showing the
distribution of the different classes of
passengers (e.g., 1st class, 2nd class,
3rd class) in the dataset, with each
group represented in a different color. Explanati
1. type_counts <- table(titanic$Pclass)
type_counts <- on line creates a frequency table of the group
This

table(titanic$Pclass) column in the titanic dataset. The table() function

counts how many times each unique group appears.
colors <- c("skyblue",
This would typically categorize the plants into
"orange","green")
different growth conditions (like “1 st class", “2nd
pie(type_counts, col = colors, rd
Pie Chart which shows percentage
type_counts <- table(titanic$Pclass)
colors <- c("skyblue", "orange",
"green") percent_labels <-
round(type_counts /
sum(type_counts) * 100, 1)
labels <- paste(names(type_counts),
"\n", percent_labels, "%")
pie(type_counts, col = colors, main =
Explanati
"Pie Chart of Pclass", labels = labels)
type_counts
on <- table(titanic$Pclass)- Count
occurrences of each Pclass
percent_labels <- round(type_counts /
sum(type_counts) * 100, 1)- converts counts to
percentages.
labels <- paste(names(type_counts), "\n",
percent_labels, "%") - adds percentages to class
labels.
Pie Chart: using ggplot
• 1st step,
type_counts <-
as.data.frame(table(titanic$Pclass))
colnames(type_counts) <- c(“Pclass",
"Count")
• 2nd step, Explanati
1. x = "": This sets the x-axis to a constant value (empty
colors <- c("skyblue", "orange", on
string), which is necessary for creating a pie chart.
"green")
2. y = Count: This maps the Count variable from the data to
• 3 step,
rd
the size of the slices in the pie chart.
ggplot(type_counts, aes(x = "", y =
3. geom_bar(stat = "identity"): This tells ggplot to create a
Count, fill = Pclass)) +
bar chart where the heights (or values) of the bars are
geom_bar(stat = "identity", width = 1) + directly taken from the data (Count), rather than being
coord_polar("y") + counted.
theme_void() + 4. width = 1: This sets the width of the bars to 1, meaning no
labs(title = "Titanic Passenger Class space between the slices of the pie chart.
5. coord_polar(theta = "y"):This transforms the bar
chart into a pie chart by applying a polar
coordinate system.theta = "y" ensures that the y-
axis values are used to create the angles of the pie
slices.
6. scale_fill_manual(values = colors):This specifies
the colors to be used for each group. colors is
presumably a vector of color values corresponding
to the different groups in the Group column.
Pie Chart: using ggplot
1. type_counts <-
as.data.frame(table(titanic$Pclass))
2. colnames(type_counts) <- c("Pclass", "Count")
type_counts$Percentage <-
round(type_counts$Count /
sum(type_counts$Count) * 100, 1) Explanati
3. type_counts$Label <- 1.
on Count occurrences of each class.
2. Converting proportions into
paste0(type_counts$Pclass, " (", percentages and rounding off to 1.
type_counts$Percentage, "%)") 3. Paste commands joins text together
without spaces.
4. colors <- c("skyblue", "orange", "green") 4. Fills slices with different colors based on
5. ggplot(type_counts, aes(x = "", y = Count, fill = Pclass.
5. Uses actual values instead of counting
as.factor(Pclass))) + geom_bar(stat = "identity", occurrences automatically.
width = 1, color = "black") +
coord_polar("y") + theme_void() + labs(title =
Line chart: using simple method
It is useful for comparing different
categories across the same variables.
From visualizing Passenger Miles against
year, it can be concluded that at former
increases with time.
Dataset used: airmiles (inbuilt in R)

Step 1:
plot(airmiles, type = "o", col =
"blue", lwd = 2,
xlab = "Year", ylab = "Passenger
Miles (millions)",
main = "Airline Passenger Miles
Explanation
(1937-1960)")
1. type = "o": Plot both points and lines, with points overlaid on the
lines.
Step 1:
Line Chart: using ggplot
airmiles1 <- data.frame(Year = 1937:1960,
Miles = as.numeric(airmiles))
Step 2:
ggplot(airmiles1, aes(x = Year, y = Miles))
+ geom_line(color = "blue", linewidth =
1.2) + geom_point(color = "blue", size =
2) +
labs(x = "Year", y = "Passenger Miles
(millions)",
Explanation
title = "Airline Passenger Miles (1937-
Convert 'airmiles'
1960)") to a dataframe
+ theme_minimal()
geom_line() adds a line connecting the data points.
geom_point()
size = 2 increases the size of the points.
Scatterplot: using
Scatterplot is being used to show ggplot
relationship between two variables.
You can easily that there is a negative
correlation between MPG and weight
Dataset used: mtcars
plot(mtcars$mpg, mtcars$wt, main
= "Scatter plot of MPG vs Weight",
xlab = "Miles per Gallon (MPG)",
ylab = "Weight (wt)",
pch = 19, col = "blue")
Explanati
on
The plot function automatically creates a scatter plot by pairing each value of mpg
with its corresponding value of wt.
pch stands for "plot character" and the number 19 specifies that the points should
be filled circles.
Scatter plot: using ggplot

ggplot(mtcars, aes(x = mpg, y = wt)) +

geom_point() + labs(title = "Scatterplot of
mpg vs weight", x = "mpg", y = "weight")
theme_minimal()

Explanation
Convert 'airmiles' to a dataframe
geom_line() adds a line connecting the data points.
geom_point()
size = 2 increases the size of the points.

Mooring Winch Brake Capacity Calculation
60% (10)
Mooring Winch Brake Capacity Calculation
1 page
Data Visualization With Ggplot2: Install Packages
No ratings yet
Data Visualization With Ggplot2: Install Packages
19 pages
08 Titanic
No ratings yet
08 Titanic
19 pages
Matplotlib (2)
No ratings yet
Matplotlib (2)
5 pages
Titanic Prediction
No ratings yet
Titanic Prediction
53 pages
Ex4
No ratings yet
Ex4
4 pages
Lab Manual _DSR
No ratings yet
Lab Manual _DSR
32 pages
Code Diagram - C196
No ratings yet
Code Diagram - C196
13 pages
DVA Practical
No ratings yet
DVA Practical
19 pages
Assignment1
No ratings yet
Assignment1
2 pages
cs446_tool-summarizing-and-visualizing-numerical-variables-in-bbivariate-and-multivariate-analyses
No ratings yet
cs446_tool-summarizing-and-visualizing-numerical-variables-in-bbivariate-and-multivariate-analyses
14 pages
I2IT DataVisualizationI - JupyterLab
No ratings yet
I2IT DataVisualizationI - JupyterLab
18 pages
Uncertainty
No ratings yet
Uncertainty
69 pages
DataVis Cheat Sheet
No ratings yet
DataVis Cheat Sheet
13 pages
3 Ggplot PDF
No ratings yet
3 Ggplot PDF
19 pages
Assignment ICT EE a 2
No ratings yet
Assignment ICT EE a 2
18 pages
Visualization - Hist and Box
No ratings yet
Visualization - Hist and Box
23 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
14 pages
Visualization in R
No ratings yet
Visualization in R
44 pages
Sl-3 Assignment No.8
No ratings yet
Sl-3 Assignment No.8
21 pages
9
No ratings yet
9
4 pages
Experiment-7: Problem Statement: Data Visualisations-Ii in R
No ratings yet
Experiment-7: Problem Statement: Data Visualisations-Ii in R
9 pages
DSR_Unit 2-2.1 ExploringBasicgraphs
No ratings yet
DSR_Unit 2-2.1 ExploringBasicgraphs
51 pages
Apuntes de Clase - DataCamp - Visualization in Higher Dimensions
No ratings yet
Apuntes de Clase - DataCamp - Visualization in Higher Dimensions
50 pages
Titanic
No ratings yet
Titanic
22 pages
Unit 3Data Visualization With Ggplot2
No ratings yet
Unit 3Data Visualization With Ggplot2
19 pages
ADS exp3
No ratings yet
ADS exp3
6 pages
Data Visualization in R Sem-III 2021 PDF
No ratings yet
Data Visualization in R Sem-III 2021 PDF
57 pages
Ex2
No ratings yet
Ex2
5 pages
Data Visualization With Seaborn PDF
No ratings yet
Data Visualization With Seaborn PDF
12 pages
BDA Experiment 9 and 10
No ratings yet
BDA Experiment 9 and 10
22 pages
Cheatsheet Data Visualization
100% (1)
Cheatsheet Data Visualization
5 pages
R Module 4
No ratings yet
R Module 4
31 pages
Plotting With Ggplot: Install - Packages ("Ggplot2") Library (Ggplot2)
No ratings yet
Plotting With Ggplot: Install - Packages ("Ggplot2") Library (Ggplot2)
3 pages
Summarising Categorical Variables in R
No ratings yet
Summarising Categorical Variables in R
4 pages
R Markdown: Eman 2024-06-16
No ratings yet
R Markdown: Eman 2024-06-16
1 page
Data Visualization With R Ggplot2
No ratings yet
Data Visualization With R Ggplot2
236 pages
Data Science Assignment Submission
No ratings yet
Data Science Assignment Submission
12 pages
ppt3
No ratings yet
ppt3
20 pages
Using Ggplot2 For Plots in R
No ratings yet
Using Ggplot2 For Plots in R
8 pages
Data Visualization part 2
No ratings yet
Data Visualization part 2
18 pages
Data Visualization with Python
No ratings yet
Data Visualization with Python
42 pages
Data Visualization With Ggplot2, Asthetic Mappings, Facets, Common Problems, Layered Grammar of Graphics
No ratings yet
Data Visualization With Ggplot2, Asthetic Mappings, Facets, Common Problems, Layered Grammar of Graphics
21 pages
Advanced Visualisationv1
No ratings yet
Advanced Visualisationv1
22 pages
Cs446 Tool Using Formulas Within Functions
No ratings yet
Cs446 Tool Using Formulas Within Functions
3 pages
2 Table and Graphical Representations
No ratings yet
2 Table and Graphical Representations
46 pages
lecture-week3
No ratings yet
lecture-week3
51 pages
Exercise 7 - Integrated Analysis with R
No ratings yet
Exercise 7 - Integrated Analysis with R
27 pages
R Ggplot2 Package
No ratings yet
R Ggplot2 Package
21 pages
File Show
No ratings yet
File Show
2 pages
Saveetha Institute of Medical and Technical Sciences: Unit V Plotting and Regression Analysis in R
No ratings yet
Saveetha Institute of Medical and Technical Sciences: Unit V Plotting and Regression Analysis in R
63 pages
Pythion Assigment
No ratings yet
Pythion Assigment
3 pages
Ass 8 DSBDL
No ratings yet
Ass 8 DSBDL
27 pages
Matplotlib Material - Opendir - Cloud
No ratings yet
Matplotlib Material - Opendir - Cloud
14 pages
Data Visualization
No ratings yet
Data Visualization
46 pages
Experiment No 8
No ratings yet
Experiment No 8
26 pages
STATA Graphics
No ratings yet
STATA Graphics
35 pages
Seaborn
No ratings yet
Seaborn
17 pages
Apuntes de Clase - DataCamp - R
No ratings yet
Apuntes de Clase - DataCamp - R
42 pages
Math 2830 Chapter 02 Slides
No ratings yet
Math 2830 Chapter 02 Slides
42 pages
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
IQ Easy 5201062 Rev L
No ratings yet
IQ Easy 5201062 Rev L
2 pages
Circuits and Signals I Exam Question Paper
No ratings yet
Circuits and Signals I Exam Question Paper
8 pages
Exp 1 Analytical Methods in Chemistry
No ratings yet
Exp 1 Analytical Methods in Chemistry
10 pages
Formula Sheet
No ratings yet
Formula Sheet
4 pages
Maths - Stage - 9 - Set-2 Paper-1
0% (1)
Maths - Stage - 9 - Set-2 Paper-1
11 pages
Data Multi Axial
No ratings yet
Data Multi Axial
2 pages
Circle-theorems past questions
No ratings yet
Circle-theorems past questions
22 pages
Regeln Pid Compactv2 Doku v1 0 en
No ratings yet
Regeln Pid Compactv2 Doku v1 0 en
56 pages
Math IA Second Draft
No ratings yet
Math IA Second Draft
18 pages
Unit 5 New PDF
No ratings yet
Unit 5 New PDF
52 pages
DartVision (Proposal)
No ratings yet
DartVision (Proposal)
53 pages
DIP9526K-H Spec
No ratings yet
DIP9526K-H Spec
4 pages
Electro Pneumatics
No ratings yet
Electro Pneumatics
17 pages
Uipath Interview Questions and Their Possible Answers
No ratings yet
Uipath Interview Questions and Their Possible Answers
16 pages
Proses Pembuatan DIPHENYLAMINE Patent Translate
No ratings yet
Proses Pembuatan DIPHENYLAMINE Patent Translate
18 pages
Ovulation Prediction 2014
No ratings yet
Ovulation Prediction 2014
19 pages
Input Modules: Siga-Mm1 & Siga-Wtm
No ratings yet
Input Modules: Siga-Mm1 & Siga-Wtm
4 pages
IED-Review Engineering Formula Sheet
100% (1)
IED-Review Engineering Formula Sheet
10 pages
Chem 151 D10S - New
No ratings yet
Chem 151 D10S - New
2 pages
Python Stage 1 Script
No ratings yet
Python Stage 1 Script
2 pages
Chocolate Production
No ratings yet
Chocolate Production
13 pages
Engineering Drawing Lab Manual
No ratings yet
Engineering Drawing Lab Manual
51 pages
A106 Revision Slides (16-Feb-2024)
No ratings yet
A106 Revision Slides (16-Feb-2024)
20 pages
Deletion From A Binary Search Tree
No ratings yet
Deletion From A Binary Search Tree
10 pages
Unit 4 LSTM
No ratings yet
Unit 4 LSTM
85 pages
Do Not Be Afraid
No ratings yet
Do Not Be Afraid
10 pages
Mat Kpsea 2023 Predictions
100% (1)
Mat Kpsea 2023 Predictions
34 pages
Bisacodyl Suppositories JPXVIII
No ratings yet
Bisacodyl Suppositories JPXVIII
2 pages
Graphical Solution of Linear Programming Models
No ratings yet
Graphical Solution of Linear Programming Models
44 pages

Visualization Techniques

Uploaded by

Visualization Techniques

Uploaded by

Visualization Techniques

 We will utilize various datasets to explore and

 titanic – This data set will have to

PassengerId: Unique identifier for SibSp: Number of siblings/spouses on

passengers were in the aged  Hist: function to draw hist,

Some of the terminologies used in ggplot2:

ggplot(titanic, aes(x = Age)) +

ggplot(titanic, aes(x = Age,

main="Boxplot of Age", notch=TRUE: Adds notches to the boxplot.

ylab="Age", col="lightblue", Notches help compare medians: if notches of

notch=TRUE) two boxplots do not overlap, their medians are

ggplot(titanic, aes(y = Age))

ggplot(titanic, aes(x = Gender, fill

table(titanic$Pclass) column in the titanic dataset. The table() function

ggplot(mtcars, aes(x = mpg, y = wt)) +

You might also like