0% found this document useful (0 votes)
18 views

22MSM40206 Data Visualisation

This document discusses different types of layers in ggplot2 for data visualization in R. It contains 7 key layers: 1. Data layer - Defines the data source 2. Aesthetic layer - Maps variables to visual properties like color, size etc. 3. Geometric layer - Controls visual geometry like points, lines, bars 4. Facet layer - Creates small multiples by grouping data by variables 5. Statistical layer - Transforms data using binning, smoothing etc. 6. Coordinate layer - Controls axes scales and limits 7. Theme layer - Controls display properties like fonts and background The document demonstrates these layers using the mtcars dataset for creating scatter plots, histograms, bar plots and

Uploaded by

hejoj76652
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

22MSM40206 Data Visualisation

This document discusses different types of layers in ggplot2 for data visualization in R. It contains 7 key layers: 1. Data layer - Defines the data source 2. Aesthetic layer - Maps variables to visual properties like color, size etc. 3. Geometric layer - Controls visual geometry like points, lines, bars 4. Facet layer - Creates small multiples by grouping data by variables 5. Statistical layer - Transforms data using binning, smoothing etc. 6. Coordinate layer - Controls axes scales and limits 7. Theme layer - Controls display properties like fonts and background The document demonstrates these layers using the mtcars dataset for creating scatter plots, histograms, bar plots and

Uploaded by

hejoj76652
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

“Different types of layers in ggplot”

SUBMITTED TO
CHANDIGARH UNIVERSITY, MOHALI PUNJAB
IN PARTIAL FULFILMENT OF THE
REQUIREMENTS FOR THE DEGREE OF
M.SC DATA SCIENCE

Under Guidance Of Presented by:


Dr. Rohit Thakur Aadarsh&Emran
Assistant Professor, 22MSM40226
Chandigarh University 22MSM40206
Msc Data Science
Data visualization with R and ggplot2

Ggplot2 package in R programming language also termed as grammar of graphics is a


• Free
• Open source
• Easy to use viusalization package.

1. Data:- the element is the data itself


2. Aesthetics:-attributes such as x axis,y
axis,color,fill,size,labels,alpha,shape,line,width,line ,type.
3. Geometrics:- How our data being displayed using point ,
line ,histogram,bar,boxplot.
4. Facets:-it displays the subset of the data using columns
and rows.
5. Statistics:-binning,smoothing,descriptive,intermediate.
6. Coordinate:-the space between data and display using
cartesian , fixed,polar,limits.
7. Themes:- non-data link
Dataset Used
mtcars(motor trend car road test) comprise fuel consumption and 10 aspects of automobile design and performance for 32
automobiles and come pre-installed with dplyr package in R.
Dplyr :-Dplyr is a data manipulation package that is part of the tidyverse universe, a collection of libraries that has the goal of
making R faster, simpler and easier

Install.packages(“dplyr”)
Library(dplyr)
Summary(mtcars)#data set used

o/p-
A data frame with 32 observations on 11 (numeric) variables.

[, 1] mpg Miles/(US) gallon


[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (1000 lbs)
[, 7] qsec 1/4 mile time
Engine (0 = V-shaped, 1 =
[, 8] vs
straight)
Transmission (0 = automatic,
[, 9] am
1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors
Data Layer:- here we define the source of the information to be visualized, lets use the mt cars
data set.
Library(ggplot2)
Library(dplyr)
Ggplot(data=mtcars)

• Aesthetic layer :- here we will display and map dataset into certain aesthetic.(attributes)
• Geometric layer:- it control the essential elements,see how our data being
displayed using the
point
line,
histogram,
bar,
boxplot

+geom_point() is the function in the R programming


used for creating scatterplots.
It is used to add points .

Col=disp, col(parameter) means color and disp (variable) that represent the engine displacement of a car. This
variable is commonly used in datasets related to cars or automotive industry.
Col=disp maps the Values of the “disp” variable to the color of the points in the plot.theme means that points with
different “disp” values will be shown in different colors , making it easier to visually distinguish between different
groups groups or levels of the “disp” variable.
• Geometric Layer: Adding size ,color and shape and then ploting histogram.

Factor() :- function is used to convert character vector or categorical variable


in to factor variable

In R, cyl likely refers to a variable that represents the number of cylinders in an engine of a car.
$ and is used to extract .
In R,’am’,likely refers to a variable that represent that represents the type of transmission in a car
• Q why y axis is not used in R?
• Ans:-In a histogram plot, the y-axis represents the frequency or count of observations that fall within each bin or interval on the x-axis.
However, the height of the bars in a histogram is not the primary focus of the plot, and therefore, the y-axis is often not labeled or is simply
labeled as "Frequency" or "Count".
• The primary focus of a histogram is to display the distribution of a single variable.

• Facet Layer:- In R, this allow you to create multiple plots arranged in a grid,where wach plot represent a subset of your data based on
acategorical variable.
• To create facet plot ,facet_grid or facet _wrap() functions are used
• Facet_grid() function creates a grid of plots with one variable along the rows and another variable along the columns.
• Facet_wrap() function create a series of plots, each representing a level of a single variable.
• Statistic Layer :- we transform our data using binning,smoothing,descriptive,intermediate.
• You can use geom_point() to create a scatter plot, and then use stat_smooth() to add a smooth curve to the
plot.

Coordinate Layer: that display the coordinates of the data points. This often used in scatter
plot and other types of plots where the x and y axes represent numeric values.
Function we are going to use is coor_*().
1. Coord_flip():-flip the x axes,useful for horizontal plots or bar charts.
2. Coord_polar():-useful for creating pie charts or radial plots.
3. Coord_sf():- converts the plot to a spatial object using the simple features(sf)package, useful
for Visualizing maps.
4. Coord_cartesian():-sets the limits of the x and y axes, zooming in or out on the plot without
changing the scale.
‘geom_smooth()’ function adds a layer to the plot with a fitted
regression line.
The method=“lm” argument specifies that a linear regression
line should be used.

Geom_bar() :- function adds the bars to the plot with stat=“identity” to


plot the values directly as heights ,and width=5 to set the width of each
bar.
Coord_polar() function sets the plot to coordinates, which creates a
circular chart.
Geom_text()function adds lables to each bar with the percentage of
each fruits amount,using the percent column as the label text. Vjust =
0.5 (vertical justification ) is use to controlling the vertical positioning of
the text or labels to a refrence point.
• Theme layer :- this layer controls the finer points of display like the font size and background color
properties.
• Facet_grid() function to create multiple small multiples, or facets, of the
same scatter plot based on the cyl variable. This means that data is split
into subsets based on the cyl variable , and a separate scatterplot is
created for each subset.
• ~ this symbol is used to separate the variable we want to use for faceting.
• . (single categorical variable )Is used for the same x axis variable for all
the panels,and we want to create separate panels for each level of the
‘cyl variable ’.
• am(also a categorical variable ) we use for faceting, this means we want
to create separate panels for each combination of ‘am’ and ‘cyl’.

You might also like