5-RVisualizingData
5-RVisualizingData
install.package("nycflights13")
Scatter Plot
Creating Scatter Plot
The function plot() is basic R function to visualize the data.
By providing a numeric or integer vector to plot(), we can produce a
scatter plot of value by index.
We can plot a scatter plot of 10 points in the increasing order as
follows:
plot(1:10)
Scatter Plot – 2 Vectors
We can generate two linearly correlated random numeric vectors to
create a more realistic scatter plot.
x <- rnorm(200)
y <- 2*x + rnorm(200)
plot(x,y)
plot(x, y,
main = ”Correlated Random numbers",
xlab = "x", ylab = "2x + noise",
xlim = c(-3, 3), ylim = c(-6, 6))
Customize Chart Elements
We can specify the chart tile by either the main argument or a separate
title() function call. The following code will plot the same chart as given
above.
plot(x, y,
xlab = "x", ylab = "2x + noise",
xlim = c(-3, 3), ylim = c(-6, 6))
title("Correlated Random numbers")
Custom Point Style
For a scatter plot, the default point style is a circle. We can specify the
pch argument (plotting character), to change the point style. 26 point
styles are available in R
x <- rnorm(200)
y <- 2*x + rnorm(200)
plot(x,y, pch = 17,
main = "Scatter plot with pch = 17")
Scatter Plot – Logical Condition
We can also distinguish the two groups of points by a logical condition. We know
that pch is vectorized.
So, we can use ifelse() to specify the point of each observation based on certain
condition.
The following example applies pch = 17 to the points satisfying x * y > 1 otherwise pch
= 1;
x <- rnorm(200)
y <- 2*x + rnorm(200)
plot(x,y,
pch = ifelse(x * y > 1, 17, 1),
main = "Scatter plot with conditional pch")
Scatter Plot – 2 Data Sets
A plot containing two separate datasets sharing the same x-axis can be drawn using plot() and points().
In the previous example, a normally distributed vector x, and a linearly correlated random vector y were
generated.
For this example, we will generate another random vector, z, that has a non-linear relationship with x. In
this example, we have plotted both y and z against x whereas both the plots have different point styles:
x <- rnorm(75)
y <- 1.5*x + rnorm(75)
z <- sqrt(1 + x ^ 2) + rnorm(75)
plot(x, y, pch = 1,
xlim = range(x), ylim = range(y, z),
xlab = "x", ylab = "value")
points(x, z, pch = 17)
title("Scatter plot with two datasets")
Scatter Plot – 2 Data Sets
• In the preceding example, first, we created datasets x, y, and z.
• Then we created a plot of x and y. Then we added another group of points z with
a different pch.
• We have specified ylim = range(y, z).
• This is to ensure that the plot builder consider the range of both y and z.
• The points() does not lengthen the axes created by plot().
• Due to which any point beyond the axes range will disappear.
• By specifying ylim = range(y, z), we have ensured that all the points in y and z are
shown in the plot area.
Customizing Point Colors
We can specify different point colors by setting the column of plot():
x <- rnorm(75)
y <- 1.5*x + rnorm(75)
plot(x, y, pch = 15, col = "blue", main = "Blue Color Scatter Plot")
Customizing Point Colors
Different colors can be applied to separate points that belong to different
categories if they satisfy certain conditions.
t <- 1:50
y <- 2.5 * sin(t * pi / 60) + rnorm(t)
plot(t, y, type = "l", main = "Line plot")
Line Type and Width
For the line plot, we can use lty to specify the line type of a line plot. It is similar to
pch for scatter plot. The preview of the six-line types that R supports is shown
below.
p <- 40
plot(t[t <= p], y[t <= p], type = "l",
xlim = range(t), xlab = "t", ylab = "y")
lines(t[t >= p], y[t >= p], lty = 2)
title("Two period Line Plot")
Line Plot with Points
We can plot both the lines and points in the same chart. This can be done easily by first plotting a
line chart and then adding points() of the same data to the plot again.
t <- 1:30
y1 <- 1.5 * t + 6 * rnorm(30)
y2 <- 2.5 * sqrt(t) + 8 * rnorm(30)
plot(t, y1, type = "l", col = "black",
ylim = range(y1, y2), ylab ="y1, y2")
points(y1, pch = 15)
lines(y2, col = "blue", lty = 2)
points(y2, col = "blue", pch = 16)
title ("Plot of two series")
legend("topleft",
legend = c("y1", "y1"),
col = c("black", "blue"),
lty = c(1, 2), pch = c(15, 16),
cex = 0.8, x.intersp = 0.5, y.intersp = 0.8)
Multi Series Chart with Legend
• In the above example, we have added a legend() on the top left.
• It shows the line and point styles of y1 and y2 respectively.
• We have also used cex to scale the font sizes of the legend and x.intersp and y.intersp
to make some minor adjustments to the legend.
Bar Chart
Bar Chart
The bar charts are one of the most commonly used charts. We use bar charts to
visualize the qualitative data by category frequency.
To plot the bar chart we use barplot() function instead of plot() function.
The function draws either vertical or horizontal bars that are separated by white
space.
Even though we display the raw frequencies, but we can use barplot to visualize
other quantities, such as means or proportions, which directly depend upon these
frequencies.
Bar Chart
The basic syntax to create a barplot in R is:
carriers
9E AA AS B6 DL EV F9 FL HA MQ OO UA US VX WN
18460 32729 714 54635 48110 54173 685 3260 342 26397 32 58665 20536 5162 12275
YV
601
In the previous code, we have used table() to count the number of flights in the record for each carrier. Now sort the carriers in decreasing order.
carriers_sort <- sort(carriers, decreasing = TRUE)
carriers_sort
UA B6 EV DL AA MQ US 9E WN VX FL AS F9 YV HA
58665 54635 54173 48110 32729 26397 20536 18460 12275 5162 3260 714 685 601 342
OO
32
Project NYCflights – Part 1
Now we can take the first 8 elements from the table and draw a bar plot:
barplot(head(carriers_sort, 8),
ylim = c(0, max(carriers_sort) * 1.1),
xlab = "Carrier", ylab = "Flights",
main ="Top 8 carriers ordered by number of flights")
Pie Chart
Pie Chart
Pie charts are also useful charts for data analysis. We can use the pie() function to create a pie
chart. The pie-chart is a representation of values as slices of a circle with different colors.
x: vector that contains the numeric values that are used in the pie chart
labels: to provide the description of the slices
radius: to provide the radius of the circle of the pie chart (value between -1 and +1)
main: to provide the title of the chart
col: indicates the color palette
clockwise: indicates whether the slices are drawn clockwise or anti-clockwise
Pie Chart
The following code is an example of the implementation of pie() function.
Now we can make a histogram of the speed of an aircraft from the nycflights13
dataset.
We can calculate the speed of an aircraft by dividing the distance of the trip
(distance) by the air time (air_time)
We observe that the distribution is different from a normal distribution. So, we can
use density() function to estimate the empirical distribution of the speed and plot a
smooth probability distribution curve. We have also added a vertical line to indicate
the global average of all the observations.
hist(ft_speed,
probability = TRUE, ylim = c(0, 0.5),
main ="Histogram & distribution of flight speed",
xlab = "Flight Speed",
border ="gray", col = "lightgray")
lines(density(ft_speed, from = 2, na.rm = TRUE),
col ="darkgray", lwd = 2)
abline(v = mean(ft_speed, na.rm = TRUE),
col ="blue", lty =2)
Thanks
Samatrix Consulting Pvt Ltd