0% found this document useful (0 votes)
27 views

Lab 6 Data Visualization

The document provides instructions for visualizing various economic and diamond datasets using R's base graphics and ggplot2 package. Key points from the visualizations include: 1) A line graph showing population steadily increasing over time from 295,000 in 2005 to around 320,000 in 2015. 2) Bar graphs revealing "Fair" cut diamonds have the lowest counts, while "Ideal" cuts have the highest counts of around 22,000. 3) Histograms illustrate most diamond transactions are between 0-1 carats and transactions decrease with higher carat values. 4) Scatterplots show correlations between carat and price are strongest between 1-2 carats and weaken with higher carat values

Uploaded by

Roaster Guru
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Lab 6 Data Visualization

The document provides instructions for visualizing various economic and diamond datasets using R's base graphics and ggplot2 package. Key points from the visualizations include: 1) A line graph showing population steadily increasing over time from 295,000 in 2005 to around 320,000 in 2015. 2) Bar graphs revealing "Fair" cut diamonds have the lowest counts, while "Ideal" cuts have the highest counts of around 22,000. 3) Histograms illustrate most diamond transactions are between 0-1 carats and transactions decrease with higher carat values. 4) Scatterplots show correlations between carat and price are strongest between 1-2 carats and weaken with higher carat values

Uploaded by

Roaster Guru
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

#Lab 6 Data Visualization

1. economics data set is a data set in the ggplot2 package which was produced from US
economic time series data. Using this data set do the following tasks with R’s base
graphics and/or ggplot2 package:
a. Plot a line for the total population (pop), where the x-axis in the plot displays the
month of data collection (date).
Code
library(ggplot2)
data(economics)
#plot line for popn against month
pop_plot<- ggplot(data=economics,aes(x=date,y=pop)) + geom_line()
pop_plot

Output

The line graph depicts the connection between the date and population, indicating that as
time passes, the population rises as well. This suggests a direct association between the
date and the number of individuals. The population growth seems to be consistent or
following a linear trend.
b. Plot a line for the total population (pop) after 2005-01-01. Hint: use filter
function in dplyr package.
Code
install.packages("dplyr")
library(dplyr)
pop_plot2005<- ggplot(data = filter(economics,date> as.Date("2005-01-
01")),aes(x=date,y=pop))+geom_line()
pop_plot2005
Output

The line graph displays a consistent and uninterrupted growth in population over the
depicted time period. Starting at 295,000 on January 1, 2005, the population steadily rises
until 2010. There is a period in 2010 where the population remains stable, showing no
significant fluctuations. However, after that interval, the population resumes its upward
trend, eventually reaching around 320,000 by the end of 2015.
2. diamonds data set is a data set in the ggplot2 package which contains the prices and
other attributes of almost 54,000 diamonds. Using this data set do the following tasks
with R’s base graphics and/or ggplot2 package:
a. Plot a bar chart for quality of the cut (cut).
Code
diamonds
ggplot(data = diamonds,aes(x = cut)) + geom_bar()
Output
The bar graph presents how the count values of various cut categories change over time,
showing a consistent upward trend. Among the categories, "Fair" has the lowest count,
below 2500, which is roughly half of the count in the "Good" category. In the following
period, there is a substantial increase in the count, reaching 12,000 for the "Very Good"
category. The count for the "Premium" category slightly exceeds this, with a difference of
approximately 1500. The "Ideal" category exhibits the highest count value, around
22,000, after experiencing another significant increase.

b. Plot a histogram for weight of the diamond (carat).


Code
ggplot(data = diamonds,aes(x = carat)) + geom_histogram()
Output

The histogram illustrates the distribution of carat values in gold purchases and
transactions. The data reveals that a large portion of transactions occur within the 0-1
carat range, indicating that most people can afford gold within this range. As the carat
value increases beyond one, the number of sales decreases. Particularly, sales are notably
low when the carat value reaches 3. This suggests that a carat value of 3 represents the
highest level of purity and also carries the highest cost.
c. Plot a histogram for weight of the diamond (carat) grouped by diamond clarity
(clarity).
Code
ggplot(data = diamonds,aes(x = carat)) + geom_histogram() + facet_wrap(~clarity)

Output

The histogram displays the weight of diamonds (carat) with an additional categorization
based on clarity. It is evident from all the histograms that the highest count is observed in
the carat group of 0-1, indicating that this range experiences the highest sales.
Conversely, the carat group ranging from 2-3 exhibits the lowest count, implying the
lowest sales. This suggests that the 1-2 carat group is relatively less expensive, while the
2-3 carat group is significantly more expensive.
d. Plot a scatterplot to display values for weight of the diamond (carat) and the
price (price).
Code
ggplot(data = diamonds, aes(x = carat, y = price)) + geom_point()
Output

The scatterplot provided visualizes the connection between two variables: carat and price.
It reveals that there is a dense cluster of points within the carat range of 1 to 2, suggesting
a robust correlation between carat and price within this range. Conversely, the area
between 4 and 5 on the plot demonstrates the lowest density and concentration of points,
indicating a weak correlation. As the carat value increases, the density of points gradually
decreases, as observed in the higher carat values where the points and density reach their
lowest levels.
e. Plot a scatterplot to display values for weight of the diamond (carat) and the
price (price) grouped by diamond color (color).
Code
ggplot(data = diamonds, aes(x = carat, y = price)) + geom_point() + aes(color= color) +
facet_wrap(~color)
Output

The scatterplot depicts the correlation between carat and price within various carat
categories. It reveals that the 1-2 carat range displays the most pronounced correlation
and highest density of points in each scatterplot. Conversely, the 3-4 carat range exhibits
the weakest association and lowest concentration of points. With increasing carat values,
there is a gradual decline in the density of dots, indicating a diminishing relationship
between carat and price.
f. Plot a boxplot for weight of the diamond (carat) grouped by the quality of the cut
(cut).
Code
ggplot(data = diamonds, aes(x= cut, y = carat)) + geom_boxplot()
Output

The boxplot serves as a visual tool to identify quartile values, interquartile range,
maximum and minimum values, and any outliers. Upon analyzing the plot, it is evident
that the premium cut exhibits the widest range of values, whereas the decent cut displays
the narrowest range. Additionally, the quartile values for each cut fall within the range of
0 to 2 carats. It is worth noting that the fair cut has the largest outlier among all the cuts,
while the good cut has the smallest outlier.

You might also like