Week10 Slides Updated
Week10 Slides Updated
Yuting Huang
AY24/25
1 / 80
Programming and data visualization
Two of the most popular programming languages for data science
would be Python and R.
▶ Developmental milestones over the years.
▶ Easy-to-use functions for data visualization in an efficient and
reproducible manner.
2 / 80
Introduction
3 / 80
We start by loading the required package. Note that ggplot2 is
included in tidyverse.
library(tidyverse)
4 / 80
The mpg data set
data(mpg)
head(mpg, 2)
## # A tibble: 2 x 11
## manufacturer model displ year cyl trans drv cty hwy f
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p
5 / 80
The mpg data set
6 / 80
The mpg data set
40
30
hwy
20
2 3 4 5 6 7
displ
7 / 80
Breaking down the syntax
8 / 80
ggplot() template
1. data
2. At least one layer which describes how to render each observation.
Layers are usually created with a geom function.
3. A set of aesthetic mappings between variables in the data and
visual elements in the geom function.
In ggplot2, we create graphs by adding (+) layers.
9 / 80
Source: Adapted from Tanya Shapiro.
10 / 80
Choosing the right plot
11 / 80
Outline
2. Miscellaneous tasks
▶ Themes
▶ Layouts
▶ Common layers
12 / 80
Geometrical objects
13 / 80
Aesthetics mappings
14 / 80
Scatterplot: geom_point()
15 / 80
How to map an aesthetic
To map an aesthetic to a variable, associate the aesthetic to the name
of the variable inside aes()
▶ ggplot2 will automatically assign a unique value of the aesthetic
to each unique value of the variable.
▶ A scatter plot we made under ggplot2:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
40
30
hwy
20
2 3 4 5 6 7
displ
16 / 80
A basic scatterplot
# 2
ggplot(mpg) +
geom_point(aes(x = displ, y = hwy))
# 3
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point()
17 / 80
Global aesthetic mapping
# 3
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point()
18 / 80
Mapping color
We can further visualize the class type of a car.
▶ It classifies cars into groups such as compact, midsize, and SUV.
▶ In the following, we map class to the color aesthetics.
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class))
40
class
2seater
compact
30
midsize
hwy
minivan
pickup
subcompact
20 suv
2 3 4 5 6 7
displ
19 / 80
Braking distance and speed
In Week 2, we created a scatter plot on the relationship between
braking distance and speed using base R plotting function.
data(cars)
new_red <- rgb(1, 0, 0, alpha = 0.4)
plot(cars, col = new_red, pch = 19, cex = 1.8, bty = "n",
xlab = "Speed (mph)", ylab = "Stopping distance (ft)",
main = "Relationship between Speed and Braking")
Relationship between Speed and Braking
120
Stopping distance (ft)
20 40 60 80
0
5 10 15 20 25
Speed (mph)
20 / 80
We can recreate the plot with ggplot2:
▶ Adjust the point size with size.
▶ Set the point colors to be red using the color aesthetic.
▶ Add transparency to the points with alpha.
ggplot(cars, aes(x = speed, y = dist)) +
geom_point(size = 4, color = "red", alpha = 0.5)
125
100
75
dist
50
25
0
5 10 15 20 25
speed
21 / 80
Issues with over-plotting
Solutions:
▶ Adjust the opacity (alpha) of the point.
▶ Add a small random variation to the location of each point
(jitter). This helps to separate the overlapping points.
22 / 80
▶ Opacity: alpha
▶ Jittering: position = "jitter"
ggplot(cars, aes(x = speed, y = dist)) +
geom_point(size = 4, color = "red", alpha = 0.5, position = "jitter")
125
100
75
dist
50
25
0
5 10 15 20 25
speed
23 / 80
Over-plotting: Comparison
40 40 40
30 30 30
20 20 20
2 3 4 5 6 7 2 3 4 5 6 7 2 3 4 5 6 7
24 / 80
1. Anything wrong with this plot?
ggplot(mpg, aes(x = displ, y = hwy, color = "steelblue")) +
geom_point(size = 4, alpha = 0.7, position = "jitter")
40
30 colour
hwy
steelblue
20
2 3 4 5 6 7
displ
25 / 80
▶ The following code produces the expected result.
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(color = "steelblue",
size = 4, alpha = 0.7, position = "jitter")
40
30
hwy
20
2 3 4 5 6 7
displ
26 / 80
2. The color aesthetic can be mapped to logical expressions.
▶ In this example, the class == "suv" takes values of TRUE and
FALSE.
▶ Each value will be mapped to a unique color.
mpg %>%
filter(manufacturer == "chevrolet") %>%
ggplot(aes(x = displ, y = hwy, color = class == "suv")) +
geom_point(size = 4, alpha = 0.7, position = "jitter")
30
25
class == "suv"
hwy
FALSE
TRUE
20
15
3 4 5 6 7
displ
27 / 80
Title and labels
28 / 80
Title and labels
mpg %>%
filter(manufacturer == "chevrolet") %>%
ggplot(aes(x = displ, y = hwy, color = class == "suv")) +
geom_point(size = 4, alpha = 0.7, position = "jitter") +
labs(title = "Fuel efficiency and engine size",
subtitle = "... for Chevrolet",
x = "Engine size (litres)",
y = "Highway fuel efficiency (mph)",
caption = "Source: Environment Protection Agency")
Fuel efficiency and engine size
... for Chevrolet
30
Highway fuel efficiency (mph)
25
class == "suv"
FALSE
TRUE
20
15
3 4 5 6 7
Engine size (litres)
Source: Environment Protection Agency
29 / 80
Smooth line: geom_smooth()
There are several types of smoothers, using different criteria to fit the
lines of best fit. We shall study just a couple of them in brief detail:
▶ Linear regression models.
▶ Loess smoother.
30 / 80
Aesthetics
31 / 80
▶ Let us add a smooth linear regression model (lm) line to the data.
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(position = "jitter") +
geom_smooth(method = "lm", formula = y ~ x) +
labs(x = "Engine Displacement (l)", y = "Highway Miles per Gallon")
40
Highway Miles per Gallon
30
20
10
2 3 4 5 6 7
Engine Displacement (l)
32 / 80
Linear regression smoother
33 / 80
mpg data, higher order polynomials
40
Highway Miles per Gallon
30
20
2 3 4 5 6 7
Engine Displacement (l)
34 / 80
method = "loess" fits a line to a scatter plot that helps us see the
overall trend.
▶ A locally estimated regression fit.
▶ For every xi in the data, loess defines a span (between 0 and 1)
and fits a line using data within that span.
▶ The fitted value at xi becomes an estimate fˆ(xi ).
▶ Then it connects all estimated fˆ(xi ) and forms a smooth curve.
35 / 80
mpg data, loess smoother
40
Highway Miles per Gallon
30
20
2 3 4 5 6 7
Engine Displacement (l)
36 / 80
loess smoother with different span
40 40 40
30 30 30
20 20 20
2 3 4 5 6 7 2 3 4 5 6 7 2 3 4 5 6 7
37 / 80
mpg data, loess smoother by group
The new curve reflects the presence of several cars that have large
engines and efficient highway mileage.
Now suppose that we want to study how this relationship varies with
the drive type:
▶ Front-wheel drive
▶ Rear-wheel drive, and
▶ Four-wheel drive
38 / 80
Loess smoother by group
40
Highway Miles per Gallon
30
20
2 3 4 5 6 7
Engine Displacement (l)
39 / 80
Loess smoother by group with colors
ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
geom_point(position = "jitter") +
geom_smooth(formula = y ~ x, method = "loess")+
labs(x = "Engine Displacement (l)", y = "Highway Miles per Gallon")+
scale_color_discrete(name = "Drive type",
labels = c("4-wheel", "Front-wheel", "Rear-wheel")) +
theme(legend.position = "top")
Drive type 4−wheel Front−wheel Rear−wheel
40
Highway Miles per Gallon
30
20
2 3 4 5 6 7
Engine Displacement (l)
40 / 80
Other considerations
41 / 80
Histogram: geom_histogram()
42 / 80
Aesthetics
Apart from the aesthetics, we also need to consider the following issues:
▶ The width of the bins.
▶ The number of bins.
▶ The location of the bins.
43 / 80
Distribution of earnings
Let us revisit the histogram of earnings using base R.
heights <- read.csv("../data/heights.csv",
header = TRUE, stringsAsFactors = TRUE)
hist(heights$earn/1000, freq = FALSE,
main = "Histogram of Earnings",
xlab = "Earnings Per Annum (in Thousands)",
col = "maroon", border = "white",
breaks = seq(0, 200, by = 10))
Histogram of Earnings
0.030
0.020
Density
0.010
0.000
44 / 80
Distribution of earnings
200
150
count
100
50
45 / 80
Distribution of earnings (revised)
▶ Adjust the bin width, interior fill, titles and labels, . . .
▶ Adjust the boundary of the first bin so it reflects the lower limit of
the data.
ggplot(heights, aes(x = earn/1000)) +
geom_histogram(binwidth = 10, fill = "maroon", boundary = 0) +
labs(title = "Histogram of Earnings",
x = "Earnings Per Annum (in Thousands)", y = "Frequency")
Histogram of Earnings
300
Frequency
200
100
46 / 80
Distribution of earnings (revised)
In Week 3, we use the density for each bin, instead of counts. This
makes the histogram closer in spirit to a probability density function.
▶ We can to tell ggplot2 to use density instead of count with y =
after_stat(density).
ggplot(heights, aes(x = earn/1000, y = after_stat(density))) +
geom_histogram(binwidth = 10, fill = "maroon", boundary = 0) +
labs(title = "Histogram of Earnings",
x = "Earnings Per Annum (in Thousands)", y = "Density")
Histogram of Earnings
0.03
0.02
Density
0.01
0.00
47 / 80
In Week 3, we found that there was a stark difference between males
and females in terms of income earned.
▶ We can present this information by mapping the variable sex to
the fill aesthetics.
ggplot(heights,
aes(x = earn/1000, y = after_stat(density), fill = sex)) +
geom_histogram(binwidth = 10, boundary = 0) +
labs(title = "Histogram of Earnings",
x = "Earnings Per Annum (in Thousands)", y = "Density")
Histogram of Earnings
0.06
0.04
sex
Density
female
male
0.02
0.00
48 / 80
▶ To create a side-by-side bar chart, use position = "dodge".
ggplot(heights,
aes(x = earn/1000, y = after_stat(density), fill = sex)) +
geom_histogram(binwidth = 10, boundary = 0, position = "dodge") +
labs(title = "Histogram of Earnings",
x = "Earnings Per Annum (in Thousands)", y = "Density")
Histogram of Earnings
0.03
0.02 sex
Density
female
male
0.01
0.00
49 / 80
Earnings, smooth density
density
density
0.01 0.01 0.01
50 / 80
Earnings, smooth density
Compare densities using the geom_density() function:
ggplot(heights, aes(x = earn/1000, fill = sex)) +
geom_density(alpha = 0.2) +
labs(title = "Smooth Density Plots of Earnings",
x = "Earnings Per Annum (in Thousands)", y = "Density") +
scale_fill_discrete(name = "Gender", labels = c("Female", "Male"))
Smooth Density Plots of Earnings
0.03
Gender
Density
0.02
Female
Male
0.01
0.00
51 / 80
Earnings, smooth density
Note that smoothness is a relative term. We can actually control it
through an option in the geom_density() function.
▶ The option that controls the smoothing bandwidth is bw.
▶ We should select a degree of smoothness that we can defend as
being representative of the underlying data.
Default (optimal) Oversmoothing Undersmoothing
0.03 0.03
0.03
0.02 0.02
0.02
density
density
density
0.01 0.01 0.01
52 / 80
Line: geom_line()
The line geom connects observations in the order of the variable on the
x-axis (usually date and time).
▶ Suitable for plotting time-series data
▶ The aesthetics that the line geom uses are
▶ x (required)
▶ y (required)
▶ alpha
▶ color
▶ linetype or lty
▶ linewidth or lwd
53 / 80
Resale flat price trends
Let’s visualize the price trends in resale flats using the data set,
resales2024.csv from Week 7.
▶ First, let’s compute the median resale price across months in
selected flat types.
## Rows: 132
## Columns: 3
## $ flat_type <chr> "3 ROOM", "3 ROOM", "3 ROOM", "3 ROOM", "3 R
## $ month <date> 2021-01-01, 2021-02-01, 2021-03-01, 2021-04
## $ med_resale_price <dbl> 320000, 325000, 318000, 307000, 323000, 3280
54 / 80
Resale flat price trends
ggplot(resale, aes(x = month, y = med_resale_price/1000,
color = flat_type)) +
geom_line(lwd = 1) +
labs(title = "Resale flat price trends, 2021 - 2024",
x = "Year", y = "Median resale price (thousands)",
color = "Flat type")
Resale flat price trends, 2021 − 2024
700
Median resale price (thousands)
600
Flat type
3 ROOM
4 ROOM
500
5 ROOM
400
300
2021 2022 2023 2024
Year
55 / 80
Resale flat price trends (revised)
56 / 80
Let’s prepare the data for geom_label() at the end of each line.
▶ Three required aesthetics: x, y, and label.
resale_text <- filter(resale, month == "2024-08-01")
resale_text
## # A tibble: 3 x 3
## flat_type month med_resale_price
## <chr> <date> <dbl>
## 1 3 ROOM 2024-08-01 420000
## 2 4 ROOM 2024-08-01 612500
## 3 5 ROOM 2024-08-01 673000
57 / 80
▶ Notice that in the geom_label() layer, we override the global
mapping by defining a new mapping.
ggplot(resale, aes(x = month, y = med_resale_price/1000,
color = flat_type)) +
geom_line(lwd = 1, show.legend = FALSE) +
geom_label(data = resale_text, aes(label = flat_type),
show.legend = FALSE, size = 2.7,
vjust = "top", hjust = "middle",
nudge_y = -10, nudge_x = 15) +
labs(title = "Resale flat price trends, 2021 - 2024",
x = "Year", y = "Median resale price (thousands)")
Resale flat price trends, 2021 − 2024
700
Median resale price (thousands)
5 ROOM
600 4 ROOM
500
400 3 ROOM
300
2021 2022 2023 2024
Year
58 / 80
Reference lines
59 / 80
Line types
60 / 80
ggplot(resale, aes(x = month, y = med_resale_price/1000,
color = flat_type)) +
geom_line(lwd = 1, show.legend = FALSE) +
geom_label(data = resale_text, aes(label = flat_type),
show.legend = FALSE, size = 2.5,
vjust = "top", hjust = "middle",
nudge_y = -10, nudge_x = 15) +
geom_hline(aes(yintercept = 600), lty = 2, lwd = 0.3) +
labs(title = "Resale flat price trends, 2021 - 2024",
x = "Year", y = "Median resale price (thousands)")
Resale flat price trends, 2021 − 2024
700
Median resale price (thousands)
5 ROOM
600 4 ROOM
500
400 3 ROOM
300
2021 2022 2023 2024
Year
61 / 80
First summary on ggplot2
some text
62 / 80
Variations
By combining geom functions, we can create variations of the basic
plots, such as annotated line chart, lollipop chart (or dumbbell chart),
and slope graph.
63 / 80
Common problems
As you start to use ggplot(), you are likely to run into problems. It
happens to everyone.
R is extremely picky. A misplaced character can make all the
differences.
▶ Make sure that every opening bracket ( is matched with a closing
bracket ); every " is paired with another ".
▶ Check that the + comes at the end of the line, not the start.
▶ If you are stuck, read the error message carefully, Then read the
function documentations.
▶ You can also Google the error message, as it is highly likely that
someone else has had encountered the same issue, and has gotten
help online.
64 / 80
Case study: US gun murders
▶ Last week, we examined the components of a graph on US gun
murders.
▶ We now construct this plot layer-by-layer in ggplot2.
65 / 80
US gun murders
## Rows: 51
## Columns: 5
## $ state <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "Calif
## $ abb <chr> "AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "D
## $ region <chr> "South", "West", "West", "South", "West", "West",
## $ population <int> 4779736, 710231, 6392017, 2915918, 37253956, 50291
## $ total <int> 135, 19, 232, 93, 1257, 65, 97, 38, 99, 669, 376,
66 / 80
US gun murders
1200
800
total
400
0 10 20 30
population/1e+06
67 / 80
US gun murders
2. A second layer of the plot.
▶ Labels to each point to identify the state.
CA
1200
800 TX
FL
total
NY
PA
400 MI
LAMO GA IL
MD NC OH
AZ VA
SC TN
NJ
MS KY IN
AL MA
DC NV
AROK
CT WIWA
NM
DE
AKNE
WV
RI
MT
SDID
ME
HI
KS
UTORCO
IA
MN
0 WY
NH
ND
VT
0 10 20 30
population/1e+06
68 / 80
3. Tweaking the arguments to make the plot easier to read.
▶ Adjust the point size using the size argument in geom_point.
▶ Adjust the text positions slightly to the right or to the left using
nudge_x in geom_text().
ggplot(murders, aes(x = population/1e6, y = total)) +
geom_point(size = 3) +
geom_text(aes(label = abb), nudge_x = 1.5)
CA
1200
800 TX
FL
total
NY
PA
400 MI
LAMO GA IL
MD NC OH
AZ VA
SC TN
NJ
MS KY IN
AL MA
DC NV
AROK
CT WI
WA
NMKS CO
DE
AKNE
WV
RI
MT
SDID
ME
HI UTORMN
IA
0 WY
NH
ND
VT
0 10 20 30 40
population/1e+06
69 / 80
4. Transformation and scales.
▶ Both axes have a highly skewed distribution.
▶ We can use the scale_*_log10() function apply a log
transformation.
▶ Because we are in log-scale now, the nudge must be made smaller.
ggplot(murders, aes(x = population/1e6, y = total)) +
geom_point(size = 3) +
geom_text(aes(label = abb), nudge_x = 0.06)+
scale_x_log10() + scale_y_log10()
CA
1000
TX
FL
NY
MI PA
GA IL
LA MO
MD NCOH
AZ VA
SC TN
NJ
IN
AL MA
MS OKKY
100 DC AR CT WIWA
NV
NM KS CO
total
MN
DE OR
NE
WV
AK UTIA
RI
MT MEID
10
SD HI
WY NH
ND
VT
1 3 10 30
population/1e+06
70 / 80
5. Next, add descriptive labels and a title.
ggplot(murders, aes(x = population/1e6, y = total)) +
geom_point(size = 3) +
geom_text(aes(label = abb), nudge_x = 0.06)+
scale_x_log10() + scale_y_log10() +
labs(title = "US Gun Murders in 2010",
x = "Population in millions (log scale)",
y = "Total number of murders (log scale)", color = "Region")
US Gun Murders in 2010
CA
1000
TX
FL
Total number of murders (log scale)
NY
LA MO MI PA
GA IL
MD VANCOH
AZ NJ
SC TN
IN
AL MA
MS OKKY
100 DC AR CT
NV WIWA
NM KS CO
MN
DE OR
NE
WV
AK UTIA
RI
10
MT MEID
SD HI
WY NH
ND
VT
1 3 10 30
Population in millions (log scale)
71 / 80
6. Categories as colors.
▶ Map the region variable to the col aesthetics for geom_point().
ggplot(murders, aes(x = population/1e6, y = total)) +
geom_point(aes(color = region), size = 3) +
geom_text(aes(label = abb), nudge_x = 0.06) +
scale_x_log10() + scale_y_log10() +
labs(title = "US Gun Murders in 2010",
x = "Population in millions (log scale)",
y = "Total number of murders (log scale)", color = "Region")
US Gun Murders in 2010
CA
1000
TX
FL
Total number of murders (log scale)
NY
LA MO GAMI PA
IL
MD VANCOH
AZ NJ
SC TN
IN
AL MA Region
MSOK KY
100 DC ARCT WIWA
NV North Central
NM KS CO
MN Northeast
DE OR
NE South
WV
AK UTIA
RI West
10
MT MEID
SD HI
WY NH
ND
VT
1 3 10 30
Population in millions (log scale)
72 / 80
US gun murders
7. Reference line.
▶ Next, we want to add a reference line that represents the average
murder rate for the entire country.
▶ The line is defined by the formula: y = rx.
▶ In log-10 scale, this line turns into log(y) = log(r) + log(x).
▶ So in our plot, it is a line with slope 1 and intercept log(r).
## [1] 1.482095
73 / 80
US gun murders
7. Reference line.
▶ To add the line, we use the geom_abline() function.
ggplot(murders, aes(x = population/1e6, y = total)) +
geom_point(aes(color = region), size = 3) +
geom_text(aes(label = abb), nudge_x = 0.06) +
scale_x_log10() + scale_y_log10() +
labs(title = "US Gun Murders in 2010",
x = "Population in millions (log scale)",
y = "Total number of murders (log scale)", color = "Region") +
geom_abline(slope = 1, intercept = log10_r, linetype = 2)
74 / 80
US gun murders
10
MT MEID
SD HI
WY NH
ND
VT
1 3 10 30
Population in millions (log scale)
▶ Next, we can adjust the order of the geom layers: Draw the dashed
line first, so it doesn’t go over the points.
75 / 80
ggplot(murders, aes(x = population/1e6, y = total)) +
geom_abline(slope = 1, intercept = log10_r, linetype = 2) +
geom_point(aes(color = region), size = 3) +
geom_text(aes(label = abb), nudge_x = 0.06) +
scale_x_log10() + scale_y_log10() +
labs(title = "US Gun Murders in 2010",
x = "Population in millions (log scale)",
y = "Total number of murders (log scale)", color = "Region")
US Gun Murders in 2010
CA
1000
TX
FL
Total number of murders (log scale)
NY
LA MO GAMI PA
IL
MD VANCOH
AZ NJ
SC TN
IN
AL MA Region
MSOK KY
100 DC ARCT WIWA
NV North Central
NM KS CO
MN Northeast
DE OR
NE South
WV
AK UTIA
RI West
10
MT MEID
SD HI
WY NH
ND
VT
1 3 10 30
Population in millions (log scale)
76 / 80
US gun murders
8. ggplot2 extensions:
The power of ggplot2 is augmented further due to the availability
of extension packages.
▶ ggthemes contains many popular themes such as
theme_economist() and theme_wsj().
▶ ggrepel stands for repulsive textual annotations. It includes a
geometry that adds labels while ensuring that they do not fall on
top of each other.
# install.packages(c("ggthemes", "ggrepel"))
library(ggthemes)
library(ggrepel)
77 / 80
ggplot(murders, aes(x = population/1e6, y = total)) +
geom_abline(slope = 1, intercept = log10_r, linetype = 2) +
geom_point(aes(color = region), size = 3) +
geom_text(aes(label = abb), nudge_x = 0.07) +
scale_x_log10() + scale_y_log10() +
labs(title = "US Gun Murders in 2010",
x = "Population in millions (log scale)",
y = "Total number of murders (log scale)", color = "Region") +
theme_economist()
CA
1000
FL TX
NY
LA MD MI PA
GA IL
MO NCOH
AZ VA
SC TN NJ
MS KY IN
AL MA
100 DC OK
AR CT WIWA
NV
NM KS CO
MN
DE NE OR
WV UTIA
AK RI
10 MT MEID
SD HI
WY NH
ND
VT
1 3 10 30
Population in millions (log scale)
78 / 80
US gun murders
Final touch:
▶ Replace geom_text() with geom_text_repel().
▶ Save the plot to a file with ggsave().
ggplot(murders, aes(x = population/1e6, y = total)) +
geom_abline(slope = 1, intercept = log10_r, linetype = 2) +
geom_point(aes(color = region), size = 3) +
geom_text_repel(aes(label = abb), color = "black") +
scale_x_log10() + scale_y_log10() +
labs(title = "US Gun Murders in 2010",
x = "Population in millions (log scale)",
y = "Total number of murders (log scale)", color = "Region") +
theme_economist()
79 / 80
US gun murders (final plot)
MI GA CA
1000 FL TX
LA MO VA PA
NY
SC MD AZ OH IL
AR MS OK TN IN NJ
100 NV AL MA NC
DC KY WI
NM CT WA
DE KS CO
NE UT MN
AK OR
RI
MT ID WV IA
10 ME
WY SD HI
NH
ND
VT
1 3 10 30
Population in millions (log scale)
80 / 80