HW 4
HW 4
HW 4
Enter your name and EID here: Jiaqi Guo (jg76446)
For all questions, include the R commands/functions that you used to find your answer (show R chunk). Answers
without supporting code will not receive credit. Write full sentences to describe your findings.
Part 1
world_bank_pop tidyverse
Question 1: (2 pts)
world_bank_pop
# pivot years 2000 to 2017 into a 'year' variable, and population values into 'indicator
_value'
world_bank_pop_tidy <- world_bank_pop %>%
pivot_longer(
cols = `2000`:`2017`, # Columns for each year from 2000 to 2017
names_to = "year", # New column to hold year values
values_to = "indicator_value" # New column to hold values for each year
) %>%
mutate(year = as.numeric(year)) # Ensure 'year' is a numeric variable
pivot world_bank_pop
year
indicator_value year
world_bank_pop pivot indicator
myworld
## # A tibble: 4,788 × 6
## country year SP.URB.TOTL SP.URB.GROW SP.POP.TOTL SP.POP.GROW
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ABW 2000 41625 1.66 89101 2.54
## 2 ABW 2001 42025 0.956 90691 1.77
## 3 ABW 2002 42194 0.401 91781 1.19
## 4 ABW 2003 42277 0.197 92701 0.997
## 5 ABW 2004 42317 0.0946 93540 0.901
## 6 ABW 2005 42399 0.194 94483 1.00
## 7 ABW 2006 42555 0.367 95606 1.18
## 8 ABW 2007 42729 0.408 96787 1.23
## 9 ABW 2008 42906 0.413 97996 1.24
## 10 ABW 2009 43079 0.402 99212 1.23
## # ℹ 4,778 more rows
Question 2: (2 pts)
ggplot Note: the
country code WLD represents the entire world.
# Filter the data to include only the world population growth data (country code "WLD")
world_growth <- myworld %>%
filter(country == "WLD")
myworld
# Filter data for the year 2017 and find the country with the highest population growth
highest_growth_2017 <- myworld %>%
filter(year == 2017) %>%
filter(SP.POP.GROW == max(SP.POP.GROW, na.rm = TRUE))
highest_growth_2017
## # A tibble: 1 × 6
## country year SP.URB.TOTL SP.URB.GROW SP.POP.TOTL SP.POP.GROW
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 QAT 2017 2686753 4.46 2711755 4.39
Question 3: (2 pts)
countrycode
codelist
continent wb
country.name.en wb
continent mycodes
# Create `mycodes` by selecting necessary columns and removing rows with missing data
mycodes <- codelist %>%
select(continent, wb, country.name.en) %>%
filter(!is.na(wb) & !is.na(continent))
mycodes
## # A tibble: 216 × 3
## continent wb country.name.en
## <chr> <chr> <chr>
## 1 Asia AFG Afghanistan
## 2 Europe ALB Albania
## 3 Africa DZA Algeria
## 4 Oceania ASM American Samoa
## 5 Europe AND Andorra
## 6 Africa AGO Angola
## 7 Americas ATG Antigua & Barbuda
## 8 Americas ARG Argentina
## 9 Asia ARM Armenia
## 10 Americas ABW Aruba
## # ℹ 206 more rows
mycodes
num_country_codes
## # A tibble: 1 × 1
## distinct_codes
## <int>
## 1 216
Question 4: (2 pts)
myworld mycodes
# your code goes below (replace this comment with something meaningful)
num_country_codes <- myworld %>%
summarise(distinct_codes = n_distinct(country))
num_country_codes
## # A tibble: 1 × 1
## distinct_codes
## <int>
## 1 266
inner_join() myworld
mycountries
## # A tibble: 3,870 × 8
## country year SP.URB.TOTL SP.URB.GROW SP.POP.TOTL SP.POP.GROW continent
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 ABW 2000 41625 1.66 89101 2.54 Americas
## 2 ABW 2001 42025 0.956 90691 1.77 Americas
## 3 ABW 2002 42194 0.401 91781 1.19 Americas
## 4 ABW 2003 42277 0.197 92701 0.997 Americas
## 5 ABW 2004 42317 0.0946 93540 0.901 Americas
## 6 ABW 2005 42399 0.194 94483 1.00 Americas
## 7 ABW 2006 42555 0.367 95606 1.18 Americas
## 8 ABW 2007 42729 0.408 96787 1.23 Americas
## 9 ABW 2008 42906 0.413 97996 1.24 Americas
## 10 ABW 2009 43079 0.402 99212 1.23 Americas
## # ℹ 3,860 more rows
## # ℹ 1 more variable: country.name.en <chr>
mycountries
# Find the country code with the highest population growth in 2017
highest_growth_country <- mycountries %>%
filter(year == 2017) %>%
slice_max(order_by = SP.POP.GROW)
highest_growth_country
## # A tibble: 1 × 8
## country year SP.URB.TOTL SP.URB.GROW SP.POP.TOTL SP.POP.GROW continent
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 QAT 2017 2686753 4.46 2711755 4.39 Asia
## # ℹ 1 more variable: country.name.en <chr>
Question 5: (2 pts)
continent mycountries
# Identify the continent with the highest and lowest average growth
highest_growth_continent <- average_growth_by_continent %>%
filter(average_growth == max(average_growth, na.rm = TRUE))
## # A tibble: 1 × 2
## continent average_growth
## <chr> <dbl>
## 1 Africa 3.59
lowest_growth_continent
## # A tibble: 1 × 2
## continent average_growth
## <chr> <dbl>
## 1 Europe 0.499
myafrica2017
# creating new dataset for focusing on Africa Countries for year of 2017
myafrica2017 <- mycountries %>%
filter(year == 2017, continent == "Africa")
myafrica2017
## # A tibble: 54 × 8
## country year SP.URB.TOTL SP.URB.GROW SP.POP.TOTL SP.POP.GROW continent
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 AGO 2017 19586972 4.62 30208628 3.55 Africa
## 2 BDI 2017 1417430 4.82 11155593 2.29 Africa
## 3 BEN 2017 5423582 4.11 11596779 2.95 Africa
## 4 BFA 2017 5701421 5.01 19835858 2.87 Africa
## 5 BWA 2017 1650064 3.20 2401840 2.08 Africa
## 6 CAF 2017 2047664 2.76 4996741 1.87 Africa
## 7 CIV 2017 12505013 3.47 24848016 2.59 Africa
## 8 CMR 2017 13605785 3.91 24393181 2.83 Africa
## 9 COD 2017 36983500 4.76 84283273 3.44 Africa
## 10 COG 2017 3530528 3.08 5312340 2.39 Africa
## # ℹ 44 more rows
## # ℹ 1 more variable: country.name.en <chr>
Question 6: (2 pts)
map_data()
maps
mapWorld
myafrica2017
## # A tibble: 54 × 8
## country year SP.URB.TOTL SP.URB.GROW SP.POP.TOTL SP.POP.GROW continent
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 AGO 2017 19586972 4.62 30208628 3.55 Africa
## 2 BDI 2017 1417430 4.82 11155593 2.29 Africa
## 3 BEN 2017 5423582 4.11 11596779 2.95 Africa
## 4 BFA 2017 5701421 5.01 19835858 2.87 Africa
## 5 BWA 2017 1650064 3.20 2401840 2.08 Africa
## 6 CAF 2017 2047664 2.76 4996741 1.87 Africa
## 7 CIV 2017 12505013 3.47 24848016 2.59 Africa
## 8 CMR 2017 13605785 3.91 24393181 2.83 Africa
## 9 COD 2017 36983500 4.76 84283273 3.44 Africa
## 10 COG 2017 3530528 3.08 5312340 2.39 Africa
## # ℹ 44 more rows
## # ℹ 1 more variable: country.name.en <chr>
Question 7: (2 pts)
ggmap
#
Note: it would be a good idea to run the code piece by piece to see what each layer adds to
the plot. eval=FALSE
# Build a map!
mymap |>
#
ggplot(aes(x = long, y = lat, group = group, fill = SP.URB.GROW)) +
#
geom_polygon(colour = "black") +
#
scale_fill_gradient(low = "red", high = "blue") +
#
labs(fill = "Urban Growth",
title = "Urban Growth in Africa in 2017",
x ="Longitude",
y ="Latitude")
Question 8: (1 pt)
myafrica2017 mapWorld
mapWorld myafrica2017
myafrica2017 |>
anti_join(mapWorld, by =c("country.name.en" = "region")) |>
select(country.name.en)
## # A tibble: 5 × 1
## country.name.en
## <chr>
## 1 Côte d’Ivoire
## 2 Congo - Kinshasa
## 3 Congo - Brazzaville
## 4 São Tomé & Príncipe
## 5 Eswatini
str_detect() mapWorld
myafrica2017
## region
## 1 Ivory Coast
## 2 Democratic Republic of the Congo
## 3 Republic of Congo
## 4 Sao Tome and Principe
## 5 Swaziland
myafrica2017
Hint: use recode() inside mutate() as described in our WS10 or in this article
https://www.statology.org/recode-dplyr/ (https://www.statology.org/recode-dplyr/).
mapWorld myafrica2017
# your code goes below (replace this comment with something meaningful)
Part 2
Question 9: (2 pts)
Formatting: (1 pt)
Open in Browser