0% found this document useful (0 votes)
10 views

03 - Demographic Data Analysis

Datasets are provided in CSV and R vector formats for demographic data analysis. Scatter plots are to be produced depicting relationships between variables like birth rate, internet usage, life expectancy, and fertility rate categorized by country income, region, and year (1960 vs 2013). Insights are also requested on changes between the two time periods.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

03 - Demographic Data Analysis

Datasets are provided in CSV and R vector formats for demographic data analysis. Scatter plots are to be produced depicting relationships between variables like birth rate, internet usage, life expectancy, and fertility rate categorized by country income, region, and year (1960 vs 2013). Insights are also requested on changes between the two time periods.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

3.

Demographic Data Analysis


As a Data Scientist you are analyzing the World’s Demographic trends.

1. You are required to produce a scatterplot illustrating Birth Rate and Internet Usage statistics by Country. The scatter
plot needs to also be categorized by Countries’ Income Groups.
2. Produce a second scatterplot also illustrating Birth Rate and Internet Usage statistics by Country. Scatterplot needs to
be categorized by Countries’ Regions. Additional data required is present in Vectors
3. You are required to produce a scatterplot depicting Life Expectancy (y-Axis) and Fertility Rate(x-Axis) statistics by
Country. The Scatter plot needs to also be categorized by Countries’ Regions.
4. You are supplied with data for 2 years:1960 and 2013 and are required to produce a visualization for each of these
years. Some data has been provided in a CSV file and some in R-Vectors. The CSV file contains combined data for
both years.
5. All data manipulation has to be done on R as the project may be audited at a later stage.
6. You have also been requested to provide insights into how the two periods compare.
Hint#1: After you import the CSV file into R, the first step is to split the data frame into two: data1960 and data2013 – Use data
frame filtering

Solution:

Datasets are provided as .CSV files. Additional Data is given as vectors in demographic_data.r

demographic_data.r

rm(list=ls())

#Execute below code to generate three new vectors

Countries_2012_Dataset <- c("Aruba","Afghanistan","Angola","Albania","United Arab Emirates", "Argentina", "Armenia",


"Antigua and Barbuda","Australia","Austria","Azerbaijan","Burundi","Belgium","Benin","Burkina Faso", "Bangladesh",
"Bulgaria","Bahrain","Bahamas, The","Bosnia and Herzegovina", "Belarus", "Belize", "Bermuda", "Bolivia", "Brazil",
"Barbados","Brunei Darussalam","Bhutan","Botswana","Central African Republic", "Canada", "Switzerland", "Chile", "China",
"Cote d'Ivoire","Cameroon","Congo, Rep.","Colombia","Comoros","Cabo Verde","Costa Rica","Cuba","Cayman Islands",
"Cyprus","Czech Republic","Germany","Djibouti","Denmark","Dominican Republic","Algeria","Ecuador","Egypt, Arab Rep.",
"Eritrea","Spain","Estonia","Ethiopia","Finland","Fiji","France","Micronesia, Fed. Sts.","Gabon","United Kingdom", "Georgia",
"Ghana","Guinea","Gambia, The","Guinea-Bissau","Equatorial Guinea", "Greece", "Grenada", "Greenland", "Guatemala",
"Guam","Guyana","Hong Kong SAR, China","Honduras","Croatia","Haiti","Hungary","Indonesia","India","Ireland","Iran,
Islamic Rep.","Iraq","Iceland","Israel","Italy","Jamaica","Jordan","Japan","Kazakhstan","Kenya","Kyrgyz Republic",
"Cambodia","Kiribati","Korea, Rep.","Kuwait","Lao PDR","Lebanon","Liberia","Libya","St. Lucia","Liechtenstein","Sri
Lanka","Lesotho","Lithuania","Luxembourg","Latvia","Macao SAR, China", "Morocco", "Moldova", "Madagascar",
"Maldives","Mexico","Macedonia, FYR", "Mali", "Malta", "Myanmar", "Montenegro", "Mongolia", "Mozambique",
"Mauritania","Mauritius","Malawi","Malaysia","Namibia","New Caledonia", "Niger", "Nigeria", "Nicaragua", "Netherlands",
"Norway","Nepal","New Zealand","Oman","Pakistan","Panama","Peru","Philippines","Papua New Guinea","Poland","Puerto
Rico","Portugal","Paraguay","French Polynesia","Qatar","Romania","Russian Federation","Rwanda","Saudi Arabia", "Sudan",
"Senegal","Singapore","Solomon Islands","Sierra Leone","El Salvador","Somalia","Serbia","South Sudan","Sao Tome and
Principe","Suriname","Slovak Republic","Slovenia","Sweden","Swaziland","Seychelles","Syrian Arab Republic", "Chad",
"Togo","Thailand","Tajikistan","Turkmenistan","Timor-Leste","Tonga","Trinidad and Tobago", "Tunisia", "Turkey",
"Tanzania","Uganda","Ukraine","Uruguay","United States","Uzbekistan","St. Vincent and the Grenadines","Venezuela, RB",
"Virgin Islands (U.S.)","Vietnam","Vanuatu","West Bank and Gaza","Samoa","Yemen, Rep.","South Africa","Congo, Dem.
Rep.","Zambia","Zimbabwe")

Codes_2012_Dataset<-
c("ABW","AFG","AGO","ALB","ARE","ARG","ARM","ATG","AUS","AUT","AZE","BDI","BEL","BEN","BFA","BGD","B
GR","BHR","BHS","BIH","BLR","BLZ","BMU","BOL","BRA","BRB","BRN","BTN","BWA","CAF","CAN","CHE","CHL","
CHN","CIV","CMR","COG","COL","COM","CPV","CRI","CUB","CYM","CYP","CZE","DEU","DJI","DNK","DOM","DZA"
,"ECU","EGY","ERI","ESP","EST","ETH","FIN","FJI","FRA","FSM","GAB","GBR","GEO","GHA","GIN","GMB","GNB","
GNQ","GRC","GRD","GRL","GTM","GUM","GUY","HKG","HND","HRV","HTI","HUN","IDN","IND","IRL","IRN","IRQ",
"ISL","ISR","ITA","JAM","JOR","JPN","KAZ","KEN","KGZ","KHM","KIR","KOR","KWT","LAO","LBN","LBR","LBY","L
CA","LIE","LKA","LSO","LTU","LUX","LVA","MAC","MAR","MDA","MDG","MDV","MEX","MKD","MLI","MLT","MM

10 | P a g e
R","MNE","MNG","MOZ","MRT","MUS","MWI","MYS","NAM","NCL","NER","NGA","NIC","NLD","NOR","NPL","NZL"
,"OMN","PAK","PAN","PER","PHL","PNG","POL","PRI","PRT","PRY","PYF","QAT","ROU","RUS","RWA","SAU","SDN",
"SEN","SGP","SLB","SLE","SLV","SOM","SRB","SSD","STP","SUR","SVK","SVN","SWE","SWZ","SYC","SYR","TCD","
TGO","THA","TJK","TKM","TLS","TON","TTO","TUN","TUR","TZA","UGA","UKR","URY","USA","UZB","VCT","VEN",
"VIR","VNM","VUT","PSE","WSM","YEM","ZAF","COD","ZMB","ZWE")

Regions_2012_Dataset <- c("The Americas","Asia","Africa","Europe","Middle East","The Americas","Asia","The Americas",


"Oceania","Europe","Asia","Africa","Europe","Africa","Africa","Asia","Europe","Middle East","The Americas", "Europe",
"Europe","The Americas","The Americas","The Americas","The Americas","The Americas", "Asia", "Asia", "Africa", "Africa",
"The Americas","Europe","The Americas","Asia","Africa","Africa","Africa","The Americas","Africa","Africa","The Americas",
"The Americas","The Americas","Europe","Europe","Europe","Africa","Europe","The Americas","Africa","The Americas",
"Africa","Africa","Europe","Europe","Africa","Europe","Oceania","Europe","Oceania","Africa","Europe","Asia","Africa","Afric
a","Africa","Africa","Africa","Europe","The Americas","The Americas","The Americas","Oceania","The Americas","Asia","The
Americas","Europe","The Americas","Europe","Asia","Asia","Europe","Middle East","Middle East","Europe","Middle East",
"Europe","The Americas","Middle East","Asia","Asia","Africa","Asia","Asia","Oceania","Asia","Middle East","Asia","Middle
East","Africa","Africa","The Americas", "Europe", "Asia", "Africa", "Europe", "Europe", "Europe", "Asia", "Africa", "Europe",
"Africa","Asia","The Americas", "Europe", "Africa", "Europe", "Asia", "Europe", "Asia", "Africa"," Africa", "Africa", "Africa",
"Asia","Africa","Oceania","Africa","Africa","The Americas","Europe","Europe","Asia","Oceania","Middle East","Asia","The
Americas","The Americas","Asia","Oceania","Europe","The Americas","Europe","The Americas","Oceania","Middle East",
"Europe","Europe","Africa","Middle East","Africa","Africa","Asia","Oceania","Africa","The Americas", "Africa", "Europe",
"Africa","Africa","The Americas","Europe","Europe","Europe","Africa","Africa","Middle East", "Africa", "Africa", "Asia",
"Asia","Asia","Asia","Oceania","The Americas","Africa","Europe","Africa","Africa","Europe","The Americas","The Americas",
"Asia","The Americas","The Americas","The Americas","Asia","Oceania","Middle East","Oceania","Middle East", "Africa",
"Africa","Africa","Africa")

Country_Code <-
c("ABW","AFG","AGO","ALB","ARE","ARG","ARM","ATG","AUS","AUT","AZE","BDI","BEL","BEN","BFA","BGD","B
GR","BHR","BHS","BIH","BLR","BLZ","BOL","BRA","BRB","BRN","BTN","BWA","CAF","CAN","CHE","CHL","CHN","
CIV","CMR","COG","COL","COM","CPV","CRI","CUB","CYP","CZE","DEU","DJI","DNK","DOM","DZA","ECU","EGY",
"ERI","ESP","EST","ETH","FIN","FJI","FRA","FSM","GAB","GBR","GEO","GHA","GIN","GMB","GNB","GNQ","GRC","G
RD","GTM","GUM","GUY","HKG","HND","HRV","HTI","HUN","IDN","IND","IRL","IRN","IRQ","ISL","ITA","JAM","JO
R","JPN","KAZ","KEN","KGZ","KHM","KIR","KOR","KWT","LAO","LBN","LBR","LBY","LCA","LKA","LSO","LTU","L
UX","LVA","MAC","MAR","MDA","MDG","MDV","MEX","MKD","MLI","MLT","MMR","MNE","MNG","MOZ","MRT",
"MUS","MWI","MYS","NAM","NCL","NER","NGA","NIC","NLD","NOR","NPL","NZL","OMN","PAK","PAN","PER","PH
L","PNG","POL","PRI","PRT","PRY","PYF","QAT","ROU","RUS","RWA","SAU","SDN","SEN","SGP","SLB","SLE","SLV"
,"SOM","SSD","STP","SUR","SVK","SVN","SWE","SWZ","SYR","TCD","TGO","THA","TJK","TKM","TLS","TON","TTO"
,"TUN","TUR","TZA","UGA","UKR","URY","USA","UZB","VCT","VEN","VIR","VNM","VUT","WSM","YEM","ZAF","C
OD","ZMB","ZWE")

Life_Expectancy_At_Birth_1960 <-
c(65.5693658536586,32.328512195122,32.9848292682927,62.2543658536585,52.2432195121951,65.2155365853659,65.8634
634146342,61.7827317073171,70.8170731707317,68.5856097560976,60.836243902439,41.2360487804878,69.7019512195122
,37.2782682926829,34.4779024390244,45.8293170731707,69.2475609756098,52.0893658536585,62.7290487804878,60.27621
95121951,67.7080975609756,59.9613658536585,42.1183170731707,54.2054634146342,60.7380487804878,62.5003658536585
,32.3593658536585,50.5477317073171,36.4826341463415,71.1331707317073,71.3134146341463,57.4582926829268,43.46580
48780488,36.8724146341463,41.523756097561,48.5816341463415,56.716756097561,41.4424390243903,48.8564146341463,6
0.5761951219512,63.9046585365854,69.5939268292683,70.3487804878049,69.3129512195122,44.0212682926829,72.176585
3658537,51.8452682926829,46.1351219512195,53.215,48.0137073170732,37.3629024390244,69.1092682926829,67.90597560
97561,38.4057073170732,68.819756097561,55.9584878048781,69.8682926829268,57.5865853658537,39.5701219512195,71.1
268292682927,63.4318536585366,45.8314634146342,34.8863902439024,32.0422195121951,37.8404390243902,36.733048780
4878,68.1639024390244,59.8159268292683,45.5316341463415,61.2263414634146,60.2787317073171,66.9997073170732,46.2
883170731707,64.6086585365854,42.1000975609756,68.0031707317073,48.6403170731707,41.1719512195122,69.691756097
561,44.945512195122,48.0306829268293,73.4286585365854,69.1239024390244,64.1918292682927,52.6852682926829,67.666
0975609756,58.3675853658537,46.3624146341463,56.1280731707317,41.2320243902439,49.2159756097561,53.00131707317
07,60.3479512195122,43.2044634146342,63.2801219512195,34.7831707317073,42.6411951219512,57.303756097561,59.7471
463414634,46.5107073170732,69.8473170731707,68.4463902439024,69.7868292682927,64.6609268292683,48.446634146341
5,61.8127804878049,39.9746829268293,37.2686341463415,57.0656341463415,60.6228048780488,28.2116097560976,67.6017
11 | P a g e
804878049,42.7363902439024,63.7056097560976,48.3688048780488,35.0037073170732,43.4830975609756,58.745219512195
1,37.7736341463415,59.4753414634146,46.8803902439024,58.6390243902439,35.5150487804878,37.1829512195122,46.9988
292682927,73.3926829268293,73.549756097561,35.1708292682927,71.2365853658537,42.6670731707317,45.2904634146342
,60.8817073170732,47.6915853658537,57.8119268292683,38.462243902439,67.6804878048781,68.7196097560976,62.808926
8292683,63.7937073170732,56.3570487804878,61.2060731707317,65.6424390243903,66.0552926829268,42.2492926829268,
45.6662682926829,48.1876341463415,38.206,65.6598292682927,49.3817073170732,30.3315365853659,49.9479268292683,36
.9658780487805,31.6767073170732,50.4513658536585,59.6801219512195,69.9759268292683,68.9780487804878,73.0056097
560976,44.2337804878049,52.768243902439,38.0161219512195,40.2728292682927,54.6993170731707,56.1535365853659,54.
4586829268293,33.7271219512195,61.3645365853659,62.6575853658537,42.009756097561,45.3844146341463,43.653878048
7805,43.9835609756098,68.2995365853659,67.8963902439025,69.7707317073171,58.8855365853659,57.7238780487805,59.2
851219512195,63.7302195121951,59.0670243902439,46.4874878048781,49.969512195122,34.3638048780488,49.0362926829
268,41.0180487804878,45.1098048780488,51.5424634146342)

Life_Expectancy_At_Birth_2013 <-
c(75.3286585365854,60.0282682926829,51.8661707317073,77.537243902439,77.1956341463415,75.9860975609756,74.5613
658536585,75.7786585365854,82.1975609756098,80.890243902439,70.6931463414634,56.2516097560976,80.3853658536585
,59.3120243902439,58.2406341463415,71.245243902439,74.4658536585366,76.5459512195122,75.0735365853659,76.276926
8292683,72.4707317073171,69.9820487804878,67.9134390243903,74.1224390243903,75.3339512195122,78.5466585365854,
69.1029268292683,64.3608048780488,49.8798780487805,81.4011219512195,82.7487804878049,81.1979268292683,75.35302
43902439,51.2084634146342,55.0418048780488,61.6663902439024,73.8097317073171,62.9321707317073,72.9723658536585
,79.2252195121951,79.2563902439025,79.9497804878049,78.2780487804878,81.0439024390244,61.6864634146342,80.30243
90243903,73.3199024390244,74.5689512195122,75.648512195122,70.9257804878049,63.1778780487805,82.4268292682927,
76.4243902439025,63.4421951219512,80.8317073170732,69.9179268292683,81.9682926829268,68.9733902439024,63.84358
53658537,80.9560975609756,74.079512195122,61.1420731707317,58.216487804878,59.9992682926829,54.8384146341464,5
7.2908292682927,80.6341463414634,73.1935609756098,71.4863902439024,78.872512195122,66.3100243902439,83.8317073
170732,72.9428536585366,77.1268292682927,62.4011463414634,75.2682926829268,68.7046097560976,67.6604146341463,8
1.0439024390244,75.1259756097561,69.4716829268293,83.1170731707317,82.290243902439,73.4689268292683,73.9014146
341463,83.3319512195122,70.45,60.9537804878049,70.2024390243902,67.7720487804878,65.7665853658537,81.4597560975
61,74.462756097561,65.687243902439,80.1288780487805,60.5203902439024,71.6576829268293,74.9127073170732,74.24029
26829268,49.3314634146342,74.1634146341464,81.7975609756098,73.9804878048781,80.3391463414634,73.7090487804878
,68.811512195122,64.6739024390244,76.6026097560976,76.5326585365854,75.1870487804878,57.5351951219512,80.746341
4634146,65.6540975609756,74.7583658536585,69.0618048780488,54.641512195122,62.8027073170732,74.46,61.466,74.5675
12195122,64.3438780487805,77.1219512195122,60.8281463414634,52.4421463414634,74.514756097561,81.1048780487805,
81.4512195121951,69.222,81.4073170731707,76.8410487804878,65.9636829268293,77.4192195121951,74.2838536585366,68
.1315609756097,62.4491707317073,76.8487804878049,78.7111951219512,80.3731707317073,72.7991707317073,76.3340731
707317,78.4184878048781,74.4634146341463,71.0731707317073,63.3948292682927,74.1776341463415,63.1670487804878,6
5.878756097561,82.3463414634146,67.7189268292683,50.3631219512195,72.4981463414634,55.0230243902439,55.2209024
390244,66.259512195122,70.99,76.2609756097561,80.2780487804878,81.7048780487805,48.9379268292683,74.71578048780
49,51.1914878048781,59.1323658536585,74.2469268292683,69.4001707317073,65.4565609756098,67.5223658536585,72.640
3414634147,70.3052926829268,73.6463414634147,75.1759512195122,64.2918292682927,57.7676829268293,71.15951219512
2,76.8361951219512,78.8414634146341,68.2275853658537,72.8108780487805,74.0744146341464,79.6243902439024,75.7564
87804878,71.669243902439,73.2503902439024,63.583512195122,56.7365853658537,58.2719268292683,59.2373658536585,5
5.633)

12 | P a g e
demographic_data_analysis.r

library(ggplot2)
#Use any one of the methods to read the CSV file
#Method 1.: dataset<-read.csv(file.choose())
#Method 2 : setwd('C:/Users/Praahas/Projects/R-Lab/Demographic-Data')
#dataset<-read.csv('P2-Demographic-Data.csv')
dataset = read.csv('C:/Users/Praahas/Projects/R-Lab/Demographic-Data/P2-Demographic-Data.csv')
#View the Dataset after reading it
View(dataset)

#Scatterplot illustrating Birth Rate and Internet Usage statistics by Country categorizing by Countries’ Income Groups.

qplot(data=dataset,x=Internet.users,y=Birth.rate,size=I(4),colour=Income.Group)

Fig.1 – Birth Rate vs Internet Users categorized based on Income Group


Observation: It can be observed from the graph shown in Fig.1 that Birth Rate is lower in the Higher Income group and Upper
Middle Income Group and is higher in the Lower Income Group. A correlation can be observed with the Percentage of Internet
users as well. With Internet Users being greater than 50% in the Higher Income Group a decreasing trend in the Birth Rate is
observed with increased Internet Usage. With Internet Users being lesser than 25% in the Low Income Group an increasing trend
in the Birth Rate is observed with decrease in Internet Usage.

merged_df<-merge(dataset,df,by.x="Country.Code",by.y="Code")
merged_df$Country<-NULL

#Setting Transparency to 50% so that overlapping datapoints are visible. Shape can be set with values from (#Scatterplot
illustrating Birth Rate and Internet Usage statistics by Country categorizing by Countries’ Regions. Additional data required is
present in Vectors

#Building DataFrame from Vectors and setting column names to Country, Code and Region
df<-data.frame(Country=Countries_2012_Dataset,Code=Codes_2012_Dataset,Region=Regions_2012_Dataset)

#Merge the DataFrame df with original dataset and remove any columns with repeated information
qplot(data=merged_df,x=Internet.users,y=Birth.rate,colour=Region,size=I(5),shape=I(17),alpha=I(0.5),main="Birth Rate vs
Internet Users")

13 | P a g e
Fig.2 – Birth Rate vs Internet Users categorized based on Region
Observation: It can be observed from the graph shown in Fig.2 that Birth Rate is lower in the Europe Region where the
percentage of Internet Users is greater than 50%. In the Africa Region the percentage of Internet users is less than 25%and the
Birth Rate is significantly higher compared to Europe. Same trend is observed in Asia and Middle East as well, where there is an
increase in the birth rate when the internet users’ percentage is decreased and vice versa.

# Scatterplot depicting Life Expectancy and Fertility Rate statistics by Country categorizing by Countries’ Regions.

dataset = read.csv('C:/Users/Praahas/Projects/R-Lab/Demographic-Data/P2-Demographic-Data_2.csv')
#View the Dataset after reading it
# Splitting the data frame into two: data1960 and data2013.
data1960<-dataset[dataset$Year==1960,]
data2013<-dataset[dataset$Year==2013,]
#Build DataFrame from Data Vectors
add1960<-data.frame(Code=Country_Code,Life.Exp=Life_Expectancy_At_Birth_1960)
add2013<-data.frame(Code=Country_Code,Life.Exp=Life_Expectancy_At_Birth_2013)
##The following lines can be used to view the dataset in Tabular form. Take screenshot of the Tables for your record instead of
writing demographic_data.r

#View(head(add1960))
#View(head(add2013))

#Merge the DataFrames & remove repeated columns


merged1960<-merge(data1960,add1960,by.x="Country.Code",by.y="Code")
merged2013<-merge(data2013,add2013,by.x="Country.Code",by.y="Code")
merged1960$Year<-NULL
merged2013$Year<-NULL
qplot(data=merged1960,x=Fertility.Rate,y=Life.Exp,colour=Region,size=I(5),shape=I(17),alpha=I(0.5),main="Fertility Rate vs
Life Expectancy (1960)")
qplot(data=merged2013,x=Fertility.Rate,y=Life.Exp,colour=Region,size=I(5),shape=I(17),alpha=I(0.5),main="Fertility Rate vs
Life Expectancy (2013)")

14 | P a g e
Fig.3 – Life Expectancy vs Fertility Rate (1960) categorized based on Region

Fig.4 – Life Expectancy vs Fertility Rate (2013) categorized based on Region

Note Down your Observations for Life Expectancy vs. Fertility Rate and how it compares in 2013 with respect to 1960

15 | P a g e

You might also like