03 - Demographic Data Analysis
03 - Demographic Data Analysis
1. You are required to produce a scatterplot illustrating Birth Rate and Internet Usage statistics by Country. The scatter
plot needs to also be categorized by Countries’ Income Groups.
2. Produce a second scatterplot also illustrating Birth Rate and Internet Usage statistics by Country. Scatterplot needs to
be categorized by Countries’ Regions. Additional data required is present in Vectors
3. You are required to produce a scatterplot depicting Life Expectancy (y-Axis) and Fertility Rate(x-Axis) statistics by
Country. The Scatter plot needs to also be categorized by Countries’ Regions.
4. You are supplied with data for 2 years:1960 and 2013 and are required to produce a visualization for each of these
years. Some data has been provided in a CSV file and some in R-Vectors. The CSV file contains combined data for
both years.
5. All data manipulation has to be done on R as the project may be audited at a later stage.
6. You have also been requested to provide insights into how the two periods compare.
Hint#1: After you import the CSV file into R, the first step is to split the data frame into two: data1960 and data2013 – Use data
frame filtering
Solution:
Datasets are provided as .CSV files. Additional Data is given as vectors in demographic_data.r
demographic_data.r
rm(list=ls())
Codes_2012_Dataset<-
c("ABW","AFG","AGO","ALB","ARE","ARG","ARM","ATG","AUS","AUT","AZE","BDI","BEL","BEN","BFA","BGD","B
GR","BHR","BHS","BIH","BLR","BLZ","BMU","BOL","BRA","BRB","BRN","BTN","BWA","CAF","CAN","CHE","CHL","
CHN","CIV","CMR","COG","COL","COM","CPV","CRI","CUB","CYM","CYP","CZE","DEU","DJI","DNK","DOM","DZA"
,"ECU","EGY","ERI","ESP","EST","ETH","FIN","FJI","FRA","FSM","GAB","GBR","GEO","GHA","GIN","GMB","GNB","
GNQ","GRC","GRD","GRL","GTM","GUM","GUY","HKG","HND","HRV","HTI","HUN","IDN","IND","IRL","IRN","IRQ",
"ISL","ISR","ITA","JAM","JOR","JPN","KAZ","KEN","KGZ","KHM","KIR","KOR","KWT","LAO","LBN","LBR","LBY","L
CA","LIE","LKA","LSO","LTU","LUX","LVA","MAC","MAR","MDA","MDG","MDV","MEX","MKD","MLI","MLT","MM
10 | P a g e
R","MNE","MNG","MOZ","MRT","MUS","MWI","MYS","NAM","NCL","NER","NGA","NIC","NLD","NOR","NPL","NZL"
,"OMN","PAK","PAN","PER","PHL","PNG","POL","PRI","PRT","PRY","PYF","QAT","ROU","RUS","RWA","SAU","SDN",
"SEN","SGP","SLB","SLE","SLV","SOM","SRB","SSD","STP","SUR","SVK","SVN","SWE","SWZ","SYC","SYR","TCD","
TGO","THA","TJK","TKM","TLS","TON","TTO","TUN","TUR","TZA","UGA","UKR","URY","USA","UZB","VCT","VEN",
"VIR","VNM","VUT","PSE","WSM","YEM","ZAF","COD","ZMB","ZWE")
Country_Code <-
c("ABW","AFG","AGO","ALB","ARE","ARG","ARM","ATG","AUS","AUT","AZE","BDI","BEL","BEN","BFA","BGD","B
GR","BHR","BHS","BIH","BLR","BLZ","BOL","BRA","BRB","BRN","BTN","BWA","CAF","CAN","CHE","CHL","CHN","
CIV","CMR","COG","COL","COM","CPV","CRI","CUB","CYP","CZE","DEU","DJI","DNK","DOM","DZA","ECU","EGY",
"ERI","ESP","EST","ETH","FIN","FJI","FRA","FSM","GAB","GBR","GEO","GHA","GIN","GMB","GNB","GNQ","GRC","G
RD","GTM","GUM","GUY","HKG","HND","HRV","HTI","HUN","IDN","IND","IRL","IRN","IRQ","ISL","ITA","JAM","JO
R","JPN","KAZ","KEN","KGZ","KHM","KIR","KOR","KWT","LAO","LBN","LBR","LBY","LCA","LKA","LSO","LTU","L
UX","LVA","MAC","MAR","MDA","MDG","MDV","MEX","MKD","MLI","MLT","MMR","MNE","MNG","MOZ","MRT",
"MUS","MWI","MYS","NAM","NCL","NER","NGA","NIC","NLD","NOR","NPL","NZL","OMN","PAK","PAN","PER","PH
L","PNG","POL","PRI","PRT","PRY","PYF","QAT","ROU","RUS","RWA","SAU","SDN","SEN","SGP","SLB","SLE","SLV"
,"SOM","SSD","STP","SUR","SVK","SVN","SWE","SWZ","SYR","TCD","TGO","THA","TJK","TKM","TLS","TON","TTO"
,"TUN","TUR","TZA","UGA","UKR","URY","USA","UZB","VCT","VEN","VIR","VNM","VUT","WSM","YEM","ZAF","C
OD","ZMB","ZWE")
Life_Expectancy_At_Birth_1960 <-
c(65.5693658536586,32.328512195122,32.9848292682927,62.2543658536585,52.2432195121951,65.2155365853659,65.8634
634146342,61.7827317073171,70.8170731707317,68.5856097560976,60.836243902439,41.2360487804878,69.7019512195122
,37.2782682926829,34.4779024390244,45.8293170731707,69.2475609756098,52.0893658536585,62.7290487804878,60.27621
95121951,67.7080975609756,59.9613658536585,42.1183170731707,54.2054634146342,60.7380487804878,62.5003658536585
,32.3593658536585,50.5477317073171,36.4826341463415,71.1331707317073,71.3134146341463,57.4582926829268,43.46580
48780488,36.8724146341463,41.523756097561,48.5816341463415,56.716756097561,41.4424390243903,48.8564146341463,6
0.5761951219512,63.9046585365854,69.5939268292683,70.3487804878049,69.3129512195122,44.0212682926829,72.176585
3658537,51.8452682926829,46.1351219512195,53.215,48.0137073170732,37.3629024390244,69.1092682926829,67.90597560
97561,38.4057073170732,68.819756097561,55.9584878048781,69.8682926829268,57.5865853658537,39.5701219512195,71.1
268292682927,63.4318536585366,45.8314634146342,34.8863902439024,32.0422195121951,37.8404390243902,36.733048780
4878,68.1639024390244,59.8159268292683,45.5316341463415,61.2263414634146,60.2787317073171,66.9997073170732,46.2
883170731707,64.6086585365854,42.1000975609756,68.0031707317073,48.6403170731707,41.1719512195122,69.691756097
561,44.945512195122,48.0306829268293,73.4286585365854,69.1239024390244,64.1918292682927,52.6852682926829,67.666
0975609756,58.3675853658537,46.3624146341463,56.1280731707317,41.2320243902439,49.2159756097561,53.00131707317
07,60.3479512195122,43.2044634146342,63.2801219512195,34.7831707317073,42.6411951219512,57.303756097561,59.7471
463414634,46.5107073170732,69.8473170731707,68.4463902439024,69.7868292682927,64.6609268292683,48.446634146341
5,61.8127804878049,39.9746829268293,37.2686341463415,57.0656341463415,60.6228048780488,28.2116097560976,67.6017
11 | P a g e
804878049,42.7363902439024,63.7056097560976,48.3688048780488,35.0037073170732,43.4830975609756,58.745219512195
1,37.7736341463415,59.4753414634146,46.8803902439024,58.6390243902439,35.5150487804878,37.1829512195122,46.9988
292682927,73.3926829268293,73.549756097561,35.1708292682927,71.2365853658537,42.6670731707317,45.2904634146342
,60.8817073170732,47.6915853658537,57.8119268292683,38.462243902439,67.6804878048781,68.7196097560976,62.808926
8292683,63.7937073170732,56.3570487804878,61.2060731707317,65.6424390243903,66.0552926829268,42.2492926829268,
45.6662682926829,48.1876341463415,38.206,65.6598292682927,49.3817073170732,30.3315365853659,49.9479268292683,36
.9658780487805,31.6767073170732,50.4513658536585,59.6801219512195,69.9759268292683,68.9780487804878,73.0056097
560976,44.2337804878049,52.768243902439,38.0161219512195,40.2728292682927,54.6993170731707,56.1535365853659,54.
4586829268293,33.7271219512195,61.3645365853659,62.6575853658537,42.009756097561,45.3844146341463,43.653878048
7805,43.9835609756098,68.2995365853659,67.8963902439025,69.7707317073171,58.8855365853659,57.7238780487805,59.2
851219512195,63.7302195121951,59.0670243902439,46.4874878048781,49.969512195122,34.3638048780488,49.0362926829
268,41.0180487804878,45.1098048780488,51.5424634146342)
Life_Expectancy_At_Birth_2013 <-
c(75.3286585365854,60.0282682926829,51.8661707317073,77.537243902439,77.1956341463415,75.9860975609756,74.5613
658536585,75.7786585365854,82.1975609756098,80.890243902439,70.6931463414634,56.2516097560976,80.3853658536585
,59.3120243902439,58.2406341463415,71.245243902439,74.4658536585366,76.5459512195122,75.0735365853659,76.276926
8292683,72.4707317073171,69.9820487804878,67.9134390243903,74.1224390243903,75.3339512195122,78.5466585365854,
69.1029268292683,64.3608048780488,49.8798780487805,81.4011219512195,82.7487804878049,81.1979268292683,75.35302
43902439,51.2084634146342,55.0418048780488,61.6663902439024,73.8097317073171,62.9321707317073,72.9723658536585
,79.2252195121951,79.2563902439025,79.9497804878049,78.2780487804878,81.0439024390244,61.6864634146342,80.30243
90243903,73.3199024390244,74.5689512195122,75.648512195122,70.9257804878049,63.1778780487805,82.4268292682927,
76.4243902439025,63.4421951219512,80.8317073170732,69.9179268292683,81.9682926829268,68.9733902439024,63.84358
53658537,80.9560975609756,74.079512195122,61.1420731707317,58.216487804878,59.9992682926829,54.8384146341464,5
7.2908292682927,80.6341463414634,73.1935609756098,71.4863902439024,78.872512195122,66.3100243902439,83.8317073
170732,72.9428536585366,77.1268292682927,62.4011463414634,75.2682926829268,68.7046097560976,67.6604146341463,8
1.0439024390244,75.1259756097561,69.4716829268293,83.1170731707317,82.290243902439,73.4689268292683,73.9014146
341463,83.3319512195122,70.45,60.9537804878049,70.2024390243902,67.7720487804878,65.7665853658537,81.4597560975
61,74.462756097561,65.687243902439,80.1288780487805,60.5203902439024,71.6576829268293,74.9127073170732,74.24029
26829268,49.3314634146342,74.1634146341464,81.7975609756098,73.9804878048781,80.3391463414634,73.7090487804878
,68.811512195122,64.6739024390244,76.6026097560976,76.5326585365854,75.1870487804878,57.5351951219512,80.746341
4634146,65.6540975609756,74.7583658536585,69.0618048780488,54.641512195122,62.8027073170732,74.46,61.466,74.5675
12195122,64.3438780487805,77.1219512195122,60.8281463414634,52.4421463414634,74.514756097561,81.1048780487805,
81.4512195121951,69.222,81.4073170731707,76.8410487804878,65.9636829268293,77.4192195121951,74.2838536585366,68
.1315609756097,62.4491707317073,76.8487804878049,78.7111951219512,80.3731707317073,72.7991707317073,76.3340731
707317,78.4184878048781,74.4634146341463,71.0731707317073,63.3948292682927,74.1776341463415,63.1670487804878,6
5.878756097561,82.3463414634146,67.7189268292683,50.3631219512195,72.4981463414634,55.0230243902439,55.2209024
390244,66.259512195122,70.99,76.2609756097561,80.2780487804878,81.7048780487805,48.9379268292683,74.71578048780
49,51.1914878048781,59.1323658536585,74.2469268292683,69.4001707317073,65.4565609756098,67.5223658536585,72.640
3414634147,70.3052926829268,73.6463414634147,75.1759512195122,64.2918292682927,57.7676829268293,71.15951219512
2,76.8361951219512,78.8414634146341,68.2275853658537,72.8108780487805,74.0744146341464,79.6243902439024,75.7564
87804878,71.669243902439,73.2503902439024,63.583512195122,56.7365853658537,58.2719268292683,59.2373658536585,5
5.633)
12 | P a g e
demographic_data_analysis.r
library(ggplot2)
#Use any one of the methods to read the CSV file
#Method 1.: dataset<-read.csv(file.choose())
#Method 2 : setwd('C:/Users/Praahas/Projects/R-Lab/Demographic-Data')
#dataset<-read.csv('P2-Demographic-Data.csv')
dataset = read.csv('C:/Users/Praahas/Projects/R-Lab/Demographic-Data/P2-Demographic-Data.csv')
#View the Dataset after reading it
View(dataset)
#Scatterplot illustrating Birth Rate and Internet Usage statistics by Country categorizing by Countries’ Income Groups.
qplot(data=dataset,x=Internet.users,y=Birth.rate,size=I(4),colour=Income.Group)
merged_df<-merge(dataset,df,by.x="Country.Code",by.y="Code")
merged_df$Country<-NULL
#Setting Transparency to 50% so that overlapping datapoints are visible. Shape can be set with values from (#Scatterplot
illustrating Birth Rate and Internet Usage statistics by Country categorizing by Countries’ Regions. Additional data required is
present in Vectors
#Building DataFrame from Vectors and setting column names to Country, Code and Region
df<-data.frame(Country=Countries_2012_Dataset,Code=Codes_2012_Dataset,Region=Regions_2012_Dataset)
#Merge the DataFrame df with original dataset and remove any columns with repeated information
qplot(data=merged_df,x=Internet.users,y=Birth.rate,colour=Region,size=I(5),shape=I(17),alpha=I(0.5),main="Birth Rate vs
Internet Users")
13 | P a g e
Fig.2 – Birth Rate vs Internet Users categorized based on Region
Observation: It can be observed from the graph shown in Fig.2 that Birth Rate is lower in the Europe Region where the
percentage of Internet Users is greater than 50%. In the Africa Region the percentage of Internet users is less than 25%and the
Birth Rate is significantly higher compared to Europe. Same trend is observed in Asia and Middle East as well, where there is an
increase in the birth rate when the internet users’ percentage is decreased and vice versa.
# Scatterplot depicting Life Expectancy and Fertility Rate statistics by Country categorizing by Countries’ Regions.
dataset = read.csv('C:/Users/Praahas/Projects/R-Lab/Demographic-Data/P2-Demographic-Data_2.csv')
#View the Dataset after reading it
# Splitting the data frame into two: data1960 and data2013.
data1960<-dataset[dataset$Year==1960,]
data2013<-dataset[dataset$Year==2013,]
#Build DataFrame from Data Vectors
add1960<-data.frame(Code=Country_Code,Life.Exp=Life_Expectancy_At_Birth_1960)
add2013<-data.frame(Code=Country_Code,Life.Exp=Life_Expectancy_At_Birth_2013)
##The following lines can be used to view the dataset in Tabular form. Take screenshot of the Tables for your record instead of
writing demographic_data.r
#View(head(add1960))
#View(head(add2013))
14 | P a g e
Fig.3 – Life Expectancy vs Fertility Rate (1960) categorized based on Region
Note Down your Observations for Life Expectancy vs. Fertility Rate and how it compares in 2013 with respect to 1960
15 | P a g e