Case Study Final - Group 05 - Lokesh R M
Case Study Final - Group 05 - Lokesh R M
Submitted by:
LOKESH R M BU22CSEN0100145
INDLA ABHISHEK BU22CSEN0101761
VISHNUPRIYA T BU22CSEN0100996
DERANGULA DIVYA SANDHYA BU22CSEN0101184
YAKKANTI PAVAN BU22CSEN0101230
5-6-2023
Analyzing the COVID-19 Pandemic
with Pandas and Matplotlib: A Case
Study
Introduction:
The COVID-19 pandemic has affected the entire world in
unprecedented ways. With the continuous spread of the virus,
understanding its impact and tracking its growth has become
crucial. One way to achieve this is through data analysis, which
can help identify patterns and trends that aid in predicting the
spread of the virus and making informed decisions.
This case study explores the COVID-19 dataset from Kaggle, which
contains daily reports of confirmed cases, deaths, and
recoveries across various countries and regions. We will use the
Python data analysis library Pandas to clean, manipulate, and
analyze the dataset. Additionally, we will use the data
visualization library Matplotlib to create visualizations that
help us interpret and communicate our findings effectively. By
the end of this case study, you will have a deeper understanding
of how to use Pandas and Matplotlib to analyze and visualize
COVID-19 data.
Pandas:
It is a Python library used for working with data sets. It has
functions for analyzing, cleaning, exploring, and manipulating
data. The name "Pandas" has a reference to both "Panel Data",
and "Python Data Analysis" and was created by Wes McKinney in
2008.
To read the data from the CSV file, we will use the read_csv()
function. The read_csv() function is part of the Pandas library,
and will read the CSV file into a DataFrame.
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print(df)
Country/Region Confirmed Deaths Recovered Active New cases
0 Afghanistan 36263 1269 25198 9796 106 \
1 Albania 4880 144 2745 1991 117
2 Algeria 27973 1163 18837 7973 616
3 Andorra 907 52 803 52 10
4 Angola 950 41 242 667 18
.. ... ... ... ... ... ...
182 West Bank and Gaza 10621 78 3752 6791 152
183 Western Sahara 10 1 8 1 0
184 Yemen 1691 483 833 375 10
185 Zambia 4552 140 2815 1597 71
186 Zimbabwe 2704 36 542 2126 192
New deaths New recovered Deaths / 100 Cases Recovered / 100 Cases
0 10 18 3.50 69.49 \
1 6 63 2.95 56.25
2 8 749 4.16 67.34
3 0 0 5.73 88.53
4 1 0 4.32 25.47
.. ... ... ... ...
182 2 0 0.73 35.33
183 0 0 10.00 80.00
184 4 36 28.56 49.26
185 1 465 3.08 61.84
186 2 24 1.33 20.04
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 187 entries, 0 to 186
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Country/Region 187 non-null object
1 Confirmed 187 non-null int64
2 Deaths 187 non-null int64
3 Recovered 187 non-null int64
4 Active 187 non-null int64
5 New cases 187 non-null int64
6 New deaths 187 non-null int64
7 New recovered 187 non-null int64
8 Deaths / 100 Cases 187 non-null float64
9 Recovered / 100 Cases 187 non-null float64
10 Deaths / 100 Recovered 187 non-null float64
11 Confirmed last week 187 non-null int64
12 1 week change 187 non-null int64
13 1 week % increase 187 non-null float64
14 WHO Region 187 non-null object
dtypes: float64(4), int64(9), object(2)
memory usage: 22.0+ KB
Using the size attribute, we can find the number of rows and
columns in the dataset.
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The shape of the given data is:")
print(df.shape)
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The Columns index attributes are:")
print(df.columns)
The Columns index attributes are:
Index(['Country/Region', 'Confirmed', 'Deaths', 'Recovered', 'Active',
'New cases', 'New deaths', 'New recovered', 'Deaths / 100 Cases',
'Recovered / 100 Cases', 'Deaths / 100 Recovered',
'Confirmed last week', '1 week change', '1 week % increase',
'WHO Region'],
dtype='object')
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
Using the head() function, we can get the first five rows of the
dataset.
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The first 5 rows of the given data is:")
print(df.head())
The first 5 rows of the given data is:
Country/Region Confirmed Deaths Recovered Active New cases New deaths
0 Afghanistan 36263 1269 25198 9796 106 10 \
1 Albania 4880 144 2745 1991 117 6
2 Algeria 27973 1163 18837 7973 616 8
3 Andorra 907 52 803 52 10 0
4 Angola 950 41 242 667 18 1
Using the tail() function, we can get the last five rows of the
dataset.
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The last 5 rows of the given data is:")
print(df.tail())
The last 5 rows of the given data is:
Country/Region Confirmed Deaths Recovered Active New cases
182 West Bank and Gaza 10621 78 3752 6791 152 \
183 Western Sahara 10 1 8 1 0
184 Yemen 1691 483 833 375 10
185 Zambia 4552 140 2815 1597 71
186 Zimbabwe 2704 36 542 2126 192
New deaths New recovered Deaths / 100 Cases Recovered / 100 Cases
182 2 0 0.73 35.33 \
183 0 0 10.00 80.00
184 4 36 28.56 49.26
185 1 465 3.08 61.84
186 2 24 1.33 20.04
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The 5 random rows of the given data is:")
print(df.sample(5))
The 5 random rows of the given data is:
Country/Region Confirmed Deaths Recovered Active New cases
95 Latvia 1219 31 1045 143 0 \
0 Afghanistan 36263 1269 25198 9796 106
105 Malaysia 8904 124 8601 179 7
116 Morocco 20887 316 16553 4018 609
163 Syria 674 40 0 634 24
New deaths New recovered Deaths / 100 Cases Recovered / 100 Cases
95 0 0 2.54 85.73 \
0 10 18 3.50 69.49
105 0 1 1.39 96.60
116 3 115 1.51 79.25
163 2 0 5.93 0.00
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
numeric_cols = df.select_dtypes(include='number')
col_min = numeric_cols.min()
col_max = numeric_cols.max()
col_mean = numeric_cols.mean()
print("The Minimum values of the numeric columns are:")
print(col_min)
print("---------------------------------------------------------------")
print("The Maximum values of the numeric columns are:")
print(col_max)
print("---------------------------------------------------------------")
print("The Mean values of the numeric columns are:")
print(col_mean)
print("---------------------------------------------------------------")
print("The Sum values of the numeric columns are:")
print(numeric_cols.sum())
print("---------------------------------------------------------------")
print("The Average values of the numeric columns are:")
print(numeric_cols.mean())
print("---------------------------------------------------------------")
print("The Count values of the numeric columns are:")
print(numeric_cols.count())
print("---------------------------------------------------------------")
The Minimum values of the numeric columns are:
Confirmed 10.00
Deaths 0.00
Recovered 0.00
Active 0.00
New cases 0.00
New deaths 0.00
New recovered 0.00
Deaths / 100 Cases 0.00
Recovered / 100 Cases 0.00
Deaths / 100 Recovered 0.00
Confirmed last week 10.00
1 week change -47.00
1 week % increase -3.84
dtype: float64
---------------------------------------------------------------
The Maximum values of the numeric columns are:
Confirmed 4290259.00
Deaths 148011.00
Recovered 1846641.00
Active 2816444.00
New cases 56336.00
New deaths 1076.00
New recovered 33728.00
Deaths / 100 Cases 28.56
Recovered / 100 Cases 100.00
Deaths / 100 Recovered inf
Confirmed last week 3834677.00
1 week change 455582.00
1 week % increase 226.32
dtype: float64
---------------------------------------------------------------
The Mean values of the numeric columns are:
Confirmed 8.813094e+04
Deaths 3.497519e+03
Recovered 5.063148e+04
Active 3.400194e+04
New cases 1.222957e+03
New deaths 2.895722e+01
New recovered 9.338128e+02
Deaths / 100 Cases 3.019519e+00
Recovered / 100 Cases 6.482053e+01
Deaths / 100 Recovered inf
Confirmed last week 7.868248e+04
1 week change 9.448460e+03
1 week % increase 1.360620e+01
dtype: float64
---------------------------------------------------------------
The Sum values of the numeric columns are:
Confirmed 16480485.00
Deaths 654036.00
Recovered 9468087.00
Active 6358362.00
New cases 228693.00
New deaths 5415.00
New recovered 174623.00
Deaths / 100 Cases 564.65
Recovered / 100 Cases 12121.44
Deaths / 100 Recovered inf
Confirmed last week 14713623.00
1 week change 1766862.00
1 week % increase 2544.36
dtype: float64
---------------------------------------------------------------
The Average values of the numeric columns are:
Confirmed 8.813094e+04
Deaths 3.497519e+03
Recovered 5.063148e+04
Active 3.400194e+04
New cases 1.222957e+03
New deaths 2.895722e+01
New recovered 9.338128e+02
Deaths / 100 Cases 3.019519e+00
Recovered / 100 Cases 6.482053e+01
Deaths / 100 Recovered inf
Confirmed last week 7.868248e+04
1 week change 9.448460e+03
1 week % increase 1.360620e+01
dtype: float64
---------------------------------------------------------------
The Count values of the numeric columns are:
Confirmed 187
Deaths 187
Recovered 187
Active 187
New cases 187
New deaths 187
New recovered 187
Deaths / 100 Cases 187
Recovered / 100 Cases 187
Deaths / 100 Recovered 187
Confirmed last week 187
1 week change 187
1 week % increase 187
dtype: int64
---------------------------------------------------------------
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
dup=df.duplicated()
print(dup)
print("----------------------------------")
print("The number of duplicate rows are:")
print(sum(dup))
0 False
1 False
2 False
3 False
4 False
...
182 False
183 False
184 False
185 False
186 False
Length: 187, dtype: bool
----------------------------------
The number of duplicate rows are:
0
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
3. Basic Analysis
In the below code we used the dtypes attribute to find the data
types of each column in the dataset.
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The data types of each column is:")
print(df.dtypes)
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The number of unique countries/regions are:")
print(df['Country/Region'].nunique())
print("---------------------------------------------------------------")
print("The number of unique WHO Regions are:")
print(df['WHO Region'].nunique())
3. Printing the Name of the 'Countries' with there 'WHO Region Name'
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print(df[['Country/Region','WHO Region']])
4. Adding the columns "Total Cases" which is the sum of "Deaths","Active" and
"Recovered" columns and comparing it with the "Confirmed" column.
4. Sorting
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df=df.sort_values(by=['Confirmed'],ascending=True)
print(df[['Country/Region','Confirmed']])
print("---------------------------------------------------------------")
print("The Country/Region with the lowest number of confirmed cases is:")
print(df.iloc[0,0])
Country/Region Confirmed
183 Western Sahara 10
75 Holy See 12
68 Greenland 14
140 Saint Kitts and Nevis 17
49 Dominica 18
.. ... ...
154 South Africa 452529
138 Russia 816680
79 India 1480073
23 Brazil 2442375
173 US 4290259
2. Sorting the data by the alphabetical order of the countries in ascending order.
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df=df.sort_values(['Country/Region'],ascending=True)
print(df[['Country/Region','Confirmed']])
print("---------------------------------------------------------------")
Country/Region Confirmed
0 Afghanistan 36263
1 Albania 4880
2 Algeria 27973
3 Andorra 907
4 Angola 950
.. ... ...
182 West Bank and Gaza 10621
183 Western Sahara 10
184 Yemen 1691
185 Zambia 4552
186 Zimbabwe 2704
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df=df.sort_values(['Country/Region','Confirmed'],ascending=[True,False])
print(df[['Country/Region','Confirmed']])
print("---------------------------------------------------------------")
print("The Country/Region with the highest number of confirmed cases is:")
print(df.iloc[0,0])
print("---------------------------------------------------------------")
print("The Country/Region with the lowest number of confirmed cases is:")
print(df.iloc[-1,0])
Country/Region Confirmed
0 Afghanistan 36263
1 Albania 4880
2 Algeria 27973
3 Andorra 907
4 Angola 950
.. ... ...
182 West Bank and Gaza 10621
183 Western Sahara 10
184 Yemen 1691
185 Zambia 4552
186 Zimbabwe 2704
4. Extracting the data of range based on the name of the 'Country' starting with 'I'
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df=df[df['Country/Region'].str.startswith('I')]
print(df[['Country/Region','Deaths','Recovered']])
print("---------------------------------------------------------------")
print("The number of countries starting with 'I' are:")
print(df['Country/Region'].nunique())
Country/Region Deaths Recovered
78 Iceland 10 1823
79 India 33408 951166
80 Indonesia 4838 58173
81 Iran 15912 255144
82 Iraq 4458 77144
83 Ireland 1764 23364
84 Israel 474 27133
85 Italy 35112 198593
---------------------------------------------------------------
The number of countries starting with 'I' are:
8
In the code we used the name of the 'Country' as the index and
extracted the data of the 'Country' 'India'.
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df=df.loc[df['Country/Region']=='India']
print(df)
New deaths New recovered Deaths / 100 Cases Recovered / 100 Cases
79 637 33598 2.26 64.26 \
Country/Region Confirmed
0 Afghanistan 36263
1 Albania 4880
2 Algeria 27973
3 Andorra 907
4 Angola 950
5 Antigua and Barbuda 86
6 Argentina 167416
7 Armenia 37390
8 Australia 15303
9 Austria 20558
8. Creating a New column called as 'Review' and giving the feedback by comparing the
Deaths and Recovered cases.
In [ ]: import pandas as pd
32 Canada
117 Mozambique
120 Netherlands
147 Serbia
161 Sweden
163 Syria
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df=df[df['WHO Region'].str.endswith('s')]
print(df[['Country/Region','WHO Region']])
Country/Region WHO Region
5 Antigua and Barbuda Americas
6 Argentina Americas
11 Bahamas Americas
14 Barbados Americas
17 Belize Americas
20 Bolivia Americas
23 Brazil Americas
32 Canada Americas
35 Chile Americas
37 Colombia Americas
41 Costa Rica Americas
44 Cuba Americas
49 Dominica Americas
50 Dominican Republic Americas
51 Ecuador Americas
53 El Salvador Americas
69 Grenada Americas
70 Guatemala Americas
73 Guyana Americas
74 Haiti Americas
76 Honduras Americas
86 Jamaica Americas
111 Mexico Americas
122 Nicaragua Americas
129 Panama Americas
131 Paraguay Americas
132 Peru Americas
140 Saint Kitts and Nevis Americas
141 Saint Lucia Americas
142 Saint Vincent and the Grenadines Americas
160 Suriname Americas
170 Trinidad and Tobago Americas
173 US Americas
178 Uruguay Americas
180 Venezuela Americas
10. Grouping countries based on there "WHO Regions" then add all the values of
Confirmed , Deaths , Recovered and Active cases induvidlly of there colume and
print the data in ascending order of the "WHO Regions".
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df=df.groupby('WHO Region')
print(df.groups)
print("---------------------------------------------------------------")
print("The number of countries in each WHO Region are:")
print(df['Country/Region'].nunique())
print("---------------------------------------------------------------")
print("The sum of Confirmed cases in each WHO Region are:")
print(df['Confirmed'].sum())
print("---------------------------------------------------------------")
print("The sum of Deaths in each WHO Region are:")
print(df['Deaths'].sum())
print("---------------------------------------------------------------")
print("The sum of Recovered cases in each WHO Region are:")
print(df['Recovered'].sum())
print("---------------------------------------------------------------")
print("The sum of Active cases in each WHO Region are:")
print(df['Active'].sum())
print("---------------------------------------------------------------")
print("The sum of New cases in each WHO Region are:")
print(df['New cases'].sum())
print("---------------------------------------------------------------")
print("The sum of New deaths in each WHO Region are:")
print(df['New deaths'].sum())
print("---------------------------------------------------------------")
print("The sum of New recovered cases in each WHO Region are:")
print(df['New recovered'].sum())
{'Africa': [2, 4, 18, 22, 26, 28, 29, 31, 33, 34, 38, 39, 40, 42, 54, 55, 57, 58,
62, 63, 66, 71, 72, 90, 97, 98, 103, 104, 107, 109, 110, 117, 118, 123, 124, 139,
144, 146, 148, 149, 154, 156, 166, 169, 174, 183, 185, 186], 'Americas': [5, 6, 1
1, 14, 17, 20, 23, 32, 35, 37, 41, 44, 49, 50, 51, 53, 69, 70, 73, 74, 76, 86, 11
1, 122, 129, 131, 132, 140, 141, 142, 160, 170, 173, 178, 180], 'Eastern Mediterr
anean': [0, 12, 48, 52, 81, 82, 88, 92, 96, 99, 116, 127, 128, 136, 145, 153, 15
9, 163, 171, 176, 182, 184], 'Europe': [1, 3, 7, 9, 10, 15, 16, 21, 25, 43, 45, 4
6, 47, 56, 60, 61, 64, 65, 67, 68, 75, 77, 78, 83, 84, 85, 89, 91, 93, 95, 100, 1
01, 102, 108, 112, 113, 115, 120, 125, 126, 134, 135, 137, 138, 143, 147, 151, 15
2, 157, 161, 162, 165, 172, 175, 177, 179], 'South-East Asia': [13, 19, 27, 79, 8
0, 106, 119, 158, 167, 168], 'Western Pacific': [8, 24, 30, 36, 59, 87, 94, 105,
114, 121, 130, 133, 150, 155, 164, 181]}
---------------------------------------------------------------
The number of countries in each WHO Region are:
WHO Region
Africa 48
Americas 35
Eastern Mediterranean 22
Europe 56
South-East Asia 10
Western Pacific 16
Name: Country/Region, dtype: int64
---------------------------------------------------------------
The sum of Confirmed cases in each WHO Region are:
WHO Region
Africa 723207
Americas 8839286
Eastern Mediterranean 1490744
Europe 3299523
South-East Asia 1835297
Western Pacific 292428
Name: Confirmed, dtype: int64
---------------------------------------------------------------
The sum of Deaths in each WHO Region are:
WHO Region
Africa 12223
Americas 342732
Eastern Mediterranean 38339
Europe 211144
South-East Asia 41349
Western Pacific 8249
Name: Deaths, dtype: int64
---------------------------------------------------------------
The sum of Recovered cases in each WHO Region are:
WHO Region
Africa 440645
Americas 4468616
Eastern Mediterranean 1201400
Europe 1993723
South-East Asia 1156933
Western Pacific 206770
Name: Recovered, dtype: int64
---------------------------------------------------------------
The sum of Active cases in each WHO Region are:
WHO Region
Africa 270339
Americas 4027938
Eastern Mediterranean 251005
Europe 1094656
South-East Asia 637015
Western Pacific 77409
Name: Active, dtype: int64
---------------------------------------------------------------
The sum of New cases in each WHO Region are:
WHO Region
Africa 12176
Americas 129531
Eastern Mediterranean 12410
Europe 22294
South-East Asia 48993
Western Pacific 3289
Name: New cases, dtype: int64
---------------------------------------------------------------
The sum of New deaths in each WHO Region are:
WHO Region
Africa 353
Americas 3555
Eastern Mediterranean 445
Europe 304
South-East Asia 734
Western Pacific 24
Name: New deaths, dtype: int64
---------------------------------------------------------------
The sum of New recovered cases in each WHO Region are:
WHO Region
Africa 14563
Americas 94776
Eastern Mediterranean 14843
Europe 11732
South-East Asia 37582
Western Pacific 1127
Name: New recovered, dtype: int64
Matplotlib
In this case study we will use it to plot the data of the COVID-
19 dataset.
1. Plotting a Line graph to compare the Confirmed Cases in all the countries
In [ ]: import pandas as pd
import matplotlib.pyplot as plt
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
plt.figure(figsize=(30,25))
plt.plot(df['Country/Region'],df['Confirmed'],marker='h')
plt.xlabel('Country/Region',fontsize=25,color='b')
plt.xticks(rotation=45, ha='right', size=5)
plt.ylabel('Confirmed',fontsize=25,color='b')
plt.title('Confirmed Cases', fontsize=30, fontweight='bold',style='italic',color
plt.show()
plt.figure(figsize=(80,60))
plt.bar(df['Country/Region'],df['Confirmed'],color='red',label='Confirmed')
plt.bar(df['Country/Region'],df['Recovered'],color='green',label='Recovered')
plt.xlabel('Country/Region',fontsize=50,color='b')
plt.xticks(rotation=45, ha='right', size=10)
plt.yticks(size=30)
# Add values to the end of each bar with pixle offset to the top
for i, (confirmed, recover) in enumerate(zip(df['Confirmed'], df['Recovered'])):
plt.annotate(str(confirmed), xy=(i, confirmed+150000), va='center', ha='cent
fontsize=13, fontweight='bold',rotation=90)
plt.annotate(str(recover), xy=(i, confirmed+50000), va='center', ha='center'
fontsize=13, fontweight='bold',rotation=90)
plt.ylabel('Confirmed',fontsize=50,color='b')
plt.title('Confirmed and Recovered Cases', fontsize=100, fontweight='bold',style
plt.legend()
plt.show()
3. Plotting a Horizontal bar graph of the Death and Recovered cases of each Country
In [ ]: import matplotlib.pyplot as plt
plt.figure(figsize=(75,150))
plt.barh(df['Country/Region'], df['Deaths'], color='red', label='Deaths')
plt.barh(df['Country/Region'], df['Recovered'], color='green', label='Recovered'
plt.xlabel('Cases')
plt.yticks(rotation=45, ha='right', size=23)
plt.xticks(size=20)
plt.ylabel('Country/Region', fontsize=50, color='b')
plt.title('Death and Recovery Cases', fontsize=100, fontweight='bold',style='ita
plt.xlabel('Cases', fontsize=50, color='b')
plt.legend()
# Add values to the end of each bar with pixle offset to the right
for i, (death, recover) in enumerate(zip(df['Deaths'], df['Recovered'])):
plt.annotate(str(death), xy=(death+recover+30000, i), va='center', ha='cente
fontsize=20, fontweight='bold')
plt.annotate(str(recover), xy=(death+recover+90000, i), va='center', ha='cen
fontsize=20, fontweight='bold')
plt.show()
3. Vertical Stacked bar graph of the COUNTRIES V\S Deaths / 100 Cases,Recovered /
100 Cases,Deaths / 100 Recovered
plt.figure(figsize=(80,75))
plt.bar(df['Country/Region'],df['Deaths / 100 Cases'],color='red',label='Deaths
plt.bar(df['Country/Region'],df['Recovered / 100 Cases'],color='green',
label='Recovered for every 100 Cases',bottom=df['Deaths / 100 Cases'])
plt.bar(df['Country/Region'],df['Deaths / 100 Recovered'],color='blue',
label='Deaths for every 100 Recovered',bottom=df['Deaths / 100 Cases']+d
plt.xlabel('Country/Region',fontsize=50,color='b')
plt.xticks(rotation=45, ha='right', size=10)
# Add values to the end of each bar with pixle offset to the top
for i, (death, recover, death_rec) in enumerate(zip(df['Deaths / 100 Cases'],
df['Recovered / 100 Cases'],
plt.annotate(str(death), xy=(i, death_rec+150), va='center', ha='center',
color='red', fontsize=15, fontweight='bold',rotation=90)
plt.annotate(str(recover), xy=(i, death_rec+200), va='center', ha='center',
color='green', fontsize=15, fontweight='bold',rotation=90)
plt.annotate(str(death_rec), xy=(i, death_rec+250), va='center', ha='center'
color='blue', fontsize=13, fontweight='bold',rotation=90)
plt.figure(figsize=(15,10))
plt.pie(df['WHO Region'].value_counts(),labels=df['WHO Region'].unique(),
autopct='%5.1f%%',shadow=True,explode=(0.0,0.0,0.0,0.0,0.0,0.2))
plt.title('WHO Region')
plt.show()
5. Sub plots of WHO Region v\s Confirmed, Deaths, Recovered and active represented
by the scatter plot
plt.show()
plt.figure(figsize=(15,10))
plt.hist(df['WHO Region'], bins=6, color='red')
plt.title('WHO Region v\s Confirmed')
plt.xlabel('WHO Region', fontsize=20, color='b')
plt.ylabel('Confirmed', fontsize=20, color='b')
plt.show()
7. Plotting a grouped bar chart of Grouping countries based on there "WHO Regions"
then add all the values of Confirmed , Deaths , Recovered and Active cases induvidlly
of there colume and print the data in ascending order of the "WHO Regions".
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df=df.groupby('WHO Region')
print(df.groups)
print("---------------------------------------------------------------")
print("The number of countries in each WHO Region are:")
print(df['Country/Region'].nunique())
print("---------------------------------------------------------------")
print("The sum of Confirmed cases in each WHO Region are:")
print(df['Confirmed'].sum())
print("---------------------------------------------------------------")
print("The sum of Deaths in each WHO Region are:")
print(df['Deaths'].sum())
print("---------------------------------------------------------------")
print("The sum of Recovered cases in each WHO Region are:")
print(df['Recovered'].sum())
print("---------------------------------------------------------------")
print("The sum of Active cases in each WHO Region are:")
print(df['Active'].sum())
print("---------------------------------------------------------------")
print("The sum of New cases in each WHO Region are:")
print(df['New cases'].sum())
print("---------------------------------------------------------------")
print("The sum of New deaths in each WHO Region are:")
print(df['New deaths'].sum())
print("---------------------------------------------------------------")
print("The sum of New recovered cases in each WHO Region are:")
print(df['New recovered'].sum())
plt.figure(figsize=(15,10))
Confirmed_bar=np.arange(len(df['WHO Region'].unique()))
Death_bar=[i+0.1 for i in Confirmed_bar]
Recovered_bar=[i+0.1 for i in Death_bar]
Active_bar=[i+0.1 for i in Recovered_bar]
New_cases_bar=[i+0.1 for i in Active_bar]
New_deaths_bar=[i+0.1 for i in New_cases_bar]
New_recovered_bar=[i+0.1 for i in New_deaths_bar]
print(Confirmed_bar)
print(Death_bar)
print(Recovered_bar)
print(Active_bar)
print(New_cases_bar)
print(New_deaths_bar)
print(New_recovered_bar)
plt.bar(Confirmed_bar,df['Confirmed'].sum(),color='red',width=0.1,label='Confirm
plt.bar(Death_bar,df['Deaths'].sum(),color='green',width=0.1,label='Deaths')
plt.bar(Recovered_bar,df['Recovered'].sum(),color='blue',width=0.1,label='Recove
plt.bar(Active_bar,df['Active'].sum(),color='yellow',width=0.1,label='Active')
plt.bar(New_cases_bar,df['New cases'].sum(),color='black',width=0.1,label='New c
plt.bar(New_deaths_bar,df['New deaths'].sum(),color='orange',width=0.1,label='Ne
plt.bar(New_recovered_bar,df['New recovered'].sum(),color='pink',width=0.1,label
plt.xticks(Confirmed_bar+0.4,df['WHO Region'].unique())
plt.xlabel('WHO Region',fontsize=20,color='b')
plt.ylabel('Cases',fontsize=20,color='b')
plt.title('WHO Region v\s Cases',fontsize=30,fontweight='bold',style='italic',co
# Adding the value of the sum of cases on the top of each bar with the help of f
for i in range(len(Confirmed_bar)):
plt.annotate(text=df['Confirmed'].sum()[i], xy=(Confirmed_bar[i],df['Confirm
textcoords="offset points", ha='center', va='bottom', rotation=
plt.annotate(text=df['Deaths'].sum()[i], xy=(Death_bar[i],df['Deaths'].sum()
textcoords="offset points", ha='center', va='bottom', rotation=
plt.annotate(text=df['Recovered'].sum()[i], xy=(Recovered_bar[i],df['Recover
textcoords="offset points", ha='center', va='bottom', rotation=
plt.annotate(text=df['Active'].sum()[i], xy=(Active_bar[i],df['Active'].sum(
textcoords="offset points", ha='center', va='bottom', rotation=
plt.annotate(text=df['New cases'].sum()[i], xy=(New_cases_bar[i],df['New cas
textcoords="offset points", ha='center', va='bottom', rotation=
plt.annotate(text=df['New deaths'].sum()[i], xy=(New_deaths_bar[i],df['New d
textcoords="offset points", ha='center', va='bottom', rotation=
plt.annotate(text=df['New recovered'].sum()[i], xy=(New_recovered_bar[i],df[
textcoords="offset points", ha='center', va='bottom', rotation=
plt.legend()
plt.show()
{'Africa': [2, 4, 18, 22, 26, 28, 29, 31, 33, 34, 38, 39, 40, 42, 54, 55, 57, 58,
62, 63, 66, 71, 72, 90, 97, 98, 103, 104, 107, 109, 110, 117, 118, 123, 124, 139,
144, 146, 148, 149, 154, 156, 166, 169, 174, 183, 185, 186], 'Americas': [5, 6, 1
1, 14, 17, 20, 23, 32, 35, 37, 41, 44, 49, 50, 51, 53, 69, 70, 73, 74, 76, 86, 11
1, 122, 129, 131, 132, 140, 141, 142, 160, 170, 173, 178, 180], 'Eastern Mediterr
anean': [0, 12, 48, 52, 81, 82, 88, 92, 96, 99, 116, 127, 128, 136, 145, 153, 15
9, 163, 171, 176, 182, 184], 'Europe': [1, 3, 7, 9, 10, 15, 16, 21, 25, 43, 45, 4
6, 47, 56, 60, 61, 64, 65, 67, 68, 75, 77, 78, 83, 84, 85, 89, 91, 93, 95, 100, 1
01, 102, 108, 112, 113, 115, 120, 125, 126, 134, 135, 137, 138, 143, 147, 151, 15
2, 157, 161, 162, 165, 172, 175, 177, 179], 'South-East Asia': [13, 19, 27, 79, 8
0, 106, 119, 158, 167, 168], 'Western Pacific': [8, 24, 30, 36, 59, 87, 94, 105,
114, 121, 130, 133, 150, 155, 164, 181]}
---------------------------------------------------------------
The number of countries in each WHO Region are:
WHO Region
Africa 48
Americas 35
Eastern Mediterranean 22
Europe 56
South-East Asia 10
Western Pacific 16
Name: Country/Region, dtype: int64
---------------------------------------------------------------
The sum of Confirmed cases in each WHO Region are:
WHO Region
Africa 723207
Americas 8839286
Eastern Mediterranean 1490744
Europe 3299523
South-East Asia 1835297
Western Pacific 292428
Name: Confirmed, dtype: int64
---------------------------------------------------------------
The sum of Deaths in each WHO Region are:
WHO Region
Africa 12223
Americas 342732
Eastern Mediterranean 38339
Europe 211144
South-East Asia 41349
Western Pacific 8249
Name: Deaths, dtype: int64
---------------------------------------------------------------
The sum of Recovered cases in each WHO Region are:
WHO Region
Africa 440645
Americas 4468616
Eastern Mediterranean 1201400
Europe 1993723
South-East Asia 1156933
Western Pacific 206770
Name: Recovered, dtype: int64
---------------------------------------------------------------
The sum of Active cases in each WHO Region are:
WHO Region
Africa 270339
Americas 4027938
Eastern Mediterranean 251005
Europe 1094656
South-East Asia 637015
Western Pacific 77409
Name: Active, dtype: int64
---------------------------------------------------------------
The sum of New cases in each WHO Region are:
WHO Region
Africa 12176
Americas 129531
Eastern Mediterranean 12410
Europe 22294
South-East Asia 48993
Western Pacific 3289
Name: New cases, dtype: int64
---------------------------------------------------------------
The sum of New deaths in each WHO Region are:
WHO Region
Africa 353
Americas 3555
Eastern Mediterranean 445
Europe 304
South-East Asia 734
Western Pacific 24
Name: New deaths, dtype: int64
---------------------------------------------------------------
The sum of New recovered cases in each WHO Region are:
WHO Region
Africa 14563
Americas 94776
Eastern Mediterranean 14843
Europe 11732
South-East Asia 37582
Western Pacific 1127
Name: New recovered, dtype: int64
[0 1 2 3 4 5]
[0.1, 1.1, 2.1, 3.1, 4.1, 5.1]
[0.2, 1.2000000000000002, 2.2, 3.2, 4.199999999999999, 5.199999999999999]
[0.30000000000000004, 1.3000000000000003, 2.3000000000000003, 3.3000000000000003,
4.299999999999999, 5.299999999999999]
[0.4, 1.4000000000000004, 2.4000000000000004, 3.4000000000000004, 4.3999999999999
99, 5.399999999999999]
[0.5, 1.5000000000000004, 2.5000000000000004, 3.5000000000000004, 4.4999999999999
98, 5.499999999999998]
[0.6, 1.6000000000000005, 2.6000000000000005, 3.6000000000000005, 4.5999999999999
98, 5.599999999999998]