0% found this document useful (0 votes)
14 views43 pages

Case Study Final - Group 05 - Lokesh R M

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views43 pages

Case Study Final - Group 05 - Lokesh R M

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

GITAM University, Bangalore

Analyzing the COVID-19 Pandemic


with Pandas and Matplotlib: A Case
Study
Programming with Python

Submitted by:
LOKESH R M BU22CSEN0100145
INDLA ABHISHEK BU22CSEN0101761
VISHNUPRIYA T BU22CSEN0100996
DERANGULA DIVYA SANDHYA BU22CSEN0101184
YAKKANTI PAVAN BU22CSEN0101230

5-6-2023
Analyzing the COVID-19 Pandemic
with Pandas and Matplotlib: A Case
Study

Introduction:
The COVID-19 pandemic has affected the entire world in
unprecedented ways. With the continuous spread of the virus,
understanding its impact and tracking its growth has become
crucial. One way to achieve this is through data analysis, which
can help identify patterns and trends that aid in predicting the
spread of the virus and making informed decisions.

This case study explores the COVID-19 dataset from Kaggle, which
contains daily reports of confirmed cases, deaths, and
recoveries across various countries and regions. We will use the
Python data analysis library Pandas to clean, manipulate, and
analyze the dataset. Additionally, we will use the data
visualization library Matplotlib to create visualizations that
help us interpret and communicate our findings effectively. By
the end of this case study, you will have a deeper understanding
of how to use Pandas and Matplotlib to analyze and visualize
COVID-19 data.

Aim of the case Study


The aim of this case study is to analyze COVID-19 data using
Pandas and Matplotlib. We will use the Pandas data analysis
library to clean, manipulate, and analyze the data, and then we
will use the Matplotlib data visualization library to create
visualizations that help us interpret the data.

Pandas:
It is a Python library used for working with data sets. It has
functions for analyzing, cleaning, exploring, and manipulating
data. The name "Pandas" has a reference to both "Panel Data",
and "Python Data Analysis" and was created by Wes McKinney in
2008.

So in this case study we will use Pandas to analyze COVID-19


data.
1. Reading the data from the CSV file

To read the data from the CSV file, we will use the read_csv()
function. The read_csv() function is part of the Pandas library,
and will read the CSV file into a DataFrame.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print(df)
Country/Region Confirmed Deaths Recovered Active New cases
0 Afghanistan 36263 1269 25198 9796 106 \
1 Albania 4880 144 2745 1991 117
2 Algeria 27973 1163 18837 7973 616
3 Andorra 907 52 803 52 10
4 Angola 950 41 242 667 18
.. ... ... ... ... ... ...
182 West Bank and Gaza 10621 78 3752 6791 152
183 Western Sahara 10 1 8 1 0
184 Yemen 1691 483 833 375 10
185 Zambia 4552 140 2815 1597 71
186 Zimbabwe 2704 36 542 2126 192

New deaths New recovered Deaths / 100 Cases Recovered / 100 Cases
0 10 18 3.50 69.49 \
1 6 63 2.95 56.25
2 8 749 4.16 67.34
3 0 0 5.73 88.53
4 1 0 4.32 25.47
.. ... ... ... ...
182 2 0 0.73 35.33
183 0 0 10.00 80.00
184 4 36 28.56 49.26
185 1 465 3.08 61.84
186 2 24 1.33 20.04

Deaths / 100 Recovered Confirmed last week 1 week change


0 5.04 35526 737 \
1 5.25 4171 709
2 6.17 23691 4282
3 6.48 884 23
4 16.94 749 201
.. ... ... ...
182 2.08 8916 1705
183 12.50 10 0
184 57.98 1619 72
185 4.97 3326 1226
186 6.64 1713 991

1 week % increase WHO Region


0 2.07 Eastern Mediterranean
1 17.00 Europe
2 18.07 Africa
3 2.60 Europe
4 26.84 Africa
.. ... ...
182 19.12 Eastern Mediterranean
183 0.00 Africa
184 4.45 Eastern Mediterranean
185 36.86 Africa
186 57.85 Africa

[187 rows x 15 columns]

2. Knowing the basic information about the


Dataset
It is important to know the basic information about the dataset
before starting the analysis. Pandas provides a function called
info() that is used to print this information.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 187 entries, 0 to 186
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Country/Region 187 non-null object
1 Confirmed 187 non-null int64
2 Deaths 187 non-null int64
3 Recovered 187 non-null int64
4 Active 187 non-null int64
5 New cases 187 non-null int64
6 New deaths 187 non-null int64
7 New recovered 187 non-null int64
8 Deaths / 100 Cases 187 non-null float64
9 Recovered / 100 Cases 187 non-null float64
10 Deaths / 100 Recovered 187 non-null float64
11 Confirmed last week 187 non-null int64
12 1 week change 187 non-null int64
13 1 week % increase 187 non-null float64
14 WHO Region 187 non-null object
dtypes: float64(4), int64(9), object(2)
memory usage: 22.0+ KB

Using the size attribute, we can find the number of rows and
columns in the dataset.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The shape of the given data is:")
print(df.shape)

The shape of the given data is:


(187, 15)

Using the columns attribute, we can find the list of column


names in the dataset.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The Columns index attributes are:")
print(df.columns)
The Columns index attributes are:
Index(['Country/Region', 'Confirmed', 'Deaths', 'Recovered', 'Active',
'New cases', 'New deaths', 'New recovered', 'Deaths / 100 Cases',
'Recovered / 100 Cases', 'Deaths / 100 Recovered',
'Confirmed last week', '1 week change', '1 week % increase',
'WHO Region'],
dtype='object')

Using the describe() function, we can get the descriptive


statistics of the dataset.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

for col in df.columns:


print(f"The Summary Statistics of {col} is:")
print(df[col].describe())
print("-----------------------------------------")
The Summary Statistics of Country/Region is:
count 187
unique 187
top Afghanistan
freq 1
Name: Country/Region, dtype: object
-----------------------------------------
The Summary Statistics of Confirmed is:
count 1.870000e+02
mean 8.813094e+04
std 3.833187e+05
min 1.000000e+01
25% 1.114000e+03
50% 5.059000e+03
75% 4.046050e+04
max 4.290259e+06
Name: Confirmed, dtype: float64
-----------------------------------------
The Summary Statistics of Deaths is:
count 187.000000
mean 3497.518717
std 14100.002482
min 0.000000
25% 18.500000
50% 108.000000
75% 734.000000
max 148011.000000
Name: Deaths, dtype: float64
-----------------------------------------
The Summary Statistics of Recovered is:
count 1.870000e+02
mean 5.063148e+04
std 1.901882e+05
min 0.000000e+00
25% 6.265000e+02
50% 2.815000e+03
75% 2.260600e+04
max 1.846641e+06
Name: Recovered, dtype: float64
-----------------------------------------
The Summary Statistics of Active is:
count 1.870000e+02
mean 3.400194e+04
std 2.133262e+05
min 0.000000e+00
25% 1.415000e+02
50% 1.600000e+03
75% 9.149000e+03
max 2.816444e+06
Name: Active, dtype: float64
-----------------------------------------
The Summary Statistics of New cases is:
count 187.000000
mean 1222.957219
std 5710.374790
min 0.000000
25% 4.000000
50% 49.000000
75% 419.500000
max 56336.000000
Name: New cases, dtype: float64
-----------------------------------------
The Summary Statistics of New deaths is:
count 187.000000
mean 28.957219
std 120.037173
min 0.000000
25% 0.000000
50% 1.000000
75% 6.000000
max 1076.000000
Name: New deaths, dtype: float64
-----------------------------------------
The Summary Statistics of New recovered is:
count 187.000000
mean 933.812834
std 4197.719635
min 0.000000
25% 0.000000
50% 22.000000
75% 221.000000
max 33728.000000
Name: New recovered, dtype: float64
-----------------------------------------
The Summary Statistics of Deaths / 100 Cases is:
count 187.000000
mean 3.019519
std 3.454302
min 0.000000
25% 0.945000
50% 2.150000
75% 3.875000
max 28.560000
Name: Deaths / 100 Cases, dtype: float64
-----------------------------------------
The Summary Statistics of Recovered / 100 Cases is:
count 187.000000
mean 64.820535
std 26.287694
min 0.000000
25% 48.770000
50% 71.320000
75% 86.885000
max 100.000000
Name: Recovered / 100 Cases, dtype: float64
-----------------------------------------
The Summary Statistics of Deaths / 100 Recovered is:
count 187.00
mean inf
std NaN
min 0.00
25% 1.45
50% 3.62
75% 6.44
max inf
Name: Deaths / 100 Recovered, dtype: float64
-----------------------------------------
The Summary Statistics of Confirmed last week is:
count 1.870000e+02
mean 7.868248e+04
std 3.382737e+05
min 1.000000e+01
25% 1.051500e+03
50% 5.020000e+03
75% 3.708050e+04
max 3.834677e+06
Name: Confirmed last week, dtype: float64
-----------------------------------------
The Summary Statistics of 1 week change is:
count 187.000000
mean 9448.459893
std 47491.127684
min -47.000000
25% 49.000000
50% 432.000000
75% 3172.000000
max 455582.000000
Name: 1 week change, dtype: float64
-----------------------------------------
The Summary Statistics of 1 week % increase is:
count 187.000000
mean 13.606203
std 24.509838
min -3.840000
25% 2.775000
50% 6.890000
75% 16.855000
max 226.320000
Name: 1 week % increase, dtype: float64
-----------------------------------------
The Summary Statistics of WHO Region is:
count 187
unique 6
top Europe
freq 56
Name: WHO Region, dtype: object
-----------------------------------------

Using the head() function, we can get the first five rows of the
dataset.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The first 5 rows of the given data is:")
print(df.head())
The first 5 rows of the given data is:
Country/Region Confirmed Deaths Recovered Active New cases New deaths
0 Afghanistan 36263 1269 25198 9796 106 10 \
1 Albania 4880 144 2745 1991 117 6
2 Algeria 27973 1163 18837 7973 616 8
3 Andorra 907 52 803 52 10 0
4 Angola 950 41 242 667 18 1

New recovered Deaths / 100 Cases Recovered / 100 Cases


0 18 3.50 69.49 \
1 63 2.95 56.25
2 749 4.16 67.34
3 0 5.73 88.53
4 0 4.32 25.47

Deaths / 100 Recovered Confirmed last week 1 week change


0 5.04 35526 737 \
1 5.25 4171 709
2 6.17 23691 4282
3 6.48 884 23
4 16.94 749 201

1 week % increase WHO Region


0 2.07 Eastern Mediterranean
1 17.00 Europe
2 18.07 Africa
3 2.60 Europe
4 26.84 Africa

Using the tail() function, we can get the last five rows of the
dataset.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The last 5 rows of the given data is:")
print(df.tail())
The last 5 rows of the given data is:
Country/Region Confirmed Deaths Recovered Active New cases
182 West Bank and Gaza 10621 78 3752 6791 152 \
183 Western Sahara 10 1 8 1 0
184 Yemen 1691 483 833 375 10
185 Zambia 4552 140 2815 1597 71
186 Zimbabwe 2704 36 542 2126 192

New deaths New recovered Deaths / 100 Cases Recovered / 100 Cases
182 2 0 0.73 35.33 \
183 0 0 10.00 80.00
184 4 36 28.56 49.26
185 1 465 3.08 61.84
186 2 24 1.33 20.04

Deaths / 100 Recovered Confirmed last week 1 week change


182 2.08 8916 1705 \
183 12.50 10 0
184 57.98 1619 72
185 4.97 3326 1226
186 6.64 1713 991

1 week % increase WHO Region


182 19.12 Eastern Mediterranean
183 0.00 Africa
184 4.45 Eastern Mediterranean
185 36.86 Africa
186 57.85 Africa

Using the sample() function, we can get a random sample of rows


from the dataset.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The 5 random rows of the given data is:")
print(df.sample(5))
The 5 random rows of the given data is:
Country/Region Confirmed Deaths Recovered Active New cases
95 Latvia 1219 31 1045 143 0 \
0 Afghanistan 36263 1269 25198 9796 106
105 Malaysia 8904 124 8601 179 7
116 Morocco 20887 316 16553 4018 609
163 Syria 674 40 0 634 24

New deaths New recovered Deaths / 100 Cases Recovered / 100 Cases
95 0 0 2.54 85.73 \
0 10 18 3.50 69.49
105 0 1 1.39 96.60
116 3 115 1.51 79.25
163 2 0 5.93 0.00

Deaths / 100 Recovered Confirmed last week 1 week change


95 2.97 1192 27 \
0 5.04 35526 737
105 1.44 8800 104
116 1.91 17562 3325
163 inf 522 152

1 week % increase WHO Region


95 2.27 Europe
0 2.07 Eastern Mediterranean
105 1.18 Western Pacific
116 18.93 Eastern Mediterranean
163 29.12 Eastern Mediterranean

Finding the Minimum, Maximum, Sum, Average, and Count of a


Column

It is important to know the minimum, maximum, sum, average, and


count of a column in the dataset. Pandas provides functions to
find these values.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
numeric_cols = df.select_dtypes(include='number')
col_min = numeric_cols.min()
col_max = numeric_cols.max()
col_mean = numeric_cols.mean()
print("The Minimum values of the numeric columns are:")
print(col_min)
print("---------------------------------------------------------------")
print("The Maximum values of the numeric columns are:")
print(col_max)
print("---------------------------------------------------------------")
print("The Mean values of the numeric columns are:")
print(col_mean)
print("---------------------------------------------------------------")
print("The Sum values of the numeric columns are:")
print(numeric_cols.sum())
print("---------------------------------------------------------------")
print("The Average values of the numeric columns are:")
print(numeric_cols.mean())
print("---------------------------------------------------------------")
print("The Count values of the numeric columns are:")
print(numeric_cols.count())
print("---------------------------------------------------------------")
The Minimum values of the numeric columns are:
Confirmed 10.00
Deaths 0.00
Recovered 0.00
Active 0.00
New cases 0.00
New deaths 0.00
New recovered 0.00
Deaths / 100 Cases 0.00
Recovered / 100 Cases 0.00
Deaths / 100 Recovered 0.00
Confirmed last week 10.00
1 week change -47.00
1 week % increase -3.84
dtype: float64
---------------------------------------------------------------
The Maximum values of the numeric columns are:
Confirmed 4290259.00
Deaths 148011.00
Recovered 1846641.00
Active 2816444.00
New cases 56336.00
New deaths 1076.00
New recovered 33728.00
Deaths / 100 Cases 28.56
Recovered / 100 Cases 100.00
Deaths / 100 Recovered inf
Confirmed last week 3834677.00
1 week change 455582.00
1 week % increase 226.32
dtype: float64
---------------------------------------------------------------
The Mean values of the numeric columns are:
Confirmed 8.813094e+04
Deaths 3.497519e+03
Recovered 5.063148e+04
Active 3.400194e+04
New cases 1.222957e+03
New deaths 2.895722e+01
New recovered 9.338128e+02
Deaths / 100 Cases 3.019519e+00
Recovered / 100 Cases 6.482053e+01
Deaths / 100 Recovered inf
Confirmed last week 7.868248e+04
1 week change 9.448460e+03
1 week % increase 1.360620e+01
dtype: float64
---------------------------------------------------------------
The Sum values of the numeric columns are:
Confirmed 16480485.00
Deaths 654036.00
Recovered 9468087.00
Active 6358362.00
New cases 228693.00
New deaths 5415.00
New recovered 174623.00
Deaths / 100 Cases 564.65
Recovered / 100 Cases 12121.44
Deaths / 100 Recovered inf
Confirmed last week 14713623.00
1 week change 1766862.00
1 week % increase 2544.36
dtype: float64
---------------------------------------------------------------
The Average values of the numeric columns are:
Confirmed 8.813094e+04
Deaths 3.497519e+03
Recovered 5.063148e+04
Active 3.400194e+04
New cases 1.222957e+03
New deaths 2.895722e+01
New recovered 9.338128e+02
Deaths / 100 Cases 3.019519e+00
Recovered / 100 Cases 6.482053e+01
Deaths / 100 Recovered inf
Confirmed last week 7.868248e+04
1 week change 9.448460e+03
1 week % increase 1.360620e+01
dtype: float64
---------------------------------------------------------------
The Count values of the numeric columns are:
Confirmed 187
Deaths 187
Recovered 187
Active 187
New cases 187
New deaths 187
New recovered 187
Deaths / 100 Cases 187
Recovered / 100 Cases 187
Deaths / 100 Recovered 187
Confirmed last week 187
1 week change 187
1 week % increase 187
dtype: int64
---------------------------------------------------------------

3. Cleaning the data

Cleaning the data is the process of converting the data into a


usable format. In this case study, we will clean the data by:

1. Dropping columns that are not needed for the analysis.


2. Renaming columns.
3. Checking for missing values and replacing them.
4. Checking for duplicates and removing them.

Checking for the Duplicate data and removing them.


It is important to check for duplicate data and remove it.
Pandas provides a function called duplicated() that is used to
check for duplicate data. The duplicated() function returns a
Boolean value for each row:

True means the row is a duplicate.


False means the row is not a duplicate.
We can use the sum() function to count the number of duplicate
rows in the dataset.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
dup=df.duplicated()
print(dup)
print("----------------------------------")
print("The number of duplicate rows are:")
print(sum(dup))

0 False
1 False
2 False
3 False
4 False
...
182 False
183 False
184 False
185 False
186 False
Length: 187, dtype: bool
----------------------------------
The number of duplicate rows are:
0

Checking for the missing values in the dataset is an important


step in the data cleaning process. Pandas provides the isnull()
function to check for missing values. The isnull() function
returns a boolean value for each cell in the dataset. If a cell
contains a missing value, it will be marked as True, otherwise
it will be marked as False.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

print("The number of null values in each column is:")


print(df.isnull().sum())
The number of null values in each column is:
Country/Region 0
Confirmed 0
Deaths 0
Recovered 0
Active 0
New cases 0
New deaths 0
New recovered 0
Deaths / 100 Cases 0
Recovered / 100 Cases 0
Deaths / 100 Recovered 0
Confirmed last week 0
1 week change 0
1 week % increase 0
WHO Region 0
dtype: int64

3. Basic Analysis

Basic analysis is the process of analyzing the data to find


patterns and trends. In this case study, we will perform basic
analysis by:

1. What are the data types of each column?

In the below code we used the dtypes attribute to find the data
types of each column in the dataset.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The data types of each column is:")
print(df.dtypes)

The data types of each column is:


Country/Region object
Confirmed int64
Deaths int64
Recovered int64
Active int64
New cases int64
New deaths int64
New recovered int64
Deaths / 100 Cases float64
Recovered / 100 Cases float64
Deaths / 100 Recovered float64
Confirmed last week int64
1 week change int64
1 week % increase float64
WHO Region object
dtype: object

2. How many unique countries/regions are included in the data?


In the below code we used the nunique() function to find the
number of unique countries/regions in the dataset.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The number of unique countries/regions are:")
print(df['Country/Region'].nunique())
print("---------------------------------------------------------------")
print("The number of unique WHO Regions are:")
print(df['WHO Region'].nunique())

The number of unique countries/regions are:


187
---------------------------------------------------------------
The number of unique WHO Regions are:
6

3. Printing the Name of the 'Countries' with there 'WHO Region Name'

In the below code we used df[] to print the name of the


'Countries' with there 'WHO Region Name'.

df['Country/Region'] is used to print the name of the


'Countries' and df['WHO Region'] is used to print the name of
the 'WHO Region Name'.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print(df[['Country/Region','WHO Region']])

Country/Region WHO Region


0 Afghanistan Eastern Mediterranean
1 Albania Europe
2 Algeria Africa
3 Andorra Europe
4 Angola Africa
.. ... ...
182 West Bank and Gaza Eastern Mediterranean
183 Western Sahara Africa
184 Yemen Eastern Mediterranean
185 Zambia Africa
186 Zimbabwe Africa

[187 rows x 2 columns]

4. Adding the columns "Total Cases" which is the sum of "Deaths","Active" and
"Recovered" columns and comparing it with the "Confirmed" column.

In the below code we added the columns "Total Cases" which is


the sum of "Deaths","Active" and "Recovered" columns and
compared it with the "Confirmed" column.
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df['Total Cases']=df['Deaths']+df['Active']+df['Recovered']
print(df[['Deaths','Active','Recovered','Total Cases','Confirmed']])
print("---------------------------------------------------------------")
print("The number of rows where the value of 'Confirmed' column is Equal to the
print(df[df['Confirmed']==df['Total Cases']].shape[0])

Deaths Active Recovered Total Cases Confirmed


0 1269 9796 25198 36263 36263
1 144 1991 2745 4880 4880
2 1163 7973 18837 27973 27973
3 52 52 803 907 907
4 41 667 242 950 950
.. ... ... ... ... ...
182 78 6791 3752 10621 10621
183 1 1 8 10 10
184 483 375 833 1691 1691
185 140 1597 2815 4552 4552
186 36 2126 542 2704 2704

[187 rows x 5 columns]


---------------------------------------------------------------
The number of rows where the value of 'Confirmed' column is greater than the valu
e of 'Total Cases' column are:
187

4. Sorting

Sorting is the process of arranging the data in a particular


order. In this case study, we will sort the data by:

1. Sorting the data by the number of confirmed cases in ascending order.

In the below code we used the sort_values() function to sort the


data by the number of confirmed cases in ascending order.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

df=df.sort_values(by=['Confirmed'],ascending=True)
print(df[['Country/Region','Confirmed']])
print("---------------------------------------------------------------")
print("The Country/Region with the lowest number of confirmed cases is:")
print(df.iloc[0,0])
Country/Region Confirmed
183 Western Sahara 10
75 Holy See 12
68 Greenland 14
140 Saint Kitts and Nevis 17
49 Dominica 18
.. ... ...
154 South Africa 452529
138 Russia 816680
79 India 1480073
23 Brazil 2442375
173 US 4290259

[187 rows x 2 columns]


---------------------------------------------------------------
The Country/Region with the lowest number of confirmed cases is:
Western Sahara

2. Sorting the data by the alphabetical order of the countries in ascending order.

In the below code we used the sort_values() function to sort the


data by the alphabetical order of the countries in ascending
order.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

df=df.sort_values(['Country/Region'],ascending=True)
print(df[['Country/Region','Confirmed']])
print("---------------------------------------------------------------")

Country/Region Confirmed
0 Afghanistan 36263
1 Albania 4880
2 Algeria 27973
3 Andorra 907
4 Angola 950
.. ... ...
182 West Bank and Gaza 10621
183 Western Sahara 10
184 Yemen 1691
185 Zambia 4552
186 Zimbabwe 2704

[187 rows x 2 columns]


---------------------------------------------------------------

3. Finding the Lowest and the Highest number of confirmed cases.

In the below code we used the sort_values() function to find the


Lowest and the Highest number of confirmed cases.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

df=df.sort_values(['Country/Region','Confirmed'],ascending=[True,False])
print(df[['Country/Region','Confirmed']])
print("---------------------------------------------------------------")
print("The Country/Region with the highest number of confirmed cases is:")
print(df.iloc[0,0])
print("---------------------------------------------------------------")
print("The Country/Region with the lowest number of confirmed cases is:")
print(df.iloc[-1,0])

Country/Region Confirmed
0 Afghanistan 36263
1 Albania 4880
2 Algeria 27973
3 Andorra 907
4 Angola 950
.. ... ...
182 West Bank and Gaza 10621
183 Western Sahara 10
184 Yemen 1691
185 Zambia 4552
186 Zimbabwe 2704

[187 rows x 2 columns]


---------------------------------------------------------------
The Country/Region with the highest number of confirmed cases is:
Afghanistan
---------------------------------------------------------------
The Country/Region with the lowest number of confirmed cases is:
Zimbabwe

4. Extracting the data of range based on the name of the 'Country' starting with 'I'

In the below code we used .str.startswith() function to extract


the data of range based on the name of the 'Country' starting
with 'I'.

where .str is the string function and .startswith() is the


function to extract the data of range based on the name of the
'Country' starting with 'I'.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df=df[df['Country/Region'].str.startswith('I')]
print(df[['Country/Region','Deaths','Recovered']])
print("---------------------------------------------------------------")
print("The number of countries starting with 'I' are:")
print(df['Country/Region'].nunique())
Country/Region Deaths Recovered
78 Iceland 10 1823
79 India 33408 951166
80 Indonesia 4838 58173
81 Iran 15912 255144
82 Iraq 4458 77144
83 Ireland 1764 23364
84 Israel 474 27133
85 Italy 35112 198593
---------------------------------------------------------------
The number of countries starting with 'I' are:
8

5. Extracting the data with the indexing

In the below code we used loc[] function to extract the data


with the indexing.

where loc[] is the function to extract the data with the


indexing.

In the code we used the name of the 'Country' as the index and
extracted the data of the 'Country' 'India'.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df=df.loc[df['Country/Region']=='India']
print(df)

Country/Region Confirmed Deaths Recovered Active New cases


79 India 1480073 33408 951166 495499 44457 \

New deaths New recovered Deaths / 100 Cases Recovered / 100 Cases
79 637 33598 2.26 64.26 \

Deaths / 100 Recovered Confirmed last week 1 week change


79 3.51 1155338 324735 \

1 week % increase WHO Region


79 28.11 South-East Asia

6. Extracting the range of data with the indexing using iloc[]

In the below code we used iloc[] function to extract the range


of data with the indexing.

where iloc[] is the function to extract the range of data with


the indexing.

0:10,0:2 is the range of data with the indexing.


where 0:10 is the range of rows and 0:2 is the range of columns.
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df=df.iloc[0:10,0:2]
print(df)

Country/Region Confirmed
0 Afghanistan 36263
1 Albania 4880
2 Algeria 27973
3 Andorra 907
4 Angola 950
5 Antigua and Barbuda 86
6 Argentina 167416
7 Armenia 37390
8 Australia 15303
9 Austria 20558

7. Extracting the data based on the 'WHO Region'

In the below code we used groupby() function to extract the data


based on the 'WHO Region'.

where groupby() is the function to extract the data based on the


'WHO Region'.

and then we used nunique() function to find the number of unique


countries/regions in the dataset.

In [ ]: # groupby() function by the 'WHO Region' column


import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df=df.groupby('WHO Region')
print(df.groups)
print("---------------------------------------------------------------")
print("The number of countries in each WHO Region are:")
print(df['Country/Region'].nunique())
{'Africa': [2, 4, 18, 22, 26, 28, 29, 31, 33, 34, 38, 39, 40, 42, 54, 55, 57, 58,
62, 63, 66, 71, 72, 90, 97, 98, 103, 104, 107, 109, 110, 117, 118, 123, 124, 139,
144, 146, 148, 149, 154, 156, 166, 169, 174, 183, 185, 186], 'Americas': [5, 6, 1
1, 14, 17, 20, 23, 32, 35, 37, 41, 44, 49, 50, 51, 53, 69, 70, 73, 74, 76, 86, 11
1, 122, 129, 131, 132, 140, 141, 142, 160, 170, 173, 178, 180], 'Eastern Mediterr
anean': [0, 12, 48, 52, 81, 82, 88, 92, 96, 99, 116, 127, 128, 136, 145, 153, 15
9, 163, 171, 176, 182, 184], 'Europe': [1, 3, 7, 9, 10, 15, 16, 21, 25, 43, 45, 4
6, 47, 56, 60, 61, 64, 65, 67, 68, 75, 77, 78, 83, 84, 85, 89, 91, 93, 95, 100, 1
01, 102, 108, 112, 113, 115, 120, 125, 126, 134, 135, 137, 138, 143, 147, 151, 15
2, 157, 161, 162, 165, 172, 175, 177, 179], 'South-East Asia': [13, 19, 27, 79, 8
0, 106, 119, 158, 167, 168], 'Western Pacific': [8, 24, 30, 36, 59, 87, 94, 105,
114, 121, 130, 133, 150, 155, 164, 181]}
---------------------------------------------------------------
The number of countries in each WHO Region are:
WHO Region
Africa 48
Americas 35
Eastern Mediterranean 22
Europe 56
South-East Asia 10
Western Pacific 16
Name: Country/Region, dtype: int64

8. Creating a New column called as 'Review' and giving the feedback by comparing the
Deaths and Recovered cases.

In [ ]: import pandas as pd

data = pd.read_csv('COVID Dataset.csv')


df = pd.DataFrame(data)

# Create a new column called Review


df['Review'] = 'Not Reviewed'

# Set Review based on Death/Recovered values


df.loc[df['Deaths']>df['Recovered'],'Review'] = 'High Risk'
df.loc[df['Deaths']<=df['Recovered'],'Review'] = 'Low Risk'

# Get count of High/Low risk and total count


high_risk_count = df.loc[df['Review'] == 'High Risk', 'Review'].count()
low_risk_count = df.loc[df['Review'] == 'Low Risk', 'Review'].count()
total_count = df['Review'].count()

print(f"High Risk Count: {high_risk_count}")


print(f"Low Risk Count: {low_risk_count}")
print(f"Total Count: {total_count}")
print("---------------------------------------------------------------")
print(df[['Country/Region','Review']])
print("---------------------------------------------------------------")
print("The number of countries with High Risk are:")
df.loc[df['Review']=='High Risk',['Country/Region']]
High Risk Count: 7
Low Risk Count: 180
Total Count: 187
---------------------------------------------------------------
Country/Region Review
0 Afghanistan Low Risk
1 Albania Low Risk
2 Algeria Low Risk
3 Andorra Low Risk
4 Angola Low Risk
.. ... ...
182 West Bank and Gaza Low Risk
183 Western Sahara Low Risk
184 Yemen Low Risk
185 Zambia Low Risk
186 Zimbabwe Low Risk

[187 rows x 2 columns]


---------------------------------------------------------------
The number of countries with High Risk are:
Out[ ]: Country/Region

32 Canada

117 Mozambique

120 Netherlands

147 Serbia

161 Sweden

163 Syria

177 United Kingdom

9. Printing the data of the 'WHO Regions' ending with 'O'

In the below code we used .str.endswith() function to extract


the data of range based on the name of the 'WHO Regions' ending
with 'O'.

where .str is the string function and .endswith() is the


function to extract the data of range based on the name of the
'WHO Regions' ending with 'O'.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df=df[df['WHO Region'].str.endswith('s')]
print(df[['Country/Region','WHO Region']])
Country/Region WHO Region
5 Antigua and Barbuda Americas
6 Argentina Americas
11 Bahamas Americas
14 Barbados Americas
17 Belize Americas
20 Bolivia Americas
23 Brazil Americas
32 Canada Americas
35 Chile Americas
37 Colombia Americas
41 Costa Rica Americas
44 Cuba Americas
49 Dominica Americas
50 Dominican Republic Americas
51 Ecuador Americas
53 El Salvador Americas
69 Grenada Americas
70 Guatemala Americas
73 Guyana Americas
74 Haiti Americas
76 Honduras Americas
86 Jamaica Americas
111 Mexico Americas
122 Nicaragua Americas
129 Panama Americas
131 Paraguay Americas
132 Peru Americas
140 Saint Kitts and Nevis Americas
141 Saint Lucia Americas
142 Saint Vincent and the Grenadines Americas
160 Suriname Americas
170 Trinidad and Tobago Americas
173 US Americas
178 Uruguay Americas
180 Venezuela Americas

10. Grouping countries based on there "WHO Regions" then add all the values of
Confirmed , Deaths , Recovered and Active cases induvidlly of there colume and
print the data in ascending order of the "WHO Regions".

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

df=df.groupby('WHO Region')
print(df.groups)
print("---------------------------------------------------------------")
print("The number of countries in each WHO Region are:")
print(df['Country/Region'].nunique())
print("---------------------------------------------------------------")
print("The sum of Confirmed cases in each WHO Region are:")
print(df['Confirmed'].sum())
print("---------------------------------------------------------------")
print("The sum of Deaths in each WHO Region are:")
print(df['Deaths'].sum())
print("---------------------------------------------------------------")
print("The sum of Recovered cases in each WHO Region are:")
print(df['Recovered'].sum())
print("---------------------------------------------------------------")
print("The sum of Active cases in each WHO Region are:")
print(df['Active'].sum())
print("---------------------------------------------------------------")
print("The sum of New cases in each WHO Region are:")
print(df['New cases'].sum())
print("---------------------------------------------------------------")
print("The sum of New deaths in each WHO Region are:")
print(df['New deaths'].sum())
print("---------------------------------------------------------------")
print("The sum of New recovered cases in each WHO Region are:")
print(df['New recovered'].sum())
{'Africa': [2, 4, 18, 22, 26, 28, 29, 31, 33, 34, 38, 39, 40, 42, 54, 55, 57, 58,
62, 63, 66, 71, 72, 90, 97, 98, 103, 104, 107, 109, 110, 117, 118, 123, 124, 139,
144, 146, 148, 149, 154, 156, 166, 169, 174, 183, 185, 186], 'Americas': [5, 6, 1
1, 14, 17, 20, 23, 32, 35, 37, 41, 44, 49, 50, 51, 53, 69, 70, 73, 74, 76, 86, 11
1, 122, 129, 131, 132, 140, 141, 142, 160, 170, 173, 178, 180], 'Eastern Mediterr
anean': [0, 12, 48, 52, 81, 82, 88, 92, 96, 99, 116, 127, 128, 136, 145, 153, 15
9, 163, 171, 176, 182, 184], 'Europe': [1, 3, 7, 9, 10, 15, 16, 21, 25, 43, 45, 4
6, 47, 56, 60, 61, 64, 65, 67, 68, 75, 77, 78, 83, 84, 85, 89, 91, 93, 95, 100, 1
01, 102, 108, 112, 113, 115, 120, 125, 126, 134, 135, 137, 138, 143, 147, 151, 15
2, 157, 161, 162, 165, 172, 175, 177, 179], 'South-East Asia': [13, 19, 27, 79, 8
0, 106, 119, 158, 167, 168], 'Western Pacific': [8, 24, 30, 36, 59, 87, 94, 105,
114, 121, 130, 133, 150, 155, 164, 181]}
---------------------------------------------------------------
The number of countries in each WHO Region are:
WHO Region
Africa 48
Americas 35
Eastern Mediterranean 22
Europe 56
South-East Asia 10
Western Pacific 16
Name: Country/Region, dtype: int64
---------------------------------------------------------------
The sum of Confirmed cases in each WHO Region are:
WHO Region
Africa 723207
Americas 8839286
Eastern Mediterranean 1490744
Europe 3299523
South-East Asia 1835297
Western Pacific 292428
Name: Confirmed, dtype: int64
---------------------------------------------------------------
The sum of Deaths in each WHO Region are:
WHO Region
Africa 12223
Americas 342732
Eastern Mediterranean 38339
Europe 211144
South-East Asia 41349
Western Pacific 8249
Name: Deaths, dtype: int64
---------------------------------------------------------------
The sum of Recovered cases in each WHO Region are:
WHO Region
Africa 440645
Americas 4468616
Eastern Mediterranean 1201400
Europe 1993723
South-East Asia 1156933
Western Pacific 206770
Name: Recovered, dtype: int64
---------------------------------------------------------------
The sum of Active cases in each WHO Region are:
WHO Region
Africa 270339
Americas 4027938
Eastern Mediterranean 251005
Europe 1094656
South-East Asia 637015
Western Pacific 77409
Name: Active, dtype: int64
---------------------------------------------------------------
The sum of New cases in each WHO Region are:
WHO Region
Africa 12176
Americas 129531
Eastern Mediterranean 12410
Europe 22294
South-East Asia 48993
Western Pacific 3289
Name: New cases, dtype: int64
---------------------------------------------------------------
The sum of New deaths in each WHO Region are:
WHO Region
Africa 353
Americas 3555
Eastern Mediterranean 445
Europe 304
South-East Asia 734
Western Pacific 24
Name: New deaths, dtype: int64
---------------------------------------------------------------
The sum of New recovered cases in each WHO Region are:
WHO Region
Africa 14563
Americas 94776
Eastern Mediterranean 14843
Europe 11732
South-East Asia 37582
Western Pacific 1127
Name: New recovered, dtype: int64

Matplotlib

Matplotlib is a Python library used for plotting. It can create


various types of plots, such as line plots, histograms, and
scatter plots.

In this case study we will use it to plot the data of the COVID-
19 dataset.

1. Plotting a Line graph to compare the Confirmed Cases in all the countries

In [ ]: import pandas as pd
import matplotlib.pyplot as plt
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

plt.figure(figsize=(30,25))
plt.plot(df['Country/Region'],df['Confirmed'],marker='h')
plt.xlabel('Country/Region',fontsize=25,color='b')
plt.xticks(rotation=45, ha='right', size=5)
plt.ylabel('Confirmed',fontsize=25,color='b')
plt.title('Confirmed Cases', fontsize=30, fontweight='bold',style='italic',color
plt.show()

In the above graph is plotted to visualize the data of the


Confirmed cases in all the countries in the dataset.

In the above code:

plt is the alias for the matplotlib library.


.figure() is used to create a new figure.
figsize is used to set the size of the figure.
.plot() is used to plot the data.
df['Country/Region'] is used to plot the data of the
'Country/Region' column.
df['Confirmed'] is used to plot the data of the
'Confirmed' column.
marker is used to set the marker for the data points.
.xlabel() is used to set the label for the x-axis.
.xticks() is used to set the ticks for the x-axis.
rotation is used to rotate the ticks.
ha is used to set the alignment of the ticks.
size is used to set the size of the ticks.
.ylabel() is used to set the label for the y-axis.
.title() is used to set the title of the plot.
.show() is used to display the plot.
2. Plotting a Stacked bar graph of "Confirmed" and "Recovered" cases of each Country

In [ ]: import matplotlib.pyplot as plt


import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

plt.figure(figsize=(80,60))
plt.bar(df['Country/Region'],df['Confirmed'],color='red',label='Confirmed')
plt.bar(df['Country/Region'],df['Recovered'],color='green',label='Recovered')
plt.xlabel('Country/Region',fontsize=50,color='b')
plt.xticks(rotation=45, ha='right', size=10)
plt.yticks(size=30)

# Add values to the end of each bar with pixle offset to the top
for i, (confirmed, recover) in enumerate(zip(df['Confirmed'], df['Recovered'])):
plt.annotate(str(confirmed), xy=(i, confirmed+150000), va='center', ha='cent
fontsize=13, fontweight='bold',rotation=90)
plt.annotate(str(recover), xy=(i, confirmed+50000), va='center', ha='center'
fontsize=13, fontweight='bold',rotation=90)

plt.ylabel('Confirmed',fontsize=50,color='b')
plt.title('Confirmed and Recovered Cases', fontsize=100, fontweight='bold',style
plt.legend()
plt.show()

3. Plotting a Horizontal bar graph of the Death and Recovered cases of each Country
In [ ]: import matplotlib.pyplot as plt

plt.figure(figsize=(75,150))
plt.barh(df['Country/Region'], df['Deaths'], color='red', label='Deaths')
plt.barh(df['Country/Region'], df['Recovered'], color='green', label='Recovered'
plt.xlabel('Cases')
plt.yticks(rotation=45, ha='right', size=23)
plt.xticks(size=20)
plt.ylabel('Country/Region', fontsize=50, color='b')
plt.title('Death and Recovery Cases', fontsize=100, fontweight='bold',style='ita
plt.xlabel('Cases', fontsize=50, color='b')
plt.legend()

# Add values to the end of each bar with pixle offset to the right
for i, (death, recover) in enumerate(zip(df['Deaths'], df['Recovered'])):
plt.annotate(str(death), xy=(death+recover+30000, i), va='center', ha='cente
fontsize=20, fontweight='bold')
plt.annotate(str(recover), xy=(death+recover+90000, i), va='center', ha='cen
fontsize=20, fontweight='bold')

plt.show()
3. Vertical Stacked bar graph of the COUNTRIES V\S Deaths / 100 Cases,Recovered /
100 Cases,Deaths / 100 Recovered

In [ ]: import matplotlib.pyplot as plt


import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

plt.figure(figsize=(80,75))
plt.bar(df['Country/Region'],df['Deaths / 100 Cases'],color='red',label='Deaths
plt.bar(df['Country/Region'],df['Recovered / 100 Cases'],color='green',
label='Recovered for every 100 Cases',bottom=df['Deaths / 100 Cases'])
plt.bar(df['Country/Region'],df['Deaths / 100 Recovered'],color='blue',
label='Deaths for every 100 Recovered',bottom=df['Deaths / 100 Cases']+d

plt.xlabel('Country/Region',fontsize=50,color='b')
plt.xticks(rotation=45, ha='right', size=10)

# Add values to the end of each bar with pixle offset to the top
for i, (death, recover, death_rec) in enumerate(zip(df['Deaths / 100 Cases'],
df['Recovered / 100 Cases'],
plt.annotate(str(death), xy=(i, death_rec+150), va='center', ha='center',
color='red', fontsize=15, fontweight='bold',rotation=90)
plt.annotate(str(recover), xy=(i, death_rec+200), va='center', ha='center',
color='green', fontsize=15, fontweight='bold',rotation=90)
plt.annotate(str(death_rec), xy=(i, death_rec+250), va='center', ha='center'
color='blue', fontsize=13, fontweight='bold',rotation=90)

plt.ylabel('Cases', fontsize=50, color='b')


plt.title('Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered',
fontsize=30, fontweight='bold',style='italic',color='r')
plt.legend()
plt.show()
4. Pie chart of the 'countries' v\s "WHO Region"

In [ ]: import matplotlib.pyplot as plt


import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

plt.figure(figsize=(15,10))
plt.pie(df['WHO Region'].value_counts(),labels=df['WHO Region'].unique(),
autopct='%5.1f%%',shadow=True,explode=(0.0,0.0,0.0,0.0,0.0,0.2))
plt.title('WHO Region')
plt.show()
5. Sub plots of WHO Region v\s Confirmed, Deaths, Recovered and active represented
by the scatter plot

In [ ]: import matplotlib.pyplot as plt


import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

fig, ax = plt.subplots(2, 3, figsize=(20, 10))


ax[0, 0].scatter(df['WHO Region'], df['Confirmed'], color='red')
ax[0, 0].set_title('WHO Region v\s Confirmed')
ax[0, 1].scatter(df['WHO Region'], df['Deaths'], color='green')
ax[0, 1].set_title('WHO Region v\s Deaths')
ax[0, 2].scatter(df['WHO Region'], df['Recovered'], color='blue')
ax[0, 2].set_title('WHO Region v\s Recovered')
ax[1, 0].scatter(df['WHO Region'], df['Active'], color='yellow')
ax[1, 0].set_title('WHO Region v\s Active')
ax[1, 1].scatter(df['WHO Region'], df['New cases'], color='black')
ax[1, 1].set_title('WHO Region v\s New cases')
ax[1, 2].scatter(df['WHO Region'], df['New deaths'], color='orange')
ax[1, 2].set_title('WHO Region v\s New deaths')

# Increasing the distance between subplots


plt.subplots_adjust(left=0.2, bottom=0.1, right=0.9, top=0.9, wspace=0.4, hspace
#Values of x axis are overlapping so we are rotating them
for ax in fig.axes:
plt.sca(ax)
plt.xticks(rotation=45, ha='right', size=8)

plt.show()

6. Histogram of the WHO Region v\s Confirmed last week

In [ ]: import matplotlib.pyplot as plt


import pandas as pd

data = pd.read_csv('COVID Dataset.csv')


df = pd.DataFrame(data)

plt.figure(figsize=(15,10))
plt.hist(df['WHO Region'], bins=6, color='red')
plt.title('WHO Region v\s Confirmed')
plt.xlabel('WHO Region', fontsize=20, color='b')
plt.ylabel('Confirmed', fontsize=20, color='b')

plt.show()
7. Plotting a grouped bar chart of Grouping countries based on there "WHO Regions"
then add all the values of Confirmed , Deaths , Recovered and Active cases induvidlly
of there colume and print the data in ascending order of the "WHO Regions".

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

df=df.groupby('WHO Region')
print(df.groups)
print("---------------------------------------------------------------")
print("The number of countries in each WHO Region are:")
print(df['Country/Region'].nunique())
print("---------------------------------------------------------------")
print("The sum of Confirmed cases in each WHO Region are:")
print(df['Confirmed'].sum())
print("---------------------------------------------------------------")
print("The sum of Deaths in each WHO Region are:")
print(df['Deaths'].sum())
print("---------------------------------------------------------------")
print("The sum of Recovered cases in each WHO Region are:")
print(df['Recovered'].sum())
print("---------------------------------------------------------------")
print("The sum of Active cases in each WHO Region are:")
print(df['Active'].sum())
print("---------------------------------------------------------------")
print("The sum of New cases in each WHO Region are:")
print(df['New cases'].sum())
print("---------------------------------------------------------------")
print("The sum of New deaths in each WHO Region are:")
print(df['New deaths'].sum())
print("---------------------------------------------------------------")
print("The sum of New recovered cases in each WHO Region are:")
print(df['New recovered'].sum())

#Plotting Grouped Bar Graph for the above data


import matplotlib.pyplot as plt
import numpy as np

plt.figure(figsize=(15,10))
Confirmed_bar=np.arange(len(df['WHO Region'].unique()))
Death_bar=[i+0.1 for i in Confirmed_bar]
Recovered_bar=[i+0.1 for i in Death_bar]
Active_bar=[i+0.1 for i in Recovered_bar]
New_cases_bar=[i+0.1 for i in Active_bar]
New_deaths_bar=[i+0.1 for i in New_cases_bar]
New_recovered_bar=[i+0.1 for i in New_deaths_bar]

print(Confirmed_bar)
print(Death_bar)
print(Recovered_bar)
print(Active_bar)
print(New_cases_bar)
print(New_deaths_bar)
print(New_recovered_bar)

plt.bar(Confirmed_bar,df['Confirmed'].sum(),color='red',width=0.1,label='Confirm
plt.bar(Death_bar,df['Deaths'].sum(),color='green',width=0.1,label='Deaths')
plt.bar(Recovered_bar,df['Recovered'].sum(),color='blue',width=0.1,label='Recove
plt.bar(Active_bar,df['Active'].sum(),color='yellow',width=0.1,label='Active')
plt.bar(New_cases_bar,df['New cases'].sum(),color='black',width=0.1,label='New c
plt.bar(New_deaths_bar,df['New deaths'].sum(),color='orange',width=0.1,label='Ne
plt.bar(New_recovered_bar,df['New recovered'].sum(),color='pink',width=0.1,label

plt.xticks(Confirmed_bar+0.4,df['WHO Region'].unique())
plt.xlabel('WHO Region',fontsize=20,color='b')
plt.ylabel('Cases',fontsize=20,color='b')
plt.title('WHO Region v\s Cases',fontsize=30,fontweight='bold',style='italic',co

# Adding the value of the sum of cases on the top of each bar with the help of f
for i in range(len(Confirmed_bar)):
plt.annotate(text=df['Confirmed'].sum()[i], xy=(Confirmed_bar[i],df['Confirm
textcoords="offset points", ha='center', va='bottom', rotation=
plt.annotate(text=df['Deaths'].sum()[i], xy=(Death_bar[i],df['Deaths'].sum()
textcoords="offset points", ha='center', va='bottom', rotation=
plt.annotate(text=df['Recovered'].sum()[i], xy=(Recovered_bar[i],df['Recover
textcoords="offset points", ha='center', va='bottom', rotation=
plt.annotate(text=df['Active'].sum()[i], xy=(Active_bar[i],df['Active'].sum(
textcoords="offset points", ha='center', va='bottom', rotation=
plt.annotate(text=df['New cases'].sum()[i], xy=(New_cases_bar[i],df['New cas
textcoords="offset points", ha='center', va='bottom', rotation=
plt.annotate(text=df['New deaths'].sum()[i], xy=(New_deaths_bar[i],df['New d
textcoords="offset points", ha='center', va='bottom', rotation=
plt.annotate(text=df['New recovered'].sum()[i], xy=(New_recovered_bar[i],df[
textcoords="offset points", ha='center', va='bottom', rotation=

# .annotate() is used to add the text on the top of the bar


# text is the value of the sum of cases
# xy is the position of the text
# xytext is the position of the text with respect to the xy
# ha is the horizontal alignment
# va is the vertical alignment
# rotation is the angle of the text
# color is the color of the text
# textcoords is the coordinate of the text = offset points means the text will b

plt.legend()
plt.show()
{'Africa': [2, 4, 18, 22, 26, 28, 29, 31, 33, 34, 38, 39, 40, 42, 54, 55, 57, 58,
62, 63, 66, 71, 72, 90, 97, 98, 103, 104, 107, 109, 110, 117, 118, 123, 124, 139,
144, 146, 148, 149, 154, 156, 166, 169, 174, 183, 185, 186], 'Americas': [5, 6, 1
1, 14, 17, 20, 23, 32, 35, 37, 41, 44, 49, 50, 51, 53, 69, 70, 73, 74, 76, 86, 11
1, 122, 129, 131, 132, 140, 141, 142, 160, 170, 173, 178, 180], 'Eastern Mediterr
anean': [0, 12, 48, 52, 81, 82, 88, 92, 96, 99, 116, 127, 128, 136, 145, 153, 15
9, 163, 171, 176, 182, 184], 'Europe': [1, 3, 7, 9, 10, 15, 16, 21, 25, 43, 45, 4
6, 47, 56, 60, 61, 64, 65, 67, 68, 75, 77, 78, 83, 84, 85, 89, 91, 93, 95, 100, 1
01, 102, 108, 112, 113, 115, 120, 125, 126, 134, 135, 137, 138, 143, 147, 151, 15
2, 157, 161, 162, 165, 172, 175, 177, 179], 'South-East Asia': [13, 19, 27, 79, 8
0, 106, 119, 158, 167, 168], 'Western Pacific': [8, 24, 30, 36, 59, 87, 94, 105,
114, 121, 130, 133, 150, 155, 164, 181]}
---------------------------------------------------------------
The number of countries in each WHO Region are:
WHO Region
Africa 48
Americas 35
Eastern Mediterranean 22
Europe 56
South-East Asia 10
Western Pacific 16
Name: Country/Region, dtype: int64
---------------------------------------------------------------
The sum of Confirmed cases in each WHO Region are:
WHO Region
Africa 723207
Americas 8839286
Eastern Mediterranean 1490744
Europe 3299523
South-East Asia 1835297
Western Pacific 292428
Name: Confirmed, dtype: int64
---------------------------------------------------------------
The sum of Deaths in each WHO Region are:
WHO Region
Africa 12223
Americas 342732
Eastern Mediterranean 38339
Europe 211144
South-East Asia 41349
Western Pacific 8249
Name: Deaths, dtype: int64
---------------------------------------------------------------
The sum of Recovered cases in each WHO Region are:
WHO Region
Africa 440645
Americas 4468616
Eastern Mediterranean 1201400
Europe 1993723
South-East Asia 1156933
Western Pacific 206770
Name: Recovered, dtype: int64
---------------------------------------------------------------
The sum of Active cases in each WHO Region are:
WHO Region
Africa 270339
Americas 4027938
Eastern Mediterranean 251005
Europe 1094656
South-East Asia 637015
Western Pacific 77409
Name: Active, dtype: int64
---------------------------------------------------------------
The sum of New cases in each WHO Region are:
WHO Region
Africa 12176
Americas 129531
Eastern Mediterranean 12410
Europe 22294
South-East Asia 48993
Western Pacific 3289
Name: New cases, dtype: int64
---------------------------------------------------------------
The sum of New deaths in each WHO Region are:
WHO Region
Africa 353
Americas 3555
Eastern Mediterranean 445
Europe 304
South-East Asia 734
Western Pacific 24
Name: New deaths, dtype: int64
---------------------------------------------------------------
The sum of New recovered cases in each WHO Region are:
WHO Region
Africa 14563
Americas 94776
Eastern Mediterranean 14843
Europe 11732
South-East Asia 37582
Western Pacific 1127
Name: New recovered, dtype: int64
[0 1 2 3 4 5]
[0.1, 1.1, 2.1, 3.1, 4.1, 5.1]
[0.2, 1.2000000000000002, 2.2, 3.2, 4.199999999999999, 5.199999999999999]
[0.30000000000000004, 1.3000000000000003, 2.3000000000000003, 3.3000000000000003,
4.299999999999999, 5.299999999999999]
[0.4, 1.4000000000000004, 2.4000000000000004, 3.4000000000000004, 4.3999999999999
99, 5.399999999999999]
[0.5, 1.5000000000000004, 2.5000000000000004, 3.5000000000000004, 4.4999999999999
98, 5.499999999999998]
[0.6, 1.6000000000000005, 2.6000000000000005, 3.6000000000000005, 4.5999999999999
98, 5.599999999999998]

You might also like