0% found this document useful (0 votes)

14 views43 pages

Case Study Final - Group 05 - Lokesh R M

Uploaded by

Marthala Jagruthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views43 pages

Case Study Final - Group 05 - Lokesh R M

Uploaded by

Marthala Jagruthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

GITAM University, Bangalore

Analyzing the COVID-19 Pandemic

with Pandas and Matplotlib: A Case
Study
Programming with Python

Submitted by:
LOKESH R M BU22CSEN0100145
INDLA ABHISHEK BU22CSEN0101761
VISHNUPRIYA T BU22CSEN0100996
DERANGULA DIVYA SANDHYA BU22CSEN0101184
YAKKANTI PAVAN BU22CSEN0101230

5-6-2023
Analyzing the COVID-19 Pandemic
with Pandas and Matplotlib: A Case
Study

Introduction:
The COVID-19 pandemic has affected the entire world in
unprecedented ways. With the continuous spread of the virus,
understanding its impact and tracking its growth has become
crucial. One way to achieve this is through data analysis, which
can help identify patterns and trends that aid in predicting the
spread of the virus and making informed decisions.

This case study explores the COVID-19 dataset from Kaggle, which
contains daily reports of confirmed cases, deaths, and
recoveries across various countries and regions. We will use the
Python data analysis library Pandas to clean, manipulate, and
analyze the dataset. Additionally, we will use the data
visualization library Matplotlib to create visualizations that
help us interpret and communicate our findings effectively. By
the end of this case study, you will have a deeper understanding
of how to use Pandas and Matplotlib to analyze and visualize
COVID-19 data.

Aim of the case Study

The aim of this case study is to analyze COVID-19 data using
Pandas and Matplotlib. We will use the Pandas data analysis
library to clean, manipulate, and analyze the data, and then we
will use the Matplotlib data visualization library to create
visualizations that help us interpret the data.

Pandas:
It is a Python library used for working with data sets. It has
functions for analyzing, cleaning, exploring, and manipulating
data. The name "Pandas" has a reference to both "Panel Data",
and "Python Data Analysis" and was created by Wes McKinney in
2008.

So in this case study we will use Pandas to analyze COVID-19

data.
1. Reading the data from the CSV file

To read the data from the CSV file, we will use the read_csv()
function. The read_csv() function is part of the Pandas library,
and will read the CSV file into a DataFrame.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print(df)
Country/Region Confirmed Deaths Recovered Active New cases
0 Afghanistan 36263 1269 25198 9796 106 \
1 Albania 4880 144 2745 1991 117
2 Algeria 27973 1163 18837 7973 616
3 Andorra 907 52 803 52 10
4 Angola 950 41 242 667 18
.. ... ... ... ... ... ...
182 West Bank and Gaza 10621 78 3752 6791 152
183 Western Sahara 10 1 8 1 0
184 Yemen 1691 483 833 375 10
185 Zambia 4552 140 2815 1597 71
186 Zimbabwe 2704 36 542 2126 192

New deaths New recovered Deaths / 100 Cases Recovered / 100 Cases
0 10 18 3.50 69.49 \
1 6 63 2.95 56.25
2 8 749 4.16 67.34
3 0 0 5.73 88.53
4 1 0 4.32 25.47
.. ... ... ... ...
182 2 0 0.73 35.33
183 0 0 10.00 80.00
184 4 36 28.56 49.26
185 1 465 3.08 61.84
186 2 24 1.33 20.04

Deaths / 100 Recovered Confirmed last week 1 week change

0 5.04 35526 737 \
1 5.25 4171 709
2 6.17 23691 4282
3 6.48 884 23
4 16.94 749 201
.. ... ... ...
182 2.08 8916 1705
183 12.50 10 0
184 57.98 1619 72
185 4.97 3326 1226
186 6.64 1713 991

1 week % increase WHO Region

0 2.07 Eastern Mediterranean
1 17.00 Europe
2 18.07 Africa
3 2.60 Europe
4 26.84 Africa
.. ... ...
182 19.12 Eastern Mediterranean
183 0.00 Africa
184 4.45 Eastern Mediterranean
185 36.86 Africa
186 57.85 Africa

[187 rows x 15 columns]

2. Knowing the basic information about the

Dataset
It is important to know the basic information about the dataset
before starting the analysis. Pandas provides a function called
info() that is used to print this information.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 187 entries, 0 to 186
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Country/Region 187 non-null object
1 Confirmed 187 non-null int64
2 Deaths 187 non-null int64
3 Recovered 187 non-null int64
4 Active 187 non-null int64
5 New cases 187 non-null int64
6 New deaths 187 non-null int64
7 New recovered 187 non-null int64
8 Deaths / 100 Cases 187 non-null float64
9 Recovered / 100 Cases 187 non-null float64
10 Deaths / 100 Recovered 187 non-null float64
11 Confirmed last week 187 non-null int64
12 1 week change 187 non-null int64
13 1 week % increase 187 non-null float64
14 WHO Region 187 non-null object
dtypes: float64(4), int64(9), object(2)
memory usage: 22.0+ KB

Using the size attribute, we can find the number of rows and
columns in the dataset.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The shape of the given data is:")
print(df.shape)

The shape of the given data is:

(187, 15)

Using the columns attribute, we can find the list of column

names in the dataset.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The Columns index attributes are:")
print(df.columns)
The Columns index attributes are:
Index(['Country/Region', 'Confirmed', 'Deaths', 'Recovered', 'Active',
'New cases', 'New deaths', 'New recovered', 'Deaths / 100 Cases',
'Recovered / 100 Cases', 'Deaths / 100 Recovered',
'Confirmed last week', '1 week change', '1 week % increase',
'WHO Region'],
dtype='object')

Using the describe() function, we can get the descriptive

statistics of the dataset.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

for col in df.columns:

print(f"The Summary Statistics of {col} is:")
print(df[col].describe())
print("-----------------------------------------")
The Summary Statistics of Country/Region is:
count 187
unique 187
top Afghanistan
freq 1
Name: Country/Region, dtype: object
-----------------------------------------
The Summary Statistics of Confirmed is:
count 1.870000e+02
mean 8.813094e+04
std 3.833187e+05
min 1.000000e+01
25% 1.114000e+03
50% 5.059000e+03
75% 4.046050e+04
max 4.290259e+06
Name: Confirmed, dtype: float64
-----------------------------------------
The Summary Statistics of Deaths is:
count 187.000000
mean 3497.518717
std 14100.002482
min 0.000000
25% 18.500000
50% 108.000000
75% 734.000000
max 148011.000000
Name: Deaths, dtype: float64
-----------------------------------------
The Summary Statistics of Recovered is:
count 1.870000e+02
mean 5.063148e+04
std 1.901882e+05
min 0.000000e+00
25% 6.265000e+02
50% 2.815000e+03
75% 2.260600e+04
max 1.846641e+06
Name: Recovered, dtype: float64
-----------------------------------------
The Summary Statistics of Active is:
count 1.870000e+02
mean 3.400194e+04
std 2.133262e+05
min 0.000000e+00
25% 1.415000e+02
50% 1.600000e+03
75% 9.149000e+03
max 2.816444e+06
Name: Active, dtype: float64
-----------------------------------------
The Summary Statistics of New cases is:
count 187.000000
mean 1222.957219
std 5710.374790
min 0.000000
25% 4.000000
50% 49.000000
75% 419.500000
max 56336.000000
Name: New cases, dtype: float64
-----------------------------------------
The Summary Statistics of New deaths is:
count 187.000000
mean 28.957219
std 120.037173
min 0.000000
25% 0.000000
50% 1.000000
75% 6.000000
max 1076.000000
Name: New deaths, dtype: float64
-----------------------------------------
The Summary Statistics of New recovered is:
count 187.000000
mean 933.812834
std 4197.719635
min 0.000000
25% 0.000000
50% 22.000000
75% 221.000000
max 33728.000000
Name: New recovered, dtype: float64
-----------------------------------------
The Summary Statistics of Deaths / 100 Cases is:
count 187.000000
mean 3.019519
std 3.454302
min 0.000000
25% 0.945000
50% 2.150000
75% 3.875000
max 28.560000
Name: Deaths / 100 Cases, dtype: float64
-----------------------------------------
The Summary Statistics of Recovered / 100 Cases is:
count 187.000000
mean 64.820535
std 26.287694
min 0.000000
25% 48.770000
50% 71.320000
75% 86.885000
max 100.000000
Name: Recovered / 100 Cases, dtype: float64
-----------------------------------------
The Summary Statistics of Deaths / 100 Recovered is:
count 187.00
mean inf
std NaN
min 0.00
25% 1.45
50% 3.62
75% 6.44
max inf
Name: Deaths / 100 Recovered, dtype: float64
-----------------------------------------
The Summary Statistics of Confirmed last week is:
count 1.870000e+02
mean 7.868248e+04
std 3.382737e+05
min 1.000000e+01
25% 1.051500e+03
50% 5.020000e+03
75% 3.708050e+04
max 3.834677e+06
Name: Confirmed last week, dtype: float64
-----------------------------------------
The Summary Statistics of 1 week change is:
count 187.000000
mean 9448.459893
std 47491.127684
min -47.000000
25% 49.000000
50% 432.000000
75% 3172.000000
max 455582.000000
Name: 1 week change, dtype: float64
-----------------------------------------
The Summary Statistics of 1 week % increase is:
count 187.000000
mean 13.606203
std 24.509838
min -3.840000
25% 2.775000
50% 6.890000
75% 16.855000
max 226.320000
Name: 1 week % increase, dtype: float64
-----------------------------------------
The Summary Statistics of WHO Region is:
count 187
unique 6
top Europe
freq 56
Name: WHO Region, dtype: object
-----------------------------------------

Using the head() function, we can get the first five rows of the
dataset.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The first 5 rows of the given data is:")
print(df.head())
The first 5 rows of the given data is:
Country/Region Confirmed Deaths Recovered Active New cases New deaths
0 Afghanistan 36263 1269 25198 9796 106 10 \
1 Albania 4880 144 2745 1991 117 6
2 Algeria 27973 1163 18837 7973 616 8
3 Andorra 907 52 803 52 10 0
4 Angola 950 41 242 667 18 1

New recovered Deaths / 100 Cases Recovered / 100 Cases

0 18 3.50 69.49 \
1 63 2.95 56.25
2 749 4.16 67.34
3 0 5.73 88.53
4 0 4.32 25.47

Deaths / 100 Recovered Confirmed last week 1 week change

0 5.04 35526 737 \
1 5.25 4171 709
2 6.17 23691 4282
3 6.48 884 23
4 16.94 749 201

1 week % increase WHO Region

0 2.07 Eastern Mediterranean
1 17.00 Europe
2 18.07 Africa
3 2.60 Europe
4 26.84 Africa

Using the tail() function, we can get the last five rows of the
dataset.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The last 5 rows of the given data is:")
print(df.tail())
The last 5 rows of the given data is:
Country/Region Confirmed Deaths Recovered Active New cases
182 West Bank and Gaza 10621 78 3752 6791 152 \
183 Western Sahara 10 1 8 1 0
184 Yemen 1691 483 833 375 10
185 Zambia 4552 140 2815 1597 71
186 Zimbabwe 2704 36 542 2126 192

New deaths New recovered Deaths / 100 Cases Recovered / 100 Cases
182 2 0 0.73 35.33 \
183 0 0 10.00 80.00
184 4 36 28.56 49.26
185 1 465 3.08 61.84
186 2 24 1.33 20.04

Deaths / 100 Recovered Confirmed last week 1 week change

182 2.08 8916 1705 \
183 12.50 10 0
184 57.98 1619 72
185 4.97 3326 1226
186 6.64 1713 991

1 week % increase WHO Region

182 19.12 Eastern Mediterranean
183 0.00 Africa
184 4.45 Eastern Mediterranean
185 36.86 Africa
186 57.85 Africa

Using the sample() function, we can get a random sample of rows

from the dataset.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The 5 random rows of the given data is:")
print(df.sample(5))
The 5 random rows of the given data is:
Country/Region Confirmed Deaths Recovered Active New cases
95 Latvia 1219 31 1045 143 0 \
0 Afghanistan 36263 1269 25198 9796 106
105 Malaysia 8904 124 8601 179 7
116 Morocco 20887 316 16553 4018 609
163 Syria 674 40 0 634 24

New deaths New recovered Deaths / 100 Cases Recovered / 100 Cases
95 0 0 2.54 85.73 \
0 10 18 3.50 69.49
105 0 1 1.39 96.60
116 3 115 1.51 79.25
163 2 0 5.93 0.00

Deaths / 100 Recovered Confirmed last week 1 week change

95 2.97 1192 27 \
0 5.04 35526 737
105 1.44 8800 104
116 1.91 17562 3325
163 inf 522 152

1 week % increase WHO Region

95 2.27 Europe
0 2.07 Eastern Mediterranean
105 1.18 Western Pacific
116 18.93 Eastern Mediterranean
163 29.12 Eastern Mediterranean

Finding the Minimum, Maximum, Sum, Average, and Count of a

Column

It is important to know the minimum, maximum, sum, average, and

count of a column in the dataset. Pandas provides functions to
find these values.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
numeric_cols = df.select_dtypes(include='number')
col_min = numeric_cols.min()
col_max = numeric_cols.max()
col_mean = numeric_cols.mean()
print("The Minimum values of the numeric columns are:")
print(col_min)
print("---------------------------------------------------------------")
print("The Maximum values of the numeric columns are:")
print(col_max)
print("---------------------------------------------------------------")
print("The Mean values of the numeric columns are:")
print(col_mean)
print("---------------------------------------------------------------")
print("The Sum values of the numeric columns are:")
print(numeric_cols.sum())
print("---------------------------------------------------------------")
print("The Average values of the numeric columns are:")
print(numeric_cols.mean())
print("---------------------------------------------------------------")
print("The Count values of the numeric columns are:")
print(numeric_cols.count())
print("---------------------------------------------------------------")
The Minimum values of the numeric columns are:
Confirmed 10.00
Deaths 0.00
Recovered 0.00
Active 0.00
New cases 0.00
New deaths 0.00
New recovered 0.00
Deaths / 100 Cases 0.00
Recovered / 100 Cases 0.00
Deaths / 100 Recovered 0.00
Confirmed last week 10.00
1 week change -47.00
1 week % increase -3.84
dtype: float64
---------------------------------------------------------------
The Maximum values of the numeric columns are:
Confirmed 4290259.00
Deaths 148011.00
Recovered 1846641.00
Active 2816444.00
New cases 56336.00
New deaths 1076.00
New recovered 33728.00
Deaths / 100 Cases 28.56
Recovered / 100 Cases 100.00
Deaths / 100 Recovered inf
Confirmed last week 3834677.00
1 week change 455582.00
1 week % increase 226.32
dtype: float64
---------------------------------------------------------------
The Mean values of the numeric columns are:
Confirmed 8.813094e+04
Deaths 3.497519e+03
Recovered 5.063148e+04
Active 3.400194e+04
New cases 1.222957e+03
New deaths 2.895722e+01
New recovered 9.338128e+02
Deaths / 100 Cases 3.019519e+00
Recovered / 100 Cases 6.482053e+01
Deaths / 100 Recovered inf
Confirmed last week 7.868248e+04
1 week change 9.448460e+03
1 week % increase 1.360620e+01
dtype: float64
---------------------------------------------------------------
The Sum values of the numeric columns are:
Confirmed 16480485.00
Deaths 654036.00
Recovered 9468087.00
Active 6358362.00
New cases 228693.00
New deaths 5415.00
New recovered 174623.00
Deaths / 100 Cases 564.65
Recovered / 100 Cases 12121.44
Deaths / 100 Recovered inf
Confirmed last week 14713623.00
1 week change 1766862.00
1 week % increase 2544.36
dtype: float64
---------------------------------------------------------------
The Average values of the numeric columns are:
Confirmed 8.813094e+04
Deaths 3.497519e+03
Recovered 5.063148e+04
Active 3.400194e+04
New cases 1.222957e+03
New deaths 2.895722e+01
New recovered 9.338128e+02
Deaths / 100 Cases 3.019519e+00
Recovered / 100 Cases 6.482053e+01
Deaths / 100 Recovered inf
Confirmed last week 7.868248e+04
1 week change 9.448460e+03
1 week % increase 1.360620e+01
dtype: float64
---------------------------------------------------------------
The Count values of the numeric columns are:
Confirmed 187
Deaths 187
Recovered 187
Active 187
New cases 187
New deaths 187
New recovered 187
Deaths / 100 Cases 187
Recovered / 100 Cases 187
Deaths / 100 Recovered 187
Confirmed last week 187
1 week change 187
1 week % increase 187
dtype: int64
---------------------------------------------------------------

3. Cleaning the data

Cleaning the data is the process of converting the data into a

usable format. In this case study, we will clean the data by:

1. Dropping columns that are not needed for the analysis.

2. Renaming columns.
3. Checking for missing values and replacing them.
4. Checking for duplicates and removing them.

Checking for the Duplicate data and removing them.

It is important to check for duplicate data and remove it.
Pandas provides a function called duplicated() that is used to
check for duplicate data. The duplicated() function returns a
Boolean value for each row:

True means the row is a duplicate.

False means the row is not a duplicate.
We can use the sum() function to count the number of duplicate
rows in the dataset.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
dup=df.duplicated()
print(dup)
print("----------------------------------")
print("The number of duplicate rows are:")
print(sum(dup))

0 False
1 False
2 False
3 False
4 False
...
182 False
183 False
184 False
185 False
186 False
Length: 187, dtype: bool
----------------------------------
The number of duplicate rows are:
0

Checking for the missing values in the dataset is an important

step in the data cleaning process. Pandas provides the isnull()
function to check for missing values. The isnull() function
returns a boolean value for each cell in the dataset. If a cell
contains a missing value, it will be marked as True, otherwise
it will be marked as False.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

print("The number of null values in each column is:")

print(df.isnull().sum())
The number of null values in each column is:
Country/Region 0
Confirmed 0
Deaths 0
Recovered 0
Active 0
New cases 0
New deaths 0
New recovered 0
Deaths / 100 Cases 0
Recovered / 100 Cases 0
Deaths / 100 Recovered 0
Confirmed last week 0
1 week change 0
1 week % increase 0
WHO Region 0
dtype: int64

3. Basic Analysis

Basic analysis is the process of analyzing the data to find

patterns and trends. In this case study, we will perform basic
analysis by:

1. What are the data types of each column?

In the below code we used the dtypes attribute to find the data
types of each column in the dataset.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The data types of each column is:")
print(df.dtypes)

The data types of each column is:

Country/Region object
Confirmed int64
Deaths int64
Recovered int64
Active int64
New cases int64
New deaths int64
New recovered int64
Deaths / 100 Cases float64
Recovered / 100 Cases float64
Deaths / 100 Recovered float64
Confirmed last week int64
1 week change int64
1 week % increase float64
WHO Region object
dtype: object

2. How many unique countries/regions are included in the data?

In the below code we used the nunique() function to find the
number of unique countries/regions in the dataset.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print("The number of unique countries/regions are:")
print(df['Country/Region'].nunique())
print("---------------------------------------------------------------")
print("The number of unique WHO Regions are:")
print(df['WHO Region'].nunique())

The number of unique countries/regions are:

187
---------------------------------------------------------------
The number of unique WHO Regions are:
6

3. Printing the Name of the 'Countries' with there 'WHO Region Name'

In the below code we used df[] to print the name of the

'Countries' with there 'WHO Region Name'.

df['Country/Region'] is used to print the name of the

'Countries' and df['WHO Region'] is used to print the name of
the 'WHO Region Name'.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
print(df[['Country/Region','WHO Region']])

Country/Region WHO Region

0 Afghanistan Eastern Mediterranean
1 Albania Europe
2 Algeria Africa
3 Andorra Europe
4 Angola Africa
.. ... ...
182 West Bank and Gaza Eastern Mediterranean
183 Western Sahara Africa
184 Yemen Eastern Mediterranean
185 Zambia Africa
186 Zimbabwe Africa

[187 rows x 2 columns]

4. Adding the columns "Total Cases" which is the sum of "Deaths","Active" and
"Recovered" columns and comparing it with the "Confirmed" column.

In the below code we added the columns "Total Cases" which is

the sum of "Deaths","Active" and "Recovered" columns and
compared it with the "Confirmed" column.
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df['Total Cases']=df['Deaths']+df['Active']+df['Recovered']
print(df[['Deaths','Active','Recovered','Total Cases','Confirmed']])
print("---------------------------------------------------------------")
print("The number of rows where the value of 'Confirmed' column is Equal to the
print(df[df['Confirmed']==df['Total Cases']].shape[0])

Deaths Active Recovered Total Cases Confirmed

0 1269 9796 25198 36263 36263
1 144 1991 2745 4880 4880
2 1163 7973 18837 27973 27973
3 52 52 803 907 907
4 41 667 242 950 950
.. ... ... ... ... ...
182 78 6791 3752 10621 10621
183 1 1 8 10 10
184 483 375 833 1691 1691
185 140 1597 2815 4552 4552
186 36 2126 542 2704 2704

[187 rows x 5 columns]

---------------------------------------------------------------
The number of rows where the value of 'Confirmed' column is greater than the valu
e of 'Total Cases' column are:
187

4. Sorting

Sorting is the process of arranging the data in a particular

order. In this case study, we will sort the data by:

1. Sorting the data by the number of confirmed cases in ascending order.

In the below code we used the sort_values() function to sort the

data by the number of confirmed cases in ascending order.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

df=df.sort_values(by=['Confirmed'],ascending=True)
print(df[['Country/Region','Confirmed']])
print("---------------------------------------------------------------")
print("The Country/Region with the lowest number of confirmed cases is:")
print(df.iloc[0,0])
Country/Region Confirmed
183 Western Sahara 10
75 Holy See 12
68 Greenland 14
140 Saint Kitts and Nevis 17
49 Dominica 18
.. ... ...
154 South Africa 452529
138 Russia 816680
79 India 1480073
23 Brazil 2442375
173 US 4290259

[187 rows x 2 columns]

---------------------------------------------------------------
The Country/Region with the lowest number of confirmed cases is:
Western Sahara

2. Sorting the data by the alphabetical order of the countries in ascending order.

In the below code we used the sort_values() function to sort the

data by the alphabetical order of the countries in ascending
order.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

df=df.sort_values(['Country/Region'],ascending=True)
print(df[['Country/Region','Confirmed']])
print("---------------------------------------------------------------")

Country/Region Confirmed
0 Afghanistan 36263
1 Albania 4880
2 Algeria 27973
3 Andorra 907
4 Angola 950
.. ... ...
182 West Bank and Gaza 10621
183 Western Sahara 10
184 Yemen 1691
185 Zambia 4552
186 Zimbabwe 2704

[187 rows x 2 columns]

---------------------------------------------------------------

3. Finding the Lowest and the Highest number of confirmed cases.

In the below code we used the sort_values() function to find the

Lowest and the Highest number of confirmed cases.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

df=df.sort_values(['Country/Region','Confirmed'],ascending=[True,False])
print(df[['Country/Region','Confirmed']])
print("---------------------------------------------------------------")
print("The Country/Region with the highest number of confirmed cases is:")
print(df.iloc[0,0])
print("---------------------------------------------------------------")
print("The Country/Region with the lowest number of confirmed cases is:")
print(df.iloc[-1,0])

[187 rows x 2 columns]

---------------------------------------------------------------
The Country/Region with the highest number of confirmed cases is:
Afghanistan
---------------------------------------------------------------
The Country/Region with the lowest number of confirmed cases is:
Zimbabwe

4. Extracting the data of range based on the name of the 'Country' starting with 'I'

In the below code we used .str.startswith() function to extract

the data of range based on the name of the 'Country' starting
with 'I'.

where .str is the string function and .startswith() is the

function to extract the data of range based on the name of the
'Country' starting with 'I'.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df=df[df['Country/Region'].str.startswith('I')]
print(df[['Country/Region','Deaths','Recovered']])
print("---------------------------------------------------------------")
print("The number of countries starting with 'I' are:")
print(df['Country/Region'].nunique())
Country/Region Deaths Recovered
78 Iceland 10 1823
79 India 33408 951166
80 Indonesia 4838 58173
81 Iran 15912 255144
82 Iraq 4458 77144
83 Ireland 1764 23364
84 Israel 474 27133
85 Italy 35112 198593
---------------------------------------------------------------
The number of countries starting with 'I' are:
8

5. Extracting the data with the indexing

In the below code we used loc[] function to extract the data

with the indexing.

where loc[] is the function to extract the data with the

indexing.

In the code we used the name of the 'Country' as the index and
extracted the data of the 'Country' 'India'.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df=df.loc[df['Country/Region']=='India']
print(df)

Country/Region Confirmed Deaths Recovered Active New cases

79 India 1480073 33408 951166 495499 44457 \

New deaths New recovered Deaths / 100 Cases Recovered / 100 Cases
79 637 33598 2.26 64.26 \

Deaths / 100 Recovered Confirmed last week 1 week change

79 3.51 1155338 324735 \

1 week % increase WHO Region

79 28.11 South-East Asia

6. Extracting the range of data with the indexing using iloc[]

In the below code we used iloc[] function to extract the range

of data with the indexing.

where iloc[] is the function to extract the range of data with

the indexing.

0:10,0:2 is the range of data with the indexing.

where 0:10 is the range of rows and 0:2 is the range of columns.
In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df=df.iloc[0:10,0:2]
print(df)

Country/Region Confirmed
0 Afghanistan 36263
1 Albania 4880
2 Algeria 27973
3 Andorra 907
4 Angola 950
5 Antigua and Barbuda 86
6 Argentina 167416
7 Armenia 37390
8 Australia 15303
9 Austria 20558

7. Extracting the data based on the 'WHO Region'

In the below code we used groupby() function to extract the data

based on the 'WHO Region'.

where groupby() is the function to extract the data based on the

'WHO Region'.

and then we used nunique() function to find the number of unique

countries/regions in the dataset.

In [ ]: # groupby() function by the 'WHO Region' column

import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df=df.groupby('WHO Region')
print(df.groups)
print("---------------------------------------------------------------")
print("The number of countries in each WHO Region are:")
print(df['Country/Region'].nunique())
{'Africa': [2, 4, 18, 22, 26, 28, 29, 31, 33, 34, 38, 39, 40, 42, 54, 55, 57, 58,
62, 63, 66, 71, 72, 90, 97, 98, 103, 104, 107, 109, 110, 117, 118, 123, 124, 139,
144, 146, 148, 149, 154, 156, 166, 169, 174, 183, 185, 186], 'Americas': [5, 6, 1
1, 14, 17, 20, 23, 32, 35, 37, 41, 44, 49, 50, 51, 53, 69, 70, 73, 74, 76, 86, 11
1, 122, 129, 131, 132, 140, 141, 142, 160, 170, 173, 178, 180], 'Eastern Mediterr
anean': [0, 12, 48, 52, 81, 82, 88, 92, 96, 99, 116, 127, 128, 136, 145, 153, 15
9, 163, 171, 176, 182, 184], 'Europe': [1, 3, 7, 9, 10, 15, 16, 21, 25, 43, 45, 4
6, 47, 56, 60, 61, 64, 65, 67, 68, 75, 77, 78, 83, 84, 85, 89, 91, 93, 95, 100, 1
01, 102, 108, 112, 113, 115, 120, 125, 126, 134, 135, 137, 138, 143, 147, 151, 15
2, 157, 161, 162, 165, 172, 175, 177, 179], 'South-East Asia': [13, 19, 27, 79, 8
0, 106, 119, 158, 167, 168], 'Western Pacific': [8, 24, 30, 36, 59, 87, 94, 105,
114, 121, 130, 133, 150, 155, 164, 181]}
---------------------------------------------------------------
The number of countries in each WHO Region are:
WHO Region
Africa 48
Americas 35
Eastern Mediterranean 22
Europe 56
South-East Asia 10
Western Pacific 16
Name: Country/Region, dtype: int64

8. Creating a New column called as 'Review' and giving the feedback by comparing the
Deaths and Recovered cases.

In [ ]: import pandas as pd

data = pd.read_csv('COVID Dataset.csv')

df = pd.DataFrame(data)

# Create a new column called Review

df['Review'] = 'Not Reviewed'

# Set Review based on Death/Recovered values

df.loc[df['Deaths']>df['Recovered'],'Review'] = 'High Risk'
df.loc[df['Deaths']<=df['Recovered'],'Review'] = 'Low Risk'

# Get count of High/Low risk and total count

high_risk_count = df.loc[df['Review'] == 'High Risk', 'Review'].count()
low_risk_count = df.loc[df['Review'] == 'Low Risk', 'Review'].count()
total_count = df['Review'].count()

print(f"High Risk Count: {high_risk_count}")

print(f"Low Risk Count: {low_risk_count}")
print(f"Total Count: {total_count}")
print("---------------------------------------------------------------")
print(df[['Country/Region','Review']])
print("---------------------------------------------------------------")
print("The number of countries with High Risk are:")
df.loc[df['Review']=='High Risk',['Country/Region']]
High Risk Count: 7
Low Risk Count: 180
Total Count: 187
---------------------------------------------------------------
Country/Region Review
0 Afghanistan Low Risk
1 Albania Low Risk
2 Algeria Low Risk
3 Andorra Low Risk
4 Angola Low Risk
.. ... ...
182 West Bank and Gaza Low Risk
183 Western Sahara Low Risk
184 Yemen Low Risk
185 Zambia Low Risk
186 Zimbabwe Low Risk

[187 rows x 2 columns]

---------------------------------------------------------------
The number of countries with High Risk are:
Out[ ]: Country/Region

32 Canada

117 Mozambique

120 Netherlands

147 Serbia

161 Sweden

163 Syria

177 United Kingdom

9. Printing the data of the 'WHO Regions' ending with 'O'

In the below code we used .str.endswith() function to extract

the data of range based on the name of the 'WHO Regions' ending
with 'O'.

where .str is the string function and .endswith() is the

function to extract the data of range based on the name of the
'WHO Regions' ending with 'O'.

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)
df=df[df['WHO Region'].str.endswith('s')]
print(df[['Country/Region','WHO Region']])
Country/Region WHO Region
5 Antigua and Barbuda Americas
6 Argentina Americas
11 Bahamas Americas
14 Barbados Americas
17 Belize Americas
20 Bolivia Americas
23 Brazil Americas
32 Canada Americas
35 Chile Americas
37 Colombia Americas
41 Costa Rica Americas
44 Cuba Americas
49 Dominica Americas
50 Dominican Republic Americas
51 Ecuador Americas
53 El Salvador Americas
69 Grenada Americas
70 Guatemala Americas
73 Guyana Americas
74 Haiti Americas
76 Honduras Americas
86 Jamaica Americas
111 Mexico Americas
122 Nicaragua Americas
129 Panama Americas
131 Paraguay Americas
132 Peru Americas
140 Saint Kitts and Nevis Americas
141 Saint Lucia Americas
142 Saint Vincent and the Grenadines Americas
160 Suriname Americas
170 Trinidad and Tobago Americas
173 US Americas
178 Uruguay Americas
180 Venezuela Americas

10. Grouping countries based on there "WHO Regions" then add all the values of
Confirmed , Deaths , Recovered and Active cases induvidlly of there colume and
print the data in ascending order of the "WHO Regions".

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

df=df.groupby('WHO Region')
print(df.groups)
print("---------------------------------------------------------------")
print("The number of countries in each WHO Region are:")
print(df['Country/Region'].nunique())
print("---------------------------------------------------------------")
print("The sum of Confirmed cases in each WHO Region are:")
print(df['Confirmed'].sum())
print("---------------------------------------------------------------")
print("The sum of Deaths in each WHO Region are:")
print(df['Deaths'].sum())
print("---------------------------------------------------------------")
print("The sum of Recovered cases in each WHO Region are:")
print(df['Recovered'].sum())
print("---------------------------------------------------------------")
print("The sum of Active cases in each WHO Region are:")
print(df['Active'].sum())
print("---------------------------------------------------------------")
print("The sum of New cases in each WHO Region are:")
print(df['New cases'].sum())
print("---------------------------------------------------------------")
print("The sum of New deaths in each WHO Region are:")
print(df['New deaths'].sum())
print("---------------------------------------------------------------")
print("The sum of New recovered cases in each WHO Region are:")
print(df['New recovered'].sum())
{'Africa': [2, 4, 18, 22, 26, 28, 29, 31, 33, 34, 38, 39, 40, 42, 54, 55, 57, 58,
62, 63, 66, 71, 72, 90, 97, 98, 103, 104, 107, 109, 110, 117, 118, 123, 124, 139,
144, 146, 148, 149, 154, 156, 166, 169, 174, 183, 185, 186], 'Americas': [5, 6, 1
1, 14, 17, 20, 23, 32, 35, 37, 41, 44, 49, 50, 51, 53, 69, 70, 73, 74, 76, 86, 11
1, 122, 129, 131, 132, 140, 141, 142, 160, 170, 173, 178, 180], 'Eastern Mediterr
anean': [0, 12, 48, 52, 81, 82, 88, 92, 96, 99, 116, 127, 128, 136, 145, 153, 15
9, 163, 171, 176, 182, 184], 'Europe': [1, 3, 7, 9, 10, 15, 16, 21, 25, 43, 45, 4
6, 47, 56, 60, 61, 64, 65, 67, 68, 75, 77, 78, 83, 84, 85, 89, 91, 93, 95, 100, 1
01, 102, 108, 112, 113, 115, 120, 125, 126, 134, 135, 137, 138, 143, 147, 151, 15
2, 157, 161, 162, 165, 172, 175, 177, 179], 'South-East Asia': [13, 19, 27, 79, 8
0, 106, 119, 158, 167, 168], 'Western Pacific': [8, 24, 30, 36, 59, 87, 94, 105,
114, 121, 130, 133, 150, 155, 164, 181]}
---------------------------------------------------------------
The number of countries in each WHO Region are:
WHO Region
Africa 48
Americas 35
Eastern Mediterranean 22
Europe 56
South-East Asia 10
Western Pacific 16
Name: Country/Region, dtype: int64
---------------------------------------------------------------
The sum of Confirmed cases in each WHO Region are:
WHO Region
Africa 723207
Americas 8839286
Eastern Mediterranean 1490744
Europe 3299523
South-East Asia 1835297
Western Pacific 292428
Name: Confirmed, dtype: int64
---------------------------------------------------------------
The sum of Deaths in each WHO Region are:
WHO Region
Africa 12223
Americas 342732
Eastern Mediterranean 38339
Europe 211144
South-East Asia 41349
Western Pacific 8249
Name: Deaths, dtype: int64
---------------------------------------------------------------
The sum of Recovered cases in each WHO Region are:
WHO Region
Africa 440645
Americas 4468616
Eastern Mediterranean 1201400
Europe 1993723
South-East Asia 1156933
Western Pacific 206770
Name: Recovered, dtype: int64
---------------------------------------------------------------
The sum of Active cases in each WHO Region are:
WHO Region
Africa 270339
Americas 4027938
Eastern Mediterranean 251005
Europe 1094656
South-East Asia 637015
Western Pacific 77409
Name: Active, dtype: int64
---------------------------------------------------------------
The sum of New cases in each WHO Region are:
WHO Region
Africa 12176
Americas 129531
Eastern Mediterranean 12410
Europe 22294
South-East Asia 48993
Western Pacific 3289
Name: New cases, dtype: int64
---------------------------------------------------------------
The sum of New deaths in each WHO Region are:
WHO Region
Africa 353
Americas 3555
Eastern Mediterranean 445
Europe 304
South-East Asia 734
Western Pacific 24
Name: New deaths, dtype: int64
---------------------------------------------------------------
The sum of New recovered cases in each WHO Region are:
WHO Region
Africa 14563
Americas 94776
Eastern Mediterranean 14843
Europe 11732
South-East Asia 37582
Western Pacific 1127
Name: New recovered, dtype: int64

Matplotlib

Matplotlib is a Python library used for plotting. It can create

various types of plots, such as line plots, histograms, and
scatter plots.

In this case study we will use it to plot the data of the COVID-
19 dataset.

1. Plotting a Line graph to compare the Confirmed Cases in all the countries

In [ ]: import pandas as pd
import matplotlib.pyplot as plt
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

plt.figure(figsize=(30,25))
plt.plot(df['Country/Region'],df['Confirmed'],marker='h')
plt.xlabel('Country/Region',fontsize=25,color='b')
plt.xticks(rotation=45, ha='right', size=5)
plt.ylabel('Confirmed',fontsize=25,color='b')
plt.title('Confirmed Cases', fontsize=30, fontweight='bold',style='italic',color
plt.show()

In the above graph is plotted to visualize the data of the

Confirmed cases in all the countries in the dataset.

In the above code:

plt is the alias for the matplotlib library.

.figure() is used to create a new figure.
figsize is used to set the size of the figure.
.plot() is used to plot the data.
df['Country/Region'] is used to plot the data of the
'Country/Region' column.
df['Confirmed'] is used to plot the data of the
'Confirmed' column.
marker is used to set the marker for the data points.
.xlabel() is used to set the label for the x-axis.
.xticks() is used to set the ticks for the x-axis.
rotation is used to rotate the ticks.
ha is used to set the alignment of the ticks.
size is used to set the size of the ticks.
.ylabel() is used to set the label for the y-axis.
.title() is used to set the title of the plot.
.show() is used to display the plot.
2. Plotting a Stacked bar graph of "Confirmed" and "Recovered" cases of each Country

In [ ]: import matplotlib.pyplot as plt

import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

plt.figure(figsize=(80,60))
plt.bar(df['Country/Region'],df['Confirmed'],color='red',label='Confirmed')
plt.bar(df['Country/Region'],df['Recovered'],color='green',label='Recovered')
plt.xlabel('Country/Region',fontsize=50,color='b')
plt.xticks(rotation=45, ha='right', size=10)
plt.yticks(size=30)

# Add values to the end of each bar with pixle offset to the top
for i, (confirmed, recover) in enumerate(zip(df['Confirmed'], df['Recovered'])):
plt.annotate(str(confirmed), xy=(i, confirmed+150000), va='center', ha='cent
fontsize=13, fontweight='bold',rotation=90)
plt.annotate(str(recover), xy=(i, confirmed+50000), va='center', ha='center'
fontsize=13, fontweight='bold',rotation=90)

plt.ylabel('Confirmed',fontsize=50,color='b')
plt.title('Confirmed and Recovered Cases', fontsize=100, fontweight='bold',style
plt.legend()
plt.show()

3. Plotting a Horizontal bar graph of the Death and Recovered cases of each Country
In [ ]: import matplotlib.pyplot as plt

plt.figure(figsize=(75,150))
plt.barh(df['Country/Region'], df['Deaths'], color='red', label='Deaths')
plt.barh(df['Country/Region'], df['Recovered'], color='green', label='Recovered'
plt.xlabel('Cases')
plt.yticks(rotation=45, ha='right', size=23)
plt.xticks(size=20)
plt.ylabel('Country/Region', fontsize=50, color='b')
plt.title('Death and Recovery Cases', fontsize=100, fontweight='bold',style='ita
plt.xlabel('Cases', fontsize=50, color='b')
plt.legend()

# Add values to the end of each bar with pixle offset to the right
for i, (death, recover) in enumerate(zip(df['Deaths'], df['Recovered'])):
plt.annotate(str(death), xy=(death+recover+30000, i), va='center', ha='cente
fontsize=20, fontweight='bold')
plt.annotate(str(recover), xy=(death+recover+90000, i), va='center', ha='cen
fontsize=20, fontweight='bold')

plt.show()
3. Vertical Stacked bar graph of the COUNTRIES V\S Deaths / 100 Cases,Recovered /
100 Cases,Deaths / 100 Recovered

In [ ]: import matplotlib.pyplot as plt

import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

plt.figure(figsize=(80,75))
plt.bar(df['Country/Region'],df['Deaths / 100 Cases'],color='red',label='Deaths
plt.bar(df['Country/Region'],df['Recovered / 100 Cases'],color='green',
label='Recovered for every 100 Cases',bottom=df['Deaths / 100 Cases'])
plt.bar(df['Country/Region'],df['Deaths / 100 Recovered'],color='blue',
label='Deaths for every 100 Recovered',bottom=df['Deaths / 100 Cases']+d

plt.xlabel('Country/Region',fontsize=50,color='b')
plt.xticks(rotation=45, ha='right', size=10)

# Add values to the end of each bar with pixle offset to the top
for i, (death, recover, death_rec) in enumerate(zip(df['Deaths / 100 Cases'],
df['Recovered / 100 Cases'],
plt.annotate(str(death), xy=(i, death_rec+150), va='center', ha='center',
color='red', fontsize=15, fontweight='bold',rotation=90)
plt.annotate(str(recover), xy=(i, death_rec+200), va='center', ha='center',
color='green', fontsize=15, fontweight='bold',rotation=90)
plt.annotate(str(death_rec), xy=(i, death_rec+250), va='center', ha='center'
color='blue', fontsize=13, fontweight='bold',rotation=90)

plt.ylabel('Cases', fontsize=50, color='b')

plt.title('Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered',
fontsize=30, fontweight='bold',style='italic',color='r')
plt.legend()
plt.show()
4. Pie chart of the 'countries' v\s "WHO Region"

In [ ]: import matplotlib.pyplot as plt

import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

plt.figure(figsize=(15,10))
plt.pie(df['WHO Region'].value_counts(),labels=df['WHO Region'].unique(),
autopct='%5.1f%%',shadow=True,explode=(0.0,0.0,0.0,0.0,0.0,0.2))
plt.title('WHO Region')
plt.show()
5. Sub plots of WHO Region v\s Confirmed, Deaths, Recovered and active represented
by the scatter plot

In [ ]: import matplotlib.pyplot as plt

import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

fig, ax = plt.subplots(2, 3, figsize=(20, 10))

ax[0, 0].scatter(df['WHO Region'], df['Confirmed'], color='red')
ax[0, 0].set_title('WHO Region v\s Confirmed')
ax[0, 1].scatter(df['WHO Region'], df['Deaths'], color='green')
ax[0, 1].set_title('WHO Region v\s Deaths')
ax[0, 2].scatter(df['WHO Region'], df['Recovered'], color='blue')
ax[0, 2].set_title('WHO Region v\s Recovered')
ax[1, 0].scatter(df['WHO Region'], df['Active'], color='yellow')
ax[1, 0].set_title('WHO Region v\s Active')
ax[1, 1].scatter(df['WHO Region'], df['New cases'], color='black')
ax[1, 1].set_title('WHO Region v\s New cases')
ax[1, 2].scatter(df['WHO Region'], df['New deaths'], color='orange')
ax[1, 2].set_title('WHO Region v\s New deaths')

# Increasing the distance between subplots

plt.subplots_adjust(left=0.2, bottom=0.1, right=0.9, top=0.9, wspace=0.4, hspace
#Values of x axis are overlapping so we are rotating them
for ax in fig.axes:
plt.sca(ax)
plt.xticks(rotation=45, ha='right', size=8)

plt.show()

6. Histogram of the WHO Region v\s Confirmed last week

In [ ]: import matplotlib.pyplot as plt

import pandas as pd

data = pd.read_csv('COVID Dataset.csv')

df = pd.DataFrame(data)

plt.figure(figsize=(15,10))
plt.hist(df['WHO Region'], bins=6, color='red')
plt.title('WHO Region v\s Confirmed')
plt.xlabel('WHO Region', fontsize=20, color='b')
plt.ylabel('Confirmed', fontsize=20, color='b')

plt.show()
7. Plotting a grouped bar chart of Grouping countries based on there "WHO Regions"
then add all the values of Confirmed , Deaths , Recovered and Active cases induvidlly
of there colume and print the data in ascending order of the "WHO Regions".

In [ ]: import pandas as pd
data=pd.read_csv('COVID Dataset.csv')
df=pd.DataFrame(data)

#Plotting Grouped Bar Graph for the above data

import matplotlib.pyplot as plt
import numpy as np

plt.figure(figsize=(15,10))
Confirmed_bar=np.arange(len(df['WHO Region'].unique()))
Death_bar=[i+0.1 for i in Confirmed_bar]
Recovered_bar=[i+0.1 for i in Death_bar]
Active_bar=[i+0.1 for i in Recovered_bar]
New_cases_bar=[i+0.1 for i in Active_bar]
New_deaths_bar=[i+0.1 for i in New_cases_bar]
New_recovered_bar=[i+0.1 for i in New_deaths_bar]

print(Confirmed_bar)
print(Death_bar)
print(Recovered_bar)
print(Active_bar)
print(New_cases_bar)
print(New_deaths_bar)
print(New_recovered_bar)

plt.bar(Confirmed_bar,df['Confirmed'].sum(),color='red',width=0.1,label='Confirm
plt.bar(Death_bar,df['Deaths'].sum(),color='green',width=0.1,label='Deaths')
plt.bar(Recovered_bar,df['Recovered'].sum(),color='blue',width=0.1,label='Recove
plt.bar(Active_bar,df['Active'].sum(),color='yellow',width=0.1,label='Active')
plt.bar(New_cases_bar,df['New cases'].sum(),color='black',width=0.1,label='New c
plt.bar(New_deaths_bar,df['New deaths'].sum(),color='orange',width=0.1,label='Ne
plt.bar(New_recovered_bar,df['New recovered'].sum(),color='pink',width=0.1,label

plt.xticks(Confirmed_bar+0.4,df['WHO Region'].unique())
plt.xlabel('WHO Region',fontsize=20,color='b')
plt.ylabel('Cases',fontsize=20,color='b')
plt.title('WHO Region v\s Cases',fontsize=30,fontweight='bold',style='italic',co

# Adding the value of the sum of cases on the top of each bar with the help of f
for i in range(len(Confirmed_bar)):
plt.annotate(text=df['Confirmed'].sum()[i], xy=(Confirmed_bar[i],df['Confirm
textcoords="offset points", ha='center', va='bottom', rotation=
plt.annotate(text=df['Deaths'].sum()[i], xy=(Death_bar[i],df['Deaths'].sum()
textcoords="offset points", ha='center', va='bottom', rotation=
plt.annotate(text=df['Recovered'].sum()[i], xy=(Recovered_bar[i],df['Recover
textcoords="offset points", ha='center', va='bottom', rotation=
plt.annotate(text=df['Active'].sum()[i], xy=(Active_bar[i],df['Active'].sum(
textcoords="offset points", ha='center', va='bottom', rotation=
plt.annotate(text=df['New cases'].sum()[i], xy=(New_cases_bar[i],df['New cas
textcoords="offset points", ha='center', va='bottom', rotation=
plt.annotate(text=df['New deaths'].sum()[i], xy=(New_deaths_bar[i],df['New d
textcoords="offset points", ha='center', va='bottom', rotation=
plt.annotate(text=df['New recovered'].sum()[i], xy=(New_recovered_bar[i],df[
textcoords="offset points", ha='center', va='bottom', rotation=

# .annotate() is used to add the text on the top of the bar

# text is the value of the sum of cases
# xy is the position of the text
# xytext is the position of the text with respect to the xy
# ha is the horizontal alignment
# va is the vertical alignment
# rotation is the angle of the text
# color is the color of the text
# textcoords is the coordinate of the text = offset points means the text will b

plt.legend()
plt.show()
{'Africa': [2, 4, 18, 22, 26, 28, 29, 31, 33, 34, 38, 39, 40, 42, 54, 55, 57, 58,
62, 63, 66, 71, 72, 90, 97, 98, 103, 104, 107, 109, 110, 117, 118, 123, 124, 139,
144, 146, 148, 149, 154, 156, 166, 169, 174, 183, 185, 186], 'Americas': [5, 6, 1
1, 14, 17, 20, 23, 32, 35, 37, 41, 44, 49, 50, 51, 53, 69, 70, 73, 74, 76, 86, 11
1, 122, 129, 131, 132, 140, 141, 142, 160, 170, 173, 178, 180], 'Eastern Mediterr
anean': [0, 12, 48, 52, 81, 82, 88, 92, 96, 99, 116, 127, 128, 136, 145, 153, 15
9, 163, 171, 176, 182, 184], 'Europe': [1, 3, 7, 9, 10, 15, 16, 21, 25, 43, 45, 4
6, 47, 56, 60, 61, 64, 65, 67, 68, 75, 77, 78, 83, 84, 85, 89, 91, 93, 95, 100, 1
01, 102, 108, 112, 113, 115, 120, 125, 126, 134, 135, 137, 138, 143, 147, 151, 15
2, 157, 161, 162, 165, 172, 175, 177, 179], 'South-East Asia': [13, 19, 27, 79, 8
0, 106, 119, 158, 167, 168], 'Western Pacific': [8, 24, 30, 36, 59, 87, 94, 105,
114, 121, 130, 133, 150, 155, 164, 181]}
---------------------------------------------------------------
The number of countries in each WHO Region are:
WHO Region
Africa 48
Americas 35
Eastern Mediterranean 22
Europe 56
South-East Asia 10
Western Pacific 16
Name: Country/Region, dtype: int64
---------------------------------------------------------------
The sum of Confirmed cases in each WHO Region are:
WHO Region
Africa 723207
Americas 8839286
Eastern Mediterranean 1490744
Europe 3299523
South-East Asia 1835297
Western Pacific 292428
Name: Confirmed, dtype: int64
---------------------------------------------------------------
The sum of Deaths in each WHO Region are:
WHO Region
Africa 12223
Americas 342732
Eastern Mediterranean 38339
Europe 211144
South-East Asia 41349
Western Pacific 8249
Name: Deaths, dtype: int64
---------------------------------------------------------------
The sum of Recovered cases in each WHO Region are:
WHO Region
Africa 440645
Americas 4468616
Eastern Mediterranean 1201400
Europe 1993723
South-East Asia 1156933
Western Pacific 206770
Name: Recovered, dtype: int64
---------------------------------------------------------------
The sum of Active cases in each WHO Region are:
WHO Region
Africa 270339
Americas 4027938
Eastern Mediterranean 251005
Europe 1094656
South-East Asia 637015
Western Pacific 77409
Name: Active, dtype: int64
---------------------------------------------------------------
The sum of New cases in each WHO Region are:
WHO Region
Africa 12176
Americas 129531
Eastern Mediterranean 12410
Europe 22294
South-East Asia 48993
Western Pacific 3289
Name: New cases, dtype: int64
---------------------------------------------------------------
The sum of New deaths in each WHO Region are:
WHO Region
Africa 353
Americas 3555
Eastern Mediterranean 445
Europe 304
South-East Asia 734
Western Pacific 24
Name: New deaths, dtype: int64
---------------------------------------------------------------
The sum of New recovered cases in each WHO Region are:
WHO Region
Africa 14563
Americas 94776
Eastern Mediterranean 14843
Europe 11732
South-East Asia 37582
Western Pacific 1127
Name: New recovered, dtype: int64
[0 1 2 3 4 5]
[0.1, 1.1, 2.1, 3.1, 4.1, 5.1]
[0.2, 1.2000000000000002, 2.2, 3.2, 4.199999999999999, 5.199999999999999]
[0.30000000000000004, 1.3000000000000003, 2.3000000000000003, 3.3000000000000003,
4.299999999999999, 5.299999999999999]
[0.4, 1.4000000000000004, 2.4000000000000004, 3.4000000000000004, 4.3999999999999
99, 5.399999999999999]
[0.5, 1.5000000000000004, 2.5000000000000004, 3.5000000000000004, 4.4999999999999
98, 5.499999999999998]
[0.6, 1.6000000000000005, 2.6000000000000005, 3.6000000000000005, 4.5999999999999
98, 5.599999999999998]

Import: Sys - Executable - M Pip Install
No ratings yet
Import: Sys - Executable - M Pip Install
23 pages
CovidData - Ipynb - Colaboratory
No ratings yet
CovidData - Ipynb - Colaboratory
4 pages
Chemistry Hydrocarbons
100% (3)
Chemistry Hydrocarbons
144 pages
Leadership and Management in Nursing
80% (5)
Leadership and Management in Nursing
5 pages
My P Report
No ratings yet
My P Report
14 pages
Lab 3
No ratings yet
Lab 3
3 pages
COVID19.ipynb - Colab
No ratings yet
COVID19.ipynb - Colab
4 pages
Jupyter Notebook2
No ratings yet
Jupyter Notebook2
15 pages
Assignment 3
No ratings yet
Assignment 3
16 pages
Covid Data Analysis Project - 2
No ratings yet
Covid Data Analysis Project - 2
2 pages
Corona Virus Analysis
No ratings yet
Corona Virus Analysis
27 pages
R-Plots - HOW TO
No ratings yet
R-Plots - HOW TO
4 pages
Handling Missing Values, Outliers and Irregular Cardinalities
No ratings yet
Handling Missing Values, Outliers and Irregular Cardinalities
16 pages
Covid19-Maro (2) - JupyterLab
No ratings yet
Covid19-Maro (2) - JupyterLab
7 pages
Region and Domain Region and Domain
No ratings yet
Region and Domain Region and Domain
3 pages
Covid Project
No ratings yet
Covid Project
87 pages
Project On Covid Data
No ratings yet
Project On Covid Data
5 pages
Ashutosh Project
No ratings yet
Ashutosh Project
19 pages
Data Are Measurements of Variables From Every
No ratings yet
Data Are Measurements of Variables From Every
29 pages
COVID
No ratings yet
COVID
19 pages
Micro
No ratings yet
Micro
12 pages
Computer Science Ip
No ratings yet
Computer Science Ip
16 pages
Ebola Worksheet-1
No ratings yet
Ebola Worksheet-1
5 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
Mini
No ratings yet
Mini
6 pages
Name
No ratings yet
Name
23 pages
World Air Quality Analysis
No ratings yet
World Air Quality Analysis
15 pages
04 Descriptive Analysis
No ratings yet
04 Descriptive Analysis
60 pages
Syadatajveez
No ratings yet
Syadatajveez
21 pages
Eda 21524785
No ratings yet
Eda 21524785
32 pages
Prac 1 Feb
No ratings yet
Prac 1 Feb
22 pages
Week3 Logistic Regression Post PDF
No ratings yet
Week3 Logistic Regression Post PDF
110 pages
Logistic Regression Implementation
No ratings yet
Logistic Regression Implementation
10 pages
Project Prog
No ratings yet
Project Prog
6 pages
Assignment - Ipynb - Colaboratory
No ratings yet
Assignment - Ipynb - Colaboratory
14 pages
Tidy Data
No ratings yet
Tidy Data
62 pages
Co Vids QL Present N 0710
No ratings yet
Co Vids QL Present N 0710
27 pages
008 Revised Notes Intro To DataPresentation
No ratings yet
008 Revised Notes Intro To DataPresentation
50 pages
Ordering Proc Freq Arpind
No ratings yet
Ordering Proc Freq Arpind
9 pages
Tutorial 6
No ratings yet
Tutorial 6
13 pages
Bidimensional 3
No ratings yet
Bidimensional 3
5 pages
Data Preprocessing
No ratings yet
Data Preprocessing
27 pages
COVID 19 Pandemic Analysis
No ratings yet
COVID 19 Pandemic Analysis
26 pages
COVID 19 Some Challenges Some Data 1
No ratings yet
COVID 19 Some Challenges Some Data 1
26 pages
Data Pre Processing 1
No ratings yet
Data Pre Processing 1
35 pages
Name
No ratings yet
Name
23 pages
Python Data Cleaning
100% (1)
Python Data Cleaning
20 pages
HUSSAIN 9392 IPS I.P PROJECT - Done
No ratings yet
HUSSAIN 9392 IPS I.P PROJECT - Done
38 pages
DS (Pandas)
No ratings yet
DS (Pandas)
17 pages
Assignment Sujith S
No ratings yet
Assignment Sujith S
13 pages
Classwork 10
No ratings yet
Classwork 10
1 page
Institute of Technology Management & Research
No ratings yet
Institute of Technology Management & Research
10 pages
A2 Midterm QP
No ratings yet
A2 Midterm QP
1 page
Project File - A
No ratings yet
Project File - A
20 pages
Dự báo và phát triển kinh doanh
No ratings yet
Dự báo và phát triển kinh doanh
43 pages
Covid19 Death Prediction
No ratings yet
Covid19 Death Prediction
1 page
COVID Live Update 165,650,458 Cases and 3,433,920 Deaths From The Coronavirus - Worldometer
No ratings yet
COVID Live Update 165,650,458 Cases and 3,433,920 Deaths From The Coronavirus - Worldometer
1 page
WA0039. Pages Deleted (1) Merged Cropped
No ratings yet
WA0039. Pages Deleted (1) Merged Cropped
38 pages
AI Practical Project
No ratings yet
AI Practical Project
15 pages
The Hidden Stone Mystery
From Everand
The Hidden Stone Mystery
Fran Striker
No ratings yet
Mass Movement: The digital years, volume one
From Everand
Mass Movement: The digital years, volume one
Tim Cundle
No ratings yet
Cooking at Home
From Everand
Cooking at Home
Martha Frommert Kausch
No ratings yet
20-Region Elimination Method - Golden Search Method-11-03-2025
No ratings yet
20-Region Elimination Method - Golden Search Method-11-03-2025
20 pages
Non-Stoichiometric Defects: Hari Prakash Sahu Id-Mu20Mch042 MSC 2Nd Sem
No ratings yet
Non-Stoichiometric Defects: Hari Prakash Sahu Id-Mu20Mch042 MSC 2Nd Sem
8 pages
Jamie Oliver's Vegetarian Black Bean Burgers Recipe - NYT Cooking
No ratings yet
Jamie Oliver's Vegetarian Black Bean Burgers Recipe - NYT Cooking
2 pages
Inter-Process Communication
No ratings yet
Inter-Process Communication
39 pages
Mondaiji-Tachi Ga Isekai Kara Kuru Sou Desu Yo Volume 5 - Supreme Ruler of The Blue Waters Descends
100% (2)
Mondaiji-Tachi Ga Isekai Kara Kuru Sou Desu Yo Volume 5 - Supreme Ruler of The Blue Waters Descends
251 pages
2 Nso
No ratings yet
2 Nso
21 pages
Differences Between Audits of Listed and Unlisted Companies
No ratings yet
Differences Between Audits of Listed and Unlisted Companies
1 page
Increase Ojas - Svastha Ayurveda
100% (1)
Increase Ojas - Svastha Ayurveda
4 pages
Assignment ChatGPT With AI
No ratings yet
Assignment ChatGPT With AI
3 pages
No. of Items: Remembering Understanding Applying Analyzing Evaluating Creating
No ratings yet
No. of Items: Remembering Understanding Applying Analyzing Evaluating Creating
1 page
UAT #1 & #2 Protection Relay Setting & Testing
No ratings yet
UAT #1 & #2 Protection Relay Setting & Testing
39 pages
Army Lean 6 Sigma Deployment Guide 2007
No ratings yet
Army Lean 6 Sigma Deployment Guide 2007
79 pages
Digital Manufacturing
No ratings yet
Digital Manufacturing
3 pages
Business Management and Behavioural Studies: Certificate in Accounting and Finance Stage Examination
No ratings yet
Business Management and Behavioural Studies: Certificate in Accounting and Finance Stage Examination
4 pages
Elasticity of Demand and Supply: Mcgraw-Hill/Irwin
No ratings yet
Elasticity of Demand and Supply: Mcgraw-Hill/Irwin
21 pages
DC Motor Control Using A Single Switch: Circuit Ideas
No ratings yet
DC Motor Control Using A Single Switch: Circuit Ideas
2 pages
Amravati Dips 12-13
No ratings yet
Amravati Dips 12-13
32 pages
Assessment 2 - Lesson Plan For Reading Comprehension PDF
No ratings yet
Assessment 2 - Lesson Plan For Reading Comprehension PDF
14 pages
PRGASs
No ratings yet
PRGASs
40 pages
Split Valuation SAP
No ratings yet
Split Valuation SAP
7 pages
Nobody Loses All The Time
No ratings yet
Nobody Loses All The Time
14 pages
MCE16 Pressure Vessels
No ratings yet
MCE16 Pressure Vessels
8 pages
Studying The Resistivity Imaging of Chicken Tissue Phantoms With Different Current Patterns in Electrical Impedance Tomography (EIT) PDF
No ratings yet
Studying The Resistivity Imaging of Chicken Tissue Phantoms With Different Current Patterns in Electrical Impedance Tomography (EIT) PDF
20 pages
Further Speculation On The Symbol of The Square and Compasses
No ratings yet
Further Speculation On The Symbol of The Square and Compasses
12 pages
Chemical Bonding
No ratings yet
Chemical Bonding
274 pages
21st Century Skills
No ratings yet
21st Century Skills
7 pages
Abm2 Week-5
No ratings yet
Abm2 Week-5
12 pages
River Hydraulics: Ramakar Jha V. P. Singh Vivekanand Singh L. B. Roy Roshni Thendiyath
No ratings yet
River Hydraulics: Ramakar Jha V. P. Singh Vivekanand Singh L. B. Roy Roshni Thendiyath
459 pages