0% found this document useful (0 votes)
472 views1 page

Solutions To Pandas Basic Questions

The document describes operations performed on a pandas DataFrame called 'birds' created from dictionary data and list labels. The DataFrame contains bird observation data with columns for bird name, age, number of visits, and observation priority. A number of data selection, sorting, and aggregation operations are demonstrated including filtering rows by criteria, calculating group means, counting unique bird types, and sorting.

Uploaded by

Jason Shax
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
472 views1 page

Solutions To Pandas Basic Questions

The document describes operations performed on a pandas DataFrame called 'birds' created from dictionary data and list labels. The DataFrame contains bird observation data with columns for bird name, age, number of visits, and observation priority. A number of data selection, sorting, and aggregation operations are demonstrated including filtering rows by criteria, calculating group means, counting unique bird types, and sorting.

Uploaded by

Jason Shax
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Consider the following Python dictionary data and Python list labels:

data = {'birds': ['Cranes', 'Cranes', 'plovers', 'spoonbills', 'spoonbills', 'Cranes', 'plovers', 'Cranes', 'spoonbills',
'spoonbills'], 'age': [3.5, 4, 1.5, np.nan, 6, 3, 5.5, np.nan, 8, 4], 'visits': [2, 4, 3, 4, 3, 4, 2, 2, 3, 2], 'priority': ['yes',
'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

1. Create a DataFrame birds from this dictionary data which has the index labels.

In [1]: import pandas as pd


import numpy as np
data = {'birds': ['Cranes', 'Cranes', 'plovers', 'spoonbills', 'spoonbills', 'Cranes', 'p
lovers', 'Cranes', 'spoonbills', 'spoonbills'], 'age': [3.5, 4, 1.5, np.nan, 6, 3, 5.5, n
p.nan, 8, 4], 'visits': [2, 4, 3, 4, 3, 4, 2, 2, 3, 2], 'priority': ['yes', 'yes', 'no',
'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(data,index=labels)
df
Out[1]:
birds age visits priority
a Cranes 3.5 2 yes
b Cranes 4.0 4 yes
c plovers 1.5 3 no
d spoonbills NaN 4 yes
e spoonbills 6.0 3 no
f Cranes 3.0 4 no
g plovers 5.5 2 no
h Cranes NaN 2 yes
i spoonbills 8.0 3 no
j spoonbills 4.0 2 no

2. Display a summary of the basic information about birds DataFrame and its data.

In [2]: df.describe()
Out[2]:
age visits
count 8.000000 10.000000
mean 4.437500 2.900000
std 2.007797 0.875595
min 1.500000 2.000000
25% 3.375000 2.000000
50% 4.000000 3.000000
75% 5.625000 3.750000
max 8.000000 4.000000

3. Print the first 2 rows of the birds dataframe

In [3]: df.head(2)
Out[3]:
birds age visits priority
a Cranes 3.5 2 yes
b Cranes 4.0 4 yes

4. Print all the rows with only 'birds' and 'age' columns from the dataframe

In [4]: df[['birds','age']]
Out[4]:
birds age
a Cranes 3.5
b Cranes 4.0
c plovers 1.5
d spoonbills NaN
e spoonbills 6.0
f Cranes 3.0
g plovers 5.5
h Cranes NaN
i spoonbills 8.0
j spoonbills 4.0

5. select [2, 3, 7] rows and in columns ['birds', 'age', 'visits']

In [5]: df.iloc[[2,3,7], :3]


Out[5]:
birds age visits
c plovers 1.5 3
d spoonbills NaN 4
h Cranes NaN 2

6. select the rows where the number of visits is less than 4

In [6]: df[df['visits'] < 4]


Out[6]:
birds age visits priority
a Cranes 3.5 2 yes
c plovers 1.5 3 no
e spoonbills 6.0 3 no
g plovers 5.5 2 no
h Cranes NaN 2 yes
i spoonbills 8.0 3 no
j spoonbills 4.0 2 no

7. select the rows with columns ['birds', 'visits'] where the age is missing i.e NaN
In [7]: df[df['age'].isnull()]
Out[7]:
birds age visits priority
d spoonbills NaN 4 yes
h Cranes NaN 2 yes

8. Select the rows where the birds is a Cranes and the age is less than 4

In [8]: df[(df['birds']=='Cranes') & (df['age'] < 4 )]


Out[8]:
birds age visits priority
a Cranes 3.5 2 yes
f Cranes 3.0 4 no

9. Select the rows the age is between 2 and 4(inclusive)

In [9]: df[(df['age'] >= 2) & (df['age'] <= 4 )]


Out[9]:
birds age visits priority
a Cranes 3.5 2 yes
b Cranes 4.0 4 yes
f Cranes 3.0 4 no
j spoonbills 4.0 2 no

10. Find the total number of visits of the bird Cranes

In [10]: df[df['birds']=='Cranes']['visits'].sum()
Out[10]:
12

11. Calculate the mean age for each different birds in dataframe.

In [11]: x = df[df['birds']=='Cranes']['age'].mean()
y = df[df['birds']=='plovers']['age'].mean()
z = df[df['birds']=='spoonbills']['age'].mean()

print(x,'\n', y,'\n',z)

3.5
3.5
6.0

12. Append a new row 'k' to dataframe with your choice of values for each column. Then delete that row
to return the original DataFrame.

In [12]: data = {'birds':'egret','age':4,'visits':2,'priority':'yes'}


x = pd.DataFrame(data, index=['k'], columns=['birds','age','visits','priority'])
df.append(x)
Out[12]:
birds age visits priority
a Cranes 3.5 2 yes
b Cranes 4.0 4 yes
c plovers 1.5 3 no
d spoonbills NaN 4 yes
e spoonbills 6.0 3 no
f Cranes 3.0 4 no
g plovers 5.5 2 no
h Cranes NaN 2 yes
i spoonbills 8.0 3 no
j spoonbills 4.0 2 no
k egret 4.0 2 yes

In [13]: drop_k = df.drop(df.tail(0).index) #deleting k


df = drop_k
df
Out[13]:
birds age visits priority
a Cranes 3.5 2 yes
b Cranes 4.0 4 yes
c plovers 1.5 3 no
d spoonbills NaN 4 yes
e spoonbills 6.0 3 no
f Cranes 3.0 4 no
g plovers 5.5 2 no
h Cranes NaN 2 yes
i spoonbills 8.0 3 no
j spoonbills 4.0 2 no

13. Find the number of each type of birds in dataframe (Counts)

In [14]: df['birds'].value_counts()
Out[14]:
spoonbills 4
Cranes 4
plovers 2
Name: birds, dtype: int64

14. Sort dataframe (birds) first by the values in the 'age' in decending order, then by the value in the
'visits' column in ascending order.

In [15]: age_sorting = df.sort_values('age', ascending=False)


print(age_sorting,'\n\n')
visit_ascend = df.sort_values('visits',ascending=True )
print(visit_ascend)

birds age visits priority


i spoonbills 8.0 3 no
e spoonbills 6.0 3 no
g plovers 5.5 2 no
b Cranes 4.0 4 yes
j spoonbills 4.0 2 no
a Cranes 3.5 2 yes
f Cranes 3.0 4 no
c plovers 1.5 3 no
d spoonbills NaN 4 yes
h Cranes NaN 2 yes

birds age visits priority


a Cranes 3.5 2 yes
g plovers 5.5 2 no
h Cranes NaN 2 yes
j spoonbills 4.0 2 no
c plovers 1.5 3 no
e spoonbills 6.0 3 no
i spoonbills 8.0 3 no
b Cranes 4.0 4 yes
d spoonbills NaN 4 yes
f Cranes 3.0 4 no

15. Replace the priority column values with'yes' should be 1 and 'no' should be 0
In [16]: x = df.replace(to_replace=['yes','no'], value=[1,0])
x
Out[16]:
birds age visits priority
a Cranes 3.5 2 1
b Cranes 4.0 4 1
c plovers 1.5 3 0
d spoonbills NaN 4 1
e spoonbills 6.0 3 0
f Cranes 3.0 4 0
g plovers 5.5 2 0
h Cranes NaN 2 1
i spoonbills 8.0 3 0
j spoonbills 4.0 2 0

16. In the 'birds' column, change the 'Cranes' entries to 'trumpeters'.

In [17]: z = df.replace(to_replace=['Cranes'], value=['trumpeters'])


z
Out[17]:
birds age visits priority
a trumpeters 3.5 2 yes
b trumpeters 4.0 4 yes
c plovers 1.5 3 no
d spoonbills NaN 4 yes
e spoonbills 6.0 3 no
f trumpeters 3.0 4 no
g plovers 5.5 2 no
h trumpeters NaN 2 yes
i spoonbills 8.0 3 no
j spoonbills 4.0 2 no

In [ ]:

You might also like