Solutions To Pandas Basic Questions
Solutions To Pandas Basic Questions
data = {'birds': ['Cranes', 'Cranes', 'plovers', 'spoonbills', 'spoonbills', 'Cranes', 'plovers', 'Cranes', 'spoonbills',
'spoonbills'], 'age': [3.5, 4, 1.5, np.nan, 6, 3, 5.5, np.nan, 8, 4], 'visits': [2, 4, 3, 4, 3, 4, 2, 2, 3, 2], 'priority': ['yes',
'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
1. Create a DataFrame birds from this dictionary data which has the index labels.
df = pd.DataFrame(data,index=labels)
df
Out[1]:
birds age visits priority
a Cranes 3.5 2 yes
b Cranes 4.0 4 yes
c plovers 1.5 3 no
d spoonbills NaN 4 yes
e spoonbills 6.0 3 no
f Cranes 3.0 4 no
g plovers 5.5 2 no
h Cranes NaN 2 yes
i spoonbills 8.0 3 no
j spoonbills 4.0 2 no
2. Display a summary of the basic information about birds DataFrame and its data.
In [2]: df.describe()
Out[2]:
age visits
count 8.000000 10.000000
mean 4.437500 2.900000
std 2.007797 0.875595
min 1.500000 2.000000
25% 3.375000 2.000000
50% 4.000000 3.000000
75% 5.625000 3.750000
max 8.000000 4.000000
In [3]: df.head(2)
Out[3]:
birds age visits priority
a Cranes 3.5 2 yes
b Cranes 4.0 4 yes
4. Print all the rows with only 'birds' and 'age' columns from the dataframe
In [4]: df[['birds','age']]
Out[4]:
birds age
a Cranes 3.5
b Cranes 4.0
c plovers 1.5
d spoonbills NaN
e spoonbills 6.0
f Cranes 3.0
g plovers 5.5
h Cranes NaN
i spoonbills 8.0
j spoonbills 4.0
7. select the rows with columns ['birds', 'visits'] where the age is missing i.e NaN
In [7]: df[df['age'].isnull()]
Out[7]:
birds age visits priority
d spoonbills NaN 4 yes
h Cranes NaN 2 yes
8. Select the rows where the birds is a Cranes and the age is less than 4
In [10]: df[df['birds']=='Cranes']['visits'].sum()
Out[10]:
12
11. Calculate the mean age for each different birds in dataframe.
In [11]: x = df[df['birds']=='Cranes']['age'].mean()
y = df[df['birds']=='plovers']['age'].mean()
z = df[df['birds']=='spoonbills']['age'].mean()
print(x,'\n', y,'\n',z)
3.5
3.5
6.0
12. Append a new row 'k' to dataframe with your choice of values for each column. Then delete that row
to return the original DataFrame.
In [14]: df['birds'].value_counts()
Out[14]:
spoonbills 4
Cranes 4
plovers 2
Name: birds, dtype: int64
14. Sort dataframe (birds) first by the values in the 'age' in decending order, then by the value in the
'visits' column in ascending order.
15. Replace the priority column values with'yes' should be 1 and 'no' should be 0
In [16]: x = df.replace(to_replace=['yes','no'], value=[1,0])
x
Out[16]:
birds age visits priority
a Cranes 3.5 2 1
b Cranes 4.0 4 1
c plovers 1.5 3 0
d spoonbills NaN 4 1
e spoonbills 6.0 3 0
f Cranes 3.0 4 0
g plovers 5.5 2 0
h Cranes NaN 2 1
i spoonbills 8.0 3 0
j spoonbills 4.0 2 0
In [ ]: