Pandas - 1
Pandas - 1
Question 1
What is the significance of Pandas library ?
Answer
The significance of Python Pandas library is as follows:
1. It can read or write in many different data formats (integer, float, double, etc.).
2. It can calculate in all the possible ways data is organized i.e., across rows and down
columns.
3. It can easily select subsets of data from bulky data sets and even combine multiple
datasets together. It has functionality to find and fill missing data.
4. It allows to apply operations to independent groups within the data.
5. It supports reshaping of data into different forms.
6. It supports advanced time-series functionality.
7. It supports visualization by integrating matplotlib and seaborn etc. libraries.
Question 2
Name some common data structures of Python's Pandas library.
Answer
The common data structures of Python's Pandas library are Series and DataFrame.
Question 3
How is a Series object different from and similar to ndarrays ? Support your answer with
examples.
Answer
A Series object in Pandas is both similar to and different from ndarrays (NumPy arrays).
Similarities:
Both Series and ndarrays store homogeneous data, meaning all elements must be of the same
data type (e.g., integers, floats, strings).
Differences:
Question 4
Write single line Pandas statement for the following. (Assuming necessary modules have
been imported) :
Declare a Pandas series named Packets having dataset as :
[125, 92, 104, 92, 85, 116, 87, 90]
Answer
Packets = pandas.Series([125, 92, 104, 92, 85, 116, 87, 90], name = 'Packets')
Question 5
Write commands to print following details of a Series object seal :
(a) if the series is empty
(b) indexes of the series
(c) The data type of underlying data
(d) if the series stores any NaN values
Answer
(a)
seal.empty
(b)
seal.index
(c)
seal.dtype
(d)
seal.hasnans
Question 6
Given the following Series S1 and S2 :
S1
A 10
B 40
C 34
D 60
S2
A 80
B 20
C 74
D 90
Output
A 90
B 60
C 108
D 150
dtype: int64
Question 7
Consider two objects x and y. x is a list whereas y is a Series. Both have values 20, 40, 90,
110.
What will be the output of the following two statements considering that the above objects
have been created already ?
(a) print (x*2)
(b) print (y*2)
Justify your answer.
Answer
(a)
Output
Output
0 40
1 80
2 180
3 220
dtype: int64
In the second statement, y represents a Series. When a Series is multiplied by a value, each
element of the Series is multiplied by 2, as Series supports vectorized operations.
Question 8
Given a dataframe df as shown below :
A B D
0 15 17 19
1 16 18 20
2 20 21 22
Output
A B D C
0 15 17 19 NaN
1 16 18 20 NaN
2 20 21 22 NaN
(b) df['C'] = [2, 5] — This statement will result in error because the length of the list [2,
5] does not match the number of rows in the DataFrame df.
(c) df['C'] = [12, 15, 27] — This statement will add a new column 'C' to the dataframe
and assign the values from the list [12, 15, 27] to the new column. This time, all rows in the
new column will be assigned a value.
The updated dataframe will look like this:
Output
A B D C
0 15 17 19 12
1 16 18 20 15
2 20 21 22 27
Question 9
Write code statements to list the following, from a dataframe namely sales:
(a) List only columns 'Item' and 'Revenue'.
(b) List rows from 3 to 7.
(c) List the value of cell in 5th row, 'Item' column.
Answer
(a)
>>> sales[['Item', 'Revenue']]
(b)
>>> sales.iloc[2:7]
(c)
>>> sales.Item[4]
Question 10
Hitesh wants to display the last four rows of the dataframe df and has written the following
code :
df.tail()
But last 5 rows are being displayed. Identify the error and rewrite the correct code so that last
4 rows get displayed.
Answer
The error in Hitesh's code is that the tail() function in pandas by default returns the last 5
rows of the dataframe. To display the last 4 rows, Hitesh needs to specify the number of rows
he wants to display.
Here's the correct code:
df.tail(4)
Question 11
How would you add a new column namely 'val' to a dataframe df that has 10 rows in it and
has columns as 'Item', 'Qty', 'Price' ? You can choose to put any values of your choice.
Answer
The syntax to add a new column to a DataFrame is <DF object>.[<column>] = <new value>.
Therefore, according to this syntax, the statement to add a column named 'val' to a dataframe
df with 10 rows is :
df['val'] = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Question 12
Write code statements for a dataframe df for the following :
(a) delete an existing column from it.
(b) delete rows from 3 to 6 from it.
(c) Check if the dataframe has any missing values.
(d) fill all missing values with 999 in it.
Answer
(a)
>>> del df[<column_name>]
(b)
>>> df.drop(range(2, 6))
(c)
>>> df.isnull()
(d)
>>> df.fillna(999)
Question 13
Write statement(s) to delete a row from a DataFrame.
Answer
The statement to delete a row from a DataFrame is:
<DF>.drop(index).
For example, the statement to delete the second row from a dataframe df is df.drop(1).
Question 14
Write statement(s) to delete a column from a DataFrame.
Answer
The statements to delete a column from a DataFrame is:
del <Df object>[<column name>]
OR
df.drop([<column name>], axis = 1) .
For example, the statement to delete a column Population from a dataframe df is del
df['Population] or df.drop('Population', axis = 1) .
Question 15
Write statement(s) to change the value at 5th row, 6th column in a DataFrame df.
Answer
The statement to change the value at 5th row, 6th column in a DataFrame df is:
df.iat[5, 6] = <new value>.
Question 16
Write statement(s) to change the values to 750 at 4th row to 9th row, 7th column in a
DataFrame df.
Answer
The statement to change the value to 750 at 4th row to 9th row, 7th column in a DataFrame df
is:
df.iloc[3:9, 6] = 750.
Question 17
What is the difference between iloc and loc with respect to a DataFrame ?
Answer
Question 18
What is the difference between iat and at with respect to a DataFrame ?
Answer
Question 19
How would you delete columns from a dataframe ?
Answer
To delete columns from a dataframe, we use the del statement with the syntax:
del <Df object>[<column name>]
OR
df.drop([<column name], axis = 1) .
For example, the statement to delete columns A, B from a dataframe df is del df['A'] and
del df['B'] or df.drop(['A', 'B'], axis = 1).
Question 20
How would you delete rows from a dataframe ?
Answer
To delete rows from a dataframe, we use the drop() function with the syntax:
<DF>.drop(sequence of indexes).
For example, the statement to delete the rows with indexes 2, 3, 4 from a
dataframe df is df.drop([2, 3, 4]).
Question 21
Which function would you use to rename the index/column names in a dataframe ?
Answer
The rename() function in pandas is used to rename index or column names in a DataFrame.
Type B: Short Answer Questions/Conceptual Questions
Question 1
Consider following Series object namely S :
0 0.430271
1 0.617328
2 -0.265421
3 -0.836113
dtype:float64
What will be returned by following statements ?
(a) S * 100
(b) S > 0
(c) S1 = pd.Series(S)
(d) S2 = pd.Series(S1) + 3
What will be the values of Series objects S1 and S2 created above ?
Answer
(a) S * 100
Output
0 43.0271
1 61.7328
2 -26.5421
3 -83.6113
dtype: float64
(b) S > 0
Output
0 True
1 True
2 False
3 False
dtype: bool
(c) S1 = pd.Series(S)
Output
0 0.430271
1 0.617328
2 -0.265421
3 -0.836113
dtype: float64
(d) S2 = pd.Series(S1) + 3
Output
0 3.430271
1 3.617328
2 2.734579
3 2.163887
dtype: float64
The values of Series object S1 created above is as follows:
0 0.430271
1 0.617328
2 -0.265421
3 -0.836113
dtype: float64
The values of Series object S2 created above is as follows:
0 3.430271
1 3.617328
2 2.734579
3 2.163887
dtype: float64
Question 2
Consider the same Series object, S, given in the previous question. What output will be
produced by following code fragment ?
S.index = ['AMZN', 'AAPL', 'MSFT', 'GOOG']
print(S)
print(S['AMZN'])
S['AMZN'] = 1.5
print(S['AMZN'])
print(S)
Answer
Output
AMZN 0.430271
AAPL 0.617328
MSFT -0.265421
GOOG -0.836113
dtype: float64
0.430271
1.5
AMZN 1.500000
AAPL 0.617328
MSFT -0.265421
GOOG -0.836113
dtype: float64
Explanation
The provided code fragment first changes the index labels of the Series S to ['AMZN',
'AAPL', 'MSFT', 'GOOG'], prints the modified Series S, and then proceeds to print and
modify the value corresponding to the 'AMZN' index. Specifically, it prints the value at the
'AMZN' index before and after assigning a new value of 1.5 to that index. Finally, it prints
the Series S again, showing the updated value at the 'AMZN' index.
Question 3
What will be the output produced by the following code ?
Stationery = ['pencils', 'notebooks', 'scales', 'erasers']
S = pd.Series([20, 33, 52, 10], index = Stationery)
S2 = pd.Series([17, 13, 31, 32], index = Stationery)
print(S + S2)
S = S + S2
print(S + S2)
Answer
Output
pencils 37
notebooks 46
scales 83
erasers 42
dtype: int64
pencils 54
notebooks 59
scales 114
erasers 74
dtype: int64
Explanation
The code creates two Pandas Series, S and S2. It then prints the result of adding these two
Series element-wise based on their corresponding indices. After updating S by
adding S and S2, it prints the result of adding updated S and S2 again.
Question 4
What will be the output produced by following code, considering the Series object S given
above ?
(a) print(S[1:1])
(b) print(S[0:1])
(c) print(S[0:2])
(d)
S[0:2] = 12
print(S)
(e)
print(S.index)
print(S.values)
Answer
(a)
Output
Explanation
The slice S[1:1] starts at index 1 and ends at index 1, but because the end index is exclusive,
it does not include any elements, resulting in an empty Series.
(b)
Output
pencils 20
dtype: int64
Explanation
The slice S[0:1] starts at index 0 and ends at index 1, but because the end index is exclusive,
it includes only one element i.e., the element at index 0.
(c)
Output
pencils 20
notebooks 33
dtype: int64
Explanation
The slice S[0:2] starts at index 0 and ends at index 1, hence, it includes two elements i.e.,
elements from index 0 and 1.
(d)
Output
pencils 12
notebooks 12
scales 52
erasers 10
dtype: int64
Explanation
The slice S[0:2] = 12 assigns the value 12 to indices 0 and 1 in Series S, directly modifying
those elements. The updated Series is then printed.
(e)
Output
Explanation
The code print(S.index) displays the indices of Series S, while print(S.values) displays
the values of Series.
Question 5
Write a Python program to create a series object, country using a list that stores the capital of
each country.
Note. Assume four countries to be used as index of the series object are India, UK, Denmark
and Thailand having their capitals as New Delhi, London, Copenhagen and Bangkok
respectively.
Solution
import pandas as pd
capitals = ['New Delhi', 'London', 'Copenhagen', 'Bangkok']
countries = ['India', 'UK', 'Denmark', 'Thailand']
country = pd.Series(capitals, index=countries)
print(country)
Output
Question 6(a)
Find the error in following code fragment :
S2 = pd.Series([101, 102, 102, 104])
print(S2.index)
S2.index = [0, 1, 2, 3, 4, 5]
S2[5] = 220
print(S2)
Answer
S2 = pd.Series([101, 102, 102, 104])
print(S2.index)
S2.index = [0, 1, 2, 3, 4, 5] #Error 1
S2[5] = 220
print(S2)
Error 1 — The Series S2 initially has four elements, so assigning a new index list of six
elements ([0, 1, 2, 3, 4, 5]) to S2.index will raise a ValueError because the new index list
length does not match the length of the Series.
The corrected code is:
S2 = pd.Series([101, 102, 102, 104])
print(S2.index)
S2.index = [0, 1, 2, 3]
S2[5] = 220
print(S2)
Question 6(b)
Find the error in following code fragment :
S = pd.Series(2, 3, 4, 5, index = range(4))
Answer
In the above code fragment, the data values should be enclosed in square brackets [] to form a
list.
The corrected code is:
S = pd.Series([2, 3, 4, 5], index = range(4))
Question 6(c)
Find the error in following code fragment
S1 = pd.Series(1, 2, 3, 4, index = range(7))
Answer
In the above code fragment, the data values should be enclosed in square brackets to form a
list and the specified index range range(7) is out of range for the provided data [1, 2, 3, 4].
Since there are only four data values, the index should have a length that matches the number
of data values.
The corrected code is:
S1 = pd.Series([1, 2, 3, 4], index = range(4))
Question 6(d)
Find the error in following code fragment :
S2 = pd.Series([1, 2, 3, 4], index = range(4))
Answer
There is no error in the above code.
Question 7
Find the Error :
data = np.array(['a', 'b', 'c', 'd', 'e', 'f'])
s = pd.Series(data, index = [100, 101, 102, 103, 104, 105])
print(s[102, 103, 104] )
Question 8
Why does following code cause error ?
s1 = pd.Series(range(1, 15, 3), index = list('abcd'))
Answer
The code causes an error because the length of the data (range(1, 15, 3)) and the length of
the index (list('abcd')) do not match. The range(1, 15, 3) generates the sequence [1, 4,
7, 10, 13], which has a length of 5. The list('abcd') generates the list ['a', 'b', 'c', 'd'], which
has a length of 4. When creating a pandas Series, the length of the data and the length of the
index must be the same.
Question 9
Why does following code cause error ?
s1 = pd.Series(range(1, 15, 3), index = list('ababa'))
print(s1['ab'])
Answer
The statement s1['ab'] causes an Error because 'ab' is not a single key in the index. The
index has individual keys 'a' and 'b', but not 'ab'.
Question 10
If Ser is a Series type object having 30 values, then how are statements (a), (b) and (c), (d)
similar and different ?
(a) print(Ser.head())
(b) print(Ser.head(8))
(c) print(Ser.tail())
(d) print(Ser.tail(11))
Answer
The statements (a), (b), (c) and (d) are all used to view the values from a pandas Series
object Ser. However, they differ in the number of values they display.
(a) print(Ser.head()): This statement will display the first 5 values from the Series Ser.
(b) print(Ser.head(8)): This statement will display the first 8 values from the Series Ser.
(c) print(Ser.tail()): This statement will display the last 5 values from the Series Ser.
(d) print(Ser.tail(11)): This statement will display the last 11 values from the Series Ser.
Question 11
What advantages does dataframe offer over series data structure ? If you have similar data
stored in multiple series and a single dataframe, which one would you prefer and why ?
Answer
The advantages of using a DataFrame over a Series are as follows:
1. A DataFrame can have multiple columns, whereas a Series can only have one.
2. A DataFrame can store data of different types in different columns, whereas a Series
can only store data of a single type.
3. A DataFrame allows to perform operations on entire columns, whereas a Series only
allows to perform operations on individual elements.
4. A DataFrame allows to index data using both row and column labels, whereas a
Series only allows to index data using a single label.
If there is similar data stored in multiple Series and a single DataFrame, I would prefer to use
the DataFrame. This is because a DataFrame allows us to store and manipulate data in a more
organized and structured way, and it allows us to perform operations on entire columns.
Additionally, a DataFrame allows us to index data using both row and column labels, which
makes it easier to access and manipulate data.
Question 12
Create a DataFrame in Python from the given list :
[['Divya', 'HR', 95000], ['Mamta', 'Marketing', 97000], ['Payal', 'IT', 980000], ['Deepak',
'Sales', 79000]]
Also give appropriate column headings as shown below :
0 Divya HR 95000
2 Payal IT 980000
Solution
import pandas as pd
data = [['Divya', 'HR', 95000], ['Mamta', 'Marketing', 97000], ['Payal', 'IT',
980000], ['Deepak', 'Sales', 79000]]
df = pd.DataFrame(data, columns=['Name', 'Department', 'Salary'])
print(df)
Output
Question 13
Carefully observe the following code :
import pandas as pd
Year1 = {'Q1': 5000, 'Q2': 8000, 'Q3': 12000, 'Q4': 18000}
Year2 = {'A': 13000, 'B': 14000, 'C': 12000}
totSales = {1: Year1, 2: Year2}
df = pd.DataFrame(totSales)
print(df)
Question 14
Given :
import pandas as pd
d = {'one' : pd.Series([1., 2., 3.], index = ['a', 'b', 'c']), 'two' :
pd.Series([1., 2., 3., 4.], index = ['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
df1 = pd.DataFrame(d, index = ['d', 'b', 'a'])
df2 = pd.DataFrame(d, index = ['d', 'a'], columns = ['two', 'three'])
print(df)
print(df1)
print(df2)
What will Python show the result as if you execute above code ?
Answer
Output
one two
a 1.0 1.0
b 2.0 2.0
c 3.0 3.0
d NaN 4.0
one two
d NaN 4.0
b 2.0 2.0
a 1.0 1.0
two three
d 4.0 NaN
a 1.0 NaN
Explanation
The given code creates three pandas DataFrames df, df1, and df2 using the same
dictionary d with different index and column labels. The first DataFrame df is created using
the dictionary d with index labels taken from the index of the Series objects in the dictionary.
The resulting DataFrame has two columns 'one' and 'two' with index labels 'a', 'b', 'c', and 'd'.
The values in the DataFrame are filled in accordance to the index and column labels. The
second DataFrame df1 is created with the same dictionary d but with a custom index ['d', 'b',
'a']. The third DataFrame df2 is created with a custom index ['d', 'a'] and a custom column
label ['two', 'three']. Since the dictionary d does not have a column label three, all its values
are NaN (Not a Number), indicating missing data.
Question 15(a)
From the DataFrames created in previous question, write code to display only row 'a' from
DataFrames df, df1, and df2.
Solution
import pandas as pd
d = {'one' : pd.Series([1., 2., 3.], index = ['a', 'b', 'c']), 'two' :
pd.Series([1., 2., 3., 4.], index = ['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
df1 = pd.DataFrame(d, index = ['d', 'b', 'a'])
df2 = pd.DataFrame(d, index = ['d', 'a'], columns = ['two', 'three'])
print(df.loc['a',:])
print(df1.loc['a',:])
print(df2.loc['a',:])
Output
one 1.0
two 1.0
Name: a, dtype: float64
one 1.0
two 1.0
Name: a, dtype: float64
two 1.0
three NaN
Name: a, dtype: object
Question 15(b)
From the DataFrames created in previous question, write code to display only rows 0 and 1
from DataFrames df, df1, and df2.
Solution
import pandas as pd
d = {'one' : pd.Series([1., 2., 3.], index = ['a', 'b', 'c']), 'two' :
pd.Series([1., 2., 3., 4.], index = ['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
df1 = pd.DataFrame(d, index = ['d', 'b', 'a'])
df2 = pd.DataFrame(d, index = ['d', 'a'], columns = ['two', 'three'])
print(df.iloc[0:2])
print(df1.iloc[0:2])
print(df2.iloc[0:2])
Output
one two
a 1.0 1.0
b 2.0 2.0
one two
d NaN 4.0
b 2.0 2.0
two three
d 4.0 NaN
a 1.0 NaN
Question 15(c)
From the DataFrames created in previous question, write code to display only rows 'a' and 'b'
for columns 1 and 2 from DataFrames df, df1 and df2.
Solution
import pandas as pd
d = {'one' : pd.Series([1., 2., 3.], index = ['a', 'b', 'c']), 'two' :
pd.Series([1., 2., 3., 4.], index = ['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
df1 = pd.DataFrame(d, index = ['d', 'b', 'a'])
df2 = pd.DataFrame(d, index = ['d', 'a'], columns = ['two', 'three'])
print(df.loc['a' : 'b', :])
print(df1.loc['b' : 'a', :])
print(df2.loc['d' : 'a', :])
Output
one two
a 1.0 1.0
b 2.0 2.0
one two
b 2.0 2.0
a 1.0 1.0
two three
d 4.0 NaN
a 1.0 NaN
Question 15(d)
From the DataFrames created in previous question, write code to add an empty column 'x' to
all DataFrames.
Solution
import pandas as pd
d = {'one' : pd.Series([1., 2., 3.], index = ['a', 'b', 'c']), 'two' :
pd.Series([1., 2., 3., 4.], index = ['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
df1 = pd.DataFrame(d, index = ['d', 'b', 'a'])
df2 = pd.DataFrame(d, index = ['d', 'a'], columns = ['two', 'three'])
df['x'] = None
df1['x'] = None
df2['x'] = None
print(df)
print(df1)
print(df2)
Output
one two x
a 1.0 1.0 None
b 2.0 2.0 None
c 3.0 3.0 None
d NaN 4.0 None
one two x
d NaN 4.0 None
b 2.0 2.0 None
a 1.0 1.0 None
two three x
d 4.0 NaN None
a 1.0 NaN None
Question 16
What will be the output of the following program ?
import pandas as pd
dic = {'Name' : ['Sapna', 'Anmol', 'Rishul', 'Sameep'], 'Agg' : [56, 67, 75, 76],
'Age' : [16, 18, 16, 19]}
df = pd.DataFrame(dic, columns = ['Name', 'Age'])
print(df)
(a)
Name Agg Age
101 Sapna 56 16
102 Anmol 67 18
103 Rishul 75 16
104 Sameep 76 19
(b)
Name Agg Age
0 Sapna 56 16
1 Anmol 67 18
2 Rishul 75 16
3 Sameep 76 19
(c)
Name
0 Sapna
1 Anmol
2 Rishul
3 Sameep
(d)
Name Age
0 Sapna 16
1 Anmol 18
2 Rishul 16
3 Sameep 19
Answer
(d)
Output
Name Age
0 Sapna 16
1 Amol 18
2 Rishul 16
3 Sameep 19
Explanation
The code creates a DataFrame df with columns 'Name' and 'Age' using a dictionary. It
contains data about individual's names and ages. The DataFrame is then printed, displaying
the specified columns.
Question 17
Predict the output of following code (it uses below given dictionary my_di).
my_di = {"name" : ["Jiya", "Tim", "Rohan"],
"age" : np.array([10, 15, 20]),
"weight" : (75, 123, 239),
"height" : [4.5, 5, 6.1],
"siblings" : 1,
"gender" : "M"}
df = pd.DataFrame(my_di)
print(df)
Answer
Output
Explanation
The given code creates a dictionary my_di. Then, a DataFrame df is created using
the pd.DataFrame() constructor and passing the my_di dictionary. The print() function is
used to display the DataFrame.
Question 18
Consider the same dictionary my_di in the previous question (shown below), what will be the
output produced by following code ?
my_di = {"name" : ["Jiya", "Tim", "Rohan"],
"age" : np.array([10, 15, 20]),
"weight" : (75, 123, 239),
"height" : [4.5, 5, 6.1],
"siblings" : 1,
"gender" : "M"}
Answer
Output
The given code creates a dictionary my_di. Then, a DataFrame df2 is created using
the pd.DataFrame() constructor and passing the my_di dictionary and the my_di["name"] list
as the index. The print() function is used to display the DataFrame.
Question 19
Assume that required libraries (panda and numpy) are imported and dataframe df2 has been
created as per questions 17 and 18 above. Predict the output of following code fragment :
print(df2["weight"])
print(df2.weight['Tim'])
Answer
Output
Jiya 75
Tim 123
Rohan 239
Name: weight, dtype: int64
123
Explanation
The given code creates a dictionary my_di. Then, a DataFrame df2 is created using
the pd.DataFrame() constructor and passing the my_di dictionary and the my_di["name"] list
as the index. The print() function is used to display the 'weight' column of the
DataFrame df2 and the value of the 'weight' column for the row with index 'Tim'.
Question 20
Assume that required libraries (panda and numpy) are imported and dataframe df2 has been
created as per questions 17 and 18 above. Predict the output of following code fragment :
df2["IQ"] = [130, 105, 115]
df2["Married"] = False
print(df2)
Answer
Output
Explanation
The code adds two new columns "IQ" with values [130, 105, 115] and "Married" with value
"False" for all rows to DataFrame df2, then prints the DataFrame.
Question 21
Assume that required libraries (panda and numpy) are imported and dataframe df2 has been
created as per questions 17 and 18 above. Predict the output produced by following code
fragment :
df2["College"] = pd.Series(["IIT"], index=["Rohan"])
print(df2)
Answer
Output
Explanation
The code snippet uses the pandas and numpy libraries in Python to create a DataFrame
named df2 from a dictionary my_di. The DataFrame is indexed by names, and a new column
"College" is added with "IIT" as the value only for the index named "Rohan."
Question 22
Assume that required libraries (panda and numpy) are imported and dataframe df2 has been
created as per questions 17 and 18 above. Predict the output produced by following code
fragment :
print(df2.loc["Jiya"])
print(df2.loc["Jiya", "IQ"])
print(df2.loc["Jiya":"Tim", "IQ":"College"])
print(df2.iloc[0])
print(df2.iloc[0, 5])
print(df2.iloc[0:2, 5:8])
Answer
Output
name Jiya
age 10
weight 75
height 4.5
siblings 1
gender M
IQ 130
College NaN
Name: Jiya, dtype: object
130
IQ College
Jiya 130 NaN
Tim 105 NaN
name Jiya
age 10
weight 75
height 4.5
siblings 1
gender M
IQ 130
College NaN
Name: Jiya, dtype: object
M
gender IQ College
Jiya M 130 NaN
Tim M 105 NaN
Explanation
1. print(df2.loc["Jiya"]) — This line prints all columns of the row with the index
"Jiya".
2. print(df2.loc["Jiya", "IQ"]) — This line prints the value of the "IQ" column for
the row with the index "Jiya".
3. print(df2.loc["Jiya":"Tim", "IQ":"College"]) — This line prints a subset of rows
and columns using labels, from "Jiya" to "Tim" for rows and from "IQ" to "College"
for columns.
4. print(df2.iloc[0]) — This line prints all columns of the first row using integer-
based indexing (position 0).
5. print(df2.iloc[0, 5]) — This line prints the value of the 6th column for the first
row using integer-based indexing.
6. print(df2.iloc[0:2, 5:8]) — This line prints a subset of rows and columns using
integer-based indexing, selecting rows from position 0 to 1 and columns from position
5 to 7.
Question 23
What is the output of the following code ?
d = {'col1': [1, 4, 3 ], 'col2': [6, 7, 8], 'col3': [9, 0, 1]}
df = pd.DataFrame(d)
print("Original DataFrame")
print(df)
print("New DataFrame :")
dfn = df.drop(df.index[[1, 2]])
print(dfn)
Answer
Output
Original DataFrame
col1 col2 col3
0 1 6 9
1 4 7 0
2 3 8 1
New DataFrame :
col1 col2 col3
0 1 6 9
Explanation
The code creates a DataFrame using the pandas library in Python, named df, with three
columns ('col1', 'col2', 'col3') and three rows of data. The DataFrame df is printed, and then a
new DataFrame named dfn is created by dropping the rows with indices 1 and 2 from the
original DataFrame using df.drop(df.index[[1, 2]]). The resulting DataFrame, dfn,
contains only the first row from the df DataFrame, removing rows 2 and 3.
Question 24
What is the output of the following code ?
data = {'age': [20, 23, 22], 'name': ['Ruhi', 'Ali', 'Sam']}
df1 = pd.DataFrame(data, index=[1, 2, 3])
print("Before")
print(df1)
df1['Edu'] = ['BA', 'BE' , 'MBA']
print('After')
print(dfl)
Answer
Output
Before
age name
1 20 Ruhi
2 23 Ali
3 22 Sam
After
age name Edu
1 20 Ruhi BA
2 23 Ali BE
3 22 Sam MBA
Explanation
The code utilizes the pandas library in Python to create a DataFrame named df1 using a
dictionary data. The df1 DataFrame is printed, showing the initial data. Then, a new column
'Edu' is added to the DataFrame using df1['Edu'] = ['BA', 'BE' , 'MBA']. The updated
DataFrame is printed.
Question 25
Consider the given DataFrame 'Genre' :
No Type Code
0 Fiction F
1 Non-fiction NF
2 Drama D
3 Poetry P
(ii)
Genre = Genre.append({'Type': 'Folk Tale', 'Code': 'FT', 'Num_Copies': 600},
ignore_index=True)
(iii)
Genre.rename(columns = {'Code': 'Book_Code'}, inplace = True)
Question 26
Write a program in Python Pandas to create the following DataFrame batsman from a
Dictionary :
1 Sunil Pillai 90 80
2 Gaurav Sharma 65 45
3 Piyush Goel 70 90
4 Karthik Thakur 80 76
Question 27
Consider the following dataframe, and answer the questions given below:
import pandas as pd
df = pd.DataFrame( { "Quarter1": [2000, 4000, 5000, 4400, 10000],
"Quarter2": [5800, 2500, 5400, 3000, 2900],
"Quarter3": [20000, 16000, 7000, 3600, 8200],
"Quarter4": [1400, 3700, 1700, 2000, 6000]})
(i) Write the code to find mean value from above dataframe df over the index and column
axis.
(ii) Use sum() function to find the sum of all the values over the index axis.
Answer
(i)
import pandas as pd
df = pd.DataFrame( { "Quarter1": [2000, 4000, 5000, 4400, 10000],
"Quarter2": [5800, 2500, 5400, 3000, 2900],
"Quarter3": [20000, 16000, 7000, 3600, 8200],
"Quarter4": [1400, 3700, 1700, 2000, 6000]})
mean_over_columns = df.sum(axis=1) / df.count(axis=1)
print("Mean over columns: \n", mean_over_columns)
Output
Output
Question 28
Write the use of the rename(mapper = <dict-like>, axis = 1) method for a Pandas Dataframe.
Can the mapper and columns parameter be used together in a rename() method ?
Answer
The rename() method in pandas DataFrame is used to alter the names of columns or rows. It
accepts various parameters, including mapper and axis, which can be used together to rename
columns and rows based on a mapping dictionary. The mapper parameter allows for a dict-
like object mapping old names to new names, while axis specifies whether the renaming
should occur along columns (axis=1) or rows (axis=0).
Yes, the mapper parameter and the columns parameter can be used together in
the rename() method of a pandas DataFrame to rename columns. The mapper parameter is
used to rename columns based on a mapping dictionary where keys represent the old column
names and values represent the new column names. The columns parameter allows us to
directly specify new column names without using a mapping dictionary. With columns, we
provide a list-like input containing the new column names, and pandas will rename the
columns accordingly.
Question 29
Find the error in the following code ? Suggest the solution.
>>> topDf
RollNo Name Marks
Sec A 115 Pavni 97.5
Sec B 236 Rishi 98.0
Sec C 307 Preet 98.5
Sec D 422 Paula 98.0
topDf.del['Sec D']
Answer
The error in the code is that topDf.del['Sec D'] is not the correct syntax to delete a row
from a DataFrame in pandas. The correct syntax to delete a row in pandas is using
the drop() method along with specifying the index label or index position of the row to be
deleted.
The corrected code is:
>>> topDf.drop(['Sec D'])
Output
Question 30
Find the error in the following code considering the same dataframe topDf given in the
previous question.
(i) topDf.rename(index=['a', 'b', 'c', 'd'])
(ii) topDf.rename(columns = {})
Answer
(i) The line topDf.rename(index=['a', 'b', 'c', 'd']) attempts to rename the index of the
DataFrame topDf, but it doesn't assign the modified DataFrame back to topDf or use
the inplace = True parameter to modify topDf directly. Additionally, using a list of new
index labels without specifying the current index labels will result in an error.
The corrected code is:
topDf.rename(index={'Sec A': 'a', 'Sec B': 'b', 'Sec C': 'c', 'Sec D': 'd'},
inplace = True)
(ii) The line topDf.rename(columns={}) attempts to rename columns in the DataFrame topDf,
but it provides an empty dictionary {} for renaming, which will not perform any renaming.
We need to provide a mapping dictionary with old column names as keys and new column
names as values. To modify topDf directly, it should use the inplace = True parameter.
The corrected code is:
topDf.rename(columns={'RollNo': 'NewRollNo', 'Name': 'NewName', 'Marks':
'NewMarks'}, inplace = True)
Solution
import pandas as pd
temperatures = [28.0, 30.4, 26.5, 29.4, 27.0, 31.2, 25.8]
Temp1 = pd.Series(temperatures)
print(Temp1)
Output
0 28.0
1 30.4
2 26.5
3 29.4
4 27.0
5 31.2
6 25.8
dtype: float64
Question 2
Write Python code to create a Series object Temp2 storing temperatures of seven days of
week. Its indexes should be 'Sunday', 'Monday',... 'Saturday'.
Solution
import pandas as pd
temperatures = [28.9, 30.1, 26.2, 29.3, 27.5, 31.9, 25.5]
days_of_week = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday',
'Saturday']
Temp2 = pd.Series(temperatures, index = days_of_week)
print(Temp2)
Output
Sunday 28.9
Monday 30.1
Tuesday 26.2
Wednesday 29.3
Thursday 27.5
Friday 31.9
Saturday 25.5
dtype: float64
Question 3
A series object (say T1) stores the average temperature recorded on each day of a month.
Write code to display the temperatures recorded on :
(i) first 7 days
(ii) last 7 days.
Solution
import pandas as pd
T1 = pd.Series([25.6, 26.3, 27.9, 28.2, 29.1, 30.9, 31.2, 32.4, 33.2, 34.4, 33.3,
32.5, 31.4, 30.7, 29.6, 28.9, 27.0, 26.2, 25.32, 24.34, 23.4, 22.3, 21.6, 20.9,
19.8, 18.1, 17.2, 16.34, 15.5, 14.6])
first_7_days = T1.head(7)
print("Temperatures recorded on the first 7 days:")
print(first_7_days)
last_7_days = T1.tail(7)
print("\nTemperatures recorded on the last 7 days:")
print(last_7_days)
Output
Question 4
Series objects Temp1, Temp2, Temp3, Temp4 store the temperatures of days of week1,
week2, week3, week4 respectively.
Write a script to
(a) print the average temperature per week.
(b) print average temperature of entire month.
Solution
import pandas as pd
Temp1 = pd.Series([28.0, 30.2, 26.1, 29.6, 27.7, 31.8, 25.9])
Temp2 = pd.Series([25.5, 24.5, 23.6, 22.7, 21.8, 20.3, 19.2])
Temp3 = pd.Series([32.4, 33.3, 34.1, 33.2, 32.4, 31.6, 30.9])
Temp4 = pd.Series([27.3, 28.1, 29.8, 30.6, 31.7, 32.8, 33.0])
Week_1 = sum(Temp1)
Week_2 = sum(Temp2)
Week_3 = sum(Temp3)
Week_4 = sum(Temp4)
Output
Question 5
Ekam, a Data Analyst with a multinational brand has designed the DataFrame df that contains
the four quarters' sales data of different stores as shown below :
Output
15
Explanation
The size attribute of a DataFrame returns the total number of elements in the DataFrame df.
(b) print(df[1:3])
Output
Explanation
This statement uses slicing to extract rows 1 and 2 from the DataFrame df.
(ii)
df = df.drop(2)
Output
Output
Question 6(i)
Consider the following DataFrame df and answer any four questions from (i)-(v):
1 Prerna Singh 24 24 20 22
2 Manish Arora 18 17 19 22
3 Tanish Goel 20 22 18 24
4 Falguni Jain 22 20 24 20
5 Kanika Bhatnagar 15 20 18 22
6 Ramandeep Kaur 20 15 22 24
Write down the command that will give the following output :
roll no 6
name Tanish Goel
UT1 24
UT2 24
UT3 24
UT4 24
dtype : object
(a) print(df.max)
(b) print(df.max())
(c) print(df.max(axis = 1))
(d) print(df.max, axis = 1)
Answer
print(df.max())
Explanation
The df.max() function in pandas is used to find the maximum value in each column of a
DataFrame.
Question 6(ii)
Consider the following DataFrame df and answer any four questions from (i)-(v):
1 Prerna Singh 24 24 20 22
2 Manish Arora 18 17 19 22
3 Tanish Goel 20 22 18 24
4 Falguni Jain 22 20 24 20
5 Kanika Bhatnagar 15 20 18 22
6 Ramandeep Kaur 20 15 22 24
The teacher needs to know the marks scored by the student with roll number 4. Help her
identify the correct set of statement/s from the given options:
(a) df1 = df[df['rollno'] == 4]
print(df1)
(b) df1 = df[rollno == 4]
print(df1)
(c) df1 = df.[df.rollno = 4]
print(df1)
(d) df1 = df[df.rollno == 4]
print(df1)
Answer
df1 = df[df.rollno == 4] print(df1)
Explanation
The statement df1 = df[df.rollno == 4] filters the DataFrame df to include only the rows
where the roll number is equal to 4. This is accomplished using boolean indexing, where a
boolean mask is created by checking if each row's rollno is equal to 4. Rows that satisfy this
condition (True in the boolean mask) are selected, while others are excluded. The resulting
DataFrame df1 contains only the rows corresponding to roll number 4 from the original
DataFrame df.
Question 6(iii)
Consider the following DataFrame df and answer any four questions from (i)-(v):
1 Prerna Singh 24 24 20 22
2 Manish Arora 18 17 19 22
3 Tanish Goel 20 22 18 24
4 Falguni Jain 22 20 24 20
5 Kanika Bhatnagar 15 20 18 22
6 Ramandeep Kaur 20 15 22 24
Which of the following statement/s will give the exact number of values in each column of
the dataframe ?
(I) print(df.count())
(II) print(df.count(0))
(III) print(df.count)
(IV) print((df.count(axis = 'index')))
Choose the correct option :
(a) both (I) and (II)
(b) only (II)
(c) (I), (II) and (III)
(d) (I), (II) and (IV)
Answer
(I), (II) and (IV)
Explanation
In pandas, the statement df.count() and df.count(0) calculate the number of non-null values
in each column of the DataFrame df. The statement df.count(axis='index') specifies the
axis parameter as 'index', which is equivalent to specifying axis=0. This means it will count
non-null values in each column of the DataFrame df.
Question 6(iv)
Consider the following DataFrame df and answer any four questions from (i)-(v):
rollno name UT1 UT2 UT3 UT4
1 Prerna Singh 24 24 20 22
2 Manish Arora 18 17 19 22
3 Tanish Goel 20 22 18 24
4 Falguni Jain 22 20 24 20
5 Kanika Bhatnagar 15 20 18 22
6 Ramandeep Kaur 20 15 22 24
Which of the following command will display the column labels of the DataFrame ?
(a) print(df.columns())
(b) print(df.column())
(c) print(df.column)
(d) print(df.columns)
Answer
print(df.columns)
Explanation
The statement df.columns is used to access the column labels (names) of a DataFrame in
pandas.
Question 6(v)
Consider the following DataFrame df and answer any four questions from (i)-(v):
1 Prerna Singh 24 24 20 22
2 Manish Arora 18 17 19 22
3 Tanish Goel 20 22 18 24
4 Falguni Jain 22 20 24 20
5 Kanika Bhatnagar 15 20 18 22
rollno name UT1 UT2 UT3 UT4
6 Ramandeep Kaur 20 15 22 24
Ms. Sharma, the class teacher wants to add a new column, the scores of Grade with the
values, 'A', 'B', 'A', 'A', 'B', 'A' , to the DataFrame.
Help her choose the command to do so :
(a) df.column = ['A', 'B', 'A', 'A', 'B', 'A']
(b) df['Grade'] = ['A', 'B', 'A', 'A', 'B', 'A']
(c) df.loc['Grade'] = ['A', 'B', 'A', 'A', 'B', 'A']
(d) Both (b) and (c) are correct
Answer
df['Grade'] = ['A', 'B', 'A', 'A', 'B', 'A']
Explanation
The statement df['Grade'] specifies that we are creating a new column named 'Grade' in the
DataFrame df. The square brackets [] are used to access or create a column in a DataFrame.
Question 7
Write a program that stores the sales of 5 fast moving items of a store for each month in 12
Series objects, i.e., S1 Series object stores sales of these 5 items in 1st month, S2 stores sales
of these 5 items in 2nd month, and so on.
The program should display the summary sales report like this :
Total Yearly Sales, item-wise (should display sum of items' sales over the months)
Maximum sales of item made : <name of item that was sold the maximum in whole
year>
Maximum sales for individual items
Maximum sales of item 1 made : <month in which that item sold the maximum>
Maximum sales of item 2 made : <month in which that item sold the maximum>
Maximum sales of item 3 made : <month in which that item sold the maximum>
Maximum sales of item 4 made : <month in which that item sold the maximum>
Maximum sales of item 5 made : <month in which that item sold the maximum>
Solution
import pandas as pd
sales_data = {
'Month_1': pd.Series([300, 250, 200, 150, 350], index=['Item_1', 'Item_2',
'Item_3', 'Item_4', 'Item_5']),
'Month_2': pd.Series([380, 210, 220, 180, 320], index=['Item_1', 'Item_2',
'Item_3', 'Item_4', 'Item_5']),
'Month_3': pd.Series([320, 270, 230, 200, 380], index=['Item_1', 'Item_2',
'Item_3', 'Item_4', 'Item_5']),
'Month_4': pd.Series([310, 260, 210, 190, 360], index=['Item_1', 'Item_2',
'Item_3', 'Item_4', 'Item_5']),
'Month_5': pd.Series([290, 240, 220, 170, 340], index=['Item_1', 'Item_2',
'Item_3', 'Item_4', 'Item_5']),
'Month_6': pd.Series([300, 250, 400, 160, 350], index=['Item_1', 'Item_2',
'Item_3', 'Item_4', 'Item_5']),
'Month_7': pd.Series([310, 260, 230, 180, 370], index=['Item_1', 'Item_2',
'Item_3', 'Item_4', 'Item_5']),
'Month_8': pd.Series([320, 270, 240, 190, 380], index=['Item_1', 'Item_2',
'Item_3', 'Item_4', 'Item_5']),
'Month_9': pd.Series([330, 280, 250, 200, 400], index=['Item_1', 'Item_2',
'Item_3', 'Item_4', 'Item_5']),
'Month_10': pd.Series([340, 290, 260, 510, 420], index=['Item_1', 'Item_2',
'Item_3', 'Item_4', 'Item_5']),
'Month_11': pd.Series([350, 300, 270, 220, 440], index=['Item_1', 'Item_2',
'Item_3', 'Item_4', 'Item_5']),
'Month_12': pd.Series([360, 390, 280, 230, 260], index=['Item_1', 'Item_2',
'Item_3', 'Item_4', 'Item_5'])
}
sales_df = pd.DataFrame(sales_data)
print("Total Yearly Sales, item-wise:")
total_sales = sales_df.sum()
print(total_sales)
t = sales_df.sum(axis=1)
max_sales_item = t.idxmax()
print("\nMaximum sales of item made: ", max_sales_item)
Output
Question 8
Three Series objects store the marks of 10 students in three terms. Roll numbers of students
form the index of these Series objects. The Three Series objects have the same indexes.
Calculate the total weighted marks obtained by students as per following formula :
Final marks = 25% Term 1 + 25% Term 2 + 50% Term 3
Store the Final marks of students in another Series object.
Solution
import pandas as pd
term1 = pd.Series([80, 70, 90, 85, 75, 95, 80, 70, 85, 90], index=[1, 2, 3, 4, 5,
6, 7, 8, 9, 10])
term2 = pd.Series([85, 90, 75, 80, 95, 85, 90, 75, 80, 85], index=[1, 2, 3, 4, 5,
6, 7, 8, 9, 10])
term3 = pd.Series([90, 85, 95, 90, 80, 85, 95, 90, 85, 90], index=[1, 2, 3, 4, 5,
6, 7, 8, 9, 10])
Output
1 86.25
2 82.50
3 88.75
4 86.25
5 82.50
6 87.50
7 90.00
8 81.25
9 83.75
10 88.75
dtype: float64
Question 9
Write code to print all the information about a Series object.
Solution
import pandas as pd
s = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
print(s)
s.info()
Output
a 1
b 2
c 3
d 4
dtype: int64
<class 'pandas.core.series.Series'>
Index: 4 entries, a to d
Series name: None
Non-Null Count Dtype
-------------- -----
4 non-null int64
dtypes: int64(1)
memory usage: 64.0+ bytes
Question 10
Write a program to create three different Series objects from the three columns of a
DataFrame df.
Solution
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
s1 = df['A']
s2 = df['B']
s3 = df['C']
print(s1)
print(s2)
print(s3)
Output
0 1
1 2
2 3
Name: A, dtype: int64
0 4
1 5
2 6
Name: B, dtype: int64
0 7
1 8
2 9
Name: C, dtype: int64
Question 11
Write a program to create three different Series objects from the three rows of a DataFrame
df.
Solution
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
s1 = df.iloc[0]
s2 = df.iloc[1]
s3 = df.iloc[2]
print(s1)
print(s2)
print(s3)
Output
A 1
B 4
C 7
Name: 0, dtype: int64
A 2
B 5
C 8
Name: 1, dtype: int64
A 3
B 6
C 9
Name: 2, dtype: int64
Question 12
Write a program to create a Series object from an ndarray that stores characters from 'a' to 'g'.
Solution
import pandas as pd
import numpy as np
data = np.array(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
S = pd.Series(data)
print(S)
Output
0 a
1 b
2 c
3 d
4 e
5 f
6 g
dtype: object
Question 13
Write a program to create a Series object that stores the table of number 5.
Solution
import pandas as pd
import numpy as np
arr = np.arange(1, 11)
s = pd.Series(arr * 5)
print(s)
Output
0 5
1 10
2 15
3 20
4 25
5 30
6 35
7 40
8 45
9 50
dtype: int32
Question 14
Write a program to create a Dataframe that stores two columns, which store the Series objects
of the previous two questions (12 and 13).
Solution
import pandas as pd
import numpy as np
data = np.array(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
S1 = pd.Series(data)
arr = np.arange(1, 11)
S2 = pd.Series(arr * 5)
df = pd.DataFrame({'Characters': S1, 'Table of 5': S2})
print(df)
Output
Characters Table of 5
0 a 5
1 b 10
2 c 15
3 d 20
4 e 25
5 f 30
6 g 35
7 NaN 40
8 NaN 45
9 NaN 50
Question 15
Write a program to create a Dataframe storing salesmen details (name, zone, sales) of five
salesmen.
Solution
import pandas as pd
salesmen = {'Name': ['Jahangir', 'Janavi', 'Manik', 'Lakshmi', 'Tanisha'], 'Zone':
['North', 'South', 'East', 'West', 'Central'], 'Sales': [5000, 7000, 3000, 8000,
6000]}
df = pd.DataFrame(salesmen)
print(df)
Output
Question 16
Four dictionaries store the details of four employees-of-the-month as (empno, name). Write a
program to create a dataframe from these.
Solution
import pandas as pd
emp1 = {'empno': 1001, 'name': 'Ameesha'}
emp2 = {'empno': 1002, 'name': 'Akruti'}
emp3 = {'empno': 1003, 'name': 'Prithvi'}
emp4 = {'empno': 1004, 'name': 'Rajesh'}
Output
empno name
0 1001 Ameesha
1 1002 Akruti
2 1003 Prithvi
3 1004 Rajesh
Question 17
A list stores three dictionaries each storing details, (old price, new price, change). Write a
program to create a dataframe from it.
Solution
import pandas as pd
prices = [{'old_price': 10, 'new_price': 12, 'change': 2},
{'old_price': 20, 'new_price': 18, 'change': -2},
{'old_price': 30, 'new_price': 35, 'change': 5}]
df = pd.DataFrame(prices)
print(df)
Output