0% found this document useful (0 votes)
21 views

DAI 101 Tutorial 2_Solution

The document is a tutorial covering basic concepts and functionalities of NumPy and Pandas, including array manipulation, data types, and DataFrame operations. It contains a series of questions and answers related to both libraries, focusing on syntax, methods, and error handling. The tutorial serves as a foundational guide for users to understand and utilize NumPy and Pandas effectively.

Uploaded by

Awaan Siddiqui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

DAI 101 Tutorial 2_Solution

The document is a tutorial covering basic concepts and functionalities of NumPy and Pandas, including array manipulation, data types, and DataFrame operations. It contains a series of questions and answers related to both libraries, focusing on syntax, methods, and error handling. The tutorial serves as a foundational guide for users to understand and utilize NumPy and Pandas effectively.

Uploaded by

Awaan Siddiqui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

DAI 101 Tutorial 2 (NumPy,Pandas)

NumPy

Q.1 What is the output of the following program?

import numpy as np
A=np.array([[3,2,4],[3,4,5]])
(print(A.T.shape))

(a) (3,3)
(b) (3,2)
(c) (1,3)
(d) None

Ans. b

Q.2 How to get and display the datatype of a Numpy array n?

(a) print(dtype(n))
(b) print(type(n))
(c) print(n.type)
(d) print(n.dtype)

Ans. d

Q.3 Methods for boolean in numpy array. Choose the relevant from the following options.

(a) sum(), any(), np.type()


(b) sum(), any(), all(), np.type()
(c) objects(), any()
(d) sum(), any(), all()

Ans. d

Q.4 How to search a Numpy array for a value?

(a) numpy.search()
(b) numpy.find()
(c) numpy.contains()
(d) numpy.where()
Ans. D, numpy.where() is a versatile function that can be used to search for values in a NumPy
array.

Q.5 Which method finds the difference between two Numpy arrays?

(a) numpy.setdiff2d()
(b) numpy.setdiff1d()
(c) numpy.setdiff()
(d) numpy.diff()

Ans. b

Q.6 What will be the output of following program:

import numpy as np
a1=np.array([[14,36],[17,47]])
a2=np.array([[10,15]])
a3=np.concatenate((a1,a2),axis=0)
print(a3)
a4=a3.reshape(2,3)
print()
print(a4)

Ans.____________________________________________________

Ans:

[[14 36]
[17 47]
[10 15]]

[[14 36 17]
[47 10 15]]

Q.7 What will be the output of following program:


import numpy as np
a=np.array([4,5,6])
b=a
a[1]=3
print(b)
(a) [4,5,6]
(b) [3,5,6]
(c) [4,3,6]
(d) [4,5,3]

Ans: C [4 3 6]

Q.8 import numpy as np


print(np.maximum([2,3,4],[1,5,2]))

(a) [1,5,2]
(b) [1,5,4]
(c) [2,3,4]
(d) [2,5,4]

Ans: d

Q.9 what is/are the correct syntax to create an array of float type?

(a) Arr = np.array([1,2,3,4], dtype= ' float ')


(b) Arr = np.array([1,2,3,4], dtype= ' f ')
(c) Arr = np.array([1,2,3,4], dtype=float)
(d) None of the Above

Ans: a,b,c

Q.10 Which of the following will give the output as


array([[1, 2, 3, 1, 2, 3],[4, 5, 6, 4, 5, 6]])

(a) arr = np.array([[1,2,3],[4,5,6]])


np.vstack((arr,arr))

(b) arr = np.array([[1,2,3],[4,5,6]])


np.hstack((arr,arr))

(c) arr = np.array([[1,2,3],[4,5,6]])


np.hstack(arr)

(d) arr = np.array([[1,2,3],[4,5,6]])


np.vstack(arr)

Ans. B, This option uses NumPy's hstack function, which stands for "horizontal stack". It
concatenates arrays along the second axis (horizontally for 2D arrays).
Q.11 Which of the following codes give an error?

(a) a1 = np.array([1,2,3])
a2=np.array([0,4,9])
a1.dot(a2)

(b) a1 = np.array([1,2,3,3])
a2=np.array([0,4,9])
np.add(a1,a2)

(c) a = np.array([[1,3,5],[4,6,8]])
np.sum (a)

(d) All the above

Ans. B

Q.12 Which of the following is code gives an error?

(a) a = np.array([(1,2,3),(4,5,6)]) a[(0,1)]


(b) a = np.array([(1,2,3),(4,5,6)]) a.reshape(2,4)
(c) a = np.array([(1,2,3),(4,5,6)]) a[np.arange(1), :]
(d) All the above
Ans. B
ValueError: cannot reshape array of size 6 into shape (2,4)

Q.13 What does the function itemsize() return?

(a) It returns the size of the array


(b) It returns the number of elements in the array
(c) It returns the byte size of each element of the array
(d) None of the above

Ans. C
Q.14 What is the output of the below code?

print(np.zeros(5).dtype)

(a) int8
(b) int16
(c) uint8
(d) Float64

Ans. D

Q.15 What will be output for the following code?


import numpy as np
a = np.array([[1,2,3],[0,1,4]])
b = np.zeros((2,3), dtype=np.int16)
c = np.ones((2,3), dtype=np.int16)
d=a+b+c
print (d[1,2] )

A. 5
B. 7
C. 3
D. 4

Answer: A
Pandas

For Question 16 to Question 20.

Q.16 Write a command to Find most expensive Player.

Ans. print(df[df['BidPrice']==df['BidPrice'].max()])

Q.17 Write a command to Print total players per team.

Ans. print(df.groupby('Team').Player.count())

Q.18 Write a command to Find player who had highest BidPrice from each team.

Ans. print(df.loc[df.groupby('Team')['BidPrice'].idxmax()])

Q.19 Write a command to Sort all players according to BidPrice.


Ans. print(df.sort_values(by='BidPrice'))

Q.20 Write a command to Find average runs of each team.

Ans. print(df.groupby(['Team']).Runs.mean())

Q.21 How can you handle duplicate values in a Pandas DataFrame?

(a) Use the df.drop_duplicates() method


(b) Use the df.remove_duplicates() method
(c) Use the df.drop_duplicate_rows() method
(d) Use the df.eliminate_duplicates() method

Ans: A

Q.22 Amongst which of the following is / are not correct to access individual item from
dataframe 'df'.

(a) df.iat[2,2]
(b) df.loc[2,2]
(c) df.at[2,2]
(d) df[0,0]

Answer: d) df[0,0]

Q.23 Which Pandas method can be used to handle large datasets by reading them in
chunks, and how can you specify the size of each chunk?

(a) read_csv() with the chunk_size parameter


(b) read_large_csv() with the buffer parameter
(c) read_csv() with the chunksize parameter
(d) read_csv_chunked() with the chunk_length parameter

Answer: c) read_csv() with the chunksize parameter

chunked_reader = pd.read_csv('large_file.csv', chunksize=1000)

Q.24 What is the effect of executing the following code?

import pandas as pd
s = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
print(s['f'])
(a) KeyError
(b) IndexError
(c) ValueError
(d) None of the above mentioned

Ans: (a) KeyError: 'f'

Q.25 What will be the output of the following code?

import pandas as pd
pd.Series([1, 2], index= ['a', 'b', 'c'])
(a) Syntax Error
(b) Index Error
(c) Value Error
(d) None of the above mentioned

Ans: c) ValueError: Length of values (2) does not match length of index (3)

Q.26 Which of the following takes a dict of dicts or a dict of array-like sequences and
returns a DataFrame?

(a) DataFrame.from_items
(b) DataFrame.from_records
(c) DataFrame.from_dict
(d) All of the mentioned

Ans: c) DataFrame.from_dict

• DataFrame.from_items: This method is deprecated since version 0.23.0 and will be removed
in a future version. It's not the correct answer.
• DataFrame.from_records: This method is used to create a DataFrame from a structured or
record array, not from a dict of dicts or dict of arrays.

Q.27 How can you filter rows in a DataFrame based on a condition?

(a) df.filter(condition)
(b) df.select_rows(condition)
(c) df[condition]
(d) df.filter_rows(condition)

Ans: C
# Filter rows where Age is greater than 30
filtered_df = df[df['Age'] > 30]

Q.28 What will be the result of executing the following code snippet?

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
result = df.groupby('A').agg({'B': lambda x: x.sum(), 'A': lambda x: x.mean()})
print(result)
(a) KeyError
(b) A DataFrame with separate aggregated results for 'A' and 'B'
(c) The same as the original DataFrame
(d) An empty DataFrame

Answer: b) A DataFrame with separate aggregated results for 'A' and 'B'

Q.29 In a DataFrame df, how can you efficiently perform column-wise z-score
normalization, which standardizes each column so that it has a mean of 0 and a standard
deviation of 1?

(a) df.apply(lambda x: (x - x.mean()) / x.std(), axis=1)


(b) df.apply(lambda x: (x - x.mean()) / x.std(), axis=0)
(c) df.transform('zscore')
(d) None of the above

Ans: b

Q.30 How do you resolve mismatched indices during arithmetic operations between two
DataFrames?

(a) Use df1.align(df2) before operations to align both DataFrames.


(b) Use df1.reindex(df2.index).add(df2) for aligning indices followed by addition.
(c) Use pd.concat([df1, df2], axis=1) to force them to match.
(d) Rename the indices manually to ensure correspondence before operations.

Answer: a,b

Q.31 To sort the DataFrame in Pandas, use the _____ method

(a) sort()
(b) sort_values()
(c) sorted_values()
(d) sorting()

Ans: b) sort_values()

Q.32 What function does Pandas offer to calculate descriptive statistics that summarize the
central tendency, dispersion, and shape of a dataset’s distribution, excluding NaN values?
(a) describe()
(b) profiling()
(c) summary()
(d) overview()

Answer: a) describe()

Q.33 In Pandas, which method is used to apply a function that takes single values and
returns single values to each element of a DataFrame?

(a) DataFrame.transform()
(b) DataFrame.apply()
(c) DataFrame.applymap()
(d) DataFrame.aggregate()

Answer: c) DataFrame.applymap()

Q.34 Which of the following operations is the most efficient way to create a copy of a
DataFrame in Pandas that includes only the first 100 rows of the original DataFrame?

(a) df_copy = df[:100]


(b) df_copy = df.iloc[:100].copy()
(c) df_copy = df.loc[:100].copy()
(d) df_copy = df.take(range(100))

Answer: b) df.iloc[:100].copy()

Q.35 How can you create a hierarchical index (MultiIndex) DataFrame and subsequently
access a subset of this DataFrame using both levels of indexing?

(a) Use pd.DataFrame.set_index(['col1', 'col2']) and access with df.loc[(value1, value2)]


(b) Use pd.MultiIndex.from_arrays() and access with df.xs((value1, value2))
(c) Use df.set_index() with a dictionary and access with df.ix[value1].ix[value2]
(d) Create a MultiIndex with pd.MultiIndex() and access directly with df['value1']['value2']

Answer: a) Use pd.DataFrame.set_index(['col1', 'col2']) and access with df.loc[(value1, value2)]


Correct method: (a)

• Create MultiIndex: df = df.set_index(['col1', 'col2'])


• Access subset: df.loc[(value1, value2)]
This method is straightforward and widely used. It creates a MultiIndex using existing columns
and allows for intuitive access using .loc[].

Alternative methods and their issues:

(b). pd.MultiIndex.from_arrays() and df.xs((value1, value2)):

• While pd.MultiIndex.from_arrays() can create a MultiIndex, it's typically used when you
have separate arrays for each level.
• df.xs() is for cross-sectional selection and is less intuitive for basic indexing.

(C). df.set_index() with a dictionary and df.ix[value1].ix[value2]:

• Using a dictionary with set_index() is uncommon and potentially confusing.


• df.ix[] is deprecated in newer versions of pandas due to its ambiguous behavior.
(D). Create a MultiIndex with pd.MultiIndex() and access with df['value1']['value2']:

• Directly creating a MultiIndex with pd.MultiIndex() is less common when working with
existing DataFrames.
• Accessing with df['value1']['value2'] is not the standard way to access MultiIndex
DataFrames and can lead to unexpected results.

Q.36 What will be the minimum number of arguments required to pass in a pandas series?
(a) 2
(b) 3
(c) 4
(d) None of the above mentioned

Ans: d) None of the above mentioned

Q.37 Point out the correct statement.

(a) If data is a list, if index is passed the values in data corresponding to the labels in the
index will be pulled out
(b) NaN is the standard missing data marker used in pandas
(c) Series acts very similarly to an array
(d) None of the mentioned

Ans: b

You might also like