DAI 101 Tutorial 2_Solution
DAI 101 Tutorial 2_Solution
NumPy
import numpy as np
A=np.array([[3,2,4],[3,4,5]])
(print(A.T.shape))
(a) (3,3)
(b) (3,2)
(c) (1,3)
(d) None
Ans. b
(a) print(dtype(n))
(b) print(type(n))
(c) print(n.type)
(d) print(n.dtype)
Ans. d
Q.3 Methods for boolean in numpy array. Choose the relevant from the following options.
Ans. d
(a) numpy.search()
(b) numpy.find()
(c) numpy.contains()
(d) numpy.where()
Ans. D, numpy.where() is a versatile function that can be used to search for values in a NumPy
array.
Q.5 Which method finds the difference between two Numpy arrays?
(a) numpy.setdiff2d()
(b) numpy.setdiff1d()
(c) numpy.setdiff()
(d) numpy.diff()
Ans. b
import numpy as np
a1=np.array([[14,36],[17,47]])
a2=np.array([[10,15]])
a3=np.concatenate((a1,a2),axis=0)
print(a3)
a4=a3.reshape(2,3)
print()
print(a4)
Ans.____________________________________________________
Ans:
[[14 36]
[17 47]
[10 15]]
[[14 36 17]
[47 10 15]]
Ans: C [4 3 6]
(a) [1,5,2]
(b) [1,5,4]
(c) [2,3,4]
(d) [2,5,4]
Ans: d
Q.9 what is/are the correct syntax to create an array of float type?
Ans: a,b,c
Ans. B, This option uses NumPy's hstack function, which stands for "horizontal stack". It
concatenates arrays along the second axis (horizontally for 2D arrays).
Q.11 Which of the following codes give an error?
(a) a1 = np.array([1,2,3])
a2=np.array([0,4,9])
a1.dot(a2)
(b) a1 = np.array([1,2,3,3])
a2=np.array([0,4,9])
np.add(a1,a2)
(c) a = np.array([[1,3,5],[4,6,8]])
np.sum (a)
Ans. B
Ans. C
Q.14 What is the output of the below code?
print(np.zeros(5).dtype)
(a) int8
(b) int16
(c) uint8
(d) Float64
Ans. D
A. 5
B. 7
C. 3
D. 4
Answer: A
Pandas
Ans. print(df[df['BidPrice']==df['BidPrice'].max()])
Ans. print(df.groupby('Team').Player.count())
Q.18 Write a command to Find player who had highest BidPrice from each team.
Ans. print(df.loc[df.groupby('Team')['BidPrice'].idxmax()])
Ans. print(df.groupby(['Team']).Runs.mean())
Ans: A
Q.22 Amongst which of the following is / are not correct to access individual item from
dataframe 'df'.
(a) df.iat[2,2]
(b) df.loc[2,2]
(c) df.at[2,2]
(d) df[0,0]
Answer: d) df[0,0]
Q.23 Which Pandas method can be used to handle large datasets by reading them in
chunks, and how can you specify the size of each chunk?
import pandas as pd
s = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
print(s['f'])
(a) KeyError
(b) IndexError
(c) ValueError
(d) None of the above mentioned
import pandas as pd
pd.Series([1, 2], index= ['a', 'b', 'c'])
(a) Syntax Error
(b) Index Error
(c) Value Error
(d) None of the above mentioned
Ans: c) ValueError: Length of values (2) does not match length of index (3)
Q.26 Which of the following takes a dict of dicts or a dict of array-like sequences and
returns a DataFrame?
(a) DataFrame.from_items
(b) DataFrame.from_records
(c) DataFrame.from_dict
(d) All of the mentioned
Ans: c) DataFrame.from_dict
• DataFrame.from_items: This method is deprecated since version 0.23.0 and will be removed
in a future version. It's not the correct answer.
• DataFrame.from_records: This method is used to create a DataFrame from a structured or
record array, not from a dict of dicts or dict of arrays.
(a) df.filter(condition)
(b) df.select_rows(condition)
(c) df[condition]
(d) df.filter_rows(condition)
Ans: C
# Filter rows where Age is greater than 30
filtered_df = df[df['Age'] > 30]
Q.28 What will be the result of executing the following code snippet?
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
result = df.groupby('A').agg({'B': lambda x: x.sum(), 'A': lambda x: x.mean()})
print(result)
(a) KeyError
(b) A DataFrame with separate aggregated results for 'A' and 'B'
(c) The same as the original DataFrame
(d) An empty DataFrame
Answer: b) A DataFrame with separate aggregated results for 'A' and 'B'
Q.29 In a DataFrame df, how can you efficiently perform column-wise z-score
normalization, which standardizes each column so that it has a mean of 0 and a standard
deviation of 1?
Ans: b
Q.30 How do you resolve mismatched indices during arithmetic operations between two
DataFrames?
Answer: a,b
(a) sort()
(b) sort_values()
(c) sorted_values()
(d) sorting()
Ans: b) sort_values()
Q.32 What function does Pandas offer to calculate descriptive statistics that summarize the
central tendency, dispersion, and shape of a dataset’s distribution, excluding NaN values?
(a) describe()
(b) profiling()
(c) summary()
(d) overview()
Answer: a) describe()
Q.33 In Pandas, which method is used to apply a function that takes single values and
returns single values to each element of a DataFrame?
(a) DataFrame.transform()
(b) DataFrame.apply()
(c) DataFrame.applymap()
(d) DataFrame.aggregate()
Answer: c) DataFrame.applymap()
Q.34 Which of the following operations is the most efficient way to create a copy of a
DataFrame in Pandas that includes only the first 100 rows of the original DataFrame?
Answer: b) df.iloc[:100].copy()
Q.35 How can you create a hierarchical index (MultiIndex) DataFrame and subsequently
access a subset of this DataFrame using both levels of indexing?
• While pd.MultiIndex.from_arrays() can create a MultiIndex, it's typically used when you
have separate arrays for each level.
• df.xs() is for cross-sectional selection and is less intuitive for basic indexing.
•
(C). df.set_index() with a dictionary and df.ix[value1].ix[value2]:
• Directly creating a MultiIndex with pd.MultiIndex() is less common when working with
existing DataFrames.
• Accessing with df['value1']['value2'] is not the standard way to access MultiIndex
DataFrames and can lead to unexpected results.
Q.36 What will be the minimum number of arguments required to pass in a pandas series?
(a) 2
(b) 3
(c) 4
(d) None of the above mentioned
(a) If data is a list, if index is passed the values in data corresponding to the labels in the
index will be pulled out
(b) NaN is the standard missing data marker used in pandas
(c) Series acts very similarly to an array
(d) None of the mentioned
Ans: b