Top Python Questions 1735201448
Top Python Questions 1735201448
Example:
import numpy as np
import pandas as pd
# NumPy Array
# Pandas Series
You can create a Pandas DataFrame by passing a dictionary, where keys represent
column names and values are lists (or other iterable types) representing the data.
Example:
df = pd.DataFrame(data)
print(df)
Output:
Name Age
0 Alice 25
1 Bob 30
2 Charlie 35
You can use the .values attribute or .to_numpy() method to convert a Pandas
DataFrame into a NumPy array.
Example:
np_array = df.to_numpy()
print(np_array)
Output:
[[1 4]
[2 5]
[3 6]]
Example:
import numpy as np
arr2 = np.array([10])
print(result)
Output:
[11 12 13]
You can use the .isnull() or .isna() method to check for missing values. These methods
return a DataFrame of the same shape with True for missing values and False for non-
missing values.
Example:
print(df.isnull())
Output:
AB
0 False False
1 False True
2 True False
• Fill missing values: Use .fillna() to replace missing values with a specific value or
method (e.g., forward fill or backward fill).
• Drop missing values: Use .dropna() to remove rows or columns with missing
values.
Example:
df_filled = df.fillna(0)
print(df_filled)
df_dropped = df.dropna()
print(df_dropped)
The groupby() function in Pandas is used to group data by one or more columns and
then apply an aggregation function (such as sum, mean, etc.) on the grouped data.
Example:
4|Page WhatsApp: 91 -9143407019
df = pd.DataFrame({'Category': ['A', 'B', 'A', 'B'], 'Value': [10, 20, 30, 40]})
grouped = df.groupby('Category').sum()
print(grouped)
Output:
Value
Category
A 40
B 60
You can use the .merge() function to join two DataFrames on a common column. This
is similar to SQL joins (inner, left, right, outer).
Example:
print(merged)
Output:
ID Name Age
0 1 Alice 25
1 2 Bob 30
You can use boolean indexing with multiple conditions combined using & (AND) or |
(OR) operators.
Example:
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'Score': [85, 90,
95]})
print(filtered)
2 Charlie 35 95
• .loc[]: Accesses rows and columns by label (index and column names).
• .iloc[]: Accesses rows and columns by integer-location-based indexing (positions).
Example:
print(df.loc['y', 'A'])
print(df.iloc[1, 0])
Output:
11. How can you apply a custom function to a Pandas DataFrame using apply()?
The apply() function in Pandas allows you to apply a custom function along an axis
(rows or columns) of a DataFrame.
Example:
import pandas as pd
# Sample DataFrame
def add_ten(x):
return x + 10
df['A'] = df['A'].apply(add_ten)
print(df)
6|Page WhatsApp: 91 -9143407019
Output:
AB
0 11 4
1 12 5
2 13 6
Categorical data in Pandas can be handled using the Categorical type or the astype()
method to convert columns into categorical data types. Categorical data is efficient
for memory and speed when working with data that has a limited number of distinct
values.
Example:
df['Category'] = df['Category'].astype('category')
print(df.dtypes)
Output:
Category category
dtype: object
You can also use the get_dummies() function to convert categorical data into one-hot
encoding.
Example:
df_encoded = pd.get_dummies(df['Category'])
print(df_encoded)
Output:
ABC
0100
1010
2100
Example:
# Using dropna()
df_dropped = df.dropna()
print(df_dropped)
# Using fillna()
df_filled = df.fillna(0)
print(df_filled)
Output:
# dropna()
AB
0 1.0 4.0
# fillna()
AB
0 1.0 4.0
1 2.0 0.0
2 0.0 6.0
You can use the pivot_table() function in Pandas to create a pivot table. It allows
summarization of data based on one or more categorical columns.
Example:
df = pd.DataFrame({'Category': ['A', 'B', 'A', 'B'], 'Value': [10, 20, 30, 40]})
Output:
Value
Category
A 40
B 60
You can use the drop_duplicates() function in Pandas to remove duplicate rows from
a DataFrame.
Example:
df_unique = df.drop_duplicates()
print(df_unique)
Output:
AB
014
125
336
You can use the numpy.add(), numpy.subtract(), or any other arithmetic operator to
perform element-wise operations on two NumPy arrays.
Example:
import numpy as np
# Element-wise addition
print(result)
9|Page WhatsApp: 91 -9143407019
Output:
[5 7 9]
10 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
19. How do you perform element-wise mathematical operations on a Pandas
DataFrame?
You can perform element-wise mathematical operations on a Pandas DataFrame
directly using operators like +, -, *, /, etc.
Example:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df_result = df + 10 # Add 10 to each element
print(df_result)
Output:
AB
0 11 14
1 12 15
2 13 16
20. How can you perform data normalization using NumPy?
Data normalization can be done by scaling the data to a specific range, typically [0,
1], using the formula:
Example:
arr = np.array([10, 20, 30, 40, 50])
normalized_arr = (arr - arr.min()) / (arr.max() - arr.min())
print(normalized_arr)
Output:
[0. 0.25 0.5 0.75 1. ]
11 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
21. What is the purpose of the crosstab() function in Pandas?
Example:
import pandas as pd
# Sample DataFrame
# Crosstab function
print(crosstab_result)
Output:
Value 10 20 30
Category
A201
B020
C001
Pandas provide powerful tools for handling time-series data, such as to_datetime() to
convert a column to DateTime format, and resampling methods like .resample() to
perform operations such as aggregation.
Example:
import pandas as pd
df = pd.DataFrame(data)
12 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
# Convert the 'Date' column to datetime
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
resampled_df = df.resample('D').sum()
print(resampled_df)
Output:
Value
Date
2024-01-01 10
2024-01-02 20
2024-01-03 30
23. How would you calculate the correlation between two columns in a Pandas
DataFrame?
You can calculate the correlation between two columns using the .corr() method,
which computes the Pearson correlation coefficient by default.
Example:
correlation = df['A'].corr(df['B'])
print(correlation)
Output:
-1.0
24. How can you filter rows in a Pandas DataFrame where a specific column's value
is greater than a certain threshold?
You can filter rows by using boolean indexing where the column's values meet the
condition.
Example:
13 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
df = pd.DataFrame({'A': [10, 20, 30, 40], 'B': [5, 15, 25, 35]})
print(filtered_df)
Output:
AB
2 30 25
3 40 35
Example:
# Using concatenate
print(concatenated)
# Using vstack
print(stacked)
Output:
# concatenate
[1 2 3 4 5 6]
# vstack
[[1 2 3]
[4 5 6]]
26. How do you create a random NumPy array of integers between two values?
14 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
You can use numpy.random.randint() to generate random integers within a specified
range.
Example:
print(random_array)
Output::
[5 2 7 4 8]
You can calculate a rolling mean using the .rolling() method followed by .mean() to
compute the moving average.
Example:
rolling_mean = df['A'].rolling(window=3).mean()
print(rolling_mean)
Output:
0 NaN
1 NaN
2 2.0
3 3.0
4 4.0
5 5.0
Example:
15 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
mean_value = np.mean(arr)
median_value = np.median(arr)
print(mean_value, median_value)
Output:
3.0 3.0
29. How can you concatenate two DataFrames along rows and columns in Pandas?
Example:
print(df_rows)
print(df_columns)
Output:
# Along rows
AB
013
124
057
168
# Along columns
ABAB
01357
16 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
12468
30. Explain how to handle outliers in a dataset using Pandas and NumPy.
Outliers can be handled by removing or replacing them. You can identify outliers by
calculating statistical measures like Z-scores or using percentile-based thresholds
(e.g., using IQR method). Once identified, you can either replace them or remove
them.
Example:
# Z-score method
z_scores = stats.zscore(df['A'])
print(df_no_outliers)
Output:
01
12
23
45
You can use the .sort_values() function to sort a DataFrame by multiple columns by
passing a list of column names.
Example:
print(sorted_df)
Output::
AB
17 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
115
226
034
32. How can you apply vectorized operations in NumPy to improve performance?
Example:
print(result)
Output:
[ 6 8 10 12]
33. How would you merge two Pandas DataFrames with different column names?
You can merge DataFrames with different column names using the left_on and
right_on parameters in the merge() function.
Example:
print(merged_df)
Output:
ABCD
01X4P
12Y5Q
23Z6R
18 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
34. How can you perform the element-wise comparison between two NumPy
arrays?
Example:
print(result)
Output:
You can handle missing values in NumPy arrays by using np.nan for missing values
and using functions like np.isnan(), np.nanmean(), np.nanstd() to handle these values.
Example:
nan_values = np.isnan(arr)
print(nan_values)
mean_value = np.nanmean(arr)
arr[np.isnan(arr)] = mean_value
print(arr)
Output:
[1. 2. 3. 4. 5.]
19 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
• numpy.nansum(): Computes the sum of an array, ignoring NaN values.
Example:
import numpy as np
mean_value = np.nanmean(arr)
sum_value = np.nansum(arr)
print(sum_value) # Output: 12
37. How can you generate a sequence of numbers in NumPy using numpy.arange()?
Example:
print(arr) # Output: [0 2 4 6 8]
You can use the pd.read_csv() function to read a CSV file and create a Pandas
DataFrame.
Example:
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head())
• np.dot(): Computes the dot product of two arrays. It works for both 1D and 2D
arrays.
• np.matmul(): Performs matrix multiplication. It is equivalent to the @ operator
and works for 2D arrays or higher-dimensional arrays (like tensors).
20 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
Example:
# Dot product
a = np.array([1, 2])
b = np.array([3, 4])
dot_product = np.dot(a, b)
print(dot_product) # Output: 11
# Matrix multiplication
matmul_result = np.matmul(A, B)
print(matmul_result)
Output:
# [[19 22]
# [43 50]]
You can use the apply() function to apply a lambda function to a column in a Pandas
DataFrame.
Example:
df['B'] = df['A'].apply(lambda x: x * 2)
print(df)
Output:
AB
012
124
236
348
21 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
41. How do you handle and work with dates and times in Pandas?
Example:
df['Date'] = pd.to_datetime(df['Date'])
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
print(df)
Output:
0 2024-01-01 2024 1
1 2024-01-02 2024 1
Example:
rand_values = np.random.rand(3, 2)
print(rand_values)
Output:
43. How would you use NumPy to compute the dot product of two vectors?
You can use np.dot() to compute the dot product of two vectors.
Example:
a = np.array([1, 2])
b = np.array([3, 4])
22 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
dot_product = np.dot(a, b)
print(dot_product) # Output: 11
44. Explain the concept of "vectorization" in NumPy and why it's beneficial.
Example:
# With vectorization
45. How would you calculate the standard deviation of a column in a Pandas
DataFrame?
You can use the .std() method to calculate the standard deviation of a column.
Example:
std_dev = df['A'].std()
46. How can you count the occurrences of unique values in a Pandas DataFrame
column?
You can use the .value_counts() method to count unique values in a column.
Example:
23 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
df = pd.DataFrame({'A': ['cat', 'dog', 'cat', 'cat', 'dog']})
value_counts = df['A'].value_counts()
print(value_counts)
Output:
cat 3
dog 2
47. How can you perform matrix multiplication using NumPy arrays?
Example:
matrix_product = np.matmul(A, B)
print(matrix_product)
Output:
# [[19 22]
# [43 50]]
Example:
unique_values = np.unique(arr)
print(unique_values) # Output: [1 2 3 4]
You can use the .drop_duplicates() method and specify the subset of columns to
check for duplicates.
Example:
24 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
df = pd.DataFrame({'A': [1, 2, 2, 3], 'B': [4, 5, 5, 6]})
print(df_no_duplicates)
Output:
AB
014
125
236
50. What are some ways to optimize the performance of large datasets in Pandas?
Example:
df['Category'] = df['Category'].astype('category')
25 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9