0% found this document useful (0 votes)

9 views

50 PYTHON INTERVIEW QUESTIONS

FOCUSED ON PANDAS AND NUMPY

FOR DATA ANALYSIS

With Examples and Python Code

1|Page WhatsApp: 91 -9143407019

1. What is the difference between a Pandas Series and a NumPy array?

• Pandas Series: A one-dimensional labelled array capable of holding any data

type. It includes an index that makes it easier to label and access individual
elements. It is similar to a NumPy array but with additional functionality like
handling missing data.
• NumPy Array: A homogeneous, multi-dimensional array. Unlike Pandas Series, it
does not support labels and is limited to numerical data types, making it faster
for numerical operations.

Example:

import numpy as np

import pandas as pd

# NumPy Array

np_array = np.array([1, 2, 3, 4])

# Pandas Series

pd_series = pd.Series([1, 2, 3, 4])

print(type(np_array)) # <class 'numpy.ndarray'>

print(type(pd_series)) # <class 'pandas.core.series.Series'>

2. How do you create a DataFrame in Pandas from a dictionary?

You can create a Pandas DataFrame by passing a dictionary, where keys represent
column names and values are lists (or other iterable types) representing the data.

Example:

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}

df = pd.DataFrame(data)

print(df)

Output:

Name Age

0 Alice 25

1 Bob 30

2 Charlie 35

2|Page WhatsApp: 91 -9143407019

3. How do you convert a Pandas DataFrame into a NumPy array?

You can use the .values attribute or .to_numpy() method to convert a Pandas
DataFrame into a NumPy array.

Example:

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

np_array = df.to_numpy()

print(np_array)

Output:

[[1 4]

[2 5]

[3 6]]

4. Explain the concept of broadcasting in NumPy and how it works.

Broadcasting refers to the ability of NumPy to perform element-wise operations on

arrays of different shapes. NumPy automatically adjusts the smaller array to match
the shape of the larger array during arithmetic operations.

Example:

import numpy as np

arr1 = np.array([1, 2, 3])

arr2 = np.array([10])

# Broadcasting allows addition

result = arr1 + arr2

print(result)

Output:

[11 12 13]

In this example, arr2 is broadcasted across arr1.

3|Page WhatsApp: 91 -9143407019
5. How can you check for missing values in a Pandas DataFrame?

You can use the .isnull() or .isna() method to check for missing values. These methods
return a DataFrame of the same shape with True for missing values and False for non-
missing values.

Example:

df = pd.DataFrame({'A': [1, 2, np.nan], 'B': [4, np.nan, 6]})

print(df.isnull())

Output:

0 False False

1 False True

2 True False

6. What are some ways to handle missing data in Pandas?

• Fill missing values: Use .fillna() to replace missing values with a specific value or
method (e.g., forward fill or backward fill).
• Drop missing values: Use .dropna() to remove rows or columns with missing
values.

Example:

df = pd.DataFrame({'A': [1, 2, np.nan], 'B': [4, np.nan, 6]})

# Fill missing values with a specific value

df_filled = df.fillna(0)

print(df_filled)

# Drop rows with any missing values

df_dropped = df.dropna()

print(df_dropped)

7. What is the use of the groupby() function in Pandas?

The groupby() function in Pandas is used to group data by one or more columns and
then apply an aggregation function (such as sum, mean, etc.) on the grouped data.

Example:
4|Page WhatsApp: 91 -9143407019
df = pd.DataFrame({'Category': ['A', 'B', 'A', 'B'], 'Value': [10, 20, 30, 40]})

grouped = df.groupby('Category').sum()

print(grouped)

Output:

Value

8. How would you join two Pandas DataFrames on a specific column?

You can use the .merge() function to join two DataFrames on a common column. This
is similar to SQL joins (inner, left, right, outer).

Example:

df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})

df2 = pd.DataFrame({'ID': [1, 2, 4], 'Age': [25, 30, 40]})

merged = pd.merge(df1, df2, on='ID', how='inner')

print(merged)

Output:

ID Name Age

0 1 Alice 25

1 2 Bob 30

9. How do you filter a Pandas DataFrame based on multiple conditions?

You can use boolean indexing with multiple conditions combined using & (AND) or |
(OR) operators.

Example:

df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'Score': [85, 90,
95]})

filtered = df[(df['Age'] > 25) & (df['Score'] > 90)]

print(filtered)

5|Page WhatsApp: 91 -9143407019

Output:

Name Age Score

2 Charlie 35 95

10. Explain the difference between .loc[] and .iloc[] in Pandas.

• .loc[]: Accesses rows and columns by label (index and column names).
• .iloc[]: Accesses rows and columns by integer-location-based indexing (positions).

Example:

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['x', 'y', 'z'])

# Using .loc[] (by label)

print(df.loc['y', 'A'])

# Using .iloc[] (by position)

print(df.iloc[1, 0])

Output:

11. How can you apply a custom function to a Pandas DataFrame using apply()?

The apply() function in Pandas allows you to apply a custom function along an axis
(rows or columns) of a DataFrame.

Example:

import pandas as pd

# Sample DataFrame

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Define a custom function

def add_ten(x):

return x + 10

# Apply custom function to each element of a column

df['A'] = df['A'].apply(add_ten)

print(df)
6|Page WhatsApp: 91 -9143407019
Output:

0 11 4

1 12 5

2 13 6

12. How do you handle categorical data in Pandas?

Categorical data in Pandas can be handled using the Categorical type or the astype()
method to convert columns into categorical data types. Categorical data is efficient
for memory and speed when working with data that has a limited number of distinct
values.

Example:

df = pd.DataFrame({'Category': ['A', 'B', 'A', 'C']})

df['Category'] = df['Category'].astype('category')

print(df.dtypes)

Output:

Category category

dtype: object

You can also use the get_dummies() function to convert categorical data into one-hot
encoding.

Example:

df = pd.DataFrame({'Category': ['A', 'B', 'A', 'C']})

df_encoded = pd.get_dummies(df['Category'])

print(df_encoded)

Output:

ABC

0100

1010

2100

7|Page WhatsApp: 91 -9143407019

3001

13. What is the difference between dropna() and fillna() in Pandas?

• dropna(): Removes missing values (NaN) from the DataFrame.

• fillna(): Replaces missing values (NaN) with a specified value or method (e.g.,
forward fill, backward fill).

Example:

df = pd.DataFrame({'A': [1, 2, None], 'B': [4, None, 6]})

# Using dropna()

df_dropped = df.dropna()

print(df_dropped)

# Using fillna()

df_filled = df.fillna(0)

print(df_filled)

Output:

# dropna()

0 1.0 4.0

# fillna()

0 1.0 4.0

1 2.0 0.0

2 0.0 6.0

14. How can you create a pivot table in Pandas?

You can use the pivot_table() function in Pandas to create a pivot table. It allows
summarization of data based on one or more categorical columns.

Example:

df = pd.DataFrame({'Category': ['A', 'B', 'A', 'B'], 'Value': [10, 20, 30, 40]})

pivot_table = df.pivot_table(values='Value', index='Category', aggfunc='sum')

8|Page WhatsApp: 91 -9143407019

print(pivot_table)

Output:

Value

15. How would you remove duplicates in a Pandas DataFrame?

You can use the drop_duplicates() function in Pandas to remove duplicate rows from
a DataFrame.

Example:

df = pd.DataFrame({'A': [1, 2, 2, 3], 'B': [4, 5, 5, 6]})

df_unique = df.drop_duplicates()

print(df_unique)

Output:

014

125

336

16. How do you merge two NumPy arrays element-wise?

You can use the numpy.add(), numpy.subtract(), or any other arithmetic operator to
perform element-wise operations on two NumPy arrays.

Example:

import numpy as np

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

# Element-wise addition

result = np.add(arr1, arr2)

print(result)
9|Page WhatsApp: 91 -9143407019
Output:

[5 7 9]

17. What is the role of numpy.reshape() in manipulating arrays?

The numpy.reshape() function is used to change the shape of a NumPy array without
changing its data. It allows you to reorganize the elements into a new shape.
Example:
arr = np.array([1, 2, 3, 4, 5, 6])
reshaped_arr = np.reshape(arr, (2, 3))
print(reshaped_arr)
Output:
[[1 2 3]
[4 5 6]]
18. Explain how to compute the mean, median, and standard deviation using Pandas
and NumPy.
• Pandas: Use .mean(), .median(), and .std() for Series or DataFrame.
• NumPy: Use numpy.mean(), numpy.median(), and numpy.std() for arrays.
Example:
import pandas as pd
import numpy as np
# Pandas
df = pd.DataFrame({'A': [1, 2, 3, 4, 5]})
mean_pandas = df['A'].mean()
median_pandas = df['A'].median()
std_pandas = df['A'].std()
print(mean_pandas, median_pandas, std_pandas)
# NumPy
arr = np.array([1, 2, 3, 4, 5])
mean_numpy = np.mean(arr)
median_numpy = np.median(arr)
std_numpy = np.std(arr)
print(mean_numpy, median_numpy, std_numpy)
3.0 3.0 1.5811388300841898
3.0 3.0 1.4142135623730951

10 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
19. How do you perform element-wise mathematical operations on a Pandas
DataFrame?
You can perform element-wise mathematical operations on a Pandas DataFrame
directly using operators like +, -, *, /, etc.
Example:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df_result = df + 10 # Add 10 to each element
print(df_result)
Output:
AB
0 11 14
1 12 15
2 13 16
20. How can you perform data normalization using NumPy?
Data normalization can be done by scaling the data to a specific range, typically [0,
1], using the formula:
Example:
arr = np.array([10, 20, 30, 40, 50])
normalized_arr = (arr - arr.min()) / (arr.max() - arr.min())
print(normalized_arr)
Output:
[0. 0.25 0.5 0.75 1. ]

11 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
21. What is the purpose of the crosstab() function in Pandas?

The crosstab() function computes a cross-tabulation of two (or more) factors. It is

useful for summarizing data by creating contingency tables that show the frequency
of different combinations of categories.

Example:

import pandas as pd

# Sample DataFrame

df = pd.DataFrame({'Category': ['A', 'B', 'A', 'B', 'C', 'A'],

'Value': [10, 20, 10, 20, 30, 30]})

# Crosstab function

crosstab_result = pd.crosstab(df['Category'], df['Value'])

print(crosstab_result)

Output:

Value 10 20 30

22. How can you handle time-series data in Pandas?

Pandas provide powerful tools for handling time-series data, such as to_datetime() to
convert a column to DateTime format, and resampling methods like .resample() to
perform operations such as aggregation.

Example:

import pandas as pd

# Sample time-series data

data = {'Date': ['2024-01-01', '2024-01-02', '2024-01-03'],

'Value': [10, 20, 30]}

df = pd.DataFrame(data)

12 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
# Convert the 'Date' column to datetime

df['Date'] = pd.to_datetime(df['Date'])

# Set 'Date' as the index

df.set_index('Date', inplace=True)

# Resample by day and calculate the sum

resampled_df = df.resample('D').sum()

print(resampled_df)

Output:

Value

Date

2024-01-01 10

2024-01-02 20

2024-01-03 30

23. How would you calculate the correlation between two columns in a Pandas
DataFrame?

You can calculate the correlation between two columns using the .corr() method,
which computes the Pearson correlation coefficient by default.

Example:

df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [4, 3, 2, 1]})

correlation = df['A'].corr(df['B'])

print(correlation)

Output:

-1.0

24. How can you filter rows in a Pandas DataFrame where a specific column's value
is greater than a certain threshold?

You can filter rows by using boolean indexing where the column's values meet the
condition.

Example:

13 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
df = pd.DataFrame({'A': [10, 20, 30, 40], 'B': [5, 15, 25, 35]})

filtered_df = df[df['A'] > 20]

print(filtered_df)

Output:

2 30 25

3 40 35

25. What is the difference between numpy.concatenate() and numpy.vstack()?

• numpy.concatenate(): Joins two or more arrays along an existing axis (specified

by the axis parameter).
• numpy.vstack(): Stacks arrays vertically (along rows), i.e., add new rows to the
existing array.

Example:

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

# Using concatenate

concatenated = np.concatenate([arr1, arr2])

print(concatenated)

# Using vstack

stacked = np.vstack([arr1, arr2])

print(stacked)

Output:

# concatenate

[1 2 3 4 5 6]

# vstack

[[1 2 3]

[4 5 6]]

26. How do you create a random NumPy array of integers between two values?

14 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
You can use numpy.random.randint() to generate random integers within a specified
range.

Example:

random_array = np.random.randint(1, 10, size=5) # Generates 5 random integers

between 1 and 10

print(random_array)

Output::

[5 2 7 4 8]

27. How do you calculate a rolling mean in Pandas?

You can calculate a rolling mean using the .rolling() method followed by .mean() to
compute the moving average.

Example:

df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6]})

rolling_mean = df['A'].rolling(window=3).mean()

print(rolling_mean)

Output:

0 NaN

1 NaN

2 2.0

3 3.0

4 4.0

5 5.0

Name: A, dtype: float64

28. What is the difference between np.mean() and np.median() in NumPy?

• np.mean(): Calculates the arithmetic mean (average) of the values in an array.

• np.median(): Calculates the median value (middle value) of the values in an array.

Example:

arr = np.array([1, 2, 3, 4, 5])

15 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
mean_value = np.mean(arr)

median_value = np.median(arr)

print(mean_value, median_value)

Output:

3.0 3.0

29. How can you concatenate two DataFrames along rows and columns in Pandas?

• Along rows: Use pd.concat() with axis=0.

• Along columns: Use pd.concat() with axis=1.

Example:

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Concatenate along rows

df_rows = pd.concat([df1, df2], axis=0)

print(df_rows)

# Concatenate along columns

df_columns = pd.concat([df1, df2], axis=1)

print(df_columns)

Output:

# Along rows

013

124

057

168

# Along columns

ABAB

01357

16 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
12468

30. Explain how to handle outliers in a dataset using Pandas and NumPy.

Outliers can be handled by removing or replacing them. You can identify outliers by
calculating statistical measures like Z-scores or using percentile-based thresholds
(e.g., using IQR method). Once identified, you can either replace them or remove
them.

Example:

# Z-score method

from scipy import stats

df = pd.DataFrame({'A': [1, 2, 3, 100, 5]})

z_scores = stats.zscore(df['A'])

df_no_outliers = df[(z_scores < 3)] # Removing rows with outliers

print(df_no_outliers)

Output:

31. How do you sort a Pandas DataFrame based on multiple columns?

You can use the .sort_values() function to sort a DataFrame by multiple columns by
passing a list of column names.

Example:

df = pd.DataFrame({'A': [3, 1, 2], 'B': [4, 5, 6]})

sorted_df = df.sort_values(by=['A', 'B'])

print(sorted_df)

Output::

17 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
115

226

034

32. How can you apply vectorized operations in NumPy to improve performance?

Vectorized operations in NumPy allow you to perform operations on arrays without

explicit for-loops, which is computationally efficient. These operations use
underlying C code for faster execution.

Example:

arr1 = np.array([1, 2, 3, 4])

arr2 = np.array([5, 6, 7, 8])

# Vectorized addition (avoiding loops)

result = arr1 + arr2

print(result)

Output:

[ 6 8 10 12]

33. How would you merge two Pandas DataFrames with different column names?

You can merge DataFrames with different column names using the left_on and
right_on parameters in the merge() function.

Example:

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': ['X', 'Y', 'Z']})

df2 = pd.DataFrame({'C': [4, 5, 6], 'D': ['P', 'Q', 'R']})

merged_df = pd.merge(df1, df2, left_on='A', right_on='C')

print(merged_df)

Output:

ABCD

01X4P

12Y5Q

23Z6R

18 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
34. How can you perform the element-wise comparison between two NumPy
arrays?

Element-wise comparison between two arrays can be performed using comparison

operators like ==, !=, <, >, etc.

Example:

arr1 = np.array([1, 2, 3])

arr2 = np.array([3, 2, 1])

result = arr1 == arr2

print(result)

Output:

[False True False]

35. How would you handle missing values in NumPy arrays?

You can handle missing values in NumPy arrays by using np.nan for missing values
and using functions like np.isnan(), np.nanmean(), np.nanstd() to handle these values.

Example:

arr = np.array([1, 2, np.nan, 4, 5])

# Checking for NaN values

nan_values = np.isnan(arr)

print(nan_values)

# Replacing NaN with mean

mean_value = np.nanmean(arr)

arr[np.isnan(arr)] = mean_value

print(arr)

Output:

[False False True False False]

[1. 2. 3. 4. 5.]

36. What is the purpose of numpy.nanmean() and numpy.nansum()?

• numpy.nanmean(): Computes the mean of an array, ignoring NaN values.

19 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
• numpy.nansum(): Computes the sum of an array, ignoring NaN values.

Example:

import numpy as np

arr = np.array([1, 2, np.nan, 4, 5])

# Calculate mean ignoring NaN

mean_value = np.nanmean(arr)

print(mean_value) # Output: 3.0

# Calculate sum ignoring NaN

sum_value = np.nansum(arr)

print(sum_value) # Output: 12

37. How can you generate a sequence of numbers in NumPy using numpy.arange()?

numpy.arange() generates an array of evenly spaced values within a specified range.

Example:

arr = np.arange(0, 10, 2)

print(arr) # Output: [0 2 4 6 8]

38. How do you create a Pandas DataFrame from a CSV file?

You can use the pd.read_csv() function to read a CSV file and create a Pandas
DataFrame.

Example:

import pandas as pd

# Reading a CSV file

df = pd.read_csv('data.csv')

print(df.head())

39. What is the difference between np.dot() and np.matmul()?

• np.dot(): Computes the dot product of two arrays. It works for both 1D and 2D
arrays.
• np.matmul(): Performs matrix multiplication. It is equivalent to the @ operator
and works for 2D arrays or higher-dimensional arrays (like tensors).

20 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
Example:

# Dot product

a = np.array([1, 2])

b = np.array([3, 4])

dot_product = np.dot(a, b)

print(dot_product) # Output: 11

# Matrix multiplication

A = np.array([[1, 2], [3, 4]])

B = np.array([[5, 6], [7, 8]])

matmul_result = np.matmul(A, B)

print(matmul_result)

Output:

# [[19 22]

# [43 50]]

40. How do you apply a lambda function to a Pandas DataFrame column?

You can use the apply() function to apply a lambda function to a column in a Pandas
DataFrame.

Example:

df = pd.DataFrame({'A': [1, 2, 3, 4]})

df['B'] = df['A'].apply(lambda x: x * 2)

print(df)

Output:

012

124

236

348

21 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
41. How do you handle and work with dates and times in Pandas?

Pandas provides the pd.to_datetime() function to convert strings to datetime

objects. You can also manipulate date and time with methods like .dt.

Example:

# Convert string to datetime

df = pd.DataFrame({'Date': ['2024-01-01', '2024-01-02']})

df['Date'] = pd.to_datetime(df['Date'])

# Extract year and month

df['Year'] = df['Date'].dt.year

df['Month'] = df['Date'].dt.month

print(df)

Output:

Date Year Month

0 2024-01-01 2024 1

1 2024-01-02 2024 1

42. What is the purpose of numpy.random.rand() in generating random values?

numpy.random.rand() generates random values from a uniform distribution over [0,

1).

Example:

rand_values = np.random.rand(3, 2)

print(rand_values)

Output:

A 3x2 array of random values between 0 and 1

43. How would you use NumPy to compute the dot product of two vectors?

You can use np.dot() to compute the dot product of two vectors.

Example:

a = np.array([1, 2])

b = np.array([3, 4])
22 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
dot_product = np.dot(a, b)

print(dot_product) # Output: 11

44. Explain the concept of "vectorization" in NumPy and why it's beneficial.

Vectorization refers to performing operations on entire arrays (vectors) without

explicit loops. It is more efficient because it leverages low-level optimizations in
NumPy for faster computation.

Example:

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

# Without vectorization (using a loop)

result = [a * b for a, b in zip(arr1, arr2)]

print(result) # Output: [4, 10, 18]

# With vectorization

result_vectorized = arr1 * arr2

print(result_vectorized) # Output: [4 10 18]

Vectorization is beneficial because it speeds up operations by avoiding loops and

taking advantage of optimized libraries.

45. How would you calculate the standard deviation of a column in a Pandas
DataFrame?

You can use the .std() method to calculate the standard deviation of a column.

Example:

df = pd.DataFrame({'A': [1, 2, 3, 4, 5]})

std_dev = df['A'].std()

print(std_dev) # Output: 1.5811388300841898

46. How can you count the occurrences of unique values in a Pandas DataFrame
column?

You can use the .value_counts() method to count unique values in a column.

Example:

23 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
df = pd.DataFrame({'A': ['cat', 'dog', 'cat', 'cat', 'dog']})

value_counts = df['A'].value_counts()

print(value_counts)

Output:

cat 3

dog 2

Name: A, dtype: int64

47. How can you perform matrix multiplication using NumPy arrays?

You can use np.matmul() or np.dot() to perform matrix multiplication.

Example:

A = np.array([[1, 2], [3, 4]])

B = np.array([[5, 6], [7, 8]])

matrix_product = np.matmul(A, B)

print(matrix_product)

Output:

# [[19 22]

# [43 50]]

48. What is the use of numpy.unique() in NumPy?

numpy.unique() returns the sorted unique elements of an array.

Example:

arr = np.array([1, 2, 2, 3, 3, 3, 4])

unique_values = np.unique(arr)

print(unique_values) # Output: [1 2 3 4]

49. How do you handle duplicates in a Pandas DataFrame based on multiple

columns?

You can use the .drop_duplicates() method and specify the subset of columns to
check for duplicates.

Example:
24 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9
df = pd.DataFrame({'A': [1, 2, 2, 3], 'B': [4, 5, 5, 6]})

df_no_duplicates = df.drop_duplicates(subset=['A', 'B'])

print(df_no_duplicates)

Output:

014

125

236

50. What are some ways to optimize the performance of large datasets in Pandas?

To optimize the performance of large datasets:

1. Use categorical data types for columns with repeating values.

2. Use efficient file formats like Parquet or Feather instead of CSV for large
datasets.
3. Load data in chunks with pd.read_csv(chunk_size=...) when working with large
files.
4. Vectorize operations instead of using loops.
5. Reduce memory usage by selecting appropriate data types for columns (e.g.,
using float32 instead of float64).

Example:

# Use categorical data type to optimize memory usage

df['Category'] = df['Category'].astype('category')

25 | P a g e W h a t s A p p : 9 1 - 9 1 4 3 4 0 7 0 1 9

Lab 9
No ratings yet
Lab 9
9 pages
Document
No ratings yet
Document
4 pages
Pandas Viva Questions
No ratings yet
Pandas Viva Questions
23 pages
Data Science - Unit-3-Part-2
No ratings yet
Data Science - Unit-3-Part-2
32 pages
Python ClassXII AI
No ratings yet
Python ClassXII AI
4 pages
Pandas & Numpy
No ratings yet
Pandas & Numpy
32 pages
Experiment 1 solution
No ratings yet
Experiment 1 solution
5 pages
Pandas Dataframe
No ratings yet
Pandas Dataframe
48 pages
Python CSBS Bhavya Lab Manual
No ratings yet
Python CSBS Bhavya Lab Manual
14 pages
P Unit-4 NP
No ratings yet
P Unit-4 NP
30 pages
09_Pandas slides
No ratings yet
09_Pandas slides
33 pages
Informatics Practices Book 12 Answer Key
No ratings yet
Informatics Practices Book 12 Answer Key
54 pages
2 Python Data Processing
100% (2)
2 Python Data Processing
66 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
UNIT 3(Chapter 2) Pandas
No ratings yet
UNIT 3(Chapter 2) Pandas
43 pages
Informatic Practices Hhw
No ratings yet
Informatic Practices Hhw
21 pages
Exercise 3
No ratings yet
Exercise 3
12 pages
Mdad - Numpy ML
No ratings yet
Mdad - Numpy ML
85 pages
Python UnitIV
No ratings yet
Python UnitIV
20 pages
Descriptive Statistics With Pandas: Data Handling Using Pandas - II
100% (1)
Descriptive Statistics With Pandas: Data Handling Using Pandas - II
37 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
32 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
Informatic Practices Hhw (3)
No ratings yet
Informatic Practices Hhw (3)
59 pages
Pandas Notes (1)
No ratings yet
Pandas Notes (1)
10 pages
Data Science - Unit II
100% (2)
Data Science - Unit II
173 pages
Usage of NumPy for Numerical Data in Detail
No ratings yet
Usage of NumPy for Numerical Data in Detail
52 pages
7 Days Analytics Course 3feiz7 4
No ratings yet
7 Days Analytics Course 3feiz7 4
8 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
Pandas,Numpy,Matplotlib
No ratings yet
Pandas,Numpy,Matplotlib
11 pages
ML Lab File Vijay Kumar
No ratings yet
ML Lab File Vijay Kumar
16 pages
Pandas CheatSheet
No ratings yet
Pandas CheatSheet
18 pages
99c949c0-5910-425f-9ac5-155882800fa5
No ratings yet
99c949c0-5910-425f-9ac5-155882800fa5
36 pages
Programs of Python Pandas
No ratings yet
Programs of Python Pandas
15 pages
SBLC 1
No ratings yet
SBLC 1
23 pages
CS3361-DATA SCIENCE LAB MANUAL
No ratings yet
CS3361-DATA SCIENCE LAB MANUAL
44 pages
Pandas Interview Questions
No ratings yet
Pandas Interview Questions
21 pages
XII - Informatics Practices (LAB MANUAL)
100% (1)
XII - Informatics Practices (LAB MANUAL)
42 pages
pandas (1)
No ratings yet
pandas (1)
25 pages
Numpy&pandas
No ratings yet
Numpy&pandas
17 pages
Pandas
No ratings yet
Pandas
29 pages
Pandas Questions
No ratings yet
Pandas Questions
11 pages
Pandas Library
No ratings yet
Pandas Library
5 pages
Python Libraries
No ratings yet
Python Libraries
27 pages
chai
No ratings yet
chai
5 pages
Financial Analytics With Python
100% (1)
Financial Analytics With Python
40 pages
Working With Pandas Notes
No ratings yet
Working With Pandas Notes
27 pages
Pandas
No ratings yet
Pandas
16 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Unit-03: Capturing, Preparing and Working With Data
No ratings yet
Unit-03: Capturing, Preparing and Working With Data
41 pages
Lab File
No ratings yet
Lab File
96 pages
Data Preprocessing Python Tome III
No ratings yet
Data Preprocessing Python Tome III
12 pages
Block 1-Data Handling Using Pandas DataFrame
No ratings yet
Block 1-Data Handling Using Pandas DataFrame
17 pages
Unit-4_PSC
No ratings yet
Unit-4_PSC
105 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
135 pages
AIML 01 Merged
No ratings yet
AIML 01 Merged
25 pages
Pandas
No ratings yet
Pandas
11 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
Saish IP Project
No ratings yet
Saish IP Project
16 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Growth of Public Expenditure_55423
No ratings yet
Growth of Public Expenditure_55423
21 pages
poverty
No ratings yet
poverty
5 pages
msc macro lecture slides
No ratings yet
msc macro lecture slides
80 pages
Indian Contract Act
No ratings yet
Indian Contract Act
61 pages
Macroeconomics 1730014738
No ratings yet
Macroeconomics 1730014738
18 pages
Workshop 2.1 Geometry Repair - Engine Block: Introduction To ANSYS Icem CFD
No ratings yet
Workshop 2.1 Geometry Repair - Engine Block: Introduction To ANSYS Icem CFD
20 pages
Info Pract Xii QP PB 1 Set 3
No ratings yet
Info Pract Xii QP PB 1 Set 3
7 pages
The Ethical Hackers Mindset Approaching Security Challenges Like A Pro
No ratings yet
The Ethical Hackers Mindset Approaching Security Challenges Like A Pro
139 pages
DLTOVR
No ratings yet
DLTOVR
5 pages
IP - preboard PPS
No ratings yet
IP - preboard PPS
3 pages
Data Structures - Data Structures - : Lecture 1: Introduction
No ratings yet
Data Structures - Data Structures - : Lecture 1: Introduction
55 pages
Dsoop Qu Ans
No ratings yet
Dsoop Qu Ans
33 pages
Google Apps Script Web Apps_ Comprehensive Guide
No ratings yet
Google Apps Script Web Apps_ Comprehensive Guide
9 pages
Output: Sample Web Interface
No ratings yet
Output: Sample Web Interface
3 pages
Capstone Project 1 Manual
100% (1)
Capstone Project 1 Manual
43 pages
NRSC Risat 1 Data Format 1
No ratings yet
NRSC Risat 1 Data Format 1
64 pages
MCQs
100% (1)
MCQs
40 pages
Development of Navigation Guidance and Control Technology For Indian Launch Vehicles
No ratings yet
Development of Navigation Guidance and Control Technology For Indian Launch Vehicles
15 pages
TSM Cli - TDP For SQL - Restoring An SQL Database To An Alternate Database or Machine Using DP For SQL
No ratings yet
TSM Cli - TDP For SQL - Restoring An SQL Database To An Alternate Database or Machine Using DP For SQL
1 page
Practical Finite Element Modeling Techniques Using MSC - Nastran
No ratings yet
Practical Finite Element Modeling Techniques Using MSC - Nastran
4 pages
SAP Interview Questions and Answers
No ratings yet
SAP Interview Questions and Answers
86 pages
Zoommodem V.92 Pci: Model 3025
No ratings yet
Zoommodem V.92 Pci: Model 3025
2 pages
L10 Requirement Engineering
No ratings yet
L10 Requirement Engineering
31 pages
BLE Stack API Reference v2.2
No ratings yet
BLE Stack API Reference v2.2
195 pages
Search Operation in An Unsorted Array, The Search Operation Can Be Performed by Linear Traversal From The First Element To The Last Element
No ratings yet
Search Operation in An Unsorted Array, The Search Operation Can Be Performed by Linear Traversal From The First Element To The Last Element
5 pages
Dds 2v0
100% (2)
Dds 2v0
14 pages
FirstCitizens Eng v01
No ratings yet
FirstCitizens Eng v01
2 pages
Dealers Dashboard Angel Broking
No ratings yet
Dealers Dashboard Angel Broking
31 pages
CourseBook Course-Module Catalogue-Computer Essentials (1st Stage)
No ratings yet
CourseBook Course-Module Catalogue-Computer Essentials (1st Stage)
5 pages
MASS SPECC Annual Report PDF
No ratings yet
MASS SPECC Annual Report PDF
66 pages
MSOFTX3000 Hardware Introduction ISSUE2.1
100% (1)
MSOFTX3000 Hardware Introduction ISSUE2.1
106 pages
PL 10 CH 10
No ratings yet
PL 10 CH 10
31 pages
C Language Notes - Latest
100% (1)
C Language Notes - Latest
48 pages
Coolmathgames Staticdata
No ratings yet
Coolmathgames Staticdata
214 pages
Case Studies of Digital Work_Toward-the-workplace-of-the-future-How-organizations-can-facilitate-digital-work
No ratings yet
Case Studies of Digital Work_Toward-the-workplace-of-the-future-How-organizations-can-facilitate-digital-work
13 pages

Top Python Questions 1735201448

Uploaded by

Top Python Questions 1735201448

Uploaded by

50 PYTHON INTERVIEW QUESTIONS

FOCUSED ON PANDAS AND NUMPY

FOR DATA ANALYSIS

With Examples and Python Code

1|Page WhatsApp: 91 -9143407019

• Pandas Series: A one-dimensional labelled array capable of holding any data

np_array = np.array([1, 2, 3, 4])

pd_series = pd.Series([1, 2, 3, 4])

print(type(np_array)) # <class 'numpy.ndarray'>

print(type(pd_series)) # <class 'pandas.core.series.Series'>

2. How do you create a DataFrame in Pandas from a dictionary?

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}

2|Page WhatsApp: 91 -9143407019

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

4. Explain the concept of broadcasting in NumPy and how it works.

Broadcasting refers to the ability of NumPy to perform element-wise operations on

arr1 = np.array([1, 2, 3])

# Broadcasting allows addition

result = arr1 + arr2

In this example, arr2 is broadcasted across arr1.

df = pd.DataFrame({'A': [1, 2, np.nan], 'B': [4, np.nan, 6]})

6. What are some ways to handle missing data in Pandas?

df = pd.DataFrame({'A': [1, 2, np.nan], 'B': [4, np.nan, 6]})

# Fill missing values with a specific value

# Drop rows with any missing values

7. What is the use of the groupby() function in Pandas?

8. How would you join two Pandas DataFrames on a specific column?

df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})

df2 = pd.DataFrame({'ID': [1, 2, 4], 'Age': [25, 30, 40]})

merged = pd.merge(df1, df2, on='ID', how='inner')

9. How do you filter a Pandas DataFrame based on multiple conditions?

filtered = df[(df['Age'] > 25) & (df['Score'] > 90)]

5|Page WhatsApp: 91 -9143407019

Name Age Score

10. Explain the difference between .loc[] and .iloc[] in Pandas.

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['x', 'y', 'z'])

# Using .loc[] (by label)

# Using .iloc[] (by position)

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Define a custom function

# Apply custom function to each element of a column

12. How do you handle categorical data in Pandas?

df = pd.DataFrame({'Category': ['A', 'B', 'A', 'C']})

df = pd.DataFrame({'Category': ['A', 'B', 'A', 'C']})

7|Page WhatsApp: 91 -9143407019

13. What is the difference between dropna() and fillna() in Pandas?

• dropna(): Removes missing values (NaN) from the DataFrame.

df = pd.DataFrame({'A': [1, 2, None], 'B': [4, None, 6]})

14. How can you create a pivot table in Pandas?

pivot_table = df.pivot_table(values='Value', index='Category', aggfunc='sum')

8|Page WhatsApp: 91 -9143407019

15. How would you remove duplicates in a Pandas DataFrame?

df = pd.DataFrame({'A': [1, 2, 2, 3], 'B': [4, 5, 5, 6]})

16. How do you merge two NumPy arrays element-wise?

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

result = np.add(arr1, arr2)

17. What is the role of numpy.reshape() in manipulating arrays?

The crosstab() function computes a cross-tabulation of two (or more) factors. It is

df = pd.DataFrame({'Category': ['A', 'B', 'A', 'B', 'C', 'A'],

'Value': [10, 20, 10, 20, 30, 30]})

crosstab_result = pd.crosstab(df['Category'], df['Value'])

22. How can you handle time-series data in Pandas?

# Sample time-series data

data = {'Date': ['2024-01-01', '2024-01-02', '2024-01-03'],

'Value': [10, 20, 30]}

# Set 'Date' as the index

# Resample by day and calculate the sum

df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [4, 3, 2, 1]})

filtered_df = df[df['A'] > 20]

25. What is the difference between numpy.concatenate() and numpy.vstack()?

• numpy.concatenate(): Joins two or more arrays along an existing axis (specified

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

concatenated = np.concatenate([arr1, arr2])

stacked = np.vstack([arr1, arr2])