Pandas
Pandas
df = pd.DataFrame(data)
c)Drop rows with missing values but keep those where at least 3 values are non-missing.
○ For numeric columns, replace missing values with the mean of the column.
○ For categorical columns (strings), replace missing values with the mode of
the column.
Provide an example DataFrame and fill the missing values according to the above
strategies.
df = pd.DataFrame(data)
import pandas as pd
df = pd.DataFrame({
'B': [None, 2, 3, 4]
})
df.isna()
df.notna()
You can remove rows or columns with missing values using dropna().
You can also specify a threshold, for example, keeping rows with at least 2 non-NaN values:
df.dropna(thresh=2)
If you don't want to drop missing values, you can fill them with some value using fillna().
There are various strategies for filling missing data.
df.fillna(0)
Fill with a value per column (e.g., different fill values for each column):
You can use interpolation methods to estimate the missing values. This is useful for
numerical data.
You can also specify different interpolation methods, like polynomial interpolation:
df.interpolate(method='polynomial', order=2)
You can replace missing values with statistical values like the mean, median, or mode
of the column.
df.fillna(df.mean())
df.fillna(df.median())
df.fillna(df.mode().iloc[0])
After you've handled missing data, you can check if any values are still missing using: