Avinash DA 6
Avinash DA 6
CODE:
import pandas as pd import numpy as np from
df['Salary'] = df['Salary'].fillna(method='pad')
# Min-Max Normalization
trans = MinMaxScaler()
df['Salary_normalized'] =
trans.fit_transform(df[['Salary'
]])
print(df)
THEORY:
This code will handle missing data in the 'Salary' column using the fillna() method with the
'pad' method for forward filling. If you want to use another method like replacing NaN with
0 or linear interpolation, you can comment/uncomment the respective lines.
Missing data can occur due to various reasons such as data entry errors, equipment
malfunction, or intentional omission.
It's essential to handle missing data appropriately because they can lead to biased results
and incorrect conclusions if not addressed.
Imputation: Filling in missing values with estimated values. Methods include filling with
mean, median, mode, or using more sophisticated techniques like linear interpolation.
Deletion: Removing rows or columns with missing values. However, this approach can
lead to loss of valuable information if not done carefully.
Prediction: Using machine learning algorithms to predict missing values based on other
features in the dataset.
In Python, libraries like pandas provide convenient functions like fillna() and interpolate()
for handling missing data effectively.
For Min-Max normalization, the code initializes a MinMaxScaler object and fits it to the
'Salary' column, transforming the data and storing the normalized values in a new column
called 'Salary_normalized'.
Min-Max normalization, also known as feature scaling, is a technique used to scale
numeric features to a specific range, typically between 0 and 1.
Here,
min X
min
max
X max
Min-Max normalization ensures that all features have the same scale, which can be crucial
for algorithms sensitive to feature scales, such as gradient descent-based optimization
algorithms.
In Python, libraries like scikit-learn provide the MinMaxScaler class, which makes it easy
to perform Min-Max normalization on datasets.