FBA Module 3
FBA Module 3
Preparation and
Analysis
Introduction to Data
Preprocessing
Handling
2. Transformation Techniques: Log transformation or
square root transformation can reduce the impact of
Outliers outliers.
Common 1. Formula:
Normaliza Xnorm=X−XminXmax−XminX_{
tion norm} = \frac{X - X_{min}}
{X_{max} - X_{min}}Xnorm
Technique =Xmax−XminX−Xmin
2. Scales values between 0 and
s 1.
3. Useful for algorithms like KNN
and Neural Networks.
• 2. Z-Score
Standardization:
Common • Formula: Xstd=X−μσX_{std}
Normaliza = \frac{X - \mu}{\sigma}Xstd
=σX−μ
tion • Converts data to a
Technique distribution with a mean of 0
and a standard deviation of 1.
s • Useful for linear regression
and PCA.
3. Log Transformation:
Common • Used when data is highly
Normaliza skewed.
tion • Formula:
Technique Xlog=log(X+1)X_{log} = \
log(X + 1)Xlog=log(X+1)
s
Common • 4. Robust Scaling:
Normaliza • Uses median and IQR
instead of mean and
tion standard deviation.
Technique • Useful for handling outliers.
s
Data • Data cleaning involves
Cleani identifying and correcting
inconsistencies, errors, and
ng irrelevant data.
1.Removing Duplicates: Eliminating duplicate records to avoid
redundancy.
2.Handling Inconsistent Data: Standardizing formats (e.g., date
formats, text case normalization).
3.Handling Outliers and Missing Values: As discussed in previous
sections.
4.Fixing Structural Errors: Correcting typos and syntax errors in
categorical variables.
5.Data Validation: Checking data integrity using automated scripts or
manual review.
Common Data
Cleaning Techniques
Data Preprocessing
and Analysis in Excel
Data Cleaning-Removing duplicate rows
• If you would like to identify duplicate rows so that you can examine
them without automatically deleting them, here's another method.
Unlike the technique described in the previous section, this method
looks at actual values, not formatted values. Create a formula to
the right of your data that concatenates each of the cells to the left.
• The formulas that follow assume that the data is in columns A:F.
Enter this formula into cell G2: =CONCAT(A2:F2)
• Add another formula in cell H2. This formula displays the number of
times a value in column G occurs: =COUNTIF(G:G,G2)
Removing extra spaces