0% found this document useful (0 votes)
8 views3 pages

Week 5 Assignme-WPS Office

The document provides a comprehensive guide on cleaning datasets before visualization, emphasizing the importance of accuracy and reliability in data analysis. It outlines a step-by-step process including downloading datasets, inspecting data, handling missing data, removing duplicates, and standardizing formats. The guide concludes with methods for creating visualizations in Excel after data cleaning is complete.

Uploaded by

Ikenna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views3 pages

Week 5 Assignme-WPS Office

The document provides a comprehensive guide on cleaning datasets before visualization, emphasizing the importance of accuracy and reliability in data analysis. It outlines a step-by-step process including downloading datasets, inspecting data, handling missing data, removing duplicates, and standardizing formats. The guide concludes with methods for creating visualizations in Excel after data cleaning is complete.

Uploaded by

Ikenna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Week 5 Assignment

Cleaning up a dataset before visualization is a critical process in data analysis, as it ensures the accuracy
and reliability of insights derived from the data. Below is a comprehensive guide on how to clean data,
including how to resolve anomalies. Let’s walk through the steps using an example dataset from
platforms like Kaggle or Google Dataset Search.

Step-by-Step Process for Data Cleaning

1. Download the Dataset:

First, choose a dataset from Kaggle or Google Dataset Search. For example, you might download a CSV
file containing sales data, weather data, or customer reviews.

Ensure that you inspect the dataset by loading it into your excel environment,

1. Inspect the Data

Open your dataset in Excel.Look at the first few rows to get an understanding of the data.

Check data types.Ensure that numbers are recognized as numbers, dates as dates, and text as text. You
can check this by selecting cells and reviewing the data type in the toolbar.

2. Handle Missing Data

-Highlight missing data: Use Excel’s Conditional Formatting to highlight blank cells.

Go to Home > Conditional Formatting > Highlight Cell Rules > Blanks** to highlight missing cells.

Option 1: Remove missing rows/columns:

Select the rows or columns, right-click, and choose *Delete.

Option 2: Impute missing data:

For numerical columns, fill missing values with the *mean* or *median*.

- Select the cells, right-click, and choose *Fill > Down* or *Fill > Series* to replace missing values.

You can also use Excel’s *AVERAGE()* or *MEDIAN()* function to calculate the value and manually
input it
=AVERAGE(B2:B100)

3. Remove Duplicate Entries*

Go to the *Data* tab and click on *Remove Duplicates.*

Select the columns you want to check for duplicates and confirm the removal.

4. *Handle Outliers

Use *Conditional Formatting*to highlight values that are too high or too low:

Go to *Home > Conditional Formatting > Highlight Cell Rules > Greater Than/Less Than* and specify
thresholds.

For visualization, you can filter out extreme outliers manually by applying filters to the data range.

5. Standardize Data Formats

Date Format:

- Select the column with dates, right-click, and choose *Format Cells.*Choose the desired date
format, such as `YYYY-MM-DD`.

String Format:

For text, use *TRIM()* to remove leading/trailing spaces and *LOWER()* or **UPPER()** functions to
standardize text.

Example:

=TRIM(A2)

=LOWER(A2)

6. **Fix Incorrect Data Types**

Convert text to numbers:* If numerical data is formatted as text, select the cells and click the warning
icon, then choose Convert to Number.*
Convert text to dates: Use *Text to Columns* to convert text dates into actual date format.

Go to *Data > Text to Columns* and follow the wizard.

7. *Categorize Data*

Use the *IF()* function or *VLOOKUP()* to categorize data into bins or groups.

Example: Categorizing prices into "low," "medium," "high" ranges.

=IF(B2<100,"low",IF(B2<500,"medium","high"))

```

8. Remove Irrelevant Data

- Select columns or rows that are unnecessary for analysis and press Delete to remove them.

9. *Transform Data (Optional)*

Apply **Log Transformation** or other transformations using functions like **LOG()** in Excel.

=LOG(B2)

10. Final Check

Use *Data Validation* to ensure data integrity:

Go to *Data > Data Validation* and set validation rules (e.g., ensuring numerical ranges or valid
dates).

After cleaning your data, you can now create meaningful visualizations in Excel:

Bar Charts:Go to *Insert > Bar Chart** to visualize categorical data.

Line Charts:* Use for trend analysis over time.

Pivot Tables* Summarize and analyze data quickly.

You might also like