0% found this document useful (0 votes)
19 views

Modulo 1

The document outlines a course on analyzing data with Excel, covering modules from basic spreadsheet introduction to advanced data analysis techniques. Key topics include data entry, cleaning, filtering, sorting, and using pivot tables, along with hands-on labs for practical application. By the end of the course, participants will be equipped to utilize Excel for data analysis tasks effectively.

Uploaded by

Ekary_Lafey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Modulo 1

The document outlines a course on analyzing data with Excel, covering modules from basic spreadsheet introduction to advanced data analysis techniques. Key topics include data entry, cleaning, filtering, sorting, and using pivot tables, along with hands-on labs for practical application. By the end of the course, participants will be equipped to utilize Excel for data analysis tasks effectively.

Uploaded by

Ekary_Lafey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Course Analyzing Data with Excel

Module 1 - Introduction to Data Analysis Using Spreadsheets


Introduction to Spreadsheets
Spreadsheet Basics
Using Spreadsheets as a Data Analysis Tool

Module 2 - Getting Started with Using Excel Spreadsheets


Viewing, Entering, and Editing Data
Copying, Filling, and Formatting Cells and Data
The Basics of Formulas
Intro to Functions
Referencing Data in Formulas

Module 3 - Basics of Data Quality and Privacy


Overview of the Data Analyst Ecosystem
Types of Data
Understanding Different Types of File Formats
Sources of Data Using Service Bindings
Languages for Data Professionals

Module 4 - Cleaning Data


Removing Duplicated or Inaccurate Data and Empty Rows
Dealing with Inconsistencies in Data
More Excel Features for Cleaning Data

Module 5 - Data Analysis Basics, Filtering and Sorting Data


Intro to Analyzing Data Using Spreadsheets
Filtering and Sorting Data in Excel
Useful Functions for Data Analysis
Using VLOOKUP and HLOOKUP Functions

Module 6 - Using Pivot Table


Introduction to Creating Pivot Tables in Excel
Pivot Table Features

Final Project - Part 1 and Part 2


Module 1 - Introduction to Data Analysis Using
Spreadsheets
In this module, you will learn about the fundamentals of spreadsheet applications, and you will be
introduced to the Excel interface and learn how to navigate your way around a worksheet and
workbook.

Video: Course Introduction (2:58)


Do you want to learn how to use spreadsheets and start analyzing data using Excel? This course
from IBM is designed to help you work with Excel and gives you a good grounding in the cleaning
and analyzing of data which are important parts of the skill set required to become a data analyst.
You will not only learn data analysis techniques using spreadsheets, but also practice using
multiple hands-on labs throughout the course.

In module one you will learn about the basics of spreadsheets, including spreadsheet terminology,
the interface and navigating around worksheets and workbooks.
In module two you will learn about selecting data, entering an editing data, copying and auto
filling data, formatting data, and using functions and formulas.
In module three you will learn about cleaning and wrangling data using a spreadsheet, including
the fundamentals of data quality and data privacy, removing duplicated and inaccurate data,
removing empty rows, removing data inconsistencies and white spaces, and using the flash fill and
text to columns features.
In module four you will learn about analyzing data using spreadsheets, including filtering data,
sorting data, using common data analysis functions, creating and using pivot tables, and creating
and using slicers and timelines.
At the end of this course in module five, you will complete a series of hands-on labs which will
guide you on how to create your first deliverable as a data analyst. This will involve you
understanding what the business scenario is, cleaning and preparing your data, and analyzing
your data. You will follow two different business scenarios throughout the course, with each using
their own data set. These different scenarios and data sets will be used in the lesson videos and in
the hands-on labs.

After completing this course, you will be able to understand how spreadsheets can be used as a
data analysis tool; understand when to use spreadsheets as a data analysis tool and their
limitations; create a spreadsheet and explain its basic functionality; perform data wrangling and
data cleaning tasks using Excel; analyze data using filter sort and pivot table features within Excel
spreadsheets. You will also perform some intermediate level data wrangling and data analysis
tasks to address a business scenario.

Introduction to Spreadsheets (5:29)


In this first video of the course, we will list some of the common spreadsheet applications
available, learn about the key capabilities of spreadsheets, and discuss why spreadsheets might
be a useful tool for a Data Analyst.

There are several spreadsheet applications available in the marketplace; some of them are more
widely known and used than others, and some are free, while others need to be paid for.

By far the most commonly used spreadsheet application, and the most fully featured of them all is
Microsoft Excel.
The desktop version comes in a paid form as part of the Office suite and some Microsoft 365
subscriptions, but there is also a web-based cut-down version called Excel for the web, also known
as Excel Online. The online version is free to users with a Microsoft account, but does not offer all
the advanced features that the desktop version provides.
2
The next most popular is Google Sheets, which offers a lot, though not all of the features that
Excel provides, and is free with a Google account. This is a web-based application and it integrates
nicely with other Google apps, such as Google Forms, Google Analytics, and Google Data Studio.

Then there is LibreOffice Calc, a totally free and open source desktop spreadsheet application
that offers more basic functionality than Excel or Google Sheets, but still has a lot of the tools you
need for data analysis, such as charts, conditional formatting, and pivot tables.

Other spreadsheet apps include Zoho Sheet (a fully-featured web-based application that is
comparable with Google Sheets), OpenOffice Calc, Quip from Salesforce, Smartsheet (which is
predominantly for project management), and Apple Numbers, (which is included withApple devices
such as Mac computers and is also available on the App Store for other Apple devices).

So, there are many spreadsheet application options open to you, from fully-featured to basic, from
cloud-based to desktop apps, from paid-for to free versions. It’s up to you to decide which one best
fits your needs and your budget.

Spreadsheets provide several advantages over manual calculation methods. For example, once
you have your formulas correctly written, you can be assured that your calculations are accurate,
and that the calculations will be performed automatically for you.
Spreadsheets also help keep your data organized and easily accessible. Your data can be easily
formatted, filtered, and sorted to suit your needs. If you do make mistakes in your data entry or
your calculations you can easily edit them,
undo them, or use error-checking tools to help remedy those mistakes. And lastly, you can analyze
data in spreadsheets, and create charts, graphs, and reports to help visualize your data analysis.

Since spreadsheet software for personal computers first appeared on the market in the 1970s,
with VisiCalc on the Apple II PC, spreadsheets have come a long way in terms of the capabilities
and features they now offer businesses, from uncomplicated tables and relatively simple
computations to powerful tools for the analysis, management, and visualization of enormous sets
of data.

The most common business uses for spreadsheet applications include the following: Data Entry
and Storage, Comparing Large Datasets, Modelling and Planning, Charting, Identifying Trends,
Flowcharts for Business Processes, Tracking Business Sales, Financial Forecasting, Statistical
Analysis, Profit and Loss Accounting, Budgeting, Forensic Auditing, Payroll and Tax Reporting,
Invoicing, and Scheduling.

And away from the business side of things, other typical uses include Personal Expenses,
Household Budgeting, Recipe library, Fitness Tracking, Calorie Counting & Weight Monitoring,
Sports Leagues such as Fantasy Football, Cataloging Music Libraries, and even Contact Lists,
Shopping Lists and Christmas Card Lists.

As a Data Analyst, you can use spreadsheets as a tool for your data analysis tasks, including:
 Collecting and harvesting data from one or more distributed and different sources.
 Cleaning data to remove duplicates, inaccuracies, errors, and resolve missing values to
improve the quality of the data.
 Analyzing data by filtering, sorting, and interpreting it to determine what useful information
can be gleaned from it.
 And visualizing data, to help you tell a story about your data analysis findings to key
business stakeholders and any other interested parties within your organization.

In this video, we had an introduction to spreadsheets. We learned about some common


spreadsheet applications, what the main capabilities of spreadsheets are, and why spreadsheets
might be a useful tool for a Data Analyst. In the next video, we will look at the basics of
spreadsheets, including common spreadsheet terminology.

3
Video: Spreadsheet Basics - Part 1 (5:29)

Now that we have a basic understanding of what spreadsheet software is available, and why
spreadsheets might be a useful tool for a Data Analyst, let’s get started on looking at some of the
basics of using a spreadsheet application.
In these videos we will be using the full ‘desktop’ version of Excel, but the majority of the tasks
that we will perform can also be done using Excel ‘on the web’, also known as Excel Online, and
other spreadsheet applications such as Google Sheets.

Let’s first cover some basic spreadsheet terminology. When you open Excel, you have the option of
creating a new blank workbook or opening anexisting workbook.
We’re going to choose New, and then Blank workbook. Workbooks are the highest-level
component in Excel and are represented as a .XLSX file. So, when you open an existing workbook
or create a new workbook you are in fact working
with a .XLSX file.

The workbook contains all your data, calculations, and functions, and contains several other
underlying elements that make up a workbook. A workbook consists of one or more worksheets,
each of which is represented by a tab in Excel.
Each worksheet is given a name which is displayed on the corresponding tab for the worksheet. By
default, each tab is named Sheet1, then Sheet2, and so on.
To make these worksheet tabs more meaningful it is usual to rename them, so they make more
sense in relation to the worksheet’s purpose. For example, you might call a worksheet January
Sales, or perhaps the name of a region or store, or even an office or department. To do this, right-
click the tab and choose Rename. Instead of right-clicking to rename, you can also just double-
click the name of a worksheet tab to rename it. Essentially, worksheet tabs can be named
anything you want to fit your particular needs to make it easier to understand what that worksheet
represents.

Note that a worksheet that is highlighted, as the Tire Sales worksheet tab is here, is referred to as
the active worksheet.
If you want to order your worksheets in a different way, that is very simple to do. Either drag a
worksheet tab to the left or right and drop it in the place you want, which is represented by the
little black arrow, or if you are not comfortable with dragging and dropping, then the longer way of
doing that is to right-click the worksheet tab, select Move or Copy, and then in the list titled Before
sheet, select where you want your worksheet tab to be placed, and click OK.

Every worksheet is made up of a lot of rectangular boxes called cells. These cells will contain your
data, which may be text, numbers, formulas, or calculation results. Cells are organized in
columns, which run vertically down the screen and use a letter system; this is column B for
instance. And rows, which run horizontally across the screen and use a numeric system; this is
row7 for example. Each cell is represented by a cell reference which is essentially just its column
letter and row number.

For example, if we click somewhere near the center of this worksheet, we now have the cell M20
selected. This is usually referred to as the ‘active cell’. This is not only indicated by the highlighted
edges of the cell but also if you look in the top left corner of the worksheet, you will see its cell
reference is noted in the little box. Here you can see it says M20. One important thing to note here
is that cells are always referenced by their column letter first then their row number; so, column M,
and row 20.

The last element of a workbook I want to mention is a cell range.


This identifies a collection of several cells selected together; that could mean a few cells in the
same row or the same column, or it could mean several rows and columns together. This can
either be done using the mouse by selecting the first cell then ‘dragging’ down or across to include
4
other cells; or you can use SHIFT+ arrow keys. This range of cells is often referred to as an array,
and it’s most commonly used as a reference in calculations and formulas.

For example, if you wanted to add up all the values in a column between cells D9 and D19 you
would specify this cell range within a formula. Note that cell ranges are notated using a full colon
(:) between the cell references; so, in this example it would be D9:D19, or to specify a few cells in
the same row it might be D9:H9, or to select several rows and columns it might be D9:H19. We will
see this notation in use later in this course when we start looking at calculations
and formulas. These cell ranges could even be a reference point to cells contained on another
worksheet; this is usually referred to as a 3D reference.

Spreadsheet Basics - Part 2 (6:56)

Now that we have a basic understanding of the main elements that make up a worksheet, let’s see
how to move around a spreadsheet, get familiar with the ribbon and menus, and learn how to
select data in a worksheet.

To open a sample file, we click File. This opens Backstage View. Here you can create a new
workbook, or open, save or print a workbook. You can also access Excel Options. Now, we want to
open our sample file. So, we click Open, and either select it from my Recent list, or click Browse to
find the data file we want.

The first thing we should do is get acquainted with the ribbon and menus. Notice that on the
ribbon at the top we have several tabs. Some of these tabs may be familiar to you from other
Office products, such as the Home, Insert, and View tabs, while others might be new to you, such
as Formulas, Data, and Power Pivot. To make a little more workspace for ourselves we can hide this
ribbon by double-clicking any tab, and to unhide it, we do the same. The other option is to use the
shortcut key CTRL+F1. The ribbon is organized into groups of buttons to make them easier to find.

So, on the Home tab we have groups for Font, Alignment, Number, Styles, and so on. Some of
these groups contain all the available buttons on the ribbon when viewing in full screen, such as
Styles and Cells, but other ribbon groups have more options, which we access by clicking the little
arrow icon in the bottom right corner of the group, as can be seen here on the Number group for
example.

The next item I want to point out is the Quick Access Toolbar at the top of the screen above the
ribbon. As the name suggests this is where you can quickly access the tools you use most often.
You can see we already have some tools in this toolbar such as Save, Undo, Redo, New, and Open.
But we can add other tools to the toolbar if we wish. So if we click the drop-down arrow in the
toolbar and then select a tool we will use a lot, such as Sort Ascending, that will be added, and we
will also add the Sort Descending button too.

Now we need to be comfortable with moving around a worksheet. You can simply use the arrow
keys to move left, right, up, and down 1 cell at a time. But you can also use Page Down and Page
Up to move around a bit faster, which is especially useful if you have lots of rows of data. And to
move even quicker up or down a large datasheet use the vertical scroll bar, and to move left or
right use the horizontal scroll bar. Again, these can be very useful when you have a large data set.

There are also some useful shortcuts you can use.


 CTRL+Home key for example takes you back to the start of the worksheet (i.e. cell A1).
 CTRL+End takes you to the cell at the end of your data in the worksheet.
 CTRL+Down arrow takes you to the end of the column you’re in, while CTRL+Up arrow
takes you back to the top of that column.

So a quick way to find out how many rows of data you have in your worksheet is to go to the first
cell in your data and press CTRL+Down arrow to see the last row of data. So here you can see we
have 160 rows. Now how do we go back to the top again? CTRL+Home will do it.

5
So far, we have seen how to navigate around our worksheet and its data, now we need to look at
how we select data. This is very important because you often need to select data to move it, copy
it, or select it in a formula. The simplest selection is a single cell, usually done with a mouse or
maybe a directional arrow key.

The next step up is to select multiple cells together, and this can be done either with a mouse by
dragging from one cell to additional adjoining cells, or you can use the SHIFT key with directional
arrow keys. Next up is selecting a single column or row which is done simply by selecting the letter
at the top of a column, or the number on the left of a row.
Then we can progress to selecting multiple columns and rows, by clicking the mouse button,
holding it down and dragging across more columns.

Or if you are not comfortable with dragging you can also select the column first, then hold
SHIFT+Arrow keys to select multiple columns. The same applies to rows too.

However, if you have data in non-contiguous rows or columns (i.e. not next to each other) you can
select the first column, then use the CTRL key to select another unconnected column, such as
columns C and F here. The largest thing you might want to select is the whole worksheet which
you can do by clicking in the top left corner of the cells. However, this selects the entire
worksheet including all the empty rows and columns; so if you only want the data in your
worksheet, you can use the shortcut CTRL+A.

A word of warning when selecting data in cells, rows, and columns; there are 3 types of cross
symbols that you might see when working with selected cells. The first one is the large white cross
that you see when you select a cell as can be seen here in cell A4, this is the Select cross that we
have been using already in this video to select cells.
The second type you might see is when you hover over the bottom edge of a cell and see a thin
black cross-type symbol with arrows on each point…. this is the Move symbol and would move the
cell data to another location.
The last type is the small thin black cross that is seen when you hover over the bottom right
corner of a cell; this is the Fill Handle or Copy symbol and it fills (or copies) the cell data to another
location.

Reading: Excel Keyboard Shortcuts

6
Hands-on Lab 1: Access to the Environment
Microsoft Excel is the most widely used spreadsheet software even three decades after its initial
release. For all these years it has been available as a standard application that needed to be

7
installed on your desktop; but it is not just a desktop app anymore. Now, you can even use Excel
when you’re online by using ‘Excel for the web’ - and run it right in your web browser without
installing anything on your desktop!

‘Excel for the web’ (sometimes referred to as Excel Online) can be used at no charge as part of a
free Microsoft account. Although it does not have all of the capabilities of the desktop and paid
online versions, the free web version provides many of the key features.

Software Used in this Lab


The instruction videos in this course use the full Excel Desktop version as this has all the available
product features, but for the hands-on labs we will be using the free ‘Excel for the web’ version as
this is available to everyone.

Although you can use the Excel Desktop software if you have access to this version, it is
recommended that you use Excel for the web for the hands-on labs as the lab instructions
specifically refer to this version, and there are some small differences in the interface and
available features.

Dataset Used in this Lab


The dataset used in this lab comes from the following source:
https://www.kaggle.com/sudalairajkumar/indian-startup-funding under a CC0: Public Domain
license.
Acknowledgement and thanks also goes to https://trak.in who were generous enough to share the
data publicly for free.
We are using a modified subset of that dataset for the lab, so to follow the lab instructions
successfully please use the dataset provided with the lab, rather than the dataset from the original
source.

Hands-on Lab 2: Spreadsheet Basics


Objectives
After completing this lab, you will be able to:
Understand and use the basic elements of a spreadsheet.
Explore the ribbon, navigate around a worksheet and select data.

Exercise 1: Introduction to Basic Spreadsheet Elements

In this exercise, you will learn about some common spreadsheet elements.
1. Open Excel for the web. Click on New blank workbook.
2. The new blank workbook will automatically be saved in Excel for the web as Book. To
rename the workbook to something more meaningful, click File, Save As, then choose
Rename.
3. In the file name box, type Personal_Monthly_Expenditure_Lab2 and click OK.
4. In the saved workbook, you will have one worksheet opened, named Sheet1. Click + once
to add another worksheet. Then, double-click the sheet name tab for Sheet1 and rename it
to Expense - 2019. Similarly, rename Sheet2 as expense - 2018.
5. To maintain an appropriate worksheet tab sequence, click on the worksheet tab Expense -
2018, then drag and drop it before the Expense - 2019 tab.
6. Click on the Expense - 2018 tab. Select an entire column by clicking on B in the top of the
worksheet, then select an entire row by clicking on the number 5 in the left of the
worksheet. Click cell B5, and a green outline will appear around the cell. Now check if you
have clicked the correct cell by looking at the cell name box in the top left corner, circled in
red below

8
7. Select several cells in the same row, such as A1:D1 by clicking cell A1 and then drag the
cursor across to D1. Similarly, select a cell range in the same column, such as A1:A5 by
clicking A1 and dragging the cursor down to A5
8. Now select a cell range which includes several rows and columns together, such as A1:C5
by clicking A1 and then dragging the cursor across and down to cell C5.

Viewpoints: Using Spreadsheets as a Data Analysis Tool (5:22)


In this video, we will listen to several data professionals discuss the advantages and limitations of
using spreadsheets as a tool for data analysis. Let us start with, “What are the benefits and
advantages of using spreadsheets as a tool for data analysis?”

Richie Zitomer
My experience using spreadsheets as a tool for data analysis is somewhat mixed. I think they can
be really, really useful in the right context, but using spreadsheets definitely has its limitations, so
the big pro of using spreadsheets is you can see all the data cleanly laid out in front of you in a
table. So, I think it's very clear to anyone looking at a spreadsheet exactly what the data is, what
format it comes in, all of that.You can just easily, visually inspect it.

Nikki
As a CPA, I use Microsoft Excel on a daily basis and I have done so for the duration of my career.
The functionalities, the pivot, the pivot tables, the charts, etc. But also, being able to use formulas.
My personal favorite is Index Match for using a pretty simple way to take just thousands of lines of
information and sift through all of that to find specifically what you're looking for. Excel is really
that one-stop-shop where you can perform calculations, analyze financial ratios, and even export
reports out of the ERP that I spoke of earlier to customize it as you need.

Asha Barnes
My experiences using spreadsheets is that they're great for simple analysis.

Joye Sistrunk
I will say spreadsheets, over the years, the process itself has just improved as systems improve, as
technology improves, spreadsheets are the way to go. Spreadsheets overall, when you do have
probably anywhere from zero to twenty-thousand lines of data, it's a good way to go, you can
really pull out the data. Whether I'm trying to see how much a client’s making per month, but they
may have, you know, a thousand transactions. All of that's helpful. I can use this spreadsheet to
whittle down what is actually going on per month or if I want to do a Sum If, or you know if this
happens, give me this number. It it's really helpful to be able to dig in and wrap your hands around
it and take something that seems, on the surface, twenty-thousand lines seems almost
unmanageable, but if I take it and I massage it, put it in a spreadsheet and then sort it filter it,
make it pretty, put in a pivot table, I can get what I need. It’s just all about not looking at it as
being this intimidating thing but making it more manageable and breaking it down into bite size
chunks.

Erin Huang
Spreadsheets are the easiest way to analyze data and present data. We don't need any fancy tools
or additional software for spreadsheets. It's like the commonly utilized language to communicate.

Thank you for that insight, but let's move on to look at the other side of the coin. What are the
drawbacks and limitations of using spreadsheets as a tool for data analysis?

Richie Zitomer
I think one of the big cons in terms of analyzing data within spreadsheets is it's really hard to
reproduce state. So, in other words, if you load in some data and you filter out some bad values, or
you impute some missing values, there's no way to tell your colleagues or your future self exactly
the different steps you took to create that data set or to modify that data set.
9
Nikki Winston
It's almost a dilemma because of the plethora of options available within Excel and all of the
functions that are there, supposedly to make your life easier, but it's nearly impossible to know
everything. And you can find yourself in what we accountants call analysis paralysis when you're
looking at something for too long or you're not well versed in a particular Excel function. So, you
may spend a lot more time, energy, and effort trying to figure that one thing out. And had you
done it a different way? Or maybe a manual way? You probably could have gotten to the solution a
lot easier.

Asha Barnes
And the downside of using spreadsheets is that if you have complex formulas, v-lookups, if-
statements at times they just stopped working and you have to rebuild them. So, I have found that
it's better to use Excel just for simple analysis and for a download of information.

Joye Sistrunk
I love a good spreadsheet. I love using Excel and pivot tables to get to the data, but I find that I if I
start to get over ten, twenty- thousand lines of data, it gets a little tricky. And sometimes the
spreadsheets will crash. So that's when we might move to Access and some of the other tools that
we use.

Erin Huang
Is very difficult to handle the extremely large data set in spreadsheets. Besides spreadsheets have
less flexibility for complicated analysis and presentation.

Reading: Summary and Highlights

In this lesson, you have learned:

1. There are several spreadsheet applications available in the marketplace; the most
commonly used and fully-featured spreadsheet application is Microsoft Excel.
2. Spreadsheets provide several advantages over manual calculation methods and they help
you keep data organized and easily accessible.
3. As a Data Analyst, you can use spreadsheets as a tool for your data analysis tasks.
4. There are several elements that make up a workbook in a spreadsheet application.
5. The ribbon provides access to all the features and tools required to view, enter, edit,
manipulate, clean, and analyze data in Excel.
6. There are several ways to navigate around a worksheet and workbook in Excel.

Module 2 - Getting Started with Using Excel


Spreadsheets

Video: Viewing, Entering, and Editing Data (5:50)

that you have learned basic spreadsheet terminology and learned how to navigate your way
around worksheets and select data in Excel, it’s now time to start entering some data. First, we will
10
look at some of the handy viewing features provided in Excel, and then we’ll enter some data, and
then edit that data.

When you have a lot of data in your worksheet it can be useful to zoom in closer to a specific area
of the data. The Zoom Slider at the bottom right corner of the worksheet allows you to do just that.
You can either click on the plus and minus buttons or drag the slider to select your preferred zoom
value. You also have some zoom controls in the ribbon on the View tab. Zoom lets you pick a
predefined zoom level or a custom one, the 100% button zooms the worksheet back to its original
size and Zoom to Selection enables you to select an area of data and then zoom into that specific
selection only.

If you want to see several areas of your data at the same time while zoomed in, you can use the
Split button. This splits the screen into multiple sections; and you can scroll each section
separately. If you only want two sections, you can remove either the horizontal or the vertical split
by double-clicking on it.

If you have headings in your columns like a header row, then you might want those to remain on
screen while you move down the sheet. To do that you need to use Freeze Panes. You can freeze
only the top row if you wish, or if that doesn’t suit, as is the case here, then you can select the row
(or even just a cell in the row) below the row or rows you want to freeze, and then select Freeze
Panes. You can do a similar thing for columns you want to freeze too. And you can even freeze
both rows and columns at the same time. The trick here is to first select the cell that is both one
row below where you want to freeze, and one column to the right of where you want to freeze.

In this case, that is cell C4.


Now we can scroll down the worksheet and across the worksheet and we can still see the header
row and the Manufacturer and Model columns. Now, if you have multiple workbooks open (notice I
said workbooks and not worksheets) then you can switch between them by using View, Switch
Windows, or the faster method is to use the CTRL+F6 shortcut.

Now let’s enter some data into a blank worksheet. The easiest way to open a new worksheet from
within Excel is to click the New button in the Quick Access Toolbar (or CTRL+N if you prefer
keyboard shortcuts). So let’s enter some headings across the top of the worksheet; this is typically
referred to as a ‘header row’. Note, that if you press Enter after typing data into a cell the next
active cell is the one directly below, which is not what we want in this case. But, if we press Tab
after we enter data in a cell, it selects the next cell along in the row as the active cell.
Now we’ll enter some headings and press Tab after each entry. Notice that the text is slightly
longer in some of the cells and it either gets partly hidden by the next cell or overlaps it. If you
click and hold the divider line between two columns, you can drag it left and right to resize it
manually. If you want to do that automatically, you can double-click the divider line between two
columns.

As these are going to be headings for our columns, let’s make them bold. Now let’s add another
column between the parts and accessories columns. Simply select the right-hand of those two
columns, then right-click and choose Insert to put another column to the left of the selected
column. Let’s call it Servicing Sales.

To tidy up all our column widths simultaneously, we select all the columns from A to E, then
double-click any of the divider lines between columns; this automatically reduces or increases
each column’s width to fit the data in that column. OK, now we have some headings, let’s enter
some month data in column A.
So, if we type Jan in cell A2 and press Enter then it takes us to the cell below, which is what we
want in this case and we can type Feb in cell A3 and so on until we get to Dec in A13. Now, let’s
suppose you need to change a couple of your headings. You have several ways of editing existing
data in a cell; You can either select the cell and then just start over typing. Or you can select the
cell and press F2 on your keyboard to put the cursor at the end of the cell and make your changes.
Or you can simply double-click somewhere on the cell to put the cursor at that position in the cell

11
and make your changes. And you can even select the cell and then click in the formula bar to edit
your cell data. Now let’s do the same for the parts and accessories column headings.

Copying, Filling, and Formatting Cells and Data (7:37)


Now that we have learned about some of the handy viewing features provided in Excel, and
entered and edited some data, let’s discuss how to move, copy, and fill data, and how to format
cells and data to suit our needs.

The first thing we are going to discuss is how to move data, so if you select a range of cells, in this
case the headings in A1 to E1, and then hover over the top or bottom edge of a selected cell, and
you will see the Move pointer, then you can drag the selection to another place on the worksheet.

Alternatively, if you want to copy the data instead, you do the same thing but this time you also
hold CTRL key as you select and drag the selection to another location and you will see the Copy
pointer. If you are not comfortable with dragging, you can also use Copy and Paste menu
commands or keyboard shortcuts.

So if you select some data in column A and copy it to the clipboard. Then you simply select the
new location and paste the copied data. You can also move or copy between worksheets, so let’s
create a new worksheet. Then select some data from Sheet1, and this time let’s use the CTRL+C
keyboard shortcut to copy it to the clipboard. Then choose the other worksheet and use the
CTRL+V shortcut to paste the data.

However, notice that the column widths are not the same as the original source data, so let’s undo
that and try another paste option. By default, when you paste the copied data, it uses the column
width settings of the destination cells. So, to paste it and retain the column widths of the source
data, you chose the special option under the Paste command, called Keep Source Column Widths.

As an alternative to having to enter data manually in a worksheet, you can use an Excel feature
that automatically fill cells with data when it follows a sequential series or pattern. The feature is
called AutoFill, and it can be especially useful when you need to enter lots of repetitive data into
Excel, such as date information.

For example, if you enter a month in a cell, even using a shortened version of the name, you can
use what’s called the Fill Handle to select down to the end of the series, and AutoFill will work out
what the series is, based on the selected data. Let’s try the same thing with days of the week. If
you enter Mon in a cell, then drag the fill handle to use AutoFill, it will determine that you want to
enter the days of the week sequentially. However, if you also enter Wed (for Wednesday) in the
next cell down, and select both cells in the series, i.e. A16 and A17, and then drag the fill handle
down, AutoFill determines that the sequence has changed to every other day, and fills in the data
series for you.

It’s important to select all cells that define the pattern when using AutoFill so that it can best
determine what the pattern is, in this case cells A16 and A17. A similar thing applies to numerical
patterns; if you enter 5 in a cell, and then use the fill handle to fill the data down the column.
Because the data is not the name of a day or month for example, AutoFill can’t determine what the
pattern is yet.
So, In this case, it just copies the value 5 into every selected cell. However, if you enter the value
10 in B3, and then use the fill handle to fill the data down the column, AutoFlll determines that the
pattern is incrementing by 5 each time
and it fills in the remainder of the data pattern for you.

We are now going to look at formatting our data, and there are essentially two distinct parts to
this. First, there’s formatting of the cells themselves (with a fill color and a bold border for example
and bold text within it). And then there’s formatting the data in the cells (for example, making it
text format, number format, or a specific currency or accounting format).
12
Let’s open the car sales worksheet we used previously. Then select the headings in cells A3 to P3
either using the mouse, or you could use the shortcut keys CTRL+SHIFT+Right Arrow. On the
Home tab, click the Styles drop-down arrow, and select a style color for your cells. Then you can
make the selected cells bold. Then you select the data in the Manufacturer column either using the
mouse, or the shortcut keys CTRL+SHIFT+Down Arrow.

In the Styles drop-down arrow, select another style color for the selected cells. Again you can
make the cells bold.
Then you select the data in the Model column again either using the mouse, or the shortcut keys
CTRL+SHIFT+Down Arrow. In the Styles drop-down arrow, select another style color for the
selected cells. This time you could make the selected cells italic. And you can also change the font
size and style.

Lastly, you can select all the other cells in the data by using the mouse or the CTRL+SHIFT+Right
Arrow then Down Arrow, and apply borders to the data cells. Now it’s time to format the cell data.
The sales figures in columns C and D can be formatted to display only two decimal places; just
select the data and click the Decrease Decimal button.
We also have an issue with a couple of the car models. If you look in cells B129 and B130, where
the model name is supposed to be displayed, you can see there are actually two dates listed
instead.

And if you look in the Number Format box, the format type is Custom. This has happened because
the model numbers are supposed to be the Saab 9-5 and the Saab 9-3 but when the files were
imported from CSV files these two cells must have been incorrectly determined to be date values
and not just numbers. You can fix this by formatting these two cells as Text, and then enter the
correct values of 9-5 and 9-3.

The last thing we shall do is format some data as currency. If you look at the heading in column F it
says it is Price in thousands of dollars and cell F4 is using the General format. So, let’s change the
format of this column to American currency format. We select the column, F in this case, then
select More Number Formats from the drop-down list, then we choose the Currency option, and the
correct currency symbol and format. And we’re done.

Hands-on Lab 3: Entering and Formatting Data (30 min)


Task A: Viewing Data
1. Download the file indian_startup_funding_Lab3.xlsx. Upload and open it using Excel for the
web.
2. Select F20:H26 (if required, use the vertical and horizontal scroll bars to bring the selected
cell range area to the center of the screen). Hold CTRL and + to zoom in closer to the
specific area of the data. Then hold CTRL and - to zoom the worksheet back out to its
original size. (Note: Zoom to Selection which is found under the View tab of Excel Desktop,
is not available for Excel for the web)
3. On the ribbon, click View, Freeze Panes, Freeze Top Row. Now you have headings in your
columns like a header row, which will remain static on screen while you move down the
worksheet. Next, click Unfreeze Panes, and click Freeze First Column. The Sr No column will
remain static on the screen while you move right across the worksheet. Lastly, click
Unfreeze Panes to end this step.
4. To freeze both the top row and the first column at the same time, select cell B2
and click View, Freeze Panes, Freeze Panes.
5. You can open multiple workbooks in multiple browser tabs in Excel for the web, and to
switch between them, you just click each browser tab. (In Excel Desktop you have to click
the View tab, then click Switch Windows)

13
Task B: Entering Data
1. Download the file Personal_Monthly_Expenditure_Lab3.xlsx. Upload and open it using Excel
for the web. Go to the Expense - 2018 worksheet.
2. In cell A1, type Month and press Tab. Then type Housing and press Tab, type Food & Dining,
and press Tab, type Personal, and press Tab, type Auto & Transport, then press Tab, type
Health & Fitness, then press Tab. You are now done with the header row.
3. To enter some data as rows in column A, in A2, type Jan and press Enter. Then type Feb,
and press Enter, type Mar, and press Enter, type Apr, and press Enter.
4. To add another column between the Housing and Food & Dining, select column C, then
right-click column C, and choose Insert Columns. In the top row header cell C1, type Bills &
Utilities.
5. Select columns A to G, then double-click the divider between A and B to adjust the column
widths.

The Basics of Formulas (7:07)


Now that we have learned how to move, copy, and fill data, and how to format cells and data, next
we will take a look at the basics of formulas, including some basic calculations, selecting ranges in
formulas, and how to copy formulas.
A typical formula is made of several key components:
 The equal sign starts the formula off and lets Excel know you are creating a formula in this
cell.
 The next part is the function, which performs the calculation.

For example, the SUM function adds up the values in referenced cells or cell ranges. Then comes
the reference, which is the cell or range of cells you want to include in your calculation, and these
need to be enclosed in parentheses. You also have operators, which specify what type of
calculation to perform. Common arithmetic operators include addition, subtraction, multiplication
and division. And these are represented by symbols. The plus symbol for addition, the minus
symbol for subtraction, the asterisk for multiplication and the forward slash for division.

There are other types of operators too, namely comparison, text concatenation, and reference.
You may also use constants in your formulas, which as the name suggests are numbers or values
which you can enter directly into a formula, and which don’t change. This might be a whole
number such as 5, it might be a percentage such as 10%, or it might even be a date.

So, a typical formula might be =SUM(B5*20), which would take the value in cell B5 and multiply it
by 20.
Let’s start with a few basic calculations.

Suppose you want to add up January and February sales of accessories. You would start by typing
an equal sign, which lets Excel know you are entering a formula. Then you type in the function you
wish to use, in this case the SUM function.
Note the description. Next you type an open parenthesis, then you select your cell range, which in
this case would be E2 to E3, so you could enter that as ‘E2,E3’ then a close paranthesis and press
Enter. And if you wanted to add March sales as well, then you would have to extend the cell range
to include E4. So you could type E2,E3,E4 as your range and it will work.

Remember, to edit a cell, you select the cell, and either edit it directly in the formula bar, or press
F2, or double-click the cell. However, it’s very cumbersome and not very flexible to do it this way,
because if you wanted to add up the entire column then you’d have to type every cell reference,
one after the other. So thankfully, there’s a better way. Instead of typing each cell to include in the
reference, you just put a colon between the first and last values in our range, so E2:E4, in this
case. And if you wanted the whole column, then you would enter E2:E13 in your formula. But
there’s another way of doing it, and that’s by using your mouse to select the range, so you still
type =sum then open parenthesis, but select the range with your mouse (or SHIFT + arrow keys)
and just press Enter. Excel will add the close parenthesis for you.

14
To total these columns up, and add some tax, you’d add some headings first for Subtotals, and Tax
at 20%. Then your formula will need to multiply the value in Subtotals by 20%. If you want to add
up all the column subtotals and calculate the taxes, then you could repeat the previous process for
each column, but that’s very time consuming, and you don’t need to, because Excel has some
neat tricks to do this for you. Just select the fill handle in the bottom right corner of the cell, and
drag across to the other cells to copy the formula; this is called AutoFill. Notice how the formula is
copied, but the row references change in relation to the cells’ position on the worksheet. So what
was E2:E13 has become B2:B13. These are known as relative references, but more on that later in
the course. And you can do the same thing for the tax values in row 16.

Now, you need a row for showing the totals. The calculation here is simply the subtotal value in
cell B15, added to the tax in B16. And again, you can use the fill handle to copy the formula
across. If you want to total the sales of all products by month, you’d add a column heading; notice
how the cell style is copied to the new heading automatically. Remember, to widen a column,
either drag the divider manually, or double-click the divider. Then enter the formula in cell F2 as
you’ve done before. However, Excel has another trick up its sleeve. It’s called AutoSum and is
found on the Home tab, in the Editing group. This is a great little shortcut for some simple
common functions like Sum, Average, Count, Max, and Min, but you can choose other functions
too. You want ‘Sum’ for this particular calculation. Notice that it also has a keyboard shortcut of ‘Alt
plus equals’, and then press Enter, and it’s done.
Now you can use the fill handle to copy down the remaining values. But hold on, there is one more
Excel trick to show and it’s a good one!
Suppose your column of data was very long; you might have to drag the fill handle down over
several pages, which isn’t easy to do and can easily lead to errors when selecting large lists of
data values. Rather than needing to drag down to the rest of the column, you can just double-click
the fill handle, and it will automatically copy the formula to all the remaining cells in that column.
This one is a real time-saver. Finally, let’s format all these values to use the US dollar currency
format.

Video: Intro to Functions (5:28)


Now that you have learned about the basics of formulas, learned how to perform some basic
calculations, and how to select ranges and copy formulas, next we will have an introduction to
functions, including using some common statistical functions. And then we will learn about some
more advanced functions that a Data Analyst might also use.

First, let’s look at some common functions used for statistical calculations. So, we’ll add some row
headings for average, minimum, maximum, count, and median. Then in cell B20, let’s work out the
average of the car sales for the year, from the table above. On the Home tab, in the Editing group,
we click the AutoSum drop-down list and choose Average. Now, because AutoSum tries to add up
the values directly above it in the column, we need to modify the cell range here to B2 to B13.
Then we can use the Fill Handle as we’ve seen before to copy the formula across to column E. For
the minimum calculation in B21, we select Min from the AutoSum list. And again, we need to
modify the cell range. So this calculates the lowest value in our range.

And fill across to column E. And for the maximum calculation, we select Max from the list.
And then modify the range and once again, copy the formula across. This calculates the highest
value in our range.
In B23 we will calculate the Count, which basically just means the number of values that exist in
the selected range. So, we select Count Numbers from the list. Then modify the range.
For the median calculation, we can select ‘More Functions’ from the AutoSum list then select
‘Statistical’ as the category and scroll down to find the MEDIAN function. The ‘median’ returns the
exact middle of a range of selected values. Note that if you’re selecting an odd number of values it
will return the figure that is the middle value in your selected range, but if you have selected an
even number of values in your range, it will return the middle figure between the two middle

15
values in your range. Once again, we need to change the cell range to B2 to B13. And we can
then copy this formula across to column E.

You’ve seen AutoSum and some of the common statistical functions in Excel, but there are another
400-plus other functions available, so let’s explore just a few of those now. On the Formulas tab, in
the Function Library group, there are drop-down lists for several function categories.

i. The first is a list of ‘Recently Used’ functions, which updates automatically as you use them.
ii. Then you have functions related to ‘Financial’ calculations. If you hover over the name of a
function, you see a short description for each one; so here we have the accrued interest
function, and here is the interest rate function.
iii. The ‘Logical’ list has BOOLEAN operator functions such as AND, IF, and OR. There are
several functions related to Text, such as CONCAT, which is an updated version of a
previous function called CONCATENATE (which is still supported by the way for backwards
compatibility), FIND, and SEARCH. There are also several functions related to dates and
times, such as NETWORKDAYS, WEEKDAY, and WEEKNUM.
iv. In the ‘Lookup & Reference’ list there are functions such as AREAS, HLOOKUP, SORTBY, and
VLOOKUP.
v. In the ‘Math & Trig’ list you’ll find lots of useful mathematical functions, such as POWER,
SUMIF, and SUMPRODUCT, alongside many functions for trigonometric purposes, such as
cosine, sine and tangent.
vi. There is also a ‘More Functions’ list which provides several more function categories, such
as Statistical, Engineering, and Information. In the ‘Statistical’ list you’ll find functions such
as Average, Count, Max, Median, and Min; we saw some of these used earlier in this video.

If you’re struggling to find the function you want in these lists, you can also search for a function;
just click the ‘Insert Function’ button on the Formulas tab, and then either browse the category
lists available, or choose ‘All’ and look down the alphabetical list for the function you want.
Alternatively, type the name of a function you want to find, and click ‘Go’ to search for it, then
select the one you want from the returned search.

Video: Referencing Data in Formulas (9:22)

Now that you've had an introduction to functions, seeing the use of some common statistical
functions and learned about some of the more advanced functions that a data analyst might use,
in this video will look at the difference between relative, absolute, and mixed references in
formulas as well as how to use them. And we'll learn about formula errors in Excel.

It's important to understand the difference between relative and absolute references when
creating your formulas. By default, in Excel, cell references are always relative references. The
term relative is the key here, because it means that when you reference a cell, you are in fact
referencing the cells position in relation to the cell that the formula is in.

That is why when we have been copying formulas from one cell to another so far in this course,
using either copy and paste or the fill handle, we haven't needed to modify the cell references
because Excel assumes you are using relative references. When the formulas are copied, the cell
references are changed to match the relative positions of the cells
that are being copied to.

So now we know that relative references are the default in Excel, but how do we make it so that
the cell references don't change when we copy them? For that you need to use absolute
references in contrast to relative references. Absolute references to cells stayed the same. When
you copy a formula containing such references.

Lastly, there may also be some instances where you only want one of the cell reference identifiers
to be absolute and the other one to be relative. For example, you might want the row identifier to
be absolute, but the column Identifier to be relative, or vice versa. These are called mixed

16
references and. An example of this would be equal sign a dollar sign one plus A3 where a dollar
one. Has a relative column and an absolute row or dollar 8. Three has an absolute column. Ando
relative RO. In contrast to relative and absolute references, when you copy a formula containing
mixed cell references, any relative cell references will change, whereas any absolute cell
references will stay the same in the copied formula.

First, let's look at an example of using relative references in a formula. For example, if we enter the
formula equals A1 plus a 3IN cell, four note the blue an red highlighted cells in a one, and a three.
These denote the cells being relatively referenced in the formula. If we copy the formula to the cell
directly below using the fill handle, we can see that the result changes, and if we look at the
copied formula. You can see that the blue and red cell references have changed relative to their
position on the worksheet. The formula has been changed to equals A2 plus a four in the copied
formula. That is, each cell reference has moved one cell down and if we copy and paste the
formula to see seven, you can see that the results also changes and again we can see that the
blue and red cell references in the copied formula have changed now.

Let's look at an example of how to use absolute references in a formula. All you need to do to
make a cell reference absolute is put a dollar sign in front of the column and or row identifiers in
the formula. For example, if we enter the formula equals dollar sign a one plus sign a dollar 3IN
cell E4. Note the blue and red highlighted cells in a one and a three. These denote the cells being.
Absolutely referenced in the formula. When we copy the formula using the fill handle, you can see
that the result stays the same this time and if we look at the copied formula you can see that the
blue and red cell references haven't changed. The formula is still equal sign dollar a dollar one plus
a dollar three in the copied formula. That is, the cell references haven't changed. Similarly, if we
then copy and paste the formula to E7, you can again see that the result stays the same this time
and we can see that the blue and red cell references haven't changed. The formula is still equal
sign dollar a dollar one plus dollar a dollar three in the copied formula. That is, the cell references
haven't changed.

Lastly, will look at an example of how to use mixed references in a formula so if we enter the
formula equals a dollar one plus dollar 8, three in cell G4. Note the blue and red highlighted cells in
A1A three. These denote the cells being referenced in the formula. If we copy the formula to the
cell below using the fill handle, you can see that the result changes, but it's a different result from
the previous examples. And if we look at the copied formula, you can see that the first blue cell
reference has stayed the same but the second red cell reference has changed. If we copy and
paste the formula to G7, you can see that the same thing happens. The result changes and again
we can see that the first blue cell reference has stayed the same in the copied formula, while only
the red cell reference has changed.

Now we'll have a quick introduction to dealing with formula errors in Excel.
Because of the complexity of writing formulas, especially the more complicated ones, there are
bound to be occasions when you make a mistake in the syntax or in the data selection which will
lead to a formula error. Errors are typically denoted by displaying in the cell that is supposed to be
displaying the result.

17
One of the error codes in this list when you see multiple hash symbols in a cell, it's not really an
error, it just means the column either isn't wide enough to display the whole word or value. Or it
contains a negative date or time value? So if we type control plus semi colon, then space then
control plus shift plus semi colon, it enters today's date and the current time. But the cell is too
narrow to display it. So what we see is multiple hash symbols. If we adjust the column
width we can now see the cell contents. So as I said, this really shouldn't be considered as an
error.

However if we enter the formula seen in Cell I7 when we press enter, we see a hash name error.
This error was caused by trying to use an X as a multiplication operator when in fact it should be
an asterisk. Note the small green triangle in the top left corner of the cell. Also note that when you
select the cell and exclamation mark appears, providing you with a hint about what caused the
error. In this case it says the formula contains unrecognized text. When you click the dropdown
error next to the exclamation mark for an error, you see several options. The first line also gives
you a clue on the nature of the error. This one says invalid name error, so it was probably a
mistyped cell reference value or function name. If you click help on this error, uh, help pane opens
with specific information related to this error. If you click show calculation steps, a dialog box
opens displaying the current syntax with the error underlined. And you can try to evaluate the
error if you are certain the error is incorrect, you can choose ignore error, and if you want to edit
the formula, click edit in Formula Bar and the cursor will be focused in the formula bar so that you
can try and correct the formula error.

If you click error checking options, the Excel Options Dialog Box is opened at the section related to
error checking rules and you can modify these options to suit your needs. Each of the errors you
make which generate one of the error codes listed at the start of this video will have a different
reason and a different solution

Hands-on Lab 4: Simple Use of Functions


Objectives
After completing this lab, you will be able to:
Understand the basics of formulas
Perform simple calculations
Select ranges in formulas and copy formulas
Understand the basics of functions
Use common functions
Understand the more advanced functions available
Reference data in formulas
Differentiate between relative and absolute references
Understand how to handle formula errors

Exercise 1: Basics of Formulas


In this exercise, you will learn the basics of formulas, how to perform simple calculations, how to
select ranges in formulas, and how to copy formulas.

1. Download the file Personal_Monthly_Expenditure_Lab4.xlsx. Upload and open it using Excel


for the web. Go to the Expense - 2018 worksheet.
2. In A14, type Totals and in B14, type =SUM( then select cells B2 to B13 with the mouse, and
press Enter.
3. Select the fill handle on cell B14 and drag to G14 to copy the formula.
4. In cell H1, type Monthly Total and double-click the divider between H and I.
5. In H2, type =SUM( then select cells B2 to G2 with the mouse, and press Enter. If necessary,
select the fill handle on cell H2 and drag to H14 to copy the formula.
6. Select columns B to H. On the Home tab, in the Number group, click the Accounting
Number Format ($) drop-down list, and select $ English (United States).

18
Reading: Summary and Highlights

In this lesson, you have learned:

 There are several features to modify views in Excel, and it is very straightforward to enter
and edit data in a spreadsheet.
 You can move or copy data within a worksheet or between worksheets, and you can use
AutoFill to automatically enter data that is in a series or that fits a pattern.
 You can format both cells and data in Excel.
 A formula is made up of several component parts, and formulas can perform calculations
using numbers directly or by using references to data in the worksheet.
 You can use the Fill Handle in Excel to quickly copy formulas to other cells.
 There are several different categories of function you can use for different purposes, and
you can search for a function by name, or by category.
 You can reference cells in the worksheet in your formulas by using relative, absolute, or
mixed references.
 You can make a formula absolute by adding a dollar symbol ($) to a cell reference.
 If you get errors in your formulas, you can use the error-checking capabilities of Excel to
resolve them.

Module 3 - Basics of Data Quality and Privacy


In this module, you will learn about the importance of data quality and how to import file data in to
Excel from external sources. You will also learn about the fundamentals of data privacy.

Video: Introduction to Data Quality (3:31)


Data analysis can play a pivotal role in business decisions and processes. In order to use the data
to make confident decisions, we must have the right information for the project and the data must
be free from errors.
In this video we will learn how to profile data to discover inconsistencies. Whether we are working
with small sets of data or analyzing a spreadsheet with thousands of rows, one of the most difficult
parts of the data analysis is finding and keeping clean data.

19
To help with this process and qualify the data, look for these five traits: Accuracy, Completeness,
Reliability, Relevance and Timeliness.

Accuracy is the first and most significant aspect to data quality. A data analyst must clean the
data set by removing duplicates, correcting formatting errors, and removing blank rows.

Another important aspect of data quality is determining if the information required to complete
the data set is readily available. Why does this matter as a trait for quality data? Let’s say we are
given the task to calculate the revenues of all sales per region. After collecting the data, we
discover that no regions were specified. This data would then be considered incomplete and other
sources would have to be considered to obtain the data required.

Reliability is another vital factor in determining the quality of the data. For instance, let’s say we
are given the task to determine the agent revenue by customer. When gathering the data, we find
the agents keep their own records and do not always update the information in the shared
company database. With those factors in mind, we would then determine that the data in the
shared company database was unreliable and new processes would need to be established to
ensure reliable data.

Relevance is another trait of quality data. When collecting information, a data analyst must
consider if the data being assembled is really necessary for the project. For example, when
reviewing the data related to the sales revenue per customer, information such as customer
birthdays and other personal information is also included. By making the determination early to
exclude the personal information from the data set, the analyst would save themselves from
having to review unnecessary information.

The last factor in determining the quality of the data is timeliness. This trait refers to the
availability and accessibility of the selected data. Let’s say our sales report is going to be used for
weekly employee reviews, but our report is only refreshed once a month. This error in refreshing
the data would cause our report to become outdated, and would have
serious consequences for employee reviews

Video: Importing File Data (5:35)

Now that you have learned about the importance of data quality, in this video you will learn how to
import data from a text file using the Text Import Wizard, learn how to adjust column widths, and
learn how to add and remove columns and rows.

As you know, by default Excel works with .xlsx or .xls files and opens them as workbooks. But
Excel can also use data that is in other formats, such as plain text, or data that has been comma-
separated and tab-separated. Sometimes, these source files will be saved with a .txt extension
and referred to as ‘text’ files, but others might be saved with a .CSV file extension and are
typically referred to as CSV files.
20
Here in Notepad, I have opened a text file that contains data about car sales, and it uses comma
separated values (or CSVs) to separate each bit of data in a record. Notice that the top line holds
headings, such as Manufacturer, Model, Engine_size, and so on, and each one is separated by a
comma. We want these to become our headers when we import the file into Excel. The line below
these headings is the first line of real data, and again you can see that each piece of data is also
separated by a comma. There are 16 headings and there are also 16 pieces of data on each of the
lines below the headings. If we scroll to the bottom, we can see that last data record is for the
Volvo S80.

Now, to open the file in Excel, we choose File, Open, and then either select the file from the
recently used list or click Browse to find the file we want to import. When we open the file, the
Text Import Wizard launches automatically, and it will start to try and determine what your file
is.

Note that it has been detected as being a delimited file; that is, one that has its data fields
separated by a character such as a comma or a tab. As we want the headings to become headers
in Excel, we need to ensure that we select the option ‘My data has headers’. We can see a mini
preview of the data in the preview box below. Then we click Next to proceed in the wizard. In step
2 of the wizard, we need to select our delimiter; that is, which character is separating our pieces of
data; so we select Comma, and deselect any others. Note the data preview now starts to show us
what the imported data will look like. You can scroll down and across this preview window to
ensure that the data is going to look as you want and expect. It all looks OK, so we’ll continue with
the wizard. In step 3 of the wizard, we can set the data format for each column. For example, you
might want to change a column to Text or Date format. In this case we can just accept the default
General format and finish the import wizard.

In Excel we can see that the headings in the text file have been imported as a header row but also
notice that some of the columns are not showing all the data; some of the headings are not
showing in full and some of the data is not shown either; all you can see are a number of hashes in
the cells. This is because the column widths are too narrow in some cases.

If you remember, we can manually adjust a column’s width by dragging the divider across but to
change them all in one go, we select all the columns first, then double-click one of the selected
column dividers. We can do a similar thing with rows by dragging to make them bigger or smaller,
or double-clicking a row divider to autosize it.

There are some columns that we have decided we don’t really need; namely Vehicle_type and
Latest_Launch, so let’s remove those. This can either be done using the Delete drop-down menu in
the Cells group on the Home tab, and select Delete Sheet Columns, or by selecting and right-
clicking a column and deleting it that way. To add another column, you simply select the column to
right of where you want your new column to be, then right-click the column and choose Insert and
let’s give the header a name, such as Year. To delete a row you don’t need, select the row, right-
click it, and choose Delete and to add a row, select the row below the place you want to add your
new row, right-click
the row and choose Insert.

If you want to save the file as an Excel file, you can either choose File, Save As, or you can click
Save As in the yellow tooltip that appeared at the top of the worksheet when we imported the file,
and then you would choose ‘Excel Workbook (*.xlsx)’ in the ‘Save as type’ box.

Video: Basics of Data Privacy (5:57)

In this video, we will learn about data privacy and the regulations that govern the collected data.
When collecting customer data, specific regulations apply to how that data can used. By
21
understanding data privacy regulations and getting familiar with the following three fundamentals,
you can eliminate the risk of financial penalties and keep the trust of your customers.
Confidentiality, Collection and Use, and Compliance.

Confidentiality

Is an important element in data privacy and it acknowledges that the customer’s personal
information belongs to them. The types of information that can be accessed by a data analyst can
range from sales forecasts, to employee information, or even patient records. When accessing
these types of records the analyst must be able to recognize the different types of personal data.

 Personal Information or PI is any type of information that can be traced back to a specific
individual. This type of information can include anything from emails to images.
 Personally Identifiable Information or PII is specific information that could be used to
identify an individual. This type of information could include a social security number or a
driver’s license number.
 Sensitive Personal Information or SPI may not necessarily identify a specific individual, but
contains private information that needs to be protected because if made public it could
possibly be use to harm the individual. The type of information can include data about race,
sexual orientation, biometric or genetic information.

Collection and Use

By understanding personal data and the associated regulations, we can efficiently anonymize our
data by removing unnecessary information. This type of action can help build consumer
confidence and continue to develop the free flow of information.

When searching through data, the analyst must know the location of the company collecting the
data and the location of the respondent. Knowing where the data was collected is an essential
element of data privacy and what regulations must be applied.

The General Data Protection Regulation or GDPR is a regulation specific to the European Union,
and only applies to the jurisdiction of the individual. A new law created in Brazil, the LGPD, will take
effect in August 2020. These new data policy regulations apply to individuals within Brazil, and
ignores the location of the data processor. While the United States does not have one country-wide
principle law for data privacy, because of this, individual states began to make their own
regulations. For instance, California created the California Consumer Privacy Act (CCPA) to better
protect customer data.

There are also industry specific regulations that govern the collection and use of sensitive and
personal data. For example in Healthcare. HIPAA privacy rules govern the collection and disclosure
of protected health information. In retail, the PCI standards govern credit card data, and failure to
safeguard cardholder information can result in hefty fines. With a basic understanding of these
policies, we are able to remain compliant when handling any sensitive information.

22
Unfortunately, breeches in customer data is an all too common occurrence and understanding how
to remain compliant is essential. Understanding the data privacy regulations of the European
Union, the United States, and other countries as well as industries is key to keeping data safe.
Companies must comply with these privacy regulations at all times and also make sure policies are
readily accessible to employees. For example, let’s say a data analyst downloads a spreadsheet of
sensitive information. In order to complete the report by Monday morning, the analyst decided to
take their work laptop home for the weekend. After driving home, the analyst accidently left the
laptop in their car. The next morning, they found their car had been stolen along with the laptop.
Because it is the responsibility of the company to keep customer data safe, this was a breach of
privacy when the data left company property.

This type of action could not only cost the company large amounts of money in fines and
penalties, but could also reduce consumer confidence causing a significant impact to revenue.
While data privacy applies to most data that is collected, there are some instances where these
regulations do not apply. In order for these laws and regulations not to apply, the particular
collection of data must be completely anonymous.

To make data anonymous means to exclude all data which ties it back to a particular individual.
While this approach might not be practical in all circumstances, collecting data with privacy in
mind could remove privacy limitations and make data collections more accessible.

Video: Viewpoints: Data Quality and Privacy (4:47)

In this video, we will listen to several data professionals discuss the importance of data quality and
data privacy as they relate to data analysis. Let us start with, “What is the importance of data
quality as it relates to data analysis?”

Kevin McFaul
Data quality is of the utmost importance in terms of data and analytics, but the reason behind this
is because as soon as what you're presenting does not align with what someone expects, that's
the first thing that they tend to go after. Where did you get the data? What's happened to the
data? How's it been transformed? Because people like to think that they know and understand
their, their business. And when you start to challenge that if you don't have the ground to stand on
of the data that it's quality that it's clean and then it is from a trusted source, that's when you start
to get into a lot of discussions. A lot of debate. And ultimately, the plot of what you're trying to
present gets lost.

Richie Zitomer
The backbone of any successful data analysis project is good quality data. There is a common term
in computer science called garbage in garbage out, which is essentially if you read in bad quality
data, you can expect to get bad quality results. So, there's really nothing more important when
doing a data analysis than making sure that you're working with good quality data, and it's really
important to sense-check the data yourself and really feel comfortable that the data you're using
is of a really high quality.

Erin Huang
Data accuracy is above all: garbage in garbage out. It's a waste of time to analyze data of poor
quality, and it might mislead the business direction.

Nikki Winston
The integrity of the data that you're using or providing for someone else to use is of the utmost
importance. Data is used determine, when or where to launch a product, if a division is profitable
or not and it's easy to get things confused if you're not paying attention to the details. Using
inventory as an example, if you're looking at inventory at a SKU level and you accidentally pick the
wrong SKU to analyze and then you draw these conclusions that this particular item isn't profitable
23
when in fact it is. So, that's a major, major decision for a company to make obviously, so the
expectation is that there will be lots of due diligence, but in the beginning if you start off with that
data and then you build on that only to later realized that it wasn't a good idea, you've lost time,
energy, effort, and in some cases, trust.

Thank you for those viewpoints. What about the importance of data privacy as it relates to data
analysis?

Kevin McFaul
Data privacy is incredibly important, especially when you're working in industries like
pharmaceuticals or healthcare, but that's not where it stops. We have to have the ability to make
sure that the users are getting the appropriate level of data based on their roles and their
permissions. Now we can do this through a number of cuts of the data specific to each geography
or each function, or in some tools such as Cognos Analytics, we can start to build out that as part
of our model. Within there you can say who has access to what, whether it's at a granular level of
this person can see data in Canada or the US or whether it's simply this person can see this report
in its entirely or not. There's lots of different ways to handle this, but data privacy is of the utmost
important across all industries.

Joye Sistrunk
In today's world, data privacy is a huge thing on the tax side, especially of our business we have
what we have what we call PII: personal, identifiable information. We have to protect that and so
we can't just send things through email. We don't send tax returns or even actually in our
business. We don't send things through email. They have sensitive PII data in it, we encrypt it, we
make sure the email is encrypted or we use software, some certain softwares that will allow us to
not show the social security numbers or the names or the date of birth and what will happen is it
has a certain sequence, and we share that with the client by calling them. We don't put that in an
email and we certainly don't put that in the same email with the encrypted information because
we want to make sure that you are always safe. So, we have to make sure we're protecting it at all
costs.

Reading: Summary and Highlights

In this lesson, you have learned the following information:

The Five Traits of Data Quality:


1. Accuracy
2. Completeness
3. Reliability
4. Relevance
5. Timeliness

Importing Text:
You can use the ‘Text Import Wizard’ to import data from other formats, such as plain text, or
comma-separated value files.

The Three Fundamentals of Data Privacy:


1. Confidentiality
2. Collection and Use
3. Compliance

24
Module 4 - Cleaning Data
In this module, you will learn how to remove duplicate and inaccurate data, and how to remove
empty rows in your data. You will then learn how to deal with inconsistencies in your data and how
to use the Flash Fill and Text to Columns features to help you manipulate and standardize your
data.

Video: Removing Duplicated or Inaccurate Data and Empty Rows


(8:48)

Now that we have learned about the importance of data quality and data privacy, in this video we
will learn how to deal with inaccurate data, how to remove empty rows, and how to remove
duplicated data.

It’s very common when collecting or importing data - whether through manual or automated
processes - to get errors and inconsistencies in your data. This can be as simple as spelling
mistakes, extra white space, or the wrong case used in text, to empty rows or missing values in
your data, to inaccurate or duplicated data. Having these errors and inconsistencies in your data
can lead to issues with formulas not working, with unsuccessful sorting and filtering operations
and therefore inadequately visualized and presented data findings.

These data errors and inconsistencies require you to carry out some form of data-cleaning routine
to improve the quality and usability of the data. Let’s start off with one of the easier of those tasks,
which is spell checking.

In Excel, this works in pretty much the same way as you may have already encountered in
applications such as Microsoft Word or other common word processing applications. I have some
data here relating to the sales of toy vehicles, and the first thing we need to do is select what data
we wish to check for spelling; in this case we will try column K which contains the product line
data. Then we click Spelling which is on the Review tab. Well that seems to be OK, so let’s try the
Country information in column T. So, we do have an error here, where a country name has been
misspelt, or more likely, mistyped. We just click Change if we are happy with the spelling
suggestion, or we could choose another suggestion from the list, or even ignore this error if we
know the data is correct but in this case we will change it. Here’s another typo for a country name
and here’s one more.

The next inconsistency we will look for is empty rows. Empty rows in your data can cause lots of
issues relating to moving around your data, working with formulas, and sorting and filtering.
Therefore, it’s very important to remove them from your data. If you remember from an earlier
lesson, when we click CTRL+DOWN ARROW, it should take us to the end of that column of data,
but notice if we do that in this dataset, the cursor keeps stopping when it gets to an empty row
meaning that the dataset if essentially being split into multiple sections, separated by these empty
rows. That’s not good, so let’s resolve that now.

We have a couple of options; one option is to just manually scroll down the sheet looking for empty
rows and deleting each one, which is easy enough, and fine to do if you only have a small amount
of data, but imagine if you were dealing with hundreds, or thousands, or even tens of thousands of
rows? That would be a very laborious and time-consuming process. There is a much better way -
which involves selecting all our data first, either using the mouse, or the CTRL+SHIFT+END
keyboard shortcut. Then we select the Filter icon on the Data tab.
25
We can now see that each column has a filter icon next to the column header. If we then select the
Customer Name column-filter in column M then uncheck Select All then scroll down to the bottom
of the list, we can check the item called Blanks, and then click OK. This will now show only the
empty rows at the top of our sheet; this can be quite hard to see, but if you look in the row
numbers, you can see that rows 28,29,65,73,74,75 and 117 are listed at the top and are
highlighted in blue text. We can now select these rows, either using the mouse or going to the first
cell in the first data row, which is A28, and then using the CTRL+SHIFT+END keyboard shortcut
then delete the offending empty rows.
We then need to clear the filter and turn it off, so we can view our data again. Now, if we go back
to the first row in the top of the datasheet and try the CTRL+DOWN shortcut again, to go to the
end of the data column, it will work.

The next inconsistency we’ll look for is duplicated rows of data; it’s quite common for duplicate
data rows to exist in your imported data, caused either by human input error, or an error in the
import process. There are two ways of doing this in Excel; the first way includes reviewing the data
you plan to remove first, before deleting it, to ensure you are deleting the right data. This is our
preferred method as it provides an additional level of data security.
The second method, which we will also show you, is simpler, as you don’t review the data to be
removed first, but it lacks the security of the first method. It’s important to select a column of data
that you would NOT expect to have duplicate values in.

For example, if we consider the Price Each column, which is C, we would expect lots of these
values to be repeated, because the unit price of some products is the same, so this is a bad
example of a column to use to find duplicates. Instead, let’s use the Sales column in column E,
because it is far less likely that these values will be duplicated in the normal process of things, as
they are the total sales for each order.

So, we select the column…and choose Conditional Formatting, then Highlight Cells Rules, and then
Duplicate Values.
When we click OK, and scroll down the sheet, we can see that only a few values have been
identified as being duplicates. There seem to be duplicate values in rows 36 to 40 and in rows 74
to 78 Let’s zoom out so we can see both duplicate sections together. It seems like these are in fact
exact duplicate entries, and are likely to be an input error. Let’s delete the second section of
duplicate rows as they are out of sequence; as they relate to Motorcycles sales and are in the
Ships section of the sheet.

So that was the first, and recommended method of removing duplicate rows of data, which
previews the data to be removed first. Now, let’s try the second, simpler, but less secure method.
Let’s go back to 100% zoom, and go back to the top of the worksheet. This time, we select the
whole datasheet, and on the Data tab, we use the Remove Duplicates
Button, we then unselect all the columns, then only select the Sales column and the duplicate
rows are deleted.

The last cleaning process we’ll look at in this video is using the Find & Replace feature to repair
some misspelt surnames in the customer contacts column.
Find and Replace tools are under Find & Select on the Home tab in Excel, and if you have used
other Office products such as Word, it should be familiar to you already. We’ve had an email from a
Swedish customer, informing us that we have his surname spelt incorrectly on his order sheets. So,
we type the misspelt surname in to the ‘Find what’ box and click Find Next, then click it again to
see there are multiple incorrect entries. If we click Find All, all instances are listed,
and we can open the Replace tab to enter a name to replace the incorrect spellings. His surname
should be Larsson with a double ‘s’, so we’ll replace all instances with that corrected spelling. That
looks better, and we are finished.

26
Video: Dealing with Inconsistencies in Data
Now that we’ve learned how to deal with inaccurate data, how to remove empty rows, and how to
remove duplicated data, in this video we’ll look at changing the case of text, fixing date formatting
errors, and trimming whitespace from data.
When you collect or receive data from varying sources, it’s quite common to find that your data
contains text in mixed case; that is, some in uppercase, some in lowercase and some in capitalized
proper case (also known as sentence case). Some of this may be intentional; but often it’s not.
Excel doesn’t have a Change Case button like there is in Microsoft Word, so you need to use other
methods to perform this data cleaning task. Those methods are functions; namely the UPPER,
LOWER, and PROPER functions.

You can use these functions to help you change the case of text in your data. You can see that the
header row here is using all uppercase characters, so if you want to change that to use proper
case then you need to add another row to put the function in; this is referred to as a ‘helper’ row.

The PROPER function is simple to use; just type equals, then PROPER, then open parenthesis,
then the cell reference - In this case A1 - then close parenthesis, and press Enter. Here you can see
that the result in A2 is in proper case. Now you can try and drag the formula right across to column
X by using the Fill Handle on A2... but this can be very tricky when you have a lot of columns, so
let’s try another way. Instead of dragging, you can use SHIFT+RIGHT ARROW to select the columns
across to X first then press F2 to bring the cursor into focus in cell A2 then you hold down the CTRL
key while you press Enter, and it fills across for you. You might think that you could now remove
the original row; but look what happens when you do; you get a REF error because the formula is
referencing an invalid reference, and the
header row cells now contain just the failed formula rather than the actual header text. So, you
need to undo that, and instead, you copy the contents of the helper row to row 1 but when you
paste you need to choose the Paste Values option. Now the header row cells just contain header
text, and you can remove the helper row in row 2.

Let’s now use the UPPER function to change text from proper case to upper case. Insert a column
to the right of the column you want to change. This will be a ‘helper’ column. Then you type the
formula containing the UPPER function in the first data cell in this new helper column. Again, it’s a
simple formula; you type equals, then UPPER, then open parenthesis, then the cell reference – in
this case T2 – and then close parenthesis and press Enter You can see the result is the country
name in upper case, and you can then copy that formula down the rest of the column by double-
clicking the Fill Handle cross symbol. As before, you then copy and paste the contents of the
helper column to the original column, but use the Paste Values option. Now you can delete the
helper column

Next, we’ll use the LOWER function to change text from proper case to lower case. As before, you
insert a column to the right of the column you want to change. This will be another ‘helper’
column. Then you type the formula containing the LOWER function in the first data cell in the
helper column. Once again, it’s a very simple formula; you type equals, then LOWER, then open
parenthesis, then the cell reference – in this case K2 – and then close parenthesis, and press
Enter You can see the result is the product line data in lower case, and you can now copy the
formula down to the rest of the column by double-clicking the Fill Handle once more. As before,
you then copy and paste the contents of the helper column to the original column, but ensuring
you use the Paste Values option.

It’s quite common to receive data that has a mixture of date formats, or that uses a date format
that isn’t suitable to your region. Now let’s look at how to change the format of some dates. You
can see that this date format is currently using a 2-digit day, a 2-digit month, and a 4-digit year
value. When you open the Number format dialog box, you can see in the Locale box, that this is an
English (United Kingdom) date format.

You want to use a US date format, so you first change the locale to English (United States). In this
list, you can see there are several date options to choose from; let’s choose one which uses the full
27
month name, then a 2-digit day, and a 4-digit year value. You could then copy this format to the
rest of the date cells. However, if you want to format these dates using your own custom format,
you can do that too.
In the Number format list, you select Custom, and then choose an existing format that is similar to
what you want and simply modify it to create a new custom format; here we’ll have the day, then
3-letter month, then 4-digit year. To apply that new custom date format to the rest of the column
you could either use the Format Painter tool, or you can select the rest of the column and choose
the new custom format from the Custom list in the Number Format dialog box.

You might find that your data has some whitespace; that is, unwanted spaces in your data. Here
you can see that we have some spaces at the start some spaces at the end and some unwanted
double spaces in the middle of our data. We’ll first have a look at what you can do to clean up
these unwanted spaces in your data by using the Find & Replace feature in Excel.

So you first select all the data then on the Home tab, you click Find & Select, then Replace. To get
rid of double spaces, you enter a double space in the ‘Find what’ box, and a single space in the
‘Replace with’ box. Then you click Find Next.
And choose Replace for each item you want to change. You could click Replace All to do all the
fixes in one go but unless you are absolutely sure of the changes, it’s better practice to check and
replace each one in sequence in case there are some valid reasons for these extra spaces. If you
have a very large dataset you might also choose Replace All to save you a lot of time. So using the
Find & Replace feature got rid of most of those unwanted whitespaces, but not all of them; we
removed double spaces using that feature, but we also have some single spaces left at the start
and end of some of the cells.

You can’t use Find & Replace to remove single spaces otherwise you would lose ALL spaces in your
data - including standard spaces between words - which you don’t want to remove. But, there is
another tool you can use to clear spaces from cells, and that’s the TRIM function.

To use the TRIM function, you once again insert a helper column. The TRIM function is simple to
use; just type equals, then TRIM, then open parenthesis, then the cell reference – in this case M2 –
then close parenthesis, and press Enter.
You can then double-click the Fill Handle symbol to copy this formula down to the remainder of the
column. Now you need to copy the contents of the new column N to column M, and remember
once again to paste using the Paste Values option. You can now see that those erroneous spaces
have been removed, or more accurately speaking, have been trimmed. And lastly, you can remove
the helper column.

Video: More Excel Features for Cleaning Data (6:13)

Now that we’ve learned how to change the case of text, how to change date formatting, and how
to trim whitespace from data, in this video we’ll discuss how to use the Flash Fill and Text to
Columns features in Excel to help clean data.

We used Flash Fill briefly earlier in the course as a quick method of entering data that fits a
specific pattern, such as the names of months or days of the week, but it can also be useful as a
data cleaning tool. It can split a column of full names into two separate columns for the forename
and surname, and it can also help to modify the naming convention used in a column of names.

For example, in the vehicle toy sales worksheet there is a column containing the last names of
contacts, and another containing their first names. If you want to use the Flash Fill feature to
combine these names into one name column, you first insert a helper column; let’s call it
‘Contactname’. Then, in the first row in the new column you enter the full name of the first contact
in the format of your choice; for example you might want surname, then a comma, then the
forename, or you might want surname, and just an initial, and so on; in this case let’s just enter
the name in the standard format of forename then surname with a space between them, and then
28
we press Enter. Next you start typing the second contact’s name in, and you’ll see that Flash Fill
displays a preview of the remaining names for you. If you’re happy with what’s in the preview, all
you have to do is press Enter, and it fills in the remaining names for you right down the column. It
even works when there are two names in one of the columns such as Wing C here … and Da
Cunha here.

Now you can remove the original columns if you no longer need them. So, in the previous task we
saw how to combine two columns of data into one column using Flash Fill; now let’s see how to use
it to modify the naming convention in a column. Let’s switch to the customer contacts worksheet.
Then in the first data row of the next column, that is B2, we type the name of the first contact
using whatever naming convention we want. We’ll use surname, then comma, then a space, then
the forename, and press Enter. Again, when we start typing the second contact’s name in the next
row down, that is B3, Flash Fill detects the pattern and fills in the remaining names in column B
when we press Enter. You could then copy and paste the column header, and delete the original
column A.

What we couldn’t do with Flash Fill was take a single column with two names in and split that into
two separate columns. We need to use the ‘Text to Columns’ feature to do that.
As the name suggests, and unlike Flash Fill, the ‘Text to Columns’ feature can take a column
containing multi-part text and split that text into one or more other columns. This can be useful for
splitting any multi-part text, such as names or addresses, into separate component parts. Let’s
open the customer contacts worksheet again. Then we’ll add column headings for the next two
columns and copy the cell format used in the first column header. Then we’ll widen the columns. If
we then select the data in column A from A2 to A23, and on the Data tab, click Text to Columns, a
wizard is launched. On the first page of the wizard, ensure that ‘Delimited’ is selected. On the
second page, ensure that only ‘Space’ is selected as the delimiter. On the third page of the wizard,
click the little arrow next to ‘Destination’…
and select cell B2 on the worksheet, then click the little arrow again to return to the wizard. We’re
now finished with this wizard.

You can see that the full customer contact names in column A have now been successfully split
into two new columns in B and C, and you could now remove column A if you no longer need it.
You can also achieve the same result using functions.

This would be required if you were using ‘Excel for the web’, the online version of Excel, as this
doesn’t have the ‘Text to Columns’ feature. There’s also a bit more flexibility with functions, which
can be especially useful if you have names that are complex and mixed, such as having
hyphenated names or some names with a middle initial, some with two middle initials, and some
with no middle initial.

So, we open the customer contacts worksheet again. Then we’ll add column headings for the next
two columns, and copy the cell format used in the first column header. Then we’ll widen the
columns. Next, we enter the formula in B2 to extract the forename part of the name
[=LEFT(A2,SEARCH(“ “,A2,1))]. This formula extracts five characters from cell A2, starting from the
left and including the space. Then, in cell C2 we enter the formula to extract the surname part of
the name [=RIGHT(A2,LEN(A2)-SEARCH(“ “,A2,1))] This formula extracts seven characters from
cell A2, starting from the right. Then we double-click the Fill Handle in cell B2 to use AutoFill to
complete the column. And we do the same to the Fill Handle in cell C2 to use AutoFill to complete
that column also.

Video: Viewpoints: Issues with Data Quality (4:01)


In this video, we will listen to several data professionals discuss issues around data quality. Can
you tell us your experience with poor quality data and the cleaning of that data?

Asha Barnes

29
A large portion of my time is spent cleaning, verifying, checking data before I run an analysis.
Working in healthcare, most of the information captured is based off of what someone's put in, so
humans can't be calibrated. Two people can have a similar situation and look at things slightly
differently. So, it's up to me to make sure that if one describes something as navy blue and the
other person describes it as dark blue that I consolidate it and make it blue. That's just an
example. We don't normally do that in healthcare, but the thought is that you always have to
check the integrity of the information before you do your analysis to make sure that your results
are accurate.

Kevin McFaul
No data is going to be perfect. That's an unfortunate reality in the world in which we live in.
Databases and data is collected for the broadest possible purpose, but oftentimes there are still
things that are missing or not quite in the format that we want. Whether that's collecting date and
time as a single field, whereas when we're doing our analysis, will I be able to break it out by day,
month, and quarter? These are things that we can take into consideration. There's a lot of different
cleansing activities that can be done and can be undertaken to help you get something that's
specific and works for you and the way you want to work.

Nikki Winston
I have had experiences with poor quality data where I'm reviewing financial statements and I'm
looking at margins, calculating ratios, trying to understand if what I'm looking at is number one
directionally correct, but two, am I looking at the right thing? Are all of these calls current calls,
relevant to the period that I'm analyzing? Has all of the data been captured? Do I have all of the
revenue for a given month? Then you have to go back and look at the sources, scrub that
information to validate that what you're seeing is correct. And from an accounting perspective, if
that data is incorrect or out of period, then adjustments need to be made to the general ledger
which houses the data to properly reflect what's happening.

Kevin McFaul
Poor data quality can really come into play and cause discussions that don't need to be happening.
They can cause you to be second guessed. They can cause you to not be able to be firm and
present your case and your data reliably.
Now, if this is the case, there's several different ways we can handle this. One is to go all the way
back to the source to ensure that the source data is being pulled appropriately or simply being
able to outline and be very specific and direct in terms of what transformations or changes have
been done to the data through tracking this in something like Watson Knowledge Catalog and
being able to present that to your audience.

Nikki Winston
If you're filtering and sorting data, and you find that it's wrong, you have to go back and, and fix
things, that time could have been spent working on other deliverables, and again, it can call the
data integrity into question if you're constantly having to redo or reiterate certain parts of data
and quite frankly, it can be frustrating at times if you're habitually having to do that. So, paying
attention to the details and the minutiae so that you're not wasting time backtracking on
something that you could have fixed early on are just some of the many benefits of ensuring that
your data quality is good.

Hands-on Lab 5: Cleaning Data (45 min)

In this lab, first you will learn how to deal with inaccurate data, how to remove empty rows, and
how to remove duplicated data. Next, you will learn how to change the case of text, how to change
date formatting, and how to trim whitespace from data. Finally, you will learn how to use the Flash
Fill feature and functions in Excel to help clean data.

Objectives
After completing this lab, you will be able to:
30
 Understand how to deal with irrelevant or inaccurate data
 Remove empty rows and duplicated data
 Change text case and date formatting
 Trim whitespaces from data
 Use Flash Fill and functions to clean data

Exercise 1: Removing Duplicated, Irrelevant or Inaccurate Data

In this exercise, you will learn how to deal with inaccurate data, how to remove empty rows, and
how to remove duplicated data.

Task A: Check spelling


1. Download the file Customer_demographics_and_sales_Lab5.xlsx. Upload and open it using
Excel for the web.
2. Select column L (CREDITCARD_TYPE), then click Review tab, and select Spelling.
3. Click the correct suggestion to change the spelling. (Note: Don’t change ‘jcb’ spelling when
doing the spell check. We will need ‘jcb’ for the Exercise 1 Task D.)
4. Close the Spelling pane.

Task B: Remove empty rows


1. Press CTRL+HOME, then press CTRL+SHIFT+END to select the whole datasheet.
2. On the Data tab, click Filter.
3. Press CTRL+HOME, click the filter arrow in the CUST_NAME column, and then click Filter.
4. Click the Select All checkbox to deselect all of them. Then select just Blanks, then OK.
5. Select first row, then press CTRL+SHIFT+END to select all rows.
6. Right-click the selected rows and then click Delete Rows.
7. Finally, on the Data tab, click Clear, then click Filter.

Task C: Remove duplicate rows


1. Select Column T (ORDER_ID) since ORDER_ID values are unique.
2. On the Home tab, click Conditional Formatting> Highlight Cells Rules> Duplicate Values,
and then click OK.
3. Select the whole datasheet (CTRL+SHIFT+END)
4. On the Data tab, click Remove Duplicates.
5. In the Remove Duplicates dialog box, ensure that Select all columns is checked and that My
data has headers is also checked, then click OK.
6. In the pop-up box informing you how many duplicate values were found and removed, click
OK.

Task D: Use Find & Replace to correct misspelling.


1. On the Home tab, click Find & Select.
2. Click Find. In Find what, type jcb, and click Find All.
3. Click Replace.
4. In Replace with, type JCB, click Replace All, and then click the Close icon.
5. On the Home tab, click Conditional Formatting> Clear Rules> Clear Rules from Entire Sheet.

Reading: Summary and Highlights

In this lesson, you have learned the following information:


 It’s important to remove any duplicated or inaccurate data, and it’s important to remove
any empty rows in your dataset.

31
 There are several other types of data inconsistency that you may need to resolve, in order
to properly clean your data:
 Change the case of text
 Fix date formatting errors
 Trim whitespace from your data
 You can use the Flash Fill and Text to Columns features in Excel to manipulate and
standardize your data, and functions can also be used to help manipulate and standardize
your data.

Module 5 - Data Analysis Basics, Filtering and Sorting


Data
In this module, you will learn about the fundamentals of analyzing data using an Excel spreadsheet
and how to filter and sort your data. You will also learn how to use some of the most useful
functions for a data analyst, and how to use the VLOOKUP and HLOOKUP reference functions.

32
Video: Intro to Analyzing Data Using Spreadsheets

Now that we have learned how to collect and clean our data, it is time to decide the best method
for analysis. In this video, we will discuss the importance of filtering, sorting, performing
calculations, and shaping our data to provide meaningful information.

Deciding how to manipulate our data can sometimes be difficult. Before we make any changes or
adjustments, we will need to visualize the final output. Below are some questions to ask before
beginning the task. How big is the dataset? What type of filtering is required to find the necessary
information? How should the data be sorted? What type of calculations are needed?

Now that we have visualized the final output, we must decide the best approach to shape our
data. The most basic step would be to filter and sort the data. By sorting the data, we are able to
organize it based on conditions such as alphabetically or numerically. For example, if we wanted to
check for duplicate order numbers, we could sort the data and quickly see any duplicates.

After sorting and removing the duplicate row, we find that the view needs to be more specific to
meet our requirements. We now decide that we only want to see the data for the month of
November. By adding a filter, we can now choose to only see items with a ‘MONTH_ID” that is
equal to “11”. By filtering our data, we are now able to only see the rows that meet the filter
criteria and it allows us to better analyze our information.

Becoming familiar with all of the tools to analyze data can seem daunting, but one key benefit of
using a spreadsheet is the ability to use functions. Functions in Excel are organized by several
categories, including mathematical, statistical, logical, financial, and date and time-based. Let’s
say we wanted to get an average of company revenue for the month of June. We realize there are
over 100 items that would need to be calculated. In normal circumstances, to get an average, we
would have to create a formula to add each row and divide by the total number of rows. This type
of calculation would not only be very long but can expose the analyst to possibly making a mistake
[ =B1+B2+B3…../160] With the use of a function, we would be able to simplify our calculation in
one easy step [=AVERAGE(B1:B160)]

While sorting and filtering data on our spreadsheet can be useful on its own, first converting your
data to a table has many benefits. When we convert our data into a table we are able to filter and
calculate the data more efficiently.

One example is the ability to easily calculate columns. For the column ‘MSRP’, we choose ‘Sum’
and we’re able to quickly calculate the sum of the column. If we then look at the data, and only
want to calculate the ‘MSRP’ total based on Japan, we would filter the ‘Country’ column to only
display Japan, and the column would then only add the values in the rows that were associated
with Japan.
While all data may not work in a table, there are quite a few advantages to formatting your data as
a table:
 Automatic calculations even when filtering
 Column headings never disappear
 Banded rows to make reading easier
 Tables will automatically expand when adding new rows

33
Sometimes data needs to be more organized then what a basic tabular format can give us, and
creating pivot tables with charts can be a better way to analyze and display the required
information. In Excel we have the option of creating a pivot table to display and analyze our data,
and optionally, an associated pivot chart.

For example, let’s say we want to know what company ordered products in the month of October.
From the original table of data, we create a pivot table to organize and analyze the required data,
along with a pivot chart to display the information. By then adding the month filter to the newly
created pivot table, we can see the results for the month of October not only in the table, but the
changes are automatically updated in the pivot chart.

When trying to single out specific information in a large dataset, a pivot table is a nice way to
show only the information that is required. This allows us to quickly and easily scan the essential
information. Pivot charts are a nice accessory to pivot tables, as they allow us to visually process
data, and in most cases, will let the audience grasp the information quicker. The advantages of
selecting a pivot table and chart are:
 Manipulate data without using formulas
 Quickly summarize large data sets
 Ability to display engaging charts and graphs

Video: Filtering and Sorting Data in Excel

In the previous video we learned how to use the flash fill and text-to-columns features in Excel to
help clean data. In this video we will discuss how to filter and sort our data to enable us to control
what information is displayed and how it's displayed in our worksheets. Filtering your data
enables you to gain more control over which parts of your data are displayed at any given time in
excel. This can help with the visibility of data by narrowing down the data to within specified
criteria and parameters and it can also help when searching for specific pieces of data.

To filter your data the first thing you need to do is turn filtering on, which is very simple. On the
data tab click "filter" and that's it, you will now see a small filter icon next to each of the column
headers. As a side note if you want to only filter on one or more columns, select those columns
first, then click filter. As another side note, if you format your data as a table, the columns
automatically have filter controls added to them. So now each column has a filter that can be
applied to the data in that column. In the order date column you can filter on the years. In product
line you can filter on the different product types. And in customer name you can filter on each
customer by name. Let's first filter on the year. We'll select orders from 2004 only by deselecting
the other year, and if you wanted to we could expand the year and filter by months also, but we
won't do that for now. If you look at the status bar at the bottom of the worksheet you can see that
there are only 50 out of 114 records now displayed. If you want to clear a filter you can either click
the "clear filter" from option or click the "select all item" in the filter list. Now let's filter on the
product line column to display only the rows that hold data for sales of classic cars. And again
we'll clear the filter. Lastly, we'll filter on the customer name column and only displays sales to
"mini gifts distributors limited" and then clear that filter.

So far, we've only applied one filter at a time, but suppose you want to filter down to a greater
degree. We can do that too by just enabling all those filters together. And now we are only
displaying sales of classic cars to mini gifts distributors limited in 2004. Remember if you only
want to clear one filter then click its filter button in the column header, and click the "clear filter
from option," but if you want to quickly clear all filters you can use the clear button in the sort and
filter group on the data tab. So far we've used what are commonly referred to as auto filters, but
you can also use custom filters to specify other criteria to apply to a filter to text or numbers. For
example, if you wanted to see sales orders that are over or under a certain value you can do that
with custom filters. For the sales column let's add a number filter that only displays sales that are
over two thousand dollars. If you look in the status bar you can see that we are now showing 111
out of 114 records. Then let's clear that filter and filter it the other way to display the sales orders
34
that are below two thousand dollars. We can see that there are only three orders that are below
two thousand dollars. It's important to note that the data rows that we don't see have not been
removed. They are still there, they have just been hidden from view by the filters and this is
indicated by the row numbers you see on the left in blue. The row numbers start at 69 and jump in
large increments indicating that there are many more rows of data in our data set than are
currently being displayed. Let's clear those filters. If we look at a column filter for a column that
contains text, you will see that the menu item changes to text filters instead of number filters and
you can see that there are several
text filter options. And if you want to turn off filtering altogether for a worksheet, just click the filter
button on the data tab. Now let's take a look at the basic sorting capabilities in Excel.

Sorting is a very important part of the role of a typical data analyst. You might need to organize
your text-based data alphabetically, your number-based data numerically or your date-based data
chronologically. When you sort data using these logical parameters it makes it easier for you to
conceptualize and visualize your data in a more meaningful way. When sorting data the first thing
you need to do is select which data to sort. For example, if you want to sort your customers
alphabetically, select a cell in the customer name column first and then either sort by a to z or by
z to a. And if you want to sort your sales figures numerically, select a cell in the sales column first
and then either sort from smallest to largest or from largest to smallest. And lastly, if you want to
sort your customers' order dates chronologically, select a cell in the order date column first, then
sort from oldest to newest or from newest to oldest.

But you can also sort your data by more than one column at a time. Simply select a cell in your
data then on the data tab click "sort," then either use the sort-by column suggested or use the
drop-down list to select a different column. In this case we'll choose the order date column as our
first sorting criteria and we'll choose oldest to newest in the order drop down list. To add a further
sorting level you click "add level" then you choose another sort column in the "then buy" drop-
down list. In our case we'll choose sales, and for this sort level we'll choose largest to smallest in
the order list. If you have a header row in your data as we do here, then ensure you select the "my
data has headers" check box, then click "ok" to sort. So, the data is now sorted to list the oldest
orders by order date first, then within each order date if there are multiple instances with the
same order date, then the net sorting level lists data by the largest order values first, down to the
smallest order values.

Video: Viewpoints: Filtering and Sorting (1:55)


In this video, we will listen to several data professionals discuss the importance of filtering and
sorting your data. Why is it important to filter and sort your data?

Kevin McFaul
Filtering and sorting are very important as part of your analysis and visualization experience,
because this allows you to create one single view of the data, but then provide a function for
people to be able to do their own analysis on it. Now, just to clarify what we mean by this is sorting
tends to be highest to lowest, alphabetical, or in some cases, you may want to create some
custom sorting where you put your particular product or offering at the start and then have the
rest falling behind it. Or you may want to group a few at the start to show your direct competitors
versus others.

Joye Sistrunk
I love, love, love the filter sort feature in Microsoft Excel.
What it allows me to do is get to the heart of the data I can drill down and see, for
example, how much revenue a client had for a specific time frame, or how much money they
made in a specific timeframe without looking through a lot of rows and a whole lot of information.
So, filtering and sorting really allows you to narrow it down and they get very specific and get the
answers that you're looking for and not just get loads of data that you don't necessarily need.

Kevin McFaul
35
When we talk about filtering, we talk about this, to mean that I have a particular value on which I
want to see the data specified path. So, for example, if we had a bar chart showing our sales over
months and I want to see it in a particular geography or for a particular product line, I could have
that available and allow me to filter down so that my sales would be specific just to one geography
or one product line.

Video: Useful Functions for Data Analysis

Now that we’ve learned how to use the Filter and Sort tools in Excel to filter and sort our data to
enable us to control what information is displayed, and how it is displayed in our worksheets, in
this video we’ll discuss how to use some of the most common functions a Data Analyst might use;
namely IF, IFS, COUNTIF, and SUMIF.

First up, let’s look at how to use the IF function.


The IF function is one of the most used logical functions in Excel. The IF function enables you to
logically compare a value against criteria you set in the function, and then return a result based on
whether the result of the logical comparison is true or false and these values can be text values or
numeric values. An IF function essentially says; “if something is true, then return a value or do
something, but if it’s not true, then return a different value or do something else”.

For example, in our vehicle toy sales worksheet, if we wanted to have a column that recorded
whether the order had been shipped or not, you could add a new column to the right of the
existing column – let’s call it shipped? and then enter the formula seen in cell H2 This formula is
saying – if the text in G2 says ‘shipped’ then return ‘Yes’, and if it doesn’t then return ‘No’. You can
then use the Fill Handle to copy this formula down the column. You can see that most of the cells
do say ‘Yes’, but some don’t, as the order hasn’t been shipped for one reason or another. We could
also use the IF function to emphasize the size of an order. So, if we add a new column to the right
of ‘Sales’, and name it ‘3K plus or minus’ Then enter the formula seen in cell F2. This formula is
saying – if the order is over three thousand, then return the text “Over 3k”, but if it isn’t, then
return the text “Under 3k”. And we can copy the formula down the column.

In an ideal world, you would only use the IF function to apply one or two conditions, but there may
be scenarios where you want to apply multiple conditions. In these cases, you can use the
‘nesting’ capabilities of functions to bring together several IF statements in one formula; these are
called ‘nested IF functions’.

For example, if we add another column here for the order size and then enter the formula seen in
cell F2. You can see that this formula, contains multiple IF functions; one is needed for each
condition one for Large, one for Medium, and one for Small and it requires three sets of
parentheses.

So, it’s a relatively long and complex formula, but it does work. Again, we can copy the formula
down the column. Even though Excel technically supports the nesting of up to 64 different IF
functions in a formula, it is not a recommended best practice. Having multiple IF functions in a
single formula can become extremely challenging to manage.
For example, suppose you come across a formula like this that you haven’t used for some time, or
even worse, was created by someone else; it could be quite difficult to work out how and why it is
being used. Also, if your conditions increase, then you need to add more conditions to an already
quite complex and long formula, which will only complicate matters more. To resolve this issue, a
new function was developed called IFS.

36
The IFS function is only supported on Excel 2019, Excel for Microsoft 365, and Excel for the web. As
the name suggests, this function can replace multiple nested IF functions being used in a single
formula, to simplify matters so, if we add a further column for order size but this time we’ll use the
IFS function instead.

As you can see in cell G2, this formula only has one set of parentheses instead of three, and only
uses one function instead of three. Let’s copy that formula down the column too. Now let’s have a
look at another example of using the IF function, but we’ll combine it with Conditional Formatting
too.

If we switch to the car sales worksheet and add a new column to the right of the Year Resale Value
column and call it ‘Retention %’. Then, we enter the formula seen in cell G2 which will divide the
‘Year Resale Value’, by the original ‘Retail Price’. We need to format this as a percentage. And then
we can copy it down the column.

Next, we’ll add a column to highlight the retention value for each car. The formula we add here in
cell H2 uses the IF function to state that if the percentage in the previous column is greater than
69%, then mark it as ‘Good’, but if it isn’t, then mark it as ‘Poor’.

Once again, we’ll copy the formula down the column.

We could also use Conditional Formatting to highlight the retention value percentages even
more.
We select H2, and on the Home tab, click Conditional Formatting, and make a new rule. The
condition in our rule will only format cells that contain a specific text value and that value is the
word ‘GOOD’. And if it does match that condition, then format it with a dark green font and fill the
cell in pale green.

Let’s copy that conditional formatting down the rest of the column. You can see that the cells that
contain the word ’good’ are now formatted as we defined, but the cells containing the word ‘poor’
are not. Let’s add another conditional format rule.

This time, we’ll select Manage Rules, because we are going to add another rule to our existing
rule. The new rule will be the same as the previous one, with the exception of looking for a match
with the word ‘poor’ instead, and formatting those matching cells with red text and a pink
background fill. And once again, we copy that down the column. Now all the cells that contain the
word ‘poor’ are formatted as red text with a pink cell fill.

Let’s now have a quick look at how to use the COUNTIF function.
COUNTIF is one of the statistical functions provided in Excel. You can use it to count the number of
cells that meet a certain criterion such as the number of instances where an employee’s name
appears in a list of sales invoices, or the number of occasions a particular part number appears in
a list of purchase orders.

Let’s switch to the vehicle toy sales worksheet. Suppose you want to find out how many of the
sales orders in the list went to customers based in the United Kingdom.We enter the formula you
see in cell AD7.
37
Note that when we are using text as a criterion, we have to enclose the text in quotation marks so
there were 6 sales orders in the UK. And if you wanted to discover the same thing for French
customers, then you would just edit the existing formula, or copy it and then edit it. You can see
there were 14 orders for French customers. Notice that this time the text entered was in
lowercase, and it still works; so names in this function are not case-sensitive. And let’s do the
same for United States customers; there are 41 orders to customers based in the states.

There is also a newer function called COUNTIFS which applies criteria to cells across multiple
ranges to count the number of occasions where all criteria have been met. This removes the need
to use multiple COUNTIF functions in a long and complex single formula. The COUNTIFS function is
only supported on Excel 2019, Excel for Microsoft 365, and Excel for the web.
Now let’s take a quick look at how to use the SUMIF function, which is a very commonly used
mathematical function in Excel. You use the SUMIF function to sum the values within a specified
range that meet specified criteria. For example, you might want to add up only the salaries that
are over a specified salary level, or you might want to find the total of all sales of a particular
product category. We’ll enter the formula seen in cell AD10

This formula will add up each of the sales orders that have a total of more than 3,000 dollars.
Again, notice that because we have used an arithmetic operator, that is the ‘greater than’
operator, we must enclose the criterion in quotes. If we specify a criterion that is only a number,
we don’t enclose it in quotes. So, the total sum of all orders that were over 3,000 dollars is almost
470,000 dollars.

You can also use wildcards such as ‘question mark’ (?) and ‘asterisk’ (*) when searching for partial
matches, and you can also specify to extract values from a different column than the column
where you have specified the criteria. For example, if we enter the formula you can see in cell
AD13, it will sum all the car sales in column E, for only those products in the ‘productline’ column
that end in ‘cars’.

There is also a newer function called SUMIFS that you can use to sum cells based on multiple
criteria. This removes the need to use multiple SUMIF functions in a long and complex single
formula. The SUMIFS function is only supported on Excel 2019, Excel for Microsoft 365, and Excel
for the web.

Video: Using VLOOKUP and HLOOKUP Functions

Now that we’ve learned how to use the IF, IFS, COUNTIF, and SUMIF functions, in this video we’ll
look at how to use the VLOOKUP and HLOOKUP reference functions.

VLOOKUP is one of the most commonly used reference-type functions in Excel, and it enables you
to find data referenced in a lookup table.
38
It stands for Vertical Lookup and therefore is a useful tool to use when you want to find something
in a table or a range by row. Shortly, we will look at HLOOKUP, which stands for Horizontal Lookup,
which looks for data by column instead.
VLOOKUP works by using a common shared key between the source data and the lookup data in
the lookup table.
A typical VLOOKUP formula would look like:

A. B3 is the lookup value, that is, the value or word you are looking for.
B. A2:B12 is the lookup table or range, that is, the table array or range of cells that contains
the lookup value. In a formula, Excel references this as ‘table_array’. The lookup table can
be on the same worksheet or in another separate worksheet.
C. 2 is the lookup column number, that is, the number of the column in the lookup table that
contains the value you are looking for. In a formula, Excel references this as
‘col_index_num’.
D. FALSE is an optional parameter that determines whether the match found has to be exact
(denoted by FALSE), or can be approximate (denoted by TRUE).In a formula, Excel
references this as ‘[range_lookup]’. The square brackets round this argument in the
formula, signifies that it is an optional argument, whereas the others are required
arguments of a VLOOKUP formula. If you don’t specify the optional FALSE or TRUE
parameter in your formula, it will default to FALSE; that is, an exact match is required. You
can also use the number 0 instead of FALSE, and the number 1 instead of TRUE.

OK, now let’s see the VLOOKUP function in action. In the car sales worksheet, suppose we wanted
a quick price list of our favorite cars. The first thing we need to do, is put the column containing
the value we want to search for, in the leftmost column, as VLOOKUP requires this. Then we can
delete the original column. We then enter the formula seen in cell V16 which is looking for the
word ‘Corvette’ in the table array from cell A2 to G156, and then looks for the value in the fifth
39
column – in this case, the ‘Price’ column – that matches the row containing ‘Corvette’ and returns
an exact value of 45,705 dollars. Note that in this example, we are using a part of our existing data
table as the lookup table, or table array.

Let’s format that as US currency. Then we’ll format it to zero decimal places. In fact, rather than
use the reference A25 in the formula, it will be easier to use the reference to the word Corvette in
the mini table in this worksheet, where our list of favorite cars is. So that is V5, and the formula
still works.
Now, let’s copy that formula up to the favorite car table, above it in the worksheet. But there’s a
problem, because when we copied the formula, the cell references changed. This happened
because as we learned earlier in this course, the default state of cell references is relative, and we
want them to be absolute in this case. So, let’s undo that copy operation. To make the cell
references absolute, we need to add dollar symbols to all the cell references in the formula.
This can either be done manually, or you can put the cursor in each cell reference in turn in the
formula and press F4 each time, to automatically add the dollar symbols.

Let’s try and copy the formula again and this time it works.

If we use the Fill Handle on cell W5 to copy it down to the rest of the cars, it doesn’t work; in fact,
we end up with the same result in every cell. Why? Because each one is referencing the same cells
in the lookup value, because we used an absolute reference.

All we need to do now, is modify the formula to remove the absolute reference for just the row
parameter, in the lookup value part of the formula, by removing the dollar symbol. So in cell W5
we change $V$5 to $V5, then when we drag the Fill Handle down it will copy the formula correctly,
and all the prices will be changed to reflect their correct retail price.

Lastly, to show that the two tables are now connected by this VLOOKUP function, if we change the
retail price for the Chevrolet Corvette in the main data table in cell E25, the price will also change
in the favorite cars price list.

Let’s now take a quick look at the HLOOKUP function, which as we mentioned earlier, does the
same thing, and works in virtually the same way, as the VLOOKUP function, but it looks for data in
columns, rather than rows.

40
So, HLOOKUP looks for a word or value in the top row of a table, and then returns a value in the
same column from a row specified in the table array. Therefore, you would use HLOOKUP if your
comparison values were situated in a row along the top of a data table.

In contrast, you would use VLOOKUP if your comparison values were located in a column to the left
of the data you want to find; as they were in the previous task. Of the two functions, VLOOKUP is
used far more than frequently than HLOOKUP, because of the nature of most data tables.

The syntax for HLOOKUP is identical to that of VLOOKUP except that you specify a row index
number, referenced in a formula by Excel as ‘row_index_num’. This indicates the number of the
row in the lookup table that contains the value you are looking for.

Let’s create a small lookup table on the right hand-side of our main data table; a few columns have
been hidden in this worksheet to make viewing a little easier. So we’ve now got Low HP, Medium
HP, and High HP in the top row of the lookup table. Next, we’ll add Wingdings symbols as ratings
for the 3 horsepower levels, 1 sad face for the low horsepower rating, 2 neutral faces for the
medium rating and 3 happy faces for the high horsepower rating.

Now, let’s add a new column to the right of the HP Level column, and call it HP Rating. Then in cell
L2 we’ll enter the HLOOKUP function. This function will look for the value in cell K2, which in this
case is ‘Medium HP’, and it will look for it in the cell range from Y21 to AA22, which is our little
lookup table, and it will return the answer it finds in row 2 of the table under Medium HP, and use
an exact value. Note that we’ve used some absolute references in this formula too.
Notice that what is returned is the text ‘KK’, so we need to format the cell using the Wingdings
font.

Now, when we double-click the Fill Handle, the whole column shows the HP Rating symbols
relevant to each row’s HP Level value. And we’re done.

Although VLOOKUP and HLOOKUP are regularly still used as the de facto functions for lookup
references in Excel, there is a newer function called XLOOKUP. This version is only supported on
Excel desktop versions from Excel for Microsoft 365, and on Excel for the web, as well as on Excel
for iPad and iPhone, and Excel for Android tablets and phones.

XLOOKUP is an improved and combined version of VLOOKUP and HLOOKUP together. It can work in
any direction; vertically or horizontally. It also uses separate lookup array and return array values,
instead of a single table array and a column or row index number.
XLOOKUP is an improved and combined version of VLOOKUP and HLOOKUP together.

Hands-on Lab 6: Filtering and Sorting Data using Functions for Data
Analysis

41
In this lab, first you will learn how to use the Filter and Sort tools in Excel to filter and sort our data
to enable us to control what information is displayed, and how it is displayed in our worksheets.
Next, you will learn how to use some of the most common functions a Data Analyst might use;
namely IF, IFS, COUNTIF, and SUMIF. Finally, you will learn how to use the VLOOKUP and HLOOKUP
functions in Excel to reference data contained in both vertical and horizontal lookup tables.

Objectives
After completing this lab, you will be able to:
Use the Filter and Sort tools
Use IF, IFS, COUNTIF, and SUMIF functions for data analysis
Use the VLOOKUP and HLOOKUP reference functions

Reading: Summary and Highlights

In this lesson, you have learned the following information:

 Before shaping your data, you need to visualize the final output, and ask yourself the
following questions: How big is the dataset? What type of filtering is required to find the
necessary information? How should the data be sorted? What type of calculations are
needed?

 There are several advantages to formatting your data as a table:


I. Automatic calculations even when filtering

II. Column headings never disappear


III. Banded rows to make reading easier
IV. Tables will automatically expand when adding new rows

 The most basic way of shaping your data is to sort and filter it:
I. Sorting data helps you to organize it by a specified criteria, such as numerically,
alphabetically, or chronologically.
II. Filtering our data makes it easier to control what data is displayed and what is
hidden, based on filtered fields.

 Functions in Excel are arranged into multiple categories; including mathematical, statistical,
logical, financial, and date and time-based. Common functions for a data analyst include IF,
IFS, COUNTIF, SUMIF, VLOOKUP, HLOOKUP

42
Module 6 - Using Pivot Tables
In this module, you will learn how to create pivot tables in Excel and use several pivot table
features to analyze data.

Video: Introduction to Creating Pivot Tables in Excel

Now that we’ve learned how to use the VLOOKUP and HLOOKUP functions, in this video we’ll look
at how to create and use Pivot Tables in Excel. We’ll first look at how to format our data as a table,
then how to create Pivot Tables and use fields in a Pivot Table to analyze data, and lastly we’ll see
how to perform calculations in a Pivot Table.

Having a worksheet full of informational data is all very well, but to really get some use out of it we
need to analyze it from different perspectives to find answers to questions related to the data.
Now, we’ve already used features such as filters and formulas to draw mathematical and logical
conclusions about our data but not all questions can be answered easily using filters and formulas
alone.

In order to obtain usable and presentable insights into your data you need something else and that
something else is Pivot Tables. Pivot Tables provide a simple and quick way, in spreadsheets, to
summarize and analyze data, to observe trends and patterns in your data and to make
comparisons of your data. A Pivot Table is dynamic, so as you change and add data to the original
dataset on which the Pivot Table is based, so the analysis and summary information changes too. A
Data Analyst can use Pivot Tables to draw useful and relevant conclusions about, and create
insights into, an organization’s data in order to present those insights to interested parties within
the company.

43
Before you start to create a Pivot Table in Excel, it can be very helpful to first format your data as a
table. The reason for this is not only to make it more organized and defined and to add table styles
to your data, but primarily it makes it a lot easier when adding records to the dataset.

In the car sales worksheet, let’s first select any cell within the data, and then on the Home tab, in
the Styles group, choose ‘Format as Table’. Then choose a style from the gallery. Note that Excel
automatically knows the boundaries of our data range, but we can change this if we need to. And
ensure you select ‘My table has headers’, if indeed it does.
After you click OK and the data has been formatted as a table, note the filter drop-downs at the
top of each column – these are automatically added when you format as a table.

If we now scroll down to the bottom of the table and start adding another row of data for another
vehicle when you click Tab or Enter, note that it is automatically formatted and included as part of
our table. OK, now let’s see how to create a basic Pivot Table, and how to use fields to arrange
data in a Pivot Table.
Just before we do that, there are a few things you should use as a checklist to ensure your data is
in a fit state to make a Pivot Table from, and these are:
 Format your data as a table for best results
 Ensure column headings are correct, and there is only one header row, as these column
headings become the field names in a Pivot Table
 Remove any blank rows and columns, and try to eliminate blank cells also.
 Ensure value fields are formatted as numbers, and not text
 Ensure date fields are formatted as dates, and not text

In the worksheet, we can just select any cell in the table. Then, on the Insert tab, we click
PivotTable. Note that in the ‘Select a table or range’ box, the table name – Table1 – is already
entered for us. If we hadn’t just formatted this data as a table, we would specify the cell range
here instead.

Under that, we need to decide whether we want to create the Pivot Table on a separate new blank
worksheet, or on this worksheet – a new worksheet is the default – and is the most commonly used
option. So, a new blank worksheet opens, displaying some basic Pivot Table instructions in the
graphic on the left of the worksheet, and a ‘PivotTable Fields’ pane on the right. You can rename
the worksheet for the Pivot Table if you wish.

To build the Pivot Table report we need to add some fields from the top of the PivotTable Fields
pane, to one or more of the sections in the bottom part of the pane. For example, if we want to
find out the total sales for each model of car, let’s drag the Manufacturer field to the Rows section
of the report and then we’ll drag the Model field there too. But this isn’t really the way we want it
to look, so we’ll drag the Manufacturer field to appear at the top of the Rows section above the
44
Model, which makes more sense with our data. Next, we’ll add the Price field to the Columns
section but again that really isn’t the way we want to view the data, so we’ll drag Price to the
Values section instead, which makes a lot more sense and looks a lot better. Next, we’ll add the
Unit Sales field to Values too, so now we can see both the individual price for each model and the
number of unit sales of each model. Let’s add the Vehicle-type field to Columns, but that doesn’t
seem very useful, so let’s remove that field which we can do in two ways, either by using the drop-
down menu( or, if we undo that we can also do it by simply dragging the field out of the Columns
section, either to the left over the worksheet, or to the top over the fields list above.

Let’s now look at how to perform a simple calculation in a Pivot Table. If we look in the ‘Sum of
Price’ column in our Pivot Table, we can see that the figures are formatted as General. So first, let’s
change the format for these figures to US currency. This can be done by modifying the value field
settings for the field in the relevant section of the PivotTable Fields pane. We’ll format the field as
US dollars and show no decimal places. Next, we’ll add a calculated field from the ‘PivotTable
Analyze’ tab, using the ‘Fields, Items & Sets’ button. We want this field to calculate the total sales
for each model by multiplying the price by the number of unit sales. When we create and add this
formula, it gets added to the PivotTable Fields pane, as a field called Total Model Sales. And we can
change the format to make it US dollars again. A new column called ‘Sum of Total Model Sales’ has
now appeared in the Pivot Table in our worksheet.

Video: Viewpoints: Pivot Tables (3:27)

In this video we will listen to several data professionals discuss their experience using pivot tables
to analyze data. What are your experiences using pivot tables to analyze data?

Asha Barnes
My experience using pivot tables in Excel is extensive. I can use them all the time. The thing to
keep in mind is that you can sum, average, and count easily. You can set it to group-by so people
can choose what the parameters are at the top. It's great if you've got a couple of thousand
records all the way up to whatever Excel can handle. So, a pivot table is just a real simple way of
manipulation without having to do any actual querying or development language.

Erin Huang
I once had a huge ecommerce sales data. I need to analyze the KPI's including gross merchandise
volume and take rate. However, I can only generate limited insights if I stay at high level. With
pivot tables I was able to group the data in terms of countries, type of stores, type of products,
which enabled me to view the data and analyze the key KPI's at different levels of granularity.

Joye Sistruck
I use pivot tables and we use pivot tables in our firm, especially during audits to assist us and help
us to kind of drill down on the data because what a pivot table does is, it helps you to take a large
set of data and whittle it down to something that's meaningful. So, in the case of audits, a client
might have, you know, $500,000 worth of maintenance and repair bills that are made up of three-
hundred invoices. But we don't want to see every invoice for every dollar we want to see the high
dollar invoices, so we're going to use that pivot table to narrow it down to the invoices that
actually are going to have the highest level of impact on the financial statement.

Kevin Mc
Much like Excel, pivot tables are a great way to understand your data quickly and effectively. Being
able to just open up an Excel sheet, put it into a pivot table, drag and drop things in to get a sense
of what the numbers look like, what the values are, really can help you get a good sense of the
data in order to then start to build out something a little bit more robust. Being able to understand
the fields, what they mean, what they look like. These are all things that can help you at the start
of a project, as you're looking to do your analysis.

Richie Zitomer
45
Pivot tables are incredibly useful to get a quick view of your data and to look at multiple levels of
your data in a very quick and clean way. It's just very, very easy to create a pivot table on a set of
raw data, aggregate it by some level of interest, be it country, be it you know country the user is
from, be it the year the user joined, or anything else, be it something related to time. It's really
good for quickly seeing and understanding some of the more high-level summaries that are hidden
within your data.

Video: Pivot Table Features (9:37)


Now that we’ve learned how to create and use Pivot Tables in Excel, in this video we’ll look at
some other features that we can use with Pivot Tables, including Recommended Pivot Tables,
Filters, Slicers, and Timelines.

First, let’s look at Recommended Pivot Tables, which isn’t exactly a feature as such; it’s really more
of a list of suggested different combinations of data that could be used when creating a Pivot
Table. These recommendations are based on the data we select in the worksheet, and they are a
great way to get started creating Pivot Tables if you don’t have much experience with them yet.

For example, in the vehicle toy sales worksheet, if we select column B, which contains data about
the quantity of items ordered when we choose Recommended Pivot Tables from the Insert tab,
then we are presented with a list of potential data combinations related to the order quantity
information. However, if we select column F, which contains Order Size information, then the
recommended pivot table list changes to reflect that data. Let’s select the third one down, which is
the sum of sales by territory; because that sounds like something we could get some useful insight
from, by presenting it in a pivot table.

Note that a new worksheet is opened containing the recommended pivot table, and a new pane
opens on the right, called PivotTable Fields. Let’s rename the worksheet to something more
meaningful. In the PivotTable Fields pane, you can see that some fields have already been added
to the Rows and Values areas.

Although it’s a recommended pivot table, we can still make it our own, by adding more fields for
example. So, let’s add the Productline item to the Columns area using drag and drop. Now we have
columns for each of the product lines in our pivot table, such as motorcycles, ships, and trains. In
the pivot table, we can manually expand any field we want to view its contents. Here we can see
that the order dates are located underneath the territory names in our pivot table.
Note that this matches the order of the fields in the Rows area of the PivotTable Fields pane. We
can manually collapse each of the fields too but we also have the option of expanding all the fields
at once and collapsing them all too.

The next feature we will delve into is pivot table filtering. Pivot table filters work in much the same
way as the standard filters we used earlier in the course. Note that we already have some in-built
filtering in this pivot table. For example, the Row Labels header is a filter, and we can filter on any
of the listed territories, such as Japan. Just like standard filters, it’s very simple to clear a filter in a
pivot table. We also have a Column Labels filter, allowing us to filter on any of the productline
items in this pivot table; for example we could show data only for the trains product. We also have
the option of adding the Productline field as a standard filter instead of a column heading, by
dragging it to the Filters area in the PivotTable Fields pane and we can then use it as a standard
filter, as we have done earlier in this course.

The filter also allows us to select multiple filter items.But because it is now being used as a
standard filter rather than a column header, we can’t see the split of the information on these two
product lines; we just see a combined total.
When we had the filter as a column header, the information on each product line was presented
separately in each column. Let’s display all the field totals again.

46
And we’ll drag the productline field back to the Columns area where it was previously, so we can
see the split of our different product lines in the pivot table. The next pivot table feature we will
look at are Slicers.

Slicers are essentially on-screen graphical filter objects that enable you to filter your data using
buttons. Slicers make it easy to perform quick filtering of your pivot table data, and they also
display the current filter state, making it easier for you to know, and see, what data is currently
being shown, and which is being hidden, by the filter.

For example, if we remove the productline field from the pivot table by dragging it out of the
PivotTable Fields pane and then, from the PivotTable Analyze tab, we click Insert Slicer and then
choose the Territory field as our slicer. We can see that the slicer can be freely moved around
anywhere on the worksheet, and it contains buttons for each of the territory items, such as EMEA,
North America, and Japan. We can also select the Multi-Select button to filter on multiple territories
if we wish. We can click the Clear Filter button to clear all slicer filters.

Let’s add another slicer to our worksheet for the productline field. However, be sure to select a cell
in the pivot table first, because if you don’t, then the insert slicer button won’t work. Note that
slicers can also be added from the Filters group on the Insert tab as well as from the PivotTable
Analyze tab. We’ll select the Productline field this time for our slicer, and drag it near the top of the
worksheet. As before, we can select only one slicer item, or we can turn on Multi-Select and
choose several items to filter on in the slicer.

Note that when you use multi-select filtering, when you select an item, you are in fact filtering it
out; that is, you are defining which items will NOT be displayed in the pivot table. This is the
opposite behavior to when you are selecting single items in a slicer. So now we are displaying only
‘classic cars’, ‘trains’, and ‘trucks and buses’ products for the EMEA and North America territories.

Now let’s clear those slicer filters and put the productline field back in the Columns area of the
pivot table, so it’s ready for the next feature we will explore. And let’s move these slicers out of
the way, further down the worksheet. The last useful feature for pivot tables we are going to look
at, is Timelines.

A Timeline is another type of filter tool that enables you to filter specifically on date-related data in
your pivot table.
This is a much quicker and more effective way of dynamically filtering by date, rather than having
to create and adjust filters on your date columns. We can add a Timeline for our pivot table either
from the PivotTable Analyze tab or from the Insert tab.

Again, ensure you select any cell in the pivot table first. We’ll select the Orderdate field as our
Timeline filter.
Then we can drag it up the worksheet and enlarge it. The default for this timeline is to display data
by month, but you can also filter by days or by quarters. You can select a single quarter; or you
can select a range of quarters. In this case, we’ll select twelve months between quarter 3 of 2003
and quarter 2 of 2004. You use the Clear Filter button to clear a timeline filter. You can also filter by
years. For example, here we have selected 2003 only. And you can combine slicers and timelines
as filters in a pivot table. For example, here we can filter the slicers to display only data for trains,
in the EMEA and North America territories, and only in the year 2003.

And if we filter on the year 2004 instead, you’ll see that there is no data being displayed; meaning
that there were no sales of train products in 2004 in either the EMEA or the North America
territories.

Timelines and Slicers have their own tabs in the ribbon when you select them, and their properties
can be modified to change how they look and how they work. For example, let’s change this
Timeline to a light green shade and let’s change this Slicer to a nice orange color.

47
And lastly, to remove a timeline or slicer, you can either select it and press the Delete key or right-
click it and choose Cut.

Reading: Summary and Highlights


In this lesson, you have learned the following information:

Pivot Tables:

 To obtain usable and presentable insights into your data you need to use Pivot Tables.
 Pivot tables provide a simple and quick way to summarize and analyze data, to observe
trends and patterns in your data and to make comparisons of your data.
 Pivot tables are dynamic, so as you change and add data to the original dataset on which
the pivot table is based, the analysis and summary information changes too.
 A Data Analyst can use pivot tables to draw useful and relevant conclusions about, and
create insights into, an organization’s data in order to present those insights to interested
parties within the company.

Use this Pivot Table checklist to ensure your data is in a fit state to make a Pivot Table:
 Format your data as a table for best results.
 Ensure column headings are correct, and there is only one header row, as these column
headings become the field names in a Pivot Table.
 Remove any blank rows and columns, and try to eliminate blank cells also.
 Ensure value fields are formatted as numbers, and not text, and ensure date fields are
formatted as dates, and not text.

Arranging Pivot Tables with Filters and Recommended Tables:


 You use the Pivot Table Fields pane to add and arrange data fields in your pivot table.
 Recommended Pivot Tables are a list of suggested different combinations of data that could
be used when creating a Pivot Table, based on the data selected in the worksheet.

Filters and Slicers:

 Slicers are on-screen graphical filter objects that enable you to filter your data using
buttons, which makes it easier to perform quick filtering of your pivot table data.
 Timelines are another type of filter tool that enable you to filter specifically on date-related
data in your pivot table. This is a much quicker and more effective way of dynamically
filtering by date, rather than having to create and adjust filters on your date columns.

Final Assignment: Final Project


In this lesson you will:

Apply skills learnt in the course to clean and prepare data in Excel spreadsheets.
Demonstrate how to analyze data using Pivot Tables in Excel.
48
Evaluate the projects of your peers to provide a project grade.

Introduction to Final Project - Part 1


Now that you are equipped with the skills to clean and prepare data using Excel, you will have the
opportunity to practice and apply your skills on a real world data set.

In this scenario, you are a Junior Data Analyst who has recently joined a growing company that
markets classic and collector cars, motorcycles, and other vehicles to customers around the world.
You have been tasked with cleaning data that has been imported from the sales department. The
data is in comma-separated value (CSV) format and needs to be cleaned before you begin your
analysis. In Part 2, you will be tasked with running an analysis of the sales data using pivot tables.

Grading Criteria

After completing both Project - Part 1 and Project - Part 2, you will complete a Peer-graded Final
Assignment. Your grade will be based on completing the following tasks:
In addition to the tasks below, you are required to submit the following URL:

- Excel file named sales_data_sample_PART1.XLSX for Part 1 of the final assignment.

Part 1:
Task 1: Save the CSV file saved as an XLSX file?
Task 2: Widen all columns so that all the data is visible
Task 3: Remove all the empty rows
Task 4: Remove duplicated records
Task 5: Sort column A
Task 6: Format the numbers
Task 7: Remove all double-spaces from the data
Task 8: Combine the two columns into a single column
Task 9: Save the Workbook

Yes, column A is ordered from lowest to highest value.

Yes, both were changed as "number" and the decimals adjusted as per instructed.

Yes, both columns were combined into one. Former columns with only name and last name were
deleted.

Yes, the calculations are below the table in the first spreadsheet.

Yes, sorted from higher to lowest.


49
50

You might also like