0% found this document useful (0 votes)
3 views

IBM Excel Basics for Data Analysis

The document provides an overview of using Microsoft Excel for data analysis, covering spreadsheet basics, terminology, and common applications. It emphasizes the advantages of spreadsheets over manual calculations, including data organization and analysis capabilities. Additionally, it includes keyboard shortcuts for efficient navigation and data manipulation within Excel.

Uploaded by

linotoffolo22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

IBM Excel Basics for Data Analysis

The document provides an overview of using Microsoft Excel for data analysis, covering spreadsheet basics, terminology, and common applications. It emphasizes the advantages of spreadsheets over manual calculations, including data organization and analysis capabilities. Additionally, it includes keyboard shortcuts for efficient navigation and data manipulation within Excel.

Uploaded by

linotoffolo22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

IBM - Excel Basics for Data Analysis

Company Internal
Sommario
Introduction to Spreadsheets for Data Analysis..............................................................................................................2
Introduction to Spreadsheets......................................................................................................................................2
Spreadsheet Basics......................................................................................................................................................3
Excel Keyboard Shortcuts............................................................................................................................................5

IBM - Excel Basics for Data Analysis Company Internal 1


Introduction to Spreadsheets for Data Analysis
Introduction to Spreadsheets

There are several spreadsheet applications available in the marketplace; some of them are more widely known and
used than others, and some are free, while others need to be paid for.
By far the most commonly used spreadsheet application, and the most fully featured of them all is Microsoft Excel.
The desktop version comes in a paid form as part of the Office suite and some Microsoft 365 subscriptions, but there
is also a web-based cut-down version called Excel for the web, also known as Excel Online. The online version is free
to users with a Microsoft account, but does not offer all the advanced features that the desktop version provides.
The next most popular is Google Sheets, which offers a lot, though not all of the features that Excel provides, and is
free with a Google account. This is a web-based application and it integrates nicely with other Google apps, such as
Google Forms, Google Analytics, and Google Data Studio.
Then there is LibreOffice Calc, a totally free and open source desktop spreadsheet application that offers more basic
functionality than Excel or Google Sheets, but still has a lot of the tools you need for data analysis, such as charts,
conditional formatting, and pivot tables.
Other spreadsheet apps include Zoho Sheet (a fully-featured web-based application that is comparable with Google
Sheets), OpenOffice Calc, Quip from Salesforce, Smartsheet (which is predominantly for project management), and
Apple Numbers, (which is included with Apple devices such as Mac computers and is also available on the App Store
for other Apple devices).

Spreadsheets provide several advantages over manual calculation methods. For example, once you have your
formulas correctly written, you can be assured that your calculations are accurate, and that the calculations will be
performed automatically for you. Spreadsheets also help keep your data organized and easily accessible. Your data
can be easily formatted, filtered, and sorted to suit your needs. If you do make mistakes in your data entry or your
calculations you can easily edit them, undo them, or use error-checking tools to help remedy those mistakes. And
lastly, you can analyze data in spreadsheets, and create charts, graphs, and reports to help visualize your data
analysis.
The most common business uses for spreadsheet applications include the following: Data Entry and Storage,
Comparing Large Datasets, Modelling and Planning, Charting, Identifying Trends, Flowcharts for Business Processes,
Tracking Business Sales, Financial Forecasting, Statistical Analysis, Profit and Loss Accounting, Budgeting, Forensic
Auditing, Payroll and Tax Reporting, Invoicing, and Scheduling.

IBM - Excel Basics for Data Analysis Company Internal 2


And away from the business side of things, other typical uses include Personal Expenses, Household Budgeting,
Recipe library, Fitness Tracking, Calorie Counting & Weight Monitoring, Sports Leagues such as Fantasy Football,
Cataloging Music Libraries, and even Contact Lists, Shopping Lists and Christmas Card Lists.
As a Data Analyst, you can use spreadsheets as a tool for your data analysis tasks, including:
- Collecting and harvesting data from one or more distributed and different sources.
- Cleaning data to remove duplicates, inaccuracies, errors, and resolve missing values to improve the quality
of the data.
- Analyzing data by filtering, sorting, and interpreting it to determine what useful information can be gleaned
from it.
- Visualizing data, to help you tell a story about your data analysis findings to key business stakeholders and
any other interested parties within your organization.

Spreadsheet Basics
Let’s first cover some basic spreadsheet terminology.
When you open Excel, you have the option of creating a new blank workbook or opening an existing workbook.
We’re going to choose New, and then Blank workbook. Workbooks are the highest-level component in Excel and are
represented as a .XLSX file. So, when you open an existing workbook or create a new workbook you are in fact
working with a .XLSX file. The workbook contains all your data, calculations, and functions, and contains several
other underlying elements that make up a workbook.
A workbook consists of one or more worksheets, each of which is represented by a tab in Excel. Each worksheet is
given a name which is displayed on the corresponding tab for the worksheet. By default, each tab is named Sheet1,
then Sheet2, and so on. To make these worksheet tabs more meaningful it is usual to rename them, so they make
more sense in relation to the worksheet’s purpose. For example, you might call a worksheet January Sales, or
perhaps the name of a region or store, or even an office or department. To do this, right-click the tab and choose
Rename. Instead of right-clicking to rename, you can also just double-click the name of a worksheet tab to rename it.
Essentially, worksheet tabs can be named anything you want to fit your particular needs to make it easier to
understand what that worksheet represents. Note that a worksheet that is highlighted, as the Tire Sales worksheet
tab is here, is referred to as the active worksheet. If you want to order your worksheets in a different way, that is
very simple to do. Either drag a worksheet tab to the left or right and drop it in the place you want, which is
represented by the little black arrow, or if you are not comfortable with dragging and dropping, then the longer way
of doing that is to right-click the worksheet tab, select Move or Copy, and then in the list titled Before sheet, select
where you want your worksheet tab to be placed, and click OK.
Every worksheet is made up of a lot of rectangular boxes called cells. These cells will contain your data, which may
be text, numbers, formulas, or calculation results. Cells are organized in columns, which run vertically down the
screen and use a letter system; this is column B for instance. And rows, which run horizontally across the screen and
use a numeric system; this is row 7 for example. Each cell is represented by a cell reference which is essentially just
its column letter and row number. For example, if we click somewhere near the center of this worksheet, we now
have the cell M20 selected. This is usually referred to as the ‘active cell’. This is not only indicated by the highlighted
edges of the cell but also if you look in the top left corner of the worksheet, you will see its cell reference is noted in
the little box. Here you can see it says M20. One important thing to note here is that cells are always referenced by
their column letter first then their row number; so, column M, and row 20. The last element of a workbook I want to
mention is a cell range. This identifies a collection of several cells selected together; that could mean a few cells in
the same row or the same column, or it could mean several rows and columns together. This can either be done
using the mouse by selecting the first cell then ‘dragging’ down or across to include other cells; or you can use
SHIFT+ arrow keys. This range of cells is often referred to as an array, and it’s most commonly used as a reference in
calculations and formulas. For example, if you wanted to add up all the values in a column between cells D9 and D19
you would specify this cell range within a formula. Note that cell ranges are notated using a full colon (:) between the
cell references; so, in this example it would be D9:D19, or to specify a few cells in the same row it might be D9:H9, or
to select several rows and columns it might be D9:H19. We will see this notation in use later in this course when we
start looking at calculations and formulas. These cell ranges could even be a reference point to cells contained on
another worksheet; this is usually referred to as a 3D reference.

IBM - Excel Basics for Data Analysis Company Internal 3


Now that we have a basic understanding of the main elements that make up a worksheet, let’s see how to move
around a spreadsheet, get familiar with the ribbon and menus, and learn how to select data in a worksheet. To open
a sample file, we click File. This opens Backstage View. Here you can create a new workbook, or open, save or print a
workbook. You can also access Excel Options. Now, we want to open our sample file. So, we click Open, and either
select it from my Recent list, or click Browse to find the data file we want. The first thing we should do is get
acquainted with the ribbon and menus. Notice that on the ribbon at the top we have several tabs. Some of these
tabs may be familiar to you from other Office products, such as the Home, Insert, and View tabs, while others might
be new to you, such as Formulas, Data, and Power Pivot. To make a little more workspace for ourselves we can hide
this ribbon by double-clicking any tab, and to unhide it, we do the same. The other option is to use the shortcut key
CTRL+F1. The ribbon is organized into groups of buttons to make them easier to find. So, on the Home tab we have
groups for Font, Alignment, Number, Styles, and so on. Some of these groups contain all the available buttons on the
ribbon when viewing in full screen, such as Styles and Cells, but other ribbon groups have more options, which we
access by clicking the little arrow icon in the bottom right corner of the group, as can be seen here on the Number
group for example.
The next item I want to point out is the Quick Access Toolbar at the top of the screen above the ribbon. As the name
suggests this is where you can quickly access the tools you use most often. You can see we already have some tools
in this toolbar such as Save, Undo, Redo, New, and Open. But we can add other tools to the toolbar if we wish. So if
we click the drop-down arrow in the toolbar and then select a tool we will use a lot, such as Sort Ascending, that will
be added, and we will also add the Sort Descending button too.
Now we need to be comfortable with moving around a worksheet. You can simply use the arrow keys to move left,
right, up, and down 1 cell at a time. But you can also use Page Down and Page Up to move around a bit faster, which
is especially useful if you have lots of rows of data. And to move even quicker up or down a large datasheet use the
vertical scroll bar, and to move left or right use the horizontal scroll bar. Again, these can be very useful when you
have a large data set.
There are also some useful shortcuts you can use. CTRL+Home key for example takes you back to the start of the
worksheet (i.e. cell A1). CTRL+End takes you to the cell at the end of your data in the worksheet. CTRL+Down arrow
takes you to the end of the column you’re in, while CTRL+Up arrow takes you back to the top of that column. So a
quick way to find out how many rows of data you have in your worksheet is to go to the first cell in your data and
press CTRL+Down arrow to see the last row of data. So here you can see we have 160 rows. Now how do we go back
to the top again? CTRL+Home will do it.
So far, we have seen how to navigate around our worksheet and its data, now we need to look at how we select
data. This is very important because you often need to select data to move it, copy it, or select it in a formula. The
simplest selection is a single cell, usually done with a mouse or maybe a directional arrow key. The next step up is to
select multiple cells together, and this can be done either with a mouse by dragging from one cell to additional
adjoining cells, or you can use the SHIFT key with directional arrow keys.
Next up is selecting a single column or row which is done simply by selecting the letter at the top of a column, or the
number on the left of a row. Then we can progress to selecting multiple columns and rows, by clicking the mouse
button, holding it down and dragging across more columns. Or if you are not comfortable with dragging you can also
select the column first, then hold SHIFT+Arrow keys to select multiple columns. The same applies to rows too.
However, if you have data in non-contiguous rows or columns (i.e. not next to each other) you can select the first
column, then use the CTRL key to select another unconnected column, such as columns C and F here.
The largest thing you might want to select is the whole worksheet which you can do by clicking in the top left corner
of the cells. However, this selects the entire worksheet including all the empty rows and columns; so if you only want
the data in your worksheet, you can use the shortcut CTRL+A.
A word of warning when selecting data in cells, rows, and columns; there are 3 types of cross symbols that you might
see when working with selected cells. The first one is the large white cross that you see when you select a cell as can
be seen here in cell A4, this is the Select cross that we have been using already in this video to select cells. The
second type you might see is when you hover over the bottom edge of a cell and see a thin black cross-type symbol
with arrows on each point…. this is the Move symbol and would move the cell data to another location. The last type
is the small thin black cross that is seen when you hover over the bottom right corner of a cell; this is the Fill Handle
or Copy symbol and it fills (or copies) the cell data to another location.

IBM - Excel Basics for Data Analysis Company Internal 4


Excel Keyboard Shortcuts
The table below lists keyboard shortcuts for some of the most common Excel tasks.

Task Shortcut
Close a workbook Ctrl+W
Open a workbook Ctrl+O
Save a workbook Ctrl+S
Copy Ctrl+C
Cut Ctrl+X
Paste Ctrl+V
Undo Ctrl+Z
Remove cell contents Delete
Bold Ctrl+B
Open context menu Shift+F10
Expand or collapse the ribbon Ctrl+F1
Move up one cell in the worksheet Up arrow key
Move down one cell in the worksheet Down arrow key
Move one cell left in the worksheet Left arrow key
Move one cell right in the worksheet Right arrow key
Move to the edge of the current data region in the Ctrl+Arrow key (e.g.
worksheet (e.g. end of column) Ctrl+Down arrow)
Move to the last cell on a worksheet Ctrl+End
Move to the beginning of a worksheet Ctrl+Home
Extend the selection of cells to the last used cell on a Ctrl+Shift+End
worksheet (lower right corner)
Move to the cell in the upper-left corner of the window Home+Scroll Lock
(when Scroll Lock is On)
Move one screen down in a worksheet Page Down
Move one screen up in a worksheet Page Up
Move one screen to the right in a worksheet Alt+Page Down
Move one screen to the left in a worksheet Alt+Page Up
Move to the next sheet in a workbook Ctrl+Page Down
Move to the previous sheet in a workbook Ctrl+Page Up
Edit the active cell and put the cursor at the end of the cell's F2
contents
Enter the current time Ctrl+Shift+colon (:)
Enter the current date Ctrl+semi-colon (;)

IBM - Excel Basics for Data Analysis Company Internal 5


Cleaning & Wrangling Data Using Spreadsheet
Introduction to Data Quality
Data analysis can play a pivotal role in business decisions and processes. In order to use the data to make confident
decisions, we must have the right information for the project and the data must be free from errors. In this video we
will learn how to profile data to discover inconsistencies. Whether we are working with small sets of data or
analyzing a spreadsheet with thousands of rows, one of the most difficult parts of the data analysis is finding and
keeping clean data.
To help with this process and qualify the data, look for these five traits: Accuracy, Completeness, Reliability,
Relevance and Timeliness.
- Accuracy is the first and most significant aspect to data quality. A data analyst must clean the data set by
removing duplicates, correcting formatting errors, and removing blank rows.
- Another important aspect of data quality is determining if the information required to complete the data set
is readily available. Why does this matter as a trait for quality data? Let’s say we are given the task to
calculate the revenues of all sales per region. After collecting the data, we discover that no regions were
specified. This data would then be considered incomplete and other sources would have to be considered to
obtain the data required.
- Reliability is another vital factor in determining the quality of the data. For instance, let’s say we are given
the task to determine the agent revenue by customer. When gathering the data, we find the agents keep
their own records and do not always update the information in the shared company database. With those
factors in mind, we would then determine that the data in the shared company database was unreliable and
new processes would need to be established to ensure reliable data.
- Relevance is another trait of quality data. When collecting information, a data analyst must consider if the
data being assembled is really necessary for the project. For example, when reviewing the data related to
the sales revenue per customer, information such as customer birthdays and other personal information is
also included. By making the determination early to exclude the personal information from the data set, the
analyst would save themselves from having to review unnecessary information.
- The last factor in determining the quality of the data is timeliness. This trait refers to the availability and
accessibility of the selected data. Let’s say our sales report is going to be used for weekly employee reviews,
but our report is only refreshed once a month. This error in refreshing the data would cause our report to
become outdated and would have serious consequences for employee reviews.

Basics of Data Privacy


When collecting customer data, specific regulations apply to how that data can used. By understanding data privacy
regulations and getting familiar with the following three fundamentals, you can eliminate the risk of financial
penalties and keep the trust of your customers. Confidentiality, Collection and Use, and Compliance.
- Confidentiality is an important element in data privacy and it acknowledges that the customer’s personal
information belongs to them. The types of information that can be accessed by a data analyst can range
from sales forecasts, to employee information, or even patient records. When accessing these types of
records the analyst must be able to recognize the different types of personal data.

IBM - Excel Basics for Data Analysis Company Internal 6


o Personal Information or PI is any type of information that can be traced back to a specific individual.
This type of information can include anything from emails to images.
o Personally Identifiable Information or PII is specific information that could be used to identify an
individual. This type of information could include a social security number or a driver’s license
number.
o Sensitive Personal Information or SPI, may not necessarily identify a specific individual, but contains
private information that needs to be protected because if made public it could possibly be use to
harm the individual . The type of information can include data about race, sexual orientation,
biometric or genetic information.
By understanding personal data and the associated regulations, we can efficiently anonymize our data by
removing unnecessary information. This type of action can help build consumer confidence and continue to
develop the free flow of information.
- When searching through data, the analyst must know the location of the company collecting the data and
the location of the respondent. Knowing where the data was collected is an essential element of data privacy
and what regu68lations must be applied. The General Data Protection Regulation or GDPR is a regulation
specific to the European Union, and only applies to the jurisdiction of the individual. A new law created in
Brazil, the LGPD, will take effect in August 2020. These new data policy regulations apply to individuals
within Brazil, and ignores the location of the data processor. While the United States does not have one
country-wide principle law for data privacy. Because of this individual states began to make their own
regulations. For instance, California created the California Consumer Privacy Act (CCPA) to better protect
customer data. There are also industry specific regulations that govern the collection and use of sensitive
and personal data. For example, in Healthcare, HIPAA privacy rules govern the collection and disclosure of
protected health information. In retail, the PCI standards govern credit card data, and failure to safeguard
cardholder information can result in hefty fines. With a basic understanding of these policies, we are able to
remain compliant when handling any sensitive information.
- Unfortunately, breeches in customer data is an all too common occurrence and understanding how to
remain compliant is essential. Understanding the data privacy regulations of the European Union, the
United States, and other countries as well as industries is key to keeping data safe. Companies must comply
with these privacy regulations at all times and also make sure policies are readily accessible to employees.
For example, let’s say a data analyst downloads a spreadsheet of sensitive information. In order to complete
the report by Monday morning, the analyst decided to take their work laptop home for the weekend. After
driving home, the analyst accidently left the laptop in their car. The next morning, they found their car had
been stolen along with the laptop. Because it is the responsibility of the company to keep customer data
safe, this was a breach of privacy when the data left company property. This type of action could not only
cost the company large amounts of money in fines and penalties, but could also reduce consumer
confidence causing a significant impact to revenue. While data privacy applies to most data that is collected,
there are some instances where these regulations do not apply. In order for these laws and regulations not
to apply, the particular collection of data must be completely anonymous. To make data anonymous means
to exclude all data which ties it back to a particular individual. While this approach might not be practical in
all circumstances, collecting data with privacy in mind could remove privacy limitations and make data
collections more accessible.

IBM - Excel Basics for Data Analysis Company Internal 7


Removing Duplicated or Inaccurate Data and Empty Rows
It’s very common when collecting or importing data - whether through manual or automated processes - to get
errors and inconsistencies in your data. This can be as simple as spelling mistakes, extra white space, or the wrong
case used in text, to empty rows or missing values in your data, to inaccurate or duplicated data. Having these errors
and inconsistencies in your data can lead to issues with formulas not working, with unsuccessful sorting and filtering
operations and therefore inadequately visualized and presented data findings. These data errors and inconsistencies
require you to carry out some form of data-cleaning routine to improve the quality and usability of the data.

Dealing with Inconsistencies in Data


When you collect or receive data from varying sources, it’s quite common to find that your data contains text in
mixed case; that is, some in uppercase, some in lowercase and some in capitalized proper case (also known as
sentence case). Some of this may be intentional; but often it’s not. Excel doesn’t have a Change Case button like
there is in Microsoft Word, so you need to use other methods to perform this data cleaning task. Those methods are
functions; namely the UPPER, LOWER, and PROPER functions. You can use these functions to help you change the
case of text in your data.
You might find that your data has some whitespace; that is, unwanted spaces in your data. You can use
Find&Replace functionality or the TRIM function.

More Excel Features for Cleaning Data


Flash Fill automatically fills your data when it senses a pattern. For example, you can use Flash Fill to separate first
and last names from a single column, or combine first and last names from two different columns.
Let's say column A contains first names, column B has last names, and you want to fill column C with first and last
names combined. If you establish a pattern by typing the full name in column C, Excel's Flash Fill feature will fill in the
rest for you based on the pattern you provide.
1. Enter the full name in cell C2, and press ENTER.
2. Start typing the next full name in cell C3. Excel will sense the pattern you provide, and show you a preview of
the rest of the column filled in with your combined text.
3. To accept the preview, press ENTER.

The Text to Column wizard allows to take text in one or more cells and split it into multiple cells.
1. Select the cell or column that contains the text you want to split.
2. Select Data > Text to Columns.
3. In the Convert Text to Columns Wizard, select Delimited > Next.
4. Select the Delimiters for your data. For example, Comma and Space. You can see a preview of your data in
the Data preview window.
5. Select Next.
6. Select the Destination in your worksheet which is where you want the split data to appear.
7. Select Finish.

IBM - Excel Basics for Data Analysis Company Internal 8


Analyzing Data Using Spreadsheets

IBM - Excel Basics for Data Analysis Company Internal 9

You might also like