0% found this document useful (0 votes)
17 views

Business Intelligence Data Analyst_Career Path

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Business Intelligence Data Analyst_Career Path

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

17/02/2024, 13:27 Welcome to the BI Data Analyst Career Path: Path Overview and Introduction to Data Cheatsheet | Codecademy

Cheatsheets / Welcome to the BI Data Analyst Career Path

Path Overview and Introduction to


Data

Garbage In, Garbage Out

The quality of the predictions made during a predictive


analysis is deeply dependent on the quality of the data
used to generate the predictions.
For example, if a model is trained with mislabeled data,
it will produce inaccurate predictions no matter how
good the actual algorithm is. This is commonly referred
to as, “garbage in, garbage out.”

Binary Categorical Variables

Categorical variables can also be binary or


dichotomous variables. Binary variables are nominal
categorical variables that contain only two, mutually
exclusive categories. Examples of binary variables are if
a person is pregnant, or if a house’s price is above or
below a particular price.

Categorical Variables

Categorical variables consist of data that can be


grouped into distinct categories, and are ordinal or
nominal. Ordinal categorical variables which are groups
that contain an inherent ranking, such as ratings of
plays or responses to a survey question with a point
scale e.g., on a scale from 1-7, how happy are you right
now? Nominal categorical variables are made of
categories without an inherent order, examples of
nominal variables are species of ants, or people’s hair
color.

https://www.codecademy.com/learn/bida-welcome-bida-career-path/modules/welcome-to-the-bi-data-analyst-career-path/cheatsheet 1/7
17/02/2024, 13:27 Welcome to the BI Data Analyst Career Path: Path Overview and Introduction to Data Cheatsheet | Codecademy

Quantitative Vs. Categorical Variables

Variables can be either quantitative or categorical.


Quantitative variables are amounts or counts; for
example, age, number of children, and income are all
quantitative variables. Categorical variables represent
groupings; for example, type of pet, agreement rating,
and brand of shoes are all categorical variables.

Categorical Data De ned

Categorical Data refers to data represented by words


rather than numbers. Examples of categorical data are
tree species and survey responses (Agree, Neutral,
Disagree).

Ordinal and Nominal Categorical Data

Categorical variables can be either ordinal (ordered) or


nominal (unordered).
Examples of ordinal variables include places (1st, 2nd,
3rd) and survey responses (on a scale of 1 to 5, how
much do you agree with a statement).
Examples of nominal variables include tree species,
student names, and account names.

https://www.codecademy.com/learn/bida-welcome-bida-career-path/modules/welcome-to-the-bi-data-analyst-career-path/cheatsheet 2/7
17/02/2024, 13:27 Welcome to the BI Data Analyst Career Path: Path Overview and Introduction to Data Cheatsheet | Codecademy

Messy Data

Messy data is data that violates one of the tidy dataset


rules (1. Each variable forms a column; 2. Each
observation forms a row; 3. Each type of observational
unit forms a table).
Below is an example of messy data:

ID# Name ChemGrade2020 MathGrade20

1 Brown F

B smith
Saito,
3 A 90
K

Tabular Data

Tabular data is organized into rows, or observations,


and columns, also referred to as variables or features.
We can read each column “down” the table (viewing
multiple observations), and each row “across” the table
(viewing multiple variables).
Row
Variable 1 Variable 2 Variable
#
1 Observation Observation Observat

2 Observation Observation Observat

3 Observation Observation Observat

https://www.codecademy.com/learn/bida-welcome-bida-career-path/modules/welcome-to-the-bi-data-analyst-career-path/cheatsheet 3/7
17/02/2024, 13:27 Welcome to the BI Data Analyst Career Path: Path Overview and Introduction to Data Cheatsheet | Codecademy

Tidy Data Rules

A tidy dataset follows three fundamental rules:


Each variable forms a column.
Each observation forms a row.
Each type of observational unit forms a table.

Below is an example of a tidy dataset:

ID# Student Year Class Grade

1 Brown 2020 Chem F

1 Brown 2021 Chem B

1 Brown 2021 Math A

2 Smith 2020 Bio C

2 Smith 2021 CompSci B

3 Saito 2020 Chem A

3 Saito 2021 Math B

Sample Set of Data

A sample set of data is a dataset that is representative


of the entire population of interest. Random sampling is
the best way to make sure the sample is representative
of the whole population but does not guarantee a
representative sample, especially if the sample is too
small.

https://www.codecademy.com/learn/bida-welcome-bida-career-path/modules/welcome-to-the-bi-data-analyst-career-path/cheatsheet 4/7
17/02/2024, 13:27 Welcome to the BI Data Analyst Career Path: Path Overview and Introduction to Data Cheatsheet | Codecademy

Structurally Missing Data

Structurally Missing Data is data that is expected to be


missing.
For example, there are structurally missing data in the
‘Litters’ and ‘Pups/Litter’ columns for all the male dogs
in the table below because we would not expect male
dogs to have puppies.

ID# Name Breed Sex Litters Pu

1 Gnasher ACD M

2 Cassie Collie F 1 3
French
3 Pepper F 4 2
Bulldog
Golden
4 Jed M
Retreiver
5 Henry Spaniel M

6 Ruby ACD F 1 6

https://www.codecademy.com/learn/bida-welcome-bida-career-path/modules/welcome-to-the-bi-data-analyst-career-path/cheatsheet 5/7
17/02/2024, 13:27 Welcome to the BI Data Analyst Career Path: Path Overview and Introduction to Data Cheatsheet | Codecademy

Missing at Random Data

Missing at Random (MAR) data is missing because of


some random characteristic about the person or thing
being studied. Often, this type of data is reliably missing
based on the value of another variable in the dataset.
In the table below, the bacterial cell counts for all the
stool samples are ‘NaN’. If we looked into this, we might
nd that there were too many bacterial cells to count in
all those samples. Therefore, the bacterial cell counts
for stool samples would be MAR data.
Sample Sample Bacterial Cell
ID Type Counts
1 Hand Swab 1008

2 Stool NaN
Mouth
3 7876
Swab
4 Hand Swab 657

5 Stool NaN

6 Hand Swab 2442


Mouth
7 5444
Swab
8 Stool NaN

9 Hand Swab 4654

10 Stool NaN

https://www.codecademy.com/learn/bida-welcome-bida-career-path/modules/welcome-to-the-bi-data-analyst-career-path/cheatsheet 6/7
17/02/2024, 13:27 Welcome to the BI Data Analyst Career Path: Path Overview and Introduction to Data Cheatsheet | Codecademy

Data Missing Completely at Random

Missing Completely at Random (MCAR) data has no


detectable underlying reason causing the values to be
missing.
The table below has MCAR data. The # of fruits is
missing for some plants, but the missing fruit data
seems unrelated to the height of the plant. Short and
tall plants are both missing fruit data. In addition, we are
missing the height for one of our plants!

Plant Height (cm) # of Fruits

1 65 10

2 87

3 987

4 44

5 105 35

6 547 74

7 876

8 55

9 875 95

Print Share

https://www.codecademy.com/learn/bida-welcome-bida-career-path/modules/welcome-to-the-bi-data-analyst-career-path/cheatsheet 7/7
17/02/2024, 13:30 Welcome to the BI Data Analyst Career Path: Introduction to Data Cheatsheet | Codecademy
Cheatsheets / Welcome to the BI Data Analyst Career Path

Introduction to Data

Data Gaps

The ability to separate good, mediocre, and poor


quality data is a crucial data literacy skill. Data-driven
conclusions are only as strong, robust, and well-
supported as the data behind them. This is also often
referred to with the phrase “garbage in, garbage out.”

Addressing Bias

Bias in data collection leads to poorer quality data.


Recognizing bias in data is a crucial data literacy skill.
Some key questions about bias include “Who made the
data?”, “Who participated in the data?” and “Who is left
out of the data?”

What is Statistics?

Statistics helps to measure whether an event happens


by chance or by a systemic factor or factors. For
example, it’s statistically more likely to see tra c during
peak rush hour than outside of peak rush hour times.

https://www.codecademy.com/learn/bida-welcome-bida-career-path/modules/introduction-to-data-cp/cheatsheet 1/2
17/02/2024, 13:30 Welcome to the BI Data Analyst Career Path: Introduction to Data Cheatsheet | Codecademy

Statistics at work

Statistics can reveal systemic patterns in a data set


rather than relying on individual experiences. This is
important in legal cases including those addressing
discrimination or class-action lawsuits.

Print Share

https://www.codecademy.com/learn/bida-welcome-bida-career-path/modules/introduction-to-data-cp/cheatsheet 2/2
17/02/2024, 13:36 Learn Microsoft Excel for Data Analysis: Exploring Data Cheatsheet | Codecademy
Cheatsheets / Learn Microsoft Excel for Data Analysis

Exploring Data

Pivot Tables

A pivot table restructures a dataset by grouping data


points categorically and summarizing values within each
category.

Pivot Table Labels

The columns and rows of a pivot table are labeled using


the unique values of zero or more columns of the
source dataset.

Pivot Table Values

The values of a pivot table are calculated using


standard summary statistics including maximum,
minimum, average, count, and standard deviation.

Cells in Excel can be lled formulaically

A formula begins with the = sign followed by an


expression using built-in Excel functions and standard
mathematical operations.

https://www.codecademy.com/learn/analyze-data-with-microsoft-excel/modules/exploring-data-in-excel/cheatsheet 1/3
17/02/2024, 13:36 Learn Microsoft Excel for Data Analysis: Exploring Data Cheatsheet | Codecademy

Cell References

Formulas in Excel can reference values in other cells by


stating the column letter followed by the row number
(e.g. A2).

Formula Recalculation

Formulas in Excel will automatically recalculate if the


data in the referenced cell is altered.

Dragging Formulas in Excel

When a formula in Excel is dragged into other cells, by


default any row and column references are
automatically incremented relative to the original cell
containing the formula.

Using dollar signs in Excel

If dollar signs are placed before the column and/or row


of a referenced cell (e.g. $A2, A$2, or A$2$), Excel will
not update the column and/or row respectively of the
cell reference when the formula is dragged into other
cells.

https://www.codecademy.com/learn/analyze-data-with-microsoft-excel/modules/exploring-data-in-excel/cheatsheet 2/3
17/02/2024, 13:36 Learn Microsoft Excel for Data Analysis: Exploring Data Cheatsheet | Codecademy

Sorting in Excel

Excel can sort tables by date or time, numerically, and


alphabetically.

Filtering in Excel

Excel can lter tables to only show rows containing


certain values or ranges.

Updating Pivot Tables

Pivot tables in Excel need to be manually refreshed


when the source data changes.

Print Share

https://www.codecademy.com/learn/analyze-data-with-microsoft-excel/modules/exploring-data-in-excel/cheatsheet 3/3
Cheatsheets / Learn Microsoft Excel for Data Analysis

Visualizing Data

Color Scales in Excel

Excel can color tables of numeric data using darker


shades for larger numbers and lighter shades for
smaller numbers (a heatmap or color scale).

Cell Rules in Excel

Excel can color tables using one color for data within an
acceptable range and another for data outside an
acceptable range.

Visualizing data in Excel

Excel can create column (or bar) charts, histograms,


scatterplots, line charts, and pie charts from a table of
data.
Scatterplots

A scatterplot visually represents connections between


two numeric variables.

Line charts and sparklines

A line chart or sparkline is useful for analyzing trends in


data over time.
Pie Chart Pitfalls

Pie Charts have two common pitfalls:


It can be di cult for viewers to compare sector
sizes within the chart.
If a pie chart contains too many sectors, it is
di cult for a viewer to decipher any useful
information.
If you ever run into this issue, a bar chart may be the
best solution. The picture comparing pie charts and bar
charts shows why.
With each pie chart, it is almost impossible to compare
separate sectors. However, the bar chart makes the
comparisons much easier to decipher.

Visualizing Categorical Data

Bar Charts and Pie Charts are used to visualize


categorical data. Both types of graphs contain
variations as displayed in the visual.

Print Share
Cheatsheets / Learn Microsoft Excel for Data Analysis

Handling Data

Importing data into Excel

CSV les must be imported into Excel for all Excel


features to function, whereas Excel documents can be
opened directly.

CSV Delimiters

A variety of delimiters can be chosen when importing


text data into Excel. To import a CSV, select “comma”
as the delimiter.

Formatting in Excel

Excel is built to auto-format dates and numbers, and


that behavior can be controlled with custom formatting.

Trimming Whitespace in Excel

Whitespace can be trimmed in Excel using =TRIM()


to make data more uniform.

Truncating Text in Excel

Text can be truncated in Excel using =LEFT() and


specifying the cell containing the text and the number
of characters to include (from the start of the text.)
Converting to Lowercase in Excel

Text can be converted to lowercase in Excel using


=LOWER().

Protecting sheets in Excel

Viewing and/or editing cells, sheets, and ranges in Excel


can be controlled by protecting and hiding sheets.

Security in Excel

Protected cells, sheets, and ranges in Excel can still be


viewed by opening the le in a di erent program.

Comma Separated Values (CSV)

CSV (Comma-separated values) les represent plain


text in the form of a spreadsheet that use comma to
separate individual values. This type of le is easy to
manage and compatible with many di erent platforms.
This le can be imported to a database or to an
Integrated Development Environment (IDE) to work with
its content.

Print Share
Cheatsheets / Learn Microsoft Excel for Data Analysis

Next Steps

Google Sheets Cheat Sheet

This cheatsheet will help you get started with Google


Sheets, describing the Google versions of all the tools
we teach in Learn Microsoft Excel for Data Analysis.

Mathematical Formulas

Most of the standard mathematical formulas are the


same across spreadsheet programs, including =MAX ,
=MIN , =AVERAGE , and =SUM .

Dragging Formulas

Like Microsoft Excel, Google Sheets let you drag


formulas by selecting the bottom-right corner of the
cell you want to drag. In some tables, Google Sheets
may o er an “auto ll” option before you drag the
formula down.

Dollar Signs

Google Sheets uses the same dollar sign syntax as =A4 becomes =A5 when dragged down one
Microsoft Excel when writing draggable formulas. A $
row
before the column letter stop the column from
changing when dragged, while a $ before the row =A$4 remains =A$4 when dragged down one
number stops the row number from changing when row
dragged.
Sorting Data

To sort data in Google Sheets, select the data you want


to sort, then select
Data → Sort Range → Advanced Range
Sorting Options .
If your range has headers, select the has
headers box so that those stay at the top. Then,
select the column you want to sort and the direction of
sorting (i.e. increasing/decreasing).

Filtering Data

To lter data in Google Sheets, select the data you want


to lter including the headers, then select Data →
Create a Filter . This will add lter icons to
the header of each column of the range. Click the lter
icon on the column you want to lter by. Use Filter
by Condition to lter a range of values by
selecting an option from the dropdown.
Pivot Tables

Pivot tables can be created in Google Sheets by


selecting the table to pivot and then select Insert
→ Pivot Table .

Customizing Pivot Tables

To customize a pivot table in Google Sheets, click


Add next to Rows to select row labels, click Add
next to Columns to select column labels.

Pivot Table Calculations

To customize the values in a pivot table in Google


Sheets, click Add next to Values to select the
column to use in the calculation. Use the
Summarize by dropdown to switch between
count, average, max, etc.
Refreshing Pivot Tables

Unlike Microsoft Excel, Google Sheets refreshes pivot


tables automatically.

Pivot Table Source

To change the source of a pivot table in Google Sheets,


select a cell of the pivot table to see the pivot table
menu. The current source range is listed at the very top
of the menu. Click the icon next to that range to alter
the range.

Create a Heatmap

To create a heatmap in Google Sheets, select the data.


Then select Format → Conditional
Formatting and click Add another rule .
Select the Color Scale tab. Click the color scale
under Preview to select a color scale.
Create a Cell Rule

To color cells based on a rule in Google Sheets, rst


select the data. Then select Format →
Conditional Formatting and click Add
another rule . Make sure you are on the
Single Color tab. Use the Format cells
if dropdown to add the rule (e.g. if <26) and click the
preview color to select a color.

Create a Chart

To create a chart in Google Sheets, start by selecting


the entire table of data, then select Insert →
Chart . Make sure you are on the Setup tab of
the chart menu, and click the Chart Type
dropdown to select the type of chart (pie, column,
histogram, scatter, line,…)
Modify Chart Titles

To add titles to a chart in Google Sheets, rst select the


chart. Then select the Customize tab of the chart
menu. Select Chart and Axis Titles to
modify the chart and axis titles (note, you’ll have to use
the Chart Title dropdown to view and modify
the axis titles.)

Import a File

Select File → Import .

Text Functions

Google Sheets has the same basic text-cleaning If cell A3 contains the value "Test "
functions as Microsoft Excel, including =LEFT ,
=LEFT(A3,2) is "Te"
=LOWER , and =TRIM .
=TRIM(A3) is "test"
=LOWER(A3) is "test "
Formatting Cells

Like Microsoft Excel, Google Sheets can apply


formatting to numeric and text cells. Select the data
you want to format, then select Format → Number
to format numbers, currency, and dates (or Format
→ Text to format text)

Protecting Sheets

Google Sheets can protect sheets from being edited.


Select Data → Protect Sheets and
ranges and then Add a sheet or range.
Select the Range or Sheet tab depending on
which you want to protect. Name the rule, and then
select the sheet or range to protect. Select
Change/Set permissions to modify what
users can do on the sheet or range.
Unprotecting Speci c Cells

Google Sheets can uprotect speci c cells on an


otherwise protected sheet. When you protect the
sheet, check the box Except certain cells
and then select the unprotected cells.
Hiding Sheets

Google Sheets can hide sheets like Microsoft Excel.


Select the arrow on the sheet name, and then Hide
Sheet .

Print Share

You might also like