0% found this document useful (0 votes)
26 views

Lecture 4. Visualization(1)

Uploaded by

Berke Al
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Lecture 4. Visualization(1)

Uploaded by

Berke Al
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Chapter 4.

Communicating Results and


Visualization
• Determine the purpose of your data visualization
• Choose the best chart for your dataset
• Refine your chart to communicate efficiently and effectively
• Communicate your results in a written report
Identify the
questions

In the IMPACT cycle, Track Master the

we’re now going to


outcomes data

look at
Communicating Communicate Perform test

Insights and Tracking insights plan

Outcomes. Address and


refine results

Exhibit 1-1 The IMPACT Cycle


Data Analytics are only as important and
effective as we can communicate and make
the data understandable.

What is the purpose of your data


visualization?
Data with the same statistics can be
interpreted differently.

Exhibit 4-2 Anscombe's Quartet (Data) Exhibit 4-3 Figure Plotting the Four Datasets in Anscombe's
Quartet
4
Visualizations are preferred over text.
• People prefer visuals.
• The brain can process visuals faster.
• Visuals can summarize complex information

5
Chart Types

What type of data is


being visualized?

Are you explaining


results or exploring the
data?

Exhibit 4-2 The four chart types


Qualitative and quantitative data
Qualitative data are categorical data Quantitative data are numerical
(e.g., count, group, rank) (e.g., age, height, dollar amount)
A. Nominal data is simple. A. Ratio data defines 0 as “absence of”
(e.g., hair color) something. 0 is treated as point of origin.
B. Ordinal data can be ranked. (e.g., cash)
(e.g., gold, silver, bronze) B. Interval data where 0 is just another
C. Proportion shows the makeup of each number. (e.g., temperature)
category. C. Discrete data show only whole numbers.
(e.g., 55% cats, 45% dogs) (e.g., points in a basketball game)
D. Continuous data show numbers with
decimals. (e.g., height)
E. Distributions describe the mean, median,
and standard deviation of the data.

4-7
Probability Distributions
• A distribution in statistics (probability distribution) is a function that
describes the likelihood for all the possible values a random variable
will assume.
• The normal distribution is a bell-shaped probability distribution that
many naturally occurring datasets in our world follow.
Normal Distribution
• Bell-shaped, symmetric family of distributions
• Classified by 2 parameters: Mean (µ) and standard deviation (s).
• Random variables that are approximately normal have the following
properties:
• Mean is the same as the Median
• Approximately half (50%) fall above (and below) mean
• Approximately 68% fall within 1 standard deviation of mean
• Approximately 95% fall within 2 standard deviations of mean
• Virtually all fall within 3 standard deviations of mean
• Notation when Y is normally distributed with mean µ and standard
deviation s :
Y ~ N (µ ,s )
Normal Distribution

P(Y ³ µ ) = 0.50 P( µ - s £ Y £ µ + s ) » 0.68 P( µ - 2s £ Y £ µ + 2s ) » 0.95


Standard Normal (Z) Distribution
• Problem: Unlimited number of possible normal distributions
(-¥ < µ < ¥ , s > 0)
• Solution: Standardize the random variable to have mean 0
and standard deviation 1

Y -µ
Y ~ N (µ ,s ) Þ Z = ~ N (0,1)
s
• Probabilities of certain ranges of values and specific percentiles of interest can be obtained
through the standard normal (Z) distribution
Is your visualization declarative or
exploratory?
• Declarative visualization • Exploratory visualizations
is used to declare or are used to gain insights
present your findings to while you are interacting
an audience. with data.
(e.g. presenting financial (e.g. identifying potential
reports) customers)
Once you have
defined your data
and the purpose,
you can find an
appropriate chart or
graph.

Exhibit 4-5 The four chart types with details


Which charts are appropriate for qualitative
data?
Most common charts to show proportion:
Bar charts

Pie charts

Stacked bar chart


Visualize Proportion by Pie Chart vs. Bar Chart
Stacked Bar
Chart
More options for showing proportion
• Tree maps and Heat maps: one uses space the other uses color to
show scale of value.
More options for showing proportion
• Symbol maps:
Geographic maps
with symbol on it
representing value.

• Word clouds:
Counting frequency
of each words in the
datafile.
Which charts are appropriate for quantitative
data (more complicated data)?
• Line charts:
Show changes and trend.

• Box and whisker plots:


Show median, quartiles, max
and min of the data.
More Options
• Scatter plots

• Filled geographic maps


Conceptual Data-Driven
(Qualitative) (Quantitative)

Outlier detection:
Comparison:
Box and whisker plot
Bar chart
Pie chart
Stacked bar chart Relationship between two
Tree map variables: Here is a summary
Heat map Scatter plot
guide of when to use
different visualizations.
Geographic data: Trend over time:
Symbol map Line chart

Text data: Geographic data:


Word cloud Filled map

Exhibit 4-8 Types of charts


Which tools are helpful for creating
visualizations?
• Tableau and Microsoft BI are
great for exploratory data
analysis.
• Tableau and Microsoft BI top the
list of visionary leaders for
visualization tools.
• Microsoft Excel is good for basic
declarative charts.

Exhibit 4-9 Gartner Magic Quadrant for


Business Intelligence and Analytics Platforms
Tableau uses a workbook and sheet file structure, much like Microsoft Excel.
v A workbook contains sheets.
v A sheet can be a worksheet, a dashboard, or a story.

vA worksheet is where you create visualization using data. It contains a


single view along with many elements.
vA dashboard is a collection of views from multiple worksheets that you can
arrange and present in a large “board”.
vA story contains a collection of worksheets or dashboards that are shown in
sequence to convey information, like a PowerPoint presentation.
In this chart, the Daily Mail, a UK-based newspaper, tries to emphasize
an upgrade in the estimated growth of British economy.

How big of a change does this


represent?
Why might the creator make this
chart?

Exhibit 4-12
A more appropriate scale is a good start.

Exhibit 4-12 Exhibit 4-13


Stacking can reveal the real increase.

Exhibit 4-12 Exhibit 4-14


This chart tries to show whose computer is
attacked more.

Exhibit 4-15
If we care about individuals, an ordered bar
chart is more clear.
If we care about job function, a bar chart can
show the proportion more clearly.
And a stacked bar chart shows the proportion
by job function.
The following four charts represent the exact same data quantity of each beer sold. Which do
you prefer?

A C

B
D
Consider scale and increments:
• How much data do you need • What is the baseline? 0?
to show? Something else?

• What do you do with • Would context or reference


outliers? lines make the scale more
meaningful?
Think about your use of color:
• When should you use multiple colors?
• Certain color has its culture meanings. (E.g., red meaning “stop”
and green meaning “continue,” just like with traffic lights. )
• Once your chart has been created, convert it to grayscale to
ensure that the contrast still exists—this is both to ensure your
color-blind audience can interpret your visuals and also to
ensure that the contrast is stark enough.
Remember to use plain language throughout
the IMPACT model.
• I: Explain what was being • C: Provide an explanation of the
researched and the purpose of the visual you chose. Describe any
project. items that stand out or that are
interesting.
• M: If appropriate, describe issues
you encountered in the ETL • T: Discuss what’s next in your
process. analysis. How frequently will it be
updated? Are there trends or
outliers that should be paid
• P and A: Give an overview of your attention?
model and limitations you faced.
Consider your audience and tone
• Place the focus on your • Use an appropriate tone.
audience.
• Provide the right content.
• Craft different versions for
different audiences.
• Avoid too much detail.
Two Examples of Data Visualization (Tableau)
• The U.S. Department of Treasury, with other government
agencies and help of a consulting firm, implements the
Digital Accountability and Transparency Act (DATA). As part
of the effort, Government Accountability Office publicize its
budget data here:
USAspending.gov
• Visualization of the COVID situation worldwide
https://gisanddata.maps.arcgis.com/apps/opsdashboard
/index.html#/bda7594740fd40299423467b48e9ecf6
Tableau example - superstore
• Tableau software comes with the example “Superstore”
• First, let’s open the excel file that is the data source of this example.
You could either open it from your own computer, it should be stored
under “Documents/My Tableau Repository”, or if you have trouble
locating it, download the excel file from the blackboard course
webpage “data and other course materials” folder
• Once you’ve opened the data source, double click on the three
worksheets, Tableau generates the relation between sheets
automatically, and you could redefine the relations. The new version
of Tableau keeps the relations without joining the tables into a flat
data file, so we don’t need to select a join type here.
Tableau example - superstore
Now, let’s do some simple analysis of the data in Tableau
• Show sales by date, category and sub-category
• Let’s color code the sales numbers by profit
• Show the same breakdown as above by region
• You can show the data in map view by double clicking a location field
from the data pane
• Create calculated field, e.g., profit margin ratio
• Create a Dashboard by drag and drop worksheets, then re-arrange
and format it

You might also like