Data Visualization Rubic
Data Visualization Rubic
Data Visualization
The Science Behind the Art of DataSt
orytelling
Agenda
90%
5x 1/10
SECOND
▪ People are more inclined to perceive certain visual cues (variables) better than
non-visual cues.
Perception
Measurement Can you make a numeric observation from a
change?
▪ Most quantitative
analysis can be
performed with charts
that use only four kinds
of objects.
Points Lines
▪ These objects (and their 2D position, 2D position,
subsequent related example: scatter example: line chart
we immediately and
more precisely perceive
both position and
length.
Bars/Columns Boxes
2D position + length, 2D position + length (unlike bars, show
example: bar chart distribution of an entire set of values)
example: bar chart
Know Your Purpose and Data
Know Your Data
Dimensions
(Categorical)
Measures
▪ Measures constitute numerical data
that are calculated or aggregated – like
the sum of Revenue, average Cost, or
Profit-per-capita or non-numeric data
that are counted. What do measures represent?
Measures can represent observations in your data or
calculated values.
Correlation Geographical
Distribution Overview
Shows, whether there is a Information and Maps
Shows how a measure is Shows the exact values in
potential correlation between Shows the geographical
spread across its domain table format
two measures distribution of measure values
LineandAreaC
hart Sales revenue in 2015 - 2017
Suggestions
The Line Chart displays measures over a time period. Line
Charts are used frequently to show trends and 1.
relationships between them. The Y-Axis always shows a Create a time hierarchy to
allow drilling up or down to
measure value, and the X-Axis denotes a time dimension
Days, Months, and Years
such as Month, Quarter, or Year.
2.
Used for Add a moving average line to
smooth the trend over time
•Trends The impact of different product lines by sales revenue
•Data over time in 2015 - 2017 3.
•Temporal patterns and correlation Add a forecast or linear
•Period-over-period regression to emphasize
current or future trends
4.
Consider an Area Chart for
showing cumulative totals
Change Over Time
ColumnLineC
hart Customer retention rate plays a role in increasing sales
Used for 2.
Other options for showing
•Trends change over time include
•Data over time Bar Charts or Tables.
•Temporal patterns and correlation
Year / Quarter /
Month
Comparison
BarChartandStacke
dBar Chart
Number of ships the company possesses based on entities they transport
Suggestions
1.
Use data labels to improve the
Bar Charts are probably the most frequently used
readability of data values.
chart type. Focus the attention of your audience to
important details by:
Quantity sold by Manager and Lines 2.
Customize hierarchies to
•Ranking data from largest to smallest or vice versa
allow drilling from a high-level
•Filtering out data that isn’t important for your message overview to more specific
details; users easily drill up
•Grouping data by combining values in a chart – if there
and down
are too many categories, you can group less relevant
categorical values together into an Other group (for
3.
example, “Other Clothing”)
Use Color to clearly
differentiate separate
Used for categorical values in
your dimension
Comparing different categorical values Which products are most likely to win deals?
Waterfal
Chart
Used for 2.
Break down the cumulative
•Cumulative effect
effect of positive and negative
•Deviations and differences contributions
3.
Visualize a starting
quantity
Comparison
TrelisLayoutofMultiple
Charts
Milk sold in glass containers generates the highest revenue
1.50M
$1,086,655.70
1.00M
The Trellis Layout, also known as Small Multiples,
$542,127,14
contains a set of charts based on the same set 500.00k
Suggestions
of data using the same axes to allow the viewer $39,054.26
0.00
categorical comparisons of different values within a Regular Milk 2.00M 1.
dimension. Used to compare values
$1,422,326.44
1.50M within
a category (such as Trellis by
1.00M
Used for Milk Type to show the Sales
500.00k $363,565.66 Values for each Milk Type in a
Comparison, identifying patterns across multiple $255,456.12
0.00
separate chart.
categorical values
Carton Glass Plastic
Ranking
BarCha
rt
Number of participants from top 10 countries in a design contest
The Ranking feature allows the user to sort and filter data
based on their importance. For example, we may want to
sort Countries based on their Number of Participants.
The Group by Selection functionality can be used in order
to group values.
Used for
Suggestions
Emphasizing on top or bottom values in a chart
1.
Often categorical values (in this
case Countries) that contribute
less to the overall measure
value might be filtered out or
grouped together in another
category.
Part-to-Whole
StackBarCh
art Percentage of items sold for each product line in 2015 - 2017
Suggestions
Used for
A Part-to-Whole relationship shows how measure values that 1.
make up the whole of something (for example, Number of You can use stacked or side-
containers sold) compare to one another and how they each by-side bars to compare
compare to the whole. different hierarchy levels
(Country Region) or
classifications (Men’s Clothing,
Women’s Clothing).
2.
You can use a 100% Stacked
Bar Chart (or Marimekko
Chart) to show the portion that
each segment makes up in a
category.
3.
In addition to Stacked Bar
Charts and Marimekko
Charts, other charts (such as
pie, ring, and funnel charts)
can be used to show Part-to-
Whole relationships.
Part-to-Whole
Pie,Ring,andFu
nnel Charts
The percentage of containers sold by container type
Suggestions
Pie, Ring (Donut), and Funnel Charts are used to discern
1.
part-to-whole comparisons to either highlight a portion of
Limit use of Pie Charts to a
the data or to compare values for different categorical
small number of slices (no
values. These chart types are generally
more than 5 slices). Do not
not recommended if they include too many segments,
use a pie chart if the slices are
as the viewer will have a difficult time differentiating
of similar size.
between too many different colors.
2.
Used for Consider showing data
labels
Comparing percentage values in proportion to the whole for ease of reading
3.
Highlight only the most
important slice if possible
4.
Compare with using a bar
chart or ring (donut) chart –
the viewer is more likely to
Distribution
HistogramandGrou
ping
Number of prescription drugs sold in a pharmacy
across different age groups
ot
A Box Plot visually displays statistical distribution of a
measure within a dataset. It is often used to also show the
range in values for each categorical value. The lines on the
box plot refer to the minimum, first quartile, median, third
quartile, and maximum range of variation. The dots on the
box plot are visual representations of the outliers. Suggestions
1.
Used for
Compare data distribution for
Comparison, distribution of values, identifying outliers several categorical values
2.
Show distribution of medians
in data
3.
Include a reference line for
the
overall median in your data
Distribution
HeatMapandTree
Map
Sales Revenue and Quantity sold by Country
ScatterP
lot Countries with higher household income have higher
population growth
2.
Used for
Keep the aspect ratio square
Showing the correlation of two measures
3.
Create a Geo Hierarchy on
top
of location data (for example,
Tabl
e
You can find the Table as one of the visualization types in
Suggestions
the Data view section.
Discount, Gross Margin and Sales Revenue by Lines
1.
Used for Best for showing exact
values
Show multiple measures in one or two categories or
2.
hierarchies
Often charts and Tables are
shown on the same page, as
they emphasize different
aspects
3.
Highlight key information
with the Conditional
Formatting feature
4.
Setting the correct precision
(number formatting) for
measures included in a
Table is paramount in order
to not overload the user
Geographical Information and Maps
Choropleth
Map
Unemployment percentage across different states in the Suggestions
United States in 2014
Unemployment
1.
Use the Choropleth Map for
A Choropleth Map uses differences in shading, coloring, or 0.13
locations of similar size, as
the placing of symbols within predefined regions to indicate 0.11
the size of the area coloured
measure values in those areas. 0.09 may overemphasize larger
0.07
areas (for example, Canada
Used for covers a much larger area
0.05
than Japan despite being
Supports location-based comparisons of standardized data 0.03
much smaller in terms of
such as Rates, Densities, or Percentages population)
2.
Make sure your measure
values are normalized by the
geographic properties, for
example, by the population of
a geographic area
3.
Remember that the
granularity
of your regions (counties, for
example) will impact the
signal (aggregated measure
values) from your data.
Geographical Information and Maps
GeoBubbleCh
art
42K
Show trends over time and forecast future
values based on historical data within the Use a scatterplot to find out whether there is a
chart. correlation between two specific measures.
Often used in Stories to reflect a important key
figure
Network Chart Radar Chart Parallel Coordinates
4. 5. 6.
Use bullet graphs instead
Avoid pie charts. Start bar charts at zero. of gauges to save space.
Key Data
7. 8. 9.
Use sparklines to show trends Show time going from left to Use color only to highlight
on the X-axis. right on the X-axis. or accentuate meaning.
Tips
▪ Features:
• A rich set of data visualizations
• Interactivity: You can create visualizations even without knowledge of SQL or
Python!
• An easy-to-use interface for exploring and visualizing data
• Create and share dashboards
• Enterprise-ready authentication with integration with major authentication
providers (database, OpenID, LDAP, OAuth & REMOTE_USER through Flask
AppBuilder)
• An extensible, high-granularity security/permission model allowing intricate
rules on who can access individual features and the dataset
• A simple semantic layer, allowing users to control how data sources are
displayed in the UI by defining which fields should show up in which drop-
down and which aggregation and function metrics are made available to the
user
• Integration with most SQL-speaking RDBMS through SQLAlchemy
• Deep integration with Druid.io(high performance real-time analytics database)
• Completely free, no user license or one time download fee
Superset – Data Sources
▪ Required Packages
Superset – SQL Lab
▪ SQL Lab is a powerful SQL IDE that works with all SQLAlchemy
compatible databases.
▪ queries are executed in the scope of a web request.
▪ To enable support for long running queries that execute beyond the
typical web request’s timeout (30-60 seconds).
▪ In config.py file of superset set SQLLAB_TIMEOUT = 180
Superset – Creating chart/slice
▪ Configure Chart
Superset – Creating chart/slice
▪ Save Chart/Save and add into existing Dashboard/Save and add in new
Dashboard
▪ View Dashboard
Superset – Making your Dashboard Public