0% found this document useful (0 votes)
4 views

Data Storytelling 101

Uploaded by

priyanga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Data Storytelling 101

Uploaded by

priyanga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Data Storytelling 101

Things you should know about


creating data stories
Data storytelling is a critical skill for data scientists, analysts & managers

Storytelling has a 30X Return on Investment But analysts present their work, not their message

Rob Walker and Joshua Glenn auctioned common Data scientists present their analysis – what they did,
items like mugs, golf balls, toys, etc. The item and what they found. That’s not what the audience
descriptions were stories purpose-written by 200+ needs.
contributing writers.
Audiences need a message that tells them what to do,
Items that were bought for $250 sold for over $8,000 – and why. Told in an engaging way. As a story.
a return of over 3,000% for storytelling!
Share your data & analysis as data stories
Stories are memorable and viral
Whenever you share inferences from data – whether
People remember stories. They’ll act on them. it’s as a presentation, or an email or document with
your analysis, or as a dashboard – craft it as a story.
People share stories. That enables collective action.
This workshop will teach you the techniques of how to
We analyze data to improve people’s decision making. convert an analysis into a memorable story – even if
For this to be effective, data stories are needed more you’ve never told a story before.
than ever before.

2
With the growth of self-service BI, 85% of companies have lost track of how many
dashboards they generated

BUT 3 THINGS
What QUESTION does Is the ANSWER evident What ACTION should the
ARE UNCLEAR ON
the dashboard answer? from the dashboard? user take now?
MOST DASHBOARDS

3
We’ve been telling stories with data for a long time

4
Let’s look at 15 years of US Birth Data
LIN
K

This is a dataset (1975 – 1990) that For example,


has been around for several years and • Are birthdays uniformly distributed?
has been studied extensively. Yet, a • Do doctors or parents exercise the C-section option to move dates?
visualization can reveal patterns that • Is there any day of the month that has unusually high or low births?
are neither obvious nor well known. • Are there any months with relatively high or low births?

More births Fewer births … on average, for each day of the year (from 1975 to 1990)

Some special days like Most people prefer not Relatively few births during the Very high births in September.
April Fool’s day are to have children on the Christmas and Thanksgiving But this is fairly well known. Most
avoided, but Valentine’s 13th of any month, given holidays, as well as New Year conceptions happen during the
Day is quite popular that it’s an unlucky day and Independence Day. winter holiday season
Fraud Education
The pattern in India is quite different
LIN
K

This is a birth date dataset that’s For example,


obtained from school admission data • Is there an aversion to the 13 th or is there a local cultural nuance?
for over 10 million children. When we • Are holidays avoided for births?
compare this with births in the US, we • Which months have a higher propensity for births, and why?
see none of the same patterns. • Are there any patterns not found in the US data?

More births Fewer births … on average, for each day of the year (from 2007 to 2013)

Such round numbered patterns a We see a large number of Very few children are born in the
typical indication of fraud. Here, children born on the 5th, 10th, 15th, month of August, and thereafter.
birthdates are brought forward to 20th and 25th of each month – Most births are concentrated in
aid early school admission that is, round numbered dates the first half of the year
Fraud Education
This adversely impacts children’s marks
LIN
K

It’s a well-established fact that older The average marks of children “born” on the 1 st, 5th, 10th, 15th etc.. of
children tend to do better at school in the month tend to score lower marks.
most activities. Since many children • Are holidays avoided for births?
have had their birth dates brought • Which months have a higher propensity for births, and why?
forward, these younger children suffer. • Are there any patterns not found in the US data?

Higher marks Lower marks … on average, for children born on a given day of the year (from 2007 to 2013)

Children “born” on round numbered days score lower marks on average,


due to a higher proportion of younger children

Fraud Education
Understanding the audience & intent

You have data. Finding insights

You have analysis.


Storylining
Now what?
Designing data stories
Step 1

Understanding the audience & intent

Understanding Finding insights


audience & intent
Storylining

Designing data stories


Define your audience, they determine the story

The same data analysis can be relevant to DO IT: Who might be an audience for your
many people — each group is called persona. analysis?
• The trends in sales data for an organization is • Lookback at your recent analytics project.
relevant for a CEO, head of sales, region • Who do you know that can use this analysis?
leads, individual sales team member & every (Come up with a real or hypothetical personas)
employee.
CHECK IT: Verify these yourself
• The analysis of polio cases in UP is relevant to
 Is there a name for the individual?
the Minister of health, polio campaign manager,
field workers, NGOs, journalists & general  Was the role specific enough? (Head of sales
public. instead of just executive)

This section will dive deeper into defining a persona


Know your audience’s needs, that helps align the message

List scenario(s) for each persona DO IT: Start with your own hypothesis
• Pick one of the personas you had listed earlier.
For each persona, answer the following questions:
1. What situation are they currently in? • What problems do you think your persona is facing?
2. What problems do they face? • How do you feel the persona will use the analysis?
3. What is the consequence? • Frame it as a user scenario.
4. What action can they need to take using your analysis?
5. What is the impact of this action? CHECK IT: Verify user scenario with a partner
Combine these as a user scenario:  Is it framed as “As a [persona], I’m in [situation]
“As a [persona], I’m in [situation] where I face [problem], leading where I face [problem], leading to
to [consequence]. Solving it by [action] leads to [impact]” [consequence]. Solving it by [action] leads to
• John: As a Marketing manager, I have to create region-wise [impact]”
budget for the next quarter. I don’t know which regions give the
 Would the persona relate to this user scenario if
highest RoI, so my spend isn’t optimized. Solving it by
prioritizing the region will lead to maximum ROI. they heard it?

Clear needs & future scenario leads to effective communication.


Reference: SPIN Selling by Neil Rackham
Step 2

Understanding the audience & intent

Finding Insights Finding insights

Storylining

Designing data stories


Insights must be Big, Useful, and Surprising

IS THE INSIGHT The analysis must, of course, be statistically significant.


But it should also be numerically significant.
BIG We want a result that substantially changes the outcome.

IS THE INSIGHT What should the audience do after hearing the insight?
Can they take an action that improves their objective?
USEFUL Even if it’s informational, what should they do next?

IS THE INSIGHT Is this something they didn’t know? Is it non-obvious?


Does it overturn a domain-driven belief or a gut feel?
SURPRISING Or does it bring consensus to a group with divided opinion?

Filter the analyses using these as a checklist


Marking each analysis as Big, Useful or Surprising (High, Medium, Low)

Insights Big Useful Surprising


Twice as many Detractors talk about our Product’s ease of use. Low Medium High

Typing with capitalization in a credit application indicates creditworthiness Low Low High

Almost 20% of all voice search queries are triggered by just 25 words Low High Medium

More engaged employees have fewer accidents Low High Low

About 50% of American small businesses do not have a website High Medium Low

The recommendation system influences about 80% of content streamed on Big Low Low
Netflix

Only those that are high or medium on all aspects are insights 14
Step 3

Understanding the audience & intent

Storylining Finding insights

Storylining

Designing data stories


Storylines are plot outlines. They summarize the entire story

Gladiator’s storyline A business storyline


• The Emperor asks General Maximus to take • Our NPS improved 6%
control of Rome and give it back to people • It was 34% in 4Q18. Now it’s at 40% in 2Q19
• The ambitious Prince murders the emperor. • Despite lower satisfaction with our Support,
• Maximus is sold as a gladiator slave. His family our NPS grew
is murdered • This increase in NPS was mainly due to better
• Maximus grows famous, fights the Prince in the Product Quality & Research
arena, and wins
Notice “characters” in red. All stories
• He joins his family in death. Rome is in the have characters, human or otherwise.
hands of the people

Outlines are the backbone on which you flesh out your story.
This section explains how to create storylines
16
1. Start with the takeaway (The elevator pitch. The moral of the story.)

Close your eyes. Think of a childhood tale. DO IT: Write your takeaway as one sentence
Summarize the moral of the story in one line
What’s the one thing you want the audience to
We easily we remember these stories and their remember from your story?
summary as a moral several years later.
What’s the one message that the audience
Close your eyes. Think of a business should take away?
presentation from last week. Can you easily
summarize the message in one line? CHECK IT: Verify these yourself
 Is it a single, complete, sentence?
Stories are designed around a moral. A single
 Does it deliver what you want the audience to
takeaway. An “elevator pitch”
remember?
 Will your audience care a lot about this?

It’s a one-sentence summary of the most important message for the audience.
17
2. Find analysis that supports your takeaway. Ignore irrelevant content

What supports your takeaway from DO IT: Write your supporting analysis
“The Lion and the Mouse”?
http://www.read.gov/aesop/007.html
1. List all possible analysis
 2. Re-word them as sentences
The lion was an Asiatic lion
 3. Strike off what’s not relevant
The lion had a huge paw
 The lion spared the mouse it caught CHECK IT: Verify these yourself
 The lion was caught by a hunter’s net
 Is each necessary? Does each analysis
 It was stalking its prey when it got caught
support the takeaway?
 The mouse was nibbling grass nearby  Are they sufficient? Do the analyses prove
 The mouse took few minutes to cut the net the takeaway?
There’s no right or wrong answer. Think
about how it supports your takeaway.

Only include analysis that proves the takeaway.


Ensure that they fully prove the takeaway.
18
3. Convert analysis into messages by adding context

Analysis doesn’t mean anything to people. When DO IT: Add context to your analysis
it does, it’s a message. We do this by adding
context. Three ways to add context are: 1. Take each relevant analysis
1. Compare with similar numbers. 2. Convert it to a message for the audience by
Our $15 mn sales is $3 mn more than last adding context
year, $1 mn below budget, and twice our
nearest competitors. CHECK IT: Verify these yourself

2. Explain with analogies.  Will your audience understand the messages


If we stopped producing, it’ll take 3 months to without explanation?
dispose our excess inventory of $2 mn.  Will your audience understand why this
3. Add business interpretation. message is relevant?
Usage is correlated with discounts. For every
$1 discount, customer LTV increases by $24.

Frame each analysis as a message that the audience will understand and find relevant
19
4. Structure the messages into a pyramid or a tree

Construct a pyramid or tree-like outline Example of a business tree


• Start with the takeaway at the root of the tree Launch sales were 30% less than target due to
• Add a message that supports the takeaway high competition
• Add further details or supporting messages • Launch sales were projected at $20 mn in
• Messages must prove the first message, and the first month, but achieved only $14 mn
only the first message o Sales in every region were 20-50% lower.
• Strike off any message that isn’t required to o Only Philippines & Korea were on target
prove or support the takeaway • Competitors discounted price by 35% - which
• Add next message that supports takeaway is unsustainable for them
• Add details to prove the second message o 80 store discounts increased from 15% to 35%
• Remaining messages for the takeaway o The maximum sustainable discount is 20%
• Add details as required • Stores offered higher discounts saw less
than 20% of our target sales

Arrange messages hierarchically to prove & support the parent message


20
5. Re-order the messages to increase memorability and motivation

Structure your supporting messages into a Order messages into an emotionally contrasting,
memorable flow. Here are 7 flows that help: motivating sequence.
1. Time: e.g. Past, Present, Future Take this aspects-based flow:
Sales was $15 mn. Now it’s $18 mn. We • Our profits doubled. But our sales only grew
expect it to grow to $20 mn.
20%. Our gross margins stayed flat.
2. Place: e.g. NA, EU, APAC The “emotional arc” is falling,
3. Aspects: e.g. company, competition, context and not motivating.
4. Benefits: e.g. better, faster, cheaper Here’s the same message re-ordered:
5. Scale: e.g. local, regional, global • Our sales grew mildly at 20%. Our margins
6. Balance: e.g. pros, cons didn’t improve at all. But our profits doubled!

7. Priority or climactic: least to most important This emotional arc falls before
rising. This is more motivating.

Remember: Emotional contrast requires bad news – it makes good look better
21
Step 4

Understanding the audience & intent

Designing Finding insights

data stories
Storylining

Designing data stories


Visually representing data helps us to see patterns in the data
quickly

• It’s hard to find patterns & derive insights from


raw data
• Statistics can summarize data, but may hide
patterns in how the data is spread
• We use visual encoding techniques to map
data to visual attributes

23
How the data should be interpreted decides the type of chart to
be used

Correlation

Magnitude
Part-to-
Whole
Distribution

Flow

Change-
Spatial Deviation Ranking
over-Time

https://gramener.github.io/visual-vocabulary-vega/

24
We use visual design cues to support our annotations & message

• Pre-attentive processing drives our attention


towards certain elements more than others. 1 Position is the most powerful
• We can leverage these to highlight aspects encoding.
The eye and brain are naturally wired to detect mis-
of the chart that are relevant to our story.
alignment of the smallest order
2 Colour, when used in context, is
• For ex, when listing a set of countries, if the
relevant insight is for one country, we can powerful.
make it stand out as below: We can detect miniscule changes or variations in colour
when comparing an element with neighbouring elements.
This is what makes true colour (32-pixel colour, i.e. 4
billion) a necessity in computer graphics
3 Size is a useful differentiator.
The eye can detect moderate size variations at
moderate distances. Size also has a natural
interpretation: that of priority.

4 Several other
encodings are
possible
Aesthetics such as angle,
shadows, shapes, patterns,
density, labelling, enclosures,
etc. can each be used to map
data.
25
Your audience may not understand what you meant to show

• Meaning or message behind a chart isn’t DO IT: What can you understand from the
always obvious. chart shown next?
• The same chart can be interpreted in several
Look at the chart that will be shown next.
ways by your audience.
• You must guide your audience to see the List down what all you can understand as points.
message you want to show.
CHECK IT: Verify these yourself
 How many did aspects did you notice from the
final list of observations?
Class Xth English Marks Distribution

20,000

15,000

10,000

5,000

0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
4 type of annotations help the audience understand your intent

Summarize the chart in its title Teachers add marks to stop some students from failing
Don’t describe the chart. This chart shows Class 10 students’ English marks in Tamil Nadu, India, in What’s unusual
Don’t write the question to answer. 2011. The X-axis has the mark a student has scored. The Y-axis has the # of
students who scored that mark. Large number of students
Write the answer itself. Like a headline. score 35 marks.
This is a bell curve. But the spike at 35 (the mark at which students pass) is Few (but not 0) students score
Explain the chart unusual. Teachers must be adding marks to some of the students who are between 30-35
likely to fail by a small margin.
How should the user read it?
What do you say when you talk through it? 20,000 # students
Large number of students score
Explain what the visual is. Then the axes. exactly 35 marks
Then its contents. Then the inference. 15,000

Highlight essential elements Few (but not 0) students score


10,000 between 30-35
What should the user focus their eyes on?
Point it out. 5,000
Interpret what they’re seeing – in words.
0
Recommend an action 0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99
Marks
How should I act on this?
You need to change the audience. Only some students get this benefit.
(Otherwise, you made no difference.) Identify a fair policy that will be applied consistently.
We can apply the same principles to the chart & surrounding elements

A chart is supported by
several elements to get
context of the data & insight.
Chart Title Legends
Each element has a scope to
be well designed and help
improve the comprehension Chart Area
Data Labels
of the chart.

Vertical Axis Label

Horizontal Axis

Vertical Axis

Vertical Axis Values Horizontal Axis Values

Students

Horizontal Axis Label


Understanding the audience & intent

Apply these Finding insights

principles
Storylining

Designing data stories


With the growth of self-service BI, 85% of companies have lost track of how many
dashboards they generated

BUT 3 THINGS
What QUESTION does Is the ANSWER evident What ACTION should the
ARE UNCLEAR ON
the dashboard answer? from the dashboard? user take now?
MOST DASHBOARDS

31
Today we use dashboards to expose data. But users must explore & interpret it.

BEF
Quarterly Sales vs Target Product-wise growth ORE
Consumer C

Consumer B

Consumer A

Enterprise C

Enterprise B

Enterprise A
- 2,000,000 4,000,000

Country-wise revenue vs target Country-wise product growth (%)


AP
Region Cons.. Cons.. Cons.. Enter.. Enter.. Enter..
AU
EU AP 12 11 15 12 9 14

MEA AU 15 10 22 17 13 18

NA EU 18 20 12 14 15 22

- 0 0 0 0 MEA 22 30 9 16 18 20
00 00 00 00
0, 0, 0, 0,
00 00 00 00 NA 7 4 3 9 10 12
2, 4, 6, 8,
We automate data stories. So users act, rather than interpret.
A FT
SERVICES REVENUE 5% BELOW TARGET, DESPITEER
8% QOQ GROWTH
STORY GUIDE Revenue is 5%
Below Target
The visual on the right shows
our services revenue against QoQ growth
target. If we’re below target, is 8%
we must understand why.

The visuals below break up


the revenue by product and
region. Focus on the area with Q1, 2017 Q2, 2017 Q3, 2017 Q4, 2017 Q1, 2018 Q2, 2018 Q3, 2018 Q4, 2018 Q1, 2019 Q2, 2019 Q3, 2019 Q4, 2019
the weakest performance.
Revenue Target

GROWTH DRIVEN BY TARGET IMPACTED BY NA NA CONSUMER PRODUCTS


CONSUMER PRODUCTS, NOT SHORTFALL HAVE GROWN THE LEAST IN Q4
ENTERPRISE
Q4 2020 Q4 2020 Q4 2020
Consumer C NA Cons A Cons B Cons C Ent A Ent B Ent C

Consumer B
AU AP 12 11 15 12 9 14
Consumer A AU 15 10 22 17 13 18
EU
Enterprise C EU 18 20 12 14 15 22
MEA MEA 22 30 9 16 18 20
Enterprise B
NA 7 4 3 9 10 12
Enterprise A AP

Action: North America should grow consumer products. Leverage learning from other regions
Here’s are some live data stories that applies these principles
LIN
K

Does access to new Technology facilitate Innovation? Does it We were curious about whether the data on TCData360 could tell a
facilitate Entrepreneurship? The story about influential factors on innovation and entrepreneurship.
Global Information Technology Report findings tell us that With over 1800 indicators, we focused on the
"innovation is increasingly based on digital technologies and Networked Readiness Index, as it has indicators on
business models, which can drive economic and social gains from entrepreneurship, technology, and innovation.
ICTs...".

Source: https://tcdata360.worldbank.org/stories/tech-entrepreneurship/
European brewery identified €15 m cost savings after
consolidating vendors
A leading European brewery’s plants purchased
commodity raw materials from several vendors
each – and had low volume discounts.

Plants also placed multiple orders placed every


week, leading to higher logistics cost.

When plant managers were shown the data, they


objected, saying “That’s not always the case.” Or,
“That’s the only way– no one else does better.”

Gramener built a custom analytics solution that


sourced their SAP order data, automatically
identified which plants ordered which commodities
the most from multiple vendors – and when.

It showed how each plant performed compared to


peers – shaming those with poor performance.

With this, they identified savings of €15 m — which


the plant managers couldn’t refute.

€15 m 40%
savings potential identified vendor based reduction
annually identified

SEE LIVE DEMO WATCH A 4-MINUTE VIDEO 35


Understanding the audience & intent

You have data. Finding insights

You have analysis.


Storylining
Now, create data
stories! Designing data stories
Q&A Share feedback at
bit.ly/datastorymasterclass

You might also like