Data Storytelling 101
Data Storytelling 101
Storytelling has a 30X Return on Investment But analysts present their work, not their message
Rob Walker and Joshua Glenn auctioned common Data scientists present their analysis – what they did,
items like mugs, golf balls, toys, etc. The item and what they found. That’s not what the audience
descriptions were stories purpose-written by 200+ needs.
contributing writers.
Audiences need a message that tells them what to do,
Items that were bought for $250 sold for over $8,000 – and why. Told in an engaging way. As a story.
a return of over 3,000% for storytelling!
Share your data & analysis as data stories
Stories are memorable and viral
Whenever you share inferences from data – whether
People remember stories. They’ll act on them. it’s as a presentation, or an email or document with
your analysis, or as a dashboard – craft it as a story.
People share stories. That enables collective action.
This workshop will teach you the techniques of how to
We analyze data to improve people’s decision making. convert an analysis into a memorable story – even if
For this to be effective, data stories are needed more you’ve never told a story before.
than ever before.
2
With the growth of self-service BI, 85% of companies have lost track of how many
dashboards they generated
BUT 3 THINGS
What QUESTION does Is the ANSWER evident What ACTION should the
ARE UNCLEAR ON
the dashboard answer? from the dashboard? user take now?
MOST DASHBOARDS
3
We’ve been telling stories with data for a long time
4
Let’s look at 15 years of US Birth Data
LIN
K
More births Fewer births … on average, for each day of the year (from 1975 to 1990)
Some special days like Most people prefer not Relatively few births during the Very high births in September.
April Fool’s day are to have children on the Christmas and Thanksgiving But this is fairly well known. Most
avoided, but Valentine’s 13th of any month, given holidays, as well as New Year conceptions happen during the
Day is quite popular that it’s an unlucky day and Independence Day. winter holiday season
Fraud Education
The pattern in India is quite different
LIN
K
More births Fewer births … on average, for each day of the year (from 2007 to 2013)
Such round numbered patterns a We see a large number of Very few children are born in the
typical indication of fraud. Here, children born on the 5th, 10th, 15th, month of August, and thereafter.
birthdates are brought forward to 20th and 25th of each month – Most births are concentrated in
aid early school admission that is, round numbered dates the first half of the year
Fraud Education
This adversely impacts children’s marks
LIN
K
It’s a well-established fact that older The average marks of children “born” on the 1 st, 5th, 10th, 15th etc.. of
children tend to do better at school in the month tend to score lower marks.
most activities. Since many children • Are holidays avoided for births?
have had their birth dates brought • Which months have a higher propensity for births, and why?
forward, these younger children suffer. • Are there any patterns not found in the US data?
Higher marks Lower marks … on average, for children born on a given day of the year (from 2007 to 2013)
Fraud Education
Understanding the audience & intent
The same data analysis can be relevant to DO IT: Who might be an audience for your
many people — each group is called persona. analysis?
• The trends in sales data for an organization is • Lookback at your recent analytics project.
relevant for a CEO, head of sales, region • Who do you know that can use this analysis?
leads, individual sales team member & every (Come up with a real or hypothetical personas)
employee.
CHECK IT: Verify these yourself
• The analysis of polio cases in UP is relevant to
Is there a name for the individual?
the Minister of health, polio campaign manager,
field workers, NGOs, journalists & general Was the role specific enough? (Head of sales
public. instead of just executive)
List scenario(s) for each persona DO IT: Start with your own hypothesis
• Pick one of the personas you had listed earlier.
For each persona, answer the following questions:
1. What situation are they currently in? • What problems do you think your persona is facing?
2. What problems do they face? • How do you feel the persona will use the analysis?
3. What is the consequence? • Frame it as a user scenario.
4. What action can they need to take using your analysis?
5. What is the impact of this action? CHECK IT: Verify user scenario with a partner
Combine these as a user scenario: Is it framed as “As a [persona], I’m in [situation]
“As a [persona], I’m in [situation] where I face [problem], leading where I face [problem], leading to
to [consequence]. Solving it by [action] leads to [impact]” [consequence]. Solving it by [action] leads to
• John: As a Marketing manager, I have to create region-wise [impact]”
budget for the next quarter. I don’t know which regions give the
Would the persona relate to this user scenario if
highest RoI, so my spend isn’t optimized. Solving it by
prioritizing the region will lead to maximum ROI. they heard it?
Storylining
IS THE INSIGHT What should the audience do after hearing the insight?
Can they take an action that improves their objective?
USEFUL Even if it’s informational, what should they do next?
Typing with capitalization in a credit application indicates creditworthiness Low Low High
Almost 20% of all voice search queries are triggered by just 25 words Low High Medium
About 50% of American small businesses do not have a website High Medium Low
The recommendation system influences about 80% of content streamed on Big Low Low
Netflix
Only those that are high or medium on all aspects are insights 14
Step 3
Storylining
Outlines are the backbone on which you flesh out your story.
This section explains how to create storylines
16
1. Start with the takeaway (The elevator pitch. The moral of the story.)
Close your eyes. Think of a childhood tale. DO IT: Write your takeaway as one sentence
Summarize the moral of the story in one line
What’s the one thing you want the audience to
We easily we remember these stories and their remember from your story?
summary as a moral several years later.
What’s the one message that the audience
Close your eyes. Think of a business should take away?
presentation from last week. Can you easily
summarize the message in one line? CHECK IT: Verify these yourself
Is it a single, complete, sentence?
Stories are designed around a moral. A single
Does it deliver what you want the audience to
takeaway. An “elevator pitch”
remember?
Will your audience care a lot about this?
It’s a one-sentence summary of the most important message for the audience.
17
2. Find analysis that supports your takeaway. Ignore irrelevant content
What supports your takeaway from DO IT: Write your supporting analysis
“The Lion and the Mouse”?
http://www.read.gov/aesop/007.html
1. List all possible analysis
2. Re-word them as sentences
The lion was an Asiatic lion
3. Strike off what’s not relevant
The lion had a huge paw
The lion spared the mouse it caught CHECK IT: Verify these yourself
The lion was caught by a hunter’s net
Is each necessary? Does each analysis
It was stalking its prey when it got caught
support the takeaway?
The mouse was nibbling grass nearby Are they sufficient? Do the analyses prove
The mouse took few minutes to cut the net the takeaway?
There’s no right or wrong answer. Think
about how it supports your takeaway.
Analysis doesn’t mean anything to people. When DO IT: Add context to your analysis
it does, it’s a message. We do this by adding
context. Three ways to add context are: 1. Take each relevant analysis
1. Compare with similar numbers. 2. Convert it to a message for the audience by
Our $15 mn sales is $3 mn more than last adding context
year, $1 mn below budget, and twice our
nearest competitors. CHECK IT: Verify these yourself
Frame each analysis as a message that the audience will understand and find relevant
19
4. Structure the messages into a pyramid or a tree
Structure your supporting messages into a Order messages into an emotionally contrasting,
memorable flow. Here are 7 flows that help: motivating sequence.
1. Time: e.g. Past, Present, Future Take this aspects-based flow:
Sales was $15 mn. Now it’s $18 mn. We • Our profits doubled. But our sales only grew
expect it to grow to $20 mn.
20%. Our gross margins stayed flat.
2. Place: e.g. NA, EU, APAC The “emotional arc” is falling,
3. Aspects: e.g. company, competition, context and not motivating.
4. Benefits: e.g. better, faster, cheaper Here’s the same message re-ordered:
5. Scale: e.g. local, regional, global • Our sales grew mildly at 20%. Our margins
6. Balance: e.g. pros, cons didn’t improve at all. But our profits doubled!
7. Priority or climactic: least to most important This emotional arc falls before
rising. This is more motivating.
Remember: Emotional contrast requires bad news – it makes good look better
21
Step 4
data stories
Storylining
23
How the data should be interpreted decides the type of chart to
be used
Correlation
Magnitude
Part-to-
Whole
Distribution
Flow
Change-
Spatial Deviation Ranking
over-Time
https://gramener.github.io/visual-vocabulary-vega/
24
We use visual design cues to support our annotations & message
4 Several other
encodings are
possible
Aesthetics such as angle,
shadows, shapes, patterns,
density, labelling, enclosures,
etc. can each be used to map
data.
25
Your audience may not understand what you meant to show
• Meaning or message behind a chart isn’t DO IT: What can you understand from the
always obvious. chart shown next?
• The same chart can be interpreted in several
Look at the chart that will be shown next.
ways by your audience.
• You must guide your audience to see the List down what all you can understand as points.
message you want to show.
CHECK IT: Verify these yourself
How many did aspects did you notice from the
final list of observations?
Class Xth English Marks Distribution
20,000
15,000
10,000
5,000
0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
4 type of annotations help the audience understand your intent
Summarize the chart in its title Teachers add marks to stop some students from failing
Don’t describe the chart. This chart shows Class 10 students’ English marks in Tamil Nadu, India, in What’s unusual
Don’t write the question to answer. 2011. The X-axis has the mark a student has scored. The Y-axis has the # of
students who scored that mark. Large number of students
Write the answer itself. Like a headline. score 35 marks.
This is a bell curve. But the spike at 35 (the mark at which students pass) is Few (but not 0) students score
Explain the chart unusual. Teachers must be adding marks to some of the students who are between 30-35
likely to fail by a small margin.
How should the user read it?
What do you say when you talk through it? 20,000 # students
Large number of students score
Explain what the visual is. Then the axes. exactly 35 marks
Then its contents. Then the inference. 15,000
A chart is supported by
several elements to get
context of the data & insight.
Chart Title Legends
Each element has a scope to
be well designed and help
improve the comprehension Chart Area
Data Labels
of the chart.
Horizontal Axis
Vertical Axis
Students
principles
Storylining
BUT 3 THINGS
What QUESTION does Is the ANSWER evident What ACTION should the
ARE UNCLEAR ON
the dashboard answer? from the dashboard? user take now?
MOST DASHBOARDS
31
Today we use dashboards to expose data. But users must explore & interpret it.
BEF
Quarterly Sales vs Target Product-wise growth ORE
Consumer C
Consumer B
Consumer A
Enterprise C
Enterprise B
Enterprise A
- 2,000,000 4,000,000
MEA AU 15 10 22 17 13 18
NA EU 18 20 12 14 15 22
- 0 0 0 0 MEA 22 30 9 16 18 20
00 00 00 00
0, 0, 0, 0,
00 00 00 00 NA 7 4 3 9 10 12
2, 4, 6, 8,
We automate data stories. So users act, rather than interpret.
A FT
SERVICES REVENUE 5% BELOW TARGET, DESPITEER
8% QOQ GROWTH
STORY GUIDE Revenue is 5%
Below Target
The visual on the right shows
our services revenue against QoQ growth
target. If we’re below target, is 8%
we must understand why.
Consumer B
AU AP 12 11 15 12 9 14
Consumer A AU 15 10 22 17 13 18
EU
Enterprise C EU 18 20 12 14 15 22
MEA MEA 22 30 9 16 18 20
Enterprise B
NA 7 4 3 9 10 12
Enterprise A AP
Action: North America should grow consumer products. Leverage learning from other regions
Here’s are some live data stories that applies these principles
LIN
K
Does access to new Technology facilitate Innovation? Does it We were curious about whether the data on TCData360 could tell a
facilitate Entrepreneurship? The story about influential factors on innovation and entrepreneurship.
Global Information Technology Report findings tell us that With over 1800 indicators, we focused on the
"innovation is increasingly based on digital technologies and Networked Readiness Index, as it has indicators on
business models, which can drive economic and social gains from entrepreneurship, technology, and innovation.
ICTs...".
Source: https://tcdata360.worldbank.org/stories/tech-entrepreneurship/
European brewery identified €15 m cost savings after
consolidating vendors
A leading European brewery’s plants purchased
commodity raw materials from several vendors
each – and had low volume discounts.
€15 m 40%
savings potential identified vendor based reduction
annually identified