Berengueres J. Introduction To Data Visualization... Guide... 2020
Berengueres J. Introduction To Data Visualization... Guide... 2020
Jose Berengueres
with
Marybeth Sandell &
Ali Fenwick
2nd Edition
Type setting by SNASK
Text Copyright © Jose Berengueres
Where otherwise stated, artworks,
cover & drawings © by Jose Berengueres
Data, value creation & thinking modes
TOC
TOC
Preface
How to use this book
Chapter 3. Knowledge
Mental frameworks
Visualizing inclusion
Winner takes all
The BRICS framework
Quiz: Visual summaries
Chapter 4. Charts to Think
Use rankings to create situational awareness
The design space
Forecasting with mean-reversion
The design space in business
The Gap Matrix
The Innovation Matrix
Wardley maps
Wheel of Life
Interactive exploration
Quiz: Supporting decisions
Chapter 5. Making your chart pop
Use arrows to unbound your thinking
Decluttering charts
Use personas to win over the audience
Visualizing big differences
Putting the world’s money into perspective
How many Solar Panels are needed to power the USA?
Storytelling age-bias with humor
Use the Golden ratio everywhere
Twists
Quiz: Ice baby
Quiz: Global warming
Quiz: Magic quadrants
Quiz: Visualizing gaps
Acknowledgement
References & Footnotes
Preface
It was May 2018 when I received an email from Kaggle.com, a data
science community where people all over the world compete in data
modelling challenges. Every year since 2017, Kaggle has surveyed its users
and this year they decided to organize a storytelling competition and offer
cash prizes for the best survey visualization. I was curious to see what
people had submitted so far and sifted through the entries. I was impressed
at how much time and effort had been put into analyzing the data. Some
charts required laborious data wrangling, others crafty SQL inner joins, and
some Python sorcery. And yet, the charts were not doing justice to the
compelling findings and the stories that could be told. No doubt, some were
creative, extensive in length, exhaustive in the exploration. However, there
were also many unimpressive charts. Why was so much IQ not producing
more enticing visualizations?
It’s about the awareness
80% of the data scientists graduating today will do so without having
received any formal education on storytelling[1], and 60% of them place data
visualization at the bottom of their priority list of skills to have. In addition,
the fact that most data science is taught at STEM schools, not Art schools,
does not help either. (Really, STEM schools should aspire to be STEAM
schools, to include Art). In order to improve the quality of data
visualization, there must be a change in how data scientists are trained
along with a mind shift regarding the imperative of good story telling.
However, to make effective visualizations, art sensibility is not all that is
needed.
Death by default settings
A second cause, is that the gallery of default chart styles of Microsoft
Excel is unlikely to match what is required to tell the story[2] and is
laboriously hard to adjust. For example, a single bar chart in Excel has a
whopping +129 configurable options: color fill options, axis options, line
thickness, line types, legend positioning options, scale options, bar type,
spacing options. Each option has between 2 and 10 possible values. This is
a 10 followed by 38 zeroes. Because of the tricky laws of aesthetics[3], very
few of these combinations will produce the Aha! chart that will dazzle the
audience, and even if it could, it is far too laborious for most people. To put
it into perspective, there are more possible combinations than stars in the
Milky Way and if we spent just 3 seconds tweaking each of the 129
combinations it would take 6 minutes and 578 mouse clicks to adjust them
all. It does not help that most visualization software in use today (this
applies to ggplot too) was developed by CS graduates with little training on
the basics of color theory, information design or visual communication. The
exception was Steve Jobs, who took calligraphy lessons and credited that
for the Mac’s beautiful fonts. See Walter Isaacson’s bio on Jobs as a
flashback on how much the field has improved[4].
Start with why
A third cause, is that to produce a meaningful visualization we need to
know why we are visualizing. To get to the why, it’s important to explore
the story that needs to be told and be able to make it relevant.
To succeed at transforming data into a compelling and relevant story, it
helps to connect the data to a context, metaphor or mental framework
(frameworks from Economics, Art, and Sociology are particularly useful).
In order to make these connections, a cross-functional education is
necessary. Unfortunately, this is not the norm resulting in loads of very
interesting data failing to become useful knowledge. To address this, in the
following pages you will find a set of principles by example that I wish I
had learned in grad school.
Happy visualizations!
Jose Berengueres
Stockholm, Oct 25th 2019
How to use this book
What will you learn
1. Identify the role of a narrative in a chart
2. Transform data into information
3. Synthesize knowledge
4. Apply visual thinking tools to the decision-making process
5. Select visual communication techniques to persuade
Fig 1 These three magazines publish some of the most influential charts in
the newsstand. None of them is made with Excel[5].
Once the order of the words has been agreed, we can discuss the
ordering criteria. Why did we order them the way we did? This is a great
conversation starter. To ground the conversation, it further helps to list the
attributes of the words at the extremes. What are the attributes that
distinguish data vs. wisdom?
Data vs. wisdom
Exercise
Fig 3 An exercise used to understand the arrow of value between data and
wisdom[6].
Solution
What is wisdom?
Data is many while wisdom is usually characterized by few. Data is
abundant, wisdom is scarce. Value is closely correlated with scarcity too.
This exercise is great to clarify the pervasive confusion between data,
information, knowledge and its relationship to value, scarcity and wisdom.
How is wisdom made?
Jackie Chan says in one of his films, “information is not knowledge, and
knowledge is not wisdom”. But what is wisdom? Is wisdom just knowledge
in context? Is wisdom meta knowledge? — knowledge about knowledge?
And more importantly, knowing in which situation to apply a given
knowledge? Even if the definition is not universal, what we are more
interested in here is how to transform knowledge into wisdom. Why?
Because it is a high added value activity and one of the reasons (if not the
only reason) why companies employ data scientists. One way to arrive at
wisdom is the Synthesis process — the dialectic combination of thesis and
antithesis into a higher truth.
Match the words
Exercise
Connect each keyword with its corresponding image. Time 1 minute.
Fig 6 An exercise used to understand what wisdom is. Lego image source:
LinkedIn, anonymous mem.
Solution
_______________________________________
Now, visualize it
Solution
Wrapping-up
When delivering a workshop, this exercise is a great way to bring
attention to the point of summary vs. synthesis. A summary is a mere
reduction process whereas synthesis is demonstrating an understanding of
the subject by relating it to other subjects, ultimately adding value through
connective thinking.
In Fig 10, we summarized the relationship between data and wisdom by
way of the pyramid metaphor. Elements at the top of the pyramid are
valuable, scarce and hard to carry to the top because they work against the
force of gravity. This is a great example of a visual summary of the chapter
while also a good example of synthesis.
Now that we’ve learnt the difference between data and knowledge, and
how to transform knowledge into wisdom, let’s look at the role of narratives
in charts.
Narratives & stories
Exercise
Fill in the blank with a verb. Example: “Data _____ stories”. Time 3
minutes.
Stories
To understand what a narrative is, we first need to understand what a
story is. A story is an account of events. We humans love stories. Why?
Telling and consuming stories is addictive. For example, listening to an
Aesop fable, reading a book, and watching a movie, all those release
oxytocin in the brain— the feel-good hormone. That is why people get
addicted to Netflix, Venezuelan soap operas and fiction books. Oral
storytelling is thought to be the earliest method for sharing narratives. From
an anthropological perspective, during most childhoods, narratives are used
to shape children on proper behavior and values. This is usually done
through tales.
Narratives
A narrative is a set of beliefs, values or worldview. Therefore, the chosen
narrative interprets the story (and consequently the underlying data /
reality). An example of a narrative popular in European culture is that kids,
especially young girls, should not trust strangers. A story that promotes that
narrative is the story Little Red Riding Hood, a tale from the 10th Century. In
fables, a narrative, is also made explicit at the end of the tale as in the moral
of the story. Another example of a narrative is FUD — Fear, Uncertainty
and Doubt. It is also known as a disinformation strategy[9] used to thwart
change to the status quo. It is said that IBM was one of the first companies
listed on the Dow Jones to use FUD openly. An exemplification was,
“Nobody gets fired for buying an IBM”.
Connection to Aristotle
Because the goal of a story is to persuade; narratives, stories and data are
related to the three modes of persuasion of Aristotle — The Ethos, the
Pathos and the Logos[10]. The narrative is related to the Ethos (to appeal to
the ethical values), the story is related to the Pathos (to appeal to the
emotions), and the data that supports the story is related to the Logos (to
appeal to logic).
Exercise
Identify the narrative, the story, the data and call to action in this photo.
Time 2 minutes.
Fig 17 Don’t let words like ‘narrative’ get in the way of a great story.
Fig 18 Unlike Al Gore in 2006, Greta Thunberg needed no charts to get her
message across [14].
Solution to Storytelling climate change
Story
(Abridged speech[16]). “You might be grown-ups but you are not
mature enough to understand this emergency. If you did, you would not
jet to the conference like you do. You could Skype, or travel like me to
reduce your carbon footprint. Hence, you (not me) is behaving like the
immature kid.”
Data
Look at the big waves behind me, I am serious, this is
dangerous
A carbon neutral sail-ship = it’s possible to reduce carbon
footprint
The situation is bad enough that I had to skip school classes
Air jet-set travel produces CO2 but there are alternatives look at
me
Chapter 2. Visualizing Information
– How to transform data into information –
Fig 19 Creating knowledge from data, the secret to winning the Nobel
prize?
Survey responses
Female 16.8%
Male 81.4%
NS 1.4%
ND 0.3%
Solution
Solution with Matplotlib
Fig 20 This default Matplotlib chart uses four different font sizes and six
colors (red, green, turquoise, purple, black, grey).
Exercise
Is the figure above data, information or knowledge? Why? Reason your
answer. Time 2 minutes.
Solution
It is just information. It is not knowledge because it is not significantly
more useful than the original raw data.
Reflection
Unless you are in a preliminary Exploratory Data Analysis (EDA), it is
not a good idea to disseminate a chart unless there is a clear why (narrative)
for the chart. And even if you produce many charts as a part of an EDA,
resist the temptation to show all of them. In this case, we are asked to
visualize the gender distribution of the respondents of the 2018 Kaggle
survey — one of the largest data science communities worldwide. Gender
was one of the 30+ questions on the survey which was answered by about
30k respondents. Fig 20 is the default settings chart produced using the
popular python library Matplotlib. This chart is perfectly fine. It is
informative, but there is no message, there is no why. It lacks a purpose.
Why? One reason is that it is not connected to any narrative. Another
reason is that it does not increase our knowledge. Is it helping us to become
wiser? Is it facilitating the prescriptive analytics function? How would you
make this chart more useful?
Exercise
Draw here at least three alternative charts to Fig 20? Time 3 minutes.
(Solution in the next page).
The Chart-narrative fit
Fig 21 Three ways to visualize the same data.
Fig 22 Four Batmen and a Wonder woman make this chart easier to
remember.
Leveraging humor
Many charts are impersonal because we cannot relate to them. We solved
that with the superheroes. See also user personas in Ch. 6) However, if in
addition we want the audience to remember the chart, we can use humor or
an insider joke as in, data scientists are superheroes because they have to
“wrangle” with data, see the term data wrangling.
Sexism in your chart?
How to bias check? It is important to check for blind spots. Charts are no
different. It is prudent to ask for bias check to a diversity of people, ideally
with different backgrounds. See Chapter 5 for more on bias.
Gravity & charts
Fig 24 Gravity shapes everything on Earth, including how we interpret
charts.
Fig 25 Musk vs. Bezos. Two visions of space exploration. Two ways to
visualize altitude.
When to use pies
There is a fundamental difference between circular charts and bar charts.
The brain is sensitive to angular change and (by comparison) numb to linear
change[22]. This is particularly true when considering motion, and sensitivity
to small changes. If in your narrative, highlighting minute changes in a
variable is important for the story, then circular pie charts (speed gauges)
are the way to go. If on the contrary, too much attention to change is a
distraction, avoid pie charts. Compare for yourself. In the Blue origin cast,
the attitude change is barely noticeable. Whereas in the SpaceX cast, it pops
during all the cast.
Blue Origin cast: http://bit.ly/2NHycmf
SpaceX cast: http://bit.ly/2XwXYxY
Quiz: Making useful charts
True or False? Time 10 minutes.
Exercise
How would you make the previous chart more useful? We can start by
reducing the information overload. Draw solutions. Time 2 minutes.
Solution
Exercise
In the previous section, we saw an example of meaning creation by way
of connecting information to a reference framework. Now let’s do the same
and, in addition, let’s apply a visual metaphor. Let’s look at salary data from
the same 2018 survey. Is Fig 29 data information or knowledge?
Given an inclusion narrative, how would you create a more useful chart?
Time 6 minutes. Hint: If the chart was a building where would highly paid
individuals own apartments?
Intermediate step
(Fig 29 rotated counter clockwise 90 degrees)
Here, we are visualizing the data science libraries the respondents use.
Using survey Q20: Of the choices that you selected in the previous
question, which ML library have you used the most? Given a winner takes
all narrative (so common in the software world), what visual metaphor can
we apply? This chart is an example of less is more. In this case, Sci-Kit (a
famous scientific Python library) has a 48% share, Google’s TensorFlow
has a 16%, followed by Keras 14%. Let's see how this visual is connected
to the narrative.
Chart-narrative fit
In scenarios with strong network externalities at play such as social
network, a phone OS or an Olympic race; being on the podium (being first)
has a disproportionate effect on the reward. In such cases, the winner-takes-
all narrative is in place. Anthropomorphizing the ranking with a podium
conveys a memorable narrative and affordance — the glory the winner
deserves for the great utility this library provided to the community. This
narrative is also connected to other memes famous in the software world
such as the developer’s glory. (See S. Balmer in “Developers, developers,
developers, developers”).
All or nothing
In the previous section, we visualized data about the most popular ML
libraries with a winner takes all narrative, here we do the same with a
different narrative.
Fig 33 The house of Shiva. When colored areas occupy large areas, use
50% grey and pastel colors instead of 100% solid bold colors.
House of Shiva
Fig 33 is a combination of, (i) a chart template called Marimekko, with
(ii) a symbolic chart called House of Shiva. The House of Shiva is used to
emphasize all or nothing relations. The metaphor is that the roof falls if just
one column collapses.
Symbolism
The columns support the visualization efforts of the community (roof
load, common good). The width of the “columns” expresses how much
work/load each column supports. Grey columns on the right represent other
less mainstream libraries such as: D3, Shiny, Bokeh, Leaflet, Lattice.
Source: Survey Q22 Which specific data visualization library or tool have
you used the most?
Metaphor
The goal is the roof. As with a house, the integrity of it becomes clearly
impaired if one column is weak.
Narrative
The narrative here is that non-mainstream visualization libraries are
important but with different degree. Note here that if we had used a pie-
chart we would have conveyed a win-lose scarcity narrative, not faithfully
representing the win-win ethos of the open source movement.
The BRICS framework
Fig 35 This chart was made with PowerPoint because it was faster than
tweaking the parameters of ggplot. Notice how the golden ratio is used
across the chart.
Fig 36 Two is the maximum amount of colors you should use in a chart.
Grey does not count.
As Mr. Wardley would say — When you need to understand the territory
it helps to have a map. Here we use 2D mapping by scattering the countries
along two dimensions[33]. The technique of projecting into two dimensions
has been successfully used in famous charts such as Wardley Maps, the
BCG growth share matrix, The Urgent-Important matrix and Gartner’s
magic quadrant. This map can be used to cluster countries by policy to help
elucidate success factors that influence the position in the map (See also
Gapminder).
Narrative
Porter’s Competitive Advantage of Nations.
About the Innovation Index
Every year, INSEAD MBA, Cornel University and the WIPO publish
the Global Innovation Index. In 2018, the most innovative country was
Switzerland. A Spearman rank correlation between GII and user prevalence
yields 79%.
Exercise
Let’s take this chart a step further. One of the most valuable skills is
prediction. Given Fig 37, can you predict where Japan will be 10 years from
now? Use a linear regression. Time 5 minutes. (Solution in the next
section).
Forecasting with mean-reversion
Here, we just added a regression line and removed the outlier Singapore.
The 95% standard error margin is shown in grey. Some countries are below
and some above. Highlighted in red is Japan, as an outlier with high in
Innovation Index (y-axis) but low in x-axis relative to peers. Let’s assume
that the principle of mean reversion applies here as a baseline predictor and
a hidden hand continually pushes countries towards the mean (dotted line).
The principle of mean reversion is based on the idea that there are no
permanent competitive advantages to either companies (See introduction
chapter in Blue Ocean Strategy) or nations. It has shown its worth,
particularly in finance. For example, in betting on the composition of the
DOW JONES, very few companies have what it takes to last long in the
Dow Jones. Of the original members of the index formed in 1896, only GE
remains.
Reflection
What can we forecast about the 2019 GII rank position of Japan?
Applying the principle of mean reversion, it is unlikely that Japan will
increase its rank because it is already high. Even if Japan catches up in data
scientist prevalence, it is likely that it will still go down (towards the mean).
Indexes are just weights. Assuming the Data Science weight in the
innovation economy will only increase in the coming decades and that the
GII index calculation method will be updated accordingly, what countries
are more likely to improve their “nominal" ranking in 2019? When the GII
index weights are rebalanced, is it likely that countries Canada, Australia
will jump a few places? Source: Global Innovation Index 2018, World Bank
Population Data 2016, Q11 - Current country of residence.
A note on the origin of linear regression
The name linear regression as in the line that minimizes the sum of the
square of the errors, was popularized in a paper where the principle of
“regression to the mean” was verified in how offspring height is related to
the parent’s height. Spoiler alert! Only 60% of the offspring height is
explained by the parents’ heights. The rest is explained by the mean of the
race. Which means that the Mean reversion principle applies in height with
a 40% influence approximately. However, the mathematical method is
completely unrelated to any concept of regression. The paper got famous
and the regression word stuck to the method. A great trick question is to ask
students to explain why linear regression is called linear regression. I am
always amazed at the inventiveness of some students[34].
The design space in business
Reframe it
One of the most important roles of a data scientist, is to realize when the
customer cannot articulate his own needs (see Jobs-to-be-done theory). This
skill is what distinguishes the A+ data scientist from the rest. The chart here
is adapted from the book The Accidental Investment Banker. The author, a
banker, came up with it during a business engagement. He used it to map
out the M&A strategy for a client. Once he made this chart, everybody in
the room could visualize where value was. In his book, he credits this chart
as an important moment in his career.
The Gap Matrix
Fig 40a Business Innovation is sometimes as easy as finding a white space.
Source: McKinsey Global Institute[35]
Fig 40b The most useful visualization in the history of Science? Source:
Bloomberg BusinessWeek.
3. Innovate
Now that you have a clear picture of relationships between value,
customer needs, costs and technology. You are in a better position to
innovate using a variety of techniques such as:
Brainstorming
Planning an ideo style shopping cart workshop
Using Edward deBono creativity tools
Finding gaps
Serving new needs with exiting functions
Exercise
Groups of four. Time 20 minutes. Think about this microwave and its
components…
First, you draw spokes. Eight spokes of a wheel. Each spoke represents a
different category of your life and will help you measure your satisfaction
in each area of your life. The first one is Money (How satisfied are you with
the money you have saved/make?) Second, Career (How satisfied are you
with your path, progress and current career?). Third, Wellness (both
spiritual/mental and physical). Then, Friends & Family, Love, Fun, Physical
Environment (Do you like the country, city/ house/ neighborhood you are
in?), and finally, spiritual and personal growth. We put a grade on each
category marking a dot on the spoke on a scale of 1 to 10, 1 being at the
center and 10 being away from the center, and then we connect the dots.
Connecting the dots
Icons and emoji are an underused resource in chart making. On the other
hand, emoji use is correlated with employee engagement
Interactive exploration
Fig 49 A screenshot of a real time, visual SQL inner join operation between
three tables; Source: Square 2001.
Square’s Crossfilter
Crossfilter is a JavaScript library for exploring large multivariate
datasets in the browser. Extremely fast (<30ms), it allows “Doherty
threshold” interaction with coordinated views, even with datasets
containing a million or more records; Square built it in 2001 to power
analytics for Square Register.
Exercise
Groups of two. Time 12 minutes. Using Square Crossfilter demo, find
three insights about air travel. Example: to avoid delays fly in the morning.
Visualize your findings.
Quiz: Supporting decisions
True or False? Time 10 minutes.
Fig 52 Marie Kondo applies the principle of throwing away things in one’s
life that do not spark joy.
Fig 53 Animation on decluttering a pie chart bit.ly/2OgCLUO
Use personas to win over the audience
Fig 54 Humans are adept at recognizing faces.
Did you know that we can recognize a face faster than many other
objects in the world? Use it! In 2007 Honda used this principle when they
designed a motorbike that, from the back, looked like a human face
(anthropomorphic).
Visualizing big differences
Fig 56 The small planet fits 1720 times in the volume of the big one.
In Fig 58, note how (a) has the most dynamic range but is not-intuitive;
(b) is the radius of the sphere equivalent. It would be intuitive if expressed
as volume, like in Fig 57; (c) is a linear scale; is intuitive, but lacks dynamic
range.
Why 3D spheres work so well
We humans have evolved to estimate weight of an animal from sight. Of
course, this was a very useful skill for our ancestors in the Savannah. Notice
how much easier it is to understand relative sizes when we use volume,
versus any other option. For primates, estimating the weight of a fellow
primate visually was a crucial survival skill useful to determine how
dangerous the threat of an opponent was before contact. Given most living
forms have a similar weight density, a way to do this was by estimating
volume. At the same time, we humans are struggle to understand bar charts
when the bars differ more than 2 orders of magnitude. Luckily, if shown 2D
projections of 3D objects, most humans can estimate the weight well. This
comes in handy to compare magnitudes as different as 3 or 4 orders of
magnitude on a flat surface such as this book. Using the cubic relation, a 1
to 10 change in height becomes a 1 to 1,000 change in weight — a great
dynamic range.
Log charts
Note that the log plot solves the issue of dynamic range but we humans
are not born with logarithmic intuition built-in (Fig 58). In other words, a
kid will understand the balls, but it takes hours for undergrads to become
familiar with semi-log plots.
How many Solar Panels are needed to power the
USA?
Fig 59 In 2017, Elon Musk used such a chart template to advocate for Solar
Energy. — It was a flop.
Fig 59 narrative is scarcity. In 2017, Elon Musk used a chart like this
one. He was advocating for Solar Energy. He said, “We just need one pixel
of the map covered in panels to power the whole USA, remember just one
pixel.” It was a flop. Why? Because it connected to a win-lose narrative. It
is also hard to trust what we cannot see (one pixel is not a great
visualization). Unfortunately, 2D charts do not have enough dynamic range
to visualize differences larger than 2 orders of magnitude. He was trying to
visualize 4 orders.
Fig 60 A chart that uses the growth mindset narrative. Adapted from Q-
Cells.
Fig 61 Of all the biases, age is one of the most pervasive and less talked
about.
Color choices
Orange: cute at first glance, annoying after 20 seconds.
Time line
The data is weekly and yearly periodical, therefore a yearly or weekly
periodical time scales would reduce clutter. For example, a weekly radar
chart.
Alternative charts
Radar weekly
Bar plot monthly, weekly with iso-measures of ice cream
The grey line is constant therefore carries no meaning
(remove?)
Other suggestions
Normalize per capita & per app user
Annotate Christmas weekend and other peak days
Explode into a scatter between temperature (x-axis), sales (y-
axis), color of dot is day of week
Calculate % of sales due to temperature and % sales due to
seasonality from a linear regression analysis using as factors
summer and weekends.
Use Crossfilter to let users explore and discover hidden
relationships for the variables weekday, month, temperature,
sales of ice-cream and individual variance in consumption
Quiz: Global warming
Fig 68 Ed Hawkins made this spiral chart in May 2016. It went viral in
minutes.
From Pies vs. Bars we know that humans are more sensitive to circular
than linear change. If we want the chart to align with the narrative that
“climate change is an emergency”, then let’s leverage that!.
Quiz: Magic quadrants
Fig 69 Most wanted Data Science skills in 2019. Source KDnuggets. Image
at original resolution (might appear blurry in some devices).
How might we visualize this data into a more meaningful way? What
design space is most appropriate given the data? Time 4 hours. Hint: See
Magic Quadrants.
Fig 69b Fig from Most wanted Data Science skills in 2019. Source
KDnuggets. Image at original resolution (might appear blurry in some
devices).
Solution
Let’s apply what we have learned so far. (Find a why, transform data into
information, synthesise knowledge by linking to frameworks, make it useful
for decision making). Before finding a why, let’s first explore the data.
The first instinct is to do a scatter plot to identify interesting clusters.
The x-axis can be the percentage of respondents that have a given skill, and
the y-axis, percentage of respondents that would like to have that skill
(want). However, there are too many data points for a human to make sense
of it. It is a textbook case of death by information overload and in Fig 70 we
used the Jackie Chan meme to convey it.
Fig 70 A victim of information overload?
However, Fig 71 is far from ready. The y-axis is aligned with the gravity
metaphor (highly wanted, high y). However, the x-axis is not aligned with
another unspoken rule, (this one by Guy Kawasaki): “you want (desired
goals) to be high and to the right”. In this case, the most desired skill (Deep
Learning) is on the wrong side – we need to flip the x-axis, (Fig 72).
Fig 72 Goals should be “high and to the right” – Guy Kawasaki.
If you make a chart and no one remembers it. Did it still happen? In Fig
72, we grouped the skills in four categories but what good are they if no one
remembers them? One way to help your audience to remember is personas
(memes, in Gen-Z speak). Let’s apply user personas. In Fig 73 each
quadrant means:
Unwanted skills (Have but, don't want = Excel)
No-thank-you skills (don’t want and don’t have = JAVA)
Hot skills (want but don’t have = Tensor Flow)
Loved skills (want and have = Python)
Fig 73 Pop culture, use to make your chart stick.
Red pen
Don’t be afraid to red pen your canvas! Captions are an opportunity to
clarify meaning and add punch to your story (not everyone is visual). Note
how in Fig 74 we broke the symmetry by tilting the “loved” label, that is
Feng-Shui for charts. We also added a twist in the Java quadrant by not
having a label for it. This ensures that the reader will go to this quadrant
after visiting all the others. The label for this quadrant is inside the meme
(No thank you).
Layering
Note how we have layered information in hierarchies (meme, quadrant
labels, quadrant representative). We have respected the seven-chunk limit in
each layer to avoid overload. Meaning was achieved by linking to an
existing framework and organizing the data into quadrants and creating a
labelling them.
Narrative
Finally, charts should have a purpose. It is reasonable to go through the
process of knowledge creation without knowing why. Once knowledge has
been found and visualized, the why will be easy to find. My personal why
for this chart is: “I like to see more Python and less Java in my classroom”.
Now that we created knowledge, can we use this chart as a thinking tool?
One way is to imagine contexts where this chart might be useful. Where
could this chart be used to create situational awareness? The figurative
“cloud wars” between Microsoft and Google are fought via proxies such
PowerBI, kaggle and other cloud software lock-in levers. A similar play
book developed in the 90’s in the database market. Fig 75 visualizes who
sponsors which language to see where allegiances stand.
Quiz: Visualizing gaps
Fig 77 If people could unlearn things, would they? (Only 3 items are shown
due to resolution limits of the device)
This chart has now become a predictor of what would happen if people
learnt what they say they want. Have is the current (prevalence) level. Have
+ gap is the future level. In the case of negative gaps, the gap bars are
plotted on the other side of the y-axis, (a glitch of the stacking function of
ggplot2 or a feature - one cannot unlearn). Note how, from a compete
narrative of Fig 71 (competing bars), we have switched to a growth
mindset narrative with the “what if you could learn anything you wanted”.
How might we use this chart to prioritize what skills to teach.
Chapter 6 Psychology of Visualization
with Ali Fenwick
You can already expect what the correct answer is: they are both the same
size. But perception-wise, they are not. The blue circle looks bigger when it is
accompanied by smaller circles around it and looks smaller when we put
bigger circles around it. Cover the outer balls with your fingers and you see
that both blue circles are exactly the same size.
This visual illusion mirrors how our mind interprets the world. It is
therefore unwise to ignore the broader impact of contextual factors on your
presentations. This effect not only holds true for charts, but also for ephemeral
things such as emotions, attitudes, and points of view. This means that our
brain uses both the external context (e.g. visuals, sounds, smells) as well as
the internal context (e.g. emotions, past experiences, desires) to interpret the
narrative as it happens.
The mental processing of contextual information, be it visual or not,
happens predominantly at the unconscious level. Less than 20% of our
awareness is conscious awareness, which means that factors such as context
can easily influence judgment and decision-making without people being
aware of it, having both intentional and unintentional effects. Let’s see with a
food example how context can influence eating behavior unconsciously.
Fig 80 Professor Wansink studied the effect that the plate’s size has on your
intake of calories. Source: Pelle Guldborg Hansen, inudgeyou.com
The same amount of food is presented on two different size dishes; one
being bigger that the other. Visually, the smaller dish makes the food look
bigger and studies have shown that people who eat from the smaller dish feel
full quicker and therefore eat less. People eating from the bigger dish tend to
eat more than the people eating from the smaller plate. Now think how this
applies to charts.
Cognitive Overload
According to Sweller, cognitive overload is a phenomenon during which
the brain is unable to process information effectively due to the sheer amount
of information presented[43]. Cognitive overload causes mental fatigue and
reliance on mental shortcuts (which are subjected to heuristics and biases). It
is therefore important that design incorporates visual elements which help
overcome cognitive resource limitations and prevent faulty decision-making.
As we saw in the Magic Quadrants in Ch. 5 the main mechanisms to avoid
overload are:
1. Limit the amount of data presented on a table or slide
2. Structure your data into bite-size pieces of relevant
information, and
3. Highlight key words and phrases in sentences which help the
reader pick up the most important parts of the text
Framing
Triggering mental shortcuts in visual design can also be beneficial to
improve visual effectiveness. One way mental shortcuts can be triggered in
visualization is through framing. Framing is a cognitive bias which affects
judgment and decision-making using positive (gain) or negative (loss)
messaging (see again Kahneman’s work[44]). People tend to be more
motivated to take action when messages are framed as either positive (a
gain) or negative (a loss) depending on the situation. For example, studies
show that people are more likely to seek risks when a message is framed as
negative or a loss. Here is an example.
Fig 81 Avoiding deaths (loss thinking frame) leads to more persuasion than
phrasing a policy proposal as “saving lives” (a gain thinking frame).
Source: bi.team
In this chapter, you will learn what bias is and how it can affect a data-
driven visual. Bias not only can be sorted by their point of entry (data, story,
narrative) but also by the area they exploit in the cognition system (optical
illusions, cultural biases). It is easy to assume that bias is intentional.
However, bias can emerge for many reasons.
First, bias can be embedded in the data itself, intentionally in the way it
is gathered but also accidentally by not realizing what is missing.
Second, bias can appear as the story is crafted. Again, this can be
intentional by cherry-picking from existing data, or accidental from cases
where not enough time is spent exploring all data available (usually due to
time pressure).
Third, it can be embedded in the narrative itself. Often this is intentional,
as in propaganda. But it can also be unintentional as in cultural bias.
Types of bias
In broad terms, bias is any systematic error. In other words, a systematic
difference between a model and the “truth” it supposedly represents. In
social sciences bias is judged to be unethical when it is unfair (usually
towards a minority). See also ethical frameworks in Ch. 1.
Bias can affect the producer of a visual (as in selection bias in data), but
also the consumer of the visual (as in Groupthink, and the hot hand fallacy,
and so on). Psychologists and behavioral economists have identified more
than 200 types of cognitive biases. Those can be classified in three groups:
Belief bias
Social bias
Memory bias
In addition, a part of the mentioned cognitive biases, when dealing with
data visualization, visual perception biases also apply. Let’s see some
examples.
Bias in narrative
The broadest forms of unconscious bias are due to unawareness and are
so rooted in society they usually are cultural (moral) norms too. Note that
not all cultural norms are biased but that most norms evolve slower than
society does and thus are usually lagging behind reality. Examples of
conscious narrative bias are: Propaganda and disinformation. Typical
techniques used are FUD (Fear, Doubt and Uncertainty) as seen with the
tobacco industry and FLICC (Fake lies, Logical fallacies, Impossible
expectation, Cherry-picking and Conspiracy theories) as seen with the
climate change denial[45]. Let’s see an example.
Bias in narrative: A balanced meal?
In the second half of the 20th century, a balanced diet was assumed to be
optimal for health. In school, many kids (myself too) were shown charts
with relatively balanced food groups. Yet, other cultures and a few
independent research papers show that perhaps that balanced food narrative
isn’t the healthiest one. For example, Okinawan diet contains less than 5%
of animal protein and no milk derivatives. Their diet would be considered
dangerously unbalanced by any Western standards. However, Okinawans
report one of the healthiest and longest lifespans in the World.
Fig 87 Logical fallacy? How could this “balanced” diet not be healthy if it
is balanced? Source: US Department of Agriculture choosemyplate.org
Solution to fires
Data was central to this story. Just how bad were the fires? There was
one main data source — the National Institute for Space Research
(INPE[49]). Initially, the news coverage presented data that encouraged the
reader to be outraged and suspect a crisis was at hand. The chart displays
data on the number of fires from 2013 to 2019. The graph leads the reader
to think that this was the highest level of fires ever. In fact, CNN on Aug.
22 wrote a story that the forest was “burning at a record rate”. The first
sentence of that CNN story said the rate of burning was a record “since
INPE began recording tracking fires in 2013”. Between the use of words
like ‘double’ and ‘record’ and the use of a visual with the 2019 bar looming
over all the other years shown, the narrative was set. To be sure, agencies
like CNN added to their written story lead: “...and scientists warn it could
strike a devastating blow to the fight against climate change.”
Fig 92 Good journalism?
In this case, INSE did indeed have more data regarding number of fires
(it was just not online!). The BBC updated its chart to include more years.
Here is how it looked when digging further into the non-online archives.
Still, is this the best data we can get? Is there more? Consider what is
being measured: number of fires. Does this mean that if we light five small
fires today and one big one tomorrow, the total number of fires is declining?
Then, The New York Times published the chart below. Here, we have the
same source, INSE, but the measurement is square miles burned. The
numbers used initially weren’t wrong, but rather they were not complete or
fully reflective of the situation, a case of unconscious selection bias.
Due diligence checklist
1. Pay attention to words
Any good set of data will offer transparency into the methodology of
how the data was gathered. This means paying particular attention to what
and how questions asked in surveys or statements made. A red flag is any
use of adverbs and adjectives. They are usually loaded with bias.
2. Follow the money. Who paid for the research?
Big tobacco showed us that the organization that pays for the research
can control its results. For example, the egg industry lobbyists are paying
for research at accredited universities to promote research that says eggs
won’t boost bad cholesterol in humans. In the 1960s, the sugar industry paid
researchers to produce data that made consumers believe fat was a bigger
health hazard than sugar. The list goes on. Next time you see research about
health or the environment, try to discover the identity of the ultimate
financial backer. Follow the money.
3. Pay attention to the statistical methods used
As we saw, sometimes the data is being selected to intentionally support
a position. After performing some statistical analysis, a good rule of thumb
is to always ask a more proficient data scientist to find flaws. It works
wonders and one can learn a lot.
4. Consider the availability of data
Just because the data isn’t publicly online it doesn’t mean it is non-
existent. Post millennial journalists who were never taught how to do
research before the internet existed are particularly vulnerable to this bias.
Quiz: Fire Tweets
[1]
Own estimates [go back]
[2]
Rose, T. (2016). The end of average: How to succeed in a world that values sameness. Penguin
UK. [go back]
[3]
Rand, P., 1985. Paul Rand: A designer's art. New Haven: Yale University Press. [go back]
[4]
Isaacson, Walter. Steve Jobs. 2011. [go back]
[5]
“It takes too long to style a chart like you want by using R or Excel. At the end, it is faster to use
Adobe Illustrator” – Marybeth Sandell, former bureau chief London, Bloomberg. [go back]
[6]
Berengueres, J., 2019, June. Visualization & Storytelling Workshop. In 20th Annual International
Conference on Digital Government Research (pp. 532-533). ACM. [go back]
[7]
See also “how to detect genius ideas” in Ogilvy & Advertising. [go back]
[8]
See also Tanaka, K., 1997. An introduction to fuzzy logic for practical applications. [go back]
[9]
See also the film Merchants of Doubt and Cambridge Analytica scandal. [go back]
[10]
Gallo, C. 2019. The Art of Persuasion Hasn’t Changed in 2,000 Years. [go back]
[11]
Reynolds, George. Ethics in information technology. Nelson Education, 2011. [go back]
[12]
Jefferys, S., 2003. Liberté, Egalité and Fraternité at work: changing French employment
relations and management. Springer. [go back]
[13]
Greene, J. D., Sommerville, R. B., Nystrom, L. E., Darley, J. M., & Cohen, J. D. (2001). An fMRI
investigation of emotional engagement in moral judgment. Science, 293(5537), 2105-2108. [go back]
[14]
In 2006, Al Gore hired Nancy Duarte to make his famous CO2 chart presentation. A scissor lift
was used on stage to show that the CO2 is “off the charts”. Al Gore was later criticized by emitting a
lot of CO2 due to his jet travelling. [go back]
[15]
See also David and Goliath in the book of Samuel. [go back]
[16]
Thunberg, G., 2018. Speech at UN Climate Change COP 24 Conference, Poland 2018. Published
online on YouTube by Connect4Climate. [go back]
[17]
Hyman, J., 2006. The objective eye: color, form, and reality in the theory of art. University of
Chicago Press. [go back]
[18]
Schippers, M. C., Scheepers, A. W., & Peterson, J. B. (2015). A scalable goal-setting intervention
closes both the gender and ethnic minority achievement gap. Palgrave Communications, 1, 15014.
[go back]
[19]
Dweck, C., 2015. Carol Dweck revisits the growth mindset. Education Week, 35(5), pp.20-24. [go
back]
[20]
Corbett, Christianne, and Catherine Hill. Solving the Equation: The Variables for Women's
Success in Engineering and Computing. American Association of University Women. 1111 Sixteenth
Street NW, Washington, DC 20036, 2015 [go back]
[21]
Norman, D., 2013. The design of everyday things: Revised and expanded edition. Basic books.
[go back]
[22]
Mourant, R.R. and Rockwell, T.H., 1972. Strategies of visual search by novice and experienced
drivers. Human factors, 14(4), pp.325-335. [go back]
[23]
See also “Generations in the workforce” [go back]
[24]
Sinton, E (2011). ‘Baby boomers are very privileged human beings’
https://www.telegraph.co.uk/finance/personalfinance/pensions/88409
63/Baby-boomers-are-very-privileged-human-beings.html retrieved October 23, 2013 from
www.telegraph.co.uk [go back]
[25]
] Lukianoff, G. and Haidt, J., 2018. The coddling of the American mind: How good intentions
and bad ideas are setting up a generation for failure. Penguin. [go back]
[26]
Palmore, Erdman. Ageism: Negative and positive. Springer Publishing Company, 1999. [go
back]
[27]
Backes‐Gellner, U. and Veen, S., 2013. Positive effects of ageing and age diversity in innovative
companies–large‐scale empirical evidence on company productivity. Human Resource Management
Journal, 23(3), pp.279-295 [go back]
[28]
https://en.wikipedia.org/wiki/Affluence_in_the_United_States [go back]
[29]
https://www.epi.org/blog/top-1-0-percent-reaches-highest-wages-ever-up-157-percent-since-
1979/ [go back]
[30]
https://www.kaggle.com/harriken/kaggle-journey-2017-2018 [go back]
[31]
Cowan, N., 2010. The magical mystery four: How is working memory capacity limited, and
why?. Current directions in psychological science, 19(1), pp.51-57. [go back]
[32]
Wynn, T., 2002. Archaeology and cognitive evolution. Behavioral and brain sciences, 25(3),
pp.389-402 [go back]
[33]
. [go back]
[34]
Galton, F., 1886. Regression towards mediocrity in hereditary stature. The Journal of the
Anthropological Institute of Great Britain and Ireland, 15, pp.246-263. [go back]
[35]
Gandhi, P., Khanna, S. and Ramaswamy, S., 2016. Which industries are the most digital (and
why). Harvard Business Review, 1. [go back]
[36]
Stewart, P.J., 2019. Mendeleev’s predictions: success and failure. Foundations of Chemistry,
21(1), pp.3-9. [go back]
[37]
Source: Berengueres, J., 2015. The Brown Book of Design Thinking: A workshop based
approach. UAE University College. [go back]
[38]
Johnstone, Keith. Impro for storytellers. Routledge, 2014. [go back]
[39]
Norman, D., 2013. The design of everyday things: Revised and expanded edition. Basic books.
[40]
Berengueres, J., 2007. The Toyota production system re-contextualized. Lulu. com. [go back]
[41]
Tversky, A. and Kahneman, D., 1974. Judgment under uncertainty: Heuristics and biases.
science, 185(4157), pp.1124-1131. [go back]
[42]
Thomas, A.K. and Millar, P.R., 2011. Reducing the framing effect in older and younger adults by
encouraging analytic processing. Journals of Gerontology Series B: Psychological Sciences and
Social Sciences, 67(2), pp.139-149 [go back]
[43]
Sweller, J., 1988. Cognitive load during problem solving: Effects on learning. Cognitive science,
12(2), pp.257-285. [go back]
[44]
Kahneman, D. and Tversky, A., 1981. On the study of statistical intuitions (No. TR-6).
STANFORD UNIV CA, DEPT OF PSYCHOLOGY. [go back]
[45]
Cook, J., Supran, G., Lewandowsky, S., Oreskes, N. and Maibach, E., 2019. America Misled:
How the Fossil Fuel Industry Deliberately Misled Americans about Climate Change. [go back]
[46]
Because the authors of the paper refused to retract their beliefs after the data showed otherwise,
this is also evidence of confirmation bias. [go back]
[47]
Herndon, T., Ash, M. and Pollin, R., 2014. Does high public debt consistently stifle economic
growth? A critique of Reinhart and Rogoff. Cambridge journal of economics, 38(2), pp.257-279. [go
back]
[48]
Pandey, A.V., Rall, K., Satterthwaite, M.L., Nov, O. and Bertini, E., 2015, April. How deceptive
are deceptive visualizations?: An empirical analysis of common distortion techniques. In
Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (pp.
1469-1478). ACM. [go back]
[49]
The National Institute for Space Research (Portuguese: Instituto Nacional de Pesquisas
Espaciais, INPE). It is a research unit of the Brazilian Ministry of Science, Technology and
Innovation. [go back]