0% found this document useful (0 votes)
12 views74 pages

Unit 2 and 3 PDF

Uploaded by

Ravut Malaghan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views74 pages

Unit 2 and 3 PDF

Uploaded by

Ravut Malaghan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

CHAPTER 1

STORYTELLING IN A
DIGITAL ERA

This chapter sets the context for the rest of


this book with an introductory discussion
on data visualization and visual data
storytelling. It explores how these two
concepts are similar and different and how
both practices have been transformed in
the digital era by new technologies and
bigger, more diverse, and more dynamic
data. Lastly, the chapter explores the value
of visual data storytelling for data
communication, and establishes how data
storytelling is the perfect skill to bridge
the very broad and expansive business—
IT gap.
A VISUAL REVOLUTION
A data revolution is happening across the globe. From
academics to politics and everywhere in between, the
world’s stories are being told through their data points.
Although using visualization to tell stories about data isn’t
particularly new (in fact, as you’ll soon discover, we’ve
been doing it for quite some time), we are now telling them
in more influential and impactful ways than ever before.

Today, the resurgence in the power of data visualization—


alongside a virtual gold rush of bigger, more diverse, and more
dynamic data—is providing new tools and innovative
techniques to help us transform raw data into compelling
visual data narratives. Propelled by this newfound horsepower
in data visualization, we are recreating the entire analytic
process. We’re also making it increasingly more visual—from
how we explore data to discover new insights all the way to
how we curate dashboards, storyboards, and interactive
visualizations to share the fruits of our labor. We are always
looking for new ways to show off the messages hidden within
our data, and we’re getting pretty good at it, too. Charts and
graphs created five years ago in Excel do not compare to the
incredible visuals we are now producing with best-of-breed
tools like Tableau, or scripting with dynamic JavaScript
libraries like D3.js (see Figure 1.1).

Our newest breed of data visualizations are moving beyond the


classic bar, line, and pie charts of the past, and pushing beyond
the boundaries of traditional information displays to powerful
new territories of graphic representation. With determination
and a healthy spirit of curiosity and adventure, we are visually
representing our data on everything from massive, mural-sized
visualizations like the Affinity Map,1 a 250-square meter
visualization produced by the Swiss Federal Institute of
Technology in Lausanne, to interactive visualizations like
Trendalyzer,2 a statistical animation visualization developed
by the late Hans Rosling’s Gapminder Foundation, to
streaming visualizations that bring data to life with real-time
movement, to fluid, customizable dashboards that toggle
between form factors from the desktop to the smartphone with
pixel-perfect rendering. If Gene Roddenberry, creator of the
science fiction series Star Trek, had scripted today’s visual
analytics movement, he might have said we are boldly going
where no viz (visualization) has gone before—and he’d be
right.

However, all of these visualizations, from the most dynamic to


the most static, need more than just data to make the leap from
information representation to resonation. They need a story—
something to show, or, more aptly, to “tell” visually—and
finding this tale isn’t always obvious when digging through a
data set. It takes exploration, curiosity, and a shift in mindset
to move from creating a data visualization to scripting a data
narrative. They are similar, but not identical, skill sets.
Figure 1.1 An example of “old” data visualization compared
to its modern equivalent.

Scripting a data narrative might sound like a vague or even an


overwhelming process. After all, many of us would consider
ourselves analysts first, “data people” before storytellers. We
enjoy numbers and analytics and computation more so than the
artsy craft of writing stories. Nevertheless, the two are
fundamentally intertwined: We must know our data, its
context, and the results of analytics in order to extrapolate
these into meaning for an audience who doesn’t. That’s all a
story is, really—one person sharing something new and
unknown with another in a way that is easily understandable
and relatable. The good news? There’s no single way to do it.
We can use several proven narrative frameworks to design a
data storyboard, and numerous quintessential examples exist
where a data storyteller has exercised a generous amount of
creative liberty and done something entirely new. After all,
like any kind of story, data stories require a certain amount of
creativity—and although tools and technology can do much
with our data for us, creativity is a uniquely human
contribution to any narrative (see Figure 1.2). We’ll take a
look at some of these examples as we go forward.

note
Data visualization is the practice of graphically
representing data to help people see and understand
patterns, insights, and other discoveries hidden inside
information. Data storytelling translates seeing into
meaning by weaving a narrative around the data to
answer questions and support decision making.

Data visualization and data storytelling are not the same thing;
however, they are two sides of the same coin. A true data story
utilizes data visualizations as a literary endeavor would use
illustrations—proof points to support the narrative. However,
there’s a little bit of a role reversal here: whereas data
visualizations provide the “what” in the story, the narrative
itself answers the “why.” As such, the two work together in
tandem to translate raw data into something meaningful for its
audience. So, to be a proper data storyteller you need to know
how to do both: curate effective data visualizations and frame
a storyboard around them. This starts with learning how to
visualize data, and more importantly, how to do so in the best
way for communication rather than purely analytical purposes.
As discussed later in this book, visualizations for analysis
versus presentation are not always the same thing in data
storytelling.

One of the most common clichés in the viz space is that “data
visualizations are only as effective as the insights they reveal.”
In this context, effectiveness is a function of careful planning.
Any meaningful visualization is a two-pronged one. It requires
analytical perfection and correct rendering of statistical
information, as well as a well-orchestrated balance of visual
design cues (color, shape, size, and so on) to encode that data
with meaning. The two are not mutually exclusive.
Figure 1.2 A sampling of great data stories in recent headlines
by statisticians and data journalists.

Data visualization is a place where science meets art, although


the jury is still out on whether the practice is more a scientific
endeavor or an artistic one. Although experts agree that a
compelling visual requires both, it tends to be something of a
chicken and egg scenario. We haven’t quite come to a
consensus as to whether science comes before design or we
design for the science—and the decision changes depending
on whom you ask, who is creating the visualization, and who
its audience is. That said, whichever side of the argument you
land on, the result is the same. We need statistical
understanding of the data, its context, and how to measure it;
otherwise, we run the risk of faulty analysis and skewed
decision making that, eventually, leads to risk. Likewise, our
very-visual cognition system demands a way to encode
numbers with meaning, and so we rely on colors and shapes to
help automate these processes for us. An effective visual must
strike the right balance of both to accurately and astutely
deliver on its goal: intuitive insight at a glance.

This might sound like an easy task, but it’s not. Learning to
properly construct correct and effective data visualization isn’t
something you can accomplish overnight. It takes as much
time to master this craft as it does any other, as well as a
certain dedication to patience, practice, and keeping abreast of
changes in software. In addition, like so many other things in
data science, data visualization and storytelling tend to evolve
over time, so an inherent need exists for continuous learning
and adaptation, too. The lessons in this book will guide you as
you begin your first adventures in data storytelling using data
visualizations in Tableau.

FROM VISUALIZATION TO
VISUAL DATA STORYTELLING:
AN EVOLUTION
With all the current focus on data visualization as the best
(and sometimes only) way to see and understand today’s
biggest and most diverse data, it’s easy to think of the
practice as a relatively new way of representing data and
other statistical information. In reality, the practice of
graphing information—and communicating visually—
reaches back all the way to some of our earliest prehistoric
cave drawings where we charted minutiae of early human
life, through initial mapmaking, and into more modern
advances in graphic design and statistical graphics. Along
the way, the practice of data visualization has been aided by
both advancements in visual design and cognitive science as
well as technology and business intelligence, and these have
given rise to the advancements that have led to our current
state of data visualization.

In today’s data-driven business environment, an emerging new


approach to storytelling attempts to combine data with
graphics and tell the world’s stories through the power of
information visualization. For as far back as we can trace the
roots of data visualization, storytelling stretches further.
Storytelling has been dubbed the world’s oldest profession.
Likewise, it is now and has always been an integral part of the
human experience. There’s even evidence of the cognitive
effects of storytelling in our neurology. It’s a central way that
we learn, remember, and communicate information—which
has important implications when the goal of a visualization or
visual data story is to prepare business decision makers to
leave a data presentation with a story in their head that helps
them both remember your message and take action on it. We’ll
discuss the cognitive and anthropological effects of stories
more in later chapters.

Graphing stories is the intersection of data visualization and


storytelling. American author Kurt Vonnegut is quoted as
having famously said, “There is no reason that the simple
shapes of stories can’t be fed into a computer—they have
beautiful shapes.” Likewise, we could restate this to say that
data stories provide the shapes to communicate information in
ways that facts and figures alone can’t. Just as much as today’s
approach to data visualization has changed the way we see and
understand our data, data storytelling has equally—if not more
—been the catalyst that has radically changed the way we talk
about our data.

Learning to present insights and deliver the results of analysis


in visual form involves working with data, employing
analytical approaches, choosing the most appropriate
visualization techniques, applying visual design principles,
and structuring a compelling data narrative. Also, although
crafting an effective and compelling visual data story is, like
traditional storytelling, a uniquely human experience, tools
and software exist that can help. Referring back to Vonnegut’s
quote, stories have shapes. In visual data storytelling, we find
the shape of the story through exploration of the data, conduct
analysis to discover the sequence of the data points, and use
annotations to layer knowledge to tell a story.

To visualize the data storytelling process, consider the graphic


shown in Figure 1.3. This is the process we’ll follow
throughout this book. It’s worthwhile to note that this process
isn’t always as straightforward or linear as it might initially
appear. In reality, this process is, like all discovery processes,
iterative. For example, as a result of analysis we might need to
revisit data wrangling (for example, if we find that we are
missing a required attribute that we need for our proposed
model). Further, as we find insights we might need to revisit
the analysis or adjust the data. Finally, as the story unfolds we
might need to revisit previous steps to support claims we did
not originally plan to make.
Figure 1.3 The storytelling process, visualized.

FROM VISUAL TO STORY:


BRIDGING THE GAP
Before we move into building skills and competencies in
visual data storytelling, let’s take a moment to pause and
think about why we are doing this. We’ve danced around
this already in previous conversations, and while we could
make a convincing argument that mastering new tools and
ways to interact with data is an inevitable result of the big
data era, that would only be half of the reason. Data will
continue to grow, technologies to adapt and innovate, and
analytical approaches to chart new territory in how we work
with and try to uncover meaning and value hidden within
our data. The real value in becoming a data storyteller is to
amass the ability to share—to communicate—about our
data.

So far, I’ve put data visualization first and communication


second, because that is the order you follow when you
structure your visual analysis—you have to explore and build
something before you can tell a story about it. However, we
shouldn’t underestimate the communication that happens
before you ever touch your data. Communication skills are a
prerequisite listed on every job description, but just how
important are these skills in data analysis and visual data
storytelling—and why?

In 2012, academic researchers with the AIS Special Interest


Group on Decision Support, Knowledge, and Data
Management Systems (SIG DSS) and Teradata University
Network (TUN) formed the Business Intelligence Congress 3
to survey and assess the state of business intelligence and
analytics. They surveyed more than 400 recruiters from
technical companies, asking what skills and competencies they
looked for in new analytic hires.

Their number one answer? Communication skills3 (see Figure


1.4).

The BI Congress survey isn’t the only piece of data to pinpoint


the importance of communication skills in analysis. A second
recent piece of research comes from data research and
advisory firm Gartner.4 It conducted a research study to
determine why big data projects fail—specifically, what
percentage of big data projects fail due to organizational
problems, like communication, and what percentage fail due to
technical problems, like programming or hardware? Only
about 1% of companies responded that technical issues alone
were the fail point of their data analytics problems. The other
99% of companies said that at least half of the reasons their
data analytics projects failed were due to poor organizational
skills, namely communication, and not technical skills.
Figure 1.4 According to the BIC3 Survey published in 2014,
communication skills outrank technical skills for getting a
business analysis job.

Of course, there isn’t a perfect correlation between


organizational skills and communication, but the reality is that
one of the most important organizational skills is the ability to
communicate—hence its inclusion in every business academic
program and on every aforementioned job posting. Although
communication skills might live on the softer side of things in
terms of skillsets, it is nonetheless a skill that is critical for
success, particularly when helping others to see the story
within data. However, sharing a story isn’t enough. Anyone
can do that. If we can’t communicate, we can’t inspire change
or action. Real communication is a two-way dialogue between
a sender and a receiver, or receivers. It prompts an action,
supports a decision, or generates understanding.

When we discuss the importance of communication skills


within the context of data storytelling we are looking at it from
an audience-first perspective. This means putting the
audience’s needs ahead of the storytellers. Successful
communication hinges on the ability to influence the people
who matter the most—the stakeholder to your analysis, be that
an executive, a teacher, the general public, or anyone else.
Ultimately, how data—visual or otherwise—is interpreted is
fundamentally influenced by context. Context is a multifaceted
thing. It is driven in part by your audience, but just as
important to your story is the part of the context driven by you
—your assumptions, your goals, and what you already know.

Understanding the importance of context is the focus of


Chapter 4. For now, to answer the question I posed earlier—
how important are communication skills in visual data
storytelling?—they are paramount.

A NOTE ABOUT “DESSERT CHARTS”

After more than 200 years of use (the first being


credited to William Playfair’s Statistical Breviary of
1801) what have come to be called “dessert charts”—
those circular visualizations including pie and donut
charts that “slice” data into wedges reminiscent of our
favorite sweets—have had a bit of a fall from grace.
Although still widely in use, many visualization
experts and educators preach against the use of these
types of charts, myself included. However, it should be
noted that hatred of pie charts is not merely an
opinion, and there is empirical research that provides
the basis for why these types of charts just don’t work
analytically. That said, there are ways to use them
productively—particular as mechanisms for data
storytelling—if a few words of caution are followed.
We’ll take a deeper look at how to best curate “dessert
charts” for visual data storytelling in Chapter 7,
“Preparing Data for Storytelling.”

DATA SCIENCE EDUCATION GETS ON THE


MAP

By now we are all in agreement: The business of data


is changing. Business users are more empowered to
work with data; IT is shifting its focus to be less about
control and more about enablement. New data science
job descriptions—like data scientist and visual data
artist—are springing up as companies look for the
right people with the right skill sets to squeeze more
value from their data. Data itself is getting bigger,
hardware more economical, and analytical software
more “self-service.” We’ve embraced the paradigm
shift from traditional BI to iterative data discovery.
We’re depending on data visualization and data
storytelling to see, understand, and share data in ways
like never before. It’s the visual imperative in action.
As you might expect, these changes have a significant
effect on how people work in data science, be they
executives, data scientists, researchers, analysts, or
even data storytellers. There are a lot of skills
available and a very big toolbox to choose tools from,
and we are all learning together. Adding to that, over
the past few years we’ve been reminded that data
workers are in high demand, and we’ve seen firsthand
how limited the current supply is. There’s the familiar
U.S. Bureau of Labor Statistics estimate that expects
1.4 million computer science jobs by 2020. Another
familiar statistic from the McKinsey Global Institute
estimates that there will be 140,000 to 180,000
unfilled data scientist positions in the market in 2018.
That’s a lot of empty seats to fill. So, we are faced
with two challenges: 1) we need more capable data
people and 2) we need them with deeper, more
dynamic skillsets. This means we have to start
thinking about cultivating talent—rather than
recruiting it—and training an incoming workforce
isn’t something that an industry can do alone, no
matter how many specialized software training
programs, MOOCs, conferences, and excellent
publications we produce. To enact lasting change and
a sustainable funnel of competent data workers suited
to the new era of the data industry, we need to move
further down the pipeline to that place where we all
discovered we wanted to be data people in the first
place: the classroom.
That’s exactly what we’re doing. The academic
community has been tasked with developing new
educational programs that can develop the skills and
education needed by new data science professionals.
These university information science programs—
called business analytics, data science, professional
business science, or the dozen or so terms used by
academia—are only just beginning to be sorted out.
However, they are growing exponentially across the
country, and so far enrollment is promising.
Different universities are taking different approaches
to structuring a new kind of data science education.
Some are developing entirely new pedagogy focused
on the fluid and dynamic fields of data science. Others
are reshaping existing curricula by unifying across
academic silos to integrate disciplines of study,
particularly among business and IT domains. Others
are forming academic alliance programs to give
students learning experiences with contemporary
industry tools and creating projects that expose
students to analytical problems within real-world
business context.
Nevertheless, all universities are listening to campus
recruiters, who are clearly saying that we need people
with more data skills and knowledge, and they’re
working hard to fill that gap. More importantly, there
are a few things that these programs have in common.
They’re focused on real-world applications of data
problems. They’re doing their best to keep pace with
fluid changes in technology adoption, new
programming languages, and on-the-market software
packages. They’re also putting a premium on data
visualization and data storytelling. Vendors like
Tableau with its Tableau for Teaching program are
helping, too.
As you might expect, these changes have a significant
effect on how people work in data science, be they
executives, data scientists, researchers, analysts, or
even data storytellers. There are a lot of skills
available and a very big toolbox to choose tools from,
and we are all learning together. Adding to that, over
the past few years we’ve been reminded that data
workers are in high demand, and we’ve seen firsthand
how limited the current supply is. There’s the familiar
U.S. Bureau of Labor Statistics estimate that expects
1.4 million computer science jobs by 2020. Another
familiar statistic from the McKinsey Global Institute
estimates that there will be 140,000 to 180,000
unfilled data scientist positions in the market in 2018.
That’s a lot of empty seats to fill. So, we are faced
with two challenges: 1) we need more capable data
people and 2) we need them with deeper, more
dynamic skillsets. This means we have to start
thinking about cultivating talent—rather than
recruiting it—and training an incoming workforce
isn’t something that an industry can do alone, no
matter how many specialized software training
programs, MOOCs, conferences, and excellent
publications we produce. To enact lasting change and
a sustainable funnel of competent data workers suited
to the new era of the data industry, we need to move
further down the pipeline to that place where we all
discovered we wanted to be data people in the first
place: the classroom.
That’s exactly what we’re doing. The academic
community has been tasked with developing new
educational programs that can develop the skills and
education needed by new data science professionals.
These university information science programs—
called business analytics, data science, professional
business science, or the dozen or so terms used by
academia—are only just beginning to be sorted out.
However, they are growing exponentially across the
country, and so far enrollment is promising.
Different universities are taking different approaches
to structuring a new kind of data science education.
Some are developing entirely new pedagogy focused
on the fluid and dynamic fields of data science. Others
are reshaping existing curricula by unifying across
academic silos to integrate disciplines of study,
particularly among business and IT domains. Others
are forming academic alliance programs to give
students learning experiences with contemporary
industry tools and creating projects that expose
students to analytical problems within real-world
business context.
Nevertheless, all universities are listening to campus
recruiters, who are clearly saying that we need people
with more data skills and knowledge, and they’re
working hard to fill that gap. More importantly, there
are a few things that these programs have in common.
They’re focused on real-world applications of data
problems. They’re doing their best to keep pace with
fluid changes in technology adoption, new
programming languages, and on-the-market software
packages. They’re also putting a premium on data
visualization and data storytelling. Vendors like
Tableau with its Tableau for Teaching program are
helping, too.
So just how big is data science education? Over the
past couple of years, the number of new business
analytics program offerings has significantly
increased. In 2010 there were a total of 131 confirmed,
full-time BI/BA university degree programs, including
47 undergraduate-level programs. Today, that number
has nearly tripled and continues to rise with new and
improved programs at the undergraduate, graduate,
and certificate levels—both on and off campus—
springing up at accredited institutions across the
country (see Figure 1.5). So, while we might not have
access to all this new data talent yet, if academia has
anything to say about it, help is on the way.

note
This dataset is regularly updated and maintained by Ryan
Swanstrom, and is available via Github at
https://github.com/ryanswanstrom/awesome-datascience-
colleges.

Figure 1.5 Business analytics degree programs in the


United States.

SUMMARY
This chapter focused on providing an introductory
discussion on data visualization and visual data storytelling
by taking a look at how these concepts are similar and
different, and how both have been transformed in the digital
era. The next chapter takes a closer look at the power of
visual data stories to help us understand what makes them
so powerful and important in today’s data deluge.

_____________
1. https://actu.epfl.ch/news/the-world-s-largest-data-visualization/
2. https://www.gapminder.org/tag/trendalyzer/
3. Wixom, Barbara; Ariyachandra, Thilini; Douglas, David; Goul, Michael; Gupta, Babita; Iyer,
Lakshmi; Kulkarni, Uday; Mooney, John G.; Phillips-Wren, Gloria; and Turetken, Ozgur (2014).
“The Current State of Business Intelligence in Academia: The Arrival of Big Data,”
Communications of the Association for Information Systems: Vol. 34 , Article 1.
4. http://www.gartner.com/newsroom/id/2593815
CHAPTER 2

THE POWER OF VISUAL


DATA STORIES

This chapter uses quintessential and real-


life examples from the visual storytelling
canon to display the powerful ability of
data stories to communicate discoveries
and insights hidden in data. We will
ground these lessons by taking time to
understand what makes data visualization
and stories so influential to the human
brain from both a cognitive and an
anthropological perspective.
THE SCIENCE OF
STORYTELLING
The world of data is changing. So is how we tell stories
about it.

In a September 2016 interview with NPR Marketplace,1


National Geographic editor-in-chief Susan Goldberg spoke to
host Kai Ryssdal about the power of visual storytelling, which
has provided a transformative conduit for the publication in
the new digital era. Speaking to National Geographic’s
conversion from traditional print magazine to social media
heavyweight, Goldberg commented that “everything is visual
today”—especially stories. It’s worth noting that National
Geographic is dominating visual storytelling online, using
powerful imagery to captivate and educate 19 million
Snapchat users, 60 million Instagram followers, and 50 million
Facebook followers. The magazine is also throwing its hat in
the ring with data visualization with its Data Points blog.

Media and journalists are not the only ones putting emphasis
on data storytelling, although they arguably have been a
particularly imaginative bunch of communicators. Today
we’ve seen the power of storytelling used to color in
conversations on just about every type of data imaginable—
from challenging astronomical principles, to visualizing the
tenure pipeline at Harvard Business School, to quantifying the
fairytale of Little Red Riding Hood. In every organization and
every industry, data stories are becoming the next script for
how we share information.

For as diverse as data stories can be, they all have one thing in
common: They give us something to connect to in a very
literal sense. Let’s delve into the power of stories, first by
looking behind the curtain at the science of storytelling and
then looking at some incredible data stories over time to see
how they have capitalized on the secret sauce of storytelling.

The Brain on Stories


In Chapter 1 I mentioned that evidence exists of the
cognitive effects of storytelling embedded within our
neurology. Here’s how: When we are presented with data,
only two parts of our brain respond. These are Wernicke’s
area—responsible for language comprehension—and
Broca’s area—responsible, again, for language processing.
For the very powerful human brain, data is easy. The brain’s
response to these stimuli is a relatively simple input-and-
respond transaction that requires the utilization of these two
basic areas. Because we’re focused only on seeing and
responding to information (agree/disagree), there’s no great
need to overexert our neuro-horsepower.

Unlike simple data, stories require a substantial cognitive


boost. Here’s an easy thought exercise. Imagine that tonight
we have pasta on the menu. However, our pantry is empty, so
to prepare this meal we need to go to the market. Let’s make a
quick mental list of our ingredients: pasta, some tomato sauce,
perhaps some herbs, garlic, and Parmesan cheese—if we’re
feeling fancy we can grab a loaf of garlic bread, too. Now, let’s
pretend we get to the market, only to discover it’s closed. So,
instead of cooking we decide instead to go to our favorite
Italian restaurant (it’s okay if yours is Olive Garden—mine is,
too) and order something from the menu. Suddenly the image
changes: We’re no longer looking at a bunch of individual
items on a grocery list; we’re imagining a waiter setting down
a big, beautiful dish of flavorful and delicious spaghetti in
front of us. Perhaps we also hear the buzzing backdrop of
restaurant sounds—water glasses, clinking silverware, and so
on. If we think about it long enough—or if we’re hungry
enough—we can almost taste the food.

This is the difference between visualizing data, and presenting


a story: rather than itemizing a list of ingredients (data points)
we are presenting a full, sensory-engaging dining experience
(see Figure 2.1).

Figure 2.1 Visualizing versus presenting.

You can think of this storytelling experience in a more


traditional way, too, by considering the difference in reading a
novel and watching a film. When reading, you are tasked with
using your imagination—you’re reading the raw data of words
and building the story in your own mind. Conversely, when
watching a film, your imagination is off the hook. Images of
characters and settings, costumes, spoken dialogue, music, and
so on are displayed for you on the screen. When you watch a
live presentation, like a play or a 4D movie, you also get a few
extra pieces of sensory information, like the smell of a smoke
machine or carefully chosen scents to accompany the story
pumping through the air.

These extra storytelling details have a profound effect on the


brain (see Figure 2.2). Beyond the two areas of the brain that
activate when presented with data, when presented with a
story, five additional areas respond. These are

The visual cortex (colors and shapes)


The olfactory cortex (scents)
The auditory cortex (sounds)
The motor cortex (movement)
The sensory cortex/cerebellum (language comprehension)
Figure 2.2 The brain on stories.

The Human on Stories


Beyond the sciences, there’s also a lot of truth to the old
saying “everyone loves a good story.”

Storytelling has been an integral part of human expression and


culture throughout time. All human cultures tell stories, and
most people derive a great deal of pleasure from them—even
if they are untrue (think of fantastical stories or fables).
Beyond entertainment, stories teach us important lessons; we
learn from them. In many cases they are how we transmit
information—whether through metaphoric tales, instructions,
or legends. Stories also have the ability to transport us; we
give the author license to stretch the truth—although, in data
storytelling, this license extends only as far as it can before the
data loses its elasticity and begins to break down. Data stories,
above all, must be true. They are works of narration, but of the
non-fiction variety.

Okay, so we love stories—but why? There’s no easy answer to


this question, and frankly from academe to industry, the
research is crowded with books and articles attempting to
explain the cognitive basis of all storytelling and literature
under the heading of storytelling psychology. That said, we
can distill all of these dialogues into two primary possible
contenders for why we tell stories: the need to survive (fitness)
and the need to know (closure).

Fitness
As much as we might try to argue it, human beings did not
evolve to find truth. We evolved to defend positions and
obtain resources—oftentimes regardless of the cost—to
survive. These concepts are at the heart of Darwinian theory
of natural selection: survival of the fittest as the mechanism,
and our ability to overcome (or, biologically, to reproduce),
fitness.

Human biology aside to survive in competitive and often


unstable environments—whether wilderness or business—one
thing we’ve always had to do is understand other people. In
fact, one of our most expensive cognitive tasks where we exert
an impressive amount of energy is in trying to figure out other
people: predict what they’re going to do, understand
motivations, assess relationships, and so forth. Beyond people,
we are also driven to understand how things work. If we know
how they work, we can conquer, fix, or control. All of these
lead to winning, which equates to survival and continuation.
Stories act as guides to give us the information and confidence
we need to harness this knowledge. They increase our fitness.

Closure
Aside from being bent on survival, humans also tend to
require closure. The few philosophical exceptions
notwithstanding, in general we don’t enjoy ongoing
questions and curiosities with no resolution—we need
endings, even unhappy ones. We simply can’t abide
cliffhangers; they’re sticky in the worst of ways, bouncing
around in our brains until we can finally “finish” them and
put them to rest. There’s actually a term for this
phenomenon called the Zeigarnik effect. It was named for
Soviet psychologist Bluma Zeigarnik who demonstrated
that people have a better memory for unfinished tasks that
they do for finished ones. Today, the Zeigarnik effect is
known formally as a “psychological device that creates
dissonance and uneasiness in the target audience.”

In essence, the Zeigarnik effect speaks to our human need for


endings. No matter the story’s goal—to focus, align, teach, or
inspire—we build narratives to foster imagination, excitement,
even speculation. Successful narratives are those that are able
to grab the audience’s attention, work through a message, and
then resolve our anxiety with a satisfactory ending. Thus,
stories are therapeutic—they give us closure.

THE POWER OF STORIES


We’ve established that data stories are powerful, and that
they are powerful because of their ability to communicate
information, generate understanding and knowledge, and
stick in our brains. However, as information assets, visual
data stories have a few other noteworthy qualities.

But first, let’s set the record straight. There is much to be said
about how visual data stories creating meaning in a time of
digital data deluge, but it would be careless to relegate data
storytelling to the role of “a fun new way to talk about data.”
In fact, it has radically changed the way we talk about data
(though certainly not invented the concept). The traditional
charts and graphs we’ve always used to represent data are still
helpful because they help us to better visually organize and
understand information. They’ve just become a little static.
With today’s technology, fueled by today’s innovation, we’ve
moved beyond the mentality of gathering, analyzing, and
reporting data to collecting, exploring, and sharing information
—rather than simply rendering data visually we are focused on
using these mechanisms to engage, communicate, inspire, and
make data memorable. No longer resigned to the tasks of
beautifying reports or dashboards, data visualizations are
lifting out of paper, coming out of the screen, and moving into
our hearts, minds, and emotions. The ability to stir emotion is
the secret ingredient of visual data storytelling, and what sets it
apart from the aforementioned static visual data renderings.

As we’ll explore in later chapters, emotional appeal isn’t


enough to complete a meaningful visual data story. Like any
good tale, a data story requires an anchor, or a goal—be it a
reveal, a call to action, or an underlying message—to pass to
its audience. This idea isn’t unique to data storytelling by any
means, but a construct applied to all varieties of stories. When
a story imprints on our memory, it requires emotion plus a
willingness to act on that emotion.

Instead of talking about the power of visual data stories, let’s


see them in action. As we do, we’ll be looking for the
following key takeaways:

Sometimes the only way to see the story in data is


visually.
A good story should meet its goals—and it should be
actionable.
A story should change, challenge, or confirm the way you
think.
Storytelling evolves—don’t be afraid to try something
new.

The Classic Visualization Example


One of the core tenants of a visual data story is that it uses
different forms of data visualization—charts, graphs,
infographics, and so on—to bring data to life. Perhaps one
of the most archetypal examples of the power of data
visualization to help people see and understand data in ways
they never would by looking at rows and columns of raw
black and white data comes from Anscombe’s Quartet (see
Figure 2.3). Constructed in 1973 by statistician Francis
Anscombe, these four datasets appear identical when
compared by their summary statistics. If you review the
table, you will notice that each dataset has the same mean of
both X and Y, the same standard deviation, the same
correlation, and the same linear regression equation.

Figure 2.3 Four seemingly identical datasets known as


Anscombe’s Quartet.

Even though the individual variables are different, if the


statistical outputs are the same, we would expect these, when
graphed, to look the same. The “story” for each of these
datasets should be the same—right? Wrong.

When graphed (see Figure 2.4), we can see beyond the


limitations of basic statistical properties for describing data,
and can tell a bigger picture of the datasets and the
relationships therein.

Anscombe’s example might be classic in terms of putting


some support behind visual horsepower, but it only brushes the
tip of the iceberg in terms of visual data storytelling. Although
we might not yet have everything we need to tell a story, we
can start to see that the sets are not so similar as they might
appear, and there is something worth talking about in these
datasets. We know there is a story there, and we know we need
to visualize it to see it, but we are still left wanting. This isn’t
quite a visual data story, but it’s definitely a first step.

Figure 2.4 Anscombe’s Quartet, visualized.

Story Takeaway

Sometimes the only way to see the story in data is


visually.
Using Small Personal Data for Big
Stories
When it comes to telling a story, no one knows how to do it
better than Hollywood—except maybe networks like
Netflix and AMC that are using massive amounts of
consumer-generated data as recipes to create new content.

Graphic designer Chelsea Carlson decided to take this


approach to a personal level. In a 2016 experiment, Chelsea
focused on analyzing her personal Netflix viewing habits to
see what story her own data might tell about her television
binging habits, tastes and preferences, and—perhaps more
important in a streaming TV market saturated with more new
shows every day—possibly even help her predict a new
favorite by telling her exactly what to look for (this, by the
way, is not too unlike how Netflix is using its user viewing
data to curate new shows).

Like many analysts, Chelsea began her experiment by


collecting and organizing her Netflix viewing data in
spreadsheets organized in Microsoft Excel. She tracked several
variables on her top 27 favorite shows, including things like
genre, language, main character gender, episode length, IMDB
rating, and more (see Figure 2.5). As a tool, a color-coded
spreadsheet helped Chelsea get a bird’s eye view of some of
the interesting patterns and trends in her data (like whether she
seemed to prefer multi-season shows or if her favor aligned
with award winners) as well as areas where her tastes were
less predictable (no preference for age and race of the lead
character or the show’s setting or length). However, this was
the extent of meaningful analysis that Chelsea could achieve
when limited to scouring rows and columns of information—
even colored ones (see the upcoming sidebar “Color Cues”).

Like Anscombe’s Quartet, when Chelsea plotted her data it


transformed beyond its meager Excel boundaries and moved
into the realm of visual storytelling, this time showing a much
richer tale (see Figure 2.6).

Figure 2.5 Chelsea Carson’s Netflix data spreadsheet, in table


form.
Figure 2.6 Chelsea Carlson’s Netflix data visualized.

note
See more of Chelsea’s Netflix data story at
https://www.umbel.com/blog/data-visualization/netflix-
chill-little-data-experiment-understanding-my-own-
taste-tv/.

As a visual storyteller, Chelsea worked through visual


discovery and a variety of graph types that included scatter
plots, packed bubble charts, timelines, and even pie charts to
build her data story. She also integrated expressive visual
elements—particularly size and color—to provide visual cues
to assign meaning to the visualization and highlight certain
insights. As a result, Chelsea was able to come away with a
rich visual data story encapsulated within a series of very
deliberately crafted visualizations. There are several
interesting story points to pick out within this visualization—
including a strong bias for costume dramas and shows cut
short—and you can explore them for yourself in the URL
included. However, perhaps the most salient point is that
through this story Chelsea can take action on the goals she set
for this visual story. She can clearly see her tastes and
preferences, and when she goes scrolling through Netflix for
her next binge-worthy show, she’ll know to look for a female-
led costume drama with a genre-bending storyline.

Story Takeaway

A good story should meet its goals—and it should be


actionable.

COLOR CUES

The Netflix experiment brings to mind an important


learning point in the power of data visualization. One
of the most important lessons in learning how to build
data visualization is learning how to leverage what are
referred to as pre-attentive features—a limited set of
visual properties that are detected very rapidly (around
200 to 250 milliseconds) and accurately by our visual
system, and are not constrained by display size. A
good visualization—the building blocks of a visual
data story—reduces time to insight and leverages our
brain’s pre-attentive features to shave time as low as
possible.
Let’s take a look at the pre-attentive feature known as
perceptual pop-out. Perceptual pop-out is the use of
color as a beacon to pre-attentively detect items of
importance within visualization. The shape, size, or
color of the item here is less important than its ability
to “pop out” of a display. Further, these should be used
sparingly, and with intention. Too many of these
features at once negates their impact, or—worse—can
have a detrimental effect on your visual.
Consider a visit to the eye doctor, when your vision is
tested by the ability to spot a flash of color in a sea of
darkness, or take a look at Figure 2.7.

Figure 2.7 A table showing companies with respective


annual gross profits, 2013–2016.
*All data gathered from www.amigobulls.com

This is a simple table with only three companies, but


suppose I asked you to tell me, in each year, which
company had the highest gross profit? You are tasked
with analyzing each box of the table, line by line, to
assess each year independently and select the highest
number. You might even have to write it down or mark
it in some way to help you remember the winner. Go
ahead and give it a try. It should take you roughly one
minute to complete the exercise.
Now, take a look at Figure 2.8 and try again.

Figure 2.8 A table showing companies with respective


annual gross profits, replaced by color, 2013–2016.

This time, we’ve replaced the numerical data with a


visual cue. Rather than reading the table, perceptual
pop-out makes completing this exercise near instant.
We don’t have to actually “look” for answers; we
simply “see” them instead.
Because the sample we are looking at is so small, this
is a good time to remark on the special partnership
between color and counting. Essentially, the fewer
things there are to count, the quicker we can count
them—which makes sense. If I asked you which
company outperformed the others, Disney would be an
easy response as it has three out of four of the orange
squares.
Our ability to “count” visually is called numerosity. It
is a numerical intuition pattern that allows us to see an
amount without actually counting it, and it varies
among individuals although the typical counting
amount ranges between two and ten items.
As you build visualizations as part of your storyboard
framework, be sure to pay careful attention to color
and counting to help your audience easily and
intuitively experience your story.

The Two-or-Four Season Debate


In school, we’re taught that a full year includes four distinct
seasons—spring, summer, fall, and winter. Yet, some people
argue that only two true seasons exist—summer and winter
—and they’re using a form of visual data storytelling (and a
good heaping of rationality) to prove their point. My
favorite of these comes from artificial intelligence
researcher Nate Soares’ blog, Minding Our Way.2

The item up for debate in this story is a simple one: Is it fair to


qualify “waxing summer” (also known as spring) and “waning
summer”” (also known as autumn) as full seasons? Sure, it’s
familiar and if you live in the northern hemisphere you can
likely distinguish the seasons according to their observable
natural phenomena—such as their colorful transitions—
flowers blooming or leaves changing color—rather than their
actual astronomical dates (and this doesn’t even begin to open
the conversation on astronomical versus meteorological dates
of change3).

Let’s begin to build a story around this and see where we end
up. First let’s agree on a foundation: The year follows a
seasonal cycle that starts cold and gets progressively warmer
until it peaks and begins to cool again. Repeat. Right? This is a
pretty basic assumption. More importantly, it’s one that we can
successfully chart—loosely and without requiring any more
specific data or numbers at all. Rather, we’ll use points from
the basic story premise we laid out earlier to graph a seasonal
continuum for the year, using length of daylight as our curve
(see Figure 2.9). From there, we can try to decide just how
many seasons are really in a year.

Figure 2.9 The seasonal cycle of a single year.

How many curves does the orange line trace? The answer,
obviously, is two—hence the two-season viewpoint (see
Figure 2.10).

Figure 2.10 Two seasons.

Now, we could break this down further with more information.


We could add in astronomical dates or mull over geographic
differences in weather or meteorology. However, whether or
not you agree with Nate and me (and others!) on the number
of qualifying seasons that occur over the course of one year,
the preceding two graphs represent a powerful data story—and
they don’t even require the type of “hard data” (rows and
columns of numbers) that we would typically expect. This
shows us—quite literally—that to tell a great story doesn’t
necessarily require a ton of data. It just requires a few points, a
goal, and the creativity to visualize it for your audience in a
way that affects their opinion.

Story Takeaway

A story should change, challenge, or confirm the way


you think.

Napoleon’s March
As I’ve mentioned, using visualizations to tell stories about
data is not a new technique. French civil engineer Charles
Joseph Minard has been credited for several significant
contributions in the field of information graphics, among
them his very unique visualizations of two military
campaigns—Hannibal’s march from Spain to Italy some
2,200 years ago and Napoleon’s invasion of Russia in 1812.
Both of these visualizations were published in 1869 when
Minard was a spry 88 years old.

Minard’s flow map of Napoleon’s invasion of Russia (see


Figure 2.11)—unofficially titled “Napoleon’s March by
Minard”—tells the story of Napoleon’s army, particularly its
size (by headcount) as it made its way from France to Russia
and home again. As you read this visualization, moving left to
right, the beige ribbon thins, signaling the waning of
Napoleon’s army from 422,000 to 100,000 as they marched
east, during the winter, to Moscow. The army turned around
and retreated, returning to France with a mere 10,000 men. We
can move through the visualization, imagining the soldiers’
journey and peril as they hiked through increasingly
inhospitable and unfamiliar territory, turning around and
coming home, losing more than 400,000 comrades on the way
to war, cold, and disease.

Figure 2.11 Napoleon’s 1812 March by Minard, 1869.

Obviously, this was not a successful war, and as an analysis


piece Minard’s map is not a successful one analytically.
However, as a visual story around human drama, it has earned
the distinction of becoming known as one of the best
storytelling examples in history. You would be hard pressed to
take a data visualization class today and not experience
Napoleon’s march. It’s fair to note, too, that several analysts
have tried to recreate it, using more common statistical
methods but all fall short of the original’s storytelling appeal.

Minard’s second military visualization, Hannibal’s journey


through the Alps (not pictured), is similar in concept to
Napoleon’s march, although it didn’t quite pull off the same
memorable story. Most stories have an inherent amount of
entropy—we need to tell them quickly and succinctly, and
many times this means we only get one chance. In fact,
numerous examples of this “once and done” effect exist in
more modern visual data stories, too. These one-hit wonders
are an expected consequence of good stories. Sometimes we
only need to tell them once—no sequels necessary.

Story Takeaway

Stories have an inherent amount of entropy, and some


we tell only once.

Stories Outside of the Box


Thus far we’ve looked at some of the most classic examples
of visual data stories to those more modern. We’ve even
looked at visual data storytelling without data in the classic
sense. Now, let’s finish our tour of the power of visual data
storytelling with one of the most quintessential instances on
the books: Nigel Holmes’ “Monstrous Costs” (see Figure
2.12).
Figure 2.12 Nigel Holmes’ Monstrous Costs.

This hand-drawn illustration does exactly what a visual data


story is supposed to do: It transforms boring data into
something alive. At its core, this data visualization is little
more than a bar chart that shows rising costs on political
campaign expenditures, but it’s the storytelling detail that
gives it the flair that has made it such a powerful example. It
weaves a story around the data, anthropomorphizing these
costs from dollars and cents to a ravenous beast, replete with
jagged teeth and flying spittle. As with the Napoleon’s March
by Minard graph, we’ll take a much closer and more critical
look at this story in a later chapter, but for now the lesson is
simply that visual stories come in all shapes and sizes, some
more technical looking and some so unique and personalized
that they are barely recognizable as visualizations.

What masterful storytellers can do is to straddle that balance,


and capitalize on the best features to tell their story. In
Monstrous Costs, these features allow the image to hook into
memory, clearly telling the story of rising campaign costs with
the intended emotion of the storyteller.
Story Takeaway

Don’t be afraid to try something new.

SUMMARY
In this chapter we discussed what makes stories so
impactful on the human brain. We then looked at a few real-
life examples of visual data storytelling in action. We could
analyze many more examples for this purpose, and more are
available online at the website companion to this book,
www.visualdatastorytelling.com.

Now, let’s get ready to put this information to work in Tableau.


In the next chapter we’ll begin exploring the Tableau
ecosystem, and take a journey through its freshly redesigned
user interface. This will form the basis for later hands-on
practice in exploring and analyzing data visually as we work
toward building complete visual data stories.

_____________
1. www.marketplace.org/2016/09/26/sustainability/corner-office-marketplace/dont-call-
nationalgeographic-stodgy
2. http://mindingourway.com/there-are-only-two-seasons/
3. https://www.ncdc.noaa.gov/news/meteorological-versus-astronomical-seasons
CHAPTER 3

GETTING STARTED WITH


TABLEAU

The goal of this chapter is to help you get


your footing with the Tableau product
ecosystem and use the basic Tableau
interface so that you are familiar enough
with the tool to begin working hands-on
with data. This chapter covers how to get
started with Tableau, reviews the tool’s
basic functionality, discusses how to
connect to data, and provides an overview
of data types in Tableau. From here, you
will be able to move on to the visual
analysis process, curating visuals, and
building stories. The version of Tableau
available at the time of this writing is
Tableau 10, and this chapter illustrates
using the Mac version of the software
(little to no difference exists between Mac
and Windows versions, although some
aesthetic differences might be apparent).
If you are already an intermediate Tableau user and familiar
with the v10 interface and Tableau terminology, you might
want to skip this chapter and move on to Chapter 4,
“Importance of Context in Storytelling.”

USING TABLEAU
Standing out against many other data visualization tools on
the market, Tableau is an industry-leading, best-of-breed
tool that delivers an approachable, intuitive environment for
self-service users of all levels to help them prepare, analyze,
and visualize their data. The software also provides delivery
channels for the fruits of its user’s visual analysis, including
dashboards and native storytelling functionality, called
“story points” in Tableau.

Tableau’s stated mission is to help everyone “see and


understand” their data, and to facilitate this the company offers
a suite of software products, including a recently released free
mobile app called Vizable, designed to suit the needs of a
diverse group of clients from enterprise-level organizations to
academic users and visualization hobbyists who want to
visualize data in a mobile-first format. All the Tableau
products excel at displaying data visually, using a drag-and-
drop canvas on top of embedded analytics to help users
explore their data. Tableau Desktop can connect to a wide
variety of data, stored in a variety of places—from local
spreadsheets, to multidimensional databases, and even some
cloud database sources, like Google Analytics, Amazon
Redshift, or Salesforce—and the number of connections is
always increasing.1 Although Tableau can mimic Excel by
providing the capability to analyze rows and columns of
numbers, its focus is on interactive, visual data exploration
through its analytic capabilities as well as dashboarding and
storytelling features, no programming required. For more
advanced users, Tableau supports a complete formula language
and robust data connections: Tableau’s live query engine
allows users to connect to more than forty different data
sources; its in-memory data engine leverages the complete
memory hierarchy from disk to L1 cache and shifts the curve
between big data and fast analysis.

WHY TABLEAU?
In his book, Communicating Data with Tableau, author Ben
Jones included a personal note to his readers on why he
chooses Tableau. I thought I might do something similar.

My reference for Tableau is part personal preference and part


professional opinion. When I was an analyst in the data
science community, I had the opportunity to work hands-on
with many of the leading data visualization technologies on
the market and get to know the vendors in the space. Being
fluent on the tools available, their capabilities and limitations,
and the viability of their provider was a required part of my
job as clients, and the industry-at-large looks to trusted voices
to help them navigate a sea of options. Many impressive data
visualization and storytelling tools are available, but Tableau
was—at least in my opinion—always at least one step ahead
of the pack with its intuitive user interface, dynamic and ever-
expanding off-the-shelf capabilities, and dedication to building
and supporting an engaged community of visual analysts from
the workplace to the classroom and everyone in between.

Today, much like Google has outgrown its noun-based role of


search engine and data collection superpower and become a
common use verb that encompasses all Internet searching,
Tableau has expanded beyond the boundaries of a software
package and become a required job skill—and one that is top
of the list for employers. We searched Labor Insight, an
analytics software company powered by the largest and most
sophisticated database of labor market data (Burning Glass), to
analyze data visualization–related IT job descriptions posted
between the period of March 2017–February 2018 across the
nation, and can you guess what popped up as the second most
in demand skill of applicants—right behind data visualization
and SQL itself? If you guessed Tableau, you are right (see
Figure 3.1). The message that this look into the job market is
sending is clear: If you’re looking for a job as an analyst—
which by the way is the number one job in this sector of IT—
then employers expect you to have a working knowledge of
Tableau.
Figure 3.1 Out of a total 30,786 jobs that listed “data
visualization” as a skill, roughly 13k (42%) listed Tableau as a
critical specialized skill for applicants.

TABLEAU IN DEMAND

From my analysis of Labor Insight, approximately


30,786 jobs listed “Data Visualization” as a desirable
skill—an increase of about 17.6% from the preceding
annual period. Among the top job titles were Data
Analyst (7%), Business Analyst (3%), and Data
Scientist (3%). Interestingly, only 3.4% (1,050) of
these jobs actually listed “Data Visualization” as the
job title itself.
In addition to “Data Visualization,” other high-ranking
skills included SQL (50%) and Tableau (44%). Excel,
Python, SAS, R, and Java also received mention.
When pitted against the intersection of other required
baseline “soft” skills, those listed were communication
(47%), research (37%), writing (32%), teamwork
(31%), problem solving (30%), and mathematics
(28%). As a testament to visual data storytelling, data
visualization and communication were the top skills in
either category.
If you’re wondering where these jobs are located,
adjusted by population, “above average” job postings
were found in the combined metropolitan areas of
New York/New Jersey (13.7%), followed by
Washington DC (8.45%), San Francisco (6.1%),
Chicago (5.11%), Boston (5%), Santa Clara (4%), and
Seattle (3.83%, and home to Tableau headquarters).

THE TABLEAU PRODUCT


PORTFOLIO
Although Tableau Desktop is Tableau’s cornerstone data
visualization software product, and is the focus of the work
in this book, Tableau also offers several other software
products that incorporate essentially the same user interface
and VizQL engine that makes it such a powerful tool. The
primary differences between these core products is the
different types of data sources users can connect to, how
visualizations can be shared with others, and, in some cases,
primary form factor intended for use.
note
VizQL is Tableau’s proprietary analysis technology You
can read more on VizQL at
https://www.tableau.com/products/technology.

note
The Tableau pricing model is based on users and
designed to scale as your organization’s needs grow.
Free software trials are also available as well as free
licenses for students and teachers—more at
https://www.tableau.com/pricing.

Tableau Server
As you might expect, Tableau Server is best suited for
enterprise-wide deployments. It is intended to provide entire
organizations with the ability to connect to any data source
—on-premise or in the cloud—with centrally managed
governance and granular security protocols to maintain
balance between user flexibility and IT control. This
product is used in conjunction with Tableau Desktop.

Tableau Desktop
Tableau’s flagship product, Tableau Desktop is an
application that can be used on either Windows or Mac
machines. It allows connection to data on-premise or in the
cloud, and facilitates the entire visual discovery and
analytics process from connecting to data to sharing
visualizations, dashboards, or interactive stories using
Tableau Server or Tableau Online. The software also
includes a device designer to help users design and publish
dashboards optimized for various form factors.

Tableau Online
The online version of Tableau eliminates the need for a
server and is a fully cloud-hosted platform that primarily
works with cloud databases, but can also work with live on-
premises queries or scheduled extract refreshes. It provides
the ability for on-the-go users to build, explore, curate, and
share visualizations and dashboards that are accessible from
a browser or a Tableau Mobile app.

Tableau Public
One part data visualization hosting service, one part social
networking, Tableau Public is a free service that allows
users to publish interactive data visualizations online. These
visualizations can be embedded into webpages and blogs,
shared via social media or email, or made available for
download to other users.

note
You can follow me and see many of the visualizations
included in this book on Tableau Public at
https://public.tableau.com/profile/lindyryan.
GETTING STARTED
The first thing you need to do to get started with Tableau is
to get your hands on a license. If you have not done so
already, refer to the Introduction for guidance on how to get
a free trial of Tableau Desktop. You can also visit the
Tableau website to explore trial and purchase options.

CONNECTING TO DATA
When you first open Tableau Desktop, the Connect to Data
screen appears (see Figure 3.2).

Figure 3.2 The Tableau Connect to Data screen.

There are several important elements to know on this screen:

Connect: A long list of native connections to various data


sources.
Open: As you create your own workbooks, recently
opened workbooks appear here for quick access.
Sample Workbooks: These are default samples provided
by Tableau.
Discover: This pane connects you to various Tableau
training, visualization, and other resources.

note
This book focuses on the art of visual data storytelling,
and as such is not a user manual for Tableau. I
recommend you review the Training videos provided by
Tableau in the Discover pane.

Connecting to Tables

tip
I’m using the Global Superstore Excel training file
provided by Tableau. This is a simple dataset of sales
for a global retailer that sells furniture, office supplies,
and technology goods. You can download this file from
the Tableau Community to follow along.

For our purposes, connect to a very common file format—an


Excel file. You can connect to any Excel file by clicking the
Excel option under the Connect menu and navigating to the
file’s location on your machine. Once connected to your data
file, Tableau opens the data connection window (see Figure
3.3).
Figure 3.3 The Data Connections screen.

The screen provides several options to help you prepare this


file for analysis in Tableau.

Connections: You can add additional data sources by


clicking Add. You can also edit the name of the
connection or remove it as desired by clicking the drop-
down arrow to the right of the filename. (You can also
rename the connection by clicking on its title on the
canvas to the right.)
Sheets: This pane displays all the sheets in the Excel file,
corresponding to the names of individual worksheet tabs.
Sheets in Excel are treated the same as tables in a
database, and you can choose to connect to a single table
or join multiple tables. To connect to a sheet, simply click
and drag it into the data connection canvas to the right
(you will notice a “Drag sheets here” prompt) or by
double-clicking the sheet desired. After you connect to a
sheet, three things happen (see Figure 3.4):
The sheet name appears in the data connection canvas.
The data displays in the preview pane below the data
connection canvas.
A Go To Worksheet icon displays.

Figure 3.4 You have connected to the Orders sheet of the


Excel file, populating the data preview pane. Tableau also
provides the prompt to Go To Worksheet and begin visually
exploring the data if you are ready.

Before moving on, there are a few more things to take note of
on this screen.

First, if you aren’t satisfied with any individual column name,


you can click on the drop-down arrow to the right of the name
and select Rename. Additionally, clicking on the data type
icon allows you to change the default data type for that column
(see Figure 3.5). You can also:

Adjust the default data source sort order.


Create calculated fields to populate in your worksheet.
Hide or show hidden fields.
Split fields by delimiter using an automatic or custom
split.
Pivot data fields as necessary.

Figure 3.5 Clicking the data type icon allows you to change
the default data type for that column. This determines how the
fields are displayed on your worksheet in the next step.

Live Versus Extract


You might have noticed the option for a Live or Extract
connection on the sheet canvas. The default is Live.
However, before you begin analyzing data, this might be
something you want to consider (see Table 3.1).
Table 3.1 Be sure to understand the benefits and
drawbacks of Live versus Extract connection options.

Connection Pros Cons

Live Leverage a Can result in a slower


high- experience
performance
Some cloud-based data
database’s
sources must be
capabilities
extracted
See real-time
changes in
data

Extract Can deter Most Online Analytical


latency in a Processing (OLAP) data
slow database sources cannot be
extracted
Could reduce
query load on
critical
systems

Connecting to Multiple Tables with


Joins
Previously I mentioned that you can connect to multiple
data sources in Tableau. You can also connect to multiple
tables in the same data source.

To do this, drag and drop or double-click the second sheet you


want to connect to (in Figure 3.6 I have selected the sheet
named People). The join icon with the blue center indicates
that Tableau has automatically joined these tables as an inner
join, making it the default join clause. Clicking on the join
icon displays the details as well as gives the option to edit the
join clause, or even create a new one.

It’s important to note that while Tableau will automatically


join your tables, it does so by guessing what your matching ID
is. You can change this by clicking on the fields, which shows
a drop-down menu of all data fields available to join.

Figure 3.6 Tableau has automatically joined these tables by


recognizing that Region is a common field between the two.

Overview of Join Types


As you prepare your data for analysis in Tableau, you might
need to “join” data by connecting a collection of tables that
are related by a specific field (or column). In a nutshell,
joining is a method for combining the data located in those
common fields into one virtual table for analysis.

Tableau provides four types of joins that you can use to


combine your data: inner, left, right, and outer. Inner and left
joins are the two most common types of joins.

Inner join: Joins records where there is a matching field


in both datasets. Using an inner join to combine tables
produces a new virtual table that contains values that
have matches in both tables.
Left join: Joins records from the left and right sides of
your equation when there is a match. Using a left join to
combine tables produces a new virtual table that contains
all values from the left table and corresponding matches
from the right table. When there is no corresponding
match from left to right, you will see a null value.
Right join: Joins all the records from the data on the right
side of your equation and any matching records from the
left side. Opposite of a left join, using a right join to
combine tables produces a table that contains all values
from the right and corresponding matches from the left.
Likewise, when a value in the right table doesn’t have a
corresponding match in the left table, you see a null
value.
Outer join: Joins all the records from each dataset
together, even when there is no join—and rarely used.
Using a full outer join to combine tables produces a table
that contains all values from both tables. If a value from
either table doesn’t have a match with the other table, you
see a null value.

WHAT ARE “NULLS”?

Occasionally as you work with data, you will discover


a field name called null. What is that?
Null means that some empty cells are in your data and
Tableau is, essentially, letting you know. Checking
fields and formatting for extraneous information is
always important when doing data analysis because
you want to ensure these blank fields do not skew out
results. A null field might indicate an error in the data,
or some other inaccuracy.
In many cases, you don’t want empty fields to show
up in your data, and you’ll want to exclude null fields.
To do so, select the Null field and Ctrl-click (or right-
click), and then select Exclude (see Figure 3.7). This
excludes the null values from your analysis.
Figure 3.7 To exclude nulls from analysis, right-click
and select Exclude.

Generally speaking, Tableau will do its best to automatically


determine the best join. However, if you’re unsure which type
of joins your data supports, you can check the join dialog after
you’ve connected your data. Additionally, you can adjust the
join type by selecting a different join type in the Join dialog.

join errors!
Sometimes, an issue occurs in joins. Tableau notes these
with a red exclamation point to the side of the join
wherein the error occurs (see Figure 3.8).

Figure 3.8 Tableau alerts users to join errors with a red


exclamation point.
BASIC DATA PREP WITH DATA
INTERPRETER
The preceding example shows an Excel file that is already
nicely formatted and ready to go for analysis in Tableau.
However, in reality, data files are not always so analysis
ready and might require extensive prep work before they are
ready to be brought into Tableau for analysis and
visualization work.

Tableau Desktop delivers some features to help automatically


reshape files to get them ready for analysis in Tableau.
Primary among them is Data Interpreter, Tableau’s built-in tool
for preparing data for analysis. When you connect to an Excel
sheet in Tableau, the software can recognize issues such as
missing column names, null values, and so on. To remedy
these and clean the file for use in analysis, Tableau will
suggest Data Interpreter (refer to Figure 3.4 to locate the Data
Interpreter option on the Data Source screen). While this is a
helpful feature, the tool is limited and somewhat superficial in
its ability to prep data.

To use Data Interpreter, select the check box to turn on the


tool. This executes a query to the Excel file and confirms its
automated prep tasks with a revised data preview pane
addressing the issues it has identified. To get more specifics on
what Data Interpreter has adjusted in the file, including a
before-and-after view and an explanation table, click the link
that is provided following the Interpreter’s action to “Review
the results.” This opens an Excel file describing the changes.
You can also clear the check box to undo these changes and
revert to your original sheet.

After verifying the data you’ll be connecting to, you can go to


your worksheet and begin exploring the Tableau interface and
your data—you are ready to begin your analysis!

NAVIGATING THE TABLEAU


INTERFACE
Now that you have some data in Tableau, you can click the
prompt to Go To Worksheet and start getting to know the
Tableau user interface in a more meaningful way (see
Figure 3.9). The Tableau UI is a drag-and-drop interface
that fosters rich interactivity between sheets, dashboards,
and stories, allowing for in-depth visual exploration and
powerful visual communication. Tableau is similar to Excel
in that its files are called workbooks and the sheets inside
the workbook are called sheets. Every Tableau workbook
contains three elements:
Sheets: For creating individual visualizations. Each
workbook can contain multiple sheets—one for each data
visualization you create.
Dashboards: For combining multiple sheets as well as
other objects like images, text, and web pages, and
adding interactions between them like filtering and
highlighting. Dashboards are great for looking at the
interactions between multiple visualizations.
Stories: These frameworks can be based on visualizations
or dashboards, or based on different views and
explorations of a single visualization, seen at different
stages, with different marks filtered and annotations
added—however is best suited to narrate the story in your
data.

Figure 3.9 The Tableau user interface, a blank canvas.

Later chapters cover Dashboards and Stories in more depth.


For now, let’s focus on sheets and take a high-level view at the
various areas of the screen. As you begin to work directly with
data to build visualizations and stories, you will take a more
detailed approach to each of these areas.

There are four basic elements to the Tableau interface:

Menus and toolbar


Data window
Shelves and cards
Legends

Menus and Toolbar


At the top of the screen are the menus that accompany
Tableau Desktop. These contain many powerful controls.

Below, at the top of your Tableau sheet, is the toolbar. It is


similar in concept to the ribbon in Microsoft Office products.
Like the menu, this toolbar contains many powerful buttons
that give you control over your Tableau experience and enable
you to navigate from the data source all the way to story
presentation mode. A few items of special consideration:

Logo: The Tableau logo button brings you back to the


original Connect to Data screen (clicking the icon from
this screen returns you to your sheet).
Undo: There is no limit to how much you can undo in
Tableau, which is an important feature for exploration
and discovery. The icon is grayed until there is an action
to undo.
Save: There is no automatic save in Tableau. Be sure to
save your work incrementally.

Another menu appears along the bottom of the sheet. This


menu, similar in concept to a Tableau workbook, enables you
to return to the Data Source screen; create new sheets,
Dashboards, or Stories; and do things like rename, rearrange,
duplicate, delete various sheets, and so on.

Data Window
The pane on the left of the sheet is called the Data window
and has two tabs: a Data tab and an Analytics tab.

Data
At the top of the Data tab is a list of all open data
connections and the fields from that data source categorized
as either dimensions or measures (discussed shortly).

Analytics
The Analytics tab enables you to bring out pieces of your
analysis—summaries, models, and more—as drag-and-drop
elements. We review these functions later.

Shelves and Cards


Shelves and cards are some of the most dynamic and useful
features of the Tableau UI.
Columns and Rows shelves: Control grouping headers
(dimensions) and axes (measures)
Pages shelf: Lets you break a view into a series of pages
so you can better analyze how a specific field affects the
rest of the data
Filters shelf: Filters visualizations by dimensions or
measures
Marks card: Controls the visual characteristics of a
visualization, including encoding of color, size, labels,
tooltip text, and shape
“Show Me” card (shown open): A collapsible card that
shows application visualization types for a selected
measure and dimension

Legends
Legends will be created and automatically appear when you
place a field on the Color, Size, or Shape card. To change
the order (or appearance) of fields in a visualization, drag
them around in the legend. Hide legends by clicking on the
menu and selecting Hide Card. Likewise, bring them back
by selecting the Legend option on the appropriate space in
the Marks card or by using the Analysis menu.

UNDERSTANDING DIMENSIONS
AND MEASURES
When you bring a data source into Tableau, Tableau
automatically classifies each field as a dimension or a
measure. The differences between these two are important,
though they can be tricky to those new at analysis. Perhaps
the best way to differentiate these two classifications is as
this: dimensions are categories, whereas measures are fields
you can do math with.

Dimensions
Dimensions are things that you can group data by or drill
down by. They are usually—but not always—categories
(such as City, Product Name, or Color), and they can be
grouped into strings, dates, or geographic fields.
Measures
Measures are generally numerical data on which you want
to perform calculations—summing, averaging, and so on.

Remember, setting a field as a measure or dimension can be


adjusted in the Data Source screen by clicking on the data type
icon. You can also change this directly in the sheet by either
dragging and dropping a dimension to measure, or vice versa,
or by clicking the drop-down menu by any field and selecting
the Convert to Measure (or Dimension) option.

Continuous and Discrete


Generally, dimensions are discrete and measures are
continuous. We could break this down a little more into four
types or levels of measurement: nominal, ordinal, interval,
and ratio.
Nominal measures are discrete and categorical
(for/against, true/false, yes/no)
Ordinal measures have order but there are not distinct,
equal values (for example, rankings)
Interval measures have order and distinct, equal values (at
least we assume they are equal; for example, Likert
scales)
Ratio measures have order, distinct/equal values, and a
true zero point (length, weight, and so on)

In Tableau, continuous fields produce axes, whereas discrete


fields create headers. Continuous means “forming an unbroken
whole, without interruption.” Discrete means “individually
separate and distinct.” Be sure you understand the difference
between these mathematical terms. Text and categories
(dimensions) are inherently discrete. Numbers can be discrete
if they can only take one of a limited set of distinct, separate
values (like, for example, a rating). Numbers, including dates,
can be continuous if they can take on any value in a range.

COLORFUL PILLS

When a field is brought from the data window pane


and dropped into the Rows and Columns shelves,
Tableau creates a “pill.” These pills are color coded:
blue pills represent discrete variables whereas green
pills are continuous. The data type icons also reflect
these color codes (see Figure 3.10).

Figure 3.10 Color-coded pills reflect continuous (green)


measures and discrete (blue) dimensions.

SUMMARY
This chapter introduced the Tableau product ecosystem and
then took a high-level view of the Tableau user interface,
including connecting and preparing data and the core
functionality of the Sheets canvas. In future chapters, you
will put this knowledge into practice as you begin working
hands-on with this functionality.

The next chapter addresses the importance of context in


building a visual data story.

_____________
1. http://onlinehelp.tableau.com/current/pro/desktop/en-us/basicconnectoverview.html

You might also like