A Practitioner's Guide To Best Practices in Data Visualization Inte.2017.0916
A Practitioner's Guide To Best Practices in Data Visualization Inte.2017.0916
This article may be used only for the purposes of research, teaching, and/or private study. Commercial use
or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher
approval, unless otherwise noted. For more information, contact [email protected].
The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness
for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or
inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or
support of claims made of that product, publication, or service.
With 12,500 members from nearly 90 countries, INFORMS is the largest international association of operations research (O.R.)
and analytics professionals and students. INFORMS provides unique networking and learning opportunities for individual
professionals, and organizations of all types and sizes, to better understand and use O.R. and analytics tools and methods to
transform strategic visions and achieve better outcomes.
For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org
INTERFACES
Vol. 47, No. 6, November–December 2017, pp. 473–488
http://pubsonline.informs.org/journal/inte/ ISSN 0092-2102 (print), ISSN 1526-551X (online)
Received: June 27, 2016 Abstract. Data visualization is the process of visualizing data through tables, charts,
Revised: November 8, 2016; April 7, 2017 graphs, maps, and other visual aids. Data visualization is often thought of as a descriptive
Accepted: April 14, 2017 tool, but it is also important for use in predictive and prescriptive analytics. It serves as a
tool for descriptive data exploration, but also for communicating insights from data and
https://doi.org/10.1287/inte.2017.0916
from analytical models. In this tutorial, we provide a concise discussion of best practices
Copyright: © 2017 INFORMS for data visualization for effective communication.
Much has been written about the importance of com- the effectiveness of the visualization in communicating
munication in the application of operations research to the audience.
(OR) and in ensuring that the results from OR studies In this tutorial, we discuss best practices for effec-
are accepted and implemented. Woolsey (1979, 1981), tive data visualization, give examples of good and bad
Levasseur (1991, 2013), and Keller and Kros (2000) pro- visualizations, and describe a variety of types of charts
vide examples. Data visualization is an important tool and graphs and the comparisons for which they are
for data exploration; however, it is also critical for com- most effective.
municating results from analysis and recommenda-
tions from analytical studies. The Goal and Three General Principles
From the point of view of good, available software, The goal of a table or chart should be to display data
data visualization has never been easier. A variety of in a manner that conveys the desired message to the
excellent data-visualization software packages are now intended audience. The input data to the visualiza-
available. A few of the leading specialized software tion can be raw data, descriptive statistics from the
packages include Tableau, SAS Visual Analytics, and raw data, or the output of analytical models. Hence,
Spotfire. The open source language R also provides the principles we discuss in this paper are applica-
excellent data-visualization capabilities, and Microsoft ble across the full spectrum of analytics: descriptive,
has recently updated the data-visualization capabili- predictive, and prescriptive. For example, in descrip-
ties in its ubiquitous Excel package to allow the cre- tive analytics, scatter plots are useful for exploring
ation of a wider range of charts. Excel also has a rec- and conveying possible relationships between pairs of
ommended-chart option, which provides the user with variables. Overlaid line charts are often an excellent
a list of suggested visualizations based on the selected choice for the output of predictive models when the
data. Comparisons of data-visualization tools can be goal is to show the model results and prediction inter-
found in recent ratings by PC Magazine (Baker 2017) vals to convey the uncertainty associated with those
and Predictive Analytics Today (2017), and in Gartner’s results. A network geographic map, with arcs show-
“Magic Quadrant for Business Intelligence and Analyt- ing distribution-center-to-customer-zone assignments,
ics Platforms” (Sallam et al. 2017). However, software is an excellent way to convey the efficient structure
availability does not relieve the user of the responsibil- of the solution from a supply-network optimization
ity of making intelligent choices—choices that impact model.
473
Camm, Fry, and Shaffer: A Practitioner’s Guide to Best Practices in Data Visualization
474 Interfaces, 2017, vol. 47, no. 6, pp. 473–488, © 2017 INFORMS
Regardless of the type of input data (data or model introduced what he calls the data-ink ratio—the ratio
output), the creation of the visualization must be of ink required to convey the intended meaning (data-
guided by the intended audience, the interests of that ink) to the total amount of ink used in the table or
audience, and the message that is to be conveyed. chart. According to Tufte, the data-ink ratio should be
This requires an understanding of the makeup of the as close to one as possible; that is, we should avoid ink
audience and an anticipation of questions audience that does not add information.
members will pose. In short, empathy for your audi- The third general principle, consistent with the
ence should be the main driver in designing your notion that simpler is better, is to use color purposely and
visualization. effectively. Although the use of color might be attractive
Before embarking on a discussion of the three guid- to make the data visualization seem “prettier,” it can be
ing principles for data visualization, it is instructive to a distraction to the audience and hinder your message.
know a bit about how we process information. Every As an example, color can be used effectively to draw
second our brains are busy processing the informa- attention to part of a visualization or to distinguish cat-
tion that our five senses are feeding us. The brain fil- egories in the data; however, using too much color can
ters stimuli from our environment to process what is distract the audience. Color should be used intention-
important in two main ways: unconscious and con- ally, that is, only if it assists in conveying your message.
scious, as Kahneman (2011) refers to as System 1 In the following sections, we provide examples and
and System 2, respectively. System 1 represents the more detail on these three general principles of good
uncontrolled functions, such as reading people’s facial data visualization. For a more detailed discussion on
expressions or the immediate reaction we have to an how the brain processes images for data visualiza-
event. System 2 represents the controlled functions tion, we refer the reader to Few (2009), particularly to
that require conscious effort, such as solving a math Chapter 3.
problem.
In data visualization, we leverage the pre-attentive An Example: Barring Pie Charts
attributes, which are System 1. We encode data in a cer- Let us consider a simple descriptive analytics example
tain way; for example, if we highlight something with to illustrate the use of the three general principles, as
color, the reader sees this and interprets it in 250 mil- well as the iterative nature of the process for creating
liseconds without even thinking about it. Hence, by a good visualization. Like prototyping in the modeling
utilizing methods that are attuned to System 1, data- process, visualization often benefits from incremental
visualization tools, when used properly, can help a improvements over multiple iterations. Consider the
reader gain a quick and correct understanding of the following example.
data being presented. The three guiding principles, We have data that have yielded the percentage of
which we discuss next, are all about making sure students declaring each of the 12 categories of majors
we capitalize on how we process information using (plus undesignated) in a college of business. Suppose
System 1. these data are to be used as part of the recruiting
The first general principle in data visualization is process for the college; that is, the audience includes
that design and layout matter. The design and layout of potential students, and in many cases, one or more
your visualization should facilitate ease of understand- parents of these potential students. Undoubtedly, the
ing your message. The design and layout, including the students and parents will have many questions about
type of chart or table used, should draw attention to the majors; however, we know that their questions are
the parts of the visualization that are important in con- likely to include the following:
veying your message to your intended audience. For • What is the most popular major?
example, axes should be labeled clearly and legends • What is the least popular major?
should be close enough to the display to facilitate easy • What are the top three majors?
comparisons. • How does the major I am considering compare to
The second general principle is to avoid clutter. In the other majors in terms of its popularity?
short, simpler is better. To make this point, Tufte (2001) • What percentage of students are undesignated?
Camm, Fry, and Shaffer: A Practitioner’s Guide to Best Practices in Data Visualization
Interfaces, 2017, vol. 47, no. 6, pp. 473–488, © 2017 INFORMS 475
The data that we have available are the percentage Consider Figure 2 in which we have eliminated the
of the total student population in each major, which unnecessary third dimension.
illustrates a part-to-whole relationship—the percent- Can we improve upon this chart? Studies have
age breakdown of the total for the variable under con- shown that humans are very good at measuring the
sideration. Pie charts are often touted as being useful length or height of an object on a common baseline. For
for showing this part-to-whole relationship (Figure 1). example, when two people stand next to each other,
What are the potential problems with this choice of we can easily tell which person is taller and by how
chart? One drawback is that it is difficult to see the much. Humans are also good at measuring the posi-
legend and the colors associated with each major. The tion of objects; for example, how far one person is
layout and design do not facilitate easy comparison standing from another. However, they are much less
because the viewer must move back and forth between adept at accurately measuring angles, arcs, and area,
the legend and the pie. With 13 categories, many of the and comparing them against each other. We can lever-
colors are very similar, which also makes comparison age this strength at measuring length in the design of
difficult. It might be better to label each slice of the
a chart by carefully choosing the type of chart to use.
pie with its major; however, doing so would make the
Bar charts are a good choice for comparing categories.
chart even more cluttered. Having the percentage next
Given the questions we believe our audience will pose,
to each slice is helpful.
that is, comparisons of the percentages of students in
An interesting question is, “What does the third di-
each major, a bar chart would be a better design for our
mension add to this chart to help us better understand
purposes.
the data?” The answer is nothing. In this example,
adding the third dimension makes comparing cate-
gories more difficult. Figure 2. Compared to the Three-Dimensional Pie Chart in
Figure 1, the Data-Ink Ratio in This Pie Chart Is Higher and
Figure 1. This Three-Dimensional Pie Chart Shows the the Chart Is Easier to Understand
Percentage of Undergraduates by Major for a College Undergraduate majors
of Business
6%
1%
Undergraduate majors 19%
8%
6% 19%
1%
8%
4%
2% 4%
4% 2%
26% 26%
4%
17%
5% 17%
5% 5%
2% 1%
5% 2% 1%
Accounting Business economics
Economics Entrepreneurship Accounting Business economics
Finance Hospitality management Economics Entrepreneurship
Industrial management International business Finance Hospitality management
Information systems Marketing Industrial management International business
Operations management Real estate Information systems Marketing
Undesignated Operations management Real estate
Note. The third dimension reduces the data-ink ratio and makes the Undesignated
chart more difficult to understand.
Camm, Fry, and Shaffer: A Practitioner’s Guide to Best Practices in Data Visualization
476 Interfaces, 2017, vol. 47, no. 6, pp. 473–488, © 2017 INFORMS
Figure 3 shows a vertical bar chart of the data Figure 4. Sorting a Bar Chart Allows a Reader to Easily and
on business college majors. The differences in college Quickly Compare Differences Among Variables
majors are much easier to visualize in this figure than Undergraduate majors
in either Figures 1 or 2. For example, compare Account- 30
26%
ing and Finance in Figures 3 and 2. Without the data 25
labels, looking at Figure 2 and determining which 20 19%
17%
major has a higher percentage of undergraduates is
% 15
much harder. In Figure 3, even without data labels,
10 8%
we can relatively easily see that the bar for Finance is 6% 5% 5%
4% 4%
shorter than the bar for Accounting; thus, the percent- 5 2% 2% 1% 1%
age of Finance majors is smaller than the percentage of 0
Marketing
Accounting
Finance
Operations management
Undesignated
International business
Information systems
Business economics
Entrepreneurship
Economics
Industrial management
Hospitality management
Real estate
Accounting majors.
Can we improve upon Figure 3? If we think about
how this chart will be used, perhaps sorting the data
from high to low would make the chart easier to
read and more useful for the reader. Figure 4 shows
the same vertical bar chart sorted. Sorting allows the
reader to more easily find the highest and lowest per-
centage majors and to compare majors that are close in
percentage. What are the three largest majors? Compare the ease
Finally, our adeptness at comparing lengths is not of answering this question by looking at Figure 5 ver-
impaired by the use of a horizontal display versus sus Figure 2. We can also quickly determine the bottom
a vertical display. Figure 5 displays a horizontal bar
three majors and the second highest; or we can com-
chart, which is a better design for the same data; the
pare the third highest with the fourth highest. With
categorical text labels are easier to read because they
this simple example in Figures 1–5, we have illustrated
are oriented horizontally. In addition, because we have
the following three general principles:
labels next to each bar, we removed the horizontal axis
• Design and layout matter: Use them effectively.
for simplicity.
We evolved from a pie chart to a bar chart to a sorted
Figure 3. In This Bar Chart, Different Colors Are No Longer
Necessary to Distinguish Between Majors; Thus, We Can
Eliminate the Legend Figure 5. The Categorical Text Labels Are Horizontal; Thus,
Undergraduate majors
the Chart Is Easier to Read Than the Vertical Bar Chart in
30
Figure 4
26%
Undergraduate majors
25
Marketing 26%
20 19% 17% Accounting 19%
% 15 Finance 17%
10 8% Operations management 8%
5% 5% 6%
4% 4% Undesignated 6%
5 2% 1% 2% 1% Information systems 5%
0 International business 5%
Accounting
Business economics
Economics
Entrepreneurship
Finance
Hospitality management
Industrial management
International business
Information systems
Marketing
Operations management
Real estate
Undesignated
Entrepreneurship 4%
Business economics 4%
Industrial management 2%
Economics 2%
Real estate 1%
Hospitality management 1%
Camm, Fry, and Shaffer: A Practitioner’s Guide to Best Practices in Data Visualization
Interfaces, 2017, vol. 47, no. 6, pp. 473–488, © 2017 INFORMS 477
bar chart, and we then rotated the bar chart for ease of Design and Layout Matter: Types of
reading. Comparisons and Chart Types
• Avoid clutter: Too many labels, values, lines, and A number of primary comparisons can made when
dimensions hinder the reader in interpreting the data. visualizing data. The type of comparison being made
We eliminated the distracting third dimension. is critical when determining the type of data visualiza-
• Use color purposely and effectively: Use it only tion or chart type.
when it is needed to distinguish categories or draw the A categorical comparison is one of the more com-
attention of the reader. By choosing a bar chart, we mon comparisons, and several good choices are avail-
were able to eliminate the use of different colors and able. As we previously discuss, because humans are
the associated legend. adept at measuring height and width, we can leverage
From this example, we can also learn the following: this strength in our choice of chart type. As we saw in
• Avoid using a pie chart—use a bar chart instead. the first example, a bar chart, either vertically or hor-
• Sorting the data often makes relevant compar- izontally, is a great choice for making quick and pre-
isons easier; in our example, it can help our audience cise categorical comparisons. Figure 6 shows a simple
to more easily answer the questions we anticipate. example with two good approaches for a simple com-
• Avoid rotated text. If the data include long cate- parison (in this case, units of fruit sold). If our audience
gory names, consider rotating the chart. has a sales target of 16 units for each of these fruits,
In a more general way, this example illustrates what adding a target line helps the audience to easily com-
often happens in practice. We can rarely achieve the pare how each fruit is doing relative to the target. In
best visual display on the first attempt. Rather, itera- addition to a target line, other examples of this include
tive prototyping, based on the intended message, leads using “average,” “estimate,” “threshold,” and “previ-
to a narrowing down of design options followed by ous year.” Which of these to use should be driven by
successive improvements in labeling, the use of color, the problem context and questions you anticipate the
titles, and other details that improve the communica- audience will ask.
tion of our message. Berinato (2016) provides an ex- In designing a bar chart, consider the following:
panded discussion of prototyping in the context of • Order the data to provide additional context.
chart construction. • For a small number of bars, consider data labels.
In the next three sections, we provide more detailed Position the labels such that the reader can easily see
discussions of the three general principles. them. Notice the “data table” that is created when the
Figure 6. (Color online) Bar Charts for Categorical Comparisons Can Be Augmented with a Target Line
Bar chart for categorical comparisons
23
Bananas 23
17 Target 16 units
Apples 17 14
12
Grapes 14
Oranges 12
Target 16 units
labels are aligned near the vertical axis of the chart. A bar chart is recommended for these types of com-
This makes it very easy for the eye to follow. parisons. To emphasize comparisons among categories
• If the chart must include many categories, then and part-to-whole comparisons, the bar chart can be
balance the use of axis labels with data labels. Both are supplemented by a stacked bar chart (Figure 7).
unnecessary. In the example above, why use axis labels Another very common comparison in data visual-
to give the reader approximate values, when the data ization is showing a trend over time. Time should gen-
labels provide the actual values? erally be on the horizontal axis. Line charts are great
• Avoid rotated text. If some category names are for visualizing a trend, but a bar chart can also be used.
long, consider rotating the chart. Figure 8 shows two good approaches for times-series
• Mute or remove gridlines and remove chart data. Note that the line chart does not show all data
values because doing so would make the chart appear
borders.
too cluttered. Here we simply show the highest value
• Use a light font color on dark colors and dark font
and the most recent value. For example, if our audience
colors on light colors (e.g., white font color on the dark-
for Figure 8 is concerned about planning candy pur-
colored bars in Figure 6).
chases, then having actual numbers, as we show on the
• Always start the axis at zero on a bar chart or any
bar chart in the figure, with the actual number of Hal-
other chart that encodes data using length or height loween visitors, might be more useful. We could also
(e.g., bar chart, histogram, lollipop chart, area chart). enhance the figure by adding an average line, similar
Because these types of charts use height or length to to our target line in Figure 6.
encode the data for comparison, the axis should begin In designing a time-series chart, consider the
at zero; otherwise, the visual comparison is incorrect following:
and can be misleading. • Time should be on the horizontal axis; time should
In addition, the part-to-whole comparison is often progress from earlier to later and read from left to right.
necessary. Unfortunately, this is often shown as a pie • If using bars to represent time, do not break the
chart, which for the reasons previously discussed, most axis. Bars must begin at zero. It is acceptable to break
data-visualization experts consider to be bad practice. the axis on a line chart; however, it has a magnifying
Figure 7. (Color online) A Stacked Bar Chart Can Enhance a Bar Chart and Is Generally Preferable to a Pie Chart for Making
Part-to-Whole Comparisons
What makes up a credit score? How to improve a credit score:
Figure 8. (Color online) Both Line Charts and Bar Charts Can Be Used Effectively for Time-Series Data
Line or bar chart for time series data (on x-axis)
869
800
600
454
869
400
726 673
492 542
200 454
391
0
2008 2009 2010 2011 2012 2013 2014 2008 2009 2010 2011 2012 2013 2014
Note. The line chart emphasizes minimum and most recent values to the reader; the bar chart can more effectively convey information on
values for each year without appearing too cluttered.
effect on the visualization, making the visual slope of thus, it helps the reader to generate insights from the
the line appear much more dramatic as the magnifica- chart.
tion increases. Dots can also be used in a scatter plot (with or with-
• Plot consistent time periods using a line chart. If out jitter). Figure 10 combines a traditional scatter plot
time periods are missing or unequal, then note this
and deal with it in the visualization; for example, use a Figure 9. (Color online) Jitter Adds Some Randomness in
dotted line where years are missing. the Horizontal Position of the Dots and Performance Bands
The use of dots can be effective in various visual- Allow the Easy Designation of Different Grade Ranges
ization methods. This includes making comparisons, Final exam grades of data visualization students
showing time-series data when an interval is miss- University of Cincinnati—Spring 2013
ing, plotting a distribution, or showing a correlation Dot plot Using jitter
between variables. If possible, avoid using shapes other
100 100
than dots or circles. Circles and dots have a single cen-
ter point that is clear to readers. Triangles, squares,
asterisks, stars, or other shapes hinder the reader in 90 90
making a quick and accurate determination of the cen-
ter of the point.
Final exam (%)
Figure 10. (Color online) A Scatter Diagram and Box Plots Can Display the Relationship Between Two Variables and Are
Useful for Displaying the Distributions of Correlated Variables
-AJOR ,EAGUE "ASEBALL PLAYERS (EIGHT AND WEIGHT
(EIGHT INCHES
7EIGHT POUNDS
#REATED BY
*EFFREY ! 3HAFFER
&OLLOW (IGH6IZ!BILITY
3OURCE 3/#2 DATA
-,"