0% found this document useful (0 votes)
4 views

Week1 Introduction Notes

Uploaded by

g.grills89
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Week1 Introduction Notes

Uploaded by

g.grills89
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Introduction to Data Visualization

Week 1
What are my and your responsibilities?
As teacher, I have the following responsibilities:
1. Come prepared to every class.
2. Design the class so you can accomplish the objectives listed in the syllabus.
3. Create a mutually respectful classroom environment.
4. Consider that it is not always your fault if you do not understand the material.
As student, you have the following responsibilities:
1. Come prepared to every class.
2. Consistently participate in class discussions and activities.
3. Treat everyone with respect.
4. Consider that it is not always my fault if you do not understand the material.
What are instructional coordinators?
• Instructional coordinators (ICs) are a student’s first point of contact for all course-related questions. The
ICs and instructor work as a team to provide the highest quality learning environment for students. A
student should contact their IC regarding:
• Course content… if a student does not understand a topic and seeks clarification and/or assistance
• Course details… questions regarding due dates, extension requests, assignment clarification, grades, syllabus

• Students can contact their IC via email.


• If a student use the Email option on Canvas to contact their IC, the IC’s response will go to the student’s email address
on file for Canvas. Students can change their email address on file at http://ramweb.colostate.edu.

• ICs help students stay on track, may provide suggestions, and touch base with students on seen issues.
ICs also grade assessment components (e.g., assignments, quizzes).
• Do not hesitate to contact your IC if you have questions, concerns, feel off track, or experience
emergencies that affect your ability to stay engaged in the class.
An Overview of Data Visualization

What is your definition of Data:


Showing data in a way we can comprehend.

Examples:
Charts, plots, line charts
Visualization has a purpose
- can be used to support find patterns in data, identify
outliers, what is the data relationship
- Data vis can help display the findings, to better
communicate and help them understanding the message

Data Visualization
• Data visualization: an umbrella term to cover all types of visual representations
that support the exploration, examination, and communication of data.
• Information visualization and scientific visualization: subsets of data visualization
• Information visualization: the use of computer-supported, interactive, visual representations of
abstract data to amplify cognition.
• Scientific visualization: visual representation of scientific data that are usually physical in
nature, rather than abstract. This is the main difference between the scientific vs informational e.g.
mri scan, xray, they display things in visual form

• The purpose of information visualization is not to make pictures, but to help us


think.
Data Visualization
• A display of data designed to enable
analysis, exploration, and discovery.
• Not intended mainly to convey messages
that are predefined by their designers
• Often considered tools that let people
extract their own conclusions from the
data.
• Example: A World of Terror (Periscopic)

this kind of display can be interactive to show different views, provide additional information
Infographics
• An infographic is a multi-section visual
representation of information intended to
communicate one or more specific messages.
• Its designer does not show all information they
gathered, but just the portion that is relevant
for the point (or points) that they are trying to
make.
• Example: A Question of Taste (Adolfo Arranz)
needs
persons
to each
e according
customizabl
These are
interactive
This is
life.
data in their
relate to the
explore and
People can
News Application
• A news application is a special kind
of visualization that lets people
relate the data being presented to
their own lives.
• Main goal: be customizable
according to each person’s needs.
• Example: HealthCare.gov Explorer
(The Wall Street Journal)
Data Art
• Also called information art or
informatism.
• The objective of data art is to
create aesthetic forms and artistic
works from the digital nature of
the data generated from big
data (graphics, simulations,
worksheets, statistics, etc.).
• Source: Grugier (2016)
es
os
rp
pu
tic
tis
ar
th
wi
d
te
ea
cr
-
es
ey
of
re
su
ea
pl
e
th
r
fo
Other Terms
• A chart is a display in which data are encoded with symbols that have different
shapes, colors, or proportions. (E.g., lollipop chart)
• Plot: synonym of “chart”, but commonly used to refer to a few specific charts (e.g.,
scatter plot)
• A map is a depiction of a geographical area or a representation of data that
pertains to that area.
History of Visualization
A picture conveyed the information “with more exactness, and in much less time, than it [would take] by
reading.” – Joseph Priestley
Priestley's Chart of Biography from 1765 (Wikipedia)
The time distribution
of events considered
milestones in the
history of data
visualization, shown
by a rug plot and
density estimate.
(source: Michael
Friendly)
History of Visualization
• Events that have contributed to the visualization of quantitative data
• Table (a simple arrangement of data into columns and rows) emerged in
Babylonia at around 2500 BCE.
• Visual encoding of quantitative data arose in the 17th century.
• In the late 18th and early 19th century, William Playfair invented the bar chart
and pie chart, and was the first to use line graphs.
• The first graphing course was offered at Iowa State University in 1913.
History of Visualization
• In 1967, Jacques Bertin introduced the notion of visual language.
• In 1977, John Tukey introduced a new approach to statistics called exploratory
data analysis (box plot as one of his inventions)
• Other important figures:
• Edward Tufte
• William Cleveland
Playfair’s trade-
balance time-
series chart,
published in his
Commercial and
Political Atlas,
1786
(Wikipedia)
Florence Nightingale’s
“rose diagrams”
showed deaths from
disease (blue), war
wounds (red) and
other causes (black).
(Wellcome Library,
London)
John Snow’s Cholera map: an
early and most worthy use of
a map to chart patterns of
disease
A map shows the
distribution of the slave
population in the
Southern states of the
United States, based
on the 1860 census.
(Library of Congress
Geography and Map
Division)
Minard’s French original (“…seeming to defy the pen of the historian by its brutal eloquence…”)
Why Visualization?
• Graphics reveal data
• Graphics can be more precise and revealing than conventional statistical
computations. (E.g., Anscombe’s quartet)
Why Visualization?
Applying the data set to the graph
means we can quickly see the
trends/patterns
Why Visualization?
Graphical Integrity
“Graphical excellence begins with telling the truth about the data.” – Edward Tufte
Six Principles of Graphical Integrity
• The representation of numbers should be directly proportional to the numerical quantities
represented.
• Clear, detailed, and thorough labeling should be used to defeat graphical distortion and
ambiguity.
• Show data variation, not design variation.
• In time-series displays of money, deflated and standardized units of monetary measurement are
nearly always better than nominal units.
• The number of information-carrying (variable) dimensions depicted should not exceed the number
of dimensions in the data.
• Graphics must not quote data out of context.
Six Principles of Graphical Integrity
• The representation of numbers should be directly proportional to the numerical quantities
represented.
• Clear, detailed, and thorough labeling should be used to defeat graphical distortion and
ambiguity.
• Show data variation, not design variation.
• In time-series displays of money, deflated and standardized units of monetary measurement are
nearly always better than nominal units.
• The number of information-carrying (variable) dimensions depicted should not exceed the number
of dimensions in the data.
• Graphics must not quote data out of context.
Distortion in Data Graphic
• Lie Factors greater than 1.05 or
less than 0.95 indicate substantial
distortion, far beyond minor
inaccuracies in plotting.

𝐿𝑖𝑒 𝐹𝑎𝑐𝑡𝑜𝑟
𝑠𝑖𝑧𝑒 𝑜𝑓 𝑒𝑓𝑓𝑒𝑐𝑡 𝑠ℎ𝑜𝑤𝑛 𝑖𝑛 𝑔𝑟𝑎𝑝ℎ𝑖𝑐
𝑠𝑖𝑧𝑒 𝑜𝑓 𝑒𝑓𝑓𝑒𝑐𝑡 𝑖𝑛 𝑑𝑎𝑡𝑎
Six Principles of Graphical Integrity
• The representation of numbers should be directly proportional to the numerical quantities
represented.
• Clear, detailed, and thorough labeling should be used to defeat graphical distortion and
ambiguity.
• Show data variation, not design variation.
• In time-series displays of money, deflated and standardized units of monetary measurement are
nearly always better than nominal units.
• The number of information-carrying (variable) dimensions depicted should not exceed the number
of dimensions in the data.
• Graphics must not quote data out of context.
Design and Data Variation
• The confounding of design variation
with data variation over the surface of
a graphic leads to ambiguity and
deception, for the eye may mix up
changes in the design with changes in
the data.
Six Principles of Graphical Integrity
• The representation of numbers should be directly proportional to the numerical quantities
represented.
• Clear, detailed, and thorough labeling should be used to defeat graphical distortion and
ambiguity.
• Show data variation, not design variation.
• In time-series displays of money, deflated and standardized units of monetary measurement are
nearly always better than nominal units.
• The number of information-carrying (variable) dimensions depicted should not exceed the number
of dimensions in the data.
• Graphics must not quote data out of context.
Time-Series Displays of Money
• The case of skyrocketing government spending
Time-Series Displays of Money
• By not deflating, the
graphic (on the previous
slide) mixes up changes in
the value of money with
changes in the budget.
• Computing expenditures in
constant (real) dollars per
capita reveals a more
accurate picture ►
Six Principles of Graphical Integrity
• The representation of numbers should be directly proportional to the numerical quantities
represented.
• Clear, detailed, and thorough labeling should be used to defeat graphical distortion and
ambiguity.
• Show data variation, not design variation.
• In time-series displays of money, deflated and standardized units of monetary measurement are
nearly always better than nominal units.
• The number of information-carrying (variable) dimensions depicted should not exceed the number
of dimensions in the data.
• Graphics must not quote data out of context.
Visual Area and Numerical Measure
• The use of two (ore three)
varying dimensions to show
one-dimensional data is a
weak and inefficient
technique.
• Using areas to show one-
dimensional data confuses
data variation with design
variation.
Six Principles of Graphical Integrity
• The representation of numbers should be directly proportional to the numerical quantities
represented.
• Clear, detailed, and thorough labeling should be used to defeat graphical distortion and
ambiguity.
• Show data variation, not design variation.
• In time-series displays of money, deflated and standardized units of monetary measurement are
nearly always better than nominal units.
• The number of information-carrying (variable) dimensions depicted should not exceed the number
of dimensions in the data.
• Graphics must not quote data out of context.
Context is Essential for Graphical Integrity
• The “Compared to what?” question
Context is Essential for Graphical Integrity
• The “Compared to what?” question
Data-Ink
Good Statistical Graphics
Five Principles of Good Statistical Graphics
• Above all else show the data.
• Maximize the data-ink ratio.
• Erase non-data-ink.
• Erase redundant data-ink.
• Revise and edit.
Five Principles of Good Statistical Graphics
• Above all else show the data
• The basis for a theory of data graphics

• Maximize the data-ink ratio


• Erase non-data-ink
• Erase redundant data-ink
• Revise and edit
Playfair’s time-series
chart, published in
The Commercial and
Political Atlas, 1785
(Source: Wikimedia
Commons)
Playfair’s trade-
balance time-series
chart, published in
his Commercial and
Political Atlas, 1786
(Source: Wikipedia)
Five Principles of Good Statistical Graphics
• Above all else show the data
• Maximize the data-ink ratio
• Erase non-data-ink
• Erase redundant data-ink
• Revise and edit
Data-Ink
• A large share of ink on a graphic should present data-information, the ink
changing as the data change.
• The non-erasable core of a graphic: Proportion of a graphic’s ink devoted to the
non-redundant display of data-information
• Data-ink ratio = 1.0 – proportion of a graphic that can be erased without loss of data-
information

𝐷𝑎𝑡𝑎-𝑖𝑛𝑘 𝑟𝑎𝑡𝑖𝑜
Data-Ink
• Maximize the data-ink ratio, within reason
• Every bit of ink on a graphic requires a reason
Data-Ink Maximization
• Redesign of the scatterplot
Data-Ink Maximization
• Redesign of the bar chart
Five Principles of Good Statistical Graphics
• Above all else show the data
• Maximize the data-ink ratio
• Erase non-data-ink
• Erase redundant data-ink
• Revise and edit
Erase Non-Data-Ink, within Reason
• Ink that fails to depict statistical information does not have
much interest to the viewer of a graph.
• Gratuitous decoration and reinforcement of the data
measures generate much redundant data-ink.
Five Principles of Good Statistical Graphics
• Above all else show the data
• Maximize the data-ink ratio
• Erase non-data-ink
• Erase redundant data-ink
• Revise and edit
Erase Redundant Data-Ink, within Reason
• Redundant data-ink depicts the same number over
and over
• Bilateral symmetry of data measures creates
redundancy ▼
• Bilateral symmetry doubles the space consumed by the
design in a graphic, without adding new information.

• Example: Chernoff faces – half faces carry the


same information as full faces.
Erase Redundant Data-Ink, within Reason
• Redundancy can be useful:
• Giving a context and order to
complexity
• Facilitating comparisons over various
parts of the data
• Creating an aesthetic balance
Five Principles of Good Statistical Graphics
• Above all else show the data
• Maximize the data-ink ratio
• Erase non-data-ink
• Erase redundant data-ink
• Revise and edit
Revise and Edit
• A graph drawn by Roger Hayward (science illustrator) shows the periodicity of
properties of chemical elements.
Chartjunk
Chartjunk
• Non-data-ink or redundant data-ink
• Unintentional optical art
• The grid
• Self-promoting graphics
Unintentional Optical Art
• Moiré effect: the design interacts with
the physiological tremor of the eye to
produce the distracting appearance of
vibration and movement.
Unintentional Optical Art
• Tell the story with a table?
The Grid
• The grid should usually be muted or completely suppressed so that its presence is
only implicit – lest it compete with the data.
The Grid
• When a graphic serves as a look-up table, a grid may help in reading and
interpolating but should be muted relative to the data.
Self-Promoting Graphics
• When a graphic is taken over by decorative forms or
computer debris, when the data measures and structures
become Design Elements, when the overall design purveys
Graphical Style rather than quantitative information.
References
• Cairo, A. (2016). The truthful art: Data, charts, and maps for communication.
• Few, S. (2009). Now you see it: simple visualization techniques for quantitative
analysis.
• Tufte, E. R. (2013). The visual display of quantitative information.
• Yau, N. (2011). Visualize This: The FlowingData Guide to Design, Visualization,
and Statistics.

You might also like