Week1 Introduction Notes
Week1 Introduction Notes
Week 1
What are my and your responsibilities?
As teacher, I have the following responsibilities:
1. Come prepared to every class.
2. Design the class so you can accomplish the objectives listed in the syllabus.
3. Create a mutually respectful classroom environment.
4. Consider that it is not always your fault if you do not understand the material.
As student, you have the following responsibilities:
1. Come prepared to every class.
2. Consistently participate in class discussions and activities.
3. Treat everyone with respect.
4. Consider that it is not always my fault if you do not understand the material.
What are instructional coordinators?
• Instructional coordinators (ICs) are a student’s first point of contact for all course-related questions. The
ICs and instructor work as a team to provide the highest quality learning environment for students. A
student should contact their IC regarding:
• Course content… if a student does not understand a topic and seeks clarification and/or assistance
• Course details… questions regarding due dates, extension requests, assignment clarification, grades, syllabus
• ICs help students stay on track, may provide suggestions, and touch base with students on seen issues.
ICs also grade assessment components (e.g., assignments, quizzes).
• Do not hesitate to contact your IC if you have questions, concerns, feel off track, or experience
emergencies that affect your ability to stay engaged in the class.
An Overview of Data Visualization
Examples:
Charts, plots, line charts
Visualization has a purpose
- can be used to support find patterns in data, identify
outliers, what is the data relationship
- Data vis can help display the findings, to better
communicate and help them understanding the message
Data Visualization
• Data visualization: an umbrella term to cover all types of visual representations
that support the exploration, examination, and communication of data.
• Information visualization and scientific visualization: subsets of data visualization
• Information visualization: the use of computer-supported, interactive, visual representations of
abstract data to amplify cognition.
• Scientific visualization: visual representation of scientific data that are usually physical in
nature, rather than abstract. This is the main difference between the scientific vs informational e.g.
mri scan, xray, they display things in visual form
this kind of display can be interactive to show different views, provide additional information
Infographics
• An infographic is a multi-section visual
representation of information intended to
communicate one or more specific messages.
• Its designer does not show all information they
gathered, but just the portion that is relevant
for the point (or points) that they are trying to
make.
• Example: A Question of Taste (Adolfo Arranz)
needs
persons
to each
e according
customizabl
These are
interactive
This is
life.
data in their
relate to the
explore and
People can
News Application
• A news application is a special kind
of visualization that lets people
relate the data being presented to
their own lives.
• Main goal: be customizable
according to each person’s needs.
• Example: HealthCare.gov Explorer
(The Wall Street Journal)
Data Art
• Also called information art or
informatism.
• The objective of data art is to
create aesthetic forms and artistic
works from the digital nature of
the data generated from big
data (graphics, simulations,
worksheets, statistics, etc.).
• Source: Grugier (2016)
es
os
rp
pu
tic
tis
ar
th
wi
d
te
ea
cr
-
es
ey
of
re
su
ea
pl
e
th
r
fo
Other Terms
• A chart is a display in which data are encoded with symbols that have different
shapes, colors, or proportions. (E.g., lollipop chart)
• Plot: synonym of “chart”, but commonly used to refer to a few specific charts (e.g.,
scatter plot)
• A map is a depiction of a geographical area or a representation of data that
pertains to that area.
History of Visualization
A picture conveyed the information “with more exactness, and in much less time, than it [would take] by
reading.” – Joseph Priestley
Priestley's Chart of Biography from 1765 (Wikipedia)
The time distribution
of events considered
milestones in the
history of data
visualization, shown
by a rug plot and
density estimate.
(source: Michael
Friendly)
History of Visualization
• Events that have contributed to the visualization of quantitative data
• Table (a simple arrangement of data into columns and rows) emerged in
Babylonia at around 2500 BCE.
• Visual encoding of quantitative data arose in the 17th century.
• In the late 18th and early 19th century, William Playfair invented the bar chart
and pie chart, and was the first to use line graphs.
• The first graphing course was offered at Iowa State University in 1913.
History of Visualization
• In 1967, Jacques Bertin introduced the notion of visual language.
• In 1977, John Tukey introduced a new approach to statistics called exploratory
data analysis (box plot as one of his inventions)
• Other important figures:
• Edward Tufte
• William Cleveland
Playfair’s trade-
balance time-
series chart,
published in his
Commercial and
Political Atlas,
1786
(Wikipedia)
Florence Nightingale’s
“rose diagrams”
showed deaths from
disease (blue), war
wounds (red) and
other causes (black).
(Wellcome Library,
London)
John Snow’s Cholera map: an
early and most worthy use of
a map to chart patterns of
disease
A map shows the
distribution of the slave
population in the
Southern states of the
United States, based
on the 1860 census.
(Library of Congress
Geography and Map
Division)
Minard’s French original (“…seeming to defy the pen of the historian by its brutal eloquence…”)
Why Visualization?
• Graphics reveal data
• Graphics can be more precise and revealing than conventional statistical
computations. (E.g., Anscombe’s quartet)
Why Visualization?
Applying the data set to the graph
means we can quickly see the
trends/patterns
Why Visualization?
Graphical Integrity
“Graphical excellence begins with telling the truth about the data.” – Edward Tufte
Six Principles of Graphical Integrity
• The representation of numbers should be directly proportional to the numerical quantities
represented.
• Clear, detailed, and thorough labeling should be used to defeat graphical distortion and
ambiguity.
• Show data variation, not design variation.
• In time-series displays of money, deflated and standardized units of monetary measurement are
nearly always better than nominal units.
• The number of information-carrying (variable) dimensions depicted should not exceed the number
of dimensions in the data.
• Graphics must not quote data out of context.
Six Principles of Graphical Integrity
• The representation of numbers should be directly proportional to the numerical quantities
represented.
• Clear, detailed, and thorough labeling should be used to defeat graphical distortion and
ambiguity.
• Show data variation, not design variation.
• In time-series displays of money, deflated and standardized units of monetary measurement are
nearly always better than nominal units.
• The number of information-carrying (variable) dimensions depicted should not exceed the number
of dimensions in the data.
• Graphics must not quote data out of context.
Distortion in Data Graphic
• Lie Factors greater than 1.05 or
less than 0.95 indicate substantial
distortion, far beyond minor
inaccuracies in plotting.
𝐿𝑖𝑒 𝐹𝑎𝑐𝑡𝑜𝑟
𝑠𝑖𝑧𝑒 𝑜𝑓 𝑒𝑓𝑓𝑒𝑐𝑡 𝑠ℎ𝑜𝑤𝑛 𝑖𝑛 𝑔𝑟𝑎𝑝ℎ𝑖𝑐
𝑠𝑖𝑧𝑒 𝑜𝑓 𝑒𝑓𝑓𝑒𝑐𝑡 𝑖𝑛 𝑑𝑎𝑡𝑎
Six Principles of Graphical Integrity
• The representation of numbers should be directly proportional to the numerical quantities
represented.
• Clear, detailed, and thorough labeling should be used to defeat graphical distortion and
ambiguity.
• Show data variation, not design variation.
• In time-series displays of money, deflated and standardized units of monetary measurement are
nearly always better than nominal units.
• The number of information-carrying (variable) dimensions depicted should not exceed the number
of dimensions in the data.
• Graphics must not quote data out of context.
Design and Data Variation
• The confounding of design variation
with data variation over the surface of
a graphic leads to ambiguity and
deception, for the eye may mix up
changes in the design with changes in
the data.
Six Principles of Graphical Integrity
• The representation of numbers should be directly proportional to the numerical quantities
represented.
• Clear, detailed, and thorough labeling should be used to defeat graphical distortion and
ambiguity.
• Show data variation, not design variation.
• In time-series displays of money, deflated and standardized units of monetary measurement are
nearly always better than nominal units.
• The number of information-carrying (variable) dimensions depicted should not exceed the number
of dimensions in the data.
• Graphics must not quote data out of context.
Time-Series Displays of Money
• The case of skyrocketing government spending
Time-Series Displays of Money
• By not deflating, the
graphic (on the previous
slide) mixes up changes in
the value of money with
changes in the budget.
• Computing expenditures in
constant (real) dollars per
capita reveals a more
accurate picture ►
Six Principles of Graphical Integrity
• The representation of numbers should be directly proportional to the numerical quantities
represented.
• Clear, detailed, and thorough labeling should be used to defeat graphical distortion and
ambiguity.
• Show data variation, not design variation.
• In time-series displays of money, deflated and standardized units of monetary measurement are
nearly always better than nominal units.
• The number of information-carrying (variable) dimensions depicted should not exceed the number
of dimensions in the data.
• Graphics must not quote data out of context.
Visual Area and Numerical Measure
• The use of two (ore three)
varying dimensions to show
one-dimensional data is a
weak and inefficient
technique.
• Using areas to show one-
dimensional data confuses
data variation with design
variation.
Six Principles of Graphical Integrity
• The representation of numbers should be directly proportional to the numerical quantities
represented.
• Clear, detailed, and thorough labeling should be used to defeat graphical distortion and
ambiguity.
• Show data variation, not design variation.
• In time-series displays of money, deflated and standardized units of monetary measurement are
nearly always better than nominal units.
• The number of information-carrying (variable) dimensions depicted should not exceed the number
of dimensions in the data.
• Graphics must not quote data out of context.
Context is Essential for Graphical Integrity
• The “Compared to what?” question
Context is Essential for Graphical Integrity
• The “Compared to what?” question
Data-Ink
Good Statistical Graphics
Five Principles of Good Statistical Graphics
• Above all else show the data.
• Maximize the data-ink ratio.
• Erase non-data-ink.
• Erase redundant data-ink.
• Revise and edit.
Five Principles of Good Statistical Graphics
• Above all else show the data
• The basis for a theory of data graphics
𝐷𝑎𝑡𝑎-𝑖𝑛𝑘 𝑟𝑎𝑡𝑖𝑜
Data-Ink
• Maximize the data-ink ratio, within reason
• Every bit of ink on a graphic requires a reason
Data-Ink Maximization
• Redesign of the scatterplot
Data-Ink Maximization
• Redesign of the bar chart
Five Principles of Good Statistical Graphics
• Above all else show the data
• Maximize the data-ink ratio
• Erase non-data-ink
• Erase redundant data-ink
• Revise and edit
Erase Non-Data-Ink, within Reason
• Ink that fails to depict statistical information does not have
much interest to the viewer of a graph.
• Gratuitous decoration and reinforcement of the data
measures generate much redundant data-ink.
Five Principles of Good Statistical Graphics
• Above all else show the data
• Maximize the data-ink ratio
• Erase non-data-ink
• Erase redundant data-ink
• Revise and edit
Erase Redundant Data-Ink, within Reason
• Redundant data-ink depicts the same number over
and over
• Bilateral symmetry of data measures creates
redundancy ▼
• Bilateral symmetry doubles the space consumed by the
design in a graphic, without adding new information.