Lecture 4. Visualization(1)
Lecture 4. Visualization(1)
look at
Communicating Communicate Perform test
Exhibit 4-2 Anscombe's Quartet (Data) Exhibit 4-3 Figure Plotting the Four Datasets in Anscombe's
Quartet
4
Visualizations are preferred over text.
• People prefer visuals.
• The brain can process visuals faster.
• Visuals can summarize complex information
5
Chart Types
4-7
Probability Distributions
• A distribution in statistics (probability distribution) is a function that
describes the likelihood for all the possible values a random variable
will assume.
• The normal distribution is a bell-shaped probability distribution that
many naturally occurring datasets in our world follow.
Normal Distribution
• Bell-shaped, symmetric family of distributions
• Classified by 2 parameters: Mean (µ) and standard deviation (s).
• Random variables that are approximately normal have the following
properties:
• Mean is the same as the Median
• Approximately half (50%) fall above (and below) mean
• Approximately 68% fall within 1 standard deviation of mean
• Approximately 95% fall within 2 standard deviations of mean
• Virtually all fall within 3 standard deviations of mean
• Notation when Y is normally distributed with mean µ and standard
deviation s :
Y ~ N (µ ,s )
Normal Distribution
Y -µ
Y ~ N (µ ,s ) Þ Z = ~ N (0,1)
s
• Probabilities of certain ranges of values and specific percentiles of interest can be obtained
through the standard normal (Z) distribution
Is your visualization declarative or
exploratory?
• Declarative visualization • Exploratory visualizations
is used to declare or are used to gain insights
present your findings to while you are interacting
an audience. with data.
(e.g. presenting financial (e.g. identifying potential
reports) customers)
Once you have
defined your data
and the purpose,
you can find an
appropriate chart or
graph.
Pie charts
• Word clouds:
Counting frequency
of each words in the
datafile.
Which charts are appropriate for quantitative
data (more complicated data)?
• Line charts:
Show changes and trend.
Outlier detection:
Comparison:
Box and whisker plot
Bar chart
Pie chart
Stacked bar chart Relationship between two
Tree map variables: Here is a summary
Heat map Scatter plot
guide of when to use
different visualizations.
Geographic data: Trend over time:
Symbol map Line chart
Exhibit 4-12
A more appropriate scale is a good start.
Exhibit 4-15
If we care about individuals, an ordered bar
chart is more clear.
If we care about job function, a bar chart can
show the proportion more clearly.
And a stacked bar chart shows the proportion
by job function.
The following four charts represent the exact same data quantity of each beer sold. Which do
you prefer?
A C
B
D
Consider scale and increments:
• How much data do you need • What is the baseline? 0?
to show? Something else?