0% found this document useful (0 votes)
0 views23 pages

L7 - DS304 Visualization

The document discusses various methods for visualizing time series data, emphasizing the importance of maintaining the inherent order of time in data points. It covers different visualization techniques such as line graphs, connected scatterplots, and smoothing methods like moving averages, highlighting their advantages and potential pitfalls. Additionally, it addresses the significance of window size in moving averages and introduces weighted moving averages as a way to assign importance to recent data points.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views23 pages

L7 - DS304 Visualization

The document discusses various methods for visualizing time series data, emphasizing the importance of maintaining the inherent order of time in data points. It covers different visualization techniques such as line graphs, connected scatterplots, and smoothing methods like moving averages, highlighting their advantages and potential pitfalls. Additionally, it addresses the significance of window size in moving averages and introduces weighted moving averages as a way to assign importance to recent data points.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Visualization

Siddharth R
Visualizing Time Series Data
● Data points have an
inherent order of “time”
● Preferred chart type for
visualizing time series data?
● Data: Monthly submissions
to the preprint server
bioRxiv, from November
2013 until April 2018.
● Any difference between
scatterplot?
Visualizing Time Series Data
● Here, the dots are spaced evenly
along the x axis, and there is a
defined order among them.
● Each dot has exactly one left and
one right neighbor (except the
leftmost and rightmost points, which
have only one neighbor each).
● We can visually emphasize this
order by connecting neighboring
points with lines, called “Line
Graph”
Visualizing Time Series Data
● Do we really need to draw lines
between data?
● Is the line corresponding to a
made up data?
● Whether point is important or line
is important?
Visualizing Time Series Data
● Using lines to represent time series
is generally accepted practice,
however, and frequently the dots are
omitted altogether
● Without dots, the figure places more
emphasis on the overall trend in the
data and less on individual
observations
● In general, the denser the time
series, the less important it is to
show individual observations with
dots.
Visualizing Time Series Data
● We can also fill the area under the
curve with a solid color
● It visually separates the area above
the curve from the area below
● It is only valid if the y axis starts
at zero, so that the height of the
shaded area at each time point
represents the data value at that
time point
Multiple Time Series
● If we want to show the monthly submissions to multiple preprint servers, a
scatterplot is not a good idea, because the individual time courses run into
each other
Multiple Time Series
Multiple Time Series - With more than one response variable
● For example, we may be interested
to find the change in house prices
from the previous 12 months as it
relates to the unemployment rate.
We may expect that house prices
rise when the unemployment rate is
low, and vice versa
● We can visualize such data as two
separate line graphs stacked on top
of each other
Twelve-month change in house prices (a) and
unemployment rate (b) over time, from January 2001
through December 2017.
Multiple Time Series
● As an alternative, we can plot the two
variables against each other, drawing a
path that leads from the earliest time
point to the latest
● “Connected scatterplot”, because we
are technically making a scatterplot of the
two variables against each other and
then are connecting neighboring points.
● What is missing here?
Multiple Time Series
● When drawing a connected scatterplot, it is
important that we indicate both the direction
and the temporal scale of the data.
● A gradual darkening of the color to indicate
direction, alternatively, one could draw
arrows along the path
● In a connected scatterplot, you can find
correlated (positive and negative)
movement between the two variables.
● If the two variables have a somewhat cyclic
relationship, we will see circles or spirals in
the connected scatterplot.
Multiple Time Series
● Separate line graphs tend to be easier to read, but it is very hard to spot
patterns like cyclical relationships.
● Once people are used to connected scatterplots, they may be able to extract
certain patterns (such as cyclical behavior with some irregularity) that can be
difficult to spot in line graphs.
● Research reports that readers are more likely to confuse order and direction
in a connected scatterplot than in line graphs
● Connected scatter plots seem to result in higher engagement, and thus such
plots may be effective tools to draw readers into a story
Visualizing Trends
● When making scatter plots or time series , we are often more interested in the
overarching trend of the data than in the specific detail of where each individual
data point lies.
● There are two fundamental approaches to determining a trend:
○ Smoothing is useful for uncovering general patterns in data without
assuming a predefined mathematical relationship, making it ideal for datasets
with a lot of noise or fluctuations.
○ Curve fitting, on the other hand, assumes that a specific function can
describe the trend and provides a mathematical model that can be used for
prediction and deeper analysis.
Smoothing
● Captures key patterns in the data while removing irrelevant minor detail or
noise
● Reduces the impact of outliers or sudden changes
● Common approaches
○ Simple Moving Averages
○ Weighted Moving Averages
○ LOESS
Moving Averages
● This method relies on the notion that observations close in time are likely
to have similar values. Consequently, the averaging removes random
variation, or noise, from the data.
● Averages the values within a sliding window of fixed size
● Types:
○ One sided moving average
○ Centered moving average
Types of Moving Average: One sided vs Centered
One-sided include the current and previous
observations for each average. For example, the
formula for a moving average (MA) of X at time t
with a length of 7 is the following:

Centered include both previous and future


observations that surround it in both directions. It
also known as two-sided moving averages. The
formula for a centered moving average of X at time t
with a length of 7 is the following:
Example: Florida Covid Daily Deaths
● Here human-based scheduling factor
that influences when causes of death
are recorded.
● Some activities must be less likely to
occur on weekends because the
lowest day of the week is almost
always Sunday, and weekends
● Because of this seasonal pattern, the
number of recorded deaths for a
particular day depends on the day of
the week you’re evaluating.

Source: https://statisticsbyjim.com/time-series/moving-averages-smoothing/
Example: (Contd..)
● Now we need to remove this season
pattern to reveal the underlying trend
component.
● The graph displays one-sided moving
averages with a length of 7 days for these
data. Notice how the seasonal pattern is
gone and the underlying trend is visible.
● Each moving average point is the daily
average of the past seven days.
Importance of Window Size
● The 20-day moving average
removes small, short-term spikes
but otherwise follows the daily data
closely
● The 100-day moving average, on
the other hand, removes even fairly
substantial drops or spikes that
play out over a time span of
multiple weeks.
Common pitfall in moving average
● What is the difference between the two
charts?
● Parts are missing at either the beginning
or the end or both.
● All data points in the window are weighted
equally
Weighted Moving Average (WMA)
● WMA is a moving average where each data
point is assigned a specific weight based
on its position in the series.
● Weights are assigned linearly or according
to a specific pattern chosen by the user.
● The most recent data point typically
receiving the highest weight.
● The sum of the weighting should add up to
one (or 100%).

(172.38×5/15)+(171.37×4/15)+178.67×3/15)+(176.08×2/15)+(172.72×1/15)=173.85
Next Class
● Parametric vs Nonparametric Curve Fitting
● Locally estimated scatterplot smoothing (LOESS)
Thank You !!!

You might also like