0% found this document useful (0 votes)
9 views

2.multivariate Analysis & Visualization

Uploaded by

Shashank S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

2.multivariate Analysis & Visualization

Uploaded by

Shashank S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 410

Visual Analysis of 2D Data

Recap 1D Data visualization


Recap 1D Data visualization
Recap 1D Data visualization
Recap 1D Data visualization
Recap 1D Data visualization
Recap 1D Data visualization
Recap 1D Data visualization
Visual Analysis of 2D Data
TIME-SERIES DATA VISUALIZATION
Data is in a series of particular time periods or
intervals.
Time Series  Period of time

Data
 Start
 End
 One or more value at each time
Time Series data
 No quantitative relationship receives more
attention than value changing over time.
 A statistics from 15 yrs of newspaper analysis
 > 75% features time series analysis
 Ex

Time Series 


Stock performance over time (day/hour)
Fiscal spending over time (year)

Analysis  Effect of the drug over time (days/hours/minutes)


 Experimental outcome as a function of time
 Covid cases/deaths over time (days, months)
 Goal:
 Understand present
 (may be) Predict the future in light of the past.
 Meaningful patterns that come out from time
series analysis
 Trend
Time series  Variability

patterns
 Rate of change
 Co-variation
 Cycles
 Exceptions
 The overall tendency of a series of values to

Trend: 


increase,
decrease or

Time Series  remain stable


during a particular period of time.
Analysis
 Line chart are used to track changes over short
and long periods of time.
 When smaller changes exist, line charts are
better to use than other charts.

Trend:
Line Charts
Anatomy of
Line Chart
Drawn by
- First, mapping the data (time and value) a
2D point on a Cartesian coordinate grid,
and
- then connecting a line between all of these
points.
Typically, the y-axis has a quantitative value,
while the x-axis is a timescale or a sequence of
intervals. https://datavizcatalogue.com/methods/line_graph.html
 Line charts are one of the oldest forms of data
visualization

Trend:  Data Types:


 Quantitative Independent variable(time),

Line Charts  Quantitative dependent variable(change)


 Visual Properties:
 Position, color-hue
 Line chart are used to track changes over short
and long periods of time.
 When smaller changes exist, line charts are
better to use than other charts.
 Line chart lets the viewer compare changes over
Trend: the same period of time for more than one
group.
Line Charts
Trend:
Line Charts
Trend:
Time Series
Analysis
Variability:
Time Series
Analysis
Variability:
Time Series
Pattern
 Variability:
 Average Degree of
Change from one point
of time to the next
throughout a particular
span of time.
 Line graphs are good in
showing degree of variability
 Jagged line: greater
variability
 Caution:
 narrow scale
exaggerates the
variability appearance.
Rate of Change: Time Series Pattern

 Change from one value to


another
 Expressed as the percentage
difference
Cycles:
Time
Series
patterns
CYCLES: PATTERNS THAT
REPEAT AT REGULAR
INTERVALS (EX: DAILY,
WEEKLY, MONTHLY, ETC…)
Exception:
Values that
fall outside
the norm
Line Chart: Guidelines

 Use them to show trends or


changes over time, such as
price changes. Place line
charts where they can get
attention.
 Avoid putting numbers on the
line chart itself.
 Do not place too many lines
on the chart: highlight just one
or two important lines in the
chart but keep the others as
context in the background.

https://www.eea.europa.eu/data-and-maps/daviz/learn-more/chart-dos-and-donts
Time Series Displays

 Line Graphs/Area Charts  Line Graph: Often used. Better


shows:
 Area chart
 Sequential flow/change of values
 Bar Graphs over time
 Point Plots  Slope displays extent and slope of
 Radar Graphs change
 Easy to see the overall changes for
 Heatmaps
the time period, ups and downs
 Box plots from time to time.

 Scatter plots
Time Series Displays
 Area chart: represents the change in one or more
quantities over time.
 Similar to line graph.
 Line Graphs
 data points are plotted and then connected by line
 Area Chart segments to show the value of a quantity at several
different times.
 Bar Graphs
 Different from line graphs: the area between the x
 Point Plots axis and the line is filled in with color or shading:
 Radar Graphs
 Heatmaps
 Box plots
 Scatter plots
Anatomy of
Area Chart

https://datavizcatalogue.com/methods/area_graph.html
 Primary Use: Changes over time
 Data type:
 quantitative values
 Visual property:

Area Chart 
 height, slope, area
Area charts are a good choice to use when
you want to show a trend over time but aren't
as concerned with showing exact values.
 William Playfair [1786] is credited with inventing
the area charts as well as the line, bar, and pie
charts.

Area Chart

Wiki
 “The Scottish Scoundrel Who Changed How We
See Data: When he wasn’t blackmailing lords
and being sued for libel, William Playfair
invented the pie chart, the bar graph, and the
line graph.” Atlas Obscura (June 28, 2016).

William  apprenticed with James Watt, of steam engine


fame,

Playfair 


failed at silversmithing,
falsely claimed to have invented the semaphore
telegraphy,
 tried blackmailing a Scottish lord,
 sold tracts of American land he didn't actually
own to French nobility, and
 died in poverty and obscurity.

http://conversableeconomist.blogspot.com/2017/08/william-playfair-inventor-of-bar-graph.html
Difference Area Chart

 William Playfair [1786] is


credited with inventing
line, bar, area, and pie
charts along with
difference area charts.

Exports and
imports to
and from
Denmark &
Norway from
Wiki 1700 to 1780
Overlapping Area Chart

https://en.wikipedia.org/wiki/Area_chart
Stacked Area Chart

Almost all countries are


contributing to the rise,
with emissions in China
up 4.7%, in the US by
2.5% and in India by
6.3% in 2018. The EU’s
emissions are near flat.

https://www.theguardian.com/environment/2018/dec/05/brutal-
news-global-carbon-emissions-jump-to-all-time-high-in-2018
Stacked Area Chart: Anatomy

 Stacked Area Graphs work in the


same way as simple Area
Graphs do, except for the use of
multiple data series that start each
point from the point left by the
previous data series
 The entire graph represents the
total of all the data plotted.
 They do not work for negative
values.

https://datavizcatalogue.com/methods/stacked_area_graph.html
Stacked Area Chart

 Primary Use: Changes over time


 Data type:
 quantitative variables
 for multiple data series with part-to-
whole relationships or for cumulative
series of values.
 Visual property:
 height, area, color-hue
 A different visualization of Stacked area chart.
 instead of plotting values against a fixed, straight
axis, a Stream Graph has values displaced around
a varying central baseline.

Stream
Graph/Stream
Chart

http://nvd3.org/examples/stackedArea.html
Stacked Chart vs. Stream chart
Stream Graph/Stream Chart:
Anatomy

https://datavizcatalogue.com/methods/stream_graph.html
Cons:
Making

 suffer from legibility issues, as they are often very

Sense of
cluttered with large datasets.
 it's impossible to read the exact values visualised

Stream in a Stream Graph, as there is no axis to use as a


reference.

Graph  http://www.visualisingdata.com/2010/08/makin
g-sense-of-streamgraphs/
Stream Chart / Stream Graph
The overall effect
is artful.
The New York Times
produced this
fascinating graphic
showing box
office receipts for films
from 1986-2008.

Patterns of the amount of money


films over a 21 year period made
at the box office. The total
takings are shown by the varying
heights the shapes reach over
time. The color scheme
represents the gross takings and
the length of each shape reveals
its longevity in the cinemas. Ref: Matthew Bloch; Lee Byron; Shan Carter; Amanda Cox (23 February
2008). "The Ebb and Flow of Movies: Box Office Receipts 1986–2007". The New
York Times.
 Area charts work best if the total is as important
Line Graphs as its parts. If the total (= the height of all your
stacked areas) is not important, consider a line
chart instead. Many readers will have an easier
vs Area time understanding a line chart than an area
chart.
Charts  Area charts are good to show the parts of the
whole over time.

https://blog.datawrapper.de/area-charts/
Time Series Displays

 Line Graphs/Area Charts  Bar Graphs:


 Bar Graphs  used to emphasize individual
values at distinct points in time.
 Point Plots
 Radar Graphs
 Heatmaps
Time Series Displays
 Point plots:
 Goes along with line plot.
 Line Graphs/Area Charts
 For analyzing data in irregular time
 Bar Graphs interval

 Point Plots
 Radar Graphs
 Heatmaps
Time Series Displays
 Radar/ Spider Graphs: Circular
shape can be used to show the
cyclical nature of time

 Line Graphs/Area Charts


 Bar Graphs
 Point Plots
 Radar/ Spider Graphs
 Heatmaps
Time Series Displays
 Heatmaps: display cyclical data
that can not be displayed clearly
using line plot.
 Line Graphs/Area Chart
 Bar Graphs
 Point Plots
 Radar Graphs
 Heatmaps
Time Series Displays

 Line Graphs/Area Chart


 Bar Graphs
 Point Plots
 Radar Graphs
 Heatmaps
Time Series Displays

 Line Graphs/Area Chart  Candle stick chart: used a lot in the financial
world.
 Bar Graphs
 It visualizes the change of price information
 Point Plots against time by providing multiple price
parameters at the same time.
 Radar Graphs
 Heatmaps
 Candle stick chart
Candle stick

Symbols resemble a Box Plot, but


they function differently and
therefore, are not to be confused
with one another.

A candle shows the


• high, low,
• open, and
• close price information for a
predetermined time interval.

https://en.wikipedia.org/wiki/Candlestick_chart
Sample market data and
Candle Stick chart
 Visualization:
 Use Line Chart
Time Series  line chart with point plot: use for non-regular time
interval
Analysis and  Bar chart: use for comparing specific x-axis
values.
Best  Aggregating to various time intervals

Practices  Use log scale if values as a function of time


covers a very large range
 Optimizing graph aspect ratio
Common, Effective
Data Visualization
Techniques
2D DATA
Recap

 Time series data:


 Line plot
Time series:
Aspect ratio
Time Series:
Try different
scale for Y
Recap

 Time series data:


 Line plot
 Area plot:
 Mostly used for multivariate
data to show part in the whole
relationship over time.
Recap

 Time series data:


 Line plot
 Area plot
 Point Plots
 Goes along with line plot.
 Mostly used for analyzing data in
irregular time interval
Recap

 Time series data:


 Line plot
 Area plot
 Point Plots
 Radar Graphs
 Polar line plot
 Circular shape can be
used to show the
cyclical nature of time
Recap
Candlestick Plot:
A chart used to describe price movements of stock, derivative, or
currency over time
Bar Plot for Time-Series data
Bar charts are also used
• to track changes over time. However, when trying to
measure change over time, bar graphs are best when
 Time series data: the changes are larger and smaller #of time events
 Line plot
 Area plot
 Point Plots
 Radar Graphs
 Bar plot
Bar Plot
Bar charts are also used
• to track changes over time. However, when trying to
measure change over time, bar graphs are best when
 Time series data: the changes are larger and smaller #of time events
 Line plot
 Area plot
• when comparing multi-variate data

 Point Plots
 Radar Graphs
 Bar plot
Time Series Viz:
Timeline Chart
 Events (Nominal data) happening as a
function of time.
Time Series Viz: Timeline Chart

 A timeline shows a list of events


in chronological order.
 The events are mostly placed
on a linear time scale.
Timelines are useful for telling a
story of events over time.
It's like a time-series line chart with
no Y dimension.
Visual Analysis of 2D data
FOR COMPARISON, RANKING AND PART-TO-WHOLE RELATIONSHIP
Bar Chart The US economy added a whopping 517,000 jobs in January

Source: https://www.cnn.com/2023/02/03/economy/january-jobs-report-final/index.html
Bar Chart
Frequency of Letters in English Text
 Primary Use:
 Comparing Category,
Comparing changes over time
 Data Types:
 Discrete or categorical
Independent Variable and
Quantitative Dependent
Variable
 Visual Properties:
 Length/Height, Color-hue
Bar Chart
 Presents time series or categorical data with
rectangular bars with heights or lengths proportional
to the values that they represent.
 The bars can be plotted vertically or horizontally.
 Used for comparing.

Bar Chart

Anatomy of a Bar Chart

Source: https://datavizcatalogue.com/methods/bar_chart.html
Bar Chart
 Primary Use:
 Comparing Category,
Comparing changes over time
 Data Types:
 Discrete or categorical
Independent Variable and
Quantitative Dependent
Variable
 Visual Properties:
 Length/Height, Color-hue

Source https://www.cdc.gov/measles/cases-outbreaks.html
Effective visualization: Vertical or horizontal
 Long Category name
or
 large number of categories
 Vertical bar chart can be
unintuitive
Effective visualization: Vertical or horizontal

 Long Category name


or
 large number of categories
 horizontal bar chart is preferable.
Bar Chart

Artistic
Rendition
using Shapes
Span Chart

Source:
Report from Center for Excellence in Education
https://www.cee.org/about-us/cee-index-
excellence-stem-education
Anatomy of Span Chart

https://datavizcatalogue.com/methods/span_chart.html
Multivariate Bar Chart
BAR CHARTS CAN PRESENT AND COMPARE MORE THAN ONE GROUP
OF DATA AT A TIME.
Population Pyramid
 Bivariate Horizontal Bar chart:
 Independent dimension:
Categorical (ordinal) data
 Dependent dimension:
 Quantitative data
 Population: Male and Female

Source:
https://www.populationpyramid.net/india/2020/
Population Pyramid

Source: https://www.visualcapitalist.com/population-pyramids-compared/
 Grouped bar charts are good for comparing
between each element in the categories, and
comparing elements across categories or
changes over time.

Grouped
Clustered Bar
Bar Chart Chart

Source:
https://www.reuters.com/world/india/despite-world-beating-
growth-indias-lack-jobs-threatens-its-young-2023-05-30/
Group Bar Chart

 Primary Use:
 Comparing Category, Comparing
changes over time
 Data Types:
 Discrete or categorical
Independent Variable
 and Quantitative Dependent
Variable
 Visual Properties:
 Length/Height, Color-hue
 Presents more than one group of data at a
time.
 Compares Part-to-whole relationship over time.

Stacked
Grouped
Bar Chart

My iphone screen time screen shot


Stacked
Grouped
Bar Chart
 Presents more than one group of data at a
time.
 Compares Part-to-whole relationship over time.

https://www.pewresearch.org/fact-tank/2023/01/03/118th-congress-has-a-record-number-of-women/ft_22-01-03_womencongress_1/
 A variation on the stacked bar chart is one in
which the stacks diverge from a central
baseline in opposite directions.
 Used to show differences in opposing
sentiments or groups

Diverging
Bar Chart

Source: "Comparing Categories". Chapter 4. Better Data Visualization


 Shows variation among multiple groups as
percentages rather than absolute numbers.

Grouped Gender Diversity in US Congress by Age group

Bar Chart
 Shows variation among multiple groups as
percentages rather than absolute numbers.

Grouped
Bar Chart
 General Guideline
 Ordering:
 Use the bars in any order for nominal categories,
but in order, for ordinal categories.

Bar Chart

Source:
https://exceljet.net/glossary/ordinal-data
 General Guideline
 Ordering:
 Use the bars in any order for nominal categories,
but in order, for ordinal categories.
 Limit the number of bars.

Bar Chart

Too crowded:
Not ideal
 General Guideline
 Order:
 Use the bars in any order for nominal categories.
In order, for ordinal categories.
 Limit the number of bars.
 Use Horizontal:
 A large number of different categories and
Bar Chart there is insufficient space to fit all the columns
required for a vertical bar chart across the page
 When Categories have long names (difficult to
put them below vertical bars)
 Some choose Horizontal chart for nominal
categories and Vertical chart for ordinal category.
 Some call Vertical Charts are Column Charts and
Horizontal charts as Bar Charts (ex: Excel)
Plotly Express bar chart Support

 Bar chart and variants


 https://plotly.com/python/bar-charts/
 px.bar(df, x=…, y = …, color = …)
 https://seaborn.pydata.org/generated/seaborn.barplot.html
 sns.barplot(data=df, x=“…", y=“…", hue=“…")
Bar Charts are good, but

Multiple example collections from:


https://viz.wtf/search/bar+chart
Bar Charts are Good, but …

 With bar charts it's easy to distort the proportion by changing the
scale.
Bar Charts are Good, but …

With bar charts it's


easy to distort the
proportion by
changing the scale.

Source:
https://tradingeconomics.com/india/enrolment-in-upper-secondary-education-both-sexes-number-wb-data.html
Misleading charts 1

see http://viz.wtf/
Bar Charts are Good, but …

 With bar charts it's easy to distort the proportion by changing the
scale.
Bar Chart: Best Practices

 General Guideline
 Order:
 Use the bars in any order for nominal categories. In order, for ordinal
categories.
 Limit the number of bars.
 Use Horizontal:
 A large number of different categories and there is insufficient space to fit all
the columns required for a vertical bar chart across the page
 When Categories have long names (difficult to put them below vertical bars)
 Some choose Horizontal chart for nominal categories and Vertical chart for
ordinal category.
 Some call Vertical Charts are Column Charts and Horizontal charts as Bar Charts
(ex: Excel)
 In general, do not show a bar chart without a zero.
Bar Charts are Good, but …

Confusing label

https://www.idealmedicalcare.org/1103-doctor-suicides-13-reasons-why/
Bar Charts are Good, but …

What scale is used?

Source:
https://viz.wtf/post/90690163613/bar-chart-table#notes
Bar Chart: Best Practices

 General Guideline
 Order:
 Use the bars in any order for nominal categories. In order, for ordinal
categories.
 Limit the number of bars.
 Use Horizontal:
 A large number of different categories and there is insufficient space to fit all
the columns required for a vertical bar chart across the page
 When Categories have long names (difficult to put them below vertical bars)
 Some choose Horizontal chart for nominal categories and Vertical chart for
ordinal category.
 Some call Vertical Charts are Column Charts and Horizontal charts as Bar Charts
(ex: Excel)
 In general, do not show a bar chart without a zero.
 Make proper labeling
Misleading Charts!

see http://viz.wtf/
Bar Chart: Best Practices
 General Guideline
 Order:
 Limit the number of
bars.
 Use Horizontal:
 ...
 …
 Label your scales
correctly
 …
 Do not use multiple
scales

See http://viz.wtf/
Variations to Bar Chart
 Very much similar to a normal bar chart.
 the bar is replaced by a line anchored from the
x axis and a dot at the end to mark the value.
 Conveys the same information as bar charts.

Lollipop
Chart

https://datavizproject.com/data-type/lollipop-chart/
 Dot at the end may be replaced by another
symbol
 Like in Bar chart, Coordinates may be flipped to
make the segments horizonal.

Lollypop
Chart

Source: http://www.datarevelations.com/tag/ryan-sleeper
Radial Bar (column) Chart
 Uses a grid of concentric circles to plot bars
on. Each circle on the graph represents a
value on a scale, while the radial dividers (lines
spanning from the center) are used for each
category.
 Same purpose as the bar-chart

Re:
https://datavizcatalogue.com/
methods/radial_column_chart.h
tml
Circular Bar Chart

 a Bar Chart plotted on a polar coordinate


system, rather then on a cartesian one.
 Same purpose as the bar-chart. The
angular distance of the end point is
proportional to the values that they
represent.
Circular Bar Chart

 Pros: Compact
 Cons: the problem with Radial Bar Charts is
that the bar lengths can be misinterpreted.
Each bar on the outside gets relatively
longer than the last, even if they represent
the same value.
 Our visual systems are better at interpreting
straight lines, so the Cartesian bar chart is a
better choice for comparing values.
Therefore, Radial Bar Charts are used
primarily for aesthetic reasons.
Polar Area diagram
(Cox-Comb chart)
 Same purpose as the bar-chart. UK temperatures in 2012.
 Suitable for cyclic data
 Constant angle division
 Radius proportional to the value.
 Pros: Good for cyclical data.
 Cons:
 The area of the sectors is proportional
to the squared radius. So it amplifies
the data.

http://prcweb.co.uk/radialbarchart/
 This chart was famously used by Florence
Nightingale to communicate about the
avoidable deaths of soldiers during the
Crimean war.
 In her chart, the area represented the data, not
the radius.

Nightingale’s
Rose Chart

https://datavizcatalogue.com/methods/nightingale_rose_chart.html
Funnel Chart

 A type of chart that shows


progress through a series of linear
and interconnected stages in
which the data values typically
decrease.
 A variation of bar chart, in which
bars are center aligned.
 Typical use:
 In the sales, marketing, and
product teams to improve their
processes.
 helps to visualize the bottlenecks
in a process.

https://plotly.com/python/funnel-charts/
Ranking
ORDERED COMPARISON.
Bar Chart for
Ranking
Bar charts are great for ranking
Done by sorting data
or even plot methods allows you
Plotly allow you to draw them
ordered fashion.
Ranking

 Patterns in Ranking
 Uniform: all values are roughly the same.
 Different:
 Uniformly different: difference between the adjacent data are the
same
 Non-uniformly different: difference between adjacent data vary
 Increasingly/decreasingly different
 Alternating difference: Differences begin with small, shift to large
and then shift back to small.
 Exceptional : one or more data are exceptionally different from
the next
Lollipop chart for ranking
Visual Analysis of 2D Data
Bar Chart: Best Practices (Recap)

 General Guideline
 Order:
 Use the bars in any order for nominal categories. In order, for ordinal
categories.
 Limit the number of bars.
 Use Horizontal:
 A large number of different categories and there is insufficient space to fit all
the columns required for a vertical bar chart across the page
 When Categories have long names (difficult to put them below vertical bars)
➢ Some choose Horizontal chart for nominal categories and Vertical chart for
ordinal category.
➢ Some call Vertical Charts are Column Charts and Horizontal charts as Bar Charts
(ex: Excel)
 In general, do not show a bar chart without a zero.
Bar Charts are Good, but …

Confusing label

https://www.idealmedicalcare.org/1103-doctor-suicides-13-reasons-why/
Bar Charts are Good, but …

What scale is used?

Source:
https://viz.wtf/post/90690163613/bar-chart-table#notes
Bar Chart: Best Practices

 General Guideline
 Order:
 Use the bars in any order for nominal categories. In order, for ordinal
categories.
 Limit the number of bars.
 Use Horizontal:
 A large number of different categories and there is insufficient space to fit all
the columns required for a vertical bar chart across the page
 When Categories have long names (difficult to put them below vertical bars)
➢ Some choose Horizontal chart for nominal categories and Vertical chart for
ordinal category.
➢ Some call Vertical Charts are Column Charts and Horizontal charts as Bar Charts
(ex: Excel)
 In general, do not show a bar chart without a zero.
 Make proper labeling
Misleading Charts!

see http://viz.wtf/
Bar Chart: Best Practices
 General Guideline
 Order:
 Limit the number of
bars.
 Use Horizontal:
 ...
 …
 Label your scales
correctly
 …
 Do not use multiple
scales

See http://viz.wtf/
Variations to Bar Chart
 Very much similar to a normal bar chart.
 the bar is replaced by a line anchored from the
x axis and a dot at the end to mark the value.
 Conveys the same information as bar charts.

Lollipop
Chart

https://datavizproject.com/data-type/lollipop-chart/
 Dot at the end may be replaced by another
symbol
 Like in Bar chart, Coordinates may be flipped to
make the segments horizonal.

Lollypop
Chart

Source: http://www.datarevelations.com/tag/ryan-sleeper
Radial Bar (column) Chart
 Uses a grid of concentric circles to plot bars
on. Each circle on the graph represents a
value on a scale, while the radial dividers (lines
spanning from the center) are used for each
category.
 Same purpose as the bar-chart

Re:
https://datavizcatalogue.com/
methods/radial_column_chart.h
tml
Circular Bar Chart

 a Bar Chart plotted on a polar coordinate


system, rather then on a cartesian one.
 Same purpose as the bar-chart. The
angular distance of the end point is
proportional to the values that they
represent.
Circular Bar Chart

 Pros: Compact
 Cons: the problem with Radial Bar Charts is
that the bar lengths can be misinterpreted.
Each bar on the outside gets relatively
longer than the last, even if they represent
the same value.
 Our visual systems are better at interpreting
straight lines, so the Cartesian bar chart is a
better choice for comparing values.
Therefore, Radial Bar Charts are used
primarily for aesthetic reasons.
Polar Area diagram
(Cox-Comb chart)
 Same purpose as the bar-chart. UK temperatures in 2012.
 Suitable for cyclic data
 Constant angle division
 Radius proportional to the value.
 Pros: Good for cyclical data.
 Cons:
 The area of the sectors is proportional
to the squared radius. So it amplifies
the data.

http://prcweb.co.uk/radialbarchart/
 This chart was famously used by Florence
Nightingale to communicate about the
avoidable deaths of soldiers during the
Crimean war.
 In her chart, the area represented the data, not
the radius.

Nightingale’s
Rose Chart

https://datavizcatalogue.com/methods/nightingale_rose_chart.html
Funnel Chart

 A type of chart that shows


progress through a series of linear
and interconnected stages in
which the data values typically
decrease.
 A variation of bar chart, in which
bars are center aligned.
 Typical use:
 In the sales, marketing, and
product teams to improve their
processes.
 helps to visualize the bottlenecks
in a process.

https://plotly.com/python/funnel-charts/
Ranking
ORDERED COMPARISON.
Bar Chart for
Ranking
Bar charts are great for ranking
Done by sorting data
or even plot methods allows you
Plotly allow you to draw them
ordered fashion.
Ranking

 Patterns in Ranking
 Uniform: all values are roughly the same.
 Different:
 Uniformly different: difference between the adjacent data are the
same
 Non-uniformly different: difference between adjacent data vary
 Increasingly/decreasingly different
 Alternating difference: Differences begin with small, shift to large
and then shift back to small.
 Exceptional : one or more data are exceptionally different from
the next
Lollipop chart for ranking

Hotel Venue G
Part-to-whole  Two of the simplest types of analysis involve
Comparing parts with the whole
and

 Ranking the parts by value

Ranking  Frequently done as well


Pie Chart

 Pie charts are best to use when


you are trying to compare parts of
a whole (100 percent). Each slice
represents a different piece of
data.
 does not show changes over time.
 represents percentages at a set
point in time.

Source: https://datavizcatalogue.com/methods/pie_chart.html
 How individual parts make up the whole of
something?

Part-to-whole
 ex: Faculty, Student proportion of BITS.
 Categorical subdivisions are measured as a
ratio to the whole (i.e., a percentage out of
100%).
Pie Chart

 Part-to-whole relationship (simple


proportions, specific percentages)
 Data types:
 Categorical, quantitative
 Visual properties
 Angle, area, color-hue

Populations of English native


speakers

Wiki
 The earliest known pie chart is generally
credited to William Playfair's Statistical
Breviary of 1801.
 Simple and very good way to show Part-to-
whole relationship
 Compare relative sizes.

Pie Chart

https://en.wikipedia.org/wiki/William_Playfair
Doughnut (Donut) Chart

 a variant of the pie chart, with a


blank center allowing for
additional information about the
data as a whole to be included.

https://www.pewresearch.org/religion/2021/06/
29/religion-in-india-tolerance-and-segregation/

Wiki
3D Pie Chart (Not to be used)

 Often used for in media 3D Pie chart should never be used


 Note: the third dimension make it
difficult reading of the data (the
distorted effect of perspective
associated with the third
dimension).
 The use of superfluous dimensions
not used to display the data of
interest is discouraged for charts in
general, not only for pie charts.
[Wiki]

https://developers.google.com/chart/interactive/docs/gallery/piechart
Infamous
MacWorld's
iPhone Pie
Chart

Perspective Trick Makes


19.5% Look Bigger Than 21.2%
Waffle
Chart/Square
Pie Chart
Part-to-whole relationship
Mapping:
area, color- hue, symbol

src: http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Waffle%20Chart
Waffle Chart

Part-to-whole relationship
Mapping:
area, color- hue, symbol
Pie Chart: To use or not to use

 Most used and most criticized of all


charts!
 Why criticism?
 difficulty in showing more than a
few values
 When slices become too small, pie
charts have to rely on colors,
textures or arrows so the reader can
understand them. This makes them
unsuitable for use with larger
amounts of data.

https://en.wikipedia.org/wiki/Pie_chart
Pie Chart: To use or not to use

 Most used and most criticized


of all charts!
 Why criticism?
 Our eyes are great at comparing
differences in 2-D location and
differences in line length, but not
2-D areas and angles.

Note: Fictional Data. Shown for the purpose of


illustration only.
Pie Chart: To use or not to use
 Most used and most
criticized of all charts!
 Why criticism?
 Our eyes are great at
comparing differences in 2-D
location and differences in line
length, but not 2-D areas and
angles.

https://en.wikipedia.org/wiki/Pie_chart
 Most used and most criticized of all charts!
 Why criticism?
 Our eyes are great at comparing differences in
2-D location and differences in line length, but
not 2-D areas and angles.
Pie Chart: To
use or not to
use

https://en.wikipedia.org/wiki/Pie_chart
Pie Chart Problems

 Quote: [Stephen Few: “save the pies for dessert” in Visual


Business Intelligence Newsletter, August 2007]
Pie Chart Problems

Save the pies for dessert


A Better Choice

save the pies for dessert


Pie Chart: To use or Not to use
 Most used and most criticized of all charts!
 Why criticism?
 difficulty in showing more than a few values
 When slices become too small, pie charts have to rely on colors, textures or arrows
so the reader can understand them. This makes them unsuitable for use with larger
amounts of data.
 Pie charts also take up a larger amount of space on the page compared to
the more flexible bar charts, which do not need to have separate legends
 Our eyes are great at comparing differences in 2-D location and differences
in line length, but not good in comparing 2-D areas and angles.
 [Edward Tufte: The Visual Display of Quantitative Information p.178]:
“A table is nearly always better than a dumb pie chart, the only thing worse than
a pie chart is several of them…Given their low data-density and failure to order
numbers along a visual dimension, pie charts should never be used.”
Pie Chart may be A preferable if…
 What if, instead, the only point you want to convey is the proportion of
market captured by A compared to the remaining?

http://speakingppt.com/2013/03/18/why-tufte-is-flat-out-wrong-about-pie-charts/
Pie Chart may be A preferable if…
 When you’re comparing percentages, bars are NOT more effective than
pie charts.

Which one
communicates
more quickly?

http://speakingppt.com/2013/03/18/why-tufte-is-flat-out-wrong-about-pie-charts/
Pie Charts can be good

 When you want to quickly


communicate a part-to-whole
relationship
 Nothing beats a pie chart for instantly
communicating that the whole
represents 100%.
 When approximate values are enough
to have a productive discussion
 Pie charts are also immune to the
distortion caused by a change in
scale.
Pie Charts: Guidelines

 Use it for Part-to-whole relationship


 money, percentages
 Keep it simple.
 Limit the number of slices
 Work clockwise from largest to smallest "slice."
 Label carefully.
 Pie charts and Bar charts are commonly used.
 Area of the pie slice or the angle formed by the slice
at the center is used to represent the parts data
Part-to-  Height of the bar chart is used to represent the part.

whole and  Bar charts may be more precise for ranking and
part-to-whole (when plotting % values)

Ranking
relationship
Pareto Charts: Part-to-whole and
ranking relationship
 Pareto charts are useful for
ranking relationship and to
examine the cumulative
contribution of parts to the whole.
 May be considered to be the best
compromise for showing Part-to-
whole and ranking.
Pareto Chart
 A Pareto chart: contains
both bars and a line graph, where
individual values are represented in
descending order by bars, and the
cumulative total is represented by the
line. [Wiki]
The name comes from Pareto Principle,
(also known as the 80/20 rule) states that,
for many events, roughly 80% of the
effects come from 20% of the causes.
[named it after Italian economist Vilfredo
Pareto]
ex: ~80% Federal income tax is paid
by 20% earner.

https://en.wikipedia.org/wiki/Pareto_chart
Ranking Changes over Time:
Slope Charts
Slope charts:
 dot chart for ranking
 line chart for time sequence

Edward Tufte is credited for this chart.


(see his 1983 book The Visual Display
of Quantitative Information)
Deviation Analysis
Deviation Analysis

 How one or more sets of values


deviate from a reference set of
values.
 Two best graphs: Bar graph, Line
graph
 Displays a reference line
Deviation
Analysis
 Diverging Bar chart
 Diverging Lollypop chart
 Requires
 Computation of deviation
 In this figure:
100* (x-reference)/reference
Deviation Analysis

 How one or more sets of values  Reference Line:


deviate from a reference set of  Current target
values.
 Future target
 Two best graphs: Bar graph, Line
 Current forecast
graph
 Same point in time in the past
 Displays a reference line
 Immediately prior period
 Standard/Norm
Deviation plot from a non-zero
baseline

https://blogs.sas.com/content/iml/2020/03/02/deviation-plot-baseline.html
Diverging
Bars: Z-score
 Deviation is sometimes computed
in standard score, also called Z-
score or Z-value. Its computation is
as follows:
Diverging
Bars: T-score
 A T-score is a type of normalized
score, usually used for
Psychometric tests in which mean
in 50 and a standard deviation of is
10.

Psychometric tests:
• Intelligence.
• Aptitudes and skills.
• Personality.
Deviation as a function of time
Deviation analysis

 Techniques and Best Practice


 Expressing deviation as percentage, Z-score (or T-score)
 Comparing Deviation to other points of reference
Scatter plot and Correlation Analysis
Correlation Analysis

 Like distribution analysis (for 1D data), Correlation Analysis is routinely


performed by statisticians, engineers, and scientists.
 It allows us:
 to see any relationship between the two dimensions
 to track down the cause of something happening
 Helps anticipate/predict future
 Used in many fields: ex: economics, psychology, medicine, and
social sciences.
Scatter plot: Visual tool for
Correlation Analysis

Ref: https://clauswilke.com/dataviz/visualizing-associations.html#associations-scatterplots
Scatter Plot

 Scatterplot: invariably the first


choice for Correlation analysis
 Two quantitative variables
 Mapped to position
 Any of the variable may be called
“independent” and the other
“dependent”

https://datavizcatalogue.com/methods/scatterplot.html
Scatter Plot

 By having an axis for each


variable, you can detect if a
relationship or correlation
between the two exists.
 Let you visually compare the
correlation
 Strength: to what degree one
affects the other
 Direction: What sort of correlation:
positive, negative or neutral
 Shape: (linear, curvilinear)

https://datavizcatalogue.com/methods/scatterplot.html
Strength of Correlation

 Rule of Thumb: Fat Pencil Test


 Imagine laying a fat pencil on top
of the drawn line fitting the plotted
points. If the pencil body covers up
the plotted points, it passes the
test, and you can conclude that
the correlation between the two
characteristics is strong.
Correlation coefficient

 A numerical measure to quantify correlation.


 Several types of correlation coefficient exist, each with their own
definition and own range of usability and characteristics.
 They all assume values in the range from −1 to +1, where ±1 indicates
the strongest possible agreement and 0 the strongest possible
disagreement.
Pearson Correlation Coefficient

a.k.a. Linear Correlation Coefficient (r)


 Assumes a linear association between
two variables.
 Measures the direction and strength
of the linear relationship
 Note:
 There is no assumption of causality
Correlation Coefficient (r)
 No distinction between  Ratio between the sample
independent/explanatory (x) and Covariance between x and y to
dependent/response (y) variable.
the product of sample standard
 Requires both variables to be deviation of x and of y.
quantitative or continuous
variables
 Assumes both variables to be
normally distributed.
 Closer the value to 1 higher the
correlation (+1 for positive and -1
for negative correlation).

https://en.wikipedia.org/wiki/Pearson_correlation_co
efficient
Variance
Population Variance
2
 Measure how far a set of numbers is 2
∑ 𝑥𝑖 − 𝜇
spread out. 𝜎 =
 measure of how much something
𝑁
changes
 Variance describes how much ∑𝑥𝑖
a random variable differs from where 𝜇=
𝑁
its expected value (mean).
 defined as the average of Sample Variance
the squares of the differences
between the individual (observed)
and the expected value. ∑ 𝑥𝑖 − 𝜇 2
2
𝜎 =
 it is always positive.
𝑁−1
 Note: The variance is not the average
difference from the expected value. https://simple.m.wikipedia.org/wiki/Variance
Variance and standard deviation
∑ 𝑥 − 𝜇 2
Population Variance 2
𝜎 =
𝑖
𝑁
2
∑ 𝑥𝑖 − 𝜇 ∑𝑥𝑖 ∑ 𝑥𝑖2 − 2𝜇𝑥𝑖 + 𝜇2
𝜎2 = where 𝜇 =
𝑁 =
𝑁 𝑁
∑𝑥𝑖2 2𝜇∑𝑥𝑖 𝑁𝜇2
= − +
Sample Variance 𝑁 𝑁 𝑁
∑𝑥𝑖2
∑ 𝑥𝑖 −𝜇 2 ∑𝑥𝑖2 𝑁 = − 2𝜇2 + 𝜇2
2
𝜎 = = − 𝜇2 𝑁
𝑁−1 𝑁−1 𝑁−1
∑𝑥𝑖2
= − 𝜇2
standard deviation = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑁
Covariance Population Covariance
∑ 𝑥𝑖 −𝜇𝑥 ∗ 𝑦𝑖 −𝜇𝑦
𝑐𝑜𝑣(𝑥, 𝑦) =
𝑁
 a measure of the relationship
between two random variables.
where
∑𝑥𝑖 ∑𝑦𝑖
 evaluates to what extent the
variables change together.
𝜇𝑥 = and 𝜇𝑦 =
𝑁 𝑁
 In other words, it’s a measure of
the variance between two
variables.
Sample Covariance
 A positive covariance means that
both move together while a
negative covariance means they ∑ 𝑥𝑖 −𝜇𝑥 ∗ 𝑦𝑖 −𝜇𝑦
move inversely. 𝑐𝑜𝑣(𝑥, 𝑦) =
 Note: It does not assess the 𝑁−1
dependency between variables.
Correlation Coefficient r
Correlation Coefficient r

Another way of interpreting Correlation Coefficient


𝑥−𝜇𝑥
Let standardized x (or z-score of x) : z𝑥 = ,
𝜎𝑥

𝑦−𝜇𝑦
and standardized y (or z-score of y) : z𝑦 =
𝜎𝑦

1
𝑟𝑥𝑦 = ∑ z𝑥,𝑖 𝑧𝑦,𝑖
𝑁
Correlation r - Interpretation

 Positive r indicates positive linear association between x and y


or variables, and negative r indicates negative linear
relationship
 r always between -1 and +1
 strength increases as r moves away from zero toward -1 or +1
 The extreme values +1 and -1 indicate perfect linear
relationship (points lie exactly along a straight line)
How high must a correlation be
to be considered meaningful?
It depends on the discipline. Here are some rough
guidelines:

Discipline r meaningful if R2 meaningful if


Physics r < -0.95 or 0.95 < r 0.9 < R2
Chemistry r < -0.9 or 0.9 < r 0.8 < R2
Biology r < -0.7 or 0.7 < r 0.5 < R2
Social Sciences r < -0.6 or 0.6 < r 0.35 < R2
Use of Correlation Analysis

 Predictive modeling:
 Correlation analysis can be used to identify which variables are most
strongly related to an outcome, and this information can be used to
build predictive models.
Correlation Analysis
 Line of best fit: Trend line
 ‘Best Fit’ would mean
 Squared difference between
Actual Y Values & Predicted Y
Values for X are a Minimum
 Why not Difference Between
Actual Y Values & Predicted Y
Values for X Are Minimized.?
 Positive Differences would Off-
Set Negative ones

 Computation is called linear


regression

Ref: https://clauswilke.com/dataviz/visualizing-trends.html
Use of Correlation Analysis (contd…)

 Predictive modeling:
 Correlation analysis can be used to identify which variables are most
strongly related to an outcome, and this information can be used to
build predictive models.
 Identifying confounding variables:
 Correlation analysis can help identify variables that may be influencing
the relationship between two other variables, thus allowing researchers
to control for these confounding variables in their analysis.
Least Squared Regression

 Goal of linear regression:


 Simplest is fit a line: y = a + b x
 The equation of the regression line is of
the form:
 y=a+bx
 compute a and b from a set of (xi,yi)
 The linear model is written as:
 yi=a+bxi+ϵi
 So ϵi is the difference between the
observed values of yi and the
predicted values of y at xi.
 Least squared method minimizes the
sum of square of the difference. i.e.
minimize Q(a,b) where

Q(a,b) = ∑ϵi = ∑(yi-(a+bxi)) See: https://towardsdatascience.com/regression-


analysis-linear-regression-239df26a94ac
Least Squared Regression

 Not limited to fitting line


 May be generalized to polynomial fitting.
 The polynomial model is written as:
 yi=a+bxi+cxi2+…+ϵi
 So ϵi is the observed values of yi and the predicted values of y at xi.
 Least squared method minimizes the sum of square of the difference.
i.e. minimize Q(a,b,c,…)
 So you compute a, b, c, … from the set of values by minimizing the error
squared difference with respect to a, b, c, ….. The minimization process
will give you linear system with as many equations are there are
unknowns. You solve them to estimate the unknows.
Least Squares Regression
Least Squares Regression

http://mathworld.wolfram.com/LeastSquaresFitting.html
http://mathworld.wolfram.com/LeastSquaresFittingPerpendicularOffsets.ht
ml
Least Squared Minimization

see:
http://mathworld.wolfram.com/
LeastSquaresFitting.html
Best Fit line coefficients and
Correlation coefficient

see:
http://mathworld.wolfram.com/
LeastSquaresFitting.html
Linear Regression

 Not restricted to fitting line


 could be quadratic, cubic, …
 Not restricted to be a single variable x.
 Uses the same error minimization method.
Local Regression

 LOESS (locally estimated scatterplot smoothing) and


LOWESS (locally weighted scatterplot smoothing),
 Loess is locally quadratic fitting
 Lowess is locally linear fitting
 Local fitting
 for the fit at point x, the fit is made using points in a
neighborhood of x
 the size of neighborhood is controlled by parameter
“span”
 The resulting smooth curve is called LOESS curve

https://en.wikipedia.org/wiki/Local_regression
Local Regression

 LOESS (locally estimated scatterplot smoothing) and


LOWESS (locally weighted scatterplot smoothing),
 Loess is locally quadratic fitting
 Lowess is locally linear fitting
 Local fitting
 for the fit at point x, the fit is made using points in a
neighborhood of x
 the size of neighborhood is controlled by parameter
“span”
 The resulting smooth curve is called LOESS/LOWESS
curve.

https://en.wikipedia.org/wiki/Local_regression
Local Regression

 Weighted local fitting:


 minimizes ∑(𝑤𝑖 (𝑥)[𝑦𝑖 − (𝑎 + 𝑏𝑥𝑖 )]
 for “span” < 1: tri-cubic weight is
used. i.e. weight is proportional to
𝑥−𝑥𝑖 3 3
1− | |
𝑚𝑎𝑥𝐷𝑖𝑠𝑡

 the point corresponding to the


maximum distance will have a
weight of zero, and the point at
zero distance will have the highest
possible weight — one.
Local Line Fit

∑𝑤𝑖 𝑥𝑖
 𝑥ҧ =
∑𝑤𝑖
∑𝑤𝑖 𝑦𝑖
𝑦
ത =
∑𝑤𝑖

∑𝑤𝑖 𝑥𝑖 𝑦𝑖 −𝑥ҧ 𝑦∑𝑤


ത 𝑖
 𝑏=
∑𝑤𝑖 𝑥𝑖2 −𝑥ҧ 2 ∑𝑤𝑖

 𝑎 = 𝑦ത − 𝑏𝑥ҧ
Upcoming
Assignment
 Table Scraping from Web
Correlation Analysis
Recap: Scatter Plot

 Scatter plot allows us to visually


compare the correlation
 Strength: to what degree one
affects the other
 Direction: What sort of correlation:
positive, negative or neutral
 Shape: (linear, curvilinear)

https://datavizcatalogue.com/methods/scatterplot.html
Recap: Variance, Covariance

 Variance:
∑ 𝑥𝑖 −𝜇 2
 Computation: 𝑣𝑎𝑟 𝑥 = 𝜎𝑥2 =
𝑁

Intuitive Meaning/Importance: Spread in Data


𝑥𝑖 −𝜇𝑥
 Standardizing Data: z𝑥,𝑖 =
𝜎𝑥
 Mean: 0
 Variance: 1
 CoVariance:
∑ 𝑥𝑖 −𝜇𝑥 ∗ 𝑦𝑖 −𝜇𝑦
 Computation: 𝑐𝑜𝑣(𝑥, 𝑦) = 𝑁

 Intuitive Meaning/Importance: Evaluates to what extent the two


dimensions change together or related to each other.
Recap: Correlation Coefficient r
Correlation Coefficient r

Another way of interpreting Correlation Coefficient


𝑥−𝜇𝑥
Let standardized x (or z-score of x) : z𝑥 = ,
𝜎𝑥

and standardized y (or z-score of y) :


Correlation r - Interpretation

 Positive r indicates positive linear association between x and y


or variables, and negative r indicates negative linear
relationship
 r always between -1 and +1
 strength increases as r moves away from zero toward -1 or +1
 The extreme values +1 and -1 indicate perfect linear
relationship (points lie exactly along a straight line)
How high must a correlation be
to be considered meaningful?
It depends on the discipline. Here are some rough
guidelines:

Discipline r meaningful if R2 meaningful if


Physics r < -0.95 or 0.95 < r 0.9 < R2
Chemistry r < -0.9 or 0.9 < r 0.8 < R2
Biology r < -0.7 or 0.7 < r 0.5 < R2
Social Sciences r < -0.6 or 0.6 < r 0.35 < R2
Use of Correlation Analysis

 Predictive modeling:
 Correlation analysis can be used to identify which variables are most
strongly related to an outcome, and this information can be used to
build predictive models.
Correlation Analysis
Note: The extreme values +1 and -1
indicate perfect linear relationship (points
lie exactly along a straight line)
 Line of best fit: Trend line
 ‘Best Fit’ would mean
 Squared difference between
Actual Y Values & Predicted Y
Values for X are a Minimum
 Why not Difference Between
Actual Y Values & Predicted Y
Values for X Are Minimized.?
 Positive Differences would Off-Set
Negative ones

 Computation is called linear


regression

Ref: https://clauswilke.com/dataviz/visualizing-trends.html
Use of Correlation Analysis (contd…)

 Predictive modeling:
 Correlation analysis can be used to identify which variables are most
strongly related to an outcome, and this information can be used to
build predictive models.
 Identifying confounding variables:
 Correlation analysis can help identify variables that may be influencing
the relationship between two other variables, thus allowing researchers
to control for these confounding variables in their analysis.
Least Squared Regression

 Goal of linear regression:


 Simplest is fit a line: y = a + b x
 The equation of the regression line is of
the form:
 y=a+bx
 compute a and b from a set of (xi,yi)
 The linear model is written as:
 yi=a+bxi+ϵi
 So ϵi is the difference between the
observed values of yi and the
predicted values of y at x i.
 Least squared method minimizes the
sum of square of the difference. i.e.
minimize Q(a,b) where

Q(a,b) = ∑ϵi = ∑(yi−(a+bxi)) See: https://towardsdatascience.com/regression-


analysis-linear-regression-239df26a94ac
Least Squared Regression

 Not limited to fitting line


 May be generalized to polynomial fitting.
 The polynomial model is written as:
 yi=a+bxi+cxi2+…+ϵi
 So ϵi is the observed values of yi and the predicted values of y at xi.
 Least squared method minimizes the sum of square of the difference.
i.e. minimize Q(a,b,c,…)
 So you compute a, b, c, … from the set of values by minimizing the error
squared difference with respect to a, b, c, ….. The minimization process
will give you linear system with as many equations are there are
unknowns. You solve them to estimate the unknows.
Least Squares Regression
Least Squares Regression

http://mathworld.wolfram.com/LeastSquaresFitting.html
http://mathworld.wolfram.com/LeastSquaresFittingPerpendicularOffsets.ht
ml
Least Squared Minimization

see:
http://mathworld.wolfram.com/
LeastSquaresFitting.html
Best Fit line coefficients and
Correlation coefficient

see:
http://mathworld.wolfram.com/
LeastSquaresFitting.html
Linear Regression

 Not restricted to fitting line


 could be quadratic, cubic, …
 Not restricted to be a single variable x.
 Uses the same error minimization method.
Local Regression

 LOESS (locally estimated scatterplot smoothing) and


LOWESS (locally weighted scatterplot smoothing),
 In general, Loess uses locally quadratic fitting
and Lowess uses locally linear fitting

 Local fitting
 for the fit at point x, the fit is made using points in a
neighborhood of x
 the size of neighborhood is controlled by parameter
“span”
 The resulting smooth curve is called LOESS curve

See:
https://clauswilke.com/dataviz/visualizing-trends.html Source:
https://en.wikipedia.org/wiki/Local_regression
Local Regression

 LOESS (locally estimated scatterplot smoothing) and


LOWESS (locally weighted scatterplot smoothing),
 Loess is locally quadratic fitting
 Lowess is locally linear fitting
 Local fitting
 for the fit at point x, the fit is made using points in a
neighborhood of x
 the size of neighborhood is controlled by parameter
“span”
 The resulting smooth curve is called LOESS/LOWESS
curve.

https://en.wikipedia.org/wiki/Local_regression
Local Regression

 Weighted local fitting:


 minimizes ∑(𝑤𝑖 (𝑥)[𝑦𝑖 − (𝑎 + 𝑏𝑥𝑖 )]
 for “span” < 1: tri-cubic weight is
used. i.e. weight is proportional to
𝑥−𝑥𝑖 3 3
1− | |
𝑚𝑎𝑥𝐷𝑖𝑠𝑡

 the point corresponding to the


maximum distance will have a
weight of zero, and the point at
zero distance will have the highest
possible weight — one.
Local Line Fit

∑𝑤𝑖 𝑥𝑖𝑦𝑖 −𝑥ҧ 𝑦∑𝑤


ത 𝑖
 𝑏=
∑𝑤𝑖𝑥𝑖2 −𝑥ҧ 2 ∑𝑤𝑖

 𝑎 = 𝑦ത − 𝑏𝑥ҧ
Visual Analysis of 2D Data
CLUSTERING AND CLUSTER ANALYSIS
Cluster and Cluster Analysis

 Cluster analysis: A statistical tool to


 dividing the data points into a number of
groups such that objects in one group
 are more similar to each other and
 different from objects in other groups.

 normally used for exploratory data analysis


and as a method of discovery by solving
classification issues.

https://en.wikipedia.org/wiki/Cluster_analysis
Visualizing Bi-Variate Data

Example: faithful data

https://www.xarg.org/2018/04/how-to-plot-a-covariance-error-ellipse/
https://www.visiondummy.com/2014/04/draw-error-ellipse-representing-covariance-matrix/
Cluster Analysis: Application

 Market segmentation:  Customer profiling:


 identify different segments of customers based  create customer profiles based on their behavior,
on their behavior, preferences, and preferences, and purchase history. This information
demographics. This information can be used to can be used to develop targeted marketing
develop targeted marketing strategies for strategies for each customer.
each segment.
 Recommendation systems:
 Image segmentation:
 group similar items or products together. This
 group pixels in an image based on their information can be used to develop
similarity. This can be used in image processing recommendation systems that suggest similar
tasks such as object recognition, image products to customers based on their purchase
compression, and image enhancement. history or browsing behavior.
 Anomaly detection:  Data mining:
 detect unusual patterns or outliers in data. This  identify patterns and relationships in data. This
is useful in detecting fraud, network intrusions, information can be used to develop predictive
and other anomalies in data. models and make data-driven decisions.
Clustering Algorithm

 k-Mean Clustering:
1. starts with k randomly selected points as centroid (each centroid
defines one cluster)
 either randomly generated
 or randomly selected from the data points,
2. assign each data point to its nearest centroid, based on the squared
Euclidean distance.
3. calculate centroid of the group assigned to each centroid
1. compute the mean of the points in the group
4. performs steps 2 and 3 iteratively until the following condition is met:
 the centroids have stabilized
 there is no change in their v alues because the clustering has been successful.
 or the defined number of iterations has been achieved.
Choosing the value of “k”

 Run the K-means


clustering algorithm for a
range of K values and
compare the results
 Typically compute average
within-cluster distance to
centroid
 Choose the elbow of the
plot
Elbow point graph

https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
k-Means++

 Drawback of k-Means algorithm


 random initialization of centers. The formation of clusters very much depend on the
initial position of a centroid. The random positioning of the centroids can completely
alter clusters and can result in a random formation.
 Solution Idea: spread out the k initial cluster centers as much as possible.
1. Choose one center uniformly at random from among the data points.
2. For each data point say x, compute D(x), the distance between x and the nearest
centroid that has already been chosen.
3. Choose one new data point at random as a new center, using a weighted
probability distribution where a point x is chosen with probability proportional to D(x) 2.
4. Repeat Steps 2 and 3 until K centers have been chosen.
5. Proceed with standard k-means clustering.
https://en.wikipedia.org/wiki/K-means%2B%2B
Density Contour Plot
Contour plot

 A contour line of a function of two variables is a curve along which


the function has a constant value, so that the curve joins points of
equal value.
 It is a plane section of the three-dimensional graph of the function
f(x, y) parallel to the (x, y)-plane.

https://en.wikipedia.org/wiki/Contour_line
Contour Plot
Contour Plot
Density Contour Plot
Multivariate Data Visualization
Multivariate Data

Example:
Multivariate Data

Example:
Multiple 1D Data: Distribution analysis

 Multiple distribution display


 Box Plots
 Violin Plots

Source: https://blogs.sas.com/content/graphicallyspeaking/2013/03/24/custom-box-plot
Multiple 1D Data: Distribution analysis

 Multiple distribution display


 Box Plots
 Violin Plots
 https://plotly.com/javascri
pt/violin/

https://datavizcatalogue.com/methods/violin_plot.html
Multiple Time series data
Scatter plot Matrix: for Multivariate
Correlation analysis

 Scatterplot is designed
to compare only two
variables
 Scatterplot Matrix
addresses the problem.
 Explores how several
variables interact with
each other.
Correlation Analysis for multiple
variables
 Scatterplot is designed to
compare only two
variables
 Scatterplot Matrix
addresses the problem.
 Explores how several
variables interact with
each other.
Correlation Display for multiple
variables
 Scatterplot is designed to
compare only two
variables
 Scatterplot Matrix
addresses the problem.
 Explores how several
variables interact with
each other.
How do we visualize Multivariate
Data?
Example:
Multivariate Analysis

 Identify similarity and difference among items, each characterized


by a common set of variables.
 Multivariate Visualizations
 Bubble chart
 Parallel Coordinates
 Spider/Star Plots
 Glyphs
3D Scatterplot

https://plotly.com/python/3d-scatter-plots/ Px.scatter_3d
Bubble Chart

https://observablehq.com/d/005a613631862b1b
Parallel Coordinates
 ideal for comparing many
variables together and
seeing the relationships
between them.
 each variable is given its
own axis and all the axes
are placed in parallel to
each other.

Source: https://datavizcatalogue.com/methods/parallel_coordinates.html
Parallel Coordinates

 Each axis can have a different scale as


each variable works off a different unit of
measurement.
 or each variable is normalized
 Values are plotted as a series of lines that
connected across all the axes.
 each line is a collection of points placed on
each axis, that have all been connected
together.
Dataviz Catalogue
Parallel Coordinates

 The order the axes are arranged in


can impact the way how the
reader understands the data.
 the relationships between adjacent
variables are easier to perceive,
then for non-adjacent variables.

Dataviz Catalogue
How to visually read Parallel Coordinates

 When most lines between two


parallel axis are somewhat parallel to
each other, it suggests a positive
relationship between these two
dimensions.
 When lines cross in a kind of
superposition of X-shapes, it's a
negative relationship.
 When lines cross randomly or are
parallel, it shows there is no particular
relationship.

See Inselberg, A. (1997), "Multidimensional detective", Information Visualization, 1997.


for a full review of how to visually read out parallel coords' relational patterns.
Radar/Star/Spider Chart
 allows comparison of
observations with multiple
quantitative variables.
 allows the viewer to see
which variables have similar
values or if there are any
outliers amongst each
variable.
 each variable is given its own
axis.
 the axes are arranged
radially
Source: Dataviz Catalogue
https://datavizcatalogue.com/methods/radar_chart.html
Radar/Star/Spider Chart

 Each variable is provided with an axis


that starts from the center.
 All axes are arranged radially, with
equal distances between each other,
while maintaining the same scale
between all axes.
 Grid lines that connect from axis-to-axis
are often used as a guide.
 Each variable value is plotted along its
individual axis.
 All the variables in a dataset and
connected together to form a
polygon.
Radar/Star/Spider Chart

https://en.wikipedia.org/wiki/Radar_chart
Dimension
Reduction
Dimension Reduction

 The process of converting a set of data having ”d” dimensions into


data with “m” dimensions where “m≪d” ensuring that it retains as
much information as possible.
 Dimensionality reduction is used
 to easily see patterns and clusters of similar or dissimilar data.
 Assumption is:
 if done properly the data displayed in low dimension (say 2D or 3D)
preserves structures present in higher (original) dimensions.
Example (recap)

 Nutrition chart for almost all food


items are available
 What if we want to categorize food
based on this nutrition data. What will
be the best categorization?
 What if we want to create 2D Scatter
plot and create clusters of similar
food items?
Options

1. Remove redundant dimensions


 Detect Highly correlated dimensions
2. Create new dimensions
 from weighted combination of existing dimensions
Principal Component Analysis
Principal Component Analysis (PCA)

 PCA finds a new set of dimensions such that


 all the dimensions are linearly independent (a.k.a orthogonal)
 ranked according to the variance of data along them. It means more
important principlal axis occurs first. (more important = more
variance/more spread-out data)
 The direction of maximum spread of data is called the first Principal
axis.
 This is the direction where there is the most variance, the direction
where the data is most spread out.
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)

https://towardsdatascience.com/https-medium-com-abdullatif-h-
dimensionality-reduction-for-dummies
Principal Component Analysis (PCA)

https://towardsdatascience.com/https-medium-com-abdullatif-h-
dimensionality-reduction-for-dummies
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)

 Measures data in terms of its principal axes rather than on a normal


axis of the data variable.
 takes a linear combination of the original variables into a new set of
variables. These new set of variables are known as principal
components.
 PCA is a projection based method which transforms the data by
projecting it onto a set of orthogonal axes.
Principal Component Analysis (PCA)

 PCA finds a new set of dimensions such that


 all the dimensions are orthogonal (and hence linearly independent)
and
 ranked according to the variance of data along them. It means more
important principlal axis occurs first. (more important = more
variance/more spread out data)
 The direction of maximum spread of data is called the Principal axis.
 They are the directions where there is the most variance, the directions
where the data is most spread out.
 Eigen Value Decomposition of data gives us Eigenvectors and
Eigenvalues. The vectors with maximum Eigen value is the 1 st
principal component.
Eigenvectors and Eigenvalues

 Eigenvectors and values exist in pairs:


 every eigenvector has a corresponding eigenvalue.
 An eigenvector is a direction
 An eigenvalue is a number, telling you how much variance there is in
the data in that direction, in other words telling us how spread out the
data is on the line.
 The eigenvector with the highest eigenvalue is therefore the
principal component.
 The number of eigenvectors/values that exist equals the number of
dimensions the data set has.
 However, we choose the first “k” of them and hence have a reduced
dimension.
Principal Axes
Data In the Principal Axes
PCA Analysis

 Calculate the covariance matrix X of data points.


 Calculate eigenvectors and corresponding eigenvalues for the
covariance matrix
 Sort the eigen vectors according to their eigen values in decreasing
order.
 Choose first k eigen vectors and that will be the new k dimensions.
 Transform the original n dimensional data points into k dimensions.

https://medium.com/@aptrishu/understanding-
principle-component-analysis-e32be0253ef0
Variance, Covariance (Recap)

 Variance
 measure of the variability or
 simply measures how spread the data set is.
 Mathematically: the average squared deviation from the
mean .
 Covariance
 measure of the extent to which corresponding elements from
two sets of ordered data move in the same or opposite
direction.
 Covariance matrix is symmetric.
As, we discussed earlier we want the data to be spread out
i.e. it should have high variance along dimensions.
 If two dimensions are independent of each other then
covariance should be zero
Covariance Matrix

For a dataset with D dimensions


 the covariance matrix is
 a symmetric DxD matrix
 variance of dimensions as the main diagonal elements
 covariance of dimensions as the off diagonal elements.
 The goal of PCA is to transform the dataset such that
 variance is maximum
 covariance is zero
 That means we need a diagonal matrix
 So PCA works by diagonalization of the covariance
matrix.
Covariance Matrix

 Let us assume that the data is reshaped in a form where dimensions are

rows and observations are columns.


 Covariance between two dimensions k and m ?
1 𝑛
 ∑ 𝑥𝑘,𝑖 − 𝑥𝑘 𝑥𝑚,𝑘 − 𝑥𝑚
𝑛 𝑖=1

 For convenience we will transform the data rows to zero mean:

 Covariance Matrix of X?
1 𝑇
PCA Analysis

 Let’s say the dataset is X has zero mean and its covariance matrix is
CX
1
 That means C𝐱 = 𝑿𝑿𝑇
𝑛
 We want to transform them to Y such that its covariance matrix CY is a
diagonal matrix
 Let 𝒀 = 𝑷𝑿
1 1 1
 Then C𝐘 = 𝒀𝒀𝑇 = 𝑷𝑿 𝑷𝑿 𝑇
= 𝑃𝑋𝑋 𝑇 𝑃𝑇
𝑛 𝑛 𝑛
1
 C𝐘= 𝑃C 𝐱𝑃𝑇
𝑛

 Covariance matrices are Symmetric.


PCA Analysis

 𝐶𝑌 = 𝑃𝐶𝑋 𝑃 𝑇
 What P to choose such that CY is a diagonal matrix?
 In linear algebra, eigen decomposition is the factorization of a
matrix into a diagonal form, whereby the matrix is represented in
terms of its eigenvalues and eigenvectors.

https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix#Real_symmetric_matrices
Eigenvector and Eigenvalue of a
matrix
 If Av =  v where A is a matrix, v is a vector and k is a
scalar then v and  are respectively the eigenvector and
eigenvalue of A
 Av = Iv
 So (A-  I)v = 0
 That means determinant of (A-  I) = 0
 Determinant is a “m” degree polynomial in  and has
“m”  roots.
 Hence, for m×m matrix A, there are “m” eigen vectors and “m”
eigen values.
See: https://www.mghassany.com/MLcourse/principal-
 AV =V or A = VV-1. components-analysis.html
PCA Analysis

 Eigen decomposition of Symmetric matrix 𝐂X = 𝑄Λ𝑄 𝑇


 where 𝑄 is an orthogonal matrix whose columns are Eigen
Vectors
 Λ is a diagonal matrix whose diagonal values are Eigen Values.
 Rearranging we get 𝑄 𝑇 𝐶𝑋 𝑄 = Λ
 Compare it what we wanted to do for PCA
 𝐶𝑌 = 𝑃𝐶𝑋 𝑃 𝑇
 So 𝑃 is 𝑄𝑇, the transpose of Eigen vector matrix.

https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix#Real_symmetric_matrices
PCA Demo

 https://setosa.io/ev/principal-component-analysis/
PCA Dimension Reduction of IRIS
Dataset
PCA: simplified explanation

 https://builtin.com/data-science/step-step-explanation-principal-
component-analysis
PCA Analysis (recap)

 Eigen decomposition of Symmetric matrix 𝐂X = 𝑄Λ𝑄 𝑇


 where the columns of 𝑄 are Eigen Vectors
 Λ is a diagonal matrix whose diagonal values are Eigen Values.

 Compute

1
 The covariance matrix of Y: 𝒀𝒀𝑇 is a diagonal matrix.
𝑛
Multidimensional Scaling
Multidimensional Scaling (MDS)

 Maps data from n-D space to k-D space such that difference in
distance between the data points is minimized.
 Given a distance matrix with the distances between each pair of
objects in a set, and a chosen number of dimensions, k, an
MDS algorithm places each object into k-dimensional space such
that the between-object distances are preserved as well as
possible.
 For scatter-plot visualization we can choose k = 2.

https://en.m.wikipedia.org/wiki/Multidimensional_scaling
MDS Algorithm

 Mathematically:
 MDS takes an input matrix giving dissimilarities (say, distance) between
pairs of items and outputs a coordinate matrix whose configuration
minimizes a loss function called strain.
 Compute the Euclidean distance between data item i and data item j
and minimize the stress function:
General MDS Algorithm

 input: D matrix: D matrix could be the Euclidean distance in the m


dimensional space, or could be dissimilarity between every
1. Assign points to arbitrary coordinates in k-dimensional space.
2. Compute Euclidean distances among all pairs of points, to form the D’
matrix.
3. Compare D’ matrix with the input D matrix by evaluating the stress
function. The smaller the value, the greater the correspondence
between the two.
4. Adjust coordinates of each point in the direction that best maximally
stress. (Optimization step)
5. Repeat steps 2 through 4 until stress won't get any lower.
MDS Algorithm

𝑓𝑖𝑡𝑡𝑒𝑑 2
 A typical strain/stress function: ∑ 𝑑𝑖𝑗 − 𝑑𝑖𝑗 where
the sum is over N data points.

 or its scaled version

 or some variation of the equations.

http://www.analytictech.com/networks/mds.htm
MDS Algorithm: which “k” to choose?

 If there is an option, the elbow of


the residual stress plot (aka Scree
Plot) may guide the choice of “k”.

http://www.analytictech.com/networks/mds.htm
Scree plot
 Scree is a collection of broken rock fragments at the base
of mountain cliffs, that has accumulated through periodic rockfall from cliff
faces.

https://en.wikipedia.org/wiki/Scree
Uncertainty Visualization
Uncertainty

 Uncertainty
 The lack of certainty, a state of limited knowledge where it is impossible
to exactly describe the existing state, a future outcome, or more than
one possible outcome. [Wiki]
 ex: Who will win presidential election?
US Employment Rate & Uncertainty
“The unemployment rate declined by 0.2 percentage point to 5.2
percent, the U.S. Bureau of Labor Statistics. [Aug 2021]
 How is this rate computed?
 From people collecting
unemployment insurance (UI)?
 Govt counts every unemployed
person every month?
 The government conducts a
monthly survey called the Current
Population Survey (CPS) to
measure the extent of
unemployment in the country.

US Bureau of Labor Statistics


Survey to Compute Uncertainty
 60,000 eligible households in the sample for this survey.
 approximately 110,000 individuals each month, and are asked about the labor force
activities (jobholding and job seeking)
 Note: It is a large sample compared to public opinion surveys, which usually cover fewer than 2,000
people.
 Sample is representative of the entire population of the United States.
 all of the counties and independent cities in the country first are grouped into approximately
2,000 geographic areas (sampling units).
 a sample of ~ 800 of these geographic areas are chosen to represent each state and DC.
 Every month, one-fourth of the households in the sample are changed, so that no household
is interviewed for more than 4 consecutive months.
 After a household is interviewed for 4 consecutive months, it leaves the sample for 8 months,
and then is again interviewed for the same 4 calendar months a year later, before leaving
the sample for good.
 As a result, approximately 75 percent of the sample remains the same from month to month
and 50 percent remains the same from year to year.
US Bureau of Labor Statistics
Back to Unemployment Rate &
Uncertainty
 Sample and Population:
 Population: All the citizens of the country
 Parameter: the actual unemployment number collected by asking each
member of the population
 Sample: Subset of citizens queried
 Estimate: the number computed from the sample.
Uncertainty

 Measurement of uncertainty
 the range of possible values within which the true value lies and the
probability assigned to the possible values.
 Often standard deviation and standard error are general measures of
uncertainty around a particular single choice (or the measured value or
a mean, median, or mode).

Wiki
Uncertainty Measure

Key concepts of statistical sampling.

https://serialmentor.com/dataviz/visualizing-
uncertainty.html
Parameter estimates and their
uncertainties.
 Bayesians approach:
 We have some prior knowledge about the world, and we will use the
sample to update this knowledge.
 Frequentist approach:
 We make precise statements about the world without having any prior
knowledge in hand.
Standard Deviation vs Standard
Error
 standard deviation (SD) measures the amount of variability,
or dispersion, for a subject set of data from the mean,
 How much spread there is in the data around the mean.
𝑠𝑎𝑚𝑝𝑙𝑒𝑖 −𝑚𝑒𝑎𝑛 2
 SD =
𝑁−1
 standard error of the mean (SEM) measures how far the sample mean
of the data is likely to be from the true population mean.
 How accurate the sample mean is.
 The SEM is always smaller than the SD.
 SEM = SD/√𝑁
 SD is a measure of volatility (spread of measured value).
 68% of the possible values around the measured values are withing 1 SD.
 SEM is the standard deviation of the means within a dataset.
Confidence Interval

 When you know the Standard


deviation and sample mean:
 Confidence interval around the
𝜎
mean : 𝑧 ∗
𝑛

https://www.dummies.com/education/math/statistics/how-to-calculate-a-confidence-
interval-for-a-population-mean-when-you-know-its-standard-deviation/
Confidence Interval in Polling

 Let us assume in a polling we have an


estimate of population proportion: 𝑝Ƹ
 Confidence interval of the actual
population proportion around 𝑝Ƹ :

𝑝(1− ො
𝑝)
𝑧∗
𝑛

See: https://www.dummies.com/education/math/statistics/how-to-determine-the-
confidence-interval-for-a-population-proportion/
For polling visualization see: https://fivethirtyeight.com/
Error Bars for Uncertainty visualization

 Five different way to


represent uncertainty using
error bar.
Standard Error and Sample Size
Error Bar: IRIS data example
Point Range: IRIS data example
(recap) Best fit line

 Goal of linear regression:


 The equation of the regression line is
of the form:
 y=a+bx
 where a is intercept and b is slope

 Uses linear least square method to


compute a and b from a set of (xi,yi)
 You can also compute best fit
polynomial of higher degree using
linear least squares method.
Best fit line (contd…)

 Note:
 Best fit line does not mean it
is the right fit.
 See for example:
(Recap) Best Fit line coefficients
and Correlation coefficient
Measure of Error in Linear Regression

 SSE: Sum of squared error:

Smaller the value of SSE


better the fit.
Standard error of Linear Regression

 Standard SSE:
 SSE relative to the spread of y-values:

∑ ∗

Closer to zero better the fitting.


Max standard SSE = 1.
Coefficient of determination

∑ ∗
 Coefficient of determination:

 Value range is 0 to 1.
 Closer to 1 better the fit.
 Compare it with the Correlation coefficient.
 In some statistics books you may find the following equation

∑ ∗


(recap) Local Regression

 LOESS (locally estimated scatterplot smoothing) and


LOWESS (locally weighted scatterplot smoothing),
 In general, Loess uses locally quadratic fitting
and Lowess uses locally linear fitting
 Local fitting
 for the fit at point x, the fit is made using points in a
neighborhood of x
 the size of neighborhood is controlled by parameter
“span”
 The resulting smooth curve is called LOESS curve

See:
https://clauswilke.com/dataviz/visualizing-trends.html Source:
https://en.wikipedia.org/wiki/Local_regression
(recap) Local Regression

 LOESS (locally estimated scatterplot smoothing)


 Local fitting
 for the fit at point x, the fit is made using points in a
neighborhood of x
 the size of neighborhood is controlled by parameter “span”

 The resulting smooth curve is called LOESS/LOWESS curve.


 LOWESS (locally weighted scatterplot smoothing),
Weighted local fitting:
 Lowess using local line fit: minimizes ∑(𝑤 (𝑥)[𝑦 − (𝑎 + 𝑏𝑥 )]

Sample weight function

https://en.wikipedia.org/wiki/Local_regression
Visual Analysis of 2D Data
CLUSTERING AND CLUSTER ANALYSIS
Cluster and Cluster Analysis

 Cluster analysis: A statistical tool to


 dividing the data points into a number of
groups such that objects in one group
 are more similar to each other and
 different from objects in other groups.

 normally used for exploratory data analysis


and as a method of discovery by solving
classification issues.

https://en.wikipedia.org/wiki/Cluster_analysis
Visualizing Bi-Variate Data

Example: faithful data

https://www.xarg.org/2018/04/how-to-plot-a-covariance-error-ellipse/
https://www.visiondummy.com/2014/04/draw-error-ellipse-representing-covariance-matrix/
Cluster Analysis: Application

 Market segmentation/ Customer profiling  Image segmentation:


 identify different segments of customers  group pixels in an image based on their
based on their behavior, preferences, and similarity. This can be used in image processing
demographics. This information can be used tasks such as object recognition, image
to develop targeted marketing strategies for compression, and image enhancement.
each segment.
 Anomaly detection:
 Recommendation systems:
 detect unusual patterns or outliers in data. This is
 group similar items or products together. This useful in detecting fraud, network intrusions, and
information can be used to develop other anomalies in data.
recommendation systems that suggest
similar products to customers based on their  Data mining:
purchase history or browsing behavior.  identify patterns and relationships in data. This
information can be used to make data-driven
decisions.
Clustering Algorithm

 k-Mean Clustering: It's a centroid-based algorithm and the simplest


unsupervised learning algorithm.
1. starts with k randomly selected points as centroid (each centroid
defines one cluster)
 either randomly generated
 or randomly selected from the data points,
2. assign each data point to its nearest centroid, based on the squared
Euclidean distance.
3. calculate centroid of the group assigned to each centroid
1. compute the mean of the points in the group
4. performs steps 2 and 3 iteratively until the following condition is met:
 the centroids have stabilized
 there is no change in their values because the clustering has been successful.
 or the defined number of iterations has been achieved.
https://en.wikipedia.org/wiki/Cluster_analysis
Choosing the value of “k”

 Run the K-means


clustering algorithm for a
range of K values and
compare the results
 Typically compute average
within-cluster distance to
centroid
 Choose the elbow of the
plot
Elbow point graph

https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
k-Means++

 Drawback of k-Means algorithm


 random initialization of centers. The formation of clusters very much depend on the
initial position of a centroid. The random positioning of the centroids can completely
alter clusters and can result in a random formation.
 Solution Idea: spread out the k initial cluster centers as much as possible.
1. Choose one center uniformly at random from among the data points.
2. For each data point say x, compute D(x), the distance between x and the nearest
centroid that has already been chosen.
3. Choose one new data point at random as a new center, using a weighted
probability distribution where a point x is chosen with probability proportional to D(x)2.
4. Repeat Steps 2 and 3 until K centers have been chosen.
5. Proceed with standard k-means clustering.
https://en.wikipedia.org/wiki/K-means%2B%2B
Cluster Ellipse Plot
Density Contour Plot
Contour plot

 A contour line of a function of two variables is a curve along which


the function has a constant value, so that the curve joins points of
equal value.
 It is a plane section of the three-dimensional graph of the function
f(x, y) parallel to the (x, y)-plane.

https://en.wikipedia.org/wiki/Contour_line
Contour Plot
Density Contour Plot
Multivariate Data Visualization
Multivariate Data

Example:
(Recap) Multiple 1D Data: Distribution
analysis
 Multiple distribution display
 Box Plots
 Violin Plots

Source: https://blogs.sas.com/content/graphicallyspeaking/2013/03/24/custom-box-plot
(Recap) Multiple 1D Data: Distribution
analysis
 Multiple distribution display
 Box Plots
 Violin Plots

https://datavizcatalogue.com/methods/violin_plot.html
(Recap)Multiple Time series data
Multivariate Data

Example:
Scatter plot Matrix: for Multivariate
Correlation analysis

 Scatterplot is designed to
compare only two
variables
Scatter plot Matrix: for Multivariate
Correlation analysis

 Scatterplot is designed to
compare only two
variables
 Scatterplot Matrix (SPLOM)
addresses the problem.
 Explores the correlation of
the data dimensions with
each other.
Correlation Display for multiple
variables
 Scatterplot is designed to
compare only two
variables
 Scatterplot Matrix
addresses the problem.
 Explores how several
variables interact with
each other.

https://seaborn.pydata.org/generated/seaborn.pairplot.html
Correlation Display for multiple
variables
 Scatterplot is designed to
compare only two
variables
 Scatterplot Matrix
addresses the problem.
 Explores how several
variables interact with
each other.
Correlation Analysis for multiple
variables
 Scatterplot is designed to
compare only two
variables
 Scatterplot Matrix
addresses the problem.
 Explores how several
variables interact with
each other.
Multivariate Analysis in a Single 2D plot

 Identify similarity/difference patten among


observations/measurements/rows, each characterized by a
common set of > 2 variables.
 Multivariate Visualizations
 Bubble chart
 Parallel Coordinates
 Spider/Star Plots
 Glyphs
3D Scatterplot

https://plotly.com/python/3d-scatter-plots/ Px.scatter_3d
Bubble Chart
An extended scatter plot

4D data in a single plot:

o 3 quantitative dimensions
o 1 categorical dimension
Parallel Coordinates

 Each axis can have a different scale as


each variable works off a different unit of
measurement.
 or each variable is normalized
 Values are plotted as a series of lines that
connected across all the axes.
 each line is a collection of points placed on
each axis, that have all been connected
together.
Dataviz Catalogue
Parallel Coordinates

 The order the axes are arranged in


can impact the way how the
reader understands the data.
 the relationships between adjacent
variables are easier to perceive,
then for non-adjacent variables.

Dataviz Catalogue
Parallel Coordinates
 ideal for comparing many
variables together and
seeing the relationships
between them.
 each variable is given its
own axis and all the axes
are placed in parallel to
each other.

Source: https://datavizcatalogue.com/methods/parallel_coordinates.html
How to visually read Parallel Coordinates

 When most lines between two


parallel axis are somewhat parallel to
each other, it suggests a positive
relationship between these two
dimensions.
 When lines cross in a kind of
superposition of X-shapes, it's a
negative relationship.
 When lines cross randomly, it shows
there is no particular relationship.

See Inselberg, A. (1997), "Multidimensional detective", Information Visualization, 1997.


for a full review of how to visually read out parallel coords' relational patterns.
Radar/Star/Spider Chart

 Each variable is provided with an axis


that starts from the center.
 All axes are arranged radially, with
equal distances between each other,
while maintaining the same scale
between all axes.
 Each variable value is plotted along its
individual axis.
 All the variables in a dataset and
connected together to form a
polygon.
Radar/Star/Spider Chart
 allows comparison of
observations with multiple
quantitative variables.
 allows the viewer to see
which variables have similar
values or if there are any
outliers amongst each
variable.
 each variable is given its own
axis.
 the axes are arranged
radially
Source: Dataviz Catalogue
https://datavizcatalogue.com/methods/radar_chart.html
Radar/Star/Spider Chart

https://en.wikipedia.org/wiki/Radar_chart
Dimension
Reduction
Dimension Reduction

 The process of converting a set of data having ”d” dimensions into


data with “m” dimensions where “m≪d” ensuring that it retains as
much information as possible.
 Dimensionality reduction is used
 to easily see patterns and clusters of similar or dissimilar data.
 Assumption is:
 if done properly the data displayed in low dimension (say 2D or 3D)
preserves structures present in higher (original) dimensions.
Example (recap)

 Nutrition chart for almost all food


items are available
 What if we want to categorize food
based on this nutrition data. What will
be the best categorization?
 What if we want to create 2D Scatter
plot and create clusters of similar
food items?
Options

1. Remove redundant dimensions


 Detect Highly correlated dimensions
2. Create new dimensions
 from weighted combination of existing dimensions
Principal Component Analysis
Principal Component Analysis (PCA)

 PCA finds a new set of dimensions such that


 all the dimensions are linearly independent (a.k.a orthogonal)
 ranked according to the variance of data along them. It means more
important principlal axis occurs first. (more important = more
variance/more spread-out data)
 The direction of maximum spread of data is called the first Principal
axis.
 This is the direction where there is the most variance, the direction
where the data is most spread out.
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)

https://towardsdatascience.com/https-medium-com-abdullatif-h-
dimensionality-reduction-for-dummies
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)

 Measures data in terms of its principal axes rather than on a normal


axis of the data variable.
 takes a linear combination of the original variables into a new set of
variables. These new set of variables are known as principal
components.
 PCA is a projection based method which transforms the data by
projecting it onto a set of orthogonal axes.
Principal Component Analysis (PCA)

 PCA finds a new set of dimensions such that


 all the dimensions are orthogonal (and hence linearly independent)
and
 ranked according to the variance of data along them. It means more
important principlal axis occurs first. (more important = more
variance/more spread out data)
 The direction of maximum spread of data is called the Principal axis.
 They are the directions where there is the most variance, the directions
where the data is most spread out.
 Eigen Value Decomposition of data gives us Eigenvectors and
Eigenvalues. The vectors with maximum Eigen value is the 1st
principal component.
Eigenvectors and Eigenvalues

 Eigenvectors and values exist in pairs:


 every eigenvector has a corresponding eigenvalue.
 An eigenvector is a direction
 An eigenvalue is a number, telling you how much variance there is in
the data in that direction, in other words telling us how spread out the
data is on the line.
 The eigenvector with the highest eigenvalue is therefore the
principal component.
 The number of eigenvectors/values that exist equals the number of
dimensions the data set has.
 However, we choose the first “k” of them and hence have a reduced
dimension.
Principal Axes
Data In the Principal Axes
PCA Analysis

 Calculate the covariance matrix X of data points.


 Calculate eigenvectors and corresponding eigenvalues for the
covariance matrix
 Sort the eigen vectors according to their eigen values in decreasing
order.
 Choose first k eigen vectors and that will be the new k dimensions.
 Transform the original n dimensional data points into k dimensions.

https://medium.com/@aptrishu/understanding-
principle-component-analysis-e32be0253ef0
Variance, Covariance (Recap)

 Variance
 measure of the variability or
 simply measures how spread the data set is.
 Mathematically: the average squared deviation from the
mean .
 Covariance
 measure of the extent to which corresponding elements from
two sets of ordered data move in the same or opposite
direction.
 Covariance matrix is symmetric.
 we discussed earlier we want the data to be spread out i.e. it
should have high variance along dimensions.
 If two dimensions are independent of each other then
covariance should be zero
Covariance Matrix

For a dataset with D dimensions


 the covariance matrix is
 a symmetric DxD matrix
 variance of dimensions as the main diagonal elements
 covariance of dimensions as the off diagonal elements.
 The goal of PCA is to transform the dataset such that
 variance is maximum
 covariance is zero
 That means we need a diagonal matrix
 So PCA works by diagonalization of the covariance
matrix.
Covariance Matrix

𝑥 , ⋯ 𝑥,
 Let us assume that the data is reshaped in a form ⋮ ⋱ ⋮ where dimensions are
𝑥 , ⋯ 𝑥 ,
rows and observations are columns.
 Covariance between two dimensions k and m ?
 ∑ 𝑥 , −𝑥 𝑥 , −𝑥
𝑥 , −𝑥 ⋯ 𝑥 , −𝑥
 For convenience we will transform the data rows to zero mean: ⋮ ⋱ ⋮
𝑥 , −𝑥 ⋯ 𝑥 , −𝑥
𝑥 , −𝑥 ⋯ 𝑥 , −𝑥
 𝑿= ⋮ ⋱ ⋮
𝑥 , −𝑥 ⋯ 𝑥 , −𝑥
 Covariance Matrix of X?
 𝑿𝑿
PCA Analysis

 Let’s say the dataset is X has zero mean and its covariance matrix is
CX
 That means C𝐱 = 𝑿𝑿
 We want to transform them to Y such that its covariance matrix CY is a
diagonal matrix
 Let 𝒀 = 𝑷𝑿
 Then C𝐘 = 𝒀𝒀 = 𝑷𝑿 𝑷𝑿 = 𝑃𝑋𝑋 𝑃
 C𝐘 = 𝑃C𝐱 𝑃
 Covariance matrices are Symmetric.
PCA Analysis

 𝐶 = 𝑃𝐶 𝑃
 What P to choose such that CY is a diagonal matrix?
 In linear algebra, eigen decomposition is the factorization of a
matrix into a diagonal form, whereby the matrix is represented in
terms of its eigenvalues and eigenvectors.

https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix#Real_symmetric_matrices
Eigenvector and Eigenvalue of a
matrix
 If Av =  v where A is a matrix, v is a vector and  is a
scalar then v and  are respectively the eigenvector and
eigenvalue of A
 Av = Iv
 So (A-  I)v = 0
 That means determinant of (A-  I) = 0
 Determinant is a “m” degree polynomial in  and has
“m”  roots.
 Hence, for m×m matrix A, there are “m” eigen vectors and “m”
eigen values.
See: https://www.mghassany.com/MLcourse/principal-
 AV =V or A = VV-1. components-analysis.html
PCA Analysis

 Eigen decomposition of Symmetric matrix


 where 𝑄 is an orthogonal matrix whose columns are Eigen
Vectors
 Λ is a diagonal matrix whose diagonal values are Eigen Values.
 Rearranging we get
 Compare it what we wanted to do for PCA
 𝐶 = 𝑃𝐶 𝑃
 So 𝑃 is 𝑄 , the transpose of Eigen vector matrix.

https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix#Real_symmetric_matrices
PCA Demo

 https://setosa.io/ev/principal-component-analysis/
PCA Dimension Reduction of IRIS
Dataset
PCA: simplified explanation

 https://builtin.com/data-science/step-step-explanation-principal-
component-analysis
PCA Analysis (recap)

 Eigen decomposition of Symmetric matrix


 where the columns of 𝑄 are Eigen Vectors
 Λ is a diagonal matrix whose diagonal values are Eigen Values.

, ,
 Compute
, ,

 The covariance matrix of Y: is a diagonal matrix.


(recap) Dimension Reduction in
Data Visualization
 Reducing the number of input dimensions to 2 Dimension, and
visualize the patterns and clusters using scatter plot
(recap) PCA for dimension
reduction
 PCA (Principal Component Analysis): The most commonly used
approach to dimension reduction
 transforms high-dimensions data into lower-dimensions by linear
transformation (linear combination of the data dimensions to retain
retaining as much information in the first few transformed data
dimensions.
(recap) PCA for dimension
reduction
 PCA (Principal Component Analysis): The most commonly used
approach to dimension reduction
 transforms high-dimensional data into lower-dimensions by linear
transformation (linear combination of the data dimensions to retain
retaining as much information in the first few transformed data
dimensions.
 the principal components are eigenvectors of the data's covariance
matrix. Thus, the principal components are often computed by eigen-
decomposition of the data covariance matrix (or singular value
decomposition of the data matrix).

https://en.wikipedia.org/wiki/Principal_component_analysi s
Multidimensional Scaling
Multidimensional Scaling (MDS)

 Maps data from n-D space to k-D space such that difference in
distance between the data points is minimized.
 Given a distance matrix with the distances between each pair of
objects in a set, and a chosen number of dimensions, k, an
MDS algorithm places each object into k-dimensional space such
that the between-object distances are preserved as well as
possible.
 For scatter-plot visualization we can choose k = 2.

https://en.m.wikipedia.org/wiki/Multidimensional_scaling
MDS Algorithm

 Mathematically:
 MDS takes an input matrix giving dissimilarities (say, distance) between
pairs of items and outputs a coordinate matrix whose configuration
minimizes a loss function called strain.
 Compute the Euclidean distance between data item i and data item j
and minimize the stress function:
Metric MDS

𝟏
 Centering matrix 𝑋: 𝑋 = 𝑋 − 𝑋 = 𝑰 − 𝑋

 Recap: We did Eigen decomposition of Covariance matrix 𝑋 𝑋 to


compute PCA
 A similar approach may be applied here:
 Doing singular value decomposition of 𝑋 : 𝑋 = 𝑈𝑆𝑉
 Uses Eigen decomposition of Gram Matrix 𝐊 𝐜 = 𝑋 𝑋 : 𝑋 𝑋 = 𝑈𝑆 𝑈
 Distance between two
𝟏 𝟏
 Double Centering: 𝐊 𝐜 = 𝑋 𝑋 = 𝑰 − 𝑋𝑋 𝑰− where 𝑋𝑋 is the
Gram Matrix of the uncentered X.

Ref: https://stats.stackexchange.com/questions/14002/whats-the-difference-between-principal-component-
analysis-and-multidimensional
General MDS Algorithm

 input: D matrix: D matrix could be the Euclidean distance in the m


dimensional space, or could be dissimilarity between every
1. Assign points to arbitrary coordinates in k-dimensional space.
2. Compute Euclidean distances among all pairs of points, to form the D’
matrix.
3. Compare D’ matrix with the input D matrix by evaluating the stress
function. The smaller the value, the greater the correspondence
between the two.
4. Adjust coordinates of each point in the direction that best maximally
stress. (Optimization step)
5. Repeat steps 2 through 4 until stress won't get any lower.
MDS Algorithm

 A typical strain/stress function: ∑ 𝑑 −𝑑 where


the sum is over N data points.


 or its scaled version ∑

 or some variation of the equations.

http://www.analytictech.com/networks/mds.htm
MDS Algorithm: which “k” to choose?

 If there is an option, the elbow of


the residual stress plot (aka Scree
Plot) may guide the choice of “k”.

http://www.analytictech.com/networks/mds.htm
Scree plot
 Scree is a collection of broken rock fragments at the base
of mountain cliffs, that has accumulated through periodic rockfall from cliff
faces.

https://en.wikipedia.org/wiki/Scree
Uncertainty Visualization
Uncertainty

 Uncertainty
 The lack of certainty, a state of limited knowledge where it is impossible
to exactly describe the existing state, a future outcome, or more than
one possible outcome. [Wiki]
 ex: Who will win presidential election?
Example
Not Verified

Source: https://www.adda247.com/upsc-exam/unemployment-rate-in-india/
Unemployment

 How is it measured?
 Does the government count every unemployed person each year?
 To do this, every home in the country would have to be contacted—just as in
the population census every 10 years.
 This procedure would cost way too much and take far too long to produce the
data.
 In addition, people would soon grow tired of having a census taker contact them
every month, year after year, to ask about job-related activities.
Unemployment Measure: In US

 The government conducts a monthly survey called the Current Population Survey
(CPS) to measure the extent of unemployment in the country.
 60,000 eligible households in the sample for this survey.
 approximately 110,000 individuals each month, and are asked about the labor force
activities (jobholding and job seeking)
 Sample is representative of the entire population of the United States.
 all of the counties and independent cities in the country first are grouped into approximately
2,000 geographic areas (sampling units).
 a sample of ~ 800 of these geographic areas are chosen to represent each state and DC.
 Every month, one-fourth of the households in the sample are changed, so that no household
is interviewed for more than 4 consecutive months.
 After a household is interviewed for 4 consecutive months, it leaves the sample for 8 months,
and then is again interviewed for the same 4 calendar months a year later, before leaving
the sample for good.
 As a result, approximately 75 percent of the sample remains the same from month to month
and 50 percent remains the same from year to year.
US Bureau of Labor Statistics
Unemployment Measure: In India

 Measuring unemployment in India is difficult due to the informal


nature of jobs. Unlike developed economies, individuals do not hold
one job year-round.
 An individual may be unemployed this week, but may have worked
as a casual labourer last month, and as a farmer for most of the
year. Are they to be counted as unemployed?
 The National Sample Survey Organisation (NSSO) adopts two major
measures for classifying the working status of individuals in India —
the Usual Principal and Subsidiary Status (UPSS) and the Current
Weekly Status (CWS).

https://www.thehindu.com/business/Economy/how-unemployment-is-measured/article67278546.ece
Back to Unemployment Rate &
Uncertainty
 Sample and Population:
 Population: All the citizens of the country
 Parameter: the actual unemployment number collected by asking each
member of the population
 Sample: Subset of citizens queried
 Estimate: the number computed from the sample.
Uncertainty

 Measurement of uncertainty
 the range of possible values within which the true value lies and the
probability assigned to the possible values.
 Often standard deviation and standard error are general measures of
uncertainty around a particular single choice (or the measured value or
a mean, median, or mode).

Wiki
Uncertainty Measure

Key concepts of statistical sampling.

https://serialmentor.com/dataviz/visualizing-
uncertainty.html
Standard Deviation vs Standard
Error
 standard deviation (SD) measures the amount of variability,
or dispersion, for a subject set of data from the mean,
 How much spread there is in the data around the mean.

 SD =

 standard error of the mean (SEM) measures how far the sample mean
of the data is likely to be from the true population mean.
 How accurate the sample mean is.
 The SEM is always smaller than the SD.
 SEM = SD/√𝑁
 SD is a measure of volatility (spread of measured value).
 68% of the possible values around the measured values are withing 1 SD.
 SEM is the standard deviation of the means within a dataset.
Confidence Interval

 When you know the Standard


deviation and sample mean:
 Confidence interval around the
mean : 𝑧 ∗

https://www.dummies.com/education/math/statistics/how-to-calculate-a-confidence-
interval-for-a-population-mean-when-you-know-its-standard-deviation/
Confidence Interval in Polling

 Let us assume in a polling we have an


estimate of population proportion:
 Confidence interval of the actual
population proportion around :
( )
𝑧∗

See: https://www.dummies.com/education/math/statistics/how-to-determine-the-
confidence-interval-for-a-population-proportion/
For polling visualization see: https://fivethirtyeight.com/
Error Bars for Uncertainty visualization

 Five different way to


represent uncertainty using
error bar.
Standard Error and Sample Size
Error Bar: IRIS data example
Point Range: IRIS data example
Linear Regression revisited

𝑦 = 𝑎 + 𝑏𝑥

where
𝑎: 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡
𝑏: 𝑠lope

𝑎, 𝑏are computed
Least squared error
minimization,
where

Error = ∑ 𝑦 − 𝑎 + 𝑏𝑥
Linear Regression revisited
𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑑 𝑅𝑒𝑑𝑖𝑑𝑢𝑎𝑙 𝑆𝑆𝑅 𝑜𝑓 𝑙𝑖𝑛𝑒𝑎𝑟
𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛:
𝑦 − 𝑎 + 𝑏𝑥

𝑜𝑟
𝑦 −𝑦

where 𝑦 is the
predicted value.
Error in Estimate of Slope

 random variation in the sample data will result in variation in slope.


 The estimate of this random variation in slope is the standard error
(SE) of the slope :

 𝑆𝐸 = ×∑ ̅

 Where 𝑆𝑆𝑅 = ∑ 𝑦 − 𝑦
Error in Estimate of intercept

 random variation in the sample data will result in variation in


intercept.
 The estimate of this random variation in intercept is the standard
error (SE) of the intercept :
̅
 𝑆𝐸 = × ∑ ̅

 Where 𝑆𝑆𝑅 = ∑ 𝑦 − 𝑦

The standard error in slope and intercept are used to calculate the confidence
interval for the intercept estimate.
The 95% confidence ( *100) interval is given by:
intercept ± t(/2 , n-2)*SEintercept
slope ± t(/2 , n-2)*SEslope
Seaborn support

sns.regplot(df,
x="sepal_length",
y="sepal_width“
)

Also available in
sns.lmplot and
sns.jointplot

You might also like