Big Data and Data Visualization
Big Data and Data Visualization
• Understand visualizations
• Big data helps to analyze the in-depth concepts for the better decisions and
strategic taken for the development of the organization.
The Evolution of Big Data
Volume: scale of data
Volume: scale of data
• 90% of today’s data has been created in just the last 2 years
• Every day we create 2.5 quintillion bytes of data or enough to fill 10 million Blu-ray discs
• 40 zettabytes (4o trillion gigabytes) of data will be created by 2020, an increase of 300
times from 2005, and the equivalent of 5,200 gigabytes of data for every man, woman and
child on Earth
• Most companies in the US have over 100 terabytes (100,000 gigabytes) of data stored
Categories of Big Data - I
Categories of Big Data - II
What is the importance of Big Data?
Who are the ones who use the Big Data
Technology?
Brief explanation of how
exactly businesses are utilizing Big Data
Big Data Technologies
“Big Data” on PubMed
1400
1196
1200
Instances of “Big Data”
1000
800 723
600
463
400
201
200
2 1 9 3 2 7 41
0
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Year
Big Data and Librarians
Contracts Monitoring
Public Commercial
Sensor
Credit
Weather
Industry
Population Social Media Sentiment
Economic
Network
Types of Data
Which Big Data characteristic is the
biggest issue for your organization?
Velocity of
data
16%
Variety of
data
Volume of 48%
data
35%
Source: Getting Value from Big Data, Gartner Webinar, May 2012
Biggest opportunity for Big Data in your
organization?
• 85% of Fortune 500
organizations will be unable
to exploit big data for
competitive advantage.
• Many companies are performing new kinds of analytics (**sentiment analysis, etc.), to
better and more quickly understand and respond to what customers are saying about them
and their products.
• The cloud, and appliances are being used as data stores
• Advanced analytics are growing in popularity and importance
**Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text
analysis and computational linguistics to identify and extract subjective information in source materials.
Analytics Models How can we make it
happen?
happen? Analytics
Diagnostic
What happened? Analytics
Descriptive
Analytics
DIFFICULTY
Descriptive Analytics
• Descriptive analytics, such as reporting/ OLAP, dashboards, and data visualization, have
been widely used for some time.
• They are the core of traditional BI.
• Algorithms for predictive analytics, such as regression analysis, machine learning, and
neural networks, have also been around for some time.
• Common-sense advice
• Invented box plot
• Worked for various US
government agencies
Jacques Bertin1967
• Principle of expressiveness:
• Say everything you want to say — no
more, no less
• Don’t mislead
• Principle of effectiveness:
• Use the best method available for
showing your data
• Cartographer
Jacques Bertin
Seven Visual Variables
• Position
• Size
• Shape
• Color
• Brightness
• Orientation
• Texture
Edward Tufte
1983
Idea
generation
Exploratory
Four Types of Data Visualizations
Declarative
Idea Everyday
illustration dataviz
Conceptual Data-Driven
Idea Visual
generation discovery
Exploratory
Data Visualization
condense information
What makes a good chart?
http://www.popvssoda.com/countystats/total-county.html
Some basic principles (adapted from Tufte 2009)
“If the statistics are boring, then you’ve got the wrong
numbers.” (Tufte 2009)
Principle 2: The chart should have graphical
integrity
• Basically, it shouldn’t “lie” (mislead the reader)
• Tufte’s “Lie Factor”:
𝑠𝑖𝑧𝑒 𝑜𝑓 𝑒𝑓𝑓𝑒𝑐𝑡 𝑠ℎ𝑜𝑤𝑛 𝑖𝑛 𝑔𝑟𝑎𝑝ℎ𝑖𝑐
• 𝐿𝑖𝑒 𝐹𝑎𝑐𝑡𝑜𝑟 =
𝑠𝑖𝑧𝑒 𝑜𝑓 𝑒𝑓𝑓𝑒𝑐𝑡 𝑖𝑛 𝑑𝑎𝑡𝑎
Should be ~ 1
5.3/0.6 8.83
𝐿𝐹 = = = 5.77
27.5/18 1.53
Reprinted from
Tufte (2009), p. 4280% (𝑐ℎ𝑎𝑛𝑔𝑒 𝑖𝑛 𝑣𝑜𝑙𝑢𝑚𝑒)
57 & p. 62 𝐿𝐹 = = 9.4
454% (𝑐ℎ𝑎𝑛𝑔𝑒 𝑖𝑛 𝑝𝑟𝑖𝑐𝑒)
Principle 3: The chart should minimize
graphical complexity
Key concepts
Sometimes
a table is Data-ink Chart junk
better
When a table is better than a chart
For a few data points, a table can do just as well…
• Large amount of
information in a very
small space
Should be ~ 1
< 1 = more non-data = 1 implies all ink
related ink in graphic devoted to data
Tufte’s principle:
Erase ink whenever possible
Being conscious of data ink
Lower data-ink ratio
Hypothetical City Crime
(worse) 425
375
275
225
175
Hypothetical City Crime 125
425 75
375 25
Thefts per 100000 citizens
275
225
200
Sometimes it’s
140000
120000
100000
80000
60000
40000
really a matter of
20000
0 preference.
Order Date
These both
Sum of Extended Price
Order Date
3-D Charts
Source: Knaflic (2015). Storytelling with Data: A Data Visualization Guide for Business Professionals. Chapter 2.
Chartjunk: Data Ink “gone wild”
$0.00
Hypothetical City Crime
425
375
275
225
175
125
75
25
2003 2004 2005 2006 2007 2008 2009 2010
Example: The Grid
Hypothetical City Crime
425
375
Why are these
Thefts per 100000 citizens
325
275 examples of
225
175 chartjunk?
125
75
25
2003 2004 2005 2006 2007 2008 2009 2010 Hypothetical City Crime
425
375
275
What could you 225
125
75
25
2003 2004 2005 2006 2007 2008 2009 2010
Data Ink Working For Us
Evaluate this
chart in terms of
Data Ink.
Imagine this as
a bar chart. As a
table!!
Review: Data principles (adapted from
Tufte 2009)
1 • The chart should tell a story
http://the-digital-reader.com/2015/04/13/infographic-ebooks-on-track-to-double-dutch-ebook-market-in-2014/
Summary
• Use data visualization principles to assess a visualization
• Tell a story
• Graphical integrity (lie factor)
• Minimize graphical complexity (data ink, chartjunk)
• Explain how a visualization can be improved based on those principles
• Types of visualization
Resources…
• DataMed https://datamed.org/
• Institute for Health Metrics and Evaluation’s Global Health Data Exchange
http://ghdx.healthdata.org/
• NNLM RD3: Resources for Data-Driven Discovery https://nnlm.gov/data/
• NNLM’s YouTube Channel https://www.youtube.com/channel/UCmZqoegBFKJQF69V8d-
05Bw
• OHSU’s Big Data to Knowledge https://dmice.ohsu.edu/bd2k/topics.html
• Registry of Research Data Repositories (re3data.org) http://www.re3data.org/
References
• Borgman, Christine L. Big data, little data, no data: Scholarship in the
networked world. MIT Press, 2015.
• Federer, Lisa. Beyond the SEA: Data Science 101: An introduction for
librarians https://www.youtube.com/watch?v=i78ciP1eGxo&t=3s
• Mayer-Schönberger, Viktor, and Kenneth Cukier. Big data: A
revolution that will transform how we live, work and think. Houghton
Mifflin Harcourt, 2013.
Bibliography…
• A Good Example of Misleading Visualization
• http://spatial.ly/2009/09/a-good-example-of-misleading-visualization/
• A quick guide for better data visualizations
• https://www.tableau.com/good-to-great
• The analysis of visual variables for use in the cartographic design of point symbols for mobile Augmented Reality applications
• Łukasz Halik, Adam Mickiewicz University Poznan
• http://www.iag-aig.org/attach/30dee1f85f7bd479367f1f933d48b701/V61N1_2FT.pdf
• The Benefits and Future of Data Visualization
• StatSilk founder Frank van Cappelle
• https://www.statsilk.com/blog/benefits-and-future-data-visualization
• Charting Statistics
• Mary Eleanor Spear
• https://archive.org/details/ChartingStatistics
• Color Brewer
• http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3
Bibliography…
• Data: The language of modern business leaders
• Steve Proctor, March 17, 2017
• http://www.itbusiness.ca/sponsored/data-the-language-of-modern-business-leaders
• Data Visualization 101: How to Choose the Right Chart or Graph for Your Data
• Jami Oetting
• https://blog.hubspot.com/marketing/types-of-graphs-for-data-visualization
• Datavis.ca
• http://www.datavis.ca/index.php
Bibliography
• Diverging color schemes: Showing good data isn't enough; you need to show it well
• Alberto Cairo, June 26, 2016
• http://www.thefunctionalart.com/2016/06/diverging-color-schemes-showing-good_26.html
• 8 Horrible Data Visualizations That Make No Sense
• Eric Limer, September 02, 2013
• http://gizmodo.com/8-horrible-data-visualizations-that-make-no-sense-1228022038
• 55 Striking Data Visualization and Infographic Poster Designs
• Igor Ovsyannykov, May 16, 2011
• http://inspirationfeed.com/inspiration/infographics/55-striking-data-visualization-and-infographic-poster-designs/
• 4 Tips for Promoting Predictive Analytics in Your Organization
• Fern Halper, September 26, 2017
• https://tdwi.org/articles/2017/09/26/ADV-ALL-4-Tips-for-Promoting-Predictive-Analytics.aspx
THANK YOU