Lecture2_VA_Handling_Data
Lecture2_VA_Handling_Data
Rubin’s vase
What do you see in this figure? What do you see in this figure?
IITK CS661: Big Data Visual Analytics: Soumya Dutta 6
Pre-attentiveness
• Also called pop-out
Overview first
IITK CS661: Big Data Visual Analytics: Soumya Dutta 16
Visual Information Seeking Mantra
• Ben Shneiderman’s Mantra: Overview, zoom and filter, then details-on-demand!
Zoom
IITK CS661: Big Data Visual Analytics: Soumya Dutta 17
Visual Information Seeking Mantra
• Ben Shneiderman’s Mantra: Overview, zoom and filter, then details-on-demand!
Filter
IITK CS661: Big Data Visual Analytics: Soumya Dutta 18
Visual Information Seeking Mantra
• Ben Shneiderman’s Mantra: Overview, zoom and filter, then details-on-demand!
Details on demand
IITK CS661: Big Data Visual Analytics: Soumya Dutta 19
Another Paradigm: Focus + Context
• Focus + Context:
• One single view which shows information in direct context
• Maintains continuity and do not require viewer to shift back and forth
• But: there is distortion!
IITK CS661: Big Data Visual Analytics: Soumya Dutta Source: Google 23
Humans Are Imperfect
• Spot the difference: Change blindness
IITK CS661: Big Data Visual Analytics: Soumya Dutta Source: Wikipedia 24
Human Limitations for Visualization
• The Magic Number Seven (7 ± 2) for visualization
• Not more than 7 ± 2 segments in a pie chart
• Not more than 7 ± 2 colors in a line chart
• and so on …..
Miller, G.. (1956). "The magical number seven, plus or minus two: Some limits on our capacity for processing information".
IITK CS661: Big Data Visual Analytics: Soumya Dutta 25
Example of Visual Complexity
• Standardization
• Standardization
• IQR = Q3 – Q1
• Difference between the 75th percentile and the 25th percentile data
• Immune to outliers
• Relies on the median and IQR, which are robust to extreme values
• Ensures that most of the data falls within a consistent range after scaling
Imbalanced Data
• Alternatives
• Buy more storage
• Buy more computers or faster ones
• Develop more efficient algorithms
Summary Data
• Distribution-based
• Clustering
• Sampling (Later in the course)
• Systematic/Regular
• Random Big Data
• Stratified
• Adaptive/Data-driven
• Importance-driven
Sampling
• Cluster-based
• Dimension Reduction (Later in the course)
AI/ML model
• AI/ML techniques (Later in the course)