0% found this document useful (0 votes)
11 views59 pages

Lecture1_Introduction_bd248164-5dbb-4aa0-bf56-90f35db41208

The document outlines the course structure for CS 661: Big Data Visual Analytics at IIT Kanpur, taught by Soumya Dutta. It includes details on class timings, course topics, grading schemes, policies on attendance and academic honesty, and required resources. The course aims to provide students with a comprehensive understanding of data visualization, analytics, and the application of machine learning in visual computing.

Uploaded by

Swaraj Sonavane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views59 pages

Lecture1_Introduction_bd248164-5dbb-4aa0-bf56-90f35db41208

The document outlines the course structure for CS 661: Big Data Visual Analytics at IIT Kanpur, taught by Soumya Dutta. It includes details on class timings, course topics, grading schemes, policies on attendance and academic honesty, and required resources. The course aims to provide students with a comprehensive understanding of data visualization, analytics, and the application of machine learning in visual computing.

Uploaded by

Swaraj Sonavane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Big Data Visual Analytics (CS 661)

Instructor: Soumya Dutta


Department of Computer Science and Engineering
Indian Institute of Technology Kanpur (IITK)
email: [email protected]
Logistics

IITK CS661: Big Data Visual Analytics: Soumya Dutta 2


Course Staff
• Instructor:
• Soumya Dutta ([email protected])
• https://soumyadutta-cse.github.io/
• TAs:
• TA information will be provided to you soon
• Contact your assigned TA for grading help

• We will use HelloIITK for this course

IITK CS661: Big Data Visual Analytics: Soumya Dutta 3


Class Timings
• Monday & Wednesday
• Time: 2:00pm – 3:15pm
• Location: Rajeev Motwani (RM) Building, Room: 101 (RM - 101)
• Office hours: By email appointment

IITK CS661: Big Data Visual Analytics: Soumya Dutta 4


Course Topics
Index Module Topics Covered
1 Fundamentals of Data Introduction to Visualization and Visual Analytics
Visualization
Foundations of Data Visualization, Visual Abstractions, Visual Variables, Various types of Data

2 Scientific Visualization Big Data Characteristics, Data Reduction, Various Data Models; Visualization Pipeline
(SciVis)
Scientific Visualization Software such as VTK, ParaView, etc.
Isosurface Algorithm; Volume Rendering Algorithm
3 Information Visualization Fundamentals of Information Visualization, Software for Information Visualization
(InfoVis)
High Dimensional Data Analysis and Visualization Techniques

4 Big Data Analysis and Big Data Analytics, Statistical Modeling Techniques
Visual Computing
Techniques Information Theory Techniques for Visualization
Time-varying Data, Ensemble Data and Uncertainty Visualization
5 Machine Learning for Machine Learning for Visual Computing and Visual Analytics
Visual Computing of Large
Applications of Machine Learning to Big Data Visual Computing
Data
Visual Analytics and Explainability of Machine Learning Models
6 Advanced Topics Extreme-scale Data Analytics, Parallel and High-Performance Visualization
Exascale Computing, Future Paradigms

IITK CS661: Big Data Visual Analytics: Soumya Dutta 5


Grading/Evaluation Scheme
Category Split
Attendance 5%
Quiz 10%
Programming Assignments 30%
Mid Sem 25%
Final Sem Project 30%

• Attendance will be taken for a subset of classes and marks will be assigned
based on them
• Assignments: Group of 2
• Final semester project: Group of ~7/8 (will be decided later)
IITK CS661: Big Data Visual Analytics: Soumya Dutta 6
Noteworthy Points
1. We might add new, drop existing, or reorder topics depending on the
progress and class feedback. Things may be changed by mutual
consent after discussion in class.
2. Lectures in the class are the best resources.
3. Grading will be relative.
4. If required, extra classes will also be conducted in weekends.

IITK CS661: Big Data Visual Analytics: Soumya Dutta 7


Policies
• Please be on time for the lectures.
• Attendance will be taken in class from time to time and the marks will be
awarded based on it.
• You are expected to submit your assignments on time.
• There is a 10% penalty each day after the submission deadline for up to 20%
(2 late days). After that, you get zero. This policy will be strictly followed.
• Students caught cheating or plagiarizing will be dealt with heavy punishment
and could automatically fail the course and will be reported to the institute.
• Please cite your sources properly in your work.
• Your assignments should be your own original work.
• If you are unwell, please follow the standard IITK procedure.
IITK CS661: Big Data Visual Analytics: Soumya Dutta 8
Academic Honesty
• Please DON’T CHEAT or Plagiarize!!
• We will do plagiarism check
• Students caught cheating or plagiarizing may fail the course and will
be reported to the institute.
• You must cite all sources in your works including AI tools.
• Your assignments should be your own original work.
• IITK CSE Anti-cheating policy:
https://www.cse.iitk.ac.in/pages/AntiCheatingPolicy.html
• The List of Things I Never Want To Hear Again (by Tamara Munzner)
• https://www.cs.ubc.ca/~tmm/courses/cheat.html

IITK CS661: Big Data Visual Analytics: Soumya Dutta 9


Plagiarism Flow Chart

IITK CS661: Big Data Visual Analytics: Soumya Dutta https://thevisualcommunicationguy.com/ 10


Resources
• Data Visualization: Principles and
Practice by Alexandru C. Telea, CRC
Press
• Visualization Analysis and Design by
Tamara Munzner, A K Peters
Visualization Series, CRC Press
• The Visualization Handbook edited by
Charles D. Hansen and Chris R.
Johnson
• Research papers and other study
materials provided during the class to
cover selected topics

IITK CS661: Big Data Visual Analytics: Soumya Dutta 11


Visualization is Cool! Applications in Science

IITK CS661: Big Data Visual Analytics: Soumya Dutta https://www.youtube.com/watch?v=etUWNf2ZZpg 12


Visualization is Cool! Applications in ML

IITK CS661: Big Data Visual Analytics: Soumya Dutta https://www.youtube.com/watch?v=wvsE8jm1GzE&list=RDQMjKx69KjeL5I&index=4 13


Course Overview: What Are You Going
to Learn?

IITK CS661: Big Data Visual Analytics: Soumya Dutta 14


Overview

Data

IITK CS661: Big Data Visual Analytics: Soumya Dutta 15


Overview

Data Visualization

IITK CS661: Big Data Visual Analytics: Soumya Dutta 16


Overview

Data Visualization Analytics

IITK CS661: Big Data Visual Analytics: Soumya Dutta 17


Data
• Various types of data
• How to handle such data?
• How to process and analyze such data?
• How to Visualize such data?
• How to perform interactive analytics with data?
• How to find features/patterns from data?
• How to deal with big data?
• How to intelligently summarize large data?

IITK CS661: Big Data Visual Analytics: Soumya Dutta 18


From Data to Visualization

Visualization
in Physical Science

Visualization
in Medical Science

IITK CS661: Big Data Visual Analytics: Soumya Dutta Source: Google 19
From Data to Visualization

Visualization of Covid 19 Data

IITK CS661: Big Data Visual Analytics: Soumya Dutta Source: Google 20
From Data to Visualization

Visualization of correlation

IITK CS661: Big Data Visual Analytics: Soumya Dutta Source: Google 21
From Data to Visualization

Visualization for finding relationships in variables

IITK CS661: Big Data Visual Analytics: Soumya Dutta Source: Google 22
From Data to Visualization

Visualization for finding connections in variables

IITK CS661: Big Data Visual Analytics: Soumya Dutta Source: Google 23
High Dimensional Data to Visualization Space

Visualizing High Dimensional Data

IITK CS661: Big Data Visual Analytics: Soumya Dutta http://vis.pku.edu.cn/hdvis/en/projects/interactiveCustomization.html 24


From Data to Transformed Representation

Sampling ML Model

IITK CS661: Big Data Visual Analytics: Soumya Dutta 25


From Transformed Data to Visualization

IITK CS661: Big Data Visual Analytics: Soumya Dutta Source: Google 26
Visualization for Machine Learning

DQNViz: A Visual Analytics Approach to Understand Deep Q-Networks, J. Wang et al. TVCG
IITK CS661: Big Data Visual Analytics: Soumya Dutta https://ieeexplore.ieee.org/document/8454905 27
Visualization for Machine Learning

Visualizing Results of a ML Model Prediction


IITK CS661: Big Data Visual Analytics: Soumya Dutta Source: Google 28
Visualization for Machine Learning: Loss Landscapes

Loss landscape of ResNet-56 Loss landscape of ResNet-56


without skip-connection with skip-connection

IITK CS661: Big Data Visual Analytics: Soumya Dutta https://arxiv.org/pdf/1712.09913 29


Visualization for Machine Learning

IITK CS661: Big Data Visual Analytics: Soumya Dutta Source: Google 30
Visualization for Machine Learning
• CNNVis
• http://shixialiu.com/publications/cnnvis/demo/

• CNN Explainer
• https://poloclub.github.io/cnn-explainer/

• Understanding DNN:
• https://distill.pub/2020/grand-tour/

• GAN lab
• https://poloclub.github.io/ganlab/

IITK CS661: Big Data Visual Analytics: Soumya Dutta 31


Visualization and Data Analysis at Exascale
• High Performance Visualization
• In Situ Data Analysis and Visualization

Frontier: World’s First Exascale Supercomputer (https://www.olcf.ornl.gov/frontier/)


Exascale: 1018 IEEE 754 Double Precision (64-bit) operations (multiplications
and/or additions) per second (ExaFlops)

IITK CS661: Big Data Visual Analytics: Soumya Dutta Source: Google 32
Why Should You Learn Visual Analytics?

The growth of jobs mentioning “data visualization” as a skill from 2010 has steadily increased from
only 1,888 jobs in 2010 to 30,327 jobs in 2017 (16X growth)

IITK CS661: Big Data Visual Analytics: Soumya Dutta Teaching Data Visualization as a Skill, Ryan et al, IEEE CG&A 2019 33
What is Expected From You?
• Basic knowledge of Linear Algebra, Probability, and Statistics
• Strong programming background (C/C++, Python, JavaScript)
• Interest and Motivation to learn new topics (sometimes on your own!)
• Creativity and Imagination

If you do not have the above skills or unsure, talk to me!

• Goal of the Course: Give you a comprehensive view of the Big Data Visual
Computing and Analysis Domain
• Conduct research in these topics
• Use the learned skills in industry/academia

IITK CS661: Big Data Visual Analytics: Soumya Dutta 34


Introduction

IITK CS661: Big Data Visual Analytics: Soumya Dutta 35


Acknowledgements
• Some of the following slides are adapted from the excellent course
materials made available by:
• Prof. Klaus Mueller (State University of New York at Stony Brook)
• Prof. Tamara Munzner (University of British Columbia)

IITK CS661: Big Data Visual Analytics: Soumya Dutta 36


Visualization
• Computer-based visualization systems provide visual representations
of datasets designed to help people carry out tasks more effectively.

IITK CS661: Big Data Visual Analytics: Soumya Dutta 37


Visualization: Human in the Loop… Why?
• Computer-based visualization systems provide visual representations
of datasets designed to help people carry out tasks more effectively.
• Visualization is suitable when there is a need to augment human capabilities
rather than replace people with computational decision-making methods.

IITK CS661: Big Data Visual Analytics: Soumya Dutta 38


Visualization: Human in the Loop… Why?
• Computer-based visualization systems provide visual representations
of datasets designed to help people carry out tasks more effectively.
• Visualization is suitable when there is a need to augment human capabilities
rather than replace people with computational decision-making methods.
• Don’t need vis when fully automatic solution exists and is trusted
• Many analysis problems ill-specified
• don’t know exactly what questions to ask in advance

IITK CS661: Big Data Visual Analytics: Soumya Dutta 39


Representation for Data: Why?
• Computer-based visualization systems provide visual representations
of datasets designed to help people carry out tasks more effectively.
• Replace cognition with perception

IITK CS661: Big Data Visual Analytics: Soumya Dutta 40


Representation for Data: Why?
• Computer-based visualization systems provide visual representations
of datasets designed to help people carry out tasks more effectively.
• Replace cognition with perception

IITK CS661: Big Data Visual Analytics: Soumya Dutta 41


Why Depend on Vision?
• Computer-based visualization systems provide visual representations
of datasets designed to help people carry out tasks more effectively.

• ~50% (roughly) of our brain is dedicated to vision


• Human visual system is high-bandwidth channel to brain
• Vision is a Massively Parallel Processor dedicated to
• Detect
• Analyze
• Recognize
• Reason with

IITK CS661: Big Data Visual Analytics: Soumya Dutta 42


Why not Only Show the Summary Data?
• Computer-based visualization systems provide visual representations
of datasets designed to help people carry out tasks more effectively.

• Summaries can lose information, details matter!


• Confirm expected
• Find unexpected patterns
• Assess validity of data models

IITK CS661: Big Data Visual Analytics: Soumya Dutta 43


Why not Only Show the Summary Data?
• Computer-based visualization systems provide visual representations
of datasets designed to help people carry out tasks more effectively.

IITK CS661: Big Data Visual Analytics: Soumya Dutta 44


Utilization of Resources
• Three different kinds of resources to think about
• Computational limits
• Computation time, system memory
• Display limits
• Limited number of pixels on screen to use
• Information density: ratio of space used to encode information vs whitespace
• Tradeoff between clutter and wasting space
• Human limits
• Human time, human memory, human attention

IITK CS661: Big Data Visual Analytics: Soumya Dutta 45


Datasets

IITK CS661: Big Data Visual Analytics: Soumya Dutta 46


What All Are Needed With Visualization?
• Data (wide variety)

IITK CS661: Big Data Visual Analytics: Soumya Dutta 47


What All Are Needed With Visualization?
• Data (wide variety)
• Algorithms
• data mining
• data analytics

IITK CS661: Big Data Visual Analytics: Soumya Dutta 48


What All Are Needed With Visualization?
• Data (wide variety)
• Algorithms
• data mining
• data analytics, AI/ML, statistical, ….
• Computer
• run those algorithms
• data storage

IITK CS661: Big Data Visual Analytics: Soumya Dutta 49


What All Are Needed With Visualization?
• Data (wide variety)
• Algorithms
• data mining
• data analytics, AI/ML, statistical, ….
• Computer
• run those algorithms
• data storage
• Humans
• with a purpose/need to understand their data
• endowed with cognitive faculties, creative thought, intuition
• domain expertise

IITK CS661: Big Data Visual Analytics: Soumya Dutta 50


What All Are Needed With Visualization?
• Data (wide variety)
• Algorithms
• data mining
• data analytics, AI/ML, statistical, ….
• Computer
• run those algorithms
• data storage
• Humans
• with a purpose/need to understand their data
• endowed with cognitive faculties, creative thought, intuition
• domain expertise
• Understanding of humans
• perception, cognition, HCI issues
• we can gain it through experimentation with humans

IITK CS661: Big Data Visual Analytics: Soumya Dutta 51


Visual Analytics
• Data (wide variety)
• Algorithms
• data mining
• data analytics, AI/ML, statistical, ….
• Computer
• run those algorithms
• data storage = Visual
• Humans
• with a purpose/need to understand their data Analytics
• endowed with cognitive faculties, creative thought,
intuition
• domain expertise
• Understanding of humans
• perception, cognition, HCI issues
• we can gain it through experimentation with humans

IITK CS661: Big Data Visual Analytics: Soumya Dutta 52


Visualization Can Be Beautiful

IITK CS661: Big Data Visual Analytics: Soumya Dutta 53


Visualization is Fast

< ~200 ms to recognize the red dot

IITK CS661: Big Data Visual Analytics: Soumya Dutta 54


Visualization is Fast

IITK CS661: Big Data Visual Analytics: Soumya Dutta 55


Visualization Can Be Interactive

IITK CS661: Big Data Visual Analytics: Soumya Dutta https://www.youtube.com/watch?v=Aht3wotrrBk&list=PLyCRt3MN8s8OJp-M5UdCQv-NDllAqJOb5&index=5 56


Visualization Can Be Deceptive

IITK CS661: Big Data Visual Analytics: Soumya Dutta 57


Visualization Can Be Deceptive

Which circle in the middle is larger?


IITK CS661: Big Data Visual Analytics: Soumya Dutta 58
Visualization Can Be Deceptive

Are the horizontal lines parallel or do they slope?


IITK CS661: Big Data Visual Analytics: Soumya Dutta 59

You might also like