CSE6242-000-Intro
CSE6242-000-Intro
io/#cse6242
CSE6242 / CX4242
Data & Visual Analytics
Duen Horng (Polo) Chau
Professor, College of Computing
Associate Director, MS Analytics
Director of Industry Relations, The Institute for Data Engineering and Science
Associate Director of Corporate Relations, The Center for Machine Learning
Georgia Tech
1
Course Registration
Classroom has capacity for 305 students. We will raise the number
of seats to 305.
If you have decided not to take this course, please free up your
seat ASAP, so other students can get in.
If you are on the waitlist, please wait for seats to open up.
Enrollment changes a lot during rst week of class.
Welcome to connect
on Linkedin!
4
fi
fi
How to address Polo?
Grammatically correct
Prof. Chau
Dr. Chau
Prof. Polo
Dr. Polo
5
The course focuses on
working with large datasets.
6
Polo Club of Data Science
poloclub.github.io
AI
ARTIFICIAL
INTELLIGENCE
+ HI HUMAN
INTELLIGENCE
Scalable, interactive, interpretable tools to make sense of
complex large-scale datasets and models
HUMAN AI CENTERED
www.worldwidewebsize.com www.opte.org 9
Facebook
2 Billion Users
10
Citation Network
250 Million Articles
cellphone network
Who-calls-whom (100 million users)
Protein-protein interactions
200 million possible interactions in human genome
14
7
15
Data
Insights
16
How to do that?
COMPUTATION
+
HUMAN INTUITION
17
Or, to ride the AI wave…
ARTIFICIAL INTELLIGENCE
+
HUMAN INTELLIGENCE
18
How to do that?
Human-Computer
MACHINE LEARNING HCI Interaction
21
“Computers are incredibly fast,
accurate, and stupid.
Human beings are incredibly
slow, inaccurate, and brilliant.
Together they are powerful
beyond imagination.”
Assignment Canvas/Gradescope
Submission
23
fi
Course Homepage
For syllabus, schedule, projects, datasets, etc.
25
Important to join Ed Discussion
because…
• We will announce events related to this class and data
science in general
• Hackathons
26
Add your photo to help us and your classmates recognize you!
Canvas Ed Discussion
If you need help cropping headshot photo into square shape, use
Magic Crop (https://poloclub.github.io/magic-crop/)
Course Goals
28
What is Data & Visual Analytics?
No formal de nition!
Polo’s de nition:
the interdisciplinary science of combining
computation techniques and
interactive visualization
to transform and model data to aid
discovery, decision making, etc.
29
fi
fi
What are the “ingredients”?
30
http://spanning.com/blog/choosing-between-storage-based-and-unlimited-storage-for-cloud-data-backup/ 31
What is big data? Why care?
Many businesses are based on big data.
Search engines: rank webpages, predict what you’re going to type
Advertisement: infer what you like, based on what your friends like;
show relevant ads
Finance
…
32
fl
Good news! Many jobs!
33
fi
Course Schedule
(Analytics Building Blocks)
Collection
Cleaning
Integration
Analysis
Visualization
Presentation
Dissemination
34
Building blocks. Not Rigid “Steps”.
Collection Can skip some
Cleaning
Can go back (two-way street)
Integration
• Data types inform visualization design
Analysis • Data size informs choice of algorithms
Visualization • Visualization motivates more data cleaning
Presentation • Visualization challenges algorithm
assumptions
Dissemination e.g., user nds that results don’t make sense
35
fi
Course Goals
36
Grading
• [50%] 4 homework assignments
• End-to-end analysis
• Techniques (computation and vis)
• “Big data” tools, e.g., Hadoop, Spark, etc.
• [50%] Group project — 4 to 6 people
• [Bonus points] Quizzes
• 4 online quizzes in total; ~10min each
• 1% course grade point each; lowest score dropped
• No Exams 🎉 🎉 🎉
37
Policies. Very Important!
(on course website)
38
fi
From Previous Classes…
39
Full conference paper40
Short paper 41
“As someone with 25 years work experience, I nd my self directly applying what I
am learning within days. The skill set of rapid learning that you are teaching is the
main thing I interview for.”
“…thank you for the materials taught in DVA. As it was perfectly aligned with the what
employers are looking out for. It made less challenging for me to secure this new job
[Business Intelligence engineer at Amazon] in this competitive job market.”
“I would like to say thank you for your class! Thanks to the skills I got from the class
and the project, I got the offer.”
“I feel like the concepts from your class are like a rite of passage for an aspiring
data scientist. Assignments lead to a feelings of accomplishment and truly
progressing in my area of passion.”
“I really get more intuition about how to deal with data with some powerful tools in
HW3 [uses AWS]. That feeling is beyond description for me.”
42
fi
What we expects from you
• Actively participate throughout the course!
• If you need help, let us know early — the earlier you let us
know, the more help we can offer
43
FREE After-class Coffee ☕
• After (some) classes, we’ll have 5-7 volunteers for
FREE after-class coffee
44