0% found this document useful (0 votes)
3 views44 pages

CSE6242-000-Intro

CSE6242/CX4242 is a course on Data & Visual Analytics taught by Professor Duen Horng (Polo) Chau at Georgia Tech, focusing on working with large datasets and combining computational techniques with interactive visualization. The course includes practical assignments, a group project, and emphasizes the importance of data science skills in various industries. Students are encouraged to actively participate, utilize resources like Ed Discussion, and engage with the course community.

Uploaded by

runner4ever81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views44 pages

CSE6242-000-Intro

CSE6242/CX4242 is a course on Data & Visual Analytics taught by Professor Duen Horng (Polo) Chau at Georgia Tech, focusing on working with large datasets and combining computational techniques with interactive visualization. The course includes practical assignments, a group project, and emphasizes the importance of data science skills in various industries. Students are encouraged to actively participate, utilize resources like Ed Discussion, and engage with the course community.

Uploaded by

runner4ever81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

poloclub.github.

io/#cse6242

CSE6242 / CX4242
Data & Visual Analytics
Duen Horng (Polo) Chau
Professor, College of Computing
Associate Director, MS Analytics
Director of Industry Relations, The Institute for Data Engineering and Science
Associate Director of Corporate Relations, The Center for Machine Learning
Georgia Tech

1
Course Registration
Classroom has capacity for 305 students. We will raise the number
of seats to 305.

If you have decided not to take this course, please free up your
seat ASAP, so other students can get in.

If you are on the waitlist, please wait for seats to open up.
Enrollment changes a lot during rst week of class.

CSE 6242 A CX 4242 A


233/235 seats lled 54/55 seats lled
102 waitlist slots taken 13 waitlist slots taken
2
fi
fi
fi
Course TAs Be very nice to them!

Feyzi Can Eser


Yiwei Kuang
Ishan A Desai
Arya Mohan
Huayi Peng
Mahek Mishra
Zilu Zhu
Aniruddha Prashant Deshpande
3
Google “Polo Chau” (easy to nd; should be rst result)

Welcome to connect
on Linkedin!
4
fi
fi
How to address Polo?
Grammatically correct

Prof. Chau

Dr. Chau

Grammatically incorrect, but popular

Prof. Polo

Dr. Polo
5
The course focuses on
working with large datasets.

(Also the focus of Polo’s research group)

6
Polo Club of Data Science
poloclub.github.io

AI
ARTIFICIAL
INTELLIGENCE
+ HI HUMAN
INTELLIGENCE
Scalable, interactive, interpretable tools to make sense of
complex large-scale datasets and models
HUMAN AI CENTERED

FOR EVERYONE poloclub.github.io

Learn & Explain Interpret & Attribute Guide & Safeguard


☝ Di usion Explainer ☝ WizMap ☝ LLM Self Defense
Transformer Explainer LLM Attributor Click Di usion
MeMemo
ff
ff
Internet
50 Billion Web Pages

www.worldwidewebsize.com www.opte.org 9
Facebook
2 Billion Users

10
Citation Network
250 Million Articles

www.scirus.com/press/html/feb_2006.html#2 Modi ed from well-formed.eigenfactor.org


11
fi
Many More
Twitter
Who-follows-whom (500 million users)

Who-buys-what (120 million users)

cellphone network
Who-calls-whom (100 million users)

Protein-protein interactions
200 million possible interactions in human genome

Sources: www.selectscience.net www.phonedog.com www.mediabistro.com www.practicalecommerce.com/


12
“Big Data” Analyzed
Graph Nodes Edges

YahooWeb 1.4 Billion 6 Billion

Symantec Machine-File Graph 1 Billion 37 Billion

Twitter 104 Million 3.7 Billion

Phone call network 30 Million 260 Million

We also work with small data.


Small data also needs love.
13
7±2
Number of items an average human
holds in working memory
George Miller, 1956

14
7
15
Data
Insights
16
How to do that?

COMPUTATION
+
HUMAN INTUITION

17
Or, to ride the AI wave…

ARTIFICIAL INTELLIGENCE
+
HUMAN INTELLIGENCE

18
How to do that?

COMPUTATION INTERACTIVE VIS


Automatic User-driven; iterative
Summarization,
Interaction, visualization
clustering, classi cation

>Millions of nodes Thousands of nodes

Both develop methods for making


sense of network data
19
fi
Our Approach for Big Data Analytics

Human-Computer
MACHINE LEARNING HCI Interaction

Automatic User-driven; iterative


Summarization,
Interaction, visualization
clustering, classi cation

>Millions of items Thousands of items

Our research combines the


Best of Both Worlds
20
fi
Our mission & vision:

Scalable, interactive, usable


tools for big data analytics

21
“Computers are incredibly fast,
accurate, and stupid.
Human beings are incredibly
slow, inaccurate, and brilliant.
Together they are powerful
beyond imagination.”

(Einstein might or might not have said this.)


22
Logistics
Course website https://poloclub.github.io/cse6242-2024fall-campus/
(policies, syllabus, (link also available on Canvas)
schedule, etc.)

Discussion, Q&A, Ed Discussion


nd teammates (link available on Canvas)

Assignment Canvas/Gradescope
Submission

23
fi
Course Homepage
For syllabus, schedule, projects, datasets, etc.

If you Google “cse6242”, you will see many matches.


Make sure you click the correct site!
Join Ed Discussion Right Away
via canvas.gatech.edu

25
Important to join Ed Discussion
because…
• We will announce events related to this class and data
science in general

• Distinguished lectures, seminars

• Hackathons

• Company recruitment events (with free food, swags!)

26
Add your photo to help us and your classmates recognize you!

Canvas Ed Discussion

If you need help cropping headshot photo into square shape, use
Magic Crop (https://poloclub.github.io/magic-crop/)
Course Goals

28
What is Data & Visual Analytics?

No formal de nition!

Polo’s de nition:
the interdisciplinary science of combining
computation techniques and
interactive visualization
to transform and model data to aid
discovery, decision making, etc.
29
fi
fi
What are the “ingredients”?

Need to worry (a lot) about: storage, complex system


design, scalability of algorithms, visualization
techniques, interaction techniques, statistical tests, etc.

Wasn’t this complex before this big data era. Why?

30
http://spanning.com/blog/choosing-between-storage-based-and-unlimited-storage-for-cloud-data-backup/ 31
What is big data? Why care?
Many businesses are based on big data.
Search engines: rank webpages, predict what you’re going to type

Advertisement: infer what you like, based on what your friends like;
show relevant ads

E-commerce: recommends movies/products (e.g., Net ix, Amazon)

Health IT: patient records (EMR)

Finance


32

fl
Good news! Many jobs!

Most companies are looking for “data scientists”


The data scientist role is critical for organizations looking
to extract insight from information assets for ‘big data’
initiatives and requires a broad combination of skills
that may be ful lled better as a team
- Gartner (http://www.gartner.com/it-glossary/data-scientist)

Breadth of knowledge is important.


This course helps you learn some important skills.

33
fi
Course Schedule
(Analytics Building Blocks)

Collection

Cleaning

Integration

Analysis

Visualization

Presentation

Dissemination
34
Building blocks. Not Rigid “Steps”.
Collection Can skip some
Cleaning
Can go back (two-way street)
Integration
• Data types inform visualization design
Analysis • Data size informs choice of algorithms
Visualization • Visualization motivates more data cleaning
Presentation • Visualization challenges algorithm
assumptions
Dissemination e.g., user nds that results don’t make sense

35
fi
Course Goals

• Learn visual and computational techniques


and use them in complementary ways

• Gain a breadth of knowledge

• Learn practical know-how by working on


real data & problems

36
Grading
• [50%] 4 homework assignments
• End-to-end analysis
• Techniques (computation and vis)
• “Big data” tools, e.g., Hadoop, Spark, etc.
• [50%] Group project — 4 to 6 people
• [Bonus points] Quizzes
• 4 online quizzes in total; ~10min each
• 1% course grade point each; lowest score dropped
• No Exams 🎉 🎉 🎉
37
Policies. Very Important!
(on course website)

Attendance, COVID-19, grading, plagiarism,


collaboration, late submission, and the “warnings”
about the dif culty this course

38
fi
From Previous Classes…

• Class projects turned into papers at top


conferences

• Projects as portfolio pieces on CV

• Increased job and internship opportunities

• Former students sent me “thank you” notes

39
Full conference paper40
Short paper 41
“As someone with 25 years work experience, I nd my self directly applying what I
am learning within days. The skill set of rapid learning that you are teaching is the
main thing I interview for.”

“…thank you for the materials taught in DVA. As it was perfectly aligned with the what
employers are looking out for. It made less challenging for me to secure this new job
[Business Intelligence engineer at Amazon] in this competitive job market.”

“I would like to say thank you for your class! Thanks to the skills I got from the class
and the project, I got the offer.”

“I feel like the concepts from your class are like a rite of passage for an aspiring
data scientist. Assignments lead to a feelings of accomplishment and truly
progressing in my area of passion.”

“I really get more intuition about how to deal with data with some powerful tools in
HW3 [uses AWS]. That feeling is beyond description for me.”
42
fi
What we expects from you
• Actively participate throughout the course!

• If you need help, let us know early — the earlier you let us
know, the more help we can offer

• Help your fellow classmates, e.g., help answer questions


on Ed Discussion

• Share your ideas! Ideas for improving learning


experiences, let us know

43
FREE After-class Coffee ☕
• After (some) classes, we’ll have 5-7 volunteers for
FREE after-class coffee

• Polo’s treat. You can order coffee, tea, pastries —


whatever you want

• Very casual — you can ask me ANYTHING

• Will try doing this two weeks from now!

44

You might also like