[#0] Introduction
[#0] Introduction
Lecture 0
• Software Engineer at
• 70 academic papers
3
Instructor – Yohan Jo (second half)
4
Data Science
- GSDS View -
Lecture 0-1
6
Data Science – From Data to Insight
Big, Diverse Data Useful insight!
Data Science
7
Data Science – From Data to Insight
Big, Diverse, Dirty Data Useful insight!
Data Science
Cleansing and Pre-
Storing – Data Lake Analysis
processing
8
Data Science – From Data to Insight
Big, Diverse, Dirty Data Useful insight!
9
Data Science – Four Pillars (ABC+D)
Big, Diverse, Dirty Data Useful insight!
Domain (application)
AI model
Big Data
( Math, Stat, ML/DL)
Computing
10
Drew Conway’s Venn Diagram (2010)
[The figures are from “The Battle for Data Science,” http://sites.computer.org/debull/A20june/p8.pdf] 12
Data Scientist Job …
13
Data Science, a Global Megatrend
• https://www.youtube.com/watch?v=2ZopVhw6t3Y
14
Data Science, a Global Megatrend
• https://www.youtube.com/watch?v=9Hx5FppPVso
15
Data Science, a Global Megatrend
• In UC Berkeley
▪ 6,000 undergraduate students take DS courses per year
▪ ~2,000 students and 76 TAs for a single course (Data 8, Spring 2022)
16
Graduate School of Data Science
17
GSDS – Vision and Mission
• Make students from various backgrounds (D) dive into core principles (ABC)
of data science and let these ambidexters lead data-driven innovation in
various fields
▪ Globally unique mission… challenging of course…
• Methodologies
▪ Not an undergraduate department as a silo but a graduate school as a hub
▪ Not an MBA-ish but a hardcore program to change students’ DNA
▪ Not only advanced but also basic courses that are open to all students outside of
GSDS (no number limit)
18
GSDS – Curriculum
+bootcamp 데이터사이언스를 위한 데이터사이언스를 위한
수학과 통계의 기초 (3) 컴퓨팅의 기초 (3)
Every semester
데이터사이언스를 위
빅데이터 및 지식 관리 데이터사이언스를 위
한 머신러닝 및 딥러닝
시스템 1 (3) 한 컴퓨팅 1 (3)
1 (3)
데이터사이언스를 위
빅데이터 및 지식 관리 데이터사이언스를 위 데이터사이언스
한 머신러닝 및 딥러닝
시스템 2 (3) 한 컴퓨팅 2 (3) 프로젝트 (3)
2 (3)
A B C
19
GSDS – Growth
• (11 + 2) new faculty members have joined
▪ Google, Amazon, US faculty…
• 40 MS/15 PhD to 80 MS/30 PhD per year
• Being recognized gradually…
1600 1452
1400
1200
1000 928 913
754 801
800 735
600 539
471 427
418 374
400 248 268 283
161 150 193
200 87
0
2020년 1학기 2020년 2학기 2021년 1학기 2021년 2학기 2022년 1학기 2022년 2학기
데이터사이언스학과 타 학과(부) 합계
20
GSDS – Outreach
• Summer/Winter GSDS Bootcamp
▪ 4-week quick and intense program for beginners
▪ Flipped learning
▪ Computing for DS
▪ Math and Statistics for DS
21
GSDS – Outreach
• Google ExploreCSR
▪ Workshops to explore various research fields
▪ Mentoring for CS/DS carrier and research
▪ Will do it again in the upcoming Summer
22
GSDS – Outreach
• Ambient AI Bootcamp & Competition
▪ Google donation and funding
▪ 3-week education
▪ 1-month project with mentoring
23
GSDS – Outreach
• Ambient AI Bootcamp & Competition
▪ Google donation and funding
▪ 3-week education
▪ 1-month project with mentoring
24
About This Course
Lecture 0-2
26
Chef or Farmer?
27
What Chefs Do…
• Making a unique recipe
▪ What ingredient to use
▪ What utensils to use
▪ Sequence of cooking
▪ Time duration of each step
• Cooking fluently according to the recipe
28
Computer Science
• Computer Science is not mainly about computer hardware
▪ Semiconductor (Moore’s law), Integrated circuit – Electrical engineering!
▪ Like chefs’ main job is neither farming nor fishing
29
Computer Science
• Computer Science is more about what computers can do (software)
▪ How to make computers do what we want them to do?
▪ We should provide a nice recipe for computers to follow
Manager Various
workers
31
Computing, as Part of Data Science
32
Computing Foundations for Data Science:
A Pathway toward…
33
Power of Programming
• Defining new operations
▪ “Go to work” Program
• (1) check time (2) if it is 9:00, go to the office (3) otherwise, stay at home
▪ "Go home“ Program
• (1) check time (2) if it is 18:00, go home (3) otherwise, stay at the office
• Clarification: Although this course does cover Python, it is NOT mainly about
Python, but more about fundamental concepts by using Python as a tool
37
This Course is NOT About…
• This course does not cover computer systems nor machine-friendly
programming, which is important for high-performance big data computing
▪ Please check out
• Computing for Data Science 1, 2
• Big data and knowledge management systems 1, 2
38
This Course is NOT About…
• This course does not cover various libraries and frameworks for data analysis,
such as pandas, numpy, Tensorflow, and Pytorch
39
Course Logistics
Lecture 0-3
• Practices/QnA
0. Online lectures 2. TAs/Instructor
1. Students 3. Instructor
• Self-paced, • Integral guidance
self-disciplined learning
41
Learning Methodology
• When/How to ask questions?
▪ During lectures (zoom chat or offline)
▪ Q&A session after each lecture – For shy ladies and gentlemen
42
Learning Methodology
• Study group (optional)
▪ Will be organized later this week (4~5 members/group)
• Please answer the google form
▪ Small group discussion during lectures
43
So… for Each Lecture…
• Before the lecture
▪ You should watch the videos in advance
▪ While doing this, you should turn on Jupytor or Visual studio and type the codes
▪ Ask questions on Slack
• During the lecture
▪ Change your zoom name like 스터디그룹명-김형신
▪ Quiz: You will take a quiz with your group members
▪ Review: I will review the content of the videos
▪ Practice: A TA or I will lead a practice session and you will solve some problems collab
oratively
• After the lecture
▪ Review and ask questions on Slack
44
TAs
• For the first half
[email protected] [email protected] [email protected]
45
TAs
• For the second half
[email protected] [email protected] [email protected]
46
Textbook
• Practical programming (pdf is provided!) • Introduction to computing systems
▪ Used for the first half of this course ▪ Used for the second half of this course
47
Grading
• Final 100% (2/14, online)
48
Overcoming Psychological Barrier
• https://www.youtube.com/watch?v=r7onWtuD92U
49
Thanks!
[email protected]
50