0% found this document useful (0 votes)
16 views50 pages

[#0] Introduction

The document provides an introduction to a data science course taught by instructors Hyung-Sin Kim and Yohan Jo, detailing their backgrounds and the course's objectives. It outlines the curriculum, emphasizing the importance of programming, data processing, and the methodologies used in data science education. Additionally, it describes the course structure, logistics, and grading system, while encouraging active participation and collaboration among students.

Uploaded by

rmstn365
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views50 pages

[#0] Introduction

The document provides an introduction to a data science course taught by instructors Hyung-Sin Kim and Yohan Jo, detailing their backgrounds and the course's objectives. It outlines the curriculum, emphasizing the importance of programming, data processing, and the methodologies used in data science education. Additionally, it describes the course structure, logistics, and grading system, while encouraging active participation and collaboration among students.

Uploaded by

rmstn365
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Introduction

Lecture 0

Hyung-Sin Kim and Yohan Jo


2
Instructor – Hyung-Sin Kim (first half)
• Assistant Professor in Data Science, SNU

• Software Engineer at

• Postdoctoral Scholar in Computer Science, UC Berkeley

• BS/MS/PhD in Electrical and Computer Science, SNU

• 70 academic papers

• 4 Postdoc/PhD Fellowships, 3 Best paper finalists, 2 Paper awards

3
Instructor – Yohan Jo (second half)

• Assistant Professor in Data Science, SNU

• Applied Scientist, Amazon

• MS/PhD in Language and Information Technologies,


Carnegie Mellon University

• BS/MS in Computer Science, KAIST

4
Data Science
- GSDS View -
Lecture 0-1

Hyung-Sin Kim and Yohan Jo


We Want Insight
Useful insight!

6
Data Science – From Data to Insight
Big, Diverse Data Useful insight!

Data Science
7
Data Science – From Data to Insight
Big, Diverse, Dirty Data Useful insight!

Data Science
Cleansing and Pre-
Storing – Data Lake Analysis
processing

8
Data Science – From Data to Insight
Big, Diverse, Dirty Data Useful insight!

Storing and Managing


– Data Lake
Cleansing and Pre-processing Analysis

9
Data Science – Four Pillars (ABC+D)
Big, Diverse, Dirty Data Useful insight!

Domain (application)

Where data comes from…


Where insights are applied to…

AI model
Big Data
( Math, Stat, ML/DL)

Computing

Storing and Managing


– Data Lake
Cleansing and Pre-processing Analysis

10
Drew Conway’s Venn Diagram (2010)

[The image is from http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram] 11


Ullman’s Venn Diagram (2021)

[The figures are from “The Battle for Data Science,” http://sites.computer.org/debull/A20june/p8.pdf] 12
Data Scientist Job …

13
Data Science, a Global Megatrend
• https://www.youtube.com/watch?v=2ZopVhw6t3Y

14
Data Science, a Global Megatrend
• https://www.youtube.com/watch?v=9Hx5FppPVso

15
Data Science, a Global Megatrend
• In UC Berkeley
▪ 6,000 undergraduate students take DS courses per year
▪ ~2,000 students and 76 TAs for a single course (Data 8, Spring 2022)

16
Graduate School of Data Science

17
GSDS – Vision and Mission
• Make students from various backgrounds (D) dive into core principles (ABC)
of data science and let these ambidexters lead data-driven innovation in
various fields
▪ Globally unique mission… challenging of course…

• Methodologies
▪ Not an undergraduate department as a silo but a graduate school as a hub
▪ Not an MBA-ish but a hardcore program to change students’ DNA
▪ Not only advanced but also basic courses that are open to all students outside of
GSDS (no number limit)

18
GSDS – Curriculum
+bootcamp 데이터사이언스를 위한 데이터사이언스를 위한
수학과 통계의 기초 (3) 컴퓨팅의 기초 (3)
Every semester

데이터사이언스를 위
빅데이터 및 지식 관리 데이터사이언스를 위
한 머신러닝 및 딥러닝
시스템 1 (3) 한 컴퓨팅 1 (3)
1 (3)

데이터사이언스를 위
빅데이터 및 지식 관리 데이터사이언스를 위 데이터사이언스
한 머신러닝 및 딥러닝
시스템 2 (3) 한 컴퓨팅 2 (3) 프로젝트 (3)
2 (3)

A B C
19
GSDS – Growth
• (11 + 2) new faculty members have joined
▪ Google, Amazon, US faculty…
• 40 MS/15 PhD to 80 MS/30 PhD per year
• Being recognized gradually…

1600 1452
1400
1200
1000 928 913
754 801
800 735

600 539
471 427
418 374
400 248 268 283
161 150 193
200 87
0
2020년 1학기 2020년 2학기 2021년 1학기 2021년 2학기 2022년 1학기 2022년 2학기

데이터사이언스학과 타 학과(부) 합계

20
GSDS – Outreach
• Summer/Winter GSDS Bootcamp
▪ 4-week quick and intense program for beginners
▪ Flipped learning
▪ Computing for DS
▪ Math and Statistics for DS

21
GSDS – Outreach
• Google ExploreCSR
▪ Workshops to explore various research fields
▪ Mentoring for CS/DS carrier and research
▪ Will do it again in the upcoming Summer

22
GSDS – Outreach
• Ambient AI Bootcamp & Competition
▪ Google donation and funding
▪ 3-week education
▪ 1-month project with mentoring

23
GSDS – Outreach
• Ambient AI Bootcamp & Competition
▪ Google donation and funding
▪ 3-week education
▪ 1-month project with mentoring

24
About This Course
Lecture 0-2

Hyung-Sin Kim and Yohan Jo


Computing Foundations for Data Science

26
Chef or Farmer?

27
What Chefs Do…
• Making a unique recipe
▪ What ingredient to use
▪ What utensils to use
▪ Sequence of cooking
▪ Time duration of each step
• Cooking fluently according to the recipe

• They are not required to farm, fish, or feed livestock


• But if you are a top-level chef, you do take care of so
many things
▪ Ingredient, chemistry, climate, history, culture…

28
Computer Science
• Computer Science is not mainly about computer hardware
▪ Semiconductor (Moore’s law), Integrated circuit – Electrical engineering!
▪ Like chefs’ main job is neither farming nor fishing

29
Computer Science
• Computer Science is more about what computers can do (software)
▪ How to make computers do what we want them to do?
▪ We should provide a nice recipe for computers to follow

▪ Algorithm: A recipe for computers to follow (logical steps)


▪ Program: An instruction set for a computer to understand put an algorithm to
practice
▪ https://www.youtube.com/watch?v=9otrE0SyrFE

• But if you want to be an exceptional programmer, you do need more!


▪ Hardware characteristics (not to design it but use it efficiently)
▪ Application (may be your current major)

• This is what GSDS is heading toward!


30
Computing, as Part of Data Science

Manager Various
workers

How to make my workers most


productive?

31
Computing, as Part of Data Science

Data Scientist Various


computers

All data are electronically


stored and processed…

How to make my computers


most productive?

32
Computing Foundations for Data Science:
A Pathway toward…

33
Power of Programming
• Defining new operations
▪ “Go to work” Program
• (1) check time (2) if it is 9:00, go to the office (3) otherwise, stay at home
▪ "Go home“ Program
• (1) check time (2) if it is 18:00, go home (3) otherwise, stay at the office

• Combining (Reusing) already-defined operations to do more complex things


▪ “Commute” Program
• Repeat the steps (1) and (2) infinitely
• (1) If you are at home, do “Go to work"
• (2) Otherwise, do “Go home"

• Repeating the two steps enables computers to do incredibly many things!


34
Programming Language
• Human(natural) languages are based on common sense of mankind
▪ Computers do not have that common sense
▪ Human languages are difficult for computers to understand

• Programming languages have accurate structure and accurate meaning for


computers to understand (zero ambiguity)
▪ Programming: Writing an algorithm in a programming language
▪ It is also called “coding,” since it was actually like a secret code (011010101110111…)

• Programming languages have evolved so that human can also easily


understand
▪ Less like a secret code these days ☺
35
This Course is About …
• You will learn fundamental programming concepts to handle data
▪ How to (efficiently) process data to draw conclusions: Algorithms
▪ How to store/organize data for its (efficient) processing: Data types/structures
▪ How to make computers (efficiently) execute your algorithms: Programming in
Python
▪ Yes, now that we are talking about big data, efficiency is important!

• Clarification: Although this course does cover Python, it is NOT mainly about
Python, but more about fundamental concepts by using Python as a tool

• You will get a glimpse of how Silicon Valley companies do technical


interviews to recruit software engineers ☺
36
This Course is About …
• You will learn how a computer system works
▪ How to process and store data at the hardware level: Logic gates
▪ Why/How all the information is represented in bits (010101): Data representation
▪ Several great ideas in computer architecture

• You will learn another language called C


▪ Why it is important and how it is distinguished from Python
▪ If you can implement an algorithm using Python, you should be able to do it using C!

37
This Course is NOT About…
• This course does not cover computer systems nor machine-friendly
programming, which is important for high-performance big data computing
▪ Please check out
• Computing for Data Science 1, 2
• Big data and knowledge management systems 1, 2

• This course does not cover machine/deep learning programming, which


needs (some) mathematical background as well
▪ Please check out
• Computing for Data Science 1
• Machine learning and deep learning for Data Science 1, 2

38
This Course is NOT About…
• This course does not cover various libraries and frameworks for data analysis,
such as pandas, numpy, Tensorflow, and Pytorch

• This course is not easy! It will require devotion


▪ You will learn a lot (three in one) and there will be a lot of assignments
▪ If you don’t have that much time, I recommend other slow-paced courses

39
Course Logistics
Lecture 0-3

Hyung-Sin Kim and Yohan Jo


Hybrid Learning
• Q&A, assignments, grading … (feel free to poke them ☺)

• Practices/QnA
0. Online lectures 2. TAs/Instructor

1. Students 3. Instructor
• Self-paced, • Integral guidance
self-disciplined learning

41
Learning Methodology
• When/How to ask questions?
▪ During lectures (zoom chat or offline)
▪ Q&A session after each lecture – For shy ladies and gentlemen

• I really encourage students to try to answer any question


▪ Remember! You know something only when you can explain it clearly to others
▪ By organizing your thought, you will be the one who has the most benefit

42
Learning Methodology
• Study group (optional)
▪ Will be organized later this week (4~5 members/group)
• Please answer the google form
▪ Small group discussion during lectures

43
So… for Each Lecture…
• Before the lecture
▪ You should watch the videos in advance
▪ While doing this, you should turn on Jupytor or Visual studio and type the codes
▪ Ask questions on Slack
• During the lecture
▪ Change your zoom name like 스터디그룹명-김형신
▪ Quiz: You will take a quiz with your group members
▪ Review: I will review the content of the videos
▪ Practice: A TA or I will lead a practice session and you will solve some problems collab
oratively
• After the lecture
▪ Review and ask questions on Slack

44
TAs
• For the first half
[email protected] [email protected] [email protected]

한예규 장건민 지서진

45
TAs
• For the second half
[email protected] [email protected] [email protected]

김주희 심지훈 이유섭

46
Textbook
• Practical programming (pdf is provided!) • Introduction to computing systems
▪ Used for the first half of this course ▪ Used for the second half of this course

47
Grading
• Final 100% (2/14, online)

48
Overcoming Psychological Barrier
• https://www.youtube.com/watch?v=r7onWtuD92U

Dr. Jordan Peterson

49
Thanks!
[email protected]

50

You might also like