DATA228 Lecture Notes Week 1
DATA228 Lecture Notes Week 1
Sangjin Lee
Overview
• Introduction to Big D t
• H doop
• HDFS
• YARN
• M pReduce
• Sp rk
• High-level APIs
• Sp rkSQL
• Future of Big D t
a
a
a
a
a
a
a
a
Instructor
• Cl ss will st rt on time
• No l te ssignments re ccepted
• Quizzes re in-cl ss
• Ex ms re comprehensive
• Poll
Rise of Big Data
Rise of Big Data
• Tr ns ctions
• Internet cr wling
• Impressions nd clicks
• Perfect “d t ” storm
• D t re diverse
• D t re often unstructured
a
a
a
a
a
a
a
a
a
ff
a
a
a
a
a
Rise of Big Data
Relational DBs
• “Good ol’ d ys” when d t w s ne tly in rel tion l DBs (Or cle, MySQL, …)
• But…
• Comp nies lre dy h d to ind w ys to sc le beyond gig ntic single-m chine DBs
• 256 GB RAM
• This points to some sort of d t compute fr mework th t c n model this beh vior nd lets
you progr m this in n e sy nd e icient w y
• Horizont l sc ling
• Compute on dem nd
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
Your “dev” environment
• Poll