0% found this document useful (0 votes)
45 views

Lecture 1

This document provides an overview of an advanced machine learning course taught by Prof. Jaesik Choi at UNIST. The course covers the history of artificial intelligence and machine learning through deep learning techniques like neural networks. It includes grading based on quizzes, programming projects, a midterm, and final project. The primary textbook is "Machine Learning: A Probabilistic Perspective" and supplementary materials are also provided. The instructor is Prof. Jaesik Choi and TA is Sehyun Lee. The course will cover various machine learning problems, feature selection, comparing data, and deep learning through lectures and a MOOC.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Lecture 1

This document provides an overview of an advanced machine learning course taught by Prof. Jaesik Choi at UNIST. The course covers the history of artificial intelligence and machine learning through deep learning techniques like neural networks. It includes grading based on quizzes, programming projects, a midterm, and final project. The primary textbook is "Machine Learning: A Probabilistic Perspective" and supplementary materials are also provided. The instructor is Prof. Jaesik Choi and TA is Sehyun Lee. The course will cover various machine learning problems, feature selection, comparing data, and deep learning through lectures and a MOOC.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

CSE544 Advanced Machine Learning

Course Introduction

Jaesik Choi
UNIST

Some slides courtesy of Prof. Kee-Eung Kim and Prof. Dan Roth
Course Overview
 Primary textbook
• Murphy, “Machine Learning: A Probabilistic Perspective”
 Grading (tentative)
• Quizzes and programming projects: 40%
• Midterm exam: 30%
• Final project: 30%
 Supplementary textbook
• Bishop, “Pattern Recognition and Machine Learning”
• Mehryar Mohri, Afshin Rostamizadeh and Ameet Talwalkar,
Foundations of Machine Learning, MIT Press, 2012
• Duda, Hart & Stork, “Pattern Classification”
• Mitchell, “Machine Learning”
 Related Courses
• CSE463: Machine Learning
Course Overview
 Instructor
• Jaesik Choi ([email protected])
 TA
• Sehyun Lee ([email protected])
Course Overview
 StarMOOC + Lecture
• History of Artificial Intelligence and Machine Learning
• Various Problems in Machine Learning
• Finding Good Features to Solve Machine Learning Problems
• Making Intelligence by Measuring the Similarity among Data Points
• Deep Neural Networks
• Participating AI Challenges

• Intelligence Created by Lines and Hyperplanes


• Sparse Representation
• Ensemble of Various Machine Learning Methods
• Explaining Deep Neural Networks
• Methods to Predict the Future Values
• Automatic Discovery of Causes and Effects
• Reinforcement Learning: Self-Taught Artificial Intelligence
Policies
 Check Blackboard regularly for Q&A and
Announcements
 No plagiarism
 Allow late submission (Up to 2 days)
• Python for programming projects
Little bit about myself
Little bit about myself
Little bit about myself
Little bit about myself
Little bit about myself
Little bit about myself
For some reason there is super high emphasis on Anglican, probabilistic
programming language, that is based on clojure and java approximately taught 2
weeks before exam. This is ridiculous, as 2 weeks before exam students are not in
position to learn new languages because of final projects. CSE subjects are notorious
for having assignments that are really demanding right as finals come, and professor
Choi must know this well. There is almost no resource online for learning Anglican,
which contributed to my poor performance in assignment and exam. Overall,
professor Choi does not have much time to teach but is too ambitious for students
who just want to learn programming principles but are presented programming skills
questions in exams...

However, I wish he was willing to teach his students as much as he was to attend his
conferences, extracurricular lectures and meetings. I fully respect him, I totally
understand that he is busy, but that doesn't mean that he has the right to butcher an
entire semester by canceling classes and putting make-ups to unreasonable dates
and times, asking difficult questions in the exams from the material he talked about
for only about an hour, and coming to the class unprepared. Professor, I still love you
as a respected faculty member and I don't think if anything's going to change that.
But you should seriously, seriously, and I mean it, seriously, consider your priorities if
you will decide to keep up with giving lectures. In the beginning of semester, when
you showed us the student who was speaking unpleasant of you in his feedback, I
was quite entertained and said "Man, this is too much, he can't be this bad can he?"
Boy, was I wrong. I seriously thought he exaggerated, but now regrettably I can say
that he was right. I'm sorry.
Prof. Jaesik Choi (UNIST Computer Science & Engineering)

▪ Career
➢UNIST ECE Asst. Prof. (2013~2017), Assoc. Prof. (2017~)
➢UNIST Rising Star Distinguished Professor (2018~)
➢Lawrence Berkeley National Laboratory Faculty Affiliate (2013~)
➢Director, Ministry of Sci. & ICT/UNIST Explainable AI Center (2017~)
➢Director, UNIST Industrial AI Center (2017~)
➢POSCO Steel Fellow Professor (2017~) The Relational Automatic Statistician
[ICML, 2016]
➢LG Electronics, Advisory Professor (2017~)
➢Samsung Future Technology Committee (2019~)
➢Samsung Advanced Institute of Technology, Advisory Professor (2018~)
➢Program Committee, ICML AAAI, IJCAI, UAI (2012~)

▪ Achievement
➢The first linear time Kalman Filter (IJCAI, 2012) Group CNN for Blast Furnace Operation
➢The Relational Automatic Statistician (ICML, 2016) [2017], automated 90% of operations
(Joint work with POSCO)
➢The world-first Deep Learning (in operation) to predict the changes of
a Blast Furnace (2017)

▪ Award
➢POSCO Smart Innovation Award (2018)
➢Winner, Digital Curling Competition at Game Playing Workshop
(2017), Game AI Tournament (2018)
➢ Best Paper Award, International Conference on Big Data Intelligence Won digital Curling competition
[UEC 2017, GPW 2018, ICML 2018]
and Computing (2015)
You can go when you fill in the form
https://goo.gl/forms/M3FTY8GwoOAcBy7D2

Advanced
A Machine Learning Class
Brief Introduction to theStart Survey
Instructor
Course Start Survey
Your major is
Your background in machine learning
Your online experience
Your programming language experience
Your programming experience
Your programming experience
What is your motivation
What is your preference for the final?
Anyother MOOC?
What is your preference for the final?
Take a look into
Advanced Machine Learning
MOOC week 1 –History of AI
Lecture week 1 – Bayesian
Lecture week 1 – Bayesian
MOOC week 2 - ML Problems
Lecture week 2 - Gaussian
Lecture week 2 - Gaussian
Lecture week 2 - Gaussian
Lecture week 2 - Gaussian
Lecture week 2 - Gaussian
MOOC week 3 –Feature Selection
Lecture week 3 –Learning in Regression
MOOC week 4 – Comparing Data
MOOC week 4 – Gaussian Processes
MOOC week 5 – Deep Neural Networks
Lecture week 5 – Support Vector Machines
Lecture week 5 – Support Vector Machines
Lecture week 5 – Support Vector Machines
MOOC week 6 – Kaggle Competition
Lecture week 6 – Exponential Family
Lecture week 6 – Exponential Family
Lecture week 6 – Expectation Maximization
Lecture week 6 – Expectation Maximization
MOOC week 7 – Kaggle Competition
MOOC week 7 – Kaggle Competition
Lecture week 7 – Dimensionality Reduction
Lecture week 7 – Dimensionality Reduction
Machine Learning?
 A set of methods that can
• automatically detect patterns in data, and then
• use the uncovered patterns to predict future data, or
• to perform other kinds of decision making under uncertainty
(such as planning to collect more data)

 Learning is useful when


• Human expertise doesn’t exist (navigating on Mars)
• Humans are unable to explain their expertise (speech
recognition, autonomous driving)
• Solution changes in time (routing on a computer network)
• Solution needs to be adapted to particular cases (web search
engines that learns user interests)
Supervised Learning: Classification
 Credit scoring
• Differentiate between low-risk and high-risk customers from
their incomes and savings
• Learned classification rule:
• IF income > 1 AND savings > 2
THEN low-risk ELSE high-risk
 Other application examples
• Face recognition
• Optical character recognition
• Speech recognition
• Input is temporal
• Sensor fusion: integration of inputs
from different modalities (e.g. acoustic and visual)
• Outlier detection (fraud detection)
Supervised Learning: Regression
 Estimate the price of a used car
• x = car attributes, y = price
• y = g(x|)
• g(∙) is the model
•  is the parameter
• Examples of g and 
• g(x|w0,w1)= w1 x + w0 (linear model)
• g(x|w0,w1,w2) = w2 x2 + w1 x + w0 (quadratic model)
• Output values continuous
vs. classification (discrete)
 Example applications
• Autonomous car navigation: Learn to steer given input (video
image, GPS, …)
• Typically same application areas as classification
Unsupervised learning
 Learning without correct output values (i.e., without
supervisor)
• Classification & regression tasks had target labels…
• Find regularities in the input
• “knowledge discovery”, “density estimation”

• e.g. clustering (group similar instances)

 Example application
• Customer segmentation in CRM
• Image compression: Color quantization
• Bioinformatics: Learning motifs
Reinforcement Learning
 Learning a policy (a sequence of correct actions to
reach the goal)
• No supervisor telling the correct action: Learn from delayed
reward (critic)
 Game playing

 Multiple agents, partial observability, …


Recent Advances

https://www.youtube.com/watch?v=Dy0hJWltsyE
The Role of Learning
 Learning is at the core of
• Understanding High Level Cognition
• Performing knowledge intensive inferences
• Building adaptive, intelligent systems
• Dealing with messy, real world data

 Learning has multiple purposes


• Knowledge Acquisition
• Integration of various knowledge sources to ens
ure robust behavior
• Adaptation (human, systems)

58
Learning = Generalization

Herbert Simon -
“Learning denotes changes in the system that are adaptive
in the sense that they enable the system to do the task
or tasks drawn from the same population more
efficiently and more effectively the next time.”

59
Learning = Generalization

Herbert Simon -
“Learning denotes changes in the system
that are adaptive in the sense that
they enable the system
to do the task or tasks drawn from the same population
more efficiently and more effectively
the next time.”

60
Learning = Generalization
The ability to perform a task in a situation
which has never been encountered before
 Classification
• Medical diagnosis; credit card applications; hand-written letters
 Planning and acting
• Navigation; game playing (chess, backgammon); driving a car
 Skills
• Balancing a pole; playing tennis
 Common sense reasoning
• Natural language interactions

Generalization depends on Representation as much


as it depends on the Algorithm used.
61
Why Study Learning?
 Computer systems with new capabilities.

• Develop systems that are too difficult or impossible to


construct manually.

• Develop systems that can automatically adapt and


customize themselves to the needs of the individual
users through experience.

• Discover knowledge and patterns in databases,


database mining, e.g. discovering purchasing patterns
for marketing purposes.

62
Some Broad ML Tasks
 Classification: assign a category to each item (e.g.,
document classification).

 Regression: predict a real value for each item (prediction of


stock values, economic variables).

 Ranking: order items according to some criterion (relevant


web pages returned by a search engine).

 Clustering: partition data into ‘homogenous’ regions


(analysis of very large datasets).

 Dimensionality reduction: find lower-dimensional manifold


preserving some properties of the data.
General Objectives of ML
 Theoretical questions:
• what can be learned, under what conditions?
• are there learning guarantees?
• analysis of learning algorithms.

 Algorithms:
• more efficient and more accurate algorithms
• deal with large-scale problems.
• handle a variety of different learning problems.
Kaggle: Online Machine Learning Playground
 Kaggle provides an online platform to learn and compete for
several machine learning problems.
• You can access at https://www.kaggle.com/competitions
• When there is a host who has a machine learning problem
to solve
• E.g., GE wants to optimize the flight routes given an origi
n and destination and traffic and weather conditions. ($2
20K)
• Data scientists compete to solve the problems.
• Your submission will be evaluated immediately and poste
d online. https://www.kaggle.com/c/titanic-gettingStart
ed/leaderboard

65
Kaggle: Online Machine Learning Playground
Some of interesting datasets

66
Kaggle: Online Machine Learning Playground
Some of interesting datasets

 Datasets from top machine learning conferences


• KDD - Author-Paper Identification Challenge
• ICDM - Personalize Expedia Hotel Searches
• NIPS, ICML - Multi-label Bird Species Classification

 Datasets from companies to recruit data scientists


• Amazon - Employee Access Challenge
• Facebook - Keyword Extraction
• Yelp - How many "useful" votes will a Yelp review receive?

67
Kaggle: Online Machine Learning Playground
 Sometimes, winners posts their winning strategies.
• Titanic: Random Forests
• http://trevorstephens.com/post/72916401642/titanic-getting-started
-with-r

 Some teams are ranked from Machine Learning class at UNIST


• https://www.kaggle.com/users/146714/yunseong-hwang
• https://www.kaggle.com/liecos
• https://www.kaggle.com/giyoung

68
Kaggle: Online Machine Learning Playground
 Kaggle also provides links for machine learning library https:/
/www.kaggle.com/wiki/Algorithms

69
Titanic: Machine Learning from Disaster
Description: The sinking of the RMS Titanic is one of the most infamous ship
wrecks in history. On April 15, 1912, during her maiden voyage, the Titanic
sank after colliding with an iceberg, killing 1502 out of 2224 passengers and
crew. One of the reasons that … were not enough lifeboats for the passenge
rs and crew. Although there was some element of luck …, some groups of pe
ople were more likely to survive than others, such as women, children, and t
he upper-class.

Problem: In this contest, we ask you to complete the analysis of what sorts
of people were likely to survive. In particular, we ask you to apply the tools
of machine learning to predict which passengers survived the tragedy.

70
Titanic: Machine Learning from Disaster
Data:

71
Zillow Prize: Zillow’s Home Value Prediction

72
Zillow Prize: Zillow’s Home Value Prediction
 Can you improve the algorithm that changed the world
of real estate?

73
Web Traffic Time Series Forecasting
 Forecast future traffic to Wikipedia pages

This competition focuses on the problem of forecasting the future values of


multiple time series, as it has always been one of the most challenging
problems in the field. More specifically, we aim the competition at testing
state-of-the-art methods designed by the participants, on the problem of
forecasting future web traffic for approximately 145,000 Wikipedia articles.

74
Carvana Image Masking Challenge
 Automatically Identify the boundaries of the car in an image

75
Curling Robot Curly
 AI Robot

76
Curling Robot Curly
 AI Robot

77
Semantic Segmentation
 Pyramid Scene Parsing Networks

78
Stanford Question Answering Task

https://www.ft.com/content/8763219a-f9bc-11e7-9b32-d7d59aace167
Stanford Question Answering Dataset (SQuAD)
Title: “Super_Bowl_50”
Context:

Super Bowl 50 was an American football game to determine the champion of


the National Football League (NFL) for the 2015 season. The American Footb
all Conference (AFC) champion Denver Broncos defeated the National Footba
ll Conference (NFC) champion Carolina Panthers 24\u201310 to earn their thi
rd Super Bowl title. The game was played on February 7, 2016, at Levi's Stad
ium in the San Francisco Bay Area at Santa Clara, California. As this was the
50th Super Bowl, the league emphasized the \"golden anniversary\" with vari
ous gold-themed initiatives, as well as temporarily suspending the tradition
of naming each Super Bowl game with Roman numerals (under which the ga
me would have been known as \"Super Bowl L\"), so that the logo could prom
inently feature the Arabic numerals 50.

Question: Which NFL team represented the AFC at Super Bowl 50?

Answer: Denver Broncos


Stanford Question Answering Dataset (SQuAD)
Title: “Normans”
Context:

The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) wer


e the people who in the 10th and 11th centuries gave their name to Norman
dy, a region in France. They were descended from Norse (\"Norman\" comes
from \"Norseman\") raiders and pirates from Denmark, Iceland and Norway w
ho, under their leader Rollo, agreed to swear fealty to King Charles III of We
st Francia. Through generations of assimilation and mixing with the native Fr
ankish and Roman-Gaulish populations, their descendants would gradually m
erge with the Carolingian-based cultures of West Francia. The distinct cultur
al and ethnic identity of the Normans emerged initially in the first half of th
e 10th century, and it continued to evolve over the succeeding centuries.

Question: In what country is Normandy located?


Answer: France
Stanford Question Answering Dataset (SQuAD)
Title: “Computational_complexity_theory”
Context:

Computational complexity theory is a branch of the theory of computation i


n theoretical computer science that focuses on classifying computational pro
blems according to their inherent difficulty, and relating those classes to ea
ch other. A computational problem is understood to be a task that is in princ
iple amenable to being solved by a computer, which is equivalent to stating
that the problem may be solved by mechanical application of mathematical
steps, such as an algorithm.

Question: What branch of theoretical computer science deals with broadly cl


assifying computational problems by difficulty and class of relationship?
Answer: Computational complexity theory
Stanford Question Answering Dataset (SQuAD)
Title: “Martin_Luther”
Context:

Luther's rediscovery of \"Christ and His salvation\" was the first of two points
that became the foundation for the Reformation. His railing against the sale
of indulgences was based on it.

Question: What was the first point of the Reformation?


Answer: [Christ, Christ and His salvation, rediscovery of \"Christ and His salv
ation\"]
Stanford Question Answering Dataset (SQuAD)
Title: “Southern_California”
Context:

The 8- and 10-county definitions are not used for the greater Southern Califo
rnia Megaregion, one of the 11 megaregions of the United States. The megar
egion's area is more expansive, extending east into Las Vegas, Nevada, and s
outh across the Mexican border into Tijuana.

Question: What is the name of the state that the megaregion expands to in t
he east?

Answer: Mexican
Stanford Question Answering Dataset (SQuAD)
Title: “Huguenot”
Context:

After the revocation of the Edict of Nantes, the Dutch Republic received the
largest group of Huguenot refugees, an estimated total of 75,000 to 100,000
people. Amongst them were 200 clergy. Many came from the region of the C
\u00e9vennes, for instance, the village of Fraissinet-de-Loz\u00e8re. This w
as a huge influx as the entire population of the Dutch Republic amounted to
ca. 2 million at that time. Around 1700, it is estimated that nearly 25% of th
e Amsterdam population was Huguenot.[citation needed] In 1705, Amsterda
m and the area of West Frisia were the first areas to provide full citizens rig
hts to Huguenot immigrants, followed by the Dutch Republic in 1715. Hugue
nots intermarried with Dutch from the outset.

Question: What country initially received the largest number of Huguenot re


fugees?

Answer: Dutch Republic

You might also like