0% found this document useful (0 votes)
39 views59 pages

강의 노트 1a

This document provides an overview of the first lecture of a course on the math behind machine learning techniques. It introduces key concepts like artificial intelligence, machine learning, deep learning, and data science. It then discusses supervised and unsupervised learning as well as classical examples from Kepler, Newton, Hubble, and the expanding universe. The document concludes with introductions to linear regression, neural networks, and applications of machine learning in astronomy.

Uploaded by

dongkyun ryu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views59 pages

강의 노트 1a

This document provides an overview of the first lecture of a course on the math behind machine learning techniques. It introduces key concepts like artificial intelligence, machine learning, deep learning, and data science. It then discusses supervised and unsupervised learning as well as classical examples from Kepler, Newton, Hubble, and the expanding universe. The document concludes with introductions to linear regression, neural networks, and applications of machine learning in astronomy.

Uploaded by

dongkyun ryu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Math behind Machine Learning Techniques

Lecture 1 (a)

SSM Seoul
Oct. to Nov. 2023

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 1


Agenda
• Self-Introduction
➢Separate slides
• Course Overview and Logistics
➢Course Syllabus

• Introduction to Data Science and Machine Learning


• Motivation and Overview

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 2


AI, ML, DL, & Data Science
• Artificial Intelligence (AI)
➢ Broad concept of machine
being able to carry out “smart”
tasks. Intelligent behavior
exhibited by machines.

• Machine Learning (ML)


➢ The use of statistical tools that
help computers “learn” from
data

• Deep Learning (DL)


➢ Driven primarily by Neural
Networks

• Data Science
➢ To understand the data to
derive “insights”
Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 3
AI, ML, DL vs Data Science

• Data Science
➢ an area, a field of study
➢ (relatively) scientific study
• Data Mining
➢ a technique
➢ (relatively) business process
➢ finding trends (in a data set) previously
unknown and using these trends to
identify future patterns
Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 4
Machine Learning
• Machine Learning (or Statistical Learning) refers to a
huge set of tools for understanding data.
➢ In Supervised Learning, a statistical model is
built to predict an output based upon inputs
by training the model with a dataset having
both inputs and a labeled response (output).

➢ With Unsupervised Learning, there are


inputs but no supervising response variable.
Nonetheless, the relationships between
input variables and observations (samples)
can still be learned and discovered. That is,
unsupervised learning is intended to draw
inferences from and to find underlying
patterns in such datasets.

➢ In Reinforcement Learning, an agent learns


how to behave in a environment by
performing actions and seeing the results.
Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 5
Classical Example

• Kepler (born in 1571)


➢Laws of planetary motion
➢Empirical (data-driven), no purely theoretical
foundation
vs

• Newton (born in 1642)


➢Laws of motion
➢Law of universal gravitation
➢Theoretical foundation

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 6


Hubble’s Original Graph (1929)
An example of linear regression

km/h

• (fun question) What’s wrong with this graph?


• What does this plot mean?
Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 7
“Expanding” Universe
observational facts

• Hubble discovered that:


➢ All galaxies (outside the Local
Group) are moving away from
us.
➢ The more distant the galaxy,
the faster it is racing away.

→ Hubble’s conclusion:
The universe is expanding.

interpretation

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 8


An accelerating universe can be
2011 Nobel Prize in physics explained with dark energy.

“Accelerating” Universe

Repulsive power is strong enough


to overpower gravity.
What we discovered !!!
• The white dwarf supernova data (dots) fit the accelerating
universe model (in which the repulsive force is produced by a
form of dark energy) better than the other three models.
Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 9
“Decelerating” Cosmic Expansion?

https://youtu.be/BE1i20DzaAc

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 10


“Decelerating” Cosmic Expansion?

2022년 가을 !

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 11


New Programming Paradigm

• However, recall!
➢Kepler’s Laws of Planetary Motion
➢Hubble’s Law

(Picture Source: Deep Learning with Python by Chollet)


Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 12
Math vs ML Consider
Model Prediction vs Inference

• Y: dependent variable → Output


• X1, X2,…,Xp: independent variables → Features
• β1, β2,…, βp: parameters → Weights

• Other names
➢ Predictor (X): feature, input variable, independent variable, field
➢ Response (Y): output variable, dependent variable, target variable, outcome variable
Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 13
Model Fit
1. Model to fit the data.

2. Objective Function (loss/cost function) which is a


metric that you will choose to quantify how well the
model fits the data (e.g., RMSE, accuracy, cross-
entropy).

3. Optimization Method which you will use to find the


best model (i.e., seeking lowest RMSE, highest
accuracy).

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 14


Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 15
Artificial Neural Network (ANN)

• Deep Learning (DL): Add more hidden layers


Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 16
http://www.asimovinstitute.org/neural-network-zoo
Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 17
Same Goal
• Same “goal”: To find the combinations of β coefficients
that minimize test RMSE (or maximize test accuracy)
between the predicted outcome and the actual outcome

• Linear Regression

• Logistic Regression

• Artificial Neural Networks

• Other ML models
Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 18
Data Mining: from a Process Perspective

Data Analytics
Social Network Analytics
Text Mining
Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 19
Application Examples
• Astronomy
➢ Photometric redshift
▪ Neural networks, boosted decision trees
➢ Gravitational wave: detection and parameter estimation
▪ CNNs
➢ Gravitational lens: detection and parameter estimation
▪ NNs, CNNs, and ResNets (residual NNs)
➢ Fundamental cosmological parameter estimation
▪ 3D CNNs (for a fast mapping between the dark matter and visible galaxies)
• Particle Physics
➢ Particle identification
➢ Event selection and high-level physics tasks
➢ Reconstruction
➢ Jet classification
➢ Tracking
➢ Fast simulation
• Statistical Physics
• Many-Body Quantum Matter
• Quantum Computing
• AI acceleration with classical and quantum hardware

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 20


CNNs can
be applied.

• Nearly all of the objects that you can see in this photograph are:
➢ Planets HUDF (Hubble Ultra Deep Field) Image
- exposure time: ~ 600 hours between July 2002 and September 2012
➢ Stars - the size of this image
➢ Galaxies * horizontal length: 1/13 of the diameter of the full Moon in the sky
* area:1/150 of the area of the full Moon in the sky
- about 30,000 galaxies in this particular image
Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 21
- various types of galaxies: spiral, elliptical, and irregular
Generative Adversarial Network (GAN)

• 2014, new AI technique


• Something from nothing…
• Successfully used in Internet security and drug development

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 22


GAN Example: Exploring Galaxy Evolution

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 23


GAN Example: Exploring Galaxy Evolution

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 24


GAN Example: Exploring Galaxy Evolution

• New approach?

• We learn about nature through


1. Observation
2. Simulation (assumption-driven approach)
3. Data-driven approach
▪ Is this GAN example a third approach, between observation and
simulation?

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 25


Example: Reinforcement Learning
• Idea: To have agent learn (by itself) by performing
physics experiments

➢Which is heavier?

➢Towers: # of rigid bodies?

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 26


Physics-guided Neural Networks (PGNNs)

• Idea: physics-based models + data-driven models


➢A new paradigm for scientific discovery from both physics
and data

• ML algorithm with physics


1. Calculate additional features (feature engineering) using
physics theory and use them along with the
measurements.
2. Punish physically inconsistent predictions by adding a
physical inconsistency term to the loss function.

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 27


PGNNs Example: Lake Temperature Modeling
• Goal: predict the water temperature as a function of
depth and time for a lake

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 28


Annealing (담금질 = 강화)
• Concept: heat up materials and then cool slowly
➢Slow cooling will optimize crystal formation
➢Rapid cooling will result in suboptimal crystals with worse
properties.

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 29


Simulated Annealing
• To find global optima in the presence of large numbers of local
optima.

➢ Ability to jump out is governed by probability based on Boltzmann


distribution.
▪ Probability = Exp[ -(old f(x) – new f(x)) / T ]
where T decreases with interation
Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 30
Analogy between RG and RBM
• a remarkable analogy between a physics-based conceptual
framework called renormalization group (RG) and a type of
neural network known as a restricted Boltzmann machine
(RBM).
➢ Just as RG can, artificial neural networks can also be viewed as a coarse-
graining iterative process.

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 31


Convolutional Neural Network (CNN)

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 32


Principal Component Analysis (PCA)

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 33


Social Network Analysis

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 34


Association Rules: what goes together?

• 911 calls

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 35


DBSCAN: Density-Based Spatial Clustering
• Example of clustering process
➢https://cdn-images-1.medium.com/max/1000/1*tc8UF-
h0nQqUfLC8-0uInQ.gif

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 36


Clustering Comparison

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 37


Transformative Data in a Changing World
• Using cell phones to fight infectious disease
• Slavery in 1860 vs the 2008 US presidential election
➢Chattel slavery caused political divides that still exist in the South.
➢White people living in counties where slaveholding was more
prevalent tend to be more conservative and more hostile toward
black people.

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 38


Data Science (ML/DL/AI) Competitions

• https://www.kaggle.com/competitions

• https://dacon.io/competitions

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 39


Hubble’s Original Graph (1929)
An example of linear regression

km/h

• (fun question) What’s wrong with this graph?


• What does this plot mean?
Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 40
Linear Regression
(a.k.a. Method of Least Squares)

• What if there are lots of data points?


→ Resampling: random sampling, cross-validation, bootstrap
split data into training and test sets
Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 41
Overfitting: what we have to avoid

• The Problem of Overfitting This model fits the data with no error,
➢ Statistical models can produce highly But does it really show an appropriate trend?
complex explanations of relationships
between input and output variables. Overfitting:
➢ The “fit” appears to be excellent. A model follows the errors (noise) too closely.
➢ However, when used with new data, The method yields a small training MSE
models of great complexity do not
perform well. but a large test MSE.
A good model should follow the signal (trend),
not the errors (noise).
Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 42
Data Partition and their Roles
• Training Partition
➢ Typically the largest partition.
➢ Used to build the various models.
▪ This process is called “training”.
➢ Overfitting issue might arise.
• Validation Partition
➢ Didn’t participate in the training process.
➢ Instead, this is used to assess the predictive
performance of each model.
➢ This is also used for fine tuning of hyper-
parameters of models (e.g., the number of
nearest neighbors, learning rate, etc.).
➢ Finally, we compare the results of models
using the validation data and choose the best
model.
• Test Partition
➢ Completely new data, never participate in the
training and validating processes.
➢ Used to assess the performance of the chosen
(best) model.

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 43


The Bias-Variance Trade-Off
Expected Test MSE Variance Bias ε: irreducible error
y0: true value at x0
f_hat: estimated (predicted) value at x0

• Variance: the amount by which f_hat would change if we estimated it using a


different training data set. In general, more flexible statistical methods have higher
variance.

• Bias: the error that is introduced by approximating a real-life problem (which may be
extremely complicated) by a much simpler model. Generally, more flexible methods
result in less bias.

• Irreducible error: random error, which is independent of x and has mean zero.

• The challenge lies in finding a method for which both the variance and the squared
bias are low.

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 44


Best Model: Variance vs Bias
test

Test MSE

Training MSE

Expected Test MSE Variance Bias ε: irreducible error

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 45


Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 46
Modeling Process
1. Define/understand the purpose
2. Obtain the data
3. Explore, clean, pre-process the data
4. Reduce the data dimension, if necessary
5. Determine the data mining task (e.g., classification,
clustering, etc.)
6. Partition the data (for supervised tasks)
7. Choose the data mining techniques (e.g., linear regression,
neural networks, etc.)
8. “Iterative” implementation and “tuning”
9. Assess/interpret the results of the algorithms (that is,
compare models)
10. Deploy the best model

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 47


Data Mining: from a Process Perspective

Data Analytics
Social Network Analytics
Text Mining
Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 48
Core Ideas in Data Mining
• Classification
• Prediction
• Time Series Forecasting
• Association Rules & Recommenders
• Data & Dimension Reduction (including Cluster
Analysis)
• Data Exploration
• Visualization

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 49


Supervised Learning
• Goal: Predict a single “target” or “outcome” variable

• Training data, where target value is known

• Score to data where value is not known

• Methods: classification, prediction, time series


forecasting

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 50


Unsupervised Learning
• Goal: Segment data into meaningful segments; detect
patterns

• There is no target (outcome) variable to predict or


classify

• Methods: association rules, clustering, data reduction


& exploration, visualization

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 51


Supervised: Classification
• Goal: Predict categorical target (outcome) variable
➢Examples: Purchase/no purchase, fraud/no fraud,
creditworthy/not creditworthy…
➢Each row is a case (customer, tax return, applicant).
➢Each column is a variable.
➢Target variable is often binary (yes/no) but can be multi-
class (e.g., red, blue, or green) as well.

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 52


Supervised: Prediction
• Goal: Predict numerical target (outcome) variable
➢Examples: sales, revenue, performance
➢Each row is a case (customer, tax return, applicant).
➢Each column is a variable.
➢Target variable is continuous numbers.

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 53


Unsupervised: Association Rules
• Goal: Produce rules that define “what goes with what”
➢Example: “If X was purchased, Y was also purchased”
➢Rows are transactions.
➢Used in recommender systems – “Our records show you
bought X, you may also like Y”
➢Also called “affinity analysis”

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 54


Unsupervised: Data Reduction
• Distillation of complex/large data into simpler/smaller
data
➢Reducing the number of variables/columns (e.g., principal
components)
➢Reducing the number of records/rows (e.g., clustering)

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 55


Examples
Methods Scenarios

• Retrospective • How many fish did each vessel


catch yesterday?
• Descriptive • What is the average number of
donuts served each morning?
• Data Science
• Will the Red Sox win the game?
➢Supervised Learning
• Will the next customer order a
▪ Classification wine, beer, or cocktail?
✓ Binary
✓ Multi-Class
• How many patients should we
expect in the urgent care
▪ Prediction tomorrow?
✓ Continuous
• What should we place next to
▪ Forecasting the cheese in the grocery
➢Unsupervised Learning cooler?
▪ Associative System • What are our customer
▪ Clustering personas and how are they
similar by account attribute?
Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 56
Installation of Python via Anaconda
• Anaconda is a free and open-source distribution of the
Python programming languages for scientific computing,
so you don't need to have a license to use them.
➢Anaconda and Python can be installed on any operating systems.

• Why Anaconda?
➢It allows Python programming in an “interactive” mode via
Jupyter Notebook, which is quite useful for data science.
➢It permits to install multiple Python environments (e.g., different
versions of Python and libraries).

• Python 3
➢https://www.anaconda.com/products/distribution
▪ Download and install

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 57


• Run “Jypyter Notebook” in your computer.
• Navigate and open the “tutorial_jupyter_notebook.ipynb” file.
• Follow the instructions given in the Notebook.

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 58


Installation of R and R Studio
• Both R and RStudio are open source software, so you
don't need to have licenses to use them.
➢R and RStudio can be installed on any operating systems.

• You must install R (a free software environment for


statistical computing and graphics) first and then
install RStudio (an Integrated Development
Environment [IDE] for R).
➢R can be downloaded from CRAN archive at
https://cran.rstudio.com .
➢Rstudio IDE can be downloaded from
https://posit.co/downloads .

Oct. 28, 2023 Math behind ML Tech :: Lecture 1 (a) :: J. Rhee 59

You might also like