SlideShare a Scribd company logo
Introduction to machine learning:

Wake up being a data scientist
one day
by Oleksandr Khryplyvenko

m3oucat@gmail.com
My path to DS
2002 2003 2004 2008 2014 2016
code work linux master CS ML Ph.D. student
WorkResearch
I use ML for(order matters):
Required skills
programming
basic linear
algebra
use existing
models
High level
ML
frameworks
(keras)
+ 1 yr
Required skills
programming
basic linear
algebra
use existing
models
High level
ML
frameworks
(keras)
low level
ML
frameworks
(TF)
basic
mathematical
analysis
basic
statistics
implement
models using
papers
enhance
existing
models
basic
information
theory
+ 1 yr
+ 2 yrs
Required skills
programming
basic linear
algebra
use existing
models
High level
ML
frameworks
(keras)
low level
ML
frameworks
(TF)
advanced
linear
algebra
basic
mathematical
analysis
basic
statistics
advanced
information
theory
implement
models using
papers
enhance
existing
models
create
new
models
understanding
patterns & laws
hidden in data
advanced
statistics
advanced
mathematical
analysis
basic
information
theory
+ 1 yr
+ 2 yrs
+ 4 yrs
Required skills
programming
basic linear
algebra
use existing
models
High level
ML
frameworks
(keras)
low level
ML
frameworks
(TF)
advanced
linear
algebra
basic
mathematical
analysis
basic
statistics
advanced
information
theory
implement
models using
papers
enhance
existing
models
create
new
models
understanding
patterns & laws
hidden in data
advanced
statistics
advanced
mathematical
analysis
basic
information
theory
creating
new
theories
Riemmanian
geometry
Set theory Topology
Abstract
algebra
+ 1 yr
+ 2 yrs
+ 4 yrs
+ life
ML in a nutshell
ML is a set of tricks on data
Magic
(ML)
Approaches
Trends
• Deep learning

• Reinforcement learning
Problems
• One shot learning

• Imbalanced datasets
Ukraine & my experience
• production: better take & use existing
• in prod: if yo don’t have time for analysis

just try differend approaches
• research: sample of RL digging
• in UA science is dying. But if you wanna

lean, you’ll find a way to learn
Cook your data
• choose models suitable for dataset size.

• how balanced a dataset is

(skewed, normal, contains all classes you want to learn)
• use public datasets, pretrained models
• if you gather values of variable X and they are bad

and you can gather values of variable Y - switch to Y

• most of algorithms require normalization.

you can’t compare bitter and orange without normalization.

• properly initialize free parameters(if needed)
3 main questions
• Which entity you want teach to
• What should be taught
• How should be taught
3 main questions
• model

• cost function:

- regression

- classification

- clusterization

• teaching algorithm

- backpropagation(1st order, 2nd order methods)

- hypotheses check(statistics)

- genetic algorithms

…
3 main questions(sample)
• DQN
• MSE(target Q*, agent Q*):
• backpropagation
parametristic models
• depend on parameters
• parameters change models
• nonlinear models with complex logic

(you are what you interact with)
+
W
AT
W
OW
input(X) model output
parametristic models
linear regression
10.7
10-0.1
X1 X2
8
3
Y1
input outputtrainset
your interest
in ML
time spent
on ML
your ML
level
linear regression
introducing free parameters
* generally we have train set(number of equasions) > number of free parameters

and WX + b = Y doesn’t have solutions
X2
1
X1
0.7
X2
10
X1
-0.1
Y1
8
Y1
3
W1
W1
W2
W2
+*
* +
*
*
=
=
free parameters free parameters
b
b
+
+
You introduce hyphotesis, that if you linearly combine
unseen inputs, you’ll get plausible outputs
What is the trick behind free parameters?

(main trick of parametristic ML)
W = known output | known input
unknown output = unknown input * W | unknown input
unknown output = unknown input * (known output | known input) | unknown input
If your model gives right distribution,
unknown output ~ known output
There is similarity with Bayes

That’s why you need statistics ;)
Information obtained from seen data allows you
use unseen data to solve tasks the way

you solved them on seen data
3 main questions (1)
• Which entity you want teach to:







here’s why you need linear algebra:



XW + B = Y (matrix equasion)



(1x2)*(2x1)+(1x1)=(1x1) (dimensions)



linear combination!
r(W ) = x1w1 + x2w2 + b W = (w1,w2,b)
3 main questions (2)
• What should be taught:
You want your system’s output to be as close as

possible to target(marked) output for EACH

vector in dataset:
1 sample(SGD) batch of m samples(batch GD)
1
2m
(ri (W )−
i=1
m
∑ yi )21
2
(r(W )− y)2
3 main questions (3)
• How should be taught:

(that’s why you need some math analysis)
∂
1
2
(r(W )− y)2
∂wi
=
∂
1
2
(x1w1 + x2w2 + b − y)2
∂wi
= (r(W )− y)xi
wi = wi − (r(W )− y)xi ???
3 main questions (3)
• How should be taught:

(that’s why you need some math analysis)
1
2
(r(W )− y)2
W
r(W )
(r(W )− y)xi
wi = 0.1
3 main questions (3)
• How should be taught:
wi = wi − (r(W )− y)xi
But if you move your free parameters totally into sample’s
direction, your network would classify this sample good.


But other samples - bad!!!
Solution:
wi = wi −η(r(W )− y)xi
η = (0..1) learning speed(hyperparameter)
3 main questions (3)
• How should be taught:
This is a simple optimizer, but there are much more.
wi = wi −η(r(W )− y)xi SGD
You may use precise information about error curvature

(2nd order optimization)[1],

or try to simulate 2nd order optimization for faster convergence[2]

Training
Regularization
• There is always some noise in data
• so you do regularization
• lots of techniques:

- L1

- L2

- ensembles

- dropout

- batch-normalization

…
Back to linear regression
In [3]: from sklearn import linear_model
In [4]: regr = linear_model.LinearRegression()
In [5]: x_train = [[0.7, 1], [-0.1, 10]]
In [6]: x_test = [[0.8, 2]]
In [7]: y_train = [8, 3]
In [8]: y_test = [9]



In [9]: regr.fit(x_train, y_train)
Out[9]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)



In [10]: regr.coef_
Out[10]: array([ 0.04899559, -0.55120039])



In [11]: regr.predict(x_test)
Out[11]: array([ 7.45369917])
find what’s wrong?
Linear regression
is a simple linear neuron!
w1
w2
Y1
x1
x2
b
* in linear case, least squares may be used:

(learns in 1 iteration)
But for non-linear, it can’t. SGD is most common.
Neural networks?
• There are lots of types & variations of neural networks:



MLP, RNN, CNN

• They use the same ideas, concepts and mathematics as

you’ve been just introduced. Just a bit more complex.

• So you can start your ML path right now!
Neural networks
w1
w2
Y1
x1
x2
b
w1,1
w2,1
Y1
x1
x2
b1
w2,1
Y1
w2,2
b2
w1,1
w2,1
Y1
x1
x2
b1
w2,1
Y1
w2,2
w(1)1,1
w(1)2,1
x1
x2
b(1)1
w(1)2,1
w(1)2,2
Y1
b(2)1
b(1)2
w(2)1,1
w(2)2,1
What modern frameworks

do for you
• Symbolic computations
• Automatic differentiation
• Lots of ready-to use models(neural networks)
In few words - alomost everything!
Benefits:
- simpler automatic differentiation
- easier parallelisation
- differentiation of graph produces graph, so you can get

high order derivatives for no cost(PROFIT!!!)

You say how to symbolically compute the gradient for an op when you make a new op

in tf - single method @ops.RegisterGradient("MyOP")
Symbolic computations
• You don’t actually compute. You just say how to compute
• You can think of it as meta programming
• Symbolic computation shows how to get symbolic (common, analytical) solution
• by substituting numerical values to vars, you obtain partial numerical solutions
Symbolic: c = a + b given a=…, b=… 

Numerical: 7 = 3 + 4
Symbolic computations. TF sample
(a-b)+ cos(x)
/gpu:0
+
/cpu:0
cos
/gpu:0
-
/cpu:0
x
/gpu:0
a
/gpu:0
b
import tensorflow as tf

import numpy as np



with tf.device('/cpu:0'):

x = tf.constant(np.ones((100,100)))

y = tf.cos(x)



with tf.device('/gpu:0'):

a = tf.constant(np.zeros((100,100)))

b = tf.constant(np.ones((100,100)))

result = a-b+y



tf_session = tf.Session(
config=tf.ConfigProto(
log_device_placement=True
)
)

writer = tf.train.SummaryWriter(
“/tmp/trainlogs2",
tf_session.graph
)
# then run

# tensorboard —-logdir=/tmp/trainlogs2 in shell,
# go to the location suggested by tensorboard,

# `graphs` tab, click on each node/leaf,

# and check where it has been placed
Automatic differentiation
Automatic differentiation is based on chain rule:
http://colah.github.io/posts/2015-08-Backprop/
∂E(ƒ(w))
∂w
∂E(ƒ(w))
∂ƒ(w)
∂ƒ(w)
∂w
But f(w) not depend directly on w,
it may depend on g(w)…
In TF it’s much more convenient than in Torch or Theano
You can think of TF op = torch layer (in terms of automatic differentiation)
• Allows us to compute partial derivatives of objective function with respect to each

free parameter in one pass.
• Efficient when # of objective functions is small
non-parametristic models
+ KNN
non-parametristic models
• Depend on metric

(distance function between 2 objects)
• Depend on hyper parameters

(number of classes, number of neighbors, etc)
• if you don’t know how to define hyperparameter

value, you fail
• if you do know how to define hyperparameter value,

you may perform even better than in parametristic models
!2k
→ !
ML branches
(supervising slice)
• Supervised



- use when you have labeled datasets

• Unsupervised



- use when you have unlabeled datasets

• Reinforcement



- use when you have to interact with environment
Starter links
http://openclassroom.stanford.edu/MainFolder/
CoursePage.php?course=MachineLearning
[1] http://andrew.gibiansky.com/blog/machine-learning/
hessian-free-optimization/
[2] http://sebastianruder.com/optimizing-gradient-descent/
index.html#momentum

More Related Content

What's hot (20)

PDF
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Arnaud Joly
 
PDF
Variational Autoencoder
Mark Chang
 
PDF
Cheat Sheet for Machine Learning in Python: Scikit-learn
Karlijn Willems
 
PDF
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Pooyan Jamshidi
 
PPTX
Enter The Matrix
Mike Anderson
 
PDF
Additive model and boosting tree
Dong Guo
 
PDF
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Marjan Sterjev
 
PPTX
Ada boost
Hank (Tai-Chi) Wang
 
PPT
Sparse Matrix and Polynomial
Aroosa Rajput
 
PDF
Python matplotlib cheat_sheet
Nishant Upadhyay
 
PDF
A Note on Leapfrog Integration
Kai Xu
 
PDF
Python Pandas for Data Science cheatsheet
Dr. Volkan OBAN
 
PDF
Introduction to Big Data Science
Albert Bifet
 
PDF
Matlab algebra
pramodkumar1804
 
PPTX
17. Java data structures trees representation and traversal
Intro C# Book
 
PDF
Introduction to behavior based recommendation system
Kimikazu Kato
 
PDF
Python seaborn cheat_sheet
Nishant Upadhyay
 
PPT
Array
Malainine Zaid
 
PPTX
Language R
Girish Khanzode
 
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Arnaud Joly
 
Variational Autoencoder
Mark Chang
 
Cheat Sheet for Machine Learning in Python: Scikit-learn
Karlijn Willems
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Pooyan Jamshidi
 
Enter The Matrix
Mike Anderson
 
Additive model and boosting tree
Dong Guo
 
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Marjan Sterjev
 
Sparse Matrix and Polynomial
Aroosa Rajput
 
Python matplotlib cheat_sheet
Nishant Upadhyay
 
A Note on Leapfrog Integration
Kai Xu
 
Python Pandas for Data Science cheatsheet
Dr. Volkan OBAN
 
Introduction to Big Data Science
Albert Bifet
 
Matlab algebra
pramodkumar1804
 
17. Java data structures trees representation and traversal
Intro C# Book
 
Introduction to behavior based recommendation system
Kimikazu Kato
 
Python seaborn cheat_sheet
Nishant Upadhyay
 
Language R
Girish Khanzode
 

Viewers also liked (12)

PDF
DOSSIER CORPORATIVO
Pol Gilart Farré
 
PDF
Analisis kegiatan organisasi tua tui sejuta enam terhadap pendidikan (iman da...
Tatik prisnamasari
 
PDF
The Study on Method to Determine the Priority in Sidewalk Installation on Rur...
inventionjournals
 
PPTX
Tarea de seminario 3
Beafernandez27
 
PPT
Musica
Saramusica
 
PDF
Dialnet principales aportesdelafisicaa-lafilosofiadelamente-4193700
Mynor R. Martínez
 
PPTX
Lamborghini aventador
sidokar
 
PDF
International Trade Finance professional qualifications
Patricio Fernandez-Urbina
 
PPT
Modernismo y 98
anjuru68
 
PPTX
Ponencia recreación.infantil.ubj
PROFESIONALES DE LA EDUCACIÓN FÍSICA POBLANA A.C.
 
PPTX
Jeisithon alvis
jeisithon
 
PPTX
Mule esb lesson 2
Germano Barba
 
DOSSIER CORPORATIVO
Pol Gilart Farré
 
Analisis kegiatan organisasi tua tui sejuta enam terhadap pendidikan (iman da...
Tatik prisnamasari
 
The Study on Method to Determine the Priority in Sidewalk Installation on Rur...
inventionjournals
 
Tarea de seminario 3
Beafernandez27
 
Musica
Saramusica
 
Dialnet principales aportesdelafisicaa-lafilosofiadelamente-4193700
Mynor R. Martínez
 
Lamborghini aventador
sidokar
 
International Trade Finance professional qualifications
Patricio Fernandez-Urbina
 
Modernismo y 98
anjuru68
 
Ponencia recreación.infantil.ubj
PROFESIONALES DE LA EDUCACIÓN FÍSICA POBLANA A.C.
 
Jeisithon alvis
jeisithon
 
Mule esb lesson 2
Germano Barba
 
Ad

Similar to Introduction to Machine Learning (20)

PPTX
Machine Learning Essentials Demystified part2 | Big Data Demystified
Omid Vahdaty
 
PPTX
Deep learning from scratch
Eran Shlomo
 
PDF
[update] Introductory Parts of the Book "Dive into Deep Learning"
Young-Min kang
 
PPTX
Introduction to Neural Networks and Deep Learning from Scratch
Ahmed BESBES
 
PPTX
Deep Learning, Scala, and Spark
Oswald Campesato
 
PDF
Intro to Machine Learning with TF- workshop
Prottay Karim
 
PPTX
Training and Testing Neural Network unit II
tintu47
 
PPTX
Deep Learning in your Browser: powered by WebGL
Oswald Campesato
 
PPTX
Deep Learning, Keras, and TensorFlow
Oswald Campesato
 
ODP
Machine learning 2016: deep networks and Monte Carlo Tree Search
Olivier Teytaud
 
ODP
Machine learning 2016: deep networks and Monte Carlo Tree Search
Olivier Teytaud
 
PDF
CSSC ML Workshop
GDSC UofT Mississauga
 
PPTX
Optimization techniq
RakshithGowdakodihal
 
PPTX
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
PPTX
Practical ML
Antonio Pitasi
 
PPTX
Java and Deep Learning
Oswald Campesato
 
PDF
AI/ML Fundamentals to advanced Slides by GDG Amrita Mysuru.pdf
Lakshay14663
 
PPTX
DeepLearningLecture.pptx
ssuserf07225
 
PPTX
Deep Learning: R with Keras and TensorFlow
Oswald Campesato
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Omid Vahdaty
 
Deep learning from scratch
Eran Shlomo
 
[update] Introductory Parts of the Book "Dive into Deep Learning"
Young-Min kang
 
Introduction to Neural Networks and Deep Learning from Scratch
Ahmed BESBES
 
Deep Learning, Scala, and Spark
Oswald Campesato
 
Intro to Machine Learning with TF- workshop
Prottay Karim
 
Training and Testing Neural Network unit II
tintu47
 
Deep Learning in your Browser: powered by WebGL
Oswald Campesato
 
Deep Learning, Keras, and TensorFlow
Oswald Campesato
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Olivier Teytaud
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Olivier Teytaud
 
CSSC ML Workshop
GDSC UofT Mississauga
 
Optimization techniq
RakshithGowdakodihal
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
Practical ML
Antonio Pitasi
 
Java and Deep Learning
Oswald Campesato
 
AI/ML Fundamentals to advanced Slides by GDG Amrita Mysuru.pdf
Lakshay14663
 
DeepLearningLecture.pptx
ssuserf07225
 
Deep Learning: R with Keras and TensorFlow
Oswald Campesato
 
Ad

Recently uploaded (20)

PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 

Introduction to Machine Learning

  • 1. Introduction to machine learning:
 Wake up being a data scientist one day by Oleksandr Khryplyvenko
 [email protected]
  • 2. My path to DS 2002 2003 2004 2008 2014 2016 code work linux master CS ML Ph.D. student WorkResearch I use ML for(order matters):
  • 3. Required skills programming basic linear algebra use existing models High level ML frameworks (keras) + 1 yr
  • 4. Required skills programming basic linear algebra use existing models High level ML frameworks (keras) low level ML frameworks (TF) basic mathematical analysis basic statistics implement models using papers enhance existing models basic information theory + 1 yr + 2 yrs
  • 5. Required skills programming basic linear algebra use existing models High level ML frameworks (keras) low level ML frameworks (TF) advanced linear algebra basic mathematical analysis basic statistics advanced information theory implement models using papers enhance existing models create new models understanding patterns & laws hidden in data advanced statistics advanced mathematical analysis basic information theory + 1 yr + 2 yrs + 4 yrs
  • 6. Required skills programming basic linear algebra use existing models High level ML frameworks (keras) low level ML frameworks (TF) advanced linear algebra basic mathematical analysis basic statistics advanced information theory implement models using papers enhance existing models create new models understanding patterns & laws hidden in data advanced statistics advanced mathematical analysis basic information theory creating new theories Riemmanian geometry Set theory Topology Abstract algebra + 1 yr + 2 yrs + 4 yrs + life
  • 7. ML in a nutshell ML is a set of tricks on data Magic (ML)
  • 9. Trends • Deep learning
 • Reinforcement learning Problems • One shot learning
 • Imbalanced datasets
  • 10. Ukraine & my experience • production: better take & use existing • in prod: if yo don’t have time for analysis
 just try differend approaches • research: sample of RL digging • in UA science is dying. But if you wanna
 lean, you’ll find a way to learn
  • 11. Cook your data • choose models suitable for dataset size.
 • how balanced a dataset is
 (skewed, normal, contains all classes you want to learn) • use public datasets, pretrained models • if you gather values of variable X and they are bad
 and you can gather values of variable Y - switch to Y
 • most of algorithms require normalization.
 you can’t compare bitter and orange without normalization.
 • properly initialize free parameters(if needed)
  • 12. 3 main questions • Which entity you want teach to • What should be taught • How should be taught
  • 13. 3 main questions • model
 • cost function:
 - regression
 - classification
 - clusterization
 • teaching algorithm
 - backpropagation(1st order, 2nd order methods)
 - hypotheses check(statistics)
 - genetic algorithms
 …
  • 14. 3 main questions(sample) • DQN • MSE(target Q*, agent Q*): • backpropagation
  • 15. parametristic models • depend on parameters • parameters change models • nonlinear models with complex logic
 (you are what you interact with) + W AT W OW input(X) model output
  • 16. parametristic models linear regression 10.7 10-0.1 X1 X2 8 3 Y1 input outputtrainset your interest in ML time spent on ML your ML level
  • 17. linear regression introducing free parameters * generally we have train set(number of equasions) > number of free parameters
 and WX + b = Y doesn’t have solutions X2 1 X1 0.7 X2 10 X1 -0.1 Y1 8 Y1 3 W1 W1 W2 W2 +* * + * * = = free parameters free parameters b b + + You introduce hyphotesis, that if you linearly combine unseen inputs, you’ll get plausible outputs
  • 18. What is the trick behind free parameters?
 (main trick of parametristic ML) W = known output | known input unknown output = unknown input * W | unknown input unknown output = unknown input * (known output | known input) | unknown input If your model gives right distribution, unknown output ~ known output There is similarity with Bayes
 That’s why you need statistics ;) Information obtained from seen data allows you use unseen data to solve tasks the way
 you solved them on seen data
  • 19. 3 main questions (1) • Which entity you want teach to:
 
 
 
 here’s why you need linear algebra:
 
 XW + B = Y (matrix equasion)
 
 (1x2)*(2x1)+(1x1)=(1x1) (dimensions)
 
 linear combination! r(W ) = x1w1 + x2w2 + b W = (w1,w2,b)
  • 20. 3 main questions (2) • What should be taught: You want your system’s output to be as close as
 possible to target(marked) output for EACH
 vector in dataset: 1 sample(SGD) batch of m samples(batch GD) 1 2m (ri (W )− i=1 m ∑ yi )21 2 (r(W )− y)2
  • 21. 3 main questions (3) • How should be taught:
 (that’s why you need some math analysis) ∂ 1 2 (r(W )− y)2 ∂wi = ∂ 1 2 (x1w1 + x2w2 + b − y)2 ∂wi = (r(W )− y)xi wi = wi − (r(W )− y)xi ???
  • 22. 3 main questions (3) • How should be taught:
 (that’s why you need some math analysis) 1 2 (r(W )− y)2 W r(W ) (r(W )− y)xi wi = 0.1
  • 23. 3 main questions (3) • How should be taught: wi = wi − (r(W )− y)xi But if you move your free parameters totally into sample’s direction, your network would classify this sample good. 
 But other samples - bad!!! Solution: wi = wi −η(r(W )− y)xi η = (0..1) learning speed(hyperparameter)
  • 24. 3 main questions (3) • How should be taught: This is a simple optimizer, but there are much more. wi = wi −η(r(W )− y)xi SGD You may use precise information about error curvature
 (2nd order optimization)[1],
 or try to simulate 2nd order optimization for faster convergence[2]

  • 26. Regularization • There is always some noise in data • so you do regularization • lots of techniques:
 - L1
 - L2
 - ensembles
 - dropout
 - batch-normalization
 …
  • 27. Back to linear regression In [3]: from sklearn import linear_model In [4]: regr = linear_model.LinearRegression() In [5]: x_train = [[0.7, 1], [-0.1, 10]] In [6]: x_test = [[0.8, 2]] In [7]: y_train = [8, 3] In [8]: y_test = [9]
 
 In [9]: regr.fit(x_train, y_train) Out[9]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
 
 In [10]: regr.coef_ Out[10]: array([ 0.04899559, -0.55120039])
 
 In [11]: regr.predict(x_test) Out[11]: array([ 7.45369917]) find what’s wrong?
  • 28. Linear regression is a simple linear neuron! w1 w2 Y1 x1 x2 b * in linear case, least squares may be used:
 (learns in 1 iteration) But for non-linear, it can’t. SGD is most common.
  • 29. Neural networks? • There are lots of types & variations of neural networks:
 
 MLP, RNN, CNN
 • They use the same ideas, concepts and mathematics as
 you’ve been just introduced. Just a bit more complex.
 • So you can start your ML path right now!
  • 31. What modern frameworks
 do for you • Symbolic computations • Automatic differentiation • Lots of ready-to use models(neural networks) In few words - alomost everything!
  • 32. Benefits: - simpler automatic differentiation - easier parallelisation - differentiation of graph produces graph, so you can get
 high order derivatives for no cost(PROFIT!!!)
 You say how to symbolically compute the gradient for an op when you make a new op
 in tf - single method @ops.RegisterGradient("MyOP") Symbolic computations • You don’t actually compute. You just say how to compute • You can think of it as meta programming • Symbolic computation shows how to get symbolic (common, analytical) solution • by substituting numerical values to vars, you obtain partial numerical solutions Symbolic: c = a + b given a=…, b=… 
 Numerical: 7 = 3 + 4
  • 33. Symbolic computations. TF sample (a-b)+ cos(x) /gpu:0 + /cpu:0 cos /gpu:0 - /cpu:0 x /gpu:0 a /gpu:0 b import tensorflow as tf
 import numpy as np
 
 with tf.device('/cpu:0'):
 x = tf.constant(np.ones((100,100)))
 y = tf.cos(x)
 
 with tf.device('/gpu:0'):
 a = tf.constant(np.zeros((100,100)))
 b = tf.constant(np.ones((100,100)))
 result = a-b+y
 
 tf_session = tf.Session( config=tf.ConfigProto( log_device_placement=True ) )
 writer = tf.train.SummaryWriter( “/tmp/trainlogs2", tf_session.graph ) # then run
 # tensorboard —-logdir=/tmp/trainlogs2 in shell, # go to the location suggested by tensorboard,
 # `graphs` tab, click on each node/leaf,
 # and check where it has been placed
  • 34. Automatic differentiation Automatic differentiation is based on chain rule: http://colah.github.io/posts/2015-08-Backprop/ ∂E(ƒ(w)) ∂w ∂E(ƒ(w)) ∂ƒ(w) ∂ƒ(w) ∂w But f(w) not depend directly on w, it may depend on g(w)… In TF it’s much more convenient than in Torch or Theano You can think of TF op = torch layer (in terms of automatic differentiation) • Allows us to compute partial derivatives of objective function with respect to each
 free parameter in one pass. • Efficient when # of objective functions is small
  • 36. non-parametristic models • Depend on metric
 (distance function between 2 objects) • Depend on hyper parameters
 (number of classes, number of neighbors, etc) • if you don’t know how to define hyperparameter
 value, you fail • if you do know how to define hyperparameter value,
 you may perform even better than in parametristic models !2k → !
  • 37. ML branches (supervising slice) • Supervised
 
 - use when you have labeled datasets
 • Unsupervised
 
 - use when you have unlabeled datasets
 • Reinforcement
 
 - use when you have to interact with environment