1 AI_Introduction and ML
1 AI_Introduction and ML
Programme: B. Tech-CSE
Course: SKE-309
Faculty: Dr. S. K. Dubey
Introduction
• Machine learning is making great strides
• Large, good data sets
• Compute power
• Progress in algorithms
• Many interesting applications
• commericial
• scientific
• Links with artificial intelligence
• However, AI machine learning
2
Machine learning tasks
• Supervised learning
• regression: predict numerical values
• classification: predict categorical values, i.e., labels
• Unsupervised learning
• clustering: group data according to "distance"
• association: find frequent co-occurrences
• link prediction: discover relationships in data
• data reduction: project features to fewer features
• Reinforcement learning
3
Regression
Colorize B&W images automatically
https://tinyclouds.org/colorize/
4
Classification
Object recognition
https://ai.googleblog.com/2014/09/building-
deeper-understanding-of-images.html
5
Reinforcement
learning
Learning to play Break Out
https://www.youtube.com/watch?v=V1eYniJ
0Rnk
6
Clustering
Crime prediction using k-means clustering
http://www.grdjournals.com/uploads/article
/GRDJE/V02/I05/0176/GRDJEV02I050176.pd
f
7
Applications in science
8
Machine learning algorithms
• Regression:
Ridge regression, Support Vector Machines, Random Forest,
Multilayer Neural Networks, Deep Neural Networks, ...
• Classification:
Naive Base, , Support Vector Machines,
Random Forest, Multilayer Neural Networks,
Deep Neural Networks, ...
• Clustering:
k-Means, Hierarchical Clustering, ...
9
Issues
• Many machine learning/AI projects fail
(Gartner claims 85 %)
10
Reasons for failure
• Asking the wrong question
• Trying to solve the wrong problem
• Not having enough data
• Not having the right data
• Having too much data
• Hiring the wrong people
• Using the wrong tools
• Not having the right model
• Not having the right yardstick
11
Frameworks
• Programming languages
• Python
Fast-evolving ecosystem!
• R
• C++
• ...
• Many libraries classic machine learning
• scikit-learn
• PyTorch
deep learning frameworks
• TensorFlow
• Keras
• …
12
scikit-learn
• Nice end-to-end framework
• data exploration (+ pandas + holoviews)
• data preprocessing (+ pandas)
• cleaning/missing values
• normalization
• training
• testing
• application
• "Classic" machine learning only
• https://scikit-learn.org/stable/
13
Keras
• High-level framework for deep learning
• TensorFlow backend
• Layer types
• dense
• convolutional
• pooling
• embedding
• recurrent
• activation
• …
• https://keras.io/
14
Data pipelines
• Data ingestion
• CSV/JSON/XML/H5 files, RDBMS, NoSQL, HTTP,...
• Data cleaning Must be done systematically
• outliers/invalid values? → filter
• missing values? → impute
• Data transformation
• scaling/normalization
15
Supervised learning: methodology
• Select model, e.g., random forest, (deep) neural network, ...
• Train model, i.e., determine parameters
• Data: input + output
• training data → determine model parameters
• validation data → yardstick to avoid overfitting
• Test model
• Data: input + output
• testing data → final scoring of the model
• Production Experiment with underfitting and overfitting:
010_underfitting_overfitting.ipynb
• Data: input → predict output
16
From neurons to ANNs
inspiration
𝑥1
𝑤1
𝑥2 𝑤2
𝑦 𝑁 𝜎 𝑥
𝑤3 𝑦=𝜎 𝑤𝑖 𝑥𝑖 + 𝑏
𝑖=1 activation function
𝑥3
𝑏
+1
𝑥
...
𝑤𝑁
𝑥𝑁
17
Multilayer network
How to determine
weights?
18
Training: backpropagation
• Initialize weights "randomly"
• For all training epochs
• for all input-output in training set
• using input, compute output (forward)
• compare computed output with training output
• adapt weights (backward) to improve output
• if accuracy is good enough, stop
19
Task: handwritten digit recognition
• Input data
• grayscale image
• Output data
• digit 0, 1, ..., 9
• Training examples
• Test examples
20
First approach
• Data preprocessing
• Input data as 1D array array([ 0.0, 0.0,..., 0.951, 0.533,..., 0.0, 0.0], dtype=float32)
• output data as array with 5
one-hot encoding
21
Deep neural networks
• Many layers
• Features are learned, not given
• Low-level features combined into
high-level features
22
Convolutional neural networks
1 ⋯ 0
⋮ ⋱ ⋮
0 ⋯ 1
23
Convolution examples
1 ⋯ 0 1 ⋯ 0
⋮ ⋱ ⋮ ⋮ ⋱ ⋮
0 ⋯ 1 0 ⋯ 1
0 ⋯ 1 0 ⋯ 1
⋮ ⋱ ⋮ ⋮ ⋱ ⋮
1 ⋯ 0 1 ⋯ 0
Convolution: 050_convolution.ipynb
24
Second approach
• Data preprocessing
• Input data as 2D array array([[ 0.0, 0.0,..., 0.951, 0.533,..., 0.0, 0.0]], dtype=float32)
• output data as array with 5
one-hot encoding
25
Task: sentiment classification
• Input data <start> this film was just brilliant casting location
scenery story direction everyone's really suited the part
• movie review (English) they played and you could just imagine being there Robert
• Output data redford's is an amazing actor and now the same being director
norman's father came from the same scottish island as myself
/ so i loved the fact there was a real connection with this
film the witty remarks throughout the film were great it was
• Training examples just brilliant so much that i bought the film as soon as it
• Test examples
26
Word embedding
• Represent words as one-hot vectors
length = vocabulary size
Issues:
• unwieldy
• no semantics
• Word embeddings
• dense vector
• vector distance semantic distance
• Training
• use context
• discover relations with surrounding
words
27
How to remember?
Manage history, network learns
• what to remember
• what to forget
Long-term correlations!
Use, e.g.,
• LSTM (Long Short-Term Memory
• GRU (Gated Recurrent Unit)
28
Gated Recurrent Unit
(GRU)
• Update gate
𝑧𝑡 = 𝜎 𝑊𝑧 𝑥𝑡 + 𝑈𝑧 ℎ𝑡−1
• Reset gate
𝑟𝑡 = 𝜎 𝑊𝑟 𝑥𝑡 + 𝑈𝑟 ℎ𝑡−1
• Current memory content
ℎ′𝑡 = tanh 𝑊𝑥𝑡 + 𝑟𝑡 ⊙ 𝑈ℎ𝑡−1
• Final memory/output
ℎ𝑡 = 𝑧𝑡 ⊙ ℎ𝑡−1 + 1 − 𝑧𝑡 ⊙ ℎ′𝑡
29
Approach
• Data preprocessing
• Input data as padded array
• output data as 0 or 1
30
Caveat
• InspiroBot (http://inspirobot.me/)
• "I am an artificial intelligence dedicated to generating unlimited amounts of unique inspirational quotes for endless
enrichment of pointless human existence".
31
Thank You
32