100% found this document useful (1 vote)

135 views

Deep Learning

The document provides an outline of topics related to deep learning including machine learning basics, components of deep learning like activation functions and optimization methods, deep neural network architectures, and applications. It discusses key deep learning concepts such as what deep learning is, why it is useful, common network components and hyperparameters, and regularization techniques to prevent overfitting.

Uploaded by

DARIN SYAHARANI NANDRIAWAN

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

135 views

Deep Learning

Uploaded by

DARIN SYAHARANI NANDRIAWAN

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 49

Outline

 Machine Learning basics

 Introduction to Deep Learning
 what is Deep Learning
 why is it useful
 Main components/hyper-parameters:
 activation functions

x
 optimizers, cost functions and training
 regularization methods
Backpropagation
 tuning
GANs & Adversarial training
 classification vs. regression tasks
Bayesian Deep Learning
 DNN basic architectures: Generative models
 convolutional Unsupervised / Pretraining
 recurrent
 attention mechanism
 Application example: Relation Extraction

Most material from CS224 NLP with DL course at Stanford

Machine Learning Basics
Machine learning is a field of computer science that gives computers
the ability to learn without being explicitly programmed

Machine Learning
Labeled Data algorithm

Training
Prediction

Learned
Labeled Data Prediction
model

Methods that can learn from and make predictions on data

Types of Learning
Supervised: Learning with a labeled training set
Example: email classification with already labeled emails

Unsupervised: Discover patterns in unlabeled data

Example: cluster similar documents based on text

Reinforcement learning: learn to act based on feedback/reward

Example: learn to play Go, reward: win or lose

class A

Classification Regression Clustering

Anomaly Detection
Sequence labeling
http://mbjoseph.github.io/2013/11/27/measure.html
…
ML vs. Deep Learning
Most machine learning methods work well because of human-designed
representations and input features
ML becomes just optimizing weights to best make a final prediction
What is Deep Learning (DL) ?
A machine learning subfield of learning representations of data.
Exceptional effective at learning patterns.
Deep learning algorithms attempt to learn (multiple levels of)
representation by using a hierarchy of multiple layers
If you provide the system tons of information, it begins to understand it
and respond in useful ways.

https://www.xenonstack.com/blog/static/public/uploads/media/machine-learning-vs-deep-learning.png
Why is DL useful?
o Manually designed features are often over-specified, incomplete and
take a long time to design and validate
o Learned Features are easy to adapt, fast to learn
o Deep learning provides a very flexible, (almost?) universal, learnable
framework for representing world, visual and linguistic information.
o Can learn both unsupervised and supervised
o Effective end-to-end joint system learning
o Utilize large amounts of training data

In ~2010 DL started outperforming

other ML techniques
first in speech and vision, then NLP
State of the art in …

Several big improvements in recent years in NLP

 Machine Translation
 Sentiment Analysis Leverage different levels of representation
 Dialogue Agents o words & characters
 Question Answering o syntax & semantics
 Text Classification …
Neural Network Intro
Weights

𝒉 = 𝝈(𝐖𝟏 𝒙 + 𝒃𝟏 )
𝒚 = 𝝈(𝑾𝟐 𝒉 + 𝒃𝟐 )

Activation functions

How do we train?

𝒚
4 + 2 = 6 neurons (not counting inputs)
𝒙 [3 x 4] + [4 x 2] = 20 weights
4 + 2 = 6 biases
𝒉 26 learnable parameters

Demo
Training
Forward it Back-
Sample Update the
labeled data through the
network, get
propagate network
(batch) the errors weights
predictions

Optimize (min. or max.) objective/cost function 𝑱(𝜽)

Generate error signal that measures difference
between predictions and target values

Use error signal to change the weights and get

more accurate predictions
Subtracting a fraction of the gradient moves you
towards the (local) minimum of the cost function
https://medium.com/@ramrajchandradevan/the-evolution-of-gradient-descend-optimization-algorithm-4106a6702d39
Gradient Descent
objective/cost function 𝑱(𝜽) Review of backpropagation

ⅆ
𝜃𝑗𝑛𝑒𝑤 = 𝜃𝑗𝑜𝑙𝑑 − 𝛼 𝑜𝑙𝑑 𝐽(𝜃) Update each element of θ
ⅆ𝜃𝑗

𝜃 𝑛𝑒𝑤 = 𝜃 𝑜𝑙𝑑 − 𝛼𝛻𝜃 𝐽(𝜃) Matrix notation for all parameters

learning rate

Recursively apply chain rule though each node

One forward pass
Text (input) representation
TFIDF
Word embeddings
….

0.2 -0.5 0.1 0.1 1.0 0.95 very positive

2.0 1.5 1.3 3.0 3.89 positive
0.2
0.5 0.0 0.25 0.025 0.15 negative
0.3
-0.3 2.0 0.0 0.0 0.37 very negative

𝐖 𝒙𝒊 𝒃 𝝈(𝒙𝒊 ; 𝑾, 𝒃)
Activation functions
Non-linearities needed to learn complex (non-linear) representations of
data, otherwise the NN would be just a linear function W1 W2 𝑥 = 𝑊𝑥

http://cs231n.github.io/assets/nn1/layer_sizes.jpeg

More layers and neurons can approximate more complex functions

Full list: https://en.wikipedia.org/wiki/Activation_function

Activation: Sigmoid
Takes a real-valued number and
“squashes” it into range between
0 and 1.
𝑅𝑛 → 0,1

http://adilmoujahid.com/images/activation.png

+ Nice interpretation as the firing rate of a neuron

• 0 = not firing at all
• 1 = fully firing

- Sigmoid neurons saturate and kill gradients, thus NN will barely learn
• when the neuron’s activation are 0 or 1 (saturate)
� gradient at these regions almost zero
� almost no signal will flow to its weights
� if initial weights are too large then most neurons would saturate
Activation: Tanh
Takes a real-valued number and
“squashes” it into range between
-1 and 1.
𝑅𝑛 → −1,1

http://adilmoujahid.com/images/activation.png

- Like sigmoid, tanh neurons saturate

- Unlike sigmoid, output is zero-centered
- Tanh is a scaled sigmoid: tanh 𝑥 = 2𝑠𝑖𝑔𝑚 2𝑥 − 1
Activation: ReLU
Takes a real-valued number and
thresholds it at zero f 𝑥 = max(0, 𝑥)

𝑅𝑛 → 𝑅+𝑛

http://adilmoujahid.com/images/activation.png

Most Deep Networks use ReLU nowadays

� Trains much faster

• accelerates the convergence of SGD
• due to linear, non-saturating form
� Less expensive operations
• compared to sigmoid/tanh (exponentials etc.)
• implemented by simply thresholding a matrix at zero
� More expressive
� Prevents the gradient vanishing problem
Overfitting

http://wiki.bethanycrane.com/overfitting-of-data

Learned hypothesis may fit the

training data very well, even
outliers (noise) but fail to
generalize to new examples
(test data)

https://www.neuraldesigner.com/images/learning/selection_error.svg
Regularization
Dropout
• Randomly drop units (along with their
connections) during training
• Each unit retained with fixed probability
p, independent of other units
• Hyper-parameter p to be chosen (tuned)
Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural
networks from overfitting." Journal of machine learning research (2014)

L2 = weight decay
• Regularization term that penalizes big weights,
added to the objective 𝐽𝑟𝑒𝑔 𝜃 = 𝐽 𝜃 + 𝜆 ෍ 𝜃𝑘2
• Weight decay value determines how dominant 𝑘
regularization is during gradient computation
• Big weight decay coefficient  big penalty for big weights

Early-stopping
• Use validation error to decide when to stop training
• Stop when monitored quantity has not improved after n subsequent epochs
• n is called patience
Tuning hyper-parameters
g(x) ≈ g(x) + h(y)

g(x) shown in green

h(y) is shown in yellow
Bergstra, James, and Yoshua Bengio. "Random
search for hyper-parameter optimization." Journal
of Machine Learning Research, Feb (2012)

“Grid and random search of 9 trials for optimizing function g(x) ≈ g(x) + h(y)
With grid search, nine trials only test g(x) in three distinct places.
With random search, all nine trials explore distinct values of g. ”

Both try configurations randomly and blindly

Next trial is independent to all the trials done before

Bayesian optimization for hyper-parameter tuning: Library available!

Make smarter choice for the next trial, minimize the number of trials
1. Collect the performance at several configurations
2. Make inference and decide what configuration to try next
Loss functions and output
Classification Regression

Training Rn x {class_1, ..., class_n} Rn x Rm

examples (one-hot encoding)

Output Soft-max Linear (Identity)

Layer [map Rn to a probability distribution] or Sigmoid

f(x)=x

Cost (loss) Cross-entropy Mean Squared Error

function 1
𝑛
2
𝑛 𝐾 𝐽 𝜃 = ෍ 𝑦 (𝑖) − 𝑦ො (𝑖)
1 (𝑖) (𝑖) (𝑖) 𝑖
𝑛
𝐽 𝜃 = − ෍ ෍ 𝑦𝑘 log 𝑦ො𝑘 + 1 − 𝑦𝑘 log 1 − 𝑦ො𝑘 𝑖=1
𝑛
𝑖=1 𝑘=1 Mean Absolute Error
𝑛
1
𝐽 𝜃 = ෍ 𝑦 (𝑖) − 𝑦ො (𝑖)
𝑛
List of loss functions 𝑖=1
Convolutional Neural
Networks (CNNs)
Main CNN idea for text:
Compute vectors for n-grams and group them afterwards

Example: “this takes too long” compute vectors for:

This takes, takes too, too long, this takes too, takes too long, this takes too long

Convolutional
Input matrix 3x3 filter
http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
Convolutional Neural
Networks (CNNs)
Main CNN idea for text:
Compute vectors for n-grams and group them afterwards

max pool
2x2 filters
and stride 2

https://shafeentejani.github.io/assets/images/pooling.gif
CNN for text classification

Severyn, Aliaksei, and Alessandro Moschitti. "UNITN: Training Deep Convolutional Neural Network for
Twitter Sentiment Classification." SemEval@ NAACL-HLT. 2015.
CNN with multiple filters

Kim, Y. “Convolutional Neural Networks for Sentence Classification”, EMNLP (2014)

sliding over 3, 4 or 5 words at a time

Recurrent Neural Networks
(RNNs)
Main RNN idea for text:
Condition on all previous words
Use same set of weights at all time steps
ℎ𝑡 = 𝜎(𝑊 (ℎℎ) ℎ𝑡−1 + 𝑊 (ℎ𝑥) 𝑥𝑡 )

https://pbs.twimg.com/media/C2j-8j5UsAACgEK.jpg

� Stack them up, Lego fun!

� Vanishing gradient problem

https://discuss.pytorch.org/uploads/default/original/1X/6415da0424dd66f2f5b134709b92baa59e604c55.jpg
Bidirectional RNNs
Main idea: incorporate both left and right context
output may not only depend on the previous elements in the sequence,
but also future elements.

ℎ𝑡 = 𝜎(𝑊 (ℎℎ) ℎ𝑡−1 + 𝑊 (ℎ𝑥) 𝑥𝑡 )

ℎ𝑡 = 𝜎(𝑊 (ℎℎ) ℎ𝑡+1 + 𝑊 (ℎ𝑥) 𝑥𝑡 )

𝑦𝑡 = 𝑓 ℎ𝑡 ; ℎ𝑡

past and future around a single token

http://www.wildml.com/2015/09/recurrent-neural-
networks-tutorial-part-1-introduction-to-rnns/

two RNNs stacked on top of each other

output is computed based on the hidden state of both RNNs ℎ𝑡 ; ℎ𝑡
Sequence 2 Sequence or
Encoder Decoder model

Try many other models for MT

Cho, Kyunghyun, et al. "Learning phrase

representations using RNN encoder-decoder for
statistical machine translation." EMNLP 2014
Gated Recurrent Units
(GRUs)
Main idea:
keep around memory to capture long dependencies
Allow error messages to flow at different strengths depending on the inputs
Standard RNN computes hidden layer at next
time step directly ℎ𝑡 = 𝜎(𝑊 (ℎℎ) ℎ𝑡−1 + 𝑊 (ℎ𝑥) 𝑥𝑡 )

Compute an update gate based on current

Controls how much of past state should matter now

If z close to 1, then we can copy information in that unit through many steps!
Gated Recurrent Units
(GRUs)
Main idea:
keep around memory to capture long dependencies
Allow error messages to flow at different strengths depending on the inputs
Standard RNN computes hidden layer at next
time step directly ℎ𝑡 = 𝜎(𝑊 (ℎℎ) ℎ𝑡−1 + 𝑊 (ℎ𝑥) 𝑥𝑡 )

Compute an update gate based on current

Units with short-term dependencies often have reset gates very active
Units with long-term dependencies have active update gates z
Gated Recurrent Units
(GRUs)
Main idea:
keep around memory to capture long dependencies
Allow error messages to flow at different strengths depending on the inputs
Standard RNN computes hidden layer at next
time step directly ℎ𝑡 = 𝜎(𝑊 (ℎℎ) ℎ𝑡−1 + 𝑊 (ℎ𝑥) 𝑥𝑡 )

Compute an update gate based on current

input word vector and hidden state
𝑧𝑡 = 𝜎(𝑈 (𝑧) ℎ𝑡−1 + 𝑊 (𝑧) 𝑥𝑡 )
http://www.wildml.com/2015/10/recurrent-neural-network-
tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-
theano/
Compute a reset gate similarly but with
different weights
𝑟𝑡 = 𝜎 𝑈 𝑟 ℎ𝑡−1 + 𝑊 𝑟 𝑥𝑡 LSTMs are a more complex form,
but basically same intuition
GRUs are often more preferred
New memory ℎ෨ 𝑡 = 𝑡𝑎𝑛ℎ(𝑟𝑡 ∘ 𝑈ℎ𝑡−1 + 𝑊𝑥𝑡 )
than LSTMs
Final memory ℎ𝑡 = 𝑧𝑡 ∘ ℎ𝑡−1 +(1 − 𝑧𝑡 ) ∘ ℎ෨ 𝑡 combines current & previous time steps
Attention Mechanism
Pool of source states

Bahdanau D. et al. "Neural machine translation by jointly learning to align and translate." ICLR (2015)

Main idea: retrieve as needed

Attention - Scoring

𝑠𝑐𝑜𝑟𝑒 ℎ𝑡−1 , ℎത 𝑠 = ℎ𝑡 𝑇 ℎത 𝑠
Compare target and source hidden states
Attention - Normalization

𝑒 𝑠𝑐𝑜𝑟𝑒(𝑠)
𝑎𝑡 𝑠 =
σ𝑠′ 𝑒 𝑠𝑐𝑜𝑟𝑒(𝑠′)
Convert into alignment weights
Attention - Context

𝑐𝑡 = ෍ 𝑎𝑡 𝑠 ℎത 𝑠
𝑠
Build context vector: weighted average
Attention - Context

𝑒𝑡

ℎ𝑡 = 𝑓(ℎ𝑡−1 , 𝑐𝑡 , 𝑒𝑡 )
Compute next hidden state
Application Example:
IMDB Movie reviews
sentiment classification
https://uofi.box.com/v/cs510DL

Binary Classification
Dataset of 25,000 movies reviews from IMDB,
labeled by sentiment (positive/negative)
Application Example:
Relation Extraction from text

http://www.mathcs.emory.edu/~dsavenk/slides/relation_extraction/img/distant.png
Useful for:
• knowledge base
completion
• social media analysis
• question answering
• …
Task: binary (or multi-class)
classification
sentence S = w1 w2 .. e1 .. wj .. e2 .. wn e1 and e2 entities

“The new iPhone 7 Plus includes an improved camera to take amazing pictures”

Component-Whole(e1 , e2 ) ?
YES / NO

It is also possible to include more than two entities as well:

“At codons 12, the occurrence of point mutations from G to T
were observed”  point mutation(codon, 12, G, T)
Features / Input
representation
1) context-wise split Embeddings Embeddings Embeddings
of the sentence Left Middle Right

The new iPhone 7 Plus includes an improved camera that takes amazing pictures

2) word sequences Word indices Position indices e1 Position indices e2

concatenated with [5, 7, 12, 6, 90 …] [-1, 0, 1, 2, 3 …] [-4, -3, -2 -1, 0]
positional features
Word Positional Positional
Embeddings emb. e1 emb. e2
3) concatenating
embeddings of two
entities with average Embeddings e1 Embeddings e2 context embeddings
of word embeddings
for rest of the words

The new iPhone 7 Plus includes an improved camera that takes amazing pictures
Models: MLP
Component-Whole(e1 , e2 )
Sigmoid ?
YES / NO

Dense Layer n
…

Dense Layer 1

Embeddings e1 context embeddings Embeddings e2

The new iPhone 7 Plus includes an improved camera that takes …

Simple fully-connected multi-layer perceptron

Models: CNN
Component-Whole(e1 , e2 )
Sigmoid ?
YES / NO

Max Pooling Max Pooling Max Pooling

Convolutional Convolutional Convolutional

Layer Layer Layer

Embeddings Left Embeddings Middle Embeddings Right

The new iPhone 7 Plus includes an improved camera that takes …

OR
Word indices Position indices e1 Position indices e2
[5, 7, 12, 6, 90 …] [-1, 0, 1, 2, 3 …] [-4, -3, -2 -1, 0]

Word Embeddings Positional emb. e1 Positional emb. e2

Zeng, D.et al. “Relation classication via convolutional deep neural network”.COLING (2014)
Models: CNN (2)
Component-Whole(e1 , e2 )
Sigmoid ?
YES / NO

Max Max Max

Pooling Pooling Pooling

Convol Convol Convol

ution ution ution CNN with multiple CNN with multiple
filter=2 filter=3 filter=k filter sizes filter sizes

Embeddings Left Embeddings Middle Embeddings Right

The new iPhone 7 Plus includes an improved camera that takes …

OR
Word indices Position indices e1 Position indices e2
[5, 7, 12, 6, 90 …] [-1, 0, 1, 2, 3 …] [-4, -3, -2 -1, 0]

Word Embeddings Positional emb. e1 Positional emb. e2

Nguyen, T.H., Grishman, R. “Relation extraction: Perspective from convolutional neural networks.” VS@ HLT-NAACL. (2015)
Models: Bi-GRU
Component-Whole(e1 , e2 )
Sigmoid ?
YES / NO
Attention or
Max Pooling

Bi-GRU

Embeddings Left Embeddings Middle Embeddings Right

The new iPhone 7 Plus includes an improved camera that takes …

OR
Word indices Position indices e1 Position indices e2
[5, 7, 12, 6, 90 …] [-1, 0, 1, 2, 3 …] [-4, -3, -2 -1, 0]

Word Embeddings Positional emb. e1 Positional emb. e2

Zhang, D., Wang, D. “Relation classication via recurrent neural network.” -arXiv preprint arXiv:1508.01006 (2015)
Zhou, P. et al. “Attention-based bidirectional LSTM networks for relation classication. ACL (2016)
Distant Supervision
Circumvent the annotation problem – create large dataset
Exploit large knowledge bases to automatically label entities and their
relations in text

Assumption:
when two entities co-occur in a sentence, a certain relation is expressed
knowledge base
Relation Entity 1 Entity 2 text
place of Michael Gary Barack Obama moved from Gary ….
birth Jackson Michael Jackson met … in Hawaii
place of Barack Hawaii
birth Obama
place of birth
… … …

For many ambiguous relations, mere co-occurrence does not guarantee the
existence of the relation  Distant supervision produces false positives
Attention over Instances
s representation of the sentence set

ai weight given by sentence-level attention

xi sentence vector representation

xi sentence for an entity pair (e1,e2)

n sentences for relation r(e1,e2)

Lin et al. “Neural Relation Extraction with Selective Attention over Instances” ACL (2016) [code]
Sentence-level ATT results
NYT10 Dataset
Align Freebase relations with
New York Times corpus (NYT)
53 possible relationships
+NA (no relation between entities)

Data sentences entity

pairs
Training 522,611 281,270
Test 172,448 96,678

Lin et al. “Neural Relation Extraction with Selective Attention over Instances” ACL (2016) [code]
References
 Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from
overfitting." Journal of machine learning research (2014)
 Bergstra, James, and Yoshua Bengio. "Random search for hyper-parameter optimization." Journal
of Machine Learning Research, Feb (2012)
 Kim, Y. “Convolutional Neural Networks for Sentence Classification”, EMNLP (2014)
 Severyn, Aliaksei, and Alessandro Moschitti. "UNITN: Training Deep Convolutional Neural Network
for Twitter Sentiment Classification." SemEval@ NAACL-HLT (2015)
 Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for
statistical machine translation." EMNLP (2014)
 Ilya Sutskever et al. “Sequence to sequence learning with neural networks.” NIPS (2014)
 Bahdanau et al. "Neural machine translation by jointly learning to align and translate." ICLR (2015)
 Gal, Y., Islam, R., Ghahramani, Z. “Deep Bayesian Active Learning with Image Data.” ICML (2017)
 Nair, V., Hinton, G.E. “Rectified linear units improve restricted boltzmann machines.” ICML (2010)
 Ronan Collobert, et al. “Natural language processing (almost) from scratch.” JMLR (2011)

 Kumar, Shantanu. "A Survey of Deep Learning Methods for Relation Extraction." arXiv preprint
arXiv:1705.03645 (2017)
 Lin et al. “Neural Relation Extraction with Selective Attention over Instances” ACL (2016) [code]
 Zeng, D.et al. “Relation classification via convolutional deep neural network”. COLING (2014)
 Nguyen, T.H., Grishman, R. “Relation extraction: Perspective from CNNs.” VS@ HLT-NAACL. (2015)
 Zhang, D., Wang, D. “Relation classification via recurrent NN.” -arXiv preprint arXiv:1508.01006 (2015)
 Zhou, P. et al. “Attention-based bidirectional LSTM networks for relation classification . ACL (2016)
 Mike Mintz et al. “Distant supervision for relation extraction without labeled data.” ACL- IJCNLP (2009)
References & Resources
 http://web.stanford.edu/class/cs224n
 https://www.coursera.org/specializations/deep-learning
 https://chrisalbon.com/#Deep-Learning
 http://www.asimovinstitute.org/neural-network-zoo
 http://cs231n.github.io/optimization-2
 https://medium.com/@ramrajchandradevan/the-evolution-of-gradient-descend-optimization-
algorithm-4106a6702d39
 https://arimo.com/data-science/2016/bayesian-optimization-hyperparameter-tuning
 http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow
 http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp
 https://medium.com/technologymadeeasy/the-best-explanation-of-convolutional-neural-
networks-on-the-internet-fbb8b1ad5df8
 http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
 http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-
grulstm-rnn-with-python-and-theano/
 http://colah.github.io/posts/2015-08-Understanding-LSTMs
 https://github.com/hyperopt/hyperopt
 https://github.com/tensorflow/nmt
https://giphy.com/gifs/thanks-thank-you-thnx-3o6ozuHcxTtVWJJn32/download

A Survey On Vision Transformer
No ratings yet
A Survey On Vision Transformer
23 pages
Skip Gram
No ratings yet
Skip Gram
37 pages
Deep Learning - DL-2
100% (1)
Deep Learning - DL-2
44 pages
Deep Learning
No ratings yet
Deep Learning
18 pages
Deep Learning
No ratings yet
Deep Learning
189 pages
Flutter With Android Introduction
No ratings yet
Flutter With Android Introduction
14 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu - Medium
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu - Medium
8 pages
22 Selected Top Papers On Deep Learning
No ratings yet
22 Selected Top Papers On Deep Learning
393 pages
Introduction To Machine Learning PDF
100% (1)
Introduction To Machine Learning PDF
17 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Deep Learning: Huawei AI Academy Training Materials
No ratings yet
Deep Learning: Huawei AI Academy Training Materials
47 pages
Dropout Vs Pruning
No ratings yet
Dropout Vs Pruning
2 pages
Hyper-Parameter Tuning Techniques in Deep Learning - Towards Data Science
No ratings yet
Hyper-Parameter Tuning Techniques in Deep Learning - Towards Data Science
14 pages
UNIT-I_Introduction to Computer Vision
No ratings yet
UNIT-I_Introduction to Computer Vision
45 pages
Chapter 7 - Neural-Networks
100% (1)
Chapter 7 - Neural-Networks
60 pages
Computer Vision Unit 4
No ratings yet
Computer Vision Unit 4
186 pages
Introduction of Neural Network
No ratings yet
Introduction of Neural Network
31 pages
DEEP_LEARNING_UNIT_1[1]
No ratings yet
DEEP_LEARNING_UNIT_1[1]
24 pages
Deep Learning PPT Full Notes
No ratings yet
Deep Learning PPT Full Notes
105 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
Emotion Detection
No ratings yet
Emotion Detection
23 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
Deep Learning Based Recommendation Systems
No ratings yet
Deep Learning Based Recommendation Systems
47 pages
Object Detection Slides
No ratings yet
Object Detection Slides
90 pages
1 - Intro To Machine Learning
100% (1)
1 - Intro To Machine Learning
20 pages
Deep Learning Nanodegree Syllabus 8-15
No ratings yet
Deep Learning Nanodegree Syllabus 8-15
15 pages
GANppt
100% (1)
GANppt
34 pages
Lec16 - Autoencoders
No ratings yet
Lec16 - Autoencoders
18 pages
Neural Network
No ratings yet
Neural Network
58 pages
Early Stopping in Practice
No ratings yet
Early Stopping in Practice
14 pages
Car Make and Model Recognition Using Ima
No ratings yet
Car Make and Model Recognition Using Ima
8 pages
A Novel Adoption of LSTM in Customer Touchpoint Prediction Problems Presentation 1
No ratings yet
A Novel Adoption of LSTM in Customer Touchpoint Prediction Problems Presentation 1
73 pages
Deep Learning
No ratings yet
Deep Learning
41 pages
Final PPT
No ratings yet
Final PPT
44 pages
Sign Language Recognition Using Deep Learning
No ratings yet
Sign Language Recognition Using Deep Learning
6 pages
ch14 Autoencoder
No ratings yet
ch14 Autoencoder
42 pages
Deep Learning: - Course Code: - Unit 1
No ratings yet
Deep Learning: - Course Code: - Unit 1
21 pages
Transactions &ConcurrencyControl
No ratings yet
Transactions &ConcurrencyControl
40 pages
Object Recognition
No ratings yet
Object Recognition
30 pages
Ensemble Learning: Wisdom of The Crowd
100% (1)
Ensemble Learning: Wisdom of The Crowd
12 pages
Artificial Intelligence and Deep Learning
0% (1)
Artificial Intelligence and Deep Learning
9 pages
ML First Unit
No ratings yet
ML First Unit
70 pages
Introduction To Computer Vision
No ratings yet
Introduction To Computer Vision
10 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
Lecture 1: Introduction To Reinforcement Learning: David Silver
No ratings yet
Lecture 1: Introduction To Reinforcement Learning: David Silver
46 pages
FAANGPath Simple Template 1
No ratings yet
FAANGPath Simple Template 1
2 pages
Deep Learning (MODULE-3) (1)
No ratings yet
Deep Learning (MODULE-3) (1)
85 pages
Android UI Lecture Layout
No ratings yet
Android UI Lecture Layout
33 pages
Chapter_1_Introduction_to_computer_vision_and_image_processing_for
No ratings yet
Chapter_1_Introduction_to_computer_vision_and_image_processing_for
81 pages
CS5560 Lect12-RNN - LSTM
No ratings yet
CS5560 Lect12-RNN - LSTM
30 pages
Regularization_for_Neural_Networks_1718966083
No ratings yet
Regularization_for_Neural_Networks_1718966083
9 pages
Lazy Lerners (Learning From Your Neighbours)
100% (1)
Lazy Lerners (Learning From Your Neighbours)
11 pages
Lec 06 Feature Selection and Extraction
No ratings yet
Lec 06 Feature Selection and Extraction
43 pages
Mobile Agent-Based Software
No ratings yet
Mobile Agent-Based Software
15 pages
Deep Learning
100% (3)
Deep Learning
32 pages
Tensorflow Internal
No ratings yet
Tensorflow Internal
17 pages
CSE Dept. PPT 176 173
No ratings yet
CSE Dept. PPT 176 173
17 pages
Speech Emotion Recognition With Deep Learning
No ratings yet
Speech Emotion Recognition With Deep Learning
10 pages
Feature engineering Complete Self-Assessment Guide
From Everand
Feature engineering Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Deep Learning
No ratings yet
Deep Learning
49 pages
3simple Factors of Polynomials
No ratings yet
3simple Factors of Polynomials
28 pages
PRELIM
No ratings yet
PRELIM
22 pages
APR -MAY 23
No ratings yet
APR -MAY 23
2 pages
Exercises ILP
No ratings yet
Exercises ILP
19 pages
Introduction To Fem
No ratings yet
Introduction To Fem
21 pages
Big-M Method
No ratings yet
Big-M Method
23 pages
Optimization Techniques: Amrita Vishwa Vidyapeetham
No ratings yet
Optimization Techniques: Amrita Vishwa Vidyapeetham
38 pages
Martins MDO Course Slides PDF
No ratings yet
Martins MDO Course Slides PDF
427 pages
DAA Assignment
No ratings yet
DAA Assignment
3 pages
Static Condensation and Substructuring
No ratings yet
Static Condensation and Substructuring
2 pages
Dynamic Programming and The Knapsack Problem: Paul Dohmen Roshnika Fernando
No ratings yet
Dynamic Programming and The Knapsack Problem: Paul Dohmen Roshnika Fernando
12 pages
Homework Set No. 7: 1. Various Methods
No ratings yet
Homework Set No. 7: 1. Various Methods
3 pages
Sample Viva Questions Fractional Knapsack and Greedy Algorithm
No ratings yet
Sample Viva Questions Fractional Knapsack and Greedy Algorithm
2 pages
CS 210a Assignment 1 (Concept) Solutions
No ratings yet
CS 210a Assignment 1 (Concept) Solutions
3 pages
Daa Assignment - 3
No ratings yet
Daa Assignment - 3
2 pages
Bakari832017ARJOM34769 PDF
No ratings yet
Bakari832017ARJOM34769 PDF
7 pages
Separable Programming Presentation
No ratings yet
Separable Programming Presentation
33 pages
Applications of Numerical Methods Using Matlab2
No ratings yet
Applications of Numerical Methods Using Matlab2
15 pages
A New Algorithm For Solving Linear Programming Problems
No ratings yet
A New Algorithm For Solving Linear Programming Problems
6 pages
Simplex Method Max LP
No ratings yet
Simplex Method Max LP
3 pages
5CS4-05 Analysis of Algorithms Set 2
No ratings yet
5CS4-05 Analysis of Algorithms Set 2
1 page
Experiment No. 3 Image Sampling and Quantization
No ratings yet
Experiment No. 3 Image Sampling and Quantization
3 pages
Math 311 Numerical Solutions of Ordinary Differential Equations
No ratings yet
Math 311 Numerical Solutions of Ordinary Differential Equations
2 pages
Math 135A Laurent Ucr 2010 Fall Numerical Analysis
No ratings yet
Math 135A Laurent Ucr 2010 Fall Numerical Analysis
43 pages
Nonlinear Optimization: Overview of Methods The Newton Method With Line Search
No ratings yet
Nonlinear Optimization: Overview of Methods The Newton Method With Line Search
9 pages
Refresher: Perceptron Training Algorithm
No ratings yet
Refresher: Perceptron Training Algorithm
12 pages
MGT 208 - The Optimality Test Using MODI
No ratings yet
MGT 208 - The Optimality Test Using MODI
34 pages
A Survey of The Extended Finite Element
No ratings yet
A Survey of The Extended Finite Element
1 page
Adams Moulton
No ratings yet
Adams Moulton
11 pages
Appendix A Numerical Solution To Heat Equation
No ratings yet
Appendix A Numerical Solution To Heat Equation
3 pages