ML Group 4 Assignment
ML Group 4 Assignment
1. Abel Fentaye-------------------------1404708
2. Habitamu Getaneh------------------1406170
3. Lingere Yeshambel------------------1403863
4. Tsehay Shahile------------------------1404382
5. Daregot Getachew--------------------1405874
Netflix's recommendation system uses a variety of factors, including user viewing history,
ratings, and the behavior of similar users, to predict what content you might enjoy. It leverages
machine learning algorithms to analyze this data and personalize recommendations.
Here's a more detailed breakdown:
Data Sources:
User Interactions:
Netflix collects data on what users watch (including duration and frequency), how they rate
content, and their search queries.
Content Metadata:
The system uses information about the content itself, such as genre, actors, release year, and
popularity.
User Preferences:
Recommendations are tailored to individual preferences, including viewing history and ratings.
Netflix also analyzes the behavior of users with similar tastes and preferences.
Initial Recommendations:
New subscribers are asked to choose some initial titles they'd like to watch, which helps the
system understand their basic preferences.
Personalized Recommendations:
As users continue to watch and interact with Netflix, the system uses their viewing history,
ratings, and other data to personalize recommendations.
Machine Learning:
The system uses machine learning algorithms to analyze the data and predict what content a user
is likely to enjoy.
1|Page
Content Grouping:
Recommendations are organized into rows on the homepage, such as "Because You Watched" or
"Trending Now," to make it easier for users to browse.
Real-Time Updates:
The system constantly learns from user interactions and updates its recommendations in real-
time.
Key Technologies:
Machine Learning:
Collaborative Filtering:
This algorithm analyzes user preferences and suggests content based on similar users' behavior.
Content-Based Filtering:
This approach recommends content based on the content's metadata, such as genre and actors.
Amazon utilizes both product recommendations and dynamic pricing to optimize customer
experience and profitability. Product recommendations suggest items based on a customer's
browsing and purchase history, while dynamic pricing adjusts product prices in real-time based
on factors like demand, competition, and inventory levels.
Product Recommendations:
How it works:
Amazon's recommendation engine analyzes vast amounts of data to identify patterns and
preferences, then suggests products that customers might be interested in.
Purpose:
To increase sales by showcasing relevant products and driving traffic to those listings,
ultimately boosting customer engagement and satisfaction.
Examples:
"Customers who bought this also bought...", "You may also like...", and personalized
product displays on the homepage.
Dynamic Pricing:
How it works:
2|Page
Amazon uses algorithms and automation to adjust prices based on real-time data, including
competitor pricing, demand fluctuations, and inventory levels.
Purpose:
How it Uses ML: Google Search uses numerous ML models to understand query intent and rank
billions of web pages to provide the most relevant and useful results. Early ranking was more
rule-based (like PageRank), but ML is now central.
How the Model(s) Work: Google employs complex deep learning models, particularly in
Natural Language Processing (NLP).
Assessing Page Quality & Relevance: ML models analyze hundreds of signals for each
webpage, including content quality (originality, depth), keyword relevance (not just
density but semantic relevance), user engagement signals (how users interact with search
results), page loading speed, mobile-friendliness, backlink quality and context (informed
by ML analysis, not just raw counts), and more.
3|Page
Learning from User Interaction: Models learn implicitly from
aggregated and anonymized click data – if many users click on the 3rd
result for a specific query and seem satisfied (don't immediately return to
search), the models might learn that this result is highly relevant for that
query intent.
Output: The ranked list of search results (SERP - Search Engine Results Page) tailored
to the user's query, location, language, and inferred intent.
The core difference lies in how the system arrives at the solution or performs a task:
Analogy: You give the computer a detailed recipe (the program) explaining
exactly how to bake a cake (process data) using specific ingredients (input). The
computer follows the recipe precisely.
Logic Creation: The logic, rules, and decision-making processes are entirely
defined by the programmer beforehand.
How it Works: Instead of writing explicit rules, the programmer provides the
computer with a large amount of data (examples) and an ML algorithm (a general
framework for learning). The algorithm analyzes the data to discover patterns,
correlations, and underlying structures on its own. It uses these learned patterns to
build a "model" which can then make predictions or decisions on new, unseen
data.
4|Page
Analogy: You show the computer thousands of pictures labeled "cat" and
thousands labeled "not cat" (the data). You provide a learning algorithm (e.g., a
neural network structure). The computer figures out for itself what visual features
(patterns) distinguish a cat. It builds its own internal "rules" for cat identification
(the model).
In essence:
5|Page
Machine Learning: Programmer tells the computer how to learn from data to solve a
problem.
Ensemble learning is a machine learning technique where multiple individual models (often
called "base learners" or "weak learners") are trained to solve the same problem, and their
predictions are combined to produce a final, overall prediction.
The core idea is based on the principle of "wisdom of the crowd": by combining the outputs of
several diverse models, the final prediction is often more accurate, robust, and stable than any
single model could achieve on its own.
Think of it like this: Instead of asking one expert for their opinion on a complex topic, you ask a
diverse group of experts. You then combine their opinions (e.g., by taking the majority vote or
averaging their scores) to arrive at a more reliable and well-rounded conclusion.
The way predictions are combined depends on the task (classification or regression) and the
specific ensemble method:
For Classification:
Majority Voting (Hard Voting): The final prediction is the class predicted by
the majority of the base models.
Weighted/Soft Voting: Predictions from each model are weighted (often based
on their individual performance or confidence), and the class with the highest
total weighted vote is chosen. This often uses predicted probabilities.
For Regression:
Averaging: The final prediction is the average of the predictions from all base
models.
While there are many variations, the most common categories include:
6|Page
1. Bagging (Bootstrap Aggregating):
Trains multiple instances of the same base algorithm (e.g., Decision Trees)
independently on different random subsets of the training data (sampled with
replacement).
Combines predictions using voting or averaging.
Goal: Reduce variance and overfitting, improve stability.
Famous Example: Random Forests (an ensemble of Decision Trees).
2. Boosting:
High Accuracy is Critical: When maximizing predictive performance is the top priority
(e.g., in machine learning competitions like Kaggle, financial modeling, medical
diagnosis, critical fraud detection). Ensembles often outperform single, highly tuned
models.
Improving Model Robustness: Combining models makes the overall system less
sensitive to noise or outliers in the data, or the specifics of the training data split. The
aggregated prediction is generally more stable.
Reducing Variance (Overfitting): Techniques like Bagging (especially Random Forests)
are very effective at reducing the risk of overfitting complex models (like deep decision
trees) to the training data.
Reducing Bias: Boosting techniques are specifically designed to iteratively reduce the
bias of the combined model by focusing on hard-to-classify examples.
7|Page
Combining Different Strengths: Stacking allows you to leverage the unique ways
different algorithms model the data. One model might be good at capturing linear
relationships, while another excels at non-linear ones.
Single Models Reach Performance Limits: When optimizing hyperparameters and
feature engineering for a single model type yields diminishing returns, ensembles provide
a powerful way to push performance further.
The relationship between these two terms can be a bit confusing because a Multilayer
Perceptron (MLP) is a specific type of Feedforward Neural Network (FFNN). Often, in
practice, the terms are used interchangeably, especially when discussing standard, fully
connected networks. However, there's a technical distinction based on scope and specific
characteristics.
Here's a breakdown:
Definition: This is a broad category of Artificial Neural Networks (ANNs) where connections
between nodes do not form a cycle. Information moves in only one direction – forward – from
the input nodes, through any hidden layers, to the output nodes.
Key Characteristic: The defining feature is the absence of feedback loops or cycles. The
output of any layer does not affect that same layer or preceding layers within the current pass of
information.
Scope: It encompasses any neural network structure adhering to this unidirectional flow. This
could include:
Contrast: The opposite would be Recurrent Neural Networks (RNNs), where connections do
form cycles, allowing information to persist and influence future inputs (giving them a form of
memory).
8|Page
An Input Layer: Receives the initial data.
One or more Hidden Layers: Layers of nodes (neurons) between the input and
output layers. These are crucial for learning complex, non-linear patterns. The
"multilayer" aspect requires at least one hidden layer.
An Output Layer: Produces the final prediction or classification.
Neurons (Perceptrons): Each node in the hidden and output layers typically
performs a weighted sum of its inputs and then applies a non-linear activation
function (like Sigmoid, Tanh, ReLU). This non-linearity is essential for MLPs to
model complex data.
Full Connectivity (Typically): Usually, each node in one layer is connected to
every node in the subsequent layer. This is often implied when referring to a
standard MLP, making them also known as "fully connected feedforward
networks."
Key Characteristics: The presence of at least one hidden layer and the use of non-linear
activation functions within those layers are defining features.
FFNN is the general category: Defined by the direction of information flow (forward
only, no cycles).
MLP is a specific type of FFNN: Defined by its structure (input layer, >=1 hidden
layer(s), output layer, typically fully connected with non-linear activation functions).
Analogy:
Feedforward Neural Network (FFNN) is like the category "Polygon" (a closed shape made of
straight lines).
Multilayer Perceptron (MLP) is like the specific type "Rectangle" (a polygon with four sides
and four right angles).
All Rectangles are Polygons, but not all Polygons are Rectangles (e.g., triangles, pentagons).
Similarly, all MLPs are FFNNs, but not all FFNNs are MLPs (e.g., a single-layer perceptron is
an FFNN but not an MLP; a CNN is generally considered an FFNN but distinct from a classic
MLP due to its specialized layers).
9|Page
Convolutional Neural Network (CNN) is an advanced version of artificial neural networks
(ANNs), primarily designed to extract features from grid-like matrix datasets. This is
particularly useful for visual datasets such as images or videos, where data patterns play a
crucial role. CNNs are widely used in computer vision applications due to their effectiveness
in processing visual data.
CNNs consist of multiple layers like the input layer, Convolutional layer, pooling layer, and
fully connected layers. Let’s learn more about CNNs in detail.
Now imagine taking a small patch of this image and running a small neural network, called a
filter or kernel on it, with say, K outputs and representing them vertically.
Now slide that neural network across the whole image, as a result, we will get another image
with different widths, heights, and depths. Instead of just R, G, and B channels now we have
more channels but lesser width and height. This operation is called Convolution. If the patch
size is the same as that of the image it will be a regular neural network. Because of this small
patch, we have fewer weights.
10 | P a g e
Mathematical Overview of Convolution
Now let’s talk about a bit of mathematics that is involved in the whole convolution process.
Convolution layers consist of a set of learnable filters (or kernels) having small widths and
heights and the same depth as that of input volume (3 if the input layer is image input).
For example, if we have to run convolution on an image with dimensions 34x34x3. The
possible size of filters can be axax3, where ‘a’ can be anything like 3, 5, or 7 but smaller as
compared to the image dimension.
During the forward pass, we slide each filter across the whole input volume step by step
where each step is called stride (which can have a value of 2, 3, or even 4 for high-
dimensional images) and compute the dot product between the kernel weights and patch
from input volume.
As we slide our filters we’ll get a 2-D output for each filter and we’ll stack them together
as a result, we’ll get output volume having a depth equal to the number of filters. The
network will learn all the filters.
Layers Used to Build ConvNets
A complete Convolution Neural Networks architecture is also known as covnets. A covnets is
a sequence of layers, and every layer transforms one volume to another through a differentiable
function.
Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.
Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the
input will be an image or a sequence of images. This layer holds the raw input of the image
with width 32, height 32, and depth 3.
Convolutional Layers: This is the layer, which is used to extract the feature from the input
dataset. It applies a set of learnable filters known as the kernels to the input images. The
filters/kernels are smaller matrices usually 2×2, 3×3, or 5×5 shape. it slides over the input
image data and computes the dot product between kernel weight and the corresponding
input image patch. The output of this layer is referred as feature maps. Suppose we use a
total of 12 filters for this layer we’ll get an output volume of dimension 32 x 32 x 12.
Activation Layer: By adding an activation function to the output of the preceding layer,
activation layers add nonlinearity to the network. it will apply an element-wise activation
function to the output of the convolution layer. Some common activation functions
are RELU: max(0, x), Tanh, Leaky RELU, etc. The volume remains unchanged hence
output volume will have dimensions 32 x 32 x 12.
11 | P a g e
Pooling layer: This layer is periodically inserted in the covnets and its main function is to
reduce the size of volume which makes the computation fast reduces memory and also
prevents overfitting. Two common types of pooling layers are max pooling and average
pooling. If we use a max pool with 2 x 2 filters and stride 2, the resultant volume will be of
dimension 16x16x12.
Flattening: The resulting feature maps are flattened into a one-dimensional vector after the
convolution and pooling layers so they can be passed into a completely linked layer for
categorization or regression.
Fully Connected Layers: It takes the input from the previous layer and computes the final
classification or regression task.
Output Layer: The output from the fully connected layers is then fed into a logistic function
for classification tasks like sigmoid or softmax which converts the output of each class into the
probability score of each class.
Example: Applying CNN to an Image
Let’s consider an image and apply the convolution layer, activation layer, and pooling layer
operation to extract the inside feature.
Input image:
Step:
import the necessary libraries
12 | P a g e
set the parameter
define the kernel
Load the image and plot it.
Reformat the image
Apply convolution layer operation and plot the output image.
Apply activation layer operation and plot the output image.
Apply pooling layer operation and plot the output image.
# import the necessary libraries
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from itertools import product
# Reformat
image = tf.image.convert_image_dtype(image, dtype=tf.float32)
image = tf.expand_dims(image, axis=0)
kernel = tf.reshape(kernel, [*kernel.shape, 1, 1])
kernel = tf.cast(kernel, dtype=tf.float32)
# convolution layer
conv_fn = tf.nn.conv2d
13 | P a g e
image_filter = conv_fn(
input=image,
filters=kernel,
strides=1, # or (1, 1)
padding='SAME',
)
plt.figure(figsize=(15, 5))
plt.imshow(
tf.squeeze(image_filter)
)
plt.axis('off')
plt.title('Convolution')
# activation layer
relu_fn = tf.nn.relu
# Image detection
image_detect = relu_fn(image_filter)
plt.subplot(1, 3, 2)
plt.imshow(
# Reformat for plotting
tf.squeeze(image_detect)
)
plt.axis('off')
plt.title('Activation')
# Pooling layer
pool = tf.nn.pool
image_condense = pool(input=image_detect,
window_shape=(2, 2),
pooling_type='MAX',
strides=(2, 2),
padding='SAME',
)
plt.subplot(1, 3, 3)
plt.imshow(tf.squeeze(image_condense))
plt.axis('off')
plt.title('Pooling')
plt.show()
14 | P a g e
Output:
Output
Advantages of CNNs
1. Good at detecting patterns and features in images, videos, and audio signals.
2. Robust to translation, rotation, and scaling invariance.
3. End-to-end training, no need for manual feature extraction.
4. Can handle large amounts of data and achieve high accuracy.
Disadvantages of CNNs
1. Computationally expensive to train and require a lot of memory.
2. Can be prone to overfitting if not enough data or proper regularization is used.
3. Requires large amounts of labeled data.
4. Interpretability is limited, it’s hard to understand what the network has learned.
5. Discuss varieties of RNN Deep learning algorithm, and explain, how
each algorithm works by taking examples.
Recurrent Neural Networks (RNNs) come in several varieties, each with unique strengths for
processing sequential data.
RNNs allow the network to “remember” past information by feeding the output from one step
into next step. This helps the network understand the context of what has already happened and
make better predictions based on that. For example when predicting the next word in a
sentence the RNN uses the previous words to help decide what word is most likely to come
next.
15 | P a g e
This image showcases the basic architecture of RNN and the feedback loop mechanism
where the output is passed back as input for the next time step.
Recurrent Neuron
16 | P a g e
2. RNN Unfolding
RNN unfolding or unrolling is the process of expanding the recurrent structure over time steps.
During unfolding each step of the sequence is represented as a separate layer in a series
illustrating how information flows across each time step.
This unrolling enables backpropagation through time (BPTT) a learning process where
errors are propagated across time steps to adjust the network’s weights enhancing the RNN’s
ability to learn dependencies within sequential data.
RNN Unfolding
17 | P a g e
How does RNN work?
At each time step RNNs process units with a fixed activation function. These units have an
internal hidden state that acts as memory that retains information from previous time steps.
This memory allows the network to store past knowledge and adapt based on new inputs.
Updating the Hidden State in RNNs
The current hidden state htht depends on the previous state ht−1ht−1 and the current
input xtxt , and is calculated using the following relations:
1. State Update:
ht=f(ht−1,xt)ht =f(ht−1 ,xt )
where:
htht is the current state
ht−1ht−1 is the previous state
xtxt is the input at the current time step
2. Activation Function Application:
ht=tanh(Whh⋅ht−1+Wxh⋅xt)ht =tanh(Whh ⋅ht−1 +Wxh ⋅xt )
Here, WhhWhh is the weight matrix for the recurrent neuron, and WxhWxh is the
weight matrix for the input neuron.
3. Output Calculation:
yt=Why⋅htyt =Why ⋅ht
where ytyt is the output and WhyWhy is the weight at the output layer.
These parameters are updated using backpropagation. However, since RNN works on
sequential data here we use an updated backpropagation which is known as backpropagation
through time.
Backpropagation Through Time (BPTT) in RNNs
Since RNNs process sequential data Backpropagation Through Time (BPTT) is used to
update the network’s parameters. The loss function L(θ) depends on the final hidden
state h3h3 and each hidden state relies on preceding ones forming a sequential dependency
chain:
h3h3 depends on depends on h2,h2 depends on h1,…,h1 depends on h0 depends on h2 ,h2
depends on h1 ,…,h1 depends on h0 .
18 | P a g e
2. Handling Dependencies in Layers:
Each hidden state is updated based on its dependencies:
h3=σ(W⋅h2+b)h3 =σ(W⋅h2 +b)
The gradient is then calculated for each state, considering dependencies from previous
hidden states.
3. Gradient Calculation with Explicit and Implicit Parts: The gradient is broken
down into explicit and implicit parts summing up the indirect paths from each hidden state
to the weights.
∂h3∂W=∂h3+∂W+∂h3∂h2⋅∂h2+∂W∂W∂h3 =∂W∂h3+ +∂h2 ∂h3 ⋅∂W∂h2+
4. Final Gradient Expression:
The final derivative of the loss function with respect to the weight matrix W is computed:
∂L(θ)∂W=∂L(θ)∂h3⋅∑k=13∂h3∂hk⋅∂hk∂W∂W∂L(θ) =∂h3 ∂L(θ) ⋅∑k=13 ∂hk ∂h3
⋅∂W∂hk
This iterative process is the essence of backpropagation through time.
Types Of Recurrent Neural Networks
There are four types of RNNs based on the number of inputs and outputs in the network:
1. One-to-One RNN
This is the simplest type of neural network architecture where there is a single input and a
single output. It is used for straightforward classification tasks such as binary classification
where no sequential data is involved.
19 | P a g e
One to Many RNN
Many-to-One RNN
The Many-to-One RNN receives a sequence of inputs and generates a single output. This type
is useful when the overall context of the input sequence is needed to make one prediction. In
sentiment analysis the model receives a sequence of words (like a sentence) and produces a
single output like positive, negative or neutral.
Many-to-Many RNN
The Many-to-Many RNN type processes a sequence of inputs and generates a sequence of
outputs. In language translation task a sequence of words in one language is given as input, and
a corresponding sequence in another language is generated as output.
20 | P a g e
1. Vanilla RNN
This simplest form of RNN consists of a single hidden layer where weights are shared across
time steps. Vanilla RNNs are suitable for learning short-term dependencies but are limited by
the vanishing gradient problem, which hampers long-sequence learning.
2. Bidirectional RNNs
Bidirectional RNNs process inputs in both forward and backward directions, capturing both
past and future context for each time step. This architecture is ideal for tasks where the entire
sequence is available, such as named entity recognition and question answering.
3. Long Short-Term Memory Networks (LSTMs)
Long Short-Term Memory Networks (LSTMs) introduce a memory mechanism to overcome
the vanishing gradient problem. Each LSTM cell has three gates:
Input Gate: Controls how much new information should be added to the cell state.
Forget Gate: Decides what past information should be discarded.
Output Gate: Regulates what information should be output at the current step. This
selective memory enables LSTMs to handle long-term dependencies, making them ideal for
tasks where earlier context is critical.
4. Gated Recurrent Units (GRUs)
Gated Recurrent Units (GRUs) simplify LSTMs by combining the input and forget gates into a
single update gate and streamlining the output mechanism. This design is computationally
efficient, often performing similarly to LSTMs, and is useful in tasks where simplicity and
faster training are beneficial.
Implementing a Text Generator Using Recurrent Neural
Networks (RNNs)
In this section, we create a character-based text generator using Recurrent Neural Network
(RNN) in TensorFlow and Keras. We’ll implement an RNN that learns patterns from a text
sequence to generate new text character-by-character.
Step 1: Import Necessary Libraries
We start by importing essential libraries for data handling and building the neural network.
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense
Step 2: Define the Input Text and Prepare Character Set
We define the input text and identify unique characters in the text which we’ll encode for our
model.
text = "This is GeeksforGeeks a software training institute"
chars = sorted(list(set(text)))
char_to_index = {char: i for i, char in enumerate(chars)}
index_to_char = {i: char for i, char in enumerate(chars)}
Step 3: Create Sequences and Labels
To train the RNN, we need sequences of fixed length (seq_length) and the character following
each sequence as the label.
21 | P a g e
seq_length = 3
sequences = []
labels = []
X = np.array(sequences)
y = np.array(labels)
Step 4: Convert Sequences and Labels to One-Hot Encoding
For training, we convert X and y into one-hot encoded tensors.
X_one_hot = tf.one_hot(X, len(chars))
y_one_hot = tf.one_hot(y, len(chars))
22 | P a g e
Step 7: Generate New Text Using the Trained Model
After training, we use a starting sequence to generate new text character-by-character.
start_seq = "This is G"
generated_text = start_seq
for i in range(50):
x = np.array([[char_to_index[char] for char in generated_text[-seq_length:]]])
x_one_hot = tf.one_hot(x, len(chars))
prediction = model.predict(x_one_hot)
next_index = np.argmax(prediction)
next_char = index_to_char[next_index]
generated_text += next_char
print("Generated Text:")
print(generated_text)
Complete Code
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense
seq_length = 3
sequences = []
labels = []
X = np.array(sequences)
y = np.array(labels)
model = Sequential()
model.add(SimpleRNN(50, input_shape=(seq_length, len(chars)), activation='relu'))
model.add(Dense(len(chars), activation='softmax'))
23 | P a g e
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_one_hot, y_one_hot, epochs=100)
for i in range(50):
x = np.array([[char_to_index[char] for char in generated_text[-seq_length:]]])
x_one_hot = tf.one_hot(x, len(chars))
prediction = model.predict(x_one_hot)
next_index = np.argmax(prediction)
next_char = index_to_char[next_index]
generated_text += next_char
print("Generated Text:")
print(generated_text)
Advantages of Recurrent Neural Networks
Sequential Memory: RNNs retain information from previous inputs, making them ideal
for time-series predictions where past data is crucial. This capability is often called Long
Short-Term Memory (LSTM).
Enhanced Pixel Neighborhoods: RNNs can be combined with convolutional layers to
capture extended pixel neighborhoods improving performance in image and video data
processing.
Limitations of Recurrent Neural Networks (RNNs)
While RNNs excel at handling sequential data, they face two main training challenges
i.e., vanishing gradient and exploding gradient problem:
Vanishing Gradient: During backpropagation, gradients diminish as they pass through
each time step, leading to minimal weight updates. This limits the RNN’s ability to learn
long-term dependencies, which is crucial for tasks like language translation.
Exploding Gradient: Sometimes, gradients grow uncontrollably, causing excessively large
weight updates that destabilize training. Gradient clipping is a common technique to
manage this issue.
These challenges can hinder the performance of standard RNNs on complex, long-sequence
tasks.
Applications of Recurrent Neural Networks
RNNs are used in various applications where data is sequential or time-based:
Time-Series Prediction: RNNs excel in forecasting tasks, such as stock market predictions
and weather forecasting.
Natural Language Processing (NLP): RNNs are fundamental in NLP tasks like language
modeling, sentiment analysis, and machine translation.
Speech Recognition: RNNs capture temporal patterns in speech data, aiding in speech-to-
text and other audio-related applications.
Image and Video Processing: When combined with convolutional layers, RNNs help
analyze video sequences, facial expressions, and gesture recognition.
24 | P a g e
References
25 | P a g e