0% found this document useful (0 votes)
0 views

ML Group 4 Assignment

The document discusses the application of machine learning in various companies, including Netflix, Amazon, and Google, detailing how their respective models work. It contrasts machine learning with general programming, explaining the differences in logic creation and data handling. Additionally, it covers ensemble learning techniques and the distinction between feedforward neural networks and multilayer perceptrons.

Uploaded by

abelielom1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

ML Group 4 Assignment

The document discusses the application of machine learning in various companies, including Netflix, Amazon, and Google, detailing how their respective models work. It contrasts machine learning with general programming, explaining the differences in logic creation and data handling. Additionally, it covers ensemble learning techniques and the distinction between feedforward neural networks and multilayer perceptrons.

Uploaded by

abelielom1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Name ID

1. Abel Fentaye-------------------------1404708

2. Habitamu Getaneh------------------1406170

3. Lingere Yeshambel------------------1403863

4. Tsehay Shahile------------------------1404382

5. Daregot Getachew--------------------1405874

Submited to : Mr. Ewunetu


1. List at least three companies that uses machine learning models, and
discuss how the model works?
 Netflix’s Recommendation Systems

Netflix's recommendation system uses a variety of factors, including user viewing history,
ratings, and the behavior of similar users, to predict what content you might enjoy. It leverages
machine learning algorithms to analyze this data and personalize recommendations.
Here's a more detailed breakdown:

Data Sources:

 User Interactions:

Netflix collects data on what users watch (including duration and frequency), how they rate
content, and their search queries.

 Content Metadata:

The system uses information about the content itself, such as genre, actors, release year, and
popularity.

 User Preferences:

Recommendations are tailored to individual preferences, including viewing history and ratings.

 Similar User Behavior:

Netflix also analyzes the behavior of users with similar tastes and preferences.

How the System Works:

 Initial Recommendations:

New subscribers are asked to choose some initial titles they'd like to watch, which helps the
system understand their basic preferences.

 Personalized Recommendations:

As users continue to watch and interact with Netflix, the system uses their viewing history,
ratings, and other data to personalize recommendations.

 Machine Learning:
The system uses machine learning algorithms to analyze the data and predict what content a user
is likely to enjoy.

1|Page
 Content Grouping:
Recommendations are organized into rows on the homepage, such as "Because You Watched" or
"Trending Now," to make it easier for users to browse.

 Real-Time Updates:
The system constantly learns from user interactions and updates its recommendations in real-
time.

Key Technologies:

Machine Learning:

Netflix relies heavily on machine learning to power its recommendation engine.

Collaborative Filtering:

This algorithm analyzes user preferences and suggests content based on similar users' behavior.
Content-Based Filtering:
This approach recommends content based on the content's metadata, such as genre and actors.

 Amazon: Product Recommendations & Dynamic Pricing

Amazon utilizes both product recommendations and dynamic pricing to optimize customer
experience and profitability. Product recommendations suggest items based on a customer's
browsing and purchase history, while dynamic pricing adjusts product prices in real-time based
on factors like demand, competition, and inventory levels.
Product Recommendations:

How it works:

Amazon's recommendation engine analyzes vast amounts of data to identify patterns and
preferences, then suggests products that customers might be interested in.

Purpose:

To increase sales by showcasing relevant products and driving traffic to those listings,
ultimately boosting customer engagement and satisfaction.

Examples:

"Customers who bought this also bought...", "You may also like...", and personalized
product displays on the homepage.

Dynamic Pricing:

How it works:

2|Page
Amazon uses algorithms and automation to adjust prices based on real-time data, including
competitor pricing, demand fluctuations, and inventory levels.

Purpose:

To maximize profitability and remain competitive by adjusting prices dynamically to capitalize


on market conditions.
Examples:
Increasing prices during peak demand or reducing prices during promotional periods to attract
customers.
In essence:

Product recommendations: personalize the shopping experience and drive sales.


Dynamic pricing: optimizes revenue and competitiveness by adapting to market
conditions

 Google: Search Ranking (e.g., BERT, MUM)

How it Uses ML: Google Search uses numerous ML models to understand query intent and rank
billions of web pages to provide the most relevant and useful results. Early ranking was more
rule-based (like PageRank), but ML is now central.

How the Model(s) Work: Google employs complex deep learning models, particularly in
Natural Language Processing (NLP).

 Understanding Query Intent (e.g., RankBrain, BERT, MUM):

 RankBrain (Older but foundational): Helped Google understand


ambiguous or novel queries (the ~15% it hadn't seen before) by relating
them to broader concepts using word vectors (embeddings).
 BERT (Bidirectional Encoder Representations from Transformers): A
powerful model that understands the context of words in a query by
considering the words before and after it (bidirectional). This allows
Google to grasp nuances, prepositions (like "to" vs. "for"), and the overall
meaning of longer, conversational queries much better.
 MUM (Multitask Unified Model): An even more advanced model designed
to understand information across different languages and formats (text,
images) simultaneously. It aims to answer complex questions that might
require synthesizing information from multiple sources, moving beyond
simple keyword matching to deeper comprehension.

 Assessing Page Quality & Relevance: ML models analyze hundreds of signals for each
webpage, including content quality (originality, depth), keyword relevance (not just
density but semantic relevance), user engagement signals (how users interact with search
results), page loading speed, mobile-friendliness, backlink quality and context (informed
by ML analysis, not just raw counts), and more.

3|Page
 Learning from User Interaction: Models learn implicitly from
aggregated and anonymized click data – if many users click on the 3rd
result for a specific query and seem satisfied (don't immediately return to
search), the models might learn that this result is highly relevant for that
query intent.

Output: The ranked list of search results (SERP - Search Engine Results Page) tailored
to the user's query, location, language, and inferred intent.

2. How machine learning is different from general programming?

The core difference lies in how the system arrives at the solution or performs a task:

 General Programming (Explicit Instructions):

How it Works: A human programmer explicitly writes step-by-step instructions


(code, algorithms, rules) that tell the computer exactly how to process input data
and produce a specific output.

Analogy: You give the computer a detailed recipe (the program) explaining
exactly how to bake a cake (process data) using specific ingredients (input). The
computer follows the recipe precisely.

Logic Creation: The logic, rules, and decision-making processes are entirely
defined by the programmer beforehand.

Input/Output: Input Data -> Program (Fixed Rules) -> Output.

Example: Writing a program to calculate payroll. You define rules like if


hours_worked > 40 then overtime_pay = (hours_worked - 40) * rate * 1.5. The
program executes these exact rules.

Goal: To execute a predefined set of instructions reliably and predictably.

 Machine Learning (Learning from Data):

How it Works: Instead of writing explicit rules, the programmer provides the
computer with a large amount of data (examples) and an ML algorithm (a general
framework for learning). The algorithm analyzes the data to discover patterns,
correlations, and underlying structures on its own. It uses these learned patterns to
build a "model" which can then make predictions or decisions on new, unseen
data.

4|Page
Analogy: You show the computer thousands of pictures labeled "cat" and
thousands labeled "not cat" (the data). You provide a learning algorithm (e.g., a
neural network structure). The computer figures out for itself what visual features
(patterns) distinguish a cat. It builds its own internal "rules" for cat identification
(the model).

Logic Creation: The logic or "rules" (often complex mathematical relationships


or patterns) are learned implicitly by the algorithm from the data during a
"training" phase. The programmer guides the learning process but doesn't define
the specific pattern-matching rules.

Input/Output (Training): Input Data + Expected Outputs (Labels) -> ML


Algorithm -> Model (Learned Patterns/Rules).

Input/Output (Prediction/Inference): New Input Data -> Model -> Predicted


Output.

Example: Building a spam filter. Instead of writing thousands of rules for


specific spam words, you feed the ML algorithm thousands of emails labeled as
"spam" or "not spam." It learns the characteristics (word frequencies, sender
patterns, etc.) associated with spam and builds a model to classify new, incoming
emails.

Goal: To enable the system to perform tasks (like prediction, classification,


clustering) without being explicitly programmed for them, often by generalizing
from examples.

Here's a table summarizing the key differences:

General Programming Machine Learning


Core Explicit instructions (Rules Learning from data (Data first)
Approach first)
Logic Defined by human Learned implicitly by the algorithm
Source programmer
Role of Input to be processed by the Input used to create the rules/model
Data rules
Process Code -> Execute Data + Algorithm -> Train -> Predict
Output Deterministic based on code Often probabilistic, based on patterns
Flexibility Rigid; requires code changes Can adapt/generalize to new data
for new logic
Problem Well-defined tasks with Complex pattern recognition, prediction, tasks
Type known rules where rules are unknown/hard to define

In essence:

 General Programming: Programmer tells the computer how to solve a problem.

5|Page
 Machine Learning: Programmer tells the computer how to learn from data to solve a
problem.

 What is ensemble learning and when this technique is applied?

What is Ensemble Learning?

Ensemble learning is a machine learning technique where multiple individual models (often
called "base learners" or "weak learners") are trained to solve the same problem, and their
predictions are combined to produce a final, overall prediction.

The core idea is based on the principle of "wisdom of the crowd": by combining the outputs of
several diverse models, the final prediction is often more accurate, robust, and stable than any
single model could achieve on its own.

Think of it like this: Instead of asking one expert for their opinion on a complex topic, you ask a
diverse group of experts. You then combine their opinions (e.g., by taking the majority vote or
averaging their scores) to arrive at a more reliable and well-rounded conclusion.

How Predictions are Combined:

The way predictions are combined depends on the task (classification or regression) and the
specific ensemble method:

For Classification:

Majority Voting (Hard Voting): The final prediction is the class predicted by
the majority of the base models.

Weighted/Soft Voting: Predictions from each model are weighted (often based
on their individual performance or confidence), and the class with the highest
total weighted vote is chosen. This often uses predicted probabilities.

For Regression:

Averaging: The final prediction is the average of the predictions from all base
models.

Weighted Averaging: Similar to weighted voting, predictions are averaged, but


some models contribute more to the average based on their perceived reliability.

Key Types of Ensemble Methods:

While there are many variations, the most common categories include:

6|Page
1. Bagging (Bootstrap Aggregating):

 Trains multiple instances of the same base algorithm (e.g., Decision Trees)
independently on different random subsets of the training data (sampled with
replacement).
 Combines predictions using voting or averaging.
 Goal: Reduce variance and overfitting, improve stability.
 Famous Example: Random Forests (an ensemble of Decision Trees).

2. Boosting:

 Trains multiple instances of the same base algorithm sequentially.


 Each new model focuses on correcting the errors made by the previous
models (e.g., by giving more weight to misclassified data points).
 Combines predictions, often giving more weight to better-performing
models.
 Goal: Reduce bias and often achieve very high accuracy.
 Famous Examples: AdaBoost, Gradient Boosting Machines (GBM),
XGBoost, LightGBM, CatBoost.

3. Stacking (Stacked Generalization):

 Trains multiple different types of base models (e.g., a Decision Tree, an


SVM, a K-Nearest Neighbors).
 Uses the predictions of these base models as input features for a final
"meta-model" (e.g., Logistic Regression), which learns the optimal way to
combine their predictions.
 Goal: Leverage the diverse strengths of different algorithms.

When is Ensemble Learning Applied?

Ensemble techniques are applied in various situations, primarily when:

 High Accuracy is Critical: When maximizing predictive performance is the top priority
(e.g., in machine learning competitions like Kaggle, financial modeling, medical
diagnosis, critical fraud detection). Ensembles often outperform single, highly tuned
models.
 Improving Model Robustness: Combining models makes the overall system less
sensitive to noise or outliers in the data, or the specifics of the training data split. The
aggregated prediction is generally more stable.
 Reducing Variance (Overfitting): Techniques like Bagging (especially Random Forests)
are very effective at reducing the risk of overfitting complex models (like deep decision
trees) to the training data.
 Reducing Bias: Boosting techniques are specifically designed to iteratively reduce the
bias of the combined model by focusing on hard-to-classify examples.

7|Page
 Combining Different Strengths: Stacking allows you to leverage the unique ways
different algorithms model the data. One model might be good at capturing linear
relationships, while another excels at non-linear ones.
 Single Models Reach Performance Limits: When optimizing hyperparameters and
feature engineering for a single model type yields diminishing returns, ensembles provide
a powerful way to push performance further.

4. Discuss the difference between feedforward neural network and multi-


layer perceptron.

The relationship between these two terms can be a bit confusing because a Multilayer
Perceptron (MLP) is a specific type of Feedforward Neural Network (FFNN). Often, in
practice, the terms are used interchangeably, especially when discussing standard, fully
connected networks. However, there's a technical distinction based on scope and specific
characteristics.

Here's a breakdown:

1. Feedforward Neural Network (FFNN):

Definition: This is a broad category of Artificial Neural Networks (ANNs) where connections
between nodes do not form a cycle. Information moves in only one direction – forward – from
the input nodes, through any hidden layers, to the output nodes.

Key Characteristic: The defining feature is the absence of feedback loops or cycles. The
output of any layer does not affect that same layer or preceding layers within the current pass of
information.

Scope: It encompasses any neural network structure adhering to this unidirectional flow. This
could include:

 Single-Layer Perceptrons (input directly connected to output, no hidden layers).


 Multilayer Perceptrons (MLPs).
 Convolutional Neural Networks (CNNs) - While they have specialized layers, the
overall flow of information is typically feedforward.
 Radial Basis Function (RBF) Networks.

Contrast: The opposite would be Recurrent Neural Networks (RNNs), where connections do
form cycles, allowing information to persist and influence future inputs (giving them a form of
memory).

2. Multilayer Perceptron (MLP):

Definition: This is a specific class of Feedforward Neural Network. It consists of:

8|Page
 An Input Layer: Receives the initial data.
 One or more Hidden Layers: Layers of nodes (neurons) between the input and
output layers. These are crucial for learning complex, non-linear patterns. The
"multilayer" aspect requires at least one hidden layer.
 An Output Layer: Produces the final prediction or classification.
 Neurons (Perceptrons): Each node in the hidden and output layers typically
performs a weighted sum of its inputs and then applies a non-linear activation
function (like Sigmoid, Tanh, ReLU). This non-linearity is essential for MLPs to
model complex data.
 Full Connectivity (Typically): Usually, each node in one layer is connected to
every node in the subsequent layer. This is often implied when referring to a
standard MLP, making them also known as "fully connected feedforward
networks."

Key Characteristics: The presence of at least one hidden layer and the use of non-linear
activation functions within those layers are defining features.

Scope: It's a subset of FFNNs.

The Core Difference Summarized:

 FFNN is the general category: Defined by the direction of information flow (forward
only, no cycles).

 MLP is a specific type of FFNN: Defined by its structure (input layer, >=1 hidden
layer(s), output layer, typically fully connected with non-linear activation functions).

Analogy:

Think of it like shapes:

Feedforward Neural Network (FFNN) is like the category "Polygon" (a closed shape made of
straight lines).

Multilayer Perceptron (MLP) is like the specific type "Rectangle" (a polygon with four sides
and four right angles).

All Rectangles are Polygons, but not all Polygons are Rectangles (e.g., triangles, pentagons).
Similarly, all MLPs are FFNNs, but not all FFNNs are MLPs (e.g., a single-layer perceptron is
an FFNN but not an MLP; a CNN is generally considered an FFNN but distinct from a classic
MLP due to its specialized layers).

5. Discuss in detail about Convolutional Neural Networks (CNNs)


Architecture.

9|Page
Convolutional Neural Network (CNN) is an advanced version of artificial neural networks
(ANNs), primarily designed to extract features from grid-like matrix datasets. This is
particularly useful for visual datasets such as images or videos, where data patterns play a
crucial role. CNNs are widely used in computer vision applications due to their effectiveness
in processing visual data.

CNNs consist of multiple layers like the input layer, Convolutional layer, pooling layer, and
fully connected layers. Let’s learn more about CNNs in detail.

How Convolutional Layers Works?


Convolution Neural Networks are neural networks that share their parameters.
Imagine you have an image. It can be represented as a cuboid having its length, width
(dimension of the image), and height (i.e the channel as images generally have red, green, and
blue channels).

Now imagine taking a small patch of this image and running a small neural network, called a
filter or kernel on it, with say, K outputs and representing them vertically.
Now slide that neural network across the whole image, as a result, we will get another image
with different widths, heights, and depths. Instead of just R, G, and B channels now we have
more channels but lesser width and height. This operation is called Convolution. If the patch
size is the same as that of the image it will be a regular neural network. Because of this small
patch, we have fewer weights.

10 | P a g e
Mathematical Overview of Convolution
Now let’s talk about a bit of mathematics that is involved in the whole convolution process.
 Convolution layers consist of a set of learnable filters (or kernels) having small widths and
heights and the same depth as that of input volume (3 if the input layer is image input).
 For example, if we have to run convolution on an image with dimensions 34x34x3. The
possible size of filters can be axax3, where ‘a’ can be anything like 3, 5, or 7 but smaller as
compared to the image dimension.
 During the forward pass, we slide each filter across the whole input volume step by step
where each step is called stride (which can have a value of 2, 3, or even 4 for high-
dimensional images) and compute the dot product between the kernel weights and patch
from input volume.
 As we slide our filters we’ll get a 2-D output for each filter and we’ll stack them together
as a result, we’ll get output volume having a depth equal to the number of filters. The
network will learn all the filters.
Layers Used to Build ConvNets
A complete Convolution Neural Networks architecture is also known as covnets. A covnets is
a sequence of layers, and every layer transforms one volume to another through a differentiable
function.
Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.
 Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the
input will be an image or a sequence of images. This layer holds the raw input of the image
with width 32, height 32, and depth 3.
 Convolutional Layers: This is the layer, which is used to extract the feature from the input
dataset. It applies a set of learnable filters known as the kernels to the input images. The
filters/kernels are smaller matrices usually 2×2, 3×3, or 5×5 shape. it slides over the input
image data and computes the dot product between kernel weight and the corresponding
input image patch. The output of this layer is referred as feature maps. Suppose we use a
total of 12 filters for this layer we’ll get an output volume of dimension 32 x 32 x 12.
 Activation Layer: By adding an activation function to the output of the preceding layer,
activation layers add nonlinearity to the network. it will apply an element-wise activation
function to the output of the convolution layer. Some common activation functions
are RELU: max(0, x), Tanh, Leaky RELU, etc. The volume remains unchanged hence
output volume will have dimensions 32 x 32 x 12.

11 | P a g e
 Pooling layer: This layer is periodically inserted in the covnets and its main function is to
reduce the size of volume which makes the computation fast reduces memory and also
prevents overfitting. Two common types of pooling layers are max pooling and average
pooling. If we use a max pool with 2 x 2 filters and stride 2, the resultant volume will be of
dimension 16x16x12.

Image source: cs231n.stanford.edu

 Flattening: The resulting feature maps are flattened into a one-dimensional vector after the
convolution and pooling layers so they can be passed into a completely linked layer for
categorization or regression.
 Fully Connected Layers: It takes the input from the previous layer and computes the final
classification or regression task.

Output Layer: The output from the fully connected layers is then fed into a logistic function
for classification tasks like sigmoid or softmax which converts the output of each class into the
probability score of each class.
Example: Applying CNN to an Image
Let’s consider an image and apply the convolution layer, activation layer, and pooling layer
operation to extract the inside feature.
Input image:

Step:
 import the necessary libraries

12 | P a g e
 set the parameter
 define the kernel
 Load the image and plot it.
 Reformat the image
 Apply convolution layer operation and plot the output image.
 Apply activation layer operation and plot the output image.
 Apply pooling layer operation and plot the output image.
# import the necessary libraries
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from itertools import product

# set the param


plt.rc('figure', autolayout=True)
plt.rc('image', cmap='magma')

# define the kernel


kernel = tf.constant([[-1, -1, -1],
[-1, 8, -1],
[-1, -1, -1],
])

# load the image


image = tf.io.read_file('Ganesh.jpg')
image = tf.io.decode_jpeg(image, channels=1)
image = tf.image.resize(image, size=[300, 300])

# plot the image


img = tf.squeeze(image).numpy()
plt.figure(figsize=(5, 5))
plt.imshow(img, cmap='gray')
plt.axis('off')
plt.title('Original Gray Scale image')
plt.show();

# Reformat
image = tf.image.convert_image_dtype(image, dtype=tf.float32)
image = tf.expand_dims(image, axis=0)
kernel = tf.reshape(kernel, [*kernel.shape, 1, 1])
kernel = tf.cast(kernel, dtype=tf.float32)

# convolution layer
conv_fn = tf.nn.conv2d

13 | P a g e
image_filter = conv_fn(
input=image,
filters=kernel,
strides=1, # or (1, 1)
padding='SAME',
)

plt.figure(figsize=(15, 5))

# Plot the convolved image


plt.subplot(1, 3, 1)

plt.imshow(
tf.squeeze(image_filter)
)
plt.axis('off')
plt.title('Convolution')

# activation layer
relu_fn = tf.nn.relu
# Image detection
image_detect = relu_fn(image_filter)

plt.subplot(1, 3, 2)
plt.imshow(
# Reformat for plotting
tf.squeeze(image_detect)
)

plt.axis('off')
plt.title('Activation')

# Pooling layer
pool = tf.nn.pool
image_condense = pool(input=image_detect,
window_shape=(2, 2),
pooling_type='MAX',
strides=(2, 2),
padding='SAME',
)

plt.subplot(1, 3, 3)
plt.imshow(tf.squeeze(image_condense))
plt.axis('off')
plt.title('Pooling')
plt.show()

14 | P a g e
Output:

Original Grayscale image

Output
Advantages of CNNs
1. Good at detecting patterns and features in images, videos, and audio signals.
2. Robust to translation, rotation, and scaling invariance.
3. End-to-end training, no need for manual feature extraction.
4. Can handle large amounts of data and achieve high accuracy.
Disadvantages of CNNs
1. Computationally expensive to train and require a lot of memory.
2. Can be prone to overfitting if not enough data or proper regularization is used.
3. Requires large amounts of labeled data.
4. Interpretability is limited, it’s hard to understand what the network has learned.
5. Discuss varieties of RNN Deep learning algorithm, and explain, how
each algorithm works by taking examples.

Recurrent Neural Networks (RNNs) come in several varieties, each with unique strengths for
processing sequential data.

RNNs allow the network to “remember” past information by feeding the output from one step
into next step. This helps the network understand the context of what has already happened and
make better predictions based on that. For example when predicting the next word in a
sentence the RNN uses the previous words to help decide what word is most likely to come
next.

15 | P a g e
This image showcases the basic architecture of RNN and the feedback loop mechanism
where the output is passed back as input for the next time step.

How RNN Differs from Feedforward Neural Networks?


Feedforward Neural Networks (FNNs) process data in one direction from input to output
without retaining information from previous inputs. This makes them suitable for tasks with
independent inputs like image classification. However FNNs struggle with sequential data
since they lack memory.
Recurrent Neural Networks (RNNs) solve this by incorporating loops that allow
information from previous steps to be fed back into the network. This feedback enables
RNNs to remember prior inputs making them ideal for tasks where context is important.

Recurrent Vs Feedforward networks

Key Components of RNNs


1. Recurrent Neurons
The fundamental processing unit in RNN is a Recurrent Unit. Recurrent units hold a hidden
state that maintains information about previous inputs in a sequence. Recurrent units can
“remember” information from prior steps by feeding back their hidden state, allowing them to
capture dependencies across time.

Recurrent Neuron

16 | P a g e
2. RNN Unfolding
RNN unfolding or unrolling is the process of expanding the recurrent structure over time steps.
During unfolding each step of the sequence is represented as a separate layer in a series
illustrating how information flows across each time step.
This unrolling enables backpropagation through time (BPTT) a learning process where
errors are propagated across time steps to adjust the network’s weights enhancing the RNN’s
ability to learn dependencies within sequential data.

RNN Unfolding

Recurrent Neural Network Architecture


RNNs share similarities in input and output structures with other deep learning architectures
but differ significantly in how information flows from input to output. Unlike traditional deep
neural networks, where each dense layer has distinct weight matrices, RNNs use shared
weights across time steps, allowing them to remember information over sequences.
In RNNs, the hidden state HiHi​ ​ is calculated for every input XiXi​ ​ to retain sequential
dependencies. The computations follow these core formulas:
1. Hidden State Calculation:
h=σ(U⋅X+W⋅ht−1+B)h=σ(U⋅X+W⋅ht−1​ +B)
Here, hh represents the current hidden state, UU and WW are weight matrices, and BB is the
bias.
2. Output Calculation:
Y=O(V⋅h+C)Y=O(V⋅h+C)
The output YY is calculated by applying OO, an activation function, to the weighted hidden
state, where VV and CC represent weights and bias.
3. Overall Function:
Y=f(X,h,W,U,V,B,C)Y=f(X,h,W,U,V,B,C)
This function defines the entire RNN operation, where the state matrix S holds each
element si​ representing the network’s state at each time step i.

Recurrent Neural Architecture

17 | P a g e
How does RNN work?
At each time step RNNs process units with a fixed activation function. These units have an
internal hidden state that acts as memory that retains information from previous time steps.
This memory allows the network to store past knowledge and adapt based on new inputs.
Updating the Hidden State in RNNs
The current hidden state htht​ ​ depends on the previous state ht−1ht−1​ ​ and the current
input xtxt​ ​ , and is calculated using the following relations:
1. State Update:
ht=f(ht−1,xt)ht​ =f(ht−1​ ,xt​ )
where:
 htht​ ​ is the current state
 ht−1ht−1​ ​ is the previous state
 xtxt​ is the input at the current time step
2. Activation Function Application:
ht=tanh⁡(Whh⋅ht−1+Wxh⋅xt)ht​ =tanh(Whh​ ⋅ht−1​ +Wxh​ ⋅xt​ )
Here, WhhWhh​ ​ is the weight matrix for the recurrent neuron, and WxhWxh​ ​ is the
weight matrix for the input neuron.
3. Output Calculation:
yt=Why⋅htyt​ =Why​ ⋅ht​
where ytyt​ ​ is the output and WhyWhy​ ​ is the weight at the output layer.
These parameters are updated using backpropagation. However, since RNN works on
sequential data here we use an updated backpropagation which is known as backpropagation
through time.
Backpropagation Through Time (BPTT) in RNNs
Since RNNs process sequential data Backpropagation Through Time (BPTT) is used to
update the network’s parameters. The loss function L(θ) depends on the final hidden
state h3h3​ and each hidden state relies on preceding ones forming a sequential dependency
chain:
h3h3​ depends on depends on h2,h2 depends on h1,…,h1 depends on h0 depends on h2​ ,h2
​ depends on h1​ ,…,h1​ depends on h0​ ​ .

Backpropagation Through Time (BPTT) In RNN


In BPTT, gradients are backpropagated through each time step. This is essential for updating
network parameters based on temporal dependencies.
1. Simplified Gradient Calculation:
∂L(θ)∂W=∂L(θ)∂h3⋅∂h3∂W∂W∂L(θ) =∂h3 ∂L(θ) ⋅∂W∂h3
​ ​ ​ ​ ​

18 | P a g e
2. Handling Dependencies in Layers:
Each hidden state is updated based on its dependencies:
h3=σ(W⋅h2+b)h3​ =σ(W⋅h2​ +b)
The gradient is then calculated for each state, considering dependencies from previous
hidden states.
3. Gradient Calculation with Explicit and Implicit Parts: The gradient is broken
down into explicit and implicit parts summing up the indirect paths from each hidden state
to the weights.
∂h3∂W=∂h3+∂W+∂h3∂h2⋅∂h2+∂W∂W∂h3​ ​ =∂W∂h3+​ ​ +∂h2​ ∂h3​ ​ ⋅∂W∂h2+​

4. Final Gradient Expression:
The final derivative of the loss function with respect to the weight matrix W is computed:
∂L(θ)∂W=∂L(θ)∂h3⋅∑k=13∂h3∂hk⋅∂hk∂W∂W∂L(θ)​ =∂h3​ ∂L(θ)​ ⋅∑k=13​ ∂hk​ ∂h3
​ ​ ⋅∂W∂hk​ ​
This iterative process is the essence of backpropagation through time.
Types Of Recurrent Neural Networks
There are four types of RNNs based on the number of inputs and outputs in the network:
1. One-to-One RNN
This is the simplest type of neural network architecture where there is a single input and a
single output. It is used for straightforward classification tasks such as binary classification
where no sequential data is involved.

One to One RNN


2. One-to-Many RNN
In a One-to-Many RNN the network processes a single input to produce multiple outputs over
time. This is useful in tasks where one input triggers a sequence of predictions (outputs). For
example in image captioning a single image can be used as input to generate a sequence of
words as a caption.

19 | P a g e
One to Many RNN

Many-to-One RNN
The Many-to-One RNN receives a sequence of inputs and generates a single output. This type
is useful when the overall context of the input sequence is needed to make one prediction. In
sentiment analysis the model receives a sequence of words (like a sentence) and produces a
single output like positive, negative or neutral.

Many to One RNN

Many-to-Many RNN
The Many-to-Many RNN type processes a sequence of inputs and generates a sequence of
outputs. In language translation task a sequence of words in one language is given as input, and
a corresponding sequence in another language is generated as output.

Many to Many RNN

Variants of Recurrent Neural Networks (RNNs)


There are several variations of RNNs, each designed to address specific challenges or optimize
for certain tasks:

20 | P a g e
1. Vanilla RNN
This simplest form of RNN consists of a single hidden layer where weights are shared across
time steps. Vanilla RNNs are suitable for learning short-term dependencies but are limited by
the vanishing gradient problem, which hampers long-sequence learning.
2. Bidirectional RNNs
Bidirectional RNNs process inputs in both forward and backward directions, capturing both
past and future context for each time step. This architecture is ideal for tasks where the entire
sequence is available, such as named entity recognition and question answering.
3. Long Short-Term Memory Networks (LSTMs)
Long Short-Term Memory Networks (LSTMs) introduce a memory mechanism to overcome
the vanishing gradient problem. Each LSTM cell has three gates:
 Input Gate: Controls how much new information should be added to the cell state.
 Forget Gate: Decides what past information should be discarded.
 Output Gate: Regulates what information should be output at the current step. This
selective memory enables LSTMs to handle long-term dependencies, making them ideal for
tasks where earlier context is critical.
4. Gated Recurrent Units (GRUs)
Gated Recurrent Units (GRUs) simplify LSTMs by combining the input and forget gates into a
single update gate and streamlining the output mechanism. This design is computationally
efficient, often performing similarly to LSTMs, and is useful in tasks where simplicity and
faster training are beneficial.
Implementing a Text Generator Using Recurrent Neural
Networks (RNNs)
In this section, we create a character-based text generator using Recurrent Neural Network
(RNN) in TensorFlow and Keras. We’ll implement an RNN that learns patterns from a text
sequence to generate new text character-by-character.
Step 1: Import Necessary Libraries
We start by importing essential libraries for data handling and building the neural network.
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense
Step 2: Define the Input Text and Prepare Character Set
We define the input text and identify unique characters in the text which we’ll encode for our
model.
text = "This is GeeksforGeeks a software training institute"
chars = sorted(list(set(text)))
char_to_index = {char: i for i, char in enumerate(chars)}
index_to_char = {i: char for i, char in enumerate(chars)}
Step 3: Create Sequences and Labels
To train the RNN, we need sequences of fixed length (seq_length) and the character following
each sequence as the label.

21 | P a g e
seq_length = 3
sequences = []
labels = []

for i in range(len(text) - seq_length):


seq = text[i:i + seq_length]
label = text[i + seq_length]
sequences.append([char_to_index[char] for char in seq])
labels.append(char_to_index[label])

X = np.array(sequences)
y = np.array(labels)
Step 4: Convert Sequences and Labels to One-Hot Encoding
For training, we convert X and y into one-hot encoded tensors.
X_one_hot = tf.one_hot(X, len(chars))
y_one_hot = tf.one_hot(y, len(chars))

Step 5: Build the RNN Model


We create a simple RNN model with a hidden layer of 50 units and a Dense output layer
with softmax activation.
model = Sequential()
model.add(SimpleRNN(50, input_shape=(seq_length, len(chars)), activation='relu'))
model.add(Dense(len(chars), activation='softmax'))
Step 6: Compile and Train the Model
We compile the model using the categorical_crossentropy loss and train it for 100 epochs.
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_one_hot, y_one_hot, epochs=100)
Output:
Epoch 1/100
2/2 ━━━━━━━━━━━━━━━━━━━━ 4s 23ms/step – accuracy: 0.0243 – loss:
2.9043
Epoch 2/100
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step – accuracy: 0.0139 – loss:
2.8720
Epoch 3/100
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step – accuracy: 0.0243 – loss:
2.8454
Epoch 99/100
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step – accuracy: 0.8889 – loss:
0.5060
Epoch 100/100
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step – accuracy: 0.9236 – loss:
0.4934

22 | P a g e
Step 7: Generate New Text Using the Trained Model
After training, we use a starting sequence to generate new text character-by-character.
start_seq = "This is G"
generated_text = start_seq

for i in range(50):
x = np.array([[char_to_index[char] for char in generated_text[-seq_length:]]])
x_one_hot = tf.one_hot(x, len(chars))
prediction = model.predict(x_one_hot)
next_index = np.argmax(prediction)
next_char = index_to_char[next_index]
generated_text += next_char

print("Generated Text:")
print(generated_text)

Complete Code
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense

text = "This is GeeksforGeeks a software training institute"


chars = sorted(list(set(text)))
char_to_index = {char: i for i, char in enumerate(chars)}
index_to_char = {i: char for i, char in enumerate(chars)}

seq_length = 3
sequences = []
labels = []

for i in range(len(text) - seq_length):


seq = text[i:i + seq_length]
label = text[i + seq_length]
sequences.append([char_to_index[char] for char in seq])
labels.append(char_to_index[label])

X = np.array(sequences)
y = np.array(labels)

X_one_hot = tf.one_hot(X, len(chars))


y_one_hot = tf.one_hot(y, len(chars))

model = Sequential()
model.add(SimpleRNN(50, input_shape=(seq_length, len(chars)), activation='relu'))
model.add(Dense(len(chars), activation='softmax'))

23 | P a g e
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_one_hot, y_one_hot, epochs=100)

start_seq = "This is G"


generated_text = start_seq

for i in range(50):
x = np.array([[char_to_index[char] for char in generated_text[-seq_length:]]])
x_one_hot = tf.one_hot(x, len(chars))
prediction = model.predict(x_one_hot)
next_index = np.argmax(prediction)
next_char = index_to_char[next_index]
generated_text += next_char

print("Generated Text:")
print(generated_text)
Advantages of Recurrent Neural Networks
Sequential Memory: RNNs retain information from previous inputs, making them ideal
for time-series predictions where past data is crucial. This capability is often called Long
Short-Term Memory (LSTM).
Enhanced Pixel Neighborhoods: RNNs can be combined with convolutional layers to
capture extended pixel neighborhoods improving performance in image and video data
processing.
Limitations of Recurrent Neural Networks (RNNs)
While RNNs excel at handling sequential data, they face two main training challenges
i.e., vanishing gradient and exploding gradient problem:
Vanishing Gradient: During backpropagation, gradients diminish as they pass through
each time step, leading to minimal weight updates. This limits the RNN’s ability to learn
long-term dependencies, which is crucial for tasks like language translation.
Exploding Gradient: Sometimes, gradients grow uncontrollably, causing excessively large
weight updates that destabilize training. Gradient clipping is a common technique to
manage this issue.
These challenges can hinder the performance of standard RNNs on complex, long-sequence
tasks.
Applications of Recurrent Neural Networks
RNNs are used in various applications where data is sequential or time-based:
Time-Series Prediction: RNNs excel in forecasting tasks, such as stock market predictions
and weather forecasting.
Natural Language Processing (NLP): RNNs are fundamental in NLP tasks like language
modeling, sentiment analysis, and machine translation.
Speech Recognition: RNNs capture temporal patterns in speech data, aiding in speech-to-
text and other audio-related applications.
Image and Video Processing: When combined with convolutional layers, RNNs help
analyze video sequences, facial expressions, and gesture recognition.

24 | P a g e
References

 Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.


 Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural
Computation, 9(8), 1735-1780.
 Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
 Ricci, F., Rokach, L., & Shapira, B. (2011). Introduction to Recommender Systems
Handbook. In Recommender Systems Handbook (pp. 1-35). Springer US.
 Stanford University CS231n: Convolutional Neural Networks for Visual
Recognition. Course Notes.
 https://www.google.com/url?sa=E&q=https%3A%2F%2Fptop.only.wip.la%3A443%2Fhttp%2Fcs231n.github.io%2F
 https://www.google.com/url?sa=E&q=https%3A%2F%2Fptop.only.wip.la%3A443%2Fhttps%2Fwww.tensorflow.org%2Fapi_d
ocs%2Fpython%2Ftf

25 | P a g e

You might also like