0% found this document useful (0 votes)
10 views

ML Unit 4

ML Unit 4

Uploaded by

Sayli Gawde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

ML Unit 4

ML Unit 4

Uploaded by

Sayli Gawde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Unit 4: Fundamentals of Deep Learning What is Deep Learning?

Need Deep
Learning? Introduction to Artificial Neural Network (ANN), Core components of
neural networks, Multi-Layer Perceptron (MLP), Activation functions, Sigmoid,
Rectified Linear Unit (ReLU), Introduction to Tensors and Operations,
Tensorflow framework

What is Deep Learning?


Deep learning is the branch of machine learning which is based on artificial
neural network architecture. An artificial neural network or ANN uses layers
of interconnected nodes called neurons that work together to process and
learn from the input data.
In a fully connected Deep neural network, there is an input layer and one or
more hidden layers connected one after the other. Each neuron receives
input from the previous layer neurons or the input layer. The output of one
neuron becomes the input to other neurons in the next layer of the network,
and this process continues until the final layer produces the output of the
network. The layers of the neural network transform the input data through a
series of nonlinear transformations, allowing the network to learn complex
representations of the input data.
Today Deep learning has become one of the most popular and visible areas
of machine learning, due to its success in a variety of applications, such as
computer vision, natural language processing, and Reinforcement learning.
Deep learning can be used for supervised, unsupervised as well as
reinforcement machine learning. it uses a variety of ways to process these.
 Supervised Machine Learning: Supervised machine learning is
the machine learning technique in which the neural network learns to
make predictions or classify data based on the labeled datasets. Here we
input both input features along with the target variables. the neural
network learns to make predictions based on the cost or error that comes
from the difference between the predicted and the actual target, this
process is known as backpropagation. Deep learning algorithms like
Convolutional neural networks, Recurrent neural networks are used for
many supervised tasks like image classifications and recognization,
sentiment analysis, language translations, etc.
 Unsupervised Machine Learning: Unsupervised machine learning is
the machine learning technique in which the neural network learns to
discover the patterns or to cluster the dataset based on unlabeled
datasets. Here there are no target variables. while the machine has to
self-determined the hidden patterns or relationships within the datasets.
Deep learning algorithms like autoencoders and generative models are
used for unsupervised tasks like clustering, dimensionality reduction, and
anomaly detection.
 Reinforcement Machine Learning: Reinforcement Machine Learning is
the machine learning technique in which an agent learns to make
decisions in an environment to maximize a reward signal. The agent
interacts with the environment by taking action and observing the
resulting rewards. Deep learning can be used to learn policies, or a set of
actions, that maximizes the cumulative reward over time. Deep
reinforcement learning algorithms like Deep Q networks and Deep
Deterministic Policy Gradient (DDPG) are used to reinforce tasks like
robotics and game playing etc.

Artificial neural networks

Artificial neural networks are built on the principles of the structure and
operation of human neurons. It is also known as neural networks or neural
nets. An artificial neural network’s input layer, which is the first layer,
receives input from external sources and passes it on to the hidden layer,
which is the second layer. Each neuron in the hidden layer gets information
from the neurons in the previous layer, computes the weighted total, and
then transfers it to the neurons in the next layer. These connections are
weighted, which means that the impacts of the inputs from the preceding
layer are more or less optimized by giving each input a distinct weight. These
weights are then adjusted during the training process to enhance the
performance of the model.

Fully Connected Artificial Neural Network

Artificial neurons, also known as units, are found in artificial neural networks.
The whole Artificial Neural Network is composed of these artificial neurons,
which are arranged in a series of layers. The complexities of neural networks
will depend on the complexities of the underlying patterns in the dataset
whether a layer has a dozen units or millions of units. Commonly, Artificial
Neural Network has an input layer, an output layer as well as hidden layers.
The input layer receives data from the outside world which the neural
network needs to analyze or learn about.
In a fully connected artificial neural network, there is an input layer and one
or more hidden layers connected one after the other. Each neuron receives
input from the previous layer neurons or the input layer. The output of one
neuron becomes the input to other neurons in the next layer of the network,
and this process continues until the final layer produces the output of the
network. Then, after passing through one or more hidden layers, this data is
transformed into valuable data for the output layer. Finally, the output layer
provides an output in the form of an artificial neural network’s response to
the data that comes in.
Units are linked to one another from one layer to another in the bulk of neural
networks. Each of these links has weights that control how much one unit
influences another. The neural network learns more and more about the data
as it moves from one unit to another, ultimately producing an output from the
output layer.
Difference between Machine Learning and Deep Learning :
machine learning and deep learning both are subsets of artificial intelligence
but there are many similarities and differences between them.
Machine Learning Deep Learning

Apply statistical algorithms to learn the Uses artificial neural network


hidden patterns and relationships in the architecture to learn the hidden patterns
dataset. and relationships in the dataset.

Requires the larger volume of dataset


Can work on the smaller amount of dataset
compared to machine learning

Better for complex task like image


Better for the low-label task. processing, natural language processing,
etc.

Takes less time to train the model. Takes more time to train the model.
Machine Learning Deep Learning

A model is created by relevant features Relevant features are automatically


which are manually extracted from images extracted from images. It is an end-to-
to detect an object in the image. end learning process.

Less complex and easy to interpret the More complex, it works like the black box
result. interpretations of the result are not easy.

It can work on the CPU or requires less


It requires a high-performance computer
computing power as compared to deep
with GPU.
learning.

Types of neural networks

Deep Learning models are able to automatically learn features from the data,
which makes them well-suited for tasks such as image recognition, speech
recognition, and natural language processing. The most widely used
architectures in deep learning are feedforward neural networks,
convolutional neural networks (CNNs), and recurrent neural networks
(RNNs).
Feedforward neural networks (FNNs) are the simplest type of ANN, with a
linear flow of information through the network. FNNs have been widely used
for tasks such as image classification, speech recognition, and natural
language processing.
Convolutional Neural Networks (CNNs) are specifically for image and video
recognition tasks. CNNs are able to automatically learn features from the
images, which makes them well-suited for tasks such as image classification,
object detection, and image segmentation.
Recurrent Neural Networks (RNNs) are a type of neural network that is able
to process sequential data, such as time series and natural language. RNNs
are able to maintain an internal state that captures information about the
previous inputs, which makes them well-suited for tasks such as speech
recognition, natural language processing, and language translation.
The core components of neural networks, particularly deep neural networks, include:

1. Neurons (Nodes): Neurons are the basic computational units of a neural network. Each
neuron receives input signals, processes them using an activation function, and
produces an output signal.

2. Connections (Edges): Connections represent the pathways through which signals are
transmitted between neurons. Each connection is associated with a weight that
determines the strength of the signal.

3. Layers: Neural networks are typically organized into layers, each consisting of multiple
neurons. The three main types of layers are:

 Input Layer: Receives input data and passes it to the subsequent layers.
 Hidden Layers: Intermediate layers between the input and output layers where
the computation occurs. Deep neural networks may have multiple hidden layers.
 Output Layer: Produces the final output of the network based on the processed
input.

4. Weights and Biases: Weights represent the strength of connections between neurons,
determining how much influence one neuron has on another. Biases are additional
parameters added to each neuron that allow the network to learn more complex
functions.

5. Activation Functions: Activation functions introduce non-linearity into the network,


allowing it to learn complex patterns and relationships in the data. Common activation
functions include sigmoid, tanh, ReLU (Rectified Linear Unit), and softmax.

6. Loss Function: The loss function measures the difference between the predicted output
of the network and the actual output (ground truth). The goal of training the network is
to minimize this loss function, often using optimization algorithms such as gradient
descent.

7. Optimization Algorithm: Optimization algorithms, such as stochastic gradient descent


(SGD), Adam, or RMSprop, are used to update the weights and biases of the network
during training, minimizing the loss function.

8. Forward Propagation: In forward propagation, input data is passed through the


network layer by layer, with each layer performing computations and passing the output
to the next layer until the final output is generated.
9. Backpropagation: Backpropagation is the process of computing gradients of the loss
function with respect to the weights and biases of the network. These gradients are then
used to update the parameters during training, enabling the network to learn from the
data.

These components work together to enable neural networks to learn complex patterns
and relationships in data, making them powerful tools for tasks such as classification,
regression, and pattern recognition.

A Multi-Layer Perceptron (MLP) is a type of feedforward artificial neural


network (ANN) consisting of multiple layers of nodes (neurons), including an input layer, one or
more hidden layers, and an output layer. It is a fundamental architecture in deep learning and is
used for various tasks such as classification, regression, and pattern recognition.

Here are some key characteristics and components of an MLP:

1. Input Layer: The input layer consists of neurons that receive the initial data or features. Each
neuron represents a feature of the input data.

2. Hidden Layers: Hidden layers are intermediate layers between the input and output layers. Each
hidden layer consists of multiple neurons, and these neurons perform computations on the input
data. The number of hidden layers and the number of neurons in each layer are configurable
parameters of the network architecture.

3. Output Layer: The output layer produces the final output of the network based on the
computations performed in the hidden layers. The number of neurons in the output layer depends
on the nature of the task; for example, in binary classification, there may be one neuron
representing the probability of belonging to one class, while in multi-class classification, there
may be multiple neurons representing the probabilities of belonging to each class.

4. Weights and Biases: Each connection between neurons in adjacent layers is associated with a
weight, which determines the strength of the connection. Additionally, each neuron has an
associated bias term, which allows the network to learn more complex functions.

5. Activation Functions: Activation functions introduce non-linearity into the network, allowing it
to learn complex patterns and relationships in the data. Common activation functions used in
MLPs include sigmoid, tanh, and Rectified Linear Unit (ReLU).

6. Forward Propagation: During forward propagation, input data is passed through the network
layer by layer, with each layer performing computations using the weights and biases and
passing the output to the next layer until the final output is generated.
7. Backpropagation: Backpropagation is used to train the MLP by computing gradients of the loss
function with respect to the weights and biases of the network. These gradients are then used to
update the parameters during training, enabling the network to learn from the data.

MLPs are versatile and can learn complex relationships in data, especially when they have
multiple hidden layers. However, they may suffer from overfitting if not properly regularized,
and they may require a large amount of data for training, as well as computational resources for
training larger networks. Despite these challenges, MLPs remain a foundational model in deep
learning and are widely used in various applications.

Activation functions are an essential component of artificial neural networks, including


deep learning models like Multi-Layer Perceptrons (MLPs) and Convolutional Neural
Networks (CNNs). They introduce non-linearity into the network, enabling it to learn
complex patterns and relationships in the data. Here are some commonly used
activation functions:

1. Sigmoid: The sigmoid function, also known as the logistic function, maps input values
to a range between 0 and 1. It has the mathematical form:

Sigmoid functions were widely used historically but are less common now due to some
drawbacks, such as the vanishing gradient problem, where gradients become extremely small
during training, making learning slow.
The sigmoid activation function, also known as the logistic function, is a
type of activation function commonly used in artificial neural networks, particularly in
the past. Although less prevalent now compared to Rectified Linear Unit (ReLU) and its
variants, it still has some applications, particularly in the output layer for binary
classification problems where the output needs to be between 0 and 1.

The sigmoid function is defined mathematically as:

The sigmoid function


has the following properties:

1. Output Range: The output of the sigmoid function is bounded between 0 and 1. This
property makes it useful for tasks where the output needs to be interpreted as a
probability, such as binary classification, where values closer to 1 indicate one class and
values closer to 0 indicate the other class.

2. Smoothness: The sigmoid function produces smooth and continuous outputs, which
allows for smooth gradients during backpropagation, aiding in the training process.

3. Non-linearity: The sigmoid function introduces non-linearity into the network, enabling
it to model complex relationships in the data. This property is crucial for the expressive
power of neural networks.

However, the sigmoid function has some limitations:

1. Vanishing Gradient: Sigmoid neurons saturate when the input is very large or very
small, leading to vanishing gradients during backpropagation. This can slow down the
learning process, especially in deep networks.

2. Output Centered around 0.5: The output of the sigmoid function is centered around
0.5, which can cause issues during optimization, especially if the output distribution is
imbalanced.

Due to these limitations, sigmoid activation functions are less commonly used in hidden
layers of deep neural networks compared to alternatives like ReLU and its variants.
Nonetheless, they still find applications in specific scenarios, such as binary classification
tasks where the output needs to be in the range (0, 1).

The Rectified Linear Unit (ReLU) is a popular activation function used in artificial neural
networks, especially in deep learning models. It is known for its simplicity and effectiveness in
training deep neural networks. The ReLU function is defined mathematically as:

ReLU has several properties that make it advantageous:

1. Simplicity: The ReLU function is simple and computationally efficient. It involves a single
element-wise operation, comparing the input to zero and selecting the maximum value.

2. Non-linearity: Like other activation functions, ReLU introduces non-linearity into the network,
allowing it to learn complex relationships in the data. The non-linear nature of ReLU is crucial
for the expressive power of neural networks.

3. Sparsity: ReLU neurons are sparse, meaning they are inactive (output zero) for negative inputs.
This sparsity can help improve the efficiency of training and reduce the likelihood of overfitting
by introducing regularization effects.

4. Avoids Vanishing Gradient: Unlike sigmoid and tanh activation functions, which can saturate
and cause vanishing gradients, ReLU does not suffer from this problem for positive inputs. It
helps mitigate the vanishing gradient problem, enabling more stable and faster training of deep
neural networks.

Despite its advantages, ReLU also has some limitations:

1. Dead Neurons: During training, some ReLU neurons can become 'dead,' meaning they output
zero for any input. Once a neuron is 'dead,' it cannot recover since the gradient for negative
inputs is zero. Techniques like Leaky ReLU and Parametric ReLU have been proposed to
address this issue.

2. Unbounded Activation: ReLU does not have an upper bound on the activation value, which can
lead to exploding activations during training, especially if the learning rate is too high. However,
techniques such as gradient clipping can mitigate this problem.
Overall, ReLU is widely used in practice due to its simplicity, effectiveness, and ability to
mitigate the vanishing gradient problem, making it a cornerstone of modern deep learning
architectures.

Introduction: TensorFlow.js is a library to define and run computations


using tensors in JavaScript. A tensor is a generalization of vectors and
matrices to higher dimensions.
Tensors: The central unit of data in TensorFlow.js is the tf.Tensor – a set of
values shaped into an array of one or more dimensions. tf.Tensor are very
similar to multidimensional arrays.
A tf.Tensor also contains the following properties:
 rank: Defines how many dimensions the tensor contains.
 shape: Which defines the size of each dimension of the data.
 dtype: Which defines the data type of the tensor.
Note: We will use the term “dimension” interchangeably with the rank.
Sometimes in machine learning,
“dimensionally” of a tensor can also be referred to as the size of a particular
dimension (e.g a matrix of shape[10, 5] is a rank-2 tensor or a 2-dimensional
tensor. The dimensionality of the first dimension is 10. This can be
confusing, but we put this note here because you will likely come across
these dual uses of the term).
A tf.Tensor can be created from an array with the tf.Tensor() method.
 JavaScript

// Create a rank-2 tensor (matrix) matrix

// tensor from a multidimensional array

const a = tf.tensor([[1,2], [3,4]]);

console.log('shape:', a.shape);

a.print();

// Or you can create a tensor from a


// flat array and specify a shape.

const shape = [2, 2];

const b = tf.tensor([1,2,3,4], shape);

console.log('shape:', b.shape);

b.print();

Output:

shape: 2,2
Tensor
[[1, 2],
[3, 4]]
shape: 2,2
Tensor
[[1, 2],
[3, 4]]
Operations: While tensors allow you to store data, operations (ops) allow
you to manipulate that data. TensorFlow.js provides a wide variety of ops
suitable for linear algebra and machine learning that can be performed on
tensors.
Example: Computing x^2 of al element in a tf.Tensor:
 JavaScript

const x = tf.tensor([1,2,3,4]);

// Equivalent to tf.square(x)
const y = x.square();

y.print();

Output for the above code:


y=x([1, 4, 9, 16 ] )
Example: Adding elements of two tf.Tensor element-wise:
 JavaScript

const a = tf.tensor([1,2,3,4]);

const b = tf.tensor([10,20,30,40]);

// Equivalent to tf.add(a,b)

const y = a.add(b);

y.print();

Output:
numpy = array([ 11, 22, 33, 44 ])
Because tensors are immutable, these ops do not change their values.
Instead, ops return always return new tf.Tensor s.
Memory: When using the WebGL backend, tf.Tensor memory must be
managed explicitly (it is not sufficient to let a tf.tensor go out of scope for
its memory to be released).
Introduction to TensorFlow
TensorFlow is an open-source machine learning library developed
by Google. TensorFlow is used to build and train deep learning models as it
facilitates the creation of computational graphs and efficient execution on
various hardware platforms

TensorFlow
TensorFlow is basically a software library for numerical computation
using data flow graphs where:
 nodes in the graph represent mathematical operations.
 edges in the graph represent the multidimensional data arrays
(called tensors) communicated between them. (Please note
that tensor is the central unit of data in TensorFlow).

Consider the diagram given below: Here, add is a node which represents
addition operation. a and b are input tensors and c is the resultant tensor. This
flexible architecture allows you to deploy computation to one or more CPUs or
GPUs in a desktop, server, or mobile device with a single API!

TensorFlow is an open-source machine learning framework developed and maintained by


Google. It provides a comprehensive ecosystem of tools, libraries, and resources for building and
deploying machine learning models, particularly deep learning models. TensorFlow is widely
used in both research and industry for a variety of tasks, including classification, regression,
clustering, natural language processing, computer vision, and more. Here are some key aspects
of TensorFlow:

1. TensorFlow Core: TensorFlow's core component is its computational graph framework, which
allows users to define and execute complex mathematical operations using tensors (multi-
dimensional arrays) as the primary data structure. The computational graph defines the flow of
data through the operations, facilitating automatic differentiation and optimization during
training.

2. Flexible Architecture: TensorFlow offers a flexible and scalable architecture that allows users
to deploy models across a wide range of devices, including CPUs, GPUs, TPUs (Tensor
Processing Units), and even mobile and embedded devices. This flexibility makes TensorFlow
suitable for both research and production environments.

3. High-Level APIs: TensorFlow provides high-level APIs, such as Keras (integrated into
TensorFlow as tf.keras), tf.estimator, and TensorFlow Hub, which simplify the process of
building, training, and deploying machine learning models. These APIs abstract away low-level
details and provide easy-to-use interfaces for common tasks.

4. Ecosystem and Libraries: TensorFlow has a rich ecosystem of libraries and tools built on top of
its core framework, including TensorFlow Extended (TFX) for end-to-end machine learning
pipelines, TensorFlow Probability for probabilistic modeling, TensorFlow Lite for deploying
models on mobile and embedded devices, TensorFlow.js for running models in the browser, and
more.

5. Model Serving and Deployment: TensorFlow provides tools and utilities for serving and
deploying trained models in production environments, such as TensorFlow Serving and
TensorFlow Model Optimization Toolkit. These tools enable efficient and scalable deployment
of models for inference in real-world applications.

6. Community and Support: TensorFlow has a large and active community of developers,
researchers, and enthusiasts who contribute to its development, provide support, and share
resources and best practices through forums, mailing lists, documentation, and online tutorials.

Overall, TensorFlow is a powerful and versatile framework for building and deploying machine
learning models, offering a wide range of features, tools, and resources to support various use
cases and applications in both research and production environments.

You might also like