0% found this document useful (0 votes)
8 views

DL PYTH Keras

Uploaded by

azamfaisal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

DL PYTH Keras

Uploaded by

azamfaisal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

Python Keras

Keras

• Keras is an open-source deep learning library written in Python. It is a high-level neural


networks API that can run on top of other popular deep learning frameworks like
TensorFlow, Microsoft Cognitive Toolkit (CNTK), or Theano.
• Keras provides a user-friendly interface for building, training, and deploying deep learning
models.
Keras

• The word "Keras" itself has its roots in Greek and is related to the term "horn" or
"projectile."
• In this context, it's used metaphorically to signify a focus on simplicity, modularity, and
extensibility in the design of the library.
• Keras was developed by François Chollet and first released in March 2015.
Popular Open-Source Data Repositories

• UC Irvine Machine Learning Repository


• Kaggle datasets
• Amazon’s AWS datasets
• Meta portals (they list open data repositories):
• http://dataportals.org/
• http://opendatamonitor.eu/
• http://quandl.com/
• Other pages listing many popular open data repositories:
• Wikipedia’s list of Machine Learning datasets
• Quora.com question
• Datasets subreddit
Activation Functions
• Sigmoid Activation Function
• Hyperbolic Tangent (tanh) Activation
Function
• Rectified Linear Unit (ReLU) Activation
Activation Function
Functions • Leaky Rectified Linear Unit (Leaky
ReLU) Activation Function
• Softmax Activation Function
Sigmoid
•Range: (0, 1)
Sigmoid
• Use Case
• Often used in the output layer of binary classification problems where
the goal is to produce probabilities.
• It squashes the output to a range between 0 and 1.
Sigmoid
import numpy as np
import matplotlib.pyplot as plt

def sigmoid(x):
return 1 / (1 + np.exp(-x))

x = np.linspace(-5, 5, 100)
y = sigmoid(x)

plt.plot(x, y, label='Sigmoid')
plt.title('Sigmoid Activation Function')
plt.xlabel('Input')
plt.ylabel('Output')
plt.legend()
plt.show()
Hyperbolic Tangent
(tanh) Activation
Function

•Range: (-1, 1)
Hyperbolic
Tangent • Similar to the sigmoid, but with an output range
(tanh) between -1 and 1. Often used in hidden layers of
neural networks.

Activation
Function
Hyperbolic Tangent (tanh) Activation
Function
def tanh(x):
return np.tanh(x)

y_tanh = tanh(x)

plt.plot(x, y_tanh, label='tanh')


plt.title('Hyperbolic Tangent (tanh) Activation Function')
plt.xlabel('Input')
plt.ylabel('Output')
plt.legend()
plt.show()
Rectified Linear
Unit (ReLU)
Activation
Function

•Range: [0, +∞)


ReLU
• Commonly used in hidden layers.
• ReLU is computationally efficient and helps with the vanishing
gradient problem, allowing models to learn faster.
ReLU
def relu(x):
return np.maximum(0, x)

y_relu = relu(x)

plt.plot(x, y_relu, label='ReLU')


plt.title('Rectified Linear Unit (ReLU) Activation Function')
plt.xlabel('Input')
plt.ylabel('Output')
plt.legend()
plt.show()
Leaky Rectified Linear Unit (Leaky ReLU)
Activation Function
• A variant of ReLU that allows a small negative slope for the negative
values, preventing dead neurons during training.
def leaky_relu(x, alpha=0.01):
return np.where(x > 0, x, alpha * x)

y_leaky_relu = leaky_relu(x)

plt.plot(x, y_leaky_relu, label='Leaky ReLU')


plt.title('Leaky Rectified Linear Unit (Leaky
ReLU) Activation Function')
plt.xlabel('Input')
plt.ylabel('Output')
plt.legend()
plt.show()
Why DL in PR?
Challenges of SLP Vs DL: Representation Power:

• SLP: A single-layer perceptron can only learn linear decision boundaries, which
means it can only separate classes that are linearly separable.
• It cannot capture nonlinear relationships between input features and target variables.
• MLP: Multi-layer perceptrons with more layers (hidden layers) and nonlinear
activation functions have the ability to learn complex, nonlinear decision boundaries.
• They can capture intricate patterns and relationships in the data, allowing for more
accurate and flexible modeling.
Example
• In an SLP, we would directly connect the input features (age, weight,
height, BMI) to the output layer, without any hidden layers.
• The SLP would learn a linear decision boundary in the input feature
space to separate individuals with and without DM.
• If the relationship between input features and DM is linearly
separable, the SLP may achieve decent performance. However, if the
relationship is nonlinear or complex, the SLP may struggle to capture
it effectively.
• For instance, younger individuals with higher BMI and weight may be at
higher risk of DM, but this relationship may not be strictly linear.
Example
• In this case, an SLP may struggle to capture the nonlinear relationship
between input features and DM, as it can only learn linear decision
boundaries.
• It may underperform and fail to accurately classify individuals with
and without DM.
DL Representation Benefit
• In an MLP with more layers, we would add one or more hidden layers
between the input and output layers.
• Each hidden layer in the MLP would consist of multiple neurons with
nonlinear activation functions (e.g., ReLU).
• The MLP would learn hierarchical representations of features, where each
hidden layer extracts increasingly abstract and complex features from the
input data.
• With more layers and nonlinear activation functions, the MLP can capture
nonlinear relationships and complex patterns in the data, enabling it to
achieve better performance, especially on tasks with nonlinear decision
boundaries.
Challenges of SLP Vs DL: Expressiveness

• SLP: Single-layer perceptrons have limited expressiveness and may


struggle to learn complex tasks that require capturing nonlinear
relationships or handling high-dimensional data.
• MLP: Multi-layer perceptrons with more layers have greater
expressiveness and can handle a wide range of tasks, including
complex classification and regression problems, image and speech
recognition, natural language processing, and more.
Challenges of SLP Vs DL: Learning Capacity

• SLP: Due to their simplicity, single-layer perceptrons may


underperform on complex tasks or datasets with nonlinear
relationships.
• MLP: Multi-layer perceptrons with more layers have higher learning
capacity and can better adapt to complex datasets.
• They can learn intricate patterns and generalize well to unseen data,
leading to improved performance on challenging tasks.
Symmetry (Pattern) breaking
problem
The Problem
• All neurons in a hidden layer have the same or very similar weights,
they end up learning similar representations of the input data, which
reduces the overall representational capacity of the network.
The Solution
• Random Initialization
• Initializing the weights randomly helps break symmetry and introduce
diversity in the network's weights from the beginning of training.
• This ensures that neurons in hidden layers start with different weight values
and learn to extract different features from the input data.
• Nonlinear Activation Functions
• Using nonlinear activation functions such as ReLU (Rectified Linear Unit) or
tanh introduces nonlinearity into the network, allowing it to learn complex
relationships between inputs and outputs.
• Nonlinear activation functions enable the network to model more complex
functions and avoid the problem of hidden neurons converging to similar
weights.
The Solution
• Regularization Techniques
• Techniques like L1 or L2 regularization, dropout, and batch normalization can
help prevent overfitting and improve the generalization ability of the network.
• Regularization penalizes large weights and encourages sparsity in the network,
which can mitigate the problem of neurons converging to similar weights.
The Solution
• Gradient Descent Optimization
• Using advanced optimization algorithms such as stochastic gradient descent
(SGD), Adam, or RMSprop can help the network efficiently update the
weights during training.
• These algorithms adaptively adjust the learning rate and update the weights
based on the gradients of the loss function, which helps the network navigate
the weight space more effectively.
Overfitting
• Overfitting occurs when a model learns to fit the training data too
closely, capturing noise and irrelevant patterns in addition to the
underlying relationships.
• A model that overfits may perform very well on the training data but
generalize poorly to new, unseen data.
• Overfitting can happen when the model is too complex relative to the
amount of training data available, leading to excessive
parameterization and memorization of the training examples.
Overfitting -- Example
• Small or similar weights can contribute to overfitting by reducing the
model's capacity to learn meaningful patterns in the data.
• If weights become too small or similar, neurons in the network may
fail to distinguish between different features or input patterns
effectively.
• This can result in a loss of discriminative power and a reduction in the
model's ability to learn and generalize from the data.
Underfitting
• Underfitting often occurs when the model is too simple or has
insufficient capacity to capture the complexity of the underlying data
distribution.

• Underfitting can also occur when the model is not trained for a
sufficient number of iterations (epochs) or when the training dataset is
too small or not representative of the true data distribution.
import pandas as pd

from sklearn.model_selection import train_test_split

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

# Load data from the text file

data = pd.read_csv('path/to/your/sample_data.txt')

# Assuming 'Age' and BMI' are the input features and 'Target' is the target variable

X = data[['Age’, ‘BMI']]

y = data['Target']

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# Build the model

model = Sequential()

model.add(Dense(units=1, activation='relu', input_dim=2)) # Assuming two input features (Age and BMI)

model.add(Dense(units=1, activation='sigmoid')) # Assuming binary classification, use 'sigmoid' activation

# Compile the model

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model

model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))


Neurons parameter
• The number of neurons in the first layer (or any hidden layer) of a
neural network is a hyperparameter that you can tune based on the
complexity of the problem you're trying to solve.
• The choice of the number of neurons depends on factors such as the
complexity of the data, the relationships between features, and the
nature of the underlying patterns.
A Complex Architecture of Network
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Assuming X_data has 10 features
input_features = 10
model = Sequential()
# First hidden layer with more neurons
model.add(Dense(units=64, activation='relu', input_dim=input_features))
# Second hidden layer
model.add(Dense(units=32, activation='relu'))
# Output layer
model.add(Dense(units=1, activation='sigmoid'))
A Complex Design
• The first hidden layer has 64 neurons, which allows the model to learn
more complex representations from the input data.
• Each neuron in this layer can learn different patterns and relationships
within the data.
• The second hidden layer with 32 neurons provides additional capacity
for the model to capture hierarchical features and relationships.
• This layer can learn to combine features learned by the first layer in a
more sophisticated manner.
• The output layer remains the same with 1 neuron and a sigmoid
activation function, suitable for binary classification.
Keras Main Functions

Deep Learning
Sequential Model
• keras.models.Sequential()
• Used to create a linear stack of layers
• Appropriate for plain stack of input tensor, {1 input, 1 output}
• Not Suitable when
• Model has multiple inputs and multiple outputs
Dense Layer
• keras.layers.Dense()
• A fully connected layer
Convolutional Layer
• keras.layers.Conv2D()
• Used for convolution operations.
• create a convolutional layer for 2D spatial convolution over images
• Mainly used for image classification, object detection, and image
segmentation
Convolution Additional Details
• Convolution involves combining two functions to produce a third
function
• Input Image: This represents the original image or input signal that
you want to process.
• Kernel or Filter: This is a small matrix of weights (also known as a
filter or kernel) that is used for the convolution operation.
Convolution..
Input Image: Kernel:
[1 2 3] [0 1]
[4 5 6] [1 0]
[7 8 9]

Output[0,0] = (1*0) + (2*1) + (4*1) + (5*0) = 6


Output[0,1] = (2*0) + (3*1) + (5*1) + (6*0) = 8
Output[1,0] = (4*0) + (5*1) + (7*1) + (8*0) = 18
Output[1,1] = (5*0) + (6*1) + (8*1) + (9*0) = 23

Some Predefined Kernals include: Sobel, Gaussian, Sharpening, etc.


Pooling Layer
• keras.layers.MaxPooling2D() or keras.layers.AveragePooling2D()
• Performs downsampling operation.
• Max pooling is a downsampling operation commonly applied after
convolutional layers to reduce the spatial dimensions of the feature
maps while retaining important features
Recurrent Layer
• keras.layers.LSTM(), keras.layers.GRU(), etc.
• Used for recurrent neural networks.
Activation Functions
• keras.activations.relu(), keras.activations.sigmoid(),
keras.activations.softmax(), etc.
• Applies an activation function to an output.
Dropout
• keras.layers.Dropout()
• Used for regularization by randomly setting a fraction of input units to
0 at each update during training.
Batch Normalization
• keras.layers.BatchNormalization()
• Used to normalize the activations of the previous layer at each batch.
Optimizer
• keras.optimizers.SGD(), keras.optimizers.Adam(), etc.
• Algorithms used to update the weights of the network.
Loss Function
• keras.losses.categorical_crossentropy(),
keras.losses.mean_squared_error(), etc.
• Used to compute the quantity that a model should seek to minimize
during training.
Metrics
• keras.metrics.Accuracy(), keras.metrics.Precision(),
keras.metrics.Recall(), etc.
• Used to evaluate the performance of a model.
Model Compilation
• model.compile()
• Configures the model for training.
Model Training
• model.fit()
• Trains the model for a fixed number of epochs.
Model Evaluation
• model.evaluate()
• Evaluates the model on a testing dataset.
Model Prediction
• model.predict()
• Generates output predictions for the input samples.
Model Summary
• model.summary()
• Prints a summary representation of the model.
Model Saving/Loading
• model.save() and keras.models.load_model()
• Used to save and load model weights and architecture

You might also like