Deep Learning Module-04 Search Creators
Deep Learning Module-04 Search Creators
Module-04
Convolutional Networks
Definition of Convolution
• Purpose: Captures important patterns and structures in the input data, crucial for tasks like
image recognition.
2. Mathematical Formulation
3. Parameters of Convolution
a. Stride
• Definition: The number of pixels the filter moves over the input.
• Types:
o Stride of 2: Filter moves two pixels at a time, reducing output size (downsampling).
b.Padding
• Types:
o Same Padding: Padding applied to maintain the same output dimensions as the
input.
Purpose of Pooling
2. Types of Pooling
a. Max Pooling
• Definition: Selects the maximum value from each patch (sub-region) of the feature map.
• Purpose: Captures the most prominent features while reducing spatial dimensions.
b. Average Pooling
• Definition: Takes the average value from each patch of the feature map.
3. Operation of Pooling
• Feature Extraction: Reduces the size of the feature maps while retaining the most relevant
features.
• Robustness: Provides a degree of invariance to small translations in the input, making the
model more robust.
• Focus on Local Patterns: Emphasizes the importance of local patterns in the data (e.g.,
edges and textures) over global patterns.
• Feature Learning: Both operations prioritize local features, enabling efficient learning of
essential characteristics from input data.
1. Dilated Convolutions
• Wider Context: Allows the model to incorporate a wider context of the input data without
significantly increasing the number of parameters.
• Two-Stage Process:
o Pointwise Convolution: Uses 1x1 convolutions to combine the outputs from the
depthwise convolution.
• Applications: Commonly used in lightweight models, such as MobileNets, for mobile and
edge devices.
• Structured Outputs: Refers to tasks where the output has a specific structure or spatial
arrangement, such as pixel-wise predictions in image segmentation or keypoint localization
in object detection.
• Maintaining Spatial Structure: For tasks like semantic segmentation, it’s crucial to
maintain the spatial relationships between pixels in predictions to ensure that the output
accurately represents the original input image.
3. Specialized Networks
• Skip Connections: Techniques like skip connections (used in U-Net and ResNet) help
preserve high-resolution features from earlier layers, improving the accuracy of the output.
o Pixel-wise Loss: Evaluating the loss on a per-pixel basis (e.g., Cross-Entropy Loss
for segmentation).
5. Applications
• Use Cases: Structured output networks are widely used in various applications, including:
o Object Detection: Predicting bounding boxes and class labels for objects in an
image while maintaining spatial relations.
Data Types
1. 2D Images
• Standard Input: The most common input type for CNNs, typically used in image
classification, object detection, and segmentation tasks.
• Format: Represented as height × width × channels (e.g., RGB images have three channels).
2. 3D Data
• Definition: Includes video processing and volumetric data, such as those found in medical
imaging (e.g., MRI or CT scans).
3. 1D Data
• Applications: Used in tasks like speech recognition, audio classification, and analyzing
sensor data from IoT devices.
• Definition: A mathematical algorithm that computes the discrete Fourier transform (DFT)
and its inverse, converting signals between time (or spatial) domain and frequency domain.
2. Winograd's Algorithms
• Efficiency Improvement:
o They can reduce the complexity of convolution operations, particularly for small
kernels, making them more efficient in terms of computational resources.
• Key Concepts:
o The algorithms break down the convolution operation into smaller components,
allowing for fewer multiplicative operations and leveraging addition and
subtraction instead.
• Definition: A technique that uses random projections to map input data into a higher-
dimensional space, facilitating the extraction of features without the need for labels.
• Purpose: Helps to approximate kernel methods, enabling linear models to learn complex
functions.
• Advantages:
o Scalability: Suitable for large datasets as it allows for faster training times.
• Applications: Commonly used in tasks where labeled data is scarce, such as clustering and
anomaly detection.
2. Autoencoders
• Structure:
• Purpose: Learns to capture important features and structures in the data without
supervision, making it effective for dimensionality reduction and feature extraction.
• Advantages:
o Robustness: Can learn from noisy data and still produce meaningful
representations.
Notable Architectures
1. LeNet-5
• Introduction:
o One of the first convolutional networks designed specifically for image recognition
tasks.
• Architecture Details:
o Convolutional Layer 1:
o Pooling Layer 1:
o Convolutional Layer 2:
▪ 16 filters (5x5).
o Pooling Layer 2:
• Significance:
o Introduced the concept of using convolutional layers for feature extraction followed
by pooling layers for dimensionality reduction.
2. AlexNet
• Introduction:
• Architecture Details:
o Convolutional Layer 1:
o Pooling Layer 1:
o Convolutional Layer 2:
o Pooling Layer 2:
o Convolutional Layer 3:
o Convolutional Layer 4:
o Convolutional Layer 5:
o Pooling Layer 3:
o ReLU Activation:
o Dropout:
o Data Augmentation:
o GPU Utilization:
• Significance:
o Highlighted the importance of large labeled datasets and robust training techniques
in achieving state-of-the-art performance.