0% found this document useful (0 votes)
4 views

cnn-notes-architecture

Convolutional Neural Networks (CNNs) are deep learning models designed for processing grid-like data, excelling in tasks such as image classification and object detection. The architecture consists of multiple layers, including convolutional, pooling, and fully connected layers, which work together to extract and transform features from input data. CNNs have numerous applications across various fields, including image classification, object detection, and medical imaging.

Uploaded by

nssurlec
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

cnn-notes-architecture

Convolutional Neural Networks (CNNs) are deep learning models designed for processing grid-like data, excelling in tasks such as image classification and object detection. The architecture consists of multiple layers, including convolutional, pooling, and fully connected layers, which work together to extract and transform features from input data. CNNs have numerous applications across various fields, including image classification, object detection, and medical imaging.

Uploaded by

nssurlec
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are a class of deep learning models


specifically designed for processing grid-like data, such as images, videos, and
audio. CNNs are inspired by the biological visual cortex and are highly effective
for tasks like image classification, object detection, segmentation, and more. The
architecture of a CNN is composed of multiple layers, each serving a specific
purpose in extracting and transforming features from the input data.
Key Components of CNN Architecture
1. Input Layer:
- The input layer takes in the raw data, such as an image (e.g., a 3D tensor of
height × width × channels for RGB images).
2. Convolutional Layers:
- These layers apply convolution operations to the input using learnable filters
(kernels).
- Each filter detects specific features (e.g., edges, textures, patterns) in the input.
- Multiple filters are used to extract different features, producing a stack of
feature maps as output.
3. Activation Function:
- After convolution, an activation function (e.g., ReLU) is applied to introduce
non-linearity into the model.
- ReLU (Rectified Linear Unit) is the most commonly used activation function:

𝑅𝑒𝐿𝑈(𝑥) = max(𝑜, 𝑥)
4. Pooling Layers:
- Pooling layers down sample the feature maps, reducing their spatial
dimensions while retaining the most important information.
- Common types of pooling:
- Max Pooling: Selects the maximum value in each pooling region.
- Average Pooling: Computes the average value in each pooling region.
- Pooling reduces computational complexity and helps prevent overfitting.
5. Fully Connected (Dense) Layers:
- After multiple convolutional and pooling layers, the feature maps are flattened
into a 1D vector and passed to fully connected layers.
- These layers combine the extracted features to make predictions (e.g.,
classifying an image into categories).
6. Output Layer:
- The output layer produces the final predictions, such as class probabilities in
classification tasks.
- Common activation functions for the output layer:
- Softmax: For multi-class classification.
- Sigmoid: For binary classification.
Typical CNN Architecture
A typical CNN architecture consists of the following sequence of layers:
1. Input Layer:
- Takes in the raw image or data.
2. Convolutional Block:
- Convolutional Layer: Applies filters to extract features.
- Activation Function: Introduces non-linearity (e.g., ReLU).
- Pooling Layer: Down samples the feature maps.
3. Repeat Convolutional Blocks:
- Multiple convolutional blocks are stacked to extract hierarchical features (e.g.,
edges → textures → shapes → objects).
4. Flatten Layer:
- Converts the 3D feature maps into a 1D vector for input to fully connected
layers.
5. Fully Connected Layers:
- Combines features to make predictions.
6. Output Layer:
- Produces the final output (e.g., class probabilities).
Example: LeNet-5 (Early CNN Architecture)
LeNet-5, developed by Yann LeCun in 1998, is one of the earliest CNN
architectures used for handwritten digit recognition. Its architecture is as follows:
1. Input Layer: Grayscale image (32x32 pixels).
2. Convolutional Layer: 6 filters of size 5x5, stride 1.
3. Pooling Layer: Max pooling with 2x2 window, stride 2.
4. Convolutional Layer: 16 filters of size 5x5, stride 1.
5. Pooling Layer: Max pooling with 2x2 window, stride 2.
6. Fully Connected Layers: 120 → 84 neurons.
7. Output Layer: 10 neurons (for digit classification).
Modern CNN Architectures
Over the years, more advanced CNN architectures have been developed,
including:
1. AlexNet:
- Introduced ReLU activation, dropout, and data augmentation.
- Won the ImageNet competition in 2012.
2. VGGNet:
- Uses smaller 3x3 filters and deeper networks (e.g., VGG16, VGG19).
3. ResNet (Residual Networks):
- Introduces skip connections (residual blocks) to enable training of very deep
networks (e.g., ResNet-50, ResNet-101).
4. Inception (GoogLeNet):
- Uses multi-scale filters within the same layer (Inception modules).
5. Efficient Net:
- Balances depth, width, and resolution for efficient scaling.
Advantages of CNNs
1. Automatic Feature Extraction:
- CNNs learn relevant features directly from the data, eliminating the need for
manual feature engineering.
2. Parameter Sharing:
- Filters are shared across the input, reducing the number of parameters and
computational cost.
3. Translation Invariance:
- CNNs can detect features regardless of their position in the input.
4. Hierarchical Feature Learning:
- Early layers detect low-level features (e.g., edges), while deeper layers detect
high-level features (e.g., objects).
Applications of CNNs
1. Image Classification:
- Assigning labels to images (e.g., cat vs. dog).
2. Object Detection:
- Locating and classifying objects within an image (e.g., YOLO, Faster R-
CNN).
3. Semantic Segmentation:
- Assigning a label to each pixel in an image (e.g., identifying roads, buildings).
4. Face Recognition:
- Identifying or verifying individuals from images or videos.
5. Medical Imaging:
- Detecting diseases from X-rays, MRIs, etc.
6. Natural Language Processing (NLP):
- Text classification, sentiment analysis, etc.
In summary, CNNs are powerful and versatile models that have revolutionized
computer vision and other fields. Their ability to automatically learn hierarchical
features from data makes them a cornerstone of modern deep learning.

You might also like