0% found this document useful (0 votes)
301 views

VGG16 Architecture

The VGG16 architecture is a convolutional neural network model for image classification and recognition. It contains 13 convolutional layers and 3 fully connected layers, for a total of 16 layers, hence the name VGG16. The input is a 224x224x3 image that passes through the convolutional and max pooling layers to extract features before the fully connected layers for classification output. Key layers include multiple 3x3 convolutional layers with padding and max pooling layers to reduce spatial dimensions.

Uploaded by

Mehak Smagh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
301 views

VGG16 Architecture

The VGG16 architecture is a convolutional neural network model for image classification and recognition. It contains 13 convolutional layers and 3 fully connected layers, for a total of 16 layers, hence the name VGG16. The input is a 224x224x3 image that passes through the convolutional and max pooling layers to extract features before the fully connected layers for classification output. Key layers include multiple 3x3 convolutional layers with padding and max pooling layers to reduce spatial dimensions.

Uploaded by

Mehak Smagh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

VGG16 architecture

VGG16 architecture
• Input layer
– 224×224×3
• CL1 (64 kernels, Filter size :3×3, S=1)
VGG16
• CL2 (64 kernels, Filter size :3×3, S=1)
Max pool (ML1)

2 x 2 stride=2

224 x224 x64  112 x112 x64


CL3
• 3x3,pad=1,
• 128 kernels
CL4
• 3x3,pad=1,
• 128 kernels
Max pool (ML2)

2 x 2 stride=2

112 x112 x128  56 x56 x 128


CL5
• 3x3,pad=1,
• 256 kernels
CL6
• 3x3,pad=1,
• 256 kernels
CL7
• 3x3,pad=1,
• 256 kernels
Max pool (ML3)

2 x 2 stride=2

56 x56 x256  28x28 x 256


CL8
• 3x3,pad=1,
• 512 kernels
CL9
• 3x3,pad=1,
• 512 kernels
CL10
• 3x3,pad=1,
• 512 kernels
Max pool (ML4)

2 x 2 stride=2

28 x28 x512  14x14 x 512


CL11
• 3x3,pad=1,
• 512 kernels
CL12
• 3x3,pad=1,
• 512 kernels
CL13
• 3x3,pad=1,
• 512 kernels
Max pool (ML5)

2 x 2 stride=2

14 x14 x512  7x7 x 512


FC(1)
4096 neurons
FC(2)
4096 neurons
Output layer –softmax
FC(3)
1000 neurons
Layer (type) Output Shape Param #
CL1 (224, 224, 64) (( 3*3*3)+1)*64
Layer (type) Output Shape Param #
CL1 (224, 224, 64) 1792
Layer (type) Output Shape Param #
CL1 (224, 224, 64) 1792
CL2 (224, 224, 64) ((3*3*64)+1)*64
Layer (type) Output Shape Param #
CL1 (224, 224, 64) 1792
CL2 (224, 224, 64) 36928
ML1 (112, 112, 64) 0
CL3 (112, 112, 128) 73856
CL4 (112, 112, 128) 147584
ML2 (56, 56, 128) 0
CL5 (56, 56, 256) 295168
CL6 (56, 56, 256) 590080
CL7 (56, 56, 256) 590080
ML3 (28, 28, 256) 0
CL8 (28, 28, 512) 1180160
CL9 (28, 28, 512) 2359808
CL10 (28, 28, 512) 2359808
ML4 (14, 14, 512) 0
CL11 (14, 14, 512) 2359808
CL12 (14, 14, 512) 2359808
CL13 (14, 14, 512) 2359808
ML5 (7, 7, 512)=25088 0
FC1 (4096) 102764544
FC2 (4096) 16781312
FC3/Output Layer (1000) 4097000
Why the name VGG16?
• CNN layers=13
• Fully connected layers=3
• Total VGG16 architecture=16 layers
VGG 16 architecture

https://medium.com/mlearning-ai/an-overview-of-vgg16-and-nin-models-96e4bf398484
References
[1] Simonyan, Karen, and Andrew Zisserman. “Very deep
convolutional networks for large-scale image
recognition.” arXiv preprint arXiv:1409.1556 (2014).
[2] Lin, Min, Qiang Chen, and Shuicheng Yan. “Network in
network.” arXiv preprint arXiv:1312.4400 (2013).
[3] https://medium.com/mlearning-ai/an-overview-of-vgg16-and-
nin-models-96e4bf398484

You might also like