Presentation1
Presentation1
2D and 3D Object
Detection
Team 3
Jua
Traditional Object Detection
RCNN Overview:
• Introduced by Ross Girshick et al.
• Region Proposal: Uses selective search to
propose regions (bounding boxes)
• CNN-Based Feature Extraction: Extracts
features for each region
• Classification: Uses classifiers like SVM for
object classification
Limitations:
• Slow due to separate steps for region
proposal, feature extraction, and classification
• Not real-time
RCNN (Region-
Based CNN)
Fast RCNN and
Faster RCNN
Fast RCNN:
• Combines feature extraction and
classification in a single forward pass
• Uses ROI Pooling for faster
computation
Faster RCNN:
• Introduces the Region Proposal
Network (RPN) to generate proposals
• Significantly faster than RCNN
Fast RCNN
Faster RCNN
YOLO (You Only
Look Once)
YOLO Overview:
• Single-stage detector: Combines region proposal,
classification, and bounding box prediction in one pass
• Speed: Real-time object detection
• Divides the image into grid cells and predicts bounding
boxes for each cell
Strengths:
• Extremely fast
• Real-time detection for video processing
Weaknesses:
• Struggles with detecting small or overlapping objects
SSD (Single Shot
MultiBox Detector)
SSD Overview:
• Combines YOLO’s speed with better accuracy for small
objects
• Predicts objects at multiple scales using feature maps
from different layers
• No need for a separate region proposal network like in
Faster RCNN
Strengths:
• Good balance between speed and accuracy
• Multi-scale detection improves performance for small
objects
Weaknesses:
• Still not as accurate as two-stage detectors like Faster
RCNN
Anchor-Free Keypoint-
Based Detection
Overview:
• Anchor-Free: Does not use predefined
anchor boxes
• Detects objects by keypoints (like center
points or object corners)
• Examples: CornerNet, CenterNet
Strengths:
• Eliminates the complexity of anchor design
• More flexible for varying object shapes
Weaknesses:
• May struggle with overlapping objects or
cluttered scenes
Anchor-Free Anchor-
Point-Based Detection
Overview:
• Instead of using predefined anchors, anchor
points are selected dynamically
• Faster and simpler as it removes the need for a
predefined grid of anchors
• Examples: FCOS, CrossDet
Strengths:
• Improves the efficiency of object detection
• Reduces false positives from misaligned anchors
Weaknesses:
• May lose some localization accuracy compared
to anchor-based models
DETR (DEtection
TRansformers)
• Overview:
• Transformer-based approach for object detection
• Uses Transformers to model object detection as a set
prediction problem
• No need for non-maximum suppression or anchor
boxes
• Strengths:
• Simplified architecture
• Strong performance in detecting objects with complex
relationships
• Weaknesses:
• Requires large amounts of data and computational
resources
• Slower convergence compared to CNN-based models
ViT (Vision
Transformer)
• Overview:
• Vision Transformer applies the transformer
architecture (originally for NLP) to image data
• Divides images into patches and processes
them like sequences of words
• Strengths:
• Strong performance for large datasets
• Captures long-range dependencies in images
• Weaknesses:
• Requires substantial training data
• Computationally expensive
Comparison of 2-D Object Detection
Models