IVP notes
IVP notes
It’s like breaking down a digital image into smaller pieces, like
objects, groups of pixels .
the image.
● Process: Assigns a label to each pixel in the image. Pixels with the same
● Benefits:
1. Image Input: The process begins with an input image or frame from a
video.
2. Feature Extraction: The image is analyzed to extract features using
techniques like convolutional neural networks (CNNs). These features
help in identifying parts of the image that might contain an object.
3. Region Proposals: Potential regions where objects might be located
are proposed. Methods such as Selective Search, Region Proposal
Networks (RPNs), or sliding windows can be used.
4. Classification and Bounding Box Regression: Each proposed
region is classified to determine if it contains an object and which class
the object belongs to. Simultaneously, the bounding box of the object is
refined to more accurately encompass the object.
5. Post-processing: Techniques like Non-Maximum Suppression (NMS)
are used to remove redundant bounding boxes and keep the best ones,
ensuring that each object is detected only once.
Popular Object Detection Algorithms:
2. Sliding Window:
5. Grid-Based Division:
Techniques
● Grayscale: This represents an image using a single channel, where each pixel
holds a brightness value (often from 0 for black to 255 for white).
● RGB (Red, Green, Blue): This is the standard format for colored images, where
each pixel stores intensity values for red, green, and blue channels.
Global descriptors: These capture the overall properties of an image in a single
feature vector. They are useful for image classification tasks. Some examples include:
● Bag-of-words models: Treat image features like words and represent the image
by their frequency.
● Histogram of Oriented Gradients (HOG): Captures the distribution of local
gradients in an image.
Common color models utilized in image representation include RGB (Red, Green, Blue),
HSV (Hue, Saturation, Value/Brightness), and CMYK (Cyan, Magenta, Yellow, Black).
Different multimedia applications might prefer one model over another based on their
distinct requirements.
For instance, RGB model is typically used in computer graphics while CMYK model is
largely used for printing purposes.
● RGB: Red (0-255), Green (0-255), Blue (0-255) - Used in computer screens
● CMYK: Cyan (0-100%), Magenta (0-100%), Yellow (0-100%), Black (0-100%) -
Used in professional printing
● HSV: Hue (0-360), Saturation (0-100%), Value/Brightness (0-100%) - Used in
television broadcasting
Key Technologies:
Real-World Applications:
● Images are typically stored in digital files using formats like JPEG, PNG, or
BMP. These formats encode the pixel values and additional information like
image dimensions and color space.
● There are also specialized image processing libraries like OpenCV that provide
various numerical representations of images suitable for different computer
vision algorithms.
1. Geometric Transformations:
○ Rotation: Rotating the image by a certain angle (e.g., -30 to +30 degrees)
to simulate different orientations.
○ Translation: Shifting the image horizontally or vertically to change its
position.
○ Scaling: Resizing the image to make objects appear larger or smaller.
○ Shearing: Applying a shear transformation to slant the image along the x
or y axis.
○ Flipping: Flipping the image horizontally or vertically.
3. Noise Injection:
○ Gaussian Noise: Adding random noise to the image to simulate variations
and imperfections.
○ Salt-and-Pepper Noise: Randomly setting some pixels to maximum and
minimum values.
1. Increased dataset size: Augmentation can significantly increase the size of the
training dataset, making it more representative of the real-world data.
2. Improved model robustness: By exposing the model to a wide range of image
variations, augmentation helps improve its robustness to different conditions and
scenarios.
3. Reduced overfitting: Augmentation can reduce overfitting by making it more
difficult for the model to memorize the training data.
4. Better generalization: Augmentation helps the model generalize better to new,
unseen data by making it more adaptable to different image styles and conditions
1. TensorFlow:tf.image.
2. PyTorch: torchvision.transforms.
3. OpenCV: cv2.rotate() and cv2.flip().
Image enhancement is the process of digitally manipulating a stored image
using software. The tools used for image enhancement include many different
kinds of software such as filters, image editors and other tools for changing
various properties of an entire image or parts of an image.
Image enhancement is a process that involves improving the quality
and appearance of an image by modifying its color, contrast,
brightness, and other features. The goal is to make the image more
clear and visible, or to highlight certain features.
Example:
Suppose we have an image with a histogram that is skewed towards the darker end of
the intensity range. After applying histogram equalization, the histogram becomes more
uniform, and the image appears more balanced, with improved contrast and visibility of
details
2. Contrast stretching: Stretching the contrast of the image to make it
more visually appealing.
Example:
Suppose we have an image with a limited contrast range, where the brightest pixels are
not very bright and the darkest pixels are not very dark. After applying contrast
stretching, the image appears more vivid, with improved contrast and visibility of
details.
(Gamma value :- describes the relationship between a color value and its
brightness on a particular device.)Gamma correction is a nonlinear operation
used to adjust the brightness of an image. It helps in correcting the nonlinear
intensity response of display systems and enhancing the perceptual quality of an
image.
4. Filtering: Applying filters such as Gaussian filters, median filters, and
Wiener filters to remove noise and improve image quality.
Filtering is a technique used to remove noise and artifacts from an image by applying a
mathematical operation to the pixel values. The goal is to:
Example:
Suppose we have an image with salt and pepper noise. After applying a median filter,
the noise is reduced, and the image appears smoother and more detailed.
Types of Image Enhancement:
Contour Detection:
- Contour detection is a higher-level concept that involves identifying and
extracting the boundaries of objects or regions in an image. Contours are
continuous curves that outline the boundaries of objects in an image.
- Contour detection algorithms aim to identify these continuous curves that
represent the boundaries of objects based on color, intensity, texture, or other
visual cues.
- Contour detection can involve more complex algorithms than edge detection
and may include additional processing steps such as noise reduction, smoothing,
or curve fitting.
- Contour detection is often used in tasks such as object recognition, shape
analysis, and image segmentation.
● Face recognition
● Computer vision
● Machine vision
● Fingerprint recognition
● Medical imaging
● Vehicle detection (traffic control)
Background subtraction is a widely used technique in computer
vision and image processing. The background subtraction technique
aims to detect moving objects in a sequence of frames from a static
camera. It allows image foreground (moving object) and background
(stationary object) to be extracted for further processing, such as
object recognition.
Assumption:
Background Modeling:
Feature Separate CNN pass for Single CNN pass Single CNN pass
Extraction each proposed region for entire image for entire image
Fast R-CNN:-
Faster R-CNN:-
Tracking techniques in image processing are methods used to
detect and follow objects over time within a sequence of images or
video frames. Here’s an overview of the three common techniques:
Optical Flow, Kalman Filter, and Particle Filter.
1. Optical Flow
2. Kalman Filter