CVIP Lecture for stud
CVIP Lecture for stud
Lecture Note
Wachemo University
Computer Vision and Image Processing Lecture Note 1
Chapter One: Introduction to Digital Image Processing
Digital Image Processing
When x, y and intensity values of an image are finite and discrete the image is digital
Processing digital images using digital computer known as digital image processing
Digital images are composed of a finite number of elements each having particular
location and value
These elements are called picture elements, image elements, pels, and pixels
Pixel is the term used most widely to denote the elements of a digital image
Computer Vision and Image Processing Lecture Note 2
Cont.
Humans are limited to the visual band of the EM spectrum, imaging machines cover
almost the entire EM spectrum, ranging from gamma to radio waves
IP operate on images generated by sources that humans are not accustomed to that
images
No general agreement where image processing stops and computer vision start
Focus is on making the image more visually appealing or suitable for further use
It’s about transforming the raw image into a refined version of itself
Image processing is the method of enhancing quality of image for specific application
Imagine you have a photograph that isn’t perfect, maybe it’s dark, or the colors are dull
Computer Vision and Image Processing Lecture Note 4
Cont.
Image processing is like a magic wand that transforms this photo into a better version
It involves altering or improving digital images using various methods and tools
Transformations improve aesthetics, make images more suitable for analysis, laying
the groundwork for deeper interpretation, including by computer vision systems
It tries to make sense of it, much like how our brain interprets what our eyes see
CV’s goal is to use computers to emulate human vision, including learning and making
inferences and take actions based on visual inputs
The goal isn’t to change image, but to understand what the image represents
It’s at the heart of AI and robotics, helping machines recognize faces, interpret road
scenes for autonomous vehicles, and understand human behavior
The success of computer vision often relies on the quality of image processing
Based on the preceding comments, we see that a logical place of overlap between image
Computer Vision and Image Processing Lecture Note 8
Applications of Computer Vision
Some examples of computer vision and image processing applications and goals:
Image acquisition:
Image enhancement:
Process of manipulating an image to obtain more suitable than the original for a
specific application
Enhancement techniques are problem oriented (e.g. a method useful for X-ray
images may not be good satellite images taken in the infrared band)
Image restoration
It has significant increase in the use of digital images over the internet
Color is used also as the basis for extracting features of interest in an image
Feature extraction:
Uses raw pixel data, constituting either the boundary of a region (i.e., the set of pixels
separating one image region from another) or all the points in the region itself
Image pattern classification: is the process that assigns a label to an object based on
its feature descriptors
The knowledge base can also be quite complex, such as an interrelated list of all
major possible defects in a materials inspection problem, or an image database
containing high-resolution satellite images of a region in connection with change-
Computer Vision and Image Processing Lecture Note 15
Application Areas of Digital Image Processing.
Other sources of energy include acoustic, ultrasonic, and electronic (electron beams
used in electron microscopy)
Synthetic images, used for modeling and visualization, are generated by computer
Images based on radiation from the EM spectrum are familiar in X-ray and visual
bands of the spectrum
In nuclear medicine, to inject a patient with a radioactive isotope that emits gamma
rays as it decays
E.g. image of a complete bone obtained using gamma-ray imaging used to locate
sites of bone pathology, such as infections or tumors
X-rays images are generated using an X-ray tube, which is a vacuum tube with a
cathode and anode
In digital radiography, digital images are obtained by one of two methods: (1) by
digitizing X-ray films; or; (2) by having the X-rays that pass through the patient fall
directly onto devices (such as a phosphor screen) that convert X-rays to light
Ultraviolet light is used in fluorescence microscopy, one of the fastest growing areas
of microscopy
Visual band of the EM spectrum is the most familiar in all our activities, imaging in
this band outweighs by far all the others in terms of breadth of application
Even in microscopy alone, the application areas are too numerous to detail here.
It is not difficult to conceptualize the types of processes one might apply to these
images, ranging from enhancement to measurements.
Computer Vision and Image Processing Lecture Note 20
Chapter Two: Digital Image Fundamentals
Elements of Visual Perception
Structure of human eye
Eye is enclosed by three membranes: cornea and sclera outer cover, choroid, and retina
Cornea is a tough, transparent tissue that covers the anterior surface of the eye
Sclera is an opaque membrane that encloses the remainder of the optic globe
Choroid lies below sclera containing blood vessels to serve as nutrition source of the eye
Cones are located in central portion of retina called fovea, highly sensitive to color
Humans can resolve fine details because each cone is connected to its own nerve end
Muscles rotate the eye until the image of a region of interest falls on the fovea
Rods capture an overall image of the field of view, not involved in color vision
E.g. object appear colored in daylight appear colorless in moonlight because of rods
Varying the distance between lens and imaging plane used for focusing at various distances
In human eye, the distance between the center of the lens and retina (imaging sensor) is
fixed, and the focal length of proper focus is obtained by varying the shape of the lens
The range of focal lengths is approximately 14mm to 17mm, the latter taking place when
the eye is relaxed and focused at distances greater than about 3m
For an object with height h, a person looking an object at distance d, to obtain the retinal
image is h/d dimensions of an image formed on the retina,
E.g. suppose that a person is looking at a tree 15m high at a distance of 100m, let h denote
height of object in the retinal image, yields 15/100 =h/17 or h=25mm
Allows to perceive light intensity from dimly lit to bright daylight for visual clarity, sensitivity
Occurs in retina and involves adjustments in rods and cones to varying levels of light
When moving from a bright to a darker, the eye becomes more sensitive to low light levels
Moving from a dark to a bright, the eye becomes more sensitive to high light levels
Visual system’s perceive and differentiate fine details and contrasts within image or scene
Spatial (fine vs. coarse) and temporal frequency (time) affect discrimination capabilities
Essential for reading text, recognizing faces, navigating environments, identifying objects
Ambient lighting, noise, color contrast, and complexity of pattern influence discrimination
When a beam of sunlight passes through a glass prism, the emerging beam of light consists a
continuous spectrum of colors ranging from violet at one end to red at the other
The range of colors we perceive in visible light is a small portion of the EM spectrum
Radio waves with the higher wavelengths than visible light exist at one end, whereas gamma
rays with smaller wavelengths than visible light found in another end
𝐸 = ℎ𝑣 h is planks constant
Computer Vision and Image Processing Lecture Note 27
Cont.
Each massless particle contains a certain amount (or bundle) of energy called a photon
Radio waves have lowest frequency, whereas gamma waves have the highest frequency
High-energy EM radiation, X-ray and gamma ray bands is harmful to living organisms
Visible light EM spectrum is a type of EM radiation that can be sensed by human eye
Color spectrum is divided into six regions: violet, blue, green, yellow, orange, and red
The nature of light reflected by the object determines the colors perceived in an object
A body that reflects balanced light in all visible wavelengths appears white to the observer
A body that reflects in a limited range of the visible spectrum exhibits some shades of color
E.g. green objects reflect light with wavelengths primarily in the 500 to 570 nm range, while
absorbing most of the energy at other wavelengths
Void/no color light called monochromatic/achromatic light, its only attribute is intensity
The intensity of monochromatic light vary from black to grays and finally to white
Monochromatic light values range from black to white which is called gray scale
Monochromatic images are frequently referred to as grayscale images
Chromatic/color light EM spectrum wavelength ranges from 0.43 to 0.79 mm
Computer Vision and Image Processing Lecture Note 30
Cont.
Three quantities that describe a chromatic light source: radiance, luminance, and brightness
Radiance is total amount of energy that flows from the light source, measured in watts (W)
Luminance is the amount of energy an observer perceives from a light source, in lumens (lm)
Brightness embodies the achromatic notion of intensity and is key in describing color sensation
Predominant source of imaging is energy of EM wave, but this is not the only source
Electron beams for electron microscopy and software for generating synthetic images are
another sources of digital imaging
In principle, if a sensor detects energy in the specific band of EM, object imaged at that band
Wavelength of EM wave required to see an object must be the same/smaller size than the object
E.g. a water molecule has a diameter on the order of 10 10 −m. Thus, to study these molecules,
we would need a source capable of emitting energy in the far (high energy) ultraviolet band or
soft (low-energy) X-ray bands
Illumination originate from a source of EM energy like radar, infrared, or X-ray system
Scene elements could be objects like molecules, buried rock formations, or a human brain
Illumination energy either reflected from or transmitted through objects based on the source
E.g. Light reflected from a planar surface, whereas X-rays pass through a patient’s body for
the purpose of generating a diagnostic X-ray image
E.g. Photodiode constructed from silicon, output is a voltage proportional to light intensity
To generate 2D image, there is displacements in x and y directions of sensor and object area
Sensor is mounted on a lead screw that provides motion in the perpendicular direction
A geometry used more frequently than single sensors is an in-line sensor strip
Imaging strip gives 1D of image at a time, and motion of strip relative to scene completes 2D
Lens are used to project the area to be scanned onto the sensors
Sensor strips in a ring used in medical and industrial imaging to obtain images of 3D object
Typical sensor for these cameras is a CCD (charge-coupled device) array, 4000*4000
Sensor integrates the input light over minutes/even hours to achieve noise reduction
Advantage is a complete image obtained by focusing the energy pattern onto the array
Motion obviously is not necessary
Sensor array, produces outputs proportional to integral of the light received at each sensor
Computer Vision and Image Processing Lecture Note 36
Image Sampling and Quantization
Output of most sensors is a continuous voltage waveform (i.e., coordinates and amplitude)
To create a digital image, the sensed data should be converted into a digital format
Method of sampling is determined by the sensor arrangement used to generate the image
Representing with two axes (x, y) and value of z, useful in grayscale sets which is (x, y, z)
Representing using array/matrix, which is used for computer processing. Digital image
represented as array of real numbers. Each element of array is an picture element, pixel/p
A pixel p at (x, y) has 4 neighbors (2 horizontal, 2 vertical) with coordinates (x+1, y), (x-1, y),
(x, y+1), (x, y-1), This set of pixels called the 4-neighbors of p, or N4(p)
4 diagonal neighbors of p denoted by ND(p) have coordinates (x+1, y+1), (x+1, y-1), (x-1,
y+1), (x-1, y-1)
In a grayscale image, set V contains more elements. E.g. for the adjacency of pixels whose
values are 0 to 255, set V could be any subset of these 256 values
A digital path from pixel p with (𝑥0 , 𝑦0 ) to pixel q with (𝑥𝑛 , 𝑦𝑛 ) is a sequence of distinct
pixels with coordinate (𝑥0 , 𝑦0 ), (𝑥1 , 𝑦1 )… (𝑥𝑛 , 𝑦𝑛 ) n is length of the path
Let S is subset of pixels in an image, two pixels p and q are said to be connected in S if there
exists a path between them consisting entirely of pixels in S
Let R is a subset of pixels in an image, if R is a connected set then R is a region of the image
Two regions, 𝑅𝑖 and 𝑅𝑗 are said to be adjacent if their union forms a connected set
Arithmetic operations between two images f(x, y)and g(x, y) are denoted as:
The results s, d, p and v are images, which has the same size of inputs f and g
Arithmetic addition is important in DIP for noise reduction, is image enhancement method
Image multiplication and division are important for shading correction and masking
IM1 AND IM2 IM1 OR IM2 IM1 XOR IM2 NOT IM1
Single-pixel operations: used to alter intensity of pixels using a transformation, e.g. negatives
Spatial domain refers to the image plane itself, direct manipulation of pixels in an image
Two categories of spatial processing are intensity transformations and spatial filtering
Transformations operate on single pixels for contrast manipulation and image thresholding
The spatial domain processing are denoted by the expression: g(x, y) = T[f(x, y)]
T can be operate on a set of images e.g. addition of K images for noise reduction
The negative of an image with specific intensity levels is obtained using transformation
Simple to compute, suitable for hardware implementations, useful for real-time image processing
Histogram present in digital cameras and used to see the distribution of gray level captured
X-axis represent intensity level, y-axis represent number of pixels at the specific intensity
Black/dark presented in the left, medium gray in middle and white at the right end of the x-axis
Histogram Equalization
Is the process of uniformly distributing the image histogram over the entire intensity axis
by choosing a proper intensity transformation function
Used for feature matching, for pictures from diverse sources or varied intensity
Is possible only if the number of channels matches in the input and reference images
Using the reference histogram, update the pixel intensity values in the input picture
such that they match
Computer Vision and Image Processing Lecture Note 62
Cont.
Statistics obtained directly from an image histogram can be used for image enhancement
Global mean and variance are computed from entire image and are useful for gross
adjustments in overall intensity and contrast
Local mean and variance are used as the basis for making changes that depend on image
characteristics in a neighborhood about each pixel in an image