0% found this document useful (0 votes)
1 views

DIP notes

The document discusses histograms in digital image processing, explaining their graphical representation and applications such as image analysis, brightness adjustment, and contrast enhancement. It covers techniques like histogram equalization and matching, which aim to enhance image contrast and normalize representations, respectively. Additionally, it introduces local histogram processing, homomorphic filtering for illumination compensation, and various image compression methods, including Huffman and arithmetic coding.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

DIP notes

The document discusses histograms in digital image processing, explaining their graphical representation and applications such as image analysis, brightness adjustment, and contrast enhancement. It covers techniques like histogram equalization and matching, which aim to enhance image contrast and normalize representations, respectively. Additionally, it introduces local histogram processing, homomorphic filtering for illumination compensation, and various image compression methods, including Huffman and arithmetic coding.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Histograms

In digital image processing, the histogram is used for graphical representation of a


digital image. A graph is a plot by the number of pixels for each tonal value. Nowadays,
image histogram is present in digital cameras.

In a graph, the horizontal axis of the graph is used to represent tonal variations
whereas the vertical axis is used to represent the number of pixels in that particular
pixel. Black and dark areas are represented in the left side of the horizontal axis,
medium grey color is represented in the middle, and the vertical axis represents the
size of the area.

Applications of Histograms
1. In digital image processing, histograms are used for simple calculations in
software.
2. It is used to analyze an image. Properties of an image can be predicted by the
detailed study of the histogram.
3. The brightness of the image can be adjusted by having the details of its
histogram.
4. The contrast of the image can be adjusted according to the need by having
details of the x-axis of a histogram.
5. It is used for image equalization. Gray level intensities are expanded along the
x-axis to produce a high contrast image.
6. Histograms are used in thresholding as it improves the appearance of the
image.

Histogram Equalization
Histogram equalization is used for equalizing all the pixel values of an image.
Transformation is done in such a way that uniform flattened histogram is produced.

Histogram equalization increases the dynamic range of pixel values and makes an
equal count of pixels at each level which produces a flat histogram with high contrast
image.

Histogram equalization is a technique for adjusting image intensities to enhance


contrast. Assuming initially continuous intensity values, let the variable 𝑟 denote the
intensities of an image to be processed. As usual, we assume that 𝑟 is in the range [0,
𝐿 − 1], with 𝑟 = 0 representing black and 𝑟 = 𝐿 − 1 representing white. For 𝑟 satisfying
these conditions, we focus attention on transformations (intensity mappings) of the
form :
𝑠 = 𝑇(𝑟) 0 ≤ 𝑟 ≤ 𝐿 − 1
that produce an output intensity value, 𝑠, for a given intensity value 𝑟 in the input
image. We assume that
a) 𝑇(𝑟) is a monotonic increasing function in the interval 0 ≤ 𝑟 ≤ 𝐿 − 1; and
b) 0 ≤ 𝑇(𝑟) ≤ 𝐿 − 1 for 0 ≤ 𝑟 ≤ 𝐿 − 1.

Recall that the probability of occurrence of intensity level 𝑟𝑘 in a digital image is


approximated by
𝑝𝑟 (𝑟𝑘 ) = 𝑛𝑘 𝑀𝑁
The transformation is
𝑠𝑘 = 𝑇(𝑟𝑘 ) = (𝐿 − 1)∑𝑝𝑟(𝑟𝑗) 𝑘 𝑗=0 𝑘 = 0,1,2, … , 𝐿 – 1……………………… (1)

where, as before, 𝐿 is the number of possible intensity levels in the image (e.g., 256 for
an 8-bit image). Thus, a processed (output) image is obtained by using the equation
above to map each pixel in the input image with intensity 𝑟𝑘 into a corresponding pixel
with level 𝑠𝑘 in the output image, This is called a histogram equalization or histogram
linearization transformation. It is not difficult to show that this transformation satisfies
conditions (a) and (b) stated previously in this section.

Histogram matching

Histogram matching is used for normalizing the representation of images, it can be


used for feature matching, especially when the pictures are from diverse sources or
under varied conditions (depending on the light, etc). each image has a number of
channels, each channel is matched individually. Histogram matching is possible only
if the number of channels matches in the input and reference images.

Histogram equalization produces a transformation function that seeks to generate an


output image with a uniform histogram. When automatic enhancement is desired, this
is a good approach to consider because the results from this technique are predictable
and the method is simple to implement. However, there are applications in which
histogram equalization is not suitable. In particular, it is useful sometimes to be able
to specify the shape of the histogram that we wish the processed image to have. The
method used to generate images that have a specified histogram is called histogram
matching or histogram specification.

Given an input image, a specified histogram,

𝑝𝑧 (𝑧𝑖 ),𝑖 = 0,1,2, … , 𝐿 − 1


and recalling that the 𝑠𝑘′𝑠 are the values resulting from Eq. (1), we may summarize the
procedure for discrete histogram specification as follows:

1. Compute the histogram, 𝑝𝑟 (𝑟), of the input image, and use it in Eq. (1) to map the
intensities in the input image to the intensities in the histogram-equalized image.
Round the resulting values, 𝑠𝑘, to the integer range [0, 𝐿 − 1].
2. Compute all values of function 𝐺(𝑧𝑞) using the Equation

𝐺(𝑧𝑞) = (𝐿 − 1)∑𝑝𝑧 (𝑧𝑖 ) 𝑞 𝑖=0


for 𝑞 = 0,1,2, … , 𝐿 − 1

where 𝑝𝑧(𝑧𝑖) are the values of the specified histogram. Round the values of 𝐺 to
integers in the range [0, 𝐿 − 1]. Store the rounded values of 𝐺 in a lookup table.

3. For every value of 𝑠𝑘, 𝑘 = 0,1,2, … , 𝐿 − 1 use the stored values of 𝐺 from Step 2 to
find the corresponding value of 𝑧𝑞 so that 𝐺(𝑧𝑞) is closest to 𝑠𝑘. Store these mappings
from 𝑠 to 𝑧. When more than one value of 𝑧𝑞gives the same match (i.e., the mapping
is not unique), choose the smallest value by convention.
4. Form the histogram-specified image by mapping every equalized pixel with value
𝑠𝑘 to the corresponding pixel with value 𝑧𝑞 in the histogram-specified image, using
the mappings found in Step 3.

LOCAL HISTOGRAM PROCESSING


The procedure is to define a neighborhood and move its center from pixel to pixel in
a horizontal or vertical direction. At each location, the histogram of the points in the
neighborhood is computed, and either a histogram equalization or histogram
specification transformation function is obtained. This function is used to map the
intensity of the pixel centered in the neighborhood. The center of the neighborhood
is then moved to an adjacent pixel location and the procedure is repeated. Because
only one row or column of the neighborhood changes in a one-pixel translation of the
neighborhood, updating the histogram obtained in the previous location with the new
data introduced at each motion step is possible. This approach has obvious advantages
over repeatedly computing the histogram of all pixels in the neighborhood region
each time the region is moved one pixel location. Another approach used sometimes
to reduce computation is to utilize nonoverlapping regions, but this method usually
produces an undesirable “blocky” effect.

Figure 2(a) is an 8-bit, 512 × 512 image consisting of five black squares on a light gray
background. The image is slightly noisy, but the noise is imperceptible. There are
objects embedded in the dark squares, but they are invisible for all practical purposes.
Figure 2(b) is the result of global histogram equalization. As is often the case with
histogram equalization of smooth, noisy regions, this image shows significant
enhancement of the noise. However, other than the noise, Fig. 2(b) does not reveal any
new significant details from the original. Figure 2(c) was obtained using local histogram
equalization of Fig. 2(a) with a neighborhood of size 3 × 3. Here, we see significant
detail within all the dark squares. The intensity values of these objects are too close to
the intensity of the dark squares, and their sizes are too small, to influence global
histogram equalization significantly enough to show this level of intensity detail
Histogram statistics for image enhancement
Let r denote a discrete random variable representing intensity representing in tensity
values in [0, L-1] and let p(ri)denote the normalised histogram compo nent
corresponding to value r .We may view p(ri ) as an estimate of the probabil ity that
infinity ri occurs in the image

Homomorphic Filtering
Homomorphic filters are widely used in image processing for compensating the effect
of no uniform illumination in an image. Pixel intensities in an image represent the light
reflected from the corresponding points in the objects.

The illumination-reflectance model can be used to develop a frequency domain


procedure for improving the appearance of an image by simultaneous gray-level range
compression and contrast enhancement. An image f(x, y) can be expressed as the
product of illumination and reflectance components:

Equation above cannot be used directly to operate separately on the frequency


components of illumination and reflectance because the Fourier transform of the
product of two functions is not separable; in other words,

where Fi (u, v) and Fr (u, v) are the Fourier transforms of ln i(x, y) and ln r(x, y),
respectively. If we process Z (u, v) by means of a filter function H (u, v) then, from

where S (u, v) is the Fourier transform of the result. In the spatial domain,

Now we have

Finally, as z (x, y) was formed by taking the logarithm of the original image f (x, y), the
inverse (exponential) operation yields the desired enhanced image, denoted by g(x, y);
that is,
and

are the illumination and reflectance components of the output image. The
enhancement approach using the foregoing concepts is summarized in Fig. 9.1. This
method is based on a special case of a class of systems known as homomorphic
systems. In this particular application, the key to the approach is the separation of the
illumination and reflectance components achieved. The homomorphic filter function
H (u, v) can then operate on these components separately.

The illumination component of an image generally is characterized by slow spatial


variations, while the reflectance component tends to vary abruptly, particularly at the
junctions of dissimilar objects. These characteristics lead to associating the low
frequencies of the Fourier transform of the logarithm of an image with illumination
and the high frequencies with reflectance. Although these associations are rough
approximations, they can be used to advantage in image enhancement.

A good deal of control can be gained over the illumination and reflectance
components with a homomorphic filter. This control requires specification of a filter
function H (u, v) that affects the low- and high-frequency components of the Fourier
transform in different ways. Figure 9.2 shows a cross section of such a filter. If the
parameters γL and γH are chosen so that γL < 1 and γH > 1, the filter function shown in
Fig. 9.2 tends to decrease the contribution made by the low frequencies (illumination)
and amplify the contribution made by high frequencies (reflectance). The net result is
simultaneous dynamic range compression and contrast enhancement.

IMAGE COMPRESSION
Image compression models
An image compression system is composed of an encoder and a decoder. The encoder
performs compression, and the decoder performs decompression. A coder is used for
encoding & decoding.
Input image f(x,y) is fed into the encoder, which creates a compressed representation
of the input. The decoder generates a reconstructed output image f^(x, y) when f^(x,
y) is an exact replica of f(x,y), the compression is error free, lossless, or information
preserving. Otherwise, the reconstructed output image is distorted and the
compression system is lossy.

The encoding or Compession Process

In the first stage of the encoding process, a mapper transforms f(x, y) into a format to
reduce spatial and temporal redundancy. This is a reversible operation.

Example Run-length coding The quantizer reduces the accuracy of the mappers output
is accordance with a pre-established fidelity criterion, to keep irrelevant information
out of the compressed representation. This is omitted when error-free compression is
desired. The symbol coder generates a fixed or variable length code to represent. The
quantizer output & maps the output to a code. The shortest code words are assigned
to the most frequently occurring quantizer output values thus minimizing coding
redundancy. This operation is reversible.

The decoding or decompression process

The decoder contains a symbol decoder and an inverse mapper. They perform the
inverse operations of symbol encoder & mapper.

Image formats, containers & Compression standards


Image file format tells how the data is arranged & the type of Compression used.Image
container – similar to file format & handles multiple types of image data. Image
compression standard define procedures for compressing & de compressing.

IEC - International Electro-technical Commission

ITU-T - International Telecommunication Union


CCITT - Consultative Committee for International Telephony and Telegraphy.
Some Basic Compression Methods
Some basic compression methods are:

 Huffman Coding

 Arithmetic Coding
 Run-Length Coding

Huffman coding
This is One of the most popular techniques for removing coding redundancy . When
coding the symbols of an information source individually, Huffman code yields the
smallest possible number of code symbols per source symbol. In terms of Shannon's
theorem , the resulting code is optimal for a fixed value of n, subject to the constraint
that the spouce symbols be be coded one at a time. . (The source symbols may be
either the intensities of an image or the output of an intensity mapping operation like
pixel differences,run length etc ).

 The first step in Huffman’s approach is to create a series of source reductions by ordering the
probabilities of the symbols under consideration and combining the lowest probability symbols into a
single symbol that replaces them in the next source reduction.
 The second step in Huffman’s procedure is to code each reduced source, starting with the smallest
source and working back to the original source. The code for a two symbol source are 0 and 1. These
are assigned to two symbols on the right. Since the prob. 0.6 was generated by combining two
symbols, the 0 used to code it is now assigned to both of these symbos, & a 0 & are 1 are appended
to each. This is repeated to each reduced source.

The symbols are coded one at a time and decoding is done using a simple lookup
table. The code used here is a block code, since each source symbol is mapped into a
fixed sequence of code symbols. It is uniquely decodable.
Arithmetic coding
This generates nonblock codes. A one-to-one correspondence between Source
symbols & code words does not exist . But, an entire sequence of source symbol ( or
message), is assigned a single arithmetic code word. The code word defines an internal
of real numbers between 0 & 1. As the no. of symbols in the message increases, the
interval used to represent it becomes smaller & the no. of information units (bits)
required to represent the interval, become larger.

for ex a five symbol sequence or message, a1 a2a3a3 a4, from a four symbol source is
coded. The internal [ 0, 1 ]is sub divided into four regions based on the probabilities
of each source symbol. Symbol a1, is associated with subinterval [ 0, 0.2 ] and the
message interval is initially narrowed to [ 0, 0.2 ]. This is then subdivided according to
original source symbol probabilities and the proces continues with the next message
symbol .
Symbol a2 narrows the subinterval to [ 0.04, 0.08 ] , a3 to [ 0.056, 0.072 ] & so on. The
final message symbol, reserved for end-of -message indicator, narrows the range to [
0.06752, 0.0688 ] . Any number within this subinterval like 0.068 can be used to
represent the message.

Here 3 decimal digits are used to represent the five symbol message i.e., 0.6 decimal
digits are used per source symbol.

Run Length coding


Images with repeating intensities along their rows (or columns) can be compressed by
representing runs of identical intensities as run-length pairs where each ruin-length
pair specifies the start of a new intensity and the no. of consecutive pixels that have
that intensity. This technique is known as run-length encoding (RLE) & used in fascimile
(FAX) coding. Compression is achieved by eliminating a simple form of spatial
redundancy – i.e., groups of identical intensities.

In a BMP file format, image data is represented in two different Mode : encoded and
absolute, and any mode can occur anywhere in the image. In encoded mode, a two
byte RLE representation is used. The first byte specifies the no. of consecutive pixels
that have the color index contained in the second byte. The color index is chosen from
a table of 256 possibilities.
In absolute mode, the first byte is 0, and the second byte signals one of four possible
conditions, as shown below.

if the second byte is between 3 – 255 , it specifies the no. of uncompressed pixels that follow.
RLE is effective when compressing binary images, because there are only two possible intensies (black
& white.

JPEG, MPEG
JPEG

JPEG was proposed in 1991 as a compression standard for digital still images. JPEG
compression is lossy , which means that some details may be lost when the image is
restored from the compressed data. JPEG compression takes advantage of the way the
human eyes work, so that people usually do not notice the lost details in the image.
With JPEG, one can adjust the amount of loss at compression time by trading image
quality for a smaller size of the compressed image.

JPEG is designed for full-color or grayscale images of natural scenes. It works very well
with photographic images. JPEG does not work as well on images with sharp edges or
artificial scenes such as graphical drawings, text documents, or cartoon pictures.

A few different file formats are used to exchange files with JPEG images. The JPEG File
Interchange Format (JFIF) defines a minimum standard necessary to support JPEG, is
widely accepted, and is used most often on the personal computer and on
the Internet. The conventional file name for JPEG images in this format usually has the
extension ~.JPG or ~.JPEG (where ~. represents a file name).
A recent development in the JPEG standard is the JPEG 2000 initiative. JPEG 2000 aims
to provide a new image coding system using the latest compression techniques, which
are based on the mathematics of wavelets. The effort is expected to find a wide range
of applications, from digital cameras to medical imaging.

MPEG
MPEG was first proposed in 1991. It is actually a family of standards for compressed
digital movies, and is still evolving. One may think of a digital movie as a sequence of
still images displayed one after another at video rate. However, this approach does not
take into consideration the extensive redundancy from frame to frame. MPEG takes
advantage of that redundancy to achieve even better compression. A movie also has
sound channels to play synchronously with the sequence of images.

The MPEG standards actually consist of three main parts: video, audio, and systems.
The MPEG systems part coordinates between the video and audio parts, as well as
coordinating external synchronization for playing.

The MPEG-1 standard was completed in 1993, and was adopted in video CDs. MP3
audio, which is popular on the Internet, is actually an MPEG-1 standard adopted to
handle music in digital audio. It is called MP3 because it is an arrangement for MPEG
Audio Layer 3. The MPEG-2 standard was proposed in 1996. It is used as the format
for many digital television broadcasting applications, and is the basis for digital
versatile disks (DVDs), a much more compact version of the video CD with the capacity
for full-length movies. The MPEG-4 standard was completed in 1998, and was adopted
in 1999. MPEG-4 is designed for transmission over channels with a low bit rate . It is
now the popular format for video scripts on the Internet.

These MPEG standards were adopted in a common MPEG format for exchange in disk
files. The conventional file name would have the extension ~.MPG or ~.MPEG (where
~ represents a file name). MPEG-7, completed in 2001, is called the Multimedia
Content Description Interface. It defines a standard for the textual description of the
various multimedia contents, to be arranged in a way that optimizes searching. MPEG-
21, called the Multimedia Framework, was started in June 2000. MPEG-21 is intended
to integrate the various parts and subsystems into a platform for digital multimedia.
Module 5
Image Edge Detection Operators in Digital Image
Processing
Edges are significant local changes of intensity in a digital image. An edge can be
defined as a set of connected pixels that forms a boundary between two disjoint
regions. There are three types of edges:
• Horizontal edges
• Vertical edges
• Diagonal edges
Edge Detection is a method of segmenting an image into regions of discontinuity.
It is a widely used technique in digital image processing like

• pattern recognition
• image morphology
• feature extraction
Edge detection allows users to observe the features of an image for a significant
change in the gray level. This texture indicating the end of one region in the image
and the beginning of another. It reduces the amount of data in an image and
preserves the structural properties of an image.
Edge Detection Operators are of two types:
• Gradient – based operator which computes first-order derivations in a
digital image like, Sobel operator, Prewitt operator, Robert operator
• Gaussian – based operator which computes second-order derivations in
a digital image like, Canny edge detector, Laplacian of Gaussian

Sobel Operator: The Sobel edge detection operator extracts all the edges of an
image, without worrying about the directions. The main advantage of the Sobel
operator is that it provides differencing and smoothing effect.

Sobel edge detection operator is implemented as the sum of two directional edges.
And the resulting image is a unidirectional outline in the original image.
Sobel Edge detection operator consists of 3x3 convolution kernels. Gx is a simple
kernel and Gy is rotated by 90°

These Kernels are applied separately to input image because separate measurements
can be produced in each orientation i.e Gx and Gy.

Following is the gradient magnitude:

As it is much faster to compute An approximate magnitude is computed:

Advantages:

1. Simple and time efficient computation


2. Very easy at searching for smooth edges
Limitations:

1. Diagonal direction points are not preserved always


2. Highly sensitive to noise
3. Not very accurate in edge detection
4. Detect with thick and rough edges does not give appropriate results

Prewitt Operator: Prewitt operator is a differentiation operator. Prewitt operator is


used for calculating the approximate gradient of the image intensity function. In an
image, at each point, the Prewitt operator results in gradient vector or normal vector.
In Prewitt operator, an image is convolved in the horizontal and vertical direction with
small, separable and integer-valued filter. It is inexpensive in terms of computations.
Advantages:

1. Good performance on detecting vertical and horizontal edges


2. Best operator to detect the orientation of an image
Limitations:

1. The magnitude of coefficient is fixed and cannot be changed


2. Diagonal direction points are not preserved always

Robert Operator: Robert's cross operator is used to perform 2-D spatial gradient
measurement on an image which is simple and quick to compute. In Robert's cross
operator, at each point pixel values represents the absolute magnitude of the input
image at that point.

Robert's cross operator consists of 2x2 convolution kernels. Gx is a simple kernel and
Gy is rotated by 90o

Following is the gradient magnitude:

As it is much faster to compute An approximate magnitude is computed:

Advantages:

1. Detection of edges and orientation are very easy


2. Diagonal direction points are preserved
Limitations:

1. Very sensitive to noise


2. Not very accurate in edge detection

Marr-Hildreth Operator or Laplacian of Gaussian (LoG): It is a gaussian-based


operator which uses the Laplacian to take the second derivative of an image. This
really works well when the transition of the grey level seems to be abrupt. It works
on the zero-crossing method i.e when the second-order derivative crosses zero, then
that particular location corresponds to a maximum level. It is called an edge location.
Here the Gaussian operator reduces the noise and the Laplacian operator detects the
sharp edges.
The Gaussian function is defined by the formula:

Advantages:

1. Easy to detect edges and their various orientations


2. There is fixed characteristics in all directions
Limitations:

1. Very sensitive to noise


2. The localization error may be severe at curved edges
3. It generates noisy responses that do not correspond to edges, so-called
“false edges”

Canny Operator: It is a gaussian-based operator in detecting edges. This operator


is not susceptible to noise. It extracts image features without affecting or altering the
feature. Canny edge detector have advanced algorithm derived from the previous
work of Laplacian of Gaussian operator. It is widely used an optimal edge detection
technique. It detects edges based on three criteria:
1. Low error rate
2. Edge points must be accurately localized
3. There should be just one single edge response

Advantages:

1. It has good localization


2. It extract image features without altering the features
3. Less Sensitive to noise
Limitations:

1. There is false zero crossing


2. Complex computation and time consuming
Some Real-world Applications of Image Edge Detection:

• medical imaging, study of anatomical structure


• locate an object in satellite images
• automatic traffic controlling systems
• face recognition, and fingerprint recognition

Thresholding
Thresholding is one of the segmentation techniques that generates a binary image
(a binary image is one whose pixels have only two values – 0 and 1 and thus requires
only one bit to store pixel intensity) from a given grayscale image by separating it
into two regions based on a threshold value. Hence pixels having intensity values
greater than the said threshold will be treated as white or 1 in the output image and
the others will be black or 0.

Suppose the above is the histogram of an image f(x,y). We can see one peak near
level 40 and another at 180. So there are two major groups of pixels – one group
consisting of pixels having a darker shade and the others having a lighter shade. So
there can be an object of interest set in the background. If we use an appropriate
threshold value, say 90, will divide the entire image into two distinct regions.
In other words, if we have a threshold T, then the segmented image g(x,y) is
computed as shown below:
So the output segmented image has only two classes of pixels – one having a value
of 1 and others having a value of 0.
If the threshold T is constant in processing over the entire image region, it is said to
be global thresholding. If T varies over the image region, we say it is variable
thresholding.
Multiple-thresholding classifies the image into three regions – like two distinct
objects on a background. The histogram in such cases shows three peaks and two
valleys between them. The segmented image can be completed using two
appropriate thresholds T1 and T2.

where a, b and c are three distinct intensity values.

Otsu’s Thresholding Concept

Automatic global thresholding algorithms usually have following steps.

1. Process the input image


2. Obtain image histogram (distribution of pixels)
3. Compute the threshold value
4. Replace image pixels into white in those regions, where saturation is greater
than and into the black in the opposite cases.

Usually, different algorithms differ in step 3.

Let’s understand the idea behind Otsu’s approach. The method processes image
histogram, segmenting the objects by minimization of the variance on each of the
classes. Usually, this technique produces the appropriate results for bimodal images.
The histogram of such image contains two clearly expressed peaks, which represent
different ranges of intensity values.

The core idea is separating the image histogram into two clusters with a threshold
defined as a result of minimization the weighted variance of these classes denoted
by .

The whole computation equation can be described as:


, where are the probabilities
of the two classes divided by a threshold , which value is within the range from 0 to
255 inclusively.

Two options to find the threshold. The first is to minimize the within-class variance
defined above , the second is to maximize the between-class variance using
the expression below:
, where is a mean of class .

The probability is calculated for each pixel value in two separated clusters
using the cluster probability functions expressed as:
,

It should be noted that the image can presented as intensity function , which
values are gray-level. The quantity of the pixels with a specified gray-level denotes
by . The general number of pixels in the image is .
Thus, the probability of gray-level occurrence is:
.

The pixel intensity values for the are in and for are in , where
is the maximum pixel value (255).

The next phase is to obtain the means for , which are denoted by
appropriately:

Now let’s remember the above equation of the within-classes weighted variance. We
will find the rest of its components ( ) mixing all the obtained above
ingredients:

It should be noted that if the threshold was chosen incorrectly the variance of some
class would be large. To get the total variance we simply need to summarize the
within class and between-class variances: , where
. The total variance of the image ( )
does not depend on the threshold.

Thus, the general algorithm’s pipeline for the between-class variance maximization
option can be represented in the following way:

1. calculate the histogram and intensity level probabilities


2. initialize
3. iterate over possible thresholds:
o update the values of , where is a probability and is a mean
of class
o calculate the between-class variance value
4. the final threshold is the maximum value

Variable Thresholding

There are broadly two different approaches to local thresholding. One approach is to
partition the image into non-overlapping rectangles. Then the techniques of global
thresholding or Otsu’s method are applied to each of the sub-images. Hence in the
image partitioning technique, the methods of global thresholding are applied to each
sub-image rectangle by assuming that each such rectangle is a separate image in itself.
This approach is justified when the sub-image histogram properties are suitable (have
two peaks with a wide valley in between) for the application of thresholding techniques
but the entire image histogram is corrupted by noise and hence is not ideal for global
thresholding.

The other approach is to compute a variable threshold at each point from the
neighborhood pixel properties. Let us say that we have a neighborhood S xy of a pixel
having coordinates (x,y). If the mean and standard deviation of pixel intensities in this
neighborhood be mxy and σxy , then the threshold at each point can be computed as:

where a and b are arbitrary constants. The above definition of the variable threshold is
just an example. Other definitions can also be used according to the need.

The segmented image is computed as:


Moving averages can also be used as thresholds. This technique of image thresholding
is the most general one and can be applied to widely different cases.

Region-Based Segmentation
Region-based segmentation involves dividing an image into regions with similar
characteristics. Each region is a group of pixels, which the algorithm locates via a seed
point. Once the algorithm finds the seed points, it can grow regions by adding more
pixels or shrinking and merging them with other points.

Region Growing
– Region growing is a procedure that groups pixels or subregions into larger regions.
– The simplest of these approaches is pixel aggregation, which starts with a set of
“seed” points and from these grows regions by appending to each seed points those
neighboring pixels that have similar properties (such as gray level, texture, color,
shape).
– Region growing based techniques are better than the edge-based techniques in
noisy images where edges are difficult to detect.

Region Merging
Merging must start from a uniform seed region. Some work has been done in
discovering a suitable seed region. One method is to divide the image into 2x2 or 4x4
blocks and check each one. Another is to divide the image into strips, and then
subdivide the strips further. In the worst case the seed will be a single pixel. Once a
seed has been found, its neighbours are merged until no more neighbouring regions
conform to the uniformity criterion. At this point the region is extracted from the
image, and a further seed is used to merge another region.

There are some drawbacks which must be noted with this approach. The process is
inherently sequential, and if fine detail is required in the segmentation then the
computing time will be long. Moreover, since in most cases the merging of two regions
will change the value of the property being measured, the resulting area will depend
on the search strategy employed among the neighbours, and the seed chosen.

Region Splitting
These algorithms begin from the whole image, and divide it up until each sub region
is uniform. The usual criterion for stopping the splitting process is when the properties
of a newly split pair do not differ from those of the original region by more than a
threshold.

The chief problem with this type of algorithm is the difficulty of deciding where to
make the partition. Early algorithms used some regular decomposition methods, and
for some classes these are satisfactory, however, in most cases splitting is used as a
first stage of a split/merge algorithm.

You might also like