0% found this document useful (0 votes)
37 views

Unit 1 CV Notes

Uploaded by

yadavvaishu95
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Unit 1 CV Notes

Uploaded by

yadavvaishu95
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 122

Unit I

Part 1: CAMERAS
Pinhole Cameras.

Part 2: Radiometry: Measuring Light:


Light in Space, Light Surfaces, Important Special Cases.

Part 3:
Sources, Shadows, And Shading
Qualitative Radiometry, Sources and Their Effects, Local Shading
Models, Application: Photometric Stereo, Interreflections: Global
Shading Models.

Part 4:
Color
The Physics of Color, Human Color Perception, Representing Color,
A Model for Image Color, Surface Color from Image Color.

What is Computer Vision. Explain its working, advantages,


disadvantages, and applications.
Computer vision is the field of computer science that focuses on
creating digital systems that can process, analyze, and make sense
of visual data (images, videos) in the same way that humans do.

Computer Vision uses convolutional neural networks (CNNs) to


processes visual data at the pixel level and deep learning recurrent
neural networks to understand how one pixel relates to another.

Computer Vision Working:

1. Image Acquisition: Computer vision systems start with capturing


or acquiring images or videos using cameras or other sensors.

2. Preprocessing: The acquired images are preprocessed to enhance


quality, remove noise, and normalize them for further processing.

3. Feature Extraction: Key features like edges, corners, textures,


shapes, and colours are extracted from the preprocessed images.
4. Feature Representation: Extracted features are represented in a
suitable mathematical form for analysis and interpretation.

5. Recognition and Interpretation: Computer vision algorithms


analyze the represented features to recognize objects, scenes,
gestures, or patterns and interpret their meaning.

6. Decision Making: Based on the interpretation, computer vision


systems make decisions or take actions, such as classification,
detection, tracking, or 3D reconstruction.

Advantages of CV:

1. Automation: It enables automation of visual tasks that are


tedious, time-consuming, or error-prone for humans.

2. Accuracy: Computer vision systems can perform repetitive tasks


with high accuracy and consistency.

3. Speed: They can process visual information much faster than


humans, enabling real-time applications.

4. Scale: Computer vision systems can handle large datasets and


vast amounts of visual information efficiently.

5. Objectivity: They provide objective analysis and interpretation of


visual data, reducing biases.
6. Versatility: Computer vision algorithms can be adapted and
applied to various domains, including healthcare, manufacturing,
automotive, security, and entertainment.

Disadvantages:

1. Complexity: Developing robust computer vision systems requires


expertise in image processing, machine learning, and software
engineering.

2. Dependency on Data: Performance heavily depends on the


quantity and quality of training data, which can be challenging to
obtain and annotate.

3. Computational Resources: Some advanced computer vision


algorithms require significant computational resources, making
them unsuitable for resource-constrained environments.

4. Environmental Sensitivity: Performance may degrade in


challenging environments with variations in lighting, weather, or
viewpoint.

5. Interpretability: Complex deep learning models used in computer


vision may lack interpretability, making it difficult to understand
their decision-making process.

6. Privacy Concerns: Applications involving surveillance or image


processing of personal data raise privacy concerns and ethical
considerations.

Limitations:

1. Generalization: Computer vision systems may struggle to


generalize to unseen scenarios or objects not encountered during
training.

2. Ambiguity: Interpretation of visual data can be ambiguous,


leading to errors or misinterpretations.
3. Occlusion and Clutter: Occlusions and clutter in images can
hinder object detection and recognition.

4. Limited Context: Understanding visual scenes often requires


contextual knowledge beyond what is captured in a single image.

5. Domain Specificity: Performance may vary across different


domains, and models trained in one domain may not generalize well
to others.

6. Semantic Gap: There's a semantic gap between low-level visual


features and high-level semantic concepts, which poses a challenge
for meaningful interpretation.

Applications of Computer Vision :

1. Agriculture:

Product Quality Testing, Plant disease detection, Livestock health


monitoring, Crop and yield monitoring, Insect detection, Aerial
survey and imaging, Automatic weeding, Yield Assessment, etc.

2. Sports:

Player Tracking, Performance Assessment, Batting Recognition, Real-


Time Coaching, Sports Activity Scoring, etc.

3. Healthcare:

Cell Classification, Disease Progression Score, Cancer detection,


Blood loss measurement, Movement analysis, CT and MRI, X-Ray
analysis, etc.

4. Transportation:

Vehicle Classification, Traffic flow analysis, Self-driving cars, Moving


Violations Detection, Pedestrian detection, License Plate Recognition,
Parking occupancy detection, Road Condition Monitoring, Road
condition monitoring, Driver Attentiveness Detection, etc.
5. Manufacturing:

Defect inspection, Reading text and barcodes, Product assembly, etc.

6. Retail:

Intelligent video analytics. Waiting Time Analytics, Theft Detection,


Foot traffic and people counting, Self-checkout, Automatic
replenishment, etc.

7. Constructions:

Predictive maintenance, PPE Detection, etc.

Top Tools used for Computer Vision:

OpenCV, TensorFlow, Keras, CUDA, MATLAB, Viso Suite, CAFFE,


SimpleCV, DeepFace, YOLO, GPUImage, etc.

PART 1 CAMERAS

Pinhole Camera:
 There are many types of imaging devices, from animal eyes to
video cameras and radio telescopes, and they may or may not be
equipped with lenses.
 The visual arts and photography have undergone a major
transformation over the years.
 With the advancement of technology, the designs and
functioning of cameras have also changed.
 However, the Pinhole Camera, which is one of the fundamentals
of photography, is still an interesting technique today.
 The first models of the “camera obscura”, which is a Latin word
which means dark chamber, invented in thesixteenth century did
not have lenses, but instead used a pinhole to focus light raysonto a
wall or translucent plate and demonstrate the laws of perspective
discovered, a century earlier by Brunelleschi.
 The pinhole camera is the simplest kind of camera.
 It does not have a lens.
 It just makes use of a tiny opening/aperture (a pinhole-sized
opening) to focus all light rays within the smallest possible area to
obtain an image, as clearly as possible.
 The core component of the Pinhole Camera, the hole is an
opening through which light passes.
 Light rays reflected or radiated from an object pass through
the hole and reach the projection plane.
 The hole allows the light to come from different points and
reflect to different points on the plane and pass through a single
point.
 This prevents the light rays from scattering and allows a
clearer image to fall on the projection plane.
 The small size of the hole helps to focus the light rays
accurately.
 The simple image formed using a pinhole camera is always
inverted.

Some of the terms related to pinhole camera are:


1. Projection Plane:
The projection plane is where the Pinhole Camera image is formed.
This plane can often be a wall, a screen, or a special surface.
2. The Field of View
 The field of view is the angular area the Pinhole Camera can see
and determines the width of the scene where the photo is taken.
 A larger field of view covers a wider perspective and allows the
camera to view a wider area around it.
 Field of view is a modifiable feature in Pinhole Camera. A hole
camera uses factors such as hole size and focal length to adjust the
field of view.

3. COP (Center of Projection):


 The center of reflection is a central point in the Pinhole Camera
located between the hole and the projection plane.
 As the light rays pass through this center, they fall on the
projection plane and are reversed there.
 This mechanism provides the formation of the image.

Advantages:
1. It is simple and has a few components.
2. Its design is quite simple.
3. It is low cost.
4. It is portable.
Disadvantages:
1. Due to absence of lens, it has a low light-gathering capacity.
2. Lenses receive more light and expose better, due to their ability to
collect light. In the Pinhole Camera, on the other hand, light is more
difficult to collect, which means longer exposure times may be needed
in low light conditions.
3. Due to absence of lens, there may be a loss of sharpness in
image quality.
4. There are also restrictions on depth of focus because focus
cannot be adjusted in the Pinhole Camera.

Images obtained without using lenses can create a certain


atmosphere and nostalgic effect.
This makes Pinhole Camera interesting as a creative tool for
photographers and artists.

PART 2: RADIOMETRY – MEASURING LIGHT


Part 2: Radiomet

Light in Space:

2.3 Sources and their Effects


There are three main types of geometrical light source models: point
sources, line sources and area sources.

at a surface patch by these light


The expression for radiosity produced
sources is obtained by thinking about the appearance of the source
from the patch.

2.3.1 Point Sources


Light sources are modelled as point sources because many sources
are physically small compared to the environment in which they
stand.

A model for the effects of a point source is obtained by modelling the


source as a very small sphere which emits light at each point on the
sphere, with an exitance that is constant over the sphere.
Assume that a surface patch is viewing a sphere of radius ε, at a
distance r away, and that ε >> r, as shown below:

The solid angle that the source subtends is Ωs. This will behave
approximately proportional to

The pattern of illumination that the source creates on the hemisphere


will
(roughly) scale, too. As the sphere moves away, the rays leaving the
surface patch and striking the sphere move closer together (roughly)
evenly, and the collection changes only slightly (a small set of new
rays is added at the rim — the contribution from these rays must be
very small, because they come from directions tangent to the sphere).
In the limit as ε tends to zero, no new rays are added.

The radiosity due to the point source is obtained by integrating the


illumination pattern over the surface, multiplied by the cosine of the
angle of incidence, and scaled by the surface albedo.

The expression for radiosity due to the point source will be

where E is a term in the exitance of the source, integrated over the


small patch and ρ is the surface albedo.

A Nearby Point Source

The standard nearby point source model is given by:

Where N(x) is the unit normal to the surface and S(x) known as source
vector, a vector from x to the source.

This is an extremely convenient model, because it gives an explicit


relationship between radiosity and shape (the normal term).

A Point Source at Infinity


2
The sun is far away; as a result, the terms 1/r(x) and S(x) are
essentially constant.
In this case, the point source is referred to as being a point source at
infinity.
If all the surface patches we are interested in are close together with
respect to the distance to the source,
Where ro >>Δr(x).

The radiosity due to a point source at infinity is given by:

Choosing a Point Source Model

Point source at infinity works for distant sources: The sun is a


good example because it appears very small from our perspective,
regardless of from where we look at it. This allows us to treat it as a
point source for calculations.

Apparent size of object matters: For very distant sources, their


apparent size doesn't change much with our viewpoint. However, for
nearby sources like a light bulb in a room, their apparent size
increases as we get closer, leading to a stronger light effect.

Point source model has limitations: While the point source model
simplifies calculations, it can create unrealistic results for nearby
sources. For example, a point source in the center of a cube wouldn't
accurately predict the darkness of corners in a real room.

The solid angle subtended by the source goes up as the inverse


square of the distance, which means that the radiosity also goes up.

2.3.2 Line Sources

A line source has the geometry of a line.

A good example is a single fluorescent light bulb.

Line sources are not terribly common in natural scenes or in


synthetic environments.

Their main interest is as an example for radiometric problems.


In particular, the radiosity of patches reasonably close to a line
source changes as the reciprocal of distance to the source.

We model a line source as a thin cylinder with diameter ε.

We also assume that the line source is infinitely long, with constant
exitance along its surface and diameter ε.

We considering a patch that views the source frontally, as in Figure


2.4.

Figure 2.4

On the right, the view of the source from each patch are shown.

We can see that that the length of the source on this hemisphere does
not change, but the width does (as ε/r).

Hence, the radiosity due to the source decreases with the reciprocal
of distance.

2.3.3 Area Sources


Area sources are important, for two reasons.
Firstly, they occur quite commonly in natural scenes and in synthetic
environments.

Secondly, a study of area sources will allow us to explain various


shadowing and interreflection effects.

An overcast sky is a good example of natural scene area source and


the fluorescent light boxes found in many industrial ceilings are good
examples of synthetic environments.
Area sources are normally modelled as surface patches whose
emitted radiance is independent of position and of direction—they
can be described by their exitance.

For points not too far from the source, the radiosity due to an area
source does not change with distance to the source.

Area sources generally yield fairly uniform illumination and hence


they are widely used in illumination engineering.

The radiosity on the surface is obtained by summing the incoming


radiance over all incoming directions, which is shown below:

2.4 Local Shading Models


In computer vision, shading refers to the process of altering the color
of an object/surface/polygon in the 3D scene, based on things like
to the surface's angle to lights, its distance from lights, its angle to
the camera and material properties (e.g. bidirectional reflectance
distribution function) to create a photorealistic effect.

A shading model defines how a material’s color varies depending on


factors such as surface orientation, viewer direction, and lighting.

Shading models are algorithms that determine how light and color
interact with the surfaces of 3D objects in computer vision.

Radiance could arrive at surface patches by many ways, for example,


it could be reflected from other surface patches; and it is necessary
to know which components account for this radiance.

Hence, selecting a shading model is very difficult.

The easiest model to manipulate is a local shading model, which


models the radiosity at a surface patch as the sum of the radiosity
due to sources and sources alone.

Local shading models provide a way to determine the intensity and


color of a point on a surface.

The models are local because they don't consider other objects at all.

The models are used because they are fast and simple to compute.

The local shading models capture:


i) Direct illumination from light sources.
ii) Diffuse and Specular components
iii) Approximate effects of global lighting.

They do not require the knowledge of the entire scene only the current
piece of surface.

This model will support a variety of algorithms and theories.


Unfortunately, this model often produces wildly inaccurate
predictions. Even worse, there are is little reliable information about
when this model is safe to use.

An alternate model is the global shading model which takes into


account all radiations. (section 2.6).

This takes into account radiance arriving from sources, and that
arriving from radiating surfaces.

This model is physically accurate, but usually very hard to


manipulate.

2.4.1 Local Shading Models for Point Sources


The local shading model for a set of point sources is:

where Bs(x) is the radiosity due to source s.

If all the sources are point sources at infinity, the expression


becomes:

For point sources that are not at infinity the model becomes:

where rs(x) is the distance from the source to x;


The Appearance of Shadows
In a local shading model, shadows occur when the patch can not see
one or more sources.

In this model, point sources produce a series of shadows with smooth


boundaries.

Shadow regions where no source can be seen are particularly dark.

Shadows cast with a single source can be very crisp and very black,
depending on the size of the source and the albedo of other nearby
surfaces.

The geometry of the shadow cast by a point source on a plane is


analogous to the geometry of viewing in a perspective camera (Figure
2.6).
Any patch on the plane is in shadow if a ray from the patch to the
source passes through an object.

Figure 2.6

There are two kinds of shadow boundary.


1. At self shadow boundaries, the surface is turning away from
the light, and a ray from the patch to the source is tangent to
the surface.

2. At cast shadow boundaries, from the perspective of the patch,


the source suddenly disappears behind an occluding
(obstructing) object.

Shadows cast onto curved surfaces can have extremely complex


geometries.

If there are many sources, the shadows will be less dark, except at
points where no source is visible.

There can be very many qualitatively distinct shadow regions (each


source casts its own shadow — some points may not see more than
one source.

2.4.2 Area Sources and their Shadows


The local shading model for a set of area sources is significantly more
complex, because it is possible for patches to see only a portion of a
given source. The model becomes:

Area sources do not produce dark shadows with crisp boundaries.


This is because, from the perspective of a viewing patch, the source
appears slowly from behind the occluding object; an eclipse of the
moon is an exact analogy.
Area sources generate complex shadows with smooth boundaries,
because, from the point of view of a surface patch, the source
disappears slowly behind the occluder.

Regions where the source cannot be seen at all are known as the
umbra (Latin word meaning shadow).

Regions where some portion of the source is visible are known as the
penumbra (Latin word meaning almost shadow).

Umbra is the dark part of the shadow whereas the penumbra is the
less dark part of the shadow.

Umbra is the central part of the shadow while Penumbra is the


outer part.
Light cannot reach Umbra while light can reach penumbra.
Figure 2.7

A good model is to imagine lying with your back to the surface,


looking at the world above, as shown in figure 2.7.

At point 1, one can see all of the source; at point 2, one can see some
of it; and at point 3 you can see none of it.

2.4.3 Ambient Illumination


One problem with local shading models is that they predict that some
shadow regions are arbitrarily dark, because they cannot see the
source.

This prediction is inaccurate in almost every case.

But in fact, the local shading models miss an important effect: light
bouncing off nearby surfaces can still illuminate shadows.
This effect is especially noticeable and which can be very significant
in rooms with bright walls and large light sources because shadows
are illuminated by light from other diffuse surfaces.

To account for this indirect illumination, we can add an "ambient


illumination term" to the shading model.

This term represents the constant background light a surface


receives from its surroundings.

Figure 2.8

Ambient illumination is a term added to the radiosity predictions of


local shading models to model the effects of radiosity from distant,
reflecting surfaces.

For some environments, the total irradiance a patch obtains from


other patches is roughly constant and roughly uniformly distributed
across the input hemisphere.
In such an environment it is sometimes possible to model the effect
of other patches by adding an ambient illumination term to each
patch’s radiosity. There are two strategies for determining this term.

There are two ways to calculate this term as shown in figure 2.8
above:

In a world like the interior of a sphere or of a cube (the case on the


left in figure 2.8), where a patch sees roughly the same thing from
each point, a constant ambient illumination term is often added to
radiosity of each patch. The magnitude of this term is usually
guessed.

In more complex worlds, some surface patches see much less of the
surrounding world than others.

For example, the patch at the base of the groove on the right in figure
2.8 sees relatively little of the outside world, which is modelled as an
infinite polygon of constant radiosity, where the view of this polygon
is occluded at some patches and its input hemisphere is shown
below.

The result is that the ambient term is smaller for patches that see
less of the world.

This model is often more accurate than adding a constant ambient


term.

Unfortunately, it is much more difficult to extract information from


this model, possibly as difficult as for a global shading model.

Monge Patch:
A Monge patch, named after the French mathematician Gaspard
Monge, is a type of parametric surface used in mathematics,
computer graphics and computer vision to represent a curved surface
in 3D space using a mathematical formula.

It is a representation of a piece of surface as a height function.


It is essentially a small, localized description of a larger surface,
similar to how a map uses a flat projection to represent a curved
Earth.

Figure 2.9 Monge Patch

A Monge patch is defined by three functions: X(u, v), Y(u, v), and
Z(u, v), where 𝑢 and v are parameters that vary over a two-
dimensional domain, often a rectangle.

These functions specify the coordinates of points on the surface in


terms of the parameters 𝑢 and v.

Monge patches are useful in differential geometry, a branch of


mathematics concerned with the properties of curves and surfaces.

They allow mathematicians to analyze and manipulate complex


surfaces by breaking them down into smaller, more manageable
pieces.

They are also used in computer graphics and CAD (computer aided
design)to model and render smooth, curved surfaces.
2.5 Application: Photometric Stereo
Photometric stereo is a computer vision technique that recovers the
3D shape of an object by analyzing how light reflects off the surface
from different lighting directions.

Photometric stereo is a method for recovering or reconstructing a


representation of the Monge patch from image data.
Monge patch is represents the surface as (x, y, f(x, y))

Monge patch is attractive because we can determine a unique point


on the surface by giving the image coordinates.

Photometric stereo is a computer vision technique used to estimate


surface normals of objects from multiple images taken under
different lighting conditions.

This technique relies on the fact that different lighting directions


produce different shading patterns on a surface, and by analyzing
these variations in shading, it's possible to infer the surface
orientation or surface normal.

The method involves reasoning about the image intensity values for
several different images of a surface in a fixed view, illuminated by
different sources.

This method will recover the height of the surface at points


corresponding to each pixel; which is known as a height map, depth
map or dense depth map.

Assumptions:

1. Lambertian reflectance: This is a common assumption where


the reflected light intensity is proportional to the cosine of the
angle between the light source direction and the surface normal.
Perfect diffusers like matte surfaces exhibit Lambertian reflectance.

2. Directional light sources: We typically assume light sources


are infinitely far away, so all rays from the source are parallel.
Process:

1. Image Capture: The object is photographed from a fixed


viewpoint under several light sources with varying directions. It's
crucial to know the exact direction of each light source.
2. Intensity Analysis: For each pixel in the captured images, the
intensity values are collected. These values represent the amount of
light reflected towards the camera at that specific point on the object
under the corresponding lighting condition.
3. Shape from Shading: Mathematical models are applied to relate
the captured light intensity values at each pixel to the surface normal
and lighting conditions. This typically involves solving a system of
equations where each image under a different light source
contributes one equation.

4.Once the surface normal vectors are known for all pixels, we can
reconstruct the 3D shape of the object.

Mathematical Equations:

The radiosity at a point x on the surface is

where N is the unit surface normal and S1 is the source vector.

The response of the camera is linear in the surface radiosity, and so


have that the value of a pixel at (x, y) is
where g(x, y) = ρ(x, y)N(x, y) and V1 = kS1, where k is the constant
connecting the camera response to the input radiance.

The albedo can be extracted as given by the equation

Finally the normal is extracted using the equation

To recover the depth map, we need to determine f(x, y) from measured


values of the unit normal.

The surface can be reconstructed by summing the changes in height


along some path given by,

where C is a curve starting at some fixed point and ending at (x, y)


and c is a
constant of integration, which represents the (unknown) height of the
surface at the start point.
Advantages

 Simple and inexpensive


 Can be used to reconstruct the shape of a variety of objects

Disadvantages
 Assumes a Lambertian surface
 Sensitive to noise and shadows.

Limitations:

1. Lambertian Surfaces: The classic approach to photometric


stereo assumes Lambertian surfaces. These are ideal surfaces
where reflected light intensity is proportional to the cosine of the
angle between the light source direction and the surface normal.
Real-world surfaces often deviate from this ideal behavior.

2. Lighting Setup: Knowing the exact lighting directions is


essential for accurate results. In practice, calibrating the lighting
setup can be challenging.

3. Specular Highlights: Shiny or specular surfaces can cause


issues as they reflect light in a mirror-like fashion, making the
reflected intensity less dependent on surface orientation.

Applications:

1. 3D Reconstruction: Photometric stereo can be used to create a


3D model of an object's surface by combining the estimated surface
normals with other techniques like depth estimation.

2. Facial Recognition: In conjunction with other methods,


photometric stereo can help analyze subtle details of facial surfaces
for recognition purposes.

3. Robot Vision: By understanding the 3D shape of objects in their


environment, robots can grasp and manipulate them more
effectively.

Benefits of Shading Models:


The top six benefits of 3D shading models are:
1. Realism
3D shading models accurately simulate how light interacts with
surfaces, resulting in more realistic renderings of real-world objects
and environments.

2. Enhanced visual appeal


By adding highlights, shadows, and reflections, 3D shading models
improve the visual appeal of 3D computer graphics, making them
more engaging and captivating.

3. Depth and dimension


Shading models contribute to the perception of depth and
dimensionality in 3D graphics by manipulating light and shadow to
create the illusion of form and volume.

4. Artistic control
Shading models offer artists and designers precise control over the
appearance of materials and surfaces, helping them achieve specific
aesthetic goals and convey desired moods or atmospheres.

5. Increased efficiency
Advanced shading techniques, such as physically-based rendering,
can help streamline the rendering process by accurately simulating
light behaviour while minimizing the need for manual adjustments,
resulting in more efficient workflows.

6. Versatility
3D shading models are versatile tools that can be applied to
animation, visual effects, video games, architectural visualization,
and product design, providing consistent and high-quality results
across various media and industries.

2.6 Interreflections: Global Shading Models

Local shading models can be quite misleading.


In computer vision, interreflections refer to the phenomenon where
light bounces between multiple surfaces in a scene before reaching
the camera sensor.

This creates a situation where surfaces are illuminated not just by


the direct light source, but also by the reflected light from other
surfaces.

Thus, in the real world, each surface patch is illuminated not only by
sources, but also by other surface patches.

This leads to a variety of complex shading effects, which are still quite
poorly understood.

Unfortunately, these effects occur widely, and it is still not yet known
how to simplify interreflection models without losing essential
qualitative properties.

When one black room with black objects and a white room with white
objects are illuminated by a distant point source, the local shading
model predicts that these pictures would be indistinguishable.

But in fact, the images are qualitatively different, with darker


shadows and crisper boundaries in the black room, and bright
reflexes in the concave corners in the white room.

This is because surfaces in the black room reflect less light onto other
surfaces (they are darker) whereas in the white room, other surfaces
like the walls and floor of the room are significant sources of
radiation, which tend to light up the corners, which would otherwise
be dark.

2.6.1 An Interreflection Model


The total radiosity of a patch will be its exitance — which will be zero
for all but sources — plus all the radiosity due to all the other patches
it can see:
Since there is no distinction between energy leaving another patch
due to exitance and that due to reflection, the expression for
Bincoming(u) is given as:

Where every other patch in the world that the patch under
consideration can see is an area source, with exitance B(v).

Figure 2.16. Terminology for expression derived in the text for


the interreflection kernel.

where the terminology is that of Figure 2.16 and


visible(u, v)K(u, v) is usually referred to as the interreflection
kernel.

This means that our model is:

In particular, the solution appears inside the integral.

Equations of this form are known as Fredholm integral equations


of the second kind.

This particular equation is a fairly nasty sample of the type, because


the interreflection kernel is generally not continuous and may have
singularities.
Solutions of this equation can yield quite good models of the
appearance of diffuse surfaces and the topic supports a substantial
industry in the computer graphics community.

2.6.2 Solving for Radiosity


We will sketch one approach to solving for radiosity, to illustrate the
methods.

Subdivide the world into small, flat patches and approximate the
radiosity as being constant over each patch.

This approximation is reasonable, because we could obtain a very


accurate representation by working with small patches.

Now we construct a vector B, which contains the value of the


radiosity for each patch.

In particular, the i’ th component of B is the radiosity of the i’th patch.

We write the incoming radiosity at the i’th patch due to radiosity on


the j’th patch as Bj→i:
where x is a coordinate on the i’th patch and v is a coordinate on the
j’th patch.

Now this expression is not a constant, and so we must average it over


the i’th patch to get

where Ai is the area of the i’th patch.

If we insist that the exitance on each patch is constant, too, we obtain


the model:

This is a system of linear equations in Bi (although an awfully big one


— Kij
could be a million by a million matrix), and as such can in principle
be solved.

2.6.3 The qualitative effects of interreflections


Extracting shape information from radiosity is relatively easy to do
with a local model.

The model describes the world poorly, and very little is known about
how severely this affects the resulting shape information.
Extracting shape information from an interreflection model is
difficult, for two reasons.

Firstly, the relationship —which is governed by the interreflection


kernel — between shape and radiosity is complicated.

Secondly, there are almost always surfaces that are not visible, but
radiate to the objects in view.

These so-called “distant surfaces” mean it is hard to account for


all radiation in the scene using an interreflection model, because
some radiators are invisible and we may know little or nothing
about them.

Hence, understanding qualitative, local effects of interreflection


becomes important, which largely an open research topic.

Smoothing and Regional Properties


Firstly, interreflections have a characteristic smoothing effect.

Imagine a stained glass window: the intricate details are lost when
looking at the colored blobs it projects on the floor.

Figure 2.18
This effect is further explained with a simplified model:
A small patch views a plane with sinusoidal radiosity of unit
amplitude as shown in figure 2.18 above.

This patch will have a (roughly) sinusoidal radiosity due to the


effects of the plane.

We refer to the amplitude of this component as the gain of the


patch.

The graph shows numerical estimates of the gain for patches at ten
equal steps in slant angle, from 0 to π/2, as a function of spatial
frequency on the plane.

The gain falls extremely fast, meaning that large terms at high
spatial frequencies must be regional effects, rather than the
result of distant radiators.

Light bounces between a slanted patch and a flat plane. The model
shows that high spatial frequencies (sharp details) have a hard time
traveling between the surfaces.

Interreflections tend to blur high-detail information (high spatial


frequencies) in an image. This is because light bounces between
surfaces, and these bounces smooth out sharp variations in light
intensity.

Therefore, if we observe a sharp detail (high spatial frequency) in an


image, it's unlikely to be caused by the influence of distant, non-
luminous objects.

This is why it is hard to determine the pattern in a stained glass


window by looking at the floor at foot of the window.

There is a mid range of spatial frequencies that are largely unaffected


by mutual illumination from distant surfaces, because the gain is
small. Spatial frequencies in this range cannot be “transmitted” by
distant passive radiators unless these radiators have improbably
high radiosity.

There's a range of medium frequencies that are also not significantly


affected by distant objects. These frequencies can be thought of as
local properties, only influenced by interreflections within a specific
area.

A second important effect is colour bleeding, where a coloured


surface reflects light onto another coloured surface.

This is a common effect that people tend not to notice unless they
are consciously looking for it. It is quite often reproduced by
painters.

A reflex is an involuntary muscle reaction in response to sensory


stimulation that brings about a change.

Some changes are clearly visible while others are not, depending on
the distances between the surveillance camera and subjects.

Interreflection effects are often ignored, which causes a reflex


response of hostility.

If interreflection effects do not change the output of a method much,


then it is probably all right to ignore them.

COLOUR

Colour is a rich and complex experience, usually caused by the vision


system, responding differently to different wavelengths of light (other
causes include pressure on the eyeball and dreams).

While the colour of objects seems to be a useful cue in identifying


them, it is currently difficult to use.
3.1 The Physics of Colour
Color is all about light and how our eyes interact with it.

The major components include:

Light as a wave: Light is a form of electromagnetic radiation, which


is assumed as a wave with a specific wavelength and frequency.
Different wavelengths correspond to different colors we perceive.

Visible spectrum: Not all electromagnetic radiation is visible to us.


Our eyes can detect a specific range of wavelengths, called the
visible spectrum.
This spectrum ranges from roughly 380 nanometers (violet) to 750
nanometers (red).

Object interaction: Objects appear colored because they interact


with light in different ways.
An object might absorb certain wavelengths and reflect others.
For instance, a red apple absorbs most wavelengths except for the
red ones, which it reflects back to our eyes, making it appear red.

Color perception: When the reflected light reaches our eyes, the
cone cells in our retina react to the specific wavelengths.
These signals are then sent to the brain, which interprets them as
different colors.

Thus, the physics of colour extends the radiometric vocabulary to


describe energy arriving in different quantities at different
wavelengths and then describing typical properties of coloured
surfaces and coloured light sources.

3.1.1 Radiometry for Coloured Lights: Spectral Quantities


All of the physical units we have described earlier can be extended
with the phrase “per unit wavelength” to yield spectral units.

These allow us to describe differences in energy, in BRDF or in albedo


with wavelength.

the definitions studied earlier can be extended by adding the phrase


“per unit wavelength,” to obtain what are known as spectral
quantities and ignore the cases where energy changes wavelength.

Spectral radiance is usually written as Lλ(x, θ, φ), and the radiance


emitted in the range of wavelengths [λ, λ+dλ] is Lλ(x, θ, φ)dλ.

Spectral radiance has units Watts per cubic meter per steradian
(Wm−3sr−1 — cubic meters because of the additional factor of the
wavelength).

spectral exitance has units Wm−3 and is the property where the
angular distribution of the source is unimportant.

Similarly, the spectral BRDF is obtained by considering the ratio of


the spectral radiance in the outgoing direction to the spectral
irradiance in the incident direction.
Because the BRDF is defined by a ratio, the spectral BRDF will again
have units sr−1.

The Colour of Surfaces


The colour of coloured surfaces is a result of a large variety of
mechanisms, including differential absorbtion at different
wavelengths, refraction, diffraction and bulk scattering.

Usually these effects are bundled into a macroscopic BRDF model,


which is typically a Lambertian plus specular approximation; the
terms are now spectral reflectance (sometimes abbreviated to
reflectance) or (less commonly) spectral albedo.

The colour of the light returned to the eye is affected both by the
spectral radiance of the illuminant and by the spectral reflectance of
the surface.

If we use the Lambertian plus specular model, we have:


E(λ) = ρdh(λ)S(λ) × geometric terms + specular terms
where E(λ) is the spectral radiosity of the surface, ρdh(λ) is the spectral
reflectance and S(λ) is the spectral irradiance.

The specular terms have different colours depending on the surface


— i.e. a spectral specular albedo is needed.

Colour and Specular Reflection


Generally, metal surfaces have a specular component that is
wavelength dependent— a shiny copper coin has a yellowish glint.

Surfaces that do not conduct — dielectric surfaces— have a


specular component that is independent of wavelength — for
example, the specularities on a shiny plastic object are the colour of
the light.

3.1.3 The Colour of Sources


Building a light source usually involve heating something until it
glows.
An idealization of this process is described below:

Black Body Radiators


A body that reflects no light — usually called a black body — is the
most efficient radiator of illumination.

An object will appear black when it absorbs all wavelengths of


visible light.

Therefore, no light is scattered to our eye.

A heated black body emits electromagnetic radiation.

It is a remarkable fact that the spectral power distribution of this


radiation depends only on the temperature of the body.

It is possible to build quite good black bodies (one obtains a hollow


piece of metal and looks into the cavity through a tiny hole — very
little of the light getting into the hole will return to the eye), so that
the spectral power distribution can be measured.

At relatively low temperatures, black bodies are red, passing


through orange to a pale yellow-white to white as the temperature
increases.

It is possible to build quite good black bodies so that the spectral


power distribution can be measured.
A hollow piece of metal is obtained and when looked into the cavity
through a tiny hole — very little of the light getting into the hole will
return to the eye.

E(λ) is the spectral radiosity of the surface which is given by:

Where T is the temperature of the body in Kelvins,

h is Planck’s constant, is 6.62607015×10−34 joule/hertz


(or Joule-seconds).

k is Boltzmann’s constant is 1.380649×10−23


Joules/degree Kelvin

c is the speed of light is 3×108 m/s and

λ is the wavelength.

The parameter family of light colours which corresponds to black


body radiators is temperature and hence colour temperature of a
light source becomes a very important parameter.

This is the temperature of the black body that looks most similar.

The Sun and the Sky


The sun is considered as most important natural source of light and
it is usually modelled as a distant, bright point.

The colour of sunlight varies with time of day as shown in figure 3.3
and time of year.
Figure 3.3

In the figure, wavelength is plotted against spectral energy density


for seven different daylight measurements, as shown.

The sky is another important natural light source.


The sky is bright because light from the sun is scattered by the air.

A crude geometrical model for sky is assuming it to be a


hemisphere with constant exitance.
But since the sky is substantially brighter at the horizon than at
the zenith, this assumed model of sky is not correct.

The natural model for sky is to consider air as emitting a constant


amount of light per unit volume; which is why the sky is brighter on
the horizon than at the zenith, because a viewing ray along the
horizon passes through more sky.
Rayleigh Scattering:
Rayleigh scattering is a phenomenon in which light particles are
scattered by particles smaller than the wavelength of the light (< 1
/10 wavelength).

It's named after the British scientist Lord Rayleigh, who first
described it in the 19th century.

In Rayleigh scattering, shorter wavelengths of light (like blue and


violet) scatter more strongly than longer wavelengths (like red and
orange).

This is why the sky appears blue: sunlight is scattered by the


molecules in Earth's atmosphere, with shorter blue wavelengths
scattering more than longer red wavelengths, making the sky
appear blue to our eyes.
Rayleigh scattering also explains why sunsets and sunrises often
appear red or orange, as the longer wavelengths of light scatter less,
allowing more red and orange light to reach our eyes.

Rayleigh scattering is called as elastic scattering since the energies


of the scattered light particles are not changed.

Lord Rayleigh calculated the scattered intensity from dipole


scatterers much smaller than the wavelength to be:

Where λ is wavelength,
N is no of scatterers
α is polarizability
R is distance from scatterer
θ is angle.

That is the response is inversely proportional to the fourth power of


wavelength.

Mie Scattering:
The Mie scattering model is a mathematical description of how light
is scattered by particles that are comparable in size to the
wavelength of the light.

It's named after the German physicist Gustav Mie, who developed it
in the early 20th century.

Unlike Rayleigh scattering, which applies to particles much smaller


than the wavelength of light, the Mie scattering model is used for
larger particles like dust, pollen, or water droplets.

Mie scattering describes how light interacts with these larger


particles, taking into account factors such as the size of the
particles, the refractive index of the particles and the surrounding
medium, and the wavelength of the incident light.

The clouds appear to be white due to Mie Scattering.

Unlike Rayleigh scattering, which varies with the inverse fourth


power of the wavelength, Mie scattering's dependence on
wavelength is more complex, leading to different scattering patterns
for different colors of light.

The equation is usually expressed in terms of scattering efficiency,


phase function, and scattering amplitude.

The general form of the Mie scattering equation is:

where:
𝑄scat is the scattering efficiency, representing the fraction of incident
light scattered by the particle.

x is the size parameter, defined as 𝑥 =2𝜋𝑟/𝜆


r is the radius of the particle and
λ is the wavelength of the incident light
and an and bn are the Mie coefficients, which depend on the size
parameter
x, refractive indices of the particle and surrounding medium, and
the scattering angle.

Artificial Illumination
Artificial illumination, as opposed to natural light from the sun, refers
to any lighting created by humans.

Artificial illumination refers to the deliberate use of artificial sources


of light to illuminate indoor or outdoor spaces.

It plays a crucial role in modern society, providing illumination for


various activities, enhancing safety and security, and contributing to
aesthetics and ambiance.

Typical artificial light sources are commonly of a small number of


types.
• An incandescent light contains a metal filament which is heated
to a high temperature.

The construction of an incandescent lamp can be done by using


different parts like a Glass bulb, Inert gas, Tungsten filament,
Contact wire to foot, Contact wire to base, Support wires, Glass
mount or support, Base contact wire, Screw threads, Insulation, and
Electrical foot contact.

The spectrum roughly follows the black-body law, meaning that


incandescent lights in most practical cases have a reddish tinge.
• Fluorescent lights work by generating high speed electrons that
strike gas within the bulb; this in turn releases ultraviolet radiation,
which causes phosphorus coating inside of the bulb to fluoresce.

Typically, the coating consists of three or four phosphors, which


fluoresce in quite narrow ranges of wavelengths.

Most fluorescent bulbs generate light with a bluish tinge, but bulbs
that mimic natural daylight are increasingly available (figure 3.4 (b)).
• In some bulbs, an arc is struck in an atmosphere consisting of
gaseous metals and inert gases. Light is produced by electrons in
metal atoms dropping from an excited state, to a lower energy state.

Typical of such lamps is strong radiation at a small number of


wavelengths, which correspond to particular state transitions.

The most common cases are sodium arc lamps, and mercury arc
lamps.

Sodium arc lamps produce a yellow-orange light extremely


efficiently,
and are quite commonly used for freeway lighting.

Mercury arc lamps produce a blue-white light, and are often used
for security lighting.
The graph in figure 3.4 (a) shows the relative spectral power
distribution of two standard CIE models, illuminant A — which
models the light from a 100W Tungsten filament light bulb, with
colour temperature 2800K
— and illuminant D-65 — which models daylight.
Figure 3.4 (b) show a sample of spectra from different light bulbs.

Figure 3.4 (a) Figure 3.4 (b)

3.2 Human Colour Perception

Spectral Energy Density:


The light coming out of sources or reflected from surfaces has more
or less energy at different wavelengths, depending on the processes
that produced the light.
This distribution of energy with wavelength is sometimes called a
spectral energy density.

To be able to describe colours, we need to know how people respond


to them.

Different kinds of color receptor in the human eye respond more or


less
strongly to light at different wavelengths, producing a signal that is
interpreted as color by the human vision system.
3.2.1 Colour Matching:

The precise interpretation of a particular light is a complex function


of context; illumination, memory, object identity, and emotion, etc.

The fact that different spectral radiances produce the same


response from people under simple viewing conditions yields a
simple, linear theory of
colour matching which is accurate and extremely useful for
describing colours.

Colour matching theory is a fascinating blend of art and science


that helps us understand how humans perceive colour and create
pleasing or impactful colour combinations. It all revolves around a
central tool.

Color matching theory, also known as the trichromatic theory or the


Young-Helmholtz theory.

It all revolves around a central tool: the colour wheel.

The Colour Wheel:


Developed by Sir Isaac Newton in the 17th century, the colour
wheel is a circular representation of the colour spectrum.

It typically consists of 12 hues – primary colours (red, yellow,


blue), secondary colours (orange, green, violet) created by mixing
primaries, and tertiary colours (mixtures of a primary and a
secondary colour).

The wheel's magic lies in how it visually encodes the relationships


between these colours.
Colour matching theory uses the colour wheel to identify colour
schemes that are considered harmonious or aesthetically pleasing.
Here are some key concepts:

Primary, Secondary & Tertiary Colours: As mentioned earlier,


these are the building blocks. Understanding how they mix and
interact is fundamental.

Complementary Colours: Colours sitting directly opposite each


other on the wheel (e.g., red and green, blue and orange) are called
complementaries. They create high contrast and vibrancy when
used together.
Analogous Colours: Colours that sit next to each other on the
wheel (e.g., red, orange, yellow) are analogous. They create a sense
of calmness and flow when used together.

Triadic Colours: Three colours spaced evenly around the wheel


(e.g., red, yellow, blue) form a triadic scheme. This creates a bold
and vibrant combination.

Monochromatic: This scheme uses variations of a single colour


(shades, tints) for a unified and elegant look.

Warm vs. Cool Colours: The colour wheel is also divided into warm
colours (red, orange, yellow) associated with fire and energy, and
cool colours (blue, green, violet) associated with water and
calmness. Understanding this temperature distinction helps create
specific moods in a design.

The simplest case of colour perception is obtained when only two


colours are in view, on a black background.

Human perception of colour can be studied by asking observers to


mix
coloured lights to match a test light, shown in a split field, by two
methods:
1. Additive matching
2. Subtractive matching

Additive Matching:
Additive color matching occurs when different colored lights are
combined.

The primary colors in additive matching are red, green, and blue
(RGB).

When all three primary colors are mixed at full intensity, they
produce white light. This is known as Additive colour matching.
This is the principle behind electronic displays such as televisions
and computer monitors.
Figure 3.5 below shows the outline of such additive colour
matching.

The adjustments involve changing the intensity of some fixed


number of primaries in the mixture.
In this form, a large number of lights may be required to obtain a
match, but many different adjustments may yield a match.

Write T for the test light, an equals sign for a match, the weights wi
and the
primaries Pi.

The mixture of primaries can be written as w1P1+w2P2+w3P3;

A match can then written in an algebraic form as:

T = w1P1 + w2P2 + w3P3

meaning that test light T matches the particular mixture of


primaries (P1, P2, P3) given by (w1, w2, w3).

Figure 3.5

Subtractive Colour Matching:


In subtractive matching, the viewer can add some amount of some
primaries to the test light instead of to the match.
This can be written in algebraic form by allowing the weights in the
expression above to be negative, given by:

T = w1P1 - w2P2 - w3P3

In the subtractive color model, pigment is used to produce color


using reflected light.
The subtractive color matching model is used in printing, silk-
screening, painting and other mediums that add pigment to a
substrate.

The subtractive colors are cyan, yellow, magenta and black, also
known as CMYK.

Subtractive color begins with white (paper) and ends with black; as
color is added, the result is darker.
Black is referred to as "K," or the key color, and is also used to add
density.

Trichromacy:
Trichromacy, also known as trichromatic vision, is the ability to see
a wide range of colors.
Tri means three and chromacy means colour.

Trichromacy, also known as the Young-Helmholtz theory of color


vision, is a fundamental principle that explains how humans
perceive color.

It states that to perceive and differentiate between different


colors, only three primary colours are sufficient.
This phenomenon is known as the principle of trichromacy.

Some limitations of the principle of trichromacy are:


Firstly, subtractive matching must be allowed, and
secondly, the primaries must be independent — meaning that no
mixture of two of the primaries may match a third.
The principle of trichromacy is often explained by assuming that
there are three distinct types of colour transducers/ colour
receptors in the eye and these three types of colour transducers/
colour receptors are common to most people.

The retina contains millions of photoreceptors called rods and


cones.
When light enters the pupil of our eye, it travels to the retina in the
back of the eye.
When the rods and cones detect light, they send a signal to the
brain for interpretation.

The rods are sensitive to light and help us to see in dim lighting,
whereas the cones allow us to detect color and detail in normal
lighting.
Of the three types of color receptors, one is most sensitive to:
Short-wavelength (S) which corresponds to blue colour,
medium-wavelength (M) which corresponds to green colour and
long-wavelength (L) which corresponds to red colour.

The combinations of these three colors could produce all of the


colors that we are capable of perceiving.

But the combinations of only two colors could not produce all of the
colors that we are capable of perceiving.

Grassman’s Laws
Grassmann's Law, also known as the Grassmann's Axiom, is a
principle in color perception and computer vision that describes how
colors mix and interact.

It's named after Hermann Grassmann, a 19th-century German


mathematician who made significant contributions to various fields,
including optics and algebra.

Grassmann's Law is particularly relevant in the context of color


spaces and color models used in computer vision and image
processing.
In computer vision and image processing, Grassmann's Law is
applied in various tasks such as color correction, color matching, and
color synthesis.

Understanding how colors mix according to Grassmann's Law is


essential for accurately representing and manipulating colors in
digital images and videos.

1. Color Mixing Equations: Grassmann's Law can be


mathematically expressed using color mixing equations that describe
how the components of different colors combine.

These equations typically involve linear combinations of the color


components/primary colour, with each component/primary colour
weighted by its respective intensity or contribution to the final color.

Experimental facts have proved that matching is (to a very accurate


approximation) linear. This yields Grassman’s laws.

Linear Superposition: According to Grassmann's Law, the


perception of a color resulting from the mixture of two or more colors
is determined by the linear superposition of the spectral power
distributions of those colors. In other words, the combined effect of
multiple light sources is the sum of their individual effects.

1. Firstly, if we mix two test lights, then mixing the matches will
match the result, that is, if
Ta = wa1P1 + wa2P2 + wa3P3
and
Tb = wb1P1 + wb2P2 + wb3P3
Then
Ta + Tb = (wa1 + wb1 )P1 + (wa2 + wb2 )P2
+ (wa3 + wb3 )P3
Where Ta and Tb are two test lights,
P1 , P2 and P3 are three different primaries,
wa1 and wb1 are weights or percentage of intensity of P1 which
contributes to the final colour.
wa2 and wb2 are weights or percentage of intensity of P2 which
contributes to the final colour.
wa3 and wb3 are weights or percentage of intensity of P3 which
contributes to the final colour.

2. Secondly, if two test lights can be matched with the same set of
weights, then they will match each other, that is, if
Ta = w1P1 + w2P2 + w3P3
and
Tb = w1P1 + w2P2 + w3P3
Then
Ta = Tb
Where w1, w2 and w3 are the weights or percentage of intensity of P1,
P2 and P3 respectively, which contributes to the final colour.

3. Finally, matching is linear: if


Ta = w1P1 + w2P2 + w3P3
then
kTa = (kw1)P1 + (kw2)P2 + (kw3)P3
for non-negative k.

Exceptions
Given the same test light and the same set of primaries, most people
will use the same set of weights to match the test light.

This, trichromacy and Grassman’s laws are about as true as any law
covering biological systems can be.

The exceptions include:


• people with aberrant(departing from an accepted standard) colour
systems as a result of genetic ill-fortune (who may be able to match
everything with fewer primaries);

• people with aberrant colour systems as a result of neural ill-fortune


(who may display all sorts of effects, including a complete absence of
the sensation of colour);

• some elderly people (whose choice of weights will differ from the
norm, because of the development of macular pigment in the eye);

• very bright lights (whose hue and saturation look different from less
bright versions of the same light);

• and very dark conditions (where the mechanism of colour


transduction is
somewhat different than in brighter conditions).

In summary, Grassmann's Law is a fundamental principle in color


perception and computer vision that governs how colors mix and
interact. By understanding the principles of additive color mixing and
linear superposition, computer vision systems can accurately
represent and manipulate colors in digital images and videos.
3.2.2 Colour Receptors
Trichromacy suggests that there are profound constraints on the way
colour is transduced (converted in another form) in the eye.

One hypothesis assumes that there are three distinct types of


receptors in the eye that mediate colour perception.

Each of these receptors turns incident light into neural signals.

Each of these colour receptors have different amount of sensitivity to


different colours which can be deducted from colour matching
experiments.

If two test lights that have different spectra, look the same, then they
must have the same effect on these receptors.
The Principle of Univariance
The principle of univariance is a concept that applies to biological
vision systems, but it's relevant to understanding color perception in
computer vision as well.

The principle of Univariance is a fundamental concept in computer


vision, particularly in the context of image formation and processing.

This principle is based on the properties of the human visual system


and the way in which light interacts with photoreceptor cells.

Our eyes contain photoreceptor cells called cones that respond to


light of different wavelengths (colors).

The principle of univariance states that a single cone cell can't


distinguish between changes in wavelength (color) and changes in
light intensity or different combinations of wavelengths that
produce the same level of light intensity, because different
combinations of wavelengths can stimulate the same cone cell with
the same amount of total light.

That means photoreceptors can only convey information about the


total amount of light that falls on it, regardless of the specific
wavelength or color or intensity of that light.
As a result, the visual system cannot uniquely determine the
wavelength or color of light based solely on the response of a single
photoreceptor.
Imagine this:
We have a sensor that only measures the total amount of water
poured into a bucket (intensity).

We pour in 1 liter of blue water (short wavelength) or 1 liter of red


water (long wavelength) of 1 litre of green water (medium
wavelength)- the sensor reads the same level (univariant),
irrespective of the colour of water. It does not recognize the colour of
water.

Overcoming Univariance:
Our brains have multiple cone types, each with a different peak
sensitivity to specific wavelength ranges (red, green, blue).

To overcome this challenge, computer vision systems rely on


complex algorithms and processing techniques to extract
meaningful information from the raw pixel data and infer the
characteristics of the observed scene or object.

These techniques may involve pattern recognition, may involve


pattern recognition, feature extraction and machine learning
algorithms to interpret the visual information and make sense of
the images captured by the sensors.

By dissecting light sensitive cells, measuring their responses to light


at different wavelengths and comparing the outputs from these
different cones, the brain can decode the actual wavelength and
perceive color.

Because the system of matching is linear, the receptors must be


linear.

Let pk be the response of the k’th receptor,


σk(λ) for its sensitivity,
E(λ) for the light arriving at the receptor and
Λ (capital letter lambda) for the range of visible wavelengths.

The overall response of a receptor is obtained by adding up the


response to each separate wavelength in the incoming spectrum
which is given by

In computer vision, understanding the concept of univariance is


important for developing algorithms and techniques for tasks such
as color correction, color image processing, and object recognition.
By understanding how the human visual system processes color
information, researchers and engineers can develop more effective
algorithms for analyzing and interpreting digital images and videos.

Additionally, knowledge of univariance can also inform the design of


image sensors and other hardware components used in computer
vision systems.

Rods and Cones


The retina has two types of cells that are sensitive to light,
differentiated by their shape: 1. Rods and 2. Cones.

The light sensitive region of a cone has a roughly conical shape,


whereas rod has roughly cylindrical shape.

Cones largely dominate colour vision and completely dominate the


fovea.

Cones are responsible for vision in bright light conditions, also known
as photopic vision.

Cones are somewhat less sensitive to light than rods are, meaning
that in low light, colour vision is poor and it is impossible to read.

There are three types of cones, differentiated by their sensitivity.


The sensitivities of the three different kinds of cones to different
wavelengths can be obtained by comparing colour matching data for
normal observers with colour matching data for observers lacking
one type of cone.

Sensitivities obtained in this fashion are shown in Figure 3.6 below:


The three types of cones respond to all photons in the same way, but
in different amounts.

The figure 3.6 above shows the log of the relative spectral sensitivities
of the three kinds of colour receptors in the human eye.

The three types of cone are properly called S cones, M cones and L
cones (for their peak sensitivity being to short, medium and long
wavelength light respectively).

They are occasionally called blue, green and red cones, which is a
bad practice.

The first two receptors —sometimes called the red and green cones
respectively, but more properly named the long and medium
wavelength receptors — have peak sensitivities at quite similar
wavelengths.
The third type of receptors sometimes called the blue cones, but more
properly named the short wavelength receptors - have a very different
peak sensitivity.
The response of a receptor to incoming light can be obtained by
summing the product of the sensitivity and the spectral radiance of
the light, over all wavelengths.

Differences between Rods and Cones:

Rods Cones

1. These are the types of These are the types of


photoreceptors in the eye whose photoreceptors in the eye whose
outer segment is rod shaped or outer segment is cone shaped.
cylindrical.
2. Consists of single type of Consists of 3 types of cells S, M
cells and L Cones for red, green and
blue primary colours.

3. Very sensitive to light Less sensitive to light

4. There are about 120 million There are about 6-7 million cone
rod cells. cells.

5. Lead to monochromatic Lead to colour vision.


vision
6. Contain rhodopsin as the Contain iodopsin as the visual
visual pigment pigment

7. Located in the periphery of Located in the center of the


the retina. retina.

8. Rods do not dominate the Cones completely dominate the


fovea. fovea.

9. Deficiency of rhodopsin Deficiency of iodopsin causes


causes night blindness. colour blindness.
10. Used for vision under low Used for vision under high light
light conditions. conditions.

11. Provide less visual acuity Provide more visual acuity

12. Low resolution because High resolution because each


many rods share a single cone has its own neuron
neuron connected to the brain. connected to the brain.

3.3 Representing Colour


Describing colours accurately is a matter of great commercial
importance.

Many products are closely associated with very specific colours and
manufacturers take a great deal of trouble to ensure that different
batches have the same colour.

This requires a standard system for talking about colour.

Simple names are insufficient, because relatively few people know


many colour names, and most people are willing to associate a large
variety of colours with a given name.

Colour matching data yields simple and highly effective linear colour
spaces
(section 3.3.1).

Specific applications may require colour spaces that emphasize


particular properties (section 3.3.2) or uniform colour spaces, which
capture the
significance of colour differences (section 3.3.2).

3.3.1 Linear Colour Spaces


A color space is a mathematical model that organizes colors in a way
that facilitates their representation in digital devices like cameras,
monitors, and printers.
Each color space has a specific set of coordinates that define colors
within it.

In computer vision, understanding color spaces is crucial for


processing and analyzing images accurately.

The most common color spaces include RGB (Red, Green, Blue),
CMYK (Cyan, Magenta, Yellow, Black), HSV/HSL (Hue, Saturation,
Value/Lightness), and XYZ.

Linear colour spaces is a natural mechanism for representing colour


by agreeing on a standard set of primaries, and then describing any
coloured light by the three values of the weights for each primary that
would match the light using those primaries.

Linear color spaces are a subset of color spaces that have gained
importance due to their compatibility with various image processing
algorithms, especially those involving linear transformations like
convolution and matrix operations.

A color space is considered linear if the relationship between the


numerical values representing color and the actual perceived color is
linear.

In other words, if you double the numerical value representing a color


component, you should perceive the resulting color as twice as
bright.

Linear colour spaces are obtained are easy to use to describe a colour
by setting up and performing matching experiments and transmitting
the match weights.

This approach extends to give a representation not only for surface


colours but also when a standard light is used for illuminating the
surface.

Importance of Linear Colour Spaces in Computer Vision:


1. Mathematical Operations: Linear color spaces are essential for
various image processing tasks, such as convolution, filtering, and
matrix transformations. Since these operations rely on linear algebra
principles, using linear color spaces ensures that the mathematical
operations accurately represent the visual transformations.
2. Accurate Calculations: Since intensity directly relates to
numerical values, linear spaces enable precise calculations for tasks
like lighting correction, color balancing, and image analysis.

3. Consistency in Manipulation: Linear color spaces maintain


consistency when applying operations like brightness adjustments,
contrast enhancements, and blending multiple images. Non-linear
color spaces might introduce unexpected results or distortions
during such manipulations.

4. Color Correction and Calibration: In applications like image


editing, computer graphics, and machine vision, accurate color
representation is crucial. Linear color spaces provide a standardized
framework for color correction and calibration, ensuring consistent
color reproduction across different devices and environments.

5. Physical Accuracy: Linear color spaces often correspond more


closely to the physical properties of light, making them suitable for
applications where precise color representation is necessary, such as
scientific imaging, medical imaging, and remote sensing.

6. Easier manipulation: Mathematical operations like addition,


subtraction, and scaling have a more intuitive meaning in linear
spaces, simplifying image processing algorithms.

Examples of Linear Color Spaces:


Standard RGB: Most camera sensors capture raw data with red,
green, and blue (RGB) values proportional to the light intensity
striking those pixels. This raw data is considered linear as it directly
reflects the captured light.

Linear RGB (sRGB): Standard RGB is not truly linear because of


gamma correction applied during image storage. However, some
image processing pipelines use a linearized version of RGB, where
the gamma correction is removed to achieve linear light intensity
values.

CIE XYZ: The CIE XYZ color space is linear and serves as a
foundational color space for various color models. It represents
colors based on their spectral characteristics and is widely used in
color science and standardization.

CIE Lab: Although not strictly linear, CIE Lab is approximately


perceptually uniform, making it suitable for color transformations
and adjustments in linear color spaces.

Applications of Linear Color Spaces:

Pre-processing: Linear spaces are often used as an intermediate


step during image pre-processing. Calculations like white balancing
and color correction are performed in the linear domain for better
accuracy.

Feature extraction: In some computer vision tasks, features might


be based on light intensity variations. Linear spaces ensure these
features accurately reflect the actual light information.

Calibration: Cameras can be calibrated using linear color spaces to


ensure consistent color response across different lighting
conditions.

Limitations of Linear Color Spaces:

Dynamic Range: Linear color spaces may not fully utilize the
available dynamic range of digital sensors and displays, especially
in low-light or high-brightness conditions. Linear spaces can
struggle with the vast dynamic range of real-world scenes. Cameras
often capture data with a limited range, requiring non-linear spaces
for efficient storage and display. However, this limitation can be
addressed through techniques like gamma correction or tone
mapping.
Perceptual Non-uniformity: Linear color spaces may not provide
perceptual uniformity, meaning that equal differences in color
components do not always correspond to equal perceptual changes
in color. However, perceptual uniformity can be achieved through
transformations like CIE Lab.

In conclusion, linear color spaces play a vital role in computer vision


by providing a standardized and mathematically consistent
framework for representing and manipulating colors in digital
images.

Their compatibility with linear algebra operations and their closer


correspondence to physical color properties make them
indispensable for a wide range of image processing applications.

However, it's important to understand their limitations and consider


non-linear spaces when dealing with aspects like human color
perception or wider dynamic range.

Colour Matching Functions


Colour matching functions (CMFs) are a fundamental concept in
colour science, particularly important in colorimetry, the science of
measuring and quantifying human colour perception.

Color matching functions (CMFs) are fundamental components of


color science used to describe how the human visual system
perceives colors.

They quantify the sensitivity of the three types of cone cells (red (L),
green (M) and blue (S)) in the human retina to different wavelengths
of light.

CMFs represent the amount of each primary colour required to


match monochromatic lights of different wavelengths across the
visible spectrum.

CMFs determine the weights that need to be used to match a source


of some known spectral radiance, given a fixed set of primaries.
The spectral radiance of the source can be thought of as a weighted
sum of single wavelength sources.

Because colour matching is linear, the combination of primaries that


matches a weighted sum of single wavelength sources is obtained by
matching the primaries to each of the single wavelength sources, and
then adding up these match weights.

Types of Colour Matching Functions:

The most common CMFs are defined by the International


Commission on Illumination (CIE):

1. CIE 1931 2-degree CMFs: This is the most widely used set,
representing colour matching for a 2-degree field of view (the central
part of human vision).
2. CIE 1964 10-degree CMFs: These CMFs account for a wider
field of view (10 degrees) and are more relevant for peripheral vision.
3. Normalized Functions: The color matching functions are
often normalized so that the area under each curve is equal to one.
Normalization ensures that the total response of the visual system
to all wavelengths of light is consistent across different conditions
and observers.

The colour matching functions, which are written as f1(λ), f2(λ) and
f3(λ); can be obtained from a set of primaries P1, P2 and P3 by
experiment.

During the experiement, the weight of each primary is tuned to match


a unit radiance source at every wavelength.
A set of weights is then obtained, one for each wavelength, for
matching a unit radiance source U(λ), which can be written as:
U(λ) = f1(λ)P1 + f2(λ)P2 + f3(λ)P3
i.e. at each wavelength λ, f1(λ), f2(λ) and f3(λ) give the weights
required to match a unit radiance source at that wavelength.
The source, represented as S(λ) is a sum of a vast number of single
wavelength sources, each with a different intensity, is then given by

(a) (b)

The figures (a) and (b) above plot the wavelength on x-axis and
Trstimulus value, i.e., relative sensitivity on y-axis.

The figure (a) above shows colour matching functions for the
primaries for the RGB system. The negative values mean that
subtractive matching is required to match lights at that wavelength
with the RGB primaries.

The figure (b) above shows colour matching functions for the CIE X,
Y and Z primaries; the colour matching functions are everywhere
positive, but the primaries are not real.
Applications of Colour Matching Functions:
CMFs play a crucial role in various applications:
1. Standardizing Colour Measurement: They provide a reference
for instruments that measure colour, ensuring consistent results
across different devices. They serve as the basis for defining color
spaces, color models, and color reproduction systems used in various
industries, including printing, imaging, and display technology.

2. Defining Colour Spaces: CMFs form the basis for developing


colour spaces like CIE XYZ, which represent colours numerically
based on human perception.

3. Colour Calibration: CMFs are used in calibrating monitors,


printers and displays to ensure they accurately reproduce colours.

4. Image Processing: Some image processing algorithms utilize


CMFs to transform images into colour spaces more aligned with
human perception.

Limitations of Colour Matching Functions:


1. Individual Variations: Colour vision can vary slightly between
individuals. CMFs represent an average observer and might not
perfectly capture everyone's colour perception.

2. Non-linearities: Human colour perception isn't entirely linear.


CMFs provide a good approximation but might not perfectly predict
how we perceive mixtures of colours.

General Issues for Linear Colour Spaces


Linear color naming systems can either be defined by:
1. specifying primaries, which imply color matching functions, or
by
2. specifying color matching functions, which imply primaries.

If the primaries are real lights, at least one color matching function
may be negative for some wavelengths.

This necessitates subtractive matching to match certain lights,


regardless of the chosen set of primaries.

While using negative CMF isn't a violation of natural law, it can be


inconvenient.
One solution to overcome this problem is to use color matching
functions that are always positive, which implies that the primaries
are imaginary.

This may seem problematic, but since color naming systems typically
rely on comparing weights rather than physically creating colors, it's
not a significant issue.

The CIE (The Commission International d’´eclairage,(The


International Commission for Illumination) has standardized various
systems to address these considerations.

The CIE XYZ Colour Space


The CIE XYZ color space is a foundational color model developed by
the International Commission on Illumination (CIE) in 1931.

The XYZ color space should be considered the master color space as
it can encompass and describe all other RGB color spaces.

It serves as a standardized and device-independent representation of


color, providing a basis for various color models and systems used in
color science, imaging, and colorimetry and is a very popular model.

Basis of CIE XYZ Color Space:

1. Tristimulus Theory: The CIE XYZ color space is based on the


tristimulus theory, which postulates that any color can be matched
by mixing three primary colors in varying proportions.

The CIE defined three imaginary primary colors, X, Y, and Z, which


are not based on any physical light sources but serve as a
standardized reference for color representation.

2. Standard Observer: The CIE XYZ color space is based on the


color matching functions defined by the CIE standard observer.

These functions describe the average color perception of human


observers under standardized viewing conditions and form the basis
for quantifying the human eye's sensitivity to different wavelengths
of light.

Components of CIE XYZ Color Space:

1. X, Y, and Z Values: In the CIE XYZ color space, each color is


represented by three coordinates: X, Y, and Z.
X: Represents the amount of stimulation of the L cones (long-
wavelength, red receptors).

Y: Represents the amount of stimulation of both the L and M cones


(medium-wavelength, green receptors). It is often referred to as the
luminance or brightness of the color.

Z: Represents the amount of stimulation of the S cones (short-


wavelength, blue receptors).

2. Color Space Primaries: The CIE XYZ color space has imaginary
primaries, meaning they do not correspond to any physically real
light sources.

These primaries are defined in such a way that any color can be
represented as a non-negative linear combination of X, Y, and Z
values.

The colour matching functions in CIE XYZ are chosen to be positive


everywhere, so that the coordinates of any real light are always
positive.

But it is not possible to obtain CIE X, Y, or Z primaries because for


some wavelengths the value of their spectral radiance is negative.

However, given colour matching functions alone, one can specify the
XYZ coordinates of a colour and hence describe it.

Linear colour spaces allow a number of useful graphical


constructions which are more difficult to draw in three-dimensions
than in two, so it is common to intersect the XYZ space with the plane
X+Y+Z=1
Thus, CIE has also introduced normalized coordinates x, y, and z,
obtained by dividing the X, Y and Z values by (X + Y + Z).

Properties of CIE XYZ Color Space:

1. Device Independence: The CIE XYZ color space is device-


independent, meaning that it is not tied to the characteristics of any
specific display or imaging device. This makes it suitable for color
measurement, analysis, and comparison across different devices and
environments.
2. Linear Relationship: The CIE XYZ color space maintains a
linear relationship between the numerical values representing color
and the perceived color. This linearity is important for various image
processing and color manipulation tasks.
3. Uniformity: While the CIE XYZ color space is not perfectly
perceptually uniform, it provides a reasonably uniform
representation of color compared to other color spaces like RGB.
4. Foundation for Other Spaces: Many other colour spaces, like
CIELAB and CIELUV, are derived from XYZ. These spaces incorporate
aspects of human colour perception for more intuitive colour
manipulation.
5. Colour Gamut (complete range): The XYZ space encompasses
all the colours a human can see. This "gamut" is a useful reference
for comparing the range of colours reproducible by different devices.

Applications of CIE XYZ Color Space:

1. Color Measurement: The CIE XYZ color space is widely used


for color measurement and specification in industries such as
printing, textiles, and automotive manufacturing. It allows for precise
quantification and communication of color information.
2. Color Analysis and Comparison: Because of its device-
independent nature, the CIE XYZ color space is used for color
analysis, comparison, and quality control in various applications,
including image processing, color grading, and scientific research.
3. Color Reproduction: The CIE XYZ color space serves as a
reference for color reproduction in digital imaging systems, helping
to ensure accurate color rendering across different devices and
platforms.
4. Colour Matching: In applications like printing or dyeing, XYZ
allows accurate colour matching by comparing target colours with
available materials.
5. Image Processing: Some image processing algorithms utilize
XYZ as an intermediate space for calculations because of its linear
relationship between tristimulus values and perceived intensity.

Standardization:

The CIE has standardized various aspects of the CIE XYZ color space
to ensure consistency and interoperability across different
applications and industries. These standards include color matching
functions, illuminants, observer conditions, and conversion
algorithms.

Limitations of CIE XYZ:


1. Non-Perceptual Uniformity: While a step towards perception-
based colour, XYZ isn't perfectly aligned with human colour
perception. Equal distances in the XYZ space don't always
correspond to equal perceived colour differences.

2. Imaginary Primaries: The mathematical definition of XYZ leads


to imaginary primaries (colours that cannot be created with real light
sources). This isn't a practical limitation for most applications.
Figure 3.8 (a) Figure 3.8 (b)

The volume of all visible colours in CIE XYZ coordinate space is a


cone whose vertex is at the origin.
Usually, it is easier to suppress the brightness of a colour by
intersecting the cone with the plane X + Y + Z = 1 to get the CIE xy
space.

CIE xy Colour Space:


The CIE xy color space, also known as the chromaticity diagram or
CIE 1931 chromaticity diagram, is a two-dimensional color space
derived from the CIE XYZ color space.

It provides a way to represent colors based on their chromaticity, or


the quality of a color independent of its brightness (luminance).

The CIE xy color space is widely used in color science, lighting design,
and colorimetry.

Basis of CIE xy Color Space:


Chromaticity: Chromaticity refers to the quality of a color that is
determined by its hue and saturation, irrespective of its brightness.
In the CIE xy color space, colors are represented by two
coordinates, x and y, which describe their chromaticity.

Transformation from CIE XYZ: The CIE xy color space is derived


from the CIE XYZ color space using a transformation that
normalizes the XYZ coordinates to obtain chromaticity coordinates.
This normalization eliminates the dependence on luminance and
results in a two-dimensional representation of color.

Components of CIE xy Color Space:


x and y Coordinates: In the CIE xy color space, each color is
represented by two chromaticity coordinates, x and y, which are
obtained by dividing the X, Y, and Z tristimulus values by their
sum:

x = X / (X + Y + Z)
y = Y / (X + Y + Z)

These coordinates describe the relative amounts of redness-


greenness (x) and yellowness-blueness (y) of a color.

Chromaticity Diagram: The CIE xy color space is often visualized


as a chromaticity diagram, which is a two-dimensional plot with x
and y coordinates.

The diagram typically includes a horseshoe-shaped boundary


called the spectral locus, which represents the achievable colours
for the human eye.

The chromaticity diagram is divided into the following parts:


1. Inner Area: The region enclosed by the curve represents the
full range of colours humans can perceive. Colours closer to
the centre are more desaturated or grayish, while those near
the edge are more saturated and vibrant.
2. Spectral Locus: The curved boundary itself is called the
spectral locus. It represents the colours produced by pure light
of a single wavelength (monochromatic light). Different
wavelengths of light have corresponding positions on the
spectral locus, with violet at one end and red at the other.
3. X and Y Axes: The horizontal axis (X) and vertical axis (Y)
represent the calculated chromaticity coordinates derived from
the XYZ tristimulus values.

Figure 3.10 Chromaticity Diagram of CIE xy Color Space.


Information conveyed by the chromaticity Diagram:

Colour Hue: Hue changes as one moves around the spectral locus.
Colours on the red side of the diagram have a reddish hue, those on
the green side have a greenish hue, and so on.

Colour Saturation: Saturation increases as one moves out radially


from white. Colours closer to the edge of the horseshoe are more
saturated, meaning they have a purer hue and less gray. Conversely,
colours closer to the centre are less saturated and appear more
grayish.
No Brightness Information: The xy diagram discards luminance
information (brightness) from the original XYZ values. It only focuses
on chromaticity.

Properties of CIE xy Color Space:


Chromaticity Representation: The CIE xy color space represents
colors based solely on their chromaticity, providing a perceptually
meaningful and intuitive way to describe and compare colors.

Device Independence: Like the CIE XYZ color space, the CIE xy
color space is device-independent, making it suitable for color
specification, analysis, and comparison across different devices and
environments.

Applications of CIE xy Color Space:


1. Color Specification: The CIE xy color space is used to specify
and communicate the chromaticity of light sources, displays, and
materials in various industries such as lighting design, display
technology, and interior design.

2. Color Matching: The CIE xy color space is used in color


matching applications to determine how closely a given color
matches a reference color. By comparing the chromaticity
coordinates of two colors, color differences can be quantified and
evaluated.

3. Color Analysis: The CIE xy color space is used for color


analysis and visualization in fields such as color science,
colorimetry, and image processing. It provides a convenient and
standardized framework for analyzing and interpreting color data.

4. CIE xy is widely used in vision and graphics textbooks and in


some applications, but is usually regarded by professional
colorimetrists as out of date.

5. Colour Gamut Visualization: As mentioned earlier, xy is


crucial for visualizing and comparing the colour gamuts of various
devices.
6. Education and Communication: The xy chromaticity diagram
is a helpful tool for teaching colour theory and visually representing
colour relationships.

Limitations and Considerations:


Lack of Luminance Information: The CIE xy color space does not
include information about the brightness (luminance) of colors.
Therefore, it does not fully describe the perceptual appearance of
colors, especially when brightness is an important factor.

Metamerism: Colors with different spectral compositions may have


the same chromaticity coordinates in the CIE xy color space, leading
to metamerism.
Metamerism occurs when colors appear visually identical under
certain conditions but have different spectral properties.

Non-linear Perception: Similar to XYZ, equal distances in the xy


space don't always correspond to equal perceived colour differences
for humans.

The RGB Colour Spaces


Colour spaces are normally invented for practical reasons, and so a
wide variety exist.

RGB (Red, Green, Blue) color spaces are fundamental to the digital
world, used in digital imaging, video systems, display technologies,
computer graphics, photography, and digital media, etc.

RGB colour spaces form the foundation for how colors are displayed
on electronic devices like monitors, TVs, and smartphones.

RGB Colour spaces represent colors by specifying the intensity of red,


green, and blue primary colors.
These primary colors are combined in various proportions to produce
a wide range of colors.

The RGB colour space is a linear colour space that formally uses
single wavelength primaries (645.16 nm for R, 526.32nm for G and
444.44nm for B).
Informally, RGB uses whatever phosphors a monitor has as
primaries.

Core Principles:
Additive Color Mixing: RGB relies on the principle of additive color
mixing.
By combining varying intensities of red, green, and blue light, a vast
range of colors can be produced.

Color Gamut: The color gamut of an RGB color space refers to the
range of colors that can be accurately represented within that color
space.
Different RGB color spaces have different gamuts, with some capable
of representing a wider range of colors than others.

Digital Representation: In digital systems, each color component


(red, green, blue) is represented by a numerical value.
The most common format is 8 bits per channel, allowing values
from 0 (no intensity) to 255 (maximum intensity).
This combination creates a total of 256 x 256 x 256 = 16,777,216
possible colors.

For example:
R=0, G=0, B=0 yields black
R=255, G=255, B=255 yields white
R=251, G=254, B=141 yields a pale yellow
R=210, G=154, B=241 yields a light purple.

Trichromatic Theory: RGB color spaces are based on the


trichromatic theory of color vision, which suggests that the human
visual system perceives color through the combined responses of
three types of cone cells in the retina, each sensitive to different
wavelengths of light (long, medium, and short wavelengths).

Available colours are usually represented as a unit cube — usually


called the RGB cube — whose edges represent the R, G, and B
weights.
The cube is drawn in figure 3.12

Figure 3.12 (a) Figure 3.12 (b)

Figure 3.12 (a) shows the RGB cube; which is the space of all colours
that can be obtained by combining three primaries (R, G, and B —
usually defined by the colour response of a monitor) with weights
between zero and one.

It is common to view this cube along its neutral axis — the axis from
the origin to the point (1, 1, 1) — to see a hexagon, shown in the
middle.

This hexagon codes hue (the property that changes as a colour is


changed from green to red) as an angle, which is intuitively satisfying.

Figure 3.12 (b) shows a cone obtained from the RGB cube cross-
section, where the distance along a generator of the cone gives the
value (or brightness) of the colour, angle around the cone gives the
hue and distance out gives the saturation of the colour.

Variations of RGB: There isn't a single universal RGB color space.


Different variations exist, each with its own characteristics:

1. sRGB (Standard RGB):


The most commonly used space, particularly for web graphics and
digital cameras.

The sRGB color space, defined by HP and Microsoft, was originally


designed to work well with cathode ray tube displays and on the
Internet.

The phosphors which are typically used in those (with substantially


non-monochromatic emission) substantially limit the color gamut.

Deficiencies are particularly apparent in the blue-green region, but a


deep red is also not included.

It has a gamma correction applied to compensate for the non-linear


response of human vision and display devices.

Despite its limitations, the sRGB color space is still very widely used
and often considered to be the default if the color space of an image
is not explicitly specified.

It has been endorsed by the W3C, Exif, Intel, Pantone, Corel, and
many other industry players.

It is also well accepted by open-source software such as the GIMP


(GNU Image Manipulation Program), and is used in proprietary and
open graphics file formats such as SVG (Scalable Vector Graphics).

Adobe RGB:
The Adobe RGB color space was developed by Adobe Systems in
1998.

It was designed to encompass most of the colors achievable on CMYK


color printers, but by using RGB primary chromaticities on a device
such as the computer display.

The Adobe RGB color space encompasses roughly 50% of the visible
colors specified by the Lab color space, improving upon the gamut of
the sRGB color space primarily in cyan-greens.
It offers a wider gamut (range of colors) compared to sRGB, suitable
for professional image editing where preserving a larger color range
is crucial.

ProPhoto RGB:
ProPhoto RGB, also known as ROMM RGB, is an even wider-gamut
RGB color space designed for high-end professional imaging
applications.

It encompasses a larger range of colors than both sRGB and Adobe


RGB and is often used in workflows involving high-quality image
capture and editing.

Applications of RGB Color Spaces:

1. Image and Video Display: RGB is the primary color space for
displaying images and videos on various electronic devices.
2. Digital Photography: Most digital cameras capture images in
an RGB color space, often using variations like sRGB for
compatibility.
3. Computer Graphics and Animation: RGB is the foundation for
creating and manipulating colors in computer graphics and
animation software.

Limitations of RGB Color Spaces:


1. Device Dependence: RGB is inherently device-dependent.
The same RGB values might appear differently on two monitors due
to variations in display technology and calibration.

2. Non-Perceptual Uniformity: Equal distances in the RGB space


don't always correspond to equal perceived color differences for
humans.

CMY and Black Colour Space:


CMY stands for Cyan, Magenta, and Yellow, and it's a color space
primarily used in color printing and dye industry.

In this colour space, there are three primaries:


1. Cyan (a blue green colour)
2. Magenta (purplish colour) and
3. Yellow.

In CMY model, red, green and blue are complementary colours.

It deals more with pigments than with light.

Pigments remove colour from incident light which is reflected from


paper.

In the CMY colour space, each layer of ink reduces the initial
brightness, by absorbing some of the light wavelengths and reflecting
others, depending on its characteristics.
It uses the subtractive colour mixing where colors are created by
subtracting varying amounts of cyan, magenta, and yellow inks from
white light, to create different hues.

When all colors are fully subtracted, it results in black.

This is in contrast to the RGB colour space, which uses additive


colour mixing.

Primary Colors:
1. Cyan: Cyan is a greenish-blue color.
It absorbs red light and reflects green and blue light.
In printing, cyan ink is used to subtract red light from white light,
resulting in cyan.
cyan = White − Red (C = W - R); -------- (1)

2. Magenta: Magenta is a purplish-red color.


It absorbs green light and reflects red and blue light.
In printing, magenta ink is used to subtract green light from white
light, resulting in magenta.
magenta = White − Green (M = W - G) -------- (2)

3. Yellow: Yellow absorbs blue light and reflects red and green
light.
In printing, yellow ink is used to subtract blue light from white light,
resulting in yellow.
yellow = White − Blue (Y = W-B) ---------- (3)

Color Mixing:
The appearance of mixtures may be evaluated by reference to the
RGB colour space.
For example,
W = R + G + B ------------------(4)

By varying the amounts of cyan, magenta, and yellow inks, a wide


range of colors can be produced.

1. Mixing cyan and magenta inks subtracts red and green


light, resulting in blue.
Adding (1) and (2), we have,
C + M = W-R + W – G ------- (5)

we assume that ink cannot cause paper to reflect more light than it
does when uninked.
Hence, we can write,
W + W = W --------------- (6)
 C + M = W – R – G ---------(7)
 Put (4) in (7), we have,
 C+M=R+G+B–R–G
 C + M = B ----------- (8)
For example:

2. Mixing magenta and yellow inks subtracts green and blue


light, resulting in blue.
Adding (2) and (3), we have,
M + Y = W – G + W – B ------- (9)
W + W = W --------------- (6)
 M + Y = W – G – B ---------(10)
 Put (4) in (10), we have,
 M+Y=R+G+B–G–B
M + Y = R ------------ (11)

3. Mixing cyan and yellow inks subtracts red and blue light,
resulting in green.
Adding (3) and (1), we have,
C + Y = W – R + W – B ------- (12)
W + W = W --------------- (6)
 C + Y = W – R – B ---------(13)
 Put (4) in (13), we have,
 C+Y=R+G+B–R–B
C + Y = G ------------ (14)

4. Equal parts of cyan, magenta, and yellow inks subtract all


primary colors equally, resulting in black.
Adding (1), (2) and (3),
C + M + Y = W – R + W – G + W – B --------- (15)
Bur W + W + W = W ---------------(16)
C+M+Y= W–R–G–B
 C + M + Y = W – (R + G + B) --------- (17)
Put (4) in (17)
C+M+Y=W–W
C + M + Y = 0 (zero) ……….(18)
That is
C + M + Y = Black ----------- (19)

Limitations of CMY Model:


While CMY is a versatile color model, it has limitations.
Getting really good results from a colour printing process is still
difficult because:
i) different inks have significantly different spectral properties;
ii) different papers have different spectral properties, too; and
iii) inks can mix non-linearly.

CMYK Colour Model:


In CMY colour model, due to imperfections in inks and printing
processes, mixing equal parts of cyan, magenta, and yellow inks often
results in a muddy dark color rather than a true black.

To overcome this, a separate black ink, referred to as "Key" (hence,


CMYK), is often added in printing to enhance the depth of shadows
and produce true black and rich grayscale tones.

Black ink provides richer, deeper blacks and sharper details in print
compared to mixing CMY inks.

Hence, the new colour model is known as CMYK colour model, which
is an extension of the CMY model, incorporating a fourth color, black
(K), to improve the quality of color reproduction in printing.

The CMYK color space has a narrower color gamut compared to RGB,
meaning it can represent fewer colors.

Applications of CMY and CMYK colour space:


CMY and CMYK colour spaces are predominantly used in color
printing processes, including offset printing, digital printing, and
color photography.

Understanding CMY and CMYK models is crucial for graphic


designers, printers, and anyone involved in producing color prints.

Thus, CMYK colour space enables accurate color reproduction and


ensures consistency across different printing processes and
materials.

3.3.2 Non-linear Colour Spaces


Disadvantages/Limitations of Linear Colour Spaces:
While linear color spaces offer simplicity and ease of calculation, they
come with some limitations when it comes to representing color
perception and working with real-world devices:

1. The coordinates of linear colour space do not encode properties


such as hue, saturation and brightness which are very common in
language and also are important in applications.
2. Non-uniformity: Linear spaces don't account for how humans
perceive brightness. Equal changes in value on a linear scale don't
translate to equal perceived changes in brightness. We are more
sensitive to variations in darker areas. This can lead to unrealistic
color representations, especially in low-light scenarios.
3. Color Matching: Matching colors across different devices or
media can be challenging in linear colour spaces. A specific value in
RGB might appear different on a phone screen compared to a high-
quality print.
4. Non-linear Display Response: Electronic displays don't emit
light with a perfectly linear relationship to the voltage they receive.
They have a gamma response curve, meaning higher values increase
brightness at a faster rate. Linear color spaces can't directly
represent this behavior, requiring conversion for accurate display.
This adds complexity to image processing workflows and can increase
computational overhead.
5. Complex Calculations: Some color processing tasks, like color
mixing or averaging, become mathematically complex in linear
spaces. Algorithms need to account for the non-linearity of human
perception or device response for accurate results.
6. Difficulties with hue representation: For example, checking
if a color is "red" might be tricky because a linear value might jump
from a high value (red) to a low value (not red) instead of smoothly
transitioning.
7. Missing the "color circle" intuition: Humans perceive hues as
a continuous circle, where red goes to orange, yellow, and so on,
eventually looping back to red. Linear spaces struggle to represent
this because a single coordinate might have a maximum value far
away from the minimum, making it difficult to represent the circular
nature of hues.
HSV(HSB) and HSL:
Hue, Saturation and Value(Brightness) and Hue
HSV is one of the very popular non-linear color spaces that
addresses and overcomes some of the limitations of linear spaces
like RGB.

HSV is cylindrical-coordinate representations of cartesian points in


an RGB color model, which makes it much more intuitive and
perceptually relevant, in accordance with the human vision.

In HSV colour space, H stands for Hue, S stands for Saturation and
V stands for Value and B stands for brightness.

Hue (H):
the property of a colour that varies in passing from red to green is
known as Hue.
In each cylinder, the angle around the central vertical axis
corresponds to "hue".
In the HSL color model, hue represents the actual colour, which
ranges from 0 to 360 degrees, covering the entire spectrum of
colors.
Red is typically at 0 degrees, green at 120 degrees, and blue at 240
degrees, with intermediate hues in between.

2. Saturation (S):
The distance from the axis corresponds to "saturation".
It can also be defined as the property of a colour that varies in
passing from red to pink;
Saturation refers to the intensity or purity of the color.
A saturation value of 0 results in a grayscale color (no hue), while a
saturation value of 100% represents the fully saturated hue.
Saturation is typically expressed as a percentage, ranging from 0%
(gray) to 100% (fully saturated).

3. Value (V):
Value is also known as brightness.
So the HSV colour space is also known as HSB colour space.
Value is the property that varies in passing from black to white.
The distance along the axis corresponds to value or brightness.
Value/Brightness represents the brightness of the color.
The value or brightness dimension is represented on a scale from 0%
to 100%.
Value (sometimes called Brightness) also goes from 0% (black) to
100% (white).
The lower the percentage, the darker the color will be; the higher the
percentage, the purest the color will be.
The value is essentially the brightness of the color irrespective of its
hue.

4. Lightness: Lightness in HSL represents the brightness of the


color relative to a neutral gray.
It's represented as a percentage ranging from 0% (black) to 100%
(white). At 0% lightness, the color is black, at 100%, it's white, and
at 50%, it's the color in its purest form.

Both HSL and HSV require conversion to/from linear spaces like
RGB for calculations and display on devices.

Differences in HSV and HSL colour spaces:


In HSV, brightness is a measure of how much light is emitted by a
color, whereas in HSL, lightness is a measure of how much light is
reflected by a color.
In HSL, the colors are most saturated midway between the top
(white) and bottom (black) of the cylinder. i.e., most saturated at
50%.

In HSV, the colors are most saturated at the top (full value), and
desaturation happens as we move down the cylinder, i.e., most
saturated at 100%.

Perceptual Uniformity: HSL is designed to be more perceptually


uniform than HSV, meaning that equal steps in the numerical values
of the three parameters are intended to produce roughly equal
perceptual changes in the color represented.
Primary Use Cases: HSV is often preferred in applications where the
manipulation of color based on its type (hue) is the primary concern,
such as image editing or color selection tools.
HSL, on the other hand, is more intuitive for tasks where changes in
the lightness of a color are more important, such as web design or
data visualization.

Figure 3.12 (a) Figure


3.12 (b)

Figure 3.12 (a) shows the RGB cube, which can be viewed with
weights between 0 and 1. This cube appears as a hexagon, when
viewed along its neutral axis — the axis from the origin to the point
(1, 1, 1).

Figure 3.12 (b) shows the cone obtained from the cross-section of
the hexagon.
The angle around the cone gives hue, and the distance away from
the axis gives the saturation, and the distance from origin along the
vertical axis gives value or brightness.

Applications:
1. Developed in the 1970s for computer graphics applications,
HSL and HSV are used today in color pickers, in image editing
software, and less commonly in image analysis and computer vision.
2. HSV model is used in histogram equalization.
3. Converting grayscale images to RGB color images.
4. Visualization of images is easy as by plotting the H and S
components we can vary the V component or vice-versa and see the
different visualizations.

Uniform Colour Spaces (UCS):


Uniform color spaces (UCS) address a critical limitation of color
spaces like HSL and RGB: perceptual uniformity.

Perfect color reproduction is challenging and very difficult because


we can not always perfectly match colors.

So it is crucial to know if a colour difference would be noticeable to


the human eye.

Just noticeable difference (JND): This is the smallest color change


a human can perceive. By measuring JNDs, we can define the
boundaries of indistinguishable colors.

When these colour differences are plotted on a colour space, they


form the boundary of a region of colours that are indistinguishable
from the original colours.

Non-uniform colour spaces: Existing spaces like CIE xy have


limitations. The size of a difference in coordinates doesn't directly
translate to the significance of the color change.
Figure 1

A uniform colour space is one in which the distance in coordinate


space is a fair guide to the significance of the difference between two
colours — in such a space, if the distance in coordinate space was
below some threshold, then a human observer would not be able to
tell the colours apart.

Uniform color spaces are color spaces that approximate human


perception of color.
Geometric distances between color attributes (saturation, lightness,
or hue) are close approximations to the way humans perceive
changes in color.

Figure 2

Figure 1 shows an abrupt transition from green to blue due to non-


uniform colour spaces whereas Figure 2 shows smooth or gradual
transition from green to blue as a result of uniform colour spaces.
Macadam Ellipses:
Just noticeable differences can be obtained by modifying a colour
shown to an observer until they can only just tell it has changed in a
comparison with the original colour.

When these differences are plotted on a colour space, they form the
boundary of a region of colours that are indistinguishable from the
original colours.
These boundaries form ellipses instead of circles because the human
eye has varying sensitivity to color changes across the spectrum.

These ellipses are known as Macadam ellipses, which are named


after its inventor physicist David Macadam.

These Macadam ellipses shown below are a cornerstone of color


science.
These ellipses represent the regions in a color space where human
vision struggles to distinguish a color from a central reference point.

Figure 3.13

Figure 3.13 shows variations in colour matches on a CIE x, y space.

At the center of the ellipse is the colour of a test light; the size of the
ellipse represents the scatter of lights that the human observers
tested would match to the test colour; the boundary shows where the
just noticeable difference is.
The ellipses at the top are larger than those at the bottom of the
figure, and that they rotate as they move up.
This means that the magnitude of the difference in x, y coordinates
is a poor poor indicator of the significance of a difference in colour.
The CIE u'v' color space:
The CIE u'v' color space, also known as the CIE 1960 UCS (Uniform
Colour Space), is an important step towards more perceptually
uniform color representation compared to its predecessor, CIE xy.

The CIE u'v' space was developed to address the issue of non-
uniform perception where in equal distances in the space don't
necessarily correspond to equal perceived color differences.

It's a transformation of CIE xy using a mathematical formula that


aims to create a more perceptually uniform space.

CIE u’ v’ Color Space:


The CIE u'v' color space, also known as the CIE 1960 UCS (Uniform
Colour Space), is an important step towards more perceptually
uniform color representation compared to its predecessor, CIE xy.

The CIE u'v' space was developed to address the issue of non-
uniform perception where in equal distances in the space don't
necessarily correspond to equal perceived color differences.

CIE u'v' is also visualized as a horseshoe-shaped diagram, but the


scales and shapes are different from xy. The curved boundary still
represents the spectral locus.

u' and v' Coordinates: Colors within the diagram are defined by
coordinates u' and v' that are derived from the original x and y values
of CIE xy.

The coordinates are:

Where X, Y, Z: indicate tristimulus values and


x and y: indicate chromaticity coordinates.
Generally, the distance between coordinates in u’, v’ space is a fair
indicator
of the significance of the difference between two colours, neglecting
the difference in brightness.

Disadvantage:
The transformation from xy to u'v' involved complex mathematical
formulas, making it less convenient for some applications.

CIELUV Color Space:


CIELUV, also known as CIE 1976 Luv*, is a uniform color space
widely used for applications where accurate and consistent color
perception is crucial.

It builds upon the foundation of previous color spaces like CIE xy


and CIE u'v' to offer a more perceptually uniform representation of
color.

Like its predecessors, CIELUV is based on the tristimulus values


from human cone cells.

However, it incorporates a non-linear transformation to achieve


greater perceptual uniformity.

This means that equal distances in the CIELUV space correspond


more closely to how similar or different colors appear to the human
eye.

Three Components of CIELUV:


CIELUV is a three-dimensional space defined by three coordinates:

L (Lightness):* This value represents the perceived lightness or


darkness of a color, ranging from 0 (black) to 100 (white).

u*: This coordinate represents the green-red opponent channel.


Positive values indicate colors towards red, and negative
values indicate colors towards green.
v*: This coordinate represents the blue-yellow opponent channel.
Positive values indicate colors towards yellow, and negative values
indicate colors towards blue.

In this color space, the distance between two points approximately


tells how different the colors are in luminance, chroma, and hue.

The CIELUV coordinates (L*,u*,v*) can be calculated from the


tristimulus values XYZ or the chromaticity coordinates (x,y) with the
following formulas.

Where Y: Tristimulus value Y


u’ and v’: Chromaticity coordinates from the CIE 1976 UCS
diagram.

Not Commonly Used: Unlike CIE xy and u'v', CIELUV is not


typically visualized as a two-dimensional diagram. This is because
it's a 3D space and visualizing it accurately in 2D can be
challenging.
Software Tools: Specialized software tools can be used to represent
CIELUV mathematically and calculate color differences within the
space.

Metric for Color Difference: CIELUV allows for the calculation of


Delta E (ΔE), a metric that quantifies the perceived color difference
between two colors in the space, but not the direction.

ΔE is given by:

This is valuable for tasks like color comparison and tolerance


testing.

A value of ΔE of unity represents a just noticeable difference (JND).

CIE LAB Uniform Colour Space:


The CIELAB color space, also known as CIE 1976 (Lab*) color
space, is a color space defined by the International Commission on
Illumination (CIE).

It's one of the most widely used color spaces in various fields such
as color science, color management, and computer graphics.

CIELAB is designed to be perceptually uniform, meaning that a


change of a certain amount in any direction within the color space
should correspond roughly to a similar perceptual change in color
to the human eye.

It is based on the opponent color model of human vision,


where red/green forms an opponent pair, and blue/yellow
forms an opponent pair.

In CIELAB, colors are represented in a three-dimensional space


defined by three coordinates:
1. Luminance or Lightness(L): This represents the brightness of
the color, ranging from 0 (black) to 100 (white).

2. a*: (green–red): This axis represents the position between green


and red.
This coordinate (-128 to +127) represents the green-red opponent
channel. Positive values indicate redness, while negative values
indicate greenness.

3. b*: (blue–yellow): This axis represents the position between blue


and yellow.
This coordinate (-128 to +127) represents the blue-yellow
opponent channel.
Positive values indicate yellowness, while negative values
indicate blueness.

This makes it ideal for applications requiring high color accuracy,


and useful for color comparison and color reproduction across
different devices and viewing conditions.

The 1976 CIELAB coordinates (L*, a*, b*) in this color space can be
calculated from the tristimulus values XYZ with the following
formulas.
The subscript n denotes the values for the white point.

Here Xn, Yn and Zn are the X, Y and Z coordinates of a reference white


patch.
Similar to CIELUV, CIELAB allows for calculating Delta E (ΔE), which
is a metric that quantifies the perceived color difference between two
colors in the space, but not the direction.
The "E" in delta E or delta E* is derived from "Empfindung", the
German word for sensation.
The magnitude of Delta E indicates how similar or dissimilar the
colors appear to the human eye.
ΔE is given by:

Advantages of CIELAB Color Space:


1. Since CIELAB is measured across three dimensions, there’s an
infinite number of color possibilities.
2. CIELAB color space is device-independent, which means it's not
tied to any specific device or technology.
3. Compared to CIELUV, CIELAB offers a slightly better
approximation of how humans perceive color differences.
4. Since the lab space is fully mathematically defined, the CIELAB
is copyright and license-free.
5. It’s also entirely in the public domain, meaning that it’s
completely free to use and integrate into your projects.

Disadvantages of CIELAB:
1. The precision provided by CIELAB is at a level where it requires
significantly more data per pixel, compared to RGB and CMYK
standards.
Since the gamut of the standard is higher than most computer
displays, occasionally there is some loss of precision; however,
advances in technology have made such issues negligible.

In conclusion, CIELAB stands as the industry standard for tasks


requiring precise and consistent color representation. Its focus on
perceptual uniformity and the Delta E metric make it a powerful tool
for various applications across science, design, and technology.

3.3.3 Spatial and Temporal Effects


Spatial and temporal effects are fundamental concepts in computer
vision (CV).
They deal with how we perceive and represent the colour of the
objects around us.

Spatial Effects: The spatial effects relate the objects being viewed
with space.
These focus on how objects are arranged and relate to each other
in 3D space.
Temporal Effects: The temporal effects relate the objects being
viewed with time.
These focus on how objects move and change over time.

Spatial Effects are essential for tasks like:


1. Object Detection and Recognition: Identifying and locating
objects in an image or video based on their spatial characteristics
(shape, size, texture).
2. Scene Understanding: Analyzing the spatial relationships
between objects in a scene to understand the layout and activity.
3. 3D Reconstruction: Recovering the 3D structure of a scene
from multiple 2D images or videos.

Temporal Effects are used for:


1. Motion Tracking: Following the movement of objects over
time in a video sequence.
2. Action Recognition: Classifying human actions in videos
(e.g., walking, running, jumping).
3. Video Segmentation: Separating objects of interest from the
background based on their motion patterns.

Spatial Effects:
These include the following:
1. Spatial Resolution: Spatial resolution refers to the level of
detail in an image and is determined by the number of pixels in an
image.
Higher resolution images contain more detail and provide a clearer
representation of the scene.
Spatial resolution is crucial in tasks like image classification,
object detection, and image segmentation.
2. Spatial Filtering: Spatial filtering involves applying
mathematical operations to an image at each pixel to enhance or
extract certain features. Common spatial filters include blurring
(smoothing), sharpening, edge detection, and noise reduction
filters.
These filters manipulate the spatial arrangement of pixel values to
highlight or suppress specific features in an image.

3. Spatial Transformation: Spatial transformation involves


geometrically altering the spatial arrangement of pixels in an image.
Common spatial transformations include rotation, scaling,
translation, and affine transformations.

These transformations are used for tasks like image registration,


image warping, and geometric correction.

4. Spatial Domain Processing: Spatial domain processing


involves manipulating pixel values directly in the image space.
Techniques like histogram equalization, contrast stretching, and
local adaptive processing are used to enhance the visual quality of
images or extract useful information.

Temporal Effects:
1. Temporal Resolution: Temporal resolution refers to the
ability to distinguish between changes in a scene over time.
In video processing, it is determined by the frame rate, which is the
number of frames captured or displayed per second.
Higher frame rates provide smoother motion and better temporal
resolution, which is essential for tasks like motion detection,
tracking, and video analysis.

2. Temporal Filtering: Temporal filtering involves processing a


sequence of frames over time to remove noise, smooth motion, or
extract temporal features.
Techniques like temporal averaging, temporal differencing, and
motion-compensated filtering are used to enhance the quality of
videos and extract useful information about motion dynamics.
3. Temporal Integration: Temporal integration refers to the
aggregation of information over multiple frames to improve the
reliability of measurements or enhance the visibility of features.
Techniques like temporal averaging, temporal pooling, and
Kalman filtering are used to reduce noise, increase signal-to-noise
ratio, and improve the accuracy of measurements in video
processing tasks.

4. Temporal Dynamics Analysis: Temporal dynamics analysis


involves studying how features in a scene change over time.
This includes detecting motion patterns, identifying temporal events,
and analyzing long-term trends in video sequences.
Techniques like optical flow estimation, activity recognition, and
event detection are used to extract meaningful information from
temporal data.

In both computer graphics and computer vision, understanding and


effectively managing spatial and temporal effects are essential for
tasks such as image/video enhancement, feature extraction, object
recognition, motion analysis, and scene understanding.
By leveraging spatial and temporal information, algorithms can
extract more meaningful insights from visual data and perform a
wide range of tasks with greater accuracy and efficiency.

Chromatic Adaptation:
Predicting the appearance of complex displays of colour is difficult.

If human visual system is exposed to a particular illuminant for some


time, this causes the colour system to adapt, a process known as
chromatic adaptation.

Chromatic adaptation is a phenomenon where the human visual


system adjusts its perception of colours to compensate for changes
in the illumination of a scene.

This adaptation mechanism allows humans to perceive colours


consistently across varying lighting conditions, ensuring that objects
maintain their perceived colour appearance under different light
sources.
Adaptation causes the colour diagram to skew(distort), in the sense
that two observers, adapted to different illuminants, can report that
spectral radiosities with quite different chromaticities have the same
colour.

Adaptation can be caused by surface patches in view.

The other mechanisms that are significant in adaptation are:

1. Assimilation — where surrounding colours cause the colour


reported for a surface patch to move towards the colour of the
surrounding patch.

2. Contrast — where surrounding colours cause the colour


reported for a surface patch to move away from the colour of the
surrounding patch.

3. Light Source: The colour of the light illuminating a scene


significantly impacts the perceived colour of objects. Sunlight vs
fluorescent lights create vastly different colour casts.

These effects appear to be related to coding issues within the optic


nerve, and colour constancy.

Mechanism of Chromatic Adaptation:


The retina contains three types of cones, each sensitive to different
wavelengths of light (short, medium, and long wavelengths
corresponding roughly to blue, green, and red, respectively).
The signals from these cones are processed by the visual system to
compute the perceived color of objects in the scene.

When the spectral composition of the illumination changes, the


signals received by the cones also change.
Chromatic adaptation mechanisms adjust the sensitivity of the cones
and modify the processing of color signals to maintain a consistent
perception of object colors.

Types of Chromatic Adaptation:


1. Von Kries Adaptation: The Von Kries adaptation model
assumes that the sensitivity of the three cone types scales
proportionally with the spectral power distribution of the illuminant.
This means that the responses of the cones are adjusted
multiplicatively to match the characteristics of the current lighting
conditions.
While this model is simple and computationally efficient, it does not
fully capture all aspects of chromatic adaptation.

2. Bradford Adaptation: The Bradford adaptation model improves


upon Von Kries by considering the differences in adaptation between
daylight and artificial light sources.
It uses a diagonal transformation matrix to adapt colors based on the
spectral characteristics of the illuminant.
The Bradford model attempts to provide better color constancy under
a wider range of lighting conditions.

3. Sharp Adaptation: The Sharp adaptation model extends the


Bradford model by incorporating additional terms to account for
variations in adaptation across different regions of the visual field.
It considers the fact that chromatic adaptation mechanisms may vary
depending on the spatial distribution of colors in the scene.

3.5 Colour Constancy & Surface Colour from Image Colour:


The colour of light arriving at a camera is determined by two factors:
1. The spectral reflectance of the surface that the light is leaving,
and
2. The spectral radiance of the light falling on that surface.
The colour of the light falling on surfaces can vary very widely, as a
result of which the colour of the light arriving at the camera can be
quite a poor representation of the colour of the surfaces being viewed.

CV applications often need to deal with images or videos captured


under varying lighting conditions.

If we light a green surface with white light, we get a green image; if


we light a white surface with a green light, we also get a green
image. This makes it difficult to name surface colours from
pictures.
This is where the colour constancy algorithms come into picture.

Colour constancy algorithms are the algorithms which take an


image, discount the effect of the light, and report the actual colour
of the surface being viewed.

Chromatic adaptation algorithms help achieve colour constancy,


making objects appear with consistent colours despite illumination
variations.

3.5.1 Surface Colour Perception in People


Colour constancy is an interesting subproblem that is associated
with general human vision problem.

There is some form of colour constancy algorithm in the human


vision system, which people are often unaware of.

Colour constancy focusses on intensity independent descriptions of


colour like hue and saturation.

Lightness constancy allows humans to report whether a surface is


white, grey or black (the lightness of the surface) despite changes in
the intensity of illumination (the brightness).

The human colour constancy algorithm uses various forms of


simple linear models to predict the colour of a complex scene being
viewed.
But the predictions given by these models tend to be wildly
inaccurate because it is surprisingly difficult to predict the colours
a human will see in a complex scene.

This makes it hard to produce really good colour reproduction


systems.

Since colour constancy systems are neither perfectly accurate, nor


unavoidable, humans can report:
• the colour a surface would have in white light (often called surface
colour);
• colour of the light arriving at the eye, and
• sometimes, the colour of the light falling on the surface.

Colour constancy fails due to the following reasons:


1. Human competence at colour constancy is poorly understood.
2. The main experiments on humans do not explore all
circumstances.
3. It is not known how robust colour constancy is.
4. The extent to which high level cues in colour constancy
contribute to colour judgements is not known.

Lightness constancy is extremely good over a wide range of illuminant


variation, in spite of the fact that the brightness of a surface varies
with its orientation as well as with the intensity of the illuminant.

3.5.2 Inferring Lightness


Human lightness constancy involves two processes:
i) one compares the brightness of various image patches, and uses
this comparison to determine which patches are lighter and
which darker;

ii) the second establishes some form of absolute standard to which


these comparisons can be referred.

The lightness constancy algorithms tend to be simpler than colour


constancy algorithms.

Lightness Constancy Algorithm:

A Simple Model of Image Brightness


Current lightness constancy algorithms were developed in the
context of simple scenes.

The radiance arriving at a pixel depends on the illumination of the


surface being viewed, its BRDF, its configuration with respect to the
source and the camera responses.

For simplification, the following assumptions are made:


1. The scene is flat and frontal;
2. That surfaces are diffuse, or that specularities have been
removed;
3. That surfaces are Lambertian and
4. That the camera responds linearly to radiance.

In this case, the camera response C at a point x is the product of an

illumination term I(x), an albedo term ρ(x), and a constant kc


that comes from the camera gain:
C(x) = kc I(x)ρ(x)
If we take logarithms, we get

log C(x) = log kc+ log I(x) + logρ(x)

Recovering Lightness from the Model


Algorithm to determine Lightness of 1D-Image Patches:
Assumptions made for the algorithm to determine lightness of 1D-
image patches are:
1. Albedoes change only quickly over space, which means that
spatial derivatives of the term log ρ(x) are either zero (where the
albedo is constant) or large (at a change of albedo).
2. Illumination changes only slowly over space, for example, the
illumination due to a point source will change relatively slowly unless
the source is very close.

Based on the above assumptions, algorithms are built.

The earliest algorithm, Retinex algorithm of Land and McCann


[1971], has fallen into disuse.

But a more convenient and easy way to measure/estimate the


lightness of 1D-image patches is shown in figure 3.22 below:
Figure 3.22 The Lightness Algorithm

In the top row, the graph on the left shows log ρ(x); that the center
shows log I(x) and that on the right their sum which is logC.

The log of image intensity has large derivatives at changes in surface


reflectance and small derivatives when the only change is due to
illumination gradients.

Lightness is recovered by differentiating the log intensity,


thresholding to dispose of small derivatives, and then integrating, at
the cost of a missing constant of integration.
this procedure is widely known as Retinex.

This approach can be extended to two dimensions as well.


The minimization problem-based lightness algorithm:
The minimization problem-based lightness algorithm is a
computational approach used in colour constancy to correct for
variations in illumination across an image.

This algorithm formulates colour constancy as a minimization


problem, where the goal is to find the optimal colour correction that
minimizes the discrepancy between observed colours and the true
colours of objects in the scene.

Algorithm:
1. The log albedo map whose gradient is most like the thresholded
gradient is chosen.
2. This is a relatively simple problem, because computing the
gradient of an image is a linear operation.
3. The x-component of the thresholded gradient is scanned into a
vector p and the y-component is scanned into a vector q.
4. The vector representing log-albedo is written as l.
5. Since the process of forming the x derivative is linear, there is
some matrix Mx such that Mxl is the x derivative.
6. For the y derivative, the corresponding matrix is written as My.

7. Differentiating and thresholding is easy: at each point, the


magnitude of the gradient is estimated, and if the magnitude is less
than some threshold, the gradient vector is set to zero, else it is left
as it.
8. The problem becomes to find the vector l that minimizes
| Mx l− p |2 + | My l− q |2
9. This is a quadratic minimization problem, and the answer can
be found by a linear process.
10. Some special tricks are required, because adding a constant
vector to l cannot change the derivatives, so the problem does not
have a unique solution.
11. The constant of integration is obtained from one of the two
assumptions mentioned below:
• we can assume that the brightest patch is white;
• we can assume that the average lightness is constant.
The major difficulty in estimating the lightness is caused by shadow
boundaries.

3.5.3 A Model for Image Colour


To build a colour constancy algorithm, we need a model to interpret
the colour of pixels.

Many phenomena affect the colour of this pixel.

The main parameters which affect the colour of the pixel are:
1. The camera response to illumination (which might not be
linear);
2. The choice of camera receptors;
3. The amount of light that arrives at the surface;
4. The colour of light arriving at the surface;
5. The dependence of the diffuse albedo on wavelength; and
specular components.

The value at a pixel can be modelled as:


C(x) = gd(x)d(x) + gs(x)s(x) + i(x)
Where , gd(x)d(x) + gs(x)s(x) are diffuse terms and
i(x) is interreflected term
gd(x)d(x) is direct term and
gs(x)s(x) is specular term.
 d(x) is the image colour of an equivalent flat frontal surface
viewed under the same light;
 gd(x) is a term that varies over space and accounts for the
change in brightness due to the orientation of the surface;
 s(x) is the image colour of the specular reflection from an
equivalent flat frontal surface;
 gs(x) is a term that varies over space and accounts for the
change in the amount of energy specularly reflected; and
 i(x) is a term that accounts for coloured interreflections, spatial
changes in illumination, and the like.
The detailed structure of the terms gd(x) and i(x) are ignored because
We are primarily interested in information that can be extracted from
colour at a local level.

The term i(x) can sometimes be quite small with respect to other
terms and usually changes quite slowly over space.

Moreover, nothing is known about how to extract information from


i(x), and even if it is known, all evidence suggests that this is very
difficult.

So, the term i(x) is neglected.

Specularities are small and bright, and can be found by using


different techniques or even new images without specularities can be
generated.

In the term gd(x)d(x) in the model above, gd(x) is assumed as a


constant, so that we are viewing a flat, frontal surface.

The resulting term, d(x), represents flat, frontal diffuse coloured


surfaces.

It is also assumed that there is a single illuminant that has a


constant colour over the whole image.

This term is a combination of illuminant, receptor and reflectance


information, which is impossible to separate completely in a realistic
world.

Despite this, modern algorithms can provide reasonable estimates


of surface colours from image colors, given a diverse set of coloured
surfaces and a reasonable illuminant.

Finite-Dimensional Linear Models


Finite-dimensional linear models are a class of mathematical models
used in computer vision for various tasks such as image processing,
object recognition, and image reconstruction.

In computer vision, finite-dimensional linear models represent a


fundamental approach for tasks involving relationships between
image features and the quantities you want to estimate or predict.

These models represent visual data using linear combinations of


basis functions or features.

These basis functions can be predefined or learned from data.

Each basis function captures a specific aspect or characteristic of


the visual data, such as edges, textures, colors, or shapes.

We know that the value at a pixel can be modelled as:


C(x) = gd(x)d(x) + gs(x)s(x) + i(x)
 where d(x) is the image colour of an equivalent flat frontal
surface viewed under the same light.

The term d(x) results from interactions between the spectral


irradiance of the source, the spectral albedo of the surfaces, and
the camera sensitivity.

A model is needed to account for these interactions.


Figure 3.23

If a patch of perfectly diffuse surface with diffuse spectral reflectance


ρ(λ) is illuminated by a light whose spectrum is E(λ), the spectrum of
the reflected light will be ρ(λ)E(λ), multiplied by some constant which
is linked to surface orientation, and this constant is neglected.

Thus, if a photoreceptor of the k’th type sees this surface patch, its
response will be:

where Λ is the range of all relevant wavelengths and


σk(λ) is the sensitivity of the k’th photoreceptor.
This response is linear in the surface reflectance and linear in the
illumination, which suggests using linear models for the families of
possible surface reflectances and illuminants.

A finite-dimensional linear model models surface spectral albedos


and illuminant spectral irradiance as a weighted sum of a finite
number of basis functions.
We need not use the same bases for reflectances and for illuminants.

If a finite-dimensional linear model of surface reflectance is a


reasonable description of the world, any surface reflectance can be
written as

where the φj(λ) are the basis functions for the model of reflectance,
and the rj vary from surface to surface.

Similarly, if a finite-dimensional linear model of the illuminant is a


reasonable model, any illuminant can be written as

where the ψi(λ) are the basis functions for the model of illumination.

When both models apply, the response of a receptor of the k’th type
is:
where we expect that the

are known, as they are components of the world model.

Surface Colour from Finite Dimensional Linear Models


Understanding surface colour is a crucial aspect of computer
vision, as it allows for tasks like object recognition and scene
interpretation despite varying lighting conditions.

Finite dimensional linear models provide a way to analyze how


surface colour interacts with illumination to produce the observed
image colour.

The colour we perceive from an object depends on two factors:


1. The inherent reflectance properties of the surface (surface
colour).
2. The spectral power distribution of the illumination source
(lighting).

The challenge lies in separating these effects and recovering the


intrinsic surface colour from the image captured under a specific
illuminant.
Finite dimensional linear models represent both surface reflectance
and illumination with a low number of dimensions (typically 2 or
3).

This is a simplification of the real world, where reflectance and


illumination spectra can have many dimensions.

The core assumption is that changes in illumination can be


approximated by linear transformations in this reduced-
dimensionality space.
This allows us to model the relationship between surface
reflectance, illumination, and the resulting image colour using a
linear equation.

We know that any surface reflectance can be written as

where the φj(λ) are the basis functions for the model of reflectance,
and the rj vary from surface to surface.

Each of the indexed terms in the above equation can be interpreted


as components of a vector, and pk represents the vector p with k’th
component.

The surface colour could be represented either directly by the


vector of coefficients r, or more indirectly by computing r and then
determining what the surfaces would look like under white light.

The latter representation is more useful in practice; among other


things, the results are easy to interpret.

Normalizing Average Reflectance


Assuming that the spatial average of reflectance in all scenes is
constant and is known, in the finite-dimensional basis for
reflectance, his average can be written as

Now if the average reflectance is constant, the average of the


receptor responses must be constant too, since the imaging process
is linear.

The average of the response of the k’th receptor can then be written
as:
pk and ̅̅̅
We know ̅̅̅̅ rj , and so have a linear system in the unknown
light coefficients ei. We solve this, and then recover reflectance
coefficients at each pixel, as for the case of specularities.

pk, and A is the matrix with


If 𝑝̅ is the vector with k’th component ̅̅̅̅
k, i’th component

then we can write the above expression as:

The matrix A, which represents the linear model for illumination, is


assumed to have full rank with reasonable choices of receptors.

This implies that we can determine the illumination (e) if the


dimension of the linear model matches the number of receptors.

Once the illumination is known, surface reflectance at each pixel


can be reported or the image can be corrected to simulate white
light conditions.

However, assuming the average reflectance as a constant is risky,


as it's often not accurate.

To mitigate this, one approach is to adjust the average reflectance


for different scenes, but determining the appropriate average is
challenging.
Another method is to compute a spatially averaged color, like
averaging colors represented by a certain number of pixels without
weighting them by pixel count.
However, the effectiveness of this approach is uncertain due to a
lack of experimental data in the literature.

Normalizing the Gamut


(Colour Constancy Algorithm by Gamut Mapping):
Color constancy refers to the ability of the human visual system to
perceive the color of objects consistently under varying lighting
conditions.

The gamut is the set of different colours that appear in the image.
The gamut of an image is defined as the collection of all pixel
values which contains information about the light source.

Gamut mapping is a technique used in color constancy algorithms


to achieve color consistency by mapping the colors of an image to a
reference colour space, while preserving their perceptual
appearance as much as possible.

This is typically achieved by finding the closest match for each


pixel's color in the reference color space.

Not every possible pixel value can be obtained by taking images of


real surfaces under white light.

It is usually impossible to obtain values where one channel


responds strongly and others do not - for example, 255 in the red
channel and 0 in the green and blue channels.

If an image gamut contains two pixel values, say p1 and p2, then it
must be
possible to take an image under the same illuminant that contains
the value tp1 + (1 − t)p2 for 0 ≤ t ≤ 1
This means that the convex hull of the image gamut contains the
illuminant information.

These constraints can be exploited to constrain the colour of the


illuminant.
Step 1: Obtain the gamut G of the image which represents the
convex hull of all image pixel values of the given image.

Step 2: Obtain the gamut W of many images of many different


coloured surfaces under white light, which represents the convex
hull of all image pixel values.

Step 3: Obtain the map Me that takes an image seen under


illuminant e to an image seen under white light.

Step 4: Then the only illuminants we need to consider are those


such that
Me (G) ∈ W.
Step 5: This is most helpful if the family Me has a reasonable
structure; he elements of Me are diagonal matrices, which means
that changes in one illuminant parameter affect only the response
of a single receptor.
In the case of finite dimensional linear models, Me depends linearly
on e, so that the family of illuminants that satisfy the constraint is
also convex.

Step 6: This family can be constructed by intersecting a set of


convex hulls, each corresponding to the family of maps that takes a
hull vertex of G to some point inside W.

Step 7: Once we have formed this family, it remains to find an


appropriate illuminant, by using any one of the many strategies
available for selection of illuminant.

Step 8: Apply the selected illuminant to every pixel in the image.

Advantages:
Better color reproduction than other approaches.

Disadvantages:
1. Computationally expensive
2. Depends upon sensor sensitivity
3. Assumes uniform illumination distribution
4. Requires the knowledge of the range of illuminant.

You might also like