Introduction to color imaging science
Introduction to color imaging science
H SIEN -C HE L EE received his B.S. from National Taiwan University in 1973 and Ph.D. in
electrical engineering from Purdue University in 1981. He then worked for 18 years at
Kodak Research Laboratories in Rochester, New York. There he did research on digital
color image processing, color science, human color vision, medical imaging, and computer
vision. He is now Senior Vice President of Advanced Imaging at Foxlink Peripherals,
Inc., Fremont, California. With more than 20 years of research and product development
experience in imaging science, he has given many lectures and short courses on color
imaging, color science, and computer vision at various universities and research institutes.
He has published many technical papers and has 14 US patents in inventions related to color
imaging science.
Introduction to
Color Imaging Science
HSIEN-CHE LEE
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo
Preface xix
1 Introduction 1
1.1 What is color imaging science? 1
1.2 Overview of the book 2
1.2.1 Measurement of light and color 2
1.2.2 Optical image formation 3
1.2.3 In the eye of the beholder 4
1.2.4 Tools for color imaging 5
1.2.5 Color image acquisition and display 5
1.2.6 Image quality and image processing 6
1.3 The International System of Units (SI) 6
1.4 General bibliography and guide to the literatures 8
1.5 Problems 12
2 Light 13
2.1 What is light? 13
2.2 Wave trains of finite length 15
2.3 Coherence 15
2.3.1 Temporal coherence 16
2.3.2 Spatial coherence 17
2.4 Polarization 20
2.4.1 Representations of polarization 20
2.4.2 Stokes parameters 23
2.4.3 The Mueller matrix 26
2.4.4 The interference of polarized light 28
2.5 Problems 28
3 Radiometry 29
3.1 Concepts and definitions 29
3.2 Spectral radiometry 39
3.3 The International Lighting Vocabulary 40
3.4 Radiance theorem 40
3.5 Integrating cavities 42
vii
viii Contents
4 Photometry 49
4.1 Brightness matching and photometry 49
4.2 The spectral luminous efficiency functions 52
4.3 Photometric quantities 54
4.4 Photometry in imaging applications 58
4.4.1 Exposure value (EV) 59
4.4.2 Guide number 59
4.4.3 Additive system of photographic exposure (APEX) 61
4.5 Problems 62
5 Light–matter interaction 63
5.1 Light, energy, and electromagnetic waves 63
5.2 Physical properties of matter 64
5.3 Light and matter 66
5.3.1 Optical properties of matter 67
5.3.2 Light wave propagation in media 69
5.3.3 Optical dispersion in matter 72
5.3.4 Quantum mechanics and optical dispersion 76
5.4 Light propagation across material boundaries 76
5.4.1 Reflection and refraction 77
5.4.2 Scattering 81
5.4.3 Transmission and absorption 83
5.4.4 Diffraction 84
5.5 Problems 87
6 Colorimetry 89
6.1 Colorimetry and its empirical foundations 89
6.2 The receptor-level theory of color matching 90
6.3 Color matching experiments 93
6.4 Transformation between two sets of primaries 95
6.5 The CIE 1931 Standard Colorimetric Observer (2◦ ) 97
6.6 The CIE 1964 Supplementary Standard Colorimetric Observer (10◦ ) 102
6.7 Calculation of tristimulus values 104
6.8 Some mathematical relations of colorimetric quantities 104
6.9 Cautions on the use of colorimetric data 106
6.10 Color differences and uniform color spaces 107
6.10.1 CIE 1976 UCS diagram 109
6.10.2 CIELUV color space 110
6.10.3 CIELAB color space 111
Contents ix
Glossary 625
References 635
Index 689
Preface
To understand the capturing, the processing, and the display of color images requires knowl-
edge of many disciplines, such as image formation, radiometry, colorimetry, psychophysics,
and color reproduction, that are not parts of the traditional training for engineers. Yet, with
the advance of sensor, computing, and display technologies, engineers today often have to
deal with aspects of color imaging, some more frequently than others. This book is intended
as an introduction to color imaging science for engineers and scientists. It will be useful
for those who are preparing to work or are already working in the field of color imaging
or other fields that would benefit from the understanding of the fundamental processes of
color imaging.
The sound training of imaging scientists and engineers requires more than teaching
practical knowledge of color signal conversion, such as YIQ to RGB. It also has to impart
good understanding of the physical, mathematical, and psychophysical principles underlying
the practice. Good understanding ensures correct usage of formulas and enables one to come
up with creative solutions to new problems. The major emphasis of this book, therefore,
is to elucidate the basic principles and processes of color imaging, rather than to compile
knowledge of all known systems and algorithms. Many applications are described, but they
serve mainly as examples of how the basic principles can be used in practice and where
compromises are made.
Color imaging science covers so many fields of research that it takes much more than
one book to discuss its various aspects in reasonable detail. There are excellent books on
optics, radiometry, photometry, colorimetry, color science, color vision, visual perception,
pigments, dyes, photography, image sensors, image displays, image quality, and graphic arts.
Indeed, the best way to understand the science of color imaging is to read books on each of
these topics. The obvious problem is the time and effort required for such an undertaking,
and this is the main motivation for writing this book. It extracts the essential information
from the diverse disciplines to present a concise introduction to the science of color imaging.
In doing so, I have made unavoidable personal choices as to what should be included. I have
covered most of the topics that I considered important for a basic understanding of color
imaging. Readers, who want to know more on any topic, are strongly encouraged to study
the books and articles cited in the reference list for further information.
I would like to thank Professor Thomas S. Huang of University of Illinois, for his won-
derful lectures and his suggestion of writing a book on color imaging. I would also like to
thank Professor Thomas W. Parks of Cornell University for his numerous suggestions on
how to improve the presentation of the material and for his help in constructing homework
xix
xx Preface
problems for students. During the time he and I cotaught a course on color imaging science
at Cornell, I learned a lot from his many years of teaching experience. My career in imaging
science began under Mr. James S. Alkofer and Dr. Michael A. Kriss. They let me wander
around in the interesting world of color imaging under their experienced guidance. I appre-
ciate their encouragement, friendship, and wisdom very much. I am also very grateful to
my copy-editor, Maureen Storey, for her patient and meticulous editing of my manuscript.
During the preparation of this book, my wife took care of the family needs and all the
housework. Her smiles brightened my tired days and her lively description of her daily
activities kept me in touch with the real world. She loves taking pictures and her casual
comments on image quality serve as reality checks of all the theories I know. My book-
writing also required me to borrow many weekends from my daughter. Her witty and funny
remarks to comfort me on my ever increasing time debt just made it more difficult for me
to figure out how much I owe her. Certain things cannot be quantified.
1 Introduction
Color imaging science is the study of the formation, manipulation, display, and evaluation of
color images. Image formation includes the optical imaging process and the image sensing
and recording processes. The manipulation of images is most easily done through computers
in digital form or electronic circuits in analog form. Conventional image manipulation in
darkrooms accounts only for a very small fraction of the total images manipulated daily. The
display of color images can use many different media, such as CRT monitors, photographic
prints, half-tone printing, and thermal dye-transfer prints, etc. The complete imaging chain
from capture, through image processing, to display involves many steps of degradation,
correction, enhancement, and compromise. The quality of the final reproduced images has
to be evaluated by the very subjective human observers. Sometimes, the evaluation process
can be automated with a few objectively computable, quantitative measurements.
The complexity of color imaging science stems from the need to understand many
diverse fields of engineering, optics, physics, chemistry, and mathematics. Although it
is not required for us to be familiar with every part of the process in detail before we
can work in and contribute to the color imaging science field, it is often necessary for
us to have a general understanding of the entire imaging chain in order to avoid making
unrealistic assumptions in our work. For example, in digital image processing, a frequently
used technique is histogram-equalization enhancement, in which an input image is mapped
through a tonal transformation curve such that the output image has a uniformly distributed
histogram of image values. However, the technique is often applied without knowing what
the units of the digital images really are. The same image can be digitized in terms of film
density or image exposure. Depending on which way it is digitized, the resulting histogram
can differ widely. Writing that an image has been processed by the “histogram-equalization”
technique without saying in which metric the histogram was equalized does not allow the
reader to draw any meaningful conclusion. If we have a general understanding of the practice
of image scanning and display, we can easily avoid this type of error. Sometimes, causes of
errors can be more subtle and it requires understanding of a different kind to avoid them. For
example, the geometrical theory of optical imaging tells us that the out-of-focus point spread
function is a uniform disk. However, if we understand that the fundamental assumption of
geometrical optics is not valid around the image focus area, we are more careful in using the
uniform disk as a blur model. In this case, basic knowledge of the assumptions underlying
various approximations made by theories lets us watch out for potential pitfalls. For these
1
2 1 Introduction
reasons, this book aims at providing the needed general understanding of the entire color
imaging chain whilst making the various assumptions and approximations clear.
This book is written based on the belief that for a beginning color imaging scientist or
engineer, a basic, broad understanding of the physical principles underlying every step in
the imaging chain is more useful than an accumulation of knowledge about details of various
techniques. Therefore, on the one hand, some readers may be surprised by many of the topics
in the book that are not traditionally covered by textbooks on color science and imaging
science. On the other hand, some readers may be disappointed that no comprehensive
surveys are provided for various algorithms or devices. If we truly understand the nature
of a problem, we can often come up with very creative and robust solutions after some
careful thinking. Otherwise, even if we know all the existing tricks and methods to solve a
problem, we may be at a loss when some critical constraints are changed. The following is
an overview of the book.
see the world, separate colorimetric systems can be constructed according to their spectral
sensing mechanisms. From this perspective, we can appreciate how color imaging can be
thought of as a branch of science that relates different physical systems with the same basic
laws. For human color perception, the colorimetry system established by the Commission
Internationale de l’Eclairage (CIE) is the most widely accepted system today. Much of
Chapter 6 is devoted to explaining what the CIE system is and how it was derived. It is of
fundamental importance that we understand this system thoroughly.
Since the spectral composition of the light reflected from an object surface is the product
of the spectral composition of the light incident on the surface and the spectral reflectance
factor of the surface, the spectral characteristics of light sources directly (through direct
illumination) or indirectly (through mutual reflection) affect the spectral composition of the
optical image formed at the sensor(s) of a color imaging system. Therefore, it is necessary
for us to have a good knowledge of the nature of the various light sources that are involved
in color imaging applications. This is the subject of Chapter 7.
The colorful contents of natural scenes are the results of the complex interaction of light
and objects. The quantitative description of such interactions is called scene physics, and
is the subject of Chapter 8. It is important to note that such quantitative description is a
very difficult problem to formulate. The concept of the bidirectional reflectance distribution
function (BRDF) is one formulation that has been widely accepted because of its practical
applicability and usefulness, although it certainly is not valid for every conceivable light–
matter interaction. Various models for reflective and transmissive materials are discussed
following this basic concept. In addition to color imaging applications, these models often
find use in color image synthesis, colorant formulation, the printing industry, and computer
vision. These fields are closely related to color imaging and color imaging research benefits
from ideas and results from them. The chapter also includes a general overview of the
physical and optical properties of some of the common materials that we encounter in color
imaging applications. The chapter ends with a summary of some statistical properties of
natural scenes. These properties are empirical, but they are useful for at least two purposes:
(1) Many practical color imaging problems, such as white balance and exposure determi-
nation, are open research problems that seem to have no provable, deterministic solutions.
Statistical properties of natural scenes can be used as a priori knowledge in any Bayesian
estimate. (2) The statistical properties reveal certain regularities in the natural scenes and
thus form a very rich source of research topics that will increase our understanding of how
the physical world behaves.
assumed to be an ideal image point, independently of the aperture size. This is simply not
true. From electromagnetic wave theory, we can derive the so-called “diffraction-limited”
point spread function, which turns out to have a fairly complicated spatial distribution. The
description of the optical image formation through wave theory is called wave optics or
physical optics. Chapters 9 and 10 cover the basic concepts in both geometrical optics and
physical optics. The geometrical theory of optical imaging is quite general and, as far as
color imaging science is concerned, the most interesting result is that the mapping between
the object space and the image space is a projective transformation. This leads naturally to
the matrix method for paraxial ray tracing that allows us to do quick and simple calculations
of the basic characteristics of most optical imaging systems. The most fundamental tool for
analyzing the image quality of an imaging system is the optical transfer function (OTF). The
relationship between the OTF and the wavefront aberration can be derived from diffraction
theory, which is the foundation of physical optics for image formation.
In the sensing and recording of optical images, it is very important to calculate how much
light (image irradiance) is collected on the sensor plane, as a function of focal length, object
distance, and aperture size. In Chapter 10, the image irradiance equations, like the theory
of radiometry, are derived from geometrical optics. These equations are very important
for all practical optical imaging systems and should be understood well. A more detailed
desciption of the light distribution in the image space has to be derived from physical optics.
The results from geometrical optics and physical optics are compared using a case study of
the blur caused by defocus. The conclusion is that when the defocus is severe, the predictions
of both theories are quite similar. However, when the defocus is slight, the predictions are
very different. Physical optics even predicts, against our intuition, that the center of the point
spread function can become zero at a certain defocus distance. This rather counterintuitive
prediction has been confirmed by experiments.
of brightness and lightness in some detail because they show us how complicated the
computation can be even for some things that sound intuitively obvious. We also discuss
at length the perception of images when they are stabilized on our retinas. The finding that
the perceived images quickly fade when they are stabilized on the observer’s retina clearly
demonstrates that the visual perception is more a task of reconstruction from visual features
than a job of mapping the optical images directly to our mind.
After we have studied the human visual system in Chapters 11–13, we are well prepared
to explore the basic ideas and theories behind the various color order systems in Chapter
14. We have delayed the discussion of this subject until now so that we can appreciate
the motivation, the limitations, and the difficulties involved in any color order system.
(For example, the concept of opponent color processes was developed to explain many
psychophysical observations, and therefore it also plays an important role in the Ostwald
and the NCS color order systems.) The idea of using a color atlas for everyday color
specification seems an intuitive thing to do, but from the perspective of colorimetry, a color
atlas may be a useless thing to have because the everyday illuminants are almost never as
specified by the atlas. It is the powerful color processing of our visual system that does all
the “auto” compensations that make a color atlas of any practical use.
of the characteristics and working principles of the various input/output systems is very
important in the practice of color imaging science. Even if we do not directly work on a
particular device or medium, it is very likely we will encounter images that are acquired by
that device or are to be displayed on that medium. Often, the solution to a color imaging
problem for a given device may have been worked out for other devices. Understanding the
problems and technology behind one type of system often helps us to solve problems in
another type of system. A good example is the unsharp masking method for image enhance-
ment, which has long been practised in photographic dark rooms. The same technique is
now used extensively in digital imaging as well.
In this book, we use the terminology and units in the International System of Units (SI) and
those recommended by the Commission Internationale de l’Eclairage (CIE). When there are
1.3 The International System of Units (SI) 7
conflicts in symbols, we will use the CIE symbols for the units in radiometry, colorimetry,
and photometry. The International System of Units is described in many standard documents
(such as [942]) and the book by Ražnjević [787] provides good explanations. The CIE
system is well described in its publication: International Lighting Vocabulary [187]. The
International System of Units (SI) adopted by CGPM1 is composed of basic units, derived
units, and supplementary units. There are seven basic units: meter [m] for length, kilogram
[kg] for mass, second [s] for time, ampere [A] for electric current, kelvin [K] for temperature,
candela [cd] for luminous intensity, and mole [mol] for amount of substance. The meter
is defined as the length of the path traveled by light in vacuum during a time interval of
1/299 792 458 second. The unit of plane angle, radian [rad], and the unit of solid angle,
steradian [sr], are two of the supplementary units. Since they are dimensionless derived
units, they do not need to be defined as a separate class of unit. Many SI derived units, such
as watt [W], volt [V], hertz [Hz], and joule [J], are quite familiar to us. Other SI derived
units, such as lux [lx] and lumen [lm], that we are going to use frequently in the book will be
defined in detail later. When the numerical values are too large or too small, the SI prefixes
in Table 1.1 can be used to form multiples and submultiples of SI units. It is a convention
that a grouping formed by a prefix symbol and a unit symbol is a new inseparable symbol.
Therefore, cm (centimeter) is a new symbol and can be raised to any power without using
parentheses. For example, 2 cm2 = 2 (cm)2 . Convention also requires that unit symbols are
unaltered in the plural and are not followed by a period unless at the end of a sentence.
Unfortunately, there are many instances when one standard symbol could represent
more than one physical quantity. For example, E is used both for the electric field strength
[V m−1 ] and for irradiance [W m−2 ]. Similarly, H is used for the magnetic field strength
[A m−1 ] and also for exposure [J m−2 ]. Since this happens very frequently and since chang-
ing standard symbols for various physical quantities can create more confusion, we decided
that the best way to avoid ambiguity is to specify the units when it is not clear from the
context which physical quantity is used. This will free us to use the same, widely accepted,
standard symbol for different physical quantities in our discussion throughout the book. In
1 CGPM stands for Conférence Générale des Poids et Measures. Its English translation is: General Conference
on Weights and Measures. It is the decision-making body of the Treaty of the Meter, signed in 1875. The decisions
by CGPM legally govern the international metrology system among all the countries that signed the Treaty.
8 1 Introduction
almost all cases, the context and the name of the physical quantity will make the meaning
clear. The physical constants shown in Table 1.2 will be useful in our later discussion.
Color imaging science cuts across many different disciplines. For further details on any
specific topic, the reader is encouraged to consult books and papers in that field. There are
many excellent books in each field. Since every person has a different style of learning and a
different background of training, it is difficult to recommend books that will be both useful
and interesting to everyone. A short bibliography is compiled here. No special criteria
have been used for selection and the list represents only a tiny fraction of the excellent
books available on the various topics. Hopefully, it may be useful for you. If you know
some experts in the field you are interested in, you should ask them for more personalized
recommendations.
Color science
Billmeyer and Saltzman’s Principles of Color Technology, 3rd edition, by R.S. Berns [104].
Principles of Color Technology, 2nd edition, by F.W. Billmeyer and M. Saltzman [111].
Measuring Colour, by R.W.G. Hunt [430].
Color: An Introduction to Practice and Principles, by R.G. Kuehni [539].
Color Measurement, by D.L. MacAdam [620].
Colour Physics for Industry, 2nd edition, edited by R. McDonald [653].
1.4 General bibliography and guide to the literatures 9
Handbook of Color Science, 2nd edition, edited by Nihon Shikisaigakkai (in Japanese)
[716].
The Science of Color, 2nd edition, edited by S.K. Shevell [863].
Industrial Color Testing: Fundamentals and Techniques, by H.G. Völz [989].
Color Science, 2nd edition, by G. Wyszecki and W.S. Stiles [1053].
Optics
Handbook of Optics, Volumes I and II, edited by M. Bass [84].
Principles of Optics, 7th edition, by M. Born and E. Wolf [125].
Introduction to Matrix Methods in Optics, by A. Gerrard and J.M. Burch [341].
Statistical Optics, by J.W. Goodman [353].
Introduction to Fourier Optics, by J.W. Goodman [354].
Optics, 2nd edition, by E. Hecht [385].
Lens Design Fundamentals, by R. Kingslake [508].
Optics in Photography, by R. Kingslake [509].
Optics, 2nd edition, by M.V. Klein and T.E. Furtak [512].
Physiological Optics, by Y. Le Grand and S.G. El Hage [580].
Aberration Theory Made Simple, by V.N. Mahajan [626].
Optical Coherence and Quantum Optics, by L. Mandel and E. Wolf [631].
Geometrical Optics and Optical Design, by P. Mouroulis and J. Macdonald [682].
10 1 Introduction
Scene physics
Absorption and Scattering of Light by Small Particles, by C.F. Bohren and D.R. Huffman
[120].
The Cambridge Guide to the Material World, by R. Cotterill [217].
Light by R.W. Ditchburn [258].
Sensory Ecology, by D.B. Dusenbery [269].
Seeing the Light, by D.S. Falk, D.R. Brill, and D.G. Stork [297].
Color in Nature, by P.A. Farrant [301].
Color and Light in Nature, by D.K. Lynch and W. Livingston [615].
The Colour Science of Dyes and Pigments, by K. McLaren [654].
Light and Color in the Outdoors, by M. Minnaert [667].
The Physics and Chemistry of Color, by K. Nassau [693].
Light and Color, by R.D. Overheim and D.L. Wagner [736].
Introduction to Materials Science for Engineers, 4th edition, by J.F. Shackelford
[853].
Colour and the Optical Properties of Materials, by R.J.D. Tilley [952].
Light and Color in Nature and Art, by S.J. Williamson and H.Z. Cummins [1036].
Color Chemistry, 2nd edition, by H. Zollinger [1071].
Image science
Foundations of Image Science, by H.H. Barrett and K.J. Meyers [64].
Image Science, by J.C. Dainty and R. Shaw [232].
Principles of Color Photography, by R.M. Evans, W.T. Hanson, and W.L. Brewer
[289].
The Theory of the Photographic Process, 4th edition, edited by T.H. James [459].
Handbook of Image Quality, by B.W. Keelan [494].
Science and Technology of Photography, edited by K. Keller [495].
Image Technology Design: A Perceptual Approach, by J.-B. Martens [642].
Handbook of Photographic Science and Engineering, 2nd edition, edited by C.N. Proudfoot
[779].
Fundamentals of Electronic Imaging Systems, 2nd edition, by W.F. Schreiber [841].
Imaging Processes and Materials, edited by J. Sturge, V. Walworth, and A. Shepp
[923].
Photographic Sensitivity: Theory and Mechanisms, by T. Tani [936].
1.4 General bibliography and guide to the literatures 11
Color reproduction
Color Appearance Models, by M.D. Fairchild [292].
Color and Its Reproduction, by G.G. Field [309].
Digital Color Management, by E.J. Giorgianni and T.E. Madden [347].
Colour Engineering, edited by P.J. Green and L.W. MacDonald [362].
The Reproduction of Colour in Photography, Printing, and Television, 5th edition, by R.W.G.
Hunt [433].
Color Technology for Electronic Imaging Devices, by H.R. Kang [483].
Colour Imaging: Vision and Technology, edited by L.W. MacDonald and M.R. Luo [622].
Colour Image Science: Exploiting Digital Media, edited by L.W. MacDonald and M.R. Luo
[623].
Introduction to Color Reproduction Technology, (in Japanese) by N. Ohta [726].
Colour Science in Television and Display Systems, by W.N. Sproson [895].
Principles of Color Reproduction, by J.A.C. Yule [1065].
1.5 Problems
1.1 Let X = g(Y ) be the input/output characteristic response function of an image capture
device (say, a scanner), where Y is the input signal (reflectance) and X is the output
response (output digital image from the scanner). Let y = f (x) be the input/output
characteristic function of an image display device (say, a CRT monitor), where x is
the input digital image and y is the luminance of the displayed image. Assume that
both g and f are one-to-one functions. If our objective is to make the displayed im-
age y proportional to the scanned target reflectance Y , what should be the functional
transformation on X before it is used as the input, x, to the display?
1.2 A monitor has two gray squares, A and B, displayed on its screen. When the room
light is on, the amounts of light from the two squares are L A and L B , where L A ≥ L B .
When the room light is off, the amounts of light become D A and D B , where D A ≥ D B .
Which of the contrast ratios is higher, L A /L B or D A /D B ?
1.3 The Poynting vector, E × H, is a very useful quantity in the study of electromagnetic
waves, where E is the electric field strength [V m−1 ] and H is the magnetic field strength
[A m−1 ]. By analyzing its unit, can you guess the physical meaning of the Poynting
vector?
2 Light
Within our domain of interest, images are formed by light and its interaction with matter.
The spatial and spectral distribution of light is focused on the sensor and recorded as an
image. It is therefore important for us to first understand the nature and the properties of
light. After a brief description of the nature of light, we will discuss some of its basic
properties: energy, frequency, coherence, and polarization. The energy flow of light and the
characterization of the frequency/wavelength distribution are the subjects of radiometry,
colorimetry, and photometry, which will be covered in later chapters. The coherence and
the polarization properties of light are also essential for understanding many aspects of the
image formation process, but they are not as important for most color imaging applications
because most natural light sources are incoherent and unpolarized, and most imaging sensors
(including our eyes) are not sensitive to polarization. Therefore, we will discuss these two
properties only briefly. They are presented in this chapter. Fortunately there are excellent
books [208, 631, 871] covering these two topics (also, see the bibliography in Handbook
of Optics [84]). From time to time later in the book, we will need to use the concepts we
develop here to help us understand some of the more subtle issues in light–matter interaction
(such as scattering and interference), and in the image formation process (such as the
OTFs).
The nature of light has been one of the most intensively studied subjects in physics. Its
research has led to several major discoveries in human history. We have now reached a
stage where we have an extremely precise theory of light, quantum electrodynamics (QED)
[307, 602, 760] that can explain all the physical phenomena of light that we know about and
its interaction with matter, from diffraction, interference, blackbody radiation, the laser, and
the photoelectric effect, to Compton scattering of x-rays [211]. However, the nature of light
as described by QED is quite abstract. It is so different from our everyday experience that
no simple mental model or intuition, such as waves or particles, can be developed in our
understanding to comprehend its nature. A fair statement to make about the nature of light
is that we do not really “understand” it, but we have a very precise theory for calculating
and predicting its behavior. Since the nature of light is literally beyond our comprehension,
the most fundamental description of light has to rely on experimental facts – phenomena
that are observable. For example:
13
14 2 Light
1. Due to its wave nature, light has different temporal frequencies. By saying this, we
are implying that light is described as periodic functions, at least over a very short
period of time. The spectrum of a beam of sunlight as produced by a prism has many
different colors, each associated with light of different frequency ν. The word “light”
usually refers to the frequency range that is visible (approximately, from 4.0 × 1014 Hz
to 7.8 × 1014 Hz).
2. Light carries energy (we feel heat from sunlight) and when it is absorbed, it is always
in discrete amounts. The unit energy of the discrete amounts is hν, where h is Planck’s
constant and ν is the frequency of the light.
3. Light (photon) has linear momentum, hν/c, and therefore exerts force on a surface it
illuminates.
4. Light of the same frequency can have different characteristics (called polarizations)
that can be separated out by certain materials called polarizers. In quantum mechanics,
a photon can have one of two different spins (angular momentum): ±h/(2π ).
Because of its complexity and its nonintuitive nature, QED theory is rarely used to explain
“simpler” light behavior, such as interference, or to design optical imaging systems, such
as a camera or a scanner. Fortunately, for these applications, we have alternative theories or
models. The two most valuable ones are the ray model (geometrical optics) and the wave
model (physical optics). Both models are incapable of explaining or predicting certain
phenomena, but within their domains of validity, they are much simpler and more intuitive,
and therefore, very useful.
The wave model is based on the Maxwell equations for classical electromagnetic theory.
The velocity of the electromagnetic wave was shown to be the same as that of light. By
now, it is well accepted (with the knowledge that the description is not complete) that
light is an electromagnetic wave, as are the microwave used for cooking, the radio-wave
used for communications, and the x-ray used for medical imaging. The light ray in the
simpler geometric optics is often thought of as the surface normal to the wavefront of
the electromagnetic wave, although this simple interpretation does not always work well,
especially when the wave is not a simple plane wave or spherical wave. The connection
between the electromagnetic wave and the photon in QED theory is not as straightforward
to make. Quantum theory uses two objects to describe a physical system: the operator for the
physical variables, such as the electric field intensity, and the Schrödinger wave function, ψ,
for the state of the system. The Schrödinger wave function, ψ, is usually a complex function
and its product with its complex conjugate, ψψ ∗ , gives the probability of finding photons
at a point in space and time. It should be pointed out that the Schrödinger wave function, ψ,
as solved in QED is not the electromagnetic wave as described by the Maxwell equations.
The connection between the two waves is a statistical one: for classical phenomena, such as
interference, the time-averaged Poynting vector, E × H [W m−2 ] as calculated from the
Maxwell equations, predicts the average number of photons per unit time per unit area at
that point in space, as calculated from QED.
For the majority of the optical applications that are of interest to us in this book, we will
treat light as electromagnetic waves described by the Maxwell equations. The wavelength
(in vacuum) range of the light that is visible to our eyes is approximately from 380 nm
2.3 Coherence 15
(7.89 × 1014 Hz) to 740 nm (4.05 × 1014 Hz). The sources of light relevant to color imaging
are mostly thermal sources, such as the sun, tungsten lamps, and fluorescent lamps. For these
sources, light is incoherent and unpolarized – these two concepts can be treated within the
electromagnetic wave model.
When we treat light as electromagnetic waves, we need to realize that the waves are of finite
length. When we turn on a light lamp at time t1 , light is emitted from the lamp, and when we
turn off the lamp at time t2 , the emission of light stops (approximately, because the tungsten
filament does not cool down instantly). In this case, the duration of each of the trains of
electromagnetic waves cannot be much longer than t2 − t1 . In fact, they are all many orders of
magnitude shorter than t2 − t1 . When an electron of an atom or a molecule makes a transition
from a higher energy state to a lower one, a photon is emitted. The time it takes for the electron
to make the transition is very short and so is the length of the wave train of the light emitted.
Although we have not measured the transition time directly, there are measurements that
give us good estimates of the approximate length of the wave train for several light sources
(e.g., [258, Chapter 4]). If the transition is spontaneous, the phase is often random, and the
length of the wave train is short (on the order of 10−8 s [258, p. 93, 306,Volume I, p. 33–2,
631, p. 150]). If the transition is induced by an external field, such as in a laser, then the wave
train can be much longer (as long as 10−4 s). However, for light with a wavelength of 500 nm,
even a 10−8 s wave train contains 6 million wave cycles! There are two implications from
the result of this simple calculation. (1) For most theoretical derivations concerning phase
relations on a spatial scale in the range of a few wavelengths, such as light reflection from a
smooth surface, we can approximate the light as a sinusoidal wave (such as a plane wave).
(2) For most measurements of light, the integration time for sensing is much longer than
10−8 s, and the finite length of a wave train cannot be neglected. From the theory of Fourier
analysis, a sine wave of duration t has a frequency bandwidth ν ≈ 1/t. Therefore, there
is no such thing as a monochromatic (single-frequency) light wave. When the frequency
bandwidth of radiation is very narrow, ν/ν 1, we call it a quasi-monochromatic wave.
Conventional wave analysis relies heavily on Fourier analysis, which has the disadvantage
of having a very sharp frequency resolution, but very poor spatial or time resolution (i.e.,
the sine and cosine functions can have a single frequency, but then they extend to infinity
spatially or temporally). A new mathematical tool called wavelet analysis allows us to
decompose any signal into wavelets that are more localized in time. It can be shown that
wavelet solutions to the Maxwell equations can be found [478] and they may provide a more
natural description for wave trains of finite length.
2.3 Coherence
The electromagnetic fields at two different points in space-time can fluctuate completely
independently. In this case, we can say that they are completely incoherent. If the fluctuations
16 2 Light
of the fields at these two points are not completely independent of each other, then they are
partially or completely coherent with each other. The degree of independence or the degree
of coherence can be measured by statistical correlation [631, Chapters 4 and 6, 742]. Two
special cases of coherence theory are temporal coherence (field fluctuation measured at the
same spatial location) and spatial coherence (field fluctuation measured at the same time
instant). Let us first consider the case of the temporal coherence in the famous Michelson
interferometer.
A
B′
Wa
W Wb
S
M B
W′
plane of observation
a light source that generates line spectra, such as a sodium lamp or a mercury arc lamp. For
these sources, we can imagine that many wave trains are emitted randomly, but each wave
train, W , is split into a pair of trains, Wa and Wb , which are later brought back together
at the plane of observation, which is set up somewhere along the path that the combined
light beam travels. Instead of making the reflecting mirrors, A and B, perfectly parallel with
respect to the images as seen by the beam splitter M, we introduce a minutely small tilt
angle on mirror B. As a result of this tilt, the wave trains arriving at different points on the
plane of observation are out of phase by different amounts and thus produce interference
fringes. At the points where the pair of wave trains Wa and Wb differ in relative phase by
integer multiples of the wavelength, the field amplitudes add exactly constructively and the
radiant flux density [W · m−2 ] reaches the maximum, E max . At the points where the relative
phase differs by an odd multiple of half the wavelength, the field amplitudes cancel each
other, and the light flux density falls to the minimum, E min . Michelson [661, p. 36] defined
the fringe visibility (also known as Michelson contrast), V , as:
E max − E min
V = (2.1)
E max + E min
and he showed that it varies as a function of the time delay t introduced between the
two paths for Wa and Wb , or equivalently as a function of the optical path difference,
d = vt, where v is the velocity of the light in the medium. By analyzing the visibility
V as a function of d, he was able to estimate the spectral distribution of the light source.
For example, the cadmium red line at 643.8 nm was shown to have a half-width (at the
half-height) of 0.000 65 nm [258, p. 80], which can be used to deduce that the duration of
the wave train emitted by the cadmium is on the order of 10−8 s.
Our immediate interest is that Michelson interference as described above occurs only
when the relative time delay between the two wave trains is less than the duration of the
original wave train. This time duration, T is called the coherent time of the light. Its
corresponding optical path difference, l = vT , is called the longitudinal coherence
length [631, pp. 148–9]. For the cadmium red line at 643.8 nm, the coherent time is about
10−8 s and the corresponding longitudinal coherence length is about 3 m in the air.
There is another interesting aspect of Michelson interference. If we consider a wave train
as a sine wave of frequency ν windowed (multiplied) by a rectangle function of width T ,
from Fourier analysis, the resulting frequency spectrum of the wave train is a sinc function,
centered at ν, whose main lobe has a half-width of 1/T . If the sine wave is windowed by
a Gaussian function with a standard deviation of T , the resulting frequency spectrum is
also a Gaussian function, centered at ν, with a standard deviation of 1/(2π T ). Experi-
mentally, one finds that the Michelson interference fringes appear only when νT ≤ 1
approximately, where ν is the bandwidth of the light source. Therefore, the coherent time
T is approximately inversely proportional to the bandwidth of the light beam ν.
x B C
∆s z
S d
D
A b
R R′
Figure 2.2. Young’s two-slit interference experiment.
Ditchburn [258, p. 119], Grimaldi was among the first to attempt to observe interference.
He used a (thermal) light source S (without screen A) in front of a screen (B) with two slits
and observed fringes on a screen (C) some distance behind the slits. However, it was Young
who discovered that the light source size had to be made very small for the interference
fringes to be observed. He used an additional screen (A) with a small hole (s) to let
the light through and projected the light onto the two slits. This small hole thus serves to
reduce the size of the light source. In our later analysis, we will see that this was the critical
modification that made him successful. Young reported his results in 1802 in front of the
Royal Society, but was met with great ridicule because Newton’s particle model of light
was the dominant theory at that time. However, regardless of how the phenomenon should
be explained, the experiment was a very important one in presenting the very basic nature
of light (see the interesting discussion in [306, Volume III]).
The two light beams that pass through the two slits to produce the interference are
separated spatially although they come from the same small thermal source. The fact that
the spatially separated light beams can produce interference means that the field fluctuations
in the two spatially-separated slits are correlated. This is easy to imagine if one thinks of
a spherical wave propagating from the small source towards the two slits on the screen.
However, this is only a mental model and in reality we know that this wave model is not true
because it does not explain many phenomena, such as the photoelectric effect. Therefore,
the experimental facts alone force us to describe the light going through the two slits as
having spatial coherence.
Experiments have shown that whether interference fringes are observed or not depends
critically on some of the experimental parameters. Let the source (the tiny hole on the screen
A) be at the origin of the coordinate system, and the positive z-axis go through the middle
of the two slits on the screen B, intersecting with the observation screen C at the point D.
(The widths of the two slits affect the modulation of the interference pattern because of
diffraction, but as long as they are very narrow compared with the distance between them,
we can ignore this in what we would like to discuss below. We can make sure that the fringe
2.3 Coherence 19
pattern that we are seeing is due to interference, not diffraction, by covering one of the slits
when the pattern should disappear.) Let the x-axis be parallel with the line connecting the
two slits on the screen B. As mentioned above, the size of the source along the x-direction,
s, should be small, because different points on the source along that direction generate
interference fringes that are offset from each other and therefore smear out the interference
pattern. Also, R, the distance from screen A to screen B, and R , the distance from screen
B to screen C, both should be much larger than d, the distance between the two slits on the
screen B, because the angular subtenses from the source to the slits and from the slits to the
observation screen determine the optical path difference. If the optical path difference is too
long (say, longer than the typical duration of the wave train of the light), the interference does
not occur. Experiments (as well as theoretical calculation based on optical path difference
[258, pp. 120–1]) show that the interference fringes are observable when
d
s ≈ sθ ≤ λ, (2.2)
R
where θ ≈ d/R is the angle formed by d the distance between the two slits relative to the
source, and λ is the wavelength of the light from the source. The width of the interference
band, b (the distance from maximum to maximum), on the observation plane C can also
be calculated from the optical path difference between the two slit paths: b = R λ/d [208,
Section 2.3]. In a typical experiment, the two slits are separated by about 1 mm, the screen
distances, R and R , are about 1 m, and the wavelength is about 500 nm. Therefore the
width of the interference band, b, is about 0.5 mm, which is observable by the naked
eye.
The above experimental results allow us to define a few terms regarding spatial coherence.
The two beams passing through the two slits are separated by a distance d and they are located
at a distance R away from the source of dimension s. In order for the interference fringes
to be observable, the spatial separation d has to satisfy the following relation:
Rλ
d≤ . (2.3)
s
Therefore, we can define Rλ/s as the transverse coherence length, and its square,
R 2 λ2 /(s)2 , as the coherence area, A. If we take the product of the longitudinal coher-
ence length l and the coherence area, A, we get the coherence volume, V = lA.
From the uncertainty principle of quantum mechanics, we can show that photons in the
coherence volume are not distinguishable from each other [631, pp. 155–9]. Although we
have derived the concepts of coherence length, coherence area, and coherence volume from
the electromagnetic wave models, they are consistent with quantum theory as well.
It is instructive to calculate the coherence area of some common light sources that we
see in our imaging applications. The sun has an angular subtense (s/R) of about 0.5◦ .
The middle of the visible spectrum is at about 500 nm. Therefore, the coherence area of
sunlight at 500 nm is about 3.3 × 10−3 mm2 and the transverse coherence length is about
0.057 mm. This is so small that we can treat sunlight reflected from any two points of
an object surface as incoherent for all practical purposes. On the other hand, light from a
distant star has a relatively large coherent area on the earth’s surface and starlight needs to
be treated with its coherence property in mind. For example, the red giant star Betelgeuse
20 2 Light
in the constellation of Orion has an angular subtense of 0.047 arcsec [660]. Assuming that
its effective wavelength is 575 nm, then its transverse coherence length is about 2.52 m!
Images of the stars do look like images of coherent sources.
2.4 Polarization
The constraints imposed by Maxwell’s equations require that far from their source electric
and magnetic fields are orthogonal to each other and to the direction of the propagation.
Since the magnetic field can be determined from the electric field, we will discuss only the
behavior of the electric field. The electric field, ξ, of the electromagnetic wave is a vector
that has a magnitude as well as a direction, which can vary in the plane perpendicular to the
vector of wave propagation. Therefore, there are two degrees of freedom in the direction
of the electric field and these can be represented by two basis vectors. The variation of the
electric vector direction as a function of time is called the polarization.
where ν is the frequency [s−1 ], A x and A y are the amplitudes [V m−1 ], and δx and δ y are the
phases [rad]. For the following discussion, the important parameter is the phase difference
δ = δ y − δx .
From electromagnetic theory [512, p. 70], the radiant flux density [W m−2 ] of the wave
is given by the magnitude of the Poynting vector, P:
n 2
P(t) = ξ (t), (2.5)
cµ
where ξ 2 (t) = ξx2 (t) + ξ y2 (t), n is the index of refraction, µ is the magnetic permeability,
and c is the velocity of light in vacuum. For visible light, the frequency is on the order of
1014 Hz, too fast to be measured by almost all instruments that measure energy flux. What
is measured is the time-averaged radiant flux density, P(t), [W m−2 ]. Since the averaged
value of cosine squared is 1/2,
n
P(t) = (A2 + A2y ) = η(A2x + A2y ), (2.6)
2cµ x
where η = n/(2cµ).
2.4 Polarization 21
E
x
z
Y
E E
The electric field vector varies continuously as a function of the phase of the wave within
the duration of the wave train. When δ = 0, the direction of the vector remains constant, the
light is said to be linearly polarized (or plane polarized). When δ = ±π/2 and A x = A y ,
the direction of the vector varies and traces out a circle, and the light is said to be circularly
polarized. In the most general case, the direction of the vector traces out an ellipse and
the light is said to be elliptically polarized. The circularly (or elliptically) polarized light is
further divided into the right-hand circular (RHC) (or elliptic) polarization and the left-hand
circular (LHC) (or elliptic) polarization. The handedness convention is to observe the light
coming to us. If the electric field vector rotates in the clockwise direction, i.e., δ > 0, the
light is said to be right-hand circularly (or elliptically) polarized. If the electric field vector
rotates in the counterclockwise direction, i.e., δ < 0, then the light is said to be left-hand
circularly (or elliptically) polarized.
Another important representation of polarization is to use the the RHC polarization and
the LHC polarization as the two basis vectors. It can be shown that the electric field vector
22 2 Light
represented by Eqs. (2.4) can be expressed as the sum of a RHC wave with amplitude AR
and phase δR and a LHC wave with amplitude AL and phase δL . At the same point on the
z-axis as in Eqs. (2.4), the RHC wave is represented as
The parameters in the (x, y) and the (RHC, LHC) representations are related by the following
equations:
1 2
A2R = (A + A2y + 2A x A y sin δ), (2.11)
4 x
1
A2L = (A2x + A2y − 2A x A y sin δ), (2.12)
4
A y cos δ y − A x sin δx
tan δR = , (2.13)
A x cos δx + A y sin δ y
A x sin δx + A y cos δ y
tan δL = . (2.14)
A x cos δx − A y sin δ y
It should be pointed out that at a given point on the z-axis, the magnitude of the electric field
of the circularly polarized wave remains the same for the duration of the wave train, but
its direction is changing around a circle. The averaged radiant flux density of a circularly
polarized wave thus does not have the 1/2 factor from the averaged value of cosine squared,
and the magnitude of the Poynting vector is 2η A2R for the RHC wave, and 2η A2L for the
LHC wave. The total radiant flux density [W m−2 ] for the wave is
As we discussed in the previous section, light emitted from thermal sources consists of
short wave trains of duration on the order of 10−8 s. Each wave train has its polarization, but
it varies so rapidly (108 times a second) and randomly that most instruments cannot detect
any effects due to polarization (assuming they average out in all directions). This type of
light is said to be completely unpolarized. If the averaged polarization does not completely
cancel out in all directions and the light is not of any single polarization, the light is said to
be partially polarized. These concepts will be defined more quantitatively later.
The polarization of light is treated in the quantum theory in a very different way con-
ceptually. A photon is a two-state system. The two base states are often taken as the RHC
polarization and the LHC polarization. The reason is that each base state is then associated
with a spin number +1 or −1, with an angular momentum of h/2π or −h/2π, where h is
Planck’s constant. The state of a given photon can be any linear combination of these two
base states. For the linearly polarized light, the coefficients (or amplitudes) of the two states
are equal. For the elliptically polarized light, one coefficient is greater than the other.
2.4 Polarization 23
The Stokes vector consisting of the four Stokes parameters, s0 , s1 , s2 , and s3 , is defined as
s0 Eh + Ev
s1 E h − E v
S=
s2 = E 45 − E 135 . (2.16)
s3 ER − EL
The Stokes vector depends on the measurement geometry and the wavelength. If the po-
larizers are not ideal, corrections and transformations can be performed to calculate the
desired Stokes vector. Let us see how light of different polarizations is represented by the
Stokes vector. For a monochromatic light beam, using the same notation as before, we have
The horizontal linear polarizer only passes the ξx component, and therefore, E h = η A2x
where η = n/(2µc). Similarly, E v = η A2y . For the 45◦ linear polarizer, the x- and y-
components of the electric field have to be projected onto the 45◦ axis, and the elec-
tric field amplitude, ξ45 , is ξx cos(π/4) + ξ y cos(π/4), and the radiant flux density E 45 =
n/(µc)ξ452
, where . . . represents the time-averaged value. Carrying out the computation
for the 45◦ polarizer and similarly for the 135◦ polarizer, we have
η 2
E 45 = (A + A2y + 2A x A y cos δ), (2.19)
2 x
η
E 135 = (A2x + A2y − 2A x A y cos δ). (2.20)
2
For the RHC and LHC polarizers, we have, from the previous section,
η 2
E R = 2η A2R = (A + A2y + 2A x A y sin δ), (2.21)
2 x
η
EL = 2η A2L = (A2x + A2y − 2A x A y sin δ). (2.22)
2
Therefore, we can express the Stokes vector for a monochromatic light wave as
A2x + A2y
s0
s1 A2x − A2y
S = = η . (2.23)
s2 2A x A y cos δ
s3 2A x A y sin δ
The first Stokes parameter, s0 = η(A2x + A2y ) = P(t), is the radiant flux density [W m−2 ]
associated with the light wave. The other three parameters, s1 , s2 , and s3 , can be positive,
zero, or negative. For a completely polarized light wave, the four Stokes parameters are not
linearly independent because s02 = s12 + s22 + s32 , as can be verified from Eq. (2.23). Since
the definition of the Stokes vector is in terms of energy flux density, the Stokes vector of
an incoherent light beam is the sum of all the Stokes vectors representing each individual
2.4 Polarization 25
where the index i denotes each individual wavetrain in the beam. If the relation s02 =
s12 + s22 + s32 holds true for a light beam as in the case of the monochromatic light wave
described in Eq. (2.23), we say that the light beam is completely polarized. If the light is
completely unpolarized, the positive and the negative contributions to s1 , s2 , and s3 from all
the individual wavetrains cancel each other to make all three parameters zero. Therefore,
s12 + s22 + s32 = 0 for completely unpolarized light. It can be proved that s02 ≥ s12 + s22 + s32
for all possible mixtures of polarized light [168, p. 32]. When s02 > s12 + s22 + s32 > 0 for
a light beam, we call it partially polarized. We can now define the degree of polarization,
p, as
s12 + s22 + s32
p= . (2.25)
s0
It can be shown that for completely polarized light, p = 1; for unpolarized light, p = 0;
and for a mixture of polarized light and unpolarized light, 0 < p < 1. Furthermore, the
degree of polarization can be shown to represent a measure of the coherence (statistical
correlation) between two different components (such as the x- and y-components) of the
electromagnetic field at the same point in space [877, pp. 199–200].
Let us see how the different polarizations can be represented using the Stokes vector.
Let E denote the radiant flux density [W m−2 ] of the light beam, i.e., E = s0 . First, as
discussed just now, the unpolarized light is represented as [E, 0, 0, 0]T . Light linearly po-
larized parallel to the x-axis is [E, E, 0, 0]T and light linearly polarized parallel to the y-axis
is [E, −E, 0, 0]T . What about the light that is linearly polarized at an angle θ relative to the
x-axis? This can be represented by projecting the electric field vector to the x-axis and to the
y-axis (similarly to what is done in deriving the law of Malus). The resulting Stokes vector is
[E, E cos 2θ, E sin 2θ, 0]T . For the circularly polarized light, the representation is straight-
forward from the operational definition of the Stokes vector in Eq. (2.16): [E, 0, 0, E]T for
RHC and [E, 0, 0, −E]T for LHC .
It should be noted that although a given light beam is represented by a unique Stokes
vector, the converse is not true. Two light beams with the same Stokes vector are not
necessarily the same optically [742], i.e., there are ways to tell them apart by using optical
elements, such as polarizers. Because the Stokes vector is a function of frequency and a light
wave train is of finite duration and hence consists of many frequencies, the exact spectral
content of a light beam is quite important when we use the Stokes vector. In general, Stokes
vectors should be expressed as Stokes vector functions of frequency. This is necessary when
we consider how an optical element, a surface, or even the air, changes the polarization state
of a light beam, because these light–matter interactions are all functions of spectral content.
26 2 Light
Let us consider a polarizer (diattenuator) which is an optical element that attenuates the
orthogonal components of an optical beam differently. Let us assume that the attenuation
factors are ax and a y , for the two orthogonal components, A x and A y , as in Eq. (2.23). The
exiting Stokes vector Se is related to the incident Stokes vector Si by
2 2 2
ax A x + a y2 A2y ax + a y2 ax2 − a y2 0 0
2 2 2
ax A x − a y2 A2y a − a y2 ax2 + a y2 0 0
Se = η = 1 x
2
2ax A x a y A y cos δ 0 0 2ax a y 0
2ax A x a y A y sin δ 0 0 0 2ax a y
+ A2x A2y
A x − A2y
2
×η = MSi , (2.27)
x y
2A A cos δ
2A x A y sin δ
where
ax2 + a y2 ax2 − a y2 0 0
2
1 ax − a y2 ax2 + a y2 0 0
M=
(2.28)
2 0 0 2ax a y 0
0 0 0 2ax a y
When δ = π/2, the retarder is called a quarter-wave plate. When δ = π , the retarder is
called a half-wave plate. An ideal circular polarizer can be constructed using an ideal linear
polarizer (with the polarization axis at an angle α) followed by an ideal quarter-wave plate
(with the fast axis at an angle β), then its Mueller matrix, Mc , is the product of the two
matricies, Mp and Mr :
Mc = Mr Mp (2.31)
1 cos 2β sin 2β 0
1 cos 2α cos 2(α − β) cos 2α cos 2β cos 2(α − β) cos 2α sin 2β cos 2(α − β) 0
=
2sin 2α cos 2(α − β) sin 2α cos 2β cos 2(α − β) sin 2α sin 2β cos 2(α − β) 0
sin 2(α − β) cos 2β sin 2(α − β) sin 2β sin 2(α − β) 0
(2.32)
When α − β = ±π/4,
1 cos 2β sin 2β 0
1 0 0 0 0
Mc = (2.33)
2 0 0 0 0
±1 ± cos 2β ± sin 2β 0
2.5 Problems
2.1 A laser beam has a coherent time of 10−4 s. What is its approximate frequency band-
width ν?
2.2 A quasi-monochromatic light source has an angular subtense of 30 arcsec. If the wave-
length is 500 nm, what is its coherent area at the point of observation?
2.3 Let E be the radiant flux density [W m−2 ] of a light beam. If it is a linearly polarized
light beam with a polarization angle of 30◦ relative to the x-axis, what is its Stokes
vector?
2.4 What is the Mueller matrix of a 30◦ polarizer (relative to the x-axis)?
3 Radiometry
If we are given an optical imaging system, one thing we would like to know is how much
light will be available to our sensors. Because all sensors have limited operating ranges,
controlling the amount of light irradiating the sensors is very important to obtain the best
usage of the sensors and for the best quality of our images. In order to study the energy
flow of light through the various stages of image formation, we have to carefully define the
concepts and terms that we are going to use. The study and measurement of optical energy
flow are the subject of radiometry.
Over the years several nomenclature systems have been proposed for light measurement
and although there is still some debate on the subject, the units and terms proposed by
the CIE have gained general acceptance. These units and terms are described in detail in
the CIE publication International Lighting Vocabulary [187]. They have also been adopted
by the American National Standards Institute (ANSI Z7.1-1967) and recommended by the
publications of the (US) National Bureau of Standards [700, p. 8]. We will describe the
radiometric concepts using CIE units and terms.
The concepts and measurements of optical energy flow in radiometry are traditionally
defined using geometrical optics. For example, optical rays are used to define the cone
associated with a light beam and the path by which a ray is transmitted from one medium to
another is determined by Snell’s law. As a consequence of this idealization, many concepts
lose their meanings when the spatial dimension is reduced to an infinitely small distance.
For example, in an imaging system, rays cannot converge to an ideal image point, and the
physical optics of diffraction has to be considered. However, this difficulty does not make
these concepts useless because, in practice, we never measure things in infinitely small
spatial extents. Similar problems exist in other fields of physics as well. For example, we
often say that the humidity of the air is such and such percent, but in a volume smaller than
a hydrogen atom, the definition of humidity loses its meaning completely. This does not
make humidity a useless concept in a weather report.
It should be pointed out that one can also formulate the concepts of radiometry from
physical optics and take into account the coherence property of the light [598, 631, Chapter 5,
1041]. Conventional radiometry has been mainly developed for thermal sources that have
incoherent radiation. It can be shown [631, Chapter 5] that for a general class of sources,
29
30 3 Radiometry
called quasi-homogeneous sources1 (to which thermal sources usually belong), a radiometric
model can be developed from physical optics. However, the treatment becomes unnecessarily
complicated for most color imaging applications where light is almost always incoherent.
In the description of optical energy flow, there are quantities, such as radiance, which
are associated with light rays and they are functions of both position and direction. Other
quantities, such as radiant exposure, which are associated with surfaces or volumes, are
functions of only position or only direction, but not both. Of course, there are also quan-
tities, such as energy, that are not associated with geometry. Also, there are quantities that
are sometimes associated with rays and other times are associated with surfaces. For this
last class of quantities, the exact meaning can often be understood from the context in
which they are used. It is therefore not necessary to create separate terms for these quan-
tities when used in different ways. The International Lighting Vocabulary [187] usually
follows this principle. For example, irradiance is defined as the radiant flux per unit area
of a surface. It can mean the radiant flux from a given direction, or it can mean the total
radiant flux integrated over the hemisphere of the surface. The irradiance is thus a function
of both position and direction in the former, but it is only a function of position in the
latter.
The unit of radiant light energy, Q, is the joule. The energy flow per unit time through a
point (x, y, z) in space in a direction (θ, φ) is called the radiant flux, (x, y, z, θ, φ) [W].
It is important to realize that the radiant flux is a function of both the position and direction,
and, in this sense, it is a quantity associated with a light beam.
At a distance very far away from the light source, we can treat the light source as a
point source. A point source is an idealization which has a spatial location but no physical
dimension. Although there is no such thing as a point source, it is a good approximation in
many real-world problems. For example, a star can be treated as a point source even if it is
much larger than the earth. As a rule of thumb, if the physical dimension of the source is
smaller than one-tenth of the distance between the source and the object, the source can be
treated as a point source and the error in the radiometric calculation is on the order of 10%
(see Example 3.1).
Solid angle
Since a cone of rays radiating from a point source diverges at a rate proportional to the
inverse of the square of the distance from the source, we need a measure of the spatial
1 Quasi-homogeneous sources are sources whose linear dimensions are much larger than the coherence length
of their radiation.
3.1 Concepts and definitions 31
Aproj
N
q
V
extent of the cone angle. The concept for this measure is called the solid angle (in contrast
with the plane angle we use in two-dimensional space). The solid angle of a cone is defined
as the area cut out by the cone on a unit sphere (radius = 1) that is centered at the apex
of the cone (see Figure 3.2). If the intersection of a cone and the unit sphere has an area
ω, we say that the cone has a solid angle of ω. A sphere thus has a solid angle of 4π . If
the intersection is between the cone and a sphere of radius r , the solid angle is equal to
the intersection area divided by r 2 . The concept of a solid angle is more general than a
regular cone. In Fig. 3.2, a patch of surface, S, of arbitrary shape, bounded by a closed
curve, C, in space, forms a solid angle with a point, P. If all the lines connecting P and
32 3 Radiometry
dr
R z
P
q
every point on C cut off an area a on the sphere centered at P with a radius of r , then the
surface patch S forms a solid angle of a/r 2 with the point P. The unit of solid angle is
called the steradian, because the measure of the intersection area is relative to the radius
of a sphere. The steradian is the solid angle that, having its vertex at the center of a sphere
with radius r , cuts off an area of the surface of the sphere equal to r 2 . To calculate the
solid angle subtended by an area of arbitrary shape, the differential definition of the solid
angle is
dAproj
dω = , (3.1)
r2
dAproj
ω = dω = , (3.2)
A A r2
where dAproj is the elemental area dA projected to the direction of the ray connecting dA
and the point source.
Solution Since points on the disk are not equal distances away from P, we will divide the
disk into ring-shaped elemental areas so that the solid angle can be calculated by integration.
If the area of a ring has a radius of r , its distance from P squared is r 2 + z 2 . Since the area
of the ring, dA, is not perpendicular to the view from P, we have to use its projected area
dAproj . Therefore the solid angle ω can be calculated as
R
dAproj R
(2πr dr ) cos θ R
2πr zdr z
ω= = = = 2π 1 − √ .
0 r 2 + z2 0 r 2 + z2 0 (r + z )
2 2 3/2
R2 + z2
(3.3)
3.1 Concepts and definitions 33
df
dw
df
I =
dw
Intensity I
If the radiant flux leaving a point source through a cone of solid angle dω [sr] is d [W]
(see Fig. 3.4), the quantity that describes the light output from the point source at this
cone is called the radiant intensity, I , which is defined as the radiant flux per unit solid
angle, i.e.,
d(x, y, z, θ, φ)
I (x, y, z, θ, φ) = W sr−1 . (3.4)
dω
It should be pointed out that, in theory, light from a source of very small dimension has
a very large coherent area. Therefore, images of stars taken by well-corrected telescopes
often exhibit certain features of diffraction patterns that are characteristic of coherent light
source [631, p. 154]. This will be discussed in Chapter 9 when we derive the OTF of an
ideal lens. The concept of intensity is an idealization and it is useful mostly in developing
other terms in radiometry.
Radiance L
If the light source is nearby, we can no longer treat it simply as a point source. If we divided
the area of the source into very small elements d A, each one can be treated as a point
source, but now the amount of light coming from that element is proportional to its area
dA. Furthermore, if that elemental surface is not perpendicular to the direction of the cone
of rays, its projected area dA cos θ has to be used, where θ is angle between the cone and
the surface normal of the element. The quantity that describes the amount of light coming
from a surface is called the radiance, L [W sr−1 m−2 ], which is the light flux per solid angle
per projected surface area (see Fig. 3.5):
d2 (x, y, z, θ, φ)
L(x, y, z, θ, φ) = W sr−1 m−2 . (3.5)
dA cos θdω
34 3 Radiometry
df
dw
q
dA
Therefore, we use radiance to describe how much light is reflected from a wall, or how
much light is coming from the sun.
Lambertian sources
Most reflecting surfaces appear to be approximately equally bright, independently of
the viewing angle. As we will discuss in later chapters, there are two reasons for this,
one physical and one psychophysical. The physical reason is that the retinal image irradi-
ance is proportional to the scene radiance, and most object surfaces reflect light (excluding
the specular highlight) with approximately equal radiance in all directions. A radiation
source (either self-emitting, transmitting, or reflecting) whose radiance is completely inde-
pendent of the viewing angle is called a Lambertian source (self-emitting) or a Lambertian
surface (reflecting or transmitting). A Lambertian surface that reflects (transmits) 100% of
the incident light is called a perfect reflecting (transmitting) diffuser. This idealized reflec-
tor (transmitter) is important because it is used in the definition of radiance factor, which
describes how bright a surface looks to an imaging system.
3.1 Concepts and definitions 35
dF
q
dq
r
dA
f df
Figure 3.6. The spherical coordinate system used to integrate the total radiant flux from an area dA.
Solution To find the radiant exitance, we first calculate the total amount of flux radiated
from the surface. The integration of the total flux, , is best done in the spherical coordinate
system as shown in Fig. 3.6.
= d = L · dA · cos θ · dω,
π/2 2π
= L · dA cos θ · [(r dθ)(r sin θdφ)/r 2 ],
θ=0 φ=0
π/2 2π
= L · dA cos θ sin θdφdθ = π L · dA.
θ=0 φ=0
Therefore, M = /dA = π L. This is a very useful relation when we want to estimate the
radiance (or luminance in photometry) from the surface illumination. For example, if a
perfect Lambertian surface (i.e., reflectance = 1) is illuminated with 7π W m−2 , then we
know that its reflected radiance will be 7 W m−2 sr−1 . (Note that the dimension of exitance
M is [W m−2 ], while the dimension of radiance L is [W m−2 sr−1 ].)
Solution All points on a ring with a radius of s on the disk source have the same distance
and orientation relative to the elemental area d A, which forms a solid angle of dω to any
point on the ring, where
dA · cos θ
dω = .
z2 + s2
36 3 Radiometry
s
dA
q
z
ds
The radiant flux, d, illuminating dA from the ring dA can be calculated as
The total radiant flux illuminating dA can be computed by integrating the ring over the
entire disk:
r
2π s cos2 θ
= d = L · (2π s cos θds) · dω = L · dA · ds
0 z2 + s2
r
2πsz 2 r2
= L · dA · ds = π L · dA · .
0 (z 2 + s 2 )2 z2 + r 2
πr 2
E= = 2 L.
dA z + r2
Note that when z r , E → (πr 2 /z 2 )L, and the irradiance at a plane far away from a
circular, Lambertian source is simply the radiance L of the source times the solid angle of
the source.
In this example, the irradiated plane element dA is located on the axis of symmetry and
that makes the integral simple. What happens if we shift dA off the axis by an amount h?
As expected, the integral becomes much more complicated. It is derived in Section 10.2.
Here we show the result for completeness of this discussion.
z
q
r
dA
Radiant exposure
For sensors from which time-dependent signals cannot be read out continuously, a shutter
has to be used to control the time interval during which the sensors are exposed to the image
signals. Basically, these sensors respond to the energy per unit area, rather than the power
per unit area. All photographic films and most CCD sensors function this way.2 Therefore,
the proper quantity for these sensors is the integrated product of image irradiance and time,
2 The human eye is different because the image signal is processed continuously in real time.
38 3 Radiometry
which gives energy per unit area. This quantity is called the radiant exposure, H . Let t be
the exposure time, then
dQ
H= = Edt [J m−2 ]. (3.8)
dA t
Reflectance ρ
Light illuminating an object surface is partially reflected, partially absorbed, and partially
transmitted. The ratio of the radiant flux of the reflected light to that of the incident light is
called the reflectance, ρ. In general, the value of reflectance depends on the geometry, the
polarization, and the spectral composition of the incident and reflected light. Since energy
cannot be created in a passive material, reflectance cannot be greater than 1. A Lambertian
surface with reflectance equal to 1 is called a perfect (reflecting) diffuser.
Solution Let L be the reflected radiance of a perfect reflecting diffuser identically irradi-
ated. The total radiant flux from the surface is π L, which is equal to the total flux irradiated
on the surface. The radiance from the safety reflector is approximately 0.6π L/(0.4π ), i.e.,
1.5L. Therefore, the reflectance factor in the cone direction is 1.5L/L = 1.5. This example
3.2 Spectral radiometry 39
shows that the surface reflectance, ρ, is always less than 1, but its reflectance factor, R,
from a certain viewing geometry, can be greater than 1. In Chapter 10, we will show that
the image irradiance is proportional to the surface radiance, which, in turn, is proportional
to its reflectance factor. Therefore, the brightness of an object in an image is proportional
to its surface reflectance factor, not its reflectance. The image of a safety reflector appears
brighter than that of a Lambertian white object. Things that look brighter than white often
appear to be self-luminous sources.
Responsivity (sensitivity) s
The responsivity of a detector is defined most generally as the ratio of the output signal to
the input power (light flux). Its unit depends on the form of the electrical response exhibited
by a given detector (see [127, p. 109]). However, there are detectors (such as photographic
films) that respond to the total light energy (exposure), and the responsivity can be defined
as the ratio of the output signal to the input light energy. If the output signal is measured as a
function of the wavelength of the input light, the spectral responsivity is denoted by sλ . The
term sensitivity is often used interchangeably with responsivity. However, in vision science,
sensitivity is defined as the inverse of the input power required for the system to produce a
thresold response. If the system response is a nonlinear function of the input power, measured
sensitivity is a function of the threshold response that is chosen for the measurement.
Although most radiometric measurements are simple to describe, they are very difficult
to perform accurately. The problem is that there are many factors that have to be taken
into account and uncertainty in each one of them introduces errors [1043]. For example,
the responsivity of a detector is often a function of the wavelength of the radiation, the
temporal variation of the radiation, the direction of the incident radiation, the position on
the detector, the temperature of the instrument, the polarization of the radiation, and the
degree of coherence of the radiation, etc.
The concepts and terms we described above are defined without reference to the spectral
composition of the light that we are measuring. In practice, the instrument’s response as
a function of wavelength has to be specified when reporting those quantities. When our
interest is in quantifying light energy flow as a function of spectral wavelength or frequency,
the quantities we have defined so far have to be measured within a small wavelength or
frequency interval. The resulting quantities are called by the same names but the adjective
“spectral” is added in front of the terms, and their units become per unit wavelength interval
(e.g., nanometer) or per unit frequency interval. For example, spectral radiance is L λ =
d3 /(dωdλdA cos θ) if a wavelength interval is used, or L ν = d3 /(dωdνdA cos θ) if a
frequency interval is used. The relation between L λ and L ν can be derived from the relation
c/n = νλ, where c is the velocity of light in the vacuum, and n is the index of refraction of
the medium, and
nλ2
Lν = L λ. (3.9)
c
40 3 Radiometry
So far, we have covered some of the most basic concepts and terms in radiometry. The
International Lighting Vocabulary contains 950 terms, which are well annotated. Obviously,
it is not appropriate to discuss all the terms here. Some of the terms that are related to our
interests in this book can be found in the Glossary at the end of the book. The reader should
consult the Glossary whenever an unfamiliar term is encountered.
In 1987, the International Electrotechnical Commission (IEC) and the CIE issued the
fourth edition of the International Lighting Vocabulary [187]. The aim of the publication is
to promote international standardization of quantities, units, symbols, and terminology in
the field of radiometry, photometry, and lighting engineering.
In Section 3.1, we discussed the concepts and definitions of radiant flux, radiance, and
irradiance, etc. These quantities are measured in terms of radiant energy, which is usually
denoted by a subscript e so that their symbols become e , L e , and E e . When different
wavelengths are weighted by their visual efficiency in producing the “brightness” sensation,
the resulting terms are the luminous flux, luminance, and illuminance, and they are denoted
by v (unit: lumen), L v (unit: candela per square meter), and E v (unit: lux). These will
be defined later when we discuss photometry. Sometimes, it is desirable to measure the
radiation in terms of number of photons. In this case, the symbols have the subscript p. The
corresponding terms are the photon flux p , photon radiance L p , and photon irradiance E p .
Table 3.1 shows the correspondence of these different measures.
In geometrical optics, light is treated as rays, and tracing rays from one point in space to
another is an operation carried out in diverse applications, such as lens design, thermal
radiative transfer, scanner engineering, and computer graphics. An implicit assumption in
ray tracing is that each ray carries with it some measure of energy flux through space and
time. The radiometric quantity associated with each ray is the so-called “basic radiance”
[700] which is L/n 2 , the radiance divided by the index of refraction squared. It can be
proved that the basic radiance is conserved when light is propagated through nonabsorbing
and nonscattering media. This conservation property of the basic radiance is called the
radiance theorem.
Figure 3.9 shows the propagation of the basic radiance along a light ray through two
different media. Let dA be an elemental area of the interface between two media, with
refractive indices, n 1 and n 2 . From Snell’s law, a ray, with an incident angle θ1 relative to
3.4 Radiance theorem 41
n1 n2
dw2 L2
q2
dA
d w1
L1
q1
measures: L ν and L λ . If the spectral radiance is measured with respect to frequency ν inter-
val, the basic spectral radiance remains the same, L ν /n 2 . However, if the spectral radiance
is measured with respect to wavelength, L λ , the basic spectral radiance changes to L λ /n 3 ,
because the radiant flux is measured per wavelength interval and the wavelength has to be
scaled by n:
dL 1 /n 21 = (L λ )1 dλ0 /n 31 , (3.17)
dL 2 /n 22 = (L λ )2 dλ0 /n 32 . (3.18)
Since dL 1 /n 21 = dL 2 /n 22 , we have
L λ1 Lλ
= 32 . (3.19)
n 31 n2
d A1
R q1
q2
d A2
Figure 3.10. The integrating sphere.
and
d L · dA1
dE 2 = = , (3.23)
dA2 4R 2
which is independent of the location of d A2 if dA1 is a perfect diffuser, i.e., L is a constant
for all directions.
An integrating sphere is an elegant way to achieve uniform diffuse lighting for measuring
reflectances. A reflectance sample can be positioned inside the sphere and an incident beam
is then directed at the sample. The light is reflected to and integrated by the sphere. Since
every point inside the sphere is uniformly illuminated, one can place the detector wherever
convenient to measure the irradiance which is proportional to amount of reflected light. The
same principle can be applied to the measurement of the radiant flux of a large beam of
nonuniform cross-section, because the sphere turns the beam into uniform surface irradiance
proportional to the incident radiant flux, and a small aperture detector can then be used for
the measurement. Figure 3.11 shows an example of how an integrating sphere can be used
to measure surface irradiance due to an extended source. The entrance port of the sphere
is positioned at the location of interest and the measurement is taken at the exit port. The
integrating sphere can be only a few centimeters in diameter. The entrance port allows it to
collect over a wide solid angle formed by the extended source, and the exit port provides a
fairly uniform area for measurement.
Natural and artificial light sources are often compared with the idealized concept of black-
body radiation for good reasons. Sunlight and its scattered portion, skylight, are the most
important sources of optical radiation on earth. Their combination, called daylight, varies
greatly according to different humidity, dust, cloud conditions and the angle of the sun.
44 3 Radiometry
extended source
spectroradiometer
entrance port
exit port
integrating sphere
Figure 3.11. Using an integrating sphere to measure the spectral irradiance on a surface illuminated
by an extended source.
However, the color quality (chromaticity) of daylight is very similar to that of blackbody ra-
diation at various temperatures. In addition, the spectral compositions of artificial tungsten
light sources are very close to those of blackbody radiation. For these reasons, light source
colors are often compared with those of blackbody radiators at different temperatures, and
are often specified by their “correlated color temperatures” which will be formally defined
later.
5.0
4.5
4.0 93
00
K
3.5
log radiance (W cm−2 mm−1)
650
0K
3.0 5000 K
2.5 4000 K
2.0 K
3200
K
1.5 2800
1.0
0.5
0.0
0.0 500.0 1000.0 1500.0 2000.0
wavelength (nm)
Figure 3.12. The spectral radiance functions of blackbody radiators at different temperatures.
which is known as Wien’s formula. T C2 /(nλ) turns out to be the temperature range that
is most relevant to color imaging. For λ = 550 nm and n = 1, C2 /(nλ) = 26 160 K, which
is much higher than the color temperatures of most commonly available light sources.
When the temperature is very high (C2 nλT ),
3.7 Problems
3.1 In vision literature, a visual angle is often used to describe the size of an object. If the
visual angle of a circular disk is α (rad), what is the solid angle (sr) subtended by the
disk relative to the eye?
3.2 Given a square plate of frosted glass which is a cm on a side (see Fig. 3.13) and can
be considered to be a Lambertian source with radiance L [W m−2 sr−1 ], calculate the
irradiance E at a point P on a plane parallel to the plane of the source, and separated
from the source by z meters.
z P
a
radiance L irradiance E
Figure 3.13.
3.3 There are two ways to express the spectral radiance: radiance per unit wavelength,
L λ , and radiance per unit frequency, L ν . How is L λ related to L ν ?
3.4 For an application, it is necessary to provide a uniform illumination for a planar test
target with a size of 25 cm × 25 cm. Suppose we position one spherical source directly
above each of the four corners at a distance d from the target plane. Assuming that each
spherical source is 5 cm in diameter and is uniformly bright (i.e., has equal radiance)
in all directions, what is the minimum distance d that will produce an illumination
uniform to within 10%, i.e., the maximum illuminance is no larger than 1.1 times the
minimum illuminance on the test target? (Hint: For a Lambertian sphere source of
radius r and luminance L, a surface perpendicular to the source direction will have
an illuminance E. It can be shown from spherical symmetry that E = π L(r 2 /d 2 ).
48 3 Radiometry
The symmetry argument is: since the flux from a unit area on the sphere is π L (from
Example 3.2), the total flux of the sphere is 4πr 2 (π L) which is distributed uniformly
to an area of 4πd 2 at a distance of d.)
3.5 The surface of many traffic signs is made to reflect light more strongly in the direction
perpendicular to the surface. Suppose that some snow happens to cover part of a
STOP sign. The snow on the STOP sign looks white in daylight, but looks dark gray
in the evening when it is illuminated by car headlights. How do you explain such a
perceptual difference?
4 Photometry
In our discussion of radiometry, light flux is measured in terms of power or energy. However,
even under identical viewing conditions, equal power of light of different wavelengths does
not produce equal brightness sensation in our visual perception. (In an extreme case, even
a kilowatt infrared source will not help us to see.) Therefore, radiometric quantities are not
always meaningful in our visual world, especially in the field of illumination engineering. For
example, in order to illuminate a library reading room, we need to know “how much” visible
light our chosen light sources will provide for reading. For these types of applications, we
need to measure light flux in quantities that are representative of its visual impact, such as
brightness. Photometry deals with measurements of visible light in terms of its effectiveness
to produce the “brightness” sensation in the human visual system. Given two stimuli of
different spectral compositions, the basic goal of photometry is to set up a quantitative
procedure for determining which stimulus will appear “brighter” or more luminous to an
average observer.
Measuring visual quantities of light is complicated because light stimuli of different
spectral compositions produce complex perceptions of light, such as bright red or dark
green. It is not easy (if not impossible) to order these different color sensations along a
single, intensive scale. In fact, years of research have not produced a completely satisfac-
tory solution. However, the applications are so important that an agreed-upon, incomplete
solution is better than no solution at all. It is important to point out that we should not
be surprised if, once in a while, object A appears to be brighter than object B when the
photometric measurements tell us that object B is more luminous than object A.
There are two types of photoreceptor, rods and cones, in our retinas and they operate under
different illumination levels. In the low-illumination range, only rods are responsible for
seeing and we call this range of vision scotopic vision. In the high-illumination range, only
cones are responsible, and we call this range of vision photopic vision. In the transition
between these two ranges, both rods and cones are responsible for vision, and we call
the transition range of vision mesopic vision. The peak luminous sensitivity changes from
555 nm for photopic vision to 507 nm for scotopic vision (see Fig. 4.1). This shift is called
the Purkinje shift. As the ambient light level becomes dimmer (as in the twilight), objects
that reflect mostly long wavelength light (such as red flowers) look darker compared with
49
50 4 Photometry
V′ V
luminous efficiency 1.0
0.8
scotopic photopic
0.6
0.4
0.2
0.0
300.0 400.0 500.0 600.0 700.0 800.0
wavelength (nm)
Figure 4.1. The CIE luminous efficiency functions: V (λ) for the photopic vision, V (λ) for scotopic
vision.
other objects that reflect mostly short wavelength light (such as blue flowers). This is due
to the difference in spectral absorption in the photopigments in cones and rods.
Monochromatic lights of different wavelengths, but the same power, do not produce equal
brightness or luminous sensation. For example, in the photopic range, it is found that light of
around 555 nm wavelength requires the least radiant power to produce a given level of lumi-
nous sensation. The problem of quantifying the photopic luminous sensation is that there is
no single photoreceptor that is responsible for it. Instead, in the photopic range, the sensation
is produced from the quantum catches of the three cone mechanisms. The exact dependence
of the luminous sensation on the three cone photoreceptions is unknown, and probably will
remain so for some time to come. In the scotopic range, there is only a single type of
photoreceptor (the rods) that actively responds to light at this low level, and the measured
luminous sensation as a function of wavelength does correspond to the spectral sensitivity of
rod, as measured in vitro, with correction for the absorption of the optical media in the eye.
The difficulty of comparing the brightness of objects of very different colors is imme-
diately clear to anyone who has tried to do so. To deal with the problem, measurement
procedures have been proposed and the quantitative specifications of luminous sensation
are thus defined by the results from these procedures [479]. A typical procedure involves
matching the brightness of one wavelength against that of another wavelength. In order to
reduce the difficulty in matching the brightness of different colors, the two wavelengths un-
der comparison are chosen to be similar. Therefore, one compares the brightness of 500 nm
light with that of 505 nm light, then one compares 505 nm light with 510 nm light, and so
on. This procedure is called step-by-step brightness matching. Another brightness matching
procedure involves temporal comparison. Lights of two different wavelengths are presented
at the same spatial location in quick succession temporally at a certain frequency. If the two
wavelengths differ a lot in brightness, one sees flickers. When the radiant power of the darker
of the wavelengths is adjusted upward, the perceived flickering sensation is reduced. When
4.1 Brightness matching and photometry 51
the ratio of the radiant powers of the two wavelengths is such that the flickering sensation is at
a minimum, the two wavelengths are said to be of equal luminance. This is called minimum-
flicker brightness matching. The setting of the alternating frequency is somewhat involved.
If the frequency is too high (e.g., higher than 30 Hz), one cannot see any flicker. If it is too low
(say lower than 2 Hz), one always sees the two alternating stimuli. Therefore the frequency
has to be adjusted so that a small amount of flicker is always perceived (about 12–15 Hz).
In addition to the minimum-flicker and step-by-step brightness matching methods, there
are other methods for determining the relative efficiency of different wavelengths to produce
luminous sensation. Some of these alternative methods are: direct heterochromatic bright-
ness matching, absolute thresholds, increment thresholds, minimally distinct border, and
visual acuity [185]. It is important to point out that different brightness matching procedures
do not produce the same luminous efficiency curve, and even within the same procedure,
data may vary a lot from observer to observer. The range of interobserver variation in the
luminous efficiency function as measured by flicker photometry can be as large as one log
unit in the short wavelength region, although the standard deviation of the variation is much
smaller (on the order of 0.1 log10 unit) [614]. A good discussion of the various methods and
issues can be found in [479].
Empirical findings of brightness matching are summarized in the forms of laws [1053,
p. 252] as:
1. Symmetry law: If stimulus A matches stimulus B, then stimulus B matches stimulus
A. (It does not matter how you move them around.)
2. Transitivity law: If A matches B and B matches C, then A matches C.
3. Proportionality law: If A matches B, then α A matches α B, where α is any positive
factor by which the radiant power of the stimulus is increased or decreased, while its
spectral composition is kept the same.
4. Additivity law: If A matches B and C matches D, then (A ⊕ C) matches (B ⊕ D),
where the operator “⊕” means additive color mixture. (By “additive mixture”, we mean
a color stimulus of which the radiant power at any wavelength is equal to the sum of the
powers at the same wavelength of the constituents of the mixture.) Also, if A matches
B and (A ⊕ C) matches (B ⊕ D), then C matches D.
The four laws are similar to the four laws in color matching to be described in Chap-
ter 6. Although the four laws in color matching are intuitively acceptable from the point
of view of equivalent quantum catches by photoreceptors, the similar laws in brightness
matching do not have the same foundations based on receptor-level matching. In particular,
the proportionality law and the additivity law, sometimes referred as Abney’s laws, have
been questioned many times, and there is plenty of evidence to show their failure [185].
Among the different brightness matching methods, it is found that the minimum-flicker
method, the minimally distinct border method, and the visual acuity method yield results
that follow the additivity law well. Other methods of determining brightness matches do not
do as well. In particular, chromatic stimuli generally require less luminance to match their
brightness to that of the achromatic stimuli. This effect is called the Helmholtz–Kohlrausch
effect. A number of systematic studies have been conducted to quantify this effect [973].
However, the deviations of luminous quantities calculated using these two laws are often
52 4 Photometry
acceptable for many applications (e.g., illumination engineering) for which the convenience
in computation often outweighs concern about the resulting inaccuracy. Applications, such
as color reproduction, that are much more sensitive to the perceptual errors have to model
brightness more accurately. One of the obvious problems with flicker photometry is that the
measurement involves temporal variations that are faster than our everyday color perception.
The other problem is that the spatial frequency has not been treated as a measurement vari-
able. The discrepancy between different brightness matching methods has been attributed to
the fact that certain perceptual information processing channels (the red–green and yellow–
blue chromatic channels) cannot respond (or respond with attenuated amplitudes) to the
flicker frequency used in the flicker photometry. As a consequence, the measured bright-
ness only represents the response from the achromatic channel. Brightness perception in a
static scene consists of contributions from the chromatic channels as well as the achromatic
channel. This “explanation” seems to be in general agreement with the experimental data.
If we are willing to assume that the luminous sensation is additive, one of the important
tasks in photometry is to measure the relative efficiency of light of various wavelengths
in producing the luminous sensation. The resulting function, called the spectral luminous
efficiency function, can then be used as a weighting function for calculating the luminous
sensation for stimuli of any spectral compositions.
Let us denote the wavelength of the maximal efficiency by λm . A spectral luminous
efficiency function gives the ratio of the radiant power of a monochromatic stimulus at
wavelength λm to that of a monochromatic stimulus at wavelength λ, when the two stimuli
produce luminous sensations that are judged to be equivalent under some given viewing
conditions and according to certain specified criteria. In 1924, the CIE adopted the standard
spectral luminous efficiency function for photopic vision, V (λ), which was derived from
the results of several independent experiments. The final result was not an average of the
experimental data, but a weighted assembly of the different sets of data [1053, p. 395]. In
all the experiments, a 2◦ test field was used. The luminance level of the experiments was
around 1.5 cd m−2 . Finally, it should be noted that because of the age-related increase of
optical density of the eye lens, especially in the short-wavelength region, there is a gradual
reduction of luminous efficiency with age in the corresponding wavelength region [817].
In 1951, the CIE adopted the standard spectral luminous efficiency function for scotopic
vision, V (λ), which was based on experimental data from subjects all under 30 years of
age. Both V (λ) and V (λ) are shown in Fig. 4.1. The photopic luminous efficiency function,
V (λ), has its maximum value, 1.0, at 555 nm, and the scotopic luminous efficiency function,
V (λ), at 506–508 nm. A detector whose relative spectral responsivity curve conforms to the
V (λ) function (or to the V (λ) function) may be referred to as a CIE Standard Photometric
Observer for photopic (or scotopic) vision. Because of the complexity of the problem, CIE
has not recommended standards for the mesopic luminous efficiency function yet [188].
The V (λ) function is recommended for a visual field of 2◦ . For large fields, the lu-
minous efficiency is higher at short wavelengths. Recognizing the difference, CIE also
4.2 The spectral luminous efficiency functions 53
provisionally provided a 10◦ luminous efficiency function, V10 (λ) (which is the same as
y10 (λ) for the CIE 1964 10◦ color matching functions) [1053, p. 397]. For field sizes larger
than 10◦ , the luminous efficiency function remains quite close to the 10◦ data. However,
V (λ) remains the official recommendation for photometric calculations.
Compared with the photopic V (λ), the scotopic V (λ) is relatively free from difficulties.
Since only one type of photoreceptor (rods) is responsible for scotopic vision, the additivity
law holds well. The interobserver variation for V (λ) is also small. At the opposite end
of the case of standardization, the luminous efficiency function for mesopic vision (from
about 10−3 cd m−2 to 3 cd m−2 ) is a sensitive function of the luminance level. Empirical
formulas have been suggested for calculating mesopic luminance from photopic luminance
and scotopic luminance [185, p. 21, 188]. From the above discussion, we can also predict that
the luminous efficiency function of color-deficient (color-blind) observers is quite different
from that of color-normal observers. For example, protanopia observers (those missing the
long-wavelength sensitive cones) have abnormally low luminous efficiency at the long-wave
end of the spectrum. Their peak luminous efficiency wavelength is shifted from the 555 nm
of the normal observer to 540 nm. The totally color-blind observer, missing all three types
of cone, has a luminous efficiency function identical to the scotopic V (λ).
Not long after CIE published its 1924 recommendation for V (λ), experimental data
began to show that the recommended luminous efficiency value was too low in the short-
wavelength range. In 1951, Judd proposed a modified luminous efficiency function to correct
the deficiency. Judd’s modified function became widely used by the vision research commu-
nity and was finally published by CIE, with some slight modification, as one of the optional
luminous efficiency functions. It is called the CIE 1988 Modified Two Degree Spectral
Luminous Efficiency Function for Photopic Vision, VM (λ) [190]. This modification is a
supplement to, not a replacement of, the CIE 1924 V (λ). Figure 4.2 shows the comparison
between V (λ) and VM (λ).
1.0
0.8
luminous efficiency
0.6
0.4
0.2
VΜ(l)
V (l)
0.0
300.0 400.0 500.0 600.0 700.0 800.0
wavelength (nm)
Figure 4.2. Comparison between CIE 1924 V (λ) and Judd’s modified version VM (λ) for photopic
vision.
54 4 Photometry
The terms radiant flux and radiant power (symbol: e ; unit: watt, W) are synonyms for
power emitted, transferred, or received in the form of radiation. The spectral concentration
of radiant flux, called the spectral radiant flux, is represented as e,λ . For photopic vision,
the luminous flux v (unit: lumen) of a radiation whose spectral distribution of radiant flux
is e,λ (λ) can be expressed by the equation:
830 nm
v = K m e,λ (λ)V (λ) dλ, (4.1)
360 nm
where the scaling factor, K m , is known as the maximum spectral luminous efficacy of
radiation for photopic vision. Its value is defined to be 683.002 lumens per watt [lm W−1 ].
For scotopic vision the luminous flux v is given by:
830 nm
v = K m e,λ (λ)V (λ) dλ, (4.2)
360 nm
where the scaling factor, K m , is known as the maximum spectral luminous efficacy of
radiation for scotopic vision. Its value is defined to be 1700.06 lm W−1 . K m and K m are
defined so that the wavelength 555.016 nm has the same value of luminous flux for both
photopic and scotopic vision.
Solution Measurement data are often given at discrete wavelengths. Therefore, it is im-
portant to determine how the data are measured before we can calculate the spectral radiant
flux, e,λ (λ). In this example, the data points are given at discrete wavelengths at every
10 nm. For example, L(720 nm) = 0.6160 W means that the total radiant power illuminating
the sample skin, from λ = 715 nm to λ = 725 nm, is 0.6160 W. Sometimes, L(λ) is given as
power per nanometer, then L(λ) needs to be multiplied by 10 for the 10 nm interval. Here,
we will assume that L(λ) values are power per 10 nm interval. Using the discrete sum to
replace the integral, we can find the source luminous flux illuminating the skin:
v (illumination) = K m L(λ)V (λ) = 683.002 [lm W−1 ] × 10.571 [W] = 7220 [lm],
λ
λ (nm) L(λ) (W) ρ(λ) V (λ) L(λ)V (λ) (W) L(λ)ρ(λ)V (λ) (W)
Therefore, the reflectance of the sample skin is about 0.44, which is typical of fair Caucasian
skin in a body area not exposed to sun. In this example, the reflectance is calculated with
respect to V (λ) and a daylight source (D65). In other applications, it may be calculated with
other system spectral response functions. Therefore, when we use the phrase “reflectance”
or “reflectance factor” in color imaging applications, it is important, wherever appropri-
ate, to specify the light source, the imaging geometry, and the system spectral response
function [563]. Let us illustrate this point by recalculating in Table 4.2 all the numbers
with respect to the scotopic luminous efficiency function V (λ). Using the discrete sum
to replace the integral, we can find the scotopic luminous flux of the source illuminating
56 4 Photometry
λ (nm) L(λ) (W) ρ(λ) V (λ) L(λ)V (λ) (W) L(λ)ρ(λ)V (λ) (W)
the skin:
v (illumination) = K m L(λ)V (λ)
λ
= 1700.06 [scotopic lm W−1 ] × 10.469 [W] = 17978 [scotopic lm],
The reflectance of the sample skin with respect to V (λ) is now reduced to 0.38 (from 0.44).
This is easy to see because, comparing with V (λ), the weight of V (λ) is shifted to the
shorter-wavelength region, where ρ(λ) has a smaller value.
The above example also shows that the value of scotopic lumen and that of photopic
lumen are quite different. In this example, the former is larger than the latter, but this is
not always true. It is dependent on the spectral distribution of radiant flux and the spectral
reflectance function. The convention is that when the scotopic function is used in calculating
the photometric quantities, it has to be explicitly stated; otherwise, the photopic function is
assumed.
We are now in a position to define the various quantities in photometry that correspond
to those we have learned in radiometry. The basic conversion formula is the same as that
used to convert radiant flux to luminous flux (i.e., Eq. (4.1) or (4.2)). Luminous flux cor-
responds to radiant flux, luminous intensity corresponds to radiant intensity, luminance
corresponds to radiance, and illuminance corresponds to irradiance. There is a one-to-one
correspondence in the two systems, with the understanding that the energy or power in
the radiometry is always weighted by the luminous efficiency function when we want to
calculate the corresponding photometric quantities. The base unit of photometry is the
candela (cd), which was defined by CGPM in 1979 as the luminous intensity in a given
direction of a source that emits monochromatic radiation of frequency 540 × 1012 Hz (i.e.,
555.016 nm) and that has a radiant intensity in that direction of 1/683 W sr−1 . The lumen
(lm) is the unit of luminous flux emitted in unit solid angle by an isotropic point source
having a luminous intensity of 1 cd. The lux (lx) is the unit of illuminance (luminous flux
per unit area incident on a surface) and it is the illuminance produced by 1 lm of lumi-
nous flux over the area of 1 m2 . Table 4.3 lists the corresponding units in radiometry and
photometry.
Example 4.2
A spherical Lambertian source has a radius of 10 cm. It radiates a total luminous flux of
1000 lm. What is the luminance of the source?
Solution From Example 3.2, we know that the luminous exitance [lm m−2 ] of a Lambertian
surface with luminance L is π L. Since the radius of the source is r = 0.1 m, the total surface
area is A = 4πr 2 = 0.04π m2 . Therefore, 1000 lm = A(π L), and L = 2533 cd m−2 .
CIE recommended a level of 3 cd m−2 for photopic color matching [185], the rods’
contribution being considered negligible above this luminance level. For most practical
58 4 Photometry
purposes, we can say that the scotopic vision is below 10−3 cd m−2 [1053, p. 406], when
only rods are responsible for seeing. Mesopic vision is from luminances of approximately
10−3 cd m−2 to 3 cd m−2 , when rods and cones are both contributing to the perception of
visual contrasts. Typical levels of illumination and luminance of a 20% reflectance surface
are shown in Table 4.4 [198, 429, p. 617].
The photometric units that we introduce in this chapter are based on the SI base unit of
luminous intensity, the candela (cd), as defined by the International System of Units (SI),
and other derived units as used in the CIE vocabulary. Over the years, many other units
have been used. They have not been accepted as international standards and should not be
used in the future. For the purpose of converting the old units into the standard units, the
conversion factors in Table 4.5 can be consulted [714, 942].
knowledge of them will be useful to help us understand products and literature related to
color imaging. We introduce some of them in this section.
the subject distance. The product makes sense only if we assume that the camera exposure
is inversely proportional to the square of the f -number and the light reaching the subject is
also inversely proportional to the square of the subject distance. The following derivation
requires some knowledge of the imaging photometry to be covered later, but it will give
us an idea of how the concepts of photometry can be used for this practical problem. As
we mentioned in the above, the image illuminance E i of a camera is related to the scene
luminance L by the following simplified relation:
πL
Ei = , (4.8)
4F 2
where F is the f -number of the camera. The ISO speed, S, of a film is related to the exposure
Hm required to develop the film to a certain density by S = k/Hm , where the constant k
depends on the type of film of interest. The flash light is assumed to be a point source with
an intensity I . At a distance r , the illuminance of an object surface can be calculated as
Er = I /r 2 . Assuming that the surface is a Lambertian reflector with a reflectance of ρ,
we can calculate the luminance L of the surface as seen by the camera from the relation
M = ρ Er = π L, where M is the luminant exitance of the surface. Therefore,
ρI
L= . (4.9)
πr 2
The image exposure Hm is related to the image illuminance E i by Hm = E i t, where t
is the exposure time. Therefore,
k k 4k F 2 4k F 2r 2
S = = = =
Hm E i t π Lt ρ I t
and the guide number, GN, is given by:
1
GN = Fr = √ Sρ I t. (4.10)
4k
If the film is a black-and-white film, then according to the ISO standard [920, Section 2.11]
k = 0.8. The Hm in the speed definition corresponds to a density of 0.1 above the film base
√ it is from a very dark object. Let us assume that ρ = 0.018, then we
density and therefore
have GN = 0.075 S I t.
For a given photographic situation that requires the use of a strobe, the lens aperture
( f -number) can be found by dividing the given guide number of the flashbulb by the
subject distance. This is based on the assumption that the duration of the flash light is
shorter than the exposure time during which the shutter is open. The definition of the guide
number does not consider the effect of the ambient lighting, nor mutual reflection among
ambient surfaces. Therefore, it is at best an approximate guide. If the flash is used in a small
room with light walls, the amount of reflected light from the walls significantly increases
the exposure on the film/sensor. The aperture should be properly stopped down to avoid
overexposing the film/sensor. It should also be noted that guide number has the unit of
length, which can be either in feet or meters, and the subject distance should be measured
in the same unit.
4.4 Photometry in imaging applications 61
How much exposure is appropriate for a film/sensor is determined by how sensitive the
film/sensor is with respect to the flash light. Therefore, the guide number is specified for a
particular type of film/sensor and its ISO speed. The dependence of the guide number of a
flash unit on the film/sensor type is usually assumed to be small, i.e., the standard source
used in ISO speed determination should not be too different from the spectrum of the flash
light.
AV + TV = BV + SV = EV, (4.11)
where AV = log2 F 2 (F is the aperture or f -number of the lens), TV = log2 (1/t) (t is the
exposure time in seconds), BV = log2 (B/6) (B is the lighting illuminance in footcandles,
where 1 footcandle = 10.76 lx), SV = log2 S (S is the exposure index of film or sensor),
and EV is the exposure value. Note that AV + TV = EV is basically the definition of the
exposure value, EV. The other part of the system, BV + SV = EV defines the quantity called
the exposure index, which is proportional to the ISO film speed or digital camera speed.
The proportionality constant depends on the type of film or camera speed. To see this
connection, let us recall the relation derived in Section 4.4.1: EV = log2 (π/4) + log2 L −
log2 H . Furthermore, the illuminance, E s [lx], of a Lambertian surface with reflectance ρ
is related to its luminance, L, by ρ E s = π L. Therefore, we have:
4.5 Problems
4.1 Give the commonly used metric units for the following radiometric and photometric
quantities:
(a) radiant flux, e ;
(b) radiant intensity, Ie ;
(c) radiance, L e ;
(d) irradiance, E e ;
(e) radiant exposure, He ;
(f ) luminous flux, v ;
(g) luminous intensity, Iv ;
(h) luminance, L v ;
(i) illuminance, E v ;
( j) luminous exposure, Hv .
4.2 Let d be the distance between two small disks (disk 1 and disk 2) whose radii are r1
and r2 , respectively (see Fig. 4.3). The distance d is much larger than the sizes of the
disks, i.e., d r1 and d r2 . Disk 1 is a Lambertian source with a luminance of
L 1 = 1000 cd m−2 . Disk 1 is tilted at an angle of 30◦ and disk 2 is tilted at an angle
of 45◦ , relative to the axis connecting the centers of the two disks. Let r1 = 0.1 m,
r2 = 0.2 m, and d = 10 m. What is the illuminance on disk 2 from the illumination of
disk 1? (Since d r1 and d r2 , any solid angle can be approximated by projected
area divided by the distance squared.)
30 45
d
r1 r2
Figure 4.3.
4.3 Objects A and B look equally bright at noon, but object A looks brighter than object
B at sunset. What can we say about the spectral reflectance factors of the two objects?
4.4 For the same radiant power, some lamps look brighter than others. Why?
4.5 What is the maximum intensity (cd) that can be produced by a 100 W lamp? Is this
maximum efficiency lamp good for general lighting?
4.6 Let us assume that the sky is a uniform hemisphere with a luminance of 1500 cd m−2 .
Let us further assume that the sun is at a 30◦ angle from the zenith and has a luminance
of 1.2 × 109 cd m−2 . The sun is approximately a uniform disk with a diameter of
30 arcmin in visual angle as viewed on the earth’s surface.
a. What is the solid angle subtended by the sun?
b. What is the illuminance on a horizontal plane due to the sunlight?
c. What is the illuminance on a horizontal plane due to the skylight?
5 Light–matter interaction
The interaction between light and matter is often very complicated. The general descrip-
tion of the resulting phenomena often uses empirical measurement functions, such as the
bidirectional spectral reflectance distribution function (BSRDF) to be discussed in the next
chapter. However, the optical properties of a homogeneous material in its simple form (such
as gas or crystal) can be calculated from physical principles. Understanding the basic optical
properties of material is important because it serves as a foundation for understanding more
complex phenomena. In this chapter, we will first discuss the physical properties of light,
matter, and their interaction for simple cases. We will then derive the optical “constants” of
material that characterize the propagation of light in the material.
For color imaging applications, light can be defined as the radiant electromagnetic energy
that is visible either to our visual system, or to the image capture devices of interest. (When
discussing visual systems of different species, we have to vary its range accordingly.) In
optics, the scope of definition of light is larger, including other wavelengths for which
the behavior of optical elements (such as lenses) can be described by the same laws as
used for the visible spectrum. In physical chemistry, light is sometimes used to denote
electromagnetic waves of all frequencies.
The electromagnetic spectrum that is visible to our eyes is from about 360 nm to about
830 nm in the air (according to the CIE specifications), corresponding to the frequency range
of 3.61 × 1014 –8.33 × 1014 Hz. This range is dependent on the intensity of the stimulus.
For the wavelengths below 400 nm or above 700 nm to become easily visible, their radiant
power has to be very high. At the short-wavelength end, the optic media before the retina
block the ultraviolet radiation, while at the long-wavelength end, the photon energy is too
low to be absorbed by the photoreceptors in the retina. The two regions neighboring visible
light in the electromagnetic spectrum are the infrared and the ultraviolet. The infrared
region extends roughly from 3 × 1011 Hz (λ = 1 mm) to about 4 × 1014 Hz (λ = 750 nm),
while the ultraviolet region extends from about 8 × 1014 Hz (λ = 375 nm) to 3.4 × 1016 Hz
(λ = 8.8 nm). Many image sensors are responsive to radiation from these two spectral
regions and care must be taken to exclude them so that color reproduction will not be
adversely affected. Against a completely dark background, light of various wavelengths
is seen as having a distinct color. Table 5.1 roughly shows the common color names for
63
64 5 Light–matter interaction
360–430 blue–purple
430–480 blue
480–490 green–blue
490–510 blue–green
510–530 green
530–570 yellow–green
570–580 yellow
580–600 orange
600–720 red
720–830 red–purple
different wavelength regions when viewed against a completely dark background for an
average color-normal observer. (It should be pointed out that the spectral composition at a
given spatial location does not uniquely determine the color we perceive at that location.
Color perception will be discussed in detail in later chapters.) The convenient energy unit for
photons is the electron volt (eV), which is 1.602 177 33 × 10−19 J, and which corresponds
to a wavelength of 1239.84 nm in vacuum or a frequency of 2.417 75 × 1014 Hz. A 2 eV
photon has a wavelength of 619.92 nm in vacuum (that 620 nm corresponds to 2 eV is worth
remembering). The visible photon energy, hν, therefore is approximately from 1.49 eV
to 3.44 eV. Typically, the number of photons emitted by a light source is very large. For
example, a hypothetical 100 W light bulb might emit 5 W of visible light, which contain
1.5 × 1019 photons per second at 600 nm. Let us assume that the light source subtends a
visual angle of 1◦ by 1◦ . There will then be about 3.6 × 1014 photons per second coming
on to the cornea. Even if only 1% of these photons is captured by one million cones in the
fovea, each cone will receive 3.6 million photons per second!
In terms of structural complexity, material can exist as free atoms (such as helium gas),
free molecules (such as hydrogen gas), atom clusters (a few hundreds of atoms), molecule
clusters, atomic crystals, ionic crystals, molecule crystals, polycrystals, glasses, molecular
liquid crystals, atomic or molecular liquids, and other forms. Before we discuss physical
properties, it is useful to have a good feeling for the relative scale of things that we will talk
about. Table 5.2 gives the order of magnitude of the size of various things.
Gas, liquid, and solid are commonly referred to as the three states of pure elements and
compounds. Gases and liquids are collectively called fluids, while liquids and solids are
referred to as condensed matter. A material exists in one or more of these states at a given
temperature, pressure, and volume.
In gas, the atoms or molecules collide with each other about 109 –1010 times per sec-
ond. At room temperature and atmospheric pressure, the mean free path length between
5.2 Physical properties of matter 65
collisions is about 50–100 nm (corresponding to about 100–1000 times the size of the atom
or molecule). Therefore, each particle can be considered as independent of the others. The
optical properties of a gas can thus be derived from that of the individual atoms or molecules
that are present in the gas.
Both liquid and solid are forms of condensed matter. The density of a liquid is usually
only about 5% less than that of the solid (water is one of the exceptions). When the atoms
and molecules in a solid are arranged in a very orderly periodic lattice structure, the solid
is called a crystal. When they are arranged without a regular, periodic structure, the solid
is called a glass. (Note that here glass refers to a state of matter. Window glass is one of
the best-known materials in this state. Also, the common use of the name crystal glass to
denote certain clear glasses is not compatible with the terms we define here.) Typically a
1 cm3 crystal contains about 1023 atoms. The distance between atoms or molecules is on
the order of the size of the atom or molecule. For example, the sodium atom has a radius of
1.9 Å (at the valence electron shell) and the distance between the centers of nearest neigh-
bors in a sodium crystal is about 3.7 Å (neighbors actually overlap slightly!). Therefore,
the optical properties of solids or liquids cannot be derived from the individual atoms or
molecules alone – the interactions between all of them have to be considered.
In the simplest case of a crystal, atoms or molecules are well ordered in a lattice. Ac-
cording to the lattice geometry, there are 14 possible lattices (the Bravais lattices) and they
are grouped into seven crystal systems according to characteristic symmetry elements: tri-
clinic, monoclinic, orthorhombic, tetragonal, cubic, trigonal, and hexagonal. For example,
the cubic system consists of the simple cubic lattice, the body-centered cubic (bcc) lattice,
and the face-centered cubic (fcc) lattice. Many metal crystals are bcc (such as Fe(α), Li,
Na, and K) or fcc (such as Au, Al, Ag, Fe(γ ) and Pb). If there is more than one atom
at each lattice site, then considering all possible symmetries and the translational invari-
ance of a lattice, the total number of the so-called space groups is only 230. This is a
surprisingly small number, showing that the requirement to produce the spatially repeating
patterns of a lattice is a very severe constraint. The major consequence of the lattice struc-
ture is that the optical properties of most crystals are anisotropic [124, Chapter 14, 1048].
An exception is the cubic system – its lattice symmetry also makes it optically isotropic
[381, pp. 14–15]. The natural vibrational frequency in crystals is at around 1012 Hz, i.e., in
the infrared region.
66 5 Light–matter interaction
Most natural crystals are rather small in size (less than a few millimeters). A large piece
of metal consists of many small crystals bound together. Each of the tiny crystals is called
a grain, and the piece of metal is called a polycrystal. The metal crystal grains are usually
very small. This is why metals are not seen to have the characteristic shapes exhibited by
other crystalline materials, such as calcite.
As the temperature or pressure increases, the thermal vibrations reach a point at which
the crystal structure of the solid begins to break down. The empty spaces between atoms
or molecules may be large enough for an atom or molecule to move around without too
much force. The atoms or molecules lose their ordered structure and the substance becomes
liquid and optically isotropic. There are substances that can exist in a state that seems
to be between crystal and liquid. These are called liquid crystals; they are anisotropic
liquids, in the sense that molecules can move around, but they more or less maintain their
orientations relative to their neighbors (these molecules are rod-like or disk-like in shape
[209]).
The majority of the everyday materials around us are more complicated than pure ele-
ments and compounds. For example, stones, bricks, paints, paper, leaves, wood, and soils
are mixtures of many types of elements and compounds. Many of them are also mixtures
of materials in different phases. For example, an aerosol is liquid or solid dispersed in gas,
and shaving cream is gas dispersed in liquid. There are two consequences of this struc-
tural and compositional complexity. (1) Calculation of the optical properties of everyday
objects, such as a ceramic vase, from first principles is impractical if not impossible. It is
necessary to measure the properties directly or indirectly. (2) Light reflected from most
everyday objects is not as sensitive to viewing angle as that reflected from a pure and highly
ordered crystalline solid. Structural and compositional randomness tend to make the optical
properties more isotropic, because there is no orderly interference or diffraction from all
the atoms and molecules. Of course, everyday objects do not exactly reflect or transmit
light isotropically, i.e., independently of the incident and viewing angles. Nevertheless, this
randomization towards optical isotropy helps our eyes and brains solve the visual perception
problem. Although the reflected radiance from an object surface does not stay constant as
we move around and look at it from slightly different angles, it does not change drastically
either. Otherwise, it would be very difficult to maintain a constant appearance of that object
in our visual perception.
Electromagnetic radiation, of which visible light is one form, comes from the nonuni-
form motion of charged particles. Conversely, electromagnetic radiation can change the
motion of charged particles. The interaction between light and matter is that between
the electromagnetic field associated with light and the charged particles associated with
matter. In the classical model, an electromagnetic wave passing through a medium in-
troduces forced oscillations of the charged particles in the array of atoms or molecules.
From quantum mechanics, electrons can absorb quanta of light energy and be excited to
a higher energy state. The energy required for state transitions for the outer electrons of
5.3 Light and matter 67
an atom is typically from 1 to 10 eV, corresponding to the visible and ultraviolet ranges.
Infrared radiation is absorbed in the energy transitions between different states of molec-
ular vibrations and rotations. The microwave region, which extends roughly from 109 Hz
to 3 × 1011 Hz, is also effective in causing state transitions in the vibrations and rota-
tions of polar molecules, such as water. The interaction between light and matter can be
quite complicated. For example, some nonlinear optical materials, such as quartz, can
change the frequency of the incident light. If we shine a high-energy beam from an in-
frared laser at quartz, we can see a green light beam coming out of the crystal [131].
This is called frequency-doubling and is used in laser optics quite often. However, nonlin-
ear optical interaction is rare or weak in most color applications and we will not treat it
here.
charge (the nucleus) until their mutual attractive force balances out the external field. If
the field is strong enough, the charges can be separated and the atom (or the molecule) is
ionized. Otherwise, the atom (or the molecule) is said to be polarized. Two electric charges,
q and −q, separated by a distance d is called an electric dipole. It is represented by a vector
whose direction is from the negative charge to the positive charge, and whose magnitude
p = qd, is called the dipole moment. An electromagnetic wave traveling through matter
induces dipole moments in the material. In a linear homogeneous material, when the electric
field is not too strong, the induced dipole moment per unit volume, P, is proportional to the
electric field, E:
P = 0 χe (ν)E, (5.1)
where 0 = 8.8542 × 10−12 m−3 kg−1 s−4 A2 is the permittivity of free space and χe (ν) is
the electric susceptibility of the material. For many materials, the induced polarization P is
not even in the same direction as the electric field E, and the relation has to be expressed as a
3 × 3 matrix. Values of χe can be quite large for materials that consist of polar molecules,
such as water (χe = 79.4 at zero frequency, but drops to about 0.78 at optical frequencies,
because its permanent dipole moments cannot oscillate that fast). Glass has a χe (0) between
3 and 7, while polyethylene has a χe (0) of 1.26. The permittivity of a material is =
0 (1 + χe ). For most dielectric materials, the permittivity is often larger than that of free
space, and, as we will see later, this is the major parameter that makes the speed of light
much slower in the dielectric material than in free space. It should be pointed out that
permittivity need not be a real number, meaning that an electromagnetic wave is attenuated
and/or delayed in phase when it propagates through the material.
The interaction of a magnetic field with matter is much weaker than that of an electric
field, except for certain ferromagnetic materials, such as iron, nickel, and cobalt, that retain
a substantial magnetization indefinitely after the external field is removed. In the presence
of an external magnetic field, materials become magnetized. There are two types of mag-
netization: paramagnetization (the magnetization is in the same direction as the external
field) and diamagnetization (the magnetization is in the opposite direction). Both phenom-
ena require quantum mechanical explanations. One can define magnetization in a similar
way to electric polarization. Thus, a magnetic dipole moment m (a vector) can be defined
as the contour integral of the current along a curved loop. For a planar loop, the definition
becomes the product of the current I and the area of the loop, a, i.e., m = I a, with the vector
direction normal to the area and the current running in the counterclockwise direction when
we look towards the vector. In a linear homogeneous material, when the magnetic field is
not too strong, the induced magnetic dipole moment per unit volume, M, is proportional to
the magnetic field, H:
M = χm (ν)H, (5.2)
where χm (ν) is called the magnetic susceptibility of the material and is usually a function
of the frequency of the magnetic field. The permeability µ of a material is µ = µ0 (1 +
χm ), where µ0 = 1.2566 × 10−6 m kg s−2 A−2 is the permeability of free space. For most
materials, typical values of χm are 10−5 , and we can assume that µ ≈ µ0 .
5.3 Light and matter 69
where E0 is a constant complex vector independent of time, t, and the position vector, r,
ω = 2π ν is the angular frequency, and k is the wave vector.
From the Maxwell equations,
∇ · D = ρ, (5.4)
∇ · B = 0, (5.5)
∂B
∇ ×E = − , (5.6)
∂t
∂D
∇ ×H = J+ , (5.7)
∂t
where E is the electric field vector, B is the magnetic induction vector, H is the mag-
netic field vector, D is the electric displacement vector, J is the current density vector
due to the “free charge”, and ρ is the density of the “free charge”. The charged parti-
cles (mostly electrons and holes) that are free to move around in the material are called
the free charges, in contrast with the bound charges (such as valence electrons that are
bound to atoms or molecules). The effects due to the bound charges are accounted for
by the polarization vector and the magnetization vector. However, this distinction between
free charges and bound charges becomes quite arbitrary when the frequency of the ex-
ternal fields is very high, as in optical radiation. In a high-frequency oscillating field the
free charges do not move very far, and the bound charges also oscillate with the field.
When the free charges do not move further than the bound charges in a given oscillating
cycle, the distinction between them is difficult to make. Therefore, the definitions of the
permittivity (ν) and the conductivity σ (ν) of a material at optical frequencies tend to be-
come a matter of convention [38, pp. 776–9]. In general, the convention is to lump the
response of all charged particles into the permittivity (ν). This will be discussed further
shortly.
For a linear, homogeneous, isotropic material, the electric displacement vector and the
magnetic field vector are related to the electric field vector and the magnetic induction
70 5 Light–matter interaction
vector by
D = 0 E + P, (5.8)
1
H= B − M, (5.9)
µ0
where P is the electric polarization vector and M is the magnetization vector. These quantities
are further related by
J = σ E, (5.10)
P = 0 χe E, (5.11)
D = 0 E + P = 0 (1 + χe )E = E, (5.12)
B = µ0 (H + M) = µ0 (1 + χm )H = µH. (5.13)
Substituting Eqs. (5.10), (5.12), and (5.13) into Eqs. (5.4)–(5.7), we have
ρ
∇ ·E = , (5.14)
∇ · H = 0, (5.15)
∂H
∇ × E = −µ , (5.16)
∂t
∂E
∇ × H = σE + . (5.17)
∂t
Almost all the materials we encounter every day are electrically neutral, i.e., there is no (or
negligible) net charge on them, and we have ρ = 0. For typical conductors in the presence of
an external field, any electric charge density will decay exponentially with a time constant
on the order of 10−18 s [124, p. 612]. From Eqs. (5.16), (5.17), and the vector identity
∇ × (∇ × E) = ∇(∇ · E) − ∇ 2 E,
∂E ∂ 2E
∇ 2 E = µσ + µ 2 . (5.18)
∂t ∂t
Substituting Eq. (5.3) into Eq. (5.18), we have:
σ
k · k = ω2 µ + iωµσ = ω2 µ + i = ω2 µˆ , (5.19)
ω
where ˆ = + i(σ/ω) is called the complex permitivity. (At optical frequencies, the defini-
tion of and σ is not unique, as will be discussed later. We will relate the complex permitivity
to the complex index of refraction.) Therefore, the wave vector k is also a complex vector.
We now express the wave vector as:
k = k1 + ik2 (5.20)
phase
Substituting Eq. (5.20) into Eq. (5.19), we can solve for the magnitudes of k1 and k2 in
terms of the material optical constants (which are functions of frequency). The directions
of k1 and k2 have to be solved from boundary conditions. It should be noted that k1 and
k2 are not necessarily pointing in the same direction. The vector k1 is perpendicular to the
surface of constant phase, while the vector k2 is perpendicular to the surface of constant
amplitude. When k1 and k2 are in the same direction (or when k2 = 0), the wave is said
to be homogeneous. Otherwise, it is said to be inhomogeneous. For example, a plane wave
entering an absorbing material (such as a metal) at an oblique angle will be attenuated
according to the distance perpendicular to the interface (i.e., k2 is normal to the interface),
while its wave vector k1 is at an oblique angle with the interface. Therefore, the wave is
inhomogeneous. On the other hand, a wave propagating in a vacuum is homogeneous.
The surface of constant phase (k1 · r − ωt = constant) travels at a velocity v:
dr ω
v= = . (5.22)
dt k1
By definition, the index of refraction, n, of a medium is the ratio of the phase velocity of
light in vacuum (free space) to that in the medium:
c c
n= = k1 . (5.23)
v ω
The complex wave vector k = k1 + ik2 is used to define the complex index of refraction n̂:
c c c
n̂ = k= k1 + i k2 = n + iκ. (5.24)
ω ω ω
The optical properties of a material are usually characterized by the complex index of refrac-
tion, because it can be derived from quantities that are directly measurable by experiments
[120, p. 41]. There are published tables and books of n and κ for many different materials
[739, 740].
The amplitude of the wave is attenuated in an absorbing material. The rate of attenuation
α (the absorption coefficient) is defined as:
1 dI
α=− , (5.25)
I dr
where I is the radiance of the light beam and is proportional to the square of the amplitude
of the electric field. Therefore,
2ωκ
α = 2k2 = . (5.26)
c
√
In a nonconducting material for which σ = 0, Eq. (5.19) gives us: k1 = ω µ and
k2 = 0. Therefore, from Eq. (5.22), we have
1
v=√ , (5.27)
µ
where and µ are the permittivity and the permeability of the material. Thus, the
√
speed of light in vacuum is c = 1/ 0 µ0 . These two are the special cases that we are
most familiar with. In general, the expression for phase velocity (Eq. (5.22)) is more
complicated.
72 5 Light–matter interaction
From Eqs. (5.24) and (5.19), we can derive expressions for and σ in terms of n and κ:
µ0 0 2
= (n − κ 2 ), (5.28)
µ
ωµ0 0
σ = (2nκ). (5.29)
µ
As we discussed before, accounts for the effect due to the bound charges and σ for that
of the free charges. At optical frequencies, the distinction between the two effects becomes
blurred. One can continue to think of the electrons in the conduction band as the free charges
and the electrons in the valence band as the bound charges. In terms of external measurable
optical properties, they are not easy (if it is possible at all) to distinguish and it is customary
to lump the two together and define a complex permitivity ˆ to account for their combined
effect on the propagation of a light wave:
ˆ = n̂ 2 0 µ0 /µ = 1 + i2 . (5.30)
The real part 1 and the imaginary part 2 of ˆ can be expressed in terms of n and κ:
µ0 0 2
1 = (n − κ 2 ), (5.31)
µ
µ0 0
2 = (2nκ). (5.32)
µ
This pair of equations will be important when we discuss optical dispersion in the next sec-
tion. The dispersion relation will be derived for ˆ and then be related back to the measurable
quantities n and κ through these two equations.
in which both the real part, n(ν), and the imaginary part, κ(ν), are functions of optical
frequency ν. For a typical glass, the imaginary part κ(ν) is quite small for most visible
frequencies, and the real part, n(ν), the index of refraction, is greater than 1. In this case, n
can be comfortably related to our familiar concept, the ratio of the phase velocity of light
in vacuum to that in the medium. As it turns out, for some materials, n can be less than 1
(meaning that the speed of light in the medium is greater than that in vacuum), or can even be
negative. We may feel quite uncomfortable about the idea that the phase velocity of light in
the medium is greater than c, the speed of light in vacuum, since this seemingly contradicts
the fundamental assumption of the theory of relativity. However, it can be proved that this
is not so [120, p. 236]. It can be shown that as long as n(ν) and κ(ν) satisfy the following
Kramers–Kronig relations, signals cannot be propagated with a speed greater than c. The
Kramers–Kronig relations are derived from the causality principle:
∞
2 ω κ(ω )
n(ω) − 1 = dω , (5.34)
π 0 ω2 − ω2
∞
−2ω n(ω )
κ(ω) = dω , (5.35)
π 0 ω2− ω2
where ω = 2π ν and the integral is understood to be the Cauchy principal value
[402, pp. 592–6], meaning that the integral is taken excluding a small circle of ω around ω
with the radius of the circle approaching zero. The Kramers–Kronig relations are valid for
all causal systems (and for that matter, all the optical phenomena we are interested in) and
therefore are frequently used to compute n(ν) from κ(ν), or vice versa.
The classical Drude–Lorentz model of optical dispersion assumes that an electron moves
under the influence of the electromagnetic field of the incident light wave. The motion also
experiences some resistive or frictional energy loss. The resulting equation for the forced
motion can be expressed as
d2r dr
m 2
+ m + mω02r = q E l e−iωt , (5.36)
dt dt
where r , m, and q are the position, the mass, and the electric charge of the electron, E l is
the local electric field, represents a damping mechanism, and the term mω02r represents
a spring-like Hooke’s law restoring force. Now let us solve the equation for a steady state
solution. The solution r is
q E l /m
r= . (5.37)
(ω02 − ω2 ) − iω
As we discussed before, the forced motion of a charged particle induces an oscillating
74 5 Light–matter interaction
We now have to relate the averaged local electric field back to the externally applied macro-
scopic electric field E. The resulting dispersion function will be different for gas, solid, and
liquid, with metal as a special case.
Dispersion in gases
In a gas, atoms or molecules are so far apart that the local field is essentially the same as
the macroscopic electric field. Therefore, we have
P = N α̂ E ≡ 0 χe E (5.41)
and
N α̂ N q2 fj
ˆ = 0 (1 + χe ) = 0 1+ = 0 1+ . (5.42)
0 0 m j (ω2j − ω2 ) − i j ω
Recall that the complex index of refraction, n̂ = n + iκ, is related to the complex permitivity
by ˆ = n̂ 2 0 µ0 /µ. Therefore,
µ N q2 fj
n̂ =
2
1+ . (5.43)
µ0 0 m j (ω2j − ω2 ) − i j ω
For a dielectric material, in the frequency region where ω is slightly lower than one
resonant frequency ω j , but much higher than the next lower resonant frequency, the index
of refraction, n, is an increasing function of frequency. For example, glasses have natural
resonant frequencies (around 3 × 1015 Hz) in the ultraviolet region, and their indices of
refraction are higher for blue light than for red light.
and
µ Nf q 2 1 N q2 fj
n̂ =
2
1+ + . (5.50)
µ0 0 m −ω2 − if ω 0 m j (ω2j − ω2 ) − i j ω
If we neglect all the terms associated with the bound electrons and the effect due to friction
f , we can simplify the expression for n to
µ ωp2
n (ω) ≈
2
1− 2 , (5.51)
µ0 ω
where ωp = (Nf q 2 )/(0 m) is called the plasma frequency. The charge on an electron, q,
is 1.602 × 10−19 C. The electron mass is 9.109 × 10−31 kg. The free electron density Nf
76 5 Light–matter interaction
is about 1023 cm−3 for metals and typical values of plasma frequency for metals are in the
ultraviolet region (ωp > 1015 Hz). When the incident wave frequency is far above the plasma
frequency, the index of refraction approaches 1 and the metal is transparent. At frequencies
below ωp , the index of refraction for a metal is a complex number and electromagnetic
waves can only penetrate a fraction of the wavelength into the metal. The majority of metals
reflect almost all the incident light regardless of wavelength and are therefore essentially
colorless. If a metal has a particular color, it means that the bound electrons of the atoms
are participating in the absorption process, in addition to the free electrons. For example, in
copper, the d-bands absorption begins at λ = 600 nm (about 1.93 eV) and the metal looks
reddish.
Although QED describes the atomic, molecular, and crystal levels of light–matter inter-
action very well, practical applications in radiometry, photometry, colorimetry, and color
imaging involve light–matter interaction on a much larger scale where the detailed spatial
structures of the matter are too complicated for the quantum mechanical calculation to be
feasible. Descriptions that are based on electromagnetic theory of simplified models of
matter are thus widely used for light–matter interaction on this macroscopic scale. Five
5.4 Light propagation across material boundaries 77
such descriptions that are particularly relevant to color imaging are reflection, refraction,
transmission, absorption, and scattering. These processes are usually described in terms
of monochromatic, time-harmonic electromagnetic plane waves, not only because the ana-
lytical solution is easier, but more importantly because: (1) a monochromatic field can be
regarded as a Fourier component of an arbitrary field; and (2) in general, at a distance far
from the source and on a scale such that variations in the electromagnetic field are small, the
field behaves locally as a plane wave (see, e.g., [135, p. 121]). It should be emphasized that
the analytic descriptions of these “laws” are true only for single-wavelength, monochro-
matic light, and the optical parameters of a material, such as reflectance, refraction index,
and absorption coefficients, are functions of wavelength. Proper scales of material homo-
geneity and surface smoothness relative to the wavelength of light of interest should also
be considered when interpreting the optical properties of a material.
In studying the optical scene formation processes, it should be noted that although we use
different terms (such as reflection, scattering, etc.) to describe the various aspects of light–
matter interaction, the reason for doing so is mostly historical. All their underlying causes
are actually the same – they are all the result of the interaction of light and the charged
particles (mainly the electrons) in matter. Reflection, refraction, scattering, diffraction,
absorption, transmission, polarization, and interference were studied empirically long before
any comprehensive theory of light was available. The distinctions between these phenomena
are mostly artificial and we should not try to classify them too rigidly.
Even after the classical electromagnetic theory of light was developed, phenomenological
derivation of various optical laws, such as Snell’s law of refraction and Fresnel’s formula for
reflection, often relies on solving the Maxwell equations with surface boundary conditions,
as if light is refracted or reflected only by the object surface and the rest of the matter
beneath it is irrelevant. In fact, the reflected light should be treated as the interference of the
secondary radiation from all the matter the surface encloses [121]. It just happens that these
secondary waves cancel out so neatly that the electromagnetic field solution can be derived
from the surface boundary conditions. If we were to remove most matter from under the
surface, the reflection would be quite different. With the right thickness of material left, the
reflected component of the light may disappear completely.
With these warnings and qualifiers, we will discuss the various aspects of light–matter
interaction in traditional terms. They offer us a common language with the current literature
and often a conventional list of topics to study.
y
yi
yr
zr
xi
qr xr
qi
zi
yt
qt xt
zt
Figure 5.1. The convention for the coordinate systems for the Fresnel equations.
be found in many books on calculus or complex variables (e.g., [402, Chapter 10]). The
relations describing the incident, reflected, and transmitted electromagnetic fields are called
the Fresnel equations. The coordinate systems (see Fig. 5.1) we will be using are the same
as the one we used in the previous chapter for describing the light polarization.
Nonabsorbing media
When a monochromatic plane wave encounters a boundary between two homogeneous
media of different optical properties, part of it is reflected back into the first (incident)
medium and part of it is transmitted through the second medium. If the boundary between
the two media is “ideally smooth”, the reflected wave vector is on the same plane (plane
of incidence) as the surface normal and the incident wave vector. The angle of reflection
is equal to the angle of incidence. The transmitted wave does not propagate in the same
direction as the incident wave unless the wave is normal to the boundary. The change in
direction is called refraction. The transmitted wave is also on the plane of incidence. The
incident angle, the reflected angle, and the transmitted angle are denoted by θi , θr and θt .
Snell’s law states that
where n i and n t are the indices of refraction of the incidence medium and the transmittance
medium, respectively. The proportions of energy in the reflected and refracted components
depend on the polarization of the plane wave, which is usually decomposed into a com-
ponent parallel () to the plane of incidence and a component perpendicular (⊥) to it. We
will denote the parallel component of the amplitude of the electric field of the incident wave
by E i , and the perpendicular component by E i⊥ , and so on. Let the permeability of the
first medium be µi , and that of the second medium be µt , then the amplitude reflection
5.4 Light propagation across material boundaries 79
coefficients r , r⊥ , and the amplitude transmission coefficients t , t⊥ , are [385, pp. 94–6]:
Er (n i /µi ) cos θi − (n t /µt ) cos θt
r⊥ = = , (5.53)
Ei ⊥ (n i /µi ) cos θi + (n t /µt ) cos θt
Et 2(n i /µi ) cos θi
t⊥ = = , (5.54)
Ei ⊥ (n i /µi ) cos θi + (n t /µt ) cos θt
Er (n t /µt ) cos θi − (n i /µi ) cos θt
r = = , (5.55)
Ei (n t /µt ) cos θi + (n i /µi ) cos θt
Et 2(n i /µi ) cos θi
t = = . (5.56)
Ei (n t /µt ) cos θi + (n i /µi ) cos θt
These coefficients are called Fresnel reflection and transmission coefficients. At the normal
incident angle, θi = 0,
nt − ni
r = −r⊥ = , (5.57)
nt + ni
2n i
t = t ⊥ = . (5.58)
nt + ni
For a glass with n t = 1.5 in air (i.e., n i ≈ 1.0), the amplitude reflection coefficient is 0.2.
Since the radiant power is proportional to the amplitude squared, the radiant flux (power)
reflection coefficient is 0.2 × 0.2 = 0.04.
It is interesting to note that since the perpendicular and parallel components are atten-
uated differently, the reflected light beam and the transmitted light beam have different
polarizations from the incident light beam. We can use the Stokes vectors for these beams
and derive the input–output relations between them. These relations are the Mueller matri-
ces for the planar dielectric interface. Let Si , Sr , and St be the Stokes vectors for the incident,
reflected, and transmitted beams, respectively. Since the Mueller matrix will be determined
on the interface plane, all observables (radiant flux density) in the Stokes vector should be
projected onto the plane. Therefore,
(E i )2⊥ + (E i )2
n i cos θi
(E i )2⊥ − (E i )2
,
Si = (5.59)
2µi c 2(E i )⊥ (E i ) cos δ
2(E i )⊥ (E i ) sin δ
(E r )2⊥ + (E r )2
n r cos θr
(E r )2⊥ − (E r )2
,
Sr = (5.60)
2µr c 2(E r )⊥ (E r ) cos δ
2(E r )⊥ (E r ) sin δ
(E t )2⊥ + (E t )2
n t cos θt
(E t )2⊥ − (E t )2
,
St = (5.61)
2µt c 2(E t )⊥ (E t ) cos δ
2(E t )⊥ (E t ) sin δ
80 5 Light–matter interaction
where δ is the phase difference between the perpendicular and parallel components, (δ⊥ −
δ ). Since n r = n i , µr = µi , and θr = θi , the Mueller matrix for reflection, Mr , is:
2
r⊥ + r2 r⊥2 − r2 0 0
1r 2 − r2 r⊥2 + r2 0 0
Sr = Mr Si = ⊥ Si , (5.62)
2 0 0 2r⊥r 0
0 0 0 2r⊥r
Absorbing media
In our discussion of reflection and refraction involving absorbing media, we will use the
complex index of refraction n̂. All the formulas presented above remain the same in form,
but with all the ns replaced by n̂s. The meaning of the sine and cosine of a complex number
thus needs a new interpretation. The definitions of elementary functions of a complex
variable can be given in terms of polynomials once the products of two complex numbers
are defined [402]. The following functions will serve our purpose here:
1 2 1
ez = 1 + z + z + z3 + · · · ; (5.64)
2! 3!
1 z
sinh z = (e − e−z ); (5.65)
2
1
cosh z = (ez + e−z ); (5.66)
2
sin z = sin(x + iy) = (sin x cosh y) + i(cos x sinh y); (5.67)
cos z = cos(x + iy) = (cos x cosh y) − i(sin x sinh y). (5.68)
With the above mathematical definitions of complex functions, let us present the formulas
for the absorbing media. Snell’s law can still be derived from the boundary conditions that
the tangential components of E and H are continuous across the interface and so are the
normal components of E and µH. We obtain the following Snell’s law:
where n̂ i and n̂ t are the complex indices of refraction for the incidence medium and the
transmittance medium, respectively. The incident angle θ̂i and the transmission angle θ̂t can
both now be complex numbers. The physical interpretation of complex angles requires us to
go back to the wave vector and find out how the surface of constant phase and the surface of
constant amplitude propagate through the second medium. An example of such derivation
can be found in [124, Section 13.2].
5.4 Light propagation across material boundaries 81
The amplitude reflection coefficients, r̂ , r̂⊥ , and the amplitude transmission coefficients,
tˆ , tˆ⊥ , become:
Er (n̂ i /µi ) cos θ̂i − (n̂ t /µt ) cos θ̂t
r̂⊥ = = , (5.70)
Ei ⊥ (n̂ i /µi ) cos θ̂i + (n̂ t /µt ) cos θ̂t
Et 2(n̂ i /µi ) cos θ̂i
tˆ⊥ = = , (5.71)
Ei ⊥ (n̂ i /µi ) cos θ̂i + (n̂ t /µt ) cos θ̂t
Er (n̂ t /µt ) cos θ̂i − (n̂ i /µi ) cos θ̂t
r̂ = = , (5.72)
Ei (n̂ t /µt ) cos θ̂i + (n̂ i /µi ) cos θ̂t
Et 2(n̂ i /µi ) cos θ̂i
tˆ = = . (5.73)
Ei (n̂ t /µt ) cos θ̂i + (n̂ i /µi ) cos θ̂t
The reflectances are defined as: R = r̂ 2 and R⊥ = r̂⊥ 2 , both being real numbers.
The interpretation of these complex reflection and transmission coefficients is simple. They
represent the amplitude as well as the phase change of the complex, incident electric field
vector, Ei , upon reflection and transmission. An important special case is that of the normal
incident angle, θ̂i = 0:
n̂ t − n̂ i
r̂ = −r̂⊥ = , (5.74)
n̂ t + n̂ i
2n̂ i
tˆ = tˆ⊥ = . (5.75)
n̂ t + n̂ i
The reflectance at the normal incidence is:
(n t − n i )2 + (κt − κi )2
R = R⊥ = . (5.76)
(n t + n i )2 + (κt + κi )2
Equation (5.76) shows that when κt is very much greater than κi , i.e., when the absorption
is very high (κ is proportional to the absorption coefficient), the reflectance is also very
high. This is very typical for metals. They are strong reflectors and also strong absorbers (so
strong that light cannot travel through more than a wavelength or so before it disappears).
Actually, most light energy is reflected back.
5.4.2 Scattering
Scattering is the process in which energy is removed from a beam of light and reemitted with
a change in direction, phase, or wavelength. This definition is general, but not universally
accepted. In fact, the term scattering as used in optical phenomena refers to many different
special cases of light–matter interaction [120, Chapter 1]. In the most general meaning,
scattering is the excitation of charged particles by, and the subsequent reradiation of, elec-
tromagnetic waves. The key point is that the reradiated waves are in all directions and the
sum of the reradiated waves that emerge at a given spatial position is very much depen-
dent on the physical composition and structure of the material and medium involved. The
specular reflection of a plane wave by a mirror is not typically considered to be a scattering
process, but the excitation and reradiation process is no different from diffuse reflection
82 5 Light–matter interaction
from a rough surface. The unique feature in the former is that the mirror because it is so
smooth produces a very directional path of constructive interference, so that the reflected
wave propagates “only” in that direction [307]. We will not adopt this broad definition
of scattering here, but instead opt for a narrower and more traditional phenomenological
definition.
For our applications, it is simpler to say that due to the presence of particles of index
of refraction different from the surrounding medium, light changes its direction and phase.
If the frequency is not changed, this is called elastic scattering, if it is, it is called inelastic
scattering. One of the best-known examples of scattering is that by particles much smaller
than the wavelengths of the light (known as Rayleigh scattering [921]), such as the scattering
of sunlight by air molecules to produce blue skies and red sunsets. The general problem of
scattering by a spherical particle has been solved several times independently [600]. A well-
known publication is that by Gustav Mie [662] and published in 1908, about the scattering
of a plane monochromatic wave by a homogeneous sphere in a homogeneous medium. The
scattering of light by multiple spheres can be found by applying the Mie solution to each of
the spheres and summing the scattered light from each sphere. When the radius of the sphere
is much smaller than the wavelength, Mie’s solution approaches Rayleigh scattering. In that
case, the scattered energy is inversely proportional to the fourth power of the wavelength.
Light at 450 nm will be scattered 4 times more than that at 650 nm. Without air, the sky
would look dark but with air, blue light is scattered more than green or red light and the
sky looks cyan blue. For particles that are much larger than the wavelength, the scattering
is less dependent on the wavelength. Transparent materials such as glass when ground into
small particles become white and opaque, because light is scattered at the many interfaces
of the particles and is diffusely directed to all angles, instead of going through the material
in a regular direction. Sugar, salt, clouds, and snow appear white because their particles or
crystals are much larger than the visible wavelengths.
Scattering not only redistributes energy from a beam of light in different directions, but it
can also change the state of polarization of the scattered light. It is a well known that skylight
is polarized even though sunlight is unpolarized. The description of light scattering thus
requires the use of a scattering matrix, which is a Mueller matrix that transforms the Stokes
vector of the incident light beam to that of the scattered light beam as a function of the
scattering angle. The conventional coordinate system takes the propagation direction of the
incident ray as the z-axis, which is called the forward direction (see Figure 5.2). A scattered
scattering particle
a
z
n^ 2 q
n^ 1
r
Figure 5.2. The convention for the coordinate systems for scattering.
5.4 Light propagation across material boundaries 83
ray r and the incident ray form a plane, called the scattering plane for that scattered ray. The
angle between the scattered ray and the incident ray is called the scattering angle, θ. The
Stokes vectors are then defined relative to the scattering plane. The incident and scattered
electric field vectors are both decomposed into one component that is perpendicular to the
scattering plane and another component that is parallel to the scattering plane.
The quantitative description of Rayleigh scattering can be derived from the induced
dipole field of the air molecules and is given by the following equation [453, p. 423] in
terms of the total scattering cross-section per molecule of the gas, σs :
32π 3
σs ≈ n − 12 , (5.77)
3N 2 λ4
where N is the number of molecules per unit volume, λ is the wavelength in vacuum,
and n is the index of refraction of the air, assuming n − 1 1. A somewhat simplified
calculation given by Jackson [453, p. 423] shows that in typical conditions, the percentages
of transmission through the atmosphere for the wavelengths 650 nm and 410 nm are 96%
and 76% for the sun at zenith, and 21% and 0.0065% for the sunrise/sunset. Rayleigh
scattering greatly contributes to the red sky of the beautiful sunset.
Alternatively, Rayleigh scattering can also be derived from the Mie scattering by summing
up the first few terms in a series expansion, under the assumption that the particle size is
much smaller than the wavelength. The relation can be expressed by the scattering matrix
[120, p. 132]:
1
(1 + cos2 θ) 12 (cos2 θ − 1) 0 0
2 2 2
16π 4 a 6
m − 1
1 (cos2 θ − 1) 1 (1 + cos2 θ) 0 0 I , (5.78)
Is = 2 2 i
λ4 r 2 m 2 + 2 0 0 cos θ 0
0 0 0 cos θ
where Ii (the radiant flux density) is the Stokes vector of the incident light beam, Is (the
radiant flux density) is the Stokes vector of the scattered light beam, λ is the wavelength
in the medium, a is the radius of the particle, m is the ratio of n̂ 2 , the (complex) index of
refraction of the particle, to n̂ 1 , the index of refraction of the medium, and r is the distance
from the center of the particle to the observation point of the scattered beam, It should
be noted that Rayleigh scattering is not strictly proportional to λ−4 because the index of
refraction is also a function of λ.
The two laws can be combined into a law called the Beer–Lambert law: It (λ) =
Ii (λ)10−(λ)cd , where the concentration is in moles per liter, and the constant (λ) is called
the molar extinction coefficient. The Beer–Lambert law is true only when the effect of mul-
tiple scattering is negligible because the energy scattered once away the forward direction
by one particle can be scattered back into the forward direction by a second particle. Of
course this can only happen if there is a large number of particles along the path, either
because the concentration of particles is high or because the path is long.
5.4.4 Diffraction
In our everyday experience, a beam of light in air seems to travel in a straight line. Thus,
we can draw lines from the light source to predict the shape of the shadow that is cast by
an object. This type of shadow is called the geometrical shadow and the approximation of
the propagation of a light beam in a uniform medium by a straight line is called geomet-
rical optics. However, there are several areas where the approximation does not give us a
5.4 Light propagation across material boundaries 85
reasonable answer. Two of these areas in imaging applications are: (1) areas around material
discontinuities, such as the boundary of the shadow we mentioned above, and (2) areas
where many light rays concentrate together, such as near the focus point. In traditional
terms, deviation from a straight line in light propagation (due to an obstacle) is called
diffraction.
Diffraction problems in optics are very difficult to solve exactly. In principle, we can
solve the Maxwell equations with proper boundary conditions on the surfaces of obsta-
cles. In practice, only a small number of problems, which involve obstacles of very simple
shape and ideal properties have been solved rigorously [124, Chapter XI]. The most famous
one is the diffraction by a half-plane infinitely-conducting infinitely-thin sheet, solved by
Sommerfeld in 1896. Even for the simple geometry of an aperture in a planar screen of
finite width and of finite conductivity, the solution to the diffraction problem becomes too
complicated. An approximate solution to the diffraction problem can be calculated using the
Huygens–Fresnel principle formulated mathematically by Kirchhoff in 1882. Kirchhoff ’s
approximation expresses the electromagnetic wave as a scalar function (i.e., neglects its
polarization) and assumes that the field and its first derivative normal to the surface of
the screen are everywhere zero, except in the aperture where the field is the same as the
incident field (see Fig. 5.3). This zero field and normal derivative boundary condition is
mathematically inconsistent with the Helmholtz wave equation that describes the electro-
magnetic wave. If the boundary condition is true, the field should be zero everywhere. This
inconsistency can be removed by assuming different boundary conditions to allow either the
q
dA n
A q′
r
r′
P
P′
field or its normal derivative to be nonzero on the screen surface. These different boundary
conditions result in different approximations, but in the domains in which we are interested
(the source and the observation points are far away from the screen, relative to the size of
the aperture, which in turn is very much larger than the wavelength of the light), they give
very similar results. The vector theory of the approximation can also be derived when the
screen is assumed to be perfectly conducting [888, Section 12.18].
Despite all the idealistic assumptions, Kirchhoff ’s approximation works remarkably well
for many cases of practical interest. We will use it to calculate the optical transfer function
for a diffraction-limited lens. The Huygens–Fresnel principle formulated by Kirchhoff can
be expressed in the following form:
1 eikr eikr 1
E(P ) = E(P) (cos θ + cos θ )dA, (5.79)
iλ A r r 2
where E(P) is the known electric field at point P, and E(P ) is the diffracted electric field
at point of observation, P ; λ is the wavelength of the incident light, A is the aperture,
r is the vector from P to a small surface element dA at the aperture A, r is the vector
from dA to P , n is the unit normal vector to the surface element dA, pointing to the
opposite side from P (see Fig. 5.3), θ is the angle between r and n, and θ is the angle
between r and n. The term which involves the two angles is called the inclination factor.
In the distance range where the theory is valid, the inclination factor is very close to a
constant.
From Eq. (5.79), if we know the source field distribution, E(P), we can compute the
diffraction field at any point in the space. The physical meaning of the Huygens principle
can be seen in Eq. (5.79): the source field at P propagates to the aperture element dA as
a spherical wave, eikr/r , which then acts as a source and propagates as a spherical wave,
eikr/r , to the observation point, P . It should be noted that the input field at the aperture is
assumed to be a diverging sphere as indicated by the term eikr/r , implying that the lateral
spatial coherence length is at least as large as the diameter of the aperture. Therefore, the
source has to be far away from the aperture, or the aperture has to be smaller than the
transverse coherence length of the source (see (2.3)). Normally, the integral is calculated
over a planar surface over the aperture (assuming that the boundary of the opening is a planar
curve). However, since the observation point is assumed to be far away from the aperture,
we can use any smooth surface over the opening that will make our calculation of the
integral easier. For example, we can choose a spherical surface with the center located at the
observation point P , so that r will be constant at every point of the surface. Alternatively,
if the source is a point on the left hand side of the screen, we can choose the surface to be
a spherical surface with the center located at the point source P, so that r will be constant.
In many imaging applications, the field is assumed to be known at the aperture A and the
Huygens–Fresnel integral is expressed as:
1 eikr
E(P ) = E(P) (cos θ )dA, (5.80)
iλ A r
where P is now a point in the aperture. The inclination factor has now been modified.
Although it is not consistent with the original form, this rarely matters because the integral
5.5 Problems 87
is an approximation anyway and is valid only for a far-away observation point, leading to
similar results for different forms of the inclination factor [453, p. 143].
Calculation of the diffraction pattern from Eq. (5.79) or (5.80) is done for far-away ob-
servation points. The factor r in the denominator can be treated as approximately constant
for every point on the aperture. However, the r in the numerator eikr = ei2πr /λ determines
the phase of the field and cannot be approximated as a constant because a minute differ-
ence of half a wavelength in the radial distance r can change the wave from constructive
to destructive, or vice versa. Two different approximations are often used for calculating
this phase factor, depending on the observation distance. The Fraunhofer approximation is
valid for the far field and the resulting diffraction patterns are called Fraunhofer diffraction
patterns. For the near field (still many, many wavelengths away), the Fresnel approximation
is used, and the resulting patterns are called Fresnel diffraction patterns. These two approx-
imations will be discussed in later chapters when we deal with the wave optics of image
formation.
Diffraction is a consequence of the “wave” nature of light and the resulting diffraction
pattern is often complicated and nonintuitive in the sense that it can be quite different from
what the light ray model (the geometric shadow) would predict. For example, from the
Huygens–Fresnel integral, Poisson showed that an opaque circular disk at a certain distance
can cast a shadow with a bright spot in the center of the shadow. This is so “counterintuitive”
that Poisson used it as evidence to object to Fresnel’s theory of diffraction. However, a careful
experiment performed later confirmed the prediction of the diffraction theory. Similarly, a
circular opening in a screen can transmit a bright disk of light with a dark spot right in the
center. In both cases, the diffraction patterns show numerous, alternating, dark and bright,
concentric rings.
5.5 Problems
5.1 The peak sensitivity of V (λ) is at 507 nm. What is the photon energy in eV at this
wavelength?
5.2 The silicon substrate in a CCD sensor has a band gap of about 1.1 eV. Which part of
the visible wavelengths can be readily absorbed by the silicon?
5.3 When an external electric field, E, is applied to a material, the molecules in the
material are polarized (due to charge displacement). If the molecules are themselves
polar (such as water molecules), the induced dipole moment per unit volume, P, can
be quite large because the molecules will align with the external electric field. P is
often proportional to E. The proportionality constant is found to be a function of
frequency. Give a qualitative explanation of why this is so.
5.4 The index of refraction of a material is a function of frequency. Within the visible
wavelength range, the index of refraction of some materials, such as glasses, is higher
for shorter wavelengths and lower for longer wavelengths. For other materials, the
trend is reversed, i.e., the index of refraction is lower for shorter wavelengths. What
causes this difference? For a given material, does the index of refraction continue to
increase or decrease when the optical frequency is increased?
88 5 Light–matter interaction
5.5 Due to the presence of free electrons in their conduction bands, most metals reflect
light of different visible wavelengths equally well and therefore appear gray. However,
some metals such as copper and gold have characteristic colors, why?
5.6 In order to detect submarines under the sea surface, polarization filters are often used
in the optical imaging devices on an airplane. To which direction should the filters be
turned?
5.7 Although sunlight is unpolarized, skylight can be completely polarized at a certain
angle from the sun. This can be seen from Eq. (5.78). What is this angle and what
is the direction of polarization? Can you give an intuitive explanation of why this
polarization occurs?
5.8 It is well known that if the aperture size of a camera is reduced, the depth of field (the
range of depth in which objects appear sharp) is increased. However, the amount of
available light is also decreased. Therefore, we cannot always use as small an aperture
as we would like. Even if we have plenty of light, there are two other factors to
consider: coherence and diffraction. Discuss their effects on image formation.
6 Colorimetry
The beauty of the golden sky at sunset, the splendor of peacock feathers, and the glorious
spectacle of fireworks are displays of changing color. Our visual sense is greatly enriched
by our perception and appreciation of colors. Although our color perception seems to be
direct and effortless, it is a very interesting subject of immense complexity, as are other
aspects of our visual perception. In the last 70 years, we have made a lot of progress in
understanding the physics, chemistry, optics, physiology, psychophysics, anatomy, neural
science, and molecular biology of human color vision, but we are still very far from being
able to describe exactly how it works. Therefore, practical use of color requires certain
empirical rules. These rules, which are by no means perfect, are based on many years of
experimentation and observation, and they form the empirical foundation of colorimetry,
the science of measuring color.
The basic measurement of a color stimulus is its spectral power distribution as a function
of wavelength (or frequency). The spectral power distribution of a reflecting surface is the
product of the spectral power distribution of the illumination and the spectral reflectance
factor of the surface. Although the same spectral power distribution may produce different
color sensations, depending on its surroundings, background, illumination, and viewing
geometry, all physical specifications of color stimuli start from their spectral power distri-
butions. The link between the objective physics and the subjective perception is provided
by photometry and colorimetry. These two fields of scientific study attempt to quantify the
capacity of light stimuli to produce color sensation. Because of the incomplete knowledge
we have of the total process of color perception, the initial goals of photometry and col-
orimetry have been intentionally limited to the physical specifications of colors, without
addressing the problems of the appearance of color. However, even with this limited scope
of study, their applications to practical problems have been widely accepted and the results
have been quite useful. In this chapter, we will review the basis of colorimetry and describe
some of its applications.
Colorimetry is a scientific field in which the physical specifications of light stimuli that
produce a given color sensation under restricted viewing conditions are studied. In particular,
it seeks to represent the spectral power distribution of any incident light with a much
reduced number of parameters (almost always three, and sometimes four, corresponding to
89
90 6 Colorimetry
the number of photoreceptor types we believe are facilitating the sensation of color under
the given conditions). Colorimetry is mainly concerned with color matching and predicting
small color differences when two stimuli do not match. It should be pointed out from the
beginning that colorimetric data alone do not specify what color we will see. The objective
of colorimetry is to specify the physical aspects of color stimuli, not the color appearance.
Long before the current, direct evidence for the four types of photoreceptor (rod and three
types of cone) in the retina of our eyes was available, psychophysical data had suggested the
experimental laws of color matching. In the luminance range in which the rods are mostly
saturated and by restricting the visual angle to a small subtense around the fovea, it was
found that many, but not all, visual stimuli can be completely matched in color by additive
mixtures of three fixed, independent stimuli (called primaries) with their radiant powers
properly adjusted. (“Independent” here means that none of the three stimuli can be matched
by any additive mixture of the other two.) The empirical observations were summarized
by Grassman [359, 861]. A stronger form of Grassman’s laws was given by Wyszecki and
Stiles [1053] as follows:
1. Symmetry law: If color stimulus A matches color stimulus B, then color stimulus B
matches color stimulus A. (It does not matter how you move them around.)
2. Transitivity law: If A matches B and B matches C, then A matches C.
3. Proportionality law: If A matches B, then α A matches αB, where α is any positive
factor by which the radiant power of the color stimulus is increased or decreased.
4. Additivity law: If A matches B and C matches D, then (A ⊕ C) matches (B ⊕ D),
where the operator “⊕” means the additive color mixture. Also, if A matches B and
(A ⊕ C) matches (B ⊕ D), then C matches D.
These laws hold true only if the matching color stimuli are observed under the same viewing
conditions (including the surround and the background) and by the same observer. The laws
are very similar to the definition of a vector space, and indeed, Grassman did express them
in terms of vector operations. These empirical laws become easy to understand if we assume
that the light incident on our retinas is absorbed by three types of photoreceptor and color
matching between two color stimuli is achieved when the photoreceptor responses to both
stimuli are equal. This interpretation is called the receptor-level theory of color matching.
One of the main goals of colorimetry is to derive a concise numerical representation of any
given color stimulus from its spectral power distribution. A spectral power distribution may
take more than 40 numbers to specify. Knowing that we have only three types of (cone)
photoreceptor responsible for color vision, we would expect that only three numbers are
necessary to specify the “visual effects” that the given spectral power distribution exerts
on our visual system. Therefore, the main task of colorimetry is to determine the three
cone response functions. However, until the 1980s [837], it was not technically possible to
measure the cone response functions directly from the photoreceptors themselves. Before
that the best available method for estimating the cone spectral response functions was the
6.2 The receptor-level theory of color matching 91
R R(l)
+
C(l) G G(l)
+
B B( l)
Figure 6.1. The bipartite field used in color matching experiments.
psychophysical color matching experiment which was developed in the 1920s. As will be
shown shortly, this method cannot give us the cone response functions themselves, but
only some of their linear combinations. For the purpose of specifying a color stimulus,
linear combinations are good enough. The only need then was to standardize a set of linear
combinations of the cone spectral response functions to be used for color specifications.
The standardized functions are the CIE color matching functions, x(λ), y(λ), and z(λ), to
be described below.
The basic procedure for measuring a color matching function is to use a bipartite visual
field of a certain angular subtense, where the three primaries and the test monochromatic
stimulus are projected onto two separate halves (see Figure 6.1). The observer adjusts the
radiant powers R, G, B of the three primaries (R), (G), (B) until the color produced by the
additive mixture of the three primaries matches that of the test stimulus. The unit for the
radiant power of the primaries is chosen so that the mixture of one unit from each of the three
primaries will match the color of the equal-energy white. The term, equal-energy white,1
refers to a stimulus that has equal radiant power per wavelength interval at every wavelength.
The test stimulus is ideally a single-wavelength light of unit power. The wavelength is varied
over the range to be measured. The amounts of the unit power for the three primaries are
then recorded as the tristimulus values of the test stimulus. For example, if (R), (G), and
(B) are the unit amounts of the three primaries, and the test stimulus (C) is matched by
additively mixing R units of (R), G units of (G), and B units of (B), then R, G, B are the
tristimulus values of (C) with (R), (G), and (B) as the primaries. The CIE recommends
expressing this color matching as
(C) ≡ R(R) + G(G) + B(B), (6.1)
where ≡ means color matching. Bold upper case letters inside (), such as (C), (R), (G), and
(B), represent color stimuli or lights. Italic upper case letters, such as R, G, and B, represent
scalar multipliers, meaning the amounts of light. For example, if the radiant power of the
light (R) is t W, the radiant power of R(R) is Rt W.
Let us assume that there are three types of cone photoreceptor in our retinas: L, M, and
S, each of which has a different and relatively broad (about 100 nm) spectral sensitivity
function (see Figure 6.2). Let (R), (G), and (B) be our unit primaries. Let L r , L g , and
1 A more accurate term would be equal-power white because it is the radiant power that is our concern in
continuous viewing. However, by tradition the term equal-energy white has been used in colorimetry and we will
follow this tradition.
92 6 Colorimetry
S B C G R
L
1.0 Mg
0.8 Sb
Lg
0.6 M
Mc
0.4
Lc
Lr
0.2 Mb
Sc Sg Mr
Lb Sr
0.0
400.0 450.0 500.0 550.0 600.0 650.0 700.0
wavelength (nm)
Figure 6.2. An imaginary example experiment for the receptor-level theory of color matching using
monochromatic primaries, R, G, and B, and a monochromatic test stimulus, C. The three curves
represent the normalized spectral sensitivity functions of L, M, and S cones. (The vertical axis is
arbitrarily scaled.)
L b be the number of photons that the L cone absorbs when it is stimulated by one unit
of (R), one unit of (G), and one unit of (B), respectively. Let Mr , Mg , Mb and Sr , Sg ,
Sb be the number of absorbed photons for the M cone and the S cone, corresponding to
stimulation by the unit (R), (G), (B) primaries. Now when the eye is stimulated by a color
stimulus (C), let L c , Mc , and Sc be the number of photons absorbed by the L, M, and S
cones respectively. The receptor-level theory maintains that if the photon catch of each cone
system is identical for two color stimuli, the two colors should match each other regardless
of any other difference in their physical parameters, such as their spectral compositions. In
a color matching experiment, the observer adjusts the radiant power of the three primaries
so that the color of their mixture matches the color of the test stimulus. Mathematically, the
observer is trying to find the three coefficients ar , ag , and ab by adjusting the primaries so that
if and only if
L c = ar L r + ag L g + ab L b ,
M c = ar M r + ag M g + ab M b , (6.3)
Sc = ar Sr + ag Sg + ab Sb .
and
−1
ar Lr Lg Lb Lc Lc
ag = M r Mg Mb Mc = P −1 Mc . (6.5)
ab Sr Sg Sb Sc Sc
As long as the matrix P is nonsingular (i.e., none of the three primaries can be matched by
the linear combination of the other two), ar , ag , and ab will always have a unique solution
(i.e., the test color can be matched by some algebraic mixture of the three primaries).
However, the solution may require that some of ar , ag , and ab be negative, a requirement
that has to be satisfied by moving the negative-valued primaries to, and mixing with, the
test stimulus. It is also interesting to note that multiplying both sides of Eq. (6.4) by any
nonsingular 3 × 3 matrix from the left does not change the solution of ar , ag , and ab .
This means that color matching experiments can determine the cone spectral sensitivity
functions only up to their linear combinations. When the test stimulus (C) is an ideal,
monochromatic (single-wavelength) light of unit power and its wavelength is varied over
the visible range, the measured functions: ar (λ), ag (λ), and ab (λ), are called the color
matching functions of the chosen (R), (G), (B) primaries.
The receptor-level theory of color matching is quite logical and intuitively “true”. How-
ever, it needs further refinements and qualifications. Firstly, it is assumed that a photon once
absorbed by a visual pigment produces the same “neural” signal independent of what its
wavelength is. This assumption is called the principle of univariance [1053, p. 586], which
has been shown to be consistent with experimental data. Secondly, it is implicitly assumed
that the stimuli have to be large enough so that signals sensed by the three types of cones can
be compared reliably. For example, if the stimulus when projected on the retina is less than
the cross-section of a single cone, the perceived color must be unstable because a slight eye
movement is likely to shift the image to a different type of cone. Thirdly, it is also assumed
that the neural circuits of our color vision system can compare signals sensed at spatially
separated photoreceptors. The same composition of light is sensed by three different types of
cone, and the cone output signals are compared to determine the color sensation. Fourthly, it
is assumed that the perceived color at a given spatial position is uniquely determined by the
receptor photon catches at that location alone. This has been shown many times in the past
to be false [139, 344]. The perceived color also depends on the surroundings, the luminance
level, and the chromatic adaptation. Therefore, color matches hold true from this point of
view only when two stimuli are presented under the “same” viewing conditions, including
the surrounding background, the visual angle, the illumination intensity, and the adaptation
state of the eyes.
Color matching results have been found to hold over a wide range of illumination level,
except when it is very bright (due to response saturation or self-screening: see [1053, p. 588])
or very dim (into the mesopic or scotopic range when rods come into play).
As we mentioned before, in order to set up a physical color specification system such that
receptor-level color matching can be achieved whenever two color stimuli have the same
94 6 Colorimetry
numerical representation, the spectral sensitivity curves of the L, M, and S cones are clearly
the major functions to be measured. However, due to technical difficulties in direct recording
from the photoreceptor cells, in the 1920s and 1930s indirect psychophysical measurements
were the only possibility. Direct measurements on the cone spectral absorption curves have
since been accomplished [90, 236, 837, 838], and the results agree very well with those
from psychophysical color matching experiments [838]. The series of experiments that
eventually led to the CIE standard color matching functions (which were “believed” to be
linear combinations of the spectral sensitivity functions of the cones) was based on the idea of
matching all monochromatic stimuli wavelength by wavelength with three chosen primaries.
Since any spectral power distribution can be considered as the sum of monochromatic light
at all wavelengths, if we can measure the tristimulus values of monochromatic light at every
wavelength, we can calculate the tristimulus values of light of any spectral composition.
Therefore, we have a system for specifying all color stimuli. This idea can be most easily
understood by using a discrete approximation of the spectral power distribution, Pλ (λ), of
a given color stimulus, (C):
(C) = Pλ (λi )λ. (6.6)
i
Therefore, if we measure the color matching functions, ar (λi ), ag (λi ), and ab (λi ) for a given
set of primaries, (R), (G), and (B), any given color stimulus with spectral power distribution
Pλ (λ) can be specified by Ar , Ag , and Ab which are the amounts of the primaries required
to match the given color stimulus. Therefore, the basic task of colorimetry is to agree upon
a set of color primaries and a set of color matching functions based on that set of primaries.
In order to specify the attributes of a color stimulus independent of its radiant power,
we would like to define quantities called chromaticity coordinates. Let R, G, and B be the
tristimulus values of a given color, then its chromaticity coordinates (r, g, b) are defined as:
R
r = ,
R+G+B
G
g = , (6.7)
R+G+B
B
b = .
R+G+B
6.4 Transformation between two sets of primaries 95
matrix, any three linearly independent pairs of tristimulus values are sufficient to determine
the unique transformation.
There are situations in which it is desirable to specify the transformation between two
sets of primaries by pairs of chromaticity coordinates, rather than tristimulus values in
both spaces. One example is the transformation of spectral chromaticity functions from
a set of physical primaries to a set of imaginary primaries in order to make the color
matching functions nonnegative for all wavelengths. This is exactly what was done when
the CIE specified its 1931 Standard Colorimetric Observer, as will be described in the
following section. Another example is the NTSC specification of color signals, where the
chromaticities of the three monitor phosphors and the illuminant are specified for TV color
reproduction. Here we will show how the transformation of chromaticity coordinates can
be done. Numerical examples will be given in later sections.
Let R, G, B be the tristimulus values for the (R), (G), (B) primaries and X , Y , Z be
the tristimulus values for the (X), (Y), (Z) primaries. Their chromaticity coordinates are
denoted by r, g, b and x, y, z. As we have just shown, the transformation from R, G, B to
X , Y , Z can be done by a 3 × 3 matrix, A:
X R a11 a12 a13 R
Y = A G = a21 a22 a23 G . (6.12)
Z B a31 a32 a33 B
Since, by definition, X = x(X + Y + Z ), Y = y(X + Y + Z ), Z = z(X + Y + Z ), R =
r (R + G + B), G = g(R + G + B), and B = b(R + G + B), the above equation can be
expressed as
x a11 a12 a13 r
y = R + G + B a21 a22 a23 g . (6.13)
X +Y + Z
z a31 a32 a33 b
From Eq. (6.12) and the definition of chromaticity, we have
and
R+G+B 1
= . (6.14)
X +Y + Z (a11 + a21 + a31 )r + (a12 + a22 + a32 )g + (a13 + a23 + a33 )b
Substituting the above relation into Eq. (6.13), we have
a11r + a12 g + a13 b
x = ,
(a11 + a21 + a31 )r + (a12 + a22 + a32 )g + (a13 + a23 + a33 )b
(6.15)
a21r + a22 g + a23 b
y = ,
(a11 + a21 + a31 )r + (a12 + a22 + a32 )g + (a13 + a23 + a33 )b
and z can be computed as 1 − x − y. Since all the numerators and the denominators can be
divided by a nonzero constant, there are only eight unknowns in Eq. (6.15). Therefore, given
6.5 The CIE 1931 Standard Colorimetric Observer (2◦ ) 97
In 1931, the CIE adopted the color matching functions x(λ), y(λ), z(λ) of the standard
observer for 2◦ viewing. The functions were based on the chromaticity coordinates of
monochromatic stimuli r (λ), g(λ), and b(λ) measured by Guild and Wright and the CIE
1924 luminous efficiency function V (λ).
Guild’s color matching results were from seven observers using color-filtered primaries of
relatively broad wavelength bands. Wright’s data were from ten observers using monochro-
matic stimuli at 650 nm, 530 nm, and 460 nm as primaries. Both of them limited their
experiments to between 400 and 700 nm, using 2◦ bipartite matching fields. Both sets of
data were converted to that of monochromatic primaries at wavelengths of 700.0, 546.1, and
435.8 nm. The three primaries are designated as (R), (G), and (B). The reason for choosing
these wavelengths was that the last two correspond to two strong mercury spectral lines,
which had served as standards for the National Physical Laboratory in England, and the
700 nm wavelength was chosen because its hue is very stable to slight wavelength varia-
tions. The units of the three primaries were scaled so that the mixture of unit quantities of
the three spectral stimuli matches the equal-energy spectrum. Their radiant powers were
in the ratios 72.0962:1.3791:1.0000. This ratio was selected so that equal quantities of the
three primaries would match the equal-energy spectrum. The two sets of data after being
98 6 Colorimetry
2.0 g(l)
chromaticity coordinates
b(l) r(l)
1.0
0.0
-1.0
-2.0
350.0 450.0 550.0 650.0 750.0
wavelength (nm)
Figure 6.3. The chromaticity coordinates, r (λ), g(λ), and b(λ), of the CIE 1931 RGB system with
respect to three real primaries: (R) at 700 nm, (G) at 546.1 nm, and (B) at 435.8 nm.
carefully calibrated and converted were averaged and smoothed to produce the chromaticity
coordinates of monochromatic stimuli r (λ), g(λ), and b(λ), as shown in Fig. 6.3. It should
be pointed out that variations among individual observers were fairly marked, especially
in the blue–green region [289, p. 59]. The chromaticities alone do not specify the radiant
powers required to achieve a color match (because r (λ) + g(λ) + b(λ) = 1 and the abso-
lute radiant power for the match is lost). It would have been possible to derive the color
matching functions if all the radiant powers of the monochromatic test stimuli had been
recorded in the experiments. In order to derive the color matching functions for the (R),
(G), (B) primaries, the CIE made a strong assumption that the photopic luminous efficiency
function V (λ) is a linear combination of the color matching functions r (λ), g(λ), and b(λ):
where L R , L G , and L B are the luminances of the three primaries, which can be obtained
from the products of the relative radiant power and the luminous efficiency function V (λ).
The calculated ratio is 1:4.5907:0.0601. Dividing both sides of the equation by s(λ) =
r (λ) + g(λ) + b(λ), we have
V (λ) r (λ) g(λ) b(λ)
= LR + LG + LB
s(λ) s(λ) s(λ) s(λ)
= L R r (λ) + L G g(λ) + L B b(λ). (6.20)
Therefore,
V (λ)
s(λ) = (6.21)
L R r (λ) + L G g(λ) + L B b(λ)
and the color matching functions can be obtained as r (λ) = s(λ)r (λ), g(λ) = s(λ)g(λ), and
b(λ) = s(λ)b(λ), which are shown in Fig. 6.4. Since the units of the primaries are chosen
6.5 The CIE 1931 Standard Colorimetric Observer (2◦ ) 99
0.40
b(l) r(l)
0.30
tristimulus values
g(l)
0.20
0.10
0.00
-0.10
-0.20
350.0 450.0 550.0 650.0 750.0
wavelength (nm)
Figure 6.4. CIE 2◦ color matching functions, r (λ), g(λ), b(λ) with 435.8, 546.1 and 700 nm as
primaries.
so that equal quantities of the primaries match the color of the equal-energy spectrum, the
areas under the three curves are equal to each other.
The color matching functions, r (λ), g(λ), and b(λ), have negative values at some wave-
lengths, which is inconvenient for hand calculation. The CIE hence adopted a transformation
from the trichromatic system based on the primary stimuli (R), (G), and (B) to one based
on new primary stimuli (X), (Y), and (Z). The new primaries (X), (Y), and (Z) were chosen
so that [183]:
1. The new color matching functions x(λ), y(λ), z(λ) have only positive values. This
means that the new primaries are not physically realizable. (Why? Try to answer this
question after you finish studying this chapter.)
2. In the new (x, y)-chromaticity diagram, the spectrum locus is bounded, as closely as
possible, by the three lines: x = 0; y = 0; and x + y = 1, so that the area inside the
(x, y)-chromaticity triangle occupied by the real colors is maximized.
3. In the (r, g)-chromaticity diagram, the locus of spectrum colors from 570 nm to
700 nm is virtually a straight line (see Fig. 6.5). Since r (λ = 700 nm) = 1.0, g(λ =
700 nm) = 0.0 and r (λ = 600 nm) = 0.847, g(λ = 600 nm) = 0.154, the equation of
this line is r + 0.99g = 1.00. If one axis of the new primaries is chosen on this line,
only two primaries are needed in this spectral region.
4. The tristimulus value Y is chosen to be equivalent to the luminance of the color stimuli,
and the luminances of the (X) and (Z) primaries are set to zero. Therefore, the (X)
and (Z) primary stimuli are located on the alychne (which is Greek for “line without
light” [620, p. 189]). The alychne on the (r , g, b)-chromaticity diagram (see Fig. 6.5)
is defined by
L R r + L G g + L B b = 0, (6.22)
100 6 Colorimetry
Stimulus r g b x y z
3.0
(Y)
g
2.0
505
500
1.0
570
600 r
(Z) 380 700
0.0
-2.0 -1.0 0.0 1.0 2.0
(X)
-1.0
The above guidelines do not uniquely specify the required transformation. The trans-
formation recommended by the CIE was through the specification of corresponding chro-
maticity coordinates of four stimuli of well-defined spectral distributions: see Table 6.1.
The CIE 1931 Standard Colorimetric Observer is defined by first transforming the r, g, b
chromaticity coordinates to the x, y, z chromaticity coordinates by the following relations,
which can be derived from the above four cardinal stimuli:
0.490 00r (λ) + 0.310 00g(λ) + 0.200 00b(λ)
x(λ) = ,
0.666 97r (λ) + 1.132 40g(λ) + 1.200 63b(λ)
0.176 97r (λ) + 0.812 40g(λ) + 0.010 63b(λ)
y(λ) = , (6.23)
0.666 97r (λ) + 1.132 40g(λ) + 1.200 63b(λ)
0.000 00r (λ) + 0.010 00g(λ) + 0.990 00b(λ)
z(λ) = .
0.666 97r (λ) + 1.132 40g(λ) + 1.200 63b(λ)
6.5 The CIE 1931 Standard Colorimetric Observer (2◦ ) 101
2.0
z (l)
1.5
tristimulus values
x (l)
1.0
y (l)
0.5
0.0
300.0 400.0 500.0 600.0 700.0 800.0
wavelength (nm)
Figure 6.6. CIE 1931 2◦ color matching functions, x(λ), y(λ), z(λ).
These x, y, z chromaticity coordinates are then converted to spectral tristimulus values (the
color matching functions) x(λ), y(λ), z(λ) as follows:
where V (λ) is the CIE photopic luminous efficiency function. Figure 6.6 shows the result-
ing x(λ), y(λ), z(λ) color matching functions, and Fig. 6.7 shows the CIE 1931 (x, y)-
chromaticity diagram. It should be pointed out that Guild and Wright’s experiments did not
actually determine spectral tristimulus values. They determined only matching chromatici-
ties for spectral stimuli. The function y(λ) was not measured in their experiments. This has
three very important consequences: (1) use of V (λ) as y(λ) makes the colorimetric lumi-
nance Y consistent with the photometric luminance; (2) errors in the original data of V (λ)
are now incorporated into the colorimetric standards; and (3) in theory, all color match-
ing functions can be regarded as linear combinations of the three cone spectral sensitivity
functions. However, the experimental procedures used to measure V (λ), such as the flicker
photometry, are not explicitly founded on this hypothesis. By using V (λ) as y(λ), the CIE
1931 color matching functions x(λ), y(λ), z(λ) can be regarded as linear combinations of
the three cone spectral sensitivity functions only to the extent that V (λ) is itself a linear
combination of the three cone spectral sensitivity functions.
The color matching functions x(λ), y(λ), z(λ) recommended in 1986 [186] agree closely
with those defined originally in 1931. Three minor changes have been introduced. At λ =
775 nm the new value of x(λ) is 0.0001 instead of 0.0000; at λ = 555 nm the new value of
y(λ) is 1.0000 instead of 1.0002; and at λ = 740 nm the new value of y(λ) is 0.0002 instead
of 0.0003.
When the visual angle is between 1◦ and 4◦ , the CIE 1931 2◦ Standard Observer color
matching functions are recommended. When it is greater than 4◦ , the CIE recommend use
of the 1964 Supplementary Standard Colorimetric Observer, which is described below.
102 6 Colorimetry
1.0
520
0.8
550
0.6
500
y
0.4 600
A
B
E 700
C
0.2
400
0.0
0.0 0.2 0.4 0.6 0.8 1.0
x
Figure 6.7. CIE 1931 (x, y)-chromaticity diagram. The four crosses show the chromaticity loci of
the CIE standard illuminants, A, B, and C, and the equal-energy white E.
For a visual target larger than 2◦ , a new set of color matching functions is needed because the
ratio of different cone types varies as a function of retina location and because the macula
yellow pigments cover the area around the fovea (see Section 11.1). The CIE recommended
its 1964 Supplementary Standard Colorimetric Observer for 10◦ viewing based on the data
measured by Stiles and Burch (49 observers), and by Speranskaya (18 observers). In these
two experiments, the color matching functions, r 10 (λ), g 10 (λ), and b10 (λ) were measured
directly with no appeal to V (λ). In order to reduce possible rod intrusion, the luminance
of the matching field of the Stiles–Burch experiment was kept high. The three primaries
used were 645.2 nm, 526.3 nm, and 444.4 nm. Figure 6.8 shows the mean data measured
by Stiles and Burch. The CIE wanted to construct its 10◦ color matching functions using a
general plan that was as close as possible to that followed in deriving the 1931 CIE(X, Y,
Z) 2◦ color matching functions. The following equations were finally adopted to transform
the original r 10 (λ), g 10 (λ), b10 (λ) data to the color matching functions, x 10 (λ), y 10 (λ), and
z 10 (λ) of the CIE 1964 Supplementary Standard Colorimetric (10◦ ) Observer:
x 10 (λ) = 0.341 080r 10 (λ) + 0.189 145g 10 (λ) + 0.387 529b10 (λ),
y 10 (λ) = 0.139 058r 10 (λ) + 0.837 460g 10 (λ) + 0.073 316b10 (λ), (6.25)
z 10 (λ) = 0.000 000r 10 (λ) + 0.039 553g 10 (λ) + 2.026 200b10 (λ).
6.6 The CIE 1964 Supplementary Standard Colorimetric Observer (10◦ ) 103
4.0
r
3.0
tristimulus values
2.0
b g
1.0
0.0
-1.0
300.0 400.0 500.0 600.0 700.0 800.0
wavelength (nm)
Figure 6.8. The mean 10◦ color matching function in the Stiles–Burch (1959) experiment. The three
primaries used were 645.2 nm, 526.3 nm, and 444.4 nm.
3.0
10°
2°
z10 (l)
2.0
trisimulus values
x10(l)
y10(l)
1.0
0.0
300.0 400.0 500.0 600.0 700.0 800.0
wavelength (nm)
Figure 6.9. CIE 1964 10◦ color matching functions (solid curves): x 10 (λ), y 10 (λ), and z 10 (λ).
The dashed curves are the CIE 1931 2◦ color matching functions, x(λ), y(λ), and z(λ), for
comparison.
The units of the X 10 , Y10 , Z 10 primaries are again chosen so that the equal-energy white
has a chromaticity point (x10 , y10 ) at (1/3,1/3). The y 10 (λ) is to be used for calculating
luminances when the size of stimuli is greater than 4◦ . Figure 6.9 shows the x 10 (λ), y 10 (λ),
and z 10 (λ) color matching functions.
104 6 Colorimetry
Once the color matching functions, x(λ), y(λ), and z(λ), are defined, the tristimulus values,
X , Y , and Z , of a color stimulus, φλ (λ), can be calculated by the additivity law of color
matching, as expressed by the following formulas:
X =k φλ (λ)x(λ)λ, (6.26)
λ
Y =k φλ (λ)y(λ)λ, (6.27)
λ
Z =k φλ (λ)z(λ)λ. (6.28)
λ
The summation should be over the entire range of the visible spectrum, 360–830 nm in
1 nm increments. For most practical purposes, the summation may be approximated by
using wavelength intervals, λ, equal to 5 nm over the wavelength range 380–780 nm. For
a reflecting or transmitting object, the color stimulus function, φλ (λ), is replaced by the
relative color stimulus function, evaluated as φλ (λ) = R(λ)Sλ (λ), or φλ (λ) = τ (λ)Sλ (λ),
where R(λ) is the spectral reflectance factor (or spectral radiance factor or spectral re-
flectance) of the object color, τ (λ) is the spectral transmittance of the object color, and
Sλ (λ) is the relative spectral power distribution of the illuminant. In this case, the constant,
k, is chosen so that Y = 100 for objects for which R(λ) or τ (λ) = 1 for all wavelengths, and
hence
k = 100 Sλ (λ)y(λ)λ . (6.29)
λ
For self-luminous objects and illuminants, if the Y value is required to give the absolute
value of a photometric quantity, the constant, k, must be put equal to K m , the maximum
spectral luminous efficacy (which is equal to 683 lm W−1 ) and φλ (λ) must be the spec-
tral concentration of the radiometric quantity corresponding to the photometric quantity
required.
If some of the needed values of φλ (λ) are not measured, they should be interpolated from
a third-degree polynomial or by the Lagrange interpolation formula (see [430, p. 122]).
When the measurement range is less than the practical range of summation, 380–780 nm,
the missing values may be set equal to the nearest measured value of the appropriate quantity
in question. Methods that use extrapolated values can give potentially better results (see,
for example, [620, pp. 64–70]). The CIE specifies that the range of the summation is an
essential part of the tristimulus specification.
There are a few useful mathematical relations between some of the colorimetric quantities.
These relations are often used in proving certain statements in colorimetry and frequently
have important consequences in applications. They are described in this section.
6.8 Some mathematical relations of colorimetric quantities 105
1.0
0.8
y
0.6
A
cA +dB
0.4
B
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
x
Figure 6.10. The locus of the additive mixture, c A + d B, is on the line segment AB.
1. The areas under the curves r (λ), g(λ), b(λ) are equal because of the way the units of
the primaries are chosen. The tristimulus values of the equal-energy white are matched
by unit amounts of the three primaries. The areas under x(λ), y(λ), and z(λ) are also
equal because of the way the transformation is defined in Eq. (6.23).
2. Let A and B be two color stimuli, and c and d be two constants. On a chromaticity
diagram, the chromaticity point of the additive mixture of cA and dB is on the line
connecting the chromaticity point of A and the chromaticity point of B (see Fig. 6.10).
Let the tristimulus values of A be (X a , Ya , Z a ), and those of B be (X b , Yb , Z b ). Their
chromaticity coordinates are (xa , ya ) and (xb , yb ). Define ka and kb by
c(X a + Ya + Z a )
ka = ,
c(X a + Ya + Z a ) + d(X b + Yb + Z b )
d(X b + Yb + Z b )
kb = .
c(X a + Ya + Z a ) + d(X b + Yb + Z b )
The tristimulus values of the additive mixture of cA and dB are (cX a + d X b , cYa +
dYb , cZ a + d Z b ) and the chromaticity coordinates of the mixture, (x, y), are:
cX a + d X b
x = = ka x a + kb x b ,
c(X a + Ya + Z a ) + d(X b + Yb + Z b )
cYa + dYb
y = = ka ya + kb yb .
c(X a + Ya + Z a ) + d(X b + Yb + Z b )
Since ka + kb = 1, (x, y) is on the line segment connecting (xa , ya ) and (xb , yb ). This
collinearity of additive color mixing is very useful. For example, it has been used in
106 6 Colorimetry
determining the illuminant chromaticity from specular highlights [560] and the color
gamut of reflecting surfaces [617].
3. The chromaticities of all physically possible stimuli must be bounded in a convex
region by the chromaticities of the spectrum colors. Since all physically realizable
color stimuli are composed of light of different wavelengths in various proportions,
their chromaticity loci must fall on the line segments connecting the loci of the single
wavelength lights. Therefore, the chromaticities of all physically possible stimuli must
be bounded in a convex region by the chromaticities of the spectrum colors.
Misuse of colorimetric data is quite common. The major problem arises when one confuses
color specifications with color appearance. For example, one often sees CIE diagrams with
pretty colors painted on them. As Breneman pointed out, this type of painting can be
misleading in two ways:
1. It implies that a given point on the chromaticity diagram represents one perceived color.
This is not true. In fact, a color stimulus at a given chromaticity point can appear to be
almost any color, depending on the illumination, the spectral reflectance, the surround,
and the viewing and lighting geometry.
2. It implies that a given perceived color should fall at a certain point on the chromaticity
diagram. This is not true, either. A perceived color can come from almost any point on
the diagram.
It should also be pointed out that a color matching experiment itself has its inherent
difficulties:
1. Adjusting three knobs to match a color cannot always be done accurately. Some people
stop short of getting an exact match and give up (humans do not have infinite patience).
2. Certain colors are more difficult to match than others. We are not equally sensitive to
color differences at all wavelengths.
3. It is not known how much influence adaptation has when the subject is fixated on the
test fields.
1. V (λ) has been found to be too low in the short wavelength region (see Section 4.2).
2. The assumption that V (λ) is a linear combination of color matching functions is of
unknown validity, although this assumption is accepted as true by many researchers in
the vision community [863, 885].
3. Cone sensitivities are not uniform even within 2◦ of the fovea area, and are even less
so for the 10◦ area. Fovea cones are slender and parafovea cones are shorter. The shape
of the cone may affect its spectral sensitivity.
4. The CIE Standard Observers represent “averaged” sensitivities. Every individual has
slightly different sensitivities [134, 515, 702, 703]. For some people, the difference can
6.10 Color differences and uniform color spaces 107
be significant. If one plots the chromaticity loci of the spectrum, it generally has the
same kind of shape as the standard observer. However, differences tend to be in the
yellow direction.
5. The additivity law in colorimetry fails at very high and very low luminance levels.
6. The facts that color matching functions vary with field size, that many objects are larger
than 2 degrees, and that our eyes are constantly moving, projecting the same object
point onto different locations on the retina, all make us wonder how the CIE system of
colorimetry can be so useful in so many applications. Evidence shows that our color
perception is more of a reconstructed impression than a point-by-point mapping of
tristimulus values.
7. Although there is a high correlation between the colorimetric specification of a color
stimulus and its perceptual appearance under normal viewing conditions, it would be a
major misuse of the colorimetry if we did not keep in mind that the colorimetric speci-
fication of color stimuli only expresses the equivalence relation up to the photoreceptor
signal level and not much beyond that. It has been well established that color perception
involves many more complex, spatial, and temporal, processes after the photoreceptors
in the retina and up into the visual cortex and other parts of the brain.
The CIE 1931 XYZ tristimulus values were not defined with explicit consideration of color
differences. Two colors with a small difference in the tristimulus values may look very
different or virtually indistinguishable depending on where the two colors are located in
the XYZ space. For example, MacAdam [618] measured the precision of color matching,
under constant luminance, in a 2◦ matching field surrounded by a 42◦ surround field of a
chromaticity similar to that of CIE source C and a luminance of about 24 cd m−2 . His data
showed that the just-noticeable color differences varied greatly over the CIE 1931 (x, y)
chromaticity diagram. Figure 6.11 shows the contours of 10/3 just-noticeable difference
for many chromaticities on the (x, y) diagram at a constant luminance of about 48 cd m−2 .
These elliptic contours are known as the MacAdam ellipses. It should be pointed out that the
MacAdam ellipses were derived from data from only a single observer, and therefore cannot
be considered as representative for the average observer. Other research groups repeated
MacAdam’s measurements and the general finding was that the sizes and the orientations
of the ellipses are in good agreement between different experiments. However, there were
considerable variations in the shape of the ellipses (the ratio of the major axis to the minor
axis) for different observers.
Since then, there have been many attempts to find some transformation that will map
these ellipses into circles of the same radius. A transformation with this ideal property
can then be said to have mapped the CIE 1931 (x, y)-chromaticity diagram into a uniform
chromaticity scale. If luminance variation is also considered, an ideal mapping should
produce a uniform color space. It was (and probably still is) believed by many people that a
uniform color space would be most useful for applications where perceptual errors are to be
minimized.
108 6 Colorimetry
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
x
Figure 6.11. MacAdam ellipses showing 10/3 just-noticeable difference contours for stimuli of con-
stant luminance (about 48 cd m−2 ).
After a long series of studies, proposals, and revisions, in 1976 the CIE recommended
the CIE 1976 UCS (uniform chromaticity scale) diagram, the CIELUV color space, and the
CIELAB color space. None of these recommended transformations succeeded in making
the MacAdam ellipses truly circles of uniform size. However, they all make the transformed
spaces nearly uniform. These uniform color spaces are intended for comparisons of small
color differences between object colors of the same size and shape, viewed in identical
white to mid-gray surroundings, by an observer photopically adapted to a visual field of
chromaticity not too different from that of average daylight. Two of the terms that require
special attention here are white and neutral. These two terms are often used interchange-
ably. However, some subtle, but important differences exist. Strictly speaking, white refers
to perception and neutral refers to physical specification. For example, in calibrating a CRT
monitor, it is important to adjust the monitor circuits so that equal R,G,B values produce
the same chromaticity (this is called color tracking, i.e., the three colors track each other
to maintain the same chromaticity throughout the whole luminance range). The (physical)
chromaticity that is so maintained is called the neutral point. However, it is often impossi-
ble to obtain perfect color tracking and therefore the chromaticity at the maximal R,G,B
values is commonly taken as the neutral point (it is often called the white point of the
monitor because it has the maximum luminance and also often appears to be white). As we
mentioned, white refers to perception. For example, the bright snow on an image displayed
on the monitor often appears to be white. However, color appearance is often influenced
by the colors of the immediately neighboring regions and also by almost everything else in
the visual field. Therefore, a color stimulus can appear white, but its physical measurement
6.10 Color differences and uniform color spaces 109
can be quite different from the physical specification of the neutral chromaticity. The three
uniform color spaces recommended by the CIE are all based on normalization to neutral
for accounting for the effect of chromatic adaptation. For reflection samples, a neutral point
can be defined as the color stimulus from a perfect reflecting diffuser. Since it reflects 100%
of the light uniformly into all directions, we will call it the neutral white or reference white
to distinguish it from other reflection samples that have the same chromaticity but different
luminances. Although the term “neutral white” seems to be mixing physical and percep-
tual concepts together, we use it for lack of another more commonly accepted term. The
term “reference white” seems to be more widely accepted because the adjective “reference”
appears to take away the subjective part in the term “ white”.
530 550
570 610
580 590 650
500 600 700
0.50 630
490
0.40
v ′ 0.30
480
0.20
470
0.10
450
400
0.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
u′
where Yn is the luminance of the reference white and (u n , vn ) is the chromaticity of the neutral
point (or the reference white). Note that the CIELUV space attempts to adjust to different
illuminants by a translation of the origin to (u n , vn ) in (u , v ) space. Since u and v are ratios
of XYZ tristimulus values and since differences in ratios, such as u − u n , do not follow any
known models of chromatic adaptation, it is not clear why such an illuminant adjustment
procedure was chosen by the CIE. For this reason, we suspect that the CIELUV space
should only be used for illuminants that are close to that used in the original experiments
that produced the data from which the CIELUV formulas were derived.
For viewing reflection samples, the reference neutral point can be taken as the chromatic-
ity of the lighting illuminant. For viewing self-luminous displays, the definition is less clear.
The usual practice is to take the calibrated neutral (white) point. For example, CRT monitors
are calibrated to chosen neutral (white) points of various color temperatures and therefore,
each monitor should have its white point as (u n , vn ). For direct viewing of transparency
materials, such as photographic reversal (slide) films, a common practice is to use light
transmitted through the minimum density of the film as the reference white. For viewing a
projection, the screen radiance projected by the film minimum density as measured at the
viewing position is taken as the reference white. However, this is problematic because the
experimental data used to derive the color space were collected from experiments that used
reflection samples.
∗
The difference E uv between two color stimuli is calculated as the Euclidean distance
between the points representing them in space:
∗
E uv = [(L ∗ )2 + (u ∗ )2 + (v ∗ )2 ]1/2 . (6.36)
h uv = arctan(v ∗ /u ∗ ). (6.39)
Two important characteristics of the CIELUV space should be noted. The first is that
the (u ∗ , v ∗ ) coordinates are proportional to the lightness L ∗ and so is the chroma Cuv
∗
.
6.10 Color differences and uniform color spaces 111
However, the saturation suv does not depend on the lightness. The distinction between
chroma and saturation in CIELUV is an attempt to distinguish the physical correlates of
two perceptual quantities. These color perception terms will be described in Chapter 18
when we discuss color appearance. For now, we will clarify the difference in these two
quantities by referring to the following common experience. Assume that we are in a room
equipped with multiple lamps of the same physical characteristics. For example, in an office
environment, fluorescent lamps are arranged in sections, each of which has several lamps
that can be controlled by different switches. First, let us turn on only some of the lamps and
observe and try to remember the colors of the objects in the room. Next, we turn on all the
lamps and the room looks much more brighter. The same objects now look more colorful
than before, although the basic physical properties such as the spectral reflectances of the
objects have not changed. We say that the chroma has increased with the higher illuminance,
but the saturation remains the same. Whether the “perceived” chroma is proportional to the
lightness is still an open question.
The second important characteristic of the CIELUV space is that u and v are projective
transforms of the CIE XYZ space and therefore a straight line in the CIE XYZ space is also
a straight line in the u , v space. For color stimuli of the same luminance, a straight line in
the CIE XYZ space is again a straight line in the CIELUV space. This property is important
for some applications in additive color reproduction (such as television).
where f (t) = t 1/3 if t > 0.008 856, else f (t) = 7.787t + 16/116. (X n , Yn , Z n ) are the tris-
timulus values of the reference (neutral) white point. Note that the adjustment for the
illuminant is by taking ratios of tristimulus values relative to the reference white. This type
of adjustment is similar to the von Kries model of chromatic adaptation, which is found
to describe experimental data quite well. However, the von Kries model requires ratios
to be taken for each cone response. The CIELAB space does not follow that model be-
cause it takes ratios in XYZ tristimulus values which are linear combinations of the cone
responses. Despite this theoretical difficulty, the CIELAB space is found to be adequate in
many practical applications.
For viewing reflection samples, the reference white can be taken as the reflection from
a perfect reflecting diffuser under the same lighting illuminant. However, this is often
unsatisfactory if the target is a photographic reflection print or reflection print of an image.
The reason is that the picture on the reflection print is a scene in itself and our visual system
makes some adjustment in its judgment regarding the “perceived” lightness, depending on
the illumination geometry in the scene. The best density balance for typical reflection prints
also leaves some reflectance dynamic range to accommodate for specular highlights so that
112 6 Colorimetry
they are printed “brighter than white”. Therefore, the common recommendation for the
reference white in a reflection print is to adjust the density to its scene-dependent optimum
and then back calculate what the ideal reflecting diffuser will be on that print and use that as
the reference white. In this case, the Yn in (X n , Yn , Z n ) for the reflection print is often darker
than the paper white, and the chromaticity of the reference white can also be different from
that of the viewing illuminant.
For viewing self-luminous displays, the definition of reference white is also problematic.
Again, one practice is to take the calibrated white point of the display as the reference white.
This is fine if the content of the displayed material is color graphics. If the content is an
image, a better result is often obtained when the reference white is taken to be the ideal
white object in the image. For direct viewing of transparency materials, such as photographic
reversal (slide) films, a common practice is to use light transmitted through the minimum
density of the film as the reference white. For viewing a projection, the screen radiance
projected by the film minimum density as measured at the viewing position is taken as the
reference white. Again, this is fine when the content is color graphics. A better practice is to
choose the ideal white in the image. In summary, the application of CIELAB to color image
displays is not a straightforward calculation. It involves a subjective adjustment of the color
and density balance for each image first before the reference white can be estimated or back
calculated.
CIE 1976 a,b chroma is defined by
∗
Cab = (a ∗2 + b∗2 )1/2 . (6.43)
CIE 1976 a,b hue-angle is defined as:
h ab = arctan(b∗ /a ∗ ). (6.44)
The CIELAB space is designed for computing small color differences. The color difference
between two stimuli, 1 and 2, are defined as follows. The CIE 1976 lightness difference
∗
L ∗ , chroma difference C ∗ , and total color difference E ab are:
L ∗ = L ∗1 − L ∗2 , (6.45)
∗ ∗ ∗
Cab = Cab,1 − Cab,2 , (6.46)
∗ ∗ 2 ∗ 2 ∗ 2 1/2
E ab = [(L ) + (a ) + (b ) ] . (6.47)
CIE 1976 a,b hue difference is defined by
∗ ∗ 2
Hab = [(E ab ) − (L ∗ )2 − (Cab
∗ 2 1/2
) ] . (6.48)
Among the various “uniform” color spaces, CIELAB is the most widely used. However,
it should be pointed out again that the CIELAB color difference was derived from observing
flat reflection samples on a light gray to white background. This type of color judgment is
quite different from typical color imaging applications, where colors appear in a complex
context and with cognitive interpretation of illumination, texture, shading, shadow, and sur-
face shapes. Two of the important attributes missing in designing the CIELAB and CIELUV
spaces are the spatial–temporal variations of the color stimuli and the interplay between the
object properties and the “interpreted” color perceptions. There are also attempts to apply
6.10 Color differences and uniform color spaces 113
the CIELAB metric to the original three-dimensional scenes. This raises two serious ques-
tions: what is the reference white and can the metric be applied to stimuli more luminous
than the reference white? This type of application is obviously stretching the validity of the
CIELAB space too far beyond what it was originally designed for.
where
SL = 1, (6.50)
∗
SC = 1 + 0.045Cab , (6.51)
∗
SH = 1 + 0.015Cab , (6.52)
∗
where Cab is the chroma of the standard sample. If neither sample is considered as the
∗
standard, then Cab is the geometric mean chroma of the two samples. For the defined
reference conditions, k L , kC , and k H are set to 1. In the textile industry, k L is usually set to
∗ ∗
2.0. The complete color-difference model is denoted as the CIE 1994 (L ∗ , Cab , Hab )
∗
color-difference model with symbol E 94 . The color-difference model is abbreviated as
CIE94. When the k L , kC , and k H used are different from unity (k L : kC : k H ) should be
∗
included after the symbol or abbreviation, e.g. CIE94(2:1:1) or E 94 (2 : 1 : 1) in the textile
industry.
near-neutral color differences. The formula is still based on CIELAB and the CIEDE2000
∗
color differences are calculated as follows: let L , Cab , and h ab be the arithmetic means of
∗
the L , Cab , and h ab values for a pair of samples denoted by subscripts 1 and 2, and
C∗ 7
G = 0.5 1 − ab , (6.53)
∗ 7
Cab + 257
L = L ∗, (6.54)
∗
a = (1 + G)a , (6.55)
∗
b =b , (6.56)
Cab = a 2 + b2 , (6.57)
◦ −1
h ab = (180 /π) tan (b /a ) [degree], (6.58)
L = L 1 − L 2 , (6.59)
Cab = Cab,1 − Cab,2 , (6.60)
h ab = h ab,1 − h ab,2 , (6.61)
h ab
Hab = 2 Cab,1 Cab,2 sin (6.62)
2
"
L 2 Cab 2
Hab 2
Cab Hab
E 00 = + + + RT , (6.63)
k L SL kC SC k H SH kC SC k H SH
where
0.015(L − 50)2
SL = 1 + , (6.64)
20 + (L − 50)2
∗
SC = 1 + 0.045Cab , (6.65)
∗
S H = 1 + 0.015Cab T, (6.66)
with
T = 1 − 0.17 cos(h ab − 30◦ ) + 0.24 cos(2h ab
) + 0.32 cos(3h ab + 6◦ )
− 0.20 cos(4h ab − 63◦ ) (6.67)
and
RT = −sin(2θ)RC (6.68)
where
2
h ab − 275◦
θ = 30 exp− [degree], (6.69)
25◦
∗
Cab
7
RC = 2 7
. (6.70)
∗
Cab + 257
6.11 CIE terms 115
In computing h ab and h ab , the unit used is the degree rather than the radian. All final angles
are made positive by an addition of 360◦ if necessary. Since hue angle is a periodic function
of 360◦ , the convention for computing the average hue angles h ab , is to check whether
◦
the absolute difference between the two angles is less than 180 . If yes, h ab is simply the
◦
arithmetic mean. If no, 360 is subtracted from the larger angle and the arithmetic mean is
then taken. For example, if the two angles are 90◦ and 300◦ , the correct arithmetic mean is
computed by first subtracting 360◦ from 300◦ to get −60◦ and then taking the arithmetic
mean with 90◦ to get the correct mean of (−60◦ + 90◦ )/2 = 15◦ .
As one can see, many changes have been made to CIE94 to derive the new CIEDE2000.
The color-difference formula is no longer a vector-space distance measure because it de-
pends on the two colors being compared. The increased complexity of the new formula
is definitely worthwhile for many applications where a small color difference is of large
importance. However, it also brings with it a heavier mathematical burden to maintain its
own internal consistency. For example, angle is a periodic function, but Eq. (6.69) contains
an exponential function, which is not a periodic function. Here it might be more appropriate
to use the von Mises function [318, p. 81–3, 633]: f (θ) = A exp[k cos(θ − α)]. It is very
possible that problems will be found and revision will be needed in the future, but for the
time being, the formula does improve the prediction of small color-differences significantly
[613]. This in itself is a great contribution to many critical applications.
In order to specify the color coordinates relative to those of an illuminant, the CIE recom-
mends the following terms. Figure 6.13 shows how they are defined relative to an illuminant
W for any given stimulus, such as A or B.
r Dominant wavelength (of a color stimulus, A) λd : The wavelength of the monochro-
matic stimulus that, when additively mixed in suitable proportions with the specified
achromatic stimulus, matches the color stimulus considered. For stimuli whose chro-
maticities lie between those of the specified achromatic stimulus and the two ends of the
spectrum, the complementary wavelength is used instead of the dominant wavelength.
r Complementary wavelength (of a color stimulus, B) λc : The wavelength of the
monochromatic stimulus that, when additively mixed in suitable proportions with the
color stimulus considered, matches the specified achromatic stimulus.
r Colorimetric purity: Let the dominant wavelength of a given color stimulus, A, be
λd . The color stimulus, A, can be matched by additively mixing the monochromatic
light of λd and the specified achromatic light, W . Then the colorimetric purity, pc
is defined by pc = L d /(L w + L d ) where L d is the luminance of the monochromatic
light, and L w the luminance of the achromatic light. In the case of stimuli specified
by the complementary wavelength, suitable mixtures of light from the two ends of the
spectrum are used instead of the monochromatic stimuli.
r Excitation purity: pe = (y − yw )/(yd − yw ) or (x − xw )/(xd − xw ), where (x, y),
(xw ,yw ), and (xd ,yd ) are the (x, y) chromaticity coordinates of the color stimulus con-
sidered, the specified achromatic stimulus, and the dominant wavelength.
116 6 Colorimetry
1.0
0.8
lc
0.6
0.4 ld
W A
B
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
x
Figure 6.13. CIE 1931 (x, y) chromaticity diagram showing how dominant wavelength, complemen-
tary wavelength, colorimetric purity, and excitation purity are defined, all relative to an illuminant W .
r Correlated color temperature: To describe the color quality of an illuminant, the cor-
related color temperature is often used. The correlated color temperature of a given
stimulus is the temperature of the Planckian (blackbody) radiator whose perceived
color most closely resembles that of the stimulus at the same brightness, and under the
same viewing conditions. The recommended method of calculating the correlated color
temperature of a stimulus is to determine on a chromaticity diagram the temperature
corresponding to the point on the Planckian locus that is intersected by the agreed
isotemperature line containing the point representing the stimulus. The isotemperature
lines presently recommended are those normal to the Planckian locus in the now out-
dated CIE 1960-UCS (u,v) diagram or equivalently in a chromaticity diagram in which
2v /3 is plotted against u , where v and u are the coordinates of the CIE 1976 uniform
chromaticity scale diagram. One can determine the correlated color temperature of
a given light source by calculating its (x, y) chromaticity and graphically interpolat-
ing from an existing isotemperature chart [1053, p. 225], or by using a computer to
numerically calculate the answer from the definition [1053, pp. 226–7].
The spectral composition of the light reflected from a surface depends not only on the
reflecting properties of the surface, but also on the illumination incident on the surface. The
specification of color measurements thus requires the specification of the illumination. To
6.12 The CIE standard light sources and illuminants 117
1.0
0.8
0.6
y
3000 K
0.4
5000 K
4000 K 2000 K
10000 K 7000 K
20000 K
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
x
avoid use of a large number of light sources, the CIE has recommended “sources” which
it calls A, B, C, and “illuminants”, called A, B, C, D, and E. By source is meant the
specification of the actual physical setup that can produce the desired light. By illuminant is
meant the specification of a spectral power distribution which may or may not be realizable
physically. For example, CIE Source A is realized by a gas-filled tungsten filament lamp
operating at a correlated color temperature of 2856 K, while CIE Illuminant A is defined
to be the light from the blackbody radiator at 2856 K, whose spectral radiant exitance
[W m−3 ], Me,λ , is given by Planck’s radiation law:
c1
Me,λ (λ, T ) = , (6.71)
λ5 (ec2 /λT − 1)
where S0 (λ), S1 (λ), and S2 (λ), are the mean and the two most important “eigenvectors” of
a large set of measured daylight distributions (see [1053, p. 762]). Figure 6.16 shows the
118 6 Colorimetry
250.0
Illuminant A
Illuminant B
200.0 Illuminant C A
Illuminant D65
relative spectral power
150.0
C
B
100.0
D65
50.0
0.0
350.0 450.0 550.0 650.0 750.0
wavelength (nm)
150.0
S0
100.0
relative spectral power
50.0
S1
0.0
S2
-50.0
200.0 400.0 600.0 800.0 1000.0
wavelength (nm)
Figure 6.16. The spectral power distributions of the CIE daylight vectors, S0 (λ), S1 (λ), and S2 (λ).
spectral power distribution of the three vectors. M1 and M2 are scalar multipliers that are
related to the chromaticity coordinates (xD , yD ) of the illuminant by the following functions
[1053, p. 146]:
−1.3515 − 1.7703xD + 5.9114yD
M1 = , (6.73)
0.0241 + 0.2562xD − 0.7341yD
0.0300 − 31.4424xD + 30.0717yD
M2 = . (6.74)
0.0241 + 0.2562xD − 0.7341yD
6.13 Illuminating and viewing conditions 119
150.0
100.0
50.0 4800 K
5500 K
6500 K
7500 K
0.0
300.0 500.0 700.0 900.0
wavelength (nm)
which defines the “CIE daylight locus” in the CIE 1931 chromaticity diagram [1053, p. 146].
For correlated color temperatures Tc from approximately 4000 K to 7000 K:
109 106 103
xD = −4.6070 + 2.9678 + 0.099 11 + 0.244 063. (6.76)
Tc3 Tc2 Tc
For correlated color temperatures Tc from approximately 7000 K to 25 000 K:
109 106 103
xD = −2.0064 + 1.9018 + 0.247 48 + 0.237 040. (6.77)
Tc3 Tc2 Tc
Figure 6.17 shows the spectral power distributions of the CIE daylight illuminant at several
correlated color temperatures.
In order to reduce the number of illuminants used in colorimetry, the CIE recommends
that D65 (Tc = 6504 K) be used as the preferred illuminant. When it is not convenient to do
so, D55 (Tc = 5503 K) or D75 (Tc = 7504 K) should be used. CIE illuminant E is defined as a
stimulus whose spectral power distribution is constant at all wavelengths. It is also referred
to as the equal-energy white or the equal-energy spectrum.
Few, if any, physical samples transmit or diffuse light in an isotropic manner. It is, therefore,
necessary to specify the illuminating and viewing conditions when making colorimetric
120 6 Colorimetry
measurements. For this purpose, the CIE recommends the following conditions [186] for
reflecting specimens:
The CIE recommends the following conditions [186] for transmitting specimens:
As Grassman recognized in the early 1850s [359, 861], vector space operations are very
convenient for describing color matching experiments and colorimetry in general. Indeed,
a number of elegant solutions to certain colorimetric problems are successfully derived
only after the vector space formulation is adopted [201, 202, 413, 965, 1005]. Interest in
the vector space formulation of color calculation is also partially due to the wide use of
computers.
The notation used for the vector space formulation is very concise, making it possible
to express a very complicated colorimetric statement in a very short mathematical relation.
Because of this power of expression and the large knowledge base in linear algebra and
set-theoretical methods, the vector space formulation will surely become very important in
colorimetry. Since vectors of finite-dimensional spaces are almost always used in computers,
we have to be careful about the potential errors resulting from the discrete sampling of
spectral distribution functions. Our understanding of this type of error is not quite adequate
at present and further research is needed. With this word of caution, we can now introduce
the vector space formulation of colorimetry.
Let the spectral sensitivities of the eye at wavelength λ be sr (λ), sg (λ), and sb (λ). Since the
visible range of wavelength is finite (roughly from 380 nm to 780 nm), and since most inte-
gration is done by summation with small increments, we can sample all functions of wave-
length at N discrete equally spaced wavelengths and represent them as finite-dimensional
vectors. (Depending on the required accuracy, N can be as many as 401 or more, or as few
as 41.) The three cone-sensitivity functions, sr (λ), sg (λ), and sb (λ), can thus be written as
N × 1 vectors Sr , Sg , and Sb . We can construct an N × 3 matrix S such that Sr , Sg , and Sb are
its three column vectors. Let L, an N × 1 vector, represent the color stimulus incident on the
eye, and let R, a 3 × 1 vector, represent the response (or quantum catches) of the three types
of cones. Then we can express the relation between R and L as: R = S TL. Given two color
stimuli L 1 and L 2 , a color match between them simply means S TL 1 = R1 = R2 = S TL 2 . Let
A, B, C, and D be four color stimuli, and R A = S TA, R B = S TB, RC = S TC, and R D = S TD
be their cone-response vectors. Understanding that additive mixture of two stimuli A and
B, i.e., A ⊕ B, is simply A + B in vector notation, the four laws of colorimetry can now be
stated as follows:
R(A+C) = S T (A + C) = S TA + S TC = R A + RC (6.79)
= R B + R D = S B + S D = S (B + D) = R(B+D) .
T T T
(6.80)
122 6 Colorimetry
RC = S TC = S T (A + C) − S TA = S T (B + D) − S TB = S TD = R D . (6.81)
Let us select three primaries P1 , P2 , and P3 , each one being an N × 1 vector. Let P be an
N × 3 matrix such that P1 , P2 , and P3 are its three column vectors. A given color stimulus
A can be matched by proper mixture of the three primaries if there exists a 3 × 1 coefficient
vector C P = [c1 , c2 , c3 ]T such that
Since S has a rank of 3, it is only necessary to pick a set of three primaries that are mutually
independent to guarantee the full rank of the matrix P and the nonsingularity of S TP. This
proves that any set of three independent color stimuli can be used as primaries to match any
test stimuli. However, the solution vector C P may have negative component(s), in which
case the primary with the negative component is mixed with the test stimulus, instead of
with the other primaries. A very interesting and important property of color matching is
that the coefficient vector C P depends on the cone spectral sensitivity function S only up
to its linear transformation. In other word, if we construct a new visual system whose cone
spectral sensitivity functions, S , are linear combinations of S, i.e., S = SW , where W is
any nonsingular 3 × 3 matrix, it can be shown that the coefficient vector C P for the new
system will remain the same. This means that it is not possible to determine the exact cone
spectral sensitivity functions purely by color matching experiments.
By definition, the ith row vector of the color matching functions M P (an N × 3 matrix)
with respect to the primaries P is the coefficient vector C TP when the test stimulus A is
monochromatic light with the ith element equal to 1 and the rest of the elements 0. From
Eq. (6.83),
Since (P TS )−1 is a 3 × 3 matrix, the color matching functions M P are linear combinations
of the cone spectral sensitivity functions. Multiplying both sides of Eq. (6.85) on the left by
P T , we can obtain an interesting relation:
P TM P = I, (6.86)
where I is the 3 × 3 identity matrix. Therefore, the spectra of the primaries, P, and their
corresponding color matching functions, M P , are related in a special way.
6.14 The vector space formulation of color calculations 123
If another set of primaries Q is chosen to match the same stimulus A, the coefficient
vector C Q can be related to C P by:
C Q = (S TQ)−1 S TA from Eq. (6.83)
= (S TQ)−1 S T PC P from Eq. (6.82) (6.87)
and
C QT = C TP P TS(Q TS )−1 . (6.88)
Again, by stacking up C QT s and C PT s for single wavelength stimuli, we have
M Q = M P P TS(Q TS )−1
= M P P TM Q , (6.89)
and therefore
M P = M Q (P TM Q )−1 . (6.90)
Similarly, M Q = M P (Q TM P )−1 , and using this relation, we can also derive the following
equation:
C Q = (S TQ)−1 S TPC P
= M QT PC P
= (M PT Q)−1 M PT PC P
= (M PT Q)−1 C P , (6.91)
which is the transformation of tristimulus values between two sets of primaries, as has been
shown in Eq. (6.11).
One area in colorimetry is the study of metameric color stimuli (stimuli that have different
spectral power distributions, but appear to be the same color visually). An interesting ques-
tion is that given a spectral power distribution, how can we generate its various metamers
systematically? This question is difficult to answer in the continuous domain, but if cast in
the finite-dimensional vector space formulation, it becomes quite easy as was first pointed
out by Cohen and Kappauf [201]. The basic solution is based on the observation by Wyszecki
that if a spectral function, B, produces zero integrals with all three cone-sensitivity func-
tions, sr , sg , and sb , it is invisible to the eye, and when it is added to any spectral power
distribution, A, the resulting spectral power distribution, A + B, will be a metamer of A.
The invisible spectral function B is therefore called a metameric black.
If we treat the three spectral-sensitivity functions of the cones as an N × 3 matrix, S,
we can imagine that in N -dimensional space, all vectors that are perpendicular to the three
cone vectors will not be visible to our eyes because the inner products between them and
the cone spectral-sensitivity functions will be zero. If we can find a projection matrix, V ,
that will project any N -dimensional vector, A, into the three-dimensional subspace that is
spanned by the three cone vectors, the difference, A − VA will be a metameric black. A
projection matrix of this nature is well known in matrix algebra [350] to be:
V = S(S TS )−1 S T . (6.92)
124 6 Colorimetry
Since any nonsingular linear transform of the three cone vectors spans the same subspace,
the projection operator V can be constructed from any measured color matching functions
as well (see Eq. (6.85)). This can be easily shown as follows: for any S = SW , where W
is any nonsingular 3 × 3 matrix,
The elegant simplicity resulting from the vector space formulation is thus clearly demon-
strated in the example of computing metameric black and proving its invariance under
nonsingular linear transform.
The applications of colorimetry are numerous. We will present three examples in this section.
The first example shows how the National Television Systems Committee (NTSC) color
signals are defined by its choice of phosphor primaries and reference white. The second
example shows how a realistic scene should be generated in computer graphics. The third
example discusses various issues in digital color image processing.
Stimulus r g b x y z
also less than studio lighting (200 cd m−2 or more). Therefore, it is not practical to require
the TV monitor to reproduce the same XYZ tristimulus values as the original scene. Thus
typically the signals of a “white object” with 60% reflectance are scaled to the maximum
amplitude but chromaticities are generally reproduced correctly.
The NTSC chromaticity transformation between the phosphor primaries and the CIE
XYZ primaries is shown in Table 6.2. Using the relations developed in Section 6.4, we have
x1 x2 x3 0.67 0.21 0.14
V = y1 y2 y3 = 0.33 0.71 0.08 , (6.93)
z1 z2 z3 0.00 0.08 0.78
r1 r2 r3 1.0 0.0 0.0
U = g1 g2 g3 = 0.0 1.0 0.0 , (6.94)
b1 b2 b3 0.0 0.0 1.0
α1 r4 1.0 0.0 0.0 1/3 1/3
α2 = U −1 g4 = 0.0 1.0 0.0 1/3 = 1/3 , (6.95)
α3 b4 0.0 0.0 1.0 1/3 1/3
−1
β1 x4 0.67 0.21 0.14 0.310 0.286 283
β2 = V −1 y4 = 0.33 0.71 0.08 0.316 = 0.260 999 , (6.96)
β3 z4 0.00 0.08 0.78 0.374 0.452 718
and
0.858 849 0 0
D= 0 0.782 997 0 . (6.97)
0 0 1.358 15
The convention is to choose the constant c so that Y is equal to 1.0 when R, G, and B are
set to 1.0. By doing so, we obtain the transformation matrix A as
0.607 0.173 0.200
A = 0.299 0.586 0.115 . (6.98)
0.000 0.066 1.117
The numbers are subject to errors due to rounding and matrix inversion. The matrix matches
the familiar NTSC signal specifications to the degree of accuracy in our computation:
In order to compensate for the nonlinear characteristics of the CRTs used as television
receivers, the (R, G, B) signals are predistorted to (R , G , B ) so that when (R , G , B ) are
applied to the CRTs, the displayed images are proportional to (R, G, B). This predistortion
step is called “gamma correction”:
R = R 1/γ ,
G = G 1/γ ,
B = B 1/γ ,
Y = 0.299R + 0.587G + 0.114B ,
where γ = 2.2 for the NTSC standard. The (R , G , B ) signals are then converted to color
difference signals:
U = 0.493(B − Y ), (6.99)
V = 0.877(R − Y ), (6.100)
◦ ◦
I = V cos 33 − U sin 33 , (6.101)
◦ ◦
Q = V sin 33 + U cos 33 , (6.102)
I = 0.596R − 0.275G − 0.322B , (6.103)
Q = 0.211R − 0.523G + 0.312B . (6.104)
From the TV transmitter, Y is the luminance signal and (I, Q) are the chrominance signals
that are broadcasted. Based on our visual sensitivity, Y is transmitted with the highest
bandwidth: 4.2 MHz; I with the next highest bandwidth: −1.3 MHz to +400 kHz about
the chrominance subcarrier frequency; and Q with the lowest bandwidth: −400 kHz to
+400 kHz about the chrominance subcarrier.
it is faster and more economical to use computer graphics to synthesize the images of the car
painted with the different paints that we are considering, than to build models and actually
paint them.
In order that the synthesized images be as close to the real scene as possible, careful
modeling of surface reflection and accurate color rendering are very important. The tra-
ditional shading model represents Lambertian surface reflection by only three reflectance
factors ρr , ρg , and ρb , such as:
R = E r ρr (N · L),
G = E g ρg (N · L), (6.105)
B = E b ρb (N · L),
where R, G, and B are the three image irradiances, and r, g, and b indicate the conven-
tional red, green, and blue channels. E r , E g , and E b are the red, green, and blue incident
irradiances of the surface illumination, and ρr , ρg , and ρb are the diffuse (body) reflectance
factors. N and L are the surface normal and light source direction vectors. Although this
model has been widely used and produces seemingly acceptable images, it is physically
incorrect [320, p. 733]. The correct way to generate the R, G, B signals for a color monitor
is to use the full spectral energy distribution of the light source and the spectral reflectance
function of the surface to calculate the CIE XYZ tristimulus values and then convert them
to the CRT RGB values with a 3 × 3 matrix determined from the CRT phosphor chro-
maticities and luminances. In general, any reflection model of product forms in R, G, B,
such as Eq. (6.105), is physically correct only under very restricted conditions, such as
a single light source without ambient illumination (see [320, 563, 567] for more detailed
discussions).
With the increased popularity of electronic imaging devices comes a need for a default stan-
dard color space. For example, images captured with a digital camera have to be calibrated
so that they can be displayed on a color monitor with good color and tone reproduction.
Since cameras and monitors are manufactured by many different companies, the only way
to make color reproduction feasible in the open system environment is to set up a common,
default, color space on which color images can be exchanged over different devices. For this
purpose, sRGB was accepted by the International Electrotechnical Commission (IEC) and
the International Standards Organization (ISO) as the default color space for multimedia
applications.
The sRGB color space is based on the ITU-R BT.709 reference primaries: see Table 6.3.
Applying the color transformation as discussed for the NTSC system, we can obtain the
following matrix for the color transformation between the linear R, G, B of sRGB and CIE
XYZ:
where R, G, B are assumed to be in the range [0,1]. The inverse transformation can be
performed through the following equations:
Stimulus r g b x y z
then
r = R/12.92,
g = G/12.92,
b = B/12.92;
otherwise,
R = 255.0 × r,
G = 255.0 × g,
B = 255.0 × b.
The sRGB standard allows R G B to be encoded into different numbers of bits and offsets
to match different monitor interfaces. The conversion is done simply by adding offsets and
applying different scale factors.
If the encoded nonlinear R G B signals are eight-bit/color/pixel, they can be converted
to linear RG B as follows:
r = R /255.0,
g = G /255.0,
b = B /255.0.
If r, g, b ≤ 0.039 28,
R = r × 12.92,
G = g × 12.92,
B = b × 12.92;
otherwise,
R = [(r + 0.055)/1.055]2.4 ,
G = [(g + 0.055)/1.055]2.4 ,
B = [(b + 0.055)/1.055]2.4 .
It should be pointed out that the initial linear segment serves two purposes: (a) to avoid the
large quantization of a pure 0.45-power function, and (b) to compensate for viewing flare
by making a small signal darker than it is with a pure 0.45-power function. The power 2.4 is
to produce a good approximation to the rest of the 2.2-power function after the initial linear
segment.
130 6 Colorimetry
6.17 Problems
6.1 In a color matching equation: (C) ≡ R(R) + G(G) + B(B), (C) represents the color
stimulus of the test field, and (R), (G), and (B) are the three primaries.
(a) What do we call the coefficients R, G, and B?
(b) When (C) is a monochromatic light (single wavelength) whose wavelength is var-
ied through the entire visible spectrum, what do we call the resulting coefficients
R, G, and B as functions of wavelength and what are the standard notations for
these functions?
(c) What are the chromaticity coordinates for the color stimulus (C)?
6.2 Let (R)(G)(B) and (X)(Y)(Z) be two sets of primaries. Given:
(X) X R XG X B (R)
(Y) ≡ Y R YG Y B (G) ,
(Z) Z R ZG Z B (B)
what is the transformation of the tristimulus values from one set of primaries (X)(Y)(Z)
to another set of primaries (R)(G)(B)?
6.3 Let the chromaticity coodinates of a given color stimulus be (x, y) = (0.3, 0.5), and
its luminance be 100, what are the tristimulus values, X ,Y ,Z , of this color stimulus?
6.4 In the CIE 1931 (x,y)-chromaticity diagram, the spectrum locus is a convex curve of
a horse-shoe shape. Knowing that the chromaticity coordinates of any mixture of two
color stimuli, c(A) + d(B), always fall on the straight-line segment connecting the two
points representing (A) and (B), prove that no three physically realizable primaries
can be chosen to match all physically possible color stimuli (without moving one or
more of the primaries to the test field).
6.5 Calculate the CIE 1931 chromaticity coordinates of the daylight illuminant at
6500 K.
6.6 A color sample under illuminant C has a dominant wavelength of 530 nm and an
excitation purity of 50%. Calculate its coordinates on the CIE 1931 (x, y) chromaticity
diagram.
6.7 On the CIE 1931 (x, y)-chromaticity diagram, additive mixtures of two color stimuli
fall on a straight-line segment. Is this property still true for CIE 1976 (u ,v ), (u ∗ ,v ∗ ),
and (a ∗ ,b∗ )?
6.8 How many crossing points must a pair of metameric spectral power distributions have?
6.9 Show that C P = (S TP)−1 S T A is invariant under any linear transformation of S, i.e.,
for any S = SW , where W is any nonsingular 3 × 3 matrix.
6.10 If a TV color monitor does not use the NTSC standard phosphors, what type of
electronic circuits should be put in to correct it?
6.11 How do we generate a color image of a human face that is illuminated by two illumi-
nants of different spectral compositions? Can we use three reflectance factors, ρr , ρg ,
and ρb , to represent the skin color?
6.12 Histogram equalization is a standard technique for image enhancement. Since the
image code value can always be mapped to a different output metric, such as CIEXYZ,
CIELUV, or CIELAB, what is the histogram that should be equalized?
6.17 Problems 131
6.13 Let (X, Y, Z ) be the tristimulus values of color sample 1 and (8X, 8Y, 8Z ) be those
of color sample 2. Both color samples are identically illuminated reflection samples.
Let (a ∗ , b∗ ) of color sample 1 be (12, 7). What are the (a ∗ , b∗ ) values for color sample
2? Assume that X/ X n , Y /Yn , and Z /Z n are all greater than 0.1 and (X n , Yn , Z n ) are
the tristimulus values of the reference white.
6.14 Two color samples, A and B, are illuminated by a light source W. The chromatic-
ity coordinates of A are (x, y) = (0.3016, 0.5724) and those of B are (x, y) =
(0.4822, 0.3725). The dominant wavelength of A is 550 nm and its chromaticity
coordinates are (x, y) = (0.3016, 0.6923). The dominant wavelength of B is 600 nm
and its chromaticity coordinates are (x, y) = (0.6270, 0.3725). What are the chro-
maticity coodinates of the light source W? What is the excitation purity of the color
stimulus from color sample A?
6.15 A screen is illuminated by the additive mixture of the light beams from three
projectors, A, B, and C. When projected with each projector alone, the light re-
flected from the screen has the chromaticity coordinates (xA , yA ) = (0.62, 0.24),
(xB , yB ) = (0.23, 0.64), and (xC , yC ) = (0.21, 0.25), for projectors A, B, and C, re-
spectively. If the luminances of the projectors are adjusted so that YA = 100 cd m−2 ,
YB = 200 cd m−2 , and YC = 300 cd m−2 , what are the chromaticity coordinates (x, y)
of the mixture of light reflected from the screen?
6.16 Room A and room B are two separate rooms illuminated by different light sources.
The tristimulus values of the reference white are (X n , Yn , Z n )A = (80, 100, 128) for
room A and (X n , Yn , Z n )B = (128, 100, 80) for room B. A reflection color sample S
in room A has tristimulus values (X s , Ys , Z s ) = (16, 20, 16). Another reflection color
sample T in room B also has tristimulus values (X t , Yt , Z t ) = (16, 20, 16).
(a) Color samples S and T have the same tristimulus values. Do they look the same
color in the separate rooms?
(b) Calculate the CIELAB values for color samples S and T.
(c) What color will you see in color sample S? What color will you see in color
sample T?
(d) Do you think that color samples S and T have the same or different spectral
reflectance factors? If you think their spectral reflectance factors are different,
what kind of difference is most likely to exist?
7 Light sources
where R J is the actual sun–earth distance on day J, ω = 360◦ /365.25 days = 0.9856◦ /day
is the mean angular velocity of the earth around the sun, φ = 1.735◦ is an offset angle, J
is the number of Julian days, counted from 1 = January 1, and ε = 0.0167 is the numerical
eccentricity of the earth’s orbit. The solar irradiance, thus varies from 1.0334E 0 at perigee
(its near point) to 0.9666E 0 at apogee (its far point). The top curve of Fig. 7.2 shows the
solar spectral irradiance above the earth’s atmosphere. The spectral radiance distribution
can be approximated as a blackbody radiator at 5750 K.
The daylight that reaches the earth’s surface from the sun consists of direct sunlight (solar
radiation attenuated by the earth’s atmosphere) and skylight (solar radiation scattered by the
earth’s atmosphere). At sea level on a clear day, the average irradiance received by the earth’s
surface is about 1200 W m−2 . Only about 40% of this power is in the visible wavelength
range. The angular size of the solar disk viewed from the earth varies from 31.5 to 32.5 min-
utes of arc. The peak wavelength of the sunlight spectrum outside the atmosphere is 475 nm
(corresponding to a blackbody of 6101 K). It is shifted to a wavelength longer than 500 nm
at sea level, depending on the time of the day and the atmospheric conditions. The illumi-
nance and spectral composition of the sunlight are modified by absorption and scattering
by ozone, gas molecules, water droplets, dust, and other particles in the atmosphere. Figure
7.2 shows spectral and amplitude modification by a particular atmospheric condition [189]
132
7.1 Natural sources 133
July 3 January 3
apogee sun perigee
Figure 7.1. The earth’s orbit around the sun is an ellipse with the sun at one of the foci. The numerical
eccentricity is only 0.0167, very close to a circle. The drawing is exaggerated to show the elliptic
shape.
2.5
solar spectral irradiance (W m−2 nm−1)
2.0
1.5
1.0
0.5
0.0
300.0 400.0 500.0 600.0 700.0 800.0 900.0
wavelength (nm)
Figure 7.2. The solar spectral irradiance above the earth’s atmosphere (the top curve) and at sea level
on the earth’s surface (the bottom two curves). The bottom curve is the irradiance from direct sunlight
on a plane normal to the sun beam. The middle curve (the global solar radiation) shows the direct
sunlight plus the scattered sunlight from the atmosphere.
(relative air mass = 1; water vapor content = 1.42 cm precipitable water; ozone content =
0.34 cm STP; spectral optical depth of aerosol extinction = 0.1 at wavelength 500 nm).
Let E be the solar illuminance at sea level and E s = 1.275 × 105 lx be the solar illumi-
nance above the atmosphere, both measured at a surface normal to the sun’s rays. From the
Beer–Lambert law, we can estimate E by [687, p. 352]:
E = E s e−αm , (7.2)
where α is the optical atmospheric extinction coefficient, and m is the relative optical air
mass, which is defined as the ratio of the air mass from the actual sun–earth path to the
134 7 Light sources
mass when the sun is directly overhead at sea level, i.e., zenith angle = 0. The relative air
mass is related to the solar altitude h, which is the angle of elevation of the sun above the
horizon:
If the earth and its atmosphere were flat, the equation would be exactly true. In reality,
the curvature of earth and its atmosphere reduces the actual air mass. For example, when
the sun is on the horizon (i.e., h = 0), the actual air mass is about 38, instead of infinity. The
optical atmospheric extinction coefficient, α, varies a lot depending on the local atmospheric
conditions. An average value is about 0.21 for a clear sky. From Eqs. (7.1), (7.2), and (7.3),
we can obtain a very rough estimate of the solar illuminance at sea level on a clear day for
the time and location of interest to us.
The relative contribution of the two components of daylight (direct sunlight and skylight)
to the illumination of an object varies as a function of the object’s surface orientation with
respect to the sun’s position. For example, for a horizontal plane on a clear day, the skylight
contributes about 26% of the surface illuminance when the sun’s elevation angle is 20◦ ,
and its contribution drops to about 15% when the solar elevation is 50◦ . When the sun is
directly overhead (90◦ ), the skylight contributes about 13% of the surface illuminance of
a horizontal plane. Because of the optical path and scattering, the spectral compositions
of sunlight and skylight are both functions of the solar altitude. Skylight has more relative
energy in the short wavelength region (400–500 nm) and direct sunlight has an overall
flat spectrum (from 450 to 700 nm), but it contains less energy at the shorter wavelengths
and more energy at the longer wavelengths due to scattering when the solar altitude is
low. Figure7.3 shows the relative spectral energy distributions of sunlight and skylight at
a solar altitude of 40◦ on a clear day. In everyday language, the sky is bluish and the sun
200.0
skylight
150.0
relative spectral energy
100.0
50.0
sunlight
solar altitude = 40°
0.0
300.0 400.0 500.0 600.0 700.0
wavelength (nm)
Figure 7.3. The relative spectral energy distributions of sunlight and skylight at a solar altitude of
40◦ .
7.1 Natural sources 135
is yellowish white. The consequence of this complex variation of illuminance and spectral
composition of daylight as a function of solar altitude and object surface orientation is that
in a digital image of an outdoor scene the pixels of a curved object surface have different
color values depending on its local surface orientations. This is a very basic fact with which
all color image segmentation algorithms have to deal. Fortunately, this color variation has a
characteristic yellow–blue direction that can be used to our advantage. What is remarkable
is that our visual system often perceives a curved surface of a homogeneous material as
having a uniform color, when physically every point on the surface is illuminated with a
different spectral energy composition!
Judd, MacAdam, and Wyszecki [476] compiled an extensive set of 622 measurements
of daylight spectral power distributions, and used principal component analysis to extract
several vectors which can be used to reconstruct the typical daylight spectral power dis-
tribution from a given correlated color temperature. The CIE has since recommended a
method for calculating the daylight illuminant based on their results [1053, pp. 145–6].
This is discussed in Section 6.12.
The luminance and chromaticity of the sky are not uniform. The luminance of the sky
depends on the number of scatterers in the line-of-sight. Therefore, the sky looks darker
when we are on a very high mountain because there are fewer air molecules to scatter the
sunlight into our eyes. For the same reason, the zenith sky also looks darker than the sky
near the horizon. This luminance variation can be approximately modeled as I0 (1 − e−τ ),
where τ is the optical depth of the scattering medium (the air) [615, pp. 24–6]. Near the
horizon, the optical depth (air mass) is so large that light of all wavelengths is scattered
back (due to multiple scattering), and the horizon sky looks like white, instead of blue.
7.1.2 Moonlight
Moonlight is mainly sunlight reflected by the lunar surface. The luminance of the moon
depends on the moon’s phase angle, its distance from the earth, its angular elevation from
the horizon, etc. The lunar surface reflects light in a manner quite different from that of a
Lambertian surface [276, pp. 152–61]. The full moon looks much brighter than the half-
moon because the strong retroreflectance of the lunar surface [276, Chapter VII]. The moon
at any phase also looks very different from a Lambertian sphere generated by computer
graphics. Except for the visible shadowing in some parts of the lunar surface, all points
on the moon seem to look equally bright. A Lambertian surface receives light (surface
irradiance) according to the cosine of the incident angle and distributes the reflected light
in all directions with the same radiance (flux per solid angle per projected area). Therefore,
a Lambertian surface looks darker as the surface normal points angularly further away from
the light source. This is not what we see from the lunar surface. Since our retinal image
irradiance is approximately proportional to the surface radiance, the lunar surface appears
to reflect light in such a way that the reflected radiance increases with greater reflection
angle so that the effect of the cosine incident angle is cancelled. It is as if the lunar surface
distributes the reflected light uniformly per solid angle in all directions, regardless of the
angle of reflection. This type of light-reflecting property is not unique to the lunar surface.
For example, Hawaiian volcanic cinder exhibits the same property [276, pp. 158–9].
136 7 Light sources
0.50
full moon
0.40
illuminance (lx)
0.30
0.20
new moon
0.10
0.00
-180.0 -120.0 -60.0 0.0 60.0 120.0 180.0
phase angle (degrees)
Figure 7.4. The illuminance (on a plane at normal incidence to the moon) due to moonlight plus night
skylight as a function of the lunar phase angle.
The illuminance (on a plane at normal incidence to the moon) due to moonlight plus night
skylight on a clear night at sea level under full moon is about 0.4 lx. Under a half moon,
the average illuminance is only 0.05 lx. This illuminance falls off rapidly as the phase angle
increases or decreases from the full moon. The phase angle is defined as the angle between
the vector from the object surface to the light source and the vector from the object surface to
the viewing observer. Therefore, the phase angle of a full moon is approximately 0◦ . Let us
define the phase angle before full moon as negative and after it as positive. Figure 7.4 shows
the illuminance as a function of the lunar phase angle. The curve is not symmetric. The illu-
minance is generally slightly higher before the full moon for the same phase angle because
the visible part of the moon after the full moon has a larger proportion of large dark areas
[212].
7.1.3 Starlight
The total illuminance due to starlight is approximately 2 × 10−3 lx for a clear sky night.
Although a star is a good example of a point source, its brightness is not traditionally mea-
sured in terms of luminous intensity. The visual brightness of a star is measured in apparent
magnitude m, which is defined by m = m 0 − 2.5 log(E/E 0 ), where E is the illuminance
produced by the star on a plane at the earth’s surface and normal to the direction of the star,
m 0 and E 0 are constants based on the magnitudes ascribed to certain standard stars that are
frequently measured by photoelectric instruments [667, p. 96, 772]. E 0 is approximately
2.15 × 10−6 lx for a 0th magnitude star, m 0 = 0 [212]. The coefficient 2.5 is chosen so
that a magnitude 1 star is exactly 100 times more luminous than a magnitude 6 star. Sirius,
which is the brightest star in the sky, the full moon, and the sun have apparent magnitudes
of −1.45, −12.73, and −26.74, respectively.
7.2 Artificial sources: lamps 137
There are many different types of lamps for everyday lighting and for color imaging light-
ing. Six of the major categories for everyday lighting are incandescent, tungsten halogen,
fluorescent, mercury, metal halide, and sodium. For color imaging (photography), the major
category is the electronic flash lamp. Two general characteristics of lamps that are important
for color imaging are their spectral power distribution as a function of their life time and
operating conditions. The light output of a lamp decreases during its life. Also, the spec-
tral power distribution of a tungsten lamp depends on the voltage at which it is operated.
Therefore, for critical color calibration or measurement, we cannot always assume that the
spectral power distribution of a lamp will remain the same after hours of use or at various
operating temperatures. Light sources for laboratory use are reviewed in Chapter 10 of
Handbook of Optics [84], and will not be covered here.
Radiant spectra from solid thermal sources (such as the tungsten lamps) are continuous.
On the other hand, the emission or absorption spectra of electronic transitions from isolated
atoms and molecules are quite narrow (they can be less than 10−3 nm); therefore, they
are called spectral lines. When plotting the spectral power distribution of a light source,
such as a fluorescent lamp, which emits a mixture of a continuous spectrum and several
spectral lines, the power densities of the spectral lines have to be plotted separately. For
example, if the continuous spectral radiance is measured at 5 nm intervals and expressed in
[mW sr−1 m−2 nm−1 ], a spectral line containing 100 mW is plotted as a rectangle of height
20 and width 5 centered at the correct wavelength on a graph for which the unit on the x-axis
is the nanometer and the unit along the y-axis is the milliwatt. The area of the finite-width
rectangle is equal to the power of the spectral line it is representing. This rectangular bar
is raised on top of the continuous spectral radiance so that the total height is their sum. An
example is shown in Fig. 7.6.
2000.0
1500.0
1000.0
500.0
0.0
300.0 400.0 500.0 600.0 700.0 800.0
wavelength (nm)
the same color as a blackbody radiator at a slightly higher temperature. (This is called the
correlated color temperature of a light source.) Linear interpolation of the emissivity as a
function of wavelength gives a fairly good approximation in the visible wavelength range
(400–700 nm). The emissivity decreases as the temperature is raised (for example, ε drops to
0.448 at 500 nm when the temperature is raised to 2800 K). A standard 120 V, 100 W tungsten
lamp operates at a filament temperature of about 2860 K. If the operating voltage is changed,
its filament temperature also changes. Figure 7.5 shows an example of the spectral power
distribution of a tungsten lamp. The color temperature of tungsten lamps is a few percent
higher than the true temperature. For example, a 3000 K filament radiates light of a correlated
color temperature of 3121 K [162]. In general, the spectral power distribution of a tungsten
lamp is very well approximated by that of the blackbody at the same color temperature.
Only about 10% of the electric power is converted to visible light in a 100 W light bulb
[162]. In an ordinary tungsten lamp, evaporated tungsten atoms are deposited on the inside
wall of the bulb which is much cooler than the filament. This results in the blackening of the
bulb during its life. Filling the bulb with gases such as argon and nitrogen greatly reduces
the blackening effect. However, the gases also reduce the filament temperature and therefore
its luminous efficacy. In lamps less than 40 W, the heat loss is greater than the gain in light
output by operating the filament at a somewhat higher temperature. Therefore, filling gas
is used only for lamps of 40 W or above. The reduction in filament temperature due to the
filling gas is less severe when the area of the filament is larger. Therefore, most tungsten
lamps today use double-, and even triple-, coiled filaments.
Tungsten lamps have a heating time to 90% of the steady-state luminance of the order of
0.1 s (for a 60 W gas-filled lamp), and a cooling time to 10% of the steady-state luminance
around 0.04 s. If one defines the percentage flicker as the ratio of the difference of the
maximum and minimum light levels to their sum, multiplied by 100, then for a 60 Hz power
source, the percentage flicker of tungsten lamps is about 5%.
7.2 Artificial sources: lamps 139
Introducing halogen gases (such as iodine) was found (in 1958) to increase the number
of lumens per watt and also maintain the luminance level better during a lamp’s life. This
beneficial effect is caused by the halogen regenerative cycle in which the halogen gas
combines with the evaporated tungsten atoms to form tungsten halide; tungsten iodide
molecules do not deposit on the bulb wall if the wall temperature is kept between 500 and
1500 K. Tungsten iodide molecules dissociate near the filament where the temperature is
high (2800 K) and deposit the tungsten atoms back on the filament. In order to maintain
the high wall temperature, quartz is used instead of ordinary glass. The resulting lamps are
called the tungsten halogen lamps and have a longer life and better light output. Most such
lamps today use bromine instead of iodine because bromine allows a lower wall temperature
and it is almost colorless.
50.0
30.0
20.0
10.0
0.0
300.0 400.0 500.0 600.0 700.0 800.0
wavelength (nm)
and often cause difficulties in color reproduction. For example, most photographic color
negative films have peak spectral sensitivities at about 430 nm, 550 nm, and 650 nm [924,
p. 551]. Images taken under fluorescent lighting often show unusually high “green” and
“blue” signals. This effect of course depends on the spectral reflectance factor of the object
and therefore it is not possible to correct the problem by simple shifts in log exposures in
the three color layers when printing the color negatives. For digital cameras, fluorescent
lamps also cause problems in color reproduction and a separate color correction matrix is
needed to make the colors of many familiar objects look right.
power distribution of a flash lamp consists of a continuous spectrum and a number of discrete
xenon lines. The overall shape of the distribution depends (weakly) on several operating
parameters, such as the gas pressure and applied voltage, but it is, in general, similar to the
daylight spectrum of a correlated temperature of about 6000 K. At the long-wavelength end
(greater than 650 nm), the power depends on the operating voltage range. The low-voltage
types produce much more infrared than the high-voltage types.
Since the flash duration is often very short compared with the exposure time, the image
exposure is controlled by the aperture size or the f-number. The amount of light illuminating
flash and the subject. Flash manufacturers publish exposure guide numbers for the flash
units they make to help users to set the exposure. The guide number is the subject distance
(usually in feet) multiplied by the f-number for a given film speed. For example, if the
guide number of a flash unit is 120 and the subject distance is 15 feet, then an aperture
of f /8 should be used. However, these numbers are not always very accurate because of
variations both in flash power due to manufacturing and in the luminance levels of the
ambient lighting. Therefore, some calibration for each flash unit is needed in applications,
such as digital cameras.
There are three types of flash unit used in cameras: (1) a manual flash, (2) a self-quenching
amount of light each time and it is up to the camera user to adjust the aperture for different
subject distances. The self-quenching flash unit has a sensor to integrate the total light
bounced back to the camera and automatically turns off the flash when enough light has
been collected. This type of unit is often used in digital cameras so that the proper exposure
can be obtained for flash pictures. The dedicated through-the-lens flash unit works with a
particular camera. It links up with the camera’s through-the-lens metering system to control
the exposure (flash power or aperture) automatically [518, p. 114].
produced using GaN and related III–V nitride semiconductors [931, pp. 291–2]. This made
it possible to use LEDs to produce “white” light sources by combining different color LEDs
or by coating certain phosphors, such as yttrium aluminum garnet (YAG), on top of the blue
LEDs. The latter approach has the advantage of being more compact and robust. The phos-
phors on top of the blue LED serve a similar purpose to phosphors coated on fluorescent
lamp tubes, i.e., to convert the blue light into light of other wavelengths.
It is now possible to use white-light LEDs as light sources [843, 1072, 1073]. The
luminous efficiency of such LEDs approaches 15 lm W−1 , about 8 times that of tungsten
lamps and 2 times that of fluorescent lamps. Since an LED is quite small, a large array of
LEDs can be used to provide the light needed. White-light LEDs only require a low current
15–20 mA, and can produce light with a color temperature close to 8000 K. Many traffic
lights now use LEDs. They are low-cost, small in size, very bright, and very energy-efficient.
They also last a long time (more than 100 000 hours). CIE is working on standardizing the
methods for specifying and measuring LED and LED cluster array characteristics [219].
1.00 1.00
CIE01 CIE05
0.60 0.60
0.40 0.40
0.20 0.20
0.00 0.00
300.0 400.0 500.0 600.0 700.0 800.0 300.0 400.0 500.0 600.0 700.0 800.0
wavelength (nm) wavelength (nm)
1.00 1.00
CIE02 CIE06
0.80 0.80
0.60 0.60
0.40 0.40
0.20 0.20
0.00 0.00
300.0 400.0 500.0 600.0 700.0 800.0 300.0 400.0 500.0 600.0 700.0 800.0
wavelength (nm) wavelength (nm)
1.00 1.00
CIE03 CIE07
spectral reflectance factor
0.80 0.80
0.60 0.60
0.40 0.40
0.20 0.20
0.00 0.00
300.0 400.0 500.0 600.0 700.0 800.0 300.0 400.0 500.0 600.0 700.0 800.0
wavelength (nm) wavelength (nm)
1.00 1.00
CIE04 CIE08
spectral reflectance factor
spectral reflectance factor
0.80 0.80
0.60 0.60
0.40 0.40
0.20 0.20
0.00 0.00
300.0 400.0 500.0 600.0 700.0 800.0 300.0 400.0 500.0 600.0 700.0 800.0
wavelength (nm) wavelength (nm)
1.00
0.80
0.20
0.60
0.40
0.10
0.20
0.00 0.00
300.0 400.0 500.0 600.0 700.0 800.0 300.0 400.0 500.0 600.0 700.0 800.0
wavelength (nm) wavelength (nm)
Figure 7.7. The spectral reflectance factors for the eight test-color samples to be used for calculating
the CIE general color-rendering index. (The bottom two curves are not CIE recommended color
samples – they are from a sample of human skin and a sample from a plant leaf.)
144 7 Light sources
For test-color sample i, let (E)i be the color difference between the test source and the
reference source expressed in CIE 1964 UCS color difference formula. Then the CIE special
color-rendering index is defined by Ri = 100 − 4.6(E)i . The CIE general color-rendering
index Ra is defined as the arithmetic mean of the eight CIE special color-rendering indices
Ri for the eight standard test-color samples, i.e.,
1 8
Ra = Ri . (7.4)
8 i=1
The factor 4.6 in the definition of Ri was chosen so that the CIE general color-rendering
index is equal to 50 when a CIE defined standard “warm white” fluorescent lamp is used as
a test source in comparison with an incandescent lamp as the reference source.
7.4 Problems
7.1 The earth is further away from the sun in July than in January. For the people living
in the northern hemisphere, why is it hotter in July than in January?
7.2 On a cloudless day, if the spectral power distributions of the sunlight and the skylight
remain constant throughout the day, what will be the daylight locus in the 1931
CIE chromaticity diagram? Of course, the spectral power distributions do not remain
constant. What are the factors that cause the spectral power distributions of the sunlight
and the skylight to change throughout the day?
7.3 As we saw in this chapter, the various natural and man-made light sources have quite
different spectral power distributions. What impact does this have on color imaging
applications?
7.4 Fluorescent lamps have a few spectral lines that have high radiant power. Why is this
particularly troublesome for color imaging applications?
7.5 Many digital cameras provide a manual color balance setting for flurescent lightings.
Since there are many different types of fluorescent lamp, each with a different color,
does it make sense to provide a separate color balance setting just for fluorescent
lightings?
7.6 Since the spectral power distribution of the electronic flash lamp that comes with
a camera is known to the manufacturer, does it make sense for a digital camera to
provide a separate color balance setting for flash pictures?
8 Scene physics
8.1 Introduction
Color images are formed by optical imaging systems from physical scenes that are composed
of three-dimensional matter interacting with light. Light radiated from light sources is
reflected, refracted, scattered, or diffracted by matter. As a result of all these light–matter
interactions, light is redistributed spatially and temporally to create the physical scenes that
we see and take pictures of. The study of color imaging science, thus, should begin with
the light-field formation process of the physical scenes. This is what we mean by scene
physics. The necessity for studying such a subject arises not simply to generate realistic
color images by computer graphics. It is also driven by our need to understand and model
the scene physics to develop computational algorithms that can adjust the color balance and
tone scale of color images automatically so that optimal reproduction and display can be
achieved.
Our discussion of reflection (Fresnel equations) in Section 5.4.1 assumes that the object
surface is perfectly smooth, flat, and isotropic. However, the surfaces of real objects are
almost never like that. In order to characterize light reflection and scattering from surfaces,
we need a more general way to describe the optical property of surface reflection.
Although surface reflection is a well-studied subject, the terms used in the literature have
not yet been standardized. Difficulties arise not only with the definitions, but also with the
underlying concept of measurement and the models of the assumed physical processes. Let
us start by treating light as rays (geometric optics) and see what can happen as light interacts
with a rough surface.1 Figure 8.1 shows two light rays (ray 1 and ray 2) undergoing refrac-
tion, reflection, scattering, and absorption. Let us assume that the top half of the diagram is
air and the bottom half is an object of some inhomogeneous material, say a painted surface.
When light (ray 1 and ray 2) is incident on the rough surface, part of its energy is reflected
(ray 3 and ray 7) and part of it penetrates into the surface (ray 4 and ray 6). The reflected light
can undergo more than one reflection (ray 7). The penetrating light can undergo multiple
1 There are several problems with this type of treatment, and we will discuss them later. For most applications,
it is a useful approximation.
145
146 8 Scene physics
ray 1
ray 3
ray 2 ray 7
ray 5
reference
smooth surface
ray 4
pigments ray 6
scattering by pigment particles in the surface material, and eventually is absorbed (ray 6)
or reflected back into the air (ray 5). The first reflection at the interface (ray 3 and ray 7)
has several names: surface reflection, interface reflection, or specular reflection. Reflection
coming from under the surface (ray 5) also has several names: subsurface reflection, body
reflection, bulk reflection, or diffuse reflection. Strictly speaking, there are problems with
all these names and they come from the very basic fact that light cannot be treated as being
reflected only at the interface [120, p. 5, 307], but rather the incident light interacts with
all the atoms and molecules in the object. Only in the ideal situation of a perfectly smooth
surface of homogeneous material with infinite spatial extent, can the reflected and refracted
electromagnetic fields be neatly treated by the Fresnel equations discussed previously. For
nonideal (rough, nonhomogeneous, and finite-size) surfaces, the actual physical descrip-
tion is much more complicated. The geometric ray description is no longer valid. If the
surface roughness is small compared with the wavelength of the incident light, it can be
treated by the perturbation method for the ideal solution for smooth surfaces. If the scale
of the surface roughness or material inhomogeneity is of about the same order as the light
wavelength, the problem becomes very difficult to deal with theoretically. Only numerical
calculations can handle it satisfactorily. When the surface irregularity is of a scale much
larger than the wavelength, it is again possible to develop a method of approximation by
treating the surface as composed of many micro-facets, each of which is larger than many
wavelengths.
In this book, we use the name interface reflection for the first surface reflection, and the
name body reflection for the rest. Surface reflection will be used to describe the general
process of light being reflected from a surface and therefore includes both interface reflection
and body reflection. The term subsurface reflection poses no confusion and can be used
interchangeably with body reflection. The terms diffuse and specular will be used to refer
to the angular or directional distribution of the reflected light. If the angular distribution is
well concentrated in a narrow angle range, we will call it specular reflection. Otherwise, it
will be called diffuse reflection. This is mostly a qualitative distinction and it seems that it
is not necessary to be more quantitative than this in our application of these two terms. The
8.2 General description of light reflection 147
The subscript i denotes the component from the interface reflection and b that from the
body reflection. The diffuse component can come from either the interface reflection (due to
surface roughness) or the body reflection. The specular component is almost entirely from
the interface reflection and for a smooth surface this component is often perceived as the
specular highlight of surface reflection. The other component of the interface reflection
is more diffuse and is due to the roughness and/or the inhomogeneity of the surface that
cause the scattering of the reflected light into all directions. In comparison, body reflection
is always quite diffuse in direction, unless there is strong internal regularity in the material
structures. The major differences between the diffuse component from the interface re-
flection and the diffuse component from the body reflection are: (1) the former is more
dependent on the surface roughness, and (2) the spectral composition of the latter is more
strongly modified by the pigments or dyes in the subsurface material.
1. Geometric consideration
Referring to Fig. 8.2, the BRDF, f r , is defined as the ratio of the reflected radiance
dL r in the viewing direction to the irradiance dE i in the direction of the incident light
[412, 715]:
where (θi , φi ) and (θr , φr ) are the incident and reflected direction angles with respect to
the surface normal.
2. Spectral consideration
The original definition of the BRDF f r can be extended to include the light wavelength
as a variable [715, p. 31]. The resulting function is called the bidirectional spectral-
reflectance distribution function (BSRDF), which is defined by
where dL r and dE i are the reflected spectral radiance and the incident spectral irradi-
ance.
148 8 Scene physics
Z
dw r
dw i
qi qr
dA o Y
fi fr
X
3. Polarization consideration
If the incident light is polarized, we can use the Stokes vector to describe the incident
and reflected light beams. Surface reflection will then have to be described by a 4 × 4
Mueller matrix, whose elements are BSRDFs. This Mueller matrix is called the bidi-
rectional surface scattering spectral-reflectance distribution function (BSSSRDF). The
BSSSRDF is an extension of the bidirectional surface scattering reflectance distribution
function (BSSRDF) proposed by Richard Anderson [25] with the additional spectral
wavelength as a variable.
Although the concept of the BRDF and its extension to the spectral and polarization
cases is quite general and useful, we should point out its limitations as well:
r The concept is based on the approximation that light propagation can be treated as rays
(geometrical optics). Wherever and whenever this approximation breaks down (e.g. in
interference and diffraction), so does the concept of the BRDF. For modeling everyday
scenes for imaging, this does not pose too much of a problem because we can handle
those exceptions separately.
r Using the ratio of reflected radiance L r to incident irradiance E i assumes that these
two quantities are proportional, meaning that if E i is increased by a factor of 2, then
L r will be increased by a factor of 2. This may not be true for all object surfaces.
One difficulty arises from the practical question of how we measure the BRDF with
an instrument. An instrument for measuring the BRDF is called a goniophotometer. In
order to measure the BRDF, we have to illuminate the surface from each incident angle,
and detect the reflected light at each reflection angle. The aperture for illumination and
the aperture for detection of any real instrument have a finite size. The definition of the
BRDF is a ratio of differentials and the basic problem with practical measurement is
the question of instrument resolution. This is discussed in the literature [715, Section
IV]. Let us use illumination apertures of different sizes to measure the BRDF. If we get
a constant BRDF independent of the illumination aperture, there is no difficulty. This
8.2 General description of light reflection 149
is indeed the case for a Lambertian surface because the detected light (and hence L r )
increases in proportion to the illumination aperture, and so does the surface irradiance
(E i ). However, if using a different illumination aperture size gives us a different BRDF
measurement, what should we do? Experiments show that this difficulty is most serious
when the reflection angle is equal to the incident angle and the surface has a mirror-like
reflection [674]. The difficulty arises because the mirror-like reflection component does
not vary much with the illumination aperture, but the diffuse reflection component is
proportional to the illumination aperture. Therefore, in our imaging applications, this
problem has to be kept in mind when we deal with specular reflections.
r The definition of the BRDF assumes that the reflected radiance depends only on the
irradiance on the infinitely small area to which the sensor is pointing. This means that
light illuminating the near-by surface area should not affect our measurement. This is
not true if there is significant subsurface scattering. The light from the neighboring
area can be scattered below the surface and then emerge at the point at which we
are measuring. As a result, our measured value will differ depending on how large
the illuminated area is. This is called the edge effect. A useful (bidirectional surface
scattering distribution function) model for such subsurface light transport is described
in [466]. One solution in practical measurements is to illuminate (uniformly) an area
that is much larger than the extent of the subsurface scattering for the material we are
measuring.
There are other minor difficulties too. For example, fluorescent material cannot be treated
by the BSRDF concept as presented here. Despite of these difficulties, the BRDF and its
extensions are quite general and they are widely used.
Let us look at two special cases of the application of the BRDF. In the first case, we
derive the BRDF for a perfectly diffuse reflecting surface (ideal Lambertian surface). Here,
L r is the same for all directions and the total radiant reflected flux, r , from a small area
dA, is
r , can be calculated by integrating L r over the half-sphere over the surface element dA:
r = L r · dA · cos θr · dω
Since a perfect mirror reflects 100% of the incident light, the reflected flux r is equal to
the incident flux i . Thus, we have the relation
x
Figure 8.3. An example of a rough surface. The reference surface is on the x–y plane.
Figure 8.3 shows a rough surface. We will define a reference smooth surface z = 0, on top
of which the random fluctuations in height z(x, y) represent the surface profile of the rough
surface. In the following discussion, the reference surface is taken to be a plane (the mean
surface) located at the x–y plane. For any given rough surface, it is possible to measure its
surface profile with a light beam or a mechanical stylus. In principle, from these surface
profile measurements we can calculate the surface scattering. However, all profile measure-
ments are limited by the instrument bandwidth. Even if we know the exact surface profile,
we still have to know the precise location of the light beam relative to the surface before
we can calculate the exact light scattering numerically. Practically, for almost all applica-
tions, we are only interested in the statistical properties of the roughness and their effect
on light scattering. A given type of rough surface is often treated as a stationary, isotropic
two-dimensional stochastic process (a random field). For example, the surface profiles of all
surfaces that are polished by a mechanical polishing machine are considered as belonging
to the same random process. There are three functions that are often used to characterize a
random rough surface: (1) S( f x , f y ), the power spectrum (spectral density) of the surface
profiles, or equivalently C(τx , τ y ), the autocorrelation function of the surface profiles; (2)
the single-point or multiple-point joint surface height distribution; and (3) the surface slope
distribution. Conceptually, the power spectrum of a random field, Z (x, y), can be thought
of as the ensemble average of the square of the amplitude of the Fourier transform of every
possible realization of the random field Z (x, y). However, there is a serious theoretical
difficulty in doing so because a stationary random field extends to infinity with the same
property and its Fourier transform does not converge. One way to avoid this difficulty is
to define the power spectrum of a random field through the autocorrelation function of the
random field. Another method for defining the power spectrum is the generalized harmonic
analysis of Wiener [743, Chapter 12]. The autocorrelation function, C(τx , τ y ), is defined by
# $
C(τx , τ y ) = E Z (x, y)Z (x + τx , y + τ y )dxdy , (8.7)
where E[. . .] represents the expected value over the entire ensemble of the random field.
The power spectrum of the random field is then defined as the Fourier transform of the
152 8 Scene physics
autocorrelation function:
Since we are assuming that the reference mean surface is on the x–y plane, E[Z (x, y)] = 0,
and the autocorrelation function is the same as the covariance function. Therefore, the mean-
square height h 2 can be expressed as:
# $
h =E
2
Z (x, y)dxdy = C(0, 0).
2
(8.9)
Alternatively, a rough surface is often specified by its height distribution function, pheight (Z ),
or two-point joint distribution function pheight (Z 1 , Z 2 ). The height distribution function gives
less complete information about a rough surface than the power spectrum because it does
not say anything about the rate of height change as we traverse the rough surface. A third
function that has often been used is the surface slope distribution, pslope (z x , z y ), where
(z x , z y ) is the surface slope. In practice, the slope probability density function is quite
difficult to measure.
One of the most frequently used metrics in quantifying the roughness of a surface is the
root-mean-square height, h, in surface fluctuation. It is the standard deviation of the height
distribution. As we mentioned before, h does not tell us the rate of change in surface height,
i.e., the slope. Therefore, another quantity, called the correlation length, l, is used together
with h. The correlation length for a nonperiodic random rough surface is defined as the
offset distance over which the autocorrelation function falls to 1/e of its height at zero offset
(assuming that the random field is isotropic). The correlation length of a rough surface is
quite difficult to measure because of its sensitivity to instrument bandwidth [181].
Surface scattering has been studied extensively in radar and optics. Although the wave-
length of interest in the two fields differs by many orders of magnitude, the underlying
analysis is quite similar because the wavelength can be scaled with the roughness. A rough
surface in optics is a very smooth surface for radar. All roughness measures are scaled
by the wavelength of interest. Assuming that the incident light is in air or vacuum, a
good parameter for the classification of surface roughness is kh, where k = 2π/λ is the
wavenumber, λ is the wavelength in air or vacuum, and h is the root-mean-square height
of the roughness. As a general rule of thumb, a slightly rough surface is a surface with
kh < 1, a very rough surface is one with kh > 5, and a moderately rough surface is some-
thing in between. However, the light scattering behavior also depends on the incident and
scattering angles in question. Therefore, when we describe scattering from a rough surface,
we need to specify the lighting and viewing geometry as well as the surface topographic
irregularity.
The general problem of surface scattering is quite complicated. One of the successful
theories of scattering by a slightly rough surface treats the rough surface as a smooth
surface with perturbations in height which are smaller than a wavelength. This type of
treatment was first used by Lord Rayleigh for sound waves and later extended by Rice to
electromagnetic waves. It is called the first order Rayleigh–Rice vector perturbation theory
[62, 182, 789, 891, 916] because only the first term in the series expansion of the solution
is used for approximation. At the other extreme, the analysis of scattering by a very rough
8.2 General description of light reflection 153
surface usually treats the surface as a collection of small (but still much larger than a
wavelength), random, planar facets (tangent-plane approximation), each reflecting light
according to Fresnel reflection [66, 92]. The light scattered to a point A in space is the
sum of the light reflected from all those micro-facets whose surface normals are positioned
so that the angle of incidence is equal to the angle of reflection as viewed from point A.
We will call this analysis the specular-point tangent-plane approximation. There are three
further refinements that have been proposed: (1) self-shadowing: some tangent planes are
positioned correctly, but are occluded by the neighboring planes [883, 957]; (2) multiple
scattering: the incident wave is scattered several times on the rough surface before it is
scattered to the far distance; and (3) small perturbation of the tangent plane: each tangent
plane is treated as a slightly rough surface and the Rayleigh–Rice perturbation theory is
applied to calculate its scattering [926]. For surfaces that are between slightly rough and
very rough, there is no convenient approximation or theory for easy calculation, although
their measured light scattering behavior is shown to be between those predicted by the
first order Rayleigh–Rice vector perturbation theory and the specular-point tangent-plane
approximation.
The qualitative effect of surface roughness on light reflection can be described as follows.
For a perfectly smooth planar surface of infinite extent, the reflected light is in a single
specular direction θr = θi and φr = φi + π . Its BRDF is a delta function. If the planar
surface has only a finite area, the reflected light (far away) is now spread into a major
(very narrow) lobe around the specular angle and many very small lobes around it. This is
due to diffraction and the light distribution can be calculated from the Fourier transform
of the shape of the area. Now, if a very slight roughness is introduced into the surface,
the reflected radiance of the specular main lobe is decreased and that at other scattering
angles is increased. The specular main lobe is also called the coherent component (because
a perfectly smooth plane reflects a plane wave with a coherent phase), and the nonspecular
component is called the incoherent component. The average reflected power in the coherent
component decreases exponentially with the mean-square roughness height h 2 . As the
roughness continues to increase, the specular lobe continues to decrease until it is dominated
by the incoherent component and is no longer discernible when kh > 5.
We will now present some quantitative descriptions of surface scattering based on the
theories and approximations discussed above. As defined in Fig. 8.4, the mean surface is
located at the x–y plane and the height fluctuations z(x, y) is in the z-direction. The plane of
incidence, defined by the incident beam and the surface normal, is the x–z plane. Similarly,
the plane of scattering is defined by the scattered beam and the surface normal. The incident
beam is specified by (θi , φi = π) and the scattered beam by (θs , φs ). As we described in
Section 5.4.1, the polarization of an incident plane electromagnetic wave is specified with
respect to whether its electric field vector is perpendicular (⊥) or parallel () to the plane of
incidence. Similarly, the polarization of the scattered beam is defined relative to the plane of
scattering. The BRDF now has to be defined for the different polarizations for the incident
and the scattered light. We will use the notation BRDFab (θi , φi , θs , φs ), where a denotes the
polarization of the incident light and b that of the scattered light. For example, BRDF⊥
represents the bidirectional reflectance distribution for the perpendicular component of the
incident beam and the parallel component of the scattered beam.
154 8 Scene physics
qi incident ray
qs
scattered ray
fs
x
Figure 8.4. Definition of surface scattering angles. The incident ray is assumed to be on the x–z plane
and therefore its azimuth (horizontal) angle, φi is equal to π .
where a and b denote the polarizations of the incident and the scattered light. For a very
slightly rough surface of infinite extent (i.e., its surface slopes are everywhere less than 1
and the radii of curvature everywhere are greater than the incident wavelength), it can be
shown that the coherent component is approximately a specular spike and
(BRDF)C⊥⊥ = r⊥2 (θi ) exp[−4(kh cos θi )2 ] · δ(sin2 θs − sin2 θi ) · δ(φs − φi − π), (8.11)
(BRDF)C = r2 (θi ) exp[−4(kh cos θi ) ] · δ(sin θs − sin θi ) · δ(φs − φi − π ),
2 2 2
(8.12)
where k is the wavevector, h is the root-mean-square roughness height, and r⊥ and r are the
Fresnel reflection coefficients for the ideal, smooth surface, as described in Section 5.4.1.
Therefore, the coherent component amplitude is reduced exponentially by the surface mean-
square roughness. If the surface is of finite extent and its height distribution is a Gaussian
function, its BRDF for the coherent component of the scattered light can be derived as [66,
p. 702]:
A( f x , f y , L x , L y )
C
(BRDF )ab = |Bab |2 exp[−(2π)2 h 2 f z2 ], (8.13)
cos θi cos θs
where A( f x , f y , L x , L y ) is a factor that is determined by the shape and size (L x , L y ) of the
8.2 General description of light reflection 155
is tangent to the local surface fluctuation. The total scattered field at a given observation
point is the sum of all the reflected fields from those tangent planar facets that are correctly
oriented so that the incident angle is equal to the reflection angle. An additional refinement
of the tangent plane approximation is to allow each tangent plane to be a slightly rough
surface and use the Rayleigh–Rice vector perturbation theory to calculate its contribution to
the final scattered field. Two complications arise from the tangent plane approximation: self-
shadowing and multiple scattering between planar facets. These two factors are usually dealt
with as a multiplication factor that modifies the scattered field strength. The self-shadowing
has two effects: (1) some of the planar facets are occluded from incident illumination
(shadowing), and (2) some of the reflected light is occluded by its neighboring facets
from reaching the observation point (masking). Using the geometric optics approximation,
these effects are treated as a correction factor (the geometrical attenuation factor) that
attenuates the scattered radiance as a function of incident and scattering angles. This type
of approximation is valid only for the case in which the local planar facets are much larger
than the wavelength.
Without considering self-shadowing and multiple scattering, Barrick [65] showed that
the BRDF of a very rough surface is approximately proportional to the probability density
function of its surface slopes, ps :
4π|βab |2 fx fy
BRDFab = ps − , − , (8.24)
λ2 f z cos θi cos θs
2 fz fz
where βab is a factor that is not a function of the surface roughness, but is a complicated
function of the incident angle and scattered angle, θi , θs , and φs , and the incident and
scattered polarizations (a, b) [66, p. 722].
Beckmann and Spizzichino [92] used Kirchhoff theory to derive a general expression
for scattering by a rough surface. They discussed the case of a perfect conductor in detail.
Ogilvy’s book [723] offers a more up-to-date account of the derivation. This approach gives
a formal integral for surfaces of finite conductivity and the integral needs to be evaluated
numerically. Beckmann and Spizzichino give a closed form expression of the scattered field
for the case of a rough surface with a Gaussian height distribution. The reflectance of every
micro-facet on the local tangent plane of the rough surface is assumed to be 1.0 (for a
perfect conductor) or a local average “reflectance” of R(θi ) (this is a fudge factor because
the original derivation is valid only for a constant reflectance). The Beckmann–Spizzichino
model is well known, but there is some uncertainty about its exact form (see the discussion
in [404, p. 2854]). Here we use the form presented in [404, p. 2853] (note that we have
changed their formula into the BRDF by dividing dI /(I0 d) by cos θs and modifying the
delta functions into the correct form):
R(θi )
BRDF = exp[−g]δ(sin2 θs − sin2 θi ) · δ(φs − φi ± π )
cos θs
R(θi ) 1 ∞
gm
+ exp[−g] 2 F Wm , (8.25)
cos θs λ m=1 m!
g = k 2 h 2 (cos θi + cos θs )2 , (8.26)
1 + cos θi cos θs − sin θi sin θs cos φs
F = , (8.27)
cos θi (cos θi + cos θs )2
8.2 General description of light reflection 157
and
π 2
Wm = l exp[−(2π )2 ( f x2 + f y2 )l 2 /4m] (8.28)
m
for a Gaussian autocorrelation function, or
2π (l/m)2
Wm = (8.29)
[1 + (l/m)2 (2π )2 ( f x2 + f y2 )l 2 ]3/2
for an exponential autocorrelation, where h is the root-mean-square height, l is the correla-
tion length, and f x and f y are as defined in Eqs. (8.14). The first term represents a specular
spike at θs = θi and it is reduced by the surface roughness exponentially, as we discussed
in the previous section on scattering by slightly rough surfaces.
Torrance and Sparrow [957] developed a theory to account for off-specular reflection
observed in both metal and nonmetal surfaces. The theory is based on the random planar
facet model of the rough surface. For each micro-facet plane, scattering is modeled simply
as the Fresnel reflection factor, F. The self-shadowing and masking effects are modeled
with geometric optics, and the total effect is represented as a geometrical attenuation factor,
G. The micro-facet model of the rough surface is described with a slope distribution, D.
The resulting expression, BRDF = cFGD (where c is a constant), is found to agree with
experimental data quite well for very rough surfaces (kh 1). The factor G is given as
[213]:
# $
2(N · H)(N · V) 2(N · H)(N · L)
G = min 1, , , (8.30)
(V · H) (V · H)
where N is the unit normal of the mean surface, V is the unit vector of the scattered beam
(pointing to the observer), L is the unit vector of the incident beam (pointing to the light
source), and H is the unit angular bisector of L and V:
V+L
H= . (8.31)
|V + L|
polishing, such as diamond-turned surfaces. The surface autocorrelation function for most
surfaces can be better described by exponential distributions or fractal distributions. Only
very rarely is a Gaussian correlation function adequate [95].
Various numerical and experimental studies have established some ranges of parameters
within which different models are accurate. For slightly rough surfaces (kh ≤ 0.5), the
Rayleigh–Rice vector perturbation theory is known to be valid for kl < 4. The Kirchhoff
approximation (from which the Beckmann–Spizzichino model is derived) seems to be
accurate for kl > 6 and kh 1 for surfaces that have a Gaussian roughness spectrum,
except at high incident angles [950]. Numerical studies also show that for very rough surfaces
(e.g., kh = 9 and kl = 12), significant backscattering (θs = θi and φs = φi ) can occur at
an incident angle not far from normal, say 10◦ . The BRDF for a certain range of incident
angles shows two peaks, one at some off-specular angle and the other at a backscattering
angle. The backscattering seems to be the consequence of multiple scattering and cannot
be predicted from the approximation methods discussed so far.
and we are forced to examine the details of how the two diffuse components should be
modeled.
where the functions g and h are normalized to 1 at their maximum values. The basic assump-
tion is that the spectral and geometrical factors can be separated as products. Furthermore,
the spectral factor is assumed to be independent of the lighting and viewing geometry. For
many types of material, the interface reflection component is close to nonselective, and
therefore, we can set the coefficient b(λ) to a constant, s, and call the resulting model the
neutral interface reflection (NIR) model. The NIR model [563] can expressed as
The physical meaning of the NIR model as expressed in Eq. (8.33) is that the surface reflects
light in two components: the first term represents the wavelength-selective reflection, and
the second term, the nonselective reflection. This model has been used widely in computer
graphics. It is also used to estimate the scene illuminant chromaticity [560].
where ρ is the total diffuse reflectance, θi is the incident angle, θr is the scattering an-
gle, n is the index of refraction of the surface material, F1 (θ, n) is the Fresnel reflection
coefficient for incident angle θ and index of refraction n, as discussed in Section 5.4.1,
and
# $
sin θr 1
F2 (θr , n) = F1 sin−1 , . (8.35)
n n
160 8 Scene physics
Strictly speaking, ρ is a function of the lighting and viewing geometry. However, its value
does not change too much over a large range of angles, and it can therefore be approximated
as a constant [1045]. It is interesting to see that the first Fresnel factor, [1 − F1 ], accounts
for how much light is transmitted from the air into the surface body for the given incident
angle, θi , and the second Fresnel factor, [1 − F2 ], accounts for how much light escapes back
into the air from the surface body for the given incident angle, θ, which is related to the
reflection angle θr by Snell’s law, sin θr = n sin θ.
The Chandrasekhar–Wolff model predicts that for a normal incident angle, the reflected
radiance is relatively constant for θr less than 50◦ , but falls off rapidly when the reflection
angle is increased above 50◦ . This is in sharp contrast with the prediction by the Lambertian
model which predicts a constant radiance for all reflection angles. Wolff has shown that the
Chandrasekhar–Wolff model is in good agreement with the limited experimental data from
smooth surfaces.
such as the root-mean-square roughness, the correlation length, or the surface power spec-
trum. The only reliable way to describe reflection from these surfaces is to measure their
BRDFs directly. However, measuring the BRDF, BSRDF, or BSSSRDF requires a complex
instrument, such as a goniophotometer or a goniospectrophotometer, that is not generally
available, requires skill to operate and calibrate, and with which it takes a long time to
measure a sample (because many angles and wavelengths have to be measured). Therefore,
many empirical surface reflection models have been proposed and they are widely used in
computer graphics, computer vision, and illumination engineering. The simplest model is
the Lambertian surface which has the same reflected radiance independent of the viewing
angle. This model is popular not only because it is simple, but also because it seems to
confirm our observation that most object surfaces look approximately equally bright from
different angles. Of course, near the specular highlight, the Lambertian model does not
work well. So, the next simplest model adds a specular component as a function of the
incident angle to the basic Lambertian reflection model. A popular model in the early years
of computer graphics was that of Phong [757], which models surface reflection as the sum
of a specular lobe and a diffuse (Lambertian) lobe. The specular lobe peaks at the specular
reflection angle and falls off as a power function of the cosine of the off-specular angle.
From the above discussion, we can see that an empirical surface reflection model may
consist of two components, one from interface reflection and the other from body reflection
[384, 696]. Interface reflection of a slightly or moderately rough surface usually has a
specular spike (coherent field) and a specular lobe (incoherent field). For a very rough
surface, the specular spike is often quite small and indistinguishable from the specular lobe.
Body reflection is comparatively less directional, and is called the diffuse lobe. As more
and more BRDF data are measured in the future, we may hope that someday the shapes
of the spike and lobes can be described by families of function that are easy to use. One
example of such an easy-to-use model was proposed by Ward [1012].
Ward’s general reflection model is called the anisotropic (elliptical) Gaussian model.
The incident and reflecting geometry is shown in Fig. 8.5. The model assumes that the
surface has two perpendicular (uncorrelated) slope distributions, characterized by σx and
N H
d
L
V qr
qi
x
Figure 8.5. The incident and reflecting geometry for the anisotropic (elliptical) Gaussian model
proposed by Ward.
162 8 Scene physics
σ y , where x and y are the two major axes of the elliptical anisotropic surface. Let N be the
unit surface normal, V be the viewing vector (pointing to the observer), L be the incident
vector (pointing to the source), and H be the unit angular bisector of L and V:
V+L
H= . (8.40)
|V + L|
The angle δ is the angle between N and H. The angle φ is the angle between the x-axis and
the projection of H on the tangent plane (x–y plane) of the surface. The empirical BRDF
function is
ρd 1 exp[− tan2 δ(cos2 φ/σx2 + sin2 φ/σ y2 )]
f (θi , φi ; θr , φr ) = + ρs √ , (8.41)
π cos θi cos θr 4π σx σ y
where ρd is the diffuse reflectance, ρs is the specular reflectance, ρd + ρs is less than 1 and
σ is not much greater than 0.2. An isotropic reflection model can be obtained by setting
σ = σx = σ y :
Material ρd ρs σx σy
incident direction
incident direction
35◦ . The overall shape of the BRDF is close to that of a Lambertian reflector, except that
there is noticeable backscattering at the incident angle and some off-specular reflection at
θr = 75◦ .
Figure 8.7 shows an example of measured BRDF data along the plane of incidence for
a steel plate surface with a matte finish [321]. The incident angle is 30◦ . We now see that
the overall shape of the BRDF is very different from that of a Lambertian reflector. The
164 8 Scene physics
reflection is strongest for θr = θi . However, there is significant spread around this ideal
reflection angle, due to the rough finish of the surface.
In the previous section, we briefly discussed how body (subsurface) reflection can be mod-
eled through radiative transfer theory. In fact, this is a topic that is of great interest in the
paint, textile, paper, plastic, and publishing industries. The central question is how to mix
different amounts of dyes and/or pigments to obtain products of a desired color. This is an
important part of a process called colorant formulation, which also includes the selection of
dyes and pigments for the application requirement, the study of color matching criteria, and
the minimization of production cost. An interesting example of such applications is called
shading in paint manufacturing. Pigments are ground and dispersed in the liquid solvent
and binder (called the vehicle) by milling equipment to produce paint of the desired color.
Because of batch to batch differences, the correct color is reached by successive approxima-
tions. First, about 80% of the required pigments is ground into the vehicle. Then the color
is measured and the additional amount of each pigment required to match the desired color
is calculated. Somewhat less than the calculated amount is added and the color checked
again. Using repeated trials with visual inspection and comparison it usually takes as many
as ten trials to come up with a good color match. With the help of computational models
and color measurement, it is now possible to make the match in one or two trials.
The physics of colorant formulation is also of interest in color imaging science because
certain color reproduction problems in color imaging share similar types of calculation and
modeling. An example is the prediction of reflection density from known dye transmittance
functions. Other examples will be discussed in later chapters. The basic physical model
is simple for a transparent medium, but can be quite complicated for a turbid (translucent
or opaque) medium. A simple classification is that a transparent medium does not contain
appreciable numbers of particles or inhomogeneities that produce significant scattering,
while a turbid medium does. These two types of medium are discussed separately as follows.
L λ (z + t) = L λ (z)e−α(λ)t , (8.43)
8.3 Radiative transfer theory and colorant formulation 165
where α(λ) is called the absorption coefficient. If we express the relation in base 10:
L λ (z + t) = L λ (z)10−k(λ)t , (8.44)
then k(λ) = α(λ)/2.303 is called the extinction coefficient. The corresponding law regarding
the dependence of absorption on colorant concentration c is called Beer’s law:
L λ (z + t) = L λ (z)10−k (λ)c , (8.45)
for a fixed thickness t. Beer’s law can be combined with the Lambert–Bouguer law to give
the Beer–Lambert–Bouguer law (usually, called Beer’s law):
L λ (z + t) = L λ (z)10−(λ)ct , (8.46)
where (λ) is called the molar extinction coefficient (or the molar absorption coefficient) and
c is the molar concentration of the absorbing colorant. In general, the extinction coefficient
also includes the effect of scattering. Unfortunately, the names and symbols of these terms
are not standardized across different fields and we have to pay attention to the meaning and
units of these terms when reading the literature.
The Bouguer–Lambert law can also be written in differential form when the extinction
coefficient is not constant throughout the medium. Within a transparent medium, the dif-
ferential amount of spectral radiance attenuation, dL λ (z), is proportional to the differential
distance it travels in the medium, dz, i.e.,
where αλ (z) is the absorption coefficient at spatial location z. Integrating over a finite
thickness, t, we have
# z+t $ z+t
L λ (z + t) = L λ (z) exp − αλ (z)dz = L λ (z)10− z kλ (z)dz . (8.48)
z
Dλ = − log Tλ = kλ t. (8.50)
k = ks + 1 c1 + 2 c2 + · · · , (8.51)
where ks is the extinction coefficient for the substrate alone, and i and ci are the molar
absorption coefficient and molar concentration of the ith colorant.
166 8 Scene physics
n1 n2 n3
L
Figure 8.8. Calculation of transmittance through a transparent film.
All the above discussion of absorption and transmission applies only to the interior of a
transparent medium. In many applications, the medium has a finite thickness and the effect
of the discontinuity in the index of refraction at the surface boundaries has to be considered.
This effect is usually called the surface correction. Let us consider the example of such a
calculation shown in Fig. 8.8. Assume that we have a transparent film of thickness t and
index of refraction n 2 . On one side of the film is a medium with the index of refraction
n 1 and, on the other side, a medium with index of refraction n 3 . A beam of light with
radiance L is incident on the film from medium 1. At the boundary of the interface, part
of the light is reflected back. A portion of the light that is transmitted into the film will be
partially absorbed. As the attenuated beam reaches the second surface boundary, part of it
is reflected back into the film and the rest is transmitted into medium 3. The reflected beam
then goes through the same process again. If the incident beam is perpendicular to the film,
the reflection coefficients at the two interfaces are
2
n1 − n2
r1 = , (8.52)
n1 + n2
2
n2 − n3
r2 = . (8.53)
n2 + n3
It should be noted that when the beam is perpendicular to the interface, the reflection
coefficient is the same no matter which way the beam is traveling (from medium 1 to
medium 2 or from medium 2 to medium 1). Our problem is to calculate how much light is
transmitted from medium 1 to medium 3. The internal transmittance Ti can be calculated
from the extinction coefficient, k, of the film, i.e., Ti = 10−kt . The total amount of light
transmitted through the film is
air
z=0
front surface
q1
F2 q2
F3
z
dA
dz
back surface
z=t
backing
Figure 8.9. The partition of the space around an elemental area d A in the medium into many channels.
The solid angle delineated by the cone θ1 is the first channel. The solid angle between the cones θ1
and θ2 is called channel 2, and so on.
Many-flux theory
Let us assume that the scattering medium is bounded by parallel planes and extends over
a region very large compared with its thickness, and that the medium is homogeneous and
isotropic. The many-flux theory partitions the space around an elemental area dA in the
medium into many concentric cone shells, each called a channel. The elemental area dA is
parallel to the surface of the medium. Figure 8.9 shows the partition. The z-axis is normal
to the interface between the air and the medium. The scattering medium is placed on top
of a backing. The thickness of the turbid medium is t. The air–medium boundary plane is
located at z = 0 and the backing is located at z = t. Let Fi be the flux going into channel
i. We will assume that there are m channels, where m is an even integer, such that m/2
channels (index 1 ≤ i ≤ m/2) are going down and m/2 channels (index m/2 < i ≤ m) are
going up. Let si j be the scattering coefficient that describes the scattering from channel j
into channel i when i = j. When i = j, s j j represents the total scattering and absorption
that occur in channel j.
dFi m
= si j F j , i ≤ m/2, (8.56)
dz j=1
dFi m
− = si j F j , i > m/2, (8.57)
dz j=1
where the negative sign for i > m/2 is to account for the fact that these channel fluxes are
going in the negative z direction. The above radiative transfer equations can be written in
8.3 Radiative transfer theory and colorant formulation 169
matrix form. Let the matrix be defined by i j = si j when i ≤ m/2, and i j = −si j when
i > m/2. Let F be the vector of [F1 , F2 , . . . , Fm ]T , then
dF
= F. (8.58)
dz
The general solution of this equation is:
m
Fi = Ai j C j eλ j z , (8.59)
j=1
where λ j are the eigenvalues of the matrix , Ai j is the ith component of the eigenvector
corresponding to the eigenvalue λ j , and C j are constants to be determined from the boundary
conditions at the front and back surfaces.
The many-flux theory provides a computational method for dealing with any type of
illumination and material scattering. However, it is not very practical to experimentally
measure so many coefficients in the matrix . In fact, the most useful theory has been the
two-flux (Kubelka–Munk) theory. One problem with the two-flux theory is that it deals with
only diffuse illumination and diffuse reflection/transmission of the surfaces. With advances
in computing power, the four-flux theory should become more useful because it can deal
with more realistic illumination.
Four-flux theory
If the external light is collimated and is perpendicular to the material surface, we can expect
that a major component of the transmitted light just beneath the surface of a turbid material
will be a directional flux. As light penetrates deeper into the material, it is scattered more
and the intensity of the directional flux is reduced while the intensity of the scattered diffuse
flux is increased. Therefore, a useful model for light propagation in a turbid material can be
developed by using only four fluxes: two directional fluxes (one forward and one backward)
and two diffuse fluxes (again, one forward and one backward). Figure 8.10 shows the setup
of the four-flux theory.
Let F1 , F2 , F3 , and F4 be the forward directional flux, the backward directional flux,
the forward diffuse flux, and the backward diffuse flux, respectively. The scattering and
absorption coefficients are defined as follows:
%
1 dF1 1 dF2 %%
k =− = ;
F1 dz F2 dz %directional−absorption
%
1 dF1 1 dF2 %%
sf = − = ;
F1 dz F2 dz %directional−forward−scattering
%
1 dF1 1 dF2 %%
sb = − = ; (8.60)
F1 dz F2 dz %directional−backward−scattering
%
1 dF3 1 dF4 %%
K =− = ;
F3 dz F4 dz %diffuse−absorption
%
1 dF3 1 dF4 %%
S =− = .
3F dz F dz %
4 diffuse−scattering
170 8 Scene physics
air
z=0
front surface
z
F1 F3
dz
F2 F4
back surface
z=t
backing
Figure 8.10. The four-flux theory assumes that there are two directional fluxes (F1 and F2 ) and two
diffuse fluxes (F3 and F4 ).
Radiative transfer equations are set up by accounting for energy fluxes being transferred
from one flux to another or absorbed by the medium. For example, the forward-scattered
flux from F1 is added to the forward diffuse flux F3 , while the backward-scattered flux
from F1 is added to the backward diffuse flux F4 . With these coefficients, we can write the
radiative energy transport equations as follows:
dF1 = −k F1 dz − sf F1 dz − sb F1 dz;
−dF2 = −k F2 dz − sf F2 dz − sb F2 dz;
(8.61)
dF3 = sf F1 dz + sb F2 dz − (K + S)F3 dz + S F4 dz;
−dF4 = sb F1 dz + sf F2 dz + S F3 dz − (K + S)F4 dz.
Let µ = k + sf + sb and a = 1 + (K /S), then we can simplify the above equations to:
dF1
= −µF1 ;
dz
dF2
= µF2 ;
dz
(8.62)
dF3
= sf F1 + sb F2 − aS F3 + S F4 ;
dz
dF4
= −sb F1 − sf F2 − S F3 + aS F4 .
dz
These equations can be solved in many different ways, depending on the boundary
conditions chosen. One solution based on the boundary conditions at the air–medium
interface is:
8.3 Radiative transfer theory and colorant formulation 171
F1 (z) = F1 (0)e−µz ;
F2 (z) = F2 (0)eµz = F2 (t)e−µ(t−z) ;
sf + µA
F3 (z) = F1 (0) Ae−µz − A cosh Sbz + sinh Sbz
Sb
sb − µB
+ F2 (0) Beµz − B cosh Sbz + sinh Sbz
Sb
(8.63)
a 1
+ F3 (0) cosh Sbz − sinh Sbz + F4 (0) sinh Sbz ;
b b
sb + µB
F4 (z) = F1 (0) Be−µz − B cosh Sbz + sinh Sbz
Sb
sf − µA 1
+ F2 (0) Aeµz − A cosh Sbz + sinh Sbz − F3 (0) sinh Sbz
Sb b
a
+ F4 (0) cosh Sbz + sinh Sbz ,
b
√
where b = a 2 − 1 and
S(asf + sb ) + µsf
A = ,
S 2 b2 − µ2
S(asb + sf ) − µsb
B = .
S 2 b 2 − µ2
the air–medium interface, but still internal to the medium. We can solve for Ri by noting that
a 1
F3 (t) = cosh Sbt − sinh Sbt F3 (0) + sinh Sbt F4 (0), (8.67)
b b
a 1
F4 (t) = cosh Sbt + sinh Sbt F4 (0) − sinh Sbt F3 (0), (8.68)
b b
and therefore
F4 (0) 1 − Rg (a − b coth Sbt)
Ri = = . (8.69)
F3 (0) a − Rg + b coth Sbt
There are two cases of special interest here. The first is when there is no scattering, i.e.,
S → 0. Then,
1 − Rg (1 + K /S) + Rg b coth Sbt
Ri =
(1 + K /S) − Rg + b coth Sbt
S − Rg (K + S) + Rg Sb coth Sbt
=
(K + S) − Rg S + bS coth Sbt
−Rg K + Rg Sb coth Sbt −Rg K + Rg K coth K t
≈ ≈ ,
K + bS coth Sbt K + K coth K t
and therefore
lim Ri = Rg e−2K t , (8.70)
S→0
which is what we expect from the Bouguer–Lambert law because the flux travels through
the thickness twice (going down and coming up). However, it should be noted [1056] that
the absorption coefficient K here refers to the diffuse flux, while in the Bouguer–Lambert
law, the extinction coefficient k refers to directional flux. The other special case is when
the thickness t approaches infinity, i.e., coth Sbt → 1. From the definition of a and b, it
follows that (a + b)(a − b) = a 2 − b2 = 1, and we have:
1 − Rg (a − b) (a + b)(a − b) − Rg (a − b)
Ri = = = a − b.
a − Rg + b (a + b) − Rg
This means that when the layer is very thick, the background reflectance, Rg , no longer
affects the layer reflectance, Ri . This result is usually expressed as
1/2
K K 2 K
R∞ = 1 + − +2 . (8.71)
S S S
to the calculation given for a transparent film in the previous section, except that the light
passing from the medium to the air is now diffuse and the reflectance at the medium–air
interface is very different from that at air–medium interface.
Depending on the external illumination and measurement geometry, there are different
formulas for the Saunderson correction. In order to calculate the correction, we need to
define a number of quantities. Let rx and ri represent the external and internal reflection
factors for diffuse light, respectively. From the Fresnel reflection equation, it can be shown
that [268]
# 2 2 $ & '
1 (n − 1)(3n + 1) n (n − 1)2 n−1 2n 3 n 2 + 2n − 1
rx = + + ln −
2 6(n + 1)2 (n 2 + 1)3 n+1 (n 2 + 1)(n 4 − 1)
# $
8n 4 (n 4 + 1)
+ ln n. (8.73)
(n 2 + 1)(n 4 − 1)2
It is also possible to derive a similar theoretical equation for ri . However, such a theoretical
value is often not useful because of the complication caused by the total internal reflection
when the incident angle in the medium is larger than the critical angle. This makes the
real internal reflection factor ri very sensitive to the actually angular distribution of the
upward emergent flux. For example, the theoretical value of ri is 0.596 for n = 1.5, but
backward calculation from experimental data often indicates a smaller value of around
0.4–0.5. Therefore, in practice, ri is treated as an additional constant to be estimated.
If the external illumination is diffuse and the sensing geometry is also diffuse, we can
determine the relation between Ri and R as follows. Let L be the incident flux. The reflected
flux, L r , leaving the medium is
L r = Lrx + L(1 − rx )Ri (1 − ri ) + L(1 − rx )Riri Ri (1 − ri ) + · · ·
= Lrx + L(1 − rx )Ri (1 − ri )[1 + ri Ri + ri2 Ri2 + · · ·]
# $
(1 − rx )Ri (1 − ri )
= L rx + (8.74)
1 − r i Ri
and
Lr (1 − rx )(1 − ri )Ri
R= = rx + . (8.75)
L 1 − r i Ri
Now, let us calculate an ideal case in which Ri = 0.5, n = 1.5, rx = 0.0918, and ri = 0.596.
The externally measured reflectance is R = 0.353 which is significantly different from Ri .
If instead of the ideal value, ri is only 0.4, then R = 0.432.
The spectral composition of the light emitted from a light source undergoes various changes
in the process of scattering, reflection, refraction, transmission, and absorption. As a con-
sequence, objects under one illumination send light of different spectral compositions to
our visual system and they produce our perception of various colors. There are many ways
materials or light sources can modify or generate light of different spectral compositions
(see, e.g., [276, 391, 693, 1038], for more details).
174 8 Scene physics
According to Nassau [693], color can be produced by: (1) incandescence (light emitted
by heated bodies), (2) gas excitation, (3) atomic or molecular vibration and rotation, (4)
transition metals in a ligand field, (5) orbital transitions in organic molecules, (6) charge
transfer, (7) energy bands and impurities in metals and semiconductors, (8) color centers,
(9) dispersive refraction and polarization, (10) scattering, (11) nonlinear optical effects, (12)
interference, (13) diffraction, etc. It is clear that causes of color are many. However, it should
be noted that this classification of different causes is a matter of convenience and different
causes should not be taken as mutually exclusive. After all, light–matter interaction can be
described on a very fundamental level by a single theory called quantum electrodynamics.
The following sections discuss some of the common causes of color.
8.4.2 Scattering
Scattering gives rise not only to the blue color of the sky but also to the blue color seen on
many animals. Small particles overlying a dark layer give a blue color because the portion
of light that is not scattered is absorbed by the underlying dark layer. If the underlying layer
has some color in it, the color will combine with the blue to produce various hues. The blue
feathers in the blue jay birds, the blue neck coloration of turkeys, and blue eyes in humans
(scattering from the pigments in the iris) are all examples where colors are produced by
structural scattering.
8.4.3 Interference
Thin films and regular fine structures can produce colors by interference. The colors we see
on soap bubbles, peacocks, butterflies, beetles, and opals are the results of light interference
[878]. The colors of these objects change when we shift our angle of view.
8.4.4 Dispersion
The index of refraction is a function of wavelength and as a result light of different wave-
lengths is refracted to different angles, producing spectrum-like colors. Diamond has the
highest dispersion power of any naturally occurring gem material, and it also gives a beau-
tiful play of color as it is turned. A rainbow is produced when sunlight is dispersed by water
droplets through internal reflection.
In the study of color imaging, it is always helpful to model the underlying physics as much as
we can. This often means that some basic understanding of material structures is required.
In this section we will give a brief description of the composition and structure of several
major classes of common materials: water, minerals, ceramics, metals, glass, polymers,
plants, and animals. Books on materials science (e.g., [217, 853]) can be consulted for
further details.
8.5.1 Water
A water molecule consists of two hydrogen atoms bonded to one oxygen atom. The oxygen–
hydrogen bond length is about 0.096 nm, and the angle between the two oxygen–hydrogen
bonds is 101.52◦ . The attractive forces between water molecules, in both the liquid and solid
states, are mainly provided by hydrogen bonds. The positive charge on a hydrogen atom of
one water molecule is attracted to the negative charge on an oxygen atom of another water
molecule. The two hydrogen bonds and the two covalent (oxygen–hydrogen) bonds form an
approximate tetrahedron. This symmetry is found in ice crystals. However, in liquid water
at room temperature, a molecule oscillates within a surrounding cage of other molecules
176 8 Scene physics
for about 4 ps, before jumping out of this position and into an adjacent cage [217, p. 89].
Water molecules can spontaneously break up to form H+ and OH− at a rate of 2.5 × 10−5
per second, resulting in a concentration of H+ (and also OH− ) of about 10−7 mole per
liter. The hydrogen ion concentration is measured by pH values (− log10 moles per liter).
If the hydrogen ion concentration is higher than 10−7 mole per liter due to the presence of
other chemicals (i.e., when the pH value is lower than 7), the solution is considered acidic,
and otherwise, it is considered basic. Changing the pH value of a solution can change the
solution’s color dramatically. For example, the same molecule, cyanidin, is responsible for
the red of the poppy and the blue of the cornflower [43]. In the poppy, the sap is acid and the
cyanidin acquires a hydrogen ion and its color is red. In the cornflower, the sap is alkaline,
and the cyanidin loses a hydrogen ion and its color is blue!
The color of pure water is pale blue because the hydrogen bonds tend to drag and push
neighboring water molecules when they are vibrating, causing a slight absorption at the
long-wavelength end of the visible spectrum.
8.5.2 Metals
Metals consist of polycrystals. Each crystal is called a grain. The grain size is typically in
the range of 10 µm to 1 mm. Since each grain has a different orientation of its crystal planes,
a metal surface exhibits grain boundaries under microscope examination using polarized
light. Most metals reflect all wavelengths of visible light nonselectively and appear to be
silvery gray; exceptions are the red–yellow of copper and the yellow of gold. In both copper
and gold, electrons in the filled d band can be transferred to vacant s and p bands, resulting
absorption in the visible wavelength range.
8.5.3 Minerals
In terms of percentages of the whole by weight, about 45% of the earth is oxygen and
about 28% of it is silicon. The outer crust of the earth consists of three types of rock:
igneous, metamorphic, and sedimentary. Most rocks are formed from aggregates of mainly
minerals and a small amount of nonmineral substances. In a restricted sense, a mineral
is a homogeneous inorganic crystalline solid occurring in the earth’s crust, although the
term “mineral” is often used more loosely as anything that is not animal or plant. In our
discussion here, we will use the term in the restricted sense. Of all the earth’s minerals,
more than 90% are silicates which are composed of silicon and oxygen.
Most minerals have quite complex chemical compositions and their crystal structures
are also very complicated. Table 8.3 shows the composition of some common minerals.
Crystal size ranges from smaller than a millimeter to a few centimeters.
Mineral Composition
8.5.5 Glass
The word glass is usually used to refer to a certain type of material, such as window glass
made of silicate oxides. As our understanding of the common glass material increased, it
was realized that the key characteristic of glass is its lack of long-range (distance longer
than a few hundred atoms) order. Therefore, glass is also used to denote a state of solid
that does not have specific order. The chemical elements that form glass are the same as
those that form ceramics. Ceramics contain polycrystals, but glass does not. The structure
in glass lacks the periodic regularity of ceramics. Instead of a regular arrangement of atoms
repeated over a distance of a few thousand or more atoms as in a crystal, glass consists
of a random arrangement of the same basic building blocks (such as Al2 O3 ) randomly
connected, a structure called random network.
Glasses that are used in large quantities for windows, bottles, and other containers are
mixtures of oxides. They are the soda-lime–silica glasses, consisting of 70–75% silica
(SiO2 ), 13–17% sodium oxide (Na2 O), 5–10% lime (CaO), 1–5% magnesia (MgO), and
other oxides in minor quantities.
Two of the most significant optical properties of glasses are their transparency and
their dispersion power to visible light. The transparency is primarily due to the random
arrangement of atoms and the lack of long-range order, which causes the electrons to be
trapped in localized wave states. The electrons do not receive sufficient energy from visible
light to enable them to jump to the next energy state.
Colored glass is produced by adding a very small amount of impurity to the glass. For
example, Fe3+ ions give the glass a green color, while a small amount of copper gives it a
blue color. An element produces different colors in different glasses because the absorption
bands of the color center depend on the local electronic environment. Cobalt gives a pink
color to a borosilicate glass, a purple color to potash–silica glass, and a blue color to a soda-
lime–silica glass. These impurity color centers are usually the transition metals, which have
unfilled d-orbitals. Another way of producing color in glass is to have small aggregates of
impurity atoms, which have diameters comparable with or smaller than the wavelengths of
visible light. For example, the presence of gold colloidal particles of about 40 nm diameter
in lead-silicate glass gives the glass a beautiful red color.
8.5.6 Polymers
The word polymer means many (poly) parts (meros). Polymers are formed by combining
(chaining) together many smaller units. These smaller units are called monomers and most
polymers consist of repetitions of only a few kinds of monomers. For example, polyethylene
consists of only one monomer, –C2 H4 –. The elements associated with commercial polymers
are H, C, N, O, F, and Si. Most polymers are organic materials (containing carbon). Polymer
chains are produced by inducing chemical reactions between monomers so that they are
joined by chemical bonds in a long chain. The polymer molecules are so long (several
hundred nanometers) that it is difficult for them to form crystals. For example, single
crystals of polyethylene are difficult to grow. They tend to be produced as thin platelets
about 10 nm thick, with chains folded back and forth in a small volume.
8.5 Common materials 179
Cellulose and amylose (starch) are the two most abundant biopolymers in plants. Both
have the same monomer, glucose, and their structural difference arises from the way the
glucose units are connected together.
Most known commercial polymers are plastics and synthetic fibers which are not colored.
In order to produce colored plastics, it is necessary to add color pigments in the manufac-
turing process. Color pigments are not soluble in the plastics or synthetic fibers, and they
have to be dispersed in appropriate carrier media which can be solid or liquid. Polyethylene,
polypropylene, polyvinyl chloride (PVC), and polystyrene are some of the major types of
plastic. Polyester, polyamide, acrylic, and polyolefin (olefin: hydrocarbons with a double
bond) are some of the major types of synthetic fiber.
Conventional textile dyeing uses water-soluble or dispersed dyes applied from an aque-
ous bath. Therefore dye molecules are adsorbed onto the fiber surfaces. Modern mass
coloration introduces colorants during or after the formation of the fiber-forming polymers.
Dye molecules or pigments are thus dispersed into the fiber filament, rather than adsorbed
onto its surface. Interface reflection from plastics and mass-colored synthetic fibers is mostly
nonselective, while the spectral composition of their body reflection is determined by the
pigments used to color the plastics. Therefore, reflection from plastics materials can be
described quite well by the dichromatic reflection model or, more specifically, by the NIR
model.
8.5.7 Plants
The green color of plant leaves comes from the pigment chlorophyll. There are several
forms of chlorophyll, each with its own characteristic peaks in its absorption spectrum. The
most abundant form is chlorophyll a. Chlorophyll is the photopigment that is responsible for
photosynthesis. In the green alga, Chlorella, the most effective light for photosynthesis is in
two wavelength ranges: long-wavelength light (650–680 nm) and short-wavelength (400–
460 nm) light. Therefore the reflected as well as the transmitted light consists of mainly
middle wavelength light. There are three major classes of photosynthesis pigments found
in plants and algae: the chlorophylls, the carotenoids, and the phycobilins (in some algae).
The carotenoids and the phycobilins are called accessory photosynthetic pigments because
the light energy absorbed by these pigments can be passed on to chlorophyll. The number of
carotenoid molecules is smaller than that of the chlorophyll (about 1:3). A dark green leaf
has more carotenoids than a light green leaf. Table 8.4 shows the peak wavelengths of the
absorption spectra of these pigments in organic solvents, such as ether. The carotenoids are
yellow or orange pigments found in most photosynthesizing leaf cells. In the fall when the
chlorophyll breaks down, the color of carotenoids and other pigments, such as anthocyanins
and flavonols, can be seen in the colorful leaves.
Carotene (C40 H56 ) and its derivatives are also found in many other plants, fruits, and
vegetables. They are wholely or partially responsible for the orange–yellow color in carrot,
mango, persimmon, butter, etc. Lecopene is a molecule of carotene with its two terminal
rings open. It is responsible for the red color of tomato. Carotene and lecopene are both
present in and produce the color of apricots. The yellow color of corn is caused by carotene
and zeaxanthin (C40 H56 O2 ), which is carotene with an oxygen added at each end ring.
180 8 Scene physics
Flavonoids are a class of molecules that have a common structure as shown in Fig. 8.11,
with two benzene-like rings and another one in the middle that includes an oxygen. Most
of the flavonoids occur in combination with a sugar molecule. Flavonoids are present in
leaves and flower petals. They absorb ultraviolet light to protect plant DNAs. Substitution
of the hydrogen atoms on the rings produces various classes of flavonoid, such as quercetin
(the color of Dyer’s oak), peonidin (the colors of peony, cherry, and grape), pelargonidin
(the color of red geranium), cyanidin (the colors of raspberry, apple, and grape), and many
other red, purple, and blue colors in plants [43, Chapter 5].
8.5.8 Animals
The external, visible surfaces of animals are covered with exoskeleton (insect, crab, and
scorpion), feather (bird), hair/wool (sheep, monkey), or skin (human). The structural mate-
rial of the flexible inner parts of the exoskeletons of insects and arthropods is chitin, which
is the same as cellulose except that an –OH group on each glucose unit is replaced by an
–NH(CO)CH3 group. The carapaces of crabs, lobsters, and shrimps contain astaxanthin
(C40 H56 O4 ), which is liberated from its protein chains when cooked. Astaxanthin is related
to carotene with two oxygens added at each of the two end rings, resulting in a pink–red
color.
Animal hair and wool consist of α-keratin, which is a protein formed by three right-
handed polypeptide α-helices that wrap around each other in a left-handed coil like a
8.5 Common materials 181
triple-stranded rope. Nine of these coils surround two more in the center and form a mi-
crofibril of eleven coils. A hair or wool cell consists of a stack of macrofibrils, each of which
is a bundle of hundreds of microfibrils. The color of hair or wool is caused by the pigments
(mostly melanins) in the hair or wool cells.
In the higher animals, collagens are the main structural molecules. Taking the human
body as an example, collagens form the basic structures of tendon, skin, bone, teeth, blood
vessel, cartilage, and even the cornea of the eye. Collagen fibers in the dermis of human
skin are one of the main structures that scatter light back from the skin, as will be discussed
shortly.
Melanins are one of the most common natural pigments in animals. They are responsible
for the color of skin, feathers, hair, etc. Sometimes, melanins form structures that produce
interference colors, such as those found in peacock feathers. Natural melanins consist of a
protein (peptide) portion and a chromophoric polymer portion. The monomers that form
the chromophoric polymer are different for different melanins. For example, eumelanin
contains single nitrogen monomers such as indoles, indolines, pyrroles, and others, while
pheomelanin contains monomers that have two nitrogen atoms.
8.5.9 Humans
People are often the most important subjects in color images. The color reproduction of a
closed imaging system (meaning one that uses a known sensor and a known display medium,
such as an AgX photographic system from film to paper) is often fine-tuned so as to insure
a good reproduction of skin color. Therefore, we will discuss the origin and distribution of
skin colors in some detail. We will also touch on the color of eyes and hair because they
may be important for computer algorithms that attempt to detect and locate faces in color
images.
Skin color
The human skin consists of a series of layers. The top layer is the epidermis, which is about
0.10–0.15 mm thick. It is composed of two types of cell: keratinocytes and melanocytes
(which manufacture the pigment melanin). The epidermis consists of four layers. From the
top down these are stratum corneum (dead keratinized cells), stratum granulosom (several
layers of flat cells, partially keratinized), stratum spinosum (several layers of irregular
polyhedral cells), and the basal layer (a single layer of keratinocytes and melanocytes
interspersed, with melanin pigments present in both types of cell). Through cell divisions,
new keratinocytes are formed in the basal layer and they are shifted upward and gradually
keratinized. The top stratum corneum consists of dead cells that are constantly shed. Next
to the epidermis is the dermis layer which is a connective tissue with collagen, elastic, and
reticular fibers. Within this layer, blood and lymphatic vessels form complex networks. Also
present are sweat and sebaceous glands, hair follicles, hairs, nerves, and minute muscles.
Under the dermis layer are subcutaneous fat and muscle.
Light impinging on the skin propagates through these layers, and is partially absorbed
and scattered by each layer. The absorption of light is mainly due to the various pigments
182 8 Scene physics
present in the skin. There are several types of pigment in the human skin: melanin and
its derivatives (present in the epidermis), bilirubin (reddish-yellow pigments present in
the blood), β-carotene (present in the top layer of epidermis, in the blood, and in the
subcutaneous fat), and hemoglobin (in the blood). In the epidermis of human skin, melanin
(in the form of discrete granules, called melanosomes) is the dominant light absorber.
For Caucasian skins, there is little melanin present in the epidermis, except in the basal
melanocytes. For darker skins, the number and the size of melanin granules increase and
they are present in keratinocytes as well. This accounts for the major difference in skin
color among different races. In the dermis, the dominant absorbers of the visible light
are oxygenated and deoxygenated hemoglobin [836]. Female skin has less melanin and
blood, but more carotenes, and therefore, appears 3–4% lighter than male skin [452, p. 79].
The indices of refraction for most soft tissues are in the 1.38–1.41 range, except for the
adipose (fat) tissue which has a refractive index of about 1.46. Secretions from the sweat
and sebaceous glands are shown to be quite neutral in their spectral absorption. Therefore,
the sweat and sebum that cover the skin surface typically contribute an NIR component to
the total reflected light from the skin. Light scattering in skin is caused mostly by collagen
fiber bundles and red blood cells, both being much larger than the visible wavelength.
The evolution of human skin coloration depends strongly on the the distribution of
ultraviolet radiation on the earth [68, 452]. The color of human skin varies from race to
race and from individual to individual. It also varies within an individual, e.g., the forearm
is darker and less reddish than the forehead. For example, in one case study of Caucasian
skins using the CIELAB space, the skin color of the forearm of an average white male is
L ∗ = 69.9, a ∗ = 7.9, and b∗ = 11.6, while that of the forehead is L ∗ = 66.3, a ∗ = 11.2,
and b∗ = 12.3 [246]. The color difference within an individual is due to exposure to sunlight
irradiation as well as the inherent pigment distribution [274, 275, 797]. Figure 8.12 shows
the effect of exposure to sunlight on the skin spectral reflectance. The sunlight irradiation
reflectance
0.80
0.60
0.50
after exposure
0.40
0.30
0.20
0.10
0.00
400 450 500 550 600 650 700
wavelength (nm)
Figure 8.12. The effect of exposure to sunlight on skin color.
8.5 Common materials 183
white blond
reflectance
0.70
white brunet
Japanese
0.60
0.50
Hindu
0.40
0.30
mulatto
0.20
negro
0.10
0.00
400 450 500 550 600 650 700 750
wavelength (nm)
Figure 8.13. The spectral reflectances of different skins (from Edwards and Duntley, American
Journal of Anatomy, 1939).
makes white skin become darker (e.g., reflectance drops from 0.48 to 0.32). Two types of
skin reaction occur after exposure to sun radiation: an immediate erythema and a persistent
tanning. Increased blood flow in erythema makes the skin redder (and also darker). After
a few days, the red shift gradually disappears. In a study reported by Edwards [275], the
chromaticity change is mainly in dominant wavelength (e.g., from 580 nm to 592 nm relative
to CIE Illuminant C), and only slightly in excitation purity (e.g., from 20.2% to 20.9%).
The effect of exposure to sunlight is less on darker skin. The most significant difference
in skin color is across the different races. Figure 8.13 shows six samples of skin spectral
reflectance curves measured from the buttocks of people with varying pigmentation [274].
Since the buttock is the area least exposed to sunlight, the measured difference represents
purely the difference in race. From this set of data, we can calculate the CIE1931 (Y , x, y)
for skins of different racial types illuminated by CIE daylight illuminants with correlated
color temperatures of 4000 K and 6500 K. The results are shown in Table 8.5. The major
difference is mainly in the lightness (reflectance), ranging from 7.37% in negro to 45.26%
in white blond. Figure 8.14 plots the chromaticity loci of the different skins under D65
184 8 Scene physics
550 nm
500 nm
white blond
Japanese
negro Hindu
600 nm
D65 mulatto
700 nm
white brunet
400 nm
x
Hair color
The portion of a hair external to the skin surface consists of tightly packed dead keratinized
cells. Under an electron microscope, the surface of a hair fiber is covered with overlapping
scales [793]. The root of the hair (the hair bulb) contains melanocytes that synthesize
melanins and transfer them to the neighboring keratinocytes, which are then pushed outward
to form the hair shaft. Hair color is determined by the absorption, reflection (external and
internal), and scattering of incident light [155, 896], and these processes are dependent
8.5 Common materials 185
Eye color
As far as imaging is concerned, there are two regions in the eye where colors are of immediate
interest: the iris and the pupil. The iris consists of the anterior border layer, the stroma, and
the posterior, double epithelial layers [962, p. 39]. The posterior epithelial layers contain a
large amount of melanin pigment, with the amount not varying much for eyes of different
colors [797, p. 75]. In blue eyes, the anterior layer and the stroma contain very little melanin
pigment, and the incident short wavelength light is scattered back, while middle and long
wavelength light is absorbed by the posterior heavily pigmented layers [693, p. 240]. If there
are also yellowish pigments to absorb the short wavelength light, the iris then looks green.
If the anterior layer and the stroma contains a large amount of melanin pigment, then the
iris looks black. If the amount of melanin is less, then the iris looks brown.
The light that is reflected back through the pupil of the eye has several components, one
from the front surface of the retina, one from the pigment epithelium layer behind the retina,
and one from the choroid tissue (richly vascular) behind the pigment epithelium. The first
two components are not strongly colored. The reflection component from the choroid tissue
represents the “fundus” color that one sees when looking through the pupil. The fundus color,
as seen by ophthalmologists, is closely related to the skin pigmentation, being orange in
Caucasoids and dark-brown in negroids [797, p. 75]. In typical photographs, the color is most
distinctly red (the red-eye problem). However, the red-eye problem is rarely seen in the pho-
tographs of Mongoloid or negroid people, presumably the red reflection from hemoglobin
is mostly masked by the melanin pigments (in the pigment epithelium and the choroid).
N N
N Cu2+ N
N N
Both inorganic and organic pigments are widely used, with inorganic pigments being
more common. Inorganic pigments are usually oxides, sulfides, silicates, sulfates, and car-
bonates. The pigment that is used in the largest quantity is titanium dioxide, which is
produced at a particle size of about 200 nm diametre to give effective light scattering. Tita-
nium dioxide (TiO2 ) is used as a white pigment in paints, plastics, inks, paper, cosmetics,
foodstuffs, and pharmaceuticals, etc. Pure titanium dioxide crystals are colorless. It imparts
whiteness by scattering, due to its very high index of refraction (n = 2.71 for the rutile
crystal form). The yellow and red pigments (such as iron oxides) come next in the quantity
of consumption. There is a large variety of organic pigments. Some of them have been used
for many decades, such as monoazo pigments, disazo pigments, and copper phthalocyanine
pigments (see Fig. 8.15). More recent pigments are mainly of two classes: azo and poly-
cyclic pigments. Azo pigments constitute about 50% of the overall world volume of organic
pigments, with polycyclic pigments and copper phthalocyanine pigments constituting about
25% each. High degrees of opacity cannot usually be achieved by organic pigments alone,
because their indices of refraction are not very different from that of the organic polymer
matrix in the binder.
8.5.11 Paints
A large number of man-made objects have painted surfaces. These surfaces show up very
frequently in color images. It is therefore useful to understand the material and optical
properties of paints. The major function of a paint is to coat the surface of objects. Depending
on the intended use of the surface material to be coated, the ingredients in the paint vary
significantly. Therefore, there is no such thing as a “universal paint”. A paint intended for
a wood surface is quite different from one for an automobile finish.
Paint is a colloidal system because it contains solids (pigment particles) dispersed in
liquid (the polymer solution known as the binder). In an oil-based paint, the polymer is
dissolved in solvent (oil), and therefore it is called a polymer solution. However, in a
latex paint, the polymer itself is also in the form of small particles suspended in water. A
typical paint composition consists of about 33% pigments (including filler or extender), 24%
8.5 Common materials 187
polymer (film-former), 41.8% solvent or water, and 1.3% additives (such as driers, anti-
skinning, anti-settling, plasticizers, dispersants, etc.) by weight [261]. Pigments are mainly
used to introduce color to the polymer matrix (which is colorless) by scattering and/or
absorption. Fillers (also known as extenders) are used to fill the polymer matrix. They are
also colorless. The index of refraction of a filler is selected to be close to the polymer
matrix used, so that they do not participate in light scattering. Many different polymers are
used as the binder (film-former), for example, alkyds, polyesters, vinyls, acrylics, epoxides,
polyurethanes, and cellulosics. When the polymer solution dries, the solvent evaporates.
The polymer matrix is formed either by chemical reactions, producing cross-linking, or by
physical packing together. Therefore, the choice of polymer determines the properties of
the paint film. Since the polymer binder of paint is colorless, the spectral composition of
light reflected from a painted surface is often very well described by the NIR model.
Two factors determine the color of a paint: absorption and scattering. Both factors are
dependent on the pigment particle size and its distribution. The term particle size is usually
applied only to sphere-like particles, because not all pigment particles have similar shapes.
Most of them (such as titanium dioxide) are more spherical in shape, but some are more
needle-like (such as iron oxide yellow) and others are flake-like (such as some metallic
pigments). When the particles are elongated in shape, it is the minor dimension that matters
most. Roughly speaking, light absorption is inversely proportional to particle size, d, because
the number of particles per volume is proportional to d −3 and the cross-section of a particle is
proportional to d 2 . Light scattering by a particle depends on its size relative to the wavelength
of the light. The optimum scattering size depends on its concentration, but in general, it is
about half of the wavelength. Therefore, most mean particle sizes of pigments are about 200–
500 nm. The higher the scattering power a pigment has, the more opaque the surface will be.
8.5.12 Paper
Paper is produced from various kinds of fibers, some synthetic, some cotton, but mostly
derived from wood. The bark is removed from the trunk of a tree and the remaining wood
is made into chips. Woodchips can be converted into paper pulp in two ways: (1) mechan-
ical grinding, and (2) chemical treatment. Wood contains about 50% cellulose fibers that
are bound by a polymer, a phenolic compound called lignin (about 25% of the wood con-
tent). The other 25% is hemicellulose (polysaccharide formed from many different sugar
molecules with extensive branching). The lignin contained in the wood turns yellow and has
to be removed from the paper pulp by chemical reactions. Because of its inherent yellow-
ness, paper brightness is defined as its reflectance at a wavelength of 457 nm. To increase the
brightness, paper pulp is often bleached and TiO2 or fluorescent dyes are added. Newsprint
paper has a reflectance of about 60%, while many white printing papers have a reflectance
of about 70–80%, with some very bright papers reaching about 90%.
Both internal and surface sizing (to cover or stiffen with glutinous material to fill the
pores in surfaces) are used in paper making. Rosin (from pine trees) is often added to the
fibers for internal sizing. In the drying process, the tiny rosin globules are melted or fused
together. Surface sizing (for example, with starch) is applied to the surface of paper to seal
the surface fibers, and thereby, increase its strength.
188 8 Scene physics
A paper making machine (such as a fourdrinier machine) uses a fine screen (called a
wire) to spread out the fibers to give a flat surface. The side of the paper that touches
the wire is called the wire side, and the other side is called the felt side. These two sides
have different mechanical properties. The fibers on the wire side are more aligned with the
machine coating direction than those on the felt side. For some printing applications, it may
be important to know which is the wire side and which is the felt side, because the paper
feeder mechanism may work better on one side of the paper than the other side. More recent
paper making machines (such as the twin-wire machines) are often designed to reduce the
difference between the two sides of a paper.
The basis weight of paper used in the USA is the weight in pounds of 500 sheets. However,
sheets are not the same size. A better basis weight is grams per square meter. For example, a
20 lb bound paper (75.2 g m−2 ) is only a little heavier than a 50 lb offset paper (74.0 g m−2 ),
because the size of the bond paper is 17 by 22 , while that of the offset paper is 25 by 38 .
The surface structural fibers of an uncoated paper are about 20–40 µm wide. When
coated with pigment particles, such as clay (kaolin) or chalk (calcium carbonate) of a mean
size about 2 µm, the paper surface becomes much smoother and brighter. The adhesive used
to bind the pigments together is also an important factor in the properties of paper. Examples
of adhesive are starch derivative, protein derivative, cellulose derivative, polyvinyl alcohol,
and polymers in latex form, such as styrene–butadiene copolymers. The structure of paper
coatings is complicated because of the many ingredients and the various drying process
[586]. The print quality on a coated paper is much higher and more consistent than on an
uncoated paper. Lightweight coated papers are typically coated with 4–7 g m−2 of coating
on each side. Heavierweight coated paper has about 12–15 g m−2 of coating on each side.
Typical coating mixes have several ingredients, such as pigments, binders, modifiers, and
other additives. In paints, pigments are distributed in the binder polymer matrix and light
scattering is produced at the pigment/polymer interfaces. On a coated paper, the situation is
quite different. The amount of binder used is relatively small and light scattering occurs at
the air/pigment interfaces. It is the size of the air holes that has to be controlled on a coated
paper, rather than the size of the pigment particles.
Uncoated paper is often somewhat acidic because of the rosin and the aluminum sulfate
used for internal sizing. The sodium silicate used to disperse the pigments in the starch for
coating makes coated paper slightly alkaline. Since acids often depolymerize cellulose and
thus weaken the paper, coated paper or acid-free paper lasts much longer. The drying time
for inks also decreases as the paper or the coating becomes more alkaline. In addition to
the brighter surface, coated paper also makes inks appear more saturated than the uncoated
paper. Therefore, color images printed on coated paper look much better than those on
uncoated paper.
used. Typical drying processes are evaporation, absorption, oxidative polymerization, and
radiation curing. For example, in evaporation drying, the solvent evaporates as the ink is
dried, leaving the pigment particles bound by the polymer matrix (the resin). There are five
major types of ink: liquid, paste, news, screen, and radiation-curable. Inks are formulated
differently for different applications. For example, liquid inks (low viscosity) are used for
gravure and flexo printing, while paste inks are used for lithography and letter press printing.
Low-viscosity nitrocellulose is a very commonly used resin in solvent-based liquid inks.
Other varnishes, such as polyamides, urethane, maleic, and acrylic varnishes, are often
used in mixtures for printing inks. Table 8.6 (from [279, p. 226]) shows some representative
components in inks for various printing applications.
Many varieties of pigment are used in printing inks. Most of them can be used in different
types of printing. Carbon black (mostly in the form of graphite) is the most used black
pigment in printing inks. Colored pigments are mostly organic because of their high color
strength. The pigments used in printing ink for white surfaces have to be small (15–25 nm
in the minor dimension) and transparent. Because the light has to go through a few thin
ink layers (about 1–3 µm) and be reflected from the white paper substrate, scattering by the
pigment particles (and hence opacity) is not a desired feature as it is in the paint application.
For printing on a nonwhite surface (such as brown cartons), ink opacity is also important. In
that case, opaque pigments (such as TiO2 ) have to be added. Inorganic pigments are usually
opaque because their indices of refraction are quite different from that of the organic varnish
that carries them. Inkjet printing often uses inks that are solutions of dyes and they dry by
absorption into the paper substrate. There is also a trend to use pigments in the inks used
in inkjet applications to give lightfastness and less bleeding on the paper.
In multicolor press, inks of different colors are printed on top of each other. Two physical
properties are important in this type of printing: tack and trapping. Tack is the resistance of
a thin film of ink to breaking up. If the tack of an ink is too low, the half-tone dot will not be
sharp. If it is too high, then it may cause picking of paper or pulling of the coating from the
coated paper. Trapping is the transfer of ink from one surface to another. Since a high-tack
ink on the paper will trap a lower-tack ink from the blanket, in a multicolor press the first-
down ink needs to have the highest tack, and the succeeding inks have less and less tack.
Table 8.6. Some representative ink compositions for various printing applications
binder rosin ester long-oil alkyd long-oil alkyd polyamide rosin ester long-oil alkyd
long-oil alkyd phenolic resin phenolic resin nitrocellulose metallated rosin epoxy resin
hydrocarbon hydrocarbon hydrocarbon shellac cellulose ester nitrocellulose
resin resin resin
waxes drier, waxes drier, waxes acrylic resin hydrocarbon resin cellulose ester
solvent hydrocarbon oil hydrocarbon oil hydrocarbon oil alcohol ester/ketone hydrocarbon
vegetable oil vegetable oil vegetable oil water toluene alcohol
glycol ether aliphatic ester/ketone
hydrocarbon
products publications general general packaging packaging/ textiles/posters/
commercial commercial publication signs
190 8 Scene physics
Therefore, a printing process that uses the same inks, but different ink-down sequences, will
produce different colors. The other cause of color difference due to ink-down sequence is
that inks are not completely transparent. Therefore, the ink-down sequence is also part of the
specification for a multicolor press. The most commonly used color sequence seems to be
cyan, magenta, and yellow (C–M–Y) with the black before, after, or in between [142, p. 64].
It is often argued [636] that vision is not possible if the physical world does not have enough
regularity, because visual tasks are ambiguous and the underlying inverse problems are ill
posed [765]. Although certain regularities can be found by experiment [308, 416, 879],
the source of any given regularity of natural scenes can be difficult to identify [48, 813].
Most regularities of natural scenes discovered so far are statistical and therefore cannot
be applied to any individual scene. However, many practical problems (e.g., the color and
density balance problems in image printing and display) do not currently have a reliable,
deterministic solution. Most color image processing algorithms today rely on some types
of Bayesian estimation and prior statistics are used either explicitly or implicitly. It is thus
useful for us to learn about some of the regularities of natural scenes. The important thing
to keep in mind here is that, before the cause or the source of a regularity is identified, we
should use it with great care. It is always a good practice to build in some error checking
steps in the algorithms when using statistical assumptions.
The standard deviations of L, s, and t are 0.273, 0.065, and 0.030 for an image database of
2697 images. The composition of the image database is fairly close to a typical consumer
image population: (1) about 60% are indoor scenes and 40% outdoor scenes; (2) about 70%
have faces in them; (3) about 28% have sky in them; (4) about 43% have visible green
vegetation; and (5) about 23% have sky and green vegetation. From these statistics, it seems
that the integrated color of a natural scene is reasonably close to neutral most of the time.
2 If the estimated luminance range is affected by the camera flare, one would not be able to find a scene
luminance range as wide as the data show.
192 8 Scene physics
the power spectra are said to be scale-invariant. It should be noted that this is true only at the
high frequency end [548] and it is not isotropic in all directions. Because the earth’s gravity
dictates how things should grow or be built, the horizontal and the vertical frequencies tend
to have higher energy contents. In a carpentered environment,3 it is also found that there
is more spatial frequency energy in 1–25 cycles per degree along the horizontal and the
vertical directions than other directions [929].
8.7 Problems
8.1 Why does the color of a piece of dyed fabric look more saturated when it is wet?
8.2 Why does the sky look red during sunset?
8.3 The BRDF for a perfect Lambertian reflector is f r = dL r /dE i = 1/π . A Lambertian
sphere when illuminated by a collimated light source (all rays are parallel) does not
appear as a uniform disk because different areas on the sphere are oriented at different
angles from the light source and the surface irradiance E i falls off as cos θi , where
θi is the angle between the surface normal and the incident light beam. A full moon
looks like a uniform bright disk (neglecting the somewhat random dark spots for
the moment). Therefore, the moon surface is not a Lambertian reflector. Since the
distance from the moon to the sun is much larger than the radius of the moon, we can
assume that the moon is illuminated by parallel light rays coming from the sun. Let
us further assume that the moon is a uniform, smooth spherical surface which reflects
light as a function of the vertical angle, θr , but independent of the horizontal (azimuth)
angle, φr . (The vertical angle, θr , is the angle between the reflected ray and the surface
normal.) If the reflectance of the lunar surface is ρ, what is its BRDF, f r (θr )?
8.4 Equation (8.55) shows the relationship between the internal transmittance Ti of a
transparent film and the externally measurable reflectance factor L r /L. Typically, r1
depends on the film material. Let us assume that r1 = 0.04. If the film is placed
in contact with a perfect, reflecting white base (i.e., r2 = 1.0), plot the reflection
density Dr = − log10 (L r /L) as a function of the (internal) transmission density Dt =
− log10 Ti . What is the slope when Dt is large?
8.5 Computer detection and recognition of human faces are an important area of research
with many possible applications. One of the key features in such a task is the skin
color. Based on what we described in this chapter, list the factors that need to be
considered when designing a skin-color detection program. For example, what color
space would you use and why?
3 A carpentered environment is an indoor environment in which there are plenty of wooden structures (window
frames, door frames, cabinets, desks, chairs, etc.). It can also refer to an outdoor environment where wooden
structures are abundant.
9 Optical image formation
Imaging is a mapping from some properties of the physical world (object space) into an-
other representation of those properties (image space). The mapping can be carried out
by changing the propagation of various types of physical signals. For example, medical
ultrasound imaging is the mapping of the acoustic properties of the body tissue into their
representation in the transmitted or reflected intensity of the acoustic field. The mapping is
carried out by the absorption, scattering, and transmission of the acoustic energy. Optical
imaging, the formation of an optical representation separate from the original objects, is a
mapping carried out mostly by changing the directions of the electromagnetic waves coming
from the objects. Insofar as light can be treated as rays, the spatial mapping from a point
in the object space to a point in the image space can be studied geometrically. This field is
called the geometrical theory of optical imaging. Situations arise when the wave nature of
the light has to be dealt with explicitly. This field is called the physical (or wave) theory of
optical imaging. Of course, there are other cases where the quantum nature of the light is
the dominant characteristics to be considered.
In this and the next chapter we will study only the basic concepts and processes of
optical imaging. The three main subjects to be studied are geometric optics, physical optics,
and the radiometry of imaging. Since the literature on these subjects is voluminous and
the underlying principles and theories are very well presented in many books on physics,
electromagnetics, and optics [51, 124, 352, 385, 453, 510, 657, 733, 1021], our approach
is to build upon the fundamental results that can be derived from first principles such as
the Maxwell equations and concentrate on developing the concepts and relations in optical
imaging that will be useful in engineering applications.
As we have already discussed, our understanding of the nature of light and its interaction
with matter has changed many times in history. The current theory is known as quantum
electrodynamics, which describes very accurately both the wave-like character of electro-
magnetic radiation, such as in diffraction, and the discrete character of light, such as in
quantum light absorption. Electromagnetic theory based on the Maxwell equations can be
regarded as a special case in which the quantum effect can be neglected. Maxwell’s theory
describes the wave-like character very well for electromagnetic waves with the wide range
of wavelength of 10−9 –103 m. The wavelength of visible light is less than a micron. Since
193
194 9 Optical image formation
the object sizes we are dealing with in an optical imaging system are often many orders of
magnitude larger than that, the laws of electromagnetic wave propagation can often be very
well approximated by completely neglecting the finiteness of the wavelength. In practice,
this simplifies a lot of computations and produces very adequate results. The limiting case
of making λ → 0 allows us to treat light as rays following certain geometric laws, and the
optics so developed is called geometrical optics. The basic assumption of geometric optics
is that the magnitudes of the changes in the electric and magnetic fields are much smaller
than the magnitudes of the two fields over a distance of the order of the wavelength. This
assumption is obviously not true where there is strong spatial or temporal discontinuity, such
as around the occlusion boundary of an aperture. The other important case where geometric
optics is not quite adequate is at the point of focus. For these cases, diffraction theory has
to be used and they are considered later in the discussion of physical optics. However, these
are exceptions, not rules. For most optical problems, geometric optics provides at least a
very good starting point for more refined and in depth analysis.
Many of the results in optics have been derived under further simplification, which leads
to linear optics and Gaussian optics. Linear optics is an approximation of geometrical optics.
It assumes that all angular quantities are so small that only the linear terms are required
for the calculation, e.g., sin θ ≈ θ, tan θ ≈ θ, and cos θ ≈ 1. Gaussian optics deals with a
special case of linear optics, where all optical surfaces are rotationally symmetric about a
central axis (the optical axis). Since most imaging systems are designed to be as close to
rotational symmetry as cost permits, we will develop the basic mathematical descriptions
of optical imaging in terms of Gaussian optics. Wherever it is necessary, we will provide
more accurate descriptions based on more general geometrical optics or physical optics.
In geometrical optics, light is treated as rays that are infinitely small compared with all
the components in the optical system. Obviously, light cannot propagate as a single ray,
otherwise many optical phenomena, such as interference and diffraction, could not occur.
However, from the Maxwell equations, it is possible to derive Fermat’s principle, Snell’s
law, the eikonal equation, and the ray equation, which form the basis of geometrical optics
[510, 608]. Here we follow the arguments used by Sommerfeld and Runge [510, p. 11]
and Born and Wolf [124] to show that the constant-phase surface of an electromagnetic
(EM) wave can be described by the eikonal equation. The optical rays represent the normals
of the phase surface. As pointed out by Kline and Kay [510, Chapter 1], the arguments
are not completely satisfactory. Our purpose in presenting it here is only to provide some
hints for how the Maxwell equations can be linked to geometrical optics. This is based on
the belief that the former is more basic than the latter. On the other hand, we can take the
completely opposite position that we accept the basic laws in geometrical optics and try to
derive the Maxwell equations. It was shown by Stavroudis [899, Chapter XI] that we can
indeed come very close to deriving Maxwell’s electromagnetic equations from the basic
laws of geometrical optics. Interested readers should consult the books by Kline and Kay
[510], Stavroudis [899], and Luneburg [608].
9.2 The basis of geometrical optics 195
An arbitrary complex time function of the EM field can be decomposed into Fourier
components of time harmonics. Therefore it is useful to study first the simple case of a
general time-harmonic field.
In the regions free of currents and charges, the vectors E0 , and H0 , will satisfy the time-free
Maxwell equations. By defining k0 = 2π/λ0 , where λ0 is the wavelength in the vacuum, we
may represent the fields many wavelengths away from the source by the following general
types of fields:
Under the assumptions that λ0 → 0 and the terms that contain the factor of 1/k0 can be
neglected, one can derive the following equation from Maxwell’s equations [124, p. 112]:
2 2 2
∂ψ ∂ψ ∂ψ
∇ψ · ∇ψ = + + = n 2 (x, y, z), (9.5)
∂x ∂y ∂z
where n is the index of refraction. The function ψ is called the eikonal, and the equation is
called the eikonal equation. The surfaces ψ = constant are where the phases of the fields
are constants, and are called the geometrical wavefronts. The energy of the EM wave can
be shown to propagate, with a velocity of v = c/n, in the direction of the surface normal
of these wavefronts. Geometrical light rays are thus defined as the orthogonal trajectories
to the geometrical wavefronts. The electric and magnetic vectors are orthogonal to the rays
at every point
Let r(s) denote the position vector of a point on a light ray, and s the arc length of the ray,
then dr/ds is a unit vector pointing to the direction of the light ray. The eikonal equation
can then be written as
dr
n = ∇ψ. (9.6)
ds
Since the distance between the two neighboring wavefronts, dψ, can be expressed as
dψ = dr · ∇ψ = nds, (9.7)
P2
the integral, P1 nds, taken along a curve, from point P1 to point P2 , is thus called the
optical path length between the points. The shortest distance between two points may not
be the shortest optical path between them because light travels slower in the material of
higher index of refraction. In many cases, a light ray travels along the path of the shortest
optical length (or the path that takes the minimum time). However, this is not always true.
It can be shown [682, p. 11] that a light ray always travels along the path that has a zero
derivative with respect to time or with respect to optical path length (meaning that it can be
a minimum, a maximum or an inflection point). This is called Fermat’s principle.
One of the consequences of the ray being the gradient vector of a scalar field is that if
the ray vector is operated on by a curl operator, the result is zero (∇ × ∇ψ = 0). This leads
196 9 Optical image formation
image plane
object
eye
o
center of projection
to the proof of Snell’s law [124, p. 125] which says that the incident ray, the refracted ray,
and the surface normal are all on the same plane. Furthermore, the ratio of the sine of the
refracted angle to the sine of the incident angle is equal to the ratio of the refractive index of
the incident medium to that of the refracted medium. In optical imaging, Snell’s law allows
us to trace rays through the various refracting surfaces in the optical system and many lenses
are designed with computer programs that do ray tracings based on Snell’s law.
Geometry, in a broad sense, is the study of properties and relations of given elements, such
as points, lines, curves, and surfaces, that remain invariant under specified transformations.
Modern geometry [135, 394] adopts the definition, proposed by Christian Felix Klein (1849–
1925), that geometry consists of a set of elements and a transformation group associated
with the set. Here the term group is a mathematical entity that has a set with a binary
operation defined on the set, satisfying closure, identity, inverse, and associativity. For
our present purpose, the transformations are linear and represented by matrices and the
binary operation simply means matrix multiplication. According to this view, n-dimensional
projective geometry (S,T ) consists of a set, S, of elements in (n + 1)-dimensional space
(excluding the center of projection) and all the invertible (n + 1) × (n + 1) matrices as the
transformation group, T. An element in the set, S, is called an image point, which is in fact
a space line going through the origin. By convention, the origin is taken as the center of
projection. Note that it is not necessary to specify any image plane in this definition. This
avoids the awkward problem associated with the space points lying on the plane that passes
through the center of projection in a direction parallel to the image plane. Those points are
not mapped to any points on the image plane if one is assigned.
Let us look at the case of two-dimensional projective geometry. Let (xi , yi , z i ) be an
image point in S. By definition, all space points (kxi , kyi , kz i ), k = 0, also belong to the
same image point (xi , yi , z i ). Recall that an image point is actually a space line passing
through the origin (0, 0, 0). Any space plane that contains the space line can be described
as axi + byi + cz i = 0, where a, b, and c, are real numbers and a 2 + b2 + c2 = 0. The
coordinates of the image point (xi , yi , z i ) are called the homogeneous coordinates, because
the equation axi + byi + cz i = 0 is a homogeneous equation. When an image plane is
chosen, say at z = f , then the space line specified by (kxi , kyi , kz i ) intersects with the
image plane at ( f xi /z i , f yi /z i , f ). Since all points on the image plane have the same
z coordinate, we can use the inhomogeneous coordinates, ( f xi /z i , f yi /z i ), as the two-
dimensional coordinates of the image point (xi , yi , z i ). This reduction in dimensions is the
basic nature of projective geometry in which a space line is identified with an image point.
That is why we call this two-dimensional (instead of three-dimensional) projective geometry
because the image points can in fact be specified with two-dimensional coordinates if an
image plane is chosen. Thus, one advantage of using the homogeneous coordinates is that
we do not have to specify an image plane.
The important properties of projective geometry are: (1) a straight line is mapped to a
straight line; (2) incidence relations are preserved (e.g., a point on a line will be mapped to
a point on the image on that line); and (3) the cross-ratio is preserved [135]. Other familiar
properties are: the images of parallel space lines intersect at a vanishing point and the images
of parallel space planes intersect at a vanishing line [482]. The fundamental theorem of
projective geometry says that n + 2 independent points are sufficient to determine a unique
projective transformation in n-dimensional projective geometry. This theorem, when applied
to colorimetry, tells us that we need four chromaticity points to determine the transformation
between two color systems that use two different sets of primaries. All these properties are
proved in many textbooks on geometry (see, e.g., [135, 394]).
Since we will be developing the geometrical theory of optical imaging for images in three-
dimensional space, we will use three-dimensional projective geometry. The transformation
198 9 Optical image formation
(x, y, z)
r b
o a
Figure 9.2. The inversion transformation relative to a sphere centered at o, with a radius of r . The
point (x, y, z) is mapped to the point (x , y , z ) with a · b = r 2 .
By definition, we can divide the homogeneous coordinates by any nonzero scale factor, we
can write (x , y , z , t ) as (x , y , z , 1), where x = x /t , y = y /t , and z = z /t , and
m 11 x + m 12 y + m 13 z + m 14 t
x = ,
m 41 x + m 42 y + m 43 z + m 44 t
m 21 x + m 22 y + m 23 z + m 24 t
y = ,
m 41 x + m 42 y + m 43 z + m 44 t
m 31 x + m 32 y + m 33 z + m 34 t
z = .
m 41 x + m 42 y + m 43 z + m 44 t
The above equations are often called the projective transformation. We will use this transfor-
mation in the next section in the development of the geometrical theory of optical imaging.
It should be pointed out that, from Klein’s point of view, projective geometry is the
most general type of geometry. It includes Euclidean geometry, affine geometry, inversive
geometry, and non-Euclidean geometry as special cases [135] and we now apply the inver-
sion operation used in the inversive geometry (see Fig. 9.2). The transformation group in
this geometry maps points inside a sphere to points outside of the sphere, and vice versa.
The sphere is mapped into itself. The inversion transformation is defined as follows. Let
the sphere C be centered at the origin and its radius be r . A point (x, y, z) is mapped to
(x , y , z ), which is on the line connecting the origin and the point (x, y, z), i.e., x = kx,
y = ky, and z = kz, and
x 2 + y2 + z2 · x 2 + y 2 + z 2 = r 2 . (9.8)
9.4 The geometrical theory of optical imaging 199
k = r 2 /(x 2 + y 2 + z 2 )
and
r 2x r2y r 2z
x = , y = , z = .
x 2 + y2 + z2 x 2 + y2 + z2 x 2 + y2 + z2
The geometrical theory of optical imaging can be formulated without referring to any optical
components such as lenses or mirrors. The theory is purely abstract, but its beauty is that
real optical imaging systems are well described by it in the region near the optical axis. The
theory can be proved with rigorous but lengthy mathematics. Here we shall only summarize
the main results from the theory. Readers who are interested in more details should consult
the discussion in [124, Chapter IV].
First, let us consider what we want to achieve in an optical imaging system. An “ideal”
imaging system should be able to form a sharp image from an object. Ideally, the sharp image
should also preserve the exact geometrical shape of the object. If it is a three-dimensional
object, it will be ideal if the three-dimensional image is an exact duplicate of the original
object (except for scale and mirror image). Therefore, sharpness and similarity are the two
desired geometrical attributes of an ideal optical imaging system. How do we define these
two attributes mathematically?
If all the rays that radiate from a point in the object space can be so refracted as to
converge to a point in the image space, we have indeed created a truly sharp point image
for that point object. If an imaging system can do this for all the points in the object space,
we can say that this system produces sharp images. However, it is not practical to collect
all the rays in any direction and it is not necessary to do so to form an image. Therefore we
arrive at the following definition:
Regarding the similarity between the object and the image, we can allow for the mirror
image to be accepted as being similar. If every curve in the image space is similar to its
conjugate curve in the object space, we say the image is a perfect image, and the imaging
is perfect imaging. It should be noted that the object space and the image space are both in
three-dimensional domains.
An optical imaging instrument producing images that are stigmatic (sharp) and perfect
(similar) is called an absolute instrument. The following theorems state important properties
of an absolute imaging instrument:
200 9 Optical image formation
The above two theorems combined put very serious constraints on what types of images
can be produced by an absolute imaging instrument. If we would like to have an imaging
system that is both stigmatic (sharp) and perfect (the three-dimensional image is geometri-
cally similar to the three-dimensional object), and the object space and the image space to
be in the same homogeneous space (i.e., same index of refraction), since the optical length
of any curve in the object space is the same as the optical length of its image, we can only
have an imaging system that reproduces an image that is exactly the same size as the object.
Such a system is just like a plane mirror and it is not a very intersting system to have. In
order to change the image size in the same homogeneous space, we have to drop either
the requirement of stigmatism or that of exact similarity. It can further be shown [124, p.
149] that no more than two surfaces may be sharply imaged by a rotationally symmetric
system (except the degenerate case, such as a mirror). In most real optical imaging systems,
constructed from lenses and mirrors, three-dimensional similarity is never achieved. Images
are sharp and similar to the object only on a small region of only one plane or curved surface
(i.e., the image plane or surface conjugate to the object).
The above mathematical arguments suggest that we could approximate the imaging
process as a projective transformation even though we cannot achieve it exactly. This is
indeed true for Gaussian optics, where the off-axis distances and angles are assumed to be
small. As will be described later, analysis of the elementary properties of lenses, mirrors,
and their combinations under the assumptions of Gaussian optics shows that imaging with
a refracting or reflecting surface of revolution is a projective transformation. It can also
be proved that the combination of two successive projective transformations which are
rotationally symmetrical about the same axis is also a projective transformation. We can
therefore conclude that, within the approximation of Gaussian optics, imaging by a centered
system is a projective transformation. We now show that a projective transformation under
rotational symmetry can be characterized by only four variables.
A general projective transformation is described by the following relations between a
point (x, y, z) in object space and a point (x , y , z ) in image space (see Fig. 9.3). Both are
referred to the same Cartesian coordinate axes.
m 11 x + m 12 y + m 13 z + m 14
x = , (9.9)
m 41 x + m 42 y + m 43 z + m 44
m 21 x + m 22 y + m 23 z + m 24
y = , (9.10)
m 41 x + m 42 y + m 43 z + m 44
m 31 x + m 32 y + m 33 z + m 34
z = . (9.11)
m 41 x + m 42 y + m 43 z + m 44
Let us take the z-axis as the axis of symmetry. Because of rotational symmetry, we only
9.4 The geometrical theory of optical imaging 201
Y
image point
(x′, y′, z′)
X
Z
(x, y, z)
object point
Figure 9.3. Coordinates for projective transformation from an object point (x, y, z) to an image point
(x , y , z ).
B B'
h h’
z z0 z’0 z’
H’ z
A N F H N’ F’ A'
f’ f f f’
Figure 9.4. Coordinates for projective transformation from an object point (x, y, z) to an image point
(x , y , z ) for a rotationally symmetric system. The six cardinal points of an imaging system are the
focal points (F and F ), the principal points (H and H ), and the nodal points (N and N ).
value of its conjugate object distance, z, is also infinity. This point is called the image focal
point F (also called the back focal point or the image focal point). The lateral magnification
m y can be calculated as
%
dy %% f z − z 0
my = = = . (9.17)
dy %z z − z0 f
If the object is located at z = z 0 + f , then the lateral magnification is equal to 1. The
plane z = z 0 + f is called the object principal plane (also called the first principal plane
or the front principal plane). Similarly, the plane z = z 0 + f is called the image principal
plane. The object and image principal planes intersect with the axis at the object and image
principal points, H and H (see Fig. 9.4). The distance from the object focal point F to
the object principal point H is f , and this is called the object focal length. Similarly, f is
called the image focal length. We can now conclude with the following statement.
definition of the principal planes, B , the image of B, is on the image principal plane (at
z 0 + f ) at a distance h = h away from the axis. The emerging ray intersects the axis at a
point A , located at z . The incident angle θ of this ray at the object principal plane is given
by tan θ = h/(z 0 + f − z). The exit angle θ of the emerging ray at the image principal
plane is given by tan θ = h /(z 0 + f − z ). Since h = h , the angular magnification m a
can be calculated as
tan θ z0 + f − z
ma = = . (9.19)
tan θ z0 + f − z
From Eq. (9.16), if z = z 0 − f , then z = z 0 − f , and m a = 1. This means that if a ray
passes through a point N at z = z 0 − f in the object space, it will emerge at a point N at
z = z 0 − f in the image space at an exit angle exactly equal to its incident angle. These two
points, N and N , are called the object nodal point and the image nodal point, respectively.
The two focal points, the two principal points, and the two nodal points are called the
cardinal points of an optical imaging system (see Fig. 9.4). Their locations are as follows:
z F = z0, (9.20)
z F = z 0 , (9.21)
z H = z 0 + f, (9.22)
z H = z 0
+ f , (9.23)
z N = z0 − f , (9.24)
z N = z 0 − f. (9.25)
Another useful relation expresses the distance, s , between the image plane and the image
principal point as a function of f and m y . From Eqs. (9.17) and (9.23), we have1
s = z − z H = (m y − 1) f . (9.26)
Example 9.1
The object focal point F and the image focal point F of an imaging system are located
at z F = 40 and z F = 120. It is also known that the object nodal point N and the image
nodal point N are located at z N = 80 and z N = 100. What are the object focal length f
and the image focal length f of this system? Where are the principal points, H and H ?
If an object is located at z = −800, then where is its image location z ? What is the lateral
magnification in this imaging condition?
1 Under many imaging conditions, m y and f are both negative numbers. Therefore, we might see the relation
expressed as s = (1 + m) f , where m is the absolute value of m y and f = − f if both the object space and the
image space are in the same medium, such as air.
204 9 Optical image formation
It is clear that if we use the two focal points as the origins for the object and image
spaces, all the equations are simplified. One minor inconvenience in doing so is that the
focal points are usually located at some intangible surfaces and therefore it is not easy
to measure the distances from them. For a thin lens, the principal points lie very close
to the lens vertices (the intersection point between the optical axis and the lens surface).
Therefore, formulas that relate distances measurable from the principal points have obvious
convenience. For example, Newton’s equation (9.16) can be cast into a form that is expressed
in distances measured from the principal points. Let the object distance from the object
principal point be s = z − (z 0 + f ) and the image distance from the image principal point
be i = z − (z 0 + f ), then
ff
z − z 0 = ,
z − z0
ff
i + f = ,
s+ f
(i + f )(s + f ) = f f ,
i(s + f ) = −s f ,
1 s+ f 1 f
=− = − − ,
i sf f sf
and therefore we arrive at the Gauss equation:
f f
+ = −1. (9.27)
s i
In the above analysis of general optical imaging systems, we have employed a coordinate
system that is used in most textbooks on optics and will continue to be used in this book.
We therefore should take a moment to specify it more clearly here.
The z-axis is always pointing to the right and is the axis of symmetry if the system
is rotationally symmetric. This axis is called the optical axis of the system. The distance
is positive if measured from left to right and negative if measured from right to left. For
example, the object focal length f = z H − z 0 is measured from the object focal point F to
the object principal point H . If H is located to the right of F, then f is positive. Very often,
in contrast to the object space, the image principal point H is located to the left of the image
focal point F , and therefore, the image focal length f = z H − z 0 is negative. It is always
important to know the reference point or surface from which distances are measured.
We will use a right-handed coordinate system with the x-axis pointing perpendicularly
into the page, and the y-axis pointing upward vertically on the page. For a rotationally
symmetric system, an object point can be specified by only its y coordinate, assuming
that x = 0. In other words, the y-axis is (defined to be) on the plane that passes through
the object point and the optical axis. This plane is called the meridional plane (it is also
called the tangential plane). Any ray that is on this plane is called a meridional ray. In a
9.5 Conventions and terminology in optical imaging 205
entrance pupil
exit pupil
aperture stop
Figure 9.5. The entrance pupil is the image of the aperture formed by the optical components between
the aperture and the object. The exit pupil is the image of the aperture formed by the components
behind it.
rotationally-symmetric system, a meridional ray stays on the meridional plane all the way
through the system. Any ray which is not on the meridional plane is called a skew ray.
In any physically realizable imaging system, light can only pass through a finite aperture
(see Fig. 9.5). The aperture can be formed by the opening of a physical plane (called
the aperture stop), or it can represent the narrowest cross-section in the light path through
the system. This aperture as seen from the object side is called the entrance pupil, and
when seen from the image side is called the exit pupil. In other words, the entrance pupil
is the image of the aperture stop formed by all the optical components in front of it (i.e.,
on the object side). Similarly, the exit pupil is the image of the aperture stop formed by all
the optical components behind it (i.e., on the film side). Therefore, the size of the entrance
pupil and the size of the exit pupil are often different from the actual size of the aperture.
For example, the pupils in my eyes as seen by you are larger than their actual sizes because
you are seeing the images of my pupils as formed by my corneas.
Knowledge of the size and position of the entrance pupil and the exit pupil is very
important in analyzing an imaging system. The entrance pupil forms the base and the object
point forms the apex of the cone of light rays that are accepted by the imaging system.
Similarly, the exit pupil forms the base and the image point forms the apex of the cone of
light rays that are focused to the image point by the imaging system. In general, the size of
the entrance pupil and the size of the exit pupil are not the same. The ratio of the diameter
of the exit pupil, aex , to that of the entrance pupil, aen , is called the pupil magnification (or
pupil factor), m p , i.e.,
aex
mp = . (9.28)
aen
206 9 Optical image formation
This parameter is important when we discuss the radiometry and photometry of an imaging
system.
The meridional ray which is directed toward the center of the entrance pupil is called the
chief ray. The ray is important in the sense that it represents approximately the center of the
cone of rays going through the optical system. It passes through the center of the aperture
stop. Any ray which passes the margin of the entrance pupil is called a marginal ray.
The plane which contains the chief ray and is perpendicular to the meridional plane is
called the sagittal plane. Any ray which is on the sagittal plane is called a sagittal ray.
Rays which make small angles, θ, with the optical axis are called paraxial rays. A small
region around the optical axis is called the paraxial region. The paraxial approximation
assumes that sin θ ≈ θ and cos θ ≈ 1. As mentioned before, optical calculation using the
paraxial approximation for a rotationally-symmetric system is called Gaussian optics.
The points where the optical axis intersects the front and back surfaces of the lens are
called the front and back vertices of the lens. These are tangible points and many optical
distances of a lens are specified with respect to the two vertices.
Having explored the geometrical theory of optical imaging, we will now look at the analysis
of real imaging systems, constructed from lenses and other optical components. Although
the basic tools of optical analysis are very simple (Snell’s law is often said to be the only
thing needed), the complete mathematical analysis of any practical system is almost always
not practical. Lens designers rely heavily on tracing many, many rays through various optical
components in the system to get a feeling of how well the system might work. For example,
if we trace many rays from an object point through the system to the image plane and all the
rays converge to a single image point, we know that our optical design will work well for
this object point. We can then trace rays from other points in the object space to find their
image quality in the image space. Obviously, the more object points we check, the better
the idea we have about the optical performance.
The ray tracing operation is mostly performed by computers, and experience, insight,
and knowledge are required to set up the initial designs to be given to the computers for
further modification and optimization. An optical system can be considered as consisting of
a number of refracting surfaces spaced apart from each another. The ray tracing operation is
to propagate the ray from an object point through one surface to another and trace it through
the entire system. There are two basic processes involved in the operation: transfer and
refraction [899, 1021]. The transfer process involves propagating the ray until it intersects
with a refracting surface. The major problem to solve here is the determination of the
intersection point. The refraction process involves finding the direction of propagation
after the ray is refracted at the refracting surface. These two processes are repeated again
and again until we have traced the ray to the desired image space. In this section, we
will study how these two processes are carried out using a spherical surface as an example.
Spherical surfaces have been the most important type of surface for lenses made by grinding,
because they are the only type of surface that can be produced by grinding two surfaces
9.6 Refraction at a spherical surface 207
I’
z
I B
A o
Figure 9.6. Ray tracing of an arbitrary ray through a spherical refracting surface.
repeatedly [385]. Aberrations of spherical lenses are well studied for their wide use in
imaging optics.
Referring to Fig. 9.6, a spherical surface φ(x, y, z) = x 2 + y 2 + (z − r )2 − r 2 = 0 in-
tersects the optical axis at point O, where x = y = z = 0. Here we will assume r > 0. A
ray radiating from point A is propagating along the direction of the unit vector I, and it
intersects the sphere at the point B. The surface normal of the sphere at B is the unit vector
N pointing to the center of the sphere. The incident ray is refracted by the sphere into the
direction of the unit vector I . The transfer process is to determine the intersection point B
and the refraction process is to determine the vector I . In the following we will let A and
B be the positional vectors from the origin O to the points A and B, respectively.
First, let us solve the transfer process. Since B is a point on the ray, it can be expressed as
B = A + kI, where k is a scalar. Let U be a unit vector along the z-axis, i.e., U = [0, 0, 1]T .
Thus, the center of the sphere is represented as r U. Let R = [x, y, z]T be a point on the
sphere, then
φ(R) = x 2 + y 2 + (z − r )2 − r 2 = x 2 + y 2 + z 2 − 2r z = R · R − 2r R · U = 0. (9.29)
Since point B is on the sphere,
φ(B) = B · B − 2r B · U (9.30)
= (A + kI) · (A + kI) − 2r (A + kI) · U (9.31)
= A · A + 2kA · I + k − 2r A · U − 2kr I · U
2
(9.32)
= k + 2(A · I − r I · U)k + A · A − 2r A · U
2
(9.33)
= 0. (9.34)
Solving for k,
k = −(A · I − r I · U) ± [(A · I − r I · U)2 − A · A + 2r A · U]1/2 . (9.35)
The two solutions represent two intersection points. Since the ray is refracted at the first
208 9 Optical image formation
intersection point, we have to determine which is the solution we need. This can be done
by choosing A to be a point on the axis, say A = −U, and I to be along the axis, i.e. I = U.
The intersection point B will now be at the vertex O, i.e., B = A + kI = −U + kU = O
and k = 1. Therefore we should use the negative sign in Eq. (9.35) and
Having found the intersection point B, we can proceed to the refraction process to
determine the refracted ray vector I . First, we find the surface normal vector N:
∇φ 2B − 2r U 1
N=− =− = U − B. (9.37)
|∇φ| 2r r
From Snell’s law, the refracted vector I is on the plane of incidence spanned by the incident
vector I and the surface normal N. Therefore, we can express I as a linear combination of
N and I:
I = αN + βI, (9.38)
I × N = αN × N + βI × N, (9.39)
sin θ = β sin θ. (9.40)
Therefore β = n/n , where n is the index of refraction of the sphere and n is that of the
space outside the sphere. Since I is a unit vector, we have:
In order to determine the sign, let the incident ray be normal to the surface, i.e., I = N, then
I = I, and α + β = 1. Therefore, the positive sign should be used,
B
I′
I
A q O C
s
c
kr α s k krβ
c = s + k cos θ + 1− − cos θ + cos θ, (9.46)
kα − rβ r r kα − rβ
The values of sin θ and cos θ can be approximated by their power series expansion:
θ3 θ5
sin θ = θ − + − ···; (9.50)
3! 5!
θ2 θ4
cos θ = 1 − + − ···. (9.51)
2! 4!
For small angles, if the first order approximation is used, sin θ ≈ θ and cos θ ≈ 1. All terms
which involve powers of θ higher than 1 are neglected. (Again, we remind ourselves here
210 9 Optical image formation
A Q
Figure 9.8. On-axis imaging: rays that diverge farther away from the axis are focused at a shorter
distance. This is called spherical aberration.
that the first order approximation is called linear optics. If the system is also rotationally
symmetric, the approximation is called Gaussian optics.)
If we use the Gaussian approximation, we can simplify Eqs. (9.45)–(9.49) to the follow-
ing: k = −s, N · I = 1 + sθ 2 /r , α = 1 − β, and
sr
c= , (9.52)
s + rβ − sβ
which can be put into the form of the Gauss equation (9.27):
f f
+ = −1, (9.53)
s c
where
βr nr
f = = , (9.54)
1−β n −n
r n r
f = = , (9.55)
β −1 n − n
f n
= − . (9.56)
f n
Therefore, under the Gaussian optics approximation, a spherical surface of radius r has
an object focal length f and an image focal length f . If n > n, as in the case of a
glass sphere in air, f > 0 and f < 0. Since s and c are both measured from the vertex
O, the object and image principal points are both located at the vertex of the surface,
i.e., z H = 0 and z H = 0. The object focal point is located at z 0 = z H − f = − f , and
the image focal point is located at z 0 = z H − f = − f . The object nodal point N is
located at z N = z H − f − f = −( f + f ) = r , and the image nodal point is located at
z N = z H − ( f + f ) = r . The two nodal points are both located at the center of the sphere.
This simply means that a ray that is directed towards the center of the sphere is perpendicular
to the spherical surface and its direction is not changed by refraction. Also of interest are
the intersection point B and the refracted ray vector I . The Gaussian approximation leads
to B = [0, −sθ, 0]T and I = V /|V |, where V = [0, [s(1 − β)/r ]θ + βθ, 1]T . These will
be used in the next section, where we develop the matrix method for Gaussian optics.
For rays that are at a large angle from the axis, the Gaussian optics approximation does
not apply. Figure 9.8 shows that the various rays radiating from the object point A do
9.7 Matrix method for paraxial ray tracing 211
Figure 9.9. Off-axis imaging: different cones of rays converge at different distances and different
heights.
not converge to the same point. Rays that diverge further away from the axis are focused
at a shorter distance. This is called spherical aberration. The point Q is the image point
calculated from Gaussian optics. One can see that most rays converge at a much shorter
distance on the axis. If we move the image plane closer to the spherical surface, the image
actually becomes sharper, but it is never an ideal point image. The image looks like a blurred
circle (called the circle of confusion). When we move the image plane closer to the lens,
we reach the position at which the blurred circle is the smallest, we can then say that this
is the best focus position. Obviously, this position depends on our definition of the circle
of confusion, but in general, when an imaging system has some aberrations, the best focus
position is often not at the Gaussian image plane.
In Gaussian optics, ray tracing can be very efficiently carried out by matrix multiplica-
tions. For rays that are coplanar with the optical axis (meridional rays), the operation can
be represented by 2 × 2 matrices. For other rays, 4 × 4 matrices are required. The general
212 9 Optical image formation
x
1 µm Q
Gaussian image point
Figure 9.10. Coma-like patterns resulting from cones of rays from an off-axis object point, as seen
on the Gaussian image plane.
M
q2
q1 y2
y1
z
z = z1 z = z2
n = n1 n = n2
Figure 9.11. Paraxial ray tracing by matrix transformation of ray vectors. The ray vector at the
reference plane at z = z 1 intersects the plane at a height y = y1 and at an angle θ = θ1 . The index
of refraction at z = z 1 is n = n 1 . After traveling a distance s along the optical axis, the ray vector
intersects the second reference plane at z = z 2 at a height y = y2 and at an angle θ = θ2 . The index of
refraction at z = z 2 is n = n 2 . The ray vector transition from z 1 to z 2 can be represented by a matrix M.
properties of these matrices are discussed under the subject of (linear) simplectic transfor-
mation in various books [50, 371]. The 2 × 2 matrices lead to a special case of projective
transformation and to a “proof ” that a rotationally symmetric, optical imaging system can
be approximated by a projective transformation characterized by four parameters. This is
the subject we will study in this section.
Let us take a plane, perpendicular to the optical axis and located at z = z 1 , as a reference
plane (see Fig. 9.11). A meridional ray which intersects the reference plane at a height of
9.7 Matrix method for paraxial ray tracing 213
y = y1 (the x-coordinate is irrelevant for such a ray) and at an angle θ1 can be characterized
by a 2 × 1 vector, V1 = [y1 , n 1 θ1 ]T , where n 1 is the index of refraction at (y1 , z 1 ). We trace
this ray to a second reference plane located at z = z 2 and represent that ray with the vector
V2 = [y2 , n 2 θ2 ]T , where n 2 is the index of refraction at (y2 , z 2 ). We would like to determine
the transition matrix M that maps V1 to V2 , i.e., V2 = MV1 .
As we described before, there are two processes in the ray tracing: transfer and refraction.
Let the distance between the two reference planes be s = z 2 − z 1 . If there are no refracting
surfaces between the two reference planes, the ray should propagate in a straight line.
Therefore, n 1 = n 2 = n, θ1 = θ2 = θ, y2 = y1 + s tan θ ≈ y1 + sθ, and
# $ # $# $
y2 1 s/n y1
V2 = = = Mst V1 . (9.57)
nθ2 0 1 nθ1
Therefore the matrix Mst is the transition matrix of the transfer process for a distance s. The
matrix for the refraction process can be determined from our discussion on refraction from
a spherical surface under Gaussian optics. First let us assume that the refraction surface
is very thin, i.e., z 2 ≈ z 1 , and therefore, y2 = y1 , meaning that the height of the ray is the
same before and after the refraction. From the results in Section 9.6.1, we have the following
relations:
1−β
tan θ2 = βθ1 − y1 , (9.58)
R
n1 n2 − n1
θ2 ≈ θ1 − y1 , (9.59)
n2 n2 R
n2 − n1
n 2 θ2 ≈ n 1 θ1 − y1 , (9.60)
R
where R is the radius of the refracting spherical surface. Therefore,
# $ # $# $ # $# $
y2 1 0 y1 1 0 y1
V2 = = = = M Rr V1 . (9.61)
nθ2 −(n 2 − n 1 )/R 1 nθ1 −K 1 nθ1
Thus, the matrix M Rr is the transition matrix of the refraction process for a surface of radius
R. The quantity K = (n 2 − n 1 )/R is often referred to as the power of the refracting surface.
It should be pointed out that the radius of the spherical surface R is positive when the surface
is convex to the left. Otherwise, it is negative.
Since the refraction process plays a central role in image formation, we will give an
alternative derivation of the refraction matrix here, which will give us a more intuitive
feeling of how and where paraxial approximations are made in the matrix method of ray
tracing. Figure. 9.12 shows a diagram of the various angles involved in the calculation. A ray,
intersecting the spherical surface at a height y and an angle θ1 , is refracted at the interface
to a direction of angle θ2 . From the diagram, we can write down the following equations:
s q2
f
y y R
q1
n1
n2
R1 R2
z
n = n1 n = n2 n = n3
d
M
Figure 9.13. A thick biconvex lens with thickness d, index of refraction n 2 , and radii of curvature R1
and R2 can be represented by a matrix M. The indexes of refraction on either side of the lens are n 1
and n 3 .
Since |M1 M2 | = |M1 | · |M2 |, it follows that a system transition matrix, being the product
of transition matrices of individual components, also has a determinant of 1. Therefore we
conclude that the determinant of the 2 × 2 transition matrix of any Gaussian optical imaging
system constructed by refracting surfaces is 1.
Let us look at some examples. A thick biconvex lens (see Fig. 9.13) consists of two
refracting surfaces separated by a distance d. Let R1 and R2 be the radii of the front and
back surfaces. By convention, R1 > 0 and R2 < 0. Assume that the indices of refraction
are n 1 , n 2 , and n 3 for the object space, the lens, and the image space, respectively. Let us
take the vertices of the two surfaces as the two reference planes, then the transition matrix
M of the thick lens is
# $# $# $
1 0 1 d/n 2 1 0
M=
−(n 3 − n 2 )/R2 1 0 1 −(n 2 − n 1 )/R1 1
# $# $# $
1 0 1 d/n 2 1 0
=
−K 2 1 0 1 −K 1 1
# $
1 − d K 1 /n 2 d/n 2
= . (9.73)
−(K 1 + K 2 ) + d K 1 K 2 /n 2 1 − d K 2 /n 2
In the thin lens approximation, d → 0, and
# $ # $
1 0 1 0
M= = . (9.74)
−K 1 −(K 1 + K 2 ) 1
The optical power of a thin lens is the sum of the powers of its two refracting surfaces and
n2 − n1 n3 − n2
K = K1 + K2 = + , (9.75)
R1 R2
which simplifies to:
1 1
K = (n 2 − n 1 ) − (9.76)
R1 R2
when n 3 = n 1 and is called the lens-maker’s formula.
216 9 Optical image formation
n n′
q M
y q′
y′
z z1 z2 z′
Figure 9.14. Diagram of a general imaging system with a transition matrix from z 1 to z 2 . It forms an
image y at z of an object y at z. A ray from the object forms an angle θ with the optical axis and
is refracted to a ray at an angle θ at the image. The indices of refraction are n and n for the object
space and the image space, respectively.
Can we calculate imaging parameters from these transition matrices? The answer is
yes. Here we show how these can be computed for a general optical imaging system (see
Fig. 9.14). Let M be the transition matrix of an optical imaging system from z 1 to z 2 . An
object located at z forms an image at z . Let M be a general matrix:
# $
a b
M= , (9.77)
c d
To find the principal points H and H , we let y = y for all θ because they are a conjugate
pair (i.e., H is the image of H ) with unit lateral magnification. The coefficient of y in the
first term of Eq. (9.79) should be unity and we have
c n
a+ (z − z 2 ) = 1 → z H = z = z 2 + (1 − a). (9.81)
n c
Since y is the image of y, it is independent of θ. The coefficient of θ in Eq. (9.79) should
be zero and since ad − bc = 1, we have
n
z H = z = z 1 − (1 − d). (9.82)
c
To find the nodal points N and N , we let y = y = 0 and θ = θ because they are a conjugate
pair (i.e., N is the image of N ) with unit angular magnification. From Eq. (9.80), we have
n n
z N = z = z1 − −d . (9.83)
c n
9.7 Matrix method for paraxial ray tracing 217
object to H and then continue from H to its image. The object and image distances are
now related by
f f
+ = −1 (9.90)
z − zH z − z H
or
n n
+ = K. (9.91)
zH − z z − z H
Equation (9.91) shows that distance is often divided by the index of refraction in the imaging
equation. For this reason, we call the “normalized” distances, such as (z H − z)/n and
(z − z H )/n , the reduced distances.
Example 9.2
A biconvex thick lens is made of a glass with index of refraction n = 1.5. Its frontal surface is
located at z 1 = 40 with a radius of curvature R1 = 20. Its back surface is located at z 2 = 43
with a radius of curvature R2 = −30. What is the focal length of this lens? Assuming that
the lens is in the air, what is the transition matrix from z = 10 to z = 63?
The focal length f can be found by f = 1.0/0.041 = 24.39. The transition matrix M from
z to z can be determined by
# $ # $ # $
1 20 1 30 0.133 25.333
M = Mz −z2 AMz1 −z =
t t
A = .
0 1 0 1 −0.041 −0.258
Example 9.3
Let M be the transition matrix of an optical imaging system from z 1 to z 2 , where
# $
1 0
M= ,
−K 1
z 1 = 120, and z 2 = 125. The object space and the image space are both in the air, i.e., the
index of refraction n ≈ 1. The image of an object located at z = 20 is formed at z = 135.
What is the power K of this imaging system?
Example 9.4
Let M be the transition matrix of an optical imaging system from the object nodal point N
(in the object space) to the image nodal point N (in the image space):
# $
a b
M= .
−0.06 d
Let n = 1 be the index of refraction of the object space, and n = 4/3 be that of the image
space. Determine the values of a, b, and d. What are the object focal length f and the image
focal length f ?
Solution Let us write the ray-tracing equation between the two nodal points:
# $ # $# $
y a b y
= .
nθ −0.06 d nθ
Therefore, y = ay + bnθ and n θ = −0.06y + dnθ. From the definition of nodal points,
at N ( y = 0) and N ( y = 0), the angles θ and θ should be equal for all rays. Therefore,
we have b = 0 and d = n /n = 4/3. Since the determinant of the matrix should be 1, a =
n/n = 3/4. The refracting power of the system is K = −c = 0.06. The object focal length
is f = n/K = 1.0/0.06 = 16.67 and the image focal length is f = −n /K = −22.22.
In the last section, we concluded that the determinant of the 2 × 2 transition matrix of any
Gaussian optical imaging system constructed by refracting surfaces is 1. We can prove that
the converse is also true: any 2 × 2 transition matrix with determinant 1 can be constructed
from refracting surfaces.
Given a 2 × 2 matrix M with determinant 1,
# $
a b
M= , (9.92)
c d
where ad − bc = 1. If c = 0, it is easy to verify that M can be expressed as the product of
three matrices, each implementable optically:
# $ # $# $# $
a b 1 t 1 0 1 s
M= = , (9.93)
c d 0 1 −K 1 0 1
where t = (a − 1)/c, K = −c, and s = (d − 1)/c. This decomposition was used by Gauss
as a very powerful tool for analyzing a general imaging system. It is therefore called the
Gauss decomposition.
If c = 0, then a = 0 and M can be expressed as the product of four matrices, each
implementable optically:
# $ # $# $# $# $
a b 1 0 1 t 1 0 1 s
M= = , (9.94)
0 d p/a 1 0 1 −K 1 0 1
220 9 Optical image formation
a b
M=
c d
zH zH ′ z
sn tn′
z = z1 z = z2
Figure 9.15. From the Gauss decomposition, a general imaging system represented by a transition
matrix M from z 1 to z 2 is equivalent to a thin lens transition between two principal planes, z H and z H .
tells us that there are two planes located at z H = z 1 + sn and z H = z 2 − tn (where n and
n are the refraction indices of the object and image spaces) such that the transition matrix of
the system from z H to z H is simply a thin lens transition matrix. These two planes are what
we called the principal planes. They are usually two different planes in a general imaging
system, while in the thin lens, they are the same. Let us take the thick lens as an example.
Decomposing M in Eq. (9.73), we have:
# $
1 − d K 1 /n 2 d/n 2
M=
−(K 1 + K 2 ) + d K 1 K 2 /n 2 1 − d K 2 /n 2
# $# $# $
1 −d K 2 n 2 /K 1 0 1 −d K 1 n 2 /K
= , (9.98)
0 1 −K 1 0 1
where K = (K 1 + K 2 ) − d K 1 K 2 /n 2 . The object principal plane is located at d K 1 n 2 n 1 /K
9.9 Generalized ray tracing 221
in front of the vertex of the first refracting surface, while the image principal plane is
at d K 2 n 2 n 3 /K behind the second surface. The separation of the two principal planes is
d[1 + n 2 (n 1 K 1 + n 3 K 2 )/K ].
The idea of principal planes allows us to model a general imaging system, no matter
how complex, as a black box. Let us summarize the results we have derived so far. Let n
and n be the refraction indexes of the object space and the image space, respectively. Let
the object be located at z and the image at z .
r A general, rotationally-symmetric, imaging system has two principal planes located at
z H and z H . The transition matrix M between these two planes is given by that of a thin
lens with refractive power K :
# $
1 0
M= . (9.99)
−K 1
r The object focal length f and the image focal length f are:
n
f = , (9.100)
K
n
f =− . (9.101)
K
r Let the object distance s = z H − z, the image distance i = z − z H , the object height
be y, and the image height be y. They are related by:
f
y = − y, (9.102)
s− f
n n
+ = K. (9.103)
s i
r The object nodal point and the image nodal point are located at s N and i N :
s N = (n − n )/K , (9.104)
i N = (n − n)/K . (9.105)
The six cardinal points – two focal points, two principal points, and two nodal points – plus
entrance and exit pupils can replace an extremely complex optical system in many design
studies. The principal points coincide with the nodal points if the index of refraction in the
object space is the same as that of the image space.
As we pointed out earlier in this chapter, light rays are simply vectors normal to the wavefront
that propagates through an optical system. So far, we have discussed methods of tracing a
ray through space and the refracting surfaces. Although these methods are very important
in the analysis and the design of practical optical systems, they do not directly tell us how
the shape of the wavefront associated with the ray is transformed as it propagates through
the system. In order to trace the wavefront deformation through an optical system, we have
222 9 Optical image formation
to characterize the local properties of the wavefront surface and find out how the properties
are changed during the transfer and refraction processes.
The study of local surface properties is the subject of a branch of mathematics called
differential geometry [730]. On a small neighborhood of any given point on a smooth
(differentiable) surface, we can describe its shape with two principal vectors. The local
surface shape is characterized by the two principal curvatures, k1 and k2 , along these two
vectors, i.e., the shape is approximately z = (k1 x 2 + k2 y 2 )/2 [730, p. 202]. For a given
ray, the local surface shape of its associated wavefront will, in general, have two principal
vectors and two principal curvatures along these directions. (For spherical surfaces, there
is only one curvature for all directions.) The question we would like to ask here is what
happens to the principal vectors and curvatures of the wavefront after it goes through the
transfer and refraction processes? The answer to this question was obtained by Gullstrand in
1906 and then worked out again in vector notation by Kneisly [514]. Interested readers can
also find the derivation in the book by Stavroudis [899, pp. 136–179]. He called the process
of tracing these wavefront parameters through an optical system, generalized ray tracing.
In geometrical optics, the propagation of light is described using rays that are surface
normals of the wavefronts. In a region of space, we can imagine that, in general, there is a
single ray passing through a given point in that region. The system of space-filling rays is
called a congruence. The rays can be curves or straight lines. If, in a congruence, every ray
is a straight line, we call it a rectilinear congruence. Light propagation in a homogeneous
medium forms such a congruence. A collection of rays possessing an orthogonal wavefront
is called a normal congruence. For example, rays coming from a point source form a normal
rectilinear congruence in a homogeneous medium and the wavefronts are concentric spheres
centered at the point source. It can be shown that a normal congruence remains normal after
any number of refractions or reflections. This is called the Malus and Dupin theorem [1021,
p. 18].
There are situations in which the light propagation deviates from its typical rectilinear
path. This deviation is called diffraction. For example, if light propagates through a hole
in an otherwise opaque screen, using geometrical optics, we can trace a straight line for
any given ray passing through the hole. However, this leads to the prediction of a very
sharp shadow boundary which contradicts experimental observation. In reality, the shadow
boundary is a little blurry over an angular region of θ < λ/d, where λ is the wavelength
and d is the diameter of the hole, and the shadow transition exhibits dark and light bands in
its fine structure. This cannot be described by geometrical optics.
In an optical imaging system, the aperture stop constricting the passage of light is a very
important case where diffraction through an aperture limits the ultimate performance of the
system. In order to understand how light propagates through an aperture, we have to treat
light as wave functions and the resulting optical calculation is called physical (or wave)
optics. For imaging applications, the most important subject in physical optics is the study
of diffraction through an aperture.
9.10 Physical optics 223
Y (x, y, z)
X
x
(x′, y′, 0)
dA
O
n n
A
n
cree Z
es
aqu
op
coupled with other components and cannot be treated as a scalar. Symthe formulated such
a vectorial diffraction theory and the Symthe–Kirchhoff diffraction formula [453, p. 438]
is as follows:
1 eikr
E(r) = ∇× [n × E(x)] dA, (9.107)
2π A r
where E(x) is the total tangential electric field in the aperture and E(r) is the total electric
field in the diffraction region. Experiments showed that for a diffraction aperture of diameter
d, the Symthe–Kirchhoff formula is quite accurate for λ/d < π , but it breaks down seriously
when λ/d is larger than π (see [453, p. 446]).
In spite of its simplistic assumption, the scalar diffraction theory produces very accurate
results for optical imaging applications if the diffraction aperture is relatively large compared
with the wavelength and if the diffraction field is not observed too close to the aperture,
i.e., the distance from the observation point to the aperture is much larger than the size
of the aperture [352, p. 32]. This is supported by experimental data [420, 874]. Let us
consider diffraction through an aperture on a planar screen of infinite extent. Let the z-axis
be perpendicular to the screen and the aperture be located at z = 0. Therefore, a point on the
aperture is specified by (x , y ). Light illuminating the aperture is coming from the negative
z-direction and the half-space corresponding to z > 0 is called the diffraction region where
light has propagated through the aperture opening. We can calculate the field at a point
(x, y, z) in the diffraction region z > 0 by Eq. (9.106):
k eikr i n·r
V (x, y, z) = 1+ V (x , y )dx dy , (9.108)
2πi A r kr r
We also have to assume that the wavelength is much smaller than the aperture size and
therefore much smaller than r . Therefore, 1 + i/kr = 1 + iλ/2πr ≈ 1, and
k eikr
V (x, y, z) ≈ V (x , y )dx dy . (9.110)
2π i A r
It is interesting to see that except for the scale factor k/2π i, Eq. (9.110) is clearly an
expression of the Huygens principle. Each point (x , y ) on the aperture emits a spherical
wave eikr /r that linearly adds up in the diffraction field. We can further simplify Eq. (9.110)
by substituting 1/r ≈ 1/z:
k eikr
V (x, y, z) ≈ V (x , y )dx dy . (9.111)
2π i A z
Note that eikr cannot be replaced by eikz because k is a very large number and the smaller
difference between r and z is amplified so much that it will introduce a large error in the
phase of the scalar field.
9.10 Physical optics 225
The exact evaluation of diffraction fields through Eq. (9.111) can be done quickly by
computers. However, for many imaging applications, there are some approximate evaluation
methods that can lead us to very interesting results, and they will be discussed here. The
first approximation is called the Fresnel approximation and it assumes that
r= (x − x )2 + (y − y )2 + z 2 (9.112)
"
x − x 2 y − y 2
=z 1+ + (9.113)
z z
1 x − x 2
1 y − y 2
≈z 1+ + . (9.114)
2 z 2 z
The region where Eq. (9.114) is a good approximation is called the region of Fresnel
diffraction, and the diffraction field calculated with Eq. (9.115) is called Fresnel diffraction.
The Fresnel diffraction region extends from a certain distance (approximately 10 times of
the diameter of the aperture) to infinity and its range is dependent on the degree of accuracy
required.
Equation (9.115) can be further simplified by another approximation (the Fraunhofer
approximation) when the observation distance is further away from the aperture (z very
large) and
k 2
(x + y 2 ) ≈ 0 (9.116)
2z
for any (x , y ) on the aperture. Since k = 2π/λ is very large for visible wavelengths,
the distance z at which the approximation is good is indeed very long. For example, if
λ = 500 nm and the radius of the aperture is 1 cm, then z has to be tens or hundreds of
kilometers. Using the Fraunhofer approximation, Eq. (9.115) becomes:
# $ # $
eikz k k
V (x, y, z) ≈ exp i (x 2 + y 2 ) V (x , y )exp −i (x x + yy ) dx dy (9.117)
iλz 2z A z
# $
eikz k , -
≈ exp i (x 2+ y 2 ) V (x , y )exp −i2π( f x x + f y y ) dx dy , (9.118)
iλz 2z A
where f x = x/(λz) and f y = y/(λz). This means that the diffraction field V (x, y, z) at a
plane normal to the z-axis is proportional to the Fourier transform of the aperture field
V (x , y ).
226 9 Optical image formation
y v v′ y′
x u u′ x′
Q
z′
z
Q′
Figure 9.17. A diffraction-limited imaging system converts a diverging spherical wave incident on
the entrance pupil into a converging spherical wave leaving the exit pupil.
V′
Y′
(x′, y′, z′)
(u′, v′) U′ r
Q′ (xo′, yo′, zo′)
ro X′
Z ′= 0
Z′
a Z ′= zo′
exit pupil
image plane
Figure 9.18. The coordinates of the exit pupil and the image plane for OTF calculation.
it is possible to obtain an adequate answer by assuming that diffraction occurs only from
the exit pupil to the image plane and that light propagation in other parts of the system can
be treated by geometrical optics [352, p. 103]. This is the approach we use in the following
calculation.
From Eq. (9.110), the diffracted field V (x , y , z , xo , yo , z o ) at any point (x , y , z ) in the
image space can be calculated from the aperture field V (u , v , xo , yo , z o ) at the exit pupil,
assuming that the exit pupil is located at z = 0 (see Fig. 9.18). It should be noted that both
fields are functions of the location of the point object source (xo , yo , z o ).
k eikr
V (x , y , z , xo , yo , z o ) ≈ V (u , v , xo , yo , z o )du dv , (9.119)
2πi exit pupil r
where ro = (xo − u )2 + (yo − v )2 + z o2 is the distance from a point (u , v ) on the exit
pupil to the ideal image point Q at (xo , yo , z o ), and U (xo , yo , z o ) is a constant proportional to
the intensity of the point source, its magnitude being chosen so that P(0, 0, xo , yo , z o ) = 1.
If the system is diffraction-limited, P(u , v , xo , yo , z o ) = 1 for (u , v ) on the exit pupil, and
P(u , v , xo , yo , z o ) = 0 elsewhere. If the exit pupil is a circular aperture with a radius of a,
then P(u , v , xo , yo , z o ) = 1 for u 2 + v 2 ≤ a 2 and P(u , v , xo , yo , z o ) = 0 for u 2 + v 2 >
a 2 . Thus P is a uniform disk of amplitude 1 inside the circle and 0 outside. The function
P(u , v , xo , yo , z o ) is therefore called the (generalized) pupil function. If P is not a constant,
228 9 Optical image formation
we can express it as
where A is its magnitude and W is its phase. A is called the apodization function and W is
the wavefront aberration. Equation (9.119) becomes:
k eik(r −ro )
V (x , y , z , xo , yo , z o ) ≈ U (xo , yo , z o )P(u , v , xo , yo , z o )du dv . (9.122)
2πi rro
Let d be the distance from the point (x , y , z ) in the image space to the ideal image
point Q , i.e., d 2 = (x − xo )2 + (y − yo )2 + (z − z o )2 , the phase function ik(r − ro ) can
be simplified as follows:
r 2 = (x − u )2 + (y − v )2 + z 2 (9.123)
= (x − xo + xo − u )2 + (y − yo + yo − v )2 + (z − z o + z o )2 (9.124)
= [(xo −u ) 2
+(yo 2
−v) + z o2 ]+ [(x −
xo )2 + (y −
yo )2 + (z −
z o )2 ] (9.125)
+ 2(x − xo )(xo − u ) + 2(y − yo )(yo − v ) + 2(z − z o )z o (9.126)
= ro2 + d + 2(x −
2
xo )xo + 2(y −
yo )yo + 2(z −
z o )z o
− 2[(x − xo )u + (y − yo )v ] (9.127)
= ro2 + d + 2b − 2[(x −
2
xo )u + (y −
yo )v ], (9.128)
≈ e . (9.132)
rro z z o
Equation (9.122) then becomes
V (x , y , z , xo , yo , z o )
U keikd /2zo eikb/zo
2
≈ P(u , v , xo , yo , z o )e−ik[(x −xo )u +(y −yo )v ]/zo du dv , (9.133)
i2π z z o
In imaging applications, the physical quantity we actually measure is the energy flow of
the imaging light, rather than the electromagnetic field strength itself. It can be shown [124,
9.10 Physical optics 229
Section 8.4] that the radiant flux density2 of natural light (incoherent) source for an imaging
system of moderate (not too small) aperture is proportional to |V |2 = V V ∗ , where V ∗ is
the complex conjugate of V . Therefore, the two phase factors, eikd /2zo eikb/zo , will not affect
2
the image irradiance if we restrict the object domain to be around a small neighborhood
of (xo , yo , z o ) and consider only the imaging between a plane z = z o in object space and
a plane at z in image space. Under this condition, Eq. (9.133) shows that the imaging
system has a “space-invariant” point spread function because the field distribution in image
space depends approximately on the distance to the ideal image point, ξ = x − xo and
η = y − yo . Except for the proportionality factor (denoted as s), the point spread function
g̃(ξ, η) of the eletromagnetic field is simply the Fourier transform of the scaled pupil function
evaluated for (xo , yo , z o ):
s
g̃(ξ, η) ≈ P(u , v )e−ik[ξ u +ηv ]/zo du dv (9.134)
λz z o
The system transfer function G̃( f x , f y ) is therefore a scaled version of the pupil function
P(−λz o f x , −λz o f y ). We can further simplify the expressions to make them more convenient
to use. First, we can define the pupil function in a reflected coordinate system, i.e., reversing
the u - and v -axes in Fig. 9.17 and removing the negative signs. Second, the pupil function
is usually defined to be equal to 1.0 at the origin; if we assume that the zero spatial frequency
(the DC component) is always transferred through the system, we can define the system
transfer function as G( f x , f y ), where G(0, 0) = 1.0 = P(0, 0), then
G( f x , f y ) = P(λz o f x , λz o f y ) (9.136)
and the corresponding field impulse response function is
2 The radiant flux has to be measured as a time-averaged quantity. The measurement time is assumed to be
much, much longer than a period of the highest frequency of the light considered.
230 9 Optical image formation
time. The other type of coherence is spatial coherence. Light coming from two spatial
locations can be well correlated in their phase. For example, in the last section, we assumed
that a light wave leaving the exit pupil is a perfect spherical wave for a diffraction-limited
system. The electromagnetic fields at any two spatial points on the exit pupil are thus
assumed to be perfectly correlated. This is a consequence of the source being an ideal point
source. If instead, we use a back-illuminated pinhole of finite size as the “point source”, the
electromagnetic field on the exit pupil will become partially coherent, and the irradiance
on the image plane will be different [868].
From Eq. (9.133), we can calculate the point spread function, h, in image irradiance.
First, let us simplify the notation by rewriting Eq. (9.133) as:
# $
2π
V (ξ, η) ≈ Vo P(u , v ) exp −i (u ξ + v η) du dv , (9.138)
λz o
where the dependence of V , Vo , and P on (xo , yo , z o ) is understood and omitted. The
coordinates ξ = x − xo and η = y − yo are centered at the Gaussian image point (xo , yo ):
(9.141)
where h o = λ2 z o2 Vo Vo∗ . The image irradiance function h(ξ, η) is the image of a point object
Q. If the light coming from any other point in object space is not correlated with that from
Q, then h(ξ, η) can be treated as the point spread function of the system because the radiant
flux from different object points can be added together to find the image irradiance. This is
almost always the case in imaging applications involving natural lighting. However, light
coherence is a continuum, ranging from complete incoherence to complete coherence. A
general treatment should consider all light sources as characterized by partial coherence,
which is a subject that will take us too far away from our immediate interest here. Readers
who are interested in the subject can consult to references [98, 124, 353, 631].
The system transfer function H (νx , ν y ) of the optical imaging system is the Fourier
transform of h(ξ, η):
Again, it is often more convenient to normalize the transfer function with respect to the
9.11 Problems 231
lRnx
lRny
u
zero frequency component. The normalized transfer function is called the OTF:
H (νx , ν y ) P(u, v)P ∗ (u + λRνx , v + λRν y )dudv
O T F(νx , ν y ) = = , (9.143)
H (0, 0) P(u, v)P ∗ (u, v)dudv
where R = z o is the distance from the exit pupil to the Gaussain image plane. In the
diffraction-limited system, the wavefront emerging from the exit pupil is a perfect sphere
centered at the Gaussian image point (xo , yo , z o ). This sphere is called the reference sphere
and its radius is R. Equation (9.143) shows that the OTF is simply the autocorrelation of
the pupil function, P, and its complex conjugate, P ∗ . If the pupil is a circular aperture of
radius a, any frequency ν, such that λRν > 2a, will be completely cut off by the imaging
system because the autocorrelation of the pupil will be zero once P and P ∗ are separated
by a distance greater than twice its radius (see Fig. 9.19).
The cut-off frequency, νc of the system is:
2a 2a 1
νc = ≈ = , (9.144)
λR λf λF
where f is the focal length of the system and F is the ratio of focal length to the diameter of
the pupil. F is known as the f -number of the lens. If the focal length is eight times the pupil
diameter, we say the f -number is f /8. As we will see later, the f -number is a key parameter
in determining the light-collecting capacity or “image brightness” of an optical system.
9.11 Problems
9.1 What are the names and definitions of the six cardinal points, H , H , F, F , N , and
N , of an imaging system?
9.2 Let M be the transition matrix of an optical imaging system from z 1 to z 2 , where
# $
1.0 0.0
M= ,
−0.02 1.0
z 1 = 20 and z 2 = 30. The object space and the image space are both in the air, i.e., the
232 9 Optical image formation
index of refraction n ≈ 1 (see Fig. 9.20). Where are the six cardinal points, H , H ,
F, F , N , and N ?
M
z1 z2
Figure 9.20.
9.3 A spherical surface with a radius of 0.5 is located at z 2 = 22 (see Fig. 9.21). The index
of refraction on the left is n = 1.0 and on the right is n = 1.5. Find the transition
matrix from z 1 = 10 to z 3 = 40. Where are the object focal point and the image focal
point?
z 1 = 10 z 2 = 22 z 3 = 40
Figure 9.21.
9.4 The nodal points, N , N , and the principal points, H , H , of an optical imaging system
are located at z N , z N , z H , and z H , respectively. Let n and n be the indices of refraction
of the object space and the image space. Given that z N − z H = z N − z H = u > 0,
find the transition matrix M from z N to z N , with the elements of M expressed only
in terms of u, n and n .
9.5 The object space and the image space of a given Gaussian imaging system are both in
the air. Let the z-axis be the optical axis of the system. An object with height y = 1
is located at z. Its image is formed at z with height y = −0.1. A ray radiated from
(z, y), at an angle of 0.01 radians relative to the z-axis, arrives at (z , y ) at an angle
of −0.12 radians. Determine the transition matrix M from z to z .
9.6 A telescope has an objective lens, a field lens, and an eye lens, with focal lengths
f 1 , f 2 , and f 3 , respectively (the telescope is used in the air) (see Fig. 9.22). All three
lenses are considered as thin lenses. The objective lens is the outermost lens facing
the objects. The field lens is located at a distance of f 1 behind the objective lens, and
the eye lens is at a distance of f 3 behind the field lens. The objective lens is known to
be the limiting aperture of the telescope, and therefore its image formed by the field
lens and the eye lens is the exit pupil. If a user’s eye is placed at the exit pupil, he/she
can see the entire field of view visible by the telescope.
9.11 Problems 233
f1
f3
10.1 Introduction
In this chapter, we will study lens aberrations and their effects on light distributed on the
image plane. We would like to calculate the image irradiance for a given optical imaging
system, especially when there is defocus because this is the most frequent problem in
consumer images. First, we derive the relation between the scene radiance and the image
irradiance for an ideal optical imaging system which has no lens aberrations and is in perfect
focus. Next, we study how the distribution of light on the image plane is affected by some
defects in the optical imaging process. The theory of wavefront aberrations is formulated
and it is used to calculate the point spread function (PSF) and the OTF in the presence of
focus error. Results from geometrical optics and physical optics are compared.
Some terms are used very often in the discussion of image light distribution. Sometimes,
however, they are defined differently by different authors. We will define some of these
terms here based on the international standard as specified in ISO 9334. The image of an
ideal point object is a two-dimensional function, f (x,y), on the image plane, on which
the coordinates (x,y) are defined. If we normalize this function so that it integrates to 1,
the normalized f (x,y) is the PSF of the imaging system. The Fourier transform of the PSF
is the OTF, F(νx ,ν y ), where νx and ν y are the horizontal and vertical spatial frequencies in
the image plane. By the definition of the PSF, the OTF is equal to 1 at zero frequency, i.e.,
F(0,0) = 1. An OTF can be a complex function. The modulus (i.e., the absolute value) of
the OTF is called the modulation transfer function (MTF), and the argument (i.e., the phase)
is called the phase transfer function (PTF). The normalized image of an ideal line (infinitely
long with no width) is called the line spread function (LSF). It is defined as the integral of
the PSF along the line direction. For example, if the ideal line is along the y-direction, then
∞
L S F(x) = P S F(x, y)dy. (10.1)
−∞
The normalized image of an ideal step edge (0 on one side and 1 on the other side, after
normalization) is called the edge spread function (ESF). It is defined as the integral of the
LSF. For example, if the step edge is along the y direction, then the ESF(x) is defined as
x
ESF(x) = LSF(x )dx . (10.2)
−∞
Strictly speaking, these functions are most useful and meaningful only for linear,
234
10.2 Radiometry of imaging 235
shift-invariant systems. However, for a fixed operating point and signals with small to mod-
erate modulation (say, 35% contrast), the concept of the MTF is often used to characterize a
nonlinear system as well. For example, the photographic MTF (ANSI/PIMA IT2.39-1988)
of a film is often given in product data sheets. For these applications, it is very important to
compare data using a comparable measurement procedure.
One of the most important relations in imaging is the radiometric relation between the scene
radiance, L, and the image irradiance, E . We will call this relation the image irradiance
equation. Because it is so useful, we will first summarize some of the key points in this
relation before we derive it from Gaussian optics. When the object distance is much larger
than the focal length of the system, we have the following relations:
r For a given imaging system, the image irradiance, E , is proportional to the scene
radiance, L.
r For a Lambertian surface parallel to the lens, the image irradiance falls off as cos4 θ,
where θ is the off-axis angle of the object.
r The image irradiance is proportional to the size of the entrance pupil of the imaging
system. (The larger the entrance pupil is, the more light a camera collects.)
r The image irradiance is inversely proportional to the square of the focal length of the
imaging system. (The lateral magnification is proportional to the focal length. The
longer the focal length is, the larger the image area is over which the collected light
flux is distributed.)
An imaging system collects a cone of rays radiating from an elemental area d A in the
object space of the entrance pupil. Figure 10.1 shows the geometry of radiant energy flowing
through an imaging system. The elemental area d A is assumed to be a Lambertian source
dA
with radiance L and its surface normal is parallel to the optical axis of the system. The
elemental area dA is located at a distance h away from the optical axis. Its image dA is
located at a distance h off axis. Let s be the distance from d A to the entrance pupil, and
s be the distance of d A from the exit pupil. Let n and n be the indices of refraction of
the object space and the image space, respectively. Let θ and θ be the angle between the
chief ray and the optical axis in the object space and the image space. Let a and a be the
radii of the entrance pupil and the exit pupil. The total flux, den , collected by the entrance
pupil from dA can be calculated by integrating the small area on the ring over ψ and r . The
vector, v, from dA to the small ring area is [r cos ψ, r sin ψ − h, s]T . Therefore, the cosine
of the angle β between the z-axis (normal to the pupil plane and dA) and the vector v can be
calculated from cos β = s/v, where v is the length of the vector v. The total collected
flux, den , is:
a 2π
r dψdr · cos β
den = L · dA · cos β
r =0 ψ=0 v2
a 2π
r s 2 dψdr
= L · dA ·
r =0 ψ=0 [r 2 cos2 ψ + (r sin ψ − h)2 + s 2 ]2
a
2π (s 2 + h 2 + r 2 )r s 2 dr
= L · dA ·
r =0 [(s + h + r ) − 4h r ]
2 2 2 2 2 2 3/2
πL s2 + h2 − a2
= 1− dA.
2 (s 2 + h 2 + a 2 )2 − 4h 2 a 2
We can also write a similar equation in image space:
π L s 2 + h 2 − a 2
dex = 1− dA . (10.3)
2 (s 2 + h 2 + a 2 )2 − 4h 2 a 2
Assuming that there is no loss in the imaging system, dex = den and the image irradiance,
E , at dA is
dex den πL s2 + h2 − a2 dA
E = = = 1− , (10.4)
dA dA 2 (s + h + a ) − 4h a
2 2 2 2 2 2 dA
or, equivalently,
dex π L s 2 + h 2 − a 2
E = = 1− . (10.5)
dA 2 (s 2 + h 2 + a 2 )2 − 4h 2 a 2
From the radiance theorem in Section 3.4, we know that the basic radiance L/n 2 is conserved
as it propagates through lossless media, and we have L/n 2 = L /n 2 . Therefore,
π L n 2 s 2 + h 2 − a 2
E = 1− . (10.6)
2 n (s 2 + h 2 + a 2 )2 − 4h 2 a 2
The above equations are quite general, but the required quantities are not easily measurable.
For practical purposes, there are two special cases of the image irradiance equations that
are more useful. These are discussed in the next two sections.
10.2 Radiometry of imaging 237
N A = n sin α . (10.9)
The larger the numerical aperature, the brighter the image. Therefore, it is often used to
specify the speed of the optical imaging system. The other quantity that is also often used
to quantify the speed is the relative aperture, F, commonly known as the f -number. It is
defined as
1 1
F= = . (10.10)
2n sin α 2(N A)
When the object point is located at infinity, the distance, s , between the exit pupil and the
image point is often assumed to be f , the image focal length, and
1 1 1 f 1 f
F=
≈ −1
≈ = , (10.11)
2n sin α 2n sin[tan (a / f )] n 2a mpn 2a
where m p = a /a is the pupil magnification. (Note that we use F for the relative aperture
of the general case, and F for the special case when the object is located at infinity.) For an
imaging system in which the index of refraction of the object space and the image space
are both in the air, n = n ≈ 1 and F ≈ f /D , where D = 2a is the diameter of the exit
pupil. It should be noted that it is also often defined as F = f /D, where D is the diameter
of the entrance pupil. Clearly, this is acceptable only when m p ≈ 1, which may not be true
for many imaging systems (for examples, see [786, p. 104]). The implicit assumption is that
m p is treated as yet another proportionality constant to be calibrated for the system.
It should also be pointed out that the numerical aperture and the relative aperture as
defined above are properties of the light beam converging at the image point, rather than the
imaging system itself. The reason is that the size of the cone of rays focusing on an image
point is a function of the object distance. In order to avoid such ambiguities, the numerical
aperture and the relative aperture are often defined for an object located at infinity. Using
such definitions, we can write Eq. (10.7) for an object located at infinity as
2
L π L π D2 mp
E = π (N A)2 = = L. (10.12)
n2 4F 2 n2 4 f
238 10 Lens aberrations and image irradiance
Therefore, E is proportional to the scene radiance L and the area of the entrance pupil,
π D 2 /4. Also, the longer the focal length we use in imaging, the darker our image will be.
For a close object, the distance, s , between the exit pupil and the image point deviates
from f significantly, and we need to use Eq. (10.7) to calculate the image irradiance. If
the location of the exit pupil is not known, s is often approximated by the distance from
the image plane to the image principal plane, and from Eq. (9.26), s ≈ (m − 1) f , where
m is the lateral image magnification (note: m and f are often negative numbers). Equation
(10.7) can then be expressed as
π L π L
E = ≈ , (10.13)
4F 2 (m − 1)2 q n2 4F 2 (m − 1)2 n2
where q = 1 + 1/[4(m − 1)2 F 2 n 2 ] is a factor close to 1. For example, at f /2.8 (i.e., F =
2.8), q ≈ 1.03 when n = 1.
pp. 212–213]. The first two affect image points at all locations, but the third process affects
the image irradiance more when it is further away from the optical axis. Vignetting occurs
because part of the imaging aperture is occluded by mechanical parts from image points
that are off the optical axis. The image is darker than can be accounted for by the cos4 θ
fall-off. A general image irradiance equation takes these factors into account:
# $
kπ T V (x , y ) cos4 θ L(x, y)
E (x , y ) = + g(x , y ), (10.18)
4F 2 (m − 1)2 n 2
where (x, y) and (x , y ) are the object and image coordinates, T is the transmittance of the
lens, V (x , y ) is the vignetting attenuation factor, g(x , y ) is the stray light (flare, glare)
component due to scattering from lens, camera body, sensor, etc., and the proportionality
constant, k, can be calibrated to include the pupil magnification and other factors. Another
popular form of the equation can be derived from Eq. (10.16):
# $
kπ T V (x , y ) f 2 cos4 θ L(x, y)
E (x , y ) = + g(x , y ), (10.19)
4F 2 v 2 n 2
where f is object focal length and v is the image distance.
In Chapter 9, we showed that the OTF can be calculated from the autocorrelation of the
(generalized) pupil function, P. For an ideal diffraction-limited system, the pupil function
is equal to 1.0 within the exit pupil and 0 elsewhere. However, for real lenses, the pupil
function is often a complicated function, consisting of various aberrations. The shape of
the pupil function is traditionally approximated by a polynomial and different terms in the
polynomial are given different names. Deviations of the pupil function from a constant for
a given wavelength are called monochromatic aberrations. The other type of aberration is
caused by lens dispersion and is called chromatic aberration.
Y′
exit pupil
h q y′
x r
X′
x′
ar
a image plane
Figure 10.2. The coordinates system for the generalized pupil function.
(x, h)
(x′, y′)
image plane
center of
exit pupil actual wavefront
reference sphere
exit pupil
Figure 10.3. The deviation, W , of the actual wavefront from the reference sphere is called the
wavefront aberration.
measure of the nonuniformity of the emerging wave. For a lens with aberration function
W , P(ξ, η; x , y ) = A(ξ, η; x , y ) exp[ikW (ξ, η; x , y )]. The apodization function,
A(ξ, η; x , y ), is usually equal to 1 for camera lenses. For an aberration-free lens,
P(ξ, η; x , y ) equals 1 if ξ 2 + η2 ≤ a 2 , where a is the radius of the exit pupil.
For a rotationally symmetric system, the image position on the image plane may be
specified by a single variable r = (x )2 + (y )2 and points in the exit pupil by the radial
and azimuthal coordinates ρ and θ, where ξ = aρ sin θ, η = aρ cos θ, and 0 ≤ ρ ≤ 1.
Without loss of generality, we assume that the image point of interest is on the y -axis, i.e.,
x = 0. Therefore, θ is defined relative to the image point and the η-axis is parallel to the
y -axis. From considerations of symmetry the wavefront aberration function exhibited by
such a system is a function of three variables: r 2 , ρ 2 , and rρ cos θ. (For a discussion of this
10.3 Light distribution due to lens aberrations 241
symmetry argument, see, e.g., [682, pp. 206–7]). The wavefront aberration function, W , may
be expanded as a power series in the above three variables. An alternative is also possible
using the Zernike circle polynomials to expand the function W (see [124, pp. 464–8]). The
power series expansion gives us:
W (r 2 , ρ 2 , rρ cos θ) = W020 ρ 2 + W111rρ cos θ
+ W040 ρ 4 + W222r 2 ρ 2 cos2 θ + W220r 2 ρ 2 + W131rρ 3 cos θ
+ W311r 3 ρ cos θ + W060 ρ 6 + · · · (10.21)
where the subscripts represent the powers of r , ρ, and cos θ. The first two terms represent
the defocus and the tilt respectively. The next five terms represent the five third order
(Seidel) aberrations: spherical aberration, astigmatism, curvature, coma, and distortion.
Higher orders are also considered in lens design programs. By careful adjustment of a
system’s physical parameters (e.g. the shapes, thicknesses, separations of the lenses, glass
types, and the locations of the stops), these aberrations can often be minimized.
We now briefly describe the five Seidel aberrations and their corresponding characteristic
image patterns which can be derived from their respective polynomial terms. Let (ξ, η) =
(aρ sin θ, aρ cos θ) be a point on the exit pupil and (x , y ) be a point on the image plane.
Then, let us trace an arbitrary ray from the object point and assume that it emerges from the
exit pupil at (ξ, η). If there is no aberration, this ray will go through the Gaussian image point
at G(x , y ). If there is an aberration, the ray will intersect with the image plane at a different
point Q(x + x , y + y ). The distance between G and Q is called the ray aberration.
It can be shown [682, pp. 212–20] that the ray aberration (x , y ) can be approximately
related to the wavefront aberration function, W (ξ, η; x , y ), by the following equations:
R ∂W
x = − , (10.22)
n ∂ξ
R ∂W
y = − , (10.23)
n ∂η
where R is the radius of the reference sphere (i.e., the distance from G to the center of the
exit pupil) and n is the index of refraction of the image space. From these two equations, we
can roughly determine the image pattern caused by the aberration. (These patterns are the
results of ray tracing and therefore are the predictions of geometrical optics.) In Figs. 10.4–
10.8, which explain the five Seidel aberrations, a single lens is used for illustration in place
of the exit pupil to show how different rays from an object point may deviate from the
Gaussian optics due to the aberrations. In practice, the exit pupil should replace the lens in
the diagrams.
lens
Therefore,
# $2
n ax 2
n ay
+ = 1. (10.28)
2RρW020 2Rρ(W020 + W222r 2 )
10.3 Light distribution due to lens aberrations 243
meridional rays
sagittal rays
T
lens S
object
Figure 10.5. Astigmatism: the meridional rays and the sagittal rays have different focal points, T
and S.
field curvature
lens
object
The rays from a circle with a radius of ρ on the exit pupil intersect with a defocused image
plane at an ellipse. The semi-axes of the ellipse depend on the position of the image plane,
i.e., the amount of defocus, W020 .
tangential sagittal
focal point focal point
Gaussian
coma image point
pattern
Ga
u
lens foc ssian
al p
lan
e
object
pincushion barrel
distortion distortion
Figure 10.8. Distortion.
Therefore,
2 2
W131 Rr 2 2W131 Rrρη
(x )2 + y + ρ = . (10.33)
na na2
Rays from a circle with a radius of ρ on the exit pupil intersect the image plane at another
circle with a radius of 2W131 Rrρη/(n a 2 ). However, the center of the circle on the image
plane is shifted by an amount proportional to ρ 2 .
The five Seidel aberrations are only one way to expand the wavefront aberration into
polynomials of r , ρ, and cos θ. Even in a well-designed lens, small amounts of all these
five as well as higher order terms are present at the same time. We now turn our attention
to a case study of focus error as an example of how to calculate the OTF and the PSF from
the wavefront aberration function. It is a very instructive case study because focus error is
246 10 Lens aberrations and image irradiance
a
z
2a
R+z
one of the major problems in consumer imaging applications. We will also use this case
to compare the results from wave optics and from geometrical optics to show that one of
the assumptions in geometrical optics is violated when the rays converge to an ideal point,
because the spatial dimension of interest is no longer much larger than the wavelength.
1 Sections 10.3 and 10.4 are a slightly modified version of part of a paper [564] published in Optical Engineering.
Readers interested in more details should read the paper.
10.3 Light distribution due to lens aberrations 247
Before we compute the OTF of a defocused perfect lens, we have to normalize the units
in pupil coordinates and spatial frequencies so that the results will be independent of focal
length and aperture size. Let us define the following variables; f , focal length; a, radius of
exit pupil; F = f /(2a); φ, the half-angle of the cone subtended by the exit pupil at the image
point; νx , ν y , horizontal and vertical spatial frequency in image plane; u x , u y , normalized
horizontal and vertical spatial frequency; and = W020 /(λ/4), a measure of the degree of
defocus. Since we are only interested in the circularly symmetric system, we can specify
the spatial frequency by ν and u (normalized ν), without specifying the angularorientation.
The normalized spatial frequency u is defined as u = λν/2 sin φ, where ν = νx2 + ν y2 .
Assuming that φ is small so that φ n can be neglected for all n > 2, that the out-of-
focus distance z (Fig. 10.9) is small compared with R, and that the image distance R is
approximately equal to the focal length, the following relations can be derived:
u = λFν, (10.39)
= z/(2λF ). 2
(10.40)
It should be emphasized that the above approximation is good only for small values of z. In
fact, = W020 /(λ/4) is not symmetric with respect to z, as implied by Eq. (10.40). That is,
the magnitude of W020 will differ at equal distances either side of the true Gaussian image
plane, as can be seen from Eq. (10.38). The absolute values of W020 and are larger when
z is negative than when it is positive. That is, at equal distances the OTF deteriorates much
faster when the film is moved towards the lens than it does if the film is moved away from
the lens. This is observed in experimental measurements.
It can be shown that for a defocused perfect lens, the OTF can be computed from the
following equation which is derived from Eq. (9.143) by considering only νx and making
an appropriate change of variables (for details, see [410, 588]):
1
4 1
H (u, ) = (1 − x 2 ) 2 cos[2π u(x − u)]dx. (10.41)
π u
where J0 is the Bessel function of the first kind of order 0. It should be noted that τ is the
normalized spatial distance in the image plane, and is related to the real distance r by the
relation r = λFτ . For a diffraction-limited system, = 0 and
# $
π 2J1 (π τ ) 2
h(τ, 0) = , (10.44)
4 πτ
248 10 Lens aberrations and image irradiance
0.8
0.6
image irradiance
0.4
0.2
0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
normalized radial distance
(a)
0.020
0.018
0.016 ∆=0
0.014
image irradiance
0.012
0.010
0.008
0.006
0.004
0.002
0.000
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0
normalized radial distance
(b)
Figure 10.10. The amplitude as a function of the normalized distance of the Airy pattern: (a) the
amplitude profile; (b) a magnified view of the details of the ring structure.
where J1 is the Bessel function of the first kind of order 1. The diffraction-limited PSF has
a central bright disk, surrounded by many rings. It is called the Airy pattern, after G.B. Airy
who published the result in 1835. Figure 10.10 shows the profile of the Airy pattern. The
function J1 (x)/x has a first zero at x = 3.833 and therefore the first dark ring occurs at a
normalized distance of τ = 3.833/π = 1.220, which translates into the real distance r at
the image plane as r = λFτ = 1.220λF. This distance is often used as a measure of the
resolving power of an imaging system, and is called the Rayleigh criterion: this says that
if the images of two object points are separated by a distance r in the image plane, i.e., if
the brightest center of one point image is at the first dark ring of the other point image, the
two points can be resolved by the imaging system [881, Chapter 9]. This is a reasonable,
but somewhat arbitrary, criterion for at least two reasons: (1) most imaging systems are not
10.3 Light distribution due to lens aberrations 249
f
object
a
x y
s
b
so i
lens
sensor plane
Figure 10.11. The geometric diagram of blur circle.
diffraction limited, and (2) whether two points can be resolved depends on the local image
contrast. If the two points are to be resolved by a human observer, human contrast sensitivity
has to be taken into consideration. The Rayleigh criterion can also be expressed as angular
resolution:
r λ
θ≈ = 1.220 , (10.45)
f D
where f is the focal length and D is the diameter of the aperture.
Comparison of the defocus models from wave optics and geometrical optics
Now let us compare the results from wave optics and geometrical optics. Because the PSF
from geometrical optics is so much simpler than that from wave optics, one would like to
see if the former is a good approximation of the latter. The answer is that geometrical optics
is a good approximation when the blur circle is large (say, > 100). Otherwise, we have
to carefully specify what our error tolerance is. Figures 10.12–10.17 show the comparison
of the results from geometrical optics and wave optics. The PSF and the OTF at various
degrees of defocus, , are compared.
It is interesting to note that the low-frequency part of the OTFs from wave optics and
geometrical optics move closer and closer to each other as the defocus becomes more and
more severe. The low-frequency responses (frequencies below the first zero crossing) from
these two models are almost identical to each other for > 20 (see Fig. 10.15). However,
the high-frequency parts of the OTFs are never similar to each other. Two questions naturally
arise: (1) In consumer photography, is a typical focus error, , often greater than 20? (2)
Are the differences in the high-frequency response visible in the final scanned images? The
answer to the first question is “not very often”. The answer to the second question has to
take into consideration the image blur introduced by other system components, e.g. the film
and the scanner.
Let us consider the case of a typical camera for 35 mm film. Let the focal length of
the lens be 50 mm, the f -number, F, be 8.0, and the subject be 3 m away, i.e., s = 3 m.
Let us assume that the average effective wavelength is 500 nm. From Eq. (10.50), we have
= 13.02 p. In order to have > 20, the focus error p has to be greater than 150%,
10.3 Light distribution due to lens aberrations 251
1.0
geometrical optics
0.8
wave optics
image irradiance
0.6
0.4
0.2
0.0
0.0 1.0 2.0 3.0 4.0 5.0
normalized radial distance
(a)
1.0
optical transfer function
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
normalized spatial frequency
(b)
Figure 10.12. The PSF and the OTF computed from wave optics and geometrical optics at focus error
= 0; (a) the geometrical PSF is an impulse at the origin with unit amplitude; the wave PSF is the
solid curve; (b) geometrical OTF (dashed curve), wave OTF (solid curve).
meaning that we have to mistakenly focus the camera at a distance closer than 1.2 m, when
the subject is actually 3 m away. Focus error of this magnitude does occur, but only rarely.
Most errors are more likely to be less than 40%, i.e., p < 0.4, and the corresponding s
are less than 5.2. Clearly, the use of geometrical approximation is often not justified in
consumer photography. Even for a serious focus error, say = 40, it is not clear whether
geometrical optics is good enough for a high-resolution digital image. The two outermost
rings within the blur circle of the PSF as predicted by the wave optics in this case have a
peak-to-peak spacing of 28.8 µm, with a contrast, (Imax − Imin )/(Imax + Imin ), of more than
25%. A digital image scanned from a 34 mm by 22 mm area of a 35 mm negative with a
resolution of 2000 pixels per line (longer dimension) has a pixel spacing of 17 µm, which
could be capable of resolving the outer rings of the PSF.
252 10 Lens aberrations and image irradiance
0.08
∆=3
0.06
image irradiance
0.04
0.02
0.00
0.0 1.0 2.0 3.0 4.0 5.0
normalized radial distance
(a)
1.5
∆=3
1.0
optical transfer function
0.5
0.0
−0.5
0.0 0.2 0.4 0.6 0.8 1.0
normalized spatial frequency
(b)
Figure 10.13. The PSF and the OTF computed from wave optics and geometrical optics at focus error
= 3; (a) the geometrical PSF (dashed curve) is a uniform disk centered at the origin; the wave PSF
is the solid curve; (b) geometrical OTF (dashed curve), wave OTF (solid curve).
0.014
0.012
∆ = 10
0.010
image irradiance
0.008
0.006
0.004
0.002
0.000
0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0
normalized radial distance
(a)
1.5
∆ = 10
1.0
optical transfer function
0.5
0.0
−0.5
0.0 0.1 0.2 0.3 0.4 0.5
normalized spatial frequency
(b)
Figure 10.14. The PSF and the OTF computed from wave optics and geometrical optics at focus error
= 10; (a) the geometrical PSF (dashed curve) is a uniform disk centered at the origin; the wave
PSF is the solid curve; (b) geometrical OTF (dashed curve), wave OTF (solid curve).
it is usually assumed that if a point object is imaged into a disk with a diameter less than
0.027 mm on the negative, it is an acceptable focus. From this point of view, the term, depth
of field can be defined as the range of depth in the field around the focused object distance
for which a point object will produce a disk image of diameter less than 1 minute of arc on
the final print under standard viewing conditions.
Assuming our eyes cannot resolve two points closer than 2b on the image plane, i.e. if
we can tolerate a blur circle of radius b, and the camera is focused at the object distance so ,
then any point at a distance between Snear and Sfar is imaged into a circle with radius less
than b. From Eq. (10.48), assuming a b, we can calculate the depth of field D F from
254 10 Lens aberrations and image irradiance
0.0016
∆ = 20
0.0012
image irradiance
0.0008
0.0004
0.0000
0.0 5.0 10.0 15.0 20.0 25.0 30.0
normalized radial distance
(a)
1.5
∆ = 20
1.0
optical transfer function
0.5
0.0
−0.5
0.00 0.05 0.10 0.15 0.20 0.25
normalized spatial frequency
(b)
Figure 10.15. The PSF and the OTF computed from wave optics and geometrical optics at focus error
= 20; (a) the geometrical PSF (dashed curve) is a uniform disk centered at the origin; the wave
PSF is the solid curve; (b) geometrical OTF (dashed curve), wave OTF (solid curve).
It is easy to see from Eqs. (10.53) and (10.54) that if one keeps the focal length and the
10.3 Light distribution due to lens aberrations 255
0.00040
∆ = 40
0.00030
image irradiance
0.00020
0.00010
0.00000
0.0 10.0 20.0 30.0 40.0 50.0 60.0
normalized radial distance
(a)
1.5
1.0 ∆ = 40
optical transfer function
0.5
0.0
−0.5
0.00 0.02 0.04 0.06 0.08 0.10
normalized spatial frequency
(b)
Figure 10.16. The PSF and the OTF computed from wave optics and geometrical optics at focus error
= 40; (a) the geometrical PSF (dashed curve) is a uniform disk centered at the origin; the wave
PSF is the solid curve; (b) geometrical OTF (dashed curve), wave OTF (solid curve).
subject distance constant, increasing the aperture radius a has the effect of reducing the
depth of field, as is well known in photography. If the camera is focused at an object
distance so = (1 + a/b) f , then Sfar becomes ∞, and Snear becomes 12 (1 + a/b) f . This
distance, so = (1 + a/b) f , is called the hyperfocal distance [365]. A fixed-focus camera
can be made to focus at so = (1 + a/b) f and any object from a distance of so /2 to infinity
will be “well” in focus.
In digital camera applications, one is often tempted to define the depth of field using the
sensor pixel size as the blur circle. This practice is not meaningful when the pixel size is
smaller than the size of the Airy disk (d = 2.44λF). For example, if the camera uses an
f /2.8 lens and a CCD sensor with a pixel size of 2.6 µm, the blur circle cannot be the pixel
size because the Airy disk diameter for λ = 500 nm would be 3.4 µm. A correct definition
256 10 Lens aberrations and image irradiance
0.00008
∆ = 100
0.00006
image irradiance
0.00004
0.00002
0.00000
0.0 20.0 40.0 60.0 80.0 100.0 120.0
normalized radial distance
(a)
1.5
1.0 ∆ = 100
optical transfer function
0.5
0.0
−0.5
0.00 0.01 0.02 0.03 0.04 0.05
normalized spatial frequency
(b)
Figure 10.17. The PSF and the OTF computed from wave optics and geometrical optics at focus error
= 100; (a) the geometrical PSF (dashed curve) is a uniform disk centered at the origin; the wave
PSF is the solid curve; (b) geometrical OTF (dashed curve), wave OTF (solid curve).
of the depth of field has to start from the intended image magnification as viewed by an
observer with one minute of arc visual resolution. We can then calculate the size of the blur
circle on the sensor plane. Only when the calculated blur circle is smaller than a pixel can
we use the pixel size as the blur circle.
so that we can simplify the relations and obtain a good intuitive understanding. In reality, we
have to consider those rays that make moderate angles with the optical axis. For example, a
35 mm camera with a focal length of 50 mm has to produce good images on the 24 mm ×
36 mm film area. The diagonal corners form angles up to 23.4◦ , which is about 0.4083
radians. The sine value of that angle is 0.3970 and the cosine is only 0.9178. The error of
the paraxial approximation is from 2.8% to 8.9% even at this moderate angle. If the object
point is located off-axis, rays coming from it can make even larger angles. In order for these
rays to converge at the same image point, the optical system has to satisfy the sine condition
[682, pp. 112–14]: ny sin θ = n y sin θ , where n, y, and θ are the index of refraction, the
off-axis distance of the object point, and the angle that the ray makes with the axis, all in
the object space, while n , y , and θ are the corresponding quantities in the image space.
The sine condition is an important requirement for an optical imaging system to form a
sharp image of an extended object near the optical axis.
1 1 1 d
= + − . (10.56)
f f1 f2 f1 f2
Assuming that the index of refraction for lens 1 is n 1 and that for lens 2, n 2 , and that the
medium surrounding the lens is air with a refractive index close to 1, the focal length of
a thin lens is inversely proportional to the difference in index of refraction between the
258 10 Lens aberrations and image irradiance
Light distribution on the image plane is also affected by factors other than the optics. These
factors are related to the mechanical structure of the camera and its relative motion with
respect to the objects and the scene. Here, we will examine these other factors.2
A camera is basically a light-tight box with a lens to form an image, a diaphragm to
control the aperture size, a shutter to limit the exposure time, and a means of holding the
sensors and/or the medium (e.g. film) to record the image. The lens, the diaphragm, and
the shutter, together, are responsible for producing the optical image on the film. They
are tightly coupled in their optical effects and usually cannot be analyzed as independent
components.
2 This section is a slightly modified version of part of a paper [564] published in Optical Engineering.
10.4 Optical blur introduced by the camera 259
widely used. The MTF responses of several selected spatial frequencies and azimuth angles
are computed as functions of the out-of-focus distances. The OTF of a real lens depends
on many other variables than just the out-of-focus distance z. To completely characterize a
lens, a set of OTFs have to be used for the following reasons [741]:
1. The incident light for imaging is almost never monochromatic, and it is necessary to
specify the light source for the MTF measurement.
2. The best focus position is almost impossible to determine exactly because of the aber-
rations. For a given spatial frequency, there is a flat maximum on the transfer function
magnitude vs. focusing distance curve. Furthermore, the maximum does not occur at
the same focusing distance for different spatial frequencies.
3. The OTF is a function of the field position. It depends on the angle between the chief
ray for the object and the optical axis. It is, therefore, necessary to measure the MTF
for a number of field positions (usually three or four) to cover the range of interest.
4. It is also necessary to measure the MTF at various distances around the best focus,
because aberrations prevent the direct calculation of the curves from the “best focus”
MTF curves.
5. Optical image formation for extra-axial rays is more or less astigmatic and therefore
the tangential and radial spatial frequencies have different transfer responses. Separate
MTFs have to be measured along these two directions.
6. Objects at different distances are imaged at different magnifications. The MTF mea-
sured with the lens focused on infinity is different from that obtained when the lens
is focused on a nearby object. It is necessary to measure the lens MTF at several
magnifications.
The above discussion raises the question of the number of MTFs necessary for the com-
plete description of a photographic lens. Assuming that four field positions, three apertures,
five focusing distances, two magnifications, and three illuminants are required, then a total
of 360 MTF curves have to be measured. Unfortunately, individual curves cannot be de-
duced from the others but must be measured separately. This clearly shows how difficult it
is to model the image blur accurately, even if the blur is caused only by the lens optics. For
digital image restoration, one does not know the exact depth of an object, the individual
lens aberrations, the spectral distribution of the incident light, or any manufacturing defects
associated with the system with which the image was taken. It is not realistic to assume that
one can have complete knowledge of the lens blur. A reasonable approach to modeling is
to use the functions derived from the diffraction-limited lens and adjust certain parameters,
such as aperture size and defocus distance, to approximate the real blur functions.
The MTF of a real camera system under a small degree of defocus is quite difficult
to predict accurately unless the exact wavefront aberration function of the lens is known.
However, when the defect of focus is severe, its effect tends to dominate other aberration
terms, and thus becomes more susceptible to analytic approximation. Since information
about the exact wavefront aberration function of a camera lens is often unknown, we can
only model the blur function for a diffraction-limited lens under the various focus errors.
This serves as the best case study for small focus defects.
260 10 Lens aberrations and image irradiance
It has been proposed that the MTF of a very good camera lens at a given aperture may be
approximated by reducing the aperture diameter of the corresponding diffraction-limited
lens by a factor of 1.4–2. For example, the MTF of a good photographic lens at f /5.6
is about equivalent to that of an f /12 diffraction-limited lens. If one tests this idea, one
finds it necessary to have different factors for a lens at different defocus distances. A better
approximation of a good quality lens is to use the OTF of a stopped-down diffraction-limited
lens with a slight focus error, say = 1. When the defocus error is severe ( larger than
5), the real aperture size can be used and the MTF computed from Eq. (10.41) is a good
approximation to the measured one. This seems to confirm one’s intuition that when the
defocus error, W020 , is large, it will dominate all other aberration terms, as can be seen from
Eq. (10.21).
interlens film
shutter focal plane
film
shutter
Figure 10.18. The two major types of shutter: the interlens shutter and the focal plane shutter.
be placed before a lens, between the lens components, or at any point between the lens and
the focal plane of the camera. There are two major types of shutter: the interlens shutter and
the focal plane shutter. Both types of shutter affect the effective camera OTF. In fact, some
shutters have been intentionally designed to shape the camera OTF in a desirable manner
[764].
image irradiance
equal 1.0 at the full aperture) and the OTF be L(u). Then Eq. (10.59) becomes:
1
L̄(u) = S(t)L[u/ S(t)]dt. (10.60)
T
exit pupil
v
moving
w
2a
f d
film
focal
plane
shutter
Figure 10.20. The effect of a focal plane shutter on the lens OTF can be modeled as a modification
of the shape of the exit pupil.
OTF, the exact integral as shown in Eq. (10.71) has to be evaluated numerically with p(t)
and q(t) set to 0. In practice, for the case of w > w f − (d/F), the shutter effect on the total
system transfer function is negligible because the shutter slit is wide open compared with
the diaphragm and does not limit the passage of any imaging light.
It is not clear whether Shack’s assumption that the focal plane shutter can be treated as a
pupil plane shutter is valid in practice because the distance d between the image plane and
the shutter plane is about 5 mm, while the shutter slit is usually wider than 10 mm (the field
is observed too close to the aperture). It is necessary to gather experimental data in order to
verify Eq. (10.61). Obviously, the OTF can no longer be rotationally symmetric when the
camera uses a focal plane shutter. The question is how much effect it has on the final OTF.
The majority of today’s designs have a slit of variable width, traveling at a constant speed.
The speed can be as high as 6.7 m s−1 , as in the Nikon FM-2 camera, while other cameras
may have a lower speed. Assuming that the speed is 6 m · s−1 and the exposure time is set
to 1/500 s, the slit width will then be 12 mm. If d is 5 mm, f is 50 mm, then c = 13.44 for
f /2.8, and c = 76.8 for f /16. Recall that the normalized cut-off frequency is 1.0 when the
OTF of an ideal lens drops to zero, and, therefore, the effect of the focal plane shutter is
indeed negligible in most cases when there is no object motion. When the exposure time
is reduced to 1/4000 s, the slit width becomes 1.5 mm, and c = 1.68 for f /2.8. The effect
of the shutter on the system OTF then has to be considered. For more information about
typical shutter operations, Chapter 13 of [365] has a good but not too rigorous discussion.
of the instantaneous lens OTF with the projected image of the motion trajectory. The analysis
of object motion in the presence of shutter motion is simple for the interlens shutter, but
fairly complex for the focal plane shutter when the diffraction effect has to be considered.
and
1
G(u x , u y ) = S(t)L u x / S(t), u y / S(t) exp{−i2π [u x p(t) + u y q(t)]}dt. (10.65)
T
For the simplified case in which S(t) is a rectangle with amplitude 1.0, and the motion is a
linear uniform motion, i.e. p(t) = vx t and q(t) = v y t, Eq. (10.65), after normalization with
G(0, 0), becomes
where sinc(x) = sin(π x)/(π x). Since dx = vx T and d y = v y T are the normalized distances
traveled by the image point during the entire exposure time T , Eq. (10.66) can be simplified
to
Equation (10.66) is a familiar simple model often used for motion blur. For a high-
efficiency interlens shutter (i.e., S(t) close to a rectangle function), this tends to be a very
good approximation. It should be pointed out that Eqs. (10.63) and (10.66) are derived for
the region inside the moving object. They are also useful for translational camera motion
when the scene is stationary.
10.4 Optical blur introduced by the camera 265
Ht (u x , u y ) = s(x + vt + 12 u x )P(x + 12 u x , y + 12 u y )s ∗ (x + vt − 12 u x )
× P ∗ (x − 12 u x , y − 12 u y )dxdy. (10.69)
where
1
K (u x , u y , x) = s(x + vt + 12 u x )s ∗ (x + vt − 12 u x ) exp{−i2π [u x p(t) + u y q(t)]}dt.
T
(10.72)
For the case of a linear uniform motion (say, p(t) = vx t and q(t) = v y t), if w < w f −
(d/F), one can show that after normalization with G s (0, 0), Eq. (10.72) becomes:
where
for |u x | < c and 0 otherwise. It is interesting to note that the factor (1 − |u x |/c) which
represents the basic effect of the focal plane shutter on the stationary object still appears
in the same form for the moving object case. Furthermore, for most cases, c u x and
(c − u x )/v ≈ c/v which is roughly the exposure time. The second term of Eq. (10.74) is
therefore very similar to the first term of Eq. (10.66) for the interlens shutter.
Equation (10.71) can now be written as:
G(u x , u y ) = G s (u x , u y )L s (u x , u y ), (10.75)
where
1
L s (u x , u y ) = P(x + 12 u x , y + 12 u y )P ∗ (x − 12 u x , y − 12 u y )
π
× exp[i2π x(vx u x + v y u y )/v]dxdy. (10.76)
The factor π1 is the normalization factor. It should be pointed out that because we have
used the normalized frequencies in the derivation, vx and v y are in the normalized image
coordinate space, while v is the shutter slit speed projected onto the exit pupil and normalized
with the pupil radius. The ratio vx /v is equal to [d/(2λF 2 )](Vx /V ), if the unnormalized
speeds Vx and V are used. For a defocused perfect lens, P(x, y) = exp[ikW020 (x 2 + y 2 )],
and its OTF is given by:
1
L(u x , u y , ) = P(x + 12 u x , y + 12 u y )P ∗ (x − 12 u x , y − 12 u y )dxdy
π
1
= exp{ikW020 [(x + 12 u x )2 + (y + 12 u y )2 − (x − 12 u x )2
π A
− (y + 12 u y )2 ]}dxdy
1
= exp[ikW020 2(u x x + u y y)]dxdy, (10.77)
π A
10.5 Camera flare 267
where = W020 /(λ/4) and the integrals are only evaluated on the intersection area A
of the pupil functions P(x + 12 u x , y + 12 u y ) and P ∗ (x − 12 u x , y − 12 u y ). This equation is
the same as Eq. (10.41) which, because of circular symmetry, has been simplified to a
one-dimensional integral.
If we let ax = λvx /v and a y = λv y /v, Eq. (10.76) becomes:
1
L s (u x , u y ) = exp[ik(ax u x + a y u y + 2W020 u x )x + ik2W020 u y y]dxdy.
π A
(10.78)
If the lens is in perfect focus, then W020 = 0 and,
1
L s (u x , u y ) = exp[ik(ax u x + a y u y )x]dxdy. (10.79)
π A
Both integrals could be evaluated by rotating the x–y coordinates by an angle θ, where
tan θ = u y /u x , and then integrating over A numerically. For typical system parameters,
d = 5 mm, F = 8, λ = 500 nm, and a large motion Vx /V = 0.01 (i.e. the image point
moves 20 pixels in a 2000 pixel wide image), vx /v is about 0.78 and ax is 0.78λ, which is
three times the commonly acceptable aberration of 0.25λ. For a smaller amount of motion,
say Vx /V = 0.002, ax = 0.156λ becomes comparable with or smaller than the aberration
of a good camera lens.
When a light ray enters the front surface of the camera lens, it can be refracted to form part of
the optical image and it is called the imaging light. Alternatively, it can be reflected between
the lenses, the camera body, and the sensor, and eventually absorbed at a sensor location very
far away from its intended image position. In the latter case, it is called the flare light and it
does not contribute usefully to the intended image formation. Obviously, this definition of
flare light is not good enough for a quantitative analysis because strictly speaking, even the
image of a point object formed by a diffraction-limited lens has a very large spread over the
image plane. Conventionally the term flare is used to refer to light that has a spread that is
more than ten times greater than that caused by aberrations of the lens. Again this is a vague
concept because there is an implicit assumption that the spread of image-forming light is
finite, which is true only when we put a threshold on the irradiance level. For example,
if we define the size of a PSF as the smallest radius that contains 99% of the power, the
above definition can begin to make sense. If one searches through the literature, one comes
to the conclusion that there is no quantitative definition of flare that is commonly agreed
upon. In fact, camera flare is often defined operationally by its measurement instruments
and procedures [522, 809, 1008, 1054]. A common simplifying assumption is that flare
light tends to spread so widely that it covers the entire image plane uniformly. This is called
268 10 Lens aberrations and image irradiance
integrating box
light trap
black
mirror camera
hole
black
interior
diffuse reflecting surface
uniform veiling flare. Of course, the flare light distribution is not really uniform because
one can easily see more flare around light sources. The experimental setup [645, 940, 1008]
to measure the veiling flare usually consists of a black hole (a light trap) surrounded by a
uniformly illuminated background with luminance L (the uniform background is usually
part of the interior surface of an integrating sphere). Figure 10.21 shows an example of such
a flare measurement setup. The design of the black hole is such that no light is emitted from
the hole (other than the negligible blackbody thermal radiation at room temperature). The
hole typically subtends about 1◦ at the camera. If it is too small, the measurement will be
sensitive to focus error. If it is too big, the flare may not be uniform inside the image area
of the hole. The camera under test is used to take an image of the black-hole target and the
exposure in the center of the black hole is measured as a percentage of the exposure of the
bright background in the image. Good photographic cameras have less than 3% flare [645];
however, the measured value depends somewhat on the size of the black hole because of the
local nature of camera flare [1008]. Since a large portion (as much as 30%) of the incident
light is reflected from the film or the sensor, flare measurement is a function of not only the
camera, but also the film or the sensor used in the camera.
Zoom lenses with many lens surfaces tend to have much more flare, especially at long
focal lengths. The data also show that flare is a function of aperture size. At large apertures,
lens aberration, lens edge scattering, and reflection from lens inner surfaces and the camera
interior all contribute to flare. As the aperture is stopped down, the total flare light from
these sources is reduced. However, the relative area of the diaphragm edge that is exposed
to light increases and reflection from it creates a new source of flare. As a result, flare as a
function of aperture size ( f -number) is usually a U-shaped curve. For color photography,
flare has to be measured for the red, green, and blue components separately because lens
transmission and film-surface reflection are functions of wavelength. The green component
of the flare is usually the smallest of the three. The differences in the red, green, and blue
components of flare also contribute to the color cast in the film image and therefore, a
well-color-balanced flare is preferred.
The most visible effect of camera flare is that a dark area on the image does not appear
black, especially a dark area near a bright object. The minimum image illuminance is raised
by the flare and the apparent image contrast is reduced. Jones and Condit [473] defined the
10.6 Problems 269
flare factor as the ratio of the scene luminance range (L max /L min ) to the image illuminance
range (E max /E min ). Obviously, the flare factor depends on the scene contents as well as the
instrument used to measure the scene luminance range because the instrument itself might
have flare. A better model of camera flare is to treat it as a PSF that has a large support
[643, 1008]. Thus, it can be estimated from a knife-edge target by computing the MTF from
the image. In this case, flare shows up as a very-low-frequency component superimposed
on top of a smooth MTF. One can often see that in the very-low-frequency region, there is a
discontinuity in the curve. The frequency where the discontinuity occurs is near where the
image-forming MTF intersects with the flare MTF.
10.6 Problems
10.1 The exit pupil of an imaging system is located at z 1 = 10 and its radius is a = 4. The
image plane is located at z 2 = 18. What are the numerical aperture and the relative
aperture ( f -number) of this imaging condition? Assuming that there is no light loss
in the system, what is the on-axis image illuminance of a Lambertian object that has
a luminance of 100 cd m−2 ? The object is a plane perpendicular to the optical axis.
The object space and the image space are both in the air.
10.2 A camera is equipped with a flash lamp. The camera is used to take a picture of a
painting on a wall (see Fig. 10.22). Let us assume that the flash lamp is a point source
with equal intensity in all directions (although this is not true in reality). If the painting
and the wall are Lambertian surfaces, what is the dependence of the image illumi-
nance on cos θ, where θ is the off-axis angle of the camera? Although many museum
visitors take pictures this way, it is less than ideal. What are the potential problems?
painting
flash lamp
camera
Figure 10.22.
10.3 In order to take a good picture of a painting that is placed on a flat wall, we set up four
point sources. The painting is a square, 2 m long on each side. Let A, B, C, and D
be the four corners of the painting, and P be its center. The camera is 4 m away and
pointing at the center of the painting. In order to illuminate the painting as uniformly
as possible, the four point sources are placed on a plane parallel to the wall and at the
270 10 Lens aberrations and image irradiance
same distance away from the wall as the camera (i.e., 4 m). Each of the point sources
is directly aligned with one of the corners of the painting (i.e., the line connecting the
source and the corresponding corner of the painting is perpendicular to the wall). The
intensity of each of the point sources is 1000 cd and it is uniform in all directions.
The camera lens is set at f /8 (i.e., the f -number is 8.0) and the shutter at 0.1 second.
The lateral magnification, m, is much smaller than 1 (i.e., 1 + m ≈ 1). Assume that
the painting is a Lambertian surface and there is no light loss in the camera. (You can
assume that the reflectances are all equal to 1.0 or you can use ρ A , ρ B , ρC , ρ D , ρ P for
the reflectances.)
(a) What are the illuminances at the center P and the corners, A, B, C, and D, of the
painting?
(b) What are the exposures at the center P and the corners, A , B , C , and D , of the
film image of the painting, where P , A , B , C , and D are images of P, A, B,
C, and D, respectively?
10.4 Assume that the maximum acceptable blur circle is 10 µm in diameter on the image
plane. If the lens has a focal length of 16 mm and an f -number of 4, what is the hyper
focal distance of this camera ? If the camera specifications call for a focus range from
0.8 m to infinity, at least, how many focus points have to be calibrated to cover that
range?
10.5 A camera lens has a focal length of 50 mm. What is the size of the Airy disk (for
λ = 500 nm) when the aperture is set at f /8? (The size of the Airy disk is defined as
the radius of the first dark ring.)
11 Eye optics
The optics of the eye imposes the upper bound on the image details that can be seen by
the visual system. It is important to understand and be able to model this limit of image
quality under various viewing conditions, so that the performance by imaging systems can
be properly optimized. However, it should be pointed out that the optical characteristics of
the human eye are constantly changing throughout life, and there are also very significant
variations among individuals. In this chapter, we will first describe the important features
of the anatomy of the eye. Since the anatomy shows a structure too complicated to model in
detail, we will then describe two simplified optical models of the eye: the reduced eye and the
schematic eye. These models are very useful because they allow us to make good estimates
of geometrical metrics for our retinal images. We will discuss some optical properties of the
ocular media and the eye as a whole. We will also touch on the mechanism of accommodation
and pupil control. Finally, we will describe how to put together a computational model of the
eye optics for calculating the optical quality of the retinal image. Such a model will allow us
to perform more detailed analyses under various viewing conditions and for different stimuli.
Before our discussion on visual optics, we need to define “visual angle” as a measure of
image size and retinal distance. Since the image size of an object on the retina depends on
its distance from the eye, it is often more convenient to use visual angle to specify object
size or retinal distance. The visual angle formed by two image points on the retina is the
angle between two points relative to the image nodal point of the eye. The visual angle of
two points in the visual field is the angle between the two points relative to the object nodal
point of the eye (see Fig. 11.1).
The average dimensions of an adult human eye are about 24 mm from the anterior pole to the
posterior pole, 23 mm vertically, and 23.5 mm horizontally [962, p. 5]. The eyeball consists
of, approximately, segments of two spheres placed one in front of the other (Fig. 11.1). The
radius of the anterior sphere is 8 mm and that of the posterior sphere is 12 mm. Thus, the
centers of the two spheres are separated by 4 mm.
The eyeball has three coats, the sclera (with the cornea), the choroid (with the ciliary body
and the iris), and the retina. The outside radius of curvature of the cornea is about 8 mm, but
the horizontal radius is about 0.05–0.25 mm flatter than the vertical radius [822]. The margin
of the cornea is nearly, circular about 6 mm in radius. The iris is heavily pigmented and
271
272 11 Eye optics
sclera
cornea retina
anterior
chamber
Figure 11.1. Diagram of the horizontal section of the human eye. (To make it easier to label, the
positions of the fovea and the optic disk are drawn a little further away from the optical axis. See the
text for more accurate dimensions.)
contains muscles that constrict and dilate the pupil from 2 to 8 mm in diameter, depending
on the light level and the adaptation state of the eye. The entrance pupil, defined as the image
of the aperture stop as seen from the object side, is the pupil that one observes in a person’s
eye. It is the image of the real pupil magnified by the cornea. The exit pupil, defined as the
image of the aperture stop as seen from the image side, is the real pupil magnified by the
lens. The sizes of the entrance pupil and the exit pupil are about 1.125 and 1.05 times that
of the real pupil.
The sclera has a radius of curvature of about 12 mm, and comprises five-sixths of the
surface area of the eye, with the remaining one-sixth being the cornea [962, p. 4]. The
anterior cornea portion is more curved than the posterior sclera. The unit of the refracting
power for a lens is usually expressed in diopter (D), which is the inverse of focal length
or object distance in meters.1 A lens with a focal length 1 m is said to have a refracting
power of 1 D. A lens with a focal length 0.5 m is said to have a refracting power of 2 D.
The refracting power of the combination of two lenses is simply the sum of their individual
refracting powers. The relaxed eye as a whole has a refracting power of about 60 D, of which
the cornea is responsible for about 70%, and the remaining 30% is accomplished by the
lens behind it.
The retina, which contains the light-sensing photoreceptors, which is the innermost layer
(with respect to the center of the eyeball) in the posterior sclera portion. The photorecep-
tors are located at the outermost layer of the retina. All the optical media preceding the
superior
temporal nasal
5 5.5
1.03 posterior
pole
optic
disk 1.5
2 fovea
7.5
0.69 (blind
macula spot)
5
15
inferior
Figure 11.2. Diagram of the fundus of a human right eye. Redrawn from [94].
photoreceptors absorb more of the short-wavelength energy than the long-wavelength en-
ergy of the incident light. Furthermore, its optical density depends markedly on the age of
the eye, being much denser when it gets old. This is a significant factor to consider when
comparing colorimetric measurements [1053, pp. 108–112].
There is a shallow, rounded pit about 1.85 mm in diameter (about 5◦ of visual field)
on the temporal side of the optical axis of the eye (see Fig. 11.1). This area is called the
fovea centralis. In its center, there is an area, approximately 0.35 mm in diameter (about
1◦ of visual field), which is only 0.13 mm thick, much thinner than the surrounding retina
region (about 0.2–0.5 mm thick). This area is called the foveola. Around the foveola are
the parafovea (0.5 mm wide) and the perifovea (1.5 mm wide). The fovea region contains
mostly cones (with rod-free region about 0.5–0.6 mm, or 1.7–2◦ ), and the central region of
foveola contains almost no S (short-wavelength sensitive) cones.
Strictly speaking, the eye is not a rotationally symmetric structure and therefore, does not
have an optical axis. The center of curvature of the cornea is slightly shifted, relative to the
axis of the crystalline lens [580, p. 58]. Therefore, there is an ambiguity of 1–2◦ in defining
the best fit of an optical axis. The line connecting the center of the fovea with the image nodal
point is called the visual axis. The angle, α, between the visual axis and the optical axis is
about 5◦ . The visual axis extends from the object nodal point in the same direction outward
towards the point of fixation. The lens has a flatter front surface and a more curved back
surface. Its curvatures change as the eye accommodates for seeing objects at a short distance.
Figure 11.2 shows the spatial locations of the various structures when we look into the
fundus of a human right eye. Surrounding the foveola is a yellow-colored region, called the
274 11 Eye optics
macula lutea (or yellow spot), of about 3–5 mm diameter. The yellow coloration is probably
due to the presence of the carotenoid pigment xanthophyll in the ganglion and the bipolar
cells. The foveola region itself, having no bipolar and ganglion cells, does not have the
yellow coloration [962, p. 43].
Another significant structure on the retina is the optic disk, where all the nerve fibers
from the retina exit to the brain. The optic disk is about 7.5◦ high and 5.5◦ wide. The size
of the optic disk varies from individual to individual and also varies with race, age, gender,
and refractive error [986]. Typically its size is from 1.8 to 3.4 mm2 . Its center lies 1.5◦ below
the horizontal meridian, and its nasal boundary is about 13◦ temporal to the fixation point
[659, p. 179]. It is located about 4 mm from the fovea. The optic disk does not have any
photoreceptors, i.e., the portion of the optical image falling on that region of the retina is
not sensed at all, and the optic disk is also called the blind spot. One interesting problem to
think about is why our visual image does not have a blank hole in it, if the blind spot does
not detect image signals at all. This is a very intriguing question which leads us to rethink
very carefully how “visual images”, as we perceive them, are actually “synthesized” in our
brain. The optical retinal images and our mental images generated from them are two very
different sets of images!
For various applications, we need different levels of approximation of the eye optics. As
we discussed in the last section, the eye is not rotationally symmetric, nor is the fovea
located on the “optical axis”. Furthermore, the index of refraction of the lens of the eye
varies considerably from the center (highest) to the edge (lowest). This helps to reduce the
spherical aberration of the eye. However, in oversimplified models, we simply treat it as a
centered system with uniform indices of refraction in all the ocular media. The simplest
form of approximation is the reduced eye, which consists of a single spherical surface with a
radius R of 5.55 mm. The apex of the equivalent spherical surface is located about 1.67 mm
behind the apex of the cornea. The reduced eye ball is filled with the vitreous humour,
which has an index of refraction equal to 4/3 and a dispersion similar to water. The image
focal length can be calculated as n 2 R/(n 2 − n 1 ) = 22.22 mm, with the resulting image
focal point located at the retina [1025]. Since the nodal point is at the center of the sphere,
the retina is therefore at 16.67 mm behind the node point. For this reduced eye, a distant
ray directed at the node point will not be refracted and the visual angle is calculated with
16.67 mm as the radius of curvature. For example, near the optical axis, 1 mm on the retina
corresponds to 3.437◦ .
The next level of complexity of an eye model is a paraxial schematic eye that has more
than one refracting surface. There are several models that have been widely used for paraxial
approximation. Here we will not give the parameters for these models. They can be found
in many books (for example, [884] has a very good summary). Instead, for our applications,
the positions of the six cardinal points are more valuable. As we have already learned in
the previous chapters, a general imaging system can be characterized by four parameters
(e.g., the two focal points and the two focal lengths). In visual optics, we often assume that
11.2 Reduced eye and schematic eyes 275
one medium is air and the refractive indexes of the optical media of the eye are known. The
two focal lengths are thus related. Therefore, there are only three more parameters to be
specified. A convenient choice is to specify its focal length, f (when the index of refraction
is 1), and the two principal points, H1 and H2 . Other cardinal points can be calculated from
these. For example, let F1 and F2 be the object (front) focal point and the image (back)
focal point, N1 and N2 be the object nodal point and the image nodal point, n 1 and n 2 be the
indices of refraction of the object space and the image space, and f 1 and f 2 be the object
and image focal lengths.2 It can be shown that they are related by:
F1 H1 = f 1 = n 1 f, (11.1)
H2 F2 = f 2 = n 2 f, (11.2)
H1 N1 = (n 2 − n 1 ) f, (11.3)
H2 N2 = (n 2 − n 1 ) f. (11.4)
2 Note that, following the traditional usage in eye optics, f 1 , f 2 , and f in this chapter are all positive numbers.
If we use the convention in the previous chapters, the image focal length f = − f 2 .
276 11 Eye optics
relaxed state
H E N F2
F1
F1 HE N F2
accommodated state
For the accommodated state, it is located at the second lens surface. Figure 11.3 shows the
cardinal points of the schematic eye in the relaxed and the accommodated state. A is the
vertex of the cornea. For the sake of clarity, the two pupil locations are marked as E, where
E 1 , the entrance pupil, is located in front of E 2 , the exit pupil. The principal points, H1 and
H2 , are marked only with H , with the understanding that H1 is in front of H2 . Similarly, N1
11.2 Reduced eye and schematic eyes 277
AH1 1.51
AH2 1.82
AN1 7.11
AN2 7.42
AF1 −15.16
AF2 24.09
is in front of N2 . The distances between these points are listed in Table 11.4. The relaxed
state of the Gullstrand–Emsley schematic eye has an equivalent power of 60.49 D. The
index of refraction of the air, n 1 , is assumed to be 1.0 and that of the eye, n 2 , is 1.333. The
object (front) focal length is 14.98 mm +1.55 mm = 16.53 mm, and the image (back) focal
length is 23.89 mm −1.85 mm = 22.04 mm. The radius of the entrance pupil is 4.000 mm
(magnification 1.130), and that of the exit pupil is 3.667 mm (magnification 1.036). The
accommodated state of the Gullstrand–Emsley schematic eye has an equivalent power of
69.72 D. The object (front) focal length is 14.34 mm, and the image (back) focal length is
19.12 mm. The radius of the entrance pupil is 4.000 mm (magnification 1.114), and that of
the exit pupil is 3.766 mm (magnification 1.049).
After reviewing some more recent data, Bennett and Rabbetts [94, p. 253] suggested
changing the value of n 2 to 1.336 and the equivalent power of the relaxed eye to 60.0 D,
which corresponds to an object (front) focal length of 16.67 mm. The image (back) focal
length is n = 1.336 times that, i.e., 22.27 mm. For the revised model of a relaxed eye, the
cardinal points are given in Table 11.5. If the eye is accommodated, the equivalent power
increases and the focal length decreases. For example, if the eye is accommodated at an
object distance of 66.67 cm, i.e., 1.5 D, the equivalent power of the eye becomes 61.5 D and
the front focal length is 16.26 mm. In the computational eye-optics model we will develop
shortly, the revised values of n 2 and f 1 will be used.
278 11 Eye optics
Visual angles are often used in specifying spatial dimension in visual perception. When
relating the distance between two points on the retina to the visual angle, we have to
convert between these two quantities. The relation between the retinal distance and the
visual angle can be derived from the reduced eye or the schematic eye. For small visual
angles, the paraxial eye model is adequate, and the relation is a simple proportionality. For
example, in the relaxed state of the original Gullstrand–Emsley schematic eye, the image
nodal point is located 16.53 mm from the retina. For a region near the optical axis, the
visual angle dθ [degree] is then related to the retinal distance dx [mm] by dθ = 3.466dx.
In the accommodated state, it becomes dθ = 3.996dx. However, these approximations do
not apply a distance away from the optical axis because they do not take into account the
actual radius of curvature of the real retina (about 11 mm), which is significantly smaller
than the distance from the image nodal point to the retina. Let r be the radius of curvature
of the retina and R be the distance from the image nodal point to the retina. Then,
180 −1 ( r x )
θ= sin sin [degree], (11.5)
π R r
where x is the arc distance on the retinal surface from the point where the retina intersects
with the optical axis.
The above discussion is based on the paraxial eye model. To derive the conversion relation
for retinal points far away from the optical axis, we need to use a wide-angle model of the
eye optics and do ray tracing through the refracting surfaces of the cornea and the lens.
Currently, there is no agreed-upon wide-angle model of the eye optics. To convert between
retinal distance and visual angle, the curves published by Drasdo and Fowler [265] have
quite often been used. They calculated their numerical results based on their schematic
human eye, which has an equivalent power of 63.72 D, with the radii of curvature for the
cornea, the anterior lens surface, the posterior lens surface, and the retina being 7.80 mm,
10.0 mm, 6.0 mm, and 11.06 mm, respectively. The first lens surface is located 3.6 mm
from the apex of the cornea, while the object nodal point is at 6.95 mm and the image
nodal point is at 7.32 mm. The retina is located 23.01 mm from the apex of the cornea.
The unique feature of their schematic eye is the cornea, which is an ellipsoid. An empirical
equation of the nonlinear relation between the retinal distance and the visual angle is given
by
puts the value at 291 µm/degree [1033]. A simple number to remember is that 1 mm retinal
distance corresponds to about 3.5◦ .
From Eq. (10.17),3 the retinal irradiance, E , is related to the scene radiance, L, by
πa 2 L·p L·p
E ≈ L = ≈ 2 ,
(1 + m)2 f 2 (1 + m)2 f 2 f
where a is the radius of the entrance pupil, m is the lateral magnification,4 f is the image
(back) focal length, and p is the entrance pupil area. Since the lateral magnification, m, of
the eye is typically much smaller than 1, the retinal irradiance is proportional to the product
of the scene radiance L and the pupil area p. In most imaging applications, scene radiances
or luminances are measured. In order to calculate the retinal irradiance or illuminance when
the eye is viewing an external object of a known luminance, we need to know the pupil size.
Conventionally, retinal illuminance is given in trolands. A troland is defined as the retinal
illuminance when a surface of luminance 1.0 cd m−2 is viewed through a pupil of 1 mm2 .
The troland value of an eye with a p mm2 pupil is simply the product of the luminance and
p. If the luminance is calculated with the CIE 1924 photopic luminous efficiency function
V (λ), the unit is called the photopic troland, or simply the troland. If the scotopic luminous
efficiency function, V (λ), is used, the unit is called the scotopic troland.
The central bright area of the PSF of a diffraction-limited system is called the Airy disk and
its radius (the first dark ring) is equal to 1.22λ f /d, where f is the image focal length and d
is the diameter of the pupil. For example, if the pupil diameter is 2.5 mm, the wavelength
is 550 nm, and the image focal length is 22.22 mm, then the diameter of the Airy disk is
11.9 µm. The minimum cone spacing in the fovea is about 0.5 minutes of arc, which is
about 2.42 µm. The eye with a 2.5 mm pupil is close to the diffraction-limited system and
even under this condition the Airy disk still covers several cones. The best MTF for a group
of subjects seems to occur at a pupil size of 3 mm [35]. From these simple calculations, we
know that our focus detection mechanism cannot detect image blur on a single-cone basis.
Charman and Whitefoot [170] found that for a 2 mm pupil diameter, the mean depth of
field is about 0.6 D, and it drops to about 0.3 D at a 7 mm pupil diameter. Other studies
using subjective methods [41] and objective methods [632] showed similar numbers. These
numbers are compatible with the reported threshold of perceptual blur being about 0.15
D [665, p. 57]. The depth of field of our visual system is not as sensitive as the ideal
diffraction-limited lens to the change in the pupil size.
3 Here, the cos4 θ fall-off factor is not considered for the eye because the retinal surface is curved.
4 Here, m = |m y | is the absolute value of the lateral magnification, m y , which is a negative number for our
eyes, i.e., our retinal images are inverted.
280 11 Eye optics
tonic
2
accommodation low luminance
−1 0 1 2 3
stimulus distance (diopter)
Figure 11.4. Examples of eye accommodation response functions. At a high luminance level with a
stimulus with good luminance contrast details, the eye accommodates better and the response curve
(thick solid curve) quite closely follows the ideal diagonal line (chain line). At a low luminance level
or without good contrast details, the response function (dashed curve) deviates far from the ideal
line.
Experiments [174, 584] show that when the eye is presented with a uniform visual field
without a high-contrast target to focus at, it is accommodated to a tonic level (reflecting the
tonic level of neural activity when the accommodation system has no input signal to drive it).
This is often called the tonic accommodation or resting accommodation [857]. The tonic
accommodation appears to decrease with age [196]. The average tonic accommodation
for young college-age observers is about 1.52 D (corresponding to an object distance of
25.9 inches) with a standard deviation of about 0.77 D [584].
Figure 11.4 shows two examples of the eye accommodation response as a function of
stimulus distance; both are measured in diopters. Zero diopter means that the object is
located at infinity, and 1 D means at 1 m away, etc. The chain diagonal line represents an
ideal response, i.e., the eye is always focused at exactly the distance where the stimulus is
located. The real eye response deviates from this ideal. If the object is located at a distance
further or closer than the distance of the tonic accommodation of the observer, the eye tries
to accommodate to the correct distance, but the accommodative response seems always to
fall short of the amount needed, and thus creates a focus error. For example, if the tonic
accommodation is 1.5 D and the object vergence calls for a 3.0 D accommodation, the eye
typically responds with only a 2.85 D accommodation, resulting in a D of 0.15 D. If the
object vergence calls for a 0.8 D accommodation, the eye usually responds with a 0.87 D
accommodation, resulting in a D of −0.07 D. The general behavior of the accommodative
response system is therefore to respond with a lag behind the needed change from its tonic
level. This behavior is true for polychromatic as well as monochromatic light [172]. If we
11.6 Focus error due to accommodation 281
model the accommodative mechanism as a feedback control system, we can identify the
tonic accommodation as the open loop output. The accommodation error results in image
blur which is detected by the sensing mechanism to generate the control signal. If the blur-
sensing mechanism has an inherent threshold, the accuracy of the accommodative response
depends on the form of the target and the depth of focus of the eye. Experimental data
show that the accommodative error is a function of the spatial frequency content of the
target [169, 196, 737], the luminance [468], and the pupil size [360, 395]. The accuracy
of accommodation is best for intermediate spatial frequencies (4–6 cycles per degree) and
deteriorates at higher and lower frequencies, following the general trend in the contrast
sensitivity function.
The accommodation response function is generally an S-shaped curve with a near
straight-line portion around the tonic accommodation level with a slope about 0.86 for
a Snellen target [171]. The variation of slope among individuals is fairly significant, e.g.,
from 0.72 to 0.92 for a 4.3 cycles per degree square wave target at 311 cd m−2 [737]. The
slope decreases very slightly as the luminance of the target is reduced from bright daylight
level. The rate of decrease accelerates greatly when the luminance falls below about 0.5 cd
m−2 [468]. The slope also decreases when the pupil diameter is reduced [395].
For targets of polychromatic light, a question arises as to which wavelength the eye will
accommodate to and bring to focus. Evidence suggests that the wavelength in focus on the
retina is a function of the viewing distance. It has been shown [664] that the eye focuses
at the long wavelengths (about 680 nm) when looking at infinity, and it focuses at shorter
wavelengths (484 nm) when looking at an object 25 cm away. This observation is consistent
with the data from the accommodation response curve. Suppose that the eye is trying to
focus at a middle wavelength (say, 540 nm) when it is at a tonic accommodation level of
1.5 D (65.8 cm). If the object is now moved to infinity, i.e., 0.0 D, the eye responds with
an accommodation to 0.6 D for 543 nm. This focus error means the 543 nm wavelength
image is out of focus and the wavelength in focus is shifted to a longer wavelength (about
660 nm). Conversely, if the object is moved closer to 25 cm (4.0 D), the eye responds with
an accommodation to 3.6 D, and the wavelength in focus is shifted to about 485 nm. We
can therefore account for the change of wavelength in focus by using the accommodative
response function, without invoking an additional empirical relation. However, this still
leaves a question unanswered: how do we calculate the wavelength that the eye is trying to
focus on? In fact, it seems that, with proper practice, the eye can focus on monochromatic
light of any wavelength [172]. There are two types of information that seem to suggest
that, to a good approximation, the sustained accommodation component focuses at the
luminance contrast. First, it was shown that accommodation was essentially unresponsive to
the isoluminant edge and exhibited increasing focusing accuracy with increased luminance
contrast [1042]. Second, it was found that for a white light stimulus, the accommodative
response curve matches that of the luminance-weighted average wavelength [172]. Because
of the accommodation error, the average wavelength is usually not the wavelength in focus.
If the object is closer than the tonic accommodation, the wavelength in focus is shorter than
the average wavelength. Conversely, if the object is further than the tonic accommodation,
the wavelength in focus is longer.
282 11 Eye optics
For imaging applications, it is important to be able to calculate the natural pupil size from the
image viewing condition. However, the pupillary response seems to be determined by many
complicated variables, such as changes in luminance [1053, p. 106], chromaticity [1062],
attitude [398], spatial pattern [975], accommodation [949, p. 418], and others [603]. The
natural pupil size of an eye looking at images is thus expected to change constantly with time.
Studies on pupillary response reveal that there are two components of the response: transient
and sustained. For imaging applications, a practical solution is to calculate the sustained
component from the viewing condition, knowing that this is a somewhat compromised
approach.
The spectral sensitivity of the sustained component of the pupillary response is close,
but not identical, to the luminance efficiency function. Several empirical formulas have
been proposed for calculating the pupil diameter from the luminance level of the viewing
condition [1053, p. 106]. The following relation proposed by de Groot and Gebhard [243]
is a useful approximation:
where d is the pupil diameter (in millimeters) and L is the luminance level (in candelas per
square meter) of the adapting visual field.
The lighting level for image viewing controls the pupil size, which in turn affects the
contrast sensitivity and visual acuity [100, 582] , presumably because the optical aberration
of the eye worsens when the pupil size becomes larger than about 3 mm in diameter.
However, the lighting level can be measured with either photopic luminance or scotopic
luminance. It is somewhat surprising that the pupil size in indoor viewing has been shown
to correlate much better with scotopic luminance than with photopic luminance [101], even
in the luminance range between 20 cd m−2 and 300 cd m−2 , at which level rod response
seems to have long reached saturation [13]. Regression on the experimental data showed
that log pupil area is linearly related to log scotopic illuminance measured in the plane of
the viewer’s eye:
where A is the pupil area in square millimeters and S is the illuminance at the eye in scotopic
lux.
Light entering the pupil at different distances from the optical axis is perceived as having
different brightnesses. This effect is called the Stiles–Crawford effect of the first kind. It is
attributed to the directional sensitivity of the photoreceptors [283]. The effect reduces the
adverse influence of the marginal rays entering the pupil and improves the OTF of the eye
slightly.
11.9 Visual acuity 283
Stiles [908] first proposed using the following formula to fit the luminous efficiency data,
η, of the Stiles–Crawford effect:
η = 10− pr ,
2
(11.9)
where r is the distance (in millimeters) on the pupil from the peak location of η, and p is a
function of the wavelength of the incident light. Table 11.6 [1053, p. 427] lists the value of
p as a function of wavelength. Although later studies [816] found that a Gaussian function
fits log η better than the simple quadratic function in Eq. (11.9), the difference is very small
for a pupil size smaller than 6 mm, which corresponds to a field luminance of less than
0.002 cd m−2 . Also, it was found [29] that η is not rotationally symmetric, but the deviation
is again not large. The Stiles–Crawford effect can be modeled as an apodization function
A(x, y) for the OTF computation using the generalized pupil function P(x, y).
Studies performed on large populations [29] showed that the peak position of the Stiles–
Crawford effect is located at nasal 0.51 mm and superior 0.20 mm. Using the nodal points in
the Gullstrand–Emsley schematic eye, the peak location is approximately mapped to 4.46◦
temporal and 1.78◦ inferior on the retina when the eye is accommodated. That position is
about where the fovea is located.
The capability of the eye to see spatial details is called the visual acuity. Depending on the
type of spatial details that are used for measuring the acuity, we can roughly classify acuity
into five types [884]:
1. Grating acuity (about 2 minutes of arc): this measures the highest sinusoidal frequency
that the eye can resolve. It is usually between 30 and 60 cycles per degree.
2. Snellen (letter) acuity (about 5 minutes of arc): this measures the smallest size of letter
that can be recognized.
3. Vernier acuity (about 10 seconds of arc): this measures the minimum detectable amount
of mis-alignment between two line segments.
4. Point discrimination acuity (about 1 minute of arc): this measures the discrimination
of two point white sources on a black background or vice versa.
5. Stereo-depth acuity (about 5 seconds of arc): this measures the minimum stereo dis-
parity that can be resolved.
These are single number measures. In the next sections, we discuss the MTF and OTF,
which are functions of spatial frequencies.
284 11 Eye optics
1. the double-pass method, which measures the point image reflected from the retina
[158, 695];
2. the psychophysical method, which measures the visual contrast sensitivity of an ob-
server with and without the degradation of the eye optical degradation; the ratio between
the two is an estimate of the eye MTF [157]. The eye optics is bypassed by using laser
sources to create interference gratings on the retina;
3. the objective aberroscope method, which measures the wavefront aberration of the eye
[821, 1004];
4. the computational method, which calculates the eye OTF from eye aberration data
[984].
Each of these methods has its shortcomings. The double-pass method assumes that the
retina is a perfect diffuser and that it contributes the major reflection component. In reality,
the retina is not a perfect diffuser, and there may be many other reflection components. Psy-
chophysics measurement is sensitive to noise at high frequencies and chromatic aberration
is not included because lasers are monochromatic sources. The aberroscope relies on the
reflected image from the eye and therefore suffers the same drawbacks as the double-pass
method. The computational method is very flexible, but data are not available on all aspects
of eye aberrations.
So far there is no internationally accepted standard eye MTF. Campbell and Gubisch’s
data [158] are usually used for comparison. Vos, Walraven, and van Meeteren [993] com-
bined those data with other flare measurement data to produce a numerical table of the eye
PSF at six pupil sizes. Charman [173] considered this set of data the most useful currently
available in 1982.
Improvements in instrumentation have made it possible to measure a large number of
eye MTFs [34, 36, 632, 695, 1004]. As a result, several empirical formulas for eye MTF
have been proposed [37, 242, 338, 438, 465, 695, 994].
An analytical expression for the PSF of the eye was proposed by Geisler [338]:
a1 r 2 a2 r 2
h(r ) = exp −0.5 + exp −0.5 . (11.10)
2s1 s1 2s2 s2
1. 0
0. 6
modulation transfer
0. 4
0. 2
0. 0
0.0 10.0 20.0 30.0 40.0 50.0
spatial frequency (c/deg)
Figure 11.5. The model MTFs as seen by the L, M, and S cones with a 3.0 mm pupil.
interested in the optical image quality as seen by the three human cone systems, each having
a different spectral sensitivity.
In 1974, van Meeteren used eye aberration data to calculate the average eye MTF [984].
This is a very flexible tool for estimating the eye MTF under various application conditions.
Figure 11.5 shows an example of such a calculation. The spectrum of the incident light is
D65 and the LMS cone spectral sensitivity functions are used to compute the wavelength-
weighted average MTF for each cone type.
In order to compute the eye OTF, we have to determine the pupil function5 P(x, y) for
the eye, which is expressed as:
# $
2πn 2
P(x, y) = A(x, y) exp[ikW (x, y)] = A(x, y) exp i W (x, y) , (11.19)
λ
where λ refers to the light wavelength in the air. Since the image space of the eye has an index
of refraction, n 2 = 1.336, which is different from that of the air (n 1 = 1.0), the wavelength
parameter λ has to be divided by n 2 . However, most eye aberrations are measured external
to the eye and expressed in diopters. The index of refraction is automatically absorbed into
the aberration coefficients in the experimental data. It is therefore better to cast the pupil
function into the following form:
# $
2π
P(x, y) = A(x, y) exp i W (x, y) , (11.20)
λ
5 The pupil function here refers to the generalized pupil function as defined in Eq. (10.20).
11.11 Method of eye MTF calculation by van Meeteren 287
1.0
3.0 mm pupil
0.8 D65
normalized Image Irradiance
0.6
0.4
S
0.2
M
L
0.0
−0.10 −0.05 0.00 0.05 0.10
retinal distance (degree)
Figure 11.6. The model point spread functions as seen by the L, M, and S cones with a 3.0 mm pupil.
where W (x, y) is introduced to express the eye aberrations in terms of the externally mea-
sured data, and thus
W (x, y) = n 2 W (x, y) = Wf (x 2 + y 2 ) + Ws (x 2 + y 2 )2 + Wa y 2 + Wc (x 2 + y 2 )y + Wd y,
(11.21)
where Wf represents the defocus, due to either accommodation error or chromatic focus
error, Ws is the spherical aberration, Wa represents the astigmatism, Wc the coma, and Wd
the distortion. The dependence of the image coordinate Y is absorbed into the respective
aberration coefficients, because we are only interested in the fovea vision and therefore
Y is approximately a constant. Also the field curvature aberration, W220 Y 2 (x 2 + y 2 ), is
neglected because the retina is approximately “concentric” with the curved image surface
and the error is quite small. The OTF of the eye can be calculated as the autocorrelation of
the generalized pupil function as shown in Eq. (9.143).
If we can collect the aberration data for the average subjects that we are interested in
modeling, we can then compute the MTF of their eyes. Furthermore, the spectral composi-
tion of the color stimulus, the spectral sensitivity functions of the cone, the focus error, etc.
can all be simulated to see their effects on the eye MTF.
One of the advantages of the computational model is its flexibility. For example, Vos,
Walraven, and van Meeteren’s data [993] do not allow us to calculate the MTFs as seen by
the L, M, and S cones. Figure 11.5 shows an example of the MTFs as seen by the three
photoreceptors from the model calculations. The flare is not included. The figure shows the
horizontal frequencies at a 3.0 mm pupil size. The half-height frequencies of the MTF as
seen by the fovea cones are found to be 19.6, 19.1, and 4.7 cycles per degree for the L, M,
and S cones.
288 11 Eye optics
The profiles of the corresponding PSF can also be computed. They are shown inFig. 11.6.
The half-height widths of the PSF as seen by the L, M, and S cones in the fovea are found
to be 42.0, 42.2, and 108.4 seconds of arc, respectively. The foveal cone diameter is about
30 seconds of arc, smaller than the half-height width of the L-cone point spread function.
11.12 Problems
11.1 What is the size of retinal image of the sun in terms of visual angle? Assume that the
sun has a radius of 6.95 × 108 m and its distance from the earth is 1.5 × 1011 m.
11.2 Use the matrix method in Section 9.7 to calculate the positions of the six cardinal points
for the Le Grand theoretical eye with four refracting surfaces as given in Section 11.2.
Verify the results with the numbers given in the same section.
11.3 Let the luminance of a surface be 100 cd m−2 and the eye be fully adapted to it. What
is the retinal illuminance in terms of the troland? Use Eq. (11.7) to calculate the pupil
size.
11.4 If a light source is a blackbody radiator at 2800 K, what is the ratio of the scotopic
troland to the photopic troland?
12 From retina to brain
Although our interest in the human visual system is mainly in the role it plays as the
observer in a color imaging system, we cannot completely treat it as a big black box,
because the system is nonlinear and far too complicated to be characterized as such. There
have been many attempts to apply linear system theory to the human visual system and
characterize it with a system transfer function. That approach may serve some purposes in a
few applications, but it is inadequate for most color imaging applications. Another possible
approach is to treat it as many medium-sized black boxes, one for each special aspect in
image perception. This is a more practical approach and it has been used very well for many
applications. For example, we can measure the human contrast sensitivity as a function of
luminance, field size, viewing distance, noise level, and chromatic contents, and the results
can be used to design the DCT quantization tables for color image compression. However,
the medium-sized black box approach does not give us much insight or guidance when our
problem becomes more complicated or when we have a new problem. The size of the black
boxes has to be reduced. An extreme limit is when each box corresponds to a single neuron
in the visual pathway. Even then, some details inside the neuron may still be important
to know. In general, how much detail we need to know is highly application-dependent,
but the more we know the better we are equipped to deal with image perception related
questions. For example, it is often desirable to build a computer model to predict if the
results of certain image processing operations, such as image compression/decompression
in medical imaging, will produce visible artifacts. A special-purpose computer model of
this kind (such as fixed luminance level, fixed noise level, and fixed target patterns) is
not too difficult to build because we can string together several medium-sized black boxes
to get what we want. But to build a general-purpose model for varying conditions and
application domains, we need to know more details about how the black boxes interact with
each other. It also helps to know if certain operations make sense in a visual model in terms
of the possibility of their being feasible in the human neural system. Yet another reason
for studying such a subject is the hope that we may gain some insight into how we might
attempt to solve very difficult vision problems by learning from the best system we know.
For these reasons, we will study, in this chapter, some of the known neural structures in the
visual signal processing.
289
290 12 From retina to brain
parietal
frontal lobe
lobe
occipital
lobe
temporal
lobe
visual cortex
Figure 12.1. The four lobes of a human brain (a side view).
Vision is one of the five sensory functions that we have for collecting information about
the world we live in. The optic nerve contains about one million neural fibers transmitting
information from the retina to the brain. In comparison, the auditory nerve contains only
about 30 000 fibers, and the number of all the dorsal root fibers entering the entire spinal
cord is less than one million. Visual input thus accounts for about 40% of the total sensory
input to our brain. The complexity and importance of our visual signal processing are quite
obvious. Most of that processing is performed in our brain.
The human brain is a very complicated organ. On the average, it weighs about 1400 g.
Structurally, the brain is composed of the cerebrum, the cerebellum, and the brainstem. The
cerebrum contains two cerebral hemispheres (the left brain and the right brain). A very
distinctive feature of human cerebral hemispheres is their convoluted and folded surfaces.
A ridge on these heavily folded surfaces is called a gyrus, and a groove between ridges is
called a sulcus (however, a very deep groove is called a fissure). Although the appearance of
these gyri and sulci varies from one brain to another, certain major features are reasonably
constant in human brains. For example, four prominent sulci – the central sulcus, the lateral
sulcus, the parieto-occipital sulcus, and the calcarine sulcus (in junction with the preoccipital
notch) – are used to divide each cerebral hemisphere into four regions: the frontal lobe, the
parietal lobe, the temporal lobe, and the occipital lobe, which occupy roughly the front,
the top, the side, and the back parts of the hemisphere, respectively. Figure 12.1 shows the
locations of these four regions. In the discussion of the brain anatomy, we often have to refer
to the direction of a structure. The directional terms are of two types. Terms such as anterior,
posterior, superior, and inferior are relative to the normal upright orientation of the body as a
whole. Thus, we say that the frontal lobe is anterior to the parietal lobe. Terms such as dorsal
(top) and ventral (bottom) are relative to the center of the structure in discussion. If the brain
is treated as a sphere, then the northern hemisphere (the top part) is the dorsal direction
and the southern, the ventral direction. Note that surface normals at various points on the
northern hemisphere are not all pointing in the same direction. The only thing we can say is
12.1 The human visual system 291
that all of them have a positive “north” component. In comparison, the superior direction is
strictly north and has no horizontal component. Of course, no structure in the brain has the
shape of a sphere, but the dorsal/ventral convention roughly corresponds to such an analogy.
Thus, we can say that the visual pathway going from the occipital lobe to the parietal lobe
is the dorsal pathway (relative to the center of the brain) and the visual pathway going from
the occipital lobe to the bottom of the temporal lobe is the ventral pathway.
One of the major achievements in neural science is the establishment of the functional
localization of various brain areas (see [1066] for some very interesting historical accounts).
Various sensory and motor functions are performed in different localized cortical areas. For
example, Broca’s area in the frontal lobe is the first area so identified. It was hypothesized by
Pierre Paul Broca in 1861 to be involved in speech [1066]. The area that is mainly dedicated
to processing visual signals (the visual cortex) is located on the occipital lobe at the back of
the brain (the shaded area in Fig. 12.1). The visual cortex is further divided into the primary
visual cortex (V1) and the visual association cortex (V2, V3, V4, and V5) surrounding the
primary visual cortex [981, 982].
Visual perception begins at the photoreceptors (rods and cones) in the retina. The visual
signals are then processed and sent to the visual cortex and other areas in the brain. Fig-
ure 12.2 shows a schematic diagram of the visual pathway from the retina to the brain.
There are several important features that should be pointed out in this figure. The first is
striate cortex
that the optic nerve leaving the retina carries information about the whole visual field of
that retina, but the neurons that carry the visual signals from the right half of the visual
field separate out from those carrying the signal from the left half at the optic chiasm and
join with the neurons that carry the right visual field from the retina of the other eye. The
combined optic tract goes to the lateral geniculate nucleus on the left side of the brain, and
from there projects to the visual cortex of the left brain. Similarly, the visual signals from
the left visual field are projected to the right brain. The major communication between the
left brain and the right brain is through the corpus callosum (which is a huge fiber bundle,
serving the equivalent function of computer cables). The second notable feature is that
the visual cortex is not the only brain area that receives visual signals from the retina. There
are also other projections from the retina, such as: (1) to the superior colliculus (for control
of eye movement), (2) to the pretectal area (for control of pupillary light reflex), and (3) to
the hypothalamus (for light related control of the biological clock). Projections also exist
from the visual association cortex to the superior colliculus and/or the pretectal area (for
the accommodation reflex) and from the visual cortex to the superior colliculus (possibly
for visual attention).
The main pathway for visual information transmission and processing (see Figs. 12.3 and
12.4) is from the photoreceptor to the bipolar cells, the ganglion cells, the lateral geniculate
nucleus (LGN), the primary visual cortex (V1) and the visual association cortex (V2, V3,
V4, and V5) both sequentially and in parallel. Further projections are made to the temporal
lobe and the parietal lobe. The primary visual cortex is also called the striate cortex (because
of its visible layer structures) or area 17 (according to Brodmann’s chart [1066, p. 69]). The
visual association cortex is also called the prestriate cortex and it includes areas 18 and
19. The details of the visual information processing in the brain are mostly unknown and
remain areas of intensive research.
The physiological correlations with the various visual functions are established through
numerous measurements from various neurons in the visual pathway. However, such a con-
nection between perception and neural responses is not easy to make. Imagine how difficult it
would be to understand how computers work by measuring electrical waveforms at different
points in the complex circuits and chips. Fortunately, certain higher-level concepts in visual
information processing have emerged from some early physiological and psychophysical
studies. We will explain two such important concepts, receptive field and channel, before
we discuss the neural structures and signal processing.
In the study of neuron response to various input stimuli, it is found that a neuron (es-
pecially that in the early processing stage of a sensory signal) only responds to stimuli
presented in a certain spatial region of limited extent. For example, a fovea ganglion cell
responds (changes its discharge rate) to a small spot moving across the central region of
the visual field, but does not respond to stimuli that are presented in the peripheral region
of the visual field. The region of the visual field within which a neuron responds to any
input stimuli is called the receptive field of that neuron. Since the external visual field can
12.2 The concepts of receptive field and channel 293
be mapped to the retina, the receptive field of a neuron is often specified in terms of the
region on the retina where some projected image of an external stimulus causes the neuron
to respond. An important thing to point out is that neural connections rely on some thresh-
old mechanisms to communicate with each other and therefore the measured size of the
receptive field of a retina cell may be a function of several variables, such as the level of the
illumination and the state of the visual adaptation. We should not constrain ourselves with
a preconceived notion that the receptive fields are fixed structures. For example, the area
for the spatial summation of rod signals seems to grow larger when the luminance level is
reduced. This signal pooling works to improve the signal-to-noise ratio when there is little
light available for seeing.
If the response of a neuron to a light pattern presented to its receptive field is determined
by the weighted sum of the image illuminance over the entire receptive field, the neuron is
said to have a linear response and this property is called the linearity of spatial summation. In
this case, the neuron can be modeled as a (linear) filter and the weighting factor as a function
of the spatial position in its receptive field can be treated as the correlation kernel of the
filter. If the weighting function is circularly symmetric, it can also be treated as a convolution
kernel. (It should be understood that when we say that a neuron, other than a photoreceptor,
responds to some light stimulus, we mean that the light is sensed by the photoreceptors and
the responses of the photoreceptors are transmitted through various intermediate neurons to
the neuron that we are discussing.) For a small input dynamic range (signal of low contrast),
many neurons can indeed be approximated as linear filters. A fairly-well-studied example
is the center-surround receptive field of the ganglion cell. Light illuminating the central
circular region of the receptive filed evokes a response (increased rate of action potentials)
in the (on-center) ganglion cell, while light illuminating an annular region surrounding the
central circular region evokes a response with an opposite polarity (decreased rate of action
potentials). This center-surround receptive field response profile can be well approximated
as the difference of two Gaussian functions with different spreads and heights.
If we consider visual signal processing by a neuron with its receptive field as computing
a filtered output of the input image at a certain spatial location, we need a sampling grid of
neurons covering the entire visual field so that a complete filtered image can be properly
sampled and represented [333]. Such a set of neurons having the same (or very similar)
receptive field structure, and covering the entire visual field, is called a channel, because their
output as a whole represents the visual scene processed in a certain way (such as an image
filtered through a bandpass filter in digital image processing). Of course, the filter analogy
is only an approximation for human visual signal processing because the photoreceptors
in the human retina are not linear in their response to light, their sampling grids are not
exactly regular, and their sampling intervals are functions of eccentricity. It is not known to
great accuracy how the filtered images are represented in the early visual processing. The
concept of the channel is just a convenient way of imagining how the human visual signal
can be modeled. It should also be pointed out that the concept of the channel can often
oversimplify the actual complexity, interaction, and mixing of various types of signal along
the visual pathways. Visual signals most likely are not neatly partitioned into channels [802,
p. 269]. The concept of the channel is a useful one, but has to be taken cautiously until we
understand exactly what is going on in our visual signal processing.
294 12 From retina to brain
There have been many documented clinical cases that described patients who have suffered
selective loss of visual functions (such as color discrimination, face recognition, and motion
perception) while other functions (such as form perception) remained intact. For example,
an artist lost his color vision in an accident. After the injury, he could only see things in
black-and-white. Other aspects of his vision did not seem to suffer much. Therefore, he
could still paint and draw well, but the colors he used to paint were all wrong. Another
patient lost her capability to perceive motion. When she poured tea into a cup, the tea
seemed to be frozen. She could not see the tea filling up the cup in continuous motion,
and before she realized it, the tea had overflowed the cup. Such very selective loss of a
certain visual function can now be correlated with findings through various brain imaging
methods, such as x-ray computed tomography (CT), positron emission tomography (PET),
and magnetic resonance imaging (MRI). The cortical areas that are damaged can be lo-
cated quite well. Furthermore, it was discovered that a change in the amount of oxygen
carried in our blood flow has a remarkable effect on the blood’s magnetic properties, and
these changes can be picked up clearly by MRI. When performing certain tasks, the brain
areas that are involved show increased blood flow and these functional activities in the
brain can be mapped out by MRI. Such techniques in imaging the brain are referred to
as functional MRI (fMRI), and they have allowed us to see the brain’s activities while we
are carrying out some psychophysics experiments [774, 979]. Since the 1970s, anatomical,
physiological, and psychophysical evidence has clearly shown that the primate visual sys-
tem consists of several parallel visual pathways that analyze different aspects of the retinal
images more or less independently [585, 599, 666, 774, 1066]. Figure 12.3 shows how
some of these parallel visual pathways may be wired. It should be pointed out that such
a clean partition of different visual functions into separate neural pathways is most likely
an oversimplification. Exchange of information among these pathways cannot be ruled
out.
Our current knowledge about these parallel pathways indicates that there are at least four
of them: one color pathway, one motion pathway and two form pathways (dynamic form
and static form). Segregation of these different aspects of visual signals seems to start in the
retina, especially in the ganglion cells. They are called the P and M pathways because these
ganglion cells have different sizes and shapes, and they terminate in the different layers of
the LGN. New discoveries are continually made by researchers and more detailed structures
will undoubtedly emerge in the near future. In the following sections, let us trace the visual
signals from the retina to the brain through various neural connections.
The human retina extends about 1100 mm2 and its average thickness is about 250 µm. Under
a microscope, the human retina can be seen as a many-layered structure. It is made up of the
following identifiable layers as shown in Fig. 12.4: (1) the pigment epithelium, (2) the outer
segments of photoreceptors, (3) outer limiting membrane, (4) outer nuclear layer, (5) outer
12.4 The retina 295
V5 V4
5
4 parvocellular ON pathway
3
2 parvocellular OFF pathway
1 magnocellular pathway
magnocellular pathway
parvocellular OFF pathway from ipsilateral (same side) retina
parvocellular ON pathway
from contralateral (opposite side) retina
Figure 12.3. Schematic diagram of the parallel visual pathways according to our current
understanding.
plexiform layer, (6) inner nuclear layer, (7) inner plexiform layer, (8) ganglion cell layer, (9)
optical nerve fibers, and (10) inner limiting membrane. As can be seen from the names of
these layers, the term nuclear refers to cell bodies and the term plexiform refers to synaptic
connections. The terms outer (or distal ) and inner (or proximal ) are used relative to the
order of synaptic connection to the brain. Therefore, the photoreceptors are outer relative
to the ganglion cells.
From the signal transmission point of view, the direct path through the retina consists
of the photoreceptors (rods and cones), the bipolar cells, and the ganglion cells (Fig. 12.4).
296 12 From retina to brain
pigment
rod rod rod
epithelium cone cone cone cone cone
outer segments
of rods and cones
inner segments
of rods and cones
outer limiting
membrane
outer nuclear
layer
rod spherule
rod and cone cone pedicle
synaptic terminals
outer plexiform
layer horizontal cell
rod hyperpolarizing cone bipolar
inner nuclear bipolar depolarizing cone bipolar
layer
amacrine cell
sublamina a
inner plexiform
layer sublamina b
ON gaglion
ganglion cell cell
layer OFF ganglion cell
optic nerve
fiber layer
inner limiting
membrane
There are horizontal cells that make contact with many photoreceptors. There are also
amacrine cells that make contact with bipolar cells and ganglion cells. The nuclei of the
horizontal and the amacrine cells are located in the inner nuclear layer. Some nuclei of the
amacrine cells are also found in the ganglion cell layer. Over the past 100 years, there have
been extensive studies on the structure and functions of the retina. Its complexity is now
much more appreciated than before. The five major classes of cells mentioned so far are
greatly oversimplified. We now know that there are at least 10 types of bipolar cells, 20–25
types of ganglion cells, 3 types of horizontal cells, and 20–40 types of amacrine cells. Most
of these cells have not been studied in detail, and the number of possible micro-circuits
formed by them is potentially very large. The current research effort is directed towards the
isolation and understanding of each of these micro-circuits [229, 231, 644].
12.4 The retina 297
1 There are mechanisms that actively realign the cones to point toward the entrance pupil of the eye and the
local disarray in pointing direction between cones is very small [805].
298 12 From retina to brain
detection of the light (see the discussion in [215, p. 25]). In an experiment [1023], it
was estimated that the minimum quantal catch by a single cone for detection was near
5 (4–6), and the effective number of illuminated cones was 1 or 2. Other data show that
about 20–50 quanta absorbed per rod (dark-adapted) are needed to give a half-maximal
response. For cones, it ranges from 600 quanta in turtles to 3000 quanta in primates
[264, p. 88]. Therefore, rods are mainly responsible for seeing in dim light (scotopic
vision). When dark-adapted, they saturate at luminances that cause more than about 100
photoisomerizations per integration time (about 200 ms) [902, p. 207]. However, the
exact luminance level at which rods are completely saturated in their operating range
is dependent on the stimulus size, retina location, and the spectral composition of the
light. On the other hand, cones seems to be able to continue operate to a very high level
of luminance without complete saturation. In summary, rods are able to detect single-
photon events, but saturate at the light level at which about 100 photons are absorbed
per rod. Cones are not able to detect single-photon events (because the response is too
weak to be reliably distinguished from noise), but do not seem to saturate until they are
almost completely bleached. The exact mechanisms underlying these behaviors are not
completely understood. Interesting discussions can be found in [467, 801, pp. 219–221,
802].
(2) The dynamic range of response to change
By dynamic range, here we mean the ratio of the maximum to the minimum change of
intensity over which the photoreceptor can operate with a detectable change in output
response. This is similar to another term called the dynamic range for a brief flash
[802, p. 143]. Obviously, the temporal characteristics of the photoreceptors have to be
considered. In fact, the dynamic range of response to change has to be specified as
a function of temporal frequency and the adapting luminance level. In typical image
viewing conditions, our eyes are constantly moving and each photoreceptor encounters
rapid rises or falls in image irradiance when the retina moves across an edge. Although
the PSF of the eye optics makes a step edge somewhat smooth on the retina, a dynamic
range measure based on a brief flash is still relevant to the photoreceptors’ day-to-day
operation. To a rough approximation, both rods and cones can respond to a dynamic
range of about 100:1 [802, Chapter 7]. Temporally, cones have faster response time
than rods.
(3) Adaptation
Adaptation usually refers to the gain control process by which a sensory system adjusts
its operating range and response sensitivity. By this definition, the light (or dark) adap-
tation refers to the visual system’s adjustment of its operating point when the overall
luminance of the visual field is increased (or decreased). If the luminance change is
less than a few log units, the new operating point is usually reached within seconds.
The slower visual adaptation process is dark adaptation. The visual threshold of a
dark-adapted eye is elevated by a factor of more than 6 log units immediately after
the eye is exposed to an intense illumination for a period of time. As time goes by,
the threshold gradually decreases until it again reaches the absolute threshold for the
dark-adapted eye. This process of threshold recovery is a typical measure of the dark
adaptation. Cones dark-adapt much faster than rods. The dark-adapted cone threshold
12.4 The retina 299
250.0
nasal temporal
(x1000)
200.0
rods rods
number of photoreceptors / mm2
150.0
100.0
0.0
-20.0 -10.0 0.0 10.0 20.0
eccentricity (mm)
Figure 12.5. Mean cone and rod densities as functions of eccentricity from the fovea along the
horizontal meridian of the human retina.
is completely recovered within 15 minutes of light exposure, while it takes longer than
50 minutes for the rods to do so.
(4) Retinal distribution
The average human retina contains 92 ± 15 million rods and 4.6 ± 0.45 million cones
[224]. The cones are most densely packed at the center of the fovea where no rod exits.
The cone density decreases very quickly outside the fovea. The rods, on the other hand,
start to appear at about 1◦ from the center of the fovea and their density increases and
reaches maximum at about 20◦ and then starts to decrease slowly towards the periphery
of the retina (see Fig. 12.5). Peak foveal cone density averages 199 000 ± 87 000 cones
per square millimeter and is highly variable between individuals [224]. The point of the
highest cone density can be found in an area as large as 0.032◦2 . In the fovea, the average
rod-free zone is 1.25◦ (350 µm ) in diameter. The number of cones in the rod-free area
is about 7000. The cone density falls off steeply with increasing eccentricity. There is
a streak of high cone density along the horizontal meridian. At the same eccentricities,
the cone density is 40–45% higher in nasal compared with temporal retina. The highest
rod densities are located along an elliptical ring at about the eccentricity of the optic
disk.
(5) Spectral sensitivity
The peak sensitivity of the rod occurs at 507 nm (at the cornea). It is the same as the
CIE luminous efficiency function for scotopic vision, V (λ). There are three types of
cone: one sensitive to the long wavelengths (the L cone), one sensitive to the mid-
dle wavelengths (the M cone), and one to the short wavelengths (the S cone). Their
300 12 From retina to brain
1.0
normalized spectral sensitivity
0.8
0.6
0.4
0.2
0.0
300.0 400.0 500.0 600.0 700.0 800.0
wavelength (nm)
Figure 12.6. Normalized spectral sensitivities of human cones (Stockman–MacLeod–Johnson).
peak sensitivities (at the cornea) occur at 566 nm, 543 nm, and 440 nm, respectively
[247]. The spectral sensitivity functions of the rod and the three cones have been es-
timated by various means. Figure 12.6 shows an estimate of the normalized human
cone spectral sensitivities by transient chromatic adaptation, published by Stockman,
MacLeod, and Johnson [910]. The numerical values given by these authors were from
390 nm to 730 nm in increments of 5 nm. The frequently used set of cone funda-
mentals by Smith and Pokorny [885] was determined from the color matching data
of color deficient observers. These two sets of cone fundamentals are quite close to
each other for calculating the L, M, S cone responses of most broadband spectra.
However, they produce significant differences when used for narrow-band spectra. The
Smith–Pokorny’s fundamentals are based on Judd’s modified color matching functions.
The Stockman–MacLeod–Johnson cone fundamentals are based on the CIE 1964 10◦
color matching functions and therefore, have the advantage of building on an interna-
tional standard. The question of which set of cone fundamentals is best is still being
debated.
and a high K+ ion concentration inside the cell. As a result of this concentration gradient,
rods and cones are polarized in the dark (however, because of the dark current, they are
not as polarized as typical neurons which have a resting potential of −70 mV). The dark
resting potential of rods is about −37 mV, and that of cones about −46 mV, in the macaque
retina. In response to light, the permeability of the outer segment is reduced, resulting in a
decrease of the membrane current and an increase of the membrane voltage (hyperpolariza-
tion). Intracellular recordings of the photovoltage of rods and cones show that the response
are highly variable from cell to cell. The maximum response for rods varies from 13 mV to
35 mV, and that of cones from 5 mV to 11 mV [840]. The mean peak response due to the
absorption of a single photon is estimated at 1 mV for rods and 5.2 µV for cones. Intracel-
lular recordings also confirm the following anatomical findings: (1) rods interact with each
other only weakly if at all, consistent with the absence of gap junctions between primate
rods; (2) cones are observed to receive excitatory input from rods (possible through gap
junctions or contacts in synaptic processes). Another important finding from intracellular
recordings is that the photovoltage response of both rods and cones as a function of retinal
illuminance E can be well described by the Michaelis–Menten function:
Vmax E
Vintra = , (12.1)
E + E0
where Vmax is the maximum photovoltage response of a photoreceptor, and E 0 is the retinal
illuminance required to evoke the half-maximum response V = Vmax /2. However, the values
Vmax and E 0 are highly variable among different rods and cones. The average value of E 0
is 75 photons µm−2 for rods and 1430 photons µm−2 for cones. (The Michaelis–Menten
function was derived by Leonor Michaelis and Maud Menten for their work on enzyme
kinetics in 1913.)
Another method of measuring the photovoltage response is through extracellular record-
ings. This method is easier to do, but the response may be due to more than one photoreceptor,
and typically the magnitude of the voltage response is much smaller than in the intracellular
recording. The response voltage as a function of retinal illuminance from the extracellular
recordings has a slightly different functional form:
Vmax E n
Vextra = , (12.2)
E n + E 0n
where n = 0.74. Of course, Vmax and E 0 here are quite different from those in the intracel-
lular recordings. (This function, similar to the Michaelis–Menten function, was first used by
Archibald Hill in 1913 for describing the oxygen-binding of hemoglobin, and the coefficient
n is called the Hill coefficient [922, p. 155]. In vision research literature, the Hill function
is often called the Michaelis–Menten function or Naka–Rushton function [689].)
Yet another way to measure the photoresponse is to measure the change in the membrane
current due to absorption of photons. The advantage of measuring the photocurrent is that
it is not affected by interaction or coupling between neighboring cells. The amplitude of
the response current of a rod to a single photon is about 1 pA. The response of a cone to
a single photon is estimated to be about 30 fA. The rod response rises to a peak in about
200 ms, while the cone response peaks in about 55 ms and follows with an undershoot before
302 12 From retina to brain
returning to the dark level. The response–illuminance relation can be fairly described as an
exponential function:
i = i max (1 − e−k E ), (12.3)
where i max is the maximum current response and k is a constant. However, some photore-
ceptors can be better described by the Michaelis–Menten function or a weighted average of
the exponential function and the Michaelis–Menten function [839].
Having reviewed the general characteristics of the photo response of rods and cones, let
us examine the underlying mechanisms in more detail. The subject of the light response of
photoreceptors has also been reviewed extensively in the literature (e.g., see [90, 589, 655,
802]).
The light-absorbing pigments in the rods and the cones consist of the chromophore (the
light-absorbing atom or molecule) 11-cis retinal [454], which is embedded in a seven-helix
transmembrane protein molecule called opsin. We now know that each type of photoreceptor
contains only one type of pigment. It is the slight difference in the amino sequences of the
opsin in the pigment that controls the difference in the absorption spectra of the different
photoreceptors. As we mentioned before, there are three types of cones: L, M, and S cones.
The names red, green, and blue cones are sometimes still used, but are not recommended
because they are not individually responsible for producing the perception of red, green,
and blue colors. The amino sequences of all four visual pigments (for the rod and L, M, S
cones) have been worked out (from their DNA structures).
The outer segment of a rod or a cone contains a stack of roughly 1000–2000 membrane
disks. In rods, the interior of the disk is not continuous with the external medium, whereas in
cones it can be demonstrated that the disk interior is continuous with the external medium.
Each disk contains numerous (≈105 ) pigment molecules. Light passing through the densely
packed stack of disks has a very high probability (about 0.5–0.7 at the peak) of being
absorbed by one of the pigment molecules.
So far, the best-studied photoreceptor response to light is that of the rhodopsin in the
rod [149]. In the dark, some (fewer than 5%) of the cGMP-gated ion channels in the outer
segment membrane are open and there is a steady dark current (Na+ and Ca2+ ) flowing from
the cell body out to the extracellular space and then into the outer segment. The absorption
of a photon by the rhodopsin molecule activates an enzymatic cascade that leads to closure
of the channels in the plasma membrane of the outer segment. The cascade is initiated by
the photoisomerization (from cis to trans) of the retinal chromophore of rhodopsin (R).
Photo-excited rhodopsin (R∗ ) then activates transducin (T). (The transducin is a G-protein
which also plays important roles in many signal transduction processes in other cell func-
tions.) The transducin, in turns, switches on a potent phosphodiesterase (PDE∗ ) that rapidly
hydrolyzes cGMP (guanosine 3 ,5 cyclic monophosphate). (Each of these cascaded reac-
tions amplifies the signal a few hundred times. For example, a single R∗ activates about 500
transducins and the cGMP is hydrolyzed to 1/e of its initial concentration by fully activated
PDE in only 0.6 ms.) The light-induced decrease in the cGMP level closes channels, thereby
blocking the influx of cations. The absorption of a single photon causes the closure of a
few hundred channels. The resulting hyperpolarization of the plasma membrane is trans-
mitted to the synaptic terminal at the other end of the cell. In the dark, there is a constant
12.4 The retina 303
semi-regular array in the retina of the macaque monkey [855]. Within the human fovea, the
center-to-center
√ spacing, a, between cones is about 2.6 µm [591]. The Nyquist frequency
is 1/ 3a, which is roughly 63 cycles per degree (assuming that 1 mm is 3.5◦ ).
Photoreceptor noise
Retinal photoreceptors generate discrete electrical events in the dark that are indistinguish-
able from those evoked by light. The rate of these random events is highly dependent on the
temperature, strongly suggesting that the noise originates from the thermal isomerizations
of the chromophore molecule, 11-cis retinal. However, the isomerization seems to be a
two-step process. First, the Schiff-base linkage between the chromophore and the protein is
deprotonated. Second, the unprotonated form of the photopigment undergoes the thermal
isomerization of 11-cis retinal to the all-trans form [55].
The first pathway may be the slower one, while the second pathway through cone–rod gap
junctions may be the faster one [911].
Cone bipolar cells can be divided into many subclasses, based on their contacts with
the cone pedicles (such as invaginating and flat), the extent of their dendritic field (midget
and diffuse), their response polarity (ON and OFF), and their synaptic contacts with the
amacrine cells and the ganglion cells (inhibitory or excitatory). Of course, these features
are not mutually exclusive and the number of functionally independent classes in the human
retina is still not known. The differentiation between midget and diffuse classes is important.
A midget bipolar cell makes contact with only one cone pedicle, while a diffuse (or parasol)
bipolar cell makes contact with many (usually 5–10) cones. The midget bipolar cells thus
preserve a spatial resolution as fine as the individual photoreceptor and they are found mostly
in the fovea cone connections. The existence of ON and OFF classes is usually thought to
increase the response resolution because they provide a larger dynamic range for signal
modulation. ON-bipolar cells respond to a light spot projected onto their receptive field
center with depolarization, while the OFF-bipolar cells respond with hyperpolarization.
One type of midget bipolar cell has processes invaginated in the cone pedicles, forming
the center process of the triad. Evidence shows that the invaginating midget bipolar cells
carry the ON-channel. Another class of midget bipolar cells have their dendritic terminals
contacting the cone pedicles at flat, basal junctions. These are called the flat midget bipolar
cells, and are believed to carry the OFF-channel. From the fovea to the periphery, each
midget bipolar cell is shown to have an exclusive contact with a single cone [126] and
maybe only one midget ganglion cell. The midget bipolar cells are believed to serve only
the L and the M cones.
The diffuse bipolars are also found to have flat and invaginating types. Another possible
classification is according to the stratification of the axons in the inner plexiform layer,
whether closer to the amacrine (sublamina a) or to the ganglion cells (sublamina b). Such a
scheme leads to the classifications DB1–DB6, the six different types of the diffuse bipolar
cells in rhesus monkeys [126]. Since the diffuse bipolar cells contact all cones in their
dendritic fields, they are not likely to carry chromatic signals [126]. The S cone bipolar
cells are morphologically distinct as a separate class [126]. Anther study suggests that the
diffuse invaginating bipolar and the differently classified giant bistratified bipolar cells both
receive S cone input [634].
Like photoreceptors and horizontal cells, bipolar cells respond to light in their receptive
field with graded potential changes. Bipolar cells have center-surround receptive fields, with
the excitatory center receiving input directly from the cones, and the inhibitory surround
receiving input from the horizontal cells that pool over many cones in the neighborhood.
The neural transmitter released by the photoreceptor to the bipolar cells is glutamate. The
ON-bipolars respond by depolarizing and the OFF-bipolars by hyperpolarizing. Since both
the ON- and the OFF-bipolar cells respond to the same neural transmitter, their response
mechanisms must be different. It has been shown that 2-amino-4-phosphonobutyrate (APB)
prolongs hyperpolarization in the ON-bipolar cells, making them unresponsive to subse-
quent light stimuli [834]. (In fact, APB suppresses the maintained as well as the light-driven
activity of all ON-ganglion cells in cats [686].) On the other hand, the OFF-bipolar cells
are largely unaffected.
Rod bipolar cells appear to depolarize in response to light (the ON cells) [1013]. Behavior
studies show that under dark-adapted conditions when only rods are functional, APB blocks
12.4 The retina 307
monkeys’ detection of both light increment and light decrement, confirming that the primate
rod bipolar cells are probably all of the ON type [834, p. 89]. Rod bipolar cells terminate
in the inner portions (sublamina b) of the inner plexiform layer, just as do cone ON-bipolar
cells. However, rod bipolar cells do not seem to contact the ganglion cells directly. Instead
they form synaptic contacts with amacrine cells which in turn make excitatory contacts
(gap junctions) with ON cone bipolar cells and inhibitory (glycinergic) contacts with OFF-
bipolar cells and OFF-ganglion cells [1013]. This is one of the two rod signal pathways
that we discussed before. The ON- and OFF-channels remain segregated in the LGN and
start to converge in the visual cortex. Current evidence [834] seems to support the proposal
that the ON-channel facilitates the detection of light increment and the OFF channel the
detection of light decrement.
midget ganglion cells, small parasol ganglion cells, large parasol ganglion cells, and small
bistratified ganglion cells. Their dendritic diameter in fovea retina is 5–9 µm for midget
ganglion cells, 10–35 µm for small parasols, and 25–90 µm for large parasols [520, p. 64].
However, there is conflicting evidence that the parasol cells form only a single cluster whose
size distribution covers both the small and the large parasol cells as classified by Kolb [226].
The dendritic field size increases with eccentricity. For example, the dendritic diameter of
the midget ganglion cells increases tenfold between 2 and 6 mm eccentricity, followed by a
smaller increase, reaching 225 µm in the retinal periphery [227]. The midget ganglion cells
carry the color-opponent signals to the parvocellular layers of the LGN, the small parasol
ganglion cells carry the broad-band, tonic “luminance” signals also to the parvo layer, and
the large parasol ganglion cells carry the broad-band, phasic signals to the magnocellular
layers of the LGN. The term tonic means sustained (or lower temporal frequency) and the
term phasic means transient (or high temporal frequency).
Midget ganglion cells are also called A cells, Pβ cells, or P cells (because they project
to the parvocellular layers of the LGN). Parasol cells are also called B cells, Pα cells, or M
cells (because they project to the magnocellular layers of the LGN). The dendritic field of
a parasol cell is about four times wider than that of a midget cell. Each midget cell receives
its major input from its associated midget bipolar cell in the form of about 50–80 synaptic
contacts. It also makes about the same number of contacts with the amacrine cells [521].
Parasol cells whose dendrites stratify in the inner portion (sublamina b) of the inner
plexiform layer (IPL) give ON-center responses, and those that stratify in the outer portion
of the IPL layer give OFF-center responses. Similarly, midget cells also have ON-center
and OFF-center types, depending on where their dendrites stratify in the inner plexiform
layer. These four types of cells (midget ON and OFF, parasol ON and OFF) do not seem
to receive input from S cones. A separate cell type, called the small bistratified cell, which
has dendrites stratified in both the inner and outer portions of the IPL, seems to receive
excitatory input from S cones [228]. The inner tier in sublamina b is larger in diameter
and more densely branched than the outer tier in sublamina a (see Fig. 12.4). Its dendritic
diameter is about 50 µm around the fovea and about 400 µm in the far periphery. The
proportion of small bistratified ganglion cells ranges from about 1% of the total ganglion
cells in the central retina to about 6–10% at the periphery. The small bistratified ganglion
cells and the midget ganglion cells project to the parvocellular layers of the (dorsal) LGN,
while the parasol ganglion cells project to the magnocellular layers of the (dorsal) LGN.
It should be noted that these projections refer to the majority of the cells that have been
studied. The details of the exact number, type, and projection of the many types of ganglion
cells are still unknown.
The ON- and OFF-dendritic trees of the midget ganglion cells oppose one another but do
not overlap, having a coverage of no greater than 1. The two mosaics have different spatial
scales, with the ON-dendritic field being larger than the OFF-dendritic field in an area ratio
of about 1.7:1 [227]. It was estimated that 80% are midget ganglion cells and 10% are
parasol ganglion cells. The proportion of midget ganglion cells ranges from about 95% of
the total ganglion cells in the fovea to about 45% at the periphery. The calculated Nyquist
frequencies from the midget cell spacing closely match the measured human achromatic
spatial acuity from 6◦ to 55◦ eccentricity [227]. Therefore, the visual resolution is limited by
12.5 Lateral geniculate nucleus (LGN) 309
the ganglion cell density in this range of eccentricity. (In comparison, in the central vision
of the fovea, the visual resolution is limited by the cone density.)
Many ganglion cells have center-surround receptive fields. The ON-cells are excited
by illumination in the center and inhibited by illumination in the surround. The response
profile is often approximated as the difference between two Gaussian functions. When their
receptive fields are uniformly illuminated, the ganglion cells maintain a roughly constant
rate of discharge, averaging about 20–30 impulses per second in the macaque retina under
2000 td illumination [964].2 The peak discharge rate of a ganglion cell is between a few
hundred and a thousand impulses per second, and therefore the sustained discharge rate
of the ganglion cells is not high enough for decremental modulation. The ON- and OFF-
channels thus serve the purpose of encoding the increments and the decrements of light
with higher resolution.
In the center-surround receptive fields of ganglion cells, there are variations in the spectral
responses in the center and the surround. These cells are called color-opponent cells. Since
their properties are almost the same as those cells found in the LGN, and since cells in the
LGN are easier to study, most studies on color-opponent properties were performed in the
LGN. We will discuss the color-opponent encoding in the next section, where we discuss
the structures and properties of the LGN.
The ganglion cells in the retina transmit visual signals through the optic nerves that branch
out to different areas in the brain. The majority of them terminate at the LGNs in the
thalamus. The human lateral geniculate nucleus has six major layers, each of which can be
divided into two sublayers: the principal sublayer and the koniocellular (or the intercalated)
sublayer (see Fig. 12.3). Each major layer is about 6–10 cells thick. Although the LGN
receives input from both eyes, the signals from the two eyes remain separate in the LGN.
Each of the six layers receives input from only one eye. For example, major layers 1,
4, and 6 receive input from only the opposite (contralateral) eye, and major layers 2, 3,
and 5 receive input from the eye on the same side as the LGN (see Fig. 12.3). All layers
maintain a well-registered map of half the visual field. If cell recordings are made along
a perpendicular path through the layers, the receptive fields of all the cells encountered
correspond to roughly the same region in the visual field, except areas where there is no
correspondence between the two eyes. Cells in layers 1 and 2 are larger in size than those
in the other four layers. Therefore, layers 1 and 2 are called the magnocellular layers and
the other four layers, the parvocellular layers. The magnocellular neurons (M cells) exhibit
transient responses when light stimuli are turned on or off or when they are moved across
the visual field. The parvocellular neurons (P cells) respond to prolonged illumination with
sustained discharges.
The receptive fields of the P cells and the M cells are center-surround organizations,
similar to those of the ganglion cells – so similar that the LGN was once considered as
merely a relay station for the ganglion cells. However, the LGN also receive fibers from
the cerebral cortex, and from the brainstem. Therefore, the LGN are more than just relay
stations. There is evidence that an LGN also functions as an attention filter [862]. The
spectral sensitivities of the center and the surround of many LGN cells are the opposite of
each other (e.g., the center is more sensitive to long-wavelength light while the surround
is more sensitive to middle-wavelength light). Spectrally, there are three types of P cells:
broad-band, red–green opponent, and yellow–blue opponent. The M cells do not seem to
show any clear chromatic opponency. The color encoding is the subject of discussion shortly.
Based on numerous anatomical, physiological, and psychophysical studies, it appears that
P cells and M cells form two distinct visual pathways: the P pathway and the M pathway [599,
1066]. Discrimination of color, fine pattern, and texture require the parvocellular layers,
while perception of motion and depth require the magnocellular layers. Immunoreactive
staining also reveals that the koniocellular (or intercalated) sublayer cells in each of the six
main layers seem to form a distinct third channel (in addition to the P and M channels)
that projects to the visual cortex [392, 393, 802]. Retrograde labeling shows that these cells
provide input to the cytochrome oxidase-rich blobs in layers 2 and 3 of the primary visual
cortex that are known to have mostly cells that process color information.
gathered, spectral attributes can be computed. The L cones have a larger response than the
M cones for a red flower, and the opposite may be true for a yellow flower. But how is
color information processed and encoded in the various stages along the neural pathways?
It turns out that the L, M, and S cone signals are not sent to the brain independently in
three separate channels. In fact, one of the major design principles of the neural signal
processing is to preserve retinal image contrast, or even enhance the differences spectrally,
spatially, and temporally. The three cone signals are compared and combined early on, and
by the time they reach the ganglion cells, color seems to be coded in opponent processes.
For example, one type of ganglion cell responds to a small spot of long wavelength illu-
minating the center of the receptive field by increasing the pulse rate, but to a larger spot
of middle-wavelength light by decreasing the pulse rate. The long wavelength is excita-
tory, but the middle wavelength is inhibitory – there is an opposite response for different
wavelengths.
There are many types of ganglion and LGN cells. Figure 12.7 shows some of the major
types that have been found. The notation used is R, G, B, Y, and W, representing the long
wavelength, middle wavelength, short wavelength, long + middle wavelength, and wide-
band spectra. This notation is how those cells were labeled when they were first found; it
might have been better to label them using the L, M, and S notation. However, there are
questions about whether these center or surround fields are contributed by one or more types
of cones. Therefore, we will tentatively use R, G, B, Y, and W here, with the understanding
(a) (b)
P cells M cells
R+
G− G+
R+ R− G−
W−
W+
(c)
R− R+ W+
G+ G−
W+ W−
W−
Y− B+ Y+ B−
Figure 12.7. Cell types found in the LGN: (a) the six types of color-opponent cells in the P cells and
the M cells; (b) a model of the R+ G− receptive field; (c) a model of the center-surround broadband
receptive field.
312 12 From retina to brain
that they are not directly linked to our perception of red, green, blue, yellow, and white.
The color-opponent cells are mostly found among the P cells, while the broadband cells
are among the M cells. However, it is important to point out that the spectral responses of
cells that belong to the same type, say, R+ G−, are not exactly identical. In fact, they tend
to vary continuously; so much so that initially there was a lot of difficulty describing what
their spectral response characteristics were, other than using vague words such as red, green,
yellow, and blue. Fortunately, studies with better control in the spectral composition of the
stimuli give us some good ways to described them. One such description is based on a color
space called the DKL color space [480], because it was first explicitly expressed in a paper
by Derrington, Krauskopf, and Lennie [252], who did an extensive analysis of the spectral
responses of the LGN cells, using this color space. The DKL color space consists of one
luminance axis and two chromatic axes. The luminance axis represents the CIE photopic
luminance. Depending on how the cone sensitivities L and M are scaled, we can write the
luminance axis as w 1 L + w 2 M. One convention is to scale the cone sensitivities so that
the luminance is simply L + M. One of the two chromatic axes is defined as the constant
S axis. That is, points along this axis have different L and/or M responses, but their S
responses are all the same. The other chromatic axis is the constant L and M axis. Points
along these axes have different S responses, but they all have the same L and M responses.
These two axes are sometimes called the cardinal axes. Derrington, Krauskopf, and Lennie
found that most of the LGN cells fall on one of three planes in the DKL space, meaning that
substitution of one stimulus by another on the same plane does not change the response of
that class of cells.
Although many studies have been done to quantify how color information is encoded
in our neural pathways, currently there are still no definitive models that can account for
the many varied observations. Specifically, the psychophysical color appearance data are
not directly correlated with those that have been measured from neurons. For example,
the opponent color processes proposed by Hering [396], and Hurvich and Jameson [436]
do not correspond to the DKL color axes. A similar difference exists between spatial fre-
quency channels and the observed neural responses. It seems that the neural correlates with
perceptual attributes lie further up in the brain.
The total area of the human cerebral cortex is about 1500–1900 cm2 . About 27% of the area
is predominantly for vision, 8% for hearing, 7% for the sense of touch, and 7% for motor
control [982]. The large visual cortex area has been found to consist of many subareas, each
of which is involved in a different step or specialized processing of visual information. The
full complexity is far from clear, but the understanding of the organization of the visual areas
has been much refined due to the use of functional magnetic resonance imaging (fMRI)
[262, 980, 981, 1007]. The brain cortex is a convoluted three-dimensional surface and
therefore is difficult to draw on a flat display. Using computers, it is possible to flatten out
the cortical surface (like projecting the earth’s surface on a flat map). Figure 12.8 shows a
schematic flat map of the right brain cortex and the approximate locations of the various
12.6 Visual areas in the human brain 313
V2d
V3
V1 V3A
V7
LO
MT
V8
V2v
VP V4v
Figure 12.8. Schematic diagram of the flattened map for the right human brain, showing the various
visual cortical areas (redrawn from [982]).
visual areas. Most of the areas have not been well studied yet and some of the areas are
not named universally (for example, the area MT in the human is tentatively so named here
because its response characteristics are similar to the corresponding area in the macaque
monkey). Many of the areas, such as V1, V2, V4, and MT, have been studied extensively in
the monkey. Area V1 is the primary area to which LGN neurons project. Area MT has been
shown to be essential for motion perception. It should be noted that even the size of the well-
studied areas can vary from individual to individual by a factor of as much as 2 or more. The
boundaries between neighboring areas are often quite subtle and cannot be distinguished
with high resolution. Since these areas are partitioned by response characteristics, anatomy,
and physiology, some ambiguities and inhomogeneities do exist. Therefore, the map serves
only as a rough guide of our current understanding and should not be taken too literally.
The visual cortex studies that will be discussed here were on the macaque monkey,
unless the desciption explicitly refers to humans. Differences between the macaque and the
human visual areas have only begun to be mapped out by fMRI in recent years. In particular,
human V3A is much more sensitive to motion [956], and human V4 seems to be located in
a different location [999].
see if the cells would respond, but the cells did not respond at all. After trying many different
types of stimuli, Hubel and Wiesel finally succeeded in finding the stimuli to which these
cells would respond very strongly. Although we do not yet know all the details, it seems
that the cells in the primary visual cortex may be computing image features such as edges,
lines, corners, and ends of a line, etc. (see [418] for a fascinating account of this discovery).
The axons of LGN neurons terminate in the two sublayers, α and β, of layer 4C of the
primary visual cortex. The M cells terminate in layer 4Cα, while the P cells terminate in
layer 4Cβ (see Fig. 12.3). Cells in layer 4C still have center-surround receptive fields similar
to the LGN cells, but instead of responding weakly to a diffuse white light as the LGN cells
do, they virtually do not respond at all. This is a continuous trend of cell behavior as we go
from the retina to LGN and then to the visual cortex and beyond. Cells respond to more and
more specific stimuli and less and less to diffuse light. From layer 4Cα, we find projections
to layer 4B, and then from 4B to other visual association cortex areas, such as V5. From
layer 4Cβ, we find projections to layers 2 and 3, and from there to layers 5 and 6, and to
other visual areas, such as V2, V3, and V4. In the sublayers of layer 4, Hubel and Wiesel
found many cells that respond strongly to stimuli that look like edges and lines. They called
them simple-cells, because their receptive field structures are simpler than the complex-
cells that they found in layers 2, 3, 5, and 6. Almost all cells are orientation selective. They
only respond to lines or edges that are oriented in a certain direction and their response
amplitudes decrease sharply when the features are oriented about 10–20◦ off the optimal
angle. Figure12.9 shows three types of receptive field for such simple-cells. The size of
◦
a simple-cell receptive field ranges from 14 near the fovea to 1◦ at the far periphery. The
+
- edge detector
-
+
- line detector (bright line)
+
-
+ line detector (dark line)
Figure 12.9. Three types of receptive field for simple-cells in the primary visual cortex. The “+”
means excitatory region and the “−” the inhibitory region.
12.6 Visual areas in the human brain 315
width of the center bar of a line detector is only a few minutes of arc (Fig. 12.9). To evoke
sustained responses from the simple-cells, the line or the edge has to be critically positioned
and oriented.
Complex-cells respond to moving lines and edges, and they do not have clear excitatory
and inhibitory regions in their receptive fields. They respond to lines and edges moving
through their receptive fields with the correct orientations. Complex-cells have larger re-
ceptive fields than simple-cells. Near the fovea, the field size is about 12 ◦ × 12 ◦ . The width
of the bar in the field is about the same as that of a simple-cell. We still do not know how
the complex-cell receptive field is constructed from its input signals.
Another interesting type of cell is the end-stopped cell. Simple-cells and complex-cells
respond best to lines that are as long as their receptive fields. End-stopped cells respond
best when the line ends in a certain region of its receptive field. If the line is longer, the
response weakens or disappears.
All simple-cells, complex-cells, and end-stopped cells are orientation selective, and they
are mostly not particularly selective to color changes. That is, for most cells, their receptive
fields are the same whether they are mapped out using white light, or colored light. However,
a separate group of cells in layers 2 and 3 has a completely different behavior. These cells
were identified by staining the cortex for the enzyme cytochrome oxidase. The staining
patterns look like dark blobs, patches, or spots. Some cells in the blobs have receptive
fields that are similar to those of the LGN color opponent cells. Some others have center-
surround receptive fields, but they are different from the color-opponent cells of the LGN. For
example, the center is excited by long-wavelength light, but inhibited by middle-wavelength
light, while the surround responds in the opposite way. The spatial-opponency and the color-
opponency coincide exactly, and these cells are called double-opponent cells. Their receptive
field centers are several times larger than those of the color-opponent cells in the LGN. Yet
another type of cell has a color-opponent center and a nonchromatically opponent surround
[968]. It is also found that within the same blob, cells seem to have the same color opponency.
Blobs dedicated to red/green color opponency are about three times more numerous than
those dedicated to yellow/blue [968].
With the complicated cell response behaviors described above, we cannot help won-
dering how so much computation can be accomplished in so few steps along the neural
signal processing pathways. What is more amazing is the finding that macaque V1 neu-
rons seem to signal illusory contours [367, 587]. Therefore, not only true edges and lines
are detected here, but also the proper completion of contours may be accomplished as
early as V1. More data are needed to confirm such complicated behavior. The other as-
pect that is still not clearly understood is the role of feedforward and feedback gain con-
trols from other precortex and cortex areas to the primary visual cortex. Anatomically, we
have found many such neural contacts. Psychophysical evidence is beginning to show that
these gain control signals shape the information encoding as early as the primary visual
cortex V1 [963].
One other important discovery made by Hubel and Wiesel about the primary vi-
sual cortex is the spatial organization of eye-preference and orientation-preference cells.
Cells receiving input predominantly from one eye are arranged in a stripe against an-
other stripe that has cells from the other eye. They also found that orientation preference
316 12 From retina to brain
occipital
lobe V3
V3A V5
V1 V2
infero-temporal
lunate V4 cortex
sulcus
Figure 12.10. A schematic diagram of the visual cortical areas (redrawn from Fig. 17.1 in [52]).
differs by about 10◦ for every 0.05 mm distance parallel to the surface of the cortex.
This type of cell organization seems to imply that a spatial point in the visual field is
analyzed with respect to all possible orientations in a very small and localized region
of V1.
gratings are moving in perpendicular directions, we see a combined global motion in the
diagonal direction. However, local optical flow analysis shows that each grating is moving
in a different direction. Cells in the LGNs respond to the local optical flow vectors, but some
cells in V5 respond to the perceived global motion vector, instead. In the still higher cortical
areas, cells respond to even more global and three-dimensional concepts, such as surface
orientation [971, 979]. These findings suggest that by the time we reach cortical areas V4
and V5, perceptual attributes have been extracted from the raw visual signals sensed by
the retina. V4 further projects to the inferior temporal cortex which is known to process
information about the “what” aspect of an object, such as the form, shape, and color. V5
further projects to the posterior parietal cortex which is known to process information about
the “where” aspect of an object, such as location, and motion. However, it should be pointed
out that the functions of the two cortical pathways, one to the inferior temporal cortex and
the other to the posterior parietal cortex, are not clearly understood. Instead of the “what”
and “where” visual functions, the inferior temporal pathway is said to process perceptual
representation and the posterior parietal to process visuomotor control signals [666]. The
existence of many pathways to process different aspects of visual perception may explain
some very strange behavior in patients who suffer from brain damage in one of the pathways
[403, 782].
In our discussion of neural structures and connections of the retina and the visual cortex,
we have mentioned three main concurrent processing pathways: (1) the P–I pathway (parvo
→ 4Cβ → interblob → pale stripe → V4), (2) the P–B pathway (parvo → 4Cβ → blob
→ thin stripe → V4,V5), and (3) the M pathway (magno → 4Cα → 4B → thick stripe →
V3,V5). Evidence for the existence of such parallel neural pathways comes mainly from the
anatomical tracing of neural projections from one area to another, and from the morpho-
logical (shape), chemical (staining), and physiological (extra- and intra-cellular recordings)
classification of cell types. However, it should be pointed out that the segregations between
these pathways are not complete and independent. Lesion studies also indicate that the
parvo- and the magno-cells from LGNs may provide input to other pathways than those
in the above projections. By measuring the response characteristics of a cell, we cannot
say to which pathway it belongs. For example, the P–B pathway is characterized by cells
that have high wavelength selectivity, but similar selectivity can also be found in the other
two pathways. Similarly, stereo disparity selectivity can be found in both the P–I and M
pathways. Table 12.1 shows how cells with various selectivities are found in the different
cortical areas and visual pathways of the macaque monkey (which has been shown to have
very similar color vision to humans) in terms of percentages. It is therefore not correct to say,
for example, that cells in the M pathway do not respond to chromatic signals just because
the M pathway seems to be mainly for processing motion information. There are at least two
reasons why making such a statement is likely to be misleading: (a) color information may
also be useful for computing motion, and (b) the M pathway may be processing more than
just motion information. So what is a proper or most likely interpretation of these parallel
318 12 From retina to brain
pathways? DeYoe and Van Essen [253] provide a plausible view of such an interpretation
as follows.
Vision is concerned with extracting information about three-dimensional objects and
environments from two-dimensional retinal images. Visual information is described in our
visual system in terms of certain perceptual attributes, such as color (hue, saturation, and
lightness), shape, motion, depth, etc. These perceptual attributes have to be computed from
two-dimensional retinal images, from which one can compute the sensory cues, such as the
gradient of the image irradiances as a measure of local contrast, the density and orientation
of local edges as a measure of local textures, and the relative variations of the L, M, and S
cone excitations at different spatial positions as a measure of reflectance and illumination
changes, etc. To derive the perceptual attributes from the sensory cues is a very complex
computational problem [636]. A given sensory cue can be used to help compute many visual
attributes and any given visual attribute requires more than one sensory cue for its robust
computation.
Knowing that the human brain has localized centers for certain sensory or motor func-
tions allows us to assume that one visual attribute is mainly computed in one localized
cortical area. Since the visual attributes are normally quite different from the physical cues
that they are computed from, we do not expect that the computations can be accomplished
in one step. For example, a white object in the shade may reflect less light than a black
object under the sun, but our perceptual attribute, lightness, is more correctly correlated
with the actual reflectance of the object, than with the amount of light reflected from the
object. In order to compute such a correct visual attribute, it is necessary to group pixels
into objects, make an inference about the surface shape, compute the spatial layout, and
estimate the illumination distribution. Each of these steps is difficult and may itself require
many steps to compute. But, more importantly, the results from each of these steps are also
useful for deriving other visual attributes as well. It is true that certain computations use one
type of physical cue more heavily than others. For example, motion may not require high-
resolution spatial information because the image areas under motion are blurred anyway. To
the extent that such demands of low-level image information can be parceled into different
spectral–temporal–spatial signal partitions, our neural computations can be processed in
12.8 Problems 319
parallel, and so we can justify the existence of parallel processing streams. However, the
computational results from each intermediate step need to be shared among various pro-
cesses that eventually lead to different perceptual attributes. We therefore expect that these
parallel pathways will diverge and converge, and share information along the progressive
stages from the retina to LGN, V1, V2, V3, V4, V5 and beyond. Furthermore, some compu-
tations may have to be duplicated among pathways, if sharing is not complete. Viewed from
this perspective, it is not surprising if we find wavelength-selective cells, or stereo-selective
cells in the motion pathway.
With the above perspective, we can now discuss the general characteristics of the three
pathways. Their main characteristics are listed in Table 12.2. The P–I pathway seems to
be involved in the perceptual tasks of shape and orientation discrimination and computa-
tion. The P–B pathway seems to be computing color perception. The M pathway seems to
be involved in movement perception, depth perception, and figure–ground discrimination
[599]. These perceptual tasks are found to be highly correlated with the response charac-
teristics of the neurons that are typically found in the associated pathways. (Although most
of the anatomical studies were performed on monkeys, there is evidence that the corre-
lation is also the same in humans [599, 977, 1066].) For example, our motion perception
is insensitive to color, has high temporal resolution and low spatial resolution, and high
contrast sensitivity, all of which are characteristic of the cells found in the M pathway. If
we create an equiluminant target of red and green bars drifting across the visual field, we
do not see the motion very well, although we are aware of the changing of the position
of the red and the green bars. Since the M cells have a broadband sensitivity, they cannot
detect the boundary between the equiluminant red and green bars and, therefore, are not
able to compute the motion information from them very effectively. On the other hand,
the P–I pathway is sensitive to color and it responds to the shape and orientation of the
bars. This type of stimulus almost never occurs in nature, and the strange perception of
such a dissociation of form and motion perception is a very surprising validation of our
anatomical and physiological findings about the parallel visual pathways! We can take ad-
vantage of these unique characteristics of our visual perception in many of the color imaging
applications.
12.8 Problems
12.1 Describe the main differences in the characteristics between rods and cones, in terms
of morphology, cell structure, response dynamic range, light sensitivity, and retinal
distribution.
320 12 From retina to brain
12.2 Let us assume that the photoreceptor response voltage, V , is related to the retina image
irradiance, E, by the Hill function:
Vm E n
V = . (12.4)
E n + E 0n
One hypothesis proposed in the literature is that the lightness, L ∗ in CIELAB, as a
function of Y (luminance) is a good fit to the first half of the Hill function. Determine
the proper values of Vm , E 0 , and n that produce a “best” fit. How do you interpret Vm
and E 0 in such a fit?
12.3 The 11-cis retinal is very stable at normal body temperature. Spontaneous isomer-
ization of the retinal chromophore of rhodopsin occurs once in a thousand years.
However, our retina contains about 92 million rods and each rod contains about 1500
membrane disks, each of which contains about 105 rhodopsin molecules. On the av-
erage, how many spontaneous isomerizations per second occur in the retina ? How do
our eyes cope with this noise level?
12.4 Experiments show that the measured color matching functions are slightly different for
different observers. Discuss the various factors that might account for such differences.
12.5 The primary pathways for rods and cones remain separate until they merge into the
input to ganglion cells. Describe how and where the rod signals merge with the cone
signals.
12.6 It is often said that the retina has a mixed signal processing structure, i.e., visual
signals are represented in analog and digital forms at different stages of processing.
Can you point out which part is analog (graded potential) and which part is digital
(pulses of action potential)?
12.7 What are the P-pathway and the M-pathway? What are their possible visual functions?
13 Visual psychophysics
An important part in our study of color imaging science is the subject of image perception.
Before we embark on that topic, we need to review some general results from visual psy-
chophysics. In many cases, understanding how we perceive a reproduced image is not suffi-
cient, because we also need to understand how we perceive the original (three-dimensional)
scene so that we can make a reasoning about and judgment of the imaging and reproduction
processes.
In this chapter, we will discuss the nature of psychophysical measurements and the
various psychophysical phenomena in visual perception. Psychophysics is the science of
studying the (human) psychological responses to physical stimuli. Since our psycholog-
ical responses are subject to the influence of many variables, the major problem in psy-
chophysics is to define the physical stimulus carefully so that only the relevant response
is measured. The formulation of most psychophysical problems is obviously very diffi-
cult and the majority of psychophysical experiments have not succeeded well in this re-
spect. Therefore, most experimental results in psychophysics are not easy to interpret.
The dilemma is that, if the stimulating configurations are very specific, we wonder if
the results are useful for other configurations. If they are not well controlled, we won-
der which factors in the configurations are affecting the responses. Discussions on these
concerns and other theoretical issues would take us too far from our main objective here.
Our emphasis will be more on experimental observations, rather than theoretical models.
If visual perception were better understood, we would be able to present our knowledge
in a better organized manner. In fact, there have been many theories and hypotheses that
we could have built upon to simplify our presentation, but we feel that at this moment,
these theories or models do not have sufficient validity and generality to warrant such
a presentation. Therefore, in this chapter we will first present the experimental founda-
tions of certain simple principles regarding visual perception and then we will discuss
various aspects of vision that will help our later study of imaging system design and eval-
uation issues, and also important visual phenomena that will be useful as constraints or
guidance when we study the computational models of vision. Hopefully, this emphasis on
experimental data and observations will avoid one of the problems in some areas in vi-
sion research: too much theorizing and too little study of what is to be explained [290,
Preface].
321
322 13 Visual psychophysics
Typical responses measured in psychophysical experiments can be classified into seven cat-
egories (according to Stevens [904]): (1) Absolute thresholds: what is the minimum strength
of a stimulus we can detect? (2) Differential (incremental or decremental) thresholds: what
is the minimum change in a stimulus that we can detect? (3) Equality (or matching): which
stimuli that differ in certain physical attributes (such as spectral composition in color match-
ing) are perceptually equal? (4) Rank ordering: how is a set of stimuli arranged in order
according to some perceptual attribute? (5) Equal interval: what physical stimuli produce
equal differences in some perceptual attribute? (6) Equal ratio: what physical stimuli pro-
duce equal ratios in some perceptual attribute? (7) The subjective estimation of physical
attributes. These seven categories vary greatly in difficulty and reliability. Unfortunately, in
color imaging applications, we are often interested in measuring the “higher” level or more
subjective responses and hence have to deal with even worse problems in the reliability and
interpretation of the measured results. The best strategy is always to formulate the experi-
ments in one of the first four categories, whenever possible. These issues will be discussed
in the context of psychometric methods later.
There are three main areas in psychophysics that are of interest to us: (1) How do we
quantify psychological responses (measurement scale)? (2) What psychophysical methods
are useful for measuring responses (psychometric methods)? (3) How do we interpret psy-
chophysical data for color imaging applications (data interpretation)? The last area covers
both error analysis and data extrapolation. Our presentation here is necessarily limited in
scope and depth. Readers should consult books and papers that deal with the topics in more
details (e.g., [298, 342, 361, 370, 904]).
ratios. We are interested only in the interval scale and the ratio scale. By definition, the
interval scale is determined by judging equality of differences, it does not need to have an
origin, i.e., zero point. For example, if we take a piece of white paper and a piece of black
paper, and we ask an observer to select a piece of gray paper that is equally different from
both of them in lightness, we can collect a set of data that tell us what the reflectance factor
of the gray paper should be to produce a lightness that is half way between those of the
white paper and the black paper. We can then repeat this process to determine other equal-
difference points and come up with an interval scale for lightness because it is determined
by judgments of equality in differences. In this process, there is no need to assume an origin
(i.e., the zero point) because as far as the process is concerned, it is irrelevant. In comparison,
the ratio scale requires the existence of a zero point so that it is meaningful to talk about
ratios. For example, we can show a piece of dark gray paper A and a piece of light gray paper
B, and ask an observer to select a piece of even lighter paper C so that the lightness of B to A
is equal to lightness C to B. Again, by repeating this process, we can determine a ratio scale
of lightness. In this procedure, we have implicitly assumed that there is something called
zero lightness and all other lightness values are measured from that zero point. Thus, it is
meaningful to talk about lightness B, lightness A and their ratio. Fortunately, experiments
seem to show that these two lightness scales are equivalent (i.e., they can be described by
the same mathematical equation, L = Y p − c, where L is the perceived lightness, Y is the
luminance of the paper, c is a constant, and p is approximately 1/3 for light surround).
The method of adjustment and the method of limits suffer from lack of control of the
criteria used by the observer. Two different observers may use different criteria to judge
when the stimuli are equal or just noticeable. Furthermore, these two methods are also
shown to produce biased estimates. For example, in the method of adjustment, an observer
may stop adjusting prematurely because of frustration, or may overadjust the stimulus
control to make sure that some difference can be surely observed. For these reasons, these
two methods are not and should not be used when the method of constant stimuli under
computer control can be used. Currently, a widely used procedure is multiple alternative
forced choice, in which the observer is forced to choose in which of the few time intervals
(presented sequentially) or alternative stimuli (presented simultaneously), the desired signal
is present. This procedure can be used under computer control to determine some threshold
value or some psychometric function very quickly and accurately.
their limitations, while taking full advantage of them whenever no reliable alternatives are
available. Here we offer several cautions with the hope that if we can use the information
correctly, psychophysical data will help us in our engineering design and evaluation.
B, is the integral of the just noticeable difference (JND) as a function of luminance level, Y .
At high luminance levels, the JND luminance, Y , is found to be roughly proportional to the
luminance. Therefore, according to Fechner’s hypothesis: B = kY /Y , and, integrating
both sides, we get B = k log Y . Using this brightness function in imaging applications raises
very serious questions. First of all, the proportionality constant, k, is not a constant, but varies
with luminance level and stimulus configuration. Secondly, the experimental stimuli that
were used to determine the JND luminance are quite different from a complex, natural scene.
Thirdly, the JND luminance may be related more to the noise in the visual system that makes
it impossible to discriminate two similar luminance levels than to the brightness sensation
itself. To make the jump from Y (JND) = k Y to B = kY /Y is in fact an assumption,
not an experimental conclusion. However, the literature of digital image processing is full of
models that take B = k log Y as the basic relation that relates the brightness to the luminance
of the stimulus. This is a most unfortunate mistake made again and again by engineers who
are not aware of the assumptions and the conditions of the experiments.
In summary, visual psychophysical data are quite important for us to understand how our
visual perception works, but we should always pay attention to the experimental conditions
and the type of stimuli used in the experiments. For engineers working on color imaging
applications, it is not easy to check the details in these visual experiments. The first question
one should ask when presented with some psychophysical “laws” is to ask under what
conditions do they apply. For example, Weber’s law says that the JND in stimulus intensity
I is proportional to the stimulus intensity I . This law is so generally useful in many areas
of perception that it is often considered as a summary of facts. In reality, experimental data
show that this law is true only over some range of stimulus intensity and even then it is
true only for certain stimulus configurations. Before we use Weber’s law in our imaging
application, we have to know those constraints. When we study the psychophysical literature,
we need to make clear distinctions between data, concepts, theories, and models. Weber’s
law as applied in visual perception is a model of only limited validity.
Visual thresholds, which represent our capability to detect the presence of a visual stimulus
under various conditions, are the best-studied area in visual psychophysics. Two of the most
important visual thresholds are the absolute threshold for detecting the minimum amount
13.2 Visual thresholds 327
of light energy and the contrast threshold for detecting the slight difference between two
neighboring luminance levels. The detection of small color differences is also studied in the
context of various thresholds, depending on which attribute or physical variable is changed
and detected.
4.0
3.0
2.0
log contrast threshold
1.0
-3.0
-7.0 -5.0 -3.0 -1.0 1.0 3.0
log adaptation luminance (footlambert)
Figure 13.1. Contrast thresholds as a function of the adaptation luminance and the target size. These
are measured with a circular disk against a uniform background (stimulus brighter than the back-
ground). The target size is expressed as the visual angle in minutes of arc.
luminance, and (3) with sine wave gratings. The results from (1) and (2) are about the same.
Contrast thresholds for sine wave gratings are quite different and they are discussed in the
next section.
Since visual thresholds depend on target size, viewing distance (visual accommodation),
retinal position, eye movement, and time of viewing, these factors need to be controlled in
the experiments. When comparing contrast threshold data, it is important to note if these
factors are controlled the same way in the experiments. However, for imaging applications,
there is usually no constraint in retinal position or viewing time, and data that are collected
under these conditions are still quite useful, if not more so. One set of the best-known contrast
threshold data was collected by Blackwell [114], who used circular disks of various sizes
against a uniform background at different luminance levels. The observers were free to move
their eyes and had plenty of time (longer than 15 s) to examine the visual field. Contrast
is defined as the absolute luminance difference (between the target and the background)
divided by the background luminance. The contrast threshold is defined as the contrast
which an average observer will see 50% of the time, after due allowance is made for
chance success. It was pointed out that the observers did not feel confident of having seen
a stimulus unless the level of probability of detection was greater than 90%. For this level
of detection, the threshold value has to be multiplied by a factor of 1.62. Figure 13.1
shows the log threshold contrast as a function of log adaptation luminance and the target
size (in minutes of arc) from Blackwell’s experiments. It is clear that the curves have two
branches separated approximately at 10−3 cd m−2 (1 footlambert = 3.426 cd m−2 ). The
left-hand branch is believed to be rod-dominated detection and the right-hand branch the
cone-dominated one. When the target size is larger, it is easier to detect and the contrast
threshold is about 0.76% for the 121 target at high-luminance adaptation. We can see that
for the largest target at high luminance (above 1 cd m−2 ), the contrast threshold is essentially
13.2 Visual thresholds 329
independent of the background luminance, as predicted by Weber’s law that says that the
detectable luminance difference is proportional to the mean (or background) luminance.
The smallest target size used was 36 minutes of arc because in lower background luminance
the visual threshold for a target smaller than this size depends mainly on the product of
luminance and target area. In higher luminance (above 1 cd m−2 ), this critical visual angle
is about 1 minute of arc.
In measuring the contrast threshold, we can use a stimulus that is either brighter (incre-
ment threshold) or darker (decrement threshold) than the background. Experimental data
show that, when the adapting background is brighter than 1 cd m−2 and the target size
is larger than 100 minutes of arc, the two thresholds are about the same. However, with
lower light levels or smaller targets, the decrement threshold is consistently lower than the
increment threshold by as much as 20%. A target darker than the background is easier to
detect than one that is brighter [750].
Luminance CSF
There have been many experimental measurements on our contrast sensitivities [70]. Al-
though they are not always compatible with each other because the experimental conditions
were not controlled the same way, detailed analysis of these data allows us to see the gen-
eral features and the important variables of the CSF data. With normal eye movement, our
luminance contrast sensitivity function at high luminance (above 3 cd m−2 ) is found to be
bandpass in shape, peaked at around 2–8 cycles per degree (varies between experiments).
330 13 Visual psychophysics
As the luminance decreases, the peak becomes broader and its maximum point gradually
shifts to lower frequency. For luminance higher than 50 cd m−2 , the CSF remains quite
constant. When the stimuli have the same mean luminance, the CSF of a monochromatic
luminance grating does not seem to depend on the wavelength [985].
Our contrast sensitivity, S, varies with many physical variables, such as spatial frequency
ν in cycles per degree, orientation θ, light adaptation level L in candelas per square meter,
image area s, in degrees squared, viewing distance d in meters (for the accommodation
effect), and retinal eccentricity φ in visual degrees. After comparing many published data,
Daly [234] proposed the following empirical function for describing the visual contrast
sensitivity:
# $
ν
S(ν, θ, L , s, d, φ) = S0 · min C , L , s , C(ν, L , s) , (13.1)
rd · rφ · rθ
where S0 is used to adjust the peak sensitivity and the adjustment factors, ra , rφ , and rθ are
given by
500.0
400.0
contrast sensitivity
300.0 1000.0
100.0
10.0
200.0
1.0
0.1
0.01
100.0
0.0
0.0 10.0 20.0 30.0 40.0 50.0
(a) spatial frequency (cycles/degree)
1000
1000.0
10.0
100.0
1.0
contrast sensitivity
100
0.1
0.01
10
1
0.01 0.10 1.00 10.00 100.00
(b) spatial frequency (cycles/degree)
Figure 13.2. Contrast sensitivity functions. In (a) the CSFs are plotted in linear scale, while in (b) they
are plotted on a log scale. The numbers are the luminance levels in candelas per square meter.
where a, b, c, and d are model parameters, and νx and νt are the spatial and temporal
frequencies. The constants SE and SI represent the strength of the “excitation” and inhibition”
components, in accounting for the center-surround types of receptive field as measured along
the visual pathway. It should be pointed out that Daly’s model implies that S(ν = 0) = 0
and we are not able to see the zero frequency, i.e., a uniform field. This seems to agree with
the stabilized image experiments. However, this interpretation is not without controversy
because one can argue that, even with our eyes closed, we can still notice a difference
between light and dark [540, p. 290].
A more theoretically developed contrast sensitivity model was proposed by Barten [70].
This consists of two cascaded stages. The luminance signal entering the eye is first degraded
by the optical MTF of the eye (a low-pass filter). It then goes through another degradation
due to the neural process of the lateral inhibition, which has the effect of a high-pass filter.
332 13 Visual psychophysics
These two stages combine to give the general bandpass shape of the luminance CSF. The
model then accounts for the effect of photon noise, the effect of neural noise, the effect
of spatial-temporal integration and the psychometric function of the two-alternative forced
choice method. The optical MTF of the eye is modeled as:
where ν is the spatial frequency. The standard deviation σ of the optical spread function of
the eye is modeled as a function of the pupil diameter d:
σ = σ02 + (Cab d)2 , (13.10)
where σ0 is a constant. The complete model for binocular vision can then be described by
the following:
e−2π σ 2ν2
2
where A is the effective area for object size; k is a constant calculated from the psychometric
function of the task involved and in this case it is the signal-to-noise ratio for grating
detection; σ is the standard deviation of the optical LSF of the eye; T is the visual integration
time [s]; X 0 (Y0 ) is the angular size of the object size [degree]; X max (Ymax ) is the maximum
X (Y ) angular size of the integration area; Nmax is the maximum number of cycles over
which the eyes can integrate the information; η is the quantum efficiency of the eye; p is
the photon conversion factor that depends on the light source; E is the retinal illuminance
[td]; 0 is the spectral density of the neural noise; and ν0 is the cut-off frequency of the
lateral inhibition. The photon conversion factor p is defined by the number of photons per
unit of time, per unit of area, and per unit of luminous flux per angular area entering the eye.
Let P(λ) be the spectral power distribution of the visual target and V (λ) be the photopic
luminous efficiency function, then the photon conversion factor, p, can be computed by the
equation
3 P(λ)V (λ)λdλ
p = 2.246 × 10 , (13.13)
P(λ)V (λ)dλ
where λ is expressed in nanometers and the unit of p is [photons s−1 deg−1 (photopic
td)−1 ]. For example, if P(λ) is the same as the spectral power distribution of CIE Illuminant
A, p = 1.285 × 106 [70, p. 63]. Typical values of the paramters are: k = 3.0; T = 0.1 s;
η = 0.03; σ0 = 0.5 arcmin; Cab = 0.08 arcmin mm−1 ; X max = 12◦ ; 0 = 3 × 10−8 s deg2 ;
Nmax = 15 cycles; and ν0 = 7 cycles per degree. This model has been shown to describe
many experimental data very well under various experimental conditions [70].
13.2 Visual thresholds 333
1000
luminance
contrast sensitivity red green
100
10
1
0.01 0.10 1.00 10.00 100.00
spatial frequency (cycles/degree)
Figure 13.3. Comparison of the luminance CSF and the chrominance CSF. The bandwidths of this set
of data seem to be narrower than other measured values. However, the general shapes are consistent
in different experimental results.
Chrominance CSF
Measurement of the chrominance CSFs is more difficult to do because of two complications:
(1) the detection criteria and (2) the correction for chromatic aberrations. Higher-frequency
color gratings can be detected before we can tell their colors. Therefore, we need to define
the criteria for the detection threshold. In general, the question is compounded by the
fact that pure chrominance (isoluminant) gratings are not easy to produce because they
are specific to each individual. The other difficulty is that our eyes have very significant
chromatic aberrations. Without correction, we are not sure if the detection is based purely
on chrominance variations. These problems have been addressed and the results show that
isoluminant red–green and yellow–blue chromatic CSFs have the same low-pass shape
(compared with the bandpass shape of the luminance CSF), but the yellow–blue sensitivity
is about 0.5 log unit lower than that for red–green [540, p. 215]. It has also been shown that
at each spatial frequency color contrast sensitivity declines with eccentricity approximately
twice as steeply as luminance contrast sensitivity [685]. Figure 13.3 shows a comparison of
the luminance CSF and the chrominance CSF for a typical observer as reported in [684].
The absolute sensitivity and the absolute resolution limit tend to vary from experiment to
experiment. The data shown in Fig. 13.3 seem to be on the low side. The reported highest
frequencies of the luminance and chrominance CSFs are twice as high as shown here. For
example, the red–green isoluminant interference fringes up to 20–25 cycles per degree are
still perceptible in some experiments [847].
It should be pointed out that, in color imaging applications, it is not practically possible
for color signals to be separated into pure luminance and pure chrominance components and
therefore, the precision of the chrominance CSF data is usually not critical. For example,
the NTSC YIQ color encoding is far from exact in this respect, but it achieves the major
334 13 Visual psychophysics
benefit of chrominance bandwidth compression. The same thing can be said about most
color image compression algorithms. However, it does not mean that we need not worry
about this issue at all [673]. The fact is that the more accurately we encode our color signals,
the better the image compression, transmission, or processing results we will achieve.
When the human visual system is first exposed to a new stimulus, its sensitivity to it is
initially quite high and then it drops as the time of exposure increases. The adjustment of
sensitivity to an external stimulus is called visual adaptation. For example, if we look at a
spatial sine wave grating for a long time, our sensitivity to another sine wave grating of the
same frequency is decreased. Another type of visual adaptation refers to the adjustment of
our visual system to a new operating point. For example, if we enter a dark theater from a
bright sunlit street, our eyes cannot see much for the first few minutes. After that, our visual
system gradually adjusts its light sensitivity over time. This is called dark adaptation. Visual
adaptation is a very important factor to consider in imaging science applications, because
the images are often viewed in an environment very different from the one in the original
scene. The operating condition for our visual system has changed and the effect of visual
adaptation has to be carefully considered.
p. 354]. The methods of colorimetry apply strictly to a middle range of about 1000-fold
in cone vision. In order for the eyes to function over such a large dynamic range, our
visual system adjusts its operating point photochemically and neurophysiologically. Light
adaptation refers to the change of visual sensitivity to an increase of background or average
luminance of the visual field, while dark adaptation refers to the change in sensitivity in
response to a decrease in luminance. By measuring visual thresholds at different times after
the light is turned on (off), we can estimate the time course of our light (dark) adaptation.
The curve of visual threshold as a function of time in the light (or dark) clearly shows two
separate branches. In dark adaptation, the visual threshold is initially determined by the
cone vision and after some time by the rod vision. It is found that the rods and cones adapt
at different rates: cone adaptation is completed in about 7–15 minutes, while rod adaptation
continues for about 1 hour or more [366, p. 86]. In light adaptation, the rods determine the
initial threshold elevation. Rods begin to saturate at around 100 scotopic td and the visual
threshold is then determined by the cones [53, 153, Chapter 3].
Within a factor of 100 in luminance change for photopic vision, our light or dark adap-
tation is accomplished relatively fast. The brightness and lightness of the objects appear to
be approximately the same after our eyes adapt to the change. For example, if we sit near a
window to read a book from morning to noon, we can hardly detect the continuous change
in the daylight luminance level. How does the visual system know how much adjustment
is needed to maintain the brightness constancy? At a first glance, this seems to be a trivial
problem because all the visual system has to do is to use the paper of the book as a reference
to adjust its light sensitivity so that the paper will look the same brightness or lightness
while the illumination is changing. However, that solution is not very robust since a refer-
ence object may not be present all the time. Furthermore, the illumination changes not only
in intensity, but also in the direction and distribution of lighting. The so-called invariant
reference is actually not easy to track after we turn a page. One can also turn the question
around and ask how we know the paper is white in the first place. In any case, our visual
system seems to perform much better than this type of solution. It does not matter whether
the ratios or the gradients are computed to construct a brightness/lightness description of
the scene, the central problem is how to determine an anchor point to which the computed
description can be “normalized” so that “white” objects remain “white” almost all the time.
This anchoring problem is also one of the most important problems in photofinishing, where
the problem is called the density balance problem – how to adjust the print density so that
the image will look good.
von Kries, based on the idea that the three cone systems adjust their individual sensitivities
linearly and independently:
L = kl L , M = km M, S = ks S, (13.14)
where L , M, S are the cone responses before they are adapted to one illuminant, L , M , S
are cone responses after they are adapted to another illuminant, and kl , km , ks are three
constants. The model is called the von Kries coefficient rule and the transformation from
L , M, S to L , M , S is called the von Kries transformation. But, how does the visual sys-
tem know how much adjustment is needed to maintain the constant colors, i.e., how do we
determine kl , km , ks from an image? This problem is similar to the anchoring problem in
the brightness/lightness computation and it is called the color balance problem in photofin-
ishing. Many studies show that the chromatic adaptation data deviate significantly from the
von Kries coefficient rule, especially the S cone response. Furthermore, Helson and Judd
[389, 475] observed that, under highly chromatic illuminants, bright objects continue to
appear in the same chromatic hue as the illuminant and dark objects appear to have a hue
complementary to the illuminant. For example, after adapting to a tungsten light, white
paper continues to appear slightly yellow–orange and very dark gray paper appears to be
slightly bluish. This effect is called the Helson–Judd effect. Note that this description as-
sumes that there is a “normal” illuminant under which this effect does not exist, otherwise,
we cannot define what we mean by highly chromatic illuminants. This “normal” illuminant
seems to relate to the so-called “absolute white point”, which can be measured in complete
dark adaptation by judging which blackbody illuminant requires least luminance to appear
white. However, not many data are available on this type of measurement and therefore
the concept of the “absolute white point” or the resting adaptation state requires further
experimental confirmation.
Many studies have been performed on the general characteristics of our chromatic adap-
tation. The main empirical observations are:
When we fix our eyes on a definite target in space, our eyes do not stay motionless. A fine
tremor of amplitude about 30 of arc (roughly the cone-to-cone distance in the fovea) is
always present in the eye movements. This small tremor has a temporal frequency in the
range 30–80 Hz or higher. The tremor movements are very irregular and appear to be noise-
like in recorded eye-movement charts. In addition to the tremor, there are two other types of
eye movements: drifts and saccades. Drifts are very slow, low-frequency movements (about
1–4 minutes per second). Saccades are occasional, sharp movements. Involuntary saccades
are usually less than 10 minutes of arc. The median intersaccadic interval is about 600 ms
and the duration of a saccade is about 70 ms [257, p. 375]. Experimental measurements
showed that saccades in the two eyes are highly correlated, in both amplitude and direction.
This implies that there is a central action controlling the saccades in both eyes. With the three
types of involuntary eye movement, we cannot maintain an exact fixation. The distribution
of the direction of gaze is found to be a two-dimensional Gaussian function. The horizontal
and vertical standard deviations are usually different and the correlation coefficient is not
zero. The size of the standard deviation is on the order of 2–4 minutes of arc over a time
interval of 60 s. These measurements are often taken with the head held stationary by some
artificial support. Studies show that under natural viewing (without head support), the gaze
338 13 Visual psychophysics
is more unstable by a factor of 1.5–3 and the mean eye speed is increased by a factor of 2
(from 0.25 to 0.5 degrees per second) [526, p. 81].
In the presence of eye movements, the optical images formed by the eye are moving
across the retina. This raises at least two interesting questions: (1) Image motion across
retina should create motion blur; why don’t we perceive such blur? (2) What happens to
our visual perception if we eliminate such motion? The answers to these questions are
not complete at this time, but many experiments have revealed important insights into
how our visual system works. A simplified answer to the first question is that perception is
suppressed during the eye movements and the missing temporal information is interpolated.
In order for the interpolation to work, some information about the eye movements has to
be provided to the visual processes. It seems that the control center that issues the signals
for the eye movements also sends the information to the visual processes for the correction
of eye movements. Experimental data show that our contrast threshold is not elevated until
image velocity exceeds 2 degrees per second. The answer to the second question turns out to
be quite unexpected, and it leads us to very interesting and important discoveries about the
mechanisms of visual perception. In summary, when the retinal images are well stabilized,
the images tend to disappear or fade out in a few (1–3) seconds to a few minutes, depending
on the spatial-spectral composition and the luminance of the stimulus. The color of the
stimulus first becomes desaturated and the brightness decreases, and then the structures of
the image fade out into a uniform, foggy, dark, gray field. Images that have low-contrast
edges fade out sooner than those with high-contrast edges. The perceived dark gray field
occasionally becomes completely black for a few seconds and then returns to dark gray. This
is called black-out. It is believed that the fade-out phenomenon is due to neurons’ complete
adaptation to the stationary retinal image, while the black-out phenomenon seems to be
caused by a complete loss of vision, as a result of the central visual attention mechanism
shutting down all visual processing.
Although several apparatuses [257, Chapter 5] have been developed for producing stabi-
lized images on the retina, this type of experiment is not easy to do (some residual motion
still exists in most experiments), thus the total number of observers who have taken part
in the experiments is small. Therefore, quantitative data are scarce, but qualitative data are
quite revealing and they are all consistent in showing the importance of eye movements on
the normal function of vision [257, 758, 1059]. The most compelling evidence from the
stabilized image experiments seems to suggest that edges and gradients of the image irra-
diances are the key features from which our perceived images are constructed. Figure 13.4
shows one of the experiments by Krauskopf [528]. The stimulus was a red central disk
surrounded by a green annulus (as seen in (a)). When the boundary between the central
disk and the annulus surround is stabilized on the retina, the subject sees a uniform green
disk! One can explain such a perception by arguing that, after it is stabilized, the boundary
of the central disk disappears. Our visual system does not see the red–green boundary and
our perception simply fills in the central disk with the green color. The propagation of in-
formation from the edges into the interior region surrounded by the edges is thus called the
filling-in process. The basic reasoning behind the explanation is that, without detecting any
temporal changes before and after the eye movement, the brain interprets the corresponding
image area as uniform and thus whatever is in the surround is used to fill in the center.
13.4 Eye movements and visual perception 339
before after
stabilization stabilization
GREEN
RED GREEN
(a) (b)
stabilized
Figure 13.4. Stabilized retinal images. The stimulus is a red central disk surrounded by a green
annulus (as seen in (a)). When the boundary between the central disk and the annulus surround is
stabilized on the retina, the subject sees a uniform green disk (as shown in (b))!
1 However, the filling-in process may take some time to complete [782, p. 99], and it is possible that a faint
gray line can be seen, but it may disappear a few moments later.
340 13 Visual psychophysics
perceived
image
BEFORE
stabilization
stabilized boundaries
perceived
image
AFTER
stabilization
Figure 13.5. Stabilized retinal images. The stimulus is a checker board pattern on the left and a uniform
red background on the right. In each half is a circular disk, half white and half black. The boundaries
between the circular disks and the backgrounds are stabilized on the retina, while the background
patterns are not. The observer fixates at the target so that the disks do not move across the two different
backgrounds. When the stimulus is first turned on, the target appears to the observer as the top panel.
Within a few seconds, both disks fade out to dark gray. After a few more seconds, the disk against the
uniform red background disappears (or becomes part of the red field), but the disk against the checker
board pattern remains dark gray.
slow. The “fast” process reaches a steady state in 1–3 seconds, while the “slow” process takes
30–40 seconds or longer [1059, p. 78]. For example, the stabilized part (the background)
of the visual field (saturated red) takes only 1–3 seconds to fade out, but the unstabilized
part (a moving white disk) continues to change in color appearance from white to cyan
for 30 s or longer. This finding is confirmed by many other experiments that measured the
adaptation time more directly.
Another very interesting experiment by Piantanida and Gilchrist [346, pp. 26–27] shed
more light on how visual perception is encoded. Instead of stabilizing the boundary between
the center target and its background, they stabilized the boundary between the background
and the surround. Figure 13.6(a) shows the stimulus configuration. The outer borders of the
black and white backgrounds were retinally stabilized. The two center squares have the same
luminance (middle gray). Figure 13.6(b) shows how the stimulus appears to the subject after
the stabilization. The original white and black backgrounds disappear to become a uniform
gray. The left center square now appears black and the right center square white, even though
they have the same luminance. A simple explanation is that the differences or ratios between
the center squares and their backgrounds are encoded. Stabilizing the outer borders of the
13.5 Perception of brightness and lightness 341
Figure 13.6. (a) The stimulus configuration used in Piantanida and Gilchrist’s experiment. The outer
borders of the black and white backgrounds were retinally stabilized. (b) Appearance of the stimulus
after the borders were stabilized.
backgrounds does not change those unstabilized relations. Here it is clear that the visual
system does not simply fill in the backgrounds and use the filled-in value and the luminance
of the two center squares to compute their appearances – the differences or ratios are encoded
first and they are used by the reconstruction algorithms.
The computational issues regarding visual perception are enormously complex [10, 337,
339, 346, 636, 802]. Before we discuss the broad and general subject of brightness per-
ception, let us examine how our visual perception behaves in a very simplified visual field.
From studying the simple visual stimuli, we hope to gain insight into some of the basic
principles of how our visual system works.
In order to simplify our discussion in this section, we will only deal with stimuli that are
identical in spectral composition and appear achromatic (meaning that, at a steady state, the
stimuli do not appear to have hues, such as red, green, orange, etc.). We will also assume
that the stimuli remain constant for the time of viewing. The only variable of the stimuli is
the intensity (radiance or luminance).
The first problem we need to deal with is the definition. Unfortunately, this is also one
of the most difficult problems in the study of visual perception. This is due to not only
the nature of subjective judgment in visual perception, but, more seriously, even the basic
categorical delineation of the component percepts is uncertain. For example, the amount of
light coming into the eye from a spatial area (the radiance or luminance) can be measured in
342 13 Visual psychophysics
absolute physical quantities. However, experimental data clearly show that the human visual
system cannot judge or estimate this absolute physical quantity very reliably, especially in
a complex scene. That is why professional photographers, even though highly experienced
in judging photographic lighting, often carry some type of photometer around when they
are taking pictures. Therefore, the first basic thing we learn is that our visual system is
not designed to be a radiometer or a photometer. However, experimental data show that we
are extremely good at detecting differences between neighboring stimuli. So what kinds of
perceptual quantity can we define?
Let us start with the sensation caused by light. It appears true to our common experience
that, if everything else is kept constant, the more light coming into our eyes from a spatial
region, the stronger that sensation is. Therefore, let us define a quantity, called brightness,
which corresponds to the strength of our sensation to the intensity of light. So far, the
definition of brightness seems intuitive and reasonable, and we will leave it as that. However,
we should keep in mind that such a definition presumes that there is an agreement on what
the sensation is and that somehow it can be judged internally with some reliability [815].
Both assumptions can become quite questionable when complex comparisons are being
made. Also, the history of research on perception is full of debate and confusion about
the meaning of “sensation” as opposed to “perception”. We need to pay attention to the
distinction between the two words here because “sensation” is meant to be more direct and
lower-level visual reception, without the contextual, cognitive, high-level interpretation that
belongs to “perception”. For example, under tungsten lighting, the sensation produced by the
light reflected from a piece of paper may be yellowish, but we may perceive it as being a piece
of “white” paper. Although we often do not pay attention to the fact that our color sensation
of snow in the shade is actually bluish because we perceive it as white, once someone points
it out to us, we can indeed see the bluish color in it. Many experiments have shown that we
are extremely capable of seeing and interpreting the visual world in its proper context. When
we look around, we are seeing not only the objects, their shapes, colors, and textures, but
also the general lighting and spatial layout of the room. We see that the objects in the shade
appear to have less light reflected from them because they are in the shade, not because they
have lower reflectances. This means that experimenters have to ask the right questions to
get the right answers. For this reason, brightness is used to mean apparent luminance and
lightness, apparent reflectance [796, 1028]. In many of the older psychophysical studies
(say before the 1960s), the difference between these two was not clearly distinguished and
the resulting data from different experiments can lead us to very different conclusions.
One related issue regarding to the definition of brightness is the use of the words “dark”
and “black”. We will use “bright” or “dark” when the sensation of brightness is strong or
weak, and reserve “white” and “black” to refer to other perceptual attributes to be discussed
later.
expect that, in ganzfeld viewing, similar perceptual phenomena will be observed to those
in the stabilized image experiments, and indeed this seems to be the case. When the light
in the uniform field is turned on, initially the field appears to be a uniformly bright field
of a certain color. Gradually, the color begins to desaturate and darken, and soon the entire
visual field looks like a dark gray field (the fade-out phenomenon) [45, 203, 204, 373, 374].
Just like in the viewing of an image stabilized on our retina, the perceived dark gray field
occasionally becomes completely black for a few seconds and then returns to dark gray
again. When the state of fading out is reached in viewing a ganzfeld, the final, perceived
dark gray often differs from complete darkness, in the sense that a certain faint sensation
of “brightness” does exist. The strength of this sensation of “brightness”, at its steady state,
increases slightly with the luminance of the ganzfeld. An easy way to approximate this
sensation is to close both eyes in a lighted room. The moment we close our eyes, we see a
dark gray field. If we are not sure this dark gray field is a steady state perception, we can
continue keeping our eyes closed and watch what changes occur during the dark adaptation.
In fact, our eyelids are not sufficiently opaque to keep out all the room light, and therefore
we are not undergoing a true dark adaptation. If, without opening our eyes, we now turn
off the room light (or have someone else turn off the light for us), we see the dark gray
field momentarily becomes much darker and then after a few seconds the field returns to
dark gray again and this dark gray appears to be much darker than when the room light was
on. Depending on how much residual light remains in the room, the visual field can appear
from very dark to completely dark. We therefore come to the conclusion that there is some
sensation of light even when the visual field is completely uniform.
If the size of the target is reduced below 1◦ , the luminance–brightness relation is still a
power function, but the power becomes greater than 0.33. For example, when the target is
1.5 minutes of arc, the power becomes 0.51. Similarly when the duration of stimuli is re-
duced below 1 s, the power also increases. For example, for flashes of 20–200 ms, the power
increases to about 0.4. It should be remembered that the psychophysical magnitude estima-
tion often has large variances and the exact functional form of the luminance–brightness
relation and its power should be taken only as an approximation.
The above experiments were performed on dark-adapted eyes. What happens to the
brightness–luminance relation when the eyes are light-adapted? We still want to measure
the brightness of an isolated spot of light without any contrasting background. Therefore,
we let the eye adapt to a constant luminance for a long time, and at the moment just before
the target is presented, the background light is turned off. Therefore the target is again
seen against a dark background, but the eye is in a different adaptation state. The effect of
light adaptation on the brightness function can be measured either by magnitude estimation
or by interocular matching, with one eye dark-adapted and the other light-adapted. Both
procedures produce the same results. The perceived brightness of the central target is still a
power function of the target luminance when the eye is light-adapted. However, the power is
slightly increased and there is a sharp curvature near the threshold brightness. The following
equation can be used to describe the functions:
Light adaptation lowers the value of k and raises the value of Y0 . The value of n is slightly
increased. For example, adaptation to a luminance of 1590 cd m−2 raises the value of n from
0.33 to 0.43. The value of Y0 determined by fitting the data to the equation was found to be
the same as the measured threshold brightness for the same level of light adaptation [905,
p. VII-12]. Table 13.1 shows the parameter values as determined by Stevens and Stevens
[905, p. VII-14].
surrounded by a concentric ring (annulus) of a different luminance. We will call the central
disk the target field. In the experiments reported by Stevens and Stevens [905], the central
disk is about 2.4◦ and the annulus about 5.7◦ . In their experiments, the disk and its annulus
are presented at the same time for only about 2 s, with about 10 s between presentations
[905, p. VIII-9]. Therefore, the observers did not fully, if at all, adapt to the stimuli. This is a
key point to remember when interpreting their results and when comparing them with other
experimental results. One may question whether this set of experimental results are useful
for our imaging applications. However, there are many interesting qualitative perceptual
phenomena that can be seen from the results, and that makes it interesting to examine
the data. Another reason is that the data are also consistent with those from some other
experiments (e.g., [583]) that allowed continuous viewing and judging.
When the surround is fixed at a certain luminance and the disk is varied from very low
luminance to the luminance of the surround, the perceived brightness of the disk again varies
as a power function of its luminance, as described by Eq. (13.16), except that the exponent
n is increased as a function of the surround luminance:
where S is the luminance of the surround in candelas per square meter. For S = 100 cd m−2 ,
the exponent is 2.33 which is very much higher than the 0.33 when no surround is present.
Now if the disk luminance is increased beyond that of the surround, its brightness is no
longer influenced by the surround and follows about the same relation with the luminance
as in the single disk, no surround experiment.
Stevens and Stevens [905] also described an attempt to answer a very important question
related to imaging applications. A reflection print contains areas of different reflectances.
The question is: if we view the print under various illuminations of different luminances,
will the perceived brightness difference between two areas on the print increase with the
illumination level or will it remain constant? Our everyday experience indicates that the
picture on a reflection print looks more vivid under brighter illumination and indeed this
is also true for the simple disk and annulus stimuli. The experimental data suggest a sim-
ple relation for describing this phenomenon. Let the surround luminance be S and the
disk luminance be ρ S, i.e., a fixed fraction ρ of the surround. It was found that the
brightness of the disk grows as a power function of the luminance with an exponent m
given by:
This suggests that a disk with a reflectance smaller than 14% of the surround will look darker
as the illuminance increases, because the exponent is negative. On the other hand, a disk with
a reflectance greater than 15% of the surround will look brighter as the illuminance increases
due to the positive exponent. (The brightness of a disk with a reflectance of about 15% of
the surround will remain the same independently of the illuminance.) As a consequence,
dark regions become darker and bright regions become brighter when the illuminance is
increased, and a reflection print looks more vivid under brighter illumination.
13.6 Trichromatic and opponent-process theories 347
If the goal of color vision is to distinguish visual stimuli of different spectral compositions
(such as telling if a banana is ripe for eating), then the task is truly complicated, not simply
because illumination and reflectance are coupled (the same problem exists for brightness
and lightness perception), but also because the potential number of spectral compositions
is huge. Palmer, Young, and von Helmholtz (see [728]) correctly reasoned that we cannot
have infinitely many types of color sensors, one for each spectral composition, and they
further proposed that in fact we have only three. Their theory of color vision based on direct
sensor outputs from the three types of photoreceptor in our retina is called trichromatic
theory. Maxwell’s demonstration that most colors we see can be reproduced by additively
mixing three primary colors gave further support to trichromatic theory. Of course, modern
physiological, anatomical, and psychophysical studies confirm the basic trichromatic nature
of our color vision. Most of us indeed have only three types of cone in our retina, and,
furthermore, it was argued that even if some of us have more than three cone types (because
of mutation and inheritance), our neural signal processing does not provide more than three
(luminance and chrominance) channels.
348 13 Visual psychophysics
1.0
red green
yellow blue
relative chromatic response
0.5
0.0
-0.5
-1.0
400.0 500.0 600.0 700.0
wavelength (nm)
Figure 13.7. The chromatic response functions of the two chromatic opponent processes, red–green
and yellow–blue.
However, careful analysis of our color perception (mainly through self-inspection of what
one sees) convinced Schopenhauer (1816) and Hering (1870) [396] that our color vision
is coded and represented internally as three opponent processes: light–dark (white–black),
red–green, and yellow–blue. The two chromatic processes account for the “facts” that our
perceived hues can be arranged in a circle and there seem to be four unique hues (red,
yellow, green, and blue) that together produce all hues. The achromatic process (light–dark)
accounts for bright and dark colors. The term “opponent” further means that white and black,
red and green, and yellow and blue cannot be seen in the same color stimulus (i.e., no color
should look reddish green or yellowish blue), because they represent the opposite polarities
of each of the three processes. Hurvich and Jameson [436] measured the chromatic response
functions of the two chromatic processes by using red to cancel green, blue to cancel yellow,
and vice versa, in the color of every monochromatic wavelength. This procedure is called
hue cancelation and assumes that the chromatic responses of different wavelengths are
additive. Figure 13.7 shows the chromatic response functions from their measurements.
The wavelengths at which the two chromatic response functions cross zero correspond to
the unique hues. The chromatic response functions vary with chromatic adaptation and so
do the unique hues.
For a while, there were supporters and opponents of trichromatic theory and opponent-
process theory. However, von Kries [163] proposed zone theory in which the color sensing
step is by three color cone types and then the sensed color signals are transformed and
encoded in opponent processes. This is now a commonly accepted view of our color vision.
Although physiological measurements show that there are neurons from the retina to the
visual cortex that seem to encode color signals in opponent processes, the data are not
quite consistent, nor are they neatly packaged as the opponent-process theory described.
Instead, neurons are shown to have so many different spectral response types that interpre-
tations in terms of opponent processes are only conceptual and have no predictive value in
neurophysiology.
13.7 Some visual phenomena 349
In this section, we describe some well-known visual phenomena. Not all of them are im-
portant to know for color imaging applications. However, since we understand so little
about how our visual perception works, it might be useful to be aware of as many visual
phenomena as possible, not only because they provide important clues for our research
effort, but also because they impose severe constraints on how our visual model should
work.
L0 luminance match
fluorescent
G0 grayness disappears
grayish
B0 black point (hue appears)
The perception as a whole is a bright disk, characterized by a single attribute called bright-
ness. If we decrease or increase the luminances of both by equal amounts, the perceived
brightness also decreases or increases. If we now fix the luminance of the larger circular
disk and reduce the luminance of the center small disk by half, we immediately encounter
a new perceptual attribute called grayness – the center disk now appears gray. The center
disk also appears darker than the larger disk. This darkness (or lightness) was found to
be a separate variable, different from the grayness. The two variables are the same only
for achromatic stimuli. If we further decrease the center luminance, its grayness increases
and it looks darker and darker, until a point, the black point (B0 in Fig. 13.8), is reached
when the center disk looks black. Further decreases in luminance no longer make the cen-
ter look darker – it remains black. If we change the chromaticity of the center stimulus
and repeat the process, it does not matter what the chromaticity of the center is, as its
luminance is decreased, a point is always reached when it looks black without any trace
of hue.
Evans described the color perception when the luminance of the center stimulus (say
700 nm) is increased from zero until it could not be tolerated by the eyes as follows.
When the luminance is much smaller than the surround (say, less than 1/1000), the center
stimulus looks black without any hue. As the luminance is increased, a point is reached
when a hue starts to appear. This point is called the black point B0 , as we described above.
Further increase of the center luminance results in less grayness in the stimulus. A point
G 0 is reached when the grayness disappears (for 700 nm, G 0 is about 1/10 of the surround
luminance) Further increase in luminance makes the color appear to be fluorescent. Since
no actual, physical fluorescence is involved, Evans called this phenomenon “fluorence”, and
said the stimulus appeared “fluorent”. A point L 0 is then reached when the center luminance
13.7 Some visual phenomena 351
matches that of the surround. Although no new perception occurs at this point, it is said to
be recognizable by skilled and experienced observers. If we continue to increase the center
luminance, the strength of fluorence continues to increase and then abruptly decreases and
disappears at a point S0 . The appearance of the center stimulus then changes from that of a
surface color to that of a light source.
Evans and his coworker measured G 0 for monochromatic light of various wavelengths
as the center stimulus. The resulting G 0 (λ) was found to have a shape almost identical to
the purity threshold curve, which is measured by adding monochromatic light to white light
to produce a just-noticeable hue. For this reason, Evans suggested that G 0 represents the
varying “chromatic strengths” of the different wavelengths. Evans further found that the
black point is a constant fraction of G 0 , independent of wavelength. With these findings,
he then defined G 0 as the point when the “brilliance” of the center is equal to that of the
surround. He also pointed out that the chromatic strength is not an inherent characteristic
that can be assigned to the individual wavelengths because such a measured function G 0 (λ)
varies with different surround stimuli [290, p. 134]. Although all these experimental results
are quite interesting and seem to be of fundamental importance, few follow-up studies are
to be found. Interested readers should consult Evans’ book [290].
13.7.3 Afterimages
If we look steadily at a saturated, colored target for about 10 s, and then look at a blank
sheet of white paper, we will see a pattern with a shape similar to that of the target, but
a color complementary to the original target color. This pattern is called the afterimage
of the original target. Afterimages that appear to be negative versions of the original are
called negative afterimages. For example, the negative afterimage of a red object appears
to be cyan, and so on. However, not all afterimages are complementary in color with
respect to the original objects. Some afterimages appear to have the same colors as the
originals, but often less saturated. These are called positive afterimages. It is also possible
to see an afterimage that changes its color appearance over time and does not seem to
have a well-defined relationship with the original colors. Negative afterimages often fade
and disappear in about 15 s or longer, while positive afterimages seldom last more than
5–10 s.
352 13 Visual psychophysics
luminance B
brightness
A a
(a) distance (b) distance
Figure 13.9. (a) A luminance profile across a transition edge and (b) its corresponding profile of the
perceived brightness.
Figure 13.10. The Chevreul effect: uniform-luminance squares do not appear to be uniform when
arranged to be adjacent to others.
(a) (b)
Figure 13.11. Hermann–Hering grids. The dark gray dots seen at the intersections of white lines in
(a) are called Hermann grids. The light gray dots seen at the intersections of black lines in (b) are
called Hering grids.
(a) (b)
luminance
profile
(c)
Figure 13.12. The Craik–O’Brien–Cornsweet effect: (a) the reflectance pattern of a circular disk;
(b) the appearance of the disk in (a) when it is rotating at high speed; (c) the luminance profile as
measured by a photometer with a long integration time (longer than many cycles of the rotation).
13.7.9 Assimilation
When patterns of different colors are adjacent to each other (such as a square wave grating
consisting of alternating green and yellow stripes), the simultaneous contrast effect tends
to increase the apparent color differences. However, when one of the color patterns is much
finer than the others, its color tends to be pulled toward (rather than pushed away from)
the coarser color pattern surrounding it. This effect is called assimilation [436, p. 175]. It
is quite sensitive to the relative and the absolute sizes of the color patterns involved. For
example, assimilation is mostly seen above 4 cycles per degree for a square wave stimulus
[886]. Therefore, we may have to hold the patterns at some further distance away from us
to see the effect. It should be noted that the effect occurs while the yellow patterns in the
grating mentioned above are still distinctly visible as separate elements. Therefore, it is not
optical blur alone that is causing the effect to happen.
(a) (b)
Figure 13.14. Subjective contours can be generated easily in many different ways.
However, a general solution requires a greater understanding of the basic visual processes
involved.
13.8 Problems
13.1 Describe how the Weber fraction, L/L, varies as a function of the adaptation lumi-
nance level and the size of the visual target.
13.2 The CSF is often used to optimize the quantization table used in JPEG compression.
A 512 × 512 image is to be displayed on an area of 20 cm by 20 cm of an ideal
CRT monitor (with MTF = 1.0 for all frequencies). The average luminance level of
358 13 Visual psychophysics
the image is 50 cd m−2 , and the viewing distance is 25 cm. Determine the optimal
quantization table with the quantization interval set at 16 for the peak sensitivity
frequency. What are the potential problems with this design method?
13.3 The von Kries transformation is not very accurate, but it accounts for the major effect
of chromatic adaptation. The three coefficients are often determined from the light
source colors. Experimental data show that the transformation would work even better
if the coefficients were determined from the corresponding, perceived grays in both
illuminants. How is the Helson–Judd effect related to this finding?
13.4 Are the invariant hues in the Bezold–Brücke effect related to the unique hues (hues
that are pure red, pure green, pure blue, and pure yellow) in the opponent processes
(see the discussion in [995])?
13.5 The Mach bands are good examples showing that what we see is not a simple point-
by-point mapping of the scene luminance. Can you construct a mathematical model of
the Mach band perception? Does your model explain why the Mach bands are difficult
to see on a sharp edge?
14 Color order systems
Having studied radiometry, colorimetry, and the psychophysics of our visual perception,
we now have the appropriate background to study the subject of color order systems. This
is a subject that is often discussed on an intuitive level, but the concepts and the logic of
color order systems can be much better appreciated if we have a proper knowledge of the
physics of color and the psychophysics of human color perception. Therefore, we have
delayed discussion of this subject until now. Color order systems are important in applica-
tions because they provide some practical solutions for many color problems in our daily
life, such as how to specify the paint color we want and how to coordinate the colors of
furniture. Color order systems are also quite important for the explicit expression of our the-
oretical thinking and understanding of how we perceive colors, such as the opponent-color
processes.
14.1 Introduction
How many colors can we distinguish? The number is estimated to be more than one million
[713]. How do we accurately communicate with each other about a particular color without
actually showing a real sample? Obviously our vocabulary of color names is too limited for
this purpose. A system is needed to order all possible colors according to certain chosen
attributes in a well-defined manner so that any color can be specified by its attributes in
the system. In principle, a color order system can be designed purely on a conceptual level
[519]. However, for the convenience of practical use, most color systems are implemented
as collections of physical color samples. This makes them easy to understand and easy to
use, and means it is easy to make approximate interpolation between colors. For example,
when selecting the color of paints to use, we have a better appreciation of the colors if we
have actual samples to see (of course, we have to be aware of the effects of the illumination).
For this reason, the actual physical color samples are as important as the conceptual color
order system on which the physical samples are based. In order to distinguish the two,
the conceptual system will be referred to as the color order system and its physical color
samples will be called the color atlas.
From the vision point of view, colors are most often perceived as belonging to ob-
jects. Their perceived attributes are greatly influenced by their surrounds, surface textures,
surface shapes, illumination, and the adaptation state of the observer. It is hard to imag-
ine how colors in all their possible modes of appearance can be systematically ordered.
359
360 14 Color order systems
In fact, without a detailed understanding of the human visual system as a whole, such
an attempt is not possible because we do not know what attributes to use for the order-
ing. A practical approach is to restrict the domain of interest to the colors of uniform,
textureless, flat surfaces as viewed against a uniform, textureless, flat, neutral surround,
under a standard illumination. In this restricted domain, the spatial and temporal variables
are excluded, the quality and quantity of illumination are well controlled, and the adap-
tation state of the observers is maintained at a “constant”, steady state by asking them to
move their eyes continuously across the entire visual field. From our understanding of the
human visual system, we hope that we have essentially fixed all the visual variables ex-
cept the spectral variations. With the additional assumption that in photopic vision, only
three types of photoreceptor are responsible for vision, we can propose that we need only a
three-dimensional solid for ordering all the colors in this restricted domain of interest. How-
ever, even within this greatly simplified domain, infinite possible choices exist and other
requirements or preferences are needed to specify a color order system [399, 430, 519, 795,
1050].
The basic difference between systems is the attributes they choose to order colors. We
will describe only four color order systems to illustrate the diverse views that can be taken
to order colors: the Ostwald system, the Munsell system, the natural color system (NCS),
and the Optical Society of America (OSA) system. Descriptions of other systems, such as
the DIN system and the Coloroid system, can be found in [11, 399, 430, 795]. Among the
four systems discussed here, the Munsell system is the most widely used system (especially
in the USA and Asia).
believed to be incompatible with the placements of the red, yellow, green, and blue at 90◦
intervals because there are more perceptually equal steps in hue between red and blue than
between red and yellow.
The most widely used ideas from the Ostwald color system are his rules of color harmony.
Within each monochromatic triangle, colors that have equal white content, w, are called
isotints, colors that have equal black content, b, are called isotones, and colors that have
equal color content, c, are called isochromes (also called the shadow series). Colors that
have the same white content as well as black content in different monochromatic triangles
form an isovalue series. Ostwald’s basic rule for color harmony is to require that colors be
presented in an orderly sequence in his color system. The four isoseries mentioned above
plus the isohue series (colors that are in the same monochromatic triangle) are the five
simple orders that he liked very much and so do many others [112, 734].
Since its conception by Munsell in 1905, the Munsell color system has been extended and
refined many times. Efforts to smooth its irregularities and to calibrate it colorimetrically
were completed in 1943 [711]. Since then, the color atlas, the Munsell Book of Color, has
been revised to conform fully with the 1943 renotation. It is now the most widely used color
order system and has been incorporated in many national standards [11, p. 114].
Value
constant
Hue planes
Hue
Ch
rom
a
(how colorful it is compared with a white surface). Figure 14.1 shows a schematic diagram
of the system. Hue is arranged as a circle, Value as the axis perpendicular to the Hue
circle, and Chroma as the radial distance from the center of the Hue circle. A color is thus
specified by its Hue, and on that Hue plane, it is further specified by its Value, and then
by its Chroma. For example, 7.5YR5/4 indicates that the color is on the Hue plane 7.5YR,
and its Value is 5.0 and its Chroma is 4.0. The important feature of the Munsell system
is the scaling of the attributes. Colors that differ only in one attribute are arranged so that
an equal difference in the attribute represents an equal perceptual difference. For example,
along the Value attribute, equal steps in Value represents equal perceptual differences in
lightness. The same is also done along the Hue circle and the Chroma radial distance. For a
given Value and a given Chroma, equal angular distance in Hue is meant to represent equal
perceptual difference in Hue. Since the Munsell system arranges colors in equal perceptual
intervals on cylindrical coordinates, equal Euclidean distances in the Munsell system do
not necessarily represent equal perceptual differences, unless two of the attributes are the
same. This is the basic difference between the Munsell system and the OSA color system
(to be discussed shortly) in which equal Euclidean distances are meant to represent equal
perceptual differences.
10PB 5P
5PB
/6 10P
10B
/5
5B /4 5RP
/3
10BG 10RP
/2
5BG 5R
10G 10R
5G 5YR
10GY 10YR
5GY 5Y
10Y
Figure 14.2. The Hue circle of the Munsell color order system.
(144◦ ), 5BG (180◦ ), 5B (216◦ ), 5PB (252◦ ), 5P (288◦ ), and 5RP (324◦ ) (see Fig. 14.2). One
Hue step is defined to be 3.6◦ , resulting in ten Hue steps between the neighboring major
Hues. The naming of these Hues starts from the mid-point between the major Hues and is
numbered from 0 to 10. For example, the Hue 6R is at 3.6◦ , 7R at 7.2◦ , 8R at 10.8◦ , 9R at
14.4◦ , 10R at 18◦ which can also be referred to as 0YR. The next Hue at 21.6◦ , being closer
to the major Hue YR, is named 1YR, and the next Hue at 25.2◦ is 2YR, and so on. However,
the Munsell Book of Color does not provide chip samples at every Hue step. Between the
neighboring major Hues, only three Hue samples are provided (although in the collection
of glossy color chips, supplementary colors of intermediate Hue are also provided for some
Chromas and Values). For example, between 5R (0◦ ) and 5YR (36◦ ), color chips of 7.5R
(9◦ ), 10R (18◦ ), and 2.5YR (27◦ ) are provided. The collection of color chips thus provides
40 Hues, each representing a perceptually equally-spaced hue, 9◦ apart.
The Munsell Chroma scale (corresponding to the perceptual attribute of saturation) is
measured along a Hue radius, starting with zero at the center for neutral black, gray, or
white. Munsell color samples are provided at Chroma 1 for the Hues at 18◦ intervals (5R,
10R, 5YR, and so on), and at Chroma 2, 4, 6, up to the maximum Chroma that can be
produced reliably and with certain permanency by the available pigments. The total number
of Chroma steps within the theoretical MacAdam limit [617] for all the Hues and Values is
estimated as 5836 [713].
The Munsell Value V (corresponding to the perceptual attribute of lightness) ranges be-
tween 0 and 10 in equal perceptual steps. V can be computed from the luminance reflectance
factor, RY , under Illuminant C by first normalizing the luminance Y of the sample to Yw ,
that of the then white standard MgO (97.5% reflectance) as 100%, i.e., RY = Y /Yw , and
then finding the solution of the following equation:
According to the psychological judgment [713], one Chroma step corresponds to roughly
five just-perceptible increments, one Hue step corresponds to roughly two just-perceptible
increments, and one value step corresponds to roughly 50 just perceptible increments. The
total number of surface colors that may be distinguished under good viewing conditions
[713] is thus estimated to be around 7 500 000, which needs about 23 bits in digital repre-
sentation, provided the ideal quantization of the entire space for surface colors is achieved.
One major complaint about the Munsell color system is that the Munsell Value does
not correlate well with the perceived “lightness”. In general, color chips that have higher
Chromas tend to look lighter than those of lower Chromas but the same Value [1050]. This
is a result of the way Value was defined. In the color judgment experiments conducted for
the Munsell renotation, the value adjustments were made on the neutral or near-neutral
samples with the assumption that the Value of a color is a function of the luminance factor
(Y ) only, regardless of its chroma (see [711, p. 417]).
In 1964, the Royal Swedish Academy of Engineering Science established the Swedish
Colour Centre Foundation. Its initial aim was to create another and better edition of
Hesselgren’s Colour Atlas, based on the Hering–Johansson idea of opponent-color pro-
cesses. After a long course of study and experiment, it concluded that hue and saturation
are not the basic color elements. The NCS was then developed by Hard, Sivik, and their
coworkers, based on the concept of specifying a color in terms of relative amounts of the
six basic colors perceived to be present. The NCS system does not attempt to scale its color
attributes for perceptual uniformity, but the color gradation on any hue plane of its color
atlas looks very smooth.
cone. Since redness and greenness cannot coexist, and yellowness and blueness cannot
coexist, any color can have at most four components of the six basic colors in it. A color
is thus specified by the relative amounts of the four components in terms of percentages.
For example, a color can have 20% whiteness, 30% blackness, 10% yellowness, and 40%
redness. The NCS defines Y, R, B, G as the four major hues. All other hues are expressed
as the relative percentage of the two neighboring major hues. For the color in the above
example, yellowness is 10/(10 + 40) = 20% and the redness is 40/(10 + 40) = 80%. The
hue can thus be denoted by 20Y80R. Since the yellowness and the redness sum up to 100%,
we need only write Y80R. The NSC defines chromaticness as the sum of the two major hue
components. For the color in the above example, the chromaticness is 10 + 40 = 50. The
complete NCS color specification is “blackness, chromaticness, hue”. For the color in the
above example, the NCS notation is 3050-Y80R.
In 1947, the Optical Society of America formed the Committee on Uniform Color Scales
[619] to review all available data on subjective magnitudes of color differences and to
develop a set of color samples that can be assembled to form uniform color scales of as
many varieties as is feasible. The committee conducted color judgment experiments with
76 normal observers, using hundreds of pairs of color differences. By analyzing the data, a
set of formulas was constructed so that the Euclidean distance in the resulting color scales
represents color difference. After 30 years of work, the OSA Uniform-Color-Scale Atlas
was produced in 1977. The important features of the OSA color system are: (1) the CIE
tristimulus values are linearly transformed to a space that uses estimated cone spectral
sensitivities, before any nonlinear approximation to color appearance is done; (2) all the
color differences used are about ten times the just-noticeable difference, and therefore the
results are not intended for applications involving small color differences.
sensitivities. The cone tristimulus values R10 , G 10 , B10 are estimated from the CIE 1964
tristimulus values, X 10 , Y10 , Z 10 , by the following transformation:
R10 = 0.799X 10 + 0.4194Y10 − 0.1648Z 10 ,
G 10 = −0.4493X 10 + 1.3265Y10 + 0.0927Z 10 , (14.3)
B10 = −0.1149X 10 + 0.3394Y10 + 0.717Z 10 ,
where X 10 , Y10 , Z 10 are measured in D65 with Y10 set to 100 for white. The three coordinates
L (lightness), j (yellowness), g (greenness), of the OSA color system are defined as:
1/3
L = 5.9[Y 10 − 2/3 + 0.042(Y 10 − 30)1/3 ], (14.4)
1/3 1/3 1/3
j = C(1.7R10+ 8G 10 − 9.7B10 ), (14.5)
1/3 1/3 1/3
g= C(−13.7R10 + 17.7G 10 − 4B10 ), (14.6)
where
Y 10 = Y10 (4.4934x10
2
+ 4.3034y10
2
− 4.276x10 y10 − 1.3744x10 − 2.5643y10 + 1.8103),
(14.7)
L
C = 1/3
. (14.8)
5.9(Y 10 − 2/3)
The surround used for all the committee’s color comparisons was a medium gray with 30%
reflectance factor.
In color imaging science, the question of color harmony is rarely raised. However, with
the advance of computing technology, color images can now be manipulated easily on a
pixel level, and it is likely that certain basic issues in color harmony will arise, especially
in image synthesis. As we mentioned earlier, the major theory in color harmony was pro-
posed by Ostwald. The basic idea is that color combinations have to follow certain orderly
distributions in color attributes. Five of these orderly distributions are:
368 14 Color order systems
14.7 Problems
14.1 Of the four color order systems discussed in this chapter, which one is actually defined
without reference to any visual system?
14.2 Of the four color order systems discussed in this chapter, which one has the property
that equal Euclidean distances represent equal color differences?
14.3 Plot the lightness as a function of luminance value in the four color order systems in
this chapter. How do they compare with the L ∗ in CIELAB?
15 Color measurement
Quantitative analysis in color imaging requires measurements of all the relevant charac-
teristics of the scene, the imaging system, and the display devices and media. The major
concepts underlying these measurements, such as radiometry, photometry, colorimetry, and
scene physics, have been covered in the previous chapters in the book. In this chapter, we will
briefly describe how various measurement instruments work and study how color measure-
ments are performed. Various national and international standards have been defined (such
as in [39, 444]) and it is important to understand them if accurate, standard measurements
are required.
There are three major types of color measurement in color imaging: spectral measure-
ment, colorimetric measurement, and density measurement. In addition to these three types,
some applications also require the measurements of spectral distribution as a function of
geometrical variables, such as the incident angles and the reflection angles. For example,
measurements of the BSRDF of surface materials are often performed for textile, printing,
or computer animation applications. Instruments that can make this type of measurement
are called gonioreflectometers.
The first step in many color imaging applications is to calibrate the devices that will be used
in the application, so that quantitative physical relations can be established between the
scenes, the sensors, and the displays. The most detailed description of a color stimulus is its
absolute spectral power distribution. This is usually measured as spectral radiance or spectral
irradiance by an instrument called a spectroradiometer. For many applications, only relative
spectral power distributions (with respect to some reference objects) are needed. The relative
spectral power distribution is measured by an instrument called a spectrophotometer. The
names of these two types of instrument are related to the fact that radiometry deals with
absolute radiant fluxes and photometry deals with relative light fluxes (with respect to the
human luminous sensitivity).
15.1.1 Spectroradiometer
A spectroradiometer (see Fig. 15.1) consists of four major parts: (1) the collecting optics,
(2) the monochromator, (3) the detector, and (4) the readout device. The collecting optics
369
370 15 Color measurement
collecting collimating
optics lens
monochromator
entrance diffraction
slit grating
focusing optics
readout device
serves the function of collecting or imaging light from the target and forming a light
beam in a proper shape for the monochromator. The monochromator selects and transmits
a narrow band of wavelengths (ideally a single wavelength) from the spectrum of the
incident light beam. A dispersing element (such as a prism or a diffraction grating) inside
the monochromator receives the incident light beam and spreads different wavelengths in
different angular directions. The focusing optics is then moved and positioned along the
angular spread to focus a narrow portion of the spectrum to pass through the exit slit to
the detector. The detector senses the radiant power and outputs an electrical signal to the
readout device. Figure 15.1 shows a schematic diagram of a spectroradiometer.
One of the key components in a spectroradiometer is the dispersing element that sep-
arates different wavelengths of the incident light. Diffraction gratings are often used for
this purpose. The way a diffraction grating works is well described in the literature (e.g.,
[385, 437, 646, 772]). Basically, periodic micro-structures on the grating surface cause the
reflected (or transmitted) light waves to interfere constructively or destructively, depend-
ing on the angle of propagation upon reflection or transmission. At any particular angle, a
certain wavelength may have the strongest intensity due to constructive interference, while
other wavelengths are not completely in phase and are weaker in intensity. This means that
wavelength separation is never perfect. This problem is called spectral scattering: unwanted
light of neighboring wavelength is scattered or diffracted into the exit slit and mixed with
the wavelength of interest. The scattering can come from the grating, mirrors, or the hous-
ing of the monochromator. The best way to reduce spectral scattering is to pass the output
beam of one monochromator through yet another monochromator (in a setup called a dou-
ble monochromator). If a single monochromator is used, a method of software correction,
deconvolution, can be employed to improve the spectral resoution from the measured data
[714, Chapter 8].
If the collecting optics is equipped with a telescope type of imaging lens, the spectro-
radiometer can be used to measure the spectral radiant power from a spot in the scene.
This type of instrument is called a tele-spectroradiometer. It is a most useful instrument
for color imaging research because we can use it to measure the spectral radiance from any
15.1 Spectral measurements 371
beam switching
mirror
mirror
reference
light
source spectroradiometer
mirror
test beam switching
sample mirror
sample point in the scene under the natural, complex, ambient illumination in which we are
interested.
15.1.2 Spectrophotometer
A spectrophotometer is basically a spectroradiometer that has its own light source and
is equipped to switch the measurement between two alternative input light beams, one
reflected from or transmitted through the test sample and the other reflected from a reference
object (a reflection standard) or transmitted through an open gate without the test sample.
Figure 15.2 shows a schematic diagram of a spectrophotometer.
to make sure that the lighting and measuring geometry do not create problems. If the
instrument has a diffuser or an integrating sphere in the fore-optics, the measurement
will not be sensitive to polarization.
r Spectral scattering: when measuring an incident spectrum at a wavelength λ with
an intended spectral wavelength window λ, stray light of other wavelengths may
be scattered into the detector. This is called spectral scattering. It can cause serious
errors if the target wavelength is near a strong spectral peak. The scattered light may
be stronger than the spectral light to be measured. In most color imaging applications,
spectral functions are quite smooth and therefore their measurements are not as difficult
as those with spiky spectral lines. Some exceptions are fluorescent lamps and CRT
monitors.
r Wavelength calibration: the monochromator in the spectroradiometer or spectropho-
tometer spreads different wavelengths out in different directions. An accurate spectral
measurement requires an accurate wavelength identification. A small wavelength off-
set can produce a large error in the calculation of tristimulus values. Fortunately, there
are various standard spectral lines that can be used for calibrating the wavelength
positions.
r Fluorescence: some materials absorb radiant flux of certain wavelengths and reradiate
it at other (usually longer) wavelengths. It is thus very important to characterize the
light source of the instrument so that the amount of fluorescence can be quantified [39,
ASTM E 991, 431]. If color filters are used, it is important to check to see if they are
fluorescent.
r Nonlinearity: the responsivity function of an instrument is often nonlinear, especially
when a large dynamic input range is involved. It should be part of the instrument
calibration procedure to characterize the nonlinearity in detail.
15.2 Gonioreflectometers
monochromatic
source
sample
detector
Figure 15.3. Schematic diagram of a gonioreflectometer. Here the light source is a monochromator
illuminator and the detector is a simple photodiode detector. Alternatively, the light source may be a
white-light source and the detector may be a spectroradiometer.
From our study of colorimetry, we know that, as far as our color perception is concerned,
a color stimulus can be represented by its tristimulus values, regardless of its spectral
power distribution. Therefore, most spectral measurements are converted into the CIEXYZ
tristimulus values. In many applications, it is necessary only to measure the CIEXYZ
tristimulus values or the luminance and chromaticity (Y , x, y), without having to measure
the spectral power distribution of the light. In this case, instruments that use colorimetric
spectral filters, such as x(λ), y(λ), and z(λ), can be used to make the measurements very
efficiently. This type of instrument is called a colorimeter. However, it is very difficult to
manufacture color filters that closely approximate the CIE color matching functions, and
most colorimeters have limited accuracy. With the fast development of new types of optical
filter material, such as the giant birefringent optics (GBO) polymers [1019], it may be
possible to have very inexpensive, but very accurate colorimeters in the near future [919].
The major advantage is that tristimulus values can be measured very fast in a very convenient
way.
374 15 Color measurement
As discussed in Section 15.1.3, there are many potential sources of error in spectral mea-
surements. Factors, such as lighting and viewing geometry, polarization, and fluorescence,
have to be carefully checked. Other factors, such as wavelength scale, spectral scatter-
ing, and zero-offset, are somewhat correctable. One approach for such data correction is
to model the errors with simple equations and find the parameters of those equations so
that corrected values can be determined [102, 870]. Therefore, the success of this type of
correction depends on the validity of the error model and the magnitude of the error.
Once we have obtained reliable spectral data, we can compute the tristimulus values
according to their definitions, replacing the integrals with discrete sums. However, there are
several situations that may cause problems in many applications:
r When the measurement interval is too large. The process recommended by the CIE for
computing the tristimulus values is to use 1 nm interval or 5 nm interval if the spectral
function is smooth. If the spectral data are available only for 10 nm or 20 nm inter-
vals, calculation with subsampled color matching functions at such a large wavelength
interval often produces very significant numerical errors. The ASTM-recommended
method for calculation is to interpolate the wavelength samples that were not measured.
The interpolation can be as simple as assuming that the missing wavelength samples
have the same values as their nearest measured data points.
r When the measurement range is too small. The CIE-recommended range is from 360 nm
to 830 nm. In practice, the CIE color matching functions have very small values below
380 nm and above 780 nm, and that is a very typical range for spectral measurement.
Sometimes, spectral measurement data are available for an even smaller range, say
400 nm to 700 nm. The recommended process is to extrapolate the wavelength samples
that are not measured, again by assuming they have the same values as the nearest
measured data points.
The above situations of working with missing data points are fairly classical problems in
signal estimation. Knowledge of the signal itself, such as the bandwidth, can be used to
derive more accurate results. For example, if we know that the spectral data were measured
with a triangular window which has the half-height width as the wavelength interval, we
can work out a set of optimum weights for the illuminant–observer functions (assuming that
the reflectance or transmittance factor of all the samples is equally likely to be going up or
down in any wavelength interval), and use those optimum weights to calculate the tristimulus
values. Experimental data showed that this approach can indeed reduce the computation
error very significantly [988].
Although spectral radiances of color stimuli give us the complete information, it is not
always easy or convenient to make such measurements. The instruments are expensive and
the procedures time consuming. Fortunately, for many applications, such as photographic
15.5 Density measurements 375
imaging and graphic art printing, an alternative color measurement called density can be
used with great success. Densities are proportional to log reflectance or log transmittance.
They are very effective in modeling the color reproduction processes and are much simpler
and less expensive to measure. There are two important reasons why densities are so useful
in these applications: (1) they are approximately proportional to the additive sum of dye
or ink amounts, when the colorants are substantially transparent, and (2) color cross-talks
(such as the interimage effect in color films) through which one color layer affects the
formation of another color layer can be approximately modeled by product terms in density.
Therefore, a significant portion of color measurement in imaging has been reported in terms
of density. However, there is a serious drawback in using densities because they are tied to
a particular device and medium. The same density measured on two different media can
appear quite different visually. Therefore, use of density as color measurement works best
when a single closed system is the target application. There are three major factors that have
to be considered in density measurements: (1) aperture, (2) optical geometry, and (3) system
spectral response.
In the digitization of a film image, the aperture used by a scanner or a microdensitometer
is usually quite small (on the order of 10–20 µm). On the other hand, the characteristic curves
of films are often measured with large apertures. In order to reduce the effect of noise and
nonuniformity, a minimum aperture area of 7 mm2 is recommended [924, p. 545]. Also,
the measurement should be made at least 1 mm away from the edge of the exposure. To
measure the density of half-tone images, the minimum aperture size is dependent on the
screen ruling frequency. For example, according to ISO 13655, for a screen frequency of 300
lines per inch, the measurement should use a minimum aperture of 1 mm, and, for a screen
of 65 lines per inch, a minimum of 3.5 mm. Typical density measurements are certified only
to two decimal places. To measure reliable and traceable density values, physical standards
(such as those produced by NIST) should be used frequently to keep the densitometers in
good calibration. Instrument-specific calibration procedures are described in the manuals
provided with the instruments. A good practical discussion can be found in [928, Chapter 1].
Since how a sample reflects or transmits light depends on how it is illuminated and
how the detector collects the light, the illumination (influx) and the light-collection (efflux)
geometry for density measurement have to be specified explicitly. The geometric parameters
usually involve the cone shape, the solid angle, and the direction of light flux. In a general
classification, the cone shape can be specular or diffuse, and the flux direction can be 0◦
or 45◦ from the surface normal of the sample being measured. However, the best optical
geometry to use should be determined by the particular imaging application. For example,
if a photographic print is to be viewed under general office lighting, the influx geometry
should be close to diffuse lighting.
The last major factor in density measurement is the system spectral response. Again,
the target application should be the dominating decision factor. For example, if a color
negative is to be printed by an optical printer, the density should be measured by the
combined spectral response function of the printer light source, the color filters, and the
color paper spectral sensitivities. The density measured by a system spectral response of
a specific printer and paper combination is called the printing density. There are system
spectral response functions that are standardized by national and international standards
376 15 Color measurement
organizations. Density measurements made according to such standards are given special
names, such as Status A density, Status M density, and so on. In considering the system
spectral response function for measuring densities, we have to understand one very important
factor – the position of the peak response for each red, green, and blue channel. The
spectral density as a function of the amount of dye in most transparency films can be
well approximated by Beer’s law, i.e., doubling the amount of dye in a film doubles its
spectral density (not spectral transmittance) of each wavelength. However, the (integral)
density of a film is actually calculated from the integration of the total transmitted light (not
the spectral density) of all wavelengths. Therefore, the (integral) film density measurement
is quite sensitive to the peak position of the system spectral response function. As can be
seen in the following ISO system spectral responses, these functions are all very-narrow-
band filters and often their peak positions are selected to measure commonly used dye or
ink materials for a particular application industry. That is the major reason why there are
such a plurality of status filters in the standards.
Dρ = − log10 ρ. (15.1)
In the CIE lighting vocabulary, the term reflectance density is intended for a measurement
geometry that collects all the reflected light, as specified in the definition of reflectance, ρ.
This is to be distinguished from reflectance factor R, in which the measurement geometry
is a finite cone. The density defined for the reflectance factor is called the reflectance factor
density, D R :
D R = − log10 R. (15.2)
In the literature, the terms reflection density and optical density are also frequently used to
mean either the reflectance density or the reflectance factor density. Since an exact specifica-
tion of any density measurement has to include the illumination geometry, the measurement
geometry, the system spectral response, and the aperture, it is generally acceptable to use
the term reflection density as a generic term to distinguish it from the transmission density
when transmitted light is measured.
Reflection geometry
A typical reflection density measurement uses 0/45 or 45/0 geometry. In order to increase
the illumination level on the reflection target, an annular ring-shaped light source is posi-
tioned at a 45◦ angle from the surface normal and the reflected light is collected through
a cone centered along the surface normal (0◦ ). Figure 15.4 shows such a configuration.
Alternatively, we can also position the light source at the normal and collect the reflected
light with an annular aperture.
15.5 Density measurements 377
efflux
o
<5
influx influx
o
50 40
o
sample
Figure 15.4. Typical geometry for a reflection density measurement.
Type 1 CIE V
1.0
Type 2 ISO Visual
0.8
spectral response
0.6
0.4
0.2
0.0
350.0 400.0 450.0 500.0 550.0 600.0 650.0 700.0
wavelength (nm)
(a)
1.0
0.8
0.6
0.4
0.2
0.0
300.0 400.0 500.0 600.0 700.0 800.0
wavelength (nm)
(b)
Figure 15.5. System spectral responses for (a) ISO visual density, type 1 and type 2 printing density,
and (b) Status A density.
be used for reflection density measurement. The backing material should be spectrally
nonselective, diffuse-reflecting, and have an ISO reflection density of 1.50 ± 0.20.
Dτ = − log10 τ. (15.3)
15.5 Density measurements 379
sample sample
Transmission geometry
The geometry for measuring transmission density refers to the incident light (influx) and the
transmitted light (efflux). The influx and the efflux geometry both can be either diffuse or
specular (regular). Therefore, there are four combinations: diffuse–diffuse, diffuse–specular
(d/0◦ ), specular–diffuse (0◦ /d), and specular–specular. The type of geometry to use is
dependent on the intended application. For example, if a color reversal film (transparency) is
to be displayed against a diffuse light source, the influx geometry for density measurement
should be diffuse as well. Since the transparency is usually viewed in a surface normal
direction, the efflux geometry should be specular. The transmission density measured with
such a diffuse–specular geometry is called the diffuse transmission density. (ISO 13655
specifies that the transmittance factor shall be measured with d/0◦ or 0◦ /d geometry.) On
the other hand, if the transparency is to be projected by a slide projector in which light
from the projector lamp passes through a condenser lens and forms a radiant cone or beam,
the influx geometry for density measurement should be specular. Since the transmitted
light is projected and focuses in a cone of radiant flux, the efflux geometry is also specular.
Therefore, for such an application, the proper geometry for density measurement is specular–
specular, and the density so measured is called the projection transmission density. These
two types of measurement geometry are shown in Fig. 15.6. The half-angle of the specular
cone should be no larger than 10◦ .
380 15 Color measurement
0.8
0.6
0.4
0.2
0.0
300.0 400.0 500.0 600.0 700.0 800.0
wavelength (nm)
(a)
1.0
0.8
0.6
0.4
0.2
0.0
300.0 400.0 500.0 600.0 700.0 800.0
wavelength (nm)
(b)
Figure 15.7. System spectral responses for: (a) ISO Status M density and (b) Status T density.
When we use an instrument to measure some physical quantity, we often do not obtain
the true value of the quantity that we are trying to measure. The difference between the
measured value and the true value is called an error. Of course, in most cases, we do not
know what the true value is and therefore we do not know the error either. In reality, we can
only estimate the error, mostly through repeated measurements.
There are two things that we are particularly concerned with in measurement results:
accuracy and precision. Accuracy is a measure of how close the measured value comes
to the true value. Precision is a measure of how reproducible the measurement is. The
absolute precision is the magnitude of the uncertainty in the measurement in the same units
as the measurement. The relative precision is the uncertainty expressed as a percentage or
a fraction of the true value.
Excluding operational errors caused by human mistakes, measurement errors are gener-
ally classified into three types: systematic errors, instrumental errors, and statistical errors.
Systematic errors are due to system bias or faulty calibration. For example, if a spectropho-
tometer does not have the correct wavelength scale or alignment, the measured spectral
functions will have systematic errors. Instrumental errors are caused by the limited pre-
cision of the instrument. For example, a digital meter cannot measure a quantity more
precisely than the last significant digit. Electronic noise in the instrument also limits how
precisely a quantity can be measured each time. Statistical errors are caused by the sta-
tistical distribution of the physical quantity to be measured. For example, the number of
photons arriving at a detector within a given time interval is statistical in nature. The reason
for making such a classification of errors is that different classes of error have different
probability distributions that need to be considered when we analyze the measured data.
Systematic errors should be removed by careful calibration of the instrument. In the fol-
lowing discussion, we will assume that systematic errors do not exist. Instrumental errors
and statistical errors are called random errors. We will deal only with random errors in this
chapter.
1 n
s = (xi − x)2 (15.4)
n i=1
382 15 Color measurement
Let x and y be two random variables with a joint probability density function, then
we define the variances of x and y and their covariance as V(x) = σx2 = E[(x − E[x])2 ],
V(y) = σ y2 = E[(y − E[y])2 ], and C(x, y) = E[(x − E[x])(y − E[y])], where E[.] means
the expected value. One can show that V(ax + by) = a 2 V(x) + b2 V(y) + 2abC(x, y).
When x and y are uncorrelated, i.e., C(x, y) = 0, then V(ax + by) = a 2 V(x) + b2 V(y).
Let x be an n × 1 random vector, y be an m × 1 random vector, and A be an m × n matrix.
The variance matrix of x is defined as V(x) = E[(x − E[x])(x − E[x])T ]. The covariance ma-
trix of x and y is C(x, y) = E[(x − E[x])(y − E[y])T ]. If y = Ax + b, then V(y) = AV(x)AT .
This is a useful relation because matrices are often used in color transformation from one
color vector space to another. Another useful relation is a quadratic form. Let E[x] = µ and
V(x) = , then E[xT Ax] = tr(A) + µT Aµ, where tr(.) means trace of a matrix, i.e., the
sum of all the diagonal elements. Let us look at an example. Let x = (X, Y, Z ) be the
error vector in the tristimulus values. Let be the variance matrix of x and let the matrix
J be defined by
∗
L ∗ ∂ L /∂ X ∂ L ∗ /∂Y ∂ L ∗ /∂ Z X X
a ∗ = ∂a ∗ /∂ X ∂a ∗ /∂Y ∂a ∗ /∂ Z Y = J Y . (15.6)
b∗ ∂b∗ /∂ X ∂b∗ /∂Y ∂b∗ /∂ Z Z Z
In calculating the CIELAB color error, (E)2 = (L ∗ )2 + (a ∗ )2 + (b∗ )2 , caused by
a CIEXYZ error, we can express (E)2 as a quadratic form in x: (E)2 = (J x)T (J x) =
xT J T J x. Therefore, if we assume that E[x] = 0, the variance of the CIELAB error can be
calculated as E[(E)2 ] = tr(J T J ) + E[x]T J T J E[x] = tr(J T J ).
These simple examples show that the result of the error analysis is not always intuitive
and therefore it is important to check out the analytical results when estimation of error
is required. The other alternative for error analysis when the analytical calculation is too
complex is to do a simulation. One can generate random errors in the measurement data
and use them to calculate the derived quantities and their statistical distribution.
which is the square root of the sample variance divided by the number of samples. The reason
for using the square root is that it has the same dimension as the measured quantity. When a
measurand Y is not measured directly, but is determined from other independent quantities,
X 1 , X 2 , . . . , X N , through a functional relationship: Y = f (X 1 , X 2 , . . . , X N ), the combined
15.8 Problems 385
15.8 Problems
15.1 Since light of different polarizations is reflected differently from a smooth surface,
how does this difference affect the measurement of the spectral reflectance factor
of an object surface? How can we eliminate the polarization sensitivity of a given
instrument?
15.2 A color negative film is exposed in a series of neutral density steps. Using densitometer
A to measure the D–log H curve of the film, we find that the red, green, and blue
386 15 Color measurement
curves are parallel. However, when we use densitometer B to measure the curves, they
are not parallel. Can you explain why there might be such a difference? Is it always
justified to say that densitometer B is not as well calibrated as densitometer A?
15.3 A piece of film has a specular density 1.0; what is its transmittance? Assume that the
index of refraction of the film is 1.5. What is the film’s internal transmittance value
after we correct for the air–film interface reflection?
15.4 Let the Status A red, green, blue densities of an area on a reflection print be Dr , Dg ,
and Db . If Dr = Dg = Db , would that area look achromatic?
15.5 Under D65 , the tristimulus values, (X ,Y ,Z ), measured from a color target have a mean
of (34.0, 29.0, 15.0) and a variance matrix of
2.62 1.23 0.56
1.32 2.34 1.15 . (15.16)
0.61 1.29 3.87
Measurement noise in XYZ causes errors, E, in computed CIELAB values. Calculate
the variance of E, i.e., E[(E)2 ].
16 Device calibration
387
388 16 Device calibration
we will be studying various image capture and image display devices. Specific calibration
problems related to each individual device will be discussed in more detail there.
For color imaging applications, device calibration usually involves three processes: (a) de-
vice adjustment, (b) target measurement, and (c) signal transformation. The device ad-
justment process is aimed at setting up the device at the desired operating conditions. For
example, a good CRT monitor calibration should include: (a) device adjustment: adjusting
the CRT’s cut-off bias, gain, dynamic range, color tracking, etc.; (b) target measurement:
measuring the luminance and the chromaticity for each of the selected input code values;
and (c) signal transformation: constructing lookup tables that map any reproducible color
into the digital values that will produce the color on the CRT screen. Device adjustment
usually requires professional knowledge and is not commonly performed by users. The
remaining processes of target measurement and signal transformation are called device
characterization. However, in the color imaging literature, the term “calibration” is loosely
used to mean either (a) + (b) + (c), or (b) + (c). In some of the literature, especially in
statistics, “calibration” is used to mean (c) alone [140, 640]. For example, given n points,
(xi , yi ), i = 1, 2, . . . , n, where x is the independent variable and y is the dependent variable
(or the response to the input x), we can fit a model y(x) to the data points. This model
fitting process is called regression (or approximation). For any future x, we can find (or
predict) what the response y will be by the regression model y(x). If, instead, we are given a
response y and asked to determine what x can produce that response, the problem is called
calibration. In the example of a scanner, if we scan a gray scale of known reflectance values
x, we can read out the scanner response value y for each reflectance. We can run regression
to determine a function y(x). When we examine the digital values in the scanned image, we
want to find out what their corresponding reflectances are. This then becomes a calibration
problem.
on the ColorChecker and the digital responses from the camera. The calibration problem
then is to determine the color metrics (say, in terms of CIELAB) of an unknown object that
corresponds to a future camera (R,G,B) digital response of a pixel of the image of that object.
In this example, we can see many difficult problems in calibrating an imaging device:
1. The spectra sampling problem If the spectral responsivities of the input device are
not a linear combination of the human cone sensitivities, there is no way to calibrate
the device so that it will see all colors the same way that an average human observer
sees them, unless the input device effectively measures the spectral compositions of
the color stimuli, rather than measuring only three (R,G,B) spectral bands. This is
called the spectral sampling problem because it is caused by the fact that a three-color
imaging device is undersampling the spectral signals.
2. The scene balance problem The scene illuminant is usually different from the illu-
minant used in the calibration. Either the captured image or the calibration table has to
be adjusted to compensate for the difference. The questions are whether it is possible
to do such an adjustment and how good the adjustment can be. This is called the scene
balance problem. The problem is usually addressed in two parts: (a) the brightness
(or density) balance problem deals with the fact that the exposure control of an input
device varies from image to image, depending how strong the scene illumination is;
and (b) the color balance problem deals with the different chromatic shifts due to the
difference between the spectral composition of the scene illuminant (including ambient
light) and that of the calibration illuminant. In the video imaging applications, the color
balance problem is called the white balance problem.
3. The color interpolation problem For practical reasons, the number of color targets
that are measured is often relatively small compared with the total number of possible
colors that can be represented in the input device. For example, a color digital camera
usually outputs an image in 8 bits per color per pixel. In theory, there are 16.7 million
(224 ) representable colors. Although many of them never occur in natural scenes, we
cannot realistically measure even a fraction of 1% of that many colors. Therefore, a basic
problem in color calibration is the color interpolation (or approximation) problem. We
should point out the difference between interpolation and approximation (regression).
Interpolation requires that yi = y(xi ) for all i, where (xi , yi ) are the measured data
points. Therefore, the curve or the model y(x) is forced to pass through all given
data points. In approximation or regression, this is not required. In principle, all color
measurements contain noise and thus we should always use approximation rather than
interpolation. However, approximation requires a good functional model of the color
data, and an imaging device is usually too complicated for us to guess a good model.
Furthermore, color calibration involves multivariate approximation and interpolation
and the data are not very easy to visualize. Forcing a model through the calibration data
can produce unexpected large errors. As a result, interpolation is often used in color
calibration. When interpolation is used, color measurements are repeated many times
and averages are taken as the ground truths.
Figure 16.2 shows a general block diagram for input device calibration. The first step
in the calibration is to remove any nonlinearity in sensor response. The image data are
390 16 Device calibration
converted into a metric that is linearly proportional to image irradiance. Removal of sensor
nonlinearity is usually done with a one-dimensional lookup table (1-D LUT) for each color
channel. With image data in linear space, we can perform the colorimetric transform to put
the image data into some kind of colorimetric quantities, such as CIEXYZ or CIELAB.
Depending on the desired metrics and accuracy, this step is performed with either a 3 × 3
matrix or a three-dimensional lookup table (3-D LUT). Finally, the colorimetric image
data are mapped into the desired output data metric. For example, if the images are to be
viewed on a monitor, the desired output metric is usually sRGB, NTSC or other standard
video signal. This step of converting to the desired output metric can be complicated and
therefore may need to be carried out in multiple transformations. The general diagram in
Fig. 16.2 thus represents conceptual steps rather than implementational steps. Therefore,
blocks can be merged or split in any particular implementation. We will see examples of
input device calibration in Chapter 19 in which we discuss image capture devices. At this
point, we only have to note that the 1-D LUT, the 3-D LUT, and the 3 × 3 matrix are
commonly used building blocks for input device calibration.
1. The color matching stability problem The inks or dyes used in a color hard-copy
output device reproduce colors (under a given viewing illuminant) by metameric color
matching to the original color stimuli. The reproduced colors are not matched to those
of the original in terms of spectral composition, but rather in terms of equivalent human
cone spectral responses. When the spectral composition of the viewing illuminant is
changed (as is often the case in consumer imaging), the reproduced colors no longer
match those of the original. Because of this problem, inks and dyes have to be selected
to reflect or transmit somewhat broad spectral bands. This does not solve the problem,
only makes it less severe. As a consequence, the range of reproducible colors is often
quite limited.
2. The color gamut problem The phosphors (in CRT monitors), the color filters (in
LCDs), the inks (in inkjet printers) and the dyes (in photographic papers) are almost
always much less saturated in the colors they can reproduce than the spectrum colors
16.1 Colorimetric calibration 391
produced by diffraction gratings or interference. The range (or the volume) of repro-
ducible colors produced by a given output device/medium is called the color gamut
of that device. When an input color cannot be reproduced by the output device, it is
called an out-of-gamut color. Rendering a color image on an output device requires a
systematic algorithm for handling the out-of-gamut colors. The class of algorithms for
handling out-of-gamut colors is called the gamut mapping algorithms.
3. The media-dependency problem Calibration of hard-copy devices is tied to a specific
output medium and how it is processed. For photographic paper, the chemical processes
have to be controlled well to make the calibration work. For the same paper and dye,
different surface finishes also affect color reproduction. A printer calibrated with one
type of paper will not produce good prints on another type of paper.
4. The viewing condition problem There are three types of problem in the viewing of a
hard-copy reproduction. One is caused by the difference in the measurement geometry
and the viewing geometry. For example, most color measurements are performed under
0/45 or 45/0 geometry, but images are rarely viewed with the same type of geometry
because the illumination tends to be more diffuse. The second problem is caused by
the luminance level difference between the illumination of the original scene and that
of the viewing environment. Reflective objects look less colorful when viewed under
lower-luminance lighting. The third problem is that the surround of the image also
affects the color appearance in the reproduced image.
Figure 16.3 shows a general block diagram for output device calibration. Usually but
not always, the first step in the output calibration is to perform signal shaping on the image
data. The reason for this is that the relation between the incoming image data and the input
data expected by the output device can be highly nonlinear. For example, the incoming
image data may be in CIEXYZ, but the printer may be expecting CMYK data. In theory,
we can always use a 3-D LUT to achieve such a nonlinear mapping. In reality, computing
time and memory cost often force us to use a sparse 3-D LUT and some simple linear
interpolation. Any nonlinear relation between the input and the output metrics will cause
large errors in linear interpolation from sparse data [486]. The signal shaping step is to
use 1-D LUTs to transform the input image data into a shape that is more nearly linear
with respect to the output metric. The next step of colorimetric transform is usually done
by a 3-D LUT or a 3 × 3 matrix. The purpose is to convert the image data into the output
device’s color primaries. Once the image data are in the output color space, it is often
necessary to predistort the data to compensate for the nonlinear characteristics of the output
device/medium. Again, the general diagram in Fig. 16.3 represents only conceptual steps
rather than implementational steps. Therefore, blocks can be merged or split in any particular
implementation. We will see examples of output device calibration in Chapter 20 in which
392 16 Device calibration
we discuss image display devices. At this point, we should notice that a 1-D LUT, a 3-D
LUT, and a 3 × 3 matrix are also the building blocks for output device calibration.
be the spectral transmission functions of unit amount of cyan, magenta, and yellow dye,
respectively. The spectral transmittance T (λ) of the combined dye layers can be computed
as
By definition, the spectral transmittance T (λ) is related to the spectral transmission density
Dt (λ) by the following relation:
Now we have to relate the spectral transmission density Dt (λ) to the spectral reflection
density Dρ (λ) by an empirical function f (Dt (λ)) that can be determined experimentally.
The function f (Dt (λ)) converts the transmission density Dt of the dyes to the reflection
density, Dρ of the print. (This function is further explained in Section 20.5.5.) We also have
to account for the minimum spectral density Dmin (λ) of the paper when no dye is formed.
Thus
Let Sr , Sg , Sb be the system Status A spectral response functions, and ρr , ρg , ρb be the Status
A reflectance factors. By the definition of Status A density, we have the following relations:
We would like to construct a 3-D LUT that will map Dr , Dg , and Db to ac , am , and ay . In
order to simplify the above equations to come up with a practical calibration model, we need
to remove all the integrals. From Fig. 15.5, we can observe that the spectral bandwidths
of the Status A response functions are fairly narrow compared with the general spectral
reflectance functions. We can therefore make the following approximations:
where λr , λg , and λb are the peak wavelengths of the spectral sensitivity functions Sr , Sg ,
394 16 Device calibration
−1
Dr f (Dr + dr) ac
−1 3×3
Dg f (Dg + dg) am
matrix
3-D
−1
LUT
Db f (Db + db) ay
Figure 16.4. A model-based lookup table design for mapping Status A densities to dye amounts.
Dr = − log ρr
Dr ≈ − log[ρ(λr )Sr (λr )]
Dr = − log ρ(λr ) − log Sr (λr )
Dr = − log 10−Dρ (λr ) − log Sr (λr )
Dr = Dρ (λr ) − log Sr (λr )
Dr = f (Dt (λr )) + Dmin (λr ) − log Sr (λr )
Dr = f (− log T (λr )) + Dmin (λr ) − log Sr (λr )
Dr = f (ac C(λr ) + am M(λr ) + ay Y (λr )) + Dmin (λr ) − log Sr (λr ).
Let us define dr = log Sr (λr ) − Dmin (λr ), dg = log Sg (λg ) − Dmin (λg ), and db =
log Sb (λb ) − Dmin (λb ), then
−1 −1
ac C(λr ) M(λr ) Y (λr ) f (Dr + dr )
am ≈ C(λg ) M(λg ) Y (λg ) f −1 (Dg + dg ) . (16.7)
ay C(λb ) M(λb ) Y (λb ) f −1 (Db + db )
The above equation tells us that three 1-D LUTs plus a 3 × 3 matrix will essentially transform
the Status A densities to the three dye amounts. Since the model has made some simplifying
assumptions, it cannot be as accurate as required in the application. However, if we follow
the 1-D LUTs and the 3 × 3 matrix with a 3-D LUT, we will be able to get very accurate
results. Since the model has removed most of the nonlinearity between the input variables
and the output variables, the 3-D LUT does not need many grid points and it serves only to
correct local errors caused by the oversimplified device model. Figure 16.4 shows the final
lookup table structure for this calibration problem. It requires some more memory space,
but gives much better results than a simple 3-D LUT can provide.
In the above discussion, we have identified several common building blocks for device cal-
ibration. They are: (a) device models (for sensor and display nonlinearity), (b) 3 × 3 matri-
ces, (c) 1-D LUTs, and (d) 3-D LUTs. In order to construct these common building blocks,
we need to study several statistical and computational tools: regression, approximation,
16.2 Computational tools for calibration 395
interpolation, and constrained optimization. Regression and approximation are used for
determining parameters for device models. Approximation and interpolation are used for
generating lookup tables and finding the needed unknown values from the known values in
the lookup tables. Constrained optimization is used to determine the optimal 3 × 3 matrix
for color transformation. These tools will be briefly reviewed here. There are many books
and papers on these computational tools and their statistical and mathematical foundations
(see, e.g., [110, 140, 175, 239, 256, 545, 552, 601, 640, 748, 812, 1001]). They should be
consulted for more detailed studies.
16.2.1 Interpolation
Given a set of data points (xi , f (xi )), i = 1, 2, . . . , N , where xi < x j for all i < j, the
problem of interpolation is to determine f (x) for any new x value when x1 < x < x N . The
function f (x) is, of course, unknown, but it is known to pass through the given set of data
points. Data interpolation serves two purposes in color calibration. It is used in populating
a lookup table (off-line) and also in generating pixel values from the lookup table (on-
line). Typically, the number of measured data points is much smaller than the number
of lookup table entries, which is in turn much smaller than the number of representable
colors in a digital image. For example, a typical digital color image has 8 bits per color per
pixel, which can represent 256 × 256 × 256 colors. A typical color lookup table contains
33 × 33 × 33 entries. The number of measured color targets is often 9 × 9 × 9, which is
a lot of colors to measure. From these measured data points, we have to use interpolation
to populate the (33 × 33 × 33)-entry lookup table. When processing an image, the color
of each pixel, one out of the 256 × 256 × 256 colors, has to be calculated by interpolation
from the 33 × 33 × 33 lookup table. The first color interpolation for populating the lookup
table can be performed off-line, but the second color interpolation for each pixel has to
be performed on-line. The off-line interpolation can be sophisticated and slow, but the
on-line interpolation needs to be very, very fast, usually using only the simplest linear
interpolation. When we discuss methods of interpolation, we have to distinguish between
these two different applications.
Univariate interpolation
Univariate interpolation is used in color calibration when each color channel is calibrated
independently or when only the luminance channel is calibrated. In most cases, several data
points are measured and the goal is to interpolate between them to determine the output
response for the unmeasured input. Another application is for generating a tone scale curve
from a given set of points with or without the slope specified.
Data interpolation for functions of a single variable is a well-studied subject [237, 239].
Many of these well-known methods can be used in color calibration. When a single-variate
lookup table is needed in calibration (for tone scale or curve shaping), the table size is usually
small enough (≤4096) that a full lookup table is generated off-line. Therefore, the criteria
for choosing one interpolation method over the others are mostly based on physical models
of the devices, rather than on computational complexity. For example, the luminance as a
function of the digital input value of a CRT monitor has been found to follow a certain power
396 16 Device calibration
yi−1 y i +2
y y i +1
i
y
xi −1 xi x xi +1 x i+2
Figure 16.5. Using four data points (xi−1 , yi−1 ), (xi , yi ), (xi+1 , yi+1 ), and (xi+2 , yi+2 ) to determine a
cubic polynomial for interpolating the interval [xi , xi+1 ].
function very well. There is no reason, then, to interpolate the table by, say, a cubic spline.
On the other hand, the reflectance of an output gray patch as a function of the input digital
value of a printer does not seem to follow any particular analytical function. Piecewise cubic
interpolation can thus be very well used here.
One of the major concerns in color interpolation is the unpredictable, nonlinear charac-
teristics of an imaging device. It is thus not generally a good idea to use global functions
for interpolation. For example, Lagrange interpolation with all the data points included
is a very bad choice, because every data point exerts its influence on the entire curve. A
preferred method uses only a few neighboring data points for interpolation. In this sense,
piecewise cubic interpolation is better than the cubic spline, because the former only uses
two neighboring data points on each side for interpolation, whereas the cubic spline involves
all the data points.
For the two intervals at the ends, the first four data points and the last four data points can be
used. This type of piecewise polynomial interpolation is continuous, but the derivatives at the
data points often are not continuous or do not exist. This is not necessarily a concern because
there are situations in which the devices to be calibrated may not have sufficiently smooth
response functions. Piecewise cubic interpolation is often used in scanner calibration where
several neutral (gray) patches of known reflectance are scanned and the complete curves
that relate scanner code values to input reflectances are constructed through piecewise cubic
interpolation [422]. If the scanner codes are averaged over many pixels in each region of
the reflectance target, the resulting curves are often smooth enough for scanner calibration.
Spline interpolation
It is possible to find a piecewise polynomial function for a set of data points such that some
of its derivatives are continuous throughout the entire interval covering all the data points.
One such example is the spline function of order k, defined as a piecewise polynomial
function of degree k − 1 on every interval between two consecutive data points with k − 2
continuous derivatives at the data points.
Definition Given a strictly increasing sequence of points (called knots) x0 < x1 < . . . <
x N −1 , where x0 = a and x N −1 = b, a function f (x), defined on the finite interval [a, b], is
called a spline function of order k > 0, if: (1) on each interval, [xi , xi+1 ], i = 0, . . , N − 2,
f (x) is a polynomial of, at most, degree k − 1, and (2) f (x) and its derivatives, up to order
k − 2, are continuous on [a, b].
A spline function of order 1 is a piecewise constant function, with each interval given
by a zeroth degree polynomial (constant). The function itself is not continuous. A spline
function of order 2 is a piecewise linear function (linear interpolation) with each interval
given by a first degree polynomial. The function is continuous. Although linear interpolation
does not produce a smooth curve, it is very fast and often sufficiently accurate. When a
large image has to be mapped through a sparsely populated calibration table (especially
three- or four-dimensional tables), linear interpolation is often used. The best-known spline
function is the spline function of order 4, with each interval given by a third-degree (cubic)
polynomial. It is called the cubic spline. It has continuous first and second derivatives over
the entire data interval [a, b]. It can be shown [545, Theorem 3.7.2] that the family of
cubic spline functions over a given set of N knots is a vector space of dimension N + 2.
Therefore, one can find N + 2 basis functions that will span the vector space of cubic spline
functions. One particularly useful basis function is the B-spline function (B here means
basis), which has a finite support and a bell shape similar to a Gaussian function. If we have
one B-spline function at each knot and two more at two artificially introduced knots, we have
enough basis functions to span the entire vector space. B-spline functions are useful tools in
digital image processing applications, such as image interpolation and edge-detector-based
wavelet decomposition.
Multivariate interpolation
Interpolation of a multivariate function is much less well studied [175]. Color interpolation
can be seen as a special case in multivariate interpolation. If the measured data are on
398 16 Device calibration
Once a set of points has been selected to interpolate the point of interest, we need a data
interpolation model. The commonly used data interpolation models are:
r Polynomials: depending on the number of points selected by the point selection strategy,
we can fit a multivariate polynomial to the selected points and use it to interpolate. For
example, if four noncoplanar points are selected in a three-dimensional color space,
a first order polynomial f (x, y, z) = ax + by + cz + d can be used to interpolate a
point. The very useful tetrahedral interpolation algorithm uses such a data model.
r Distance-based: the interpolated value is a weighted linear combination of the values
of the selected points. The weight assigned to each selected point is a function of its
distance to the point to be interpolated. This is a very general class of methods for
multivariate interpolation for scattered data [355, 860]. However, it is not widely used
in color interpolation, partly because there are other fast and accurate methods, and
partly because its underlying physical meaning does not match well how most color
imaging devices behave.
r Volume-based: in this data model, the volume of the convex hull of the selected k
points is partitioned into k segments, each of which uses the point of interest as one
of the vertices. The weight assigned to each of the selected points is proportional to
the volume of the segment opposite to it. Figure 16.6 shows an example of such a data
model. In fact, it can be shown that such a data model is equivalent to the one used in the
popular trilinear interpolation, in which each axis is linearly interpolated in sequence.
16.2 Computational tools for calibration 399
h3 h7
h2
h6 v1
v8 trilinear
h interpolation
v2
h8
v7
h1 v = v1 + v2 + ... + v8
h5
h = h1 . v1/v + h2 . v2 /v + ... + h 7 . v7 /v + h8 . v8 /v
Figure 16.6. Example of a volume-based data interpolation model. The volume labeled v1 is the
rectangular volume that has the vertex h 7 and the point of interest, h, as the diagonal vertices. The
weight assigned to vertex 1 is v1 divided by the total volume v.
There are several practical considerations in color interpolation that impose constraints
on the choices of the methods of multivariate interpolation:
A A′
B′
B
D
D′
C C′
Figure 16.7. Example of neighboring polyhedra that do not share a full face. The diagram shows two
neighboring cubes drawn separately for a better view, i.e., A and A are the same point, and so are
B and B , C and C , and D and D . The left-hand cube is partitioned into two prisms and so is the
right-hand cube, but the ways they are partitioned are different. As a result, none of the four prisms
shares a full face with another.
All the multivariate interpolation methods discussed so far require fair amounts of
memory space for lookup table storage and computing power for on-line calculation.
One natural question to ask is whether it is possible to combine several 1-D LUTs to
do the job of a 3-D LUT. For example, f (x, y, z) = x yz can be represented as f (x, y, z) =
exp[ln x + ln y + ln z]. Thus, it is possible to use three 1-D LUTs to transform x, y, and
z into ln x, ln y, and ln z. We then add the outputs of the three 1-D LUTs and use the
resulting value (properly quantized) as an index to a fourth 1-D LUT to perform the ex-
ponential function. Although this example seems to be quite contrived, one amazing result
in approximation theory is that it is possible to represent a continuous function of several
variables by superpositions and additions of functions of one variable. This was proved by
Kolmogorov in 1957. The result can be stated by the following Theorem of Kolmogorov
[601, p. 168]:
b Z
1
0 1
P 2 P
3
0 2
3
g Y
r X
P1
P0
P3
P2
decomposition of a matrix. However, this may take many components to produce a good
approximation.
P1
P4
P0
P3
P2
P0 P = α P0 P1 + β P0 P2 + γ P0 P3 ,
X P − X0 X1 − X0 X2 − X0 X3 − X0 α
Y P − Y0 = Y1 − Y0 Y2 − Y0 Y3 − Y0 β .
Z P − Z0 Z1 − Z0 Z2 − Z0 Z3 − Z0 γ
Test for convex inclusion Point P is inside the tetrahedron if and only if α ≥ 0, β ≥ 0,
γ ≥ 0, and α + β + γ ≤ 1, where
−1
α X1 − X0 X2 − X0 X3 − X0 X p − X0
β = Y1 − Y0 Y2 − Y0 Y3 − Y0 Y p − Y0 . (16.10)
γ Z1 − Z0 Z2 − Z0 Z3 − Z0 Z p − Z0
What are the (r, g, b) coordinates of an interior point P? When P is inside the tetrahedron,
its (r, g, b) coordinates can be determined from those of the four vertices by:
rP r1 − r0 r2 − r0 r3 − r0 α r0
g P = g1 − g0 g2 − g0 g3 − g0 β + g0 . (16.11)
bP b1 − b0 b2 − b0 b3 − b0 γ b0
In Fig. 16.10, when P0 and P4 are on opposite sides of the plane P1 P2 P3 , and the vector
P0 P4 intersects with the triangular face P1 P2 P3 at a point P, what are the (X, Y, Z ) and
16.2 Computational tools for calibration 403
(r, g, b) coordinates of P?
−1
α X1 − X0 X2 − X0 X3 − X0 X4 − X0
β = Y1 − Y0 Y2 − Y0 Y3 − Y0 Y4 − Y0 . (16.12)
γ Z1 − Z0 Z2 − Z0 Z3 − Z0 Z4 − Z0
The conditions for P0 and P4 to be on opposite sides of the plane P1 P2 P3 and for the vector
P0 P4 to intersect the triangular face P1 P2 P3 are: α ≥ 0, β ≥ 0, γ ≥ 0, and α + β + γ ≥ 1.
In this case, the intersection point P can be determined by normalizing α, β, and γ with
their sum so that their normalized values sum to 1.
XP X1 − X0 X2 − X0 X3 − X0 α/(α + β + γ ) X0
Y P = Y1 − Y0 Y2 − Y0 Y3 − Y0 β/(α + β + γ ) + Y0 (16.13)
ZP Z1 − Z0 Z2 − Z0 Z3 − Z0 γ /(α + β + γ ) Z0
and
rP r1 − r0 r2 − r0 r3 − r0 α/(α + β + γ ) r0
g P = g1 − g0 g2 − g0 g3 − g0 β/(α + β + γ ) + g0 . (16.14)
bP b1 − b0 b 2 − b0 b3 − b0 γ /(α + β + γ ) b0
Linear regression
Given a set of data points (xi , yi ), i = 1, 2, . . . , n, and a device model y = f (x) =
m
j=1 u j φ j (x), we would like to find the coefficients u j , j = 1, 2, . . . , m, such that the
total error is minimized. This problem is a linear regression problem because f (x) is a
linear function of the unknowns u j , j = 1, 2, . . . , m, although the functions φ j (x) in the
expansion of f (x) may be nonlinear. For example, a cubic spline can also be cast into linear
regression [960]. The most frequently used regression criterion is to minimize the squared
404 16 Device calibration
The above equations are called the normal equations and they can be written in a matrix
form. Let be an m × m matrix, where k j = in φk (xi )φ j (xi ), and u and b be m × 1
vector, where the jth component of the vector u is u j , and the kth component of the vector
b is bk = in yi φk (xi ), then u = b, i.e.,
n n n
φ (x )φ (x ) φ (x )φ (x ) . . . φ (x )φ (x ) u1
in 1 i 1 i in 1 i 2 i in 1 i m i
φ (x )φ (x ) φ (x )φ (x ) . . . φ (x )φ (x ) u2
i 2 i 1 i i 2 i 2 i i 2 i m i
.. .. .. .
. . . ..
n n n
i φm (x i )φ1 (x i ) i φm (x i )φ2 (x i ) . . . i φm (x i )φm (x i ) um
n
y φ (x )
i i 1 i
n yi φ2 (xi )
i
= .. . (16.15)
.
n
i yi φm (x i )
If the matrix is not singular, the coefficient vector u = −1 b. However, the matrix
is often very ill-conditioned (some small perturbation in the data leads to a large change
in the solution [960, Lecture 12]) and matrix inversion tends to be numerically unstable.
Therefore, the normal equations are usually solved by the singular value decomposition.
Before we examine a more general case of the least-squares problem, let us introduce the
singular value decomposition and the Moore–Penrose inverse.
Singular value decomposition theorem [625, p. 18]: Let A be a real m × n matrix, with
rank r. There exist an m × r matrix U such that U T U = Ir , an n × r matrix V such that
V T V = Ir , and an r × r diagonal matrix with positive diagonal elements, such that
A = U V T , where Ir is an r × r identity matrix.
16.2 Computational tools for calibration 405
(Note that this theorem is stated in a form so that the r × r diagonal matrix is invertible.
This property will be used to find the Moore–Penrose inverse. In most singular value
decomposition algorithms, the diagonal matrix is computed as an m × n matrix. In that
case, very small singular values have to be discarded and the matrices made into the form
stated here before the Moore–Penrose inverse is computed as A+ = V −1 U T . In numerical
computation, the rank r of matrix A is defined by the smallest singular value that we decide
to keep.)
is the “design” matrix. The regression model y ≈ Au means that the dependent (observation)
variable, y, is a linear combination of the variables φ1 (x), φ2 (x), . . . , φm (x), i.e., y =
u 1 φ1 (x) + u 2 φ2 (x) + · · · + u m φm (x), and the coefficients are given by the vector u which is
to be determined by the method of least squares. It can be proved that (y − Au)T (y − Au) ≥
yT (I − A A+ )y and the least squares solution for u is given by u = A+ y + (I − A+ A)c,
where I is the identity matrix, and c is an arbitrary constant n × 1 vector [625, p. 232].
In most applications, A has a full column rank, then A+ = (AT A)−1 AT and A+ A = I .
Therefore, u = A+ y = (AT A)−1 AT y. (Note that AT y = b, AT A = , and u = −1 b.)
A more generalized least squares problem is to allow different weighting factors for the
error components, and the weighted square errors become (y − Au)T W (y − Au), where
the weight matrix W is assumed to be a symmetric, positive semidefinite n × n matrix
(i.e., uT W u ≥ 0 for all u). It can be proved that (y − Au)T W (y − Au) ≥ yT (W −
W A(AT W A)+ AT W )y and the weighted least squares solution is given by u =
(AT W A)+ AT W y + (I − (AT W A)+ AT W A)c, where c is an arbitrary constant n × 1 vector.
Again, when AT W A has a full column rank, u = (AT W A)−1 AT W y.
Nonlinear regression
Sometimes, the nonlinear models can be reformulated into linear ones. For example, y =
ax b can be changed to log y = log a + b log x. Although linear regression problems are
406 16 Device calibration
relatively easy to solve, not all models can be cast in that way. For example, the CRT
monitor luminance as a function of input code value is well described by a power function:
L = a(v + b)γ . The parameters, a, b, and γ , cannot be cast into linear coefficients of
any chosen function. Standard nonlinear optimization algorithms, such as the Levenberg–
Marquardt method [777], can be used. However, as in any nonlinear optimization problem,
we have to use a good initial guess and avoid local minima in the optimization process. The
other problem is that the valley near the global minimum of the cost function may be very
shallow and the algorithm can stop prematurely before it reaches the true minimum. In the
case of monitor calibration, most monitors have a gamma value within a certain range, say
2.0 < γ < 2.5, and it is helpful to include a penalty function to prevent the algorithm from
being trapped at a gamma value far from 2.2.
Robust regression
The most widely used cost function in regression is the squared errors. The popular least
squares regression is mathematically easy to solve and, in the case of Gaussian noise, it is
also the maximum likelihood solution. However, it is quite sensitive to outliers. A single
outlier in the data may throw the estimates off so far as to make the results useless. The basic
problem is that the cost function is proportional to the square of the distance, and therefore,
is heavily biased by the outliers. A family of new cost functions has been proposed to reduce
the impact of outliers by discounting large deviations. Regression using this type of cost
function is called robust regression. The least median of squares (LMS) is found to have
many good characteristics. It can tolerate up to almost 50% bad data, because the median
is not much affected [812].
However, there is no efficient way to compute the LMS solution. Current algorithms use
random sampling of the data to run regressions. If enough different subsets are tried, a good
subset will not contain an outlier and thus will produce the LMS. In order to make sure that
at least one good subset is included, a sufficiently large number of random samplings have
to be tried. That is why the LMS takes much more time to compute. However, for off-line
interpolation, the robustness may be worth the effort.
we will assume that the output of the digital camera is in CIEXYZ space. Therefore, the
function of the 3 × 3 matrix is to transform the camera (R, G, B) values into CIEXYZ
tristimulus values. We can use the camera to take an image of a color calibration target,
such as the Macbeth ColorChecker, which has 24 color patches. Since the CIEXYZ values
of the color calibration target are known for a given illuminant, we can determine the 3 × 3
matrix that minimizes the total errors between the transformed and the measured CIEXYZ
values. Or, alternatively we can minimize the CIELAB differences. However, if the matrix
is determined in this way without any constraint, it is very likely that a neutral object in the
(R, G, B) image (say, after color balance, R = G = B) may not be reproduced as neutral.
When an image is rendered with a slight presence of hue in neutral areas (such as gray or
white objects), it becomes quite objectionable. Since preservation of neutrals is one of the
most important requirements in color reproduction, a matrix that does not preserve neutrals
is usually not an acceptable solution. Therefore, in the minimization of color errors, the
matrix elements should be constrained in such a way that neutrals are preserved. The other
frequently used constraint is the explicit constraint on skin color reproduction.
General constrained optimization problems have been well studied (see, e.g., [605, 668]).
Algorithms appropriate to their intended domain should be used for each application. Here
we will present one simple method that allows us to impose linear equality constraints.
This is called the method of Lagrange multipliers [108]. We will use it to solve the color
transformation matrix problem under the constraint that neutrals be preserved. First let us
show that the neutral constraint can be formulated as equality constraints. Under a given
illuminant with chromaticity coordinates (xc , yc ), all neutral objects should have the same
chromaticity as that of the illuminant. Let M be the 3 × 3 color transformation matrix that
maps camera (R, G, B) to CIE (X , Y , Z ):
X R m 11 m 12 m 13 R
Y = M G = m 21 m 22 m 23 G . (16.17)
Z B m 31 m 32 m 33 B
We will assume that the camera RGB image has been properly color balanced and R = G =
B for neutral colors. A neutral object with R = G = B = c is transformed to (X , Y , Z ),
where
X = c(m 11 + m 12 + m 13 ),
Y = c(m 21 + m 22 + m 23 ),
Z = c(m 31 + m 32 + m 33 ).
To preserve the neutral colors, the chromaticity coordinates should be equal to those of the
illuminant, i.e.,
X
x = = xc ,
X + Y + Z
Y
y = = yc ,
X + Y + Z
Z
z = = zc.
X + Y + Z
408 16 Device calibration
yc (m 11 + m 12 + m 13 ) = xc (m 21 + m 22 + m 23 ), (16.18)
yc (m 31 + m 32 + m 33 ) = z c (m 21 + m 22 + m 23 ). (16.19)
Furthermore, the scale of the transformation is specified by conventions. For example, when
R = G = B = 255 (maximum input value), we want the output luminance Y to be set at
255 (maximum output value) as well. In that case, m 21 + m 22 + m 23 = 1. Therefore,
m 11 + m 12 + m 13 = xc /yc , (16.20)
m 21 + m 22 + m 23 = 1, (16.21)
m 31 + m 32 + m 33 = z c /yc . (16.22)
The general problem can be stated as follows: Given a set of corresponding color val-
ues: Ui = (Ri , G i , Bi )T , Vi = (X i , Yi , Z i )T , i = 1, 2, . . . , n, find a 3 × 3 color transfor-
mation matrix, M, so that the color errors are minimized in CIELAB space, under the
constraint that the neutral colors have to be preserved. Within first-order approximation,
the error, Vi − MUi = [X i − X i , Yi − Yi , Z i − Z i ]T = [X i , Yi , Z i ]T is related to the
error [L i∗ , ai∗ , bi∗ ]T by:
∗
L i∗ ∂ L /∂ X ∂ L ∗ /∂Y ∂ L ∗ /∂ Z X i X i
ai∗ = ∂a ∗ /∂ X ∂a ∗ /∂Y ∂a ∗ /∂ Z Yi = Ji Yi
bi∗ ∂b∗ /∂ X ∂b∗ /∂Y ∂b∗ /∂ Z Z i Z i
= Ji (Vi − MUi ). (16.23)
We want to find the matrix M that minimizes the following cost function:
n
3
χ2 = [Ji (Vi − MUi )]T [Ji (Vi − MUi )] − 2λ j (m j1 + m j2 + m j3 − c j ), (16.24)
i=1 j=1
Setting all the Ji matrices to the identity matrix, let us reformulate the cost function of
Eq. (16.24) by adding an additional cost term: k i j m i2j , which penalizes large matrix
elements because they often amplify noise. Therefore, we have
n
3
3
3
χ2 = (Vi − MUi )T (Vi − MUi ) − 2λ j (m j1 + m j2 + m j3 − c j ) + k m i2j
i=1 j=1 i=1 j=1
(16.26)
and in order to find the optimal M, we set the partial derivatives to zero:
λ1 λ1 λ1
∂χ 2
= −2 Vi UiT + 2 MUi UiT − 2 λ2 λ2 λ2 + 2k M = 0. (16.27)
∂M
λ3 λ3 λ3
Let P = Vi UiT , S = Ui UiT , Q = (S + k I )−1 , and
λ1 λ1 λ1
[λ] = λ2 λ2 λ2 . (16.28)
λ3 λ3 λ3
We can now solve for [λ] in terms of the known data. Let
1 c1 c1 c1 1 1 1
N = 1 , C = c2 c2 c2 , and W = 1 1 1 . (16.31)
1 c3 c3 c3 1 1 1
and therefore, [λ] = (C − P QW )/N T Q N . Substituting [λ] into Eq. (16.30), we have
M0 = P Q + (C − P QW )Q/(N T Q N ), (16.33)
u = [m 11 , m 12 , m 13 , m 21 , m 22 , m 23 , m 31 , m 32 , m 33 ]T . Let A be a 3n × 9 matrix:
R1 G 1 B1 0 0 0 0 0 0
0 0 0 R1 G 1 B1 0 0 0
0 0 0 0 0 0 R1 G 1 B1
R G 0
2 2 B2 0 0 0 0 0
0 0 0 R2 G 2 B2 0 0 0
A=0
0 0 0 0 0 R2 G 2 B2 , (16.34)
.. .. .. .. .. .. .. .. ..
. . . . . . . . .
Rn G n Bn 0 0 0 0 0 0
0 0 0 Rn G n Bn 0 0 0
0 0 0 0 0 0 Rn G n Bn
y be a 3n × 1 vector: y = [X 1 , Y1 , Z 1 , X 2 , Y2 , Z 2 , . . . , X n , Yn , Z n ]T ; F be a 3 × 9 matrix:
1 1 1 0 0 0 0 0 0
F = 0 0 0 1 1 1 0 0 0 ; (16.35)
0 0 0 0 0 0 1 1 1
c be a 3 × 1 vector: c = [c1 , c2 , c3 ]T and J be a 3n × 3n matrix:
(J1 )11 (J1 )12 (J1 )13 0 0 0 ... 0 0 0 0 0 0
(J1 )21 (J1 )22 (J1 )23 0 0 0 ... 0 0 0 0 0 0
(J1 )31 (J1 )32 (J1 )33 0 0 0 ... 0 0 0 0 0 0
0 ...
0 0 (J2 )11 (J2 )12 (J2 )13 0 0 0 0 0 0
0 0 0 (J2 )21 (J2 )22 (J2 )23 ... 0 0 0 0 0 0
J = 0 0 0 (J2 )31 (J2 )32 (J2 )33 ... 0 0 0 0 0 0
.
.. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . .
0 0 0 0 0 0 . . . 0 0 0 (Jn )11 (Jn )12 (Jn )13
0 0 0 0 0 0 . . . 0 0 0 (Jn )21 (Jn )22 (Jn )23
0 0 0 0 0 0 . . . 0 0 0 (Jn )31 (Jn )32 (Jn )33
Let B = J T J , then the constrained minimization problem becomes to minimize (y −
Au)T B(y − Au) subject to Fu = c. The solution is given by [625, pp. 233–4]: u = u0 +
H + F T (F H + F T )+ (c − Fu0 ) + (I − H + H )q, where H = AT B A + F T F, u0 = H + AT By
and q is an arbitrary constant 9 × 1 vector. Very often in practical applications, H + H = I
and the solution for u is unique.
In addition to the colorimetric calibration discussed in the previous section, color imaging
devices, such as cameras, scanners, CRT displays, and printers, also require spatial cali-
bration to characterize their performance in reproducing spatial patterns. Resolution and
geometry calibrations are two of the most frequently performed procedures. The former is
aimed at measuring how a device reproduces spatial details, and the latter how a device
16.3 Spatial calibration 411
reproduces geometric shapes. The resolution of an imaging device is often defined in sev-
eral ways, each of which may measure somewhat different characteristics of the system.
For example, in one definition, the resolution of an imaging device is the smallest distance
between two dots that can be visually resolved. In another definition, two lines are used. In
yet another definition, the bandwidth of the system transfer function is used to represent its
resolution. All these three definitions attempt to use one number to characterize the system.
Here, by resolution calibration, we will mean the measurement of the spatial frequency
response function or the MTF of the system. In the case of a nonlinear device, international
standards often specify the operating point at which the response function is measured by a
low-contrast input signal. Geometric calibration is often more complicated than resolution
calibration, especially for image capture devices [331, 449, 590, 966, 1009, 1039]. It is
actually an important part of several engineering fields, such as photogrammetery, remote
sensing, and computer vision. The complexity increases quickly with the desired accu-
racy. Fortunately, not all imaging devices require complicated geometric calibrations. For
example, color thermal printers usually do not require such a calibration process because
the alignment of pixels is often done in the manufacturing of print heads. The geometric
calibration will be discussed in Chapter 19 when we deal with different types of imaging
device (for example, digital cameras). Here, we discuss the general method for resolution
calibration.
The edge spread function (ESF), (x, y), is the convolution of h(x, y) with an ideal
step function, s(x, y), which is equal to zero for all x < 0, and 1 for all x ≥ 0. These
expressions are of practical importance because a relatively good knife-edge target is not
difficult to create and the image of the ESF of a device can be analyzed to derive the
system transfer function of the device. Since it is almost impossible to perfectly align the
knife edge along any direction, the image of a slanted edge has to be analyzed. For digital
devices, the slanted edge image actually provides good information for estimating the system
spatial frequency response (SFR) beyond the Nyquist frequency. The reason is that we have
complete knowledge of the signal. Various pixels across the slanted edge provide digital
412 16 Device calibration
slanted edge
Figure 16.11. By projecting all the pixels onto a line perpendicular to the slanted edge, we can pool
them together to form a subpixel sampling of the ESF. The figure shows only the projection of some
of the pixels as examples.
samples at many different distances from the edge and, when all of them are projected onto
a line perpendicular to the slanted edge (see Fig. 16.11), they can be pooled together to
form a subpixel sampling of the ESF,1 allowing the computation of the spatial frequency
response much beyond the Nyquist frequency of the pixel grids. A well-known method for
estimating the device SFR from a digital image of a slanted edge [150, 788] relies on taking
the first derivative of the ESF to compute the LSF, from which one can compute the device
SFR as shown in Eq. (16.38). In order to estimate responses at other spatial frequencies, we
have to take images with the knife edge slanted at different orientations.
is minimized, subject to the constraint a 2 + b2 = 1. First, we set the first partial derivative
1 If the slope of the slanted edge is a rational number, the projections form a periodic sampling with a period
smaller than the original pixel sampling. If the slope is not a rational number, then projections are not exactly
periodic and the between-projection intervals are irregular.
16.4 Problems 413
16.4 Problems
16.1 Given the four data points in Table 16.1, we want to find the straight line that best
fits the four data points. (a) What is the least square solution if there is no error in x?
(b) What is the least squares solution if both x and y have errors?
Table 16.1.
i 1 2 3 4
16.2 Given the four (r, g, b) and (X, Y, Z ) pairs from a printer shown in Table 16.2, use
the tetrahedral interpolation to interpolate the (rp , gp , bp ) input to the printer so that
it will print the color (X p , Yp , Z p ) = (46.6, 31.9, 41.9).
Table 16.2.
i (ri , gi , bi ) (X i , Yi , Z i )
0 (115,104,107) (45.0,30.0,40.0)
1 (125,104,107) (49.0,32.0,43.0)
2 (125,114,107) (48.0,34.0,42.0)
3 (125,104,117) (47.0,33.0,44.0)
414 16 Device calibration
17.1 Introduction
Images are often considered as records of the physical scenes that we have seen. Therefore,
we wish to have images that reproduce the visual impressions of the original scenes as we
remember them. Among the various attributes that contribute to the total visual impression,
tone and color are two of the most important factors. Tone reproduction is the process of
reproducing the visual brightness/lightness impression of the original scene in an image.
Similarly, color reproduction refers to the process of reproducing the visual color impression.
Although color perception involves brightness/lightness perception, the two topics will be
discussed separately, with the implied, narrower definitions that tone reproduction deals
with luminance perception and color reproduction chrominance perception. However, it
should be understood that there are interactions and trade-offs between the two processes.
The criteria and goals of tone reproduction vary from application to application, and we
will mainly be interested in consumer imaging applications.
Since the success of a tone reproduction is finally judged by human observers, there are
at least two separate systems involved in a tone reproduction task, i.e., the imaging system
and the human visual system. Therefore, it is convenient to divide any tone reproduction
into three processes: (1) the subjective process that specifies what a desired reproduction
should be in terms of visual impression, (2) the psychophysical (translation) process that
converts the perceptual criteria as specified in the subjective process into physically quan-
tifiable criteria, and (3) the objective process that deals with calibrating and controlling
image devices to achieve the desired reproduction in terms of physical quantities. It is ar-
guable whether the first subjective process can be or should be separated from the second
psychophysical (translation) process. In the existing literature, these two processes are in
fact merged into one [470, 704, 706]. However, there are good expository reasons to sep-
arate them as we do here. The main reason is that visual impression relies on terms that
are not easy to define, but seem to be intuitively clear to us. Therefore, it is very effective
to be able to describe the tone reproduction criteria in those terms. The psychophysical
process that is supposed to translate these meaningful perceptual terms into physically
quantifiable criteria is a much more difficult task. If we separate the two processes, we
can better evaluate the cause of the success or failure of the entire tone reproduction. A
good example of such a practical use is as follows. When we say that one object appears
to be brighter than another object the meaning is quite clear to everyone. Therefore, we
415
416 17 Tone reproduction
can say that we would like to make the reproduced tone scale preserve the brightness of
the scene as we see it. This criterion of reproducing the perceived brightness is clearly
understandable to us, but to translate the visual impression of brightness into a physically
quantifiable measure turns out to be extremely difficult. We can take the position that if we
cannot quantify a term, then we should not use that term. Unfortunately, this would make
most discussion even more difficult and not necessarily more precise. The question can also
be raised as to whether we are hiding all the difficulties in the second process. The answer is
that this is not true. For example, the subjective criterion could be changed to reproducing
the perceived contrast, and the change would have a pronounced effect on the direction
in which we should conduct the psychophysical process. This will be clear in our later
discussion.
In the third objective process, we are interested in establishing the relation between the
reproduced luminance and the original scene luminance. This relation is often expressed
as a curve, called the tone reproduction curve (TRC) or the tone scale curve. For example,
we would like to determine the image illuminance on the film or sensor as a function of
the scene luminance, the output luminance of the display device as a function of the input
image signal, and the illumination flare as a function of viewing luminance. The objective
process is quantitative and can be experimentally measured and verified. Tone reproduction
that aims at producing artistic expressions of specific moods or feelings is fascinating, but
its goal is beyond our current understanding so we cannot deal with it effectively. Other
applications, such as scientific imaging, have quite different tone reproduction objectives
than that of consumer imaging, but the objective tone reproduction processes underlying
all applications are very similar to each other.
In the discussion of tone reproduction, it is convenient to consider monochromatic im-
ages (such as black-and-white pictures) as the main subject of study. Tone reproduction
of color images will be discussed in the context of color reproduction in the next chap-
ter. Studies show that good tone reproduction for monochromatic images also applies to
color images [427]. Although tone reproduction is the most important aspect of making
a good image, it is rarely discussed in the digital image processing literature. For many
people, the pixel values in a digital image are simply assumed to represent scene radiances
at those pixel locations and they are often called the gray levels, implying that when the
image is printed on a reflective medium, such as paper, the pixel values are to be printed
as reflectance factors. However, a good tone reproduction is much more complicated than
this simple intuition implies. First of all, the pixel values are rarely, if ever, proportional to
scene radiances, because of the presence of camera flare or scanner flare. Secondly, scene
radiances have a much wider dynamic range than most media can render, and a straight
one-to-one linear reproduction of scene radiances on a medium is not only impossible, but
also undesirable. The reason is that the absolute luminance level of the scene may not be
reproducible in the medium, and, even when it is reproducible, it is not really desirable
because highlights and shadows will be clipped, without visible details. Even in the very
rare cases in which all scene luminances can be reproduced on the medium, it is often found
that a straight one-to-one linear reproduction does not produce an image as well as we might
think.
17.2 TRCs 417
1000 lx
200.0
500 lx
100.0
250 lx
0.0
0.0 5000.0 10000.0 15000.0 20000.0
scene luminance (candela per square meter)
Figure 17.1. TRCs for a reflection print viewed under three different levels of illumination.
17.2 TRCs
We have defined the TRC as the curve that describes the relation between the reproduced
luminance and the original scene luminance. Since every imaging system has its own spectral
responsivity function, luminance as defined by the CIE luminous efficiency function is not
always the most appropriate metric for this purpose. However, luminance is calculated
with the human visual sensitivity and is thus quite appropriate for describing the visual
impression of the tone reproduction. For this reason, luminance has always been used for
this purpose. It should be understood that often the metrics used in a TRC are measured
with system or instrument spectral responsivities. In those cases, the TRCs are related to
luminances only in a proportional sense for a fixed spectral composition.
There are several ways that we can plot the TRC that relate the reproduced image lumi-
nance to the scene luminance. The most direct plot is to show the luminance of reproduction
as a function of the scene luminance. Figure 17.1 shows the TRCs of a reflection print viewed
under three different illumination levels. There are several drawbacks with TRCs plotted in
this way. First, it requires many curves for a single reflection print, one for each illumination
level. As we will see shortly, the appearance of a reflection print does not change much over
a range of illumination, say, from 250 lx to 1000 lx. Therefore, it is not efficient to use
the absolute luminances as the variables. For reflection prints or transparencies, it is more
efficient to specify the reproduction luminance on a relative scale. The minimum density
reproducible on a medium is often used as the reference. Similarly, the scene luminance is
usually specified on a relative scale. For example, the perfect Lambertian surface (reference
white) can be used as the reference. Second, the perceived brightness or lightness of an
object is not linearly proportional to the luminance. The slope of the curve in a linear–linear
plot does not give us information about the perceptual rate of change at any region on the lu-
minance scale. Due to the wide acceptance of Weber’s law, log luminance is most frequently
used for the luminance scale. It is also well known that the slope in a log luminance vs.
418 17 Tone reproduction
0.0
−1.0
−1.5
−2.0
−3.0 −2.0 −1.0 0.0
relative log scene luminance
(a)
2.0
1.5
density
1.0
0.5
0.0
−3.0 −2.0 −1.0 0.0
relative log scene luminance
(b)
Figure 17.2. TRCs for a reflection print, plotted in log space: (a) the relative log luminance of the
reproduction as a function of the relative log luminance of the scene; (b) the density of the reflection
print as a function of the relative log luminance of the scene.
log luminance plot correlates very well with perceived contrast. Therefore, the log–log plot
is, by far, the most widely-used way to plot the TRC. However, measurements on repro-
duced hard-copies are usually done with densitometers, and the reproduction luminance is
expressed as a density. Here, we have to be aware of the fact that there are several den-
sitometric standards. The one most frequently used for evaluating photographic reflection
prints is the Status A density. In most photography or graphic art literature, the TRCs are
plotted as density vs. relative log scene luminance. Figure 17.2 shows both types of log–log
plot for the same reflection print as used in Fig. 17.1. Figure 17.2(a) is a log luminance vs.
log luminance plot and Fig. 17.2(b) is a density vs. log luminance plot. In this case, we have
assumed that the minimum density of the paper is 0. In general, the curve in Fig. 17.2(b)
will be shifted up by the minimum density of the paper, usually on the order 0.05–0.1. The
log–log plot not only looks quite different from the linear–linear plot, but also shows more
17.3 The concept of reference white 419
information that is relevant to our image perception. The nearly straight line portion on the
log–log curve (roughly from −1.6 to −1.0 on the x-axis) tells us that within this range of
luminance the perceptual contrast is approximately constant. This range corresponds to a
scene luminance range 500–2000 cd m−2 on the linear–linear plot, occupying only a small
fraction of the range on the linear axis.
In our discussion of TRCs, we mentioned that it is useful to deal with relative log luminances,
instead of absolute log luminances. In that context, we mentioned the concept of reference
white and used it to calculate the relative quantities. For example, in viewing a reflection
print, we often use the minimum density of the paper as the reference white. However,
the concept is ambiguous in other situations. Some clarification is needed. In viewing the
original scene, we mentioned that a perfect (100%) Lambertian reflector (called diffuse
white) can be used as the reference white. This in itself is not sufficient because the angle
between the reflector and the light source has to be specified too. Furthermore, the reflector
may not be uniformly illuminated in every part that is visible to the camera. Obviously,
natural scenes may be more complicated than we can define exhaustively. For the purpose
of our future discussion, we will assume that there is a main subject in the image and the
imaginary, perfect reflector has the same shape and spatial location as the main subject.
If the illumination is not uniform, a subjective preference comes in as to how that subject
should be produced. The “reference white” is similarly adjusted. For example, if the subject
is half in the shade and half in the sun, and the preference is to produce the shaded part
properly and let the sunlit part be too light, the reference white should be defined relative
to the shaded part.
For viewing a projected transparency image in a darkened room, choosing a reference
white is again problematic. If we choose the open-gate (meaning no slide in the projec-
tor) luminance of the screen as the reference white, specular highlight in the image cannot
be rendered at all. Empirical tests show that the “reference white” in the scene should be
produced somewhat darker than the open-gate luminance. Figure 17.3 shows an exam-
ple of how the reference (diffuse) white of the scene is typically produced for a reflec-
tion print and for a transparency. In this example, the scene white is produced at about
0.2 above the minimum density (Dmin ) for the transparency and at about 0.1 above the
minimum paper density for the reflection print. However, this placement of scene white
is not necessarily optimum for all pictures. There is sufficient variation in the optimum
placement that we should not take these numbers literally as recommendations. The in-
herent difficulty in defining the reference white is only part of the cause for this vari-
ation. Scene content, viewing illuminance, and the shape of the TRC also play a role
here.
In summary, the concept of reference white is ambiguous for many applications. It
can be defined only when the subject is a planar object, uniformly illuminated. In future
discussion, the term should be understood with the ambiguity in mind. Whenever the main
subject can be assumed to be a uniformly illuminated, planar object, the reference white
420 17 Tone reproduction
4.0
2.0
bright
surround
1.0 diffuse
white
0.0
1.0 2.0 3.0 4.0
relative log scene luminance
Figure 17.3. The average optimum TRCs for a reflection print (bright surround) and a transparency
(dark surround) from empirical studies [706]. Note that the zero density point is the minimum density
(Dmin ) reproducible for the intended medium. The diffuse white in both cases is reproduced at some
density above Dmin , so that specular highlight can be rendered with some visible contrast. The dashed
line has a slope of 1.0.
can be defined. In other cases, quantitative analysis involving the use of a “reference white”
can only be as exact as that term can be defined for the particular imaging condition under
discussion.
The early studies on tone reproduction did not attempt to formulate what is to be achieved
subjectively in the tone reproduction process. Instead, the experiments produced a large
number of images, each through a different TRC, on reflection prints and projected trans-
parencies. The procedure was to ask the observers to select the best reproduction of a
given scene and then the corresponding TRC used to generate that picture was ana-
lyzed. Although it is not possible to derive truly optimum tone reproduction this way,
these experiments were very successful in discovering some of the important factors in the
tone reproduction process. Among them are scene content, luminance level, and viewing
surround.
Two of these early experiments will be described here because the insights gained from
their results are very helpful to our understanding of the dimension of the tone reproduction
problem. The first experiment was done by Simonds [880] for reflection prints, and the
second experiment by Clark [197] for transparencies. One of the key questions in the
design of these experiments is how to systematically generate different TRCs that can be
17.4 Experimental studies of tone reproduction 421
used to produce images for subjective evaluation. At that time, photographic film and paper
were the main interests of the organizations where the research was done. It was discovered
that the characteristic curves (density as a function of log exposure) of films and papers
could be well described by a few (four or five) vectors derived from principal component
analysis. This implied that, by properly choosing different films, papers, and processing
variations, they could generate various tone reproductions for a given picture.
Simonds used four scenes for his study. Scene A was an indoor wedding portrait with
a moderately complex background and its log luminance range was 2.58. Scenes B and C
were close-up indoor portraits with uniform walls as the backgrounds, and the log luminance
range was 2.20 for both scenes. Scene D was an outdoor picnic scene with direct sunlight
and dark shadows, and its log luminance range was 2.70. The luminances of the scenes
were measured and recorded by photographic photometry. Various (black-and-white) films,
camera exposures, printer flares, (black-and-white) papers, and processings were used to
reproduce these four scenes on reflection prints. The reflection densities of these prints were
measured using a 45/0 geometry. The negatives were 5 in by 7 in and the prints were 8 in by
10 in. Scenes A, B, and C were printed on Polycontrast Paper with a semi-matte (G) surface.
The prints for Scene D were printed on the Polycontrast Paper with a glossy (F) surface,
having a higher maximum density than the semi-matte surface paper. Thirty experienced
observers ranked the prints in order of subjective quality. The prints were viewed under a
bank of fluorescent lights with the illuminance maintained at 861lx. The mean rank of a print
given by the observers was taken as its final quality measure. The mean rank numbers for a
given reproduction were linearly transformed to a quality scale Q from 0 to 100, in which
100 means the optimum quality reproduced for that scene. It was found that approximately
10 units on this quality scale represent a quality change which would be discerned and
similarly ranked by at least 85% of the observers.
Clark performed a similar experiment for projected transparencies (reversal films). He
used only two scenes: Scene 1 was the same as Scene C in Simonds’ study and Scene 2 was
an outdoor sunlit scene with a brick house, trees, shrubs, a car, and a human subject. The
transparencies were printed from 5 in by 7 in negatives by projection onto 3.25 in by 4 in
projector slide plates. A wide variety of tone reproductions were generated by varying the
exposure and processing, by the choice of the contrast grade of the plates, and by the use
of a variety of positive masks. For Scene 1, several negatives were prepared with different
negative materials and various levels of exposure and processing, so that more variations of
tone reproduction could be obtained. The major difference between this experiment and that
by Simonds is that the slides were viewed in a dark surround. The visual angle subtended by
the projected image was 35◦ in diagonal, leaving plenty of peripheral areas dark. From an
earlier work by Breneman [136], it was known that an image viewed with a dark surround (as
in viewing slides) produces a very different visual impression from that of the same image
viewed with a light surround (as in viewing reflection prints). In order to study the effect
of the surround, Clark’s experiment included a second part study in which the projected
image was surrounded with a bright border about one-third of the width of the image. One
other factor that was studied in the experiment was the effect of the luminance level of the
projected image on its perceived quality. The following paragraphs summarize the main
results from these and other experiments [427, 472, 704, 706].
422 17 Tone reproduction
2.0
A: range=2.58
B: range=2.20
C: range=2.20
1.5 D: range=2.70
density
1.0
0.5
0.0
−3.0 −2.0 −1.0 0.0
relative log scene luminance
Figure 17.4. The optimum TRCs for the four scenes in Simonds’ study on reflection prints.
140.0
overall optimum
120.0 luminance Lmax
maximum quality with
varied tone reproduction
subjective image quality
100.0
80.0
60.0
fixed tone reproduction
for luminance = Lmax
40.0
20.0
0.0
0.0 1.0 2.0 3.0
log maximum image luminance on screen
Figure 17.5. The optimum quality of the projected image as a function of the projector luminance
for a given scene.
4.0
3.0
density
2.0
1.0
0.0
−3.0 −2.0 −1.0 0.0
relative log scene luminance
Figure 17.6. Comparison between the optimum TRCs for dark surround and light surround view-
ing conditions. The upper curve is for the dark viewing surround and the lower curve is for the
light surround. The two dashed curves are straight lines with slopes equal to 1.5 (upper) and
1.34 (lower).
clearly seen by comparing the optimum TRCs of the same scene for the dark surround (as
in the slide viewing condition) and for the light surround (as in the reflection print viewing
condition). Scene C in Simonds’ reflection print experiment is the same as Scene 1 in Clark’s
slide experiment. The optimum tone reproduction curves for these two viewing conditions,
as determined by the observers, are shown in Fig. 17.6. The major conclusion is that the
optimum slope (also called gamma) of the TRC for the dark surround viewing condition
has to be much higher than that for the light surround viewing condition. On the average
for many scenes, the optimum gamma is about 1.5 for dark surrounds and 1.0–1.2 for light
surrounds [427, 706].
scenes was found to be 160:1. The luminance of a scene can be greatly reduced if smoke
or fog is present. When an image is printed such that the darkest shadow has a density
less than 1.3 (corresponding to a luminance range of less than 20:1), it often gives a visual
impression of fog or flare. In digital printing, adjusting the TRC is an easy thing to do, but
the shadow has to be rendered as a good black to produce a good image. This is a fairly
consistent requirement in tone reproduction.
In an experiment to determine the best control of photographic printing, Jones and
Nelson [472] reported that the majority of the negatives they used tended to produce the
most preferred print when the minimum density of the negative (the darkest area in the
scene) was printed on the shoulder of the D–logE curve of the paper (at a density of
about 1.46–1.49, which is 0.1–0.07 less than the maximum density, Dmax = 1.56, of the
paper used in that experiment). Although we have to be cautious about this conclusion
because the number of negatives they used was quite small (only 171), and because all the
scenes used were front-lighted [472, p. 585], this finding seems to agree with our common
experience in printing images. Another interesting aspect of this finding in the Jones and
Nelson experiment was that this preference of having good black came at the price of
losing highlight details in many of the prints, and yet most observers chose this trade-off
very decisively! Those two authors suggested that in making the trade-off between losing
the shadow details and losing the highlight details, the observers preferred to preserve the
shadow details. In reality, the trade-off is more complicated than that. We also know that
most people preferred faces to be reproduced at a density slightly lighter than that of the real
facial skins. Typical Caucasian skin has a luminance factor about 37 and the preferred skin
reproduction is at a luminance factor of 39. This preference of flesh color reproduction is
an additional constraint on tone reproduction other than the trade-off to render the shadow
or highlight details.
There is one minor question raised in Jones and Nelson’s experiment: why was the
preferred black at a density 0.1 less than the paper Dmax ? This question cannot be resolved
because the spot size used in these density measurements was not given in the original
publication and we can only suspect that there were image details finer than the spot size
of the densitometer used. If the negative density was measured with a much smaller spot
size (as small as the smallest visible details when the negative is printed on the paper), it is
likely that it would have been preferred that the minimum negative density was printed at
the paper Dmax . But, this is not necessarily the right thing to do because a smaller spot will
produce high noise in the reading and the absolute minimum density so measured may no
longer be reliable.
In the last section we described some of the early efforts to determine the optimum tone
reproduction through experimental trials. Although these experiments gave us many useful
insights, blind variation of TRCs cannot really give us the true optimum results and it would
be too expensive to conduct such an exhaustive search experiment for every different viewing
condition, and every different camera, flare, film, processing, and printer combination. For
426 17 Tone reproduction
these reasons, several attempts have been made to come up with a good theoretical criterion
for tone reproduction. Some of these efforts have been to define in perceptual terms what
is to be achieved in the ideal tone reproduction. In the early days before digital printing
was affordable, the effort was mainly on designing a “system-level” TRC, which could
be used as the default tone scale for all images. This is still important in digital printing
applications because a good system tone scale can serve as the baseline on which adjustment
can be made digitally on an image-by-image basis. As digital printing becomes widely
available today, subjective tone reproduction criteria have become much more complicated,
because high-speed digital processing can now be used to manipulate spatial details pixel
by pixel, which is equivalent to dodging and burning in a photographic dark room, only
more efficient and precise. Tone reproduction can no longer be described by a single curve.
For this reason, it is important for us to study the subjective criteria for tone reproduction
at a conceptual level, rather than at an implementation level. Building upon the perceptual
criteria, we can use spatial processing to implement whatever perceptual tone adjustment we
desire.
4.0
2.0
illuminated surround
(reflection print)
1.0
0.0
dark surround
−1.0 (transparency)
−2.0
−3.0
−4.0
−1.0 0.0 1.0 2.0 3.0 4.0
log scene luminance (cd m−2)
Figure 17.7. Ideal TRCs that reproduce the relative brightness of the scene. The brightness is calcu-
lated according to the Bartleson–Breneman brightness model. The luminance of the scene reference
white is 5000 cd m−2 and the luminance of the reference white in the reproduction is 300 cd m−2 . The
solid curve is for the illuminated surround (reflection print viewing condition) and the dashed curve
is for the dark surround (transparency viewing condition). The slope of the solid curve is between 1.1
and 1.2, while that of the dashed curve is between 1.40 and 1.5.
Therefore, we can also use these new brightness models to derive the TRCs. The general
effect of the viewing surround on brightness perception is similar to that of the Bartleson
and Breneman model, and therefore we will not discuss this further here.
Relative brightness as a subjective reproduction criterion is incomplete for several rea-
sons, two of which are: (1) The display media have limited dynamic range and, therefore,
cannot reproduce the relative brightness as required by the criterion. The criterion gives
no guidance as to how the highlight and shadow brightness should be compressed. (2) The
brightness model relies on the “reference white” as the anchor point. The criterion does not
tell us at what density the “reference white” should be reproduced. For reflection prints, it
seems that the paper white can be used as the “reference white”. However, doing so will
leave no room for printing the specular highlight, which is brighter than white. Experiments
show that the reference white of the scene should be printed somewhat darker than the paper
white to produce optimum image quality.
the “system” tone reproduction, not the individual images. With the advent of high-speed
computing, we may ask if there are other tone reproduction criteria that are more general and
more desirable. One of the well-known properties of our visual system is that our perception
is correlated more with the contrast signals than with the image irradiance signals. It is the
difference between spatially neighboring elements that determines our perception, not the
absolute luminances at those elements. If the luminance of each image point is increased
by a factor of 10, the image will still look about the same. We know this is more or less true
over a limited luminance range. The idea of “reproducing the visual contrast of the scene
on the image” is thus a very reasonable tone reproduction criterion. The difficulty is in the
implementation.
There are many issues to be addressed in carrying out such a tone reproduction process.
First, how do we define the visual contrast of a scene? Second, how does the perceived
contrast change with viewing conditions, such as luminance level and surround? Third,
how is the quality of the reproduction affected by the unavoidable distortion in contrast
reproduction? In spite of these difficulties, early attempts to achieve such a tone reproduction
objective seem to show encouraging results [751]. This is an area of active research.
brightness) would predict. Let us express it in a different way. We know that the best tone
reproduction is scene-dependent, but we also know that this dependence is related, to a large
extent, to the lighting and, to a much smaller degree, to the objects present in the scene.
This is why a large number of the pictures taken by consumers are reproduced very well by
the fixed system tone reproduction process that exists in the current photographic system.
The scenes that are not handled well are mostly those with extreme lighting conditions,
such as harsh flash and backlit scenes. This is an important point to keep in mind when we
review the various methods of histogram modification. Another thing worth pointing out is
that one need not constrain oneself to the use of histogram modification for deriving a tone
transformation curve. A simple and effective way to produce a good tone curve for an input
image is to extract some histogram features and use those features to adjust the parameters
that control the shape of the tone curve [407].
There are two key issues in histogram modification methods: (1) how to sample the
image so that the resulting histogram represents more directly what we want to measure;
and (2) how to modify the histogram so that the resulting tone reproduction will render
the image in the way we desire. Obviously, sampling every pixel in a digital image to form
the histogram will make the histogram too sensitive to the scene objects and not sensitive
enough to the scene lighting. The best example is an image in which a human subject is
standing in front of a uniform wall. The histogram is dominated by the pixels of the wall
and almost no useful feature about the scene lighting can be extracted from the histogram.
This is one of the main reasons why histogram equalization as originally proposed does not
work in general.
One of the new ideas in histogram modification methods is the realization that the input
image has to be sampled intelligently to avoid the excessive bias introduced by irrelevant
pixels. One of the early attempts was to sample a pixel only when it is sufficiently different
from its neighbors in a raster scan order [421]. Other sampling schemes use Laplacian or edge
operators [16, 180, 671, 735]. These methods attempt to derive a histogram that represents
the log scene luminance distribution of the modulated parts of an image. It is known from
experiments on stabilized retinal images that edges are important in our visual perception.
Therefore, the luminance histogram of the edges in an image represents a simple extraction
of what our visual system sees. From an argument based on the central limit theorem in
statistics [16, 790], we can expect that the log scene luminance histograms tend to be
normally distributed. Therefore, it is claimed that if a TRC is constructed to transform the
input edge histogram into a Gaussian distribution, the image will look esthetically pleasing
[16, 180, 671, 735]. One may argue that, since the Gaussian distribution is only a statistical
tendency, it is not reasonable to expect that every individual image should have a Gaussian
histogram for its edge pixels. Based on thinking along those lines, a flexible algorithm was
proposed that modifies the shape of the output target histogram as a function of the input
histogram [574]. It seems to produce reasonably good results for most images.
Another variation of histogram modification is to measure the amount of image modula-
tion activity present in an image and form an image activity histogram as a function of the
digital code value in the image. The image activity histogram is then used to construct the
TRC for that image [572, 576]. The image activity measures can be based on the number
of level-crossings, the image gradient, the image Laplacian, or the number of neighboring
17.6 Density balance in tone reproduction 431
pixels that have similar gray levels. The focus of attention in this type of approach has
shifted from measuring how many pixels have a certain property to measuring how much
image activity is present in the image at what tonal range. This shift in attention repre-
sents an attempt to construct a histogram that is directly coupled to the main objective of
reproducing image details.
The term density balance refers to adjusting the overall brightness of a reproduced image.
In traditional photographic applications, both reflection prints and transparencies are mea-
sured in density, and therefore the brightness adjustment is done by changing the density
in the reproductions. For this reason, the term density balance is used for this manipula-
tion. According to our definition of tone reproduction, density balance is part of the tone
reproduction process. In a traditional negative–positive photographic system, the shape of
the system TRC is fixed. The only variable under standard processing and printing was in
exposing the image on the negative film onto the photographic paper. Here, the exposure
can be adjusted to make the final image look brighter or darker. The density balance oper-
ation is thus also called the printer exposure control. In digital printing, the density balance
can occur any time before the digital image is sent to a digital (inkjet, thermal, or laser)
printer. If the digital image is in log exposure metric, the density balance is accomplished
by simply adding or subtracting a number from every pixel of the image. If the digital
image is in a linear exposure metric, a multiplication or division has to be performed. The
density balance operation is equivalent to increasing/decreasing the exposure time of the
camera or increasing/decreasing the light source intensity in the original scene, both being
multiplicative operations in linear luminance metric.
In consumer imaging applications, thousands of prints are made every hour by a photofin-
ishing machine. Density balance has to be performed automatically for each image. Com-
putation of the optimum density balance is the single most important factor in the print
quality of a well-exposed negative.
432 17 Tone reproduction
Having discussed various tone reproduction criteria, we are now ready to study the tone
reproduction process, in which the TRC of an imaging system is derived. Let us take a
published case study [573] as an example. In this application, the goal is to derive good
TRCs for computed radiography. This example is simpler than the consumer photographic
system, but contains all the key elements of a tone reproduction process. Therefore, we will
use it to illustrate how we go about designing tone reproduction curves.1
The absorption and scattering of x-rays produces a shadow image of the internal structures
of an object. This shadow image represents the x-ray transmittances through the object.
The mathematical mapping of this invisible x-ray transmittance to the visible film density
is called the tone scale of the radiographic system. Since soon after their discovery by
Roentgen in 1895, x-rays have been used as a tool for medical diagnosis. Until the 1990s,
photographic films have been the dominant sensing (direct, or indirect from phosphor screen)
and display media. The characteristic curves of the combined screen/film systems thus
determine the tone scale mapping for diagnostic radiography. In computed radiography, an
x-ray transmittance image is recorded on a phosphor screen and then a laser scanning system
is used to read out the exposure signals which are converted into a digital image. The digital
image values are often calibrated so that they are proportional to log exposures of the x-rays
on the phosphor screen. In order to view the image, the digital image is often printed on a
radiographic film. The film writer is often calibrated in film density. The question is: what
is the optimal mapping from the digital image value (in log exposure) to the film density?
This is similar to the question in consumer imaging where the relative scene log exposure
is mapped to paper density on the reflection print.
As we discussed at the beginning of the chapter, there are three components in a typical
tone reproduction process: (1) the subjective criteria that specify a desired reproduction
in terms of visual impression, (2) the psychophysical model that translates the perceptual
criteria into physically quantifiable criteria, and (3) the objective process that deals with
calibrating and controlling image devices to achieve the desired reproduction in terms of
physical quantities. Let us discuss these three components one at a time.
The first question is: what is the objective of the TRC? A simple answer for diagnostic
radiography is that we wish to optimize the chance for a radiologist to visually detect a tumor.
If the TRC is not designed well, the tumor may be more difficult to see in a darker area than
in a lighter area, or the reverse. Therefore, a logical objective is to make a tumor equally
visible independent of the density of its background. To a first-order approximation, equal
thickness increments of a homogeneous object correspond to equal log exposure differences.
A given object in the path of x-ray beams will absorb a certain percentage of the x-ray energy
independently of what is in front of or behind it. That means that an object (say, a tumor)
will roughly create a certain log exposure difference in the image no matter what density the
surrounding area has. (This, of course, is only approximately true, because of the scattering
and the polychromatic nature of the x-ray source.) Based on this objective, Van Metter [573]
1 This section is a shortened and modified version of a paper [573] published in Proceedings of SPIE, Volume
3036. Readers interested in more details should read the paper.
17.7 Tone reproduction processes 433
proposed that the tone reproduction criterion is to produce equal brightness difference for
equal log exposure difference.
The second question is: what is the physical correlate of visual brightness? There are
many brightness models that we can use to answer this question. However, x-ray image
viewing conditions are somewhat different from other image viewing conditions, such as
for movies, TVs, or transparencies. It is necessary to verify the performance of each model
for such viewing conditions. Seven brightness models and their parametric variations were
tested: (1) Hunt’s color-appearance model [432], (2) CIELAB L ∗ [794], (3) Bartleson and
Breneman’s model [75], (4) the Michaelis–Menten function with a small local support
[851], (5) the Michaelis–Menten function with one global adaptation parameter [718], (6)
power laws [80, 117, 903], and (7) the logarithmic function (the Weber–Fechner law [906]).
Although a number of the models produced reasonable uniformity in brightness contrast,
none of them produced truly equal brightness differences across the entire gray scale. The
experimental results show that:
1. The logarithmic function and the linear functions are clearly not good, i.e., equal-density
or equal-luminance differences do not correspond to equal-brightness differences. For
the logarithmic function, the brightness increment is too small at the dark end and
too large at the light end. This is the opposite of the linear function: equal-luminance
increments produce too small a brightness increment at the light end and too large a
brightness increment at the dark end.
2. The power law, B = L p − c, where L is the luminance and B is the brightness, seems
to give a fairly uniform brightness scale when p is between 0.2 and 0.3.
3. With proper choice of parameters, Hunt’s model and Bartleson and Breneman’s model
also produce very respectable uniform gray scales, comparable to that from the power
law.
4. The Michaelis–Menten function characterizes the rate of biochemical reaction when
an enzyme is involved [326, 658, 917]. It also describes very well the physiological
response of primate cones [978] and the psychophysical response of human brightness
perception [409, 432, 718]. The Michaelis–Menten (or Hill) function
Bm L n
B= , (17.2)
L n + L n0
where B is the perceived brightness, Bm is a scale factor, L is the luminance of the
object, and the Hill coefficient n = 0.7 seems to give the best uniform gray scale for
this particular application. Let L a be the adapting luminance (cd m−2 ) of the visual
field, L 0 can be calculated as
L 0 = 12.6 × L 0.63
a + 1.083 × 10−5 . (17.3)
Let L w be the luminance of the reference white (the minimum density area of the x-ray
film). Hunt [432] suggested that L a = 0.2 × L w .
Among all the brightness models tested, the Michaelis–Menten function (Eq. (17.2)) per-
forms best for the x-ray image viewing applications. Therefore, we will choose it as the
brightness model.
434 17 Tone reproduction
II density I
D1
U3 U4
L1 E1
luminance log exposure
U2
B1 U1
IV
III brightness
Figure 17.8. An example of a graphic method of deriving a TRC for an x-ray film system. The tone
reproduction objective in this system is to produce equal-brightness difference for equal-log-exposure
difference.
The third question is: how is the image luminance related to the film density? For x-ray
image viewing, the luminance of an image area, L, is related to the film density, D, by
L = S × 10−D , (17.4)
B = a log H + b. (17.5)
The parameter a controls the contrast (or gamma) of the tone scale and the parameter b
controls the exposure or speed of the film. From the brightness model (Eq. (17.2)) and Eqs.
(17.4) and (17.5), we can construct the ideal tone scale curve for any given contrast a and
speed b. Figure 17.8 shows an example of such a construction. The first quadrant shows
the tone scale curve to be constructed. Quadrant II shows the relationship between the film
density and the luminance of an image area as described by Eq. (17.4). Quadrant III shows
the relation between luminance and perceived brightness as described by the Michaelis–
Menten function (Eq. (17.2)). Quadrant IV shows the linear relation (Eq. (17.5)) as required
to achieve Van Metter’s objective. A given exposure E 1 defines a point U1 on the brightness
vs. log exposure curve, which in turn defines a brightness point B1 . Point B1 defines point
U2 on the luminance vs. brightness curve. This in turn defines a luminance point L 1 . Point
L 1 then defines point U3 on the luminance vs. density curve. Point U3 defines a density
17.7 Tone reproduction processes 435
3.0
2.0
density
1.0
0.0
0 1000 2000 3000 4000 5000
log exposure * 1000
Dmax − Dmin
D = Dmin + . (17.6)
1 + 10β(log H0 −log H )
We can use the above function to generate a smooth toe and a smooth shoulder, and paste
them on the visual aim curve in Fig. 17.9 to produce a smooth curve shown in curve (b)
of Fig. 17.10, so that the overall TRC will have a gentle roll-off in the shadow and in
the highlight, while the mid-tone region is the visually optimized tone curve. In order to
produce a smooth tone scale curve, we require that the visual tone scale curve and the
Green–Saunders curve be continuous in the first derivative. This is achieved through the
following replacement calculation.
Let D = V (log H ) be the visually optimized tone scale curve as determined by Eqs.
(17.2), (17.4), and (17.5). Let D = A(log H ) be the aim tone scale curve to be constructed
from D = V (log H ) by rolling off the toe and the shoulder.
436 17 Tone reproduction
(a) INSIGHT HC
3.0 (b) aim (b)
(a)
2.0
density
1.0
0.0
1000 1500 2000 2500 3000 3500
log exposure * 1000
Figure 17.10. Comparison between a visually optimized tone scale curve (b) and the KODAK
INSIGHT HC Thoracic Imaging Screen/Film tone scale (a).
Let Dt be the density where D = A(log H ) starts to deviate from the ideal curve, D =
V (log H ), at the toe, and G t be the slope at D = Dt :
% %
dA %% dV %%
Gt = = . (17.7)
d log H % D=Dt d log H % D=Dt
Since D = V (log H ) can be numerically generated from Eqs. (17.2), (17.4), and (17.5), G t
can be numerically calculated as well.
Letting y = 10β(log H0 −log H ) , the Green–Saunders equation can be written as
Dmax − Dmin
D − Dmin = . (17.8)
1+y
Its derivative is
Dmax − Dmin
yt = − 1. (17.10)
Dt − Dmin
G t (1 + yt )2
β= . (17.11)
(Dmax − Dmin )(ln 10)yt
17.8 Flare correction 437
Having determined yt and β, the only unknown left is log H0 , which can be found from:
log H0 = log Ht + (log yt )/β, (17.12)
where log Ht is the log exposure that maps to Dt when D = V (log H ) is generated.
Let Ds be the density where D = A(log H ) starts to deviate from the ideal curve, D =
V (log H ), at the shoulder, and G s be the slope at D = Ds . The above procedure can also
be applied to generate a roll-off shoulder with the Green–Saunders equation.
Figure 17.10 shows an example of an aim tone scale curve generated to match the
slope and exposure of the KODAK INSIGHT HC Thoracic Imaging Screen/Film tone scale
curve at density 0.9. It can be seen that a higher contrast in the highlight and the shadow is
desirable for improving this screen/film system. However, this comes at the price of reducing
the exposure latitude of the film.
In our study of the tone reproduction process for computed radiography, we did not consider
the effect of viewing flare because we assumed that the x-ray images are viewed in a dark
room. The major effect of viewing flare is to reduce the perceptual contrast in the dark areas
in an image, similarly to the effect of camera flare (imaging flare). Since a system tone
scale curve cannot correct for localized flare, the best one can do is to assume a uniform
flare model and correct for the average effect. For example, let us assume that the ideal
camera exposure at a pixel without flare is H and the amount of flare is Hf , then the actual
camera exposure H = H + Hf . In a linear–linear plot, the effect of flare is simply an
offset. However, we know that a log–log plot is perceptually more meaningful. Therefore,
the effect of flare is better shown in a log–log plot as in Figure 17.11. This clearly shows that
the slope in the dark areas of an image is greatly reduced, and so is the perceived contrast.
Compensation for the imaging flare in a tone reproduction process can be done simply by
1.0
0.0% flare
0.5 0.5% flare
1.0% flare
0.0
relative film log exposure
−0.5
−1.0
−1.5
−2.0
−2.5
−3.0
−3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0
relative scene log luminance
Figure 17.11. The effect of camera flare in a log–log plot. The percentage flare is relative to the
reference white at log luminance = 0.
438 17 Tone reproduction
relating the real exposure H to the ideal exposure H , i.e., H = H − Hf . Compensation for
the viewing flare is performed at the output end. If the intended output image luminance is L
and the viewing flare is L f , the actual luminance the system should produce is L = L − L f ,
so that when the image is viewed, the viewing flare L f will be added to L to produce the
intended luminance L.
The tone scale in digital images that are intended for viewing on a CRT monitor has to
be predistorted to compensate for the nonlinear tone reproduction characteristics of a CRT
monitor. This process is called gamma correction. The CRT TRC is well described by a
power function: L = A · (V + B)γ , where L is the luminance on the monitor screen, V is
the input value to the monitor (usually normalized to 1 at the peak input), and A and B are
two constants. The value of γ varies from monitor to monitor, but is around 2.0–3.0. The
standard γ value is assumed to be 2.22 in most international standards. If a monitor is well
set up and carefully adjusted, B is usually set to zero, i.e.,
L = A · (V )γ = A · V 2.22 . (17.13)
The gamma correction curve is therefore the inverse function of the nonlinear TRC of the
CRT monitor, i.e.,
where Y is the image illuminance, which is linearly proportional to scene luminance. With
such a gamma correction, Eqs. (17.13) and (17.14) in combination give L = A · Y . Although
the relative scene luminance is reproduced on the CRT screen, the contrast of the resulting
images often appears too low and the colors too dull. There are three possible causes: (1)
camera flare, (2) viewing flare, and (3) dim viewing surrounds. In order to compensate for
flares, dark areas have to be driven darker than Eq. (17.14) would render them. In order
to compensate for dim viewing surrounds, the overall contrast has to be raised. There are
several ways to design a better gamma correction curve to accomplish these two goals. One
of the most popular methods is to use a linear segment in conjunction with the offset gamma
correction. Such a family of curves can be expressed as:
.
(1 + α)Y n − α for 1.0 ≥ Y ≥ Y0
V = , (17.15)
βY for Y0 > Y ≥ 0.0
where α and β are determined so that the two equations have the same first derivative at Y0
where they meet. The three parameters are related by the following relations:
# $1/n
α
Y0 = , (17.16)
(1 − n)(1 + α)
Empirical evidence shows that this family of functions produces very-high-quality images
when α is properly chosen to compensate for the camera flare, viewing flare, and viewing
surrounds. The only drawback is that flare compensation and contrast adjustment are coupled
in this function. If we want to keep the same highlight contrast and only darken the shadow,
another family of curves can be used:
. n
Y for 1.0 ≥ Y ≥ Y0
V = , (17.18)
aY 3 + bY 2 + cY for Y0 > Y ≥ 0.0
where
cY0 + (n − 2)Y0n
a= , (17.19)
Y03
Again, the two curve segments have the same first derivative at Y0 where they intersect with
each other. There are two degrees of freedom here. We can choose a desired slope, c, at
the origin and the point of intersection Y0 . These two degrees of freedom are very useful
because c also controls the quantization interval in the shadow region, as defined by Y0 . For
example, if a gamma correction table is provided to convert an eight-bit input signal to an
eight-bit output signal, a straight implementation of Eq. (17.14) will map input code 0 to
output code 0, and input code 1 to output code 21, leaving the first 20 code values unused in
the output. If we use c = 6 and Y0 = 0.12 in Eq. (17.18), input value 1 will be mapped to 6,
and the two curves only differ below Y0 . In fact, Eq. (17.18) often produces better images
because it compensates camera flare in the dark area. Equation (17.18) can also be used to
adjust the overall contrast by choosing a higher n value. Compared with Eq. (17.15), the
advantage of using Eq. (17.18) is that the contrast adjustment and flare compensation can be
explicitly controlled. The drawback of this family of functions is that the cubic polynomial
can produce a nonmonotonic curve and the final curve should be visually examined to see
if it is satisfactory.
The viewing conditions for television are also quite unique in that the room is often dim,
but not completely dark. Therefore, some ambient flare is always present. The luminance of
typical receivers is around 70–200 cd m−2 . From our discussions so far, we should be able
to develop a good TRC for television. We can take “reproducing the relative brightness” as
the objective and use the Bartleson–Breneman brightness model to derive the desired TRC.
It is also important to compensate for the viewing flare. By going through this exercise, we
will find that the resulting tone scale curve is not too far from the standard curves adopted by
various television systems. Let us look at one example of such standards. The international
standard (ITU-R BT.709-2) for encoding HDTV luminance channel is specified by the
following:
.
1.099Y 0.45 − 0.099 for 1.0 ≥ Y ≥ 0.018
V = , (17.21)
4.50Y for 0.018 > Y ≥ 0.0
440 17 Tone reproduction
0.0
one-to-one tone reproduction
ITU-R BT.709
relative log monitor luminance
-1.0
-2.0
-3.0
-3.0 -2.0 -1.0 0.0
relative log scene luminance factor
Figure 17.12. The TRC used in the HDTV luminance channel as specified by the international
standard (ITU-R BT.709).
If we assume that the video signal is displayed on a CRT monitor with a gamma of 2.22 and
a viewing flare of 0.1%, the TRC for the HDTV images can be derived. The result is shown
in Figure 17.12. From the figure, it is obvious that the curve has a slope much higher than
1, as is required by the Bartleson–Breneman brightness model.
17.10 Problems
17.1 A scene is reproduced as a reflection print (to be viewed in a light surround) and a
transparency (to be viewed in a dark surround). The minimum density of the paper is
0.1 and that of the transparency is 0.2. If the objective of the tone reproduction is to
reproduce the relative brightness, we learned from the Bartleson–Breneman brightness
model that the reflection print should have a gamma of 1.0 and the transparency should
have a gamma of 1.5. We will assume that there is no camera flare, that the TRCs are
straight lines, and that the minimum density of each medium serves as the reference
white.
(a) If there is no viewing flare and given that an object in the scene is reproduced at
density 1.8 on the reflection print, at what density should it be reproduced on the
transparency?
(b) Let us assume that the amount of viewing flare is 1% of the reference white. At
what density should the object in (a) be printed on the reflection print?
(Hint: The viewing flare is added to whatever light is reflected from the print. The
density of a reflection print is measured with a 0/45 geometry and thus is not affected
by the viewing flare.)
17.2 The maximum luminance of a bright outdoor scene is 30 000 cd m−2 . We want to
make a transparency to be projected on a screen and viewed in a dark room. The
maximum luminance produced by the projector on the screen when no film is present
is 500 cd m−2 . The transparency (slide) film has a maximum density of 3.4 and a
17.10 Problems 441
2 Since flare is never uniform, these code values are still useful in areas not affected by flare.
18 Color reproduction
18.1 Introduction
In a typical natural scene, the reflected spectral radiance at each surface element of an
object is determined by the spectral reflectance factor of the surface and the spectral power
distribution of the illumination. Therefore, there are very rich varieties of spectral distribu-
tions that produce colorful images in our visual perception. In a reproduced image, either
hard-copy or soft-copy, we cannot create colors using such rich varieties of spectral com-
positions. Fortunately, due to the trichromatic nature of our color vision, we can reproduce
most colors by using mixtures of a few (≥3) primary colors. In general, colors can be re-
produced either by using self-emitting light (e.g. in CRT monitors), or by using reflected or
442
18.3 Objectives of color reproduction 443
transmitted light (e.g. in paintings, prints, and slides). If self-emitting light is used, different
colors are often produced by additively mixing the chosen primaries, which is similar to
what is done in color matching experiments. For example, on a CRT monitor, very small
dots of red, green, and blue phosphors are coated on the screen. When the phosphors are
excited by more electron current, they generate more red, green, and blue light. Since the
phosphor dots are visually unresolvable at a typical viewing distance, the red, green, and
blue light is additively mixed (or blurred) on our retina. This type of color reproduction is
called additive color reproduction.
When reflected or transmitted light is used for color reproduction, the light sources are
often broad-band sources that contain radiant power in all visible wavelengths. Typically
these sources appear “white” or nearly so when they are seen directly. In order to produce
colors, the spectral composition of the source has to be modified, usually by filtering out
some parts of the spectrum more strongly than other parts. For example, if a red color is to
be produced, the middle-wavelength and the short-wavelength parts of the source spectrum
are often filtered out or subtracted out. Color reproduction by subtracting light energy
from a source spectrum is called subtractive color reproduction, and is the basic working
principle underlying the majority of color imaging applications. For example, a photographic
reflection print uses a cyan dye that absorbs (i.e., subtracts) the long-wavelength light (≈600–
700 nm), a magenta dye that absorbs the middle-wavelength light (≈500–600 nm), and a
yellow dye that absorbs the short-wavelength light (≈400–500 nm). Light illuminating a
reflection print goes through the dyes, is reflected from the diffuse reflection layer coated
on the paper support, and goes through the dyes again before passing through the air and
entering our eyes. If we want to produce a bright red color, we deposit on the paper no cyan
dye, but a lot of magenta and yellow dyes, which absorb most of the light from 400 nm to
600 nm. Similarly, if we want to produce a bright green color, we deposit no magenta dye,
but a lot of cyan and yellow dyes, on the paper.
In principle, we can also use additive color reproduction for reflected or transmitted light.
For example, if we can lay down nonoverlapping, opaque dots of red, green, and blue inks
on paper, color images can be reproduced in a manner similar to that in a CRT monitor.
Colors can be varied by changing the relative dot sizes. However, such a scheme has one
major difficulty: small dots are difficult to lay down without overlapping. Therefore, most
color reproduction for reflected or transmitted light is based on the subtractive method.
As in our discussion of the objectives of tone reproduction, the success of any color re-
production is finally judged by human observers, therefore, it is convenient to also divide
any color reproduction into three processes: (1) the subjective color reproduction process
that specifies what a desired reproduction should be in terms of visual impression, (2) the
psychophysical (translation) process that converts the perceptual criteria as specified in the
subjective process into physically quantifiable criteria, and (3) the objective process that
deals with calibrating and controlling image devices to achieve the desired reproduction
in terms of physical quantities. The major difference between tone and color reproduction
444 18 Color reproduction
objectives is that the former is univariate, while the latter is multivariate. Color reproduction
is thus more complicated and even less understood than tone reproduction. Fortunately, we
are not as sensitive to deviation from the desired color reproduction as we are to deviation
in tone reproduction. In spite of the added complexity, the criteria and objectives for color
reproduction can be phrased in a similar list to those for tone reproduction: (a) reproducing
the absolute color stimuli (i.e., either the absolute radiant spectral powers, or the absolute
colorimetric quantities), (b) reproducing the relative color stimuli (i.e., either the relative
radiant spectral powers, or the relative colorimetric quantities), (c) reproducing the color ap-
pearances, (d) reproducing color contrasts, (e) reproducing maximum visible color details,
and (f) reproducing the preferred colors.
Objectives (a) and (b) do not involve any subjective color reproduction process and
therefore are self-evident in what they are trying to achieve. Whether they are practical or
desirable depends on the particular application. For example, in copying a reflection color
print to the same type of material (including the same dye set or inks), it is both desirable
and practical to reproduce the exact spectral compositions in the original. When the media
or inks are different, reproducing the relative colorimetric quantities may be both desirable
and practical too. However, when the originals (say, natural scenes) and the reproductions
(say, photographic prints) are very different in dynamic range, color gamut, and spectral
characteristics, reproduction objectives such as (a) and (b) are no longer practical and often
not desirable either.
The objectives (d) and (e) are achievable when (a) and (b) are achievable. The interesting
cases are when (a) and (b) are not practical or desirable. We can try to achieve (d) and (e)
using methods similar to the histogram modification methods in tone reproduction. There
are few (if any) serious studies of these two objectives in the literature and we will not
discuss them further here. Objectives (c) and (f ) have attracted more attention in the past
and some of the ideas about them are presented below.
due to some ideal prototype colors that we prefer to see. For example, we may prefer to see
our body shape as more like that of a fashion model, although we do not have the illusion
that that shape is what we remember of our bodies. In any case, an ideal objective of color
reproduction is to reproduce all colors as we prefer to see them.
This objective is difficult to achieve not only because its goal is ambiguous, but also
because it is not easy to get good data to aim for. Some of the difficulties are:
1. Experiments only varied the selected colors without changing others. This type of color
manipulation is not globally consistent and therefore the results may vary with the color
gamut of the reproduction media and the viewing conditions.
2. The physical difference in skin colors for different peoples is relatively large, as the
following measurement data (forehead/cheek) [246, 759, 1055] show:
African: L ∗ = 37.6 ± 1.3, a ∗ = 6.9 ± 1.4, b∗ = 10.7 ± 2.3;
Arabian: L ∗ = 61.5 ± 2.3, a ∗ = 5.6 ± 1.1, b∗ = 17.3 ± 1.8;
Caucasian: L ∗ = 66.3 ± 2.8, a ∗ = 11.2 ± 0.9, b∗ = 12.3 ± 1.8;
Japanese: L ∗ = 60.7 ± 4.37, a ∗ = 10.8 ± 2.36, b∗ = 17.1 ± 2.19;
Vietnamese: L ∗ = 65.1 ± 3.1, a ∗ = 5.4 ± 0.8, b∗ = 15.4 ± 1.1.
3. The preferred color reproduction varies with culture. For example, it has been shown
that the preferred skin colors are different in different regions of the world, and possibly
at different times. Ancient people preferred lighter skin color (a sign of being rich
enough not to have to work out in the sun), while modern people prefer a suntanned
skin color (a sign of being rich enough to spend time on the beach). Certain regions
of the world prefer a more cyan-blue (cool) color balance, while there are others that
prefer a more red–yellow warm color balance.
Bartleson’s study [73] on flesh color reproduction showed that the preferred (Caucasian)
flesh color reproduction is close to the mean memory color (x,y) = (0.3548, 0.3441) under
Illuminant C, which is significantly different from the average natural flesh color, located
at about (x,y) = (0.3786, 0.3463) under Illuminant C. The preferred flesh reproduction
(about Munsell 7.5YR6.3/3.3) is yellower, lighter, and less saturated than the actual average
flesh color (about Munsell 1.5YR5.8/4.3). However, that study was done in the 1950s and
may not be representative today. The preferred skin color reproduction also varies with
races. Assuming D55, the preferred skin colors in CIELAB are approximately (a ∗ ,b∗ ) =
(19–27, 21–31) for the Japanese, (16–25, 14–23) for Caucasians, and (20–32, 24–37) for
blacks. It is also observed that there are considerable variations between individuals. All
races seem to like to have skin tone reproduced yellower than it actually is. The empiri-
cal observations in preferred color reproduction studies [74, 428] can be summarized as
follows:
1. The preferred skin color in reflection prints is more orange–yellow than the actual
average skin color. The preferred skin color is quite close to the memory skin color.
Reproducing the actual average skin chromaticity on a reflection print is only marginally
acceptable. The preferred skin color (luminance factor 39) is also slightly lighter than
the average skin (luminance factor 37).
446 18 Color reproduction
2. The preferred grass color (luminance factor 27) is slightly yellower and lighter than
average grass samples (luminance factor 13, 5.6GY/1-4). The preferred grass color is
more yellow–green than the memory grass color.
3. The preferred sky color (luminance factor 30) is about the same dominant wavelength
as the actual sky, but with higher excitation purity. It is more purple–blue than the
memory sky color.
In most applications, the objective of color reproduction is mainly to reproduce the color
appearance. We have to understand how our perceived color appearances are related to the
physical stimuli. Through many elegant demonstrations by numerous color scientists in the
past, it has been firmly established that there is no one-to-one correspondence between a
physical stimulus and its perceived color. A given light can appear to be red or orange or
black or white in color, depending on what else is present in our visual field at that time and
what we had been looking at in the previous few minutes or even much longer. (The famous
McCollough effect [54, p. 183] can last for a few hours or more.) Color appearance is
thus a function of spatial, temporal, as well as spectral variables. Currently we only have
a few very limited ideas of how this function behaves. Most of these ideas came from
extensive observations and systematic studies under controlled viewing conditions. It is not
surprising that they are applicable only in some situations and they are not easy to generalize.
However, they currently constitute the bulk of guidelines for color reproduction, and, in a
few applications, they are very successful in achieving the objective of reproducing color
appearance. In a very real sense, all sciences follow the same path, some further along than
others, but none can claim to have the ultimate truth.
The three major factors that affect the color appearance of an image are: (1) the state of
adaptation (chromatic as well as achromatic) of the eye(s), (2) the surround of the image,
and (3) the method of presentation. The major efforts of color science research have been
to quantify these three factors and their effects on color appearance.
eye is adapted to D65 daylight illumination, light with the CIE tristimulus values (X 1 , Y1 , Z 1 )
appears to be a certain purple color P. If now, the eye is adapted to a tungsten illumination,
say the CIE illuminant A, and suppose that light with tristimulus values (X 2 , Y2 , Z 2 ) looks
exactly like the same purple color P, then, (X 2 , Y2 , Z 2 ) is said to be the Illuminant A’s
corresponding tristimulus values of the D65 ’s (X 1 , Y1 , Z 1 ). Hunt [425, 426] and Burnham,
Evans and Newhall [148] determined many such pairs by matching colors between observers
who were adapted to different illuminations. It was found that a linear transformation could
fit the data very well. For example, the following transformation mapped Illuminant C’s
tristimulus values (X c , Yc , Z c ) to the corresponding values (X a , Ya , Z a ) under Illuminant A
adaptation:
The chromaticity of the Illuminant A used was (0.4475, 0.4084), and that of the Illuminant
C was (0.3125, 0.3343). Under both illuminants, the illuminance of the white surrounds was
set to 25 foot lamberts (i.e., 85.65 cd m−2 ). Using the linear transformation, the white surface
under Illuminant C, (X c , Yc , Z c ) = (23.3697, 25.0, 26.4134) was mapped to (X a , Ya , Z a ) =
(26.6476, 23.5446, 9.3975), which has a luminance smaller than 25 foot lamberts and a
chromaticity of (0.4472, 0.3951), slightly different from that of the Illuminant A. Assuming
that the difference was real, i.e., larger than the regression error (in this case, this assumption
is only marginally true), the same white object looked a little brighter and a little more
yellow under tungsten lighting. The phenomenon that a white object takes on the hue of the
illuminant is known as the Helson–Judd effect [389, 390, 475], as we mentioned before. (This
shows that color constancy is not strictly true.) The Helson–Judd effect also predicts that a
dark object will take on a complementary hue, but this is not predicted exactly by the above
linear transformation. For example, (X c , Yc , Z c ) = (0.9348, 1.0000, 1.0565) is mapped to
(X a , Ya , Z a ) = (1.0682, 0.9442, 0.3741) which has a luminance of 0.9442 ft lamberts and
a chromaticity of (0.4476, 0.3956), not too different from the corresponding chromaticity
for the white object. Evidence seems to indicate that linear transformations cannot fully
describe the effect of adaptation [79, 432, 475], but they are good approximations if the
correct coefficients can be determined [138].
There has been extensive experimental and modeling work on the effect of chromatic
adaptation [81, 138, 291, 292, 432, 698, 998]. The chromatic adaptation transforms in the
two color appearance models, CIECAM97s and CIECAM02, recommended by the CIE
provide us with some standard models for color imaging applications. We will study these
two models shortly. However, it should be pointed out that current color appearance models
are empirical in nature, do not account for the spatial processing involved in color vision,
and the model parameters have to be tuned by trial and error. This unsatisfactory status will
hopefully be improved by developing new models that are more based on computational
theory and algorithmic design.
Figure 18.1 shows how a typical chromatic adaptation model is applied to find the
corresponding tristimulus values from one illuminant to another. The first step is to transform
448 18 Color reproduction
3×3 L1 M1 S1 chromatic
X1 Y1 Z1 matrix adaptation
first viewing
condition model
La Ma Sa
adapted or normalized
viewing condition
L2 M2 S2 chromatic
3×3 adaptation
X2 Y 2 Z 2
matrix second viewing
model
condition
Figure 18.1. Computation of corresponding tristimulus values using a chromatic adaptation model.
the CIEXYZ to the LMS cone excitation space. This step is necessary because it is believed
that the nonlinear behavior of cone adaptation can be much more accurately modeled in
the cone excitation space. The chromatic adaptation model is then applied to determine the
corresponding cone excitations under a different illuminant. Many chromatic adaptation
experiments using simple color images showed that in general the von Kries model works
reasonably well [79, 132, 133, 138]. One of the surprising findings in Breneman’s study
[138] of chromatic adaptation using complex images is that the simple von Kries model
(described in Section 13.3.2) can account for most of the effect, but the scaling coefficients
have to be based on the matching grays, not reference whites, under the two illuminants [138]
. Figure 18.2 shows a comparison of von Kries’ model predictions based on normalizing
to the physical white (Fig. 18.2(a)) and normalizing to the perceived gray (Fig. 18.2(b)).
His data showed that the chromaticity coordinates of the perceived gray are different from
those of the illuminant. It is clear that the prediction is much more accurate if the von
Kries normalization is based on the matching grays (Fig. 18.2(b)) under the two different
illuminants.
A reproduction presented under low illuminance gives us a very different visual impres-
sion than under high illuminance. The following three effects are well known:
1. Under chromatic illumination (chromaticity very different from D55 ), when an observer
is presented with a series of nonselective reflectance samples, the high-reflectance
samples appear to take on the color of the illuminant and the low-reflectance samples
appear to have a hue complementary to the illuminant. Only the medium-reflectance
samples appear neutral gray.1 This is called the Helson–Judd effect [389, 390, 475].
2. Light of a given chromaticity appears to be more colorful when its luminance is in-
creased. This is called the Hunt effect [425, 426].
3. The perceived scene contrast increases with luminance. This is called the Stevens effect
[906].
1 This empirical observation seems to contradict the data from Breneman’s experiments, in which complex
images were used. It is possible that the observers in the two experiments might be answering two different types
of question [30]. This shows that our visual perception is much richer than some simple experimental protocols
can specify.
18.4 Psychophysical considerations 449
0.55
0.45
v′
0.35
0.25
0.15
0.0 0.1 0.2 0.3 0.4 0.5
(a) u′
0.55
0.45
v′
0.35
0.25
0.15
0.0 0.1 0.2 0.3 0.4 0.5
(b) u′
Figure 18.2. Comparison of von Kries’ model predictions, using Breneman’s experimental data [138]:
(a) panel is based on normalizing with respect to the illuminant, and (b) with respect to gray. The
luminance level was kept constant at 1500 cd m−2 . The solid circles are the chromaticities under
illuminant A. The open squares are the matching chromaticities under D65 as set by the observers.
The open triangles are the model predictions. The prediction errors are the distances between the open
squares and the open triangles.
The American National Standards Institute (ANSI) ANSI PH2.30-1989 specifies the
standard viewing conditions for graphic arts, with regard to color prints, transparencies, and
photomechanical reproductions. The light source should have a spectral power distribution
of D50 (or close to it) with CIE (1931) chromaticity coordinates of (0.3457, 0.3586). The
surround should be a neutral light gray of Munsell N8/ (60% reflectance factor) with matte
finish. The illuminance on the color prints and the photomechanical reproductions should
be 2200 ± 470 lx, and the luminance for the transparencies (direct viewing or projection)
should be 1400 ± 300 cd m−2 . The luminance of the projected image is measured from
50 cm away from the screen which should have a spatial resolution of at least 40 cycles per
millimeter.
The following recommendation by some GATF publications (GATFWORLD, Volume 2,
Issue 6, p. 6, 1990) shows how primitive our state of understanding is:
In the initial evaluation stage, when materials are being selected for reproduction, judgments should
be made under the point-of-use illuminating conditions. During the reproduction process, however,
standard viewing conditions should be used to improve the consistency and communication between
all the parties involved.
It is not clear how we can compensate for the point-of-use illuminating conditions after
we have done all the color reproduction process under the standard viewing conditions.
Presumably, if the point-of-use illuminating conditions are not convenient to set up for
a color reproduction process, using standardized viewing conditions will at least prevent
inconsistent conclusions from different parties.
In most color imaging applications, once the imaging chain (from capture to display) has
been calibrated for a standard imaging and viewing condition, the first order problem in
color reproduction is to reproduce the neutral colors correctly for each input image. This is
called the white balance problem in consumer electronic imaging (video camcorders, digital
cameras, desktop scanners, etc.). A similar, but more general, problem in photography and
18.5 Color balance 451
graphic arts is called the color balance problem, which tends to also include adjusting some
color interaction to make all the colors look right.
The factors that cause a calibrated system to give poor color reproduction vary from
system to system, but they can be classified into three main categories:
1. Variations in scene illumination. A calibrated system expects a neutral color to be
represented by certain numerical relations in each color channel. For example, the red,
green, and blue gray levels of a neutral color are expected to be the same for a calibrated
color monitor. The relative gains in the three color channels have to be adjusted in the
captured images or in the image capturing devices for different scene illuminations. For
example, a tungsten light source has much less spectral power in the short-wavelength
region than a daylight source. Therefore, the blue channel gain has to be higher when
the image is taken under a tungsten illumination than when it is taken under a daylight
illumination. Digital cameras attempt to make such adjustments in the image capturing
process, while photographic systems tend to correct the color bias at the printing stage.
2. Variations in image capture processes. Imaging system characteristics often change
with temperature, usage, exposure time, aperture size, and luminance levels. These
variations can introduce uncontrolled color variations in the image capture stage. For
example, underexposed films have lower sensitometrical contrast and underexposed
digital camera images exhibit clipping and quantization. These affect color reproduction
differently depending on which color channel has the lowest exposure. Other factors
are random in nature. CCD sensor characteristics depend on the operating temperature
and film sensitometry depends on chemical processing. These factors vary in a more
or less uncontrolled manner.
3. Variations in image display processes. The characteristics of display systems and print-
ers are also subject to change with the physical environment and the usage parame-
ters. The output color of an inkjet printer can be affected by the quality of the paper
and the ink. Photographic paper processing can change from day to day and batch to
batch. The characteristics of a color monitor can be affected by the temperature and
the electromagnetic fields in the surrounds. These can all cause shifts in reproduced
colors.
For most systems, the dominating factor is the variations in scene illumination. This will
be discussed here. It should be noted that the color balance problem is related to the “color
constancy” problem in human color vision. The objective is to study how the color variations
due to changing illumination can be discounted or compensated for. The fact that our visual
system can perceive and discount the illumination variations serves as an existence proof
that the problem should be solvable to a large extent. However, as we will see, we are not
anywhere near this yet.
where E r , E g , and E b are the image irradiances, L(λ, x, y) is the spectral power distribution
of the illuminant, ρ(λ, x, y) is the spectral reflectance factor of the object point correspond-
ing to the image location (x, y), and Sr (λ), Sg (λ), and Sb (λ) are the red, green, and blue
system spectral responsivity functions. The geometric factor N (θ, φ, x, y), accounts for the
effects of f -number, surface orientation, light source direction, object distance, etc. This
system of equations is not completely general (for example, the spectral and the geometrical
factors in general are not separable), but is sufficient for our discussion here.
From this set of equations, one can say that the most general problem of color bal-
ance is that, given a color image (E r (x, y), E g (x, y), E b (x, y)), one would like to estimate
L(λ, x, y) and ρ(λ, x, y). We will call this type of problem formulation the (illuminant)
spectra estimation problem. Clearly, the number of unknowns in this formulation far ex-
ceeds the number of measured data, and the problem is often solved by assuming that L(λ)
and ρ(λ) are linear combinations of a small number of basis vectors. Such an approach is
called the linear model approach.
The color balance problem can and should also be looked at from the utility point of
view. The practical purpose of solving the color balance problem is to use the solution to
correct the color reproduction of an image which was taken under an unknown illuminant
different from the one we will use to view the image, or the one we assume to be the
standard illuminant. General color correction can be done through various means of different
complexity. For example, scanners often use 3-D LUTs, while digital cameras use 3 × 3
matrices. However, in correcting for color balance error, 3 × 3 matrices are probably as
complicated as is practicable. Therefore, one can formulate the color balance problem as
estimating the optimal 3 × 3 matrix that will transform the given color image into one
that has the best desired color reproduction according to the user’s objective. This problem
formulation is called the (illuminant) matrix estimation problem. One series of the most
critical colors to be balanced correctly is the neutral series (black, gray, to white). Therefore,
the estimated matrix should transform the neutral colors under the unknown illuminant to
the neutral colors under the desired illuminant.
A simpler formulation of the problem is to estimate only the three channel responses
(Rn , G n , Bn ) corresponding to a neutral surface of which ρ(λ) = c:
It is usually possible to estimate only the ratios between Rn , G n , and Bn because the im-
age irradiances are scaled by sensor sensitivity, lens transmittance, and other unknown
factors, all of which are lumped into the factor k. We will call this type of problem formula-
tion, the (illuminant) chromaticity estimation problem, because the objective is to estimate
the illuminant chromaticity, such as (Rn /G n , Bn /G n ). Even for this simplified problem,
it is still fairly easy to see that without additional information or constraints, there is no
unique solution to this problem. For example, if we multiply L(λ) by a positive function
β(λ) and divide all ρ(λ) by the same function β(λ), we get exactly the same input image
(E r (x, y), E g (x, y), E b (x, y)), but now the correct answer for (Rn , G n , Bn ) becomes
illuminant with the previously compiled standard color gamut under a standard
illuminant, we can compute all the possible linear transforms that can map the
input color gamut into the standard gamut. From the feasible set of transforms,
we can use some criterion to select the transform that best accounts for the gamut
distortion [311, 315, 322].
r Neutral interface reflection (specular highlight)
The interface reflection component of an inhomogeneous surface has the same spectral
composition as the illuminant. Therefore, the chromaticity loci of various surfaces
will converge at the illuminant chromaticity [505, 560, 934].
r Natural light sources
The chromaticity loci of natural light sources are located near those of blackbody
radiators [244, 312, 315, 561].
r Eigenvector (linear) models of sources and reflectances:
It is known that natural daylight spectra can be well approximated by one mean vector
and two principal components [476]. It is also known that spectral reflectance
functions of common materials and Munsell color chips can be well approximated
by one mean vector and 2–6 principal components [200, 630]. Therefore, the
illuminant spectral estimation problem can be cast as a problem for estimating
the few coefficients of the dominant principal components [270, 629, 825, 939].
In addition, the distribution of the coefficients has a definite structure that can be
used to reduce the number of unknowns for estimation [254, 255].
r Machine-learning models:
If we collect a very large sample of color images taken under various illuminants, we
can use them to train a learning machine, such as a neural network, to do the
illuminant estimation [332].
In addition to the average density, other simple features such as the minimum density, the
maximum density, and various other combinations are used in the regression optimization
of the algorithm performance [206, 472, 945, 972]. As memory devices become cheaper
and computer processors become more and more powerful, algorithms are designed to be
more intelligent in an attempt to recognize objects and scene types in the images and adjust
color balance accordingly. For example, detecting faces and skins [826, 932] in the images
can be used to help produce a pleasing skin tone or correct for the overall lightness of the
print. Detection of sky, backlit, flash, snow, or beach scenes allows the color and density
balance algorithm to adjust its estimated correction, depending on the scene types.
a great impact on the research on color constancy because it emphasizes the importance
of edges in color perception, it insists on using the reflectance ratio (not the tristimulus
values) as the major predictor for perceived lightness and color, and it pioneers the use of a
computational algorithm for predicting visual perception. As a result, it has stimulated many
other research projects that eventually go well beyond its original, oversimplified operational
description [434, 889]. Many algorithms today claim to be better implementations of retinex
theory, but in reality, there have been few experiments [648, 650] aiming at verifying the
most important claim in the theory that normalization to the maximum channel reflectance
is “the” operation for color constancy.
eight-bit precision, i.e., R,G,B = 0, 1, . . . , 255. For any color image captured by the device,
it turns out that many of the (R,G,B) values will not occur. The total volume of the (R,G,B)
values that occur in an image is called the color gamut of that image. The color gamut of
one image is often different from that of another image for various reasons. For example,
an image of a forest scene may not have saturated red or blue colors, while an image of
Christmas decorations may have a much larger selection of colors. The color gamut can also
differ because of the illumination. If an image is taken under tungsten lighting, the whole
color gamut will be shifted in the red–orange direction. This color shift occurs for every
object in the scene. Assuming that a scene contains a sufficient number of colored objects,
intuitively one should be able to make some estimate of the illuminant color by comparing
the color gamut of the input image with the canonical gamut2 under a reference illuminant,
say D65. Under tungsten lighting, even the most bluish color cannot have a very high blue
value. This is the basic idea behind the gamut mapping algorithm for color constancy [322].
Clearly, there are two obvious difficulties with this approach: one is the statistical validity
and the other is the computational cost. If the color gamut of an input image does not have
any pixels that have high blue values, is it because there was no bluish object in the scene
or is it because the scene was illuminated by tungsten lamps? This type of ambiguity is
not unique to this approach, but is inherent in the color balance problem because of its
underdetermined nature. In the gamut mapping algorithms, this ambiguity translates into a
large set of illuminants that are all compatible with the input color gamut. For linear imaging
sensors, the color gamut of an image can be shown to be a convex set. In order to speed up
the computation, the input gamut and the canonical gamut are both approximated by their
convex hulls and the mapping between the two gamuts is checked only for the vertices of
the two polyhedrons. The set of all the linear transforms that can map all the input vertices
into the canonical gamut is called the feasible set. We then need a selection criterion to
choose the best mapping from the feasible set. For example, we can choose the mapping
that maximizes the volume of the transformed gamut or the one that is closest to the natural
daylight locus. The question is: how well does it work in practice? As one would expect,
the performance of the algorithm is quite sensitive to the choice of the canonical gamut and
the criteria it uses to select the best guess from the feasible set. What is surprising is that,
from the limited experiments reported, the algorithms can work reasonably well [57, 955].
A different implementation of the gamut mapping idea is to cast the problem into one
of illuminant classification [315, 955]. If we compare the input gamut with the gamuts of
a set of illuminants we can choose the illuminant which generates the gamut that is most
similar to the input gamut. The gamut comparison can be done very quickly by segmenting
the input color image into a few tens of color regions and computing the convex inclusion
relation only on those few tens of colors.
2 Unfortunately, there are no safe rules as to how this canonical gamut can be constructed. In practice, it is
usually synthesized by whatever color samples are available to the implementer of the algorithm.
18.6 Color appearance models 459
as the illuminant changes, the color distribution is shifted and skewed, as we discussed in
the gamut mapping algorithms. If we collect the various color distributions under different
illuminants, we can use the Bayesian rule to estimate the likelihood that an input image
was taken under any illuminant and choose the most likely illuminant as an estimate of the
scene illuminant [133]. This is called the Bayesian estimation approach.
A very interesting way of implementing the Bayesian illuminant estimation approach is
by a framework called color by correlation [313]. The first thing to notice is that for color
balance applications, it seems sufficient to partition natural light sources into about 10–20
different illuminants, depending on the precision required for the estimate. For each of these
illuminants, many color images can be taken to compile the probability distribution that a
color will occur under an illuminant. A correlation matrix is then put together, in which each
column represents one illuminant and each row represents a color. A given element in the
correlation matrix thus contains the probability of occurrence of the color represented by
that row and under the illuminant represented by that column. An input image is represented
as a column vector, with the ith element representing the same color as represented by the
ith row in the correlation matrix. If a color is present in the image, the element is set to
1, otherwise, it is set to 0. The input column vector is then correlated with each column
of the correlation matrix (element-by-element multiplication and summation) to derive the
likelihood of the illuminant corresponding to the column. The operation basically adds up
all the probabilities of the colors present in the image. The most likely illuminant is chosen
as the illuminant estimate. This color-by-correlation algorihm is very fast and is claimed
to be quite accurate. It is very important to note that each color present in an image is only
counted once and therefore, this method is not sensitive to large areas of dominant color. A
threshold is imposed so that an accidental color pixel will not cause the probability of that
color to be added.
One of the most intuitive objectives in color reproduction is to reproduce the color ap-
pearance. The difficulty of course lies in computing the color appearance from the original
and predicting it in the reproduction. An accurate computational theory or algorithm for
computing color appearance requires a thorough understanding of how our color percep-
tion works. Undoubtedly, this will involve spatial and temporal processing, in addition to
spectral processing. Since this is not achievable in the foreseeable future, even an over-
simplified color appearance model for uniform color patches in a uniform background is
useful for some applications. Several color appearance models for such applications have
been proposed and tested. They are well described and compared in the literature (e.g., see
[292]).
According to CIE Technical Committee 1-34 (TC1-34), a color appearance model is
any model that includes predictors of at least the relative color appearance attributes of
lightness, chroma, and hue [292, p. 217]. Since changes in illumination alter most quantities
from physical measurements, a basic requirement of any color appearance model is to
account for the chromatic adaptation. From this definition, CIELAB is a color appearance
460 18 Color reproduction
model, although a very simple one. Its lightness predictor is L ∗ . It also has the hue and
chroma predictors. It accounts for the chromatic adaptation by normalizing all CIEXYZ
values by those of the reference white (X n , Yn , Z n ). This is one of the major weaknesses
in CIELAB because the normalization is based on the CIE tristimulus values XYZ, which
are an arbitrary linear combination of the LMS cone spectral sensitivities. A reasonable
chromatic adaptation model should be based directly on the cone spectral sensitivities
instead. Furthermore, psychophysical data show that the S cones seem to behave differently
from the L and M cones and therefore require a different type of gain adjustment to account
for the chromatic adaptation data. The other major deficiency in using CIELAB as a color
appearance model is that it does not account for the effect of the viewing surround.
surround
6
proximal
field background
26
stimulus
Figure 18.3. Specification of the visual field for a color appearance model.
18.6.3 CIECAM97s
Due to popular demand from industry and the universities, CIE planned to recommend
a complete version (CIECAMc) and a simple version (CIECAMs) of an interim color
appearance model. The complete model has not yet appeared. We will here briefly describe
the simple model, i.e., the CIE 1997 Interim Color Appearance Model (Simple Version),
abbreviated to CIECAM97s [193]. A more detailed description of the model can be found
in [293, 431, Chapter 12]. Color appearance modeling is still an active research area and
constant consultation of the color research literature is necessary for anyone planning to
work on related problems. In 2003, CIE was still working to approve a draft revised version
of its “new” color appearance model called CIECAM02 [676]. However, an even more
sophisticated color appearance model for imaging applications is being developed to account
for spatial frequency and simultaneous contrast effects [295]. This is definitely a good
462 18 Color reproduction
direction in which to go and we can expect a better model to be available for future color
imaging applications.
Before we present the CIECAM97s model, we should give a warning for engineers and
scientists who are used to working on firm theoretical foundations. The color appearance
model to be discussed will strike you as a mess of empirical equations, with no justification
whatsoever. If that is how you feel, again you are certainly not alone. However, these
empirical equations did not “come from nowhere”. They represent a justifiable effort to
summarize a huge amount of experimental data with a few equations. They do not look
trustworthy only because it would have taken a few hundred pages to list all the relevant
experimental data and to explain how the equations were derived to fit those data. In a sense,
this is a desperate effort to make those data useful for people who do not have the time to
go through the same derivation process, i.e., for those people who simply want to have a
blackbox machine to compute the physical correlates of some of the important perceptual
judgments, such as brightness, lightness, hue, chroma, and colorfulness. If you do not have
any use for such a blackbox machine, you should feel free to skip this section. However, if
you do work in a color imaging related field, you may encounter discussions on this subject
from time to time. It is probably useful at least to know what it is trying to do and get
some idea about how it works. We need to treat it as work in progress and expect, that as we
understand more about how our color vision works, the models will be changed accordingly.
The input data to the model include:
1. the photopic luminance of the adapting field, L A (cd m−2 );
2. the tristimulus values of the test sample in the source conditions, X Y Z ;
3. the tristimulus values of the source white in the source conditions, X w Yw Z w ;
4. the relative luminance of the source background in the source conditions, Yb ;
5. the impact of background, c;
6. the chromatic induction factor, Nc ;
7. the lightness contrast factor, FLL ;
8. the degree of adaptation, F.
Some of the recommended parameters are given in Table 18.1
The first step is to account for chromatic adaptation. The reference viewing illuminant
is chosen to be the equal-energy white. All color appearance attributes are formulated in
the reference conditions. The tristimulus values in the application are first transformed into
an approximation of cone responses. These responses are then further transformed to the
reference viewing conditions by the Bradford chromatic adaptation model. The transfor-
mation from CIEXYZ to some intermediate normalized responses R, G, B (in which the
Table 18.1. Recommended values for some parameters in the CIECAM97s model
simple chromatic adaptation model was found to work well) is given by:
R 0.8951 0.2664 −0.1614 X/Y X/Y
G = −0.7502 1.7135 0.0367 Y /Y = MB Y /Y . (18.13)
B 0.0389 −0.0685 1.0296 Z /Y Z /Y
Note that each row of the transformation matrix, MB , sums up to 1. For the CIE tristimu-
lus values (X ref , Yref , Z ref ) of the equal-energy reference illuminant, X ref = Yref = Z ref and
therefore Rref = G ref = Bref = 1.
The normalized responses R, G, B are then transformed into the corresponding quanti-
ties Rc , G c , Bc under the reference conditions by the following chromatic adaptation model:
Brightness, Q:
0.67
1.24 J
Q= (Aw + 3)0.9 . (18.36)
c 100
Saturation, s:
50(a 2 + b2 )1/2 100e( 10 )Nc Ncb
s= 13
. (18.37)
Ra + G a + ( 21
20
)Ba
Chroma, C:
0.67n
J
C = 2.44s 0.69 (1.64 − 0.29n ). (18.38)
100
Colorfulness, M:
M = C FL0.15 . (18.39)
As can be seen in the equations listed above, the computation of CIECAM97s involves many
nonlinear functions. Furthermore, the inverse transform from the perceptual attributes to
the CIEXYZ is even more complicated. For industrial applications, complex computations
are often avoided and, as a result, CIECAM97s has not been as widely used as it was hoped.
This prompted the CIE to simplify the model further and the revised color appearance model
called CIECAM02 was proposed.
Viewing conditions c Nc F
and the responses under reference conditions, Rc , G c , Bc , are now transformed into some
approximation of cone responses, R , G , B as follows:
R 0.389 71 0.688 98 −0.078 68 Rc
G = −0.229 81 1.183 40 0.046 41 MCAT02
−1 Gc (18.50)
B 0.0 0.0 1.0 Bc
Rc
= MH MCAT02 G c .
−1 (18.51)
Bc
400(FL |R |/100)0.42
Ra = + 0.1, (18.52)
(FL |R |/100)0.42 + 27.13
400(FL |G |/100)0.42
G a = + 0.1, (18.53)
(FL |G |/100)0.42 + 27.13
400(FL |B |/100)0.42
Ba = + 0.1. (18.54)
(FL |B |/100)0.42 + 27.13
18.6 Color appearance models 467
The redness–greenness response, a, the yellowness–blueness response, b, and the hue angle,
h, are computed as follows:
12 1
a = Ra − G a + Ba , (18.55)
11 11
1 & '
b= Ra + G a − 2Ba , (18.56)
9
b
h = tan−1 [degree]. (18.57)
a
Hue quadrature, H , is calculated from the following unique hue data through linear
interpolation between the following values for the unique hues:
r red: h = 20.14◦ , e = 0.8, H = 0 or 400;
r yellow: h = 90.0◦ , e = 0.7, H = 100;
r green: h = 164.25◦ , e = 1.0, H = 200; and
r blue: h = 237.53◦ , e = 1.2, H = 300.
Chroma, C:
√
C = 0.1t 0.9 J (1.64 − 0.29n )0.73 . (18.64)
Colorfulness, M:
M = C FL0.25 . (18.65)
Saturation, s:
0.5
M
s = 100 . (18.66)
Q
Values computed can be quite different for CIECAM97s and CIECAM02, especially the
chroma value C and the hue H . While the lightness, J , has a nominal range of 100, the scale
of brightness, Q, is more arbitrary and the large brightness difference in the two models is
only a matter of scale. Initial field trials seem to show that CIECAM02 is significantly more
accurate and consistent, in addition to being simpler.
In developing his color order system, Ostwald defined his full colors (most-saturated colors)
as those generated from spectral reflectance curves that have either 0% or 100% reflectance
over the visible wavelengths. Schrödinger [842] and then MacAdam [617] showed that this
type of spectral reflectance function indeed produces the maximum attainable excitation
purity for a given luminance reflectance factor and dominant wavelength, provided that the
spectral reflectance factors are restricted to between 0% and 100%. Ostwald’s full colors are
now called the optimal colors. It is also known that the spectral reflectance factor function
of an optimal color has one or at most two transitions in the visible wavelength range.
Therefore, there are four types of spectral reflectance factor function [441, 842] as shown
in Fig. 18.4. The set of all optimal colors constitutes the theoretical boundary of the volume
of colors reproducible by all object surfaces that reflect light diffusely and do not produce
fluorescence or other nonlinear optical energy conversion. In the USA, this theoretical
1.0 1.0
1.0 1.0
Figure 18.4. The four types of spectral reflectance function for optimal colors.
18.7 Theoretical color gamut 469
0.9
0.8
20
30
10 40
0.7 50
60
70
0.6 80
0.5
90
95
y
0.4
D65
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
x
Figure 18.5. Chromaticity loci of optimal color stimuli, under D65 , as a function of luminance level
Y = 10, 20, 30, . . . , 80, 90, 95. These chromaticity contours are computed with the wavelength
resolution interpolated to 0.1 nm.
boundary of color volume is often referred to as the MacAdam limits. Figure 18.5 shows
the chromaticity loci of optimal color stimuli under D65 illuminant.
It is not difficult to understand qualitatively why the spectral reflectance factor function,
ρ(λ), of an optimal color has at most two transitions. Let us start with ρ(λ) = 0 for all
λ. Now as an example, given the dominant wavelength at 520 nm and the luminance at
Y = 80, the optimal color can be determined by first setting ρ(520 nm) = 1. This is the
highest excitation purity possible for that dominant wavelength. Of course, this function
does not reflect enough luminance. We can then set either ρ(519 nm) = 1 or ρ(521 nm) = 1,
depending on which one will result in a chromaticity closer to the dominant wavelength
520 nm. Setting one more wavelength to 1 will slightly increase the luminance, but it will
also slightly decrease the excitation purity. If we continue this process of setting the ρ at the
neighboring wavelength equal to 1 so that the resulting color has a dominant wavelength
always close or equal to 520 nm, we arrive at a spectral reflectance factor function, ρ(λ), that
produces the desired luminance Y . At that point, all other ρ(λ) are at 0. Therefore, we have
at most two transitions in the spectrum. The resulting excitation purity is the highest of all
other spectral reflectance factor functions because each ρ(λ) we set to 1 in the process is also
the wavelength closest to 520 nm. From the the chromaticity collinearity property of color
mixtures, we know that the resulting chromaticity of the mixture of all the monochromatic
stimuli is bounded within the region enclosed by their chromaticity loci, and therefore will
be closest to the chromaticity locus of 520 nm. If a chromaticity direction does not have a
dominant wavelength, the complementary wavelength, λc , has to be used. In this case, we
need to start with ρ(λ) = 1 for all λ and set the ρ(λc ) to 0 as the first step. We then proceed
to set neighboring wavelengths to 0 one at a time until the desired luminance is reached.
From this discussion, it is also easy to see that the one-transition types of spectral reflection
470 18 Color reproduction
200
b* L* = 98
L* = 95
150 L* = 90
L* = 80
100
50
a*
0
-200 -150 -100 -50 0 50 100 150 200
-50
-100
-150
Figure 18.6. CIELAB contours of optimal color stimuli, under D65 , as functions of lightness level
L ∗ = 10, 20, 30, . . . , 80, 90, 95, 98. These chromaticity contours are computed with wavelength
resolution interpolated to 0.05 nm.
factor function as shown on the top row in Fig. 18.4 are the intermediate type of the bottom
two types. The upper half of the chromaticity diagram has the bottom left type, while the
lower half has the bottom right type. The top right type happens in the middle of the right
half of the chromaticity diagram, and the top left type in the middle of the left half. For
example, in Fig. 18.5 under D65 , the one-transition type occurs at about 585 nm for Y = 60.
If the computation of optimal colors is done by the above procedure using discrete sums,
instead of real integrals, through the entire wavelength interval, significant quantization
errors are encountered, unless the color matching functions are sampled at a resolution
much finer than 1 nm. The contours as shown in Fig. 18.5 are computed with a wavelength
resolution interpolated to 0.1 nm. We can also compute the optimal colors in the CIELAB
color space. Figure 18.6 shows the theoretical boundary under D65 . As will be seen shortly,
in a study by Pointer almost all real object colors fell well within this boundary. When the
lightness is increased above L ∗ = 95, the theoretical boundary closes in very quickly to the
origin. Very bright colors cannot have very high chromas. It is also interesting to note that,
at L ∗ = 98, the color boundary is very asymmetric with respect to the origin – yellow colors
have much higher chromas than red or blue colors. This is easy to understand because the
luminous efficiency function, V (λ) = y(λ), peaks in the “middle” wavelength range.
Since the optimal colors require (physically impossible) discontinuous transitions between
0% and 100% reflectance values, the volume of real surface colors is actually significantly
smaller than the theoretical volume, as has been shown by a number of studies [766].
Figure 18.7 shows the gamut of a large collection of real surface colors plotted in (a ∗ , b∗ ).
18.8 Color gamut mapping 471
120.0
L* = 90
80.0 L* = 50
40.0
b*
0.0
-40.0
L* = 15
-80.0
-120.0
-120.0 -80.0 -40.0 0.0 40.0 80.0 120.0
a*
Figure 18.7. The gamut of real surface colors plotted in (a ∗ , b∗ ). Each contour represents a constant
lightness level, from L ∗ = 15 to L ∗ = 90 in increments of 5. The data were reported by Pointer [766].
We can see that the gamut is not symmetric with respect to neutral and the distribution of
lighter color, say L ∗ = 90 (the thickest contour), is completely skewed towards yellow (the
90–100◦ direction).
An output device, such as a printer or a monitor, can display only a limited set of colors on a
given medium, such as paper or a phosphor screen. This set of colors is called the color gamut
of the device/medium combination. Any color that is outside this gamut has to be mapped
into a displayable color before it can be sent to the device/medium. This process is called
color gamut mapping. Various strategies have been proposed to perform such a mapping.
Since cost, speed, and quality requirements differ greatly from application to application,
we will discuss different color gamut mapping strategies separately for different sets of
design considerations. The major questions are: (1) Is the mapping image-dependent [818]
or image-independent? (2) Does the mapping require spatial processing or not? Another
important factor to consider is whether the evaluation is done by single stimulus (seeing
only the reproduction) or pair comparison (seeing both the original and the reproduction).
(4) chromatic adaptation is not properly modeled. In order to reduce the nonuniformity,
different approaches have been taken. One is to modify the color-difference formula, such
∗
as E 94 [191]. Another is to use the Munsell Color System. A third approach is to use color
appearance models, such as CIECAM97s [193] or CIECAM02 [676]. Since these models
do not account for spatial variations either, spatial processing models have also been used
[47, 295, 678, 692, 770, 1052].
Although we may think that the most accurate color appearance model will be the best
choice for color gamut mapping, it should be realized that most of the models are dependent
on viewing conditions because our color perception is a function of viewing conditions.
However, in practical applications, reproduced images, such as color reflection prints, are
intended to be viewed under many different and often uncontrolled viewing conditions. The
optimality criteria for color gamut mapping for these types of applications should take into
account the wide range of viewing conditions.
L* b*
(1)
(2)
(3)
a*
*
Cab
(a) (b)
Figure 18.8. Two types of color gamut mapping strategy: (a) three examples of the hard-clipping
approach; (b) an example of the soft-compression approach.
outside of the printer color gamut, the color gamut mapping process takes over to find
a “closest” in-gamut color to use for printing the desired color. This strategy is called
hard-clipping. Alternatively, we can also create a mapping (linear or nonlinear) that shrinks
all colors towards neutral so that out-of-gamut colors are compressed into the printable
gamut. This is called soft-compression. Figure 18.8 shows typical examples of these two
approaches: Fig.18.8(a) shows three commonly used hard-clipping strategies, and Fig.
18.8(b) shows an example of the soft-compression approach in which some out-of-gamut
colors are mapped to the gamut boundary and some of the in-gamut colors are mapped
inwards away from the boundary. In either case, the color gamut mapping operates on a
pixel-by-pixel basis, independently of the image content and without any spatial processing.
Extensive studies have shown that neither of these two strategies is uniformly good for all
images [677, 857, Chapter 10]. Hard-clipping tends to preserve pleasing color saturation,
but lose image details where clipping occurs. Soft-compression preserves some details, but
tends to desaturate and noticeably distort many colors. There are many variations in both
approaches. The major differences between different algorithms are: (i) the color difference
formula used to measure color distances, and (ii) the constraints on the allowable mapping.
The former has been discussed in the context of the color space and metric. The latter
is often classified [450] in terms of the dimensionality of the constraints: one-, two-, or
three-dimensional. For example, the most often used one-dimensional constraint is to pre-
serve hue and lightness (as shown by projection (1) in Fig. 18.8(a)). Therefore, the chroma
of the out-of-gamut color is reduced to the maximum printable chroma for the same hue
and lightness. The allowable mapping is a one-dimensional projection along the chroma
direction. This strategy actually works very well for images whose color gamuts are only
slightly larger than the printer gamut. Colors maintain their colorfulness and are mostly
undistorted. Figure 18.8 also shows two examples of typical two-dimensional constraints
(projections (2) and (3) in the figure). One constraint (projection (2)) is to project to the
nearest in-gamut color on the constant hue plane. The motivation is to distort out-of-gamut
colors as little as possible, according to the defined color distance. However, this approach
474 18 Color reproduction
can make the out-of-gamut colors lighter or darker, depending on the shape and distance
of the local gamut boundary. For example, on one hue plane, the nearest projection may be
downward, but, on a neighboring hue plane, it may be upward, thus creating noisy lumi-
nance fluctuations. In order to reduce the probability of such an occurrence, one strategy is
to project towards a medium gray point (projection (3)). It is easy to imagine many other
variations around such ad-hoc constraints. Each of them will do well for some images and
not so well for others. This motivates other strategies that are either image-dependent or
spatially-varying, or both. All these new strategies are the subject of quite active research
and require a lot more work before we know which directions are the promising ones to
pursue in the future.
Many color reproduction problems can be solved more robustly if we have more than
three color channels to use. On the input side, theory says three is enough ONLY if the three
spectral sensitivity functions of the imaging system are a linear combination of the three
human cone spectral sensitivity functions. Otherwise, there will be metameric problems.
Even if metameric problems are not a concern, there are some other potential advantages
in having more than three channels. For example, one can make better estimates about the
spectral power distribution of the incident light. This potential has been studied before by
different groups. The conclusion seems to be that the potential is realizable. Some digital
cameras are using sensors that have more than three spectral channels and are claimed to
have better color reproduction than those with conventional trichromatic sensors. It is also
easy to see that color scanners can beneficially take this path.
On the output side, three channels are not enough, mainly because of the limited color
gamut. Black ink expands the color gamut for the dark colors (including gray and black).
Other color inks expand the color gamut in different parts of the color space. The other
potential benefit for more than three output channels is similar to that of the input side, i.e.,
the metameric problems. The more channels we use, the more stable the color reproduction
will be under different illuminants. In the extreme case, one can reproduce the spectral
reflectance of an object.
In a closed system such as the conventional photographic system, all the components in the
imaging chain are made to work with each other. Little intervention is needed to “manage”
the color reproduction process. In digital imaging applications, there are many different
input and output devices available, each of which might be made by a different manufacturer
without conforming to a single standard. One digital camera may use NTSC primaries to
specify its output colors, while another camera may use sRGB as output metrics. Similarly,
if the same digital image is printed on two printers, the resulting color prints may look
18.10 Color management systems 475
very different. In order to solve the problems created by the incompatible color metrics
used in the various devices, color management systems are set up to do color conversions
between different color metrics so that color imaging devices can talk to each other properly.
Furthermore, a color management system has to deal with color conversion from one
viewing condition to another, the luminance adjustment when there is a mismatch in the
dynamic ranges between two devices, and the interpretation of rendering intent as specified
in the device profiles. These issues and solutions of color management systems can be found
in a number of books [347, 469].
One solution proposed by the International Color Consortium (ICC) is to use device
profiles, which are files that contain device calibration information in a standard data format.
Device manufacturers are asked to provide a device profile for each model of imaging device
they make. A color management system will then makes color transformations from one
device to another, based on the device profile that comes with the imaging device. Therefore,
the color management system works like an international bank that may receive a payment
in one currency and make a payment in another currency. It works only if there is a table of
exchange rates between different currencies, or a table of exchange rates with a common
currency.
However, the solution provided by device profiles is not perfect. Some of the main
problems are as follows:
r Quantization errors. Color transformations are often performed with a limited number
of bits (e.g., 8 bits). Converting from one device-dependent color space to another
often involves transformation to an intermediate device-independent color space, such
as CIEXYZ or CIELAB. The resulting quantization errors can be quite serious.
r Device stability. A device profile may be generated in the factory when the device is
made. However, device characteristics often drift with time, making the original device
profile inaccurate.
r Viewing conditions. A device profile may be produced under one viewing condition
and the application may use another viewing condition. Although one can use a color
appearance model to compensate for the difference, the model may not work well
because there is no automatic algorithm for calculating the required parameters for the
particular viewing condition.
r Device characteristics. When the characteristics of an output device are very different
from those the profile was designed for, the color reproduction may be less than ideal;
for example, if the output dynamic range is too small or the viewing flare is larger than
expected.
r Ambiguous interpretation. Some color transformation is not one-to-one. For example,
RGB to CMYK is not a one-to-one mapping. In such cases, an optimal mapping
may not be chosen by a color management system. For example, if certain areas in
a color document are to be printed black, a color management system may map the
device-dependent RGB to a device-independent CIELAB, and then from there to a
printer-dependent CMYK space. The process may cause the printer to print some cyan,
magenta, and yellow inks, in addition to black ink, resulting in a muddy black.
476 18 Color reproduction
18.11 Problems
18.1 We use a digital camera to take a picture of a room and print it using a color printer.
We can then compare the color print and the original scene side by side to see how
good the color reproduction of our camera and printer as an imaging chain is.
(a) What do you think is a proper color reproduction criterion for this application?
(b) Can we use E in CIELAB for this comparison?
(c) How do we quantify the color reproduction errors in such a situation?
(Hint: Although this scenario sounds simple, as a color reproduction problem, it is
actually quite complicated. The purpose of this exercise is to let us think through what
we have learned so far and see how we should formulate some kind of solution.)
18.2 We use a scanner to digitize a color print and then print the digitized image using an
inkjet printer. We can now compare the original with the scan/print copy in terms of
color reproduction.
(a) What do you think is a proper color reproduction criterion for this application?
(b) Can we use E in CIELAB for this comparison?
(c) How do we quantify the color reproduction errors in such a situation?
(d) If the paper white of the original has a low minimum density, say, 0.05, and the
paper white of the copy has a higher minimum density, say 0.15, how do you
justify your answers in (a), (b), and (c) ?
18.3 The CIE 1997 Interim Color Appearance Model (CIECAMs) provides a way of com-
puting the lightness, J , of a color sample under a given viewing condition. The L ∗ in
CIELAB is also recommended for computing the lightness of flat color samples in a
light-gray–white background. Let us assume that the light source is CIE Illuminant C.
Compare the two lightness scales, J and L ∗ , as functions of luminance Y for neutral
gray patches.
19 Color image acquisition
Color images of scenes and objects can be captured on photographic film by conventional
cameras, on video tape by video cameras, and on magnetic disk or solid-state memory card
by digital cameras. Digital color images can be digitized from film or paper by scanners.
In this chapter, we will cover these major color image acquisition devices. Photographic
film has the longest history and still offers a convenient, low-cost, high-quality means for
capturing color images. For this reason, it is very useful to understand the photographic
processes, photographic film and photographic paper, because they are often the sources of
many color images that we will encounter. They have some unique properties that influence
how film-originated digital color images should be processed by computers. The next in
importance is the solid state sensors, of which charge-coupled devices (CCDs) are the
most widely used so far, with others (such as complementary metal-oxide–semiconductor
(CMOS) sensors) gaining in popularity. Scanners are devices that are used to digitize images
from film and paper. They are the main devices for generating high-quality digital color
images. Most scanners use CCD sensors, except some high-end graphic arts scanners that
use photomultiplier tubes. Digital cameras are becoming more and more competitive with
photographic films in terms of image quality and convenience. Most digital cameras today
also use CCD sensors. Each of these devices has different characteristics and unique image
processing problems. They are discussed separately.
Color image acquisition systems are designed under a lot of practical constraints. Many
system components are designed and manufactured separately. For example, CCD sensors,
zoom lenses, and DSP (digital signal processing) chips are made by different companies.
Almost all imaging systems are not optimized on a system design level, but rather the
performance is optimized after the components are made and when the system is put together.
Therefore, color imaging engineers usually do not have the chance to apply their knowledge
and expertise to the system as a whole. However, if every engineer in the imaging field
analyzes performance on a system level, we might hope that the components are not too
far off from how they should be designed. It is thus very important for us to have a good
overall understanding of what the general issues are before we look at each individual
image acquisition system in this chapter. Some of these general issues are: (1) spectral
sensitivity/responsivity, (2) calibration and linearity, (3) dynamic range and signal-to-noise
477
478 19 Color image acquisition
ratio, (4) signal shaping and quantization, (5) geometrical accuracy, (6) resolution and
addressability, (7) image processing algorithms, and (8) output metric conversion. System
engineers should study how these different factors may interact with each other before they
can lay out a good system hardware and firmware architecture to ensure that the final images
from the system are of the best quality possible.
cone sensitivity functions and the subspace of the system spectral response functions as the
goodness measure [991]. Of course, the sensor-theoretic approach can be cross-validated
by the data-driven approach to make sure that the theoretical optimal spectral responses are
in fact a good solution for a collection of color samples.
Photographic films and papers are the dominant input/output media for high-quality color
images. This is due to their excellent image resolution, exposure latitude, compactness, and
convenience for image recording and storage. With the advances in solid-state sensors and
memory devices, many applications, such as photojournalism, entertainment, and educa-
tional videos use electronic imaging devices instead of the conventional photographic films
and papers. However, when it comes to consumer photography or high-quality hard-copy
output, film and paper are still the media of choice because of their cost, convenience,
and quality. Of course, the trend to electronic imaging is quickening its pace and imaging
engineers need to learn both systems.
The chemistry of daguerreotype photography was first described in France in 1839 [210,
p. 20]. Initially, the single layer emulsion was coated on glass plates. The first commercial
color film, 16 mm Kodachrome motion picture film, was introduced in 1935. The image dye
couplers for Kodachrome are in the developers. The Kodachrome transparency film consists
of five layers: a blue-sensitive layer with a yellow dye to trap the blue light, a clear gelatine
layer, a blue-and-green-sensitive layer, a clear gelatin layer, and a blue-and-red sensitive
layer at the bottom, coated on a celluloid support. In the following year, Alga announced a
reversal film, a negative film, and a motion picture film based on a tripack idea, proposed
by Rudolph Fischer in 1912. The tripack films use nondiffusing couplers which are put in
the film emulsion layers. By the end of the 1950s, photofinishers were processing color
films for consumer imaging. Long years of research in chemistry, physics, photoscience,
emulsion coating, and manufacturing technology have made color photography possible for
consumer applications. This is one of the major achievements in color imaging history.
Gelatin is used to prevent silver halide grains from aggregating. It allows each grain to
be acted upon by the developer individually. Because of the presence of gelatin, when the
emulsion is coated, it is set to a jelly-like layer on the film support and partially dried. The
gelatin swells in the developer or other solutions, allowing the chemical agents to penetrate
freely to reach all parts of the emulsion. The wonderful physical and chemical properties of
gelatin make it an indispensiable ingredient in most photographic films and papers. So far,
no other medium has been found that can replace gelatin in this application with the same
cost/efficiency performance.
the entire silver halide grain consisting of about 108 silver atoms. The signal amplification
of the silver halide imaging is thus on the order of 106 , about the same magnitude as in
the human photoreceptors. Photographic developer solutions are reducing agents that act
on exposed grains several orders of magnitude faster than the unexposed grains (some of
which are also developed).
After the latent image is converted into a silver image, the rest of the undeveloped silver
halide grains are dissolved and washed out from the emulsion by a different chemical
solution. This process is called fixing. After the fixing, what is left on the emulsion is a
negative image, in the sense that, where there was high exposure, there is now a large amount
of silver deposit and less light is reflected from or transmitted through the area.
Alternatively, if the developed silver is dissolved and washed away by some chemicals
and the undeveloped silver halide grains are then exposed and developed to silver deposits,
we have a positive image. This process is then called the reversal processing.
A small proportion of the grains that do not have latent image centers may also be
developed, forming a small density across the developed film even where little or no light
has irradiated. This small density level is called fog. In addition to the fog density, other
ingredients, such as the film support or the colored-coupler in the color film, in the processed
film or paper also contribute to the minimum density, Dmin , that a piece of film or paper has.
This density component is called the base density. Therefore, Dmin is equal to base density
plus fog density.
(5) low-speed (fine grain) blue-sensitive emulsion layer, (6) yellow filter layer, (7) active sep-
arating layer, (8) high-speed green-sensitive emulsion layer, (9) low-speed green-sensitive
emulsion layer, (10) red filter layer, (11) high-speed red-sensitive emulsion layer, (12) low-
speed red-sensitive emulsion layer, (13) antihalation coating, and (14) film base (cellulose
acetate). The emulsion thickness of the 35 mm role film is about 23 µm and the film base
is about 125 µm.
Although the spectral information is sensed in roughly the three differently-sensitized
layers, there remains the question of how one can extract the three color signals from the
three layers. There are two major approaches to solving this problem. The first approach
is to put image dyes in the developers, with the image dye being formed when the latent
image centers are developed. The second approach is to put the image dye precursors
into the emulsion layers, with the image dyes being formed when the latent silver image
is developed. Kodachrome uses the former, while most others use the latter. Since the
Kodachrome processing is much more complicated and less frequently used, we will only
discuss the latter approach.
As we described before, the film developing process uses a reducing agent to reduce
the silver ions in the exposed grains to silver atoms. In this reaction, the reducing agent
is oxidized. If we can put in the emulsion some chemicals that will in turn react with the
oxidized developer to form image dyes, we can produce a dye image which is proportional
to the silver image. These chemicals are called couplers, because they couple with the
oxidized developer to form dyes. The coupler which forms the cyan dye is called the cyan
coupler, and similarly we also have the magenta coupler and the yellow coupler.
“red”, “green”, and “blue” light pass through the media. For example, if an image region is to
be rendered red, there should be little or no cyan dye, but there should be a lot of magenta and
yellow dyes to block the passage of the “green” and the “blue” light. The intensity of the red
light can be made stronger or weaker by adjusting the amount of cyan dye on the film or the
slide, but, at every intensity, all of the green and the blue light has to be completely removed,
otherwise the hue of the red color will shift when its intensity is changed. This analysis
has two important consequences: (1) color reproduction in photography is subtractive in
nature, and (2) an ideal image dye has to transmit 100% in some parts of the spectrum,
and it has to transmit various amounts of light in other parts of the spectrum. This type of
ideal dye is thus called a block dye, because its spectral transmission curve must look like
a series of blocks, having sharp transitions from 100% transmission to some intermediate
transmission value as a function of dye concentration.
Photographic paper also reproduces color by the subtractive process. Image dyes are
formed on top of the highly reflective support. Light shining on the paper, with a small
fraction of first surface reflection, has to penetrate through the dyes, be reflected from
the paper support, and then come back out into the air to reach our eyes. The spectral
composition of the reflected light is thus modified by the absorption of the dyes and any
color stimuli are thus produced by a subtractive process as we discussed in the case of the
negative film or the positive slide.
exactly the color that is created from the unwanted absorption of the cyan dye, the cast of
that color bias will be a constant across the entire image independent of the local “red”
exposure. Thus, instead of a transparent cyan coupler, we use a colored cyan coupler to
correct the unwanted aborption of the cyan dye. For example, if the cyan dye absorbs some
of the “green” and the “blue” light, the colored cyan coupler has to partially absorb the
same amount of the “green” and the “blue” light. Similarly, the magenta dye has unwanted
absorption in the “red” and “blue” regions, and the magenta coupler is made to partially
absorb the “red” and “blue” light. The reason that most color negatives have a strong red–
orange color cast is the result of using colored cyan and magenta couplers to correct for
the unwanted absorptions. The color cast in the color negative can be easily removed in the
printing process by adjusting the color filters used. Colored couplers increase the overall
density of the negatives and it takes longer for a printer to print images from the negatives,
but the resulting color quality improvement is more than worthwhile.
1 Here E means exposure, but the CIE standard symbol for exposure is H . Therefore, most new literature uses
the term D–log H curve, instead of D–log E curve.
486 19 Color image acquisition
ISO speed
ISO Type 2 transmission density
S = 0.80
Hm
0.80
0.10
base+fog
ISO speed
ISO Status M diffuse transmission density
Hm = HG * H slowest layer
blue
2 green
S=
Hm
red
0.15
0.15
0.15
Figure 19.1. Determination of ISO photographic speeds for (a) black-and-white negative films and
(b) color negative films.
frequency. When a picture on a film is digitized, the density of each pixel represents micro-
densities. There is a significant difference between the macro-density and the micro-density
for two reasons: (1) the aperture size of the film scanner is usually smaller than that used
in a densitometer, and, (2) the optical signal modulation (such as an edge) in a natural
scene produces a film density modulation that is dependent on spatial frequency as well as
chemical processing. Detailed models of micro-densities will be discussed next.
Color images scanned from photographic films represent a major source of high-quality
digital images. There are many factors that affect the correspondence between the image
19.3 Color images digitized from photographic paper 487
irradiance and the micro-density recorded on the film. We will look at some of the factors
and review film models from a signal processing perspective.2
Light incident on the photographic film is spread from the point of incidence to the
surrounding area by reflection, refraction, diffraction, and scattering. The resulting effect is
an additional blur to the input optical image. Because in consumer photography the incident
light is incoherent, this stage of image blur is, presumably, a linear process in terms of light
intensity. This PSF is called the film’s optical PSF.
As film is developed, the reaction products from chemical developing diffuse through
the emulsion layers, causing an additional spread of the information contained in the latent
image. The reaction-product concentration generated per unit area at each point in the
latent image appears to be proportional to the amount of silver being developed at that
point. For color films, the color dye image is produced by the coupling of the oxidation
product of the developer and the coupler. The diffusion of an oxidation product in the
emulsion layer generates a dye cloud in the neighborhood of a silver halide grain. The
dye-forming processes, thus, introduce still more image blur. These effects are nonlinear
and very complicated. Their detailed mechanisms are not completely clear.
Because of the nonlinear characteristics of the film, the concept of cascading system
component MTFs is no longer valid. The analysis of photographic images has therefore
taken two different paths.
2 The next sections are a slightly modified version of part of a paper [564] published in Optical Engineering.
488 19 Color image acquisition
Kelly’s model
Figure 19.2. Three nonlinear models for the photographic process. Redrawn from [873].
modulations as high as 60–70% could be used and MTF data would still be effective.
Furthermore, the correlation between visual perception and computed sharpness from
such MTF measurements is quite high. MTFs are now generally measured as part of film
evaluation.
Frieser’s model using the following LSF is also favored by many other researchers:
1
lf (r ) = exp(−r/σf ); (19.4)
2σf
it has the following Fourier transform:
For a typical color negative film, the constant σs is about 2–3 µm for the yellow layer,
3–4 µm for the magenta, and 7–8 µm for the cyan. The constant σf is about 1.22 times the
corresponding σs .
The third stage, chemical diffusion, has been modeled as a high-pass unsharp masking
filter in density space. The mechanism is usually described as follows [498, 705]. The
chemical concentrations during film development vary throughout the emulsion as functions
of local exposure levels. Since the developer is more nearly exhausted where the density
is greatest, and vice versa, the developer-concentration distribution is tone-reversed with
respect to the density image. On the other hand, the reaction products that tend to slow down
the development have higher concentrations in the high-density region. Fresh developer and
the reaction products will thus diffuse in opposite directions across the boundary of regions
of different densities. The net result is to make the density higher on the high-density side
and lower on the low-density side. The effect is therefore called the adjacency effect (for a
detailed review of other factors, see [67]). Nelson [705] studied this effect, and proposed
a formula to predict, for the one-dimensional case only, the final density level Dc (x) as a
function of the density D(x) that would have been produced according to the large-area
sensitometric D–log E curves. He called D(x) the nominal density at position x. Nelson’s
semi-empirical formula is as follows:
∞
Dc (x) = D(x) + B D 2 (x) − D(x) b(ξ )D(x − ξ )dξ, (19.6)
−∞
The above formula assumes that the relation between density, D, and mass of silver per unit
area, M, is linear for the particular film and development combination involved. If this is
not true, the mass of silver per unit area should be used instead of the density, because the
chemical concentration of the reaction products is proportional to the former, not the latter.
An empirical relation between these two quantities was given as
M = P Dn , (19.8)
where P is the reciprocal of the covering power of silver at a density of 1.0, and n usually
lies between 0.5 and 1.0. When n is not equal to 1, the following formula should be used in
490 19 Color image acquisition
The usual procedure is to determine P and n in Eq. (19.8), then substitute PD n for M in
Eq. (19.9) to obtain the following formula:
∞
Dcn (x) = D n (x) + B D 2n (x) − D n (x) b(ξ )D n (x − ξ )dξ. (19.10)
−∞
According to Nelson’s experiments, the discrepancies between the predicted and measured
densities were generally less than 0.04 at density levels above 2.0 and less than 0.03 or 0.02
at lower densities.
To determine the chemical LSF, a knife edge is exposed by x-ray on the film to be
measured (it is assumed that x-ray suffers only a negligible amount of optical blur). The
density variation across the edge is then traced by a microdensitometer. If the nominal
density on the low-density side is D1 for x < 0 and on the high-density side D2 for x > 0,
then b(x) can be determined according to the following formula:
n D n−1 dD
b(x) = − , (19.11)
Dx (D2 − D1 ) dx
n n n
where Dx = D2 for x > 0, and Dx = D1 for x < 0. It is usually more reliable to determine
b(x) from the high-density side [401], because the signal-to-noise ratio is higher.
Measurement data showed that the chemical LSF could be well described by the following
functions:
Apparently, the MTF depends on the average density level D and the input modulation level
Si . This makes it much less useful for predicting the shape of the output signal.
It should be noted that Nelson’s model is derived from the adjacency effect in black-and-
white films. Experimental results [464, 593] showed that the model has to be modified to
describe the image-enhancing effects in color films. The effects of the various chemicals
(e.g., DI(A)R and image coupler) on the chemical LSF are density-dependent. To what
degree one can continue using Nelson’s model in color films without introducing too much
error is a question to be answered by experiment.
Both Kelly’s three-stage model and Nelson’s model used empirical forms for the chem-
ical spread function. Ehn and Silevitch modeled the film adjacency effects from a more
fundamental formulation, which started with the diffusion equations relating the concentra-
tion of developer and the concentration of reaction products that inhibit development to the
rate of change in the mass of silver developed per unit area. Their results show that Nelson’s
model can be derived from the diffusion model when the density fluctuation is not too large.
However, the chemical spread function is now density-dependent. The full expression of
the relation between M and Mc is too complicated to describe here. Silevitch, Gonsalves,
and Ehn [873] later simplified it to the following form:
where A and κ are constants, b(x) is the chemical LSF, and ∗∗ represents convolution.
Although the diffusion model is more appealing, there have been many fewer measurements
on these model parameters than on those of Nelson’s model. The complexity of Eq. (19.15)
also presents a problem in estimating the unknowns.
in a large uniform area. A region-based model is needed to predict the detail in the final dye
image.
The densities, Dc , used in the above equations are the so-called analytical densities,
which are the densities of the individual layers. In practice, densities of a developed color
film are measured with all the layers contributing (the so-called integral densities). The
relations between the integral density Df and the analytical density Dc are often described
by the following linear equations:
k
n
Df,i = n
ai j Dc, j + bi for i = 1, . . . , k, (19.16)
j=1
where i and j denote the different layers of a color film. A point-based model of the
interimage effect can thus be written as
k
k
j
n
Df,i = ci j D nj + di jl D nj Dln + gi for i = 1, . . . , k. (19.17)
j=1 j=1 l=1
Using this empirical model, one can convert the so-called nominal densities Di to the
integral film densities Df,i by a 3 × 10 matrix. This is the current practice of film unbuilding,
assuming n = 1.
Photographic films serve as high-quality sensing and storage devices. They are relatively
inexpensive and have very fine resolution. An ASA-100 35-mm color negative has an area
of 24 mm by 36 mm and its resolution is about equivalent to a 2000 × 3000 pixel digital
camera. Therefore, photographic film remains a very good source of high-quality color
images. Film scanners are often used to convert the film images into digital form. The
calibration of a film scanner is more complicated than a desktop reflection scanner for two
reasons: (a) film is a complex, nonlinear device, and (b) the dynamic range of a film is much
higher – roughly about 1000:1, compared with 100:1 for reflection print.
Two major steps in calibrating a film scanner are the removal of the film interimage effect
(color cross-talk in film development), and the film D–log H curve. The film interimage
effect can be removed more accurately through a 3-D LUT or less accurately by a 3 × 10
matrix (including the R, G, B linear terms, the square terms, R 2 , G 2 , B 2 , and the product
terms, RG, R B, G B, and a constant term) [443]. Since the interimage effect can be modeled
better in film density, the input and output variables in the lookup table or color matrix
should be set up in film density. There is also the adjacency effect (which also occurred in
the chemical processing of film) which can only be removed approximately by “inverse”
filtering in density domain. This is time consuming and rarely done because the adjacency
effect actually makes the image look sharper. Let us assume that we want to calibrate a film
scanner so that the output codes represent log exposure on the film. A typical calibration
will consist of the following steps:
19.4 Film calibration 493
1. The scanner CCD sensor measures scanner transmittance. Since the scanner spectral
responsivity functions may not be the same as the standard system response filters
(such as Status A for reversal films or Status M for negative films), a 3-D LUT or 3 × 3
matrix is used to convert the scanner transmittance into the transmittance that would
have been measured by a standard densitometer.
2. The transmittance is converted into density by 1-D LUTs because the interimage is
better modeled in density.
3. The film interimage effect is removed by a 3-D LUT or a 3 × 10 matrix.
4. The 1-D LUTs (inverse D–log H ) are used to map the interimage-removed film density
to log exposure.
The most expensive step is generating data for the interimage lookup table or matrix. It
requires an instrument (sensitometer) that can generate controlled exposures on the film
to be calibrated. For consumer applications, photofinishing scanning services usually are
equipped to do this step. Therefore, reasonable calibration can be expected when films
are scanned by photofinishers. Potential film scanner problems are lamp stability, dusty
mirrors, infrared heat absorption, CFA (color filter array) organic dye fading, and dichroic
filter fading. Because film scanners have to handle a high dynamic range, often a preliminary
low-resolution scan is performed to collect data needed to choose the proper neutral density
filter before the final scan is performed.
The above calibration procedure for film digitization is obviously an oversimplified
solution to a potentially complicated problem. It is instructive to consider the problem
mathematically. Let Fr (λ), Fg (λ), and Fb (λ) be the spectral sensitivity functions of the film.
The first complication is that they are also functions of exposure but, over a small exposure
range, we can assume that they are not. The definitions of film exposures, Hr , Hg , and Hb ,
are
where L(λ) is the spectral image exposure of the film. When color films are exposed and
processed, the resulting (Status A or Status M) densities, Dr , Dg , Db , are formed as functions
of exposures:
Dr = f r (Hr , Hg , Hb ), (19.21)
Dg = f g (Hr , Hg , Hb ), (19.22)
Db = f b (Hr , Hg , Hb ). (19.23)
There are two main reasons why each color density depends on all three color exposures:
(1) the spectral energy intended for one color layer is absorbed by another color layer
(punch-through), and (2) the density developed in one color layer is affected by the density
494 19 Color image acquisition
developed in other layers (interimage effect). The functions, f r , f g , and f b , are nonlinear
and have the general shape of the typical D–log H curve when two of the variables are held
constant. The last set of equations needed is the relations between scanner density and the
standard density (Status A or Status M). Let Ar (λ), Ag (λ) and Ab (λ) be the system spectral
responsivity functions (including the densitometer light source) for the standard density,
and Sr (λ), Sg (λ) and Sb (λ) be the scanner spectral responsivity functions (including the
scanner light source). Let T (λ) be the spectral transmittance of the film, and Drs , Dgs , Dbs be
the scanner densities. We have:
# $
Dr = − log T (λ)Ar (λ)dλ , (19.24)
# $
Dg = − log T (λ)Ag (λ)dλ , (19.25)
# $
Db = − log T (λ)Ab (λ)dλ , (19.26)
# $
Drs = − log T (λ)Sr (λ)dλ , (19.27)
# $
Dgs = − log T (λ)Sg (λ)dλ , (19.28)
# $
Dbs = − log T (λ)Sb (λ)dλ . (19.29)
From Eqs. (19.18)–(19.29), we can see that the film calibration is a complex problem. The
calibration procedure we outlined before is thus an engineering approximation. If we com-
pare the equations with the procedure, we can see how the various approximations are made.
In some applications, film data may not be available. Empirical data-driven calibration proce-
dures, such as neural network and 3-D LUTs, can also be used very successfully [336, 443].
In the last two decades, great progress has been made in the development of solid-state
sensors. Among them, the undisputed leader has been the CCD sensor in the area of high-
resolution and low-noise sensors [461, 948]. However, CMOS sensors are becoming more
popular because it is easier to use the existing CMOS integrated circuit technology to
build camera-on-a-chip and because the CMOS sensors have lower power consumption
[325, 1047]. Still another type of sensor uses amorphous silicon to build the sensor on top
of a dedicated signal processing IC. This type of vertically integrated sensor is called the
thin-film-on-ASIC (TFA) sensor [606, 607]. More interestingly, it is possible to change the
spectral sensitivity of a pixel electonically on this type of sensor [890]. Another new type of
CMOS sensor [616, 656] takes advantage of the fact that light of different wavelengths has
different absorption depths in silicon (the longer the wavelength, the deeper it penetrates),
and therefore, a vertical pixel can be contructed so that “red”, “green”, and “blue” image
information is captured at the same location of the image plane, but in different depth layers,
19.5 Solid-state sensors and CCD cameras 495
thus elimating the need for color filters and color interpolation from an RGB mosaic. Solid-
state sensors are a very active research area and the number of commercially successful
sensor types is increasing over time. We will focus our discussion on CCD sensors and their
applications in most detail because so far they are the most widely used high-performance
sensors, but CMOS sensors are quickly becoming important players.
A CCD camera is different from a film camera in that a CCD is used as the image sensor
and an electronic memory device for image storage. Some CCD cameras also have an elec-
tronic shutter, instead of the mechanical one in the film cameras. CCD sensors have discrete
sensing elements (pixels) and the image exposure at each element is often digitized into
discrete numbers by an analog-to-digital converter. Therefore, the images acquired by CCD
cameras are often in digital form, which is very convenient for further computer processing.
(a)
CCD or photodiodes
amplifier
(b)
amplifier
output
shift register signals
(c)
(exposed area)
(shielded area)
amplifier
output
shift register signals
Figure 19.3. Schematic diagram of three types of CCD architecture: (a) linear array, (b) interline
transfer, and (c) frame transfer.
single pixel and manipulate the voltages of the electrodes so that the charges are transferred
in the desired direction to the neighboring pixel. The arrangement of the electrodes for a
pixel is repeated for every pixel. A group of electrodes that receive the same clock signal
(control voltage) is called a phase. Most CCD sensors use two-phase, three-phase, or four-
phase transfer structures.
CCD sensors are roughly divided into four architectures: (1) linear arrays; (2) inter-
line transfer arrays; (3) frame transfer arrays; and (4) time-delay and intregration (TDI).
Figure 19.3 shows schematic diagrams of the first three types of archtectures. The fourth
architecture is not usually used in color imaging and will not be discussed here. Sometimes
it is also desirable to mix two types of design to achieve a certain system performance
19.5 Solid-state sensors and CCD cameras 497
requirement. For example, a frame–interline transfer CCD architecture [948, p. 119] can
be used so that each vertical line is transferred to an interline shift register and then each
vertical shift register is transferred to a full frame storage, which is then read out externally.
Linear arrays are mostly used in scanners where high-resolution linear arrays can be used
without high cost. Interline transfer arrays have the advantage of a fast read-out rate, but the
sensor area occupies only about half of the exposed image area, i.e., the fill-factor is low.
If the interline shift registers are shielded from light, the exposure time can be controlled
electronically without a mechanical shutter. In comparison, frame transfer arrays have high
fill-factors (100% is possible), but suffer from a slow read-out rate.
The average dark current i d generated in a pixel is a sensitive function of the temperature
T:
where C is a constant, A is the pixel area (cm2 ), k is the Boltzmann constant, and E g
is silicon bandgap energy which is given in eV by:
7.021 × 10−4 T 2
E g = 1.1557 − . (19.31)
1108 + T
Simple calculation shows that when the CCD is cooled from 25 ◦ C to 17 ◦ C, the dark
current is reduced by a factor of 2. (Note: 25 ◦ C = 298.16 K.) At lower temperatures,
the reduction of the dark current is even more dramatic for the same size temperature
decrement.
Occasionally some isolated pixels on a CCD sensor generate an unusally high dark
current compared with the average of other pixels. These abnormal pixels are called
dark spikes (note that these pixels look bright on the image). It seems that silicon lattice
defects in a high electric field tend to generate these dark spikes.
r Fixed pattern noise
Fixed pattern noise refers to the variation in the CCD output pixel values when the
sensor is uniformly illuminated. However, several factors contribute to the total pixel-
to-pixel variations. For example, due to differences in the pixel area in manufacturing,
nonuniformity of silicon doping, nonuniformity of substrate thickness, and differences
in oxide–substrate surface structure, etc., the quantum efficiency, the spectral respon-
sivity, the dark current, and the voltage offset can vary from pixel to pixel. These
variations are properties of each pixel and therefore are called fixed pattern noise. For-
tunately, this type of noise can be corrected to a certain degree by taking several kinds
of calibration image and performing signal processing to normalize a flat field image.
A simple procedure, called flat-fielding, to correct the fixed pattern noise is to take
three kinds of calibration frame of the image: the offset frame Io (x, y), the dark current
frame Id (x, y), and the response frame Ir (x, y) [791, p. 194]. The offset frame is taken
with light blocked off the CCD sensor and using very short exposure time to , so that
the dark current does not have time to build up. The dark current frame is also taken
in the dark, but with a long exposure time td , so that the dark current can accumulate
to very significant levels. The response frame is taken with a uniform illumination and
some normal exposure time tr . Any image I (x, y) that is captured with exposure time
t can be processed to produce a corrected output image, Ic (x, y), by
[I (x, y) − Io (x, y)] − (t/td )[Id (x, y) − Io (x, y)]
Ic (x, y) = a , (19.32)
[Ir (x, y) − Io (x, y)] − (tr /td )[Id (x, y) − Io (x, y)]
where a is a constant scaling factor. It should be pointed out that the dark current noise
is assumed to be proportional to the exposure (integration) time. The other assumption
is that the illumination of the response frame is assumed to have the same spectral
composition as that of the image frame (because the CCD response is a function of
wavelength). This, of course, cannot be true in typical color imaging applications.
19.5 Solid-state sensors and CCD cameras 499
Therefore, the gain correction does not work as precisely as it should in theory. To be
effective, each of the calibration frames should be the average of many frames taken
under the same conditions.
r High-energy radiation
Cosmic rays are contantly bombarding the earth’s surface. At sea level, the cosmic ray
event rate is about 0.025 cm−2 s−1 . Therefore, in an hour of use, a 1 cm2 sensor can be
hit by high-energy radiation particles about 90 times.
resistance of the reset transistor (usually a MOSFET functions as a switch) and (2) the
source-follower noise from its output resistance. The reset transistor, when switched
on, has a very low resistance and allows a quick setting of the reference voltage to the
sense node capacitor. Because of the thermal noise in the transistor channel resistance,
the actual voltage that is finally reset on the capacitor is the reference voltage plus or
minus some random fluctuation. Fortunately, when the reset transistor is switched off
for charge readout, its resistance is very high and therefore the time constant (RC) of
the capacitor voltage variation is very long. This allows multiple readings of the sense
capacitor voltage. By sensing that voltage once before and once after the sensor charge
is dumped to the capacitor and taking the difference between these two readings, we
can essentially remove the fluctuation of the reference voltage due to the reset transistor
resistance. This is called correlated double sampling, and is so effective in removing
the reset noise that this noise component is now so small as to be neglected. There
are several methods of implementing the correlated double sampling, each having a
different frequency response and signal-to-noise ratio [652].
r Shot noise
In a semiconductor device, an electric current is the flow of discrete charge particles
(electrons or holes). The discrete nature of the flow produces statistical fluctuations,
resulting in noise similar to that due to photon statistics. However, this is much reduced
in a metal conductor because, in metals, electrons are strongly coupled over a long
distance.
r Flicker (1/ f ) noise The source of flicker noise is not well understood. The noise power
is inversely proportional to the frequency and therefore it is also called the 1/ f noise.
The flicker noise generally increases with increasing current through an electronic
device [791, p. 68]. At very high frequency, this component becomes quite small,
compared with the thermal noise which remains relatively independent of frequency
as long as hν/kT 1.
log s n
log noise
slope
= 1.0
slope
= 0.5
slope
= 0.0 fixed pattern
noise
read shot
noise noise
log S(v)
log signal full
well
Figure 19.4. An example of a photon transfer curve.
solved by the camera manufacturers, but we should be aware of this possible noise source
in case it arises in places where the interference noise might be large.
3 The output code is assumed to be linearly proportional to image irradiance. Therefore, all gamma correction
or tone scale curves should be removed before the measurements are made. It is also important to make sure that
the CCD sensor is operating in its linear response range.
502 19 Color image acquisition
the difference between two frames of the same exposure can be taken to calculate the noise
variance. Assuming that the noise of the two frames is uncorrelated, the variance calculated
from the frame difference should be divided by 2.
pixel block
row select
column select/processing
(a) (b)
A/D memory
(c)
Figure 19.6. Three examples of the pixel block used in CMOS sensors: (a) the simplest pixel has only
a photodiode and its own parasitic capacitance; (b) a more advanced pixel block has an amplifier to
boost the signal; and (c) a complicated pixel block may have an analog to digital converter (A/D) to
convert the photodiode voltage to a digital value and store that digital value in a local memory.
rate [12, 176, 439, 534, 858]. We can even go one step further to include an analog-to-
digital converter and a local storage memory in each pixel block to convert the pixel image
irradiance signal into digital form and store it in the local memory, as shown in Fig. 19.6(c).
of the CCD or CMOS image sensors. Two types of electronic shutter have often been used:
the rolling shutter and the global shutter.
In an interline CCD image sensor (see Fig. 19.3(b)), the entire image sensor is reset before
the beginning of an exposure, to remove any residual charges in the photodiodes. Each pixel
then starts to integrate the photon-generated charges from its photodiode. At the end of the
exposure time, the integrated charges at every pixel are transferred simultaneously to the
neighboring, light-shielded, interline pixels, which are then read out sequentially. Therefore,
a digital camera using an interline CCD sensor can function without a mechanical shutter.
In certain cases, the electronic shutter is used in conjunction with a mechanical shutter to
achieve better exposure control.
Some CMOS image sensors also implement light-shielded areas on the sensor and there-
fore can function as the interline CCD sensors with a global electronic shutter [273, 1011].
Most CMOS image sensors use a rolling shutter. Each row of the sensor is reset one after
another sequentially, and after some exposure time, the integrated charges in each row are
read out in the same sequence as it is reset. Therefore, it is similar to the focal-plane shutter
in the film camera described in Section 10.4.3. A rolling shutter, like the focal-plane shutter,
tends to create distortion in the image of a moving object. An additional constraint in CMOS
sensors with a rolling shutter is that the exposure time can only be integer multiples of the
time it takes to read out a row on the sensor. For flickering sources, such as fluorescent
lamps, dark bands can appear in the images for certain shutter speeds.
A special problem arises when photoflash is needed for a CMOS imager with a rolling
shutter, because the flash duration is often very short (a few tens of a microsecond to less
than 1 ms). The operation of the CMOS sensor has to be modified to ensure that each pixel
is exposed to the flash light. The operation consists of the following steps in sequence:
(1) the sensor is reset row by row for the entire image; (2) at the completion of the reset
operation, the flash is fired; (3) the image on the sensor is then read out row by row. The
time interval between step (1) and step (3) is determined by an exposure algorithm to get
the proper exposure time for the background where the photoflash may be too weak.
they generate tend to diffuse and be collected by neighboring pixels [551]. In addition,
the cameras are often mass produced and the optical elements have to be cheap and
their optical performance has to be tolerant of manufacturing errors. Therefore, the
optical design is particularly demanding [492, 1027].
2. Exposure control
Although high-quality CCD sensors when cooled to low temperatures can have a very
large dynamic range, typical consumer digital cameras operated at room temperature
have a narrower exposure latitude than color negative films. Therefore it is important
to use the right exposure when taking a picture with a digital camera. Exposure can be
controlled using the shutter speed and/or the aperture size. In general, it is desirable
to keep the shutter speed high to minimize motion blur due to camera shake or object
motion. It is also desirable to use a small aperture to have a longer depth of field and
less lens aberration. Therefore, some trade-off has to be made in exposure control. If
there is enough light, the shutter speed need not be made shorter than necessary and
the aperture can be reduced more. Similarly, we do not want to make the aperture too
small because lens diffraction degrades the image quality, and we may increase the
exposure time a little more.
Most exposure control algorithms measure the weighted average luminance of the
scene to determine a proper exposure. Exactly how that weighted average is calculated is
often classified as proprietary information. A simple but relatively effective algorithm
can be developed as follows. An image database on the order of several thousands
of images is collected, with the optimal (aim) exposure for each image determined
manually by displaying each image with many exposure variations. In order to predict
a proper exposure for an image, the image is divided into many sections and several
statistical descriptors, such as the average, the minimum, and the maximum luminance
of each section, are calculated. These descriptors and their combinations are then used
in a statistical regression against the manual aims to determine the optimal weighting
coefficients for each descriptor. Of course, the results depend on the image database used
and therefore it is important to collect the images according to the targeted consumer
image population.
3. Focus control
In order to capture images in a fast changing scene, automatic focus control is needed
to bring the object of interest quickly into good focus. There are two major classes of
approach to automatic focus control: (1) the direct ranging approach works by mea-
suring object distances with infrared (or visible light) triangulation or with ultrasound
reflection; and (2) the maximum contrast approach works by taking images at several
focus positions, calculating image contrast for each position, and determining the focus
position that will produce the maximum image contrast.
4. CFA interpolation
A single-chip CCD camera can take color images by sequentially capturing the image
with different color filters. Obviously, this is not applicable if objects move between
shots. Therefore, a CCD sensor is coated with different color filters at the pixel sites,
so that each pixel senses the image signal within only one of the spectral bands. This
is similar to our retinas which have L, M, and S cones, occupying different spatial
506 19 Color image acquisition
R G R G R
G B G B G
R G R G R
G B G B G
R G R G R
locations on the retina and sensing different spectral bands. In our retinas, the cone
mosaic is somewhat random in both spatial and spectral distributions. It has been argued
that the random mosaic helps to reduce visible aliasing. Although it appears that, in
the fovea, the eye optics sufficiently blurs the retinal image so that luminance aliasing
is not a real concern, it is still not clear how our visual system manages to reconstruct
seemingly artifact-free visual images from the color mosaic of cones. For single-chip
cameras, several CFA patterns have been used [88, 207, 749]. Some use “primary”
color filters, such as red, green, and blue, while others use “complementary” color
filters, such as cyan, magenta, and yellow filters. Wide-band (white) filters are also
used in conjunction with color filters to measure the luminance signal. A simple and
very popular color filter array is the Bayer pattern (see Fig. 19.7) which uses two green
pixels for one red pixel and one blue pixel. No matter what CFAs are used, the output
images have to be reconstructed so that at each pixel location there are red, green, and
blue values, or alternatively one luminance and two chrominance values as required in
some video signal representations.
Because color interpolation from the CFA signals often produces undesirable color
artifacts, much research has been done to develop good algorithms [4, 5, 6, 7, 19,
20, 205, 207, 328, 372, 506, 783, 967, 1051, 1069]. Unfortunately, many of the high-
performing algorithms used in consumer digital cameras are not disclosed in the pub-
lished literature.
5. Signal processing
Raw image signals captured by the sensors are processed so that sensor defects are
corrected, missing color signals interpolated, control signals generated, and output
images converted into the desired output formats.
There are two types of signal processing in a digital camera. One generates system
control signals, such as automatic focus adjustment, automatic exposure control, and
automatic color (white) balance. The other processes control the acquired image signals
to produce the output images. System control signals can be calculated from the mea-
surements by separate devices, such as infrared range sensors, or can be calculated from
the image signals themselves, such as the exposure control. The many steps of signal
19.5 Solid-state sensors and CCD cameras 507
convert sensor
to tone white CFA defect &
output scale balance color uniformity
metrics interp. correction
Figure 19.8. A block diagram of typical signal processing steps in a single-chip CCD digital camera
where AF, AE, AGC, CDS, and CFA stand for autofocus, autoexposure, automatic gain control,
correlated double sampling, and color filter array.
processing typically include: (1) input signal shaping; (2) analog-to-digital conver-
sion; (3) pixel-offset correction; (4) pixel-gain correction; (5) pixel-defect correction;
(6) CFA interpolation; (7) exposure (density) balance; (8) color (white) balance; (9)
color correction (3 × 3 matrix); (10) output signal shaping. There are many possible
variations in the sequence of these signal processing steps. In addition, lens fall-off
correction, denoising, contrast enhancement, edge sharpening, and chroma adjustment
are operations that are routinely implemented in most image processing pipelines.
Figure 19.8 shows an example of a simplified signal processing chain.
6. Image verification
In most digital cameras, images are played back on a small on-camera LCD display
soon after they are captured. This provides an easy way to verify the image composition.
The possibility also exists of using the display device to check the accuracy of exposure
control and white balance. However, this does not seem to have been exploited so far.
If focus error is large, the display can also show the problem so that the image can be
taken again.
7. Image compression
In film cameras, films serve as both sensors and storage. In digital cameras, images are
stored on memory cards or magnetic disks. Images are often compressed to reduce the
storage space required.
image resolution, and the geometric distortion of image formation. The noise measurement
has been discussed previously. We will discuss the other aspects of the camera calibration
issues here:
1. Optoelectronic conversion function (OECF)
The camera OECF of a digital camera is defined as the output digital number as a
function of the input scene log luminance. It is important to note that the meaning
of the digital number is not specified in the definition. However, the OECF allows us
to go back to the scene log luminance and therefore serves as the main calibration
function of the camera. If we measure the output digital number as a function of the
log exposure at the focal plane (instead of the scene log luminance), the relation is
called the focal plane OECF. Since camera flare distribution is scene-dependent, in
general it is not possible to derive the camera OECF from the focal plane OECF or
vice versa. The measurement of a camera OECF is specified by international standards
(ISO 14524, [516]), regarding the test targets to be used, the illumination (luminance,
spectral composition, and lighting geometry), and the imaging geometry. Since most
digital cameras have automatic exposure control which depends on scene structure,
it is important to understand that, given a luminance level of a scene (however it is
measured), we cannot assume that a camera OECF measured at the same luminance
level can be used directly without verification.
2. SFR
If the image formation process of a digital camera can be approximated as a linear, shift-
invariant system, its system transfer function is a very useful measure for characterizing
its spatial imaging performance. There are many ways to measure the system transfer
function of an imaging system. For example, if sine-wave targets are available for many
spatial frequencies, we can take images of these targets and measure the amplitudes
and phases of the sine-wave images to estimate the system transfer function. However,
accurate sine-wave targets are difficult to make. Therefore, other targets such as square-
wave and knife-edge targets are often used [445, 533, 788]. ISO 12233 recommended
using a slanted edge target to determine a camera’s SFR. The mathematical basis of
such measurements is presented in Section 16.3.1.
3. Geometric calibration
The goal of the geometric calibration of a digital camera is to determine the geometric
mapping from the object space to the image space as observed on the image captured by
the camera. To achieve a reasonable accurary, this calibration can be a fairly complicated
process [331, 900, 966, 1039]. We will present the basic principles involved in the
process.
From Chapter 9, we learn that in the ideal mapping of Gaussian optics, the ob-
ject coordinates are related to the image coordinates by a perspective transformation.
From Eq. (9.15),
fy
y = ,
z − z0
where z 0 = z F , we can see that, for a rotationally symmetric system, if the object focal
point, F, is used as the center of projection, the image coordinate y is inverted (because
19.5 Solid-state sensors and CCD cameras 509
y
y′′
F
z P y′
f f
zp z 0 = zF zH
Figure 19.9. A simple diagram for calculating image coordinates from object coordinates. Note that
the point at z F is the object focal point, F, and the point at z H = z F + f is the object principal point.
In many applications, a plane at z p = z F − f is used to intercept the projection at y which has the
same sign as the object coordinate y. This plane is called the plane of projection.
y
y′′
N N′
z P y′
d d
zp zN z′N ′ z′
Figure 19.10. An alternative camera model using nodal points. The image coordinate y is determined
by the intersection of the ray emerging from the image nodal point with the image plane located at
z . The distance d between the image nodal point N and the image plane at z depends on where the
camera is focused. For convenience in many applications, a plane at z p = z N − d is used to intercept
the projection at y which has the same sign as the object coordinate y. This plane is called the plane
of projection.
Q
X
x x′
z
C
P
F f y′
Z
y X0
Y0
Z0
Y
Figure 19.11. The relation between the world coordinates (X, Y, Z ), camera coordinates (x, y, z),
and image coordinates (x , y , z ).
used in the literature, and sometimes there is some confusion about the meaning of the
parameters. For example, the distance between the center of projection (perspective
center) and the plane of projection is fixed ( f ) in the focal point model, but variable
(d) for the nodal point model. Since the focal point of a camera is not difficult to
determine in the laboratory and since it is convenient to have a fixed distance between the
perspective center and the plane of projection, the focal point model is more frequently
used.
In order to discuss the problem of camera geometric calibration, a more general
setting for the coordinate systems is needed. Figure 19.11 shows the relation between
the world coordinates (X, Y, Z ), camera coordinates (x, y, z), and image coordinates
(x , y , z ). The perspective center F is located at (X 0 , Y0 , Z 0 ). The transformation
from the world coordinates (X, Y, Z ) to the camera coordinates (x, y, z) can be ex-
pressed as a translation followed by a rotation (or, alternatively, a rotation followed by a
translation):
x a11 a12 a13 X − X0 X − X0
y = a21 a22 a23 Y − Y0 = A Y − Y0 . (19.36)
z a31 a32 a33 Z − Z0 Z − Z0
The rotational matrix A is an orthogonal matrix (i.e., A AT = AT A = I ). The im-
age coordinates (x , y ) can be calculated by projecting (x, y, z) to the projection
plane:
f a11 (X − X 0 ) + a12 (Y − Y0 ) + a13 (Z − Z0)
x = x= f , (19.37)
z a31 (X − X 0 ) + a32 (Y − Y0 ) + a33 (Z − Z0)
f a21 (X − X 0 ) + a22 (Y − Y0 ) + a23 (Z − Z0)
y = y= f . (19.38)
z a31 (X − X 0 ) + a32 (Y − Y0 ) + a33 (Z − Z0)
The above equations are true for an ideal imaging system. In practice, there are several
19.5 Solid-state sensors and CCD cameras 511
types of distortion in a camera. The main ones are lens distortion and decentering
distortion [331]. Lens distortion is a result of lens aberration (such as W311 ) and de-
centering distortion is a result of misalignment of the optical components along the
system optical axis. The ideal image coordinates (x , y ) are related to the distorted
image coordinates (xd , yd ) by
x = xd + δx + x ,
y = yd + δy + y ,
where δx and δy are due to lens distortion, and x and y are due to decentering
distortion.
The lens distortion is often modeled as the radial distortion:
δr = K 1r 3 + K 2r 5 + K 3r 7 + · · · , (19.39)
where r = xd2 + yd2 . Projecting the radial distortion to the x - and y - axes, we can
express them as:
xd
δx = δr = K 1 xd r 2 + K 2 xd r 4 + · · · , (19.40)
r
yd
δy = δr = K 1 yd r 2 + K 2 yd r 4 + · · · . (19.41)
r
The decentering distortion [330] is modeled with a radial component, r , and a tan-
gential component t . When they are projected to the x - and y -axes, they can be
approximated as
where α and β are parameters to be determined in the calibration. In the above deriva-
tion, we have assumed that the optical axis of the camera intersects the plane of pro-
jection at point P, which is assumed to be the origin of the image coordinates. Ideally,
P is located at the center of the image as captured by the CCD camera. However, this
is often found not to be true. There are many methods [1039] of determining the point
P and they often do not give the same results. Therefore, it is important to use the
method that is most relevant to the specific application at hand. In many applications,
P is chosen to be the point that will give the minimum radial distortion. Finally, the
image coordinates have to be converted into pixel locations for a digital camera [966].
This simple process requires scaling and quantization.
Typically camera calibration involves taking images of test targets of a set of spatial
points with known spatial coordinates. Given the set of corresponding points (xd , yd )i
and (X, Y, Z )i , i = 1, 2, . . . , m, we can use constrained nonlinear optimization methods
to determine the best set of parameters (i.e., A, X 0 , Y0 , Z 0 , f , K 1 , K 2 , K 3 , etc.)
under the constraint that A AT = I , so that the squared errors between the predicted
coordinates and the measured coordinates are minimized [449, 590, 1022]. However,
such a global optimization may not always give good calibration parameters because:
512 19 Color image acquisition
(1) nonlinear optimization methods require good starting points; and (2) the effect of
error in one parameter can be cancelled by the error in another parameter. Therefore,
it is often more desirable to calibrate some parameters by special procedures (see,
e.g., [487, 627, 900]).
19.6 Scanners
Scanners are often used to convert an analog original into a digital image. The original can
be a reflection print or a transmission film (color negative or reversal). There are four main
types of scanner: handheld, sheetfed, flatbed, and drum scanners. Some scanners are built
specifically for scanning photographic films and some for scanning documents. Here we
will first briefly describe some film scanners,4 and then we will focus on desktop (reflection)
scanners.
Several types of scanner are commonly used for digitizing film images [639, 663]. They
can be roughly classified according to the relative scanning motion between the light beam,
the film, and the sensor. In type I the light beam and the sensor are stationary, and the film
is moved through them in a selected path pattern. In type II the light beam is moving, but
the film and the sensor are stationary. In type III the light beam, the film, and the sensor are
all stationary. These scanner types have different optical and mechanical designs. Micro-
densitometers, drum scanners, and flatbed scanners are all examples of type I instruments.
The basic components of these scanners are: (1) the illuminating (influx) optics, (2) the
film-moving stage (sample plane), and (3) the light-collecting (efflux) optics.
The micro-densitometer is a good example of a film scanner. It has been the main
workhorse for digitizing very-high-quality, very-high-resolution color images from photo-
graphic films. A typical microdensitometer consists of a tungsten lamp as the light source, a
source condenser, a source aperture, an influx ocular, an influx objective, the film plane, an
efflux objective, an efflux ocular, a sensor aperture, a sensor condenser, and a light sensor
(e.g., photomultiplier tube) (see Fig. 19.12). The influx ocular and objective function as
a microscope in a reverse direction, forming a reduced image of the source aperture on
the film plane, while the efflux objective and ocular function as a microscope, forming,
on the sensor aperture, an enlarged transmittance image of the area of the film illuminated
by the image of the source aperture. The irradiated region, the specimen (sample), and the
detected region are actually on the same plane. They are drawn separately because they
are three important parameters that need to be considered separately in the performance
analysis [928].
The optical characteristics of the micro-densitometer have been studied quite extensively
(see, e.g., [232, Chapter 9, 928], and references cited therein). The instrument is quite sen-
sitive to alignment errors and changes in other parameters [987]. Although many problems
(e.g. the partial coherence problem and the conditions for linearity [927]) are fairly well un-
derstood and the manufacturers often provide a few guidelines for selecting proper settings,
the operation of such a scanner requires careful setup and calibration. If the instrument is
4 This section is a modified version of part of a paper [564] published in Optical Engineering.
19.6 Scanners 513
light source
source condenser
source aperture
influx optics
irradiated region
specimen (sample)
detected region
efflux optics
sensor aperture
sensor condenser
sensor
Figure 19.12. Schematic diagram of a micro-densitometer. (Redrawn from [928, Fig. 2.1].)
configured and operated properly, the system transfer function, M(ν), can be well described
[232] by the following equation:
where F(νx , ν y ) is the OTF of the efflux optics, and A(νx , ν y ) is the Fourier transform of the
scanning aperture. This formula has been shown to agree very well with experimental data
[458]. The sampling aperture is the smaller of the source aperture and the sensor aperture
as they are projected to the sample plane. Good-quality microscope lenses are used in the
micro-densitometers, and they often approach the diffraction-limited system. Therefore, to
determine F(νx , ν y ), it is necessary only to know the f /number or the numerical aperture
of the efflux lens, which is typically the limiting micro-densitometer optical performance.
A numerical aperture of NA = 0.25 will have a cut-off frequency of 2(NA)/λ which, at
λ = 500 nm, is 1000 cycles per millimeter, a resolution far above typical films. It should
be pointed out that this assumes that the instrument is operated at a scanning speed such
that the bandwidth of the electronic circuits in the instrument does not degrade the sampled
signals being measured.
A CRT flying spot scanner is an example of a type II device. In the typical configuration,
a scanning spot is generated on the surface of a CRT and a lens system images the light spot
onto the film. The transmitted light from the film is then imaged to a sensor. To calculate
the system transfer function, one can still use Eq. (19.44) with proper functions in place.
One major difference is that the light spot on the CRT generally has a Gaussian-shaped
intensity distribution. If the effective scanning aperture (or sampling aperture) is small
compared with the Gaussian spot, one can approximate the aperture function as a uniform
distribution. Otherwise, it has to be approximated as a truncated Gaussian function. An
additional problem with a flying spot scanner is that the light spot is now moving relatively
514 19 Color image acquisition
mirror
fluorescent lamp
(light source)
mirror
lens
linear A/D
CCD
fast compared with the mechanical motion of the type I scanner, and the motion blur may
have to be considered.
Finally, a CCD area array film scanner (for an example, see [663]) is an example of a
type III scanner. The mechanical structure of this kind of scanner is probably the simplest
of the three types. Light projected through the film is imaged onto a CCD sensor. The cell
size of the sensor relative to the image determines the scanning aperture. The optical system
used to form the image determines the optical transfer function. Equation (19.44) can now
be used to compute the system response function. Because of the amount of light needed
for the sensor, the lens must not be too small, and therefore its OTF has to be considered in
computing the total response. Additional problems associated with CCD sensors are lateral
charge diffusion [411] and charge transfer efficiency [460].
With the increased availability of personal computers, desktop scanners have also become
affordable and popular. The image quality achievable in these inexpensive scanners has also
improved a lot. The basic structure of a flatbed desktop color scanner consists of a light
source, a (mechanical or optical) scanning mechanism,5 color filters, light-sensors, and
analog-to-digital converter. The original is laid flat on a glass panel and an illuminated line
is scanned through the original as the reflected or transmitted light is sensed and digitized.
Figure 19.13 shows a simplified schematic diagram of a typical desktop scanner. Almost
all desktop flatbed scanners use linear CCD arrays as the sensors. The physical size of the
CCD is smaller than the full page width. The scanner optics reduces the physical image size
to that of the CCD. Some scanners use contact image sensors that are the same physical
size as the area that is scanned. Although this eliminates the need for a lengthy optical
path, the self-focus lenses on a contact image sensor are often inefficient in light collection
and provide only limited depth of field. Some color scanners may scan an original in three
5 It is also possible to use optics and a two-dimensional CCD area sensor to capture the whole image at once
without having to scan the original.
19.7 A worked example of 3 × 3 color correction matrix 515
passes, each with a different color filter inserted between the light and the sensor. Other
single-pass color scanners use a trilinear CCD array sensor which has three rows of CCD
linear arrays, each with a different built-in color filter on top on the CCD.
There are two commonly used tools in device color calibration: the color correction matrix
and the 3-D LUT. In this section, we will work out a numerical example of how to derive
a 3 × 3 color correction matrix for an image acquisition system, such as a digital camera
or a home desktop scanner. Since the spectral response functions of most image capture
devices are not any linear combinations of the CIE color matching functions, we cannot
expect that a 3 × 3 matrix will accurately transform the device-dependent RGB color
space to a device-independent color space, such as CIEXYZ. Furthermore, the light source
of the scene or the light source used in a desktop scanner is often very different from
that required by most viewing standards. The 3 × 3 matrix is charged with the task of
providing a best approximation for correcting the difference both in spectral responses and
in illuminants. This being impossible, we should not be surprised when errors for some
colored objects become very large. However, using a 3 × 3 matrix for color correction is
very cost effective to implement and very fast to compute (compared with 3-D LUTs and
the associated multivariate interpolations). In practice, color correction by a 3 × 3 matrix is
sufficiently accurate for many digital camera and other desktop applications. We will discuss
the matrix approach here for the scanner calibration and the 3-D LUT approach later for
the printer calibration. If we want more accurate scanner color calibration, we could also
use the 3-D LUT approach here. For the color correction matrix, we can use the Macbeth
ColorChecker, which has 18 color patches and 6 gray patches from black to white. For the
516 19 Color image acquisition
r r′
1-D LUT X
3×3
scanner g g′
1-D LUT matrix Y
b b′
1-D LUT Z
Figure 19.14. A block diagram of scanner calibration using a color correction matrix.
3-D LUT, we would then certainly need to measure more than just 24 color patches as in the
ColorChecker. The graphics art standard target IT8.7/2 or Kodak Q-60R can be used for this
purpose.
Figure 19.14 shows a block diagram of scanner calibration using a color correction
matrix. The scanner output code values (r, g, b) are mapped through three 1-D LUTs into
scanner reflectance values (r , g , b ). This step is used to remove any response nonlinearity
in the scanner. A 3 × 3 color correction matrix is then used to transform the scanner
reflectance values (r , g , b ) into a device-independent color space, such as the CIE XYZ
tristimulus values. Alternatively, other device-dependent or device-independent color spaces
can be used, but the basic algorithm for deriving the matrix is similar.
There are two types of scanner: reflection and transmission. The transmission scanner is
usually used for scanning reversal (slide) films or color negative films. In this example, we
will further restrict our attention to reflection scanners. The calibration of a transmission
scanner for photographic films is more complicated and has been discussed earlier in this
chapter. Scanner calibration using a color correction matrix proceeds as follows. A Macbeth
ColorChecker (as described in [647]) is scanned on the reflection scanner. A software
program is written that can detect the horizontal and vertical blacklines in the Macbeth
ColorChecker. Each of the 24 color and gray patches can be extracted automatically. Several
pixels around the boundary of each patch are discarded and the pixel values of the central
region of a patch are averaged, thus producing relatively noise-free scanner codes for each
patch. Table 19.1 shows the averaged (r, g, b) code values for the 24 color patches of the
Macbeth ColorChecker from the output of a desktop reflection scanner. The bottom row is
a gray scale from white to black. The usefulness of Macbeth ColorCheckers is that they are
made to reproduce the same spectral reflectances and therefore, for noncritical applications,
users can rely on the published data without measuring them. Of course, the target can fade
and its surface can be contaminated by dust. Therefore, the actual spectral reflectances
should be measured for each target when resources permit. From the measured spectral
reflectances, we can compute the CIE tristimulus values of all the color patches under a
given illuminant. Table 19.2 shows the results of such measurements and calculation for
CIE illuminant D50.
19.7 A worked example of 3 × 3 color correction matrix 517
Table 19.1. The averaged (r,g,b) code values for the 24 color patches of the Macbeth ColorChecker
from the output of a desktop reflection scanner. The order is the same as the physical target from
top to bottom and left to right
Table 19.2. The CIE XYZ tristimulus values for the 24 color patches of the Macbeth ColorChecker
under illuminant D50 . The order is the same as the physical target from top to bottom and left to
right
(12.05 10.33 4.82) (41.31 35.98 19.50) (16.92 18.36 25.14) (11.17 14.10 4.75) (25.04 23.76 33.95) (31.14 42.54 34.69)
(42.82 32.72 4.19) (11.49 10.88 26.94) (30.50 19.88 9.71) (8.82 6.40 11.14) (36.48 44.73 7.94) (50.82 45.75 6.35)
(8.00 6.56 24.33) (14.20 21.66 7.38) (20.92 11.75 3.43) (61.58 62.14 7.14) (31.39 20.43 23.40) (13.56 19.47 30.33)
(86.79 90.05 74.15) (57.02 59.09 48.14) (34.78 36.20 29.78) (19.00 19.79 16.03) (8.62 9.00 7.24) (2.92 3.01 2.41)
The first step is to find the three 1-D LUTs that characterize the scanner response as a
function of the target reflectance. The bottom six patches of the Macbeth ColorChecker are
different shades of gray, with almost flat spectral reflectance functions. Table 19.3 shows
the averaged scanner r, g, b code values for each gray patch. Figure 19.15 shows a plot of
the data. The measured data points are the circles and the continuous curves are the natural
cubic spline interpolation through the data (i.e., with second derivatives set to zero at the
two end points). There are several things that should be noted: (1) the scanner reponse seems
to saturate at the high end, and (2) the curves do not seem to go through the origin. The
question is: how do we know if spline interpolation is the right thing to do? This is not an
easy question to answer. If we inspect the figure, we can see that the scanner response to
the first five gray patches is quite linear. We might as well fit a straight line to the first five
points and clip the line at y = 255, the maximum possible output value. The answer has to
come from an understanding of the device physics of the scanner. Our justification is that
the scanner uses a CCD linear array that has an anti-blooming control, which starts to kick
in before the sensor is saturated and therefore tends to create a smooth nonlinear curve at
the upper end of the response curve. Therefore, a spline interpolation is more likely to be
518 19 Color image acquisition
300.0
200.0
100.0
0.0
0.0 20.0 40.0 60.0 80.0 100.0
reflectance factor %
Figure 19.15. The scanner response curves. The circles are the averaged red, green, and blue scanner
codes plotted against the reflectance factors of the gray patches. The curves are from the natural cubic
spline interpolation through the data points. The red, green, and blue curves are almost on top of each
other because scanners are adjusted to give equal r, g, b responses to gray.
closer to the truth than a straight line with clipping. However, this may not always be true
for all scanners. One way to find out is to have reflection targets that have a finer increment
in reflectance factor. The question of why the curves do not seem to go through the origin
is even more troublesome. There are several possible causes, but a likely cause is flare, the
stray light scattered into the sensor by other brighter regions adjacent to the patch being
measured.
100.0
80.0
reflectance factor %
60.0
40.0
20.0
0.0
0.0 100.0 200.0 300.0
scanner code value
Figure 19.16. The scanner calibration table. For a given scanner code value, we can determine the
input reflectance factor from the table.
desktop scanner often forces a manufacturer to save on material cost and, as a result, scanner
interiors are not all black, some even have white or shiny parts. The flare problem is often
quite severe in desktop scanners [302] and its measurement requires special test targets.
The major problem caused by flare is the reduction of image contrast in the dark areas of an
image. This is not easy to deal with because the flare is dependent on local image details and
the exact solution calls for some kind of deconvolution,6 which is time consuming and noise
sensitive. One corrective measure is to treat the scanner flare as a uniform flare and then use
a tone scale curve with a very steep shadow slope to make dark regions darker and higher
contrast.
6 Basically, the flare is measured as a PSF and some constrained inverse filtering can be used to restore the
flare-free image.
520 19 Color image acquisition
Table 19.4. The (r ,g ,b ) code values for the 24 color patches of the Macbeth ColorChecker as
the results of mapping data in Table 19.1 through the scanner calibration curves shown in
Fig. 19.16.
(16.91, 7.92, 5.51) (51.06, 30.55, 22.83) (14.34, 19.42, 31.98) (12.44, 15.91, 5.18) (27.19, 22.66, 41.63) (26.96, 50.10, 34.61)
(52.17, 18.91, 4.10) (9.73, 10.30, 33.09) (46.33, 9.16, 12.91) (11.56, 5.14, 16.30) (36.47, 48.98, 6.70) (61.12, 35.92, 5.38)
(4.36, 4.50, 26.74) (12.20, 29.90, 5.97) (37.87, 4.02, 3.91) (75.37, 55.82, 4.66) (42.87, 10.85, 30.34) (8.53, 21.42, 29.38)
(88.98, 88.88, 89.33) (59.00, 58.47, 57.61) (36.09, 35.71, 35.29) (19.73, 19.39, 19.07) (8.80, 8.85, 8.55) (2.79, 2.97, 2.57)
Table 19.5. E errors in the scanner calibration using a color correction matrix.
From the CIEXYZ values of the Macbeth ColorChecker in Table 19.2 and the (r , g , b )
scanner reflectance values in Table 19.4, we can determine a 3 × 3 color transformation
matrix to convert scanner (r , g , b ) to CIEXYZ, such that the total CIELAB E error is
minimized, subject to the constraint that neutral colors (r = g = b = 100) are mapped
to the reference white under the chosen illuminant, D50 (X n = 96.4306, Yn = 100.0, Z n =
82.3924). The optimal matrix determined by a conjugate gradient search is:
0.639101 0.215884 0.096116
M = 0.318735 0.641256 0.036091 . (19.45)
−0.002456 0.054175 0.797830
The performance of such a simple calibration process using three 1-D LUTs and one 3 × 3
matrix can be roughly estimated by comparing the measured (L ∗ , a ∗ , b∗ ) values of the
19.8 Problems 521
24 patches with their approximated (L ∗ , a ∗ , b∗ ) from the calibration. Table 19.5 shows the
resulting calibration E errors for the 24 patches. The mean E is 3.69, but the maximum
E is as high as 10.18 for the red patch (#15). It should be noted that the patches from (#19)
to (#24) are known to be neutral, but the calibrated (L ∗ , a ∗ , b∗ )s for (#19), (#20), and (#24)
have quite a noticeable color shift. These errors can be traced back to the calibration curves
constructed by the cubic spline. We can see from Table 19.4 that the (r , g , b ) values for
these three patches have significant deviations from the neutral (i.e., r = g = b ). Since
our visual perception is quite sensitive to neutral errors and they can be very objectionable,
we have to be very careful when constructing the three 1-D LUTs. Depending on how
objectionable the resulting neutral errors are, we may or may not choose to go back and
adjust the calibration curves and recompute the matrix to reduce the neutral errors.
This example clearly shows that in order to achieve a good scanner calibration, we have
to pay special attention to the gray-scale measurements. The example clearly shows that a
3-D LUT has to be used to reduce the maximum color error.
19.8 Problems
19.1 Why should we use cyan, magenta, and yellow dyes as the subtractive primaries?
19.2 Design a set of block dyes that will give the maximum color gamut for a color reversal
film, using CIE illuminant A as the light source.
19.3 On a Bayer CFA, a blue pixel is immediately surrounded by four green pixels. One
method for interpolating the green signal at the blue pixel is to use the average of the
medians of the four green neighbors. For example, if the four green pixels have the
values 63, 80, 84, 85, the average of the medians, 80 and 84, is 82. This median filter
approach is less sensitive to noise and also has the advantage of preserving a high-
contrast edge. However, the computation tends to be more costly than other methods.
Design a fast method for computing the average of medians. How many comparison
operations are needed for each pixel?
19.4 You have a color digital camera and a Macbeth ColorChecker. You would like to con-
vert a color image taken by the camera into a black-and-white image which would ap-
proximate an image taken by a camera with a spectral response function identical to the
CIE luminous efficiency function, V (λ). Given that you know the luminance value of
each color patch on the ColorChecker, how do you do the calibration and conversion?
19.5 Let there be m color samples with spectral reflectance factors ρi (λ), i = 1, 2, . . . , m,
and the spectral power distribution of the illumination L(λ). An imaging system with
spectral response functions R(λ), G(λ), and B(λ) is used to take an image of all the
m color samples. The pixel values corresponding to each of the color samples are
(ri , gi , bi ), i = 1, 2, . . . , m. Assuming that the pixel values are proportional to the
image irradiance and that we are only interested in spectral samples at every 10 nm
from 400 nm to 700 nm (inclusive):
(a) How can we determine the spectral response functions, R(λ), G(λ), and B(λ),
from these pixel values: (ri , gi , bi ), i = 1, 2, . . . , m?
(b) What is the minimum number, m, of color samples needed for the determination?
522 19 Color image acquisition
19.6 In calibrating a digital camera, we use three 1-D LUTs and a 3 × 3 matrix to convert
camera (R, G, B) to CIE (X, Y, Z ).
(a) What are the three 1-D LUTs for?
(b) What is the 3 × 3 matrix for?
(c) Under what conditions will this type of calibration give exact results for all color
stimuli?
(d) Can we use a 3-D LUT for this calibration? What are the advantages and dis-
advantages of the 3-D LUT approach over that using 1-D LUTs and a 3 × 3
matrix?
20 Color image display
Color images can be displayed in many different kinds of devices: CRT monitors [160],
liquid-crystal displays (LCDs) [491], plasma display panels (PDPs), field-emission displays
(FEDs) [451], organic light-emitting diodes (OLEDs) [865], and other eletroluminescent
displays (ELDs) that receive electronic signals and produce color images on the display
screen. The images are transient in the sense that they last not much longer than the electric
signals. They are called soft-copies. In comparison, color images that are printed on reversal
films (slides) and papers, by photographic processes, inject printing, thermal dye transfer,
and offset printing, last much longer after the printing is completed. These are called hard-
copies.
Color images displayed as soft-copies are almost exclusively based on additive color
reproduction, in which colors are produced by additive mixtures of three or more primary
colored lights (usually red, green, and blue). In comparison, color images produced by
hard-copy devices mostly use subtractive color reproduction, in which part of the light
spectrum is selectively removed from the illumination on the hard-copy. For example, cyan
ink removes mainly the long-wavelength part of the illuminant spectrum. The more cyan
ink is deposited on a paper, the less “red” light is reflected from the print. Different colors
are produced by controlling the amounts of cyan ink, magenta ink, yellow ink, and black
ink (to produce very dark colors).
Therefore, there are two major topics in our study of color image displays. The first is
the device or the medium itself, and the second is the method of color reproduction for
that display. The basic principles in additive and subtractive color reproductions are the
same, independently of the devices. In this chapter, we will discuss the two most important
soft-display devices, CRT monitors and the LCD monitors, and the two most important
hard-copy printing technologies: continuous tone printing (such as photographic paper and
dye transfer) and half-tone printing (such as inkjet and lithographic offset printing).
Since its invention in 1897 by Ferdinand Braun [864], the CRT has been greatly improved and
become the most widely used image display device today. Figure 20.1 shows a schematic
diagram of a typical CRT, which consists of an electron beam-forming and accelerating
subsystem, a beam deflection subsystem, a metal mask, and a phosphor screen, all contained
in a vacuum tube. A CRT image display works by producing and focusing an electron beam
523
524 20 Color image display
metal mask
anode
grid 4
grid 2 deflection
cathode yoke glass
faceplate
grid 1
grid 3
phosphor
screen
tri-d
shad ot ph phos
ow-m osph phor
or sc meta scree
ask reen l in- n
line
mask
R B G
R
G B
Figure 20.2. Schematic diagrams of typical masks used in color CRTs: (a) the shadow mask (dot
mask) invented in 1949; (b) the parallel-stripe (aperture grille) mask with in-line guns.
There are two types of electron gun: the delta gun and the in-line gun. The in-line gun
arranges the three electron beams on the horizontal axis. With this arrangement, a deflection
yoke can be designed that maintains dynamic convergence over the full face of the tube
without the need for correcting waveforms [97, p. 12.16]. Because of this advantage of a
simplified deflection circuit, in-line guns are used for most commercial televisions.
The glass bulb (tube) used in a CRT is usually opaque to x-rays and visible light. The mask
is made of metal and, as the CRT is warmed up, the mask bulges (the thermal expansion can
be caused by the heat generated by electrons hitting the mask). This can cause misalignment
between the holes in the mask and the phosphor dots and the resulting mislanding of electrons
is called doming. The aperture grille mask is usually restrained with thin wires (called
damper wires) to maintain its shape. The phosphor layer is coated with a thin aluminum
film so that accumulated electrons on the phosphor layer can be conducted away.
current is drawn to grid 1. The maximum current density occurs when grid 1 is at the same
potential as the cathode.
The other possible limiting factor of the current density is the cathode temperature T . If
we neglect the space-charge effect, the maximum current density, JT , is given by
4π me 2 2 eφ
JT = k T exp − , (20.2)
h3 kT
where eφ is the work function (about 1–2 eV) of the cathode surface at temperature T . Most
cathodes operate between 900 and 1100 K, and, in most CRT designs, JT is greater than Js ,
making cathode current density space-charge-limited.
Moss [681] derived the following empirical relation between the cathode current Ic and
the grid drive voltage Vd for the space-charge-limited emission at the cathode surface when
the grid 2 voltage is fixed relative to the cathode:
where K is a constant, Vo is the absolute value of the cut-off voltage of grid 1, and Vd is
positive-going drive signal applied to the negative grid voltage, −Vo . Moss also observed
that when Vd is greater than about seven-tenths of Vo , the cathode current is better described
by the following relation:
If the drive signal is applied to the cathode, instead of to grid 1, then the grid 2 voltage
relative to the cathode will vary with the drive signal and the above relation has to be
modified (see [97, p. 12.12]).
In practice, the relation between the cathode current and the drive signal (measured
above cut-off) is often described as a power function with an exponent that varies from
device to device. Over a considerable range of the cathode current, the phosphor luminous
output is linearly proportional to the cathode current, and the relation between the phosphor
luminance and the signal voltage is also approximated by the power function. The reported
exponents for color CRTs vary from about 1.5 to 3.0 ([895, p. 107] reported a range from
2.6 and 3.0). However, the power function is a good model only when the cut-off voltage is
subtracted from the applied signal. It is not known if all the reported values were determined
with the proper offset subtracted, and therefore the typical value of the exponent of a CRT
cannot be accurately estimated from these reported data. NTSC has used 2.2 as its standard
gamma and there are informal measurement reports that agree with this value.
is deposited on the surface of the phosphor screen. This metal coating is thin enough to
allow most of the incident high-energy electrons to go through. It also functions as a mirror,
reflecting back to the glass side roughly 80–90% of the light emitted to the gun side.
The incident electrons that penetrate into the phosphor lose their energy by interacting
with the electrons and nuclei of the phosphor powders. The depth of penetration is approxi-
mately a power function of the electron energy [382, p. 295]. For the electron energy between
15 and 30 keV typical of a CRT, the penetration depth is usually less than the average size of
phosphor particles, 3–12 µm. A typical screen thickness is about 2–4 particles. This depth is
determined to optimize the luminescence efficiency [154]. When the energy of the incident
electrons falls below a threshold level, the light output is reduced to zero. Well above the
threshold, the screen luminescence intensity is approximately proportional to the energy of
the incident electrons. The brightness achievable is limited by several factors: saturation of
the activator centers and reduction in efficiency at higher operating temperatures.
30.0
25.0
20.0
luminance (fL)
15.0
10.0
5.0
0.0
0 100 200 300
code value
Figure 20.3. The CRT model (solid curve) and the measured data (circles).
Figure 20.3 shows a typical example of the accuracy of the model compared with the
measured data (the subject is a SONY-17E10 monitor). The circles are the measured data
and the solid curve is the model resulting from the least-square fit. It should be pointed out
that measurement of the radiance (luminance) at low signal levels is a very time-consuming
process with very low precision. In fact, in a US Patent [83], it is proposed that the model
is used to extrapolate at the low end of the curve after the higher radiance levels have been
measured with an instrument.
Table 20.1. Examples of phosphors that are designated as P22 by the Electronic
Industries Association
Table 20.2. The chromaticity coordinates of CRT phosphors for the various standards
and monitors
(WTDS), was established in 1982. A listing of the various registered phosphors can be
found in reference books (for example, [866, Chapter 6]).
Although different national and international standards attempt to define standard phos-
phors, few if any CRT monitors have the standard phosphors, as can be seen in Table 20.2.
r
R
X 1-D LUT
3×3
g
G CRT
Y 1-D LUT
matrix monitor
Z B b
1-D LUT
and (X B , Y B , Z B ), be the tristimulus values of the red, green, and blue phosphors, respec-
tively. Then for any CRT tristimulus values, (R, G, B), the tristimulus values (X, Y, Z ) of
the color displayed on the CRT screen can be calculated as
X X R XG X B R
Y = Y R YG Y B G (20.6)
Z Z R ZG Z B B
and
−1
R XR XG XB X
G = YR YG YB Y . (20.7)
B ZR ZG ZB Z
In the above matrix transformation, we can scale the (R, G, B) tristimulus values to any
convenient range, as long as we also scale the matrix elements inversely, so that the equation
still holds. The actual code values (r, g, b) that are sent to the color monitor can be looked
up from the three 1-D LUTs using (R, G, B) as indices as shown in Fig. 20.4.
20.2 LCDs
Two of the major disadvantages of CRT monitors are their big physical size and the large
power source they require. Flat panel displays using low-voltage, small-power sources
have distinct advantages where physical space and weight matter most, such as in portable
computers. Among the many types of flat panel displays, LCDs have gained the most
popularity. The image quality of LCDs still lags behind that of the CRT, but it has been
drastically improved since the early 2000s [491]. In this section, we will study how LCDs
work and some of their characteristics.
n
n n
θ
d d
principles of the thin-film transistor (TFT), active matrix (AM), TN-LCD, because it is
currently the most widely used LCD. Figure 20.6 shows a basic TN liquid crystal cell [679].
In Fig. 20.6(a) without any externally applied electric field, the liquid crystal molecules
orient themselves to the glass plates that confine them. Note that the molecules near the
top glass plate are oriented in one direction and those near the bottom plate in a direction
orthogonal to that of the top plate. The orientation of the molecules in between the two plates
gradually twists between the two. Therefore, this is called the twisted nematic liquid crystal.
On the surfaces of the top and bottom glass plates are polarizer plates, each polarized to the
light light
polarizer polarizer
LC LC
polarizer polarizer
direction of the molecules immediately adjacent to it. An unpolarized light beam passing
through the top polarizer will become polarized, and this polarization is gradually twisted
by the LC molecules. By the time the light reaches the bottom glass plate, it has the correct
polarization to go through the bottom polarizer plate and emerge on the other side of the
cell. Therefore, light is transmitted when the cell is in the off position (meaning no external
electric field). If a large voltage (say 10 V) is applied between the top plate and the bottom
plate, an electric field is present in the cell (see Fig. 20.6(b)) and the molecules are polarized
into electrical dipoles that align themselves along the electric field. The polarization of the
light beam is now no longer twisted and when the light reaches the bottom polarizer plate, it
does not have the correct polarization to pass through. Therefore, the light is blocked when
the voltage is on. It turns out that, if a smaller voltage (say 4 V) is applied, the molecules
will only partially align with the electric field and still twist the polarization of the light
somewhat to allow partial transmission of the light through the bottom of the cell. By
controlling the cell voltage, we can therefore control how much light is transmitted through
the TN liquid crystal cell. (This way of using DC voltage differences to drive the LCD is
called direct drive. Over a wide range of AC frequencies, TN liquid crystal also responds
to the root-mean-square (RMS) of the voltage difference. Most TN-LCD drivers are now
based on these characteristics.) What we have just discussed is called the normally-white
mode. We can also arrange the bottom polarizer to make it parallel with the top polarizer to
obtain a normally-black mode, because then the beam is twisted to orthogonal polarization
and cannot pass through without external voltage applied to the cell.
The basic cell we described above can function as a pixel. If we have an array of pixels
and can apply the image signals as voltages to each pixel individually, we then have a
liquid-crystal image display. The most successful method for addressing each pixel in an
LCD is by an AM drive, which is basically a set of horizontal and vertical voltage lines
forming a grid pattern of pixels. The reason the matrix is called active is that at each grid
(pixel) location there is an active electronic element (such as a transistor), instead of merely
a passive element (such as a resistor or capacitor). The active element allows the transfer of
the signal at each pixel to the liquid crystal cell to be controlled by a small addressing clock
signal. This is usually accomplished by a transistor, implemented by thin-film deposition of
differently-doped semiconductor materials. The resulting LCDs are thus called TFT-LCDs
(the AM is implied). Figure 20.7 shows a schematic diagram of a TFT-LCD with a CFA
for color image display [249, 680, 728]. Figure 20.7(a) shows the physical structure and
Fig. 20.7(b) shows the equivalent functional circuit. Note that the liquid crystal here serves
as the dielectric material of the capacitor whose two electrodes are the pixel electrode
(on the lower glass plate) and the common electrode (on the upper glass plate). Both
electrodes are made of indium tin oxide (ITO) which is a transparent conducting material.
On the liquid crystal sides of the ITO surfaces, there are polyimide coatings, which are
robbed to align the liquid crystal molecules along the desired directions. In order to operate
smoothly (by reducing the possible directions of rotation), the liquid-crystal molecules
are tilted at a few degrees from the ITO planes. This is called the pretilt angle. The TFT
is opaque and occupies only a small fraction of the total pixel area. The data signal is
transferred to the pixel electrode from the data line through the TFT. The voltage between
the pixel electrode and the common electrode creates the electrical field that is applied to the
20.2 LCDs 535
polarizer
glass
color filter array pixel
R G B common electrode (ITO)
G B R
liquid crystal
pixel electrode (ITO)
glass
polarizer pixel electrode (ITO )
TFT TFT
gate line
data line
liquid crystal molecules to change the light-transmitting property. The R, G, B color filters
modify the output spectra to produce the desired color stimulus at a spatial location on an
image.
There are several types of material that have been used to make the TFTs. Amorphous
(noncrystal) silicon has been the most widely used material because it does not require
high-temperature processes. However, its mobility is low ( therefore giving slow response
time) and it is sensitive to light (thus it requires the light shield). Polysilicon material
has high mobility, but usually requires high-temperature processes and/or special glass
plates. However, low-temperature, polysilicon (LTPS) transistors have been successfully
manufactured into very good quality displays.
One of the major operational differences between a CRT and an LCD is the duration of
the dwell time which is the length of time a pixel is illuminating in each refresh cycle. In
the AM LCD, each pixel is addressed once per field or frame and the luminance of the pixel
stays constant until it is addressed again. In order to avoid DC voltage build-up, the polarity
of the driving voltage of an LCD cell is often alternated at line rate or field rate. This still
makes the dwell time of an LCD pixel orders of magnitude longer than that of a CRT pixel.
As a consequence, flickering is generally not a problem in an LCD.
One of the major deficiencies of the TN-LCD as described above is the narrowness of
the viewing angle, typically 40◦ vertically and 90◦ horizontally. This is caused partially
by the polarizers but mainly by the angular transmittance of the liquid crystal cell being a
sensitive function of angle between the light beam direction and the orientations of the liquid
crystal molecules. The viewing angle can be greatly increased by various methods, such
as optical film, multidomains, in-plane switching, and axially symmetric aligned micro-
cell mode [542]. These methods have different advantages and shortcomings. For example,
the in-plane switching method is one of the early successful methods, but it suffers from
having a lower aperture ratio and slower response time. Figure 20.8 shows how the in-plane
switching LCD works. It is possible to extend the viewing angle to 140◦ both vertically and
horizontally by this method [523].
536 20 Color image display
light light
polarizer polarizer
LC
E
LC
polarizer polarizer
left top
j q j
q
righ
t
bot
to m
Figure 20.9. Coordinate framework for the polar diagram of an LCD angular measurements.
Several experimental measurements of the BRDF of LCD surfaces [91, 497] show that
much of the ambient light is reflected as a haze component, i.e., the image is not as sharp
as a specular reflection, and not as uniform as from a Lambertian surface either.
The electrooptical transfer curve of an LCD, the voltage–luminance curve, has an s-
shape that is quite different from that of a CRT monitor. The toe and the shoulder of the
s-curve are often not symmetric, with one much steeper than the other (see, for example,
[856, Fig. 7]). The red, green, and blue electrooptical transfer curves of an LCD also tend
to differ more from each other, resulting in neutral color shift as a function of input code
values. Because the luminance and chromaticity of a pixel vary dramatically as a function of
viewing angle, the calibration of an LCD usually requires measurements at various angles.
The convention is to plot the angular variation of the measured data on a polar diagram,
where the radius vector represents the inclination angle θ and the polar angle represents
the azimuth angle φ, both being defined with respect to the LCD plane (Fig. 20.9). There
are two methods for making measurements of the angular distribution: the goniometric
method and the conoscopic method [848]. The goniometric method uses precisely machined
equipment to rotate the LCD and measure its light output with a photometer, a colorimeter,
or a spectroradiometer. The conoscopic method uses an optical system to map the angular
distribution to be measured into a spatial image. Therefore, the conoscopic method is much
faster.
20.3 PDPs
Light produced by gas discharge has been well studied. Fluorescent lamps and neon lamps
are well-known light sources based on gas discharge. However, high-luminance and high-
resolution display panels based on gas discharge were introduced to the consumer electronics
market in the early 2000s. Judging from their excellent image quality, we can expect that
they will become more widely used for high-quality color displays.
Basically, a PDP works with a neon light at each pixel. A pixel cell is filled with inert
gas between two electrodes, one of which is transparent to light. When the voltage between
538 20 Color image display
glass
MgO
plasma
data
electrodes
the two electrodes of the gas-filled cell is increased, the current increases at a very small
rate and begins to saturate at a very low level. As the voltage increases further and further,
a point is reached when the electrons gain enough energy to ionize other gas atoms (such
as neon) during collisions, and the freed electrons can further ionize other atoms, and
the current starts to increase at an exponential rate. Eventually a current level is reached
at which it can sustain itself while the voltage remains at about the same level. If there
are no external current-limiting circuits, it is very difficult to control the operating point
of this gas plasma (ionized atoms and electrons). When the current is further increased
beyond this level, the voltage across the gas gap actually decreases (a negative resistance).
At this operating point, the gas plasma glows with light [1018]. The spectral emission is a
characteristic of the gas composition. For example, the red–orange glow seen on some PDPs
is due to neon gas. This is vaguely called the “primary color” of the gas. Different types
of gas have different electron orbital energy levels and the emitted wavelengths correspond
to the energy differences between orbitals. By using different gas compositions, different
color primaries can be generated for a color PDP. Some of the light emitted by the gaseous
discharge is in the ultraviolet range and can be used to excite phosphors which in turn emit
photons of different wavelengths, the same mechanism as is used in fluorescent lamps. By
using different phosphors, different color primaries can be produced. Color PDPs may use
one of these two methods to produce different colors at different pixel cells. Figure 20.10
shows an example of the basic structure of a color PDP.
Color PDPs have several advantages over CRTs and other flat panel display devices.
Since there is not much pressure difference between the inside and the outside of the panel,
PDPs do not require thick glass plates like CRTs. While the response time of an LCD is on
the order on 10−3 s, the PDP response time is on the order of 10−6 s. Another advantage is
that PDPs can be made fairly large (40–50 in) because the capacitance between electrodes
is not large and the required driving circuits can be readily achieved. The level of luminance
(500 cd m−2 ) and luminous efficiency (2–3 1m W−1 ) of the modern PDPs are as good as
20.4 Electroluminescent displays 539
those of CRTs [974]. With contrast ratios approaching 500:1, the images of commercially
available color PDPs are of very high quality (they look very colorful and very sharp).
There are many new types of color display that are self-emissive. Most of them rely on
energy transformation from externally supplied power into photons. Luminescence is the
nonthermal emission of photons by a material when it is excited by some form of external
energy. For example, chemical reactions supply the external energy for chemiluminescence,
mechanical disruptions that for triboluminescence, and electron motion energy that for
cathodoluminescence (as used in CRTs). The type of luminescence we are interested in this
section is electroluminescence which converts electric energy into light, mainly through
electric fields.
cathode
major inventions in this field was the development of materials for hole/electron injection
layers that allow efficient injection of electrons and holes. Holes and electrons will not have
a good chance to recombine and produce photons if they pass by each other too fast or do
not have properly matched energies or quantum states. Therefore, the transport layers were
developed to increase the efficiency of photon-producing hole–electron recombination. In
practice, the electron injection layer and the electron transport layer are often combined
into one layer and the light-emissive layer can be in the hole–electron transport layer. The
structures for small-molecule OLED cells and those for LEP cells are quite similar, but the
latter tend to require fewer layers than the former.
The spectral emission bandwidth of organic molecules is usually around 50–70 nm at
the half-height amplitude. The color gamut can be made larger by sharpening the emission
peak. Very high luminous efficiencies have been achieved with these OLEDs, roughly on
the order of 10–40 1m W−1 in the green [415]. Operating voltages are low, on the order of
3–10 V, because the organic emissive layer is very thin, on the order of 50–200 nm. The
manufacturing processes of the low-molecular OLED and the polymer OLED are quite
different. Color displays using either type of OLED have been demonstrated. They are
very bright (they can be greater than 50 000 cd m−2 ), have long lifetimes (longer than
10 000 hours), have wide viewing angles (greater than 160◦ ), and fast response time (less
than 10−6 s). As manufacturing processes are perfected and quantities grow, the OLED
color displays may become one of the dominant color image display devices for many
applications.
Before personal computers were widely used, office printing relied on typewriters, daisy-
wheels, or dot-matrix printers that used a type bar or stylus to strike the print medium
20.5 Printing technologies 541
printing technologies
impact printing
water-based offset
lithography
waterless offset
letterpress
gravure
screen printing
nonimapct printing
silver-halide photography
electrophotography
drop on demand
inkjet
continuous
transfer
thermal printing
sublimation
ionography
magnetography
Figure 20.12. Various types of printing technologies.
(paper) to produce output documents. These printing processes are known as impact print-
ing. Traditional commercial printing processes, such as letterpress, gravure, lithography,
flexography, and screen printing, rely on separate carriers called plates to transfer images
or documents to a blanket and then to impress them on the output medium (paper). These
are the conventional impact printing processes, and are generally known as plate printing.
Other printing technologies, such as electrophotography, inkjet, and thermal dye transfer,
are called nonimpact printing or plateless printing [719]. Figure 20.12 shows a list of cur-
rent printing technologies. The printing industry is a very large industry and many different
printing technologies are used to produce documents and hard-copy images [8, 216, 511].
Here we briefly describe some of the technologies commonly used for printing color im-
ages. The main purpose is to introduce the working principles of each technology so that we
can have at least some basic understanding of the physical processes when dealing with the
potential color imaging science problems associated with each printing technology. Readers
who are interested in more detailed information should consult books on these technologies
(e.g., [8, 484, 511, 719]).
possibly thin and flexible) material (such as aluminum) which does not transfer or impress
inks to paper very well. In 1903, Ira Rubel built a lithographic machine in which the plate
image was impressed on a rubber-covered cylinder that in turn impressed the image on the
paper [199]. The rubber-covered cylinder is called the blanket cylinder and the plate image
is said to be offset to the blanket cylinder before it is transferred to the paper. Therefore,
the combined printing process is called offset lithography.
On a lithographic plate, the image areas are coplanar with the nonimage areas. The basic
working principle of lithography is that the image areas on the plate are ink-accepting and
the nonimage areas are ink-repellent. For example, in water-based lithography, the image
areas are hydrophobic (i.e., water repellent), and the nonimage areas are hydrophilic. Greasy
inks are thus adsorbed only onto the image areas. However, new waterless offset printing
technology uses silicon-layer coated plates so that the nonimage areas are ink-repellent
[511, p. 211]. The image areas are formed by removing the silicon layer. In this process, no
water is needed to wet the nonimage areas. The plates used in offset lithography printing
are thin (up to 0.3 mm) and therefore are easy to mount on the plate cylinders.
The lithographic process is inherently binary in the sense that the amount of ink per
unit area is not controlled in gradation. Therefore, the half-toning process is always used in
offset lithography. Typically four color plates (CMYK) are prepared by using different color
filters and screens of different angles to produce half-tone images. There are many books
and papers on the science and practice of color separations and proofing [142, 310, 1065].
20.5.2 Letterpress
In the letterpress process, the image areas on the plate are raised above the nonimage areas. A
layer of ink of uniform thickness is applied to the image areas, usually all raised to the same
height, with a roller. The inked plate is then pressed and transferred to the surface of the print
substrate (e.g., paper). Letterpress printing used to be the dominant printing technology for
printing books and has been used for hundreds of years. Plates for letterpress printing were
traditionally made of rigid materials, predominantly alloys of lead, tin, and other metals.
Most letterpress plates today use flexible materials, such as rubber or photopolymers. This
is called flexography. Unlike traditional letterpress printing, flexogaphy requires relatively
low pressure between the plate cylinder and the substrate. Therefore, it can be used to print
on very thin, flexible film, rough-surfaced packaging materials, and fabrics [511, p. 397].
20.5.3 Gravure
In the gravure printing process, the image areas on the plate are recessed below the nonimage
plate areas. They are engraved into the plate surface. The recessed image areas can be of
different depths (continuous tone) or the same depth (half-tone). The ink applied to the plate
is trapped into the recessed areas while the ink adhered to the nonimage areas of the plate is
wiped out by a doctor blade. The plate is then pressed on the printing substrate and the ink
is transferred to the substrate. The print quality of the gravure process is excellent. Due to
the high cost of plate making, it is often used only for very-long-run jobs (e.g., more than
a million copies), such as weekly news magazines and packaging boxes.
20.5 Printing technologies 543
4.50
reflection density
3.00
1.50
0.00
0.00 0.40 0.80 1.20 1.60 2.00
transmission density
Figure 20.13. An example of the reflection density as a function of transmission density, Dρ = f (Dt ).
The spectral composition of the light reflected from color paper can be computed fairly
well from the amounts of dyes and their spectral transmittance functions. If multiple re-
flections between the air–emulsion interface and the emulsion-paper coating are neglected,
the reflection density is approximately twice the dye transmission density because most
reflected light goes through the dye layers twice. However, multiple reflections account for
a significant portion of the total reflected light [386, 867, 1035] and the reflection den-
sity Dρ (λ) is a nonlinear function, Dρ (λ) = f (Dt (λ)), of the transmission density Dt (λ),
both being functions of wavelength. Of course, this nonlinear relation is dependent on the
illumination and viewing geometry. Figure 20.13 shows an example of such a nonlinear
relation. Note that when the transmission density is larger, the effect of multiple reflections
is smaller, and the reflection density increases linearly at a rate approximately twice that of
the transmission density because the reflected light passes through the dyes twice. Using
this measurable nonlinear function [761], the reflection densities can be calculated from the
dye amounts analytically as discussed in Section 16.1.3.
The density range of a good glossy color paper is from a minimum density of 0.05–0.1 to
a maximum density of 2.2–2.4, measured by 0/45 reflection densitometers. This represents
a dynamic range of about 160:1. In typical print viewing environments, such as the office or
home, the illumination is more diffuse (thus creating more viewing flare) and the available
dynamic range is often less than 50:1. The average gradient of color negative paper (i.e.,
the average slope of the D–logH curve) is about 1.7–1.8 so that the images recorded on
color negative films (average gradient about 0.63) can be printed to a final contrast of about
1.1–1.2. The average gradient of color reversal paper is much lower because color reversal
films have a higher gradient of 1.3–1.5.
viewing on our brightness perception, it was found that films with a gamma higher than
1.0 produce higher-quality images. Therefore, color reversal films usually have a gamma of
1.3–1.5. Another factor related to image quality is the screen luminance level. In general,
image quality increases with the screen luminance, up to a certain level (typically from 70
to 100 cd m−2 , depending on the TRC), then it falls off gradually. It seems that, when our
eyes are adapted to the dark surround, the optimum operating range of our visual system is
only partially determined by the screen and therefore very high screen luminance may push
the image beyond the optimum visual range.
Similarly to the model of the reflection density, the spectral transmission T (λ) of a color
reversal film can be calculated from the amounts of dye, ac , am , and ay :
T (λ) = b(λ)10−[ ac C(λ)+am M(λ)+ay Y (λ) ] ,
where b(λ) is the spectral transmission functions of the film base, and C(λ), M(λ), and Y (λ)
are the spectral transmission densities of a unit amount of cyan, magenta, and yellow dye,
respectively.
photoconductor is first uniformly (−) charged in the dark. The laser discharges the black
text areas and the negatively charged toner powder is repelled from the unexposed areas
and only attaches to the discharged text areas. In color copiers or color laser printers, the
process is repeated three (CMY) or four (CMYK) times to produce a color copy.
The electrophotographic process is inherently high contrast because it is difficult to
control gentle gradation of toner powder as it is developed on the photoconductor and
transferred to the paper. For this reason, the half-toning process is used to produce images.
There is a trade-off between different half-tone patterns for producing sharp, high-contrast
texts or for producing gray-scale images.
(or pigments in dispersion). Solvent inks are used when the substrates, such as metals or
plastics, are hydrophobic or when a very fast drying time is required. Phase change inks are
referred to as solid inks or hot-melt inks. They are similar to crayons. At room temperature,
they are solid, and they melt quickly when heated.
Since the base support may not be able to stand the high temperature of the printhead, the
backcoat is to maintain the thermal stability of the dye sheet.
The receiver of the dye-diffusion thermal transfer printing typically consists of three
main layers: the release layer on the top, the dye receiving layer in the middle, and the base
support at the bottom. The release layer is to prevent the fusion between the dye sheet and
the receiver sheet. The dye receiving layer is a clear polymer that is receptive to the dyes.
The base support layer can be white paper (for a reflection print) or clear polyester (for an
overhead transparency).
20.6 Half-toning
From the above discussion, we have learnt that many important display and printing tech-
nologies, such as offset lithography, electrophotography, and inkjet printing, cannot reliably
produce many shades of gray. Yet, most of the books and magazines that we see every day
are printed with such technologies. Theoretically, it is possible to vary the thickness of ink
layers to produce different shades of gray. In practice, the control has to be so precise that
the printing process would become so slow and so expensive that few people could afford
to use it. If we require that only two levels of gray (black and white) be printed, the pro-
cess becomes so fast and so economical that it dominates the color image printing market
in its sheer volume. But how can a good color image with many shades of hue, chroma,
and lightness be produced by a bi-level printing process? First, three colored inks (cyan,
magenta, and yellow) and a black are used to produce colors. Second, the appearance of
different shades of color is produced by printing small dots of cyan, magenta, yellow, and
black inks on a piece of white paper. These dots are so small that they are not visually
resolvable at a normal viewing distance. By controlling the fraction of area covered by the
ink dots, we can produce different shades of a color. For example, if half of a paper area
is covered by black ink, the integrated reflectance of that area will be 0.5. If only one third
of a paper area is covered, then the integrated reflectance is one-third.1 This method (or
process) of controlling the black-to-white area ratios to produce the appearance of different
shades of gray is called half-toning. The three key factors that make half-toning work are
the capability to produce small dots (the smaller the better), the capability to distribute the
dots over a small area precisely (the more precisely the better), and the capability to control
the ink deposition on the dots. All three capabilities have been greatly enhanced in the last
few decades, with a dramatic increase of color image quality in the printed hard-copies. We
will first discuss the half-toning process for monochromatic images and then we will show
how color images can be printed by half-toning.
1 In this analysis, we have assumed that the area covered by the ink has a zero reflectance. This is an oversimplified
assumption to make the explanation easy.
20.6 Half-toning 549
screens to create the half-tone patterns of a given image. A set of horizontal black lines
and a set of vertical black lines form a mechanical screen of regular grids. This screen
is placed in front of a high-contrast photographic film. The original image is projected
through the screen and the film is located a short distance from the true focused plane to
create a slightly out-of-focus image on the film. The amount of light that falls on a grid
point on the film is proportional to the radiance from the original (positive) image. Because
the film is a high-contrast film, stronger light makes a larger area developable and thus
produces a larger dot on the developed film. The film thus contains many dots of various
sizes, each proportional to the local radiance of the original image and is used to etch the
printing plate. The positioning of the screen and the film relative to the original image has a
major impact on the quality of the resulting half-tone image. It takes a lot of experience and
skill to perfect this art. In 1953, the contact screen was invented. This is a half-tone screen
made from photographic film. The screen is placed in contact with the film to make the
exposure, thus eliminating the need for careful adjustment of the screen-to-film distance.
The modulated light from the original image (a print or a film) passes through the contact
screen to expose the contacted film, creating half-tone dots of varying size as functions of
the original image brightness or density. This film with the half-tone dots is then used to
make the plate for the printing machine. Contact screens are now mass produced in many
different varieties of dot shape, density profile, and density scale.
To produce a half-tone color print, four half-tone images (called color separations) are
prepared, one for each of the four color inks (CMYK). Each color separation is produced
by placing a red, green, or blue color filter or a combination of color filters (for K) in the
light path of the original image and the contact screen. Panchromatic black-and-white films
are used for the screened color separations, which are used to produce the color separa-
tion plates. In order to reduce visibility of the screen and the unwanted moiré patterns,
the screen angle for each color separation is offset from the others. The yellow separa-
tion is at 0◦ , the cyan at 15◦ , the black at 45◦ and the magenta at 75◦ . This standard
arrangement is based on the observation that the human visual system is most sensitive
to horizontal and vertical gratings, and least sensitive to 45◦ gratings. The yellow screen,
having the least luminance contrast with the paper white, is thus chosen to be at the most
sensitive orientation. The four color separation half-tone images are then overprinted on
the output medium to form the final color image. The exact ink lay down sequence is not
standardized because it depends on the interaction characteristics (ink trapping) of the four
inks.
due to the focus blur and diffraction, an analog dot can be larger than the width of a
white line. These analog screens produce half-tone dots that are not as precise as those in
modern digital half-tone printing. Most digital printers, such as inkjet and laser printers,
are specified by the number of dots per unit length (e.g., dots per inch, i.e., dpi). It is
usually assumed that the dots are equal in size and uniformly spaced. The only control is to
turn them on or off. For many digital half-toning algorithms (such as the threshold-array-
based ordered dithering), different shades of gray are produced by using N × N dots as a
half-tone cell. If N = 4, then 16 possible shades of gray (plus white) can be represented
by a half-tone cell. If the grids are square, the conversion between lpi and dpi is simple:
lpi = dpi/N . However, many printers today are capable of producing variable-size dots and
maybe variable ink thickness. The relation between lpi and dpi then becomes complicated.
Furthermore, for some half-toning algorithms (such as error diffusion and other stochastic
screening), there is no regular half-tone cell structure and the dpi frequency is the one that is
applicable.
The above discussion assumes that the device outputs square or hexagonal grids, but
many digital printers are line oriented devices. For example, a laser printer may scan a
line and then move to the next line. On a given line, the pixel frequency is usually very
high and can be changed by changing the clock rate. This is called the fast-scan direction.
The other direction, from one line to the next line, is called the slow-scan (or the process)
direction. Thus, on a given line, the distance between two neighboring, addressable pixel
locations may be much smaller than the line-to-line distance, and may be smaller than the
minimum dot size that the printer can print. This type of printer is said to have very high
addressability. The base resolution of the printer is defined to be the frequency that is most
difficult to change. In this example, it is usually the line-to-line frequency. This shows
that without a proper printer model, it may not be possible to design a proper half-toning
algorithm.
visual system will locally integrate (or blur) a small area covered with many cells to create
the perception of a continuous-tone image. This intuition is true only when the half-tone
cell is truly invisible, and it does not matter how the half-tone cell is rendered. When this
is not true, the half-tone texture pattern can interfere with our perception of the image we
want to see, and then it becomes very important how we control the half-tone pattern to
please the eye. Therefore, when we study the vast literature on half-toning techniques, we
need to know the spatial resolution that the half-toning technique was good for, because
one algorithm may work well for low-resolution images, but may produce unacceptable
artifacts for higher-resolution images [516].
Since the main objective of half-tone printing is to (fool and) please the eye, it is obvious
that we need to know how the eye can be pleased. A human visual model thus should be a
basic component in the design of a good half-toning algorithm [746, 925, 1010]. However,
our knowledge of human visual perception is far from complete and it is not always foolproof
for any existing visual model to predict what is visible and what is pleasing. For example,
our visual sensitivity to the grouping of dots into lines and curves is rarely considered in
the existing visual models.
The second key component in the design of a half-toning algorithm is the device/media
model [746]. If we are going to produce some dot patterns with a particular device on
a particular medium, we have to know how those dot patterns will actually be rendered
physically: for example, the printer may not be able to place two neighboring dots at precisely
the relative locations that we ask for; a dot when printed on the paper may not have the
exact size and shape we ask for; the light reflected from an area of the printed page to the
user’s eyes may not have the spectral radiance that we ask for, etc. A phenomenum called
dot gain is a good example. When a small dot of ink is printed on a piece of paper, its
effect on the reflectance cannot be calculated from its area alone. It also depends on how the
paper substrate scatters the light under and around the dot. The net effect of light scattering
from an isolated dot on the reflectance is to make the dot appear larger. The ratio of the
effective dot size to the requested dot size is called the dot gain. This should be part of the
device/media model that is considered when designing and implementing any half-toning
algorithm. In reality, the device/media model is often considered to be a calibration issue
and is dealt with independently from the algorithm design. Obviously, this is often a matter
of necessity for practical applications, but certainly a good algorithm design should take as
many device/media characteristics into consideration as possible.
The third key component to consider in half-toning design is the model of computational
architecture that is to be used to perform the algorithms. Often, the most difficult part is to
take care of the long-distance interaction, such as the forming of false contours. Algorithms
based on small local windows are often inadequate, unless iterative processing is used to
refine the dot patterns repeatedly. Practical implementations cannot afford time-consuming
calculation, nor can they afford very large buffer size.
The fourth component to consider is the system calibration and optimization model. It is
often assumed that we can design the device hardware, the half-toning algorithm, the media,
and the human visual model independently of each other. In reality we may be forced to
perform each design in parallel or we may have no control over how other components
are designed, but the end product has to be calibrated and optimized. If any of the model
552 20 Color image display
Random dither
The simplest way to convert a gray-level image to a binary image is to apply a threshold to the
gray levels. This creates sharp contours and does not produce the illusion of a continuous-
tone image. One of the oldest algorithms in digital half-toning is to add a random noise to
the input image signal and then apply a fixed threshold to convert it to a binary output. This
is called the method of random dither. Usually the method does not produce high-quality
half-tone images because the random dither contains noise in all spatial frequencies and
our visual perception does not like the low-frequency mottles associated with the random
dither.
Although random dither is not a very powerful method in half-toning, it is useful in other
applications when contours caused by quantization need to be made less visible in gray-
level images. For example, density balance operations on 8-bit color images often create
quantization contours in the final prints. Adding a small amount of noise and requantizing
the image signal is quite effective in making the contouring artifacts much less visible.
20.6 Half-toning 553
6 4 2 7 8 9 0 0 0
7 5 3 6 1 2 1 1 1
8 6 4 5 4 3 1 1 1
Ordered dither
In contrast to random dither, ordered dither uses a deterministic, periodic, two-dimensional,
threshold array to generate a binary image from a gray-level image. For example, if the
dimension of the threshold array is N × N , the input image is divided into nonoverlapping,
consecutive blocks, each of which is N × N pixels. Each pixel of the input image is then
compared with the corresponding value in the threshold array, and the pixel is set to 0 or
1. Figure 20.14 shows a simple example of how ordered dither works. The numbers in the
thresold array can represent either the actual thresholds or the order in which a dot is turned
on. Here we treat them as the actual thresholds. The threshold array is repeated over the
entire input image and therefore the output binary image contains a fundamental frequency
with a period of the size of the threshold array. In this example, a 3×3 threshold array is
used, and the fundamental frequency is very high if the dot size is small. However, it can
only represent ten levels of gray (including the case when the output is all zeros). In order
to increase the number of representable levels of gray, the size of the threshold array has to
be increased, resulting in a lower fundamental frequency, which can become very visible
and annoying. The threshold array used in this example is called a spiral-dot screen. This
is an example of a clustered-dot, ordered dither screen. It is called “clustered-dot” because,
for any given uniform gray-level input, the output dots are all clustered. The clustered-dot
screen tends to generate a lower-frequency dither noise because small dots are clustered
together to become a large dot in a uniform-gray area. If we order the threshold values in the
threshold array so that dots turned on at a given gray level are located far away from each
other, we obtain a dispersed-dot screen. Figure 20.15 shows an example of the threshold
array of a dispersed-dot screen. A systematic design method for the dispersed-dot screen
was invented in 1973 by Bayer [87]. Examples of screen design can be found in many
articles and books (e.g., [482] and [976]).
Error diffusion
Both the clustered-dot and the dispersed-dot ordered dithers are point operations, each pixel
is compared with a value in the threshold array, independently of what the neighboring pixel
values are. A completely different approach is to use neighborhood operations. This type of
approach tends to be more computationally intensive, but can produce much better looking
half-tone images. One very successful method is called the error diffusion method.
554 20 Color image display
2 16 3 13 2
10 6 11 7 10
4 14 1 15 4
12 8 9 5 12
2 16 3 13 2
Figure 20.15. An example of the threshold array of a dispersed-dot screen. The dashed square repre-
sents a half-tone cell.
The basic idea behind the error diffusion algorithms is that, when quantizing a gray-level
pixel into a binary black or white dot, an error in brightness at that image point is created. This
error can be compensated by turning the neighboring pixels in the direction that will cancel
the current error when all the reflectances in the neighborhood are integrated by our visual
system. Therefore, there are three key components in designing an error diffusion algorithm:
an error measure, a neighborhood to diffuse the error to, and a method for distributing the
errors. Figure 20.16 shows a conceptual block diagram of the error diffusion algorithm.
When the error diffusion algorithm was first invented by Floyd and Steinberg [319], they
used the gray value difference between the input pixel value and the thresholded output
pixel value as the error measure. The error is diffused to the four nearest “future” neighbors
in the raster scan order with different weighting factors. Since then, many variations on
the three key components have been proposed and tested with improved results [483, 516].
Images rendered by the error diffusion algorithms are generally quite good, but using a fixed
threshold to calculate the error has the potential of creating a false contour in a uniform area
when the distributed and accumulated error suddenly exceeds the threshold and requests
the algorithm to produce a dot in the output.
input output
image image
Σ threshold
error
measure
error
distribution
Color half-toning
For display or printing devices where binary output levels are used, we control the perceived
gray level of an area by changing the percentage of the area that is covered by black dots.
In order to produce a color image on such devices, different colored dots are displayed
or printed together to give a visual impression of a color image. Usually, we use dots of
only three (CMY) or four (CMYK) primary colors or process inks. We obtain different
colors by varying the percentage of the areas printed by these process inks. For example,
the color turquoise can be printed using 80% cyan dots, 7% magenta dots, and 20% yellow
dots. However, these numbers alone do not mean much for many reasons. For example, the
spectral characteristics of the three process inks on the intended paper have to be measured
and the printing sequence (e.g., cyan ink is printed first, then magenta, and then yellow)
needs to be specified. Colorimetric specifications are thus described in standards, such as
SWOP (Specifications for Web Offset Publications) [930].
There are many uncontrollable factors that affect the output color reproduction of a half-
toning process. Chief among them is the stochastic distribution of the colored dots. We do
not know how much of the area printed with cyan ink will be overprinted by the magenta
ink and the yellow ink. The effect of printing one process ink over another is printer- and
setup-dependent, and has to be treated as a parameter to be measured for each application.
The effect shows up in two ways: (1) the ink trapping problem: the amount of the second
process ink printed depends on the amount of overlap area with the first process ink that has
been printed on the paper; and (2) the failure of density additivity: the combined spectral
density may not be equal to the density sum of the two inks. If the effects are consistent
throughout a press run, they can be treated as parameters to be measured.
The basic question in color half-toning is the following: given a set of percentage dot
areas for the process inks, can we predict what color will be printed? Or in a reverse sense,
given the CIEXYZ tristimulus values, what percentage dot areas should we specify for
each process ink to produce that color? Several attempts have been made to analyze the
color reproduction in color half-toning. The best-known model is the Neugebauer equa-
tions, which was extended from Demichel’s model by Neugebauer. Let c, m, y, and k
be the fractional areas covered by cyan, magenta, yellow, and black process inks, respec-
tively, i.e., 0 ≤ c, m, y, k ≤ 1. Assuming that each ink is printed randomly independently
of each other, then any half-tone color area consists of combinations of the following
components:
r the unprinted white paper:
where ai and βi (λ), i = 1, 2, . . . , 16, are the fractional area and spectral reflectance factor
of each idealized color element. The spectral reflectance factor β(λ), of the color area can
thus be expressed as
16
β(λ) = ai βi (λ). (20.8)
i=1
Several implicit assumptions have been made in this model. (1) It is assumed that the
fractional areas of the process inks c, m, y, and k are printed as requested. This is shown
to be unrealistic. The dot areas are often different (almost always larger) than requested.
As we mentioned before, this dot area magnification in the printing process is called dot
gain. There are two main reasons for this gain [32, 33]: physical and optical. The physical
gain comes from the fact that, when ink is printed on paper, it spreads out slightly. The
reason for the optical gain is that light illuminating the area outside the dot boundary can
be scattered through the paper into the dot and come out from the dot and vice versa. As a
consequence, the coefficients ai have to be determined from the spectral measurement data,
rather than calculated. (2) It is assumed that the spectral reflectance factor does not depend
on the size of the overlapped area of different inks. In reality, a tiny overlap area is likely
to reflect light differently from a large overlap area. This means that it is not possible to
decompose the total areal spectral reflectance factor into a linear combination of 16 spectral
functions.
In general, modeling any color half-toning process requires estimates of the parameters in
many empirical equations. It can be used to optimize or simulate a product being developed.
In practice, modeling is often not used in printer calibration because of the large number of
parameters that need to be estimated. Instead, we often use 1-D LUTs to shape the signals,
apply matrices to rotate color primaries, and then rely on 3-D LUTs to achieve the desired
color mapping. This is discussed in the next section.
20.7 Printer calibrations 557
Compared with CRT monitors, printers are much more complicated to model. Even when
models do exist, their accuracies are often less than desirable. The trend is to use 3-D LUTs
for printer calibration. There are many ways to construct 3-D LUTs [422, 482, 488] and
many ways to structure the signal transformation paths. Here we will present a method for
calibrating a digital color printer that expects (R, G, B) as input code values, where R,
G, and B are integers and 0 ≤ R, G, B ≤ 255. A digital color image to be printed by the
printer can be specified in a device-independent color space, such as CIEXYZ or CIELAB.
The problem of printer calibration is to establish a signal transformation path that converts
the input image into printer (R, G, B) code values so that when the image is printed, the
color at each pixel on the print is as close to that specified in the original device-independent
color space as possible. What is meant by “as close as possible” should be defined by the
user for each application. In many cases, CIELAB E errors are minimized for a selected
set of colors.
Since a high-resolution, digital color image usually consists of millions of pixels, the
color transformation of each pixel from CIEXYZ or CIELAB to printer (R, G, B) has
to be performed very fast and very efficiently. If the 3-D LUT is completely dense, i.e., it
contains one entry for every possible input color, the only operation needed is to use the input
color as an index to locate the desired output value(s): a memory-read operation. However,
this is not yet practical because it would require a lookup table as large as 224 memory
locations for each output color channel. Therefore a sparse lookup table is often used.
For example, 8 × 8 × 8, 17 × 17 × 17, and 31 × 31 × 31 are typical sizes of 3-D LUTs.
An input color is thus interpolated from the table. This is called forward interpolation. In
order to be efficient, the table is set up on a regular grid so that no search is required to
find the neighborhood points that are to be used for interpolation. Trilinear interpolation is
often used for this purpose because it is easy to perform and can be made fast by clever
tricks. The major issue of calibration is how to set up such a multi-dimensional lookup
table on a regular grid that needs to be indexed by the device-independent color coordinates
558 20 Color image display
in which the input image is represented. For example, if the input image is in CIELAB,
we need to set up a 3-D LUT that will map CIELAB to printer (R,G,B) values. The
indices to the table are on a regular grid in CIELAB space. Before the calibration is done,
we can only produce color patches from the printer by specifying the printer (R,G,B)
values. We can then measure the CIELAB values of each color patch for which we know
what printer (R,G,B) values were used to print it. Therefore, we have a set of (R,G,B)
versus CIELAB data points, (L ∗ ,a ∗ ,b∗ ) = f (R,G,B), in which the (R,G,B) values are
regularly spaced, but the CIELAB values, (L ∗ ,a ∗ ,b∗ ), are not. In order to set up a lookup
table, (R,G,B) = h(L ∗ ,a ∗ ,b∗ ), that has regularly spaced CIELAB grid points, we have to
interpolate the (R,G,B) values from the measured data set, (L ∗ ,a ∗ ,b∗ ) = f (R,G,B). This
is called backward interpolation, because (R,G,B) are known values on a regular grid and
(L ∗ ,a ∗ ,b∗ ) are measured values. The backward interpolation can be done off-line. Therefore
complexity is not a concern and the priority is the accuracy.
F P
1-D LUT 1-D LUT Q
X′ X R′
L* fx (X′) px (X) R
3×3
Y′ Y G′ G printer
a* fy (Y ′) py (Y )
matrix (R, G, B) B′ B
A Z′ Z B′
b* fz (Z ′) p (Z)
z
G′
R′
transformation by 3 × 3 matrices are very effective means for reducing nonlinearity in the
color mapping and should be used wherever they are practical.
In Fig. 20.17, the matrix A and the 1-D LUTs, F, are determined from the definition of
CIELAB. The calibration task is to determine the three 1-D LUTs, P, and the 3-D LUT, Q,
for the targeted printer/medium. Let us summarize the basic steps in printer calibration as
follows:
1. Use the printer to print a set of color patches corresponding to some selected in-
put color values. For example, for an 8-bit/color/pixel printer, an input pixel contains
three numbers: (R, G, B) where 0 ≤ R, G, B ≤ 255. Let us select (Ri , G j , Bk ), i, j,
k = 1, 2, . . . , m, R1 = G 1 = B1 = 0, and Rm = G m = Bm = 255. It is important to
include 0 and 255 in the combinations so that the printer color gamut can be deter-
mined. It is also desirable to print a gray scale of many neutral steps, (Rl , G l , Bl ),
l = 1, . . . , n. This will be useful for deriving the 1-D LUTs, P, for correcting the
printer RGB nonlinearity.
2. Measure the spectral reflectance factors and calculate the colorimetric data for the
printed color patches under the desired viewing illuminant: (X i , Y j , Z k ) corresponding
to (Ri , G j , Bk ), i, j, k = 1, 2, . . . , m, and (X l , Yl , Z l ) corresponding to the gray scale:
(Rl , G l , Bl ), l = 1, . . . , n.
3. Use the gray-scale data, (X l , Yl , Z l ), l = 1, . . . , n, to derive the 1-D LUTs, P. For
example, we can fit piecewise cubic polynomial functions, pr (R), pg (G), and pb (B), to
the data such that pr (Rl ) = X l , pg (G l ) = Yl , and pb (Bl ) = Z l , for l = 1, . . . , n. Then
the three 1-D LUTs, P, in Fig. 20.17, are simply the inverse functions: px = pr−1 ,
p y = pg−1 , and pz = pb−1 . (Note that these three 1-D LUTs essentially convert the
X Y Z tristimulus values of the neutral grays into a linear function of printer RGB
values. However, for some printers, R = G = B does not print neutral grays. In that
case, we have to find the RGB values that will print to grays so that the three 1-D LUTs
can be used to linearize the neutral axis in the following 3-D LUT.)
4. Map (X i , Y j , Z k ), i, j, k = 1, 2, . . . , m, through the 1-D LUTs, P, to (Ri , G j , Bk ), i.e.,
Ri = px (X i ), G j = p y (Y j ), and Bk = pz (Z k ). We now have a data set: (Ri , G j , Bk ) →
(Ri , G j , Bk ), i, j, k = 1, 2, . . . , m.
5. Construct the 3-D LUT, Q : (R , G , B ) → (R, G, B), from the data (Ri , G j , Bk ) →
(Ri , G j , Bk ), i, j, k = 1, 2, . . . , m, by backward tetrahedral interpolation. This set of
data has a regular grid in (R, G, B), but it is irregular in (R , G , B ).
The last step of constructing the 3-D LUT, Q : (R , G , B ) → (R, G, B), is a com-
plicated step. We now explain this step in more detail. The data set, (Ri , G j , Bk ) →
560 20 Color image display
3 7
7
diagonal tetrahedra: 2
7
1,2,3,7 1,4,8,7
1,2,6,7 1,5,6,7 6
1,4,3,7 1,5,8,7 2
4 8
7
3 7 1 1
2 6
1 7
3 7
8 6
4 8
1 5
1 5
4
1
1 5
and the interface reflection at the ink layers. For example, a three-color reproduction on
grossy paper with high-quality inks can have a density range of only about 1.5. Furthermore,
high-density dark grays tend to be unstable (i.e., they deviate significantly from neutral when
the viewing illuminant is changed). These problems can be solved to a large degree by adding
black ink. The printable color gamut is greatly increased by the black ink. For example, the
reproducible density range can be extended to about 2.0.
In three-color printing, there is only one combination of the three colorants (or inks)
that will match the color (say, CIEXYZ) of the original under a given illuminant. If we
now add a fourth colorant (or ink), there are many combinations that will produce the
same match. The question is: which of the many combinations should be used to match
the desired color? In many cases, black ink is used for neutral grays so that they are more
stable against illuminant changes. In other cases, the black ink is the least expensive of
the inks, and therefore it is profitable to use as much black ink as possible to replace color
inks when a color match is realizable. These two cases happen frequently and they have
special names: undercolor removal (UCR) and gray component replacement (GCR). UCR
reduces the amounts of CMY ink to make room for the black ink. It is applied only to the
neutral areas in the image. GCR is similar to undercolor removal, but is applied to all color
areas. The process colors (CMY) are usually replaced by the black ink so that the gray
component in the color is mostly reproduced by the black ink. The current recommended
safe range of GCR to use is between 50% and 80% [930, p. 31], i.e., 50% to 80% of the
gray component normally produced by the CMY inks is removed and an equivalent amount
of the black ink is added. It is important that the percentage of color inks after GCR should
not be too small, otherwise the color will appear less glossy and less stable. In some other
cases, UCR and GCR are required because of physical constraints of the printing processes.
For example, in high-speed wet printing, there is not enough time to wait for one ink to
dry before another ink is printed. Therefore, as little total ink as possible should be used.
Another application is when too much ink overprinting creates an ink-trapping problem, i.e.,
too much of one ink prevents more than a certain amount of another ink being printed over it.
It is fairly typical that the maximum total amount of inks to be printed cannot exceed some
percentage of the dot area, usually around 240–300%, depending on the size of the colored
area.
The additional degree of freedom in choosing how much black ink to use for certain colors
thus demands a strategy in color reproduction. Several commonly used strategies are:
r Maximum black: black is substituted for the three colors up to the point where one of
the colors is completely replaced.
r Minimum black: no black is used except for the areas where the dark colors cannot be
produced by the three process colors.
r Minimum dot area: black is substituted for the process color that has the minimum dot
area.
r Smoothest transition: use the amount of the black ink that results in the spectral re-
flectance curve changing as little as possible from the neighboring colors in a chosen
color space. This requires an off-line global optimization in the 3-D LUT to ensure that
the black ink usage changes smoothly from color to color [423].
562 20 Color image display
20.8 Problems
20.1 Use Eq. (8.55) to derive the relation, Dρ = f (Dt ), that expresses the reflection density,
Dρ , as a function of transmission density Dt .
20.2 A color imaging application requires a reflection print to be scanned and the scanned
image displayed on a CRT monitor.
(a) Draw a block diagram showing the building blocks necessary to calibrate the
whole imaging chain from the scanner to the CRT monitor so that the (X, Y, Z )
tristimulus values in the reflection print can be fairly well reproduced on the
CRT screen. The only building blocks that you can use are 1-D LUTs and 3 × 3
matrices. You have to specify the input and the output variables for each block
and briefly explain the physical meaning of each variable.
(b) Since the reflection print is typically viewed in a light surround and the CRT
monitor is viewed in a dark surround, explain which blocks you would modify to
compensate for the visual effect due to the difference in the viewing surround.
20.3 If you want to scan an image on a reflection print and post it on the web, what do you
have to do to make sure that your image will look good on most computer monitors?
20.4 The NTSC chromaticity transformation between the phosphor primaries and the
CIEXYZ primaries is given in Table 20.3.
Table 20.3.
Stimulus r g b x y z
The ITU-R BT.709-2 HDTV chromaticity transformation between the phosphor pri-
maries and the CIEXYZ primaries is given in Table 20.4.
Table 20.4.
Stimulus r g b x y z
Determine the 3 × 3 matrix to transform the NTSC RGB signals to the ITU-R BT.709-
2 HDTV RGB signals. (Since the reference whites in both standards, Illuminant C
and D65 , are fairly close to each other in this problem, use the CIELAB type of
“normalization to white” to find the corresponding tristimulus values for the two
illuminants, i.e., X 709 / X D65 = X NTSC / X C . In general, if the two illuminants are quite
different, a chromatic adaptation model (such as the Bradford transformation) has to
be used to find the corresponding tristimulus values.)
20.8 Problems 563
20.5 The error measure and the error distribution are two important components in the
error diffusion method for half-toning. What kind of error measure can we use and
what are the computations involved for the error distribution? For example, can we
use E in CIELAB as the error measure ? If so, how do we distribute the error?
20.6 In most half-toning algorithms, the percentage dot area requested is assumed to be
what is actually printed. In practice, this is not strictly true. This difference between
what is specified and what is actually printed is called the dot gain, and can be caused by
ink spread, optical diffusion, and ink transfer problems. Given a half-toning algorithm,
how do you take care of the dot gain problem?
21 Image quality
The performance of a color imaging system is often evaluated by the image quality it can
deliver to the user. Image quality can be evaluated physically (objective image quality) or
psychophysically (subjective or perceptual image quality) or both. In this chapter, we will
discuss some of the metrics and procedures that are used in image quality measurements.
Objective image quality measures, such as resolving power, noise power spectrum, detective
quantum efficiency (DQE), and system MTF, are well defined and can often be measured
consistently [64]. However, they may not be directly correlated with the perceived image
quality. Therefore psychophysical procedures are used to construct metrics that relate to
the subjective image quality. Given our inadequate understanding of image perception, one
may even argue that the definitive quality evaluation can only be done by human observers
looking at images and making judgments. Therefore, the subjective quality rating is the only
reliable metric for image quality. Although this statement is true, it does not help us much in
developing better imaging systems because human judgment is too time-consuming, costly,
and not always consistent. Objective image quality metrics are needed for many product
optimizations and simulations.
In the past (before 1970), image quality was often measured on a system level. With the
advance and availability of digital imaging devices, quality metrics for individual digital
images have also been developed. These image-dependent image quality measures are
becoming more and more important because they can be used to detect and correct problems
before images are displayed or printed. An automatic correction algorithm for individual
images requires a reliable image quality metric that can be computed from a digital image
[489, 672].
The design and manufacturing of imaging devices often require making trade-offs between
various desirable features. For example, a lens designer would like to minimize all lens
aberrations at all wavelengths, but that would increase the cost of lenses beyond what is af-
fordable. Therefore, the question of what constitute the most desirable or optimal trade-offs
arises. The lens is only one component of an imaging system. The design and manufac-
turing of the other components in the system also require similar trade-offs. If the entire
imaging system (from capture to display) is designed by a single manufacturer, all the
564
21.1 Objective image quality evaluation 565
components can be optimized together. However, this is almost never the case. As a con-
sequence, many component-specific image quality metrics have been developed. These
component-specific image quality metrics are very useful for component designers, but
may not be as useful for system engineers or consumers, because some of the parameters
cannot be measured conveniently or because the metrics are less correlated with perceptual
image quality. In this section, we will discuss only the more generally applicable, system-
oriented quality metrics. A color imaging system is often evaluated in terms of optical
image quality, detector efficiency, tone reproduction, color reproduction, image structure
(including resolving power, MTF, and noise), and system stability. It should be pointed
out that image quality evaluation is always related to what we plan to do with the images
[61, 62, 63], especially in the fields of medical imaging, remote sensing, and pattern recog-
nition. Here we are mainly interested in consumer imaging and the task is not well defined,
other than for people’s enjoyment and saving for memory.
Alternatively, we can also measure the minimum resolvable distance of two point sources.
However, the most widely used method for system analysis is spatial frequency analysis.
There are several related functions in this type of analysis. They are all based on the idea
of measuring the system output in response to a sinusoidal input as a function of spatial
frequency.
r The OTF [354, 882, 887]: this is defined for the optical image forming components.
The function is normalized to 1.0 at zero frequency. Both the amplitude and phase
responses as functions of input spatial frequency are specified.
r The system transfer function: this is a general term, applied to all linear, shift-invariant
systems. It describes the output amplitude and phase as a function of the input sinusoidal
signal at various frequencies.
r The MTF: this is the amplitude response part of a system transfer function. It is often
assumed that the phase response is a linear function.
r The contrast transfer function (CTF): this term is occasionally used in imaging applica-
tions. Contrast is often defined as the ratio of the difference to the sum of the maximum
and the minimum signal amplitude (called the Michelson contrast). The CTF is the
output contrast divided by the input contrast as a function of spatial frequency. The
main difference between the CTF and the MTF is that the definition of contrast usually
requires that a DC level is specified to be greater than or equal to the minimum of the
signal, i.e., the modulation cannot drive the signal to negative because an image signal
cannot have a negative amount of light.
r The SFR function: this term is used to denote the MTF in the spatial domain, as
opposed to the temporal domain.
The sinusoidal functions are eigenfunctions of linear, shift-invariant systems, i.e., if the
input is a sinusoidal function, the output is also a sinusoidal function of the same frequency,
although the amplitude or phase may be different from that of the input. Fourier analysis
allows us to represent any square-integrable function as sums or integrals of sinusoidal
functions. If we can characterize how a system responds to any sinusoidal function by a
system transfer function H (ωx , ω y ), where ωx and ω y are the angular frequencies for the
x and y spatial variables, we can compute the Fourier transform G(ωx , ω y ) of the output
function g(x, y) for any square-integrable input function f (x, y) by transforming the input
signal into its spatial frequency representation, F(ωx , ω y ), and
G(ωx , ω y ) = H (ωx , ω y )F(ωx , ω y ). (21.1)
It turns out that an optical imaging system can be approximated (in the limit of paraxial
approximation) as a linear, shift-invariant system for a small region around the optical axis
on an image plane [354, Chapter 6]. Use of Fourier analysis in optics was popularized by
Duffieux in 1946 [267]. OTFs were calculated for lens aberrations by Hopkins [410]. The
applications of the Fourier method to imaging system analysis were extensively demon-
strated by Schade [830, 831]. Since almost all practical imaging systems are not linear,
shift-invariant systems, spatial frequency analysis is not a complete measure of the system
characteristics. In order to characterize the optical image quality of an imaging system,
many OTFs have to be measured along different angles and at different positions on the
21.1 Objective image quality evaluation 567
image planes. It is also important to measure the OTF as a function of distance from the
ideal image plane to get a measure of the depth of focus for the camera. If we choose a
spatial frequency and plot the system response as a function of distance, we will obtain a
function, called the “through-focus transfer function”. For digital cameras, the OTF can be
measured by the slanted edge method [788]. There are two reasons for applying frequency
analysis to nonlinear systems: (1) theoretically, when the signal amplitude is small around
a fixed operating point of a nonlinear system, linear analysis can be a good approximation;
(2) empirically, the results from cascading frequency response functions of nonlinear sys-
tems seem to be quite useful in predicting some attributes of the final image quality, such
as the sharpness of reflection prints from color negative films. Until better metrics can be
formulated and verified, spatial frequency analysis will continue to be used as a convenient
(although not complete), objective, image quality measure.
Other, simpler measures of optical image quality are also commonly used. For example,
a lens may be designed to produce at an image plane the maximum Strehl intensity [124,
p. 462] (i.e., the image irradiance, normalized to the ideal diffraction-limited case, at the
peak of the PSF). A system may be characterized by its highest cut-off frequency, or the
half-height frequency, where the OTF is equal to 1/2 of the DC level. Yet another measure
is to use the maximum number of line pairs per millimeter that can be resolved on an
image.
Let t1 = (x1 , y1 ) and t2 = (x1 , y1 ) be two locations in a random field I , µ1 = E[I (t1 )]
and µ2 = E[I (t2 )] be the mean values at t1 and t2 , and σ1 and σ2 be the standard deviations
at t1 and t2 . The covariance of the values of a random field, I , at t1 and t2 is called the
covariance function (of t1 and t2 ):
Again, for a homogeneous random field, R I I (t1 , t2 ) = R I I (τ ). For two random fields, I (x, y)
and J (x, y), we can define the cross-correlation function, R I J , as
It can be shown that this definition also holds true for square-integrable, deterministic
signals, and the new definition is equivalent to the original concept of power spectra [113,
pp. 83–88]. Wiener [1030, 1031] and Khinchine [503] established the relationship between
the autocorrelation function and the spectral density function (the power spectrum). For
this reason, power spectra are often called Wiener spectra. Another way to solve the first
problem (see, e.g., [353, p. 70]) is to define the power spectrum of a random field as the
limit of the expected value of the average power over a finite area as the area goes to
infinity.
The second problem is solved by using windows to reduce the leakage of energy of a
frequency into its neighboring frequencies, and by averaging many finite areas of one or
more realizations to reduce the variance in the estimate (assuming that the random field
is ergodic). Alternatively, the autocorrelation is computed from the data and its Fourier
transform is computed as the power spectrum. The variance in the estimate of the noise
power spectrum can be reduced by multiplying the autocorrelation function by a smooth
window function.
It is interesting to note that increasing the area of the noise field used to do the Fourier
transform does not reduce the variance of the estimate. The extra data just increase the res-
olution of the estimate. An effective way to reduce the variance is to do a moving average
to smooth the computed spectrum. This essentially trades the extra resolution in the esti-
mate to get a more accurate estimate of the power (see, e.g., [777, pp. 549–58]). Therefore
the data field is divided into overlapping or nonoverlapping segments. The FFT is used
to compute the Fourier transform of each segment. The square of the Fourier transform
is then taken to obtain the power spectrum of each segment. Finally, the power spectra of
all the segments are summed and averaged to obtain the final estimate of the noise power
spectrum.
For two random fields, I (x, y) and J (x, y), we can define the cross-power spectrum,
S I J (ωx , ω y ), as
∞
S I J (ωx , ω y ) = R I J (τx , τ y )e−i(ωx τx +ω y τ y ) dτx dτ y . (21.7)
−∞
Let I (x, y) and J (x, y) be the input and output random fields of a linear shift-invariant
system with a system transfer function of H (ωx , ω y ). It can then be proved [744, p. 312]
that:
S J J (ωx , ω y ) = H (ωx , ω y )2 · S I I (ωx , ω y ), (21.8)
S J J (ωx , ω y ) = H (ωx , ω y ) · S I J (ωx , ω y ), (21.9)
∗
S I J (ωx , ω y ) = H (ωx , ω y ) · S I I (ωx , ω y ), (21.10)
where H ∗ (ωx , ω y ) is the complex conjugate of H (ωx , ω y ). The first of these three relations
shows that the power of the input noise at a given frequency is amplified by the linear system
to produce the power of the output noise, just as is done to any deterministic signal. One can
570 21 Image quality
therefore input a noise field to a linear system and measure its output noise field to derive
the system transfer function.
A homogeneous random field, I (x, y), is called white noise if R I I (τx , τ y ) = δ(τx ) · δ(τ y ).
The power spectrum of white noise is a constant for all spatial frequencies. The color
analogy here is that the equal-energy spectrum in optical radiation appears to be white.
Similarly, if the noise power spectrum has more energy in the high frequencies than in the
low frequencies, it is called blue noise.
A · σ D2 ≈ constant. (21.11)
This relation does not work well for some color materials, prints from negatives, and radio-
graphic films. Therefore, it is always necessary to specify the aperture size when reporting
the film granularity. For example, Kodak has chosen to use a circular aperture 48 µm in
diameter to measure the granularity values for most Kodak films. Furthermore, noise is a
function of signal level. In silver-image films or color reversal films, granularity increases
with mean density. In color negative films, granularity increases with density in the low-
density areas and then stays constant or decreases as density increases. Kodak has chosen
to measure granularity at a diffuse density of 1.0. Typically film granularity is from about
0.005 to 0.05. It is important to note that equal granularity is not equally visible on a reflec-
tion print because density is not a perceptually uniform lightness scale. It requires a much
larger density difference in the dark area than in the light area to make an edge visible. The
just-noticeable difference (JND) in granularity was found to be about 6% for a uniform field
of density 1.0, and can be as much as 16–30% for complex scenes [1075], due to visual
masking from image signals.
21.2 Subjective image quality evaluation 571
In color imaging applications, it is often desirable to evaluate subjective image quality for
a given system performance. For example, when a new photographic film is developed, the
manufacturer would like to know if the new film delivers better image quality than a com-
peting film. A digital camera manufacturer would like to know if a new CFA interpolation
algorithm works better than the old one. In general, image quality questions are very difficult
to answer reliably for several reasons: (1) Most image quality improvements are incremental
and therefore not obviously perceptible. (2) Improvements are often made at the sacrifice of
some other image attributes and whether the results are perceived as improvements can be
a matter of preference. A good example is image sharpening which makes the image look
sharper, but also makes the noise more visible. There is no question that most people prefer
sharper images, but they differ a lot in terms of how much noise they are willing to tolerate.
(3) There is a learning effect that can go either way. The new “look” of a processed image
can be initially perceived to be objectionable, and yet gradually becomes acceptable or even
preferred. This learning effect can go the other way too. The new “look” can be perceived
initially as a great improvement, but gradually become regarded as less desirable. (4) Image
quality evaluation is often performed by third-party observers (people who did not take the
pictures and do not know the subjects in the pictures). Their judgments of image quality
can be quite different from those of first-party observers. (5) Image quality judgments are
affected by the judgment attitudes and strategies induced by the experimental protocols and
procedures [251, 343], such as the influence of context and the influence of instructions.
(6) Reliable evaluation of system performance requires a large sample of images. Most
imaging systems make compromises in cost/performance trade-offs. A system may do an
excellent job in certain imaging situations, but fail miserably for other imaging situations.
For example, a camera may give an excellent exposure for a frontlit scene, but a poor expo-
sure for a backlit scene. Without proper statistical sampling in “photographic space” (all the
possible situations in which the imaging system may be used to take pictures), it is impos-
sible to thoroughly evaluate the system performance. We will return to this topic later in the
chapter.
There are many image attributes that affect the overall image quality. The problem of
image quality evaluation can thus be seen as projecting a multi-dimensional space into
a one-dimensional space. For example, contrast, sharpness, noise, tone reproduction, and
color reproduction are five of the most studied attributes. For digital images, artifacts of
digital processing also add many more dimensions to the problem. Whether it is possible
to give a numerical quality to each image is something that we all have to keep in mind
when we deal with image quality issues. Another problem of image quality evaluation is
the question of methodology [281]. Psychophysical methods that are successfully used for
measuring the detection threshold and the discrimination threshold are not always applicable
to image quality evaluation, and, when they are, it is often too time consuming to conduct
experiments with those methods. One frequently used method in image quality evaluation
is that of numerical category scaling. For example, image quality is rated from 1 to 10
and the task of an observer is to assign a quality number to each image. It is important
to understand that we should not assume the numerical categories to be equally distanced.
572 21 Image quality
In fact, observers may scale things differently in different parts of the scale. Numerical
category scaling is easy and fast, but its validity is based on the assumption that observers
can indeed evaluate all the attributes of an image and consistently produce a numerical
value to represent the combined impact of all the attributes. Fortunately cross-comparison
with other nonnumerical methods (such as judging which one is better in a pair comparison
[875]) seems to show that the results from the method of numerical category scaling seem to
be consistent with those from other, more time-consuming methods. In summary, perceptual
evaluation of image quality can be done with statistical validity, but the reliability differs
from task to task. In particular, if many attributes are varied at the same time, the results
tend to be more noisy. In the next few sections, we will discuss the effect of varying one
attribute at a time. We will then discuss some attempts that have been made to evaluate the
impact on the total image quality of a combination of many varying attributes.
21.2.1 Contrast
When black letters are printed on a piece of white paper, the “contrast” of the printed page
is perceived to be high. The contrast in this case is correlated with the reflectance ratio
between the letters and the paper. When a dark room is illuminated by a small point source,
the “contrast” of the scene is also perceived to be high. Here, the contrast is produced by
comparison of the luminances in the shadow and in the highlight. In addition to reflectance
and lighting, contrast can also be produced by color differences. Ideally, we would like to
conclude that reflectance contrast, illumination contrast, and color contrast are the three
main physical correlates of our perceived contrast of a scene or its image. However, we
do not have sufficient knowledge of how various physical (contrast) events are combined
and perceived as the perceptual attribute that we call image contrast. Perceptually, we seem
to be able to visually parse the scene into reflectance, illumination, and color variations;
it is unlikely that our visual system does this parsing for the sole purpose of computing
the perceived scene contrast. A more likely scenario is that the perceptual contrast is a
by-product of other visual computations, such as edge detection, contour integration, and
shape from shading, etc.
Although we are quite ignorant about how our perceptual contrast is computed from an
image, we do know a number of physical variables that we can use to manipulate it, such
as sharpness and chroma [156]. The best-known variable is called gamma, the slope of the
TRC when the log output luminance is plotted as a function of log input luminance. The
results from several experiments [811] can be summarized as follows:
r The perceived image contrast increases with gamma, but it reaches a maximum at a
gamma of 3–5, depending on the scene.
r The optimum gamma (the one that gives the best image quality) is dependent on scene
type and viewing conditions.
r When image quality is plotted as a function of perceived image contrast, image quality
is a convex function of perceived image contrast. It reaches a peak when the perceived
contrast is about 6 on a scale of 1–10. Surprisingly, this does not seem to be scene-
dependent.
21.2 Subjective image quality evaluation 573
21.2.2 Sharpness
The intuitive meaning of image sharpness is associated with how clearly and distinctly
a physical object in the image can be seen. Step edges and fine textures are often the
objects used to judge image sharpness. In the ideal world of geometrical optics, a step
edge is reproduced as a step edge, and any deviation from that constitutes a blur, i.e., a
degradation of sharpness. In practice, depending on the viewing conditions, our visual sys-
tem cannot perceive a blur which is less than certain threshold value. For example, the
minimum detectable blur on a knife edge (transition width) is about 25 sec to 1 minute
of are, under foveal inspection (presumably for longer than 1 s) [378, 1017, 1024]. How-
ever, when the exposure time is reduced to 30 ms, the blur threshold increases by a fac-
tor of 2–4 [1024]. Physical sharpness, such as is characterized by the spread of the PSF
or the system MTF, is found to correlate quite well with perceptual sharpness. This has
been studied extensively and several sharpness measures have been proposed. Among
them are edge gradients [400, 493], various acutances (e.g., system modulation transfer
(SMT) acutance [221], cascaded modulation transfer (CMT) acutance [340] and a mod-
ulation transfer (AMT) acutance [222]), the subjective quality factor (SQF) [357], the
square-root integral (SQRI) method [69], the sharpness factor (SF), and the perceived in-
formation capacity (PIC) [455]. Given their purely empirical nature, the predictive power
of these sharpness measures is surprisingly good. Reviews and comparisons of the var-
ious metrics can be found in a number of references [455, 533, 707]. We will briefly
mention some of these sharpness measures to gain some insights into these empirical
findings.
where Ec (ω) is the eye function approximated as a Gaussian function with σ = 26.7
cycles per degree and M(ω) is the cascaded system transfer function.
3. AMT acutance:
AMTA = 100 + 66 log[( Ea (ω)M(ω)dω)/( Ea (ω)dω)], (21.13)
where S(ω) is the signal power spectrum multiplied by the squares of the system MTF,
Meye (ω) and Neye (ω) are the eye “MTF” and the equivalent eye noise power spectrum,
and N(ω) is the system noise power spectrum.
574 21 Image quality
Several interesting observations can be made from this long series of studies:
r The logarithm of the integrated area of the CSF1 -weighted MTF of an imaging system
is found to be linearly correlated with the subjective sharpness, especially when the
system noise is low or not perceptible.
r When the noise is high, the system’s signal-to-noise ratio seems to predict the perceived
sharpness better.
r With digital image processing, it is now possible to enhance images with MTFs greater
than 1 and in almost any arbitrary shape. None of the existing sharpness measures has
been tested thoroughly for this type of MTF manipulation. It is often assumed that image
quality increases with sharpness. This cannot be true if the existing sharpness measures
are used because certain MTF manipulations make images look quite unnatural, and
their quality is actually degraded, although their computed sharpness is higher.
The perception of image sharpness is coupled with the perception of image contrast. For
a given system gamma, the perceived sharpness increases with the cut-off frequency. As
gamma is increased, the perceived sharpness for a given cut-off frequency is also increased
until the gamma value reaches 0.8 or higher, then it becomes insensitive to further gamma
change [811, p. 49].
In electronic imaging systems, equivalent types of tone reproduction error also exist.
Furthermore, due to limited bit-depth, tonal quantization errors can sometimes be clearly
seen, especially after image compression. On a pixel level, errors due to color matrixing
and edge enhancement are also visible in some image areas. In general, tone reproduction
errors can be measured in terms of L ∗ in CIELAB on a pixel-by-pixel basis, or on an
image-by-image basis. However, certain quantization errors are much more objectionable
than the mere L ∗ error can tell us.
Experimental results [42] showed that acceptability versus overall CRI is both scene- and
hue-dependent. It is clear that any attempt to quantify color reproduction errors has to
take the main subject matter into account. A fixed weighting scheme cannot do a proper
job.
In consumer photofinishing, a color balance error greater than 2 buttons is considered
to be unacceptable. The color button here corresponds to 0.05 log exposure on paper [214,
p. 152]. Since most photographic color papers have very nonlinear D–log H curves, a 2-
button error in color exposure will introduce varying amounts of color shift in highlight,
mid-tone, and shadow. In the mid-tone areas, the color error can translate into a 0.1 density
error or higher, depending on the gamma of the paper used. The other complication is that
we are more sensitive to green–magenta error than to yellow–blue error, and therefore,
the acceptable color balance error should be hue-dependent. Alternatively, the color error
should be specified in a perceptually more uniform color metric, such as CIELAB. In
summary, the 2 button criterion gives us an empirical rule of thumb, which is based on a
long history of photofinishing. However, this criterion is not generalizable to other color
imaging applications because the photographic system is a closed system and the color
button correction cannot be easily translated to other systems.
sharpness, and colorfulness [156, 304, 463]. Different types of function have been proposed
to model such nonmonotonic dependence [282].
Let us look at one example of such a study on the combination model. In combining the
influence of sharpness and graininess, Bartleson [82] found that p = −3.4 seemed to fit
his experimental data well. It should be pointed out that these were purely empirical curve
fittings and the exponents may vary widely for different sets of data. In fact, for each print,
different parameters are needed. The value p = −3.4 was for the average of the set of prints
used in the experiment. The same expression is also found to describe the combined effect of
various perceptual impairments on image quality. Image quality degradations (blurring and
distortion) caused by artifacts are called perceptual impairments. Several studies [250, 717]
have shown that, in image coding, the total perceptual impairment I due to impairments Ii
in different image attributes can be combined in Minkowski distance:
N
p
Ip = Ii (21.16)
i=1
measures, we can then use multi-dimensional scaling to figure out which combinations of
noise and gamma can serve as the efficient measure of image quality. Experiments that
use stimuli generated from combinations of physical parameters without predefined image
attributes have been conducted to establish physical descriptors for psychophysical attributes
[641, 753]. This type of approach is inherently more satisfactory than predefining each
image quality attribute and then combining their contribution with a Minkowski distance.
However, this is generally useful only if most relevant physical parameters are considered
in generating the experimental images.
Image quality performance evaluation is a complex process, not only because image quality
is a subjective judgment, but also because system performance requires statistical validation
with proper sampling in the “photographic space”, i.e., the frequency distribution of image
taking as a function of photographic variables [303, 846]. Many of the algorithms in a digital
camera, such as autoexposure and auto-white-balance, are statistical in nature. Certain types
of scene can make an algorithm fail frequently because those scenes do not follow the
assumptions that the algorithm makes when computing its output values. If a digital camera
produces too many bad pictures in a field test, it does not necessarily mean that the camera
has poor quality performance. It could be a result of taking too many pictures of the same
type of scene that happens to be inconsistent with the assumptions on which the algorithms
are based. A simple example is that if the autowhite-balance algorithm is based on the
assumption that an image always integrates to gray, then any scene that has a dominant
subject color is likely to cause a significant error in the color balance of the image of that
scene. Modern auto-white-balance algorithms are much more sophisticated than the simple
integrate-to-gray algorithm, but they are all based on some variations of the same basic
assumption.
Once we understand the subjective and the statistical nature of the image quality per-
formance evaluation, it is important for us to follow some agreed process to approach the
problem in order to arrive at a repeatable and reliable result. Since the performance of algo-
rithms is highly dependent on the type of scene that are being imaged, a reliable evaluation
has to cover enough varieties of scene type to be statistically valid. The following is an
attempt to provide a partial check list for testing each of the major functions of a camera or
other image capture device.
r Autoexposure: the following scene types can be used for evaluation: (1) back-lit,
(2) front-lit with close-up objects, (3) side-lit with strong shadow and highlight, (4)
mixed illumination (e.g., outdoor scenes with some buildings in the sun and others
in the shade), (5) main subject wearing white clothing, (6) main subject wearing dark
clothing, (7) dark surrounds, (8) bright surrounds, (9) busy background, (10) uniform
background, (11) main subject in the center, (12) main subject off-center, (13) main
subject half in the shade and half in the sun, (14) snow or beach scenes, (15) document
images, (16) event images (graduation, birthday, reunion, etc), (17) location images
580 21 Image quality
(amusement parks, mountains, stores, shopping malls, office buildings), (18) baby and
small child images.
r Auto-white-balance: the following scene types can be used for evaluation: (1) scenes
under different illuminations, such as tungsten light, at different times of the day, under
fluorescent light, (2) scenes that have dominant colors, such as large areas of green
grass, blue sky, or red carpet as the background, (3) scenes with mixed illuminants, (4)
scenes that have bright saturated colors, (5) event images (parties, dinners).
r Autofocus: the following scene types can be used for evaluation: (1) scenes with objects
located at varying distances, (2) scenes that have busy textures at distances other than
that of the main subject, (3) scenes that have two main subjects, located on either sides
of the center of the image, (4) baby and small child images, (5) event images, (6)
location images.
r Tone reproduction: the following scene types can be used for evaluation: (1) scenes
that have very subtle and gradual shadings, such as cloudless sky or uniform walls,
(2) scenes that have very wide luminance dynamic ranges, such as some objects illu-
minated by bright sunlight and others in deep shadow, (3) overcast days, (4) snow or
beach scenes, (5) flash-in-the-face scenes, (6) event images, (7) location images (street
scenes, mountains, office scenes, amusement parks), (8) home life images.
r Color reproduction: the following scene types can be used for evaluation: (1) scenes
that have a Macbeth ColorChecker, (2) scenes that have large skin areas, such as a close-
up portrait, (3) people scenes, (4) home life images, (5) location images (restaurants,
stores, parks, etc), (6) scenes with blue sky and green grass, (7) scenes that show fruit.
r Edge enhancement: the following scene types can be used for evaluation: (1) scenes that
have busy textures, such as those that show hair and grass, (2) scenes that have high-
contrast and low-contrast edges, (3) close-up portraits, (4) snow scenes, (5) beach and
water scenes, (6) scenes that have objects at varying distances, (7) scenes of building,
(8) natural scenes (such as mountains, lakes, etc.), (9) scenes with blue sky and white
clouds, (10) scenes with gradual and subtle shading, (11) scenes with high-spatial-
frequency patterns.
r Noise suppression: the following scene types can be used for evaluation: (1) scenes
that are underexposed, (2) scenes that have fine textures, such as those that show hair,
grass, or a water surface, (3) portraits, (4) scenes with edges in various directions, (5)
scenes with wide dynamic luminance range, (6) scenes with stripe structures, such as
shirts with grid patterns, (7) scenes with people at varying distances.
r Image compression: the following scene types can be used for evaluation: (1) scenes
that have subtle shading and high-contrast diagonal edges, (2) close-up portraits, (3)
scenes that have fine textures, such as those that show hair and grass, (4) scenes with
people at varying distances.
Image quality evaluation studies are costly and time-consuming to conduct and their results
can have a major impact on business decisions. Therefore it is very important to clearly
21.4 Factors to be considered in image quality evaluation 581
define the objective, the method, the procedure, and the control. Expertise is required in the
following areas: psychophysics, marketing, imaging science, and system engineering. Image
quality evaluation studies differ from other marketing or psychophysical studies. Here, we
will discuss some of the issues in observer screening and the planning of experiments.
21.6 Problems
Digital image processing is a field that has diverse applications, such as remote sensing,
computer vision, medical imaging, computer graphics, graphic arts, pattern recognition,
and industrial inspection. There have been many books that cover the general topics of
digital image processing in varying depths and applications (e.g., [86, 165, 262, 351, 363,
456, 457, 594, 752, 776, 807, 841]). Readers are encouraged to consult these books for
various operations and algorithms for digital image processing. Most of the books deal
with monochromatic images. When dealing with color images, there are several concepts
that are inherently quite different. For example, if we treat the RGB signals at a pixel
as a three-dimensional vector, a color image becomes a vector field, while a monochro-
matic image is a scalar field. Typical operations, such as the gradient of an image, have
to be thought over again because simply repeating the same scalar operation three times
is often not the best thing to do. Another important reason for much of the required re-
thinking is that our visual perception of a color image is usually described in terms of
luminance–chrominance color attributes, not RGB color channels. A color image simply
provides much more information than a monochromatic image about the scene, its ma-
terial properties and its illumination. We have to think and rethink about how to extract
the additional information more effectively for the applications we have in mind. In this
chapter, we will study some basic issues and explore some new concepts for formulating
old problems which we might have encountered when working on monochromatic image
processing.
In every image processing task, the most important job is always problem formulation. If
the problem is not formulated well, the solution is often less than desirable. Let us take
image sharpening as an example. Many engineers immediately go to the textbooks and the
published literature to see what kinds of algorithm there are, and pick up the one that they
like best according to whatever criteria they have. Often, they choose some variation of an
adaptive unsharp masking algorithm [1064] because it is widely used and they have seen
many impressive images in the published papers. They then spend a lot of effort coding
it up and trying it out by systematically adjusting the important parameters. Although
most adaptive unsharp masking algorithms are quite acceptable for many applications, few
engineers actually think through their justification and compare them with other image
585
586 22 Basic concepts in color image processing
sharpening algorithms, such as multiscale contrast enhancement [997], edge shaping [549],
or curvelet transform [897, 898]. It is often observed that, over the lifetime of an algorithm
research project, less than 5% of the time is spent on problem formulation, 30% on coding
and debugging, and the rest on modifying the codes to fix the “minor” failures or the
“occasional” image artifacts created by the algorithm. In fact, it might be more profitable to
spend more than 20% of the time on problem formulation. In the case of image sharpening,
the unsharp masking approach attempts to raise the slope of edge transitions at the risk of
distorting the edge profiles with overshoots. Furthermore, the unsharp masking operation
is a second-order derivative and therefore quite sensitive to image noise. Before we rush to
implement the algorithm, we should seriously consider other appoaches that do not create
overshoots and are not as sensitive to image noise.
In general, after going through a careful thought process to formulate the problem, we
have to study the following four basic questions that face every color image processing
problem:
Use of color information in machine vision and image processing has raised questions about
the “best” color space for a particular task. That is, how do we represent the color information
so that the task can be implemented efficiently? For example, if we would like to detect and
locate faces in images using skin color as one of the features, what is a good color space to do
skin color detection? One can argue that since all color spaces are transformations of each
other, it does not matter which color space is used. Mathematically speaking, this may be
true. In practice, a poor choice of color space can increase the complexity of processing and
sometimes obscure useful features. An analogy is the time domain versus frequency domain
representations. They are mathematically equivalent, but some tasks can be performed more
efficiently in the frequency domain, and others in the time domain. However, there are more
issues to be considered in choosing an efficient color representation than choosing a time
or frequency representation.
In most cases, the “best” color space depends on the intended application. For example,
if we want to quantify how a trade-mark is reproduced in an image, a perceptually uniform
color space such as CIELAB or the Munsell system is the most useful. On the other hand,
if our job is to adjust the color balance of an image, some kind of log exposure space is
a good choice because the adjustment can be done by simple subtraction or addition of
constants. In this case, CIELAB is very inconvenient to use because changing (X n , Yn , Z n )
for all pixels involves a lot of computation. Yet for other applications, the color space may
be fixed by the requirement of the imaging chain. For example, NTSC TV (video) signals
are broadcast in a standard YIQ encoding. In the receiver, image processing circuits are
often designed to receive YIQ or gamma-corrected RGB space. To do otherwise requires
extra hardware and increase product cost.
Conventional color spaces [430] were defined either for colorimetry (such as CIEXYZ)
or for perceptual comparison (such as CIELAB, the Munsell system, and the Natural Color
System). Although these color systems provide good specifications of color information,
they are not necessarily suitable for machine vision and image processing applications. For
example, the CIE (x,y) chromaticity is known to be singular for zero input values, making
it very sensitive to noise for dark image regions [501]. Other systems that use hue angles
as one of the attributes are very unstable for the less saturated colors. A more theoretically-
oriented approach has been proposed that uses the eigenvectors of the covariance of the color
signals [145, 727, 776]. This is a good idea for data compression because the eigenvectors
588 22 Basic concepts in color image processing
tend to decorrelate the color components. However, it is not clear how one can estimate
the “true” covariance of the natural color variations. We will come back to this subject
shortly.
In this section, we will first discuss the general issues that one has to consider when
choosing a color space. In the next section, we will then develop a good color space for
image segmentation based on the discussion here. In order to restrict the scope of the
discussion, we will assume that the vision or imaging system has three color channels that
sense three independent spectral signals from a natural scene. Following conventions in
consumer applications, we will designate the three color signals as the red, green, and blue
signals which simply indicates their relative sensitivities to the long, medium, and short
wavelengths. These spectral bands are often about 100 nm wide (say 400–500 nm, 500–
600 nm, and 600–700 nm). There are also systems which use color filters that are broader
in spectral transmittance, such as yellow, cyan, and magenta, to collect more light in each
channel. These “complementary” color signals are then converted into red, green, and blue
for further color processing.
In processing digital color images, the first thing we should find out is what the numbers
mean, i.e., the signal calibration. For example, we should know if the digital numbers are
proportional to the scene radiances (linear), or the log scene radiances (log), or the gamma-
corrected scene radiances (video). Although these are the three metrics that are most directly
tied to the original scenes, there are other media- or perception-related metrics that are also
very widely used. For example, images that are digitized from photographic films are often
calibrated in terms of film density or printing density. Images that are used in graphic arts
are often calibrated in reflection density or CMYK ink coverage. Images that are captured by
digital cameras may have been rendered with nonstandard gamma correction curves [177].
Ideally we should also find out the spectral responsivity function of the device/medium
that is used to capture the original image and, if possible, the signal processing (chemical,
or electronic) that was applied to the captured image. Unfortunately, this information is
often not available and can be very difficult or expensive to determine. The next best thing
is to have companion images that include certain calibration targets (such as the Macbeth
ColorChecker [647], IT8 or ISO 12641 targets [446]) on them. However, it is not unusual for
there to be no calibration information related to the image at all. In such cases, the statistical
or physical properties of the image may be used for automatic rendering or classification
of the unkown images [300, 368, 575, 597]. Alternatively, some careful comparisons of the
digital values corresponding to various objects in the image may give a clue as to which of
the possible metrics is most likely to be the correct one.
Once we know the metric of the digital images, we have to consider three issues before we
decide how to transform our color signals to a best representation for the task in hand. These
three issues are: (a) signal characteristics, (b) noise statistics, and (c) system constraints.
of natural scenes. Presentation graphics are full of saturated colors, high-contrast edges, and
uniform areas with little image noise. The major concern here is to make sure that they are
easy to read and give good visual impressions. For this type of image, it may be desirable to
represent color signals in perceptual spaces so that visual contrast and color preference can
be manipulated directly. On the other hand, for color images of natural scenes, the repre-
sentation issues are more complicated. Here we list some of the key factors that need to be
considered:
1. Effect of light source variation
Daylight, tungsten lamps, fluorescent lamps, and electronic flashes are the four major
types of light source for color imaging. They differ greatly in their spectral power
distributions. Daylight is the dominant natural light source in outdoor scenes, but its
spectral composition varies with location, weather conditions, and the time of the day.
Fortunately, the variation can be approximated by two eigenvectors and a mean vector
as specified in the CIE daylight illuminants. Its chromaticity locus (daylight locus)
is almost parallel to that of the blackbody radiator at different temperatures. Object
surfaces of a natural scene are mainly illuminated by various mixtures of two major
components of daylight sources: sunlight and skylight. If we compute the eigenvectors
of the covariance matrix of the red, green, and blue signals of a natural scene, the exact
results vary from scene to scene. However, one of the eigenvectors is most often in
the general direction of yellow–blue. The most probable cause is that yellow–blue is
the direction of the additive mixture of yellow sunlight and blue skylight. Surfaces
of different orientations relative to the sun are illuminated by mixtures of the two
components in varying amounts, and therefore the reflected light of a homogeneous
curved surface tends to have a wider spread of color in the yellow–blue direction
than any other direction. Furthermore, objects in shadow are mainly illuminated by
the skylight, and those under the sun by the sunlight. Therefore, the chromaticity
distribution of a natural scene is often elongated along the sunlight–skylight direction.
In addition to the global lighting variation, virtually every object surface is illuminated
by light of different spectral compositions, due to mutual reflections among surfaces.
This variation is multiplicative in scene spectral radiances or integrated radiances [563].
Any color signal representation for natural scenes should account for the multiplicative
color variation of different illuminations.
2. Effect of reflectance variation
There are many more dark colors than bright colors in natural scenes. Studio photos may
have dominant groups of white objects in the scene, but they are not typical in consumer
photos or outdoor scenes. If we plot the reflectance distribution of natural objects, it is
not a symmetric or uniform distribution. In fact, various studies have shown that the
average reflectance is between 14 and 20%. If a linear metric is used to represent color
signals (such as the 12-bit digital output of an analog-to-digital converter that is used
to digitize the analog output of a CCD sensor), we will expend many of the bits in the
high-reflectance region, where only a relatively small percentage of pixels occur. The
number of bits used in a logarithmic metric may be more proportional to the number
of objects in natural scenes, therefore many dark objects can be distinguished.
590 22 Basic concepts in color image processing
singularity at zero. Similarly, a logarithmic metric has the same problem. As a result, color
signals are best represented in a space that eliminates any such singularity [566].
When we look at the world around us, our visual system readily parses the retinal images into
objects and their spatial layouts. Because this is done so fast and effortlessly by our visual
perception, we are often led to believe that this can also be done by a computer as a processing
step preceding all other, higher-level cognitive interpretations. This step of parsing and
grouping of image regions into perceptually or physically meaningful objects is called
image segmentation. However, we soon find that this is probably the most difficult problem
in vision. Many years of research have not resulted in much progress in the algorithms for,
or the understanding of, image segmentation.
If we examine the problem further, we realize that too many of our ideas are simply
too naive. The first misleading idea is that physical objects are well-defined entities and
therefore, we should be able to segment the image into objects. However, this is not true.
Let us consider the following questions. Is a nose an object? Is a face an object? Is a head
an object? Or is an entire human being an object? How about the clothing on our body?
A second misleading idea is that the color of an object is uniform and therefore it can be
used to segment the image into different regions of objects. Again, this is not true, not
only because surfaces rarely have uniform BRDFs, but also because they are almost never
uniformly illuminated. A third misleading idea is that an object has boundaries that are
distinguishable from the background. The reality is that object boudaries may be distinct
physically, but they may not be detectable in the image. There are many other difficult ques-
tions like these that make image segmentation a very unrealistic goal unless we define the
problem in a more manageable manner. The most important thing to clarify is that perfect
segmentation in terms of grouping pixels into semantically meaningful entities (such as sky,
trees, houses, roads, people, etc.) is ill defined and not achievable from a purely bottom-
up, data-driven approach without using some top-down, model-driven processes. On the
other hand, using colors, edges, and textures, we can often accomplish a useful partition
of an image into regions that other processes can further analyze (e.g., [610, 611, 894]).
For example, we can locate regions that have skin-like colors and then look for other
592 22 Basic concepts in color image processing
face features in those regions. This type of low-level region partitioning will be the sub-
ject of our discussion here [571, 609]. In particular, we will examine a case in which we
would like to see how we can do such a primitive image segmentation based on a color
feature.1 An input color image is partitioned into nonverlapping regions, each correspond-
ing to a “different color”. In order to quantify what we mean by “different colors”, we have
to define a color space to specify colors, and a distance metric to measure color differences.
1 This section is a shortened and slightly modified version of a paper published in Journal of the Society of
Photographic Science and Technology of Japan [571]. Interested readers should read the paper for more details
and the related application.
22.3 Color image segmentation 593
When considering the effect of noise in color coordinates, we have to remember that
noise in the color signals broadens a single point in a color space into a volume. If noise
is isotropic in the chosen color space, the volume is a sphere around the true color point.
Partitioning in such a space will have an equal probability of error in all directions. Noise in
images from film origination is additive and more symmetric in density space (proportional
to log exposure) than in exposure space.
Regarding the effect of light source variations, we should first note that the spectral com-
position of the light reflected from an object surface changes under different illuminations.
For different color spaces, the change of color coordinates takes place in different forms.
For a given set of objects, some color spaces preserve the geometric relations (such as angle
and distance) between the different colors better than other spaces, when the objects are
illuminated with different light sources. For example, in the CIE chromaticity diagram,
changing the illuminant (say, from illuminant A to D65) will cause a large geometrical
distortion in the relative positions of the chromaticity coordinates. For image segmentation,
it is desirable to use a color space that preserves the geometric relations under different
illuminations, so that the segmentation results are less dependent on the illumination.
With regard to the effect of reflectance variations, we need to consider the fact that the
radiance and spectral content of the light reflected from an object surface depend on the
lighting and viewing geometry. As the surface orientation changes, the diffuse (body re-
flection) component of the light reflected from an object often varies in radiance but not in
chromaticity [560, 562, 563]. This is one of the reasons why we see a uniformly-painted,
curved object as having gradual shading but constant “color”. The specular reflection (in-
terface reflection), on the other hand, often dilutes the saturation of the object color. A color
space that provides simple geometric relations for additive mixtures of light is a better choice
for segmentation because it is easier to design the segmentation metric to accommodate the
desaturation effect of object colors due to the specular reflection.
2 The luminance here relates to the magnitude of radiant exposure. It is not meant to be the luminance as defined
in photometry.
594 22 Basic concepts in color image processing
Comparison between these two spaces is done according to the four factors listed in the
previous section.
Let r , g, b be the three (red, green, blue) color signals (image irradiances or exposures),
and R, G, B be the logarithms of r , g, b, respectively. For images scanned from film,
integral densities are the quantities that can be conveniently measured. In order to get the
log exposure, we have to convert the film density through the D–log H curves. Films all
have D–log H curves that have very small slopes around the film Dmin and Dmax , where a
small density variation is thus translated into a large difference in log exposure. In order
to avoid greatly amplifying noise around these toe and shoulder density ranges, we convert
the film density to “equivalent” log exposure by dividing the density by the average gamma
of the film. We will refer to the “equivalent” log exposure simply as the log exposure.
The two color spaces are defined as follows:
Y = αr + βg + γ b,
r
x = , (22.1)
r +g+b
g
y =
r +g+b
and
1 1
L = √ (R + G + B) = √ (log r + log g + log b),
3 3
1 1
s = √ (R − B) = √ (log r − log b), (22.2)
2 2
1 1
t = √ (R − 2G + B) = √ (log r − 2 log g + log b).
6 6
Both spaces consist of one “luminance” component and two “chrominance” components.
Furthermore, the chrominance components in both spaces are intensity-invariant, i.e., they
remain constant when the light source intensity is changed by any arbitrary, nonzero factor.
Both spaces have singularity points that can be dealt with by setting small values to some
constant. In photographic applications, these singularity points are not of any practical
importance, because films have minimum densities that serve as the threshold for small
signal values. The (Y, x, y) space is similar to the CIE luminance–chromaticity space [1053].
The (L , s, t) space is traditionally called the T-space in photographic applications. Similar
spaces had been used before [727], but it is not clear if the RGB in those spaces are linear
or log. The use of log exposure instead of film density makes the segmentation algorithm
independent of film gamma.
The chromatic axes for the (L , s, t) space are chosen so that the s component approx-
imately represents the illuminant variations (daylight to tungsten light). The three compo-
nents are mutually orthogonal and are normalized to have unit vector length. The normaliza-
tion to unity is to make the noise variances of the three components approximately equal. (In
practice, there is more noise in the blue signal, and it may be desirable to adjust the coeffi-
cients accordingly.) Other considerations, such as aligning the color vectors with the princi-
pal components of the covariance matrix of the color signals, or with the opponent processes
of human color encoding, lead to coefficients similar to those in (L , s, t) space [565, 569].
22.3 Color image segmentation 595
The first consideration, of the statistical distribution of color signals, favors the use
of (L , s, t) space because both theory [790] and empirical data [16, 735] show that the
statistical distribution of color signals is more symmetrical in log space than in linear space.
For images originated from film, the second consideration, of noise, favors the (L , s, t)
space because film grain noise, is more symmetrically distributed in log exposure space.
(However, the choice for images captured with CCD sensors is not as easy because the CCD
noise distribution is more symmetric in the linear exposure space [566].)
To evaluate the effects of light source and reflectance variations on the color coordi-
nates, we have chosen eight spectral samples recommended by the CIE for evaluating the
color rendering power of different illuminants [620]. Skin and a green plant leaf are also
included because of their frequent appearance in consumer images, making a total of ten
samples of varying hues and saturations. The relative spectral reflectance curves are shown
in Fig. 7.7 As can be seen, these ten samples represent wide variations of different spectral
reflectances.
For the effect of illuminant variation, the comparison is done by plotting the color coor-
dinates of these ten samples under D65 and tungsten illuminants (the tungsten illuminant is
a blackbody radiator at 2856 K) and checking the distortions with and without a von Kries
type color normalization ([1053, p. 431]). We use a consumer color negative film as the
hypothetical image sensing and recording medium, whose spectral sensitivities are shown
in Fig. 22.1. Figure 22.2 shows the color coordinates, in the two color spaces, of the ten
samples under the two illuminants. As can be clearly seen, the ten points undergo a fairly
simple translational shift in the (s,t) space, but a highly complex distortion in the (x, y)
space.
For the effect of specular reflection, the (x, y) chromaticity space is known to be quite
simple: the color coordinates of additive mixtures of the diffuse and specular components
of the reflected light fall on a straight line. This is a very good property for image seg-
mentation because it is very simple to define a distance metric in the space to discount
the color desaturation effect due to specular reflection. Mathematically, one can prove that
the collinear property does not hold in the (s, t) space. However, the deviation from that
1.0
0.8
normalized spectral sensitivity
0.6
0.4
0.2
0.0
300.0 400.0 500.0 600.0 700.0 800.0
wavelength (nm)
Figure 22.1. The spectral sensitivity curves of a consumer color negative film.
596 22 Basic concepts in color image processing
leaf
tungsten
y
4
5 3
D65 2
6
1
7 skin
8
x
(a)
8
7
D65 6 skin
1 s
tungsten
5
2
leaf
(b)
Figure 22.2. The color coordinates of reflectance samples under different illuminants: (a) is the CIE
chromaticity diagram; (b) the (s, t) chromaticity diagram.
property becomes large only when the color is very saturated. If we add a specular reflection
component of 10% to the ten samples, i.e., we increase the spectral reflectance value 0.1 at
all wavelengths, their color coordinates are shifted convergently to the illuminant point in
both spaces, as can be seen in Fig. 22.3. (Note that the large shift in the color coordinates of
the green leaf is due to its relatively low reflectance at most wavelengths before the specular
component is added.) Even in the (s, t) space, the lines do not seem to deviate much from
the illuminant point.
Judging from the above comparison, we conclude that the (L , s, t) space is the better
choice for segmenting images of film origin. We have not considered the so-called uniform
color spaces or others based on perceptual scaling, such as the CIELUV and CIELAB,
for two reasons. (1) They are unnecessarily complicated for images from different imag-
ing systems. Not only the viewing conditions, but also the system spectral responsivities,
have to be specified. (2) The perceptual difference is not necessarily a better metric for
separating two physically different materials. For example, infrared sensing is very good
for segmenting hot objects from cold ones, without any corresponding visual perception
signals.
22.3 Color image segmentation 597
leaf
3
4
y
5 2
1
6
skin
7 8
(a) x
8
7
1 skin
6
5 2
leaf
(b)
Figure 22.3. The color coordinates of reflectance samples with and without 10% specular (interface)
reflection component: (a) the CIE chromaticity diagram; (b) the (s,t) chromaticity diagram.
case. In fact, surface areas that differ much in luminance are almost always illuminated
by light of different spectral contents.
It is true that introducing luminance into the segmentation increases the risk of breaking up
an image region that corresponds to one object surface, but empirical results show that this
is a risk well worth taking.
Q
P
c
peak m
Figure 22.4. The decomposition of the distance into a “saturation” component, P, and a “hue”
component, Q. Point c represents a color in (s, t) space and m represents the center of a color
cluster (i.e., a local peak in the three-dimensional color histogram).
In the color space, a color segmentation algorithm attempts to partition the whole space
into volumes that hopefully correspond to colors of different objects. The metric for par-
titioning should thus be coarser in the luminance component than in the chrominance
components. Furthermore, the metric should also be coarser in color saturation than in hue,
in order to reduce the effect of specular reflection.
If “hue” and “saturation” are to be used by the distance metric, we will need an estimate
of the neutral point in the chrominance space. Methods for estimating the neutral point
have been discussed in Section 18.5 and we will not discuss them here. Suffice it to say
that a rough estimate, such as the average of the whole image, is often better than taking
a default value. Moreover, the distance measure should not be too sensitive to the error
in estimating the neutral point. From the above considerations and experiments on many
images, the following distance metric was found to be quite satisfactory:
where (L)c−m , (P)c−m , and (Q)c−m are the differences, between a color c and a peak
m, in luminance, “saturation”, and “hue”, respectively, after they have been quantized into
histogram bins (see Fig. 22.4). It should be noted that the above coefficients are dependent
600 22 Basic concepts in color image processing
on the quantization intervals used in forming the three-dimensional color histogram. The
definitions of (P)c−m and (Q)c−m are as follows. If we connect each peak with the
neutral point, we form one radial line for each peak. The distance from any point, c, to
a peak, m, is decomposed into two components: one parallel and the other perpendicular
to the radial line of that peak. The parallel component is (P)c−m and the perpendicular,
(Q)c−m . Remember that the luminance quantization is much coarser than that of the two
chrominance components (0.08 vs. 0.02). The weighting factor 2.0 of (L)2c−m in Eq. (22.4)
still greatly reduces the effect of the luminance difference.
There are a few alternatives that one can use to define “hue” and “saturation”. For
example, one can use hue angle and relative saturation. We have chosen not to use hue
angle, because it is very sensitive to the estimation error in the neutral point, and not to use
relative saturation, because, in principle, the (L , s, t) space does not have a finite support.
This is not to say that the distance metric in Eq. (22.4) is best in any sense. We have not
been able to formulate its definition on a firmer theoretical ground.
As readers may have noticed, the distance metric in Eq. (22.4) is not symmetric with
respect to c and m, i.e., if c, instead of m, is the peak, the two components, (P)m−c and
(Q)m−c , are, in general, different from (P)c−m and (Q)c−m . The reason we chose to
define the metric as shown in Fig. 22.4, rather than the other way, is mainly for computational
speed. In order to compute (P)c−m and (Q)m−c , one can prestore the normalized vectors
from the neutral point to all the peaks (typically less than 64 in number).
Remarks
An image segmentation algorithm based on color space partitioning has been described
above. Much thought has been put into the selection and design of the color space with a
proper distance metric, all based on considerations of the physical principles underlying the
color imaging processes. The algorithm was designed to be simple and fast, while tolerating
some potential segmentation errors that could only be dealt with by reasoning in both the
spatial and color domains at the same time [609]. For example, boundary pixels should be
classified with a higher probability of being in one of the neighboring segments, rather than
equally likely to be in all the segments, near or far. Another obvious improvement can be
achieved by using edge detection results [566] for delineating the segment boundaries.
Color changes from one point to another in an image. For many applications, it is desirable
to detect such color changes, so that image operations can be adjusted accordingly. For
example, in smoothing out image noise, we would also like to preserve the sharpness of
edges. The change in color (vector) is called the color gradient. It is a measure similar to
the luminance (scalar) gradient (in irradiance) for a monochromatic image. Let us look at
how we can define the color gradient as a generalization of the luminance gradient.3
3 This and the following sections are a shortened and modified version of a paper published in IEEE Transactions
on Signal Processing [567]. Readers interested in more details should consult the paper.
22.5 Color edge detection 601
Edge detection is a very widely used operation in image processing and computer vision, but
what do we mean by edges? A general definition may be that they correspond to reflectance,
illumination, or object boundaries. However, even this general definition does not completely
describe our intuitive concept of edges. For example, most people will agree that a nose is
an object on our face (that is why it has a name). How do we define the boundary of the
nose on a face? Careful thinking along these lines tells us that as long as our mathematical
definition of edges does not cover our intuitive concept of edges, we cannot expect to develop
a “perfect” edge detector. In fact, we may begin to question whether we are asking the right
602 22 Basic concepts in color image processing
questions. Having pointed out the negative aspects of this topic, we should also look at some
of the positive things that we have learned from many years of research in edge detection.
At least, three definitions of edges for monochromatic images have been proposed: (a)
edges are where the rate of change in image irradiance is a local maximum along the
gradient direction (e.g., [161, 380, 635]); (b) edges are where the scene radiances (or their
derivatives) are discontinuous (e.g., [559]); and (c) edges are boudaries where the statistical
distribution of image irradiances changes from one side to another (e.g., [417, 419, 524]).
All these three definitions are quite reasonable, but they lead to different methods and thus
different results of edge detection. Here we will use definition (a) and show how we can
generalize it to color images.
The most straightforward generalization of color edge detection is to treat each color
independently. Edges are detected in each of the three color components and then combined
together to give a final edge map according to some proposed rules [245, 435, 710, 798]. A
typical color gradient is computed as the square root of the sum of the squares of the three
scalar gradient mangnitudes: ∇r 2 + ∇g2 + ∇b2 . As for the direction of a color
edge, the rule chooses either the direction that corresponds to the maximum component or a
weighted average of the three gradient directions. Instead of separately computing the scalar
gradient for each color component, we can treat a color image as a vector field and compute
the vector gradient of the vector field to detect the local maximum gradient [259, 567, 720].
It will be shown shortly that the vector gradient approach is slightly less sensitive to noise
than the approach that uses the sum of the squares of the scalar gradients.
Let us define the following variables to simplify the expression of the final solution:
2 2
∂r ∂g ∂b 2
p = + + , (22.8)
∂x ∂x ∂x
∂r ∂r ∂g ∂g ∂b ∂b
t = + + , (22.9)
∂x ∂y ∂x ∂y ∂x ∂y
2 2 2
∂r ∂g ∂b
q = + + . (22.10)
∂y ∂y ∂y
1( )
λ= p+q + ( p + q)2 − 4( pq − t 2 ) . (22.12)
2
The gradient direction requires a little careful examination. In the general case, the direction
is along the vector [t, λ − p]T , which is the eigenvector corresponding to the eigenvalue λ.
However, if t = 0 and λ = p, then [t, λ − p]T becomes a zero vector, and we have to use
[λ − q, t]T . The remaining case to be considered is when both vectors are zero vectors. In
this case, uT DT Du is locally a spherical surface, and all vector directions are equivalent.
It is interesting to note that
2 2 2
∂r ∂g ∂g ∂r ∂g ∂b ∂b ∂g ∂b ∂r ∂r ∂b
pq − t 2 = − + − + − . (22.13)
∂x ∂y ∂x ∂y ∂x ∂y ∂x ∂y ∂x ∂y ∂x ∂y
Therefore, pq − t 2 ≥ 0, and p + q ≥ λ. Since
2 2 2 2 2 2
∂r ∂r ∂g ∂g ∂b ∂b
p+q = + + + + + (22.14)
∂x ∂y ∂x ∂y ∂x ∂y
= ∇r 2 + ∇g2 + ∇b2 , (22.15)
it follows that sum of the squares of scalar gradients p + q is always greater than or equal to
λ, the vector gradient squared. This is true even for the case of a general vector field, Vn→m ,
as can be seen by the following simple proof. The sum of the squares of scalar gradients is the
trace of the matrix DT D. Since DT D is symmetric and positive semi-definite, it has all real,
nonnegative eigenvalues. The trace of DT D is also equal to the sum of its eigenvalues. Since
the eigenvalues are all nonnegative, the trace is at least as large as the largest eigenvalue.
But the largest eigenvalue is the square of the magnitude of the vector gradient. Therefore
the vector gradient squared is never larger than the sum of the squares of scalar gradients,
and can be as small as 1/n of the latter.
In summary, when using the vector gradient in boundary detection, the vector gradient
is about the same as the scalar gradient for the signal, but its value becomes smaller than
the scalar gradient for the noise. The effect is a net increase in the signal-to-noise ratio for
edge detection. This gain comes from the fact that signals of different color components
are more correlated than noise. We can therefore conclude that, for a typical color image,
the vector gradient is less sensitive to noise than the scalar gradient.
The choice of the two thresholds, Tl and Th , is important for the boundary detector. Too
low a threshold will produce too many false edges, while too high a threshold will throw
away too many true edges. In order to quantify the trade-off, we have to characterize the
noise behavior of the boundary detector. In the following, we will assume that the noise at
each point of the image is stationary, white (independent), Gaussian noise N (x, y), with
mean = 0 and variance = σn2 . We also assume that all the attributes have the same amount
of noise. If this is not true, each attribute has to be weighted with the inverse of its noise
standard deviation before it is combined with other attributes.
After the image smoothing in step 1, the smoothed noise P(x, y) is no longer white. Its
autocorrelation function R P (m, n) can be approximated as follows [567]:
σn2 m 2 + n2
R P (m, n) = E[P(m, n)P(0, 0)] ≈ exp − . (22.17)
4π σb2 4σb2
This approximation is very good for σb ≥ 1.0 pixel, but quickly becomes unacceptable
when σb is less than 0.7. In practice, for σb less than 1.0 pixel, the discrete Gaussian mask
becomes a very undersampled representation of a Gaussian filter, and should not be used
without careful analysis. The partial derivatives Px = ∂ P/∂ x and Py = ∂ P/∂ y of P(x, y),
as computed by step 2, can be shown to be independent of each other, and their variances
are given by:
Substituting Eq. (22.17) into Eq. (22.18), we arrive at the following relation:
σn2
σd2 ≈ (6 + 8c − 2c4 − 8c5 − 4c8 ), (22.19)
4π σb2
Smoothing with a Gaussian filter of size σb thus reduces the noise standard deviation of the
partial derivative by a factor of approximately 1/σb2 . If we increase the size of the smoothing
filter by a factor of 2, the noise is reduced by a factor of 4. Equations (22.19) and (22.20)
are the quantitative relations we need to determine how much smoothing we need for step
boundary detection. Since the partial derivatives, such as Px and Py , are linear combinations
of Gaussian random variables, they are themselves normally distributed. We now have all
the information needed to derive the distribution of the scalar gradient.
Let the amplitude rs of the scalar gradient of a vector field, Vn→m , which maps
(x1 , . . . , xn ) to (u 1 , . . . , u m ), be defined as
n m
∂u j 2
rs = . (22.21)
i=1 j=1
∂ xi
where k = mn and σd2 is the variance of the partial derivative, which, for n = 2, can be
√
computed by Eq. (22.18). The peak of the distribution occurs at rs = k − 1σd .
Let rv be the amplitude of the vector gradient of a vector field, V2→m , which maps (x1 , x2 )
to (u 1 , . . . , u m ). The statistical distribution of rv turns out to be too complicated for us to
find a closed form expression. We therefore search for an empirical equation to describe
it. First, we show that in Eq. (22.12), the value of ( pq − t 2 ) is statistically a fraction of
( p + q)2 :
1 m m
∂u i ∂u j ∂u j ∂u i
2
E[ pq − t 2 ] = E −
2 i=1 j=1, j=i
∂ x1 ∂ x2 ∂ x1 ∂ x2
. # $
1 m m
∂u i ∂u j 2 ∂u i ∂u j ∂u j ∂u i
= E − 2E
2 i=1 j=1, j=i ∂ x1 ∂ x2 ∂ x1 ∂ x2 ∂ x1 ∂ x2
/
∂u j ∂u i 2
+E
∂ x1 ∂ x2
1 m m
= (σ 4 − 0 + σd4 )
2 i=1 j=1, j=i d
= m(m − 1)σd4 . (22.23)
m
Since p + q = i=1 [(∂u i /∂ x1 )2 + (∂u i /∂ x2 )2 ] has a χ 2 distribution with 2m degrees of
freedom, its mean is 2mσd2 and its variance is 4mσd4 . We have
following way:
* ( )+1/2
√ 1
rv = λ= p + q + ( p + q)2 − 4( pq − t 2 )
2
√ 0 1/2
p+q m−1
≈ √ 1+ 1−
2 m+1
0 1/2
rs 2
= √ 1+ . (22.25)
2 m+1
Assuming that the above approximation is true for all integer values of m, the surprising
conclusion is that, even by measuring more and √ more attributes, the spread of the noise
vector gradient cannot be reduced beyond 1/ 2 of that of the noise scalar gradient, and
the return diminishes fairly quickly. For m = 2, rv ≈ 0.953rs , and for m = 3, rv ≈ 0.924rs .
Therefore, we might expect that the amplitude of the vector gradient, rv , would have a
distribution very similar in shape to that of the scalar gradient, rs , with the scale reduced
by a fraction. Experimental results confirm that this is indeed a very good approximation.
Thus, we have a numerically predictable advantage of reduced noise sensitivity when we
use the vector gradient instead of the scalar gradient. However, the advantage is not large.
Gradient amplitude
A two-dimensional step edge with step size A, after smoothing by a Gaussian filter of size
σb , becomes a blurred edge with the following profile:
A x
−t 2
f (x) = √ exp dt. (22.26)
2π σb −∞ 2σb2
Its gradient amplitude is
A −x 2
∇ f (x) = √ exp . (22.27)
2π σb 2σb2
The maximum gradient of a step edge is thus reduced by a factor proportional to σb−1 after
the smoothing. For the noise, the reduction factor is proportional to σb−2 (as shown in Eq.
(22.20)). Therefore, in principle, it is possible to apply a proper smoothing filter to increase
22.5 Color edge detection 607
the signal-to-noise ratio to any desirable amount for detecting an ideal step boundary (even
if the step signal is arbitrarily small compared with the noise!). In practice, one never has an
ideal step edge with an infinite extent of flat regions on both sides of the edge. Furthermore,
extensive Gaussian smoothing distorts the underlying edge structures in image irradiances,
especially around the corners. The smallest size of the smoothing kernel which gives an
acceptable performance requirement in terms of detection errors is always preferred over
a larger smoothing-kernel size. For ideal step edges, we know the statistical distributions
of both the pure noise and the signal-plus-noise, therefore it is possible to calculate the
statistical errors in this simple case of edge detection [567].
Gradient direction
Another interesting effect of the smoothing is on the distribution of the directional angle of
the gradient. We will first derive the angular distribution of the gradient directions along
a step boundary for the special case (m = 1) in which the vector field has only one color
(e.g., a monochromatic image). It is shown that the angular spread is reduced as the amount
of smoothing is increased. We then look at the case in which there are three colors in the
vector field (m = 3). The distributions show the same trend of reduced angular spread for
increased smoothing. Furthermore, for the same amount of smoothing, increasing m, the
number of colors, also reduces the angular spread.
For a vector field V2→1 , the vector gradient is the same as the scalar gradient. Assume
that a step boundary with step size A is oriented vertically and the additive Gaussian white
noise has a standard deviation of σn . Let the smoothed image be E and its partial deriva-
tives E x and E y . As we mentioned before, E x and E y are independent Gaussian random
fields with the same standard deviation, σd . The mean of E y is zero, while that of E x is
given by
# $
3A −1
µ= √ 1 + exp .
2π σb 2σb2
The angular distribution of the gradient direction along the step boundary can be determined
by performing a change of variables to polar coordinates and integrating out the variable of
radial distance. After lengthy but straightforward manipulations, the resulting distribution
is found to be
2 * +
e−s 1 1
+ √ (s cos θ)[1 + erf(s cos θ)]es cos θ ,
2 2
p2π (θ) = √ √ (22.28)
2π 2π 2
√
where s = µ/( 2σd ), 0 ≤ θ < 2π, and the error function, erf, is defined as
x
2
e−t dt.
2
erf(x) = √ (22.29)
π 0
Since we treat the directional angle θ as equivalent to θ + π , we have to modify the range
of the distribution to that of −π/2 to π/2 (or 0 to π):
2 # $
e−s 1
√ + (s cos θ)erf(s cos θ)es cos θ .
2 2
pπ (θ) = √ (22.30)
π π
608 22 Basic concepts in color image processing
For example, let us look at the case of a vector field with three colors (m = 3). Since
the distribution of the directional angle is too complicated for us to derive a closed form
expression, we can use the same function ω in Eq. (22.30) to approximate the angular
distribution of the vector gradient of an m-color vector field by defining s to be
m m
µi2
s= si2 = . (22.31)
i=1 i=1
2σd2i
Numerical experiments showed that this approximation seems to be quite reasonable [567].
A line segment in space can be directed or undirected. For example, a vector has a definite
direction and it is directed. On the other hand, a line segment in a triangle is undirected. In
dealing with color images or vector fields, we often encounter some statistics of vectorial
data (directed line) and axial data (undirected line). Both vectorial data and axial data deal
with orientations of lines, and we will call them directional data. For example, color hue
angles are vectorial data, while edge directions are axial data. Certain denoising algorithms
assume that the hue angle is a smooth function within an object surface. Since noise will
perturb the hue angle, the algorithms have to model the hue angle distribution as a function of
noise. We have to deal with the statistics of directional data. The most important difference
in dealing with directional data is that the distribution has to be periodic. Therefore, the very
useful Gaussian distribution is no longer valid for directional data. Many of the familiar
concepts that we have learned about nondirectional data have to be modified to deal with
directional data. For example, our familiar way of expressing a histogram as a linear array
has to be changed into a pie chart, a fan diagram, or a rose diagram in which the radial
length is proportional to the number of data points in the given direction. Taking an average
of directional data becomes a modulo π or 2π operation. Fortunately, there are very good
books on this unfamiliar topic [318, 633, 1016].
The vector R is called the resultant vector of the set. The mean direction of the set of unit
vectors is simply the direction of the resultant vector. Let r be the length of the resultant
vector. If most of the vectors are clustered around the mean direction, r will have a value
close to n. If most of the vectors are spread out, r will be a small number. Therefore, r is a
measure of the concentration (or dispersion) of the distribution of the directional data. The
spherical variance is thus defined as s = (n − r )/n.
In a spherical coordinate system (see Fig. 8.2), a direction is specified with two angles:
(θ, φ), with x = sin θ cos φ, y = sin θ sin φ, z = cos θ. However, the quantity dθdφ used in
probability integration does not represent an elemental area. An elemental area dA on the
unit sphere is sin θdθdφ. This means that the factor sin θ often appears in the probability
density function (pdf) of a distribution of directional data. Let us derive the pdf of the uniform
distribution on the unit sphere as an example. Let g(θ, φ)dA be an elemental probability.
By the definition of a uniform distribution on the unit sphere, g(θ, φ) = 1/(4π ). Therefore,
the pdf, f (θ, φ), of a uniform distribution on the unit sphere is
sin θ
f (θ, φ)dθdφ = g(θ, φ)dA = dθdφ, (22.33)
4π
sin θ
f (θ, φ) = , 0 ≤ θ ≤ π, 0 ≤ φ < 2π. (22.34)
4π
For spherical data (i.e., in a three-dimensional space), the von Mises distribution is
generalized to a distribution called the Fisher distribution. Let U0 (with angle µ0 ) be the
mean direction vector of a von Mises distribution, and κ be the directional concentration. For
a unit vector U in direction θ, the probability density function of the von Mises distribution
is
1 1
g(θ) = eκ cos(θ−µ0 ) = eκU·U0 . (22.37)
2π I0 (κ) 2π I0 (κ)
Therefore, the von Mises distribution can be generalized to the spherical data simply
by using, in the exponent, the inner product of a three-dimensional unit vector and
the mean unit vector of the distribution. Let U = [sin θ cos φ, sin θ sin φ, cos θ]T and
U0 = [sin θ0 cos φ0 , sin θ0 sin φ0 , cos θ0 ]T , we have
Let (θ0 , φ0 ) be the mean direction, the resulting function is called the Fisher distribution:
κ
f (θ, φ) = eκ[sin θ sin θ0 cos(φ−φ0 )+cos θ cos θ0 ] sin θ, (22.39)
4π sinh κ
where 0 ≤ θ ≤ π, 0 ≤ φ < 2π, κ > 0, 0 ≤ θ0 ≤ π , and 0 ≤ φ0 < 2π . Let the north pole
be the mean directional vector, i.e., θ0 = 0, then
κ
f (θ, φ) = eκ cos θ sin θ. (22.40)
4π sinh κ
In fact, the von Mises distribution and the Fisher distribution can be generalized to a
hypersphere of dimension m, and the resulting distribution is called the Langevin distribution
[318]. The general form is simpler if we again assume that the mean directional vector is the
north pole of the hypersphere. The pdf of the hyperspherical directional distribution [633,
p. 247, 1015, 1016] of polar angles θ1 , θ2 , . . . , θm−1 can be expressed as
κ 2 −1
m
22.7 Denoising
Image noise comes from several sources, such as the photon statistics, sensor noise, amplifier
noise, and quantization noise. Some of them are signal-independent, and some are not.
Basic methods for dealing with signal-dependent noise are of two types. The first approach
scales the signal plus noise so that the noise variance becomes constant for all levels. The
second approach uses a lookup table to adjust the noise threshold at different signal levels
[566].
Noise makes image details difficult to see and it also interferes with computer image
analysis. Many algorithms have been proposed to reduce the undesirable presence of noise
through digital image processing. In order to reduce noise, we need to ask ourselves: how can
we tell the signal from the noise? Most of the existing algorithms rely on three assumptions
about noise: (1) image noise is less correlated than the image signal from pixel to pixel
(spatial correlation), and between color channels (color correlation); (2) image noise is
more objectionable in high spatial frequencies; and (3) image noise has a lower contrast
amplitude than that of the image signal. The validity of these assumptions depends on the
noise source, image rendering, and image viewing. But, the important thing to remember is
that all these three criteria require more quantitative specification before they can be useful
in practical applications.
One important observation is that our visual system can tell the image noise from the
image signal when looking at the image as a whole, correctly interpreting objects and
textures. However, if we only cut out a small part of an image, say a 5 pixel by 5 pixel
window, we are not much better than a computer at telling the noise and the signal apart.
However, that is typically the neighborhood size used by many denoising algorithms. Our
visual perception seems to rely on a much larger context and global inference to distinguish
noise from signal. Thus we should not expect any denoising algorithms to perform well
for images that have low signal-to-noise ratios. However, when the image signal is much
stronger than the noise, some algorithms can do a fairly good job in cleaning up the images.
Although denoising problems can often be formulated as finding a minimum mean-squared-
error estimate of the noise-free image, such as the Wiener filtering [27, 457, 776], the
resulting image often looks soft and blurry. In general, methods (e.g., [535, 577, 688]) that
pay careful attention to preserving edge sharpness by adaptive processing tend to produce
much better looking images.
Edge-preserving smoothing algorithms are based on the idea of choosing the proper
neighboring pixels to do the averaging, so that the smoothing operation does not run across
edges. For example, an elongated bar mask can be rotated with the current pixel as the center,
and the mask that has the least variance is selected for averaging with the current pixel [688].
An alternative implementation is to select only a given fraction of nearest gray-level pixels
in a neighborhood window for averaging. We can imagine that many other variations can
be implemented along these lines. This class of algorithm often performs quite well, except
in texture regions. Therefore, the variance of the neighborhood is often used to adjust how
much smoothing should be used [577]. A new type of filter, called the bilateral filter, has
greatly, improved the performance of the edge-preserving smoothing approach [953]. With
conventional filters, the filter weights spatially nearby pixels more than the faraway pixels.
612 22 Basic concepts in color image processing
The bilateral filter further varies a neighboring pixel’s weighting according to how close its
brightness value is to that of the center pixel. If the value of a neighboring pixel is closer to
the center pixel value, it is weighted more than a value that is farther away. Therefore, the
filter weighting is in both the spatial domain and the image exposure domain. The bilateral
filter is a very effective noise smoother without blurring high-contrast edges. However, it
still blurs fine textures.
Another class of noise smoothing algorithm is based on signal transforms, such as
singular value decomposition [568], wavelet [260, 530], curvelet [897, 898], or polynomials
[299]. The basic idea is that these transforms tend to represent correlated image signals
better than the uncorrelated noise. Therefore, a simple discrimination between signal and
noise can be made from the magnitude of the transform coefficient. The main thrust of
the curvelet methods [897, 898] is that image structures, such as edges and curves, can be
modeled in many directions and many scales, allowing a better discrimination of signal from
noise.
The denoising algorithms discussed so far apply equally well to monochromatic images
and color images. However, if we treat color images as vector fields, there are several possible
ways to generalize some classical denoising algorithms to color images. For example, the
vector median filter [40] is generalized from the scalar median filter for removing the
impulsive noise. The median vector Xmedian of a set of vectors, Q = {Xi : i = 1, 2, . . . , n},
is the vector that minimizes the sum of all distances between the vector and all other vectors
in the set:
n
Xmedian = min ||Xi − X||. (22.44)
X ∈Q
i=1
When the vector is a scalar, the definition becomes the same as the scalar median. In higher
dimensions, the definition of the median vector depends on the definition of the vector
norm (or the distance between two vectors). The choice of an effective norm and a fast
way to search for the median are required for practical applications [59]. Since a vector has
a length and a direction, we can also perform color noise filtering with vector directional
filters [958, 959]. The idea is that on a smooth surface, the direction of a color vector should
also very smoothly. We can use the spherical variance as defined in Section 22.6.1 to form
a vectorized version of the sigma filter [578]. A different definition of the angular spread
is as follows [959]. Let Q = {Xi : i = 1, 2, . . . , n} be a set of color vectors and A(Xi , X j )
be the angle between the vectors Xi and X j . Define an angular spread si associated with a
vector Xi as
n
Xi · X j
si = A(Xi , X j ) where A(Xi , X j ) = cos−1 . (22.45)
j=1
||Xi ||||X j ||
We can then choose the vector that has the minimum angular spread as the filtered output.
This is similar to the idea of the spherical median as defined in [318, p. 111], except that the
spherical median is not constrained to be one of the vectors in the set. The spherical median
has several robustness properties similar to those of the linear median [317]. We can also
order the vectors in Q according to its angular spread and choose to average the lowest k
vectors as the output.
22.7 Denoising 613
It should be pointed out that color vectors can and should be expressed in different color
spaces, depending on the application. For denoising purposes, we should consider perceptual
color spaces, such as CIELAB and CIELUV. We can also transform the color image into
luminance–chrominance vectors and work on the hue angle as a function of image location,
i.e., a hue field. A noise-corrupted pixel is more likely to have a hue angle significantly
different from its neighbors. Therefore color denoising can be done by suppressing random
variations in a smooth hue region while preserving the sharp boundaries between the smooth
regions [96].
Appendix Extended tables
360 0.000 129 900 0 0.000 003 917 000 0.000 606 100 0 0.175 56 0.005 29
365 0.000 232 100 0 0.000 006 965 000 0.001 086 000 0.175 16 0.005 26
370 0.000 414 900 0 0.000 012 390 00 0.001 946 000 0.174 82 0.005 22
375 0.000 741 600 0 0.000 022 020 00 0.003 486 000 0.174 51 0.005 18
380 0.001 368 000 0.000 039 000 00 0.006 450 001 0.174 11 0.004 96
385 0.002 236 000 0.000 064 000 00 0.010 549 99 0.174 01 0.004 98
390 0.004 243 000 0.000 120 000 0 0.020 050 01 0.173 80 0.004 92
395 0.007 650 000 0.000 217 000 0 0.036 210 00 0.173 56 0.004 92
400 0.014 310 00 0.000 396 000 0 0.067 850 01 0.173 34 0.004 80
405 0.023 190 00 0.000 640 000 0 0.110 200 0 0.173 02 0.004 78
410 0.043 510 00 0.001 210 000 0.207 400 0 0.172 58 0.004 80
415 0.077 630 00 0.002 180 000 0.371 300 0 0.172 09 0.004 83
420 0.134 380 0 0.004 000 000 0.645 600 0 0.171 41 0.005 10
425 0.214 770 0 0.007 300 000 1.039 050 1 0.170 30 0.005 79
430 0.283 900 0 0.011 600 00 1.385 600 0 0.168 88 0.006 90
435 0.328 500 0 0.016 840 00 1.622 960 0 0.166 90 0.008 55
440 0.348 280 0 0.023 000 00 1.747 060 0 0.164 41 0.010 86
445 0.348 060 0 0.029 800 00 1.782 600 0 0.161 11 0.013 79
450 0.336 200 0 0.038 000 00 1.772 110 0 0.156 64 0.017 71
455 0.318 700 0 0.048 000 00 1.744 100 0 0.150 99 0.022 74
460 0.290 800 0 0.060 000 00 1.669 200 0 0.143 96 0.029 70
465 0.251 100 0 0.073 900 00 1.528 100 0 0.135 50 0.039 88
470 0.195 360 0 0.090 980 00 1.287 640 0 0.124 12 0.057 80
475 0.142 100 0 0.112 600 0 1.041 900 0 0.109 60 0.086 84
480 0.095 640 00 0.139 020 0 0.812 950 1 0.091 29 0.132 70
485 0.057 950 01 0.169 300 0 0.616 200 0 0.068 71 0.200 72
490 0.032 010 00 0.208 020 0 0.465 180 0 0.045 39 0.294 98
495 0.014 700 00 0.258 600 0 0.353 300 0 0.023 46 0.412 70
500 0.004 900 000 0.323 000 0 0.272 000 0 0.008 17 0.538 42
505 0.002 400 000 0.407 300 0 0.212 300 0 0.003 86 0.654 82
510 0.009 300 000 0.503 000 0 0.158 200 0 0.013 87 0.750 19
515 0.029 100 00 0.608 200 0 0.111 700 0 0.038 85 0.812 02
520 0.063 270 00 0.710 000 0 0.078 249 99 0.074 30 0.833 80
525 0.109 600 0 0.793 200 0 0.057 250 01 0.114 16 0.826 21
530 0.165 500 0 0.862 000 0 0.042 160 00 0.154 72 0.805 86
535 0.225 749 9 0.914 850 1 0.029 840 00 0.192 88 0.781 63
(cont.)
614
Extended tables 615
(cont.)
(cont.)
790 0.000 020 673 83 0.000 007 465 700 0.000 000 000 000 0.734 69 0.265 31
795 0.000 014 559 77 0.000 005 257 800 0.000 000 000 000 0.734 69 0.265 31
800 0.000 010 253 98 0.000 003 702 900 0.000 000 000 000 0.734 69 0.265 31
805 0.000 007 221 456 0.000 002 607 800 0.000 000 000 000 0.734 69 0.265 31
810 0.000 005 085 868 0.000 001 836 600 0.000 000 000 000 0.734 69 0.265 31
815 0.000 003 581 652 0.000 001 293 400 0.000 000 000 000 0.734 69 0.265 31
820 0.000 002 522 525 0.000 000 910 930 0 0.000 000 000 000 0.734 69 0.265 31
825 0.000 001 776 509 0.000 000 641 530 0 0.000 000 000 000 0.734 69 0.265 31
830 0.000 001 251 141 0.000 000 451 810 0 0.000 000 000 000 0.734 69 0.265 31
(cont.)
(cont.)
A.4 Judd’s modified VM (λ) (CIE 1988) and scotopic V (λ) (CIE 1951)
(cont.)
λ (nm) A B C D65
(cont.)
λ (nm) A B C D65
The chromaticity coordinates in the table below are based on 5 nm intervals from 380 nm
to 780 nm. For D55 , D65 , and D75 , the (x, y) chromaticity coordinates are computed from
the CIE daylight chromaticity equations for Tc = 5503 K, 6504 K, and 7504 K respectively.
(cont.)
Pointer [766] computed CIELAB data for a large collection of spectral reflectance mea-
surements of real object surfaces. The color gamut boundary is expressed in terms of the
CIELAB chroma for each lightness (in increments of 5) and hue angle (in increments of
10◦ ). The table shows the CIELAB chroma as a function of hue angle and lightness:
Hue/L ∗ 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
0 10 30 43 56 68 77 79 77 72 65 57 50 40 30 19 8
10 15 30 45 56 64 70 73 73 71 64 57 48 39 30 18 7
20 14 34 49 61 69 74 76 76 74 68 61 51 40 30 19 9
30 35 48 59 68 75 82 84 83 80 75 67 56 45 33 21 10
40 27 40 53 66 79 90 94 93 88 82 72 60 47 35 22 10
(cont.)
624 Extended tables
(cont.)
Hue/L ∗ 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
50 10 21 34 45 60 75 90 100 102 99 88 75 59 45 30 15
60 4 15 26 37 48 59 70 82 93 103 106 98 85 66 45 23
70 5 15 25 36 46 56 67 76 85 94 102 108 103 82 58 34
80 6 15 24 32 40 48 55 64 72 82 94 105 115 115 83 48
90 4 12 20 28 36 44 53 60 68 75 83 90 98 106 111 90
100 9 16 23 30 37 45 51 58 65 72 80 86 94 100 106 108
110 9 18 27 35 44 52 59 66 74 82 87 92 95 100 96 84
120 4 14 23 32 41 49 57 64 71 78 84 90 94 95 83 50
130 5 18 30 40 48 56 64 70 77 82 85 88 89 84 64 35
140 7 20 32 42 52 60 69 76 82 87 89 90 83 71 54 30
150 7 21 34 45 57 68 75 81 84 84 83 80 72 58 44 20
160 8 24 36 48 58 68 76 82 85 83 78 69 59 49 34 15
170 13 25 36 47 57 65 70 75 76 75 71 65 57 45 30 15
180 10 25 38 48 57 64 69 71 72 69 64 59 51 41 29 16
190 7 19 30 40 48 55 59 62 62 60 55 49 41 32 23 13
200 5 19 29 37 42 45 46 46 45 43 39 35 30 22 14 7
210 0 12 17 26 34 43 49 51 54 50 46 40 32 24 14 4
220 2 12 20 28 35 40 45 48 51 49 45 38 32 23 15 6
230 10 20 29 36 42 46 49 51 52 50 45 39 32 24 15 7
240 8 16 26 34 41 47 49 50 50 47 42 36 29 21 12 4
250 9 21 32 40 49 54 55 55 52 48 43 36 29 21 13 4
260 12 24 34 41 46 51 55 56 51 46 40 33 27 20 13 6
270 14 31 42 50 55 60 60 57 50 45 39 33 26 20 13 6
280 10 29 45 55 60 61 60 57 53 46 40 34 25 18 11 4
290 20 40 60 69 71 69 65 58 50 43 36 29 24 18 12 5
300 30 55 72 81 79 72 64 57 50 42 35 30 24 17 12 5
310 62 76 85 88 85 80 71 62 55 47 41 34 27 20 14 6
320 60 71 79 84 85 86 82 74 66 57 48 40 31 24 16 8
330 20 50 72 86 89 89 86 80 72 63 54 45 36 27 18 9
340 26 49 63 73 82 87 87 83 78 71 62 51 40 28 16 4
350 15 37 52 65 73 79 82 84 79 73 63 53 40 30 17 6
Glossary
Most of the following terms and definitions are from International Lighting Vocabulary,
CIE Publication No. 17.4. Others are from various standard documents, such as ISO, ANSI,
CGATS, etc.
Abney’s law
An empirical law stating that, if two stimuli, A and B, are perceived to be of equal brightness
and two other color stimuli, C and D, are perceived to be of equal brightness, the additive
mixtures of A with C and B with D will also be perceived to be of equal brightness. The
validity of this law depends strongly on the observation conditions.
absorption
Transformation of radiant energy to a different form of energy by interaction with matter.
adapted white
Color stimulus that an observer who is adapted to the viewing environment would judge to
be perfectly achromatic and to have a luminance factor of unity. The adapted white usually
varies within a scene.
additive mixture of color stimuli
Mixture of color stimuli acting in such a manner that they enter the eye simultaneously or
in rapid succession and are incident on the same area of the retina, or are incident in the
form of mosaic which the eye cannot resolve.
aliasing
Output image artifacts that occur in a sampled imaging system for input images having
significant energy at frequencies greater than the Nyquist frequency of the system.
alychne
Plane in color space representing the locus of colors of zero luminance. This plane passes
through the black point (usually the origin of the system); it intersects any linear chromaticity
diagram in a straight line which is also called an alychne.
aperture color
Color perceived as nonlocated in depth such as that perceived as filling a hole in a
screen.
Bezold–Brücke phenomenon
Change in the hue of the perceived color with a change in luminance level within the range
of photopic vision.
625
626 Glossary
brightness
Attribute of visual sensation according to which an area appears to emit more or less light.
candela (cd)
SI unit of luminous intensity, in a given direction of a source that is emitting monochromatic
radiant energy of frequency 540 × 1012 Hz (555.016 nm in air) and whose radiant intensity
in that direction is 1/683 W sr−1 .
candela per square meter (cd m−2 )
SI unit of luminance.
cathodoluminescence
Luminescence caused by the impact of electrons on certain types of luminescent materials,
such as the coating on a television screen.
chroma
Attribute of visual sensation which permits a judgement to be made of the amount of pure
chromatic color present, irrespective of the amount of achromatic color. Note: in a series
of perceived colors of constant saturation, the chroma increases with the luminosity.
chromatic adaptation
Adaptation by stimuli in which the dominant effect is that of different relative spectral
distributions.
CIE standard illuminants
Standard illuminant A, 2855.6 K
Standard illuminant B, 4874 K (direct sunlight)
Standard illuminant C, 6774 K
Standard illuminant D65 , 6504 K
CIE standard photometric observer
Ideal observer having a relative spectral responsivity curve that conforms to the V (λ)
function for photopic vision or to the V (λ) function for scotopic vision, and that complies
with the summation law implied in the definition of the luminous flux.
coherent radiation
Monochromatic radiation whose electromagnetic oscillations maintain constant phase dif-
ferences from one position to another.
color (perceived)
Attribute of visual perception consisting of any combination of chromatic and achromatic
content. This attribute can be described by chromatic color names such as yellow, orange,
brown, red, green, blue, pink, purple, etc., or by achromatic color names such as white,
gray, black, etc., and be qualified by bright, dim, light, dark, etc., or by combinations of
such names. Perceived color depends on the spectral distribution of the color stimulus, on
the size, shape, structure and surround of the stimulus area, on the state of adaptation of
the observer’s visual system, and on the observer’s experience of the prevailing and similar
situations of observation.
color atlas
Collection of color samples used for evaluating colors by visual matching.
Glossary 627
color equation
C(C) ≡ R(R) + G(G) + B(B)
color gamut
Volume in a color space, consisting of all those colors that are either (1) present in a specific
scene, artwork, print, or some form of reproduction, or (2) capable of being produced or
created using a particular output device and/or medium.
colorimeter
Instrument for measuring the tristimulus values or the chromaticity coordinates of a color
stimulus.
colorimetry
Measurement of colors based on a set of conventions. If the eye is used to make a quantita-
tive comparison of colors, it is called visual colorimetry. If physical devices are used for
the comparison, it is physical colorimetry.
color matching functions
Tristimulus values of monochromatic stimuli of equal radiant power.
color rendering
Mapping of image data representing the colorimetric coordinates of the elements of a scene
or original to image data representing the colorimetric coordinates of the elements of a
reproduction.
color space
Manifold of three dimensions for the geometrical representation of colors.
color temperature
Temperature of the full radiator which emits radiation of the same chromaticity as the
radiation considered. (See, also, correlated color temperature.)
contrast sensitivity
Reciprocal of the minimum relative luminance difference perceptible: Sc = L/L.
correlated color temperature
The color temperature corresponding to the point on the Planckian locus which is nearest to
the point representing the chromaticity of the illuminant considered on an agreed uniform-
chromaticity-scale diagram.
diffraction
Deviation of the direction of the propagation of a radiation, determined by the wave nature
of radiation and occurring when the radiation passes the edge of an obstacle.
diffusion
Change of the spatial distribution of a beam of radiation when it is deflected in many
directions by a surface or by a medium without change of frequency of the monochromatic
components of which the radiation is composed.
equi-energy spectrum, equi-energy white
Spectrum in which the spectral concentration of energy evaluated on a wavelength basis is
constant throughout the visible region.
628 Glossary
flare
Unintended stray light added to an image.
fluorescence
Photoluminescence in which the emitted optical radiation results from direct transitions
from the photoexcited energy level to a lower level, usually taking less than 10−8 s.
footcandle
Unit of illuminance: lumen per square foot. 1 footcandle = 10.76 lux.
fovea; fovea centralis
Central part of the retina, thin and depressed, which contains almost exclusively cones
and is the site of the most distinct vision. It subtends an angle of about 0.026 rad
or 1.5◦ .
foveola
Central region of the fovea which contains only cones. It subtends about 1◦ .
Helmholtz–Kohlrausch phenomenon
Change in brightness of perceived color produced by increasing the purity of a color stimulus
while keeping its luminance constant within the range of photopic vision.
hue
Attribute of visual sensation which has given rise to color names, such as blue, green, yellow,
red, purple, etc.
hyperfocal distance
Focus distance of a camera lens that offers the greatest depth of field.
incandescence
Emission of visible radiation by thermal excitation
infrared radiation
Radiation with a wavelength greater than that of visible light and shorter than 1 mm. IR-A:
780–1400 nm, IR-B: 1.4–3 µm, IR-C: 3 µm to 1 mm.
interflection (interreflection)
General effect of the reflections of radiation between several reflecting surfaces.
irradiance (at a point of a surface) E e
Quotient of the radiant flux de incident on an element of the surface containing the point,
by the area of that element.
de
Ee = [W m−2 ].
dA
Equivalent definition: integral taken over the hemisphere visible from the given point of
the expression L e · cos θ · d, where L e is the radiance at the given point in the various
directions of the incident elementary beams of solid angle d, and θ is the angle between
any of these beams and the normal to the surface at the given point. The following integration
is over 2π sr.
Ee = L e · cos θ · d [W m−2 ].
2πsr
Glossary 629
where d2 is the radiant flux transmitted by an elementary beam passing through the given
point and propagating in the solid angle d containing the given direction; dA is the area
of a section of that beam containing the given point; θ is the angle between the normal to
that section and the direction of that beam.
radiance (luminance) factor (at a point on the surface of a non-self-radiating body, in a
given direction, under specified conditions of irradiation (illumination)) β
Ratio of radiance (luminance) of the body to that of a perfect reflecting or transmitting
diffuser identically irradiated (illuminated).
radiant exitance (at a point of a surface) Me
Quotient of the radiant flux de leaving an element of the surface containing the point, by
the area of that element:
de
Me = [W m−2 ].
dA
Equivalent definition: integral, taken over the hemisphere visible from the given point, of the
expression L e cos θd, where L e is the radiance at the given point in the various directions
of the incident elementary beams of solid angle d, and θ is the angle between any of these
Glossary 631
dQ e
He = = E e dt [W s m−2 ].
dA t
de
Ie = [W sr−1 ].
d
radiometry
Measurement of the quantities associated with radiation.
Rayleigh scattering (in a medium)
Diffusion of radiation in the course of its passage through a medium containing particles
the size of which is small compared with the wavelength of the radiation.
reflectance factor (at a point on the surface, for the part of the reflected radiation contained
in a given cone with its apex at the point on the surface, and for incident spectral composition
and geometric distribution)
Ratio of the radiant flux reflected in the directions delimited by the cone to that reflected
in the same directions by a perfect reflecting diffuser identically irradiated. Note: if the
solid angle of the cone approaches zero, then the reflectance factor approaches the radiance
factor. If the solid angle of the cone approaches 2π , then the reflectance factor approaches
the reflectance.
reflectance (optical) density Dρ
Logarithm to base 10 of the reciprocal of the reflectance. Dρ = − log10 ρ.
reflection
Return of radiation by a surface without change of frequency of the monochromatic compo-
nents of which the radiation is composed. “Regular reflection”: reflection without diffusion
in accordance with the laws of optical reflection as in a mirror. “Diffuse reflection”: dif-
fusion by reflection in which, on the macroscopic scale, there is no regular reflection.
“Retro-reflection”: reflection in which radiation is returned in directions close to the direc-
tion from which it came, this property being maintained over wide variations of the direction
of the incident radiation.
632 Glossary
reflectivity ρ∞
Reflectance of a layer of material of such a thickness that there is no change of reflectance
with increase in thickness.
refractive index
Ratio of the velocity of light in vacuum to the phase velocity of that in a medium.
related color
Color perceived to belong to an area or object in relation to other perceived colors in the
visual field.
relative responsivity ŝ
Ratio of the responsivity s(X ) when the detector input is X to the responsivity s(N ) when
the detector input is the reference irradiation N : ŝ = s(X )/s(N ).
relative spectral responsivity ŝ(λ)
Ratio of the spectral responsivity s(λ) of the detector at wavelength λ to a given reference
value sm : ŝ(λ) = s(λ)/sm .
responsivity s
Quotient of the detector output Y by the detector input X : s = Y / X . Note: if the de-
tector output is Y0 when there is no input, and is Y1 when the input is X , then
s = (Y1 − Y0 )/X .
retro-reflecting material
Material in which is incorporated a large number of very small elements which, by refraction
and reflection, produce the phenomenon of retro-reflection when they become the surface
as the material wears. (See, also, reflection.)
saturation
Attribute of visual sensation which permits a judgement to be made of the proportion of
pure chromatic color in the total sensation.
scotopic vision
Vision by the normal eye when it is adapted to levels of luminance of less than some
hundredths of a candela per square meter.
secondary light source
Surface or object which is not self-emitting but receives light and redirects it, at least in
part, by reflection or transmission.
sensation
Element of mental content of a sense impression which cannot be analyzed further.
spectral distribution X λ (of a radiant, luminous or photon quantity X (λ))
Quotient of the radiant or luminous or photon quantity dX (λ) contained in an elementary
range of dλ of wavelength at the wavelength λ, by that range.
dX (λ)
Xλ = .
dλ
spectral luminous efficiency (V (λ) for photopic vision or V (λ) for scotopic vision)
Ratio of the radiant flux at wavelength λm to that at wavelength λ such that both radiations
Glossary 633
produce equally intense luminous sensations under specified photometric conditions and
λm is chosen so that the maximum value of this ratio is equal to 1.
spectral responsivity s(λ)
Quotient of the detector output dY (λ) by the monochromatic detector input dX (λ) =
X λ (λ)dλ:
dY (λ)
s(λ) = .
dX (λ)
spectroradiometer
Instrument for measuring the spectral concentration of radiant energy or power.
spectrophotometer
Instrument for measuring the ratio of two spectral radiometric quantities.
Stiles–Crawford effect
Decrease of the brightness of a light stimulus with the position of entry of the light pencil
through the pupil. This is called the Stiles–Crawford effect of the first kind. If the variation
is in hue and saturation instead of in brightness, it is called the Stiles–Crawford effect of
the second kind.
Talbot’s law
If a point of the retina is excited by a light stimulus undergoing periodic variations in
magnitude at a frequency exceeding the fusion frequency, the visual sensation produced
is identical to that produced by a steady stimulus whose magnitude is equal to the mean
magnitude of the variable stimulus taken over one period.
tone reproduction
Relationship of the luminance (luminance factor, L ∗ , log luminance, density) in a scene or
original to the luminance (luminance factor, L ∗ , log luminance, density) in a reproduction.
Tone reproduction is usually used to describe an imaging system (such as a system consisting
of a digital camera and a digital printer).
tonescale
Tonescale is a rule of mapping between two numerical representations of photometric
quantities (such as luminance, luminance factor, L ∗ , and density). For example, the tonescale
of a printer means the mapping of the input digital values (representing some recorded
photometric quantities of an image) to the photometrically measured values of the printed
output. Tonescale is usually used to describe an imaging device (such as a printer or a film).
transmission
Passage of radiation through a medium without change of frequency of the monochromatic
components of which the radiation is composed.
transmittance (for incident radiation of given spectral composition, polarization and geo-
metrical distribution) τ
Ratio of the transmitted radiant or luminous flux to the incident flux in the given conditions.
transmittance (optical) density Dτ
Logarithm to base 10 of the reciprocal of the transmittance. Dτ = − log10 τ .
634 Glossary
trichromatic system
System of color specification based on the possibility of matching a color stimulus by the
additive mixture of three suitably chosen reference stimuli.
tristimulus values of a color stimulus
Amounts of the three reference color stimuli (primaries), in a given trichromatic system,
required to match the color of the stimulus considered. For example, if a color stimulus (C)
is matched by the three primaries (R), (G), and (B), as expressed by (C) ≡ R(R) + G(G) +
B(B), then R, G, B are the tristimulus values of the color (C).
troland
Unit used for expressing the magnitude of the external light stimulus applied to the eye.
When the eye is viewing a surface of uniform luminance, the number of trolands is equal to
the product of the area in square millimeters of the limiting pupil and the luminance of the
surface in candelas per square meter. Note: in computing retinal illuminance, absorption and
reflection losses and the dimension of the particular eye under consideration must be taken
into account. If all the corrections are negligible, 1 troland produces a retinal illuminance
of approximately 0.002 lm m−2 .
ultraviolet radiation
Radiation with a wavelength less than that of visible light and longer than 1 nm.
uniform color space
Color space in which the distance between any two colors is intended to represent a measure
of the perceived difference between the corresponding colors.
unrelated color
Color perceived to belong to an area with completely dark surroundings.
veiling glare
Light reflected from an imaging medium that has not been modulated by the means used to
produce the image.
viewing flare
Veiling glare that is observed in a viewing environment, but not accounted for in radiometric
measurements made using the prescribed measurement geometry and calibration conditions.
visible radiation
Any optical radiation capable of causing visual sensation directly. The lower limit is gen-
erally taken to be between 360 nm and 400 nm, and the upper limit between 760 nm and
830 nm.
wave number (σ )
The reciprocal of the wavelength [m−1 ].
References
[1] W. Abney, “On the changes in hue of spectrum colours by dilution with white light,”
Proceedings of the Royal Society (London), A83, 120–124, 1910.
[2] W. Abney, Researches in Colour Vision and the Trichromatic Theory, London:
Longmans, Green, 1913.
[3] K. Adachi, T. Nishimura, and K. Iwabe, “Parallel signal processing system for a
super high-resolution digital camera,” in Proc. IS&T’s 2003 PICS Conference, 340–
344, 2003.
[4] J.E. Adams, Jr., “Interaction between color plane interpolation and other image
processing functions in electronic photography,” SPIE Proceedings, 2416, 144–151,
1995.
[5] J.E. Adams, Jr., “Design of practical color filter array interpolation algorithms for
digital cameras,” SPIE Proceedings, 3028, 117–125, 1997.
[6] J.E. Adams, Jr., “Design of practical color filter array interpolation algorithms for
digital cameras, Part 2,” Proc. IEEE International Conference on Image Processing,
Chicago, 1, 488–492, 1998.
[7] J. Adams, K. Parulski, and K. Spaulding, “Color processing in digital cameras,”
IEEE Micro, 18, 6, 20–30, 1998.
[8] M.J. Adams and P.A. Dolin, Printing Technology, 5th edition, Clifton Park, NY:
Delmar Publishers, 2002.
[9] E.H. Adelson, “Perceptual organization and the judgement of brightness,” Science,
262, 2042–2044, 1993.
[10] E.H. Adelson, “Lightness perception and lightness illusions,” in The Cognitive Neu-
rosciences, edited by M. Gazzaniga, pp. 339–351, Cambridge, MA: MIT Press, 2000.
[11] G.A. Agoston, Color Theory and Its Application in Art and Design, 2nd edition,
Berlin: Springer-Verlag, 1987.
[12] G. Agranov, V. Berezin, and R.H. Tsai, “Crosstalk and microlens study in a color
CMOS image sensor,” IEEE Transactions on Electron Devices, 50, 1, 4–11, 2003.
[13] M. Aguilar and W.S. Stiles, “Saturation of the rod mechanism of the retina at high
levels of stimulation,” Optica Acta, 1, 59–65, 1954.
[14] P.K. Ahnelt and R. Pflug, “Telodendrial contacts between foveolar cone pedicles in
the human retina,” Experientia, 42, 298–300, 1986.
[15] A.J. Ahumada and A.B. Watson, “Equivalent-noise model for contrast detection
and discrimination,” Journal of the Optical Society of America, A, 2, 7, 1133–1139,
1985.
635
636 References
[16] J.S. Alkofer, Tone Value Sample Selection in Digital Image Processing Method
Employing Histogram Normalization, US Patent 4,654,722, March 31, 1987.
[17] J.P. Allebach, (editor), Selected Papers on Digital Halftoning, Bellingham WA: SPIE
Press, 1999.
[18] E. Allen, “Colorant formulation and shading,” in Optical Radiation Measurements,
Volume 2, Color Measurement, edited by F. Grum and C.J. Bartleson, New York,
NY: Academic Press, 1980.
[19] D. Alleysson, S. Süsstrunk, and J. Hérault, “Color demosaicing by estimating lumi-
nance and opponent chromatic signals in the Fourier domain,” Proc. IS&T/SID 10th
Color Imaging Conference, 331–336, 2002.
[20] D. Alleysson, S. Süsstrunk, and J. Marguier, “Influence of spectral sensitivity func-
tions on color demosaicing,” Proc. IS&T/SID 11th Color Imaging Conference, 351–
357, 2003.
[21] J.H. Altman, “The measurement of RMS granularity,” Applied Optics, 3, 35–38,
1964.
[22] T. Amano, Method of Determining Exposure Amounts in Photographic Printing, US
Patent 3,873,201, March 25, 1975.
[23] T. Amano and R. Andoh, Process of and System for Printing in Color Photography,
US Patent 3,888,580, June 10, 1975.
[24] I. Amidror, “Scattered data interpolation methods for electronic imaging systems:
a survey,” Journal of Electronic Imaging, 11, 2, 157–176, 2002.
[25] R. Anderson, “Matrix description of radiometric quantities,” Applied Optics, 30, 7,
858–867, 1991.
[26] T.W. Anderson, An Introduction to Multivariate Statistical Analysis, 2nd edition,
New York, NY: John Wiley & Sons, Inc., 1984.
[27] H.C. Andrews and B.R. Hunt, Digital Image Restoration, Englewood Cliffs, NJ:
Prentice-Hall, 1977.
[28] T.M. Apostol, Calculus, Volume II, 2nd edition, pp. 269–271, New York, NY: John
Wiley & Sons, Inc., 1969.
[29] R.A. Applegate and V. Lakshminarayanan, “Parametric representation of Stiles–
Crawford functions: normal variation of peak location and directionality,” Journal
of the Optical Society of America, A, 10, 7, 1611–1623, 1993.
[30] L.E. Arend and A. Reeves, “Simultaneous colour constancy,” Journal of the Optical
Society of America, A, 3, 10, 1743–1751, 1986.
[31] D. Armitage, “Liquid-crystal display device fundamentals,” in Electro-Optical
Displays, edited by M.A. Karim, New York, NY: Marcel Dekker, 1992.
[32] J.S. Arney and M. Alber, “Optical effects of ink spread and penetration on halftones
printed by thermal ink jet,” Journal of Imaging Science and Technology, 42, 4,
331–334, 1998.
[33] J.S. Arney and E. Pray, “Kubelka–Munk theory and the Yule–Nielsen effect on
halftones,” Journal of Imaging Science and Technology, 43, 4, 365–370, 1999.
[34] P. Artal, J. Santamaria, and J. Bescos, “Retrieval of wave aberration of human eyes
from actual point-spread-function data,” Journal of the Optical Society of America,
A, 5, 8, 1201–1206, 1988.
References 637
[35] P. Artal, “Calculations of two-dimensional foveal retinal images in real eyes,” Jour-
nal of the Optical Society of America, A, 7, 8, 1374–1381, 1990.
[36] P. Artal, M. Ferro, I. Miranda, and R. Navarro, “Effects of aging in retinal image
quality,” Journal of the Optical Society of America, A, 10, 7, 1656–1662, 1993.
[37] P. Artal and R. Navarro, “Monochromatic modulation transfer function of the human
eye for different pupil diameters: an analytical expression,” Journal of the Optical
Society of America, 11, 1, 246–249, 1994.
[38] N.W. Ashcroft and N.D. Mermin, Solid State Physics, Fort Worth, TX: Saunders,
1976.
[39] ASMT (American Society for Testing and Materials), ASTM Standards on Color
and Apearance Measurement, 3rd Edition, Philadelphia, PA: ASTM, 1991.
[40] J. Astola, P. Haavisto, and Y. Neuvo, “Vector median filters, Proceedings of the
IEEE, 78, 678–689, 1990.
[41] D.A. Atchison, W.N. Charman, and R.L. Woods, “Subjective depth-of-focus of the
human eye,” Optometry and Visual Science, 74, 511–520, 1997.
[42] G.G. Attridge, C. Leverton, M.R. Pointer, and R.E. Jacobson, “Measured colour
difference and the acceptability of colour prints,” The Journal of Photographic
Science, 44, 14–17, 1996.
[43] P.W. Atkins, Molecules, Scientific American Library, New York, NY: W.H. Freeman,
1987.
[44] P.W. Atkins, Physical Chemistry, 5th edition, New York, NY: W.H. Freeman, 1994.
[45] L.L. Avant, “Vision in the Ganzfeld,” Psychological Bulletin, 64, 4, 246–258,
1965.
[46] H.D. Baker, “The instantaneous threshold and early dark adaptation,” Journal of the
Optical Society of America, 43, 9, 798–803, 1953.
[47] R. Bala, R. DeQueiroz, R. Eschbach, and W. Wu, “Gamut mapping to preserve
spatial luminance variations,” Journal of Imaging Science and Technology, 45, 5,
436–443, 2001.
[48] R.M. Balboa, C.W. Tyler, and N.M. Grzywacz, “Occlusions contribute to scaling in
natural images,” Vision Research, 41, 955–964, 2001.
[49] P. Ball, Designing the Molecular World, Princeton, NJ: Princeton University Press,
1994.
[50] P. Bamberg and S. Sternberg, A Course in Mathematics for Students of Physics: 1,
Cambridge: Cambridge University Press, 1988.
[51] P. Bamberg and S. Sternberg, A Course in Mathematics for Students of Physics: 2,
Cambridge: Cambridge University Press, 1990.
[52] R. Barker, Neuroscience: An Illustrated Guide, New York, NY: Ellis Horwood, 1991.
[53] E. Barlow, “Dark and light adaptation,” in Handbook of Sensory Physiology, VII/4,
Visual Psychophysics, edited by D. Jameson and L.M. Hurvich, Berlin: Springer-
Verlag, 1972.
[54] H.B. Barlow and J.D. Mollon, (eds.), The Senses, Cambridge: Cambridge University
Press, 1982.
[55] R.B. Barlow, R.R. Birge, E. Kaplan, and J.R. Tallent, “On the molecular origin of
photoreceptor noise”, Nature, 366, 64–66, 1993.
638 References
[56] R.B. Barlow, Jr., and R.T. Verrillo, “Brightness sensation in a Ganzfeld,” Vision
Research, 16, 1291–1297, 1976.
[57] K. Barnard, Practical Colour Constancy, Ph.D. thesis, Simon Fraser University,
School of Computing, 1999.
[58] P.Y. Barnes, E.A. Early, and A.C. Parr, NIST Measurement Services: Spectral Re-
flectance, NIST Special Publication 250-48, National Institute of Standards and
Technology, Gaithersburg, MD, March 1998.
[59] M. Barni, F. Buti, F. Fartolini, and V. Cappellini, “A quasi-Euclidean norm to speed
up vector median filtering,” IEEE Transactions on Image Processing, 9, 10, 1704–
1709, 2000.
[60] C.R. Barr, J.R. Thirtle, and P.W. Vittum, “Development inhibitor releasing (DIR)
couplers in color photography,” Photographic Science and Engineering, 13, 2, 74–80
(the typographical errors were corrected on pages 214–217), 1969.
[61] H.H. Barrett, “Objective assessment of image quality: effects of quantum noise and
object variability,” Journal of the Optical Society of America, A, 7, 7, 1266–1278,
1990.
[62] H.H. Barrett, J.L. Denny, R.F. Wagner, and K.J. Meyers, “Objective assessment of
image quality. II. Fisher information, Fourier crosstalk, and figures of merit for
task performance,” Journal of the Optical Society of America, A, 12, 5, 834–852,
1995.
[63] H.H. Barrett, C.K. Abbey, and E. Clarkson, “Objective assessment of image quality.
III. ROC metrics, ideal observers, and likelihood-generating functions,” Journal of
the Optical Society of America, A, 15, 6, 1520–1536, 1998.
[64] H.H. Barrett and K.J. Meyers, Foundations of Image Science, New York, NY: John
Wiley & Sons, 2004.
[65] D.E. Barrick, “Relationship between slope probability density function and the phys-
ical optics integral in rough surface scattering,” Proc. IEEE. 56, 1728–1729, 1968
(correction in 57, 256, 1969).
[66] D.E. Barrick, “Rough surfaces,” in Radar Cross-Section Handbook, edited by G.T.
Ruck, D.E. Barrick, W.D. Stuart and C.K. Krichbaum, New York, NY: Plenum Press,
1970.
[67] R.S. Barrows and R.N. Wolfe, “A review of adjacency effects in silver photographic
images,” Photographic Science and Engineering, 15, 6, 472–479, 1971.
[68] G.S. Barsh, “What controls variations in human skin color?” PLoS Biology,
(http://biology.plosjournals.org), 1, 1, 19–22, 2003.
[69] P.G.J. Barten, “Evaluation of subjective image quality with the square-root integral
method,” Journal of the Optical Society of America, A, 7, 10, 2024–2031, 1990.
[70] P.G.J. Barten, Contrast Sensitivity of the Human Eye and Its Effects on Image Quality,
SPIE Optical Engineering Press, Bellingham, WA, 1999.
[71] C.J. Bartleson and R.W. Huboi, “Exposure determination methods for color printing:
The concept of optimum correction level,” Journal of the Society of Picture and
Television Engineering, 65, 205–215, April 1956.
[72] C.J. Bartleson, “Influence of observer adaptation on the acceptance of color prints,”
Photographic Science and Engineering, 2, 1, 32–39, 1958.
References 639
[73] C.J. Bartleson, “Some observations on the reproduction of flesh colors,” Photo-
graphic Science and Engineering, 3, 3, 114–117, 1959.
[74] C.J. Bartleson and C.P. Bray, “On the preferred reproduction of flesh, blue-sky, and
green-grass colors,” Photographic Science and Engineering, 6, 1, 19–25, 1962.
[75] C.J. Bartleson and E.J. Breneman, “Brightness perception in complex fields,” Journal
of the Optical Society of America, 57, 7, 953–957, 1967.
[76] C.J. Bartleson and E.J. Breneman, “Brightness reproduction in the photographic
process”, Photographic Science and Engineering, 11, 254–262, 1967.
[77] C.J. Bartleson, “Criterion for tone reproduction,” Journal of the Optical Society of
America, 58, 7, 992–995, 1968.
[78] C.J. Bartleson and E.J. Breneman, “Differences among responses of observers in
scaling brightness,” Color 73, pp. 398–400, 1973.
[79] C.J. Bartleson, “A review of chromatic adaptation,” in Color 77, edited by F.W.
Billmeyer, Jr., and G. Wyszecki, pp. 63–96, Bristol: Adam Hilger, 1978.
[80] C.J. Bartleson, “Colorimetry,” in Optical Radiation Measurements, Volume 2, edited
by F. Grum and C.J. Bartleson, New York, NY: Academic Press, 1980.
[81] C.J. Bartleson, “On chromatic adaptation and persistence,” Color Research and
Application, 6, 3, 153–160, 1981.
[82] C.J. Bartleson, “The combined influence of sharpness and graininess on the quality
of colour prints,” The Journal of Photographic Science, 30, 33–38, 1982.
[83] R.E. Bartow, W.K. Darrow, and W.T. Hartmann, CRT Device Light Versus Input
Signal Characteristic Function, US Patent 4,862,265, Aug. 29, 1989.
[84] M. Bass (ed.), Handbook of Optics, Volumes I and II, New York, NY: McGraw-Hill,
1995.
[85] E. Baumgardt, “Threshold quantal problems,” in Handbook of Sensory Physiol-
ogy, VII/4, Visual Psychophysics, edited by D. Jameson and L.M. Hurvich, Berlin:
Springer-Verlag, 1972.
[86] G.A. Baxes, Digital Image Processing: Principles and Applications, New York, NY:
John Wiley and Sons, 1994.
[87] B.E. Bayer, “An optimum method for two-level rendition of continuous-tone pic-
tures,” IEEE International Conference on Communications, 1, 26.11–26.15, 1973.
[88] B.E. Bayer, Color Imaging Array, US Patent 3,971,065, July 20, 1976.
[89] D.A. Baylor, T.D. Lamb, and K.-W. Yau, “Responses of retinal rods to single pho-
tons,” Journal of Physiology, 288, 613–634, 1979.
[90] D.A. Baylor, “Photoreceptor signals and vision,” Investigative Ophthalmology and
Visual Science, 28, 1, 34–49, 1987.
[91] M.E. Becker, “Evaluation and characterization of display reflectance,” SID 97
Digest, 827–830, 1997.
[92] P. Beckmann and A. Spizzichino, The Scattering of Electromagnetic Waves from
Rough Surfaces, New York, NY: Pergamon, 1963.
[93] J. Belloni, M. Treguer, H. Remita, and R.R. De Heyzer, “Enhanced yield of pho-
toinduced electrons in doped silver halide crystals,” Nature, 402, 865–867, 1999.
[94] A.G. Bennett and R.B. Rabbetts, Clinical Visual Optics, 2nd edition, London:
Butterworths, 1989.
640 References
[95] J.M. Bennett, and L. Mattsson, Introduction to Surface Roughness and Scattering,
Washington, DC: Optical Society of America, 1989.
[96] O. Ben-Shahar and S.W. Zucker, “Hue field and color curvatures: a perceptual orga-
nization approach to color image denoising,” in Proc. 2003 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, Volume II, pp. 713–720,
2003.
[97] K.B. Benson (ed.), Television Engineering Handbook, New York, NY: McGraw-Hill
Book Company, 1986.
[98] M.J. Beran and G.B. Parrent, Jr., Theory of Partial Coherence, Englewood Cliffs,
NJ: Prentice-Hall, 1964.
[99] M.A. Berkley, F. Kitterle, and D.W. Watkins, “Grating visibility as a function of
orientation and retinal eccentricity,” Vision Research, 15, 239–244, 1975.
[100] S.M. Berman, G. Fein, D.J. Jewett, and F. Ashford, “Luminance controlled pupil
size affects Landolt C test performance,” Journal of the Illuminating Engineering
Society, 22, 2, 150–165, 1993.
[101] S.M. Berman, D.J. Jewett, B.R. Benson, and T.M. Law, “Despite different wall
colors, vertical scotopic illuminance predicts pupil size,” Journal of the Illuminating
Engineering Society, 26, 2, 59–64, 1997.
[102] R.S. Berns and K.H. Petersen, “Empirical modeling of systematic spectrophotomet-
ric error,” COLOR Research and Application, 13, 243–256, 1988.
[103] R. Berns, R. Motta, and M. Gorzynski, “CRT Colorimetry. Part I: Theory and
Practice,” COLOR Research and Application, 18, 5, 299–314, 1993. R. Berns,
M. Gorzynski, and R. Motta, “‘CRT Colorimetry. Part II: Metrology,” COLOR
Research and Application, 18, 5, 315–325, 1993.
[104] R.S. Berns, Billmeyer and Saltzman’s Principles of Color Technology, 3rd edition,
New York, NY: John Wiley and Sons, 2000.
[105] R.S. Berns, S.R. Fernandez, and L. Taplin, “Estimating black-level emissions of
computer-controlled displays,” COLOR Research and Application, 28, 5, 379–383,
2003.
[106] M.J. Berry, D.K. Warland, and M. Meister, “The structure and precision of retinal
spike trains,” Proc. of National Academy of Science, 94, 5411–5416, 1997.
[107] D.M. Berson, F.A. Dunn, and M. Takao, “Phototransduction by retinal ganglion cells
that set the circadian clock,” Science, 295, 1070–1073, 2002.
[108] D.P. Bertsekas, Nonlinear Programming, Belmont, MA: Athena Scientific, 1999.
[109] E.I. Betensky, “Photographic lenses,” Chapter 1 in Applied Optics and Optical En-
gineering, Vol. VIII, edited by R.R. Shannon and J.C. Wyant, pp. 1–30, New York,
NY: Academic Press, Inc., 1980.
[110] P.R. Bevington, Data Reduction and Error Analysis for Physical Sciences, New
York, NY: McGraw-Hill, 1969.
[111] F.W. Billmeyer and M. Saltzman, Principles of Color Technology, 2nd edition, New
York, NY: John Wiley & Sons, Inc., 1981.
[112] F. Birren, “Application of the Ostwald color system to the design of consumer
goods,” Journal of the Optical Society of America, 34, 7, 396–399, 1944.
[113] R.B. Blackman and J.W. Tukey, The Measurement of Power Spectra, New York, NY:
Dover, 1959.
References 641
[114] H.R. Blackwell, “Contrast thresholds of the human eye,” Journal of the Optical
Society of America, 36, 11, 624–643, 1946.
[115] C. Blakemore (ed.), Vision: Coding and Efficiency, Cambridge: Cambridge Univer-
sity Press, 1990.
[116] A.J. Blanksby and M.J. Loinaz, “Performance analysis of a color CMOS photogate
image sensor,” IEEE Transactions on Electron Devices, 47, 1, 55–64, 2000.
[117] H. W. Bodmann, P. Haubner, and A. M. Marsden, “A unified relationship between
brightness and luminance”, CIE Proceedings, Kyoto Session, pp. 99–102, 1979.
[118] K.R. Boff, L. Kaufman, and J.P. Thomas, Handbook of Perception and Human
Performance, Volume I, Sensory Processes and Perception, New York, NY: John
Wiley and Sons, 1986.
[119] K.R. Boff, L. Kaufman, and J.P. Thomas, Handbook of Perception and Human
Performance, Volume II, Cognitive Processes and Performance, New York, NY:
John Wiley and Sons, 1986.
[120] C.F. Bohren and D.R. Huffman, Absorption and Scattering of Light by Small Parti-
cles, New York, NY: John Wiley and Sons, 1983.
[121] C.F. Bohren, “Scattering by particles,” in Handbook of Optics, Volume I, edited by
M. Bass, New York, NY: McGraw-Hill, 1995.
[122] M.E. Bond and D. Nickerson, “Color-order system, Munsell and Ostwald,” Journal
of the Optical Society of America, 32, 709–719, 1942.
[123] I. Borg and P. Groenen, Modern Multidimensional Scaling: Theory and Applications,
New York, NY: Springer, 1997.
[124] M. Born and E. Wolf, Principles of Optics, 6th (corrected) edition, Oxford: Pergamon
Press, 1983.
[125] M. Born and E. Wolf, Principles of Optics, 7th (expanded) edition, Cambridge:
Cambridge University Press, 1999.
[126] B.B. Boycott and H. Wassle, “Morphological classification of bipolar cells of the
primate retina,” European Journal of Neuroscience, 3, 1069–1088, 1991.
[127] R.W. Boyd, Radiometry and the Detection of Optical Radiation, New York, NY:
John Wiley & Sons, Inc., 1983.
[128] R.N. Bracewell, The Fourier Transform and Its Applications, 2nd edition, New York,
NY: McGraw-Hill, 1986.
[129] R.N. Bracewell, The Fourier Transform and Its Applications, 3rd edition, Boston,
MA: McGraw-Hill, 2000.
[130] R. Bradbury, “Thermal printing,” Chapter 6 in Chemistry and Technology of Print-
ing and Imaging Systems, edited by P. Gregory, London: Blackie Academic and
Professional, 1996.
[131] D. Bradley, “Plastics that play on light,” Science, 261, 1272–1273, 1993.
[132] D.H. Brainard and B.A. Wandell, “Asymmetric color matching: how color appear-
ance depends on the illuminant,” Journal of the Optical Society of America, A, 9, 9,
1433–1448, 1992.
[133] D.H. Brainard and W.T. Freeman, “Bayesian color constancy,” Journal of the Optical
Society of America, A, 14, 7, 1393–1411, 1997.
[134] D.H. Brainard, A. Roorda, Y. Yamauchi, J.B. Calderone, A.B. Metha, M. Neitz,
J. Meitz, D.R. Williams, and G.H. Jacobs, “Functional consequences of individual
642 References
[153] P. Buser and M. Imbert, Vision, translated by R.H. Kay, Cambridge, MA: MIT Press,
1992.
[154] W. Busselt and R. Raue, “Optimizing the optical properties of TV phosphor screens,”
Journal of the Electrochemical Society, 135, 3, 764–771, 1988.
[155] H. Bustard and R. Smith, “Investigation into the scattering of light by human hair,”
Applied Optics, 24, 30, 3485–3491, 1991.
[156] A.J. Calabria and M.D. Fairchild, “Perceived image contrast and observer preference
I. The effects of lightness, chroma, and sharpness manipulations on contrast percep-
tion,” Journal of Imaging Science and Technology, 47, 6, 479–493, 2003. “Perceived
image contrast and observer preference II. Empirical modeling of perceived image
contrast and observer preference data,” Journal of Imaging Science and Technology,
47, 6, 494–508, 2003.
[157] F.W. Campbell and D.G. Green, “Optical and retinal factors affecting visual resolu-
tion,” Journal of Physiology (London), 181, 576–593, 1965.
[158] F.W. Campbell and R.W. Gubisch, “Optical quality of the human eye,” Journal of
Physiology, 186, 558–578, 1966.
[159] F.W. Campbell, J.J. Kulikowski, and J. Levinson, “The effect of orientation on the
visual resolution of gratings,” Journal of Physiology, 187, 427–436, 1966.
[160] K. Campton, Image Performance in CRT Displays, Bellingham, WA: SPIE Press,
2003.
[161] J.F. Canny, Finding Edges and Lines in Images, MIT. Artificial Intelligence Labora-
tory, Cambridge, MA, AI TR 720, 1983. (Also, “A computational approach to edge
detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 8, 6,
679–698, 1986.)
[162] F.E. Carlson and C.N. Clark, “Light sources for optical devices,” in Applied Optics
and Optical Engineering, Volume I, edited by R. Kingslake, pp. 43–109, New York,
NY: Academic Press, 1965.
[163] C.R. Carvonius, “Not seeing eye to eye,” Nature, 370, 259–260, 1994.
[164] J. Castanet and J.-P. Ortonne, “Hair melanin and hair color,” in Formation and
Structure of Human Hair, edited by P. Jolles, H. Zhan, and H. Hocker, pp. 209–225,
Basel: Birkhäuser Verlag, 1997.
[165] K.R. Castleman, Digital Image Processing, Upper Saddle River, NJ: Prentice Hall,
1996.
[166] P.B. Catrysse and B.A. Wandell, “Optical efficiency of image sensor pixels,” Journal
of the Optical Society of America, A, 19, 8, 1610–1620, 2002.
[167] L.M. Chalupa and J.S. Werner, (eds.), The Visual Neurosciences, Cambridge, MA:
MIT Press, 2004.
[168] S. Chandrasekhar, Radiative Transfer, New York, NY: Dover, 1960.
[169] W.N. Charman and J. Tucker, “Dependence of accommodation response on the spa-
tial frequency spectrum of the observed object,” Vision Research, 17, 129–139, 1977.
[170] W.N. Charman and H. Whitefoot, “Pupil diameter and the depth-of-field of the
human eye as measured by laser speckle,” Optica Acta, 24, 12, 1211–1216, 1977.
[171] W.N. Charman and J. Tucker, “Accommodation as a function of object form,” Amer-
ican Journal of Optometry, 55, 84–92, 1978.
644 References
[172] W.N. Charman and J. Tucker, “Accommodation and color,” Journal of the Optical
Society of America, 68, 4, 459–471, 1978.
[173] W.N. Charman, “The retinal image in the human eye,” in Progress in Retinal Re-
search, Volume 2, pp. 1–50, Oxford: Pergamon Press, 1982.
[174] W.N. Charman (ed.), Vision and Visual Dysfunction, Volume 1, Visual Optics and
Instrumentation, Boca Raton, FL: CRC Press, 1991.
[175] E.W. Cheney, Multivariate Approximation Theory: Selected Topics, Philadelphia,
PA: Society for Industrial and Applied Mathematics, 1986.
[176] H.-Y. Cheng and Y.-C. King, “A CMOS image sensor with dark-current cancellation
and dynamic sensitivity operations,” IEEE Transactions on Electron Devices, 50, 1,
91–95, 2003.
[177] T.L.V. Cheung and S. Westland, “Accurate estimation of the non-linearity of input–
output response for color digital cameras,” in Proc. IS&T’s 2003 PICS Conference,
366–369, 2003.
[178] M.E. Chevreul, De la loi de contraste simultane des couleurs, Paris: Leo Laget,
1969. (Original work published in 1839.)
[179] C.D. Child, “Discharge from hot CAO,” Physical Review (series I), 32, 5, 492–511,
1911.
[180] R. Chung, “A statistical method for image classification and tone reproduction de-
termination,” Journal of Applied Photographic Engineering, 3, 2, 74–81, 1977.
[181] E.L. Church, “Comments on the correlation length,” Proceedings of SPIE, 680,
102–111, 1986.
[182] E.L. Church and P.Z. Takacs, “Surface scatttering,” in Handbook of Optics, Volume
I, edited by M. Bass, New York, NY: McGraw-Hill, 1995.
[183] CIE, ISO/CIE 10527, (CIE S002-1986), Colorimetric Observers, Vienna: CIE Cen-
tral Bureau, 1991.
[184] CIE, Publication CIE 13.3, Method of Measuring and Specifying Colour Rendering
Properties of Light Sources, Vienna: CIE Central Bureau.
[185] CIE, Publication CIE 41, Light as a True Visual Quantity: Principles of Measure-
ment, Vienna: CIE Central Bureau, 1978.
[186] CIE, Publication CIE 15.2, Colorimetry, 2nd edition, Vienna: CIE Central Bureau,
1986.
[187] CIE, Publication CIE 17.4, International Lighting Vocabulary, 4th edition, Vienna:
CIE Central Bureau, 1987.
[188] CIE, Publication CIE 81, Mesopic Photometry: History, Special Problems, and
Practical Solutions, Vienna: CIE Central Bureau, 1989.
[189] CIE, Publication CIE 85, Solar Spectral Irradiance, Vienna: CIE Central Bureau,
1989.
[190] CIE, Publication CIE 86, CIE 1988 2◦ Spectral Luminous Efficiency Function for
Photopic Vision, Vienna: CIE Central Bureau, 1990.
[191] CIE, Publication CIE 116, Industrial Colour-Difference Evaluation, Vienna: CIE
Central Bureau, 1995.
[192] CIE, Publication CIE 127, Measurement of LEDs, Vienna: CIE Central Bureau,
1997.
References 645
[193] CIE, Publication CIE 131, The CIE 1997 Interim Colour Appearance Model (Simple
Version), CIECAM97s, Vienna: CIE Central Bureau, 1998.
[194] CIE, Publication CIE 142, Improvement to Industrial Colour-Difference Evaluation,
Vienna: Central Bureau of the CIE, 2001.
[195] The Camera and Imaging Product Association, Sensitivity of Digital Cameras, CIPA
DC-004, July 27, 2004.
[196] K.J. Ciuffreda, “Accommodation and its anomalies,” in Visual Optics and Instru-
mentation, edited by W.N. Charman, pp. 231–279, Boca Raton, FL: CRC Press,
1991.
[197] L.D. Clark, “Mathematical prediction of photographic picture from tone-
reproduction data,” Photographic Science and Engineering, 11, 5, 306–315, 1967.
[198] R.N. Clark, Visual Astronomy of the Deep Sky, New York, NY: Cambridge University
Press, 1990.
[199] J.E. Cogoli, Photo-Offset Fundamentals, 5th edition, Encino, CA: Glencoe Publish-
ing, 1986.
[200] J. Cohen, “Dependency of the spectral reflectance curves of the Munsell color chips,”
Psychonomic Science, 1, 369–370, 1964.
[201] J.B. Cohen and W.E. Kappauf, “Metameric color stimuli, fundamental metamers,
and Wyszecki’s metameric black,” American Journal of Psychology, 95, 4, 537–564,
1982.
[202] J.B. Cohen and W.E. Kappauf, “Color mixture and fundamental metamers: Theory,
algebra, geometry, application,” American Journal of Psychology, 98, 2, 171–259,
1985.
[203] W. Cohen, “Spatial and textural characteristics of the Ganzfeld,” The American
Journal of Psychology, 70, 3, 403–410, 1957.
[204] W. Cohen, “Color-perception in the chromatic Ganzfeld,” The American Journal of
Psychology, 71, 2, 390–394, 1958.
[205] D.R. Cok, Signal Processing Method and Apparatus for Sampled Image Signals,
US Patent 4,630,307, December 16, 1986.
[206] D.R. Cok, Apparatus and Accompanying Methods for Achieving Automatic Color
Balancing in a Film to Video Transfer System, US Patent 4,945,406, July 31, 1990.
[207] D.R. Cok, “Reconstruction of CCD images using template matching,” Final Pro-
grams and Advanced Printing of Papers of IS&T’s 47th Annual Conference, Volume
II, 380–385, 1994.
[208] E. Collett, Polarized Light: Fundamentals and Applications, New York, NY: Marcel
Dekker, 1993.
[209] P.J. Collings and M. Hird, Introduction to Liquid Crystals Chemistry and Physics,
London: Taylor and Francis, 1997.
[210] D. Collins, The Story of Kodak, New York, NY: Harry N. Abrams, Inc., 1990.
[211] A.H. Compton, “The spectrum of scattered X-rays,” Physical Review, 22, 409–413,
1923.
[212] H.R. Condit, “Natural Sources”, Section 1.3 in Handbook of Photographic Sci-
ence and Engineering, 2nd edition, edited by C.N. Proudfoot, Springfield, VA: The
Society for Imaging Science and Technology, 1997.
646 References
[213] R.L. Cook and K.E. Torrance, “A reflectance model for computer graphics,” ACM
Transactions on Graphics, 1, 1, 7–24, 1982.
[214] J.H. Coote, Photofinishing Techniques and Equipment, London: The Focal Press,
1970.
[215] T.N. Cornsweet, Visual Perception, Orlando, FL: Academic Press, 1970.
[216] F. Cost, Pocket Guide to Digital Printing, Albany, NY: Delmar Publishers, 1997.
[217] R. Cotterill, The Cambridge Guide to the Material World, Cambridge: Cambridge
University Press, 1985.
[218] W. Cowan, “Displays for vision research,” in Handbook of Optics, Volume I, edited
by M. Bass, New York, NY: McGraw-Hill, 1995.
[219] T.F. Cox and M.A.A. Cox, Multidimensional Scaling, 2nd edition, Boca Raton, FL:
Chapman & Hall/CRC, 2000.
[220] K.J.W. Craik, The Nature of Psychology. A Selection of Papers, Essays and Other
Writings by the Late K.J.W. Craik, edited by S.L. Sherwood, Cambridge: Cambridge
University Press, 1966.
[221] E.M. Crane, “An objective method for rating picture sharpness: SMT acutance,”
Journal of the Society of Motion Picture and Television Engineering, 73, 643–647,
1964.
[222] E.M. Crane, “Acutance and granulance,” Proceedings of SPIE, 310, 125–132, 1981.
[223] H.D. Crane, J.D. Peter, and E. Martinez-Uriegas, Method and Apparatus for
Decoding Spatiochromatically Multiplexed Color Image Using Predetermined
Coefficients, US Patent 5,901,242, May 4, 1999.
[224] C.A. Curcio, K.R. Sloan, R.E. Kalina, and A.E. Hendrickson, “Human photoreceptor
topography,” The Journal of Comparative Neurology, 292, 497–523, 1990.
[225] C.A. Curcio, K. Allen, K.R. Sloan, C.L. Lerea, J.B. Hurley, I.B. Klock, and A.H.
Milam, “Distribution and morphology of human cone photoreceptors stained with
anti-blue opsin,” The Journal of Comparative Neurology, 312, 610–624, 1991.
[226] D.M. Dacey, “Dendritic field size and morphology of midget and parasol ganglion
cells of the human retina,” Proceedings of the National Academy of Science, 89,
9666–9670, 1992.
[227] D.M. Dacey, “The mosaic of midget ganglion cells in the human retina,” The Journal
of Neuroscience, 13, 12, 5334–5355, 1993.
[228] D.M. Dacey and B.B. Lee, “The ‘blue-on’ opponent pathway in primate retina orig-
inates from a distinct bistratified ganglion cell type,” Nature, 367, 731–735, 1994.
[229] D.M. Dacey, “Circuitry for color coding in the primate retina,” Proceedings of the
National Academy of Science, USA, 93, 582–588, 1996.
[230] D.M. Dacey, L.C. Diller, J. Verweij, and D.R. Williams, “Physiology of L- and M-
cone inputs to H1 horizontal cells in the primate retina,” Journal of the Optical
Society of America, A, 17, 589–596, 2000.
[231] D.M. Dacey, “Parallel pathways for spectral coding in primate retina,” Annual
Reviews: Neurosciences, 23, 743–775, 2000.
[232] J.C. Dainty and R. Shaw, Image Science, London: Academic Press, 1974.
[233] S. Daly, “Application of a noise-adaptive contrast sensitivity function to image data
compression,” Optical Engineering, 29, 8, 977–987, 1992.
References 647
[234] S. Daly, “The visible differences predictor: An algorithm for the assessment of image
fidelity,” in Digital Images and Human Vision, edited by A.B. Watson, pp. 179–206,
Cambridge, MA: MIT Press, 1993.
[235] K.J. Dana and J. Wang, “Device for convenient measurement of spatially varying
bidirectional reflectance,” Journal of the Optical Society of America, A, 21, 1, 1–12,
2004.
[236] H.J.A. Dartnall, J.K. Bowmaker, and J.D. Mollon, “Human visual pigments: mi-
crospectral photometric results from the eyes of seven persons,” Proceedings of the
Royal Society of London, B, 220, 115–130, 1983.
[237] A. Davies and P. Samuels, An Introduction to Computational Geometry for Curves
and Surfaces, Oxford: Clarendon, 1996.
[238] G. Davis and J. Driver, “Parallel detection of Kanizsa subjective figures in the human
visual system,” Nature, 371, 791–793, 1994.
[239] P.J. Davis, Interpolation and Approximation, New York, NY: Dover, 1975.
[240] M.L. Davison, Multidimensional Scaling, Melbourne, FL: Krieger, 1993.
[241] N.W. Daw, R.J. Jensen and W.J. Brunken, “Rod pathways in mammalian retinae,”
Trends in Neuroscience, 13, 3, 110–115, 1990.
[242] R.J. Deeley, N. Drasdo and W.N. Charman, “A simple parametric model of the
human ocular modulation transfer function,” Ophthalmic and Physiological Optics,
11, 91–93, 1991.
[243] S.G. de Groot and J.W. Gebhard, “Pupil size as determined by adapting luminance,”
Journal of the Optical Society of America, 42, 492–495, 1952.
[244] P.B. Delahunt and D.H. Brainard, “Does human color constancy incorporate the
statistical regularity of natural daylight?”, Journal of Vision, 4, 57–81, 2004.
(http://journalofvision.org/4/2/1)
[245] C.J. Delcroix and M.A. Abidi, “Fusion of edge maps in color images,” Proceedings
of SPIE, 1001, 545–554, 1988.
[246] F. Deleixhe-Mauhin, J.M. Krezinski, G. Rorive, G.E. Pierard, “Quantification of skin
color in patients undergoing maintenance hemodialysis,” Journal of the American
Academy of Dermatology, 27, 6, 1, 950–953, 1992.
[247] P. DeMarco, J. Pokorny, and V.C. Smith, “Full-spectrum cone sensitivity functions
for X-chromosome-linked anomalous trichromats,” Journal of the Optical Society
of America, A, 9, 9, 1465–1476, 1992.
[248] L.E. De Marsh, “Optimum telecine transfer characteristics,” Journal of the Society
of Motion Picture and Television Engineering, 81, 784–787, 1972.
[249] W. den Boer, F.C. Luo, and Z. Yaniv, “Microelectronics in active-matrix LCDs and
image sensors,” in Electro-Optical Displays, edited by M.A. Karim, New York, NY:
Marcel Dekker, 1992.
[250] H. de Ridder, “Minkowski-metrics as a combination rule for digital-image-coding
impairments,” Proceedings of SPIE, 1616, 16–26, 1992.
[251] H. de Ridder, “Psychophysical evaluation of image quality: from judgement to
impression,” Proceedings of SPIE, 3299, 252–263, 1998.
[252] A.M. Derrington, J. Krauskopf, and P. Lennie, “Chromatic mechanisms in lateral
geniculate nucleus of macaque,” Journal of Physiology, 357, 241–265, 1984.
648 References
[253] E.A. DeYoe and D.C. Van Essen, “Concurrent processing streams in monkey visual
cortex,” Trends in Neurosciences, 11, 5, 219–226, 1988.
[254] J.M. DiCarlo and B.A. Wandell, “Illuminant estimation: beyond the bases,” in Proc.
the 8th IS&T Color Imaging Conference, 91–96, 2000.
[255] J.M. DiCarlo and B.A. Wandell, “Spectral estimation theory: beyond linear but
before Bayesian,” Journal of the Optical Society of America, A, 20, 7, 1261–1270,
2003.
[256] P. Dierckx, Curve and Surface Fitting with Splines, Oxford: Clarendon Press, 1993.
[257] R.W. Ditchburn, Eye-Movements and Visual Perception, Oxford: Clarendon Press,
1973.
[258] R.W. Ditchburn, Light, New York, NY: Dover, 1991. (Original text published in
1961.)
[259] S. Di Zenzo, “A note on the gradient of a multi-image,” Computer Vision, Graphics,
and Image Processing, 33, 116–125, 1986.
[260] D.L. Donoho and I.M. Johnstone, “Wavelet shrinkage: Asymptopia?” Journal of the
Royal Statistical Society, B, 57, 2, 301–369, 1995.
[261] A. Doroszkowski, “Paints,” in Technological Applications of Dispersions, edited by
R.B. McKay, New York, NY: Marcel Dekker, 1994.
[262] E.R. Dougherty, Digital Image Processing Methods, New York, NY: Marcel Dekker,
1994.
[263] R.F. Dougherty, V.M. Koch, A.A. Brewer, B. Fischer, J. Modersitzki, and B.A. Wan-
dell, “Visual field representations and locations of visual areas V1/2/3 in human vi-
sual cortex,” Journal of Vision, 3, 586–598, 2003. (http://journalofvision.org/3/10/1)
[264] J.E. Dowling, The Retina: An Approachable Part of the Brain, Cambridge, MA: The
Belknap Press of Harvard University Press, 1987.
[265] N. Drasdo and C.W. Fowler, “Nonlinear projection of the retinal image in a wide-
angle schematic eye,” British Journal of Ophthalmology, 58, 709–714, 1974.
[266] R.O. Dror, A.S.Willsky, and E.H. Adelson, “Statistical characterization of real-
world illumination,” Journal of Vision, 4, 821–837, 2004. (http://journalofvision.
org/4/9/11)
[267] P.M. Duffieux, The Fourier Transform and Its Applications to Optics, 2nd edition,
New York, NY: John Wiley & Sons, 1983. (Original text published in 1946.)
[268] S.Q. Duntley, “The optical properties of diffusing materials,” Journal of the Optical
Society of America, 32, 2, 61–70, 1942.
[269] D.B. Dusenbery, Sensory Ecology, New York, NY: W.H. Freeman, 1992.
[270] M. D’Zmura and G. Iverson, “Color constancy. I. Basic theory of two-stage linear
recovery of spectral descriptions for lights and surfaces,” and “Color constancy. II.
Results for two-stage linear recovery of spectral descriptions for lights and surfaces,”
Journal of the Optical Society of America, A, 10, 10, 2148–2165, and 2166–2180,
1993.
[271] Eastman Kodak Company, Encyclopedia of Practical Photography, Vol. 12 (Shutter),
Garden City, NY: American Photographic Book Publishing Company, 1979.
[272] Eastman Kodak Company, Advanced Color Printing Technology for Photofinishers
and Professional Finishers, Rochester, NY: Eastman Kodak Company, 1979.
References 649
[273] Eastman Kodak Company, Shutter Operations for CCD and CMOS Image Sensors,
Application Note MTD/PS-0259, Eastman Kodak Company, Rochester, NY,
October 23, 2001.
[274] E.A. Edwards and S.Q. Duntley, “Pigment and color in living human skin,” American
Journal of Anatomy, 65, 1–33, 1939.
[275] E.A. Edwards and S.Q. Duntley, “Analysis of skin pigment changes after exposure
to sunlight,” Science, 90, 235–237, 1939.
[276] W.G. Egan and T.W. Hilgeman, Optical Properties of Inhomogeneous Materials,
New York, NY: Academic Press, 1979.
[277] D.C. Ehn and M.B. Silevitch, “Diffusion model for the adjacency effect in viscous
development,” Journal of the Optical Society of America, 64, 5, 667–676, 1974.
[278] W. Ehrenstein, Probleme der ganzheitspsychologischen Wahrnehmungslehre,
Leipzig: J.A. Barth, 1954.
[279] N.R. Eldred, Chemistry for the Graphic Arts, 2nd edition, Pittsburgh, PA: Graphic
Arts Technical Foundation, 1992.
[280] J.M. Elson, “Theory of light scattering from a rough surface with an inhomgeneous
dielectric permittivity,” Physical Review, B, 30, 5460–5480, 1984.
[281] P.G. Engeldrum, “Psychometric scaling: avoiding the pitfalls and hazards,” in Proc.
IS&T’s 2001 PICS Conference, 101–107, 2001.
[282] P.G. Engeldrum, “Extending image quality models,” in Proc. IS&T’s 2002 PICS
Conference, 65–69, 2002.
[283] J.M. Enoch and F.L. Tobey, Jr. (eds.), Vertebrate Photoreceptor Optics, Berlin:
Springer-Verlag, 1981.
[284] K. Era, S. Shionoya, and Y. Washizawa, “Mechanism of broad-band luminescences
in ZnS phosphors,” Journal of Physics and Chemistry of Solids, 29, 1827, 1968.
[285] R.M. Evans and J. Klute, “Brightness constancy in photographic reproduction,”
Journal of the Optical Society of America, 34, 533–540, 1944.
[286] R.M. Evans, An Introduction to Color, New York, NY: John Wiley and Sons, Inc.,
1948.
[287] R.M. Evans, “On some aspects of white, gray, and black,” Journal of the Optical
Society of America, 39, 9, 774–779, 1949.
[288] R.M. Evans, Method for Correcting Photographic Color Prints, US Patent
2,571,697, Oct. 16, 1951. (The patent application was filed on June 20, 1946.)
[289] R.M. Evans, W.T. Hanson, and W.L. Brewer, Principles of Color Photography,
New York, NY: John Wiley and Sons, Inc., 1953.
[290] R.M. Evans, The Perception of Color, New York, NY: Wiley, 1974.
[291] M.D. Fairchild and P. Lennie, “Chromatic adaptation to natural and incandescent
illuminants,” Vision Research, 32, 11, 2077–2085, 1992.
[292] M.D. Fairchild, Color Appearance Models, Reading, MA: Addison-Wesley, 1997.
[293] M.D. Fairchild and D.R. Wyble, Colorimetric Characterization of the Apple Studio
Display (Flat Panel LCD), Munsell Color Science Laboratory Technical Report,
Center for Imaging Science, Rochester Institute of Technology, July, 1998.
[294] M.D. Fairchild, “A revision of CIECAM97s for practical applications,” COLOR
Research and Application, 26, 418–427, 2001.
650 References
[295] M.D. Fairchild and G.M. Johnson, “Image appearance modeling,” Proceedings of
SPIE, 5007, 149–160, 2003.
[296] H.S. Fairman, and H. Hemmendinger, “Stability of ceramic color reflectance stan-
dard,” COLOR Research and Application, 23, 6, 408–415, 1998.
[297] D.S. Falk, D.R. Brill, and D.G. Stork, Seeing the Light, New York, NY: John Wiley &
Sons, 1986.
[298] J.C. Flamagne, “Psychophysical measurement and theory,” in Handbook of Percep-
tion and Human Performance, Volume 1, edited K.R. Boff, L. Kaufman, and J.P.
Thomas, New York, NY: John Wiley & Sons, 1986.
[299] J. Fan and I. Gijbels, Local Polynomial Modelling and Its Applications, London:
Chapman and Hall, 1996.
[300] H. Farid, “Blind inverse gamma correction,” IEEE Transactions on Image Process-
ing, 10, 10, 1428–1433, 2001.
[301] P.A. Farrant, Color in Nature, London: Blandford, 1997.
[302] J.E. Farrell and B.A. Wandell, “Scanner linearity,” Journal of Electronic Imaging,
2, 3, 225–230, 1993.
[303] T.W. Faulkner and T.M. Rice, “The use of photographic space in the development
of the disc photographic system,” in Proceedings of the Journal of Applied Photo-
graphic Engineering, 36th Annual Meeting, 9, 2, 52–57, 1983.
[304] E.A. Fedorovskaya, H. De Ridder, and F.J.J. Blommaert, “Chroma variations and
perceived quality of color images of natural scenes,” COLOR Research and Appli-
cation, 22, 96, 1997.
[305] B. Fergg, W. Zahn, and W. Knapp, Automatic Color Printing Apparatus, US Patent
4,101,217, July 18, 1978.
[306] R.P. Feynman, R.B. Leighton, and M.L. Sands, The Feynman Lectures in Physics,
Volumes I, II, and III, Reading, MA: Addison-Wesley, 1965.
[307] R.P. Feynman, QED: The Strange Theory of Light and Matter, Princeton, NJ:
Princeton University Press, 1985.
[308] D.J. Field, “Relations between the statistics of natural images and the response
properties of cortical cells,” Journal of the Optical Society of America, A, 4, 12,
2379–2394, 1987.
[309] G.G. Field, Color and Its Reproduction, Pittsburgh, PA: Graphic Arts Technical
Foundation, 1988.
[310] G.G. Field, Tone and Color Correction, Pittsburgh, PA: Graphic Arts Technical
Foundation, 1991.
[311] G.D. Finlayson and S.D. Hordley, “Improving Gamut mapping color constancy,”
IEEE Transactions on Image Processing, 9, 10, 1774–1783, 2000.
[312] G.D. Finlayson and G. Schaefer, “Convex and non-convex illuminant constraints
for dichromatic colour constancy,” Proc. Conf. on Computer Vision and Pattern
Recognition, I, 598–604, 2001.
[313] G.D. Finlayson, S.D. Hordley, and P.M. Hubel, “Color by correlation: a simple,
unifying framework for color constancy,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, 23, 11, 1209–1221, 2001.
[314] G.D. Finlayson and S.D. Hordley, “Color constancy at a pixel,” Journal of the Optical
Society of America, A, 18, 2, 253–264, 2001.
References 651
[315] G.D. Finlayson, S.D. Hordley, and I. Tastl, “Gamut constrained illuminant estima-
tion,” Proceedings of the Ninth IEEE International Conference on Computer Vision,
792–799, 2003.
[316] A. Fiorentini, “Mach band phenomena,” in Handbook of Sensory Physiology,
Volume VII/4, edited by D. Jameson and L.M. Hurvich, pp. 188–201, Berlin:
Springer-Verlag, 1972.
[317] N.I. Fisher, “Spherical median,” Journal of the Royal Statistical Society, B, 47, 2,
342–348, 1985.
[318] N.I. Fisher, T. Lewis, and B.J.J. Embleton, Statistical Analysis of Spherical Data,
Cambridge: Cambridge University Press, 1987.
[319] R.W. Floyd and L. Steinberg, “An adaptive algorithm for spatial grayscale,” Pro-
ceedings of the Society for Information Display, 17, 2, 75–77, 1976.
[320] J.D. Foley, A. van Dam, S.K. Feiner, and J.F. Hughes, Computer Graphics Principles
and Practice, 2nd edition, pp. 568–573, Reading, MA: Addison-Wesley Publishing
Company, 1990.
[321] S.C. Foo, A Gonioreflectometer for Measuring the Bidirectional Reflectance of
Material for Use in Illumination Computation, Master Thesis, Cornell University,
1997.
[322] D.A. Forsyth, “A novel algorithm for color constancy,” International Journal of
Computer Vision, 5, 1, 5–36, 1990.
[323] G.E. Forsythe, M.A. Malcolm, and C.B. Moler, Computer Methods for Mathematical
Computations, Chapter 9, Englewood Cliffs, NJ: Prentice-Hall, 1977.
[324] C.E. Foss, D. Nickerson, and W.C. Granville, “Analysis of the Ostwald color system,”
Journal of the Optical Society of America, 34, 7, 361–381, 1944.
[325] E.R. Fossum, “CMOS image sensor: Electronic camera-on-a-chip,” IEEE Transac-
tions on Electron Devices, 44, 10, 1689–1698, 1997.
[326] R. L. Foster, The Nature of Enzymology, New York, NY: Halsted Press, 1980.
[327] R. Franke, “Scattered data interpolation: Tests of some methods,” Mathematics of
Computation, 38, 157, 181–200, 1982.
[328] W.T. Freeman, Method and Apparatus for Reconstructing Missing Color Samples,
US Patent 4,663,655, 1987.
[329] H. Frieser, “Spread function and contrast transfer function of photographic layers,”
Photographic Science and Engineering, 4, 6, 324–329, 1960.
[330] J. Fryer and D. Brown, “Lens distortion for close-range photogrammetry,” Pho-
togrammetric Engineering and Remote Sensing, 52, 1, 51–58, 1986.
[331] J.G. Fryer, “Camera calibration in non-topographic photogrammetry,” in Non-
Topographic Photogrammetry, 2nd edition, edited by H.M. Karara, Falls Church,
VA: American Society for Photogrammetry and Remote Sensing, 1989.
[332] B.V. Funt, V. Cardei, and K. Barnard, “Learning color constancy,” Proc. IS&T/SID
4th Color Imaging Conference, 58–60, 1996.
[333] M. Fürsich and H. Treiber, B. Fergg, G. Findeis, and W. Zahn, Method of Copying
Color Exposures, US Patent 4,566,786, January 28, 1986.
[334] R.S. Gairns, “Electrophotography,” Chapter 4 in Chemistry and Technology of Print-
ing and Imaging Systems, edited by P. Gregory, London: Blackie Academic and
Professional, 1996.
652 References
[355] W.J. Gordon and J.A. Wixom, “Shepard’s method of metric interpolation to bivariate
and multivariate interpolation,” Mathematics of Computation, 32, 141, 253–264,
1978.
[356] C.H. Graham and Y. Hsia, “Saturation and the foveal achromatic interval,” Journal
of the Optical Society of America, 59, 993–997, 1969.
[357] E.M. Granger and K.N. Cupery, “An optical merit function (SQF), which correlates
with subjective image judgements,” Photographic Science and Engineering, 16, 3,
221–230, 1972.
[358] W.C. Granville and E. Jacobson, “Colorimetric specification of the Color Harmony
Manual from spectrophotometric measurements,” Journal of the Optical Society of
America, 34, 7, 382–395, 1944.
[359] H.G. Grassman, “Theory of compound colors,” English translation collected in
Sources of Color Science, edited by D.L. MacAdam, Cambridge MA: MIT Press,
1970, pp. 53–60.
[360] L.S. Gray, B. Winn, and B. Gilmartin, “Accommodative microfluctuations and pupil
diameter,” Vision Research, 33, 15, 2083–2090, 1993.
[361] D.R. Green and J.A. Swets, Signal Detection Theory and Psychophysics, Los Altos,
CA: Peninsula Publishing, 1988.
[362] P.J. Green and L.W. MacDonald (eds.), Colour Engineering, Chichester: John Wiley
and Sons, 2002.
[363] W.B. Green, Digital Image Processing: A System Approach, 2nd edition, New York,
NY: Van Nostrand Reinhold, 1989.
[364] N.C. Greenham and R.H. Friend, “Semiconductor device physics of conjugated
polymers,” Solid State Physics, 49, 1–149, 1995.
[365] A.R. Greenleaf, Photographic Optics, New York, NY: MacMillan Company,
1950.
[366] R.L. Gregory, Eye and Brain, 3rd edition, Princeton, NJ: Princeton University Press,
1990.
[367] D.H. Grosof, R.M. Shapley, and M.J. Hawken, “Macaque V1 neurons can signal
illusory contours”, Nature, 366, 550–552, 1993.
[368] M.D.Grossberg and S.K. Nayar, “Modeling the space of camera response functions,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 10, 1272–
1282, 2004.
[369] F. Grum and R.J. Becherer, Optical Radiation Measurements, Volume I, Radiometry,
New York, NY: Academic Press, 1979.
[370] J.P. Guilford, Psychometric Methods, 2nd edition, New York, NY: McGraw-Hill,
1954.
[371] V.W. Guillemin and S. Sternberg, Symplectic Techniques in Physics, Cambridge:
Cambridge University Press, 1984.
[372] B.K. Gunturk, Y. Altunbasak, and R.M. Mersereau, “Color plane interpolation using
alternating projection,” IEEE Transactions on Image Processing, 11, 9, 997–1013,
2002.
[373] M. Gur, “Color and brightness fade-out in the Ganzfeld is wavelength dependent,”
Vision Research, 29, 10, 1335–1341, 1989.
654 References
[374] M. Gur, “Perceptual fade-out occurs in the binocularly viewed Ganzfeld,” Perception,
20, 645–654, 1991.
[375] E.F. Gurnee and R.T. Fernandez, Organic Electroluminescent Phosphors, US Patent
3,172,862, March 9, 1965.
[376] J. Hagerman, “Optimum spot size for raster-scanned monochrome CRT displays,”
Journal of the Society of Information Display, 1/3, 367–369, 1993.
[377] R. Hailstone, “Making every photon count,” Nature, 402, 23, 856–857, 1999.
[378] J.R. Hamerly and C.A. Dvorak, “Detection and discrimination of blur in edges and
lines,” Journal of the Optical Society of America, 71, 448–452, 1981.
[379] W.T. Hanson Jr. and C.A. Horton, “Subtractive color reproduction: interimage ef-
fects,” Journal of the Optical Society of America, 42, 9, 663–669, 1952.
[380] R.M. Haralick, “Edge and region analysis for digital image data,” Computer Vision,
Graphics, and Image Processing, 12, 60–73, 1980.
[381] W.A. Harrison, Solid State Theory, New York, NY: Dover, 1979.
[382] T. Hase, T. Kano, E. Nakazawa, and H. Yamamoto, “Phosphor materials for cathode-
ray tubes,” Advances in Electronics and Electron Physics, 79, 271–373, 1990.
[383] S. Hattar, H.-W. Liao, M. Takao, D.M. Berson, and K.-W. Yau, “Melanopsin-
containing retinal ganglion cells: architecture, projections, and intrinsic photosen-
sitivity,” Science, 295, 1065–1070, 2002.
[384] X.D. He, K.E. Torrance, F.X. Sillion, and D.P. Greenberg, “A comprehensive phys-
ical model of light reflection,” Computer Graphics (SIGGRAPH ’91 Conference
Proceedings) 25, 4, 175–186, 1991.
[385] E. Hecht, Optics, 2nd edition, Reading, MA: Addison-Wesley Publishing Company,
1987.
[386] M. Hebert and R.D. Hersch, “Classical print reflection models: a radiometric ap-
proach,” Journal of Imaging Science and Technology, 48, 4, 363–374, 2004.
[387] S. Hecht, S. Shlaer, and M.H. Pirenne, “Energy, quanta, and vision,” Journal of
General Physiology, 25, 819–840, 1942.
[388] E.G. Heinemann, “Simultaneous brightness induction,” in Handbook of Sensory
Physiology, VII/4, Visual Psychophysics, edited by D. Jameson and L.M. Hurvich,
pp. 146–169, Berlin: Springer-Verlag, 1972.
[389] H. Helson, “Fundamental principles in color vision. I. The principle governing
changes in hue, saturation, and lightness of non-selective samples in chromatic
illumination,” Journal of Experimental Psychology, 23, 439–471, 1938.
[390] H. Helson, Adaptation-Level Theory, New York, NY: Harper and Row, 1964.
[391] B. Henderson and G.F. Imbusch, Optical Spectroscopy of Inorganic Solids, Oxford:
Clarendon Press, 1989.
[392] S.H.C. Hendry and T. Yoshioka, “A neurochemically distinct third channel in the
Macaque dorsal lateral geniculate nucleus,” Science, 264, 575–577, 1994.
[393] S.H.C. Hendry and R.C. Reid, “The koniocellular pathway in primate vision,” Annual
Reviews: Neurosciences, 23, 127–153, 2000.
[394] M. Henle, Modern Geometry: The Analytic Approach, Upper Saddle River, NJ:
Prentice Hall, 1997.
[395] R.T. Hennessy, T. Iida, K. Shina, and H.W. Leibowitz, “The effect of pupil size on
accommodation,” Vision Research, 16, 587–589, 1976.
References 655
[396] E. Hering, Outlines of a Theory of the Light Sense, translation by L.M. Hurvich and
D. Jamesin, Cambridge, MA: Harvard University Press, 1964. (Originally published
in 1920.)
[397] L. Hermann, “Eine Erscheinung des simultanen Kontrastes,” Pflügers Archiv für die
gesamte Physiologie, 3, 13–15, 1870.
[398] E.H. Hess, “Attitude and pupil size,” Scientific American, 212, 4, 44–54, 1965.
[399] S. Hesselgren, “Why color order systems,” Color Research and Application, 9, 4,
220–228, 1984.
[400] G.C. Higgins and L.A. Jones, “The nature and evaluation of the sharpness of pho-
tographic images,” Journal of the SMPTE, 58, 277–290, 1952.
[401] G.C. Higgins, “Methods for analyzing the photographic system, including the ef-
fects of nonlinearity and spatial frequency response,” Photographic Science and
Engineering, 15, 2, 106–118, 1971.
[402] F.B. Hildebrand, Advanced Calculus for Applications, 2nd edition, Englewood Cliffs,
NJ: Prentice-Hall, 1976.
[403] D.D. Hoffman, Visual Intelligence, New York, NY: W.W. Norton and Company,
1998.
[404] H. Hogrefe and C. Kunz, “Soft x-ray scattering from rough surfaces: experimental
and theoretical analysis,” Applied Optics, 26, 14, 2851–2859, 1987.
[405] T.M. Holladay, “An optimum algorithm for halftone generation for displays and hard-
copies,” Proceedings of the Society for Information Display, 21, 2, 185–192, 1980.
[406] A. Holloway, The Handbook of Photographic Equipment, pp. 11, 21, New York,
NY: Alfred A. Knopf, Inc., 1981.
[407] J.M. Holm, Pictorial Digital Image Processing Incorporating Adjustments to Com-
pensate for Dynamic Range Differences, US Patent 6,628,823, September 30, 2003.
[408] R.T. Holm, “Convention confusion,” in Handbook of Optical Constants of Solids II,
pp. 21–55, edited by E.D. Palik, San Diego, CA: Academic Press, 1991.
[409] D. C. Hood, T. Ilves, E. Maurer, B. Wandell, and E. Buckingham, “Human cone
saturation as a function of ambient intensity: A test of models of shifts in the dynamic
range,” Vision Research, 18, 983–993, 1978.
[410] H.H. Hopkins, “The frequency response of a defocused optical system,” Proceedings
of the Royal Society of London, A, 231, 91–103, 1955.
[411] G.R. Hopkinson, “Analytic modeling of charge diffusion in charge-coupled-device
imagers,” Optical Engineering, 26, 8, 766–772, 1987.
[412] B.K.P. Horn and R.W. Sjoberg, “Calculating the reflectance map,” Applied Optics,
18, 11, 1770–1779, 1979.
[413] B.K.P. Horn, “Exact reproduction of colored images,” Computer Vision, Graphics,
and Image Processing, 26, 135–167, 1984.
[414] P. Horowitz and W. Hill, The Art of Electronics, 2nd edition, Cambridge: Cambridge
University Press, 1989.
[415] W.E. Howard, “Better displays with organic films,” Scientific American, 290, 2,
76–81, 2004.
[416] J. Huang and D. Mumford, “Statistics of natural images and models,” in Proc. IEEE
Computer Society Conference on Computer Vision and Pattern Recognition, 1, 541–
547, 1999.
656 References
[417] J.S. Huang and D.H. Tseng, “Statistical theory of edge detection,” Computer Vision,
Graphics, and Image Processing, 43, 337–346, 1988.
[418] D.H. Hubel, Eye, Brain, and Vision, Scientific American Library, New York, NY:
W.H. Freeman, 1988.
[419] M.H. Hueckel, “An operator which locates edges in digitized pictures,” Journal of
the Association for Computing Machinery, 18, 1, 113–125, 1971.
[420] M.E. Hufford and H.T. Davis, “The diffraction of light by a circular opening and
the Lommel wave theory,” Physical Review, 33, 589–597, 1929.
[421] J. Hughes and J.K. Bowker, “Automatic color printing techniques,” Image Technol-
ogy, 39–43, April/May 1969.
[422] P.-C. Hung, “Colorimetric calibration in electronic imaging devices using a look-
up-table model and interpolation,” Journal of Electronic Imaging, 2, 1, 53–61,
1993.
[423] P.-C. Hung, “Smooth colorimetric calibration technique utilizing the entire color
gamut of CMYK printers,” Journal of Electronic Imaging, 3, 4, 415–424, 1994.
[424] P.-C. Hung, “Camera sensitivity evaluation and primary optimization considering
color constancy,” Proc. IS&T/SID Tenth Color Imaging Conference, 127–132, 2002.
[425] R.W.G. Hunt, “The effects of daylight and tungsten light-adaptation on color per-
ception,” Journal of the Optical Society of America, 40, 6, 362–371, 1950.
[426] R.W.G. Hunt, “Light and dark adaptation and the perception of color,” Journal of
the Optical Society of America, 42, 190–199, 1952.
[427] R.W.G. Hunt, I.T. Pitt, and P.C. Ward, “The tone reproduction of colour photographic
materials,” The Journal of Photographic Science, 17, 198–204, 1969.
[428] R.W.G. Hunt, I.T. Pitt, and L.M. Winter, “The preferred reproduction of blue sky,
green grass and Caucasian skin,” The Journal of Photographic Science, 22, 144–150,
1974.
[429] R.W.G. Hunt, The Reproduction of Colour in Photography, Printing, and Television,
4th edition, pp. 441–442, Tolworth: Fountain Press, 1987.
[430] R.W.G. Hunt, Measuring Colour, Chichester: Ellis Horwood Limited, 1987; 2nd
edition, 1991.
[431] R.W.G. Hunt, Measuring Colour, 3rd edition, Kingston-upon-Thames: Fountain
Press, 1998.
[432] R.W.G. Hunt, “Revised colour-appearance model for related and unrelated colours,”
COLOR Research and Application, 16, 3, 146–165, 1991.
[433] R.W.G. Hunt, The Reproduction of Colour in Photography, Printing, and Television,
5th edition, Tolworth: Fountain Press, 1995.
[434] A. Hurlbert, “Formal connections between lightness algorithms,” Journal of the Op-
tical Society of America, A, 3, 10, 1684–1693, 1986. “Synthesizing a color algorithm
from examples,” Science, 239, 482–485, 1988.
[435] A. Hurlbert, The Computation of Color, Ph.D. Thesis, Department of Brain and Cog-
nitive Science, Massachusetts Institute of Technology, Cambridge, MA, September
1989.
[436] L.M. Hurvich, Color Vision, Sunderland, MA: Sinauer Associates Inc., 1981.
[437] M.C. Hutley, Diffraction Gratings, London: Academic Press, 1982.
References 657
[438] J.K. IJspeert, T.J.T.P. van den Berg, and H. Spekreijse, “An improved mathematical
description of the foveal visual point spread function with parameters for age, pupil
size, and pigmentation,” Vision Research, 33, 1, 15–20, 1993.
[439] I. Inoue, N. Tanaka, H. Yamashita, T. Yamaguchi, H. Ishiwata, and H. Ihara, “Low-
leakage-current and low-operating-voltage buried photodiode for a CMOS imager,”
IEEE Transactions on Electron Devices, 50, 1, 43–47, 2003.
[440] M. Inui, “A fast algorithm for computing the colour gamuts of subtractive colour
mixtures,” The Journal of Photographic Science, 38, 163–164, 1990.
[441] M. Inui, “A fast algorithm computing optimal colors” (in Japanese), Journal of
the Sciety of Photographic Science and Technology of Japan, 57, 6, 420–423,
1994.
[442] A. Ishihara, Condensed Matter Physics, New York, NY: Oxford University Press,
1991.
[443] A. Ishii, “Color management technology for digital film mastering,” in Proc.
IS&T/SID Eleventh Color Imaging Conference, 319–325, 2003.
[444] ISO (International Organization for Standardization), Photography – Density Mea-
surements, Part 1, Terms, Symbols and Notations, ISO 5/1-1984(E/F/R); Part 2,
Geometric Conditions for Transmission Density, ISO 5-2:1991(E); Part 3, Spectral
Conditions, ISO 5/3-1984(E); Part 4, Geometric Conditions for Reflection Density,
ISO 5/4-1983(E).
[445] ISO (International Organization for Standardization), Photography – Electronic Still
Picture Cameras – Resolution Measurements, ISO 12233:2000, 2000.
[446] ISO (International Organization for Standardization), Graphic Tecnology – Pre-
press Digital Data Exchange – Colour Targets for Input Scanner Calibration, ISO
12641:1997, 1997.
[447] ISO (International Organization for Standardization), Photography – Electronic Still
Picture Cameras – Methods for Measuring Opto-electronic Conversion Functions
(OECFs), ISO 14524:2000, 2000.
[448] ISO (International Organization for Standardization), Guide to the Expression of
Uncertainty in Measurement, 1995.
[449] M. Ito and A. Ishii, “A non-iterative procedure for rapid and precise camera cali-
bration,” Pattern Recognition, 27, 2, 301–310, 1994.
[450] M. Ito and N. Katoh, “Three-dimensional gamut mapping using various color dif-
ference formulae and color spaces,” Proceedings of SPIE, 3648, 83–95, 1999.
[451] S. Itoh and M. Tanaka, “Current status of field-emission displays,” Proceedings of
the IEEE, 90, 4, 514–520, 2002.
[452] N.G. Jablonski and G. Chaplin, “Skin deep,” Scientific American, 287, 4, 74–81,
2002.
[453] J.D. Jackson, Classical Electrodynamics, 2nd edition, New York, NY: John Wiley
& Sons, 1975.
[454] G.H. Jacobs and M.P. Rowe, “Evolution of vertebrate colour vision,” Clinical and
Experimental Optometry, 87, 4–5, 206–216, 2004.
[455] R.E. Jacobson, “An evaluation of image quality metrics,” The Journal of Photo-
graphic Science, 43, 7–16, 1995.
658 References
[456] B. Jähne, Digital Image Processing: Concepts, Algorithms, and Scientific Applica-
tions, 4th edition, Berlin: Springer-Verlag, 1997.
[457] A.K. Jain, Fundamentals of Digital Image Processing, Englewood Cliffs, NJ:
Prentice-Hall, 1989.
[458] J. Jakubowski, “Scanner performance from edge slope and cutoff frequencies,” Op-
tical Engineering, 35, 7, 1993–2004, 1996. Errata: 36, 8, 2361, 1997.
[459] T.H. James (ed.), The Theory of the Photographic Process, 4th edition, New York,
NY: Macmillan, 1977.
[460] J.R. Janesick, T. Elliott, S. Collins, M.M. Blouke, and J. Freeman, “Scientific charge-
coupled devices,” Optical Engineering, 26, 8, 692–714, 1987.
[461] J.R. Janesick, Scientific Charge-Coupled Devices, Bellingham, WA: SPIE Press,
2001.
[462] J.R. Janesick, “Lux transfer: Complementary metal oxide semiconductors versus
charge-coupled devices,” Optical Engineering, 41, 6, 1203–1215, 2002.
[463] R. Janssen, Computational Image Quality, Bellingham, WA: SPIE Press, 2001.
[464] J.R. Jarvis, “The calculation of sharpness parameters for color negative materials
incorporating DIR coupler,” in Photographic and Electronic Image Quality, Royal
Photographic Symposium, University of Cambridge, September 1984, pp. 10–20,
1984.
[465] J.A.M. Jennings and W.N. Charman, “An analytical approximation for the modula-
tion transfer function of the eye,” British Journal of Physiological Optics, 29, 64–72,
1974.
[466] H.W. Jensen, S.R. Marschner, M. Levoy, and P. Hanrahan, “A practical model for
subsurface light transport,” ACM Proceedings of SIGGRAPH 2001, 511–518, 2001.
[467] J. Jin, G.J. Jones, and M.C. Cornwall, “Movement of retinal along cone and rod
photoreceptors,” Visual Neuroscience, 11, 389–399, 1994.
[468] C.A. Johnson, “Effects of luminance and stimulus distance on accommodation and
visual resolution,” Journal of the Optical Society of America, 66, 138–142, 1976.
[469] T. Johnson, Colour Management in Graphic Arts and Publishing, Leatherhead: Pira
International, 1996.
[470] L.A. Jones, “The evaluation of negative film speeds in terms of print quality,” Journal
of the Franklin Institute, 227, 3, 297–354 and 497–544, 1939.
[471] L.A. Jones and H.R. Condit, “The brightness scale of exterior scenes and the compu-
tation of correct photographic exposure,” Journal of the Optical Society of America,
31, 651–678, 1941.
[472] L.A. Jones and C.N. Nelson, “The control of photographic printing be measured
characteristics of the negative,” Journal of the Optical Society of America, 32, 558–
619, 1942.
[473] L.A. Jones and H.R. Condit, “Sunlight and skylight as determinants of photographic
exposure. Part I,” Journal of the Optical Society of America, 38, 123–178, 1948;
Part II, 39, 94–135, 1949.
[474] P.R. Jones, “Evolution of halftoning technology in the United States patent litera-
ture,” Journal of Electronic Imaging, 3, 3, 257–275, 1994.
[475] D.B. Judd, “Hue, saturation, and lightness of surface colors with chromatic illumi-
nation,” Journal of the Optical Society of America, 30, 2–32, 1940.
References 659
[476] D.B. Judd, D.L. MacAdam, G.W. Wyszecki, “Spectral distribution of typical daylight
as a function of correlated color temperature,” Journal of the Optical Society of
America, 54, 1031–1040, 1964.
[477] D.B. Judd and G. Wyszecki, Color in Business, Science, and Industry, 3rd edition,
New York, NY: John Wiley and Sons, 1975.
[478] G. Kaiser, A Friendly Guide to Wavelets, Boston, MA: Birkhäuser, 1994.
[479] P.K. Kaiser, “Photometric measurements,” Chapter 11 in Optical Radiation Mea-
surements, Volume 5, edited by C.J. Bartleson and F. Grum, Orlando, FL: Academic
Press, 1984.
[480] P.K. Kaiser and R.M. Boynton, Human Color Vision, 2nd edition, Washington, DC:
Optical Society of America, 1996.
[481] K. Kanamori, “Interpolation errors on gray gradations caused by the three-
dimensional lookup table method,” Journal of Electronic Imaging, 10, 2, 431–444,
2001.
[482] K. Kanatani, Geometric Computation for Machine Vision, Oxford: Clarendon Press,
1993.
[483] H.R. Kang, Color Technology for Electronic Imaging Devices, Bellingham, WA:
SPIE Press, 1997.
[484] H.R. Kang, Digital Color Halftoning, Bellingham, WA: SPIE Press, 1999.
[485] G. Kanizsa, “Margini quasi-percettivi in campi con stimolazione omogenea,” Riv-
isita di Psicologia, 49, 7–30, 1955.
[486] M. Kaplan, “Monte Carlo calculation of light distribution in an integrating cavity
illuminator,” Proceedings of SPIE, 1448, 206–217, 1991.
[487] H.M. Karara (ed.), Non-Topographic Photogrammetry, 2nd edition, Falls Church,
VA: American Society for Photogrammetry and Remote Sensing, 1989.
[488] J.M. Kasson, S.I. Nin, W. Plouffe, and J.L. Hafner, “Performing color space conver-
sions with three-dimensional linear interpolation,” Journal of Electronic Imaging,
4, 3, 226–250, 1995.
[489] J. Katajamäki and H. Saarelma, “Objective quality potential measures of natural
color images,” Journal of Imaging Science and Technology, 42, 3, 250–263, 1998.
[490] A.H. Katz, “Camera shutters,” Journal of the Optical Society of America, 39, 1,
1–21, 1949.
[491] H. Kawamoto, “The history of liquid-crystal displays,” Proceedings of the IEEE,
40, 4, 460–500, 2002.
[492] S. Kawamura, “Capturing images with digital still cameras,” IEEE Micro, 18, 6,
14–19, 1998.
[493] V. Kayargadde and J.-B. Martens, “Estimation of edge parameters and image blur
using polynomial transforms,” CVGIP: Graphic Models and Image Processing, 56,
6, 442–461, 1994.
[494] B.W. Keelan, Handbook of Image Quality, New York, NY: Marcel Dekker, 2002.
[495] K. Keller (ed.), Science and Technology of Photography, Weinheim: VCH, 1993.
[496] P.A. Keller, Electronic Display Measurement: Concepts, Techniques and Instrumen-
tation, New York, NY: John Wiley & Sons, 1997.
[497] E.F. Kelley and G.R. Jones, “Utilizing the bi-directional reflection distribution func-
tion to predict reflections from FPDs,” SID 97 Digest, 831–834, 1997.
660 References
[498] D.H. Kelly, “Systems analysis of the photographic process. I. A three-stage model,”
Journal of the Optical Society of America, 50, 3, 269–276, 1960.
[499] D.H. Kelly, “Systems analysis of the photographic process. II. Transfer func-
tion measurements,” Journal of the Optical Society of America, 51, 3, 319–330,
1961.
[500] D.H. Kelly, Visual Science and Engineering: Models and Applications, New York,
NY: Marcel Dekker, 1994.
[501] J.R. Kender, “Instabilities in color transformations,” in Proc. IEEE Conference on
Pattern Recognition and Image Processing, Troy, NY: Rensselaer Polytechnical
Institute, pp. 266–274, 1977.
[502] R.W. Kenyon, “Ink jet printing,” in Chemistry and Technology of Printing and
Imaging Systems, edited by P. Gregory, London: Blackie Academic and Professional,
1996.
[503] I.A. Khinchine, Mathematical Foundations of Statistical Mechanics, New York, NY:
Dover, 1949.
[504] I.-C. Khoo, Liquid Crystals: Physical Properties and Nonlinear Optical Phenomena,
New York, NY: John Wiley & Sons, 1995.
[505] J.-Y. Kim, Y.-S. Seo, and T.-H. Ha, “Estimation of illuminant chromaticity from
single color image using perceived illumination and highlight,” Journal of Imaging
Science and Technology, 45, 3, 274–282, 2001.
[506] R. Kimmel, “Demosaicing: image reconstruction from CCD samples,” IEEE Trans-
actions on Image Processing, 8, 1221–1228, 1999.
[507] R. Kingslake, “Illumination in optical images,” in Applied Optics and Optical
Engineering, Vol. II, p.195, New York, NY: Academic Press, 1965.
[508] R. Kingslake, Lens Design Fundamentals, San Diego, CA: Academic Press,
1978.
[509] R. Kingslake, Optics in Photography, Bellingham, WA: SPIE Optical Engineering
Press, 1992.
[510] M. Klien and I.W. Kay, Electromagnetic Theory and Geometrical Optics, New York,
NY: Wiley(Interscience), 1965.
[511] H. Kipphan (ed.), Handbook of Print Media, Berlin: Springer, 2001.
[512] M.V. Klein and T.E. Furtak, Optics, 2nd edition, New York, NY: John Wiley and
Sons, 1986.
[513] S. Kleinfelder, S.H. Lim, X. Liu, and A. El Gamal, “A 10000 frames/s CMOS digital
pixel sensor,” IEEE Journal of Solid-State Circuits, 36, 12, 2049–2059, 2001.
[514] J.A. Kneisly, III., “Local curvature of wavefronts in an optical system,” Journal of
the Optical Society of America, 54, 229–235, 1964.
[515] K. Knoblauch, F. Vital-Durand, and J.L. Barbur, “Variation of chromatic sensitivity
across the life span,” Vision Research, 41, 23–36, 2001.
[516] D.E. Knuth, “Digital halftones by dot diffusion,” ACM Transactions on Graphics,
6, 4, 245–273, 1987.
[517] S. Kobayashi, “An LCD in the multimedia network age: polymer stabilized FLCD,”
Proceedings of the 18th International Display Research Conference, Asia Display
’98, 11–14, 1998.
References 661
[518] Kodak Limited, The Complete Kodak Book of Photography, Ann Arbor, MI: Lowe
& B. Hould Publishers, 1994.
[519] J.J. Koenderink, “Color atlas theory,” Journal of the Optical Society of America, A,
4, 7, 1314–1321, 1987.
[520] H. Kolb, “Anatomical pathways for color vision in the human retina,” Visual Neu-
roscience, 7, 61–74, 1991.
[521] H. Kolb and L. Dekorver, “Midget ganglion cells of the parafovea of human retina:
a study by electron microscopy and serial section reconstructions,” The Journal of
Comparative Neurology, 303, 617–636, 1991.
[522] H. Kondo, Y. Chiba, and T. Yoshida, “Veiling glare in photographic systems,” Optical
Engineering, 21, 2, 343–346, 1982.
[523] K. Kondo, K. Kinugawa, N. Konishi, and H. Kawakami, “Wide-viewing-angle dis-
plays with in-plane switching mode of nematic LCs addressed by 13.3-in. XGA
TFTs,” SID 96 Digest, 81–84, 1996.
[524] S. Konishi, A.L. Yuille, J.M. Coughlan, and S.C. Zhu, “Statistical edge detection:
learning and evaluating edge cues,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, 25, 1, 57–74, 2003.
[525] H.J. Kostkowski, Reliable Spectroradiometry, La Plata, MD: Spectroradiometry
Consulting, 1997.
[526] E. Kowler, “The stability of gaze and its implication for vision,” in Eye Movements,
edited by R.H.S. Carpenter, Boca Raton, FL: CRC Press, 1991.
[527] W. Kraft and W.R. von Stein, Exposure Control Process and Photographic Color
Copying Apparatus, US Patent 5,016,043, May 14, 1991.
[528] J. Krauskopf, “Effect of retinal image stabilization on the appearance of heterochro-
matic targets,” Journal of the Optical Society of America, 53, 741–744, 1963.
[529] Y.A. Kravtsov and L.A. Apresyan, “Radiative transfer: new aspects of the old theory,”
in Progress in Optics, Volume XXXVI, edited by E. Wolf, Amsterdam: Elsevier,
1996.
[530] H. Krim, D. Tucker, S. Mallat, and D. Donoho, “On denoising and best signal
representation,” IEEE Transactions on Information Theory, 45, 7, 2225–2238, 1999.
[531] E.L. Krinov, Spectral Reflectance Properties of Natural Formations, translated by G.
Belkov, National Research Council of Canada, Technical Translation TT-439, 1953.
[532] M.A. Kriss, C.N. Nelson, and F.C. Eisen, “Modulation transfer function in photo-
graphic systems containing development adjacency effects,” Photographic Science
and Engineering, 18, 2, 131–138, 1974.
[533] M.A. Kriss, “Image Structure,” in The Theory of the Photographic Process, 4th
edition, edited by T.H. James, pp. 547–552, New York, NY: Macmillan Publishing
Co., 1977.
[534] A.I. Krymski, N.E. Bock, N. Tu, D. Van Blerkom, and E.R. Fossum, “A high-speed,
240–frames/s. 4.1-Mpixel CMOS sensor,” IEEE Transactions on Electron Devices,
50, 1, 130–135, 2003.
[535] D.T. Kuan, A.A. Sawchuk, T.C. Strand, and P. Chavel, “Adaptive noise smoothing
filter for image with signal-dependent noise,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, 7, 165–177, 1985.
662 References
[536] P. Kubelka and F. Munk, “Ein Beitrag zur Optik der Farbanstriche,” Z. Techn. Physik,
12, 593–601, 1931.
[537] P. Kubelka, “New contributions to the optics of intensely light-scattering materials,
Part I,” Journal of the Optical Society of America, 38, 49, 1067, 1948.
[538] P. Kubelka, “New contributions to the optics of intensely light-scattering materials,
Part II. Nonhomogeneous layers,” Journal of the Optical Society of America, 44,
330, 1954.
[539] R.G. Kuehni, Color: An Introduction to Practice and Principles, New York, NY:
John Wiley & Sons, 1997.
[540] J.J. Kulikowski, V. Walsh, and I.J. Murray (eds.), Vision and Visual Dysfunction,
Volume 5, Limits of Vision, Boca Raton, FL: CRC Press, 1991.
[541] S. Kumar, Liquid Crystals: Experimental Study of Physical Properties and Phase
Transitions, New York, NY: Cambridge University Press, 2001.
[542] Y. Kume, N. Yamada, S. Kozaki, H. Kisishita, F. Funada, and M. Hijikigawa, “Ad-
vanced ASM mode: Improvement of display performance by using a negative-
dielectric liquid crystal,” SID 98 Digest, 1089–1092, 1998.
[543] R.L. Lamberts, “Measurement of sine-wave response of a photographic emulsion,”
Journal of the Optical Society of America, 49, 5, 425–428, 1959.
[544] R.L. Lamberts, “Sine-wave response techniques in photographic printing,” Journal
of the Optical Society of America, 51, 9, 982–987, 1961.
[545] P. Lancaster and K. Šalkauskas, Curve and Surface Fitting, London: Academic
Press, 1986.
[546] E.H. Land and J.J. McCann, “Lightness and retinex theory,” Journal of the Optical
Society of America, 61, 1–11, 1971.
[547] E.H. Land, “The retinex theory of color vision,” Scientific American, 237, 6, 108–
129, 1977.
[548] M.S. Langer, “Large-scale failures of f −α scaling in natural image spectra,” Journal
of the Optical Society of America, A, 17, 1, 28–33, 2000.
[549] G.N. LaRossa and H.-C. Lee, Digital Image Processing Method for Edge Shaping,
US Patent 6,611,627, August 26, 2003.
[550] D.L. Lau, R. Ulichney, and G.R. Arce, “Blue- and green-noise halftoning models,”
IEEE Signal Processing Magazine, 28–38, July 2003.
[551] J.P. Lavine, E.A. Trabka, B.C. Burkey, T.J. Tredwell, E.T. Nelson, and C. Anagnos-
topoulos, “Steady-state photocarrier collection in silicon imaging devices,” IEEE
Transactions on Electron Devices, 30, 9, 1123–1134, 1983.
[552] C.L. Lawson and R.J. Hanson, Solving Least Squares Problems, Englewood Cliffs,
NJ: Prentice-Hall, 1974. (Reprinted with corrections and a new appendix by SIAM,
Philadelphia, 1995.)
[553] H.P. Le, “Progress and trends in ink-jet printing technology,” Journal of Imaging
Science and Technology, 42, 1, 49–62, 1998.
[554] J.C. Leachtenauer, Electronic Image Display: Equipment Selection and Operation,
Bellingham, WA: SPIE Press, 2003.
[555] M. Leadbeater, “Polymers shine the light,” SPIE’s OE Magazine, 14–17, June
2002.
References 663
[556] B.B. Lee, J. Pokorny, V.C. Smith, P.R. Martin, and A. Valbergt, “Luminance and
chromatic modulation sensitivity of macaque ganglion cells and human observers,”
Journal of the Optical Society of America, A, 7, 12, 2223–2236, 1990.
[557] B.B. Lee, D.M. Dacey, V.C. Smith, and J. Pokorny, “Horizontal cells reveal cone
type-specific adaptation in primate retina,” Proc. of National Academy of Science,
USA, 96, 25, 14611–14616, 1999.
[558] C. Lee, M. Eden, and M. Unser, “High-quality image resizing using oblique projec-
tion operators,” IEEE Transactions on Image Processing, 7, 5, 679–692, 1998.
[559] D. Lee, “Coping with discontinuities in computer vision: their detection, classifi-
cation, and measurement,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, 12, 4, 321–344, 1990.
[560] H.-C. Lee, “Method for computing the scene-illuminant chromaticity from spec-
ular highlights,” Journal of the Optical Society of America, A, 3, 10, 1694–1699,
1986.
[561] H.-C. Lee, Digital Color Image Processing Method Employing Constrained Cor-
rection of Color Reproduction Function, US Patent, 4,663,663, May 5, 1987.
[562] H.-C. Lee, Estimating the Illuminant Color from the Shading of a Smooth Sur-
face, MIT AI Memo 1068, Massachusetts Institute of Technology, Cambridge, MA,
August 1988.
[563] H.-C. Lee, E.J. Breneman, and C.P. Schulte, “Modeling light reflection for computer
color vision,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 12,
4, 402–409, 1990.
[564] H.-C. Lee, “A review of image-blur models in a photographic system using the
principles of optics,” Optical Engineering, 29, 5, 405–421, 1990.
[565] H.-C. Lee, “A computational model for opponent color encoding,” in Advanced
Printing of Conference Summaries, SPSE’s 43rd Annual Conference, Rochester,
New York, May 1990, pp. 178–181, 1990.
[566] H.-C. Lee, “Chromatic edge detection: Idealization and reality,” International Jour-
nal of Imaging Systems and Technology, 2, 251–266, 1990.
[567] H.-C. Lee and D.R. Cok, “Detecting boundaries in a vector field,” IEEE Transactions
on Signal Processing, 39, 5, 1181–1194, 1991.
[568] H.-C. Lee, H.-J. Lee, H. Kwon, and J. Liang, Noise Suppression Algorithm Using
Singular Value Decomposition, US Patent 5,010,504, April 23, 1991.
[569] H.-C. Lee, “A physics-based color encoding model for images of natural scenes,”
in Proceedings of the Conference on Modern Engineering and Technology, Electro-
Optics Session, Taipei, Taiwan, Dec. 6–15, pp. 25–52, 1992.
[570] H.-C. Lee and R.M. Goodwin, “Colors as seen by humans and machines,” in Final
Program and Advance Printing Papers of the IS&T’s 47th Annual Conference, 401–
405, 1994.
[571] H.-C. Lee, “Color image quantization based on physics and psychophysics,” Journal
of the Society of Photographic Science and Technology of Japan, 59, 1, 212–225,
1996.
[572] H.-C. Lee, L.L. Barski, and R.A. Senn, Automatic Tone Scale Adjustment Using
Image Activity Measures, US Patent 5,633,511, May 27, 1997.
664 References
[573] H.-C. Lee, S. Daly, and R.L. Van Metter, “Visual optimiztion of radiographic tone
scale,” Proceedings of SPIE, 3036, 118–129, 1997.
[574] H.-C. Lee and H. Kwon, Method for Estimating and Adjusting Digital Image Con-
trast US Patent 5,822,453, October 13, 1998.
[575] H.-C. Lee, “Internet Color Imaging,” Proceedings of SPIE, 4080, 122–135, 2000.
[576] H.-C. Lee, Tone Scale Processing Based on Image Modulation Activity, US Patent
No. 6,717,698, Apr. 6, 2004.
[577] J.-S. Lee, “Refined filtering of image noise using local statistics,” Computer Graphics
and Image Processing, 15, 380–389, 1981.
[578] J.-S. Lee, “Digital image smoothing and the sigma filter,” Computer Vision, Graph-
ics, and Image Processing, 24, 2, 255–269, 1983.
[579] J.S. Lee, J. Shah, M.E. Jernigan, and R.I. Hornsey, “Characterization and deblur-
ring of lateral crosstalk in CMOS image sensors,” IEEE Transactions on Electron
Devices, 50, 12, 2361–2368, 2003.
[580] Y. Le Grand and S.G. El Hage, Physiological Optics, Berlin: Springer-Verlag, 1980.
[581] D. Leibovici and R. Sabatier, “A singular value decomposition of a k-way array for
a principal component analysis of multiway data, PTA-k,” Linear Algebra and Its
Applications, 269, 307–329, 1998.
[582] H. Leibowitz, “The effect of pupil size on visual acuity for photometrically equated
test fields at various levels of luminance,” Journal of the Optical Society of America,
42, 6, 416–422, 1952.
[583] H. Leibowitz, N.A. Myers, and P. Chinetti, “The role of simultaneous contrast in
brightness constancy,” Journal of Experimental Psychology, 50, 1, 15–18, 1955.
[584] H.W. Leibowitz and D.A. Owens, “New evidence for the intermediate position of
relaxed accommodation,” Documenta Ophthalmologica, 46, 133–147, 1978.
[585] P. Lennie, “Parallel visual pathways: a review,” Vision Research, 20, 561–594, 1980.
[586] P. Lepoutre, The Structure of Paper Coatings: An Update, Atlanta, GA: TAPPI Press,
1989.
[587] A.G. Leventhal, Y. Wang, M.T. Schmolesky, and Y. Zhou, “Neural correlates of
boundary perception,” Visual Neroscience, 15, 1107–1118, 1998.
[588] L. Levi and R.H. Austing, “Tables of the modulation transfer function of a defocused
perfect lens,” Applied Optics, 7, 5, 967–974, 1968.
[589] A. Lewis and L.V. Del Priore, “The biophysics of visual photoreception,” Physics
Today, 38–46, January, 1988.
[590] M. Li and J.-M. Lavest, “Some aspects of zoom lens camera calibration,” IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, 18, 11, 1105–1110, 1996.
[591] J. Liang, D.R. Williams, and D.T. Miller, “Supernormal vision and high-resolution
retinal imaging through adaptive optics,” Journal of the Optical Society of America,
A, 14, 11, 2873–2883, 1997.
[592] S. Liebes, Jr., “Brightness – On the ray invariance of B/n 2 ,” American Journal of
Physics, 37, 9, 932–934, 1969.
[593] W. Liekens, “A modified chemical spread function concept for the prediction of
densities in fine detail in photographic images,” in Photographic Image Quality,
Royal Photographic Symposium, Oxford, September pp. 8–14, 1980.
References 665
[594] J.S. Lim, Two-Dimensional Signal and Image Processing, Englewood Cliffs, NJ:
Prentice-Hall, 1990.
[595] J.O. Limb, “Distortion criteria of the human viewer,” IEEE Transactions on System,
Man, and Cybernetics, 9, 12, 778–793, 1979.
[596] C.-S. S. Lin, M.P. Mathur, and M.-C. F. Chang, “Analytical charge collection and
MTF model for photodiode-based CMOS imagers,” IEEE Transactions on Electron
Devices, 49, 5, 754–761, 2002.
[597] S. Lin, J. Gu, S. Yamazaki, H.-Y. Shum, “Radiometric calibration from a single im-
age,” in Proc. IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, 938–945, 2004.
[598] R.G. Littlejohn and R. Winston, “Corrections to classical radiometry,” Journal of
the Optical Society of America, A, 10, 9, 2024–2037, 1993.
[599] M.S. Livingstone and D.H. Hubel, “Segregation of form, color, movement, and
depth: anatomy, physiology, and perception,” Science, 240, 740–749, 1988.
[600] N.A. Logan, “Survey of some early studies of the scattering of plane waves by a
sphere,” Proceedings of the IEEE, 53, 773, 1965.
[601] G.G. Lorentz, Approximation of Functions, 2nd edition, New York, NY: Chelsea,
1986.
[602] R. Loudon, The Quantum Theroy of Light, Oxford: Oxford University Press, 1983.
[603] O. Lowenstein and I.E. Loewenfeld, “The pupil,” in The Eye, Volume 3, 2nd edition,
edited by H. Davson, New York, NY: Academic Press, 1969.
[604] J. Lubin, “A visual discrimination model for imaging systems design and evaluation,”
in Visual Models for Target Detection and Recognition, edited by E. Peli, pp. 245–
283 Singapore: World Scientific, 1995.
[605] D.G. Luenberger, Optimization by Vector Space Methods, New York, NY: John Wiley
& Sons, 1969.
[606] T. Lule, M. Wagner, M. Verhoeven, H. Keller, and M. Bohm, “100,000-pixel, 120-dB
imager in TFA technology,” IEEE Journal of Solid-State Circuits, 35, 5, 732–739,
2000.
[607] T. Lule, S. Benthien, H. Keller, F. Mutze, P. Rieve, K. Seibel, M. Sommer, and
M. Bohm, “Sensitivity of CMOS based imagers and scaling perspectives,” IEEE
Transactions on Electron Devices, 47, 11, 2110–2122, 2000.
[608] R.K. Luneburg, Mathematical Theory of Optics, Berkeley, CA: University of Cali-
fornia Press, 1964.
[609] J. Luo, R.T. Gray, and H.-C. Lee, “Towards physics-based segmentation of photo-
graphic color images,” in Proc. International Conference on Image Processing, 3,
58–61, 1997.
[610] J. Luo and A. Singhal, “On measuring low-level saliency in photographic images,”
in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 1, 84–89,
2000.
[611] J. Luo and S.P. Etz, “A physical model-based approach to detecting sky in photo-
graphic images,” IEEE Transactions on Image Processing, 11, 3, 201–212, 2002.
[612] M.R. Luo and B. Rigg, “BFD(l:c) colour difference formula” (Part 1 and Part 2),
Journal of the Society of Dyers and Colourists, 126–132, 1987.
666 References
[613] M.R. Luo, G. Cui, and B. Rigg, “The development of CIE2000 colour-difference
formula: CIEDE2000,” COLOR Research and Application, 26, 5, 340–350, 2001.
[614] S.M. Luria and D.F. Neri, “Individual differences in luminous efficiency measured
by flicker photometry”, COLOR Research and Application, 11, 1, 72–75, 1986.
[615] D.K. Lynch and W. Livingston, Color and Light in Nature, Cambridge CA: Cam-
bridge University Press, 1995.
[616] R.F. Lyon and P.M. Hubel, “Eyeing the camera: Into the next century,” in Proceedings
IS&T/SID Tenth Color Imaging Conference, 349–355, 2002.
[617] D.L. MacAdam, “Maximum visual efficiencies of colored materials,” Journal of the
Optical Society of America, 25, 361–367, 1935.
[618] D.L. MacAdam, “Visual sensitivities to color differences in daylight,” Journal of
the Optical Society of America, 32, 247, 1942.
[619] D.L. MacAdam, “Uniform color scales,” Journal of the Optical Society of America,
64, 1691–1702, 1974.
[620] D.L. MacAdam, Color Measurement, Heidelberg: Springer-Verlag, 1985.
[621] L.W. MacDonald and A.C. Lowe (eds.), Display Systems: Design and Applications,
Chichester: John Wiley & Sons, 1997.
[622] L.W. MacDonald and M.R. Luo (eds.), Colour Imaging: Vision and Technology,
Chichester: John Wiley and Sons, 2000.
[623] L.W. MacDonald and M.R. Luo (eds.), Colour Image Science: Exploiting Digital
Media, Chichester: John Wiley and Sons, 2002.
[624] E. Mach, “Über die Wirkung der räumlichen Verteilung des Lichtreizes auf die
Netzhaut. I. Sitzungsberichte der mathematisch-naturwissenschaftlichen Classe der
Kaiserlichen Akademie der Wissenschaften, 52 II, 303–322, 1865.
[625] J.R. Magnus and H. Neudecker, Matrix Differential Calculus with Applications in
Statistics and Econometrics, Chichester: John Wiley & Sons, 1988.
[626] V.N. Mahajan, Aberration Theory Made Simple, Bellingham WA: SPIE Optical
Engineering Press, 1991.
[627] D. Malacara (ed.), Optical Shop Testing, New York, NY: John Wiley and Sons, 1978.
[628] S.G. Mallat, A Wavelet Tour of Signal Processing, 2nd edition, Orlando, FL: Aca-
demic Press, 1999.
[629] L. Maloney and B. Wandell, “Color constancy: a method for recovering surface
spectral reflectance,” Journal of the Optical Society of America, A, 3, 1, 29–33,
1986.
[630] L.T. Maloney, “Evaluation of linear models of surface spectral reflectance with small
number of parameters,” Journal of the Optical Society of America, A, 3, 1673–1683,
1986.
[631] L. Mandel and E. Wolf, Optical Coherence and Quantum Optics, New York: Cam-
bridge University Press, 1995.
[632] S. Marcos, E. Moreno, and R. Navarro, “The depth-of-field of the human eye from
objective and subjective measurements,” Vision Research, 39, 2039–2049, 1999.
[633] K.V. Mardia, Statistics of Directional Data, London: Academic Press, 1972.
[634] A.P. Mariani, “Bipolar cells in monkey retina selective for cones likely to be blue-
sensitive,” Nature, 308, 184–186, 1984.
References 667
[635] D.C. Marr and E. Hildreth, “Theory of edge detection,” Proceedings of the Royal
Society of London, B, 207, 187–217, 1980.
[636] D. Marr, Vision: A Computational Investigation into the Human Representation and
Processing of Visual Information, San Francisco, CA: W.H. Freeman, 1982.
[637] S.R. Marschner, S.H. Westin, E.P.F. Lafortune, and K.E. Torrance, “Image-based
bidirectional reflectance distribution function measurement,” Applied Optics, 39,
16, 2592–2600, 2000.
[638] S.R. Marschner, H.W. Jensen, M. Cammarano, S. Worley, and P. Hanrahan, “Light
scattering from human hair fibers,” ACM Transactions on Graphics, 22, 3, 780–791,
2003.
[639] G.F. Marshall, “Scanning devices and systems,” Applied Optics and Optical Engi-
neering, VI, 203–262, 1980.
[640] H. Martens and T. Næs, Multivariate Calibration, Chichester: John Wiley and Sons,
1989.
[641] J.-B. Martens, “Multidimensional modeling of image quality,” Proceedings of the
IEEE, 90, 1, 133–153, 2002.
[642] J.-B. Martens, Image Technology Design: A Perceptual Approach, Boston, MA:
Kluwer, 2003.
[643] S. Martin, “Glare characteristics of lenses and optical instruments in the visible
region,” Optica Acta, 19, 6, 499–513, 1972.
[644] R.H. Masland and E. Raviola, “Confronting complexity: strategy for understanding
the microcircuitry of the retina,” Annual Reviews: Neurosciences, 23, 249–284,
2000.
[645] S. Matsuda and T. Nitoh, “Flare as applied to photographic lenses,” Applied Optics,
11, 8, 1850–1856, 1972.
[646] D. Maystre, “Rigorous vector theories of diffraction gratings,” Progress in Optics,
Volume XXI, edited by E. Wolf, pp. 1–67, Amsterdam: North-Holland Publishing
Company, 1984.
[647] C.S. McCamy, H. Marcus, and J.G. Davidson, “A color-rendition chart,” Journal of
Applied Photographic Engineering, 11, 3, 95–99, 1976.
[648] J.J. McCann, S.P. McKee, and T.H. Taylor, “Quantitative studies in retinex theory:
a comparison between theoretical predictions and observer responses to the color
Mondrian experiments,” Vision Research, 16, 445–458, 1976.
[649] J.J. McCann, “Capturing a black cat in shade: past and present of Retinex color
appearance model, ” Journal of Electronic Imaging, 13, 1, 36–47, 2004.
[650] J.J. McCann, “Color constancy: small overall and large local changes,” Proceedings
of SPIE, 1666, 310–320, 1992.
[651] C. McCollough, “Color adaptation of edge detectors in the human visual system,”
Science, 149, 1115–1116, 1965.
[652] T.W. McCurnin, L.C. Shooley, and G.R. Sims, “Charge-coupled device signal pro-
cessing models and comparisons,” Journal of Electronic Imaging, 2, 2, 100–107,
1993.
[653] R. McDonald (ed.), Colour Physics for Industry, 2nd edition, Bradford: The Society
of Dyers and Colourists, 1997.
668 References
[654] K. McLaren, The Colour Science of Dyes and Pigments, 2nd edition, Bristol: Adam
Hilger Ltd, 1986.
[655] P.A. McNaughton, “Light response of vertebrate photoreceptors”, Physiological
Reviews, 70, 3, 847–883, 1990.
[656] R.B. Merrill, Vertical Color Filter Detector Group and Array, US Patent 6,632,701,
October 14, 2003.
[657] J. Meyer-Arendt, Introduction to Classical and Modern Optics, 2nd edition, Engle-
wood Cliffs, NJ: Prentice-Hall, 1984.
[658] L. Michaelis and M. Menten, “Die kinetik der invertinwerkung,” Biochemische
Zeitschrift, 49, 333, 1913.
[659] D.D. Michaels, Visual Optics and Refraction, 3rd, St. Louis, MO: The C.V. Mosby
Company, 1985.
[660] A.A. Michelson and F.G. Pease, “Measurement of the diameter of α Orionis with
the interferometer,” Astrophysical Journal 53, 249–259, 1921.
[661] A.A. Michelson, Studies in Optics, New York, NY: Dover, 1995. (Original text
published in 1927.)
[662] G. Mie, “Beitrage zur Optik trüber Medien speziell kolloidaler Metallösungen,”
Annalen der Physik, 25, 377–445, 1908.
[663] J.R. Milch, “Image scanning and digitization,” Chapter 10 in Imaging Processes
and Materials, edited by J. Sturge, V. Walworth, and A. Shepp, New York, NY: Van
Nostrand Reinhold, 1989.
[664] M. Millodot and J. Sivak, “Influence of accommodation on the chromatic aberration
of the eye,” British Journal of Physiological Optics, 28, 169–174, 1973.
[665] M. Millodot, “Image formation in the eye,” in The Senses, edited by H.B. Barlow
and J.D. Mollon, London: Cambridge University Press, 1982.
[666] A.D. Milner and M.A. Goodale, The Visual Brain in Action, Oxford: Oxford Uni-
versity Press, 1995.
[667] M. Minnaert, Light and Color in the Outdoors, translated by L. Seymour, New York,
NY: Springer-Verlag, 1993.
[668] K.V. Mital, Optimization Methods in Operations Research and Systems Analysis,
New Delhi: Wiley Eastern Limited, 1977.
[669] S. Mitton (ed.), The Cambridge Encyclopedia of Astronomy, New York, NY: Crown
Publishers, 1977.
[670] E. Miyahara, V.C. Smith, and J. Pokorny, “The consequences of opponent recti-
fication: the effect of surround size and luminance on color appearance,” Vision
Research, 41, 859–871, 2001.
[671] Y. Miyake, “Tone correction of color picture by histogram modification,” Nippon
Shashin Sakhaishi, 48, 2, 94–101, 1980.
[672] Y. Miyake, Analysis and Evaluation of Digital Color Images, Tokyo, Japan Univer-
sity of Tokyo Press, 2000.
[673] Y. Miyake, T. Ishihara, K. Ohishi, and N. Tsumura, “Measurement and modeling
for the two dimensional MTF of human eye and its application for digital color
reproduction,” in Proc. IS&T/SID Ninth Color Imaging Conference, 153–157, 2001.
[674] P. Moon and D.E. Spencer, The Photic Field, Cambridge, MA: The MIT Press, 1981.
References 669
[675] J. Morgan, Introduction to Geometrical and Physical Optics, pp. 114–121, New
York, NY: McGraw-Hill Book Company, 1953.
[676] N. Moroney, M.D. Fairchild, R.W.G. Hunt, C.J Li, M.R. Luo, and T. Newman, “The
CIECAM02 color appearance model”, Proc. of the IS&T/SID 10th Color Imaging
Conference, Scottsdale, 23–27, 2002.
[677] J. Morovic and M.R. Luo, “Evaluating gamut mapping algorithms for universal
applicability,” COLOR Research and Application, 26, 1, 85–203, 2001.
[678] J. Morovic and Y. Wang, “A multi-resolution full-colour spatial gamut mapping
algorithm,” in Proc. IS&T/SID Eleventh Color Imaging Conference, 282–287,
2003.
[679] S. Morozumi, “Active-matrix thin-film transistor liquid-crstal displays,” in Advances
in Electronics and Electron Physics, Volume 77, pp. 1–82, New York, NY: Academic
Press, 1990.
[680] S. Morozumi, “Issues in manufacturing active-matrix LCDs,” in SID Seminar Lec-
ture Notes, Volume II, Seminar F3, 1–58, 1992.
[681] H. Moss, Narrow Angle Electron Guns and Cathode Ray Tubes, Supplement 3 of
Advances in Electronics and Electron Physics series, New York, NY: Academic
Press, 1968.
[682] P. Mouroulis and J. Macdonald, Geometrical Optics and Optical Design, New York,
NY: Oxford University Press, 1997.
[683] P.S. Mudgett and L.W. Richards, “Multiple scattering calculations for technology,”
Applied Optics, 10, 7, 1485–1502, 1971.
[684] K.T. Mullen, “The contrast sensitivity of human colour vision to red/green and
blue/yellow chromatic gratings,” Journal of Physiology (London), 359, 381–400,
1985.
[685] K.T. Mullen, “Colour vision as a post-receptoral specialization of the central visual
field,” Vision Research, 31, 1, 119–130, 1991.
[686] F. Müller, H. Wässle, and T. Voigt, “Pharmacological modulation of the rod pathway
in the cat retina,” Journal of Neural Physiology, 59, 1657–1672, 1988.
[687] J.B. Murdoch, Illumination Engineering – From Edison’s Lamp to the Laser, New
York, NY: Macmillan Publishing Company, 1985.
[688] M. Nagao and T. Matsuyama, “Edge preserving smoothing,” Computer Graphics
and Image Processing, 9, 394–407, 1979.
[689] K.I. Naka and W.A. Rushton, “S-potentials from luminosity units in the retina of
fish (Cyprinidae),” Journal of Physiology (London), 185, 587–599, 1966.
[690] K. Nakamura, “Grain distribution and sensitivity characteristics of photographic
black and white films,” Journal of Imaging Technology, 11, 17–21, 1985.
[691] K. Nakamura, “Grain distribution of color negative films,” Journal of Imaging Tech-
nology, 12, 6–10, 1986.
[692] S. Nakauchi, S. Hatanaka, and S. Usui, “Color gamut mapping based on a perceptual
image difference measure,” COLOR Research and Application, 24, 4, 280–291,
1999.
[693] K. Nassau, The Physics and Chemistry of Color, New York, NY: John Wiley and
Sons, Inc., 1983.
670 References
[694] J. Nathans, D. Thomas, and D.S. Hogness, “Molecular genetics of human color
vision: The genes encoding blue, green, and red pigments,” Science, 232, 193,
1986.
[695] R. Navarro, P. Artal, and D.R. Williams, “Modulation transfer of the human eye as
a function of retinal eccentricity,” Journal of the Optical Society of America, A, 10,
2, 201–212, 1993.
[696] S.K. Nayar, K. Ikeuchi, and T. Kanade, “Surface reflection: physical and geometrical
perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 13,
7, 611–634, 1991.
[697] S.K. Nayar and M. Oren, “Visual appearance of matte surfaces,” Science, 267,
1153–1156, 1995.
[698] Y. Nayatani, K. Takahama, H. Sobagaki, and K. Hashimoto, “Color-appearance
model and chromatic-adaptation,” Color Research and Application, 15, 210–221,
1990.
[699] Y. Nayatani, H. Sobagaki, and K. Hashimoto, “Existence of two kinds of representa-
tions of the Helmholtz–Kohlrausch effect. I. The experimental confirmation,” Color
Research and Application, 19, 4, 246–261, 1994.
[700] NBS (The US National Bureau of Standards), Self Study Manual on Optical Radia-
tion Measurements, Part I, NBS Technical Note No. 910–1, 1976; No. 910–2, 1978;
and No. 910–3, 1977.
[701] S.-M. F. Nee, “Error analysis for Mueller matrix measurement,” Journal of the
Optical Society of America, A, 20, 8, 1651–1657, 2003.
[702] J. Neitz and G.H. Jacobs, “Polymorphism of long-wavelength cone in normal human
colour vision,” Nature, 323, 623–625, 1986.
[703] J. Neitz and G.H. Jacobs, “Polymorphism in normal human color vision and its
mechanism,” Vision Research, 30, 4, 621–636, 1990.
[704] C.N. Nelson, “The theory of tone reproduction,” in The Theory of the Photographic
Process, 3rd edition, edited by T.H. James, pp. 464–498, New York, NY: Macmillan
Publishing Co., 1966.
[705] C.N. Nelson, “Prediction of densities in fine detail in photographic images,” Photo-
graphic Science and Engineering, 15, 1, 82–97, 1971.
[706] C.N. Nelson, “Tone and color reproduction,” in The Theory of the Photographic
Process, 4th edition, edited by T.H. James, pp. 547–552, New York, NY: Macmillan
Publishing Co., 1977.
[707] C.N. Nelson and G.C. Higgins, “Image Sharpness,” in Advances in the Psychophys-
ical and Visual Aspects of Image Evaluation, edited by R.P. Dooley, pp. 72–75,
Springfield, VA: The Society of Photographic Scientists and Engineers, 1977.
[708] A.N. Netravali and B.G. Haskell, Digital Pictures: Representation and Compression,
New York, NY: Plenum Press, 1988.
[709] H.E. Neugebauer, “Quality factor for filters whose spectral transmittances are dif-
ferent from color mixture curves, and its application to color photography,” Journal
of the Optical Society of America, 46, 821–824, 1956.
[710] R. Nevatia, “A color edge detector and its use in scene segmentation,” IEEE Trans-
actions on Systems, Man, and Cybernetics, 7, 820–826, 1977.
References 671
[711] S.M. Newhall, “Final report of the O.S.A. subcommittee on the spacing of the
Munsell Colors,” Journal of the Optical Society of America, 33, 7, 385–418,
1943.
[712] L.J. Newson, “Some principles governing changes in the apparent lightness of test
surfaces isolated from their normal backgrounds,” The Quarterly Journal of Exper-
imental Psychology, 10, 82–95, 1958.
[713] D. Nickerson and S.M. Newhall, “A psychological color solid,” Journal of the Optical
Society of America, 33, 7, 419–422, 1943.
[714] F.E. Nicodemus (ed.), Self-Study Manual on Optical Radiation Measurements, Part
I – Concepts, Gairthersburg, MD: National Bureau of Standards (now, National
Institute of Standards and Technology): Chapters 1–3, March 1976; Chapters 4–5,
February 1978; Chapter 6, June 1977; Chapters 7–9, June 1979; Chapter 10, March
1983; Chapter 11, April 1984; Chapter 12, April 1985.
[715] F.E. Nicodemus, J.C. Richmond, J.J. Hsia, I.W. Ginsberg, and T. Limperis, Geometri-
cal Considerations and Nomenclature for Reflectance, Gairthersburg, MD: National
Bureau of Standards (US), Monograph 160, October 1977.
[716] Nihon Shikisaigakkai [The Color Science Association of Japan], Handbook of Color
Science, 2nd edition (in Japanese), Tokyo: Tokyo University Press, 1998.
[717] M.R.M. Nijenhuis and F.J.J. Bloommaert, “Perceptual error measure for sampled and
interpolated images,” Journal of Imaging Science and Technology, 41, 3, 249–258,
1997.
[718] R.A. Normann, B.S. Baxter, H. Ravindra, and P.J. Anderton, “Photoreceptor contri-
bution to contrast sensitivity: Applications in radiological diagnosis,” IEEE Trans-
actions on System, Man, and Cybernetics, 13, 5, 944–953, 1983.
[719] G.A. Nothmann, Nonimpact Printing, Pittsburgh, PA: Graphic Arts Technical Foun-
dation, 1989.
[720] C.L. Novak and S.A. Shafer, “Color edge detection,” in Image Understanding Re-
search at CMU, Proc. DARPA Image Understanding Workshop, Los Angeles, Feb.
1987, edited by T. Kanade, pp. 35–37, Los Altos, CA: Morgan Kaufmann Publishers,
Inc., 1987.
[721] H. Nyquist, “Thermal agitation of electrical charge in conductors,” Physical Review,
32, 110–113, 1928.
[722] V. O’Brien, “Contour perception, illusion and reality,” Journal of the Optical Society
of America, 48, 112–119, 1958.
[723] J.A. Ogilvy, Theory of Wave Scattering from Rondom Rough Surfaces, Bristol: In-
stitute of Physics Publishing, 1991.
[724] K.N. Ogle, “Blurring of the retinal image and contrast thresholds in the fovea,”
Journal of the Optical Society of America, 50, 4, 307–315, 1960.
[725] N. Ohta, “Optimization of spectral sensitivities,” Photographic Science and Engi-
neering, 27, 193–201, 1983.
[726] N. Ohta, Introduction to Color Reproduction Technology, Tokyo: Corona Publishing,
1997.
[727] Y. Ohta, T. Kanade, and T. Sakai, “Color information for region segmentation,”
Computer Graphics and Image Processing, 13, 221–241, 1980.
672 References
[728] W.C. O’Mara, Liquid Crystal Flat Panel Displays: Manufacturing, Science, and
Technology, New York, NY: Van Nostrand Reinhold, 1993.
[729] E.L. O’Neil, Introduction to Statistical Optics, New York, NY: Dover, 1992.
[730] B. O’Neill, Elementary Differential Geometry, San Diego, CA: Academic Press,
1966.
[731] S. Ooue, “The photographic image,” in Progress in Optics, Volume VII, edited by
E. Wolf, pp. 300–358, Amsterdam: North-Holland Publishing Company, 1969.
[732] M. Oren and S.K. Nayar, “Generalization of Lambert’s reflectance model,” Proceed-
ings of ACM SIGGRAPH 1994, pp. 239–246, 1994.
[733] D.C. O’Shea, Elements of Modern Optical Design, New York, NY: John Wiley and
Sons, 1985.
[734] W. Ostwald, The Color Primer, English translation edited by F. Birren, New York,
NY: Van Nostrand Reinhold Company, 1969.
[735] Y. Ovchinnikov, I. Fainberg, R. Litvan, I. Solntsev, and N. Avatkova, “A new approach
to programming in photomechanical reproduction,” in Proc. of the 12th International
Conference of Printing Research Institutes, Versailles, France, edited by W. Banks,
pp. 160–163, Guildford: IPC Science and Technology Press, 1974.
[736] R.D. Overheim and D.L. Wagner, Light and Color, New York, NY: John Wiley and
Sons, 1982.
[737] D.A. Owens, “A comparison of accommodative responsiveness and contrast sensi-
tivity for sinusoidal gratings,” Vision Research, 20, 159–167, 1980.
[738] D.A. Owens, “The resting state of the eyes,” American Scientist, 72, 378–387,
1984.
[739] E.D. Palik (ed.), Handbook of Optical Constants of Solids, Orlando, FL: Academic
Press, 1985.
[740] E.D. Palik (ed.), Handbook of Optical Constants of Solids II, Orlando, FL: Academic
Press, 1991.
[741] J.M. Palmer, Lens Aberration Data, New York, NY: American Elsevier Publishing
Company, 1971.
[742] S. Pancharatnam, “Partial polarisation, partial coherence, and their spectral de-
scription for polychromatic light – Part I”, Proceedings of the Indian Academy of
Sciences, 57, 4, 218–230, 1963, “ – Part II”, 231–243, ibid. Both papers are collected
in Selected Papers on Interferometry, edited by P. Hariharan, SPIE Milestone Series,
Volume MS28, Bellingham, WA: SPIE Optical Engineering Press, 1991.
[743] A. Papoulis, The Fourier Integral and Its Applications, New York, NY: McGraw-
Hill, 1962.
[744] A. Papoulis, Signal Analysis, New York, NY: McGraw-Hill, 1977.
[745] A. Papoulis, Probability, Random Variables, and Stochastic Processes, New York,
NY: McGraw-Hill, 1991.
[746] T.N. Pappas, J.P. Allebach, and D.L. Neuhoff, “Model-based digital halftoning,”
IEEE Signal Processing Magazine, 14–27, July 2003.
[747] J.P.S. Parkkinen, J. Hallikainen, and T. Jaaskelainen, “Characteristic spectra of Mun-
sell colors,” Journal of the Optical Society of America, A, 6, 2, 318–322, 1989.
[748] J.R. Partington, Interpolation, Identification, and Sampling, Oxford: Clarendon
Press, 1997.
References 673
[749] K.A. Parulski, “Color filters and processing alternatives for one-chip cameras,” IEEE
Transactions on Electron Devices, 32, 8, 1381–1389, 1985.
[750] A.S. Patel and R.W. Jones, “Increment and decrement visual thresholds,” Journal
of the Optical Society of America, 58, 5, 696–699, 1968.
[751] S.N. Pattanaik, J.A. Ferwerda, M.D. Fairchild, and D.P. Greenberg, “A multiscale
model of adaptation and spatial vision for realistic image display,” Proc. ACM SIG-
GRAPH 1998, 287–298, 1998.
[752] T. Pavlidis, Algorithms for Graphics and Image Processing, Rockville, MD: Com-
puter Science Press, 1982.
[753] F. Pellaacini, J.A. Ferwerda, and D.P. Greenberg, “Toward a psychophysically-based
light reflection model for image synthesis,” Proc. ACM SIGGRAPH 2000, 55–64,
2000.
[754] D.G. Pelli and L. Zhang, “Accurate control of contrast on microcomputer display,”
Vision Research, 31, 1337–1350, 1991.
[755] D.G. Pelli, “Pixel independence: Measuring spatial interactions on a CRT display,”
Spatial Vision, 10, 4, 443–446, 1997.
[756] H. Peterson, A.J. Ahumada, and A.B. Watson, “An improved detection model for
DCT coefficient quantization,” Proceedings of SPIE, 1913, 191–201, 1993.
[757] B.T. Phong, “Illumination for computer generated images,” Communications of
ACM, 18, 6, 311–317, 1975.
[758] T. Piantanida and J. Larimer, “The impact of boundaries on color: Stabilized image
studies,” Journal of Imaging Technology, 15, 2, 58–63, 1989.
[759] G.E. Pierard, C. Pierard-Franchimont, F. Laso Dosal, T. Ben Mosbah, J. Arrese-
Estrada, A. Rurangirwa, A. Dowlati, and M. Vardar, “Pigmentary changes in skin
senescence,” Journal of Applied Cosmetology, 9, 57–63, 1991.
[760] E.R. Pike and S. Sarkar, The Quantum Theory of Radiation, Oxford: Clarendon
Press, 1995.
[761] J. E. Pinney and W. F. Voglesong, “Analytical Densitometry of Reflection Color Print
Materials,” Photographic Science and Engineering, 6, 6, 367–370, 1962.
[762] F.H.G. Pitt and E.W.H. Selwyn, “Colour of outdoor photographic subjects,” The
Photographic Journal, 78, 115–121, 1938.
[763] M. Planck, “Distribution of energy in the spectrum,” Annals of Physics, 4, 3, 553–
563, 1901.
[764] W.T. Plummer, “Photographic shutters: better pictures with a reconsideration of
shutter efficiency,” Applied Optics, 16, 7, 1914–1917, 1977.
[765] T. Poggio and F. Girosi, “Regularization algorithms for learning that are equivalent
to multilayer networks,” Science, 247, 4945, 978–982, 1990.
[766] M.R. Pointer, “The gamut of real surface colours,” Color Research and Application,
5, 145, 1980.
[767] M.R. Pointer, “Measuring colour reproduction,” The Journal of Photographic Sci-
ence, 34, 81–90, 1986.
[768] M.R. Pointer, “A colour reproduction index,” in Proceedings of the 2nd IS&T/SID
Color Imaging Conference, 180–182, 1994.
[769] F.J.A.M. Poirier and R. Gurnsey, “The effects of eccentricity and spatial frequency
on the orientation discrimination asymmetry,” Spatial Vision, 11, 4, 349–366, 1998.
674 References
[770] A.B. Poirson and B.A. Wandell, “The appearance of colored patterns: pattern–color
separability,” Journal of the Optical Society of America, A, 10, 2458–2471, 1993.
[771] M. Pope, H.P. Kallmann, and P.J. Magnante, “Electroluminescence in organic crys-
tals,” Journal of Chemical Physics, 38, 2042–2043, 1963.
[772] E. Popov, “Light diffraction by relief gratings: a macroscopic and microscopic view,”
Progress in Optics, vol. XXXI, edited by E. Wolf, pp. 139–187, Amsterdam: North-
Holland, 1993.
[773] Z. Popovic and J. Sjöstrand, “Resolution, separation of retinal ganglion cells, and
cortical magnification in humans,” Vision Research, 41, 1313–1319, 2001.
[774] M.I. Posner and M.E. Raichle, Images of Mind, Scientific American Library, New
York, NY: W.H. Freeman, 1994.
[775] C. Poynton, Digital Video and HDTV: Algorithms and Interfaces, San Francisco,
CA: Morgan Kaufmann, 2003.
[776] W.K. Pratt, Digital Image Processing, 2nd edition, New York, NY: John Wiley &
Sons, 1991.
[777] W.H. Press, B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling, Numerical Recipes
in C, 2nd edition, Cambridge: Cambridge University Press, 1992.
[778] J.E. Proctor and P.Y. Barnes, “NIST high accuracy reference reflectometer–
spectrophotometer,” Journal of Research of the National Institute of Standards and
Technology, 101, 5, 619–627, 1996.
[779] C.N. Proudfoot (ed.), Handbook of Photographic Science and Engineering, 2nd
edition, Springfield, VA: The Society for Imaging Science and Technology, 1997.
[780] S. Quan, N. Ohta, and N. Katoh, “Optimization of camera spectral sensitivities,” in
Proc. IS&T/SID Eighth Color Imaging Conference, 273–278, 2000.
[781] S. Quan, N. Ohta, R.S. Berns, and N. Katoh, “Optima design of camera spectral
sensitivity functions based on practical filter components,” in Proc. IS \ & T/SID
9th Color Imaging Conference, 326–331, 2001.
[782] V.C. Ramachandran and S. Blakeslee, Phantoms in the Brain, New York, NY: Quill
William Morrow, 1998.
[783] R. Ramanath, W.E. Snyder, G.L. Bilbro, and W.A. Sander III, “Demosaicking meth-
ods for Bayer color arrays,” Journal of Electronic Imaging, 11, 3, 306–315, 2002.
[784] F. Ratliff, Mach Bands: Quantitative Studies on Neural Networks in the Retina, San
Francisco, CA: Holden-Day, 1965.
[785] S. Ray, Camera Systems, London: Focal Press, 1983.
[786] S.F. Ray, Applied Photographic Optics, 3rd edition, Oxford: Focal Press, 2002.
[787] K. Ražnjević, Physical Quantities and the Units of the International System (SI),
New York, NY: Begell House, 1995.
[788] S.E. Reichenbach, S.K. Park, and R. Narayanswamy, “Characterizing digital image
acquisition devices,” Optical Engineering, 30, 2, 170–177, 1991.
[789] S. O. Rice, “Reflection of electromagnetic waves from slightly rough surfaces,”
Communications on Pure and Applied Mathematics, 4, 351–378, 1951.
[790] W.A. Richards, “Lightness scale from image intensity distribution,” Applied Optics,
21, 14, 2569–2582, 1982.
[791] G.H. Rieke, Detection of Light: From the Ultraviolet to the Submillimeter, Cam-
bridge: Cambridge University Press, 1994.
References 675
[792] H. Ripps and R.A. Weale, “The Visual Photoreceptors,” in The Eye, Volume 2A,
2nd edition, edited by H. Davson, New York, NY: Academic Press, 1976.
[793] C.R. Robbins, Chemical and Physical Behavior of Human Hair, 3rd edition, New
York, NY: Springer-Verlag, 1994.
[794] A.R. Robertson, “The CIE 1976 color-difference formulae,” Color Research and
Application, 2, 1, 7–11, 1977.
[795] A.R. Robertson, “Colour order systems: an introductory review,” Color Research
and Application, 9, 4, 234–240, 1984.
[796] R. Robilotto and Q. Zaidi, “Limits of lightness identification for real ob-
jects under natural viewing conditions,” Journal of Vision, 4, 779–797, 2004.
(http://journalofvision.org/4/9/9)
[797] A.H. Robins, Biological Perspectives on Human Pigmentation, Cambridge: Cam-
bridge University Press, 1991.
[798] G.S. Robinson, “Color edge detection,” Optical Engineering, 16, 479–484, 1977.
[799] K. Robinson, “Laser safety: Powerful tool need powerful protection,” Photonics
Spectra, 92–100, October 1998.
[800] R.L. Rockhill, T. Euler, and R.H. Masland, “Spatial order within but not between
types of retinal neurons,” Proceedings of the National Academy of Science, USA,
97, 5, 2303–2307, 2000.
[801] R.W. Rodieck, “The primate retina,” in Comparative Primate Biology, Vol. 4, Neu-
rosciences, pp. 203–278, New York, NY: Alan R. Liss Inc., 1988.
[802] R.W. Rodieck, The First Steps in Seeing, Sunderland, MA: Sinauer Associates, 1998.
[803] A. Roorda and D.R. Williams, “The arrangement of the three cone classes in the
living human eye,” Nature, 397, 520–522, 1999.
[804] A. Roorda, A.B. Metha, P. Lennie, and D.R. Williams, “Packing arrangement for
the three cone classes in primate retina,” Vision Research, 41, 1291–1306, 2001.
[805] A. Roorda and D.R. Williams, “Optical fiber properties of individual human cones,”
Journal of Vision, 2, 404–412, 2002.
[806] C. Rosenberg, M. Hebert, and S. Thrun, “Color constancy using KL-divergence,”
Proc. 8th IEEE International Conference on Computer Vision, 239–246, 2001.
[807] A. Rosenfeld and A.C. Kak, Digital Picture Processing, 2nd edition, New York,
NY: Academic Press, 1982.
[808] G. Rosenfeld, Screened Image Reproduction, US Patent 4,456,924, June 26,
1984.
[809] K. Rosenhauer and K. Rosenbruch, “Flare and optical transfer function,” Applied
Optics, 7, 2, 283–287, 1968.
[810] J. Ross, M.C. Morrone, and D.C. Burr, “The conditions under which Mach bands
are visible,” Vision Reseach, 29, 6, 699–715, 1989.
[811] J.A.J. Roufs, “Perceptual image quality: concept and measurement,” Philips Journal
of Research, 47, 1, 35–62, 1992.
[812] P.J. Rousseeuw and A.M. Leroy, Robust Regression and Outlier Detection, New
York, NY: John Wiley & Sons, 1987.
[813] D.L. Ruderman, “Origins of scaling in natural images,” Vision Research, 37, 23,
3385–3395, 1997.
[814] J.C. Russ, The Image Processing Handbook, Boca Raton, FL: CRC Press, 1992.
676 References
[815] M.D. Rutherford and D.H. Brainard, “Lightness constancy: a direct test of the
illumination-estimation hypothesis,” Psychological Science, 13, 2, 142–149, 2002.
[816] A. Safir, L. Hyams, and J. Philpot, “The retinal directional effect: a model based on
the Gaussian distribution of cone orientations,” Vision Research, 11, 819–831, 1971.
[817] K. Sagawa and Y. Takahashi, “Spectral luminous efficiency as a function of age,”
Journal of the Optical Society of America, A, 18, 11, 2659–2667, 2001.
[818] R. Saito and H. Kotera, “Image-dependent three-dimensional gamut mapping using
gamut boundary descriptor,” Journal of Electronic Imaging, 13, 3, 630–638, 2004.
[819] T. Sakamoto and A. Itooka, Linear Interpolator for Color Correction, US Patent
4,275,413, June 23, 1981.
[820] K. Sakatani and T. Itoh, “Color noise analysis,” Proc. IS&T’s 1999 PICS Conference,
241–246, 1999.
[821] T.O. Salmon and L. Thibos, “Comparison of the eye’s wave-front aberration mea-
sured psychophysically and with the Shack-Hartmann wave-front sensor,” Journal
of the Optical Society of America, A, 15, 9, 2457–2465, 1998.
[822] T.O. Salmon, Cornea Contribution to the Wavefront Aberration of the Eye, Ph.D.
Dissertation, Indiana University School of Optometry, 1999.
[823] C.L. Sanders, “Color preferences for natural objects,” Illumination Engineering, 54,
452–456, 1959.
[824] C. Sano, T. Song, and M.R. Luo, “Colour differences for complex images,” in
Proceedings IS&T/SID Eleventh Color Imaging Conference, 121–126, 2003.
[825] G. Sapiro, “Color and illuminant voting,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, 21, 11, 1210–1215, 1999.
[826] Y. Satoh, Y. Miyake, H. Yaguchi, and S. Shinohara, “Facial pattern detection and
color correction from negative color film,” Journal of Imaging Technology, 16, 2,
80–84, 1990.
[827] E. Saund, “Perceptual organization of occluding contours generated by opaque sur-
faces,” Proc. of IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, 2, 624–630, 1999.
[828] J.L. Saunderson, “Calculation of the color of pigmented plastics,” Journal of the
Optical Society of America, 32, 727–736, 1942.
[829] K. Sayanagi, “On the light distribution in the photographic image,” Journal of the
Optical Society of America, 47, 6, 566–567, 1957.
[830] O.H. Schade, “Image gradation, graininess and sharpness in television and motion-
picture systems,” Journal of the Society of Motion Picture and Television Engineers,
(This is a four-part article.) Part I: 56, 2, 137–177, 1951; Part II: 58, 3, 181–222,
1952; Part III: 61, 8, 97–164, 1953; Part IV: 64, 11, 593–617, 1955.
[831] O.H. Schade, Sr., “An evaluation of photographic image quality and resolving
power,” Journal of the Society of Motion Picture and Television Engineers, 73,
2, 81–119, 1964.
[832] L.B. Schein, Electrophotography and Development Physics, Morgan Hill, CA:
Laplacian Press, 1996.
[833] P. Scheunders, “An orthogonal wavelet representation of multivalued images,” IEEE
Transactions on Image Processing, 12, 6, 718–725, 2003.
References 677
[834] P.H. Schiller, “The ON and OFF channels of the visual system,” Trends in Neuro-
sciences, 15, 3, 86–92, 1992.
[835] D. Schmidt and P. Bachmann, Circuit Apparatus for Automatic Correction of TV
Color Balance, US Patent 5,040,054, August 13, 1991.
[836] J.M. Schmitt, G.X. Zhou, and E.C. Walker, “Multilayer model of photon diffusion
in skin,” Journal of the Optical Society of America, A, 7, 11, 2141–2153, 1990.
[837] J.L. Schnapf, T.W. Kraft, and D.A. Baylor, “Spectral sensitivity of human cone
photoreceptors,” Nature, 325, 439–441, 1987.
[838] J.L. Schnapf, T.W. Kraft, B.J. Nunn, and D.A. Baylor, “Spectral sensitivity of primate
photoreceptors,” Visual Neuroscience, 1, 255–261, 1988.
[839] J.L. Schnapf, B.J. Nunn, M. Meister, and D.A. Baylor, “Visual transduction in cones
of the monkey Macaca fascicularis,” Journal of Physiology, 427, 681–713, 1990.
[840] D.M. Schneeweiss and J.L. Schnapf, “Photovoltage of rods and cones in the Macaque
retina,” Science, 268, 1053–1056, 1995.
[841] W.F. Schreiber, Fundamentals of Electronic Imaging Systems, 2nd edition, Berlin:
Springer-Verlag, 1991.
[842] E. Schrödinger, “Theorie der Pigmente von gröster Leuchtkraft,” Annalen der
Physik, 62, 603–622, 1920.
[843] E.F. Schubert, Light-Emitting Diodes, Cambridge: Cambridge University Press,
2003.
[844] A. Schuster, “Radiation through a foggy atmosphere,” Astrophysical Journal, 21, 1,
1–22, 1905.
[845] A. Schwarz, “Camera shutters,” Applied Optics and Optical Engineering, edited by
R. Kingslake, Vol. IV, pp. 95–125 New York, NY: Academic Press, 1967.
[846] R.K. Segur, “Using photographic space to improve the evaluation of consumer
cameras,” in Proc. IS&T’s 2000 PICS Conference, 221–224, 2000.
[847] N. Sekiguchi, Contrast Sensitivity for Isoluminant Interference Fringes in Hu-
man Foveal Vision, Ph.D. Dissertation, Department of Psychology, University of
Rochester, 1992.
[848] L. Selhuber and A. Parker, “Optical characterisation of LCDs: pitfalls and solutions,”
in Display Systems: Design and Application, edited by L.W. MacDonald and A.C.
Lowe, Chichester: John Wiley & Sons, 1997.
[849] E.W.H. Selwyn, “A theory of graininess,” Photographic Journal, 75, 571–580, 1935.
[850] C.H. Sequin and M.F. Tompsett, Charge Transfer Device, New York, NY: Academic
Press, 1975.
[851] M. I. Sezan, K. L. Yip and S. Daly, “Uniform perceptual quantization: Applications
to digital radiography,” IEEE Transactions on System, Man, and Cybernetics, 17, 4,
622–634, 1987.
[852] R.V. Shack, “The influence of image motion and shutter operation on the photo-
graphic transfer function,” Applied Optics, 3, 10, 1171–1181, 1964.
[853] J.F. Shackelford, Introduction to Materials Science for Engineers, 4th edition, Upper
Saddle River, NJ: Prentice Hall, 1996.
[854] S.A. Shafer, “Using color to separate reflection components,” Color Research and
Application, 10, 4, 210–218, 1985.
678 References
[855] M.B. Shapiro, S.J. Schein, and F.M. De Monasterio, “Regularity and structure of the
spatial pattern of blue cones of macaque retina,” Journal of the American Statistical
Association, 80, 803–812, 1985.
[856] G. Sharma, “LCDs versus CRTs – color-calibration and gamut considerations,”
Proceedings of the IEEE, 90, 4, 605–622, 2002.
[857] G. Sharma (ed.), Digital Color Imaging Handbook, Boca Raton, FL: CRC Press,
2002.
[858] I. Shcherback and O. Yadid-Pecht, “Photoresponse analysis and pixel shape opti-
mization for CMOS active pixel sensors,” IEEE Transactions on Electron Devices,
50, 1, 12–18, 2003.
[859] R.A. Shelby, D.R. Smith, and S. Schultz, “Experimental verification of a negative
index of refraction,” Science, 292, 5514, 77–79, 2001.
[860] D. Shepard, “A two-dimensional interpolation function for irregularly spaced data,”
Proc. 23rd National Conference of ACM, 517–524, 1968.
[861] P.D. Sherman, Colour Vision in the Nineteenth Century, Chapter 6 Bristol: Adam
Hilger Ltd, 1981.
[862] S.M. Sherman and C. Koch, “The control of retinogeniculate transmission in the
mammalian lateral geniculate nucleus,” Experimental Brain Research, 63, 1–20,
1986.
[863] S.K. Shevell (ed.), The Science of Color, 2nd edition (Optical Society of America),
Amsterdam: Elsevier, 2003.
[864] G. Shiers, “Ferdinand Braun and the cathode-ray tube,” Scientific American, 230, 3,
92–101, March 1974.
[865] J. Shinar (ed.), Organic Light-Emitting Devices: A Survey, New York, NY: Springer-
Verlag, 2003.
[866] S.H. Shionoya and W.M. Yen (eds.), Phosphor Handbook, Boca Raton, FL: CRC
Press, 1999.
[867] J.D. Shore and J.P. Spoonhower, “Reflection density in photographic color prints:
generalizations of the Williams-Clapper transform,” Journal of Imaging Science and
Technology, 45, 5, 484–488, 2001.
[868] R.A. Shore, B.J. Thompson, and R.E. Whitney, “Diffraction by apertures illuminated
with partially coherent light,” Journal of the Optical Society of America, 56, 6, 733–
738, 1966.
[869] J.B. Shumaker, Self-Study Manual on Optical Radiation Measurements. Part I Con-
cepts, Chapter 6, Distribution of Optical Radiation with Respect to Polarization,
Washington, DC: National Bureau of Standards (now NIST), 1977.
[870] J.B. Shumaker, Self-Study Manual on Optical Radiation Measurements. Part I Con-
cepts, Chapter 8, Deconvolution, Washington, DC: National Bureau of Standards
(now NIST), 1979.
[871] W.A. Shurcliff, Polarized Light: Production and Use, Cambridge, MA: Harvard
University Press, 1962.
[872] R. Siegel and J.R. Howell, Thermal Radiation Heat Transfer, 2nd edition, Washing-
ton, DC: Hemisphere Publishing, 1981.
References 679
[873] M.B. Silevitch, R.A. Gonsalves, and D.C. Ehn, “Prediction and removal of adjacency
effects from photographic images,” Photographic Science and Engineering, 21, 1,
7–13, 1977.
[874] S. Silver, “Microwave aperture antennas and diffraction theory,” Journal of the
Optical Society of America, 52, 131, 1962.
[875] D.A. Silverstein and J.E. Farrell, “Efficient method for paired comparison,” Journal
of Electronic Imaging, 10, 2, 394–398, 2001.
[876] L.D. Silverstein, “Color in electronic displays,” in SID Seminar Lecture Notes, II,
Seminar F5, 1–75, 1997.
[877] J.W. Simmons and M.J. Guttmann, States, Waves and Photons: A Modern Introduc-
tion to Light, Reading, MA: Addison-Wesley, 1970.
[878] H. Simon, The Splendor of Iridescence, New York, NY: Dodd, Mead and Company,
1971.
[879] E.P. Simoncelli and B.A. Olshausen, “Natural image statistics and neural represen-
tation,” Annual Reviews: Neurosciences, 24, 1193–1216, 2001.
[880] J.L. Simonds, “A quantitative study of the influence of tone-reproduction factors on
picture quality,” Photographic Science and Engineering, 5, 5, 270–277, 1961.
[881] E.M. Slayter and H.S. Slayter, Light and Electron Microscopy, Cambridge: Cam-
bridge University Press, 1992.
[882] G.G. Slyusarev, Aberration and Optical Design Theory, 2nd edition, translated by
Major J.H. Dixon, Bristol: Adam Hilger Ltd., 1984.
[883] B.G. Smith, “Geometrical shadowing of a random rough surface,” IEEE Transac-
tions on Antennas and Propagation, 15, 5, 668–671, 1967.
[884] G. Smith and D.A. Atchison, The Eye and Visual Optical Instruments, Cambridge:
Cambridge University Press, 1997.
[885] V.C. Smith, and J. Pokorny, “Spectral sensitivity of the foveal cone photopigments
between 400 and 500 nm,” Vision Research, 15, 161–171, 1975.
[886] V.C. Smith, P.Q. Jin, and J. Pokorny, “The role of spatial frequency in color induction,”
Vision Research, 41, 1007–1021, 2001.
[887] W.J. Smith, Modern Optical Engineering, 3rd edition, New York, NY: McGraw-Hill
Book Company, 2000.
[888] W.R. Smythe, Static and Dynamic Electricity, 3rd edition, New York, NY: McGraw-
Hill, 1969.
[889] R. Sobol, “Improving the Retinex algorithm for rendering wide dynamic range
photographs,” Journal of Electronic Imaging, 13, 1, 65–74, 2004.
[890] M. Sommer, P. Rieve, M. Verhoeven, M. Bohm, B. Schneider, B. van Uffel, and
F. Librecht, “First multispectral diode color imager with three color recognition
and color memory in each pixel,” in Proceedings of the 1999 IEEE Workshop on
Charged-Coupled Devices and Advanced Image Sensors, pp. 187–190. Piscataway,
NY: The IEEE, 1999.
[891] J. M. Soto-Crespo, M. Nieto-Vesperinas, and A. T. Friberg, “ Scattering from slightly
rough random surfaces: a detailed study on the validity of the small perturbation
method,” Journal of the Optical Society of America, A, 7, 7, 1185–1201, 1990.
680 References
[892] L. Spillmann and J.S. Werner (eds.), Visual Perception: The Neurophysiological
Foundations, San Diego, CA: Academic Press, 1990.
[893] D.L. Spooner, “Why do measured values taken with different color instruments
usually differ?” in Proc. 2nd IS&T/SID Color Imaging Conference, 159–164,
1994
[894] N. Sprague and J. Luo, “Clothed people detection in still images,” in Proc. 16th
International Conference on Pattern Recognition, Volume 3, 585–589, 2002.
[895] W.N. Sproson, Colour Science in Television and Display Systems, Bristol: Adam
Hilger Ltd, 1983.
[896] R.F. Stamm, M.L. Garcia, and J.J. Fuchs, “The optical properties of human hair I.
fundamental considerations and goniophotometer curves,” Journal of the Society of
Cosmetic Chemists, 28, 571–600, 1977.
[897] J.-L. Starck, E.J. Candés, and D.L. Donoho, “The curvelet transform for image
denoising,” IEEE Transactions on Image Processing, 11, 6, 670–684, 2002.
[898] J.-L. Starck, F. Murtagh, E.J. Candés, and D.L. Donoho, “Gray and color image
contrast enhancement by the curvelet transform,” IEEE Transactions on Image Pro-
cessing, 12, 6, 706–717, 2003.
[899] O.N. Stavroudis, The Optics of Rays, Wavefronts, and Caustics, New York, NY:
Academic Press, 1972.
[900] G.P. Stein, Internal Camera Calibration Using Rotation and Geometric Shapes,
Master Thesis, Dept. of Electrical Engineering and Computer Science, Mas-
sachusetts Institute of Technology, Cambridge, MA, 1993.
[901] M.A. Stephens, “Random walk on a circle,” Biometrika, 50, 385–390, 1963.
[902] P. Sterling, “Retina,” in The Synaptic Organization of the Brain, edited by G.M.
Shepherd, New York, NY: Oxford University Press, 1990.
[903] J. C. Stevens and S. S. Stevens, “Brightness function: Effects of adaptation,” Journal
of the Optical Society of America, 53, 3, 375–385, 1963.
[904] S.S. Stevens (ed.), Handbook of Experimental Psychology, New York, NY: John
Wiley & Sons, 1951.
[905] S.S. Stevens and J.C. Stevens, The Dynamics of Visual Brightness, Psychophysical
Project Report PPR-246, Harvard University, Cambridge, MA., August 1960.
[906] S.S. Stevens, “To honour Fechner and repeal his law,” Science, 133, 80–86, 1961.
[907] E.C. Stewart, “The Gelb effect,” Journal of Experimental Psychology, 57, 4, 235–
242, 1959.
[908] W.S. Stiles, “The luminous efficiency of monochromatic rays entering the eye pupil
at different points and a new color effect,” Proceedings of the Royal Society of
London, B, 123, 90–118, 1937.
[909] T.G. Stockham Jr., “Image processing in the context of a visual model,” Proceedings
of the IEEE, 60, 828–842, July 1972.
[910] A. Stockman, D.I.A. MacLeod, and N.E. Johnson, “Spectral sensitivities of the hu-
man cones,” Journal of the Optical Society of America, A, 10, 12, 2491–2521, 1993.
[911] A. Stockman, L.T. Sharpe, K. Rüther, and K. Nordby, “Two signals in the human rod
visual system: A model based on electrophysiological data,” Visual Neuroscience,
12, 951–970, 1995.
References 681
[953] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in
Proceedings of the 6th International Conference on Computer Vision, 839–846,
1998.
[954] S. Tominaga and B.A. Wandell, “Standard surface-reflectance model and illuminant
estimation,” Journal of the Optical Society of America, A, 6, 4, 576–584, 1989.
[955] S. Tominaga and B.A. Wandell, “Natural scene-illuminant estimation using the
sensor correlation,” Proceedings of the IEEE, 90, 1, 42–56, 2002.
[956] R.B.H. Tootell, J.D. Mendola, N.K. Hadjikhani, P.J. Ledden, A.K. Liu, J.B. Reppas,
M.I. Sereno, and A.M. Dale, “Functional analysis of V3A and related areas in human
visual cortex,” The Journal of Neuroscience, 17, 7060–7078, 1997.
[957] K.E. Torrance and E.M. Sparrow, “Theory for off-specular reflection from rough-
ened surfaces,” Journal of the Optical Society of America, 57, 9, 1105–1114, 1967.
[958] P. E. Trahanias and A.N. Venetsanopoulos, “Vector directional filters: A new class
of multichannel image processing filters,” IEEE Transactions on Image Processing,
2, 528–534, 1993.
[959] P. E. Trahanias, D. G. Karakos, and A.N. Venetsanopoulos, “Directional processing
of color images: Theory and experimental results,” IEEE Transactions on Image
Processing, 5, 868–880, 1996.
[960] L.N. Trefethen and D. Bau, III, Numerical Linear Algebra, SIAM, Philadelphia,
1997.
[961] S. Triantaphillidou and R.E. Jacobson, “A simple method for the measurement
of modulation transfer functions of displays,” in Proceedings of the IS&T’s 2000
PICS Conference, 139–144, 2000.
[962] R.C. Tripathi and B.J. Tripathi, “Anatomy, orbit and adnexa of the human eye,” in
The Eye, Volume 1a, edited by H. Davson, Orlando, FL: Academic Press, 1984.
[963] Y. Trotter and S. Celebrini, “Gaze direction controls response gain in primary visual-
cortex neurons,” Nature, 398, 239–242, 1999.
[964] J.B. Troy and B.B. Lee, “Steady discharges of macaque retinal ganglion cells,” Visual
Neuroscience, 11, 111–118, 1994.
[965] H.J. Trussell, “Applications of set theoretic methods to color systems,” COLOR
Research and Application, 16, 1, 31–41, 1991.
[966] R.Y. Tsai, “A versatile camera calibration technique for high-accuracy 3-D ma-
chine vision metrology using off-the-shelf TV cameras and lenses,” IEEE Journal
of Robotics and Automation, 3, 4, 323–344, 1987.
[967] P.-S. Tsai, T. Acharya, and A.K. Ray, “Adaptive fuzzy color interpolation,” Journal
of Electronic Imaging, 11, 3, 293–305, 2002.
[968] D.Y. Ts’o and C.D. Gilbert, “The organization of chromatic and spatial interac-
tions in the primate striate cortex,” The Journal of Neuroscience, 8, 5, 1712–1727,
1988.
[969] D.Y. Ts’o, A. Wang Roe, and C.D. Gilbert, “A hierarchy of the functional organi-
zation for color form and disparity in primate visual area V2,” Vision Research, 41,
1333–1349, 2001.
[970] Y. Tsukamoto, P. Masarachia, S. Schein, and P. Sterling, “Gap junctions between
the pedicles of macaque foveal cones,” Vision Research, 32, 10, 1809–1815, 1992.
684 References
[971] K.-I.Tsutsui, H. Sakata, T. Naganuma, and M. Taira, “Neural correlates for percep-
tion of 3D surface orientation from texture gradient,” Science, 298, 5592, 409–412,
2002.
[972] C.M. Tuttle, “Photoelectric photometry in the printing of amateur negatives,” Journal
of Franklin Institute, 224, 315–337, 1937.
[973] K. Uchikawa, H. Uchikawa, and P.K. Kaiser, “Luminance and saturation of equally
bright colors,” Color Research and Application, 9, 1, 5–14, 1984.
[974] H. Uchike and T. Hirakawa, “Color plasma displays,” Proceedings of the IEEE, 90,
4, 533–539, 2002.
[975] K. Ukai, “Spatial pattern as a stimulus to the pupillary system,” Journal of the
Optical Society of America, A, 2, 7, 1094–1100, 1985.
[976] R. Ulichney, Digital Halftoning, Cambridge, MA: The MIT Press, 1987.
[977] L.M. Vaina, “Functional segregation of color and motion processing in the human
visual cortex: clinical evidence,” Cerebral Cortex, 4, 5, 555–572, 1994.
[978] J.M. Valeton and D. van Norren, “Light adaptation of primate cones: an analysis
based on extracellular data,” Vision Research, 23, 12, 1539–1547, 1983.
[979] W. Vanduffel, D. Fize, H. Peuskens, K. Denys, S. Sunaert, J.T. Todd, and G.A. Orban,
“Extracting 3D from motion:differences in human and monkey intraparietal cortex,”
Science, 298, 5592, 413–415, 2002.
[980] D.C. Van Essen and H.A. Drury, “Structural and functional analyses of human
cerebral cortex using a surface-based map,” The Journal of Neuroscience, 17, 7079–
7102, 1997.
[981] D.C. Van Essen, J.W. Lewis, H.A. Drury, N. Hadjikhani, R.B.H. Tootell, M. Bakir-
cioglu, and M.I. Miller, “Mapping visual cortex in monkeys and humans using
surface-based atlases,” Vision Research, 41, 1359–1378, 2001.
[982] D.C. Van Essen, “Organization of visual areas in macaque and human cerebral
cortex,” in The Visual Neurosciences, edited by L.M. Chalupa and J.S. Werner,
Cambridge, MA: MIT Press, 2004.
[983] E. Vanmarcke, Random Fields: Analysis and Synthesis, Cambridge, MA: The MIT
Press, 1983.
[984] A. van Meeteren, “Calculations on the optical modulation transfer function of the
human eye for white light,” Optica Acta, 21, 5, 395–412, 1974.
[985] F.L. Van Nes and M.A. Bouman, “Spatial modulation transfer in the human eye,”
Journal of the Optical Society of America, 57, 401–406, 1967.
[986] R. Varma, J.M. Tielsch, H.A. Quigley, et al., “Race-, age-, gender-, and refractive
error-related differences in the normal optic disc,” Archives of Ophthalmology, 112,
1068–1076, 1994.
[987] F.O.C.A. Veau, Optimization of the Modulation Transfer Fuction of a Joyce Loebl
Microdensitometer, Master Thesis, School of Physics, University of Melbourne,
Australia, 1971.
[988] W.H. Venable, “Accurate tristimulus values from spectra data,” Color Research and
Application, 14, 5, 260, 1989.
[989] H.G. Völz, Industrial Color Testing: Fundamentals and Techniques, Weinheim:
VCH, Germany, 1995.
References 685
[1012] G.J. Ward, “Measuring and modeling anisotropic reflection,” Computer Graphics,
26, 2, 265–272, 1992. (Proceedings of ACM SIGGRAPH 1992.)
[1013] H. Wässle, M. Yamashita, U. Greferath, U. Grünert, and F. Müller, “The rod bipolar
cell of the mammalian retina,” Visual Neural Science, 7, 99–112, 1991.
[1014] A.B. Watson (ed.), Digital Images and Human Vision, Cambridge, MA: MIT Press,
1993.
[1015] G.S. Watson and E.J. Williams, “On the construction of significance tests on the
circle and the sphere,” Biometrika, 43, 344–352, 1956.
[1016] G.S. Watson, Statistics on Spheres, New York, NY: John Wiley, 1983.
[1017] R.J. Watt and M. Morgan, “The recognition and representation of edge blur,” Vision
Research, 23, 1465–1477, 1983.
[1018] L.F. Weber, Plasma Displays, Seminar Lecture Notes, M-8, 1–36, Santa Ana, CA:
Society for Information Display, 1994.
[1019] M.F. Weber, C.A. Stover, L.R. Gilbert, T.J. Nevitt, and A.J. Ouderkirk, “Giant bire-
fringent optics in multilayer polymer mirrors,” Science, 287, 2451–2456, 2000.
[1020] V.R. Weidner and J.J. Hsia, “Reflection properties of pressed polytetrafluoroethylene
powder,” Journal of the Optical Society of America, 71, 7, 856–861, 1981.
[1021] W.T. Welford, Aberrations of Optical Systems, Bristol: Adam Hilger Ltd, 1986.
[1022] J. Weng, P. Cohen, and M. Herniou, “Camera calibration with distortion models
and accuracy evaluation,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, 14, 10, 965–980, 1992.
[1023] M.F. Wesner, J. Pokorny, S.K. Shevell, and V.C. Smith, “Foveal cone detection statis-
tics in color-normals and dichromats,” Vision Research, 31, 6, 1021–1037, 1991.
[1024] G. Westheimer, “Sharpness discrimination for foveal targets,” Journal of the Optical
Society of America, A, 8, 4, 681–685, 1991.
[1025] G. Westheimer, “Optics in vision,” in Visual Science and Engineering, edited by
D.H. Kelly, New York, NY: Marcel Dekker, Inc., 1994.
[1026] J.C. Whitaker, Electronic Displays, New York, NY: McGraw-Hill, 1994.
[1027] G. Whiteside, E. Betensky, D. Butler, Y. Chao, and J. Van Tassell, “Image acquisition
module with all plastic optics,” Proc. IS&T’s PICS Conference, pp. 70–75, Savannah,
Georgia, April 1999.
[1028] P. Whittle, “The psychophysics of contrast brightness,” and “Contrast brightness
and ordinary seeing,” in Lightness, Brightness, and Transparency, edited by A.L.
Gilchrist, Hillsdale, NJ: Lawrence Erlbaum, 1994.
[1029] H. Widdel and D.L. Post (eds.), Color in Electronic Displays, New York, NY: Plenum
Press, 1992.
[1030] N. Wiener, “Generalized harmonic analysis,” Acta Mathematica, 55, 117–258, 1930.
[1031] N. Wiener, Extrapolation, Interpolation and Smoothing of Stationary Time Series,
Cambridge, MA: MIT Press, 1949.
[1032] D.R. Williams, D.I.A. MacLeod, and M. Hayhoe, “Foveal tritanopia,” Vision Re-
search, 21, 1341–1356, “Punctate sensitivity of the blue-sensitive mechanisms,”
Vision Research, 21, 1357–1375, 1981.
[1033] D.R. Williams, “Topography of the foveal cone mosaic in the living human eye,”
Vision Research, 28, 433–454, 1988.
References 687
[1034] D.R. Williams and A. Roorda, “The trichromatic cone mosaic in the human eye,”
in Color Vision: From Genes to Perception, edited by K.R. Gegenfurtner and L.T.
Sharpe, pp. 113–122, Cambridge: Cambridge University Press, 2000.
[1035] F. C. Williams and F. R. Clapper, “Multiple internal reflections in photographic color
prints,” Journal of the Optical Society of America, 43, 7, 595–599, 1953.
[1036] S.J. Williamson and H.Z. Cummins, Light and Color in Nature and Art, New York,
NY: John Wiley and Sons, 1983.
[1037] E.N. Willmer and W.D. Wright, “Colour sensitivity of the fovea centralis,” Nature,
156, 119–121, 1945.
[1038] F.M. Willmouth, “Transparency, translucency, and gloss,” Chapter in Optical Prop-
erties of Polymers, edited by G.H. Meeten, 5 London: Elsevier Applied Science
Publishers, 1986.
[1039] R.G. Willson, Modeling and Calibration of Zoom Lenses, Ph.D. Thesis, Robotics
Institute, Carnegie Mellon University, Pittsburg, PA, 1994.
[1040] M.C.K. Wiltshire, “Bending light the wrong way,” Science, 292, 5514, 60–61, 2001.
[1041] E. Wolf, “Coherence and radiometry,” Journal of the Optical Society of America,
68, 1, 6–17, 1978.
[1042] J.M. Wolfe and D.A. Owens, “Is accommodation colorblind? Focusing chromatic
contours,” Perception, 10, 53–62, 1981.
[1043] W.L. Wolfe, “Radiometry,” in Applied Optics and Optical Engineering, Vol. VIII,
pp. 117–170, New York, NY: Academic Press, 1980.
[1044] W.L. Wolfe, Introduction to Radiometry, Bellingham, WA: SPIE Press, 1998.
[1045] L.B. Wolff, “Diffuse-reflectance model for smooth dielectric surfaces,” Journal of
the Optical Society of America, A, 11, 11, 2956–2968, 1994.
[1046] M. Wolski, C.A. Bouman, J.P. Allebach, and E. Walowit, “Optimization of sen-
sor response functions for colorimetry of reflective and emissive objects,” IEEE
Transactions on Image Processing, 5, 507–517, 1996.
[1047] H.-S. Wong, “Technology and device scaling considerations for CMOS imagers,”
IEEE Transactions on Electron Devices, 43, 12, 2131–2142, 1996.
[1048] E.A. Wood, Crystals and Light, New York, NY: Dover, 1977.
[1049] S.L. Wright, K. Ho, and A. Lien, “Status of TFTLCD color and metrology,” in Proc.
IS&T/SID Eighth Color Imaging Conference, 301–304, 2000.
[1050] W.D. Wright, “The basic concepts and attributes of colour order systems,” COLOR
Research and Application, 9, 4, 229–233, 1984.
[1051] X. Wu and N. Zhang, “Primary-consistent soft-decision color demosaicking for
digital cameras.” IEEE Transactions on Image Processing, 13, 9, 1263–1274, 2004.
[1052] S. Wuerger, A.B. Watson, and A.J. Ahumada Jr., “Toward a standard observer for
spatio-chromatic detection,” Proceedings of the SPIE, 4662, 159–172, 2002.
[1053] G. Wyszecki and W.S. Stiles, Color Science, 2nd edition, New York, NY: John Wiley
and Sons, 1982.
[1054] J.M. Yaeli, “Stray light measurement for imaging systems,” Optical Engineering,
27, 1, 86–87, 1988.
[1055] Y. Yamamoto, “Colorimetric evaluation of skin color in the Japanese,” Plastic and
Reconstructive Surgery, 96, 1, 139–145, 1995.
688 References
[1056] L. Yang and B. Kruse, “Revised Kubelka–Munk theory. I. Theory and application,”
Journal of the Optical Society of America, 21, 10, 1933–1941, 2004.
[1057] X.L. Yang and S.M. Wu, “Modulation of rod-cone coupling by light,” Science, 244,
352–354, 1989.
[1058] T. Yano and K. Hashimoto, “Preference for Japanese complexion color under illu-
mination,” Color Research and Application, 22, 4, 269–274, 1997.
[1059] A.L. Yarbus, Eye Movements and Vision, translated by B. Haigh, New York, NY:
Plenum Press, 1967.
[1060] A. Yariv, Optical Electronics, New York, NY: CBS College Publishing, Holt, Rine-
hart and Winston, 1985.
[1061] Y. Yoshida and Y. Yamamoto, “Color calibration of LCDs,” in Proc. IS&T/SID 10th
Color Imaging Conference, 305–311, 2002.
[1062] R.S.L. Young and M. Alpern, “Pupil responses to foveal exchange of monochromatic
lights,” Journal of the Optical Society of America, 70, 6, 697–706, 1980.
[1063] K.C. Yow and R. Cipolla, “Feature-based human face detection,” Image and Vision
Computing, 15, 9, 713–735, 1997.
[1064] J.A.C. Yule, “Unsharp masks and a new method of increasing definition in prints,”
Photographic Journal, 84, 321–327, 1944.
[1065] J.A.C. Yule, Principles of Color Reproduction, New York, NY: Wiley, 1967.
[1066] S. Zeki, A Vision of the Brain, Oxford: Blackwell Scientific Publications, 1993.
[1067] X. Zhang and B. Wandell, “A spatial extension of CIELAB for digital color image
reproduction,” SID Digest, 27, 731–734, 1996.
[1068] X. Zhang and B. Wandell, “Color image fidelity metrics evaluated using image
distortion maps,” Signal Processing, 70, 201–214, 1998.
[1069] W. Zhu, K. Parker, and M.A. Kriss, “Color filter arrays based on mutually exclusive
blue noise patterns,” Journal of Visual Communication and Image Representation,
10, 245–267, 1999.
[1070] A. van der Ziel, Noise in Solid State Devices and Circuits, New York, NY: John
Wiley and Sons, 1986.
[1071] H. Zollinger, Color Chemistry, 2nd edition, Weinheim: VCH Publishers, 1991.
[1072] G. Zorpette, “Let there be light,” IEEE Spectrum, 39, 9, 70–74, September, 2002.
[1073] A. Zukauskas, M.S. Shur, and R. Caska, Introduction to Solid-State Lighting, New
York, NY: John Wiley & Sons, 2002.
[1074] D.M. Zwick, “Colour granularity and graininess,” Journal of Photographic Science,
11, 269–275, 1963.
[1075] D. Zwick and D.L. Brothers, Jr., “RMS granularity: determination of just-noticeable
differences,” Photographic Science and Engineering, 19, 4, 235–238, 1975.
Index
689
690 Index