Digital Camera
Digital Camera
Figure 2.23 Image sensing pipeline, showing the various sources of noise as well as typic
digital post-processing steps.
EECS 4422/5323 Computer Vision !2 J. Elder
Outline
❖ The Sensor
❖ Colour Coding
❖ Colour Coding
๏ Then charge transferred from well to well (“bucket brigade”) until deposited at sense
amplifiers
Figure 2.23 Image sensing pipeline, showing the various sources of noise as well as ty
EECS 4422/5323 Computer Vision !5 digital post-processing steps. J. Elder
Shutter Speed
❖ Measured in fractions of a second (e.g., 1/125, 1/60, 1/30,…)
❖ For a fixed chip size, smaller pitch means higher resolution (good!) but less light per
pixel (bad!)
❖ Generally this is less than the 35mm width of a standard film frame.
❖ Our understanding of focal lengths (e.g., a standard 50mm lens) is based on using
35mm film
❖ To adapt this to a digital camera we must scale by the ratio of the sensor widths.
1
✓ W W ✓
tan = or f = tan . (2.60) W/2
2 2f 2 2
θ/2 f Z
tional film cameras, W = 35mm, and hence f is also expressed in millimeters. (x,y,1)
Optical centre
work with digital images, it is more convenient to express W in pixels so that the (X,Y,Z)
h f can be used directly in the calibration matrix K as in (2.59).
r possibility is to scale the pixel coordinates so that theyFigure
go from2.10 [Central along showingSensor
1, 1)projection, planebetween the 3D and 2
the relationship
image dimension and [ a 1 , a 1 ) along the shorter axis, nates,where
p and x,a as well as the
1 is the relationship between the focal length f , image wid
the field of view ✓.
ct ratio (as opposed to the sensor cell aspect ratio introduced earlier). This can be
❖ Example: What focal length would give me the equivalent of a 50mm lens for the
FLIR BlackFly S BFS-PGE-122S6C-C? Figure 2.9 shows how these quantities can be visualized as part of a simplifie
model. Note that now we have placed the image plane in front of the nodal point (
center of the lens). The sense of the y axis has also been flipped to get a coordina
❖ Sensor width: 1.1” = 27.94mm compatible with the way that most imaging libraries treat the vertical (row) coordi
tain graphics libraries, such as Direct3D, use a left-handed coordinate system, whic
to some confusion.
The issue of how to express focal lengths is one that often causes confusion in imp
computer vision algorithms and discussing their results. This is because the fo
depends on the units used to measure pixels.
If we number pixel coordinates using integer values, say [0, W ) ⇥ [0, H), the fo
f and camera center (cx , cy ) in (2.59) can be expressed as pixel values. How do th
EECS 4422/5323 Computer Vision !9 J. Elder
tities relate to the more familiar focal lengths used by photographers?
Analog Gain
❖ May be controlled through automatic gain control logic
❖ Higher gain allows faster shutters speeds (less motion blur) and/or smaller apertures
(greater depth of field).
Figure 2.23 Image sensing pipeline, showing the various sources of noise as well as ty
digital post-processing steps.
๏ shot noise
๏ amplifier noise
๏ quantization noise
❖ Colour Coding
❖ The sensor is spatially sampling this signal at discrete locations determined by the
sampling pitch.
❖ If the image is not low-pass filtered, aliasing will result: high frequency content will
be inextricably mixed with low frequency content in the digital image.
− sin ( 2π (5 / 4)x )
* =
sin ( 2π (3 / 4)x )
f = 3/4 f = 5/4
− sin ( 2π (5 / 4)x )
* =
sin ( 2π (3 / 4)x )
f = 3/4 f = 5/4
❖ However, the Fourier transform of this ‘boxcar’ filter falls only as 1/f, and thus high
frequencies, while attenuated, are still present and cause aliasing.
2.3 The digital camera 79
Original
(a)Image Boxcar with(b)
25% fill factor Boxcar with (c)
100% fill factor High-quality lowpass filter
(d)
Figure 2.25 Aliasing of a two-dimensional signal: (a) original full-resolution image; (b)
downsampled 4⇥ with a 25% fill factor boxSubsampled by a factor of 44⇥ with a 100% fill
filter; (c) downsampled
factor
EECS box filter;
4422/5323 Computer downsampled 4⇥ with a high-quality
(d)Vision !15 9-tap filter. Notice how the higher
J. Elder
Point Spread Function (PSF)
❖ The pre-filtering of the optical signal is determined by:
❖ If together these filters adequately attenuate frequencies above the Nyquist limit,
visible aliasing will be minimal.
❖ Colour Coding
❖ Yet most colour cameras have only 3 discrete types of sensor elements tuned to 3
different colours (wavelengths): red, green and blue.
❖ Any 3 colour vectors that span this 3D space are sufficient to generate the entire space
of colours that we experience.
Cone sensitivity
๏ Red (700.0nm)
๏ Green (546.1nm)
๏ Blue (435.8nm)
82 Computer Vision: Algorithms and Applications (Septemb
❖ Note that reproducing pure spectra in the blue-green range requires a negative amount of red
light!
0.4 2.0
r 1.8
0.3 g 1.6
b 1.4
0.2 1.2
1.0
0.8
0.1
0.6
0.4
0.0
360 400 440 480 520 560 600 640 680 720 760 0.2
0.0
-0.1 360 400 440 480 520 560 600
(a)
EECS 4422/5323 Computer Vision
(b)
!22 J. Elder
obtain the resulting (x̄( ), ȳ( ), z̄( )) curves shown in Figure 2.28b. Notice how all three
spectra (color matching functions) now have only positive values and how the ȳ( ) curve
Chromaticity
matches that of the luminance Coordinates
perceived by humans.
If we divide the XYZ values by the sum of X+Y+Z, we obtain the chromaticity coordi-
nates
digital camera x=
X
, y=
Y
, z=
Z
, (2.104)
X +Y +Z X +Y +Z X +Y +Z
which sum up to 1. The chromaticity coordinates discard the absolute intensity of a given
color sample and just represent its pure color. If we sweep the monochromatic color pa-
rameter in Figure 2.28b from = 380nm to = 800nm, we obtain the familiar chromaticity
diagram shown in Figure 2.29. This figure shows the (x, y) value for every color value per-
ceivable by most humans. (Of course, the CMYK reproduction processMonochromatic colours
in this book does not
actually span the whole gamut of perceivable colors.) The outer curved rim represents where
all of the pure monochromatic color values map in (x, y) space, while the lower straight line,
which connects the two endpoints, is known as the purple line.
A convenient representation for color values, when we want to tease apart luminance
and chromaticity, is therefore Yxy (luminance plus the two most distinctive chrominance
components).
While the XYZ color space has many convenient properties, including the ability to separate
luminance from chrominance, it does not actually predict how well humans perceive differ-
ences in color or luminance.
2.29 CIE chromaticity
EECS 4422/5323 Computer Vision
diagram, showing
!23
colors and their correspondi
J. Elder
L*a*b* Space
❖ Human luminance/colour sensitivity is roughly logarithmic
✓ ◆ 60
Y
L⇤ = 116f ,−16 (2.105)
L*
40
Yn
20
where Yn is the luminance value for nominal white (Fairchild 2005) and 0
0 20 40 60 80 100
( Y
t1/3 t> 3
f (t) = (2.106)
t/(3 ) + 2 /3 else,
2 In MATLAB:
• rgb2lab(rgb)
Yn =root δ = 6 / 29. • lab2rgb(lab)
is a finite-slope approximation towhere
the cube 100, with = 6/29. The resulting 0 . . . 100 scale
• xyz2lab(xyz)
roughly measures equal amounts of lightness perceptibility. • lab2xyz(lab)
EECS 4422/5323 Computer Vision !24 J. Elder
In a similar fashion, the a* and b* components are defined as
Appendix A: Spectral Response Curves
❖ It’s the job of the camera firmware to convert these proprietary sensor responses to
standard colour values.
❖ For some professional and scientific cameras, the manufacturer provides the spectral
responses.
86 design is the
❖ The most common Computer
Bayer Vision:
pattern,Algorithms andof
consisting
74
Applications (September 3, 2010 draft)
Computer Vision: Algorithms and Applications (September 3, 2010 dr
Figure 2.23 Image sensing pipeline, showing the various sources of noise as well as typ
with unknown (guessed) shown as lower case. of sensor used in digital still and video cameras today are charge-coupled device (CCD)
complementary metal oxide on silicon (CMOS).
In a CCD, photons are accumulated in each active well during the exposure time. Th
in a transfer phase, the charges are transferred from well to well in a kind of “bucket briga
๏ visual acuity is The
far greater for luminance
most commonly than colour
used pattern in color cameras today is the Bayer pattern (Bayer
until they are deposited at the sense amplifiers, which amplify the signal and pass i
an analog-to-digital converter (ADC).10 Older CCD sensors were prone to blooming, w
1976), which places green filters over half of the sensors (in a checkerboard pattern), and red
charges from one over-exposed pixel spilled into adjacent ones, but most newer CCDs h
anti-blooming technology (“troughs” into which the excess charge can spill).
and blue filters
❖ Interpolation of missing colourover the remaining
values ones (Figure
at each pixel known 2.30). The reason that there are twice as many
as demosaicing.
In CMOS, the photons hitting the sensor directly affect the conductivity (or gain) o
photodetector, which can be selectively gated to control exposure duration, and locally
green filters as red and blue is because the luminance signal is mostly determined by green
plified before being read out using a multiplexing scheme. Traditionally, CCD sen
values and the visual system is much more sensitive to high frequency detail in luminance
outperformed CMOS in quality sensitive applications, such as digital SLRs, while CM
was better for low-power applications, but today CMOS is used in most digital cameras.
than in chrominance (a fact that is!26exploited in color image compression—see Section 2.3.3).
EECS 4422/5323 Computer Vision J. Elder
The main factors affecting the performance of a digital image sensor are the shutter spe
White Balance
❖ The colour of the irradiance received from a surface depends upon both the colour of the
surface material and the colour of the illuminant.
❖ If the illuminant deviates from this standard, the resulting photo (out of context) may look
oddly coloured.
❖ White balance is an attempt to reduce this effect by moving the white point of the image
closer to pure white (equal RGB values).
B=V , Gamma
(2.110)
wist, i.e., they use a general 3 ⇥ 3 color transform matrix.21 Exer-
compensate Cameras
❖ for typically
this effect,
me of these issues. compress
the electronics theTV
in the intensity
camera(luminance)
would of pixel values through an
ance Y through inverse ‘gamma
an inverse function’:
gamma,
1
Y =Y ,
0
(2.111)
and white television,
where γthe
! phosphors in the CRT used to display
2.2.
= 0.45.
on-linearly to their input voltage. The relationship between the
gnal through this non-linearity before transmission had a beneficial
rightness wasThis
❖ roughly by
characterized cancels the called
a number gamma function
gamma applied to RGB values by display
( ), since
uring transmission
systems(remember,
prior to these were analog days!) would be
rendering:
e gamma at the receiver) in the darker regions of the signal where
22
B=V , (2.110)
e 2.31). (Remember that our visual system is roughly sensitive to
inance.) ❖forHowever
ompensate this effect,the
thenonlinear
electronicsrelationship between
in the TV camera wouldencoded RGB values and physical
nce Y throughintensities
an inverse complicates
gamma, physics-based computer vision algorithms, which often
emember the early days of color
assume accesstelevision
to will naturally
linear think of the
luminance hue adjustment
values.
could produce truly0 bizarre1 results.
Y =Y , (2.111)
companding was the basis of the Dolby noise reduction systems used with audio
0.45.
Output intensity
nal through this non-linearity before transmission had a beneficial
ring transmission (remember, these were analog days!) would be
gamma at the receiver) in the darker regions of the signal where
2.31).22 (Remember that our visual system is roughly sensitive to
nance.)
member the early days of color television will naturally think of the hue adjustment Input intensity
could produce truly bizarre results.
EECS 4422/5323 Computer Vision !30 J. Elder
ompanding was the basis of the Dolby noise reduction systems used with audio
Compression
❖ All compression algorithms start by separating luma and chroma channels so that
luma can be encoded with higher fidelity.
❖ Block transform stage then breaks image into disjoint blocks (e.g., 8 x 8 pixels) and
codes each using a discrete cosine transform (DCT), which approximates an efficient
coding (principal components) strategy.
❖ Video coding uses predictive (difference) encoding between frames, compensating for
estimated motion in the image.
❖ Colour Coding