Chapter 23 - Introduction To Video Processi - 2017 - Digital Signal Processing 1
Chapter 23 - Introduction To Video Processi - 2017 - Digital Signal Processing 1
Figure 23.1
Y, Cr, Cb images (which may be difficult to view in black/white book).
Digital Signal Processing 101. http://dx.doi.org/10.1016/B978-0-12-811453-7.00023-8
Copyright © 2017 Elsevier Inc. All rights reserved.
297
298 Chapter 23
brightness, than to color. So a higher resolution can be used for luminance, and less
resolution for chrominance. There are several formats used.
4: 4: 4 YCrCb each set of four pixels is composed of four Y ðluminanceÞ and four Cr and
four Cb ðchrominanceÞ samples
4: 2: 2 YCrCb each set of four pixels is composed of four Y ðluminanceÞ and two Cr and
two Cb ðchrominanceÞ samples
4: 2: 0 YCrCb each set of four pixels is composed of four Y ðluminanceÞ and one Cr and
one Cb ðchrominanceÞ samples
Most broadcast systems and video signals use the 4:2:2 YCrCb format, where the
luminance is sampled at twice the rate of each Cr and Cb chrominance. Each pixel,
therefore, requires an average of 20 bits to represent, as compared to 30 bits for 4:4:4
YCrCb.
An alternate system was developed for computer systems and displays. There was no
legacy of black and white to maintain compatibility with, and transmission bandwidth was
not a concern, as the display is just a short cable connection to the computer. This is
known as the RGB format, for red/green/blue (Fig. 23.2). Each pixel is composed of these
three primary colors and requires 30 bits to represent. Most all televisions, flat screens, and
monitors use RGB video, whereas nearly all broadcast signals use 4:2:2 YCrCb video.
These two color spaces can be mapped to each other as follows:
Y ¼ 0:299$R þ 0:587$G þ 0:114$B
Cr ¼ 0:498$R 0:330$G þ 0:498$B þ 128
Cb ¼ 0:168$R 0:417$G 0:081$B þ 128
Figure 23.2
R, G, B images (which may be difficult to view in black/white book).
Introduction to Video Processing 299
and
R ¼ Y þ 1:397$ðCr 128Þ
G ¼ Y 0:711$ðCr 128Þ 0:343$ðCb 128Þ
B ¼ Y þ 1:765$ðCb 128Þ
There are other color schemes as well, such as CYMK, which is commonly used in
printers, but we are not going to cover this further here.
Different resolutions are used. Very common resolution is the NTSC (National Television
System Committee), also known as SD or standard definition. This has a pixel resolution
of 480 rows and 720 columns. This forms a frame of video. In video jargon, each frame is
composed of lines (480 rows) containing 720 pixels. The frame rate is approximately 30
frames (actually 29.97) per second.
23.2 Interlacing
Most NTSC SD broadcast video is interlaced. This was due to early technology where
cameras filmed at 30 frames per second (fps), but this was not a sufficient update rate to
prevent annoying flicker on television and movie theater screens. The solution was
interlaced video where frames are updated at 60 fps, but only half of the lines are updated
on each frame. One frame N, the odd lines are updated, and on frame N þ 1 the even lines
are updated and so forth. This is known as odd and even field updating.
Interlaced video requires half the bandwidth to transmit as noninterlaced or progressive
video at the same frame rate, as only one half of each frame is updated at the 60 fps rate.
Modern cameras can record full images at 60 fps, although there are still many low-cost
cameras that produce this interlaced video. Most monitors and flat screen televisions
usually display full or progressive video frames at 60 fps. When you see a 720p or 1080i
designation on a flat screen, the “p” or “i” stand for progressive and interlaced, respectively.
23.3 Deinterlacing
An interlaced video stream is usually converted to progressive for image processing, as
well as to for display on nearly all computer monitors. Deinterlacing must be viewed as
interpolation, for the result is twice the video bandwidth. There are several methods
available for deinterlacing, which can result in different video qualities under different
circumstances.
The two basic methods are known as “bob” and “weave.” Bob is the simpler of the two.
Each frame of interlaced video has only one half the lines. For example, the odd lines
(1,3,5, . 479) would have pixels, and the even lines (2,4,6, . 480) are blank. On the
300 Chapter 23
Figure 23.3
Bob verses weave deinterlacing.
following frame, the even lines have pixels, but the odd lines are blank. The simplest bob
interlacing is to just copy the pixels from the line above for blank even lines (copy line 1
to line 2), and copy the pixels from the line below for blank odd lines (copy line 2 to line
1). Another method would be to interpolate between the two adjacent lines to fill in a
blank line. Both of these methods are shown in Fig. 23.3.
This method can cause blurring of images, because the vertical resolution has been
effectively halved.
Weave deinterlacing creates a full frame from the separate interlaced frames with odd and
even lines. It then copies this frame twice, to achieve the 60 fps rate. This method tends to
work only if there is little change in the odd and even interlaced frames, meaning there is
little motion in the video. As the odd and even frame pixels belong to different instances
in time (1/60th of a second difference), rapid motion can result in jagged edges in the
images rather than smooth lines. This is shown in Fig. 23.4.
Both of these methods have drawbacks. A better method, which requires more
sophisticated video processing, is to use motion adaptive deinterlacing. Where there is
motion on the image, the bob technique works better, and slight blurring is not easily seen.
In still areas of the image, the weave method will result in crisper images. A motion
adaptive deinterlacer scans the whole image and detects areas of motion, by comparing to
previous frames. It will use the bob method in these areas of the frame and use the weave
method on the remaining areas of the frame. In this way, interlaced video can be converted
to progressive with little loss of quality.
Introduction to Video Processing 301
Figure 23.4
Deinterlacing effects.
Color Plane
Format at
60 frames per
Image Size Frame Size second Bit/s Transfer Rate
1080p 1920 1125 2200 4:2:2 YCrCb 2200 1125 20 60 ¼ 2.97 Gbps
1080i 1920 1125 2200 4:2:2 YCrCb 2200 1125 20 60 0.5 ¼ 1.485 Gbps
720p 1280 750 1650 4:2:2 YCrCb 1650 750 20 60 ¼ 1.485 Gbps
480i 720 525 858 4:2:2 YCrCb 858 525 20 60 0.5 ¼ 270 Mbps
(A)
Bilinear
(2 x 2)
interpolation
5 Tap
(5 x 5 pixel array)
interpolation
(B)
9 Tap
(9 x 9 pixel array)
interpolation
Figure 23.5
Image downscaling effects.
Cropping is simply eliminating pixels, to allow an image to fit within the frame size. It
does not introduce any visual artifacts.
Figure 23.6
Picture in picture (PiP).
Figure 23.7
Video compression quality comparison.
SDI: This is a broadcast industry standard (Fig. 23.8), used to interconnect various
professional equipment in broadcast studios and mobile video processing centers (like
those big truck trailers seen at major sports events). SDI stands for “serial data interface,”
which is not very descriptive. It is an analog signal, modulated with digital information.
This is usually connected using a coaxial cable. It is able to carry all of the data rates
listed in Table 23.1 and dynamically switch between them. Most FPGAs and broadcast
ASICs can interface directly with SDI signals.
DVI: Digital visual interface (DVI) is a connection type commonly used to connect
computer monitors. It is a multipin connector carrying separated RGB digitized video
information at the desired frame resolution (Fig. 23.9).
Figure 23.8
Serial data interface coax connector.
306 Chapter 23
Figure 23.9
Digital visual interface monitor connector.
HDMI: High definition multimedia interface (HDMI) is also commonly used on computer
monitors and on big screen to connect home theater equipment such as flat panels,
computers, and DVDs together. Also carrying video and audio information in digital form,
HDMI has backward electrical compatibility to DVI but utilizes a more compact
connector. Later versions of HDMI support higher video frame sizes, rates, and higher bits
per pixel (Fig. 23.10).
DisplayPort: The latest state of the art video interface is DisplayPort. This digital interface
uses a packetized protocol to transmit video and audio information. DisplayPort can have
1, 2, 3, or 4 serial differential interfaces to support various data rates and very high
Figure 23.10
High definition multimedia interface connector.
Introduction to Video Processing 307
Figure 23.11
Display port interface and connector.
resolutions. Each serial interface supports about 5 Gbps data rate. It also uses 8/10-bit
encoding, which allows the clocking to be embedded with the data. It also has a compact
form factor connector, similar in size to HDMI (Fig. 23.11).
Figure 23.12
Video graphics array monitor connector.
308 Chapter 23
CVBS: Standing for “composite video blanking and sync,” this is the basic yellow cable
used to connect televisions and VCRs, DVDs together. It carries an SD 4:2:2 YCrCb
combined analog video signal on a low-cost coax “patch cable” (Fig. 23.13).
S-Video: This is a legacy method used to connect consumer home theater equipment such
as flat panels, televisions and VCRs, and DVDs together. It carries analog 4:2:2 YCrCb
signals in separate form over a single multipin connector, using a shielded cable. It is of
higher quality than CVBS (Fig. 23.14).
Component Video: This also is legacy method to connect consumer home theater
equipment such as flat panels, televisions and VCRs, and DVDs together. It carries analog
4:2:2 YCrCb signals in separate form over a three coax patch cables. Often the connectors
are labeled as Y, PB, and PR. It is of higher quality than S-video due to separate cables
(Fig. 23.15).
Figure 23.13
Composite video cable.
Figure 23.14
S-video cable.
Introduction to Video Processing 309
Figure 23.15
Component video cables.