Encoding and Transcoding Guide V3
Encoding and Transcoding Guide V3
Table of Contents
TABLE OF CONTENTS ............................................................................................................................................. 2
1 INTRODUCTION ............................................................................................................................................. 3
2 VIDEO PRE-PROCESSING TOOLS.............................................................................................................. 4
3 VIDEO ENCODING (CORE) .......................................................................................................................... 6
4 VIDEO ENCODING (ADDITIONAL TOOLS) ........................................................................................... 17
5 ENTROPY CODING ...................................................................................................................................... 19
6 STATISTICAL MULTIPLEXING ................................................................................................................ 20
7 OUTGOING VIDEO FORMATS (BROADCAST MODE) ........................................................................ 23
09/02/2015 2
Confidential
1 Introduction
The new compression product launches occurring during 2014 and 2015 heralds a new era of
excellence for Appear TV. Already a very strong player in video compression markets, these
revolutionary new modules build upon the considerable achievements that have already been
made, and enable all of the advantages of a fully modular solution to be delivered with best in
class performance.
And by best in class, we mean any class from any competing provider.
During the past few years, Appear TV has grown to bring innovation into several new markets.
From the introduction of high performance modular encoding and transcoding solutions which
brought a new dimension to the capability of our revolutionary chassis concept in 2010, we have
continued to add;
During 2014 the addition of new, ultra-high performance compression options will raise the
video performance to ‘best on market’ standards, making the platform equally applicable for the
most demanding tier 1 operators. Based on a new, ‘hybrid’ hardware platform that combines the
best efficiency and VQ properties of a hardware encoder with the flexibility associated with
software encoding, the new UNIVERSAL module series can be tailored to virtually any task.
Providing many times the processing power of current modules, the new UNIVERSAL range
either enables significant density improvements, for turn-around markets such as Cable, or
significant VQ improvements to beat even the best performing competing encoders on the
market today.
The superb performance of the new model line-up goes beyond the video compression stage
alone, and for broadcast applications also introduces a new and improved stat mux mechanism.
A new quad-core audio DSP will also be introduced to bring significant capability and density
benefits, such as the native ability to encode multi-channel Dolby audio.
Appear TV has always offered class leading packaging with a ‘do-anything’ capability to extract
maximum advantage from a fully modular solution, and without question, this is revolutionizing
the art of headend design. Today, Appear TV provides powerful systems that are more flexible,
space efficient and easy to use / maintain than ever before. Now, Appear TV can also provide a
standard of compression performance that is second to none.
09/02/2015 3
Confidential
This document moves through the encoding process to provide a stage by stage account of many
of the important tools that the new platform offers. It describes key basic principles, and well as
many of the available features, including internal parameters that advanced users may at some
stage wish to access to fine tune their systems.
If you are just looking for a summary to best configure the encoder / transcoder and wish to
omit the theory / background, please skip through to the summary table (section 8).
1. The mode decisions made to try and encode the noise result in sub-optimal encoding of
the real picture content.
2. If stat mux is used, the higher than necessary bitrate allocations waste bandwidth, not
only for the noisy service but for all services since the bandwidth is taken from a
common resource pool.
3. The noise will usually not be successfully encoded anyway and high frequency noise co-
efficients especially are likely to still be quantized out. This means the outgoing video
will appear to be noise free, even though its presence on the input will have caused severe
reductions in encoding efficiency. The service will appear noise free, but will have
consumed a greater bitrate to encode than would otherwise be necessary and will also
have lower VQ.
High frequency noise will only survive the DCT stage if very high video bitrates are used. This
is why a suspected noise problem should only be investigated by looking at the source of an
encoder, and never its output (unless it can be operated at maximum video bitrate as a test).
Since both mode decisions and stat mux allocation occurs early on during encoding, noise
problems can only be tackled by pre-filtering specifically designed to remove it at source.
The MCTF filter is the tool designed to achieve this in Appear TV encoders, and has been re-
developed for the new Appear TV universal compression range to provide bi-directional noise
reduction that also restores the true detail within content, especially textures and edges and
especially in low lighting conditions.
MCTF is always enabled by default. However, the default setting is very non-aggressive. The
MctfStrength parameter adjusts this from 0 (weakest and default setting) to 7 (strongest). It is
possible to disable MCTF completely using the BYPASS command where 0 enables the MCTF
engine and 1 disables it. The default is 0 (MCTF enabled).
The following illustrations show source content with noise, and how application of MCTF can
remove this. The first image is with noise, and the second is after processing with MCTF.
09/02/2015 4
Confidential
MCTF can cause noticeable softening of the image, which is likely to occur for aggressive
values higher than +3. For this reason, it is important to use the tool appropriately, with 0 or 1
covering the majority of typical scenarios.
The following illustrations show a pre-compressed source exhibiting visible MPEG2 macroblock
edges before and after being passed through the de-blocking filter.
09/02/2015 5
Confidential
As with noise reduction, applying the deblocking filter too aggressively will soften and remove
detail, although with less danger of being extremely aggressive (as can be the case with MCTF).
The tool has adjustment between 0 and 3 and with MPEG2 sources especially, should be used at
an aggressive setting
iii. Detelecine
The encoder is provided with a de-telecine (3-2 pulldown) feature to help optimize encoding
efficiency when video has been through a telecine process. This inserts additional frames to take
the frame rate up from native film rate (24p) to NTSC (60i), it is possible to reverse this process
to get the original 24 frame progressive source back. This is often preferable when a progressive
source is required for the following reasons;
If the telecine 60i source is converted to 30p progressive then the interlaced fields will be
‘merged’ by performing temporal filtering. This will include the additional fields added
by the Telecine process.
If the Telecine 60i source is converted to progressive by removing the additional fields
(inverse telecine) then the original 24p source will be restored.
Inverse telecine should only be applied to material that has been through the telecine process.
The encoder has two automatic triggers which can be used to activate de-telecine. The first is the
IT flag that signals whether telecine has been applied. If this is set, the encoder will apply de-
telecine automatically. The second is if the encoder detects repeated fields. Both of these
functions are set by enabling de-telecine in the GUI.
An interlaced field consists of two frames. The ‘top’ frame carries even numbered lines and the
‘bottom’ frame carries odd numbered lines. The reason for interlacing is historical, and dates
back to the CRT days when it provided a convenient ‘analogue’ method of video compression by
allowing the effective picture rate to double and therefore reduce flicker.
Currently, interlaced video is a legacy problem for modern progressive displays and finely tuned
compression schemes that want to exploit commonality between frames. In this setting,
interlaced video does not fit since it does the reverse by presenting a frame as two offset fields.
The offset is not just by line, but also by time making the difference between the two fields
largely dependent upon the degree of motion present.
Two legacy fixed methods exist for scanning interlaced content. The first is to treat each
interlaced field separately, and encode them both. This is FIELD mode, and when selected it will
result in the top frame and bottom frame of interlaced content being permanently encoded as
separate pictures. The second legacy option is FRAME mode, in which case the two fields will
be merged and encoded as one frame.
09/02/2015 6
Confidential
Field mode works best if there is motion throughout the entire picture. Frame mode works best if
there is no motion in the picture at all.
Obviously content will vary between these two extremes, and very often there will be motion
only within selective areas of the picture. Two adaptive options have been introduced into the
AVC toolkit for encoding interlaced content more intelligently. The first is Picture Adaptive
Field Frame (PAFF) which assesses the degree of motion and makes a frame by frame decision
on whether to encode in FIELD or FRAME mode.
The second is Macroblock adaptive field frame, or MBAFF. When a frame is encoded, it is
encoded into a matrix of 16x16 macroblocks. MBAFF enables the field/frame encoding decision
to be made at macroblock level, to optimise for any type of content especially where there are
static areas as well as areas of movement. MBAFF sets either FRAME or FIELD mode for two
macroblock pairs, and therefore operates with a granularity of 32x16 blocks.
The picture below illustrates how macroblocks have been encoded in these two different modes,
with FRAME mode (un-highlighted areas) marking stationary areas and FIELD mode
(highlighting the macroblock edges) for areas of movement.
Field only
Frame only
PAFF
MBAFF
09/02/2015 7
Confidential
i. Encoding Profile
Profiles exist within complex encoding standards to define a subset of allowed tools which can
H264 H264 H264 H264 H264 MPEG2 MPEG2
Con. Baseline Main High Hi10 Hi422 Main High
B Frames No Yes Yes Yes Yes Yes Yes
CABAC No Yes Yes Yes Yes No No
PAFF or MBAFF No Yes Yes Yes Yes Yes Yes
Reference B Frames No Yes Yes Yes Yes No No
4:2:2 No No No No Yes No Yes
10-bit No No No Yes Yes No No
be used to set the complexity of the encoded stream and therefore match it to the capability of
the decoder.
The following chart illustrates the profiles within H.264.
Within the H.264 standard, three profiles are commonly used for broadcast. These are;
Constrained Baseline. Often used in OTT applications with older and / or lower priced
smartphones. To provide an indication, the Apple Iphone 3S was the first Apple device
capable of supporting main profile decoding. Usually, with the simplest devices, either
constrained baseline or baseline profile is all that is supported.
Main Profile. The complexity difference between constrained baseline and main profile
is very significant, because main profile introduces B frames. The encoding efficiency
(resulting in improved video quality at a given bitrate) is therefore much improved. Large
screen broadcast viewing requires this step-up in terms of efficiency, hence for AVC,
main profile is the most commonly used profile for broadcast.
High profile introduces further tools to improve the compression gain, at the expense of
additional complexity. The key tool is the 8x8 transform. Most modern decoders support
high profile fully, and so its use is becoming more common.
Appear TV can support the H.264 Hi10P and HI422P modes in hardware, but does not support
either in standard broadcast mode. The Hi10P profile introduces 10 bit luminance quantization
and so increases luminance granularity from 256 to 1,024 quantization levels. This benefits both
AVC broadcast and contribution applications by reducing contouring on blanket backgrounds,
which AVC is prone to display because it uses an integer DCT and does not provide the DCT
noise that dithered out these artefacts with MPEG2. Since luminance sampling occurs before the
DCT stage, no increase in video bitrate is necessary as a result of using 10-bit precision: All that
is required is to ensure that the content is not continually compressed to its limits, enabling static
scenes to be moderately quantized to realize the benefits of improved luminance granularity on
reducing contouring. Although highly applicable to DTH broadcast, 10 bit precision is poorly
supported within the consumer decoder domain. HEVC will change this by making native 10 bit
support almost universal. Many 422 AVC encoders / decoders do support 10 bit, hence it has
become very popular for contribution.
The HI422P profile is almost exclusively used in high-end events contribution. 420 profiles sub-
sample chroma, where as 422 profiles do not. The benefits of 422 sampling become apparent
after multiple decode / encode generations, since chroma depth is maintained. By contract,
09/02/2015 8
Confidential
multiple 420 generations will lose chroma definition and suffer chroma ‘bleed’. The extent of the
problem will largely depend upon how accurately the downscaling and upscaling is done
between generations, which was a particular problem with MPEG2 (because the filter
characteristics between the encoder and decoder was poorly defined) compared to H.264 which
defines the filter characteristics fully.
Appear TV broadcast encoders enable easy user selection between constrained baseline, main
and high profiles for AVC, and main profile for MPEG2.
i. Level
Defining a profile is still too wide to ensure compatibility with decoders, and the concept of
‘levels’ within a profile has been adopted to narrow the specification down further.
Levels primarily indicate the range of bitrates, frame rates and resolutions that should be
supported within a profile. The general concept is that the simpler the profile, the lower the
minimum resolution, bitrate and framerate options provided within the levels will be to support
the intended use of that device type (eg. Baseline profile for low cost simple decoders powering
small, low resolution mobile phone displays).
The following table, courtesy of Wikipedia, lists all of the levels mandated within AVC.
Max video bit rate for video coding layer Examples for high
Max decoding speed Max frame size
(VCL) kbit/s resolution
@ highest frame rate
(max stored frames)
Level
Baseline, Toggle additional details
Luma Luma Extended High High 10
Macroblocks/s Macroblocks
samples/s samples and Main Profile Profile
Profiles
128×[email protected] (8)
1 380,160 1,485 25,344 99 64 80 192
176×[email protected] (4)
128×[email protected] (8)
1b 380,160 1,485 25,344 99 128 160 384
176×[email protected] (4)
176×[email protected] (9)
320×[email protected] (3)
1.1 768,000 3,000 101,376 396 192 240 576
352×[email protected] (2)
1.2 1,536,000 6,000 101,376 396 384 480 1,152 320×[email protected] (7)
09/02/2015 9
Confidential
352×[email protected] (6)
320×[email protected] (7)
1.3 3,041,280 11,880 101,376 396 768 960 2,304
352×[email protected] (6)
320×[email protected] (7)
2 3,041,280 11,880 101,376 396 2,000 2,500 6,000
352×[email protected] (6)
352×[email protected] (7)
2.1 5,068,800 19,800 202,752 792 4,000 5,000 12,000
352×[email protected] (6)
352×[email protected](10)
352×[email protected] (7)
2.2 5,184,000 20,250 414,720 1,620 4,000 5,000 12,000 720×[email protected] (6)
720×[email protected] (5)
352×[email protected] (12)
352×[email protected] (10)
3 10,368,000 40,500 414,720 1,620 10,000 12,500 30,000 720×[email protected] (6)
720×[email protected] (5)
720×[email protected] (13)
720×[email protected] (11)
3.1 27,648,000 108,000 921,600 3,600 14,000 17,500 42,000
1280×[email protected] (5)
1,280×[email protected] (5)
3.2 55,296,000 216,000 1,310,720 5,120 20,000 25,000 60,000
1,280×1,[email protected] (4)
1,280×[email protected] (9)
1,920×1,[email protected] (4)
4 62,914,560 245,760 2,097,152 8,192 20,000 25,000 60,000
2,048×1,[email protected] (4)
1,280×[email protected] (9)
1,920×1,[email protected] (4)
4.1 62,914,560 245,760 2,097,152 8,192 50,000 62,500 150,000
2,048×1,[email protected] (4)
1,280×[email protected] (9)
4.2 133,693,440 522,240 2,228,224 8,704 50,000 62,500 150,000
1,920×1,[email protected] (4)
09/02/2015 10
Confidential
2,048×1,[email protected] (4)
1,920×1,[email protected] (13)
2,048×1,[email protected] (13)
2,048×1,[email protected] (12)
5 150,994,944 589,824 5,652,480 22,080 135,000 168,750 405,000 2,560×1,[email protected] (5)
3,672×1,[email protected] (5)
1,920×1,[email protected] (16)
2,560×1,[email protected] (9)
3,840×2,[email protected] (5)
5.1 251,658,240 983,040 9,437,184 36,864 240,000 300,000 720,000 4,096×2,[email protected] (5)
4,096×2,[email protected] (5)
4,096×2,[email protected] (5)
1,920×1,[email protected] (16)
2,560×1,[email protected] (9)
3,840×2,[email protected] (5)
5.2 530,841,600 2,073,600 9,437,184 36,864 240,000 300,000 720,000 4,096×2,[email protected] (5)
4,096×2,[email protected] (5)
4,096×2,[email protected] (5)
MPEG2: MPEG2 High. MPEG2 High 1440. MPEG2 Main. MPEG2 Auto mode.
AVC: H.264 1b. H.264 1.0. H.264 1.1. H.264 1.2. H.264 1.3. H.264 2.0. H.264 2.1. H.264 2.2.
H.264 3.0. H.264 3.1. H.264 3.2. H.264 4.0. H.264 4.1. H.264 4.2. H.264 Auto mode
The definition of auto mode may be perplexing, but simply means that the level will be
automatically selected from the parameters (video resolution, bitrate etc.) that have been entered
into the encoder. This is the default mode and means that you can almost forget about levels
when configuring the encoder, and will certainly not be bound by the constraints of any
particular level.
v. GOP Structure
The GOP structure determines how many B frames will be used within a GOP. B frames are
extremely efficient because they are predicted in both forward and past temporal directions and
therefore just convey change between two other references. The two references used to derive a
B frame can be I, P or (in H.264 only) even two other B’s if the ‘hierarchical B frame’ mode is
enabled. Maximum use of B frames results in the most efficient coding, as long as the content is
easily predictable (Contains limited motion). For high motion sequences, an over reliance on B
frames will introduce errors and will decrease VQ. In this case, a GOP Structure with less
reliance on B frames is required. Before it was possible for an encoder to adjust GOP structure
dynamically, this setting was a compromise between maximizing efficiency and VQ for static
content and coping with dynamic content. It explains why Appear TV encoders can adjust GOP
structure dynamically, within a maximum and minimum B frame range selected by the following
09/02/2015 11
Confidential
parameters. Of course, by defining the B frame ratio the GOP structure also defines how regular
P frames occur.
For Appear TV encoders, the general recommendation is to use a maximum setting of IBBP for
typical MPEG2 applications (2 B frames).
For AVC, you can start at IBBBP (3 B frames) but this can be increased to up to 7 B frames if
the set top boxes will support this. Most recent STB’s will now support up to 7 B frames.
The ability to use up to 7 B’s efficiently is a strong differentiating factor for Appear TV encoders
which can help out perform competing products as long as the decoders being used support this.
The recommendation for Appear TV encoders is to set this to IP, for both MPEG2 and MPEG4
applications.
09/02/2015 12
Confidential
By definition, a closed GOP is built entirely of references that are contained within the GOP, but
an open GOP can also use a reference (last P frame) from the previous GOP. An open GOP
structure is therefore more efficient, because operating in closed GOP mode requires an
additional P frame to be included where as in open mode, the use of external P references allows
this to be replaced with a B frame which is more efficient.
The GOP length parameter will define for how long the GOP structure repeats until another I
frame is inserted and the process repeats again. Only one I frame is present per GOP, and so the
GOP length sets the naturally repeating I frame interval. Normally, the GOP length is defined in
terms of number of frames but an internal setting (GOP SIZE MODE) can allow alternative
counting via frame rate.
An I frame at the start of each GOP is ‘naturally occurring’, because Appear TV encoders can
insert unplanned I frames to start a new GOP early as a result of the scene change detection
feature. When features such as scene change detection are enabled, GOP lengths are always
approximate and will be liable to change with video content.
Typically, long GOP settings result in the efficient encoding of predictable sequences but non-
predictable events (such as scene changes) require a reference I frame to be provided, ideally as
an IDR. Since GOP planning mechanisms differ between encoder manufacturers, setting equal
GOP lengths may produce very different comparative results between different encoder
manufacturers when dynamic tools such as scene change detection are enabled. For this reason,
GOP lengths should be set as recommended by the manufacturer, and should not be based on
values taken from dissimilar equipment.
Since Appear TV encoders are usually set to alter GOP length dynamically to changing content,
it is no longer necessary to be conservative about setting short GOP lengths unless required for
other reasons. For example, it might be necessary to use fixed short GOPs to provide regular
random access points (I frames) for quick channel changes, or to enhance video editing or
provide trick play modes. Viewing content in fast forward / reverse modes relies on frequent I
frame access points to make the video decodable, and therefore the I frame interval can set the
granularity of ‘scrubbing’ through content in trick play mode.
Generally, Appear TV encoders will perform best if long maximum GOP lengths are used.
Compared to many encoders, Appear TV GOP planning works optimally if longer GOP lengths
are set than may be considered normal. For H264, a GOP length of 44 will work well for many
broadcast scenarios. Making the GOP length divisible by 4 will optimize efficiency when
hierarchical B frame mode is used (AVC only), and this practice is recommended. For MPEG2,
the maximum GOP length should be shorter and 24 works optimally in many scenarios. Some
older MPEG2 decoders may be limited by the maximum GOP length of 15 specified in the
MPEG 2 DVD specification, where a GOP length of 12 was recommended for PAL frame rates
with 15 for NTSC.
There are further internal controls that also define GOP length. These include the MAX GOP
SIZE parameter which constrains dynamic GOP planning to ensure a preset GOP interval is
never exceeded. This can be useful in limiting the worst case channel change time, but does not
affect the average channel change time (which is determined by the average GOP length).
09/02/2015 13
Confidential
If an Appear TV transcoder is being used, then the internal ‘GOP Control’ parameter can set the
re-encode stage to follow the incoming GOP structure if it is detected that the original GOP
structure is closed. The default setting is disabled, so the transcoder will use the GOP structure
that has been manually defined.
v. Scaling Matrix
An MPEG 2 encoder will divide each macroblock into 8x8 pixel blocks, and each will undergo
DCT to derive the vertical and horizontal frequency components that describe these 8x8 pixels in
frequency terms. At the left is the DC component, followed by the low frequency components
which get higher in frequency as you move right. At this stage, it is possible to transform
between the amplitude representation and frequency representation by performing DCT / inverse
DCT stages quite freely, because the DCT stage is the precursor to the lossy compression that
will take place and is not the source of it by itself.
What the DCT does do is prepare the co-efficients for lossy compression by presenting what
were discreet pixel amplitude values as a series of frequency co-effieicnts that describe the 8x8
block in ascending frequency component order. This is important because lossy compression will
be performed by discriminating on a frequency basis, so that high frequency (and therefore high
detail) components of the block being processed can be eliminated if necessary to meet the video
bitrate constraints that have been set.
Following DCT, the 8x8 pixel values that formerly represented the amplitude of each pixel are
transformed into 8x8 co-efficients that represent the frequency of the entire block.
09/02/2015 14
Confidential
Horizontal Coeficients
The aim of the lossy compression process to favour frequency components that the eye is
sensitive too, and therefore will be missed, and eliminate the higher frequencies that are of lesser
significance objectively. In transforming the block data into the frequency domain, the DCT has
presented the data in an ideal way to process in this way. The first step is to multiply them with a
Quantisation Scale (Q scale) factor. The second step is to divide the values by a pre-defined
quantisation matrix which defines a set value for each frequency components that it will be
divided by. The result will also be rounded to a whole number.
The Q scale factor enables the degree of quantisation to be varied, and will be set by the rate
control loop. It is a global value. The quantisation matrix enables the divisor of each DCT
frequency co-efficient to be set unequally, to ‘weight’ the co-efficients in terms of importance.
Furthermore, the quantisation matrix can be defined with the objective of reducing as many
higher frequency components as possible to zero to provide efficient and targeted lossy
compression.
These small values are efficient to transmit. They need to be sent to the receiver where they will
be re-scaled using the same quantisation matrix and passed through an inverse DCT stage to
09/02/2015 15
Confidential
recover what will hopefully be a good objective approximation of the original pixel vales, with
the zero values being the result of the ‘lossy compression’ stage. It is the zero values that are the
contributors of errors.
For MPEG4, 4x4 is the ‘core’ transform. There is also a 4x4 and 4x2 option (using Hadamard
transform) and 8x8 in high profile only.
The ScalingMatrix parameter specifies the scaling matrix used, with 1 being optimised for use
with CABAC entropy coding. The default is 1
The Intra DC precision parameter specifies the bit depth for DC co-efficients and therefore sets
the degree of precision. It has 5 possible settings. 0=8 bits. 1=9 bits. 2 =10 bits. 3=11 bits.
4=auto, and is default. The use of 11 bits is not supported in MPEG2 mode. The use of AUTO is
strongly recommended to prevent banding or blockyness on some types of video content.
The QScale type parameter sets the Qscale values used as DCT coefficient multipliers. Two
options are available; 0= linear (and sets a linear scale from 1 to 32) and 1 = non linear (which is
also default and sets the scale from 0.5 to 56 to provide greater granularity). The default (non
linear) should always be used and will achieve noticeably improved performance when the
encoder is being pushed to its limits (requirement for high resolutions at very low bitrates).
Sending these compressed values involves further stages of compression. This is lossless data
compression (entropy coding).
CBR mode implements a feedback loop between the DCT and the output buffer (CPB or
‘constant picture buffer’) to keep the output bitrate constant. The feedback loop is generally
known as the ‘rate control’ mechanism.
Rate Control
The video bitrate at the output of the DCT is not constant, and will vary with content and picture
coding type. The rate control mechanism works to maintain the CPB buffer occupancy at a
consistent level (within defined limits) by adjusting the degree of quantisation performed by the
DCT. The CPB buffer characteristics are important to ensure interoperability with the decode
buffer implemented within MPEG compliant set top boxes. The standard buffer model incurs
09/02/2015 16
Confidential
appreciable delay and was specified with DTH applications in mind. It is large enough to
accommodate peaks in incoming bitrate associated with events such as I frame insertion without
having to spontaneously re-quantise to compensate for peaks which could result in I frame
pulsing. The CBP buffer size is pre-set on the Appear TV GUI to provide optimum VQ and set
top box interoperability and changing it (to try and reduce delay for example) should not be
attempted without consulting the Appear TV support team first to request assistance.
VBR mode changes the mode of operation to allow the output bitrate to vary; Outgoing video
bitrate is no longer maintained constant. In this mode, the output bitrate changes in response to
image complexity, so that the incoming material is always encoded at the correct bitrate to meet
a pre-set threshold for VQ. This is ideal for recording encoded files to disk, since video quality is
maintained to a consistently high standard whatever the difficulty of the content, and the file size
is kept minimal because there is no wastage and only the bitrate required to encode a picture to
the defined quality threshold is used. With capped mode, you can place a video bitrate cap on the
upper video bitrate that is allowed.
The BitRateFormat parameter specifies if the video bitrate sets the video elementary stream
bitrate, or the video transport stream rate which is inclusive of MPEG TS packetisation
overheads. The default (in common with most manufacturers) is elementary stream rate.
The MaxPicSize parameter can limit the maximum picture data size to comply with limitations
in some legacy devices. An example is the Motorola Cherry picker with some legacy firmware
versions. The default setting is 0 (no artificial limit) but it is possible to define a limit in bytes up
to a maximum of 4,294,967,295 bytes. There should be no reason to use this parameter except in
exceptional circumstances which should always be guided by advice from the Appear TV
engineering team.
09/02/2015 17
Confidential
In Appear TV encoders, the adaptive QP toolset covers boarder processing, saliency detection
(detects prominent features), consistency enhancer to enhance boarders / edges, eye tracker,
texture classifier / skin tone identification, low light detection, grass detection, logo and banner
detection and noise / mosquito and ringing rejection. For best VQ, it is recommended that
adaptive QP is enabled except when performing automated quality measurements.
i. Quantisation Table
This parameter applies to MPEG2 only. It will affect H.264 operation adversely if set to anything
other than 0 (default) in H.264 mode.
With MPEG2, it is possible to trade intra predicted picture artefacts between either
This is determined by adjusting the quantisation steps within the DCT using this parameter.
Possible values range from 0 (sharp macroblock edges, default) to 4 (smoothed macroblock
edges with prevalence for ringing / mosquito artefacts.
This is an advanced, internal setting that will require careful adjustment if not left at default.
09/02/2015 18
Confidential
predict in the forward direction, from the IDR frame onwards. Marking the I frame as an IDR is
therefore the correct option whenever there really is no correlation between the pictures before
and after the scene transition has occurred.
IDR insertion is not possible for OTT applications where IDR’s signal the chunk boundary
points. IDR insertion may also not be feasible in applications where splicers are being used.
i. IDR Frequency
The IDR Frequency parameter enables rules to be set to define repeating IDR points. In AVC
mode, it has four settings (0 to 4). Mode 0 (default for all broadcast applications) declares no I
frames as IDR. Mode 1 declares every I frame as an IDR (compromising efficiency but
providing very regular splice points). Mode 2 defines every 2nd I frame as an IDR; mode 3 every
3rd I frame and mode 4 every 4th I frame. For applications requiring regular access points, this
enables the frequency v compression efficiency trade-off to be set according to requirements.
In MPEG 2 mode, the setting has a ‘switch’ function with only mode 0 or 1 being valid. Mode 0
sets OPEN GOP mode and mode 1 sets CLOSED GOP mode.
5 Entropy Coding
i. CAVLC / CABAC
Quantised transform co-efficients from the DCT (which have been ‘zig-zag scanned’) can be
further compressed using a lossless data compression stage which is undone in the decoder to
recover the original co-efficients. Lossless compression works by recognizing patterns that occur
more commonly, and represents them with shorter code words than patters occurring less
frequently. This is known as Entropy coding, and its overall contribution to the encoding gain of
the overall system is very significant.
MPEG2 uses Huffman coding, which uses a fixed coding table. The first H.264 option, CAVLC,
is very similar but can select one of four fixed VLC tables depending upon the data to be
encoded and the ratio of trailing 1’s and 0’s in the samples. The decoder needs to know what the
shortened variable codes represent, in terms of the true original word length and one of six Exp-
Goulomb codes is defined in the standard to enable this. The additional flexibility provided with
CAVLC provides a big step up in efficiency compared with Huffman coding.
The second H.264 option is even more complex and is CAVLC. This has the ability to
dynamically adapt to different content, but can result in compression gains of between 5 to 10 %
depending on content.
09/02/2015 19
Confidential
The CABAC parameter sets either CAVLC mode (0) or CABAC mode (1, which is default).
6 Statistical Multiplexing
Statistical multiplexing takes the same approach towards fixing the video quality of a service to a
consistent level and also achieves this by making the video bitrate of the service the variable
factor. However, rather than treat each service in isolation, statistical multiplexing uses the total
bandwidth allocated to multiple video services as a resource pool. For example, imagine a DTT
multiplex providing a total capacity for video of 22Mb/s, shared between 12xSD AVC services.
The system will treat the 22Mb/s as a ‘pool’ (the statmux pool) to be shared between all services.
How well the bitrate actually gets distributed between the services, so that best use is made of it,
depends upon the implementation of the statistical multiplexing system.
Encoder 1
Encoder 2
Encoder 3
Encoder 4
Encoder 5
Etc.
09/02/2015 20
Confidential
Appear TV’s second generation statistical multiplexing system analyses the incoming video at
10ms intervals to determine the instantaneous encoding complexity. This is achieved by
equipping each encoder with a pre-encode stage that exists only to perform this assessment. The
pre-encode stage performs an intra-frame encode ahead of the main encoder, which has its input
artificially delayed by the look ahead buffer.
Stat Mux
Stat Mux Metrics Controller
Video Input
Encoder 2
Look ahead buffer Main Encoder O/P
Encoder 3
Encoder 4
Encoder 5
The stat mux controller receives bitrate requests from all of the encoders and transcoders
participating in the stat mux group. Appear TV equalises the timing / delay between different
module types, meaning that encoders and transcoders working in either HD or SD mode and
with MPEG2 or MPEG4 outputs can all be mixed within a stat mux group.
The role of the stat mux controller is to decide how the total available bitrate (statmux pool) is
divided to achieve optimum results. The following is taken into account when determining the
actual bitrate that will be allocated for each encoder during the 10ms time segment;
The metrics from the pre-encode stage which indicate the bitrate required to encode
optimally. This is the encoder’s estimation of the bitrate needed to enable it to encode the
content to a high degree of VQ. The accuracy of these metrics is critically important for
optimum operation of the system.
The quality weighting that the customer has set for the service, which provides either a
positive or a negative bias for that service. If positive, it highlights the service as being
important and instructs the arbitration process to favour it compared to others with lower
priority weighting. This ensures the bitrate requested is usually allowed, resulting in
higher and more consistent VQ. If negative, it lowers the priority of the service so that if
necessary, the bitrate allocation for the service can be considerably lower than requested.
This can make the variation in VQ for the service less well controlled since the optimum
video bitrate may, which may fit a low priority
09/02/2015 21
Confidential
The B (min) parameter. The service will never be allocated less than video bitrate defined
in B (min)
The B (max) parameter. The service will never be allocated more than video bitrate
defined in B (max)
The role of B (min) is largely historical and dates back to the early days of statistical
multiplexing. Initially, there was no look ahead and the bitrate estimation and bitrate
planning stages were rudimentary by today’s standards. It was possible for systems to set
very low video bitrates during easy content, and then not be able to respond fast enough to
capture spontaneous events such as pans and scene changes which require a significant
increase in bitrate to encode without artefacts. Additionally, many systems only sampled
every few tens of frames. The B (min) parameter was provided to compensate for the lack of
accuracy and responsiveness in these early systems by preventing the bitrate of premium
channels from being set too low.
In contrast, the dedicated pre-encoding stage of Appear TV’s new statistical multiplexing
system is able to gauge real bitrate requirements very accurately. The delay before the real
encode takes place provides time for a proper bitrate arbitration process to take place, and
provides advance visibility of scene cuts and pans in particular. Very often, the encoded
picture type will have to change as well (for example, I frame insertion at a scene change).
The Appear TV system combines ultra-rapid 10ms sampling with the capability to allow the
entire process to be planned in advance, well before the primary encoder encodes the section
of content. Although Appear TV has retained the B (min) setting, in reality it is redundant
and can be safely set to its minimum value of 250kb/s.
The B (max) setting limits the maximum video bitrate that a service will be allocated and can
be useful in the following scenarios;
The diagram below completes the stat mux system overview from a top level perspective.
09/02/2015 22
Confidential
Other
Encoders
Encoder 1 Pre Encode stage
Video Input
Encoder 2
Look ahead buffer Main Encoder O/P
Encoder 3
Encoder 4
Encoder 5
Although the Appear TV statistical multiplexing system is fully automatic, its role is to optimise
bandwidth distribution and getting the most out of it still requires taking great care in setting the
MPEG configuration of the encoders optimally. Additionally, it can also place a greater
emphasis on using the pre-processing tools correctly. This is because statistical multiplexing is
exploiting the variation that exists between content. To be truly effective, a stat mux group
should contain a good mixture of content types and ideally have at least 8 SD services or four
HD services within the stat mux group. The statistical variation within a service will be nullified
if it appears to have difficult content all of the time, which is what the presence of random noise
will do. Poor quality sources, or those derived from analogue feeds, must therefore be assessed
for such artefacts at the encoder source (and not at the output of the encoder) with the MCTF and
pre-deblocking filters used fairly aggressively, especially if low frequency analogue noise is
seen. At typical DTH bitrates, even harsh unfiltered noise will usually be quantised out by the
DCT during compression, so the output of the encoder will appear clean. However, it will still
have caused the statistical multiplexing system to over-allocate bitrate for the service, and the
noise and sharp edges of previous MPEG2 macroblocks will jointly cause sub-optimal MPEG
encoding mode decisions to be made within the encoder. The result will be sub-optimal encoding
efficiency for the service, with bandwidth stealing from the rest of the stat mux group. Applying
the filtering tools provided can stop this from happening.
Another method for verifying the performance of the stat mux group is to use the graphical
reporting tools provided by the system. This is able to show the instantaneous and long term
performance of services in terms of bitrate variation, QP, and longer term historical bitrate
utilisation for a service.
09/02/2015 23
Confidential
Various options exist for setting the output video format. The options will be automatically
filtered by Appear TV, so that only sensible options that are supported in the mode being used
are displayed to users. The full suite is controlled by the following parameters.
WIDTH controls the desired width of the outgoing video. Options are 1920, 1280 and 720.
HEIGHT controls the height. Options are 1080, 720, 480 and 576 lines.
Frame Rate specifies the outgoing frame rate. Options are 23.967, 24, 25, 29.97, 30, 50, 59.94,
60, 12, 15, 10, 9.9, 11.99, 12.5, 14.99.
The encoder cannot convert from fractional to integer frame rates and vice versa.
VIDEO FORMAT specifies if the outgoing video is INTERLACED or PROGRESSIVE
ASPECT RATIO specifies square or widescreen. With 4x3, 16x9, 14x9 as possible options.
DETELECINE specifies if 3-2 pulldown is enabled for the outgoing video. This is either enabled
or disabled.
MADEINT enables advanced 8 field motion-adaptive pixel-adaptive low-angle de-interlacing.
De-interlacing to provide a progressive output has become a complex subject that is worthy of
some explanation. In general, there are two types of method used. INTRA FIELD interpolation is
used when motion is present, and INTER FIELD is used when motion is not present.
Considering INTRA FIELD methods first, the simplest type is line repetition. By definition,
INTRA FIELD requires only the current field to reconstruct the missing lines needed to convert
from interlaced to progressive. Simply repeating lines is crude, and is unable to exploit any
temporal redundancy that exists between successive frames, but it has been used because of its
simplicity. A slightly more complex method is linear interpolation, where missing lines are
created by averaging the lines above and below. Although this performs better, both methods are
poor and suffer from excessive aliasing and jitter.
INTER FIELD methods use other fields to provide de-interlacing. This method can de-interlace
stationary objects perfectly. For example, if an object is stationary, then simply combining the
odd and even fields (field repetition) will provide perfect results. If the object is moving, then
this will not be the case because the movement will cause the odd and even fields to no longer be
identical and representative of a single de-interlaced picture. The result will be severe blurring
around the moving object.
Bi-linear field interpolation uses the average of previous and future lines to de-interlace the
current missing line. This method can work very well for stationary objects, but will exacerbate
any motion problems because the motion over an even longer time period is taken into account.
Any high performance de-interlacer must use a combination of INTRA and INTER field
methods. It is clear that INTER FIELD can only be used with pixels that have no motion, and
this needs to be determined by comparing past and future fields. For best performance, the
comparison must be done only between identical pixels on the same line. For various reasons, if
the comparison is made between just two fields, there is a tendency to falsely declare movement
when there actually is none. If this is increased to three fields, movement can be missed. Four
fields provides a good compromise between complexity and accuracy. Since the Appear TV de-
interlacer is performance focussed, it actually uses eight fields of comparison.
Pixels where motion is detected use an advanced INTRA FIELD method which generates the
non-stationary pixels from adjacent pixels within the same field, but with a boundary edge
detection algorithm that ensures the new pixel values are correlated to real objects.
MADEINT is enabled by default, and should not be disabled. Doing so will de-activate this
advanced processing and will result in poorer de-interlacing performance.
HORSHARPNESS controls a filtering stage that is applied when changing the horizontal
resolution. This is to better conceal the artefacts that can result from resolution changes. The
filter characteristics can be set between -10 and +4, with 0 being default. High negative values
09/02/2015 24
Confidential
increases aliasing and artefacts in the vertical direction but provides increased sharpness, while
more positive values decreases artefacts but also decreases sharpness in the vertical direction.
VERSHARPNESS performs exactly the same function as HORSHARPNESS and has exactly
the same control range, but operates in the vertical direction.
Pre de- A pre-filter that For high quality sources Off. Never have
blocking filter removes high- can be set to off or 1. For this feature
frequency artefacts MPEG2 DTH sources, set enabled when
such as macroblock to at least 2. When performing
edges. Particularly performing dramatic objective tests.
useful for MPEG2 decreases in video
sources that have bitrate, can be set
been aggressively aggressively to provide
compressed for some high frequency
DTH. Overuse will softening before the
09/02/2015 25
Confidential
MCTF (Motion The primary noise It is difficult to assess the MCTF should be
Compensated reduction tool. This degree of noise in set to OFF. This
Temporal pre-filter is adaptive content unless you can means completely
Filtering) and designed to view the source. For most off, which currently
remove random turnaround, its best to requires a
noise whilst leaving assume it is there and set command to
wanted content. Use moderate MCTF filtering activate MCTF
too aggressively and (setting 2) especially if bypass via Telnet.
it will remove fine you are stat muxing the
details from your source. For clean sources,
image. you can set it to 1 or off
(but OFF is a mimimum
filter setting and still
applies MCTF at its
lowest setting.
GOP structure Defines the extent Set to auto unless the Set to Auto
to which temporal customer has special
09/02/2015 26
Confidential
GOP length This sets the For MPEG2 24 is a good Same settings as
maximum distance place to start. For AVC, subjective
between I frames. set to 48. Use up to 60
The GOP planner for non-denanding
can and will often channels. Appear TV GOP
reset the GOP in lengths are typically set
response to the above average to achieve
action of scene best performance.
change detection
and other tools.
Some customers
genuinely need to
restrict maximum
GOP lengths to limit
channel change time
(IPTV customers
mainly). Some will
ask for GOP length
to be set the same
as another encoder
to make it a ‘fair
test’. This is not
correct and is an
invalid argument.
See main narrative
for the reasons why.
AVC high Always use high Choose HIGH profile and Choose HIGH
profile profile unless the check 8x8 transform profile and check
customer prohibits 8x8 transform
this for STB
compatibility
reasons. It enables
the 8x8 transform
which you need to
enable separately.
09/02/2015 27
Confidential
09/02/2015 28
Confidential
Notes:
Field Frame: This setting is now configured automatically according to the following rules;
Progressive content. Always Frame.
AVC encoding. PAFF
MPEG2 encoding. MBAFF. (PAFF can also be used with good results but can accept
very old MPEG2 STB’s with low decoder RAM. If this happens, motion areas will
exhibit motion judder.
09/02/2015 29