0% found this document useful (0 votes)
63 views

Transitioning Broadcast To Cloud

This document discusses transitioning broadcast systems to cloud-based architectures. It analyzes the differences between traditional on-premise broadcast workflows and cloud-based online video delivery workflows. Some key differences noted are in ingest protocols, media formats, signal processing chains, codec constraints, metadata, transport formats, and implementation of operations like ad insertion. The document suggests improvements needed in cloud-based ingest, processing, and transcoding stacks. It also identifies cloud-based playout as an important technology for convergence. Finally, it proposes some hybrid architectures that combine on-premise and cloud-based processing as likely models for the future.

Uploaded by

Claudio Araujo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Transitioning Broadcast To Cloud

This document discusses transitioning broadcast systems to cloud-based architectures. It analyzes the differences between traditional on-premise broadcast workflows and cloud-based online video delivery workflows. Some key differences noted are in ingest protocols, media formats, signal processing chains, codec constraints, metadata, transport formats, and implementation of operations like ad insertion. The document suggests improvements needed in cloud-based ingest, processing, and transcoding stacks. It also identifies cloud-based playout as an important technology for convergence. Finally, it proposes some hybrid architectures that combine on-premise and cloud-based processing as likely models for the future.

Uploaded by

Claudio Araujo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

applied

sciences
Article
Transitioning Broadcast to Cloud
Yuriy Reznik, Jordi Cenzano and Bo Zhang *

Brightcove Inc., 290 Congress Street, Boston, MA 02210, USA; [email protected] (Y.R.);
[email protected] (J.C.)
* Correspondence: [email protected]; Tel.: +1-888-882-1880

Abstract: We analyze the differences between on-premise broadcast and cloud-based online video de-
livery workflows and identify technologies needed for bridging the gaps between them. Specifically,
we note differences in ingest protocols, media formats, signal-processing chains, codec constraints,
metadata, transport formats, delays, and means for implementing operations such as ad-splicing,
redundancy and synchronization. To bridge the gaps, we suggest specific improvements in cloud
ingest, signal processing, and transcoding stacks. Cloud playout is also identified as critically needed
technology for convergence. Finally, based on all such considerations, we offer sketches of several
possible hybrid architectures, with different degrees of offloading of processing in cloud, that are
likely to emerge in the future.

Keywords: broadcast; video encoding and streaming; cloud-based services; cloud playout

1. Introduction
Terrestrial broadcast TV has been historically the first and still broadly used technology
for delivery of visual information to the masses. Cable and DHT (direct-to-home) satellite
TV technologies came next, as highly successful evolutions and extensions of the broadcast
TV model [1,2].
 Yet, broadcast has some limits. For instance, in its most basic form, it only enables

linear delivery. It also provides direct reach to only one category of devices: TV sets. To
Citation: Reznik, Y.; Cenzano, J.; reach other devices, such as mobiles, tablets, PCs, game consoles, etc., the most practical
Zhang, B. Transitioning Broadcast to option currently available is to send streams Over the Top (OTT). The OTT model utilizes
Cloud. Appl. Sci. 2021, 11, 503. IP-connections that many of such devices already have, and internet streaming as a de-
https://doi.org/10.3390/app11020503 livery mechanism [3–7]. The use of OTT/streaming also makes it possible to implement
interactive, non-linear, time-shifted TV, or DVR types of services.
Received: 13 November 2020
Considering all such benefits and conveniences, many companies in the broadcast
Accepted: 28 December 2020
ecosystem are now increasingly adding OTT services, complementing their traditional (e.g.,
Published: 6 January 2021
terrestrial, cable, satellite) services or distribution models [8–11]. At a broader scale, we
must also recognize new standards and industry initiatives such as HbbTV [12], as well as
Publisher’s Note: MDPI stays neu-
tral with regard to jurisdictional clai-
ATSC 3.0 [13,14], which are further blurring the boundaries between traditional broadcast
ms in published maps and institutio-
and OTT.
nal affiliations.
In other words, we are now living in an era where hybrid broadcast + OTT distribution
becomes a norm, and this brings us to a question of how such hybrid systems can be
deployed and operated most efficiently?
At present time, the two extreme choices are:
Copyright: © 2021 by the authors. Li-
• on-prem: everything, including playout systems, encoders, multiplexers, servers, and
censee MDPI, Basel, Switzerland.
other equipment for both broadcast and OTT distribution is installed and operated on
This article is an open access article
premises, and
distributed under the terms and con-
• cloud-based: almost everything is turned into software-based solutions, and operated
ditions of the Creative Commons At-
tribution (CC BY) license (https://
using infrastructure of cloud service providers, such as AWS, GCP, Azure, etc.
creativecommons.org/licenses/by/ On-prem model is indeed well-known. This is how all traditional broadcast systems
4.0/). have always been built and operated. Cloud-based approach is a more recent development.

Appl. Sci. 2021, 11, 503. https://doi.org/10.3390/app11020503 https://www.mdpi.com/journal/applsci


Appl. Sci. 2021, 11, 503 2 of 23

It requires considerably different (modular, software only) implementation of the system,


but in the end, it brings a number of significant advantages: it minimizes investments
in hardware, allows pay-as-you-go operation, simplifies management, upgrades, makes
whole design more flexible and future-proof, etc. [15–17].
Furthermore, the use of cloud has already been proven to be highly scalable, reliable,
and cost-effective for implementing OTT/streaming delivery. Today, cloud already powers
mass-scale online video services, such as YouTube and Netflix, as well as online video
platforms (OVPs)—Brightcove, Kaltura, thePlatform, etc. [9–11]. Besides enabling basic
streaming functionality, OVPs also provide means for content management, ad-insertions,
analytics, client SDKs, and even automatic generators of apps for all major platforms. They
basically provide turn-key solutions for OTT services of all sorts.
However, while transition of OTT services to cloud is no longer a challenge, the
offload of traditionally on-prem functions of broadcast systems, such as ingest, content
management, master control/playout, distribution encoding, etc., is a topic that we believe
deserves additional discussion. As we will show in this paper, there are many important
differences in ways video processing is currently done in cloud vs. on-prem broadcast, as
well as technologies that may be needed to bridge the gaps between them.

2. Processing Chains in Broadcast and Cloud-Based Online Video Systems


In this section, we will study commonalities and differences between processing
chains in broadcast and online video systems. We focus on functions, formats, means for
implementation of certain operations, and overall system characteristics such as reliability,
processing granularity, and delays.
The idealized chain of processing in broadcast distribution is shown in Figure 1, and
the chain of processing in online video system is shown in Figure 2. Both are indeed
conceptual and high-level.

Figure 1. Conceptual diagram of processing operations in broadcast distribution.


Appl. Sci. 2021, 11, 503 3 of 23

Figure 2. Conceptual diagram of processing operations in cloud-based online video platform.

2.1. Main Functions and Distribution Flows


In broadcast, everything is based around processing and delivery of a set of live
streams, visible to end users as “TV channels”. As shown in Figure 1, the selection or
scheduling of input feeds that go in each channel is done by master control or playout
systems. Such systems also insert graphics (e.g., channel logos or “bugs”), slots for ads,
captions, metadata, etc.
After playout, all channels are subsequently encoded and passed to a multiplexer,
which combines them in a multi-program transport streams (aka TS or MPEG-2 TS [17])
intended for distribution. In addition to channel’s media content, the final multiplex TS
also carries program and system information (PSIP [18]), SCTE-35 ad markers [19,20], and
other descriptors required for broadcast distribution [21].
As further shown in Figure 1, the distribution chain in broadcast systems may have
multiple tiers—from main network center to local stations and also multichannel video
programming distributors (MVPDs), such as cable or satellite TV companies. At each stage,
media streams corresponding to each channel can be extracted, modified (e.g., by adding
local content or ads), re-multiplexed into a new set of channels, with new program tables
and other metadata inserted, and then again sent down to distribution or next headend.
In other words, we see that broadcast systems are effectively responsible for both
formation of the content, turning it into to a set of channels, and then distribution of content
to the end users.
In contrast, online video platforms are used primarily for distribution. They assume
that content is already fully formed. As shown in Figure 2, live inputs are typically turned
into live output streams, and pre-recorded media files are typically published as VOD
assets. They transcode and repackage inputs into HLS [5], DASH [6], or MSS [7] streaming
formats, and then pass them to Content Delivery Networks (CDNs) for propagation and
delivery to end user devices (streaming clients). In some cases, OVPs may also be used
for live to VOD conversions and VOD content management, but not for formation of
live streams.
Another important difference between online video systems and broadcast is the
availability of the feedback chain. In Figure 2 it is shown by contour arrows connecting
players and CDNs to the analytics module within OVP. This module collects playback
and CDN usage statistics, and turns them into metrics used for ad-monetization/billing,
operations-control, and optimization purposes [22,23].

2.2. Contribution and Ingest


In broadcast, live input streams (or “feeds”) originate from remote or field production.
They are normally encoded by a contribution encoder, and delivered to broadcast center
over a certain physical link (dedicated IP, satellite, 4G, etc.). The encoding is always done
using one of the standard TV formats (e.g., 480i SD or 1080i HD), and with a number of
codec- and TS-level constraints applied, making such streams compatible with broadcast
Appl. Sci. 2021, 11, 503 4 of 23

systems [24–32]. When streams are sent over IP, real-time UDP (User Datagram Protocol)-
based delivery protocols are normally used. Examples of such protocols include RTP [33],
SMPTE 2022-1 [34], SMPTE 2022-2 [35], Zixi [36], etc.
Pre-recorded content usually comes in form of files, produced by studio encoders.
Again, only standard TV/broadcast video formats are used, and specific codec- and
container-level restrictions are applied (see e.g., [37]). Moreover, in most cases, the contri-
bution (or so-called “mezzanine”) encodings are done at rates that are considerably higher
than rates used for final distribution. This allows broadcast systems to start with “cleaner”
versions of the content.
In case of online video platforms, input content generally comes from a much broader
and more diverse set of sources—from professional production studios and broadcast
workflows to user-generated content. Consequently, the bitrates, formats, and quality of
such streams can also vary greatly. This forces OVPs to be highly versatile, robust, and
tolerant on the ingest end.
The quality of links used to deliver content to OVPs may also vary greatly. From
dedicated connections to datacenters, to public Internet over some local ISPs. UDP may or
may not be available.
In such a context, the live ingest protocol that become most commonly used, remark-
ably enough, is RTMP (Real-Time Messaging Protocol) [38]. This is an old, Flash-era proto-
col, with many known limitations, but it works over TCP (Transmission Control Protocol),
and remains a popular choice for live to cloud ingest. Other protocols for live ingest include
SRT (Secure, Reliable Transport) [39] and RIST (Reliable Internet Stream Transport) [40].

2.3. Video Formats and Elementary Streams


We next look at characteristics of video formats used in both broadcast and online
video systems. The summary of this comparison is provided in Table 1. For simplicity, in
this comparison, we only consider SD and HD TV systems.
As shown in this table, SD systems almost always use interlaced, bottom-field first (bff)
sampling format [26,30,41]. If the source is progressive (e.g., film) it is typically converted to
interlaced form by so-called telecine process [1,2]. HD systems also use interlaced formats,
but with top-field first (tff) order. HD systems can also carry progressive formats (e.g., 720p).
In contrast, in internet streaming, only progressive formats are normally used [42–44].
In terms of color parameters and Sample Aspect Ratios (SARs), streaming video
formats are well aligned with HDTV systems. On the other hand, streaming of SD content
requires both color- and SAR-type conversions.
The primary reason why streaming systems are more restrictive is compatibility with
wide range of possible receiving devices—mobiles, tablets, PCs, etc. [42]. In many such
devices, graphics stacks are simply not designed to properly render interlace, or colors
other than sRGB/ITU-R BT.709, or pixels that are non-square. This forces color-, temporal
sampling type-, and SAR-type conversions.
In Table 2, we further analyze characteristics of encoded video streams (or elementary
streams) used for broadcast and streaming distribution. Again, consideration is limited to
SD and HD systems.
Appl. Sci. 2021, 11, 503 5 of 23

Table 1. Comparison of video formats used in broadcast and Internet streaming.

Format Characteristic Broadcast Systems Online Platforms/Streaming


SD: interlaced (bff), telecine
Temporal sampling HD: progressive, interlaced progressive only
(tff), telecine
24,000/1001, 24, 25,
same as source, typically
Framerates [Hz] 30,000/1001, 30, 50,
capped at 30 Hz or 60 Hz
60,000/10,001, 60
SD: 4:3, 16:9 same as indicated by the
Display Aspect Ratio (DAR)
HD: 16:9 source
SD: 1:1, 12:11, 10:11, 16:11,
40:33, 24:11, 32:11, 80:33, 18:11,
1:1 or nearest rounding to 1:1
Sample Aspect Ratio (SAR) 15:11, 64:33, 160:99
SARs are usually preferred
HD: 1:1 (most common), 4:3
(1440 mode)
SD (NTSC-derived): 480i
same as source + down-scaled
SD (PAL, SECAM-derived):
Resolutions versions; additional
576i
restrictions may apply [45,46]
HD: 720p, 1080i
Chroma sampling 4:2:0, 4:2:2 4:2:0 only
SD (NTSC-derived):
SMPTE-C [47,48]
Primary colors SD (PAL, SECAM-derived): ITU-R BT.709/sRGB [51,52]
EBU [49,50]
HD: ITU-R BT.709 [51]
SD: ITU-R BT.601 [50]
Color matrix ITU-R BT.709
HD: ITU-R BT.709 [51]
SD (NTSC-derived): power
law 2.2
Transfer characteristic SD (PAL, SECAM-derived): ITU-R BT.709
power law 2.8
HD: ITU-R BT.709 [51]

First, we notice that the number of encoded streams is different. In broadcast, each
channel is encoded as a single stream. In streaming, each input is encoded into several
output streams with different resolutions and bitrates. This is needed to accommodate
adaptive bitrate (ABR) delivery.
There are also differences in codecs, encoding modes, and codec constraints. For exam-
ple, in broadcast, the use of constant bitrate (CBR) encoding is most common [24]. It forces
codec to operate at a certain target bitrate, matching the amount of channel bandwidth
allocated for a particular channel. The use of variable bitrate (VBR) encoding in broadcast
is rare and only allowed in so-called statistical multiplexing (or statmux) regime [59,60],
where the multiplexer is effectively driving dynamic bandwidth allocation across all chan-
nels in a way that the total sum of their bitrates remains constant. In streaming, there is
no need for CBR or statmux modes. All streams are typically VBR-encoded with some
additional constraints applied on decoder buffer size and maximum bitrate [44].
Appl. Sci. 2021, 11, 503 6 of 23

Table 2. Comparison of encoded video streams used in broadcast and Internet streaming.

Online Platforms/Web
Stream Characteristic Broadcast Systems
Streaming
multiple (usually 3–10) as
Number of outputs single needed to support ABR
delivery [4,42,43]
Preprocessing denoising, MCTF-type filters [24,53] rarely used
MPEG-4/AVC—most
deployments
Video codecs MPEG-2 [54], MPEG-4/AVC [55]
HEVC [56], AV1
[57]—special cases
based on target set of
fixed for each format. applicable
devices [42]
standards:
Codec profiles, levels guidelines: Apple HLS [44],
ATSC A/53 P4 [26], ATSC A/72 P1 [29],
ETSI TS 103 285 [46], CTA
ETSI TS 101 154 [30], SCTE 128 [33]
5001 [58]
GOP length 0.5 s 2–10 s
GOP type open, closed closed
Error resiliency
mandatory slicing of I/IDR pictures N/A
features
capped VBR
CBR, VBR (with statmux)
max. bitrate is typically
many additional constraints apply, see
capped to 1.1x–1.5x target
ATSC A/53 P4 [26], ATSC A/54 A [27],
Encoding modes bitrate;
ATSC A/72 P1 [29], ETSI TS 101
HRD buffer size is typically
154 [30],
limited by codec profile+
SCTE 43 [31], SCTE 128 [33]
level constraints;
VUI/HRD parameters required optional, usually omitted
VUI/colorimetry data required optional, usually included
VUI/aspect ratio required optional, usually included
required in some cases (e.g., in film
Picture timing SEI optional, usually omitted
mode)
Buffering period SEI optional optional, usually omitted
ADF/bar data/T.35 required and carried in video ES not used
optional, maybe carried
Closed captions required and carried in video ES
out-of-band

Significantly different are also GOP (Group of Pictures) lengths. In broadcast, GOPs
are typically 0.5 s, as required for channel switching. In streaming, GOPs can be 2–10 s
long, typically limited by lengths of segments used for delivery.
Broadcast streams also carry more metadata. They typically include relevant video
bitstream verifier (VBV) [54] or hypothetical reference decoder (HRD) parameters [55],
picture structure-, picture timing-, and colorimetry-related information [55]. They also
carry CEA 608/708 closed captions [61,62] and active format descriptor (AFD)/bar data
information [63–65]. For streaming most may be omitted.
Finally, there are also important differences in pre-processing. Broadcast encoders are
famous for the use of denoisers, MCTF-filters, and other pre-processing techniques applied
to make compression more efficient [24,53]. In streaming, the use of such techniques is only
beginning to emerge.
Appl. Sci. 2021, 11, 503 7 of 23

2.4. Distribution Formats


As mentioned earlier, in broadcast, distribution is always done using MPEG-2 trans-
port streams [17]. They carry audio and video elementary streams, program and system
information [18], SCTE-35 ad markers [19], and other metadata as prescribed by relevant
broadcast standards and guidelines [21,25,27,30]. TS in cable systems may also carry EBPs
and other cable-specific metadata.
In streaming, things are more diverse. There are several streaming formats and
standards currently in use. The most prevalent ones, as of time of writing, are:
1. HTTP Live Streaming (HLS) [5],
2. Dynamic Adaptive Streaming over HTTP (DASH) [6], and
3. Microsoft Smooth Streaming (MSS) [7].
There are also several different types of digital rights management (DRM) technologies.
The most commonly used ones are:
1. FairPlay [66],
2. PlayReady [67], and
3. Widevine Modular [68].
The support for these technologies varies across different categories of receiving
devices. Hence, in order to reach all devices, multiple streaming formats and combinations
of formats and DRM technologies must be supported. We show few common choices of
such combinations in Table 3.

Table 3. Combination of streaming formats and DRMs that can be used to reach different devices. Orange tick marks
indicate possible, but less commonly used choices.

Device HLS + HLS + DASH + DASH + MSS +


Players/Platforms HLS DASH
Category FairPlay Widevine Widevine PlayReady PlayReady
Chrome 3 3 7 3 3 7 7
Firefox 3 3 7 3 3 7 7
PCs/Browsers
IE, Edge 3 3 7 7 7 3 3
Safari 3 7 3 7 7 7 7
Android 3 3 7 7 3 3 3
Mobiles
iOS 3 7 3 7 7 7 7
Chrome-cast 3 3 7 7 3 3 3
Android TV 3 3 7 7 3 3 3
Set-top boxes Roku 3 3 7 7 3 3 3
Apple TV 3 7 3 7 7 7 7
Amazon Fire
3 3 7 7 3 3 3
TV
Samsung/Tenzen 3 3 7 7 3 3 3
LG/ webOS 3 3 7 3 7 7 7
Smart TVs
SmartTV
3 3 7 7 7 3 3
Alliance
Android TV 3 3 7 7 3 3 3
Game Consoles Xbox One/ 360 3 7 7 7 7 7 3

HLS, DASH, as well as MSS use multi-rate, segment-based representation of media


data. Original content is encoded at several different resolutions and bitrates, and then split
in segments, each starting at GOP boundary, such that they can be retrieved and decoded
separately. Along with media segments (either in TS [17], ISOBMFF [69], or CMAF [70]
formats) additional files (usually called manifests, playlists, or MPD (Media Presentation
Appl. Sci. 2021, 11, 503 8 of 23

Descriptor) files) are provided, describing locations and properties of all such segments.
Such manifests are used by players (or streaming clients) to retrieve and play the content.
The carriage of metadata in streaming systems is also more diverse. Some meta-
data can be embedded in media segments, while others may also be embedded in man-
ifests, carried as additional “sidecar” tracks of segment files, or as “event” messages [6],
or ID3 tags [71].
For example, in addition to “broadcast-style” carriage of CEA 608/708 [62,63] closed
captions in video elementary streams, it is also possible to carry captions as separate tracks
of WebVTT [72] or TTML [73] segments, or as IMSC1 timed text data [74] encapsulated
in XML or ISOBMFF formats [45]. The preferred way of carriage depends on player
capabilities, and may vary for different platforms.
The SCTE-35 information is allowed to be carried only at manifest level in HLS, by
either manifest of in-band events in MPEG-DASH, and only in-band in MSS [7,45,75].
To manage such broad diversity of formats, DRMs, and metadata representations,
online video platforms are commonly deploying so-called dynamic or just-in-time (JIT)
packaging mechanisms [23]. This is illustrated by an architecture shown in Figure 2. Instead
of proactively generating and storing all possible permutations of packaged streams on
origin server, such system stores all VOD content in a single intermediate representation,
that allows fast transmux to all desired formats. The origin server works as a cache/proxy,
invoking JIT transmuxers to produce each version of content only if there is a client device
that requests it. Such logic is commonly accompanied by dynamic manifest generation,
matching the choices of formats, DRMs, and metadata representation to capabilities of
devices requesting them. This reduces amount of cloud storage needed and also increases
the efficiency of use of CDNs when handing multiple content representations [23].
As easily observed, delivery formats and their support system in case of OTT/streaming
is completely different as compared to broadcast.

2.5. Ad Processing
In broadcast systems there are several types of ad slots, where some are local and
anticipated to be filled by local stations, and some are regional or global and are filled
earlier in the delivery chain.
In all cases, insertions are done by splicing ads in the distribution TS streams, aided by
SCTE-35 [19] ad markers. Such markers (or cue tones) are inserted earlier—at playout or
even production stages [20]. Ad splicers subsequently look for SCTE-35 markers embedded
in the TS, and then communicate with ad servers (normally over SCTE 30 [76]) to request
and receive ad content that needs to be inserted. Then they update TS streams to insert
segments of ad content. Such TS update is actually a fairly tedious process, involving
re-mux, regeneration of timestamps, etc. It also requires both main content and ads
to be consistently encoded: have the same exact codec parameters, HRD model, etc.
(see e.g., [77]).
In the online/streaming world, ad-related processing is quite different. The ads
are usually inserted/personalized on a per-stream/per-client basis, and the results of
viewers watching the ads (so-called ad-impressions) are also registered, collected, and
subsequently used for monetization. It is all fully automated and has to work in real-time
and at mass scale.
There are two models for ad-insertion that are used in streaming currently: server-side
ad-insertion (SSAI) and client-side ad insertion (CSAI) [45]. In case of CSAI, most ad-
related processing resides in a client. The cloud only needs to deliver content and SCTE-35
cue tones to the client. This scales well regardless of how cue tones are delivered—both
in-band, or in-manifest carriage methods are adequate.
In case of SSAI, most ad-related processing resides in cloud. To operate it at high
scale and reasonable costs, such processing has to be extremely simple. In this context,
in-manifest carriage of SCTE-35 cue tones is strongly preferred, as it allows ad-insertions
to be done by manipulation of manifests.
Appl. Sci. 2021, 11, 503 9 of 23

For example, in case of HLS, SCTE-35 markers in HLS playlists become substituted
with sections containing URLs to ad-content segments, with extra EXT-X-DISCONTINUITY
markers added at beginning and end of such sections [75]. In case of MPEG DASH, essen-
tially the same functionality is achieved by using multiple periods [45]. The discontinuity
markers or changing periods are effectively forcing clients to reset decoders when switch-
ing between program and ad content. This prevents possible HRD buffer overflows and
other decodability issues during playback.

2.6. Delay, Random Access, Fault Tolerance, and Signal Discontinuities


In broadcast systems, many essential signal processing operations—format conver-
sions, editing, switching, etc., are normally done with uncompressed video streams, carried
over by SDI (Serial Digital Interface) [78,79], or more recently by SMPTE (Society of Motion
Picture and Television Engineers) 2110 [80] over IP. This enables all such operations to be
performed with extremely short delays and with frame-level temporal precision. When re-
dundant processing chains are employed, the switching between them in SDI domain also
happens seamlessly. When streams are encoded, this increases random access granularity
to about 0.5 s, which is a typical GOP length in broadcast streams.
In streaming, as discussed earlier, the delivery of encoded videos to clients is done
using segments. Such segments cannot be made arbitrarily small due to CDN efficiency rea-
sons. In practice, 2-, 6-, and 10-s segments are most commonly used [44]. Same segmented
media representations are also commonly used internally in cloud-based processing work-
flows. This simplifies exchanges, avoids additional transcoding or transmuxing operations,
and reduces many basic stream-level operations to manifest updates. However, such
design also makes random access and delay capabilities in cloud video systems much
worse compared to broadcast.
What also makes things in cloud complicated, is a distributed and inhomogeneous
nature of processing resources. For instance, physical servers (or cloud “instances”) re-
sponsible for running video processing tasks may be located in different datacenters, have
somewhat different characteristics of hardware, non-synchronized local clocks, etc. The
network-induced delays in accessing such instances may also be different. Processing jobs
thus have to be scheduled dynamically and in anticipation of all such possible differences.
Moreover, occasionally cloud instances may become unstable, non-responsive, or termi-
nated by the cloud service provider. These are rare, but relatively “normal” events. Cloud
workflows must be designed to be “immune” to such events.
To illustrate how fault tolerance in cloud may be achieved, in Figure 3 we show an
example of a live streaming system with 2-way redundancy introduced. There are two
contribution feeds, marked as A and B respectively, and two processing chains, including
ingest, transcoding, and packaging stages. The outputs of packagers are DASH or HLS
media segments and manifests. This system also deploys two redundancy control modules.
These modules check if manifest and segments’ updates along route A or B are arriving at
expected times, and if so they just leave manifests unchanged. However, if they detect that
either of these processing chains become non-functional, they update manifest to include a
discontinuity marker, and then continue with segments arriving from an alternative path.
As easily observed, this system remains operational in case if either of the chains A or
B fails. It also stays operational in case of failure of one of the redundancy control units.
However, what is important to note, is that in case of a failure, the switch between videos
in chains A and B may not be perfectly time-aligned. The output stream will be decodable,
but it may exhibit time-shift discontinuity in media content at time of switch/fallback.
This comes as a result of operation in a distributed system with variable delays and
different processing resources that may be utilized along chains A and B. Naturally, with
some additional effort, the magnitude of such misalignment could be minimized, but that
will necessarily increase complexity and delay of the system. Perfect synchronization, in
principle, is one of the most challenging problems in the cloud.
Appl. Sci. 2021, 11, 503 10 of 23

Figure 3. An example of cloud-based redundant live streaming workflow.

The observed differences in delays, random access granularity, and also possible
discontinuities in signals coming from cloud-based workflows are among most critical
factors that must be considered in planning migration of signal processing functionality
in cloud.

3. Technologies Needed to Support Convergence


We next discuss measures that we believe must be taken to make cloud-based video
platforms more compatible with broadcast systems.

3.1. Cloud Contribution Links and Protocols


As mentioned earlier, cloud-based video platforms typically use RTMP [38] as a proto-
col for live ingest. It is an old, Flash-era protocol, performing internal demux and carriage
of audio and video data as separate streams sent over TCP [38]. It has no control over
latencies, alters PTS (Presentation Timestamp)/DTS (Decoding Timestamp) timestamps,
and makes it very difficult to carry SCTE-35 and other important metadata. In other words,
it is basically inadequate for integration with broadcast workflows.
Things can be done better. Nowadays most cloud systems can accept UDP traffic,
enabling the use of protocols such as RTP [33], RTP+SMPTE 2022-1 [34], RTP+SMPTE
2022-2 [35], Zixi [36], SRT (Secure, Reliable Transport) [39] or RIST (Reliable Internet Stream
Transport) [40]. Such protocols can carry unaltered transport streams from contribution
encoders or broadcast workflows to the cloud. Some of these protocols can also be used to
send information back from cloud to the broadcast systems.
What also ultimately helps with achieving reliable ingest (as well as all other exchanges
between broadcast on-prem systems and cloud) is the use of dedicated connections to cloud
datacenters. Such dedicated links can be established with most major cloud operators (see
e.g., AWS Direct Connect [81], or Azure ExpressRoute [82]).

3.2. Signal Processing


As also mentioned earlier, broadcast workflows may carry videos in progressive,
interlaced, or telecine formats. Field orders and pulldown patterns may also differ across
different sources. When such signals are then edited or combined together, this produces
output with changing temporal sampling type. If such videos are then encoded and
delivered as interlaced—they may still look ok on traditional TV sets. However, if one
receives such interlace-encoded signals in and then “naively” tries to convert them to
progressive, the results can be disastrous, e.g., a wrong assumption about field order can
make videos look jerky, lack of detection of 3:2 pulldowns can produce periodic garbled
frames, etc.
What also makes broadcast signals difficult to work with are accumulations of com-
pression and conversion artifacts. The further down the chain the signal is obtained, the
mode “noisier” it becomes.
To work with such complex signals, a proper processing stack is needed. One possible
architecture is illustrated in Figure 4. It includes a content analysis module, which performs
Appl. Sci. 2021, 11, 503 11 of 23

detection of segment cuts and identifies types of temporal sampling patterns and artifacts
in each segment. Such information, along with bitstream metadata is then passed to a chain
of filters, including artifact removal, temporal sampling conversion, color space conversion,
and scaling filters.

Figure 4. Decoding and format conversion chain needed to operate with cable/broadcast content.

The artifact removal filters, such as deblocking and denoising operations are among
most basic techniques needed to work with broadcast signals. Deblocking filters are needed,
e.g., in working with MPEG-2 encoded content, as MPEG-2 codec [54] does not have in-
loop filters, and passes all such artifacts to the output. In Figure 5, we show how such
artifacts look, along with cleaned output produced by our deblocking filer. Denoising is
also needed, especially when working with older (analog-converted) SD signals. Removal
of low-magnitude noise not only makes signal cleaner, but also makes the job of the
subsequent encoder easier, enabling it to achieve better quality or lower rate. We illustrate
this effect in Figure 6.
Temporal sampling conversion filter in Figure 4 performs conversions between pro-
gressive, telecine, and interlace formats, as well as temporal interpolation and resampling
operations. As discussed earlier, this filter is driven by information from the content analy-
sis module. This way, e.g., telecine segment can be properly converted back to progressive,
interlaced, properly deinterlaced, etc.

Figure 5. Example of MPEG2 blocking artifacts (a) and their removal by deblocking filter (b).
Appl. Sci. 2021, 11, 503 12 of 23

Figure 6. Example of using denoising filter for improving quality of final transcoded signal.

The quality of temporal sampling conversion operations is very critical. For example,
in Figure 7, we show the outputs of a basic deinterlacing filter (FFMPEG “yadif” filter [83])
and more advanced optical-flow-based algorithm [84]. It can be seen that a basic dein-
terlacer cannot maintain continuity of field lines under high motion. The effects of such
nature can be very prominent in sports broadcast content.

Figure 7. Comparison of outputs of basic (a) and advanced (b) deinterlacing filters.
Appl. Sci. 2021, 11, 503 13 of 23

The use of subsequent filters in Figure 5, such as color space conversion and scaling
filters, is driven by possible differences in color spaces, SARs, and resolutions in input and
output formats.
All such conversion operations need to be state of the art. Or at least they must be
comparable in quality with Teranex [85], Snell–Willcox/Grass Valley KudosPro [86], and
other standards converter boxes commonly used in post-production and broadcast.

3.3. Broadcast-Compliant Encoding


As discussed earlier, broadcast and streaming workflows use encoders that are sig-
nificantly different in their feature sets and tuning/stream constraints. Perhaps the most
extreme example of such differences is a statmux regime, where encoders are operating
under control of a multiplexer—an operating regime that has no parallel in streaming.
Consequently, if cloud workflows are intended to be used for producing streams
going back to broadcast distribution, the tuning or upgrade of existing cloud encoders
will be needed. For implementation of statmux, the multiplexer should also be natively
implemented in cloud and integrated with encoders.

3.4. Cloud Playout


The last, and most important technology that is needed to enable convergence is a
high-quality, cloud-based implementation of a playout system.
The design of such a system is a non-trivial task. As we discussed earlier, current
cloud-based video workflows typically use HLS/DASH-type segmented media formats,
causing them to operate with significant delays and random-access limitations. One cannot
build a broadcast-grade playout system based on such architecture. Even so-called ultra-
low-delay versions of HLS, DASH, or CMAF [87–89] are inadequate. For most master
control operations, such as previews, non-linear editing, switching, etc., frame-level access
accuracy is an essential requirement.
In Figure 8, we show one possible cloud playout system architecture that can be
suggested. To enable frame-level random access this system uses an internal Intra-only
mezzanine format. Such a format could use any image or video codec operating in Intra-
coding mode, along with PCM (Pulse-Code Modulation) audio, and index enabling access
to each frame. Both input live feeds and pre-recorded content are then converted in such
internal mezzanine format and placed on cloud storage. All subsequent operations, such
as previews, non-linear editing, as well as selection and mix of content producing channel
outputs, are done by accessing media in such mezzanine format. The final stream selections,
addition of logos, transitions, etc. are done by “stream processor” elements.

Figure 8. Example architecture of cloud playout system.


Appl. Sci. 2021, 11, 503 14 of 23

In addition to enabling frame-level-accurate processing operations, the use of Intra-


only mezzanine format also minimizes impacts of possible failures in the system. All signal
processing blocks shown in Figure 8 can be run in a redundant fashion, with checks and
switches added to ensure frame-accurate fault-tolerance.

4. Transitioning Broadcast to Cloud


In this section, we discuss possible ways how broadcast and cloud-based video work-
flows may evolve in the future. We offer three examples of possible hybrid architectures,
with different degrees of migration of processing to cloud.

4.1. Cloud-Based OTT Systems


In Figure 9, we show a hybrid architecture where cloud is used only to implement
OTT/streaming services. Everything else stays on prem. This is the easiest possible
example of integration.

Figure 9. Hybrid architecture with Over the Top (OTT)/streaming workflow offloaded to cloud.

To route streams to cloud, broadcast workflow produces contribution streams, one for
each channel, and then sends them over IP (e.g., using RTP+FEC or SRT) to cloud.
The cloud platform receives such streams, performs necessary conversions, transcodes
them, and distributes them over CDNs to clients. As shown in Figure 9, the cloud platform
may also be used to implement DVR or time-shift TV-type functionality, DRM protection,
SSAI, analytics, etc. All standard techniques for optimizing multi-format/multi-screen
streaming delivery (dynamic packaging, dynamic manifest generation, optimized profiles,
etc. [23]) can also be employed in this case.
To make such system work well, the main technologies/improvements that are
needed, include:
• reliable real-time ingest, e.g., using RTP+FEC, SRT, RIST, or Zixi-type of protocols
and/or a dedicated link, such as AWS Direct connect [81];
• improvements in signal processing stack—achieving artifact-free conversion of broad-
cast formats to ones used in OTT/streaming;
• improvements in metadata handling, including full pass-through of SCTE-35 and
compliant implementation of SSAI and CSAI functionality based on it.
Generally however, hybrid architectures of this kind have already been deployed and
proven to be effective in practice. Some of the above-mentioned close-gap technologies
have also been implemented. For instance, cloud ingest using RTP, SMPTE 2022-1, SMPTE
Appl. Sci. 2021, 11, 503 15 of 23

2022-2, or SRT, improvements in support of SCTE-35 for ad-insertions, and improvements


in encoding stack were among recent updates in Brightcove VideoCloud system [90].

4.2. Cloud-Based Ingest, Playout, and OTT Delivery System


In Figure 10, we show a more advanced architecture, in which not only OTT delivery,
but also ingest, media asset management, and playout are offloaded to cloud.

Figure 10. Hybrid architecture with ingest, playout and OTT/streaming workflow offloaded to cloud.

As it can be immediately grasped, the move of playout functionality to the cloud


also enables the use of the cloud platform for ingest. This is particularly helpful on a
global scale, as major cloud systems have data centers in all major regions, and so the
contribution link is only needed to deliver content to the nearest local datacenter. Media
asset management also naturally moves to cloud in this case.
With cloud-based playout, there will still be a need in a control room with monitors,
switch panels, etc., but it all will be reduced to a function of a thin client. All storage,
redundancy management, media processing, etc., will happen in cloud, significantly
reducing required investments in hardware and operating costs.
In the system depicted in Figure 10, the broadcast distribution encoding, multiplexer
and all subsequent operations stay on prem without any changes. This way, broadcasters
can operate all current ATSC equipment until ATSC 3.0 matures or there is some other
serious need to replace it. This is another hybrid cloud + on-prem architecture, which we
believe will make sense in practice.
To make such system work, in addition to all improvements mentioned earlier, what
further needed is:
• cloud-based broadcast-grade playout system,
• direct link connection to cloud ensuring low latency monitoring and real-time re-
sponses in operation of cloud playout system,
• improvements in cloud-run encoders, specifically those acting as contribution transcoders
sending broadcast-compliant streams back to the on-prem system.

4.3. Cloud-Based Broadcast and OTT Delivery System


Finally, in Figure 11, we show an architecture, where pretty much all signal processing,
transcoding, and multiplexing operations are moved to cloud.
Appl. Sci. 2021, 11, 503 16 of 23

Figure 11. Hybrid cloud-based broadcast + OTT system architecture.

In addition to running playout, this system also runs broadcast transcoders and the
multiplexer in cloud. The final multiplex TS is then sent back to on-prem distribution
system, but mostly only to be relayed to modulators and amplifiers or (via IP or ASI) to
next tier stations or MVPD headends.
To make this system work, in addition to all improvements mentioned earlier, what
further needed is:
• broadcast-grade transcoders and multiplexer should be natively implemented in cloud
• this includes implementation of statmux capability, generation and insertion of all related
program and system information, possible addition of datacast service capability, etc.
This architecture is indeed an extreme example, where pretty much all data- and
processing- intensive operations are migrated to cloud. It is most technically challenging
to implement, but is also most promising, as it enables best utilization of cloud and
appreciation of all benefits that it brings.

4.4. Comparison of the Proposed Systems


In Table 4 we present comparison of features of the proposed hybrid architectures
relative to the pure on-prem broadcast + OTT delivery solution.
The last row of this table offers estimates of relative costs of physical equipment
required by each system. Such estimates are based on the assumption that the OTT chain
delivers about the same number of channels and handles a comparable volume of delivered
content as the broadcast chain. It can be observed that even the simplest combined solution
(Architecture 1) could potentially eliminate up to 30–50% in equipment costs. With more
evolved architectures the investment in physical equipment can be reduced further down
to about 10–20%.
Appl. Sci. 2021, 11, 503 17 of 23

Table 4. Comparison of proposed architectures.

On Prem Architecture 1 Architecture 2 Architecture 3


Characteristic/Components
Solution (Section 4.1) (Section 4.2) (Section 4.3)
Ingest from remote and
On prem On prem Cloud Cloud
field production
Playout On prem On prem Cloud Cloud
Broadcast distribution
On prem On prem On prem Cloud
transcoding
Broadcast multiplexers On prem On prem On prem Cloud
QAMs (Quadrature
Amplitude Modulation) On prem On prem On prem On prem
and links to distribution
OTT transcoding On prem Cloud Cloud Cloud
OTT packaging, DRM, and
On prem Cloud Cloud Cloud
SSAI
OTT origin servers On prem Cloud Cloud Cloud
On prem for
Fault-tolerance and broadcast
On prem Cloud Cloud
redundancy support chain, cloud
for OTT
% of physical equipment
100% 50–75% 25–50% 10–20%
required

5. Scale and Performance of Cloud-Based Media Delivery Systems


In this section, we present a set of field results demonstrating scalability and perfor-
mance of today’s cloud-based OTT media delivery systems. As examples, we will take
data observed for several randomly chosen media distributors using the Brightcove OVP
system [90]. The data were captured during the period of 19 December–22 December 2020.
First, in Figure 12 we show the numbers of concurrent live streaming encoders run
by different users of cloud-based OVP. Such numbers can be very different, from single
encoders/channels for some users, to several hundreds for others. The overall number of
live transcoders run by this set of users of OVP was close to 1000. This compares well to
typical numbers of channels supported in today’s broadcast systems.

Figure 12. The numbers of concurrent live encoders run by different users of cloud-based OVP.

Figure 12 also shows that the numbers of concurrent live encoders in cloud-based
system may change over time. There were periods when the system scaled them up, and
there were periods when it scaled them down. Such fluctuations generally related to
either creation of new or termination of existing live streaming events, or management of
redundant streams or management of a pool of “available” encoders that were instantiated
speculatively to enable new streams to be launched and streamed right away, without any
additional delays.
Appl. Sci. 2021, 11, 503 18 of 23

The dynamics of creation of new live streams in this system are further illustrated in
Figure 13. Here, we also observed high variation. From a single job to almost 100 jobs can
be created at each instance of time. This indicates that composition of work processed by
this system includes not only constantly-running 24/7 channels, but also a good volume of
special events (e.g., concerts, sports events, etc.), triggering creations of new live delivery
chains. Cloud-based architectures are highly suitable for handing variable workload of
this kind.

Figure 13. The numbers of new live streams created at each instance of time.

Next, in Figure 14 we show CDN bandwidth statistics reported for same set of users
of cloud-based delivery platform. The graphs show the amounts of data delivered at CDN
edge, as well as data pulled from origin servers, and the amounts of CDN midgress traffic.
Here, we also observed significant variations. Around 12/20 8:00 to 11:00 (US eastern
standard time) we saw a cascade of two significant spikes in traffic. The volume of data
increased almost 10× during this period. However, we also noticed, that the amount of
origin traffic during the same period did not increase much. This illustrates that CDNs
managed to absorb these spikes in traffic successfully. More generally, however, the amount
of orgin traffic may also fluctuate significantly. To support such a variable load, cloud-based
delivery platforms provide means for auto-scaling and balancing of load on origin servers.

Figure 14. CDN-reported bandwidth statistics.

Finally, in Figure 15, we also plot the numbers of concurrent players that are pulling
live streaming content. It can be observed that this figure looks somewhat similar to
CDN-reported statistics. Around 12/20 8:00 to 11:00 (US eastern standard time) we also
saw a cascade of two spikes, where almost 1 M concurrent viewers joined. This explains
spikes in CDN traffic noted earlier. Such spikes are pretty common for popular events, and
the system is designed to handle them efficiently.
Appl. Sci. 2021, 11, 503 19 of 23

Figure 15. The numbers of concurrent streaming players pulling the content.

Naturally, the above statistics capture only a small example set of use cases of today’s
existing cloud-based OTT delivery platforms. However, they show that such platfoms are
operational, mass deployed, and capable of handling volumes of transcoding and media
delivery streams comparable to ones used in broadcast distribution systems.
As also shown, the amounts of concurrent live events/streams, transcoders, trans-
muxers, origins, and other elements of delivery chan can be highly variable in practice.
Cloud-based deployments are ideally suited for handling such variablity in resource re-
quirements in cost-efficient manner. Transition of additional components of broadcsast
workflows to cloud will lead to additional reduction in investments in hardware, will
simplify management and maintenance, and will make overall design of such systems
much more flexible, extensible, and future-proof.

6. Conclusions
In this paper, we have studied the differences between on-premise broadcast and
cloud-based online video delivery workflows and identified means needed for bridging the
gaps between them. Such means include: improvements in cloud ingest, signal processing
stacks, transcoder capabilities, and most importantly, a broadcast-grade cloud playout
system. To implement a cloud playout system, we have suggested an architecture em-
ploying intra-only mezzanine format and associated processing blocks that can be easily
replicated and operated in fault-tolerant fashion. We finally considered possible evolu-
tions of broadcast and cloud-based video systems and suggested several possible hybrid
architectures, with different degrees of offloading of processing in cloud, that are likely to
emerge in the future. Examples of operational statistics observed in today’s mass-deployed
cloud-based media delivery systems were also shown. These statistics confirm that such
systems can indeed handle the load required for transitioning of additional elements of
broadcast systems to cloud.

Author Contributions: Conceptualization, Y.R. and J.C.; methodology, Y.R., J.C. and B.Z.; software,
J.C. and B.Z.; validation, Y.R., J.C. and B.Z.; formal analysis, Y.R.; investigation, Y.R. and J.C.;
resources, Y.R., J.C. and B.Z.; data curation, B.Z. and Y.R.; writing—original draft preparation, Y.R.;
writing—review and editing, J.C. and B.Z.; visualization, Y.R. and B.Z.; supervision, Y.R.; project
administration, Y.R.; funding acquisition, Y.R.; All authors have read and agreed to the published
version of the manuscript.
Funding: This research received no external funding
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Data sharing not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
Appl. Sci. 2021, 11, 503 20 of 23

References
1. Pizzi, S.; Jones, G. A Broadcast Engineering Tutorial for Non-Engineers, 4th ed.; National Association Broadcasters (NAB): Chicago,
IL, USA, 2014; 354p, ISBN 13:978-0415733380/10:0415733380.
2. Luther, A.; Inglis, A. Video Engineering, 3rd ed.; McGraw-Hill: New York, NY, USA, 1999; 549p.
3. Wu, D.; Hou, Y.T.; Zhu, W.; Zhang, Y.; Peha, J.M. Streaming video over the internet: Approaches and directions. IEEE Trans.
Circuits Syst. Video Technol. 2001, 11, 282–300.
4. Conklin, G.J.; Greenbaum, G.S.; Lillevold, K.O.; Lippman, A.F.; Reznik, Y.A. Video coding for streaming media delivery on the
internet. IEEE Trans. Circuits Syst. Video Technol. 2001, 11, 269–281. [CrossRef]
5. Pantos, R.; May, W. HTTP Live Streaming, RFC 8216. IETF. Available online: https://tools.ietf.org/html/rfc8216 (accessed on
1 November 2020).
6. ISO/IEC 23009-1:2014. Information Technology—Dynamic Adaptive Streaming Over HTTP (DASH)—Part 1: Media Presen-
tation Description and Segment Formats. ISO/IEC, 2014. Available online: https://www.iso.org/about-us.html (accessed on
1 November 2020).
7. Microsoft Smooth Streaming. Available online: https://www.iis.net/downloads/microsoft/smooth-streaming (accessed on
1 November 2020).
8. Evens, T. Co-opetition of TV broadcasters in online video markets: A winning strategy? Int. J. Digit. Telev. 2014, 5, 61–74.
[CrossRef]
9. Nielsen Holdings Plc, Total Audience Report. 2020. Available online: https://www.nielsen.com/us/en/client-learning/tv/
nielsen-total-audience-report-february-2020/ (accessed on 1 November 2020).
10. Sandvine. The Global Internet Phenomena Report. 2019. Available online: https://www.sandvine.com/hubfs/Sandvine_
Redesign_2019/Downloads/Internet%20Phenomena/Internet%20Phenomena%20Report%20Q32019%2020190910.pdf (accessed
on 1 November 2020).
11. Frost & Sullivan. Analysis of the Global Online Video Platforms Market. Frost & Sullivan. 2014. Available online: https:
//store.frost.com/analysis-of-the-global-online-video-platforms-market.html (accessed on 1 November 2020).
12. ETSI TS 102 796. Hybrid Broadcast Broadband TV. ETSI, 2016. Available online: https://www.etsi.org/deliver/etsi_ts/102700_1
02799/102796/01.04.01_60/ts_102796v010401p.pdf (accessed on 1 November 2020).
13. ATSC A/331:2020. Signaling, Delivery, Synchronization, and Error Protection. ATSC, 2020. Available online: https://www.atsc.
org/atsc-documents/3312017-signaling-delivery-synchronization-error-protection/ (accessed on 1 November 2020).
14. Stockhammer, T.; Sodagar, I.; Zia, W.; Deshpande, S.; Oh, S.; Champel, M. Dash in ATSC 3.0: Bridging the gap between OTT
and broadcast. In Proceedings of the IET Conference Proceedings, IBC 2016 Conference, Amsterdam, The Netherlands, 8–12
September 2016; Volume 1, p. 24.
15. AWS Elemental. Video Processing and Delivery Moves to the Cloud, e-book. 2018. Available online: https://www.elemental.
com/resources/white-papers/e-book-video-processing-delivery-moves-cloud/ (accessed on 1 November 2020).
16. Fautier, T. Cloud Technology Drives Superior Video Encoding; In SMPTE 2019; SMPTE: Los Angeles, CA, USA, 2019; pp. 1–9.
17. ISO/IEC 13818-1:2019. Information Technology—Generic Coding of Moving Pictures and Associated Audio Information:
Systems—Part 1: Systems. ISO/IEC, 2019; Available online: https://www.iso.org/standard/75928.html (accessed on 1 November
2020).
18. ATSC A/65B. Program and System Information Protocol for Terrestrial Broadcast and Cable (PSIP). ATSC, 2013. Available
online: https://www.atsc.org/wp-content/uploads/2015/03/Program-System-Information-Protocol-for-Terrestrial-Broadcast-
and-Cable.pdf (accessed on 1 November 2020).
19. ANSI/SCTE 35 2007. Digital Program Insertion Cueing Message for Cable. SCTE, 2007. Available online: https://webstore.ansi.
org/standards/scte/ansiscte352007 (accessed on 1 November 2020).
20. ANSI/SCTE 67 2010. Recommended Practice for SCTE 35 Digital Program Insertion Cueing Message for Cable. SCTE, 2010.
Available online: https://webstore.ansi.org/standards/scte/ansiscte672010 (accessed on 1 November 2020).
21. Lechner, B.J.; Chernock, R.; Eyer, M.; Goldberg, A.; Goldman, M. The ATSC transport layer, including Program and System
Information (PSIP). Proc. IEEE 2006, 94, 77–101. [CrossRef]
22. Reznik, Y.; Lillevold, K.; Jagannath, A.; Greer, J.; Corley, J. Optimal design of encoding profiles for ABR streaming. In Proceedings
of the Packet Video Workshop, Amsterdam, The Netherlands, 12–15 June 2018. [CrossRef]
23. Reznik, Y.; Li, X.; Lillevold, K.; Peck, R.; Shutt, T.; Marinov, R. Optimizing Mass-Scale Multi-Screen Video Delivery. In Proceedings
of the 2019 NAB Broadcast Engineering and Information Technology Conference, Las Vegas, NV, USA, 6–11 April 2019.
24. Davidson, G.A.; Isnardi, M.A.; Fielder, L.D.; Goldman, M.S.; Todd, C.C. ATSC Video and Audio Coding. Proc. IEEE 2006,
94, 60–76. [CrossRef]
25. ATSC A/53 Part 1: 2013. ATSC Digital Television Standard: Part 1—Digital Television System. ATSC, 2013. Available online:
https://www.atsc.org/wp-content/uploads/2015/03/A53-Part-1-2013-1.pdf (accessed on 1 November 2020).
26. ATSC A/53 Part 4:2009. ATSC Digital Television Standard: Part 4—MPEG-2 Video System Characteristics. ATSC, 2009. Available
online: https://www.atsc.org/wp-content/uploads/2015/03/a_53-Part-4-2009-1.pdf (accessed on 1 November 2020).
27. ATSC A/54A. Recommended Practice: Guide to the Use of the ATSC Digital Television Standard. ATSC, 2003. Available online:
https://www.atsc.org/wp-content/uploads/2015/03/a_54a_with_corr_1.pdf (accessed on 1 November 2020).
Appl. Sci. 2021, 11, 503 21 of 23

28. ATSC A/72 Part 1:2015. Video System Characteristics of AVC in the ATSC Digital Television System. ATSC, 2015. Available
online: https://www.atsc.org/wp-content/uploads/2015/03/A72-Part-1-2015-1.pdf (accessed on 1 November 2020).
29. ATSC A/72 Part 2:2014. AVC Video Transport Subsystem Characteristics. ATSC, 2014. Available online: https://www.atsc.org/
wp-content/uploads/2015/03/A72-Part-2-2014-1.pdf (accessed on 1 November 2020).
30. ETSI TS 101 154. Digital Video Broadcasting (DVB): Implementation Guidelines for the use of MPEG-2 Systems, Video and
Audio in Satellite, Cable and Terrestrial Broadcasting Applications. Doc. ETSI TS 101 154 V1.7.1. Annex, B., Ed.; 2019. Available
online: https://standards.iteh.ai/catalog/standards/etsi/af36a167-779e-4239-b5a7-89356c6c2dde/etsi-ts-101-154-v2.6.1-2019-
09 (accessed on 1 November 2020).
31. ANSI/SCTE 43 2015. Digital Video Systems Characteristics Standard for Cable Television. SCTE, 2015. Available online:
https://webstore.ansi.org/standards/scte/ansiscte432015 (accessed on 1 November 2020).
32. ANSI/SCTE 128 2010-a. AVC Video Systems and Transport Constraints for Cable Television. SCTE, 2010. Available online:
https://webstore.ansi.org/standards/scte/ansiscte1282010 (accessed on 1 November 2020).
33. Schulzrinne, H.; Casner, S.; Frederick, R.; Jacobson, V. RTP: A Transport Protocol for Real-Time Applications. RFC 1889. IETF,
1996. Available online: https://tools.ietf.org/html/rfc1889 (accessed on 1 November 2020).
34. SMPTE ST 2022-1:2007. Forward Error Correction for Real-Time Video/Audio Transport over IP Networks. ST 2022-1. SMPTE,
2007. Available online: https://ieeexplore.ieee.org/document/7291470/versions#versions (accessed on 1 November 2020).
35. SMPTE ST 2022-2:2007. Unidirectional Transport of Constant Bit Rate MPEG-2 Transport Streams on IP Networks. ST 2022-2;
SMPTE, 2007. Available online: https://ieeexplore.ieee.org/document/7291740 (accessed on 1 November 2020).
36. Zixi, LLC. Streaming Video over the Internet and Zixi. 2015. Available online: http://www.zixi.com/PDFs/Adaptive-Bit-Rate-
Streaming-and-Final.aspx (accessed on 1 November 2020).
37. OC-SP-MEZZANINE-C01-161026. Mezzanine Encoding Specification. Cable Television Laboratories, Inc., 2016. Available
online: https://community.cablelabs.com/wiki/plugins/servlet/cablelabs/alfresco/download?id=1d76e930-6d98-4de3-89ee-
9d0fb4b5292a (accessed on 1 November 2020).
38. Adobe Systems, Real-Time Messaging Protocol (RTMP) Specification. Version 1.0. 2012. Available online: https://www.adobe.
com/devnet/rtmp.html (accessed on 1 November 2020).
39. Haivision. Secure Reliable Transport (SRT). 2019. Available online: https://github.com/Haivision/srt (accessed on
1 November 2020).
40. VSF TR-06-1. Reliable Internet Stream Transport (RIST) Protocol Specification—Simple Profile. Video Services Forum.
2018. Available online: http://vsf.tv/download/technical_recommendations/VSF_TR-06-1_2018_10_17.pdf (accessed on
1 November 2020).
41. OC-SP-CEP3.0-I05-151104. Content Encoding Profiles 3.0 Specification. Cable Television Laboratories, Inc., 2015. Available
online: https://community.cablelabs.com/wiki/plugins/servlet/cablelabs/alfresco/download?id=c7eb769e-1020-402c-b2f2
-d839ee532945 (accessed on 1 November 2020).
42. Ozer, J. Encoding for Multiple Devices. Streaming Media Magazine. 2013. Available online: http://www.streamingmedia.com/
Articles/ReadArticle.aspx?ArticleID=88179&fb_comment_id=220580544752826_937649 (accessed on 1 November 2020).
43. Ozer, J. Encoding for Multiple-Screen Delivery. Streaming Media East. 2013. Available online: https://www.streamingmediablog.
com/wp-content/uploads/2013/07/2013SMEast-Workshop-Encoding.pdf (accessed on 1 November 2020).
44. Apple. HLS Authoring Specification for Apple Devices. 2019. Available online: https://developer.apple.com/documentation/
http_live_streaming/hls_authoring_specification_for_apple_devices (accessed on 1 November 2020).
45. DASH-IF. DASH-IF Interoperability Points, v4.3. 2018. Available online: https://dashif.org/docs/DASH-IF-IOP-v4.3.pdf
(accessed on 1 November 2020).
46. ETSI TS 103 285 v1.2.1. Digital Video Broadcasting (DVB); MPEG-DASH Profile for Transport of ISO BMFF Based DVB Services
Over IP Based Networks. 2020. Available online: https://www.etsi.org/deliver/etsi_ts/103200_103299/103285/01.03.01_60/ts_
103285v010301p.pdf (accessed on 1 November 2020).
47. SMPTE RP 145. Color Monitor Colorimetry. SMPTE, 1987. Available online: https://standards.globalspec.com/std/1284848/
smpte-rp-145 (accessed on 1 November 2020).
48. SMPTE 170M. Composite Analog Video Signal—NTSC for Studio Applications. SMPTE, 1994. Available online: https://
standards.globalspec.com/std/892300/SMPTE%20ST%20170M (accessed on 1 November 2020).
49. EBU Tech. 3213-E. E.B.U. Standard for Chromaticity Tolerances for Studio Monitors. EBU, 1975. Available online: https:
//tech.ebu.ch/docs/tech/tech3213.pdf (accessed on 1 November 2020).
50. ITU-R Recommendation BT.601. Studio Encoding Parameters of Digital Television For standard 4:3 and Wide Screen 16:9 Aspect
Ratios. ITU-R, 2011. Available online: https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.601-7-201103-I!!PDF-E.pdf
(accessed on 1 November 2020).
51. ITU-R Recommendation BT.709. Parameter Values for the HDTV Standards for Production and International Programme
Exchange. ITU-R. ITU-R, 2015. Available online: https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.709-6-201506-I!
!PDF-E.pdf (accessed on 1 November 2020).
52. IEC 61966-2-1:1999. Multimedia Systems and Equipment—Colour Measurement and Management—Part 2-1: Colour
Management—Default RGB Colour Space—sRGB. IEC, 1999. Available online: https://webstore.iec.ch/publication/6169
(accessed on 1 November 2020).
Appl. Sci. 2021, 11, 503 22 of 23

53. Brydon, N. Saving Bits—The Impact of MCTF Enhanced Noise Reduction. SMPTE J. 2002, 111, 23–28. [CrossRef]
54. ISO/IEC 13818-2:2013. Information Technology—Generic Coding of Moving Pictures and Associated Audio Information—Part 2:
Video. ISO/IEC, 2013. Available online: https://www.iso.org/standard/61152.html (accessed on 1 November 2020).
55. ISO/IEC 14496-10:2003. Information Technology—Coding of Audio-Visual Objects—Part 10: Advanced Video Coding. ISO/IEC,
2003. Available online: https://www.iso.org/standard/37729.html (accessed on 1 November 2020).
56. ISO/IEC 23008-2:2013. Information Technology –High Efficiency Coding and Media Delivery in Heterogeneous Environments—
Part 2: High Efficiency Video Coding. ISO/IEC, 2013. Available online: https://www.iso.org/standard/35424.html (accessed on
1 November 2020).
57. AOM AV1. AV1 Bitstream & Decoding Process Specification, v1.0.0. Alliance for Open Media. 2019. Available online: https:
//aomediacodec.github.io/av1-spec/av1-spec.pdf (accessed on 1 November 2020).
58. CTA 5001. Web Application Video Ecosystem—Content Specification. CTA WAVE, 2018. Available online: https://cdn.cta.tech/
cta/media/media/resources/standards/pdfs/cta-5001-final_v2_pdf.pdf (accessed on 1 November 2020).
59. Perkins, M.; Arnstein, D. Statistical multiplexing of multiple MPEG-2 video programs in a single channel. SMPTE J. 1995,
4, 596–599. [CrossRef]
60. Boroczky, L.; Ngai, A.Y.; Westermann, E.F. Statistical multiplexing using MPEG-2 video encoders. IBM J. Res. Dev. 1999,
43, 511–520. [CrossRef]
61. CEA-608-B. Line 21 data services. Consumer Electronics Association. 2008. Available online: https://webstore.ansi.org/standards/
cea/cea6082008r2014ansi (accessed on 1 November 2020).
62. CEA-708-B. Digital television (DTV) closed captioning. Consumer Electronics Association. 2008. Available online: https://www.
scribd.com/document/70239447/CEA-708-B (accessed on 1 November 2020).
63. CEA CEB16. Active Format Description (AFD) & Bar Data Recommended Practice. CEA, 2006. Available online: https:
//webstore.ansi.org/standards/cea/ceaceb162012 (accessed on 1 November 2020).
64. SMPTE 2016-1. Standard for Television—Format for Active Format Description and Bar Data. SMPTE, 2007. Available online:
https://www.techstreet.com/standards/smpte-2016-1-2009?product_id=1664006 (accessed on 1 November 2020).
65. OC-SP-EP-I01-130118. Encoder Boundary Point Specification. Cable Television Laboratories, Inc., 2013. Available online: https://
community.cablelabs.com/wiki/plugins/servlet/cablelabs/alfresco/download?id=2a4f4cc6-3763-40b9-9ace-7de923559187 (ac-
cessed on 1 November 2020).
66. FairPlay Streaming. Available online: https://developer.apple.com/streaming/fps/ (accessed on 1 November 2020).
67. PlayReady. Available online: https://www.microsoft.com/playready/overview/ (accessed on 1 November 2020).
68. Widevine. Available online: https://www.widevine.com/solutions/widevine-drm (accessed on 1 November 2020).
69. ISO/IEC 14496-12:2015. Information Technology—Coding of Audio-Visual Objects—Part 12: ISO Base Media File Format. 2015.
Available online: https://www.iso.org/standard/68960.html (accessed on 1 November 2020).
70. ISO/IEC 23000-19:2018. Information Technology—Coding of Audio-Visual Objects—Part 19: Common Media Application
Format (CMAF) for Segmented Media. 2018. Available online: https://www.iso.org/standard/71975.html (accessed on
1 November 2020).
71. ID3 Tagging System. Available online: http://www.id3.org/id3v2.3.0 (accessed on 1 November 2020).
72. W3C WebVTT. The Web Video Text Tracks. W3C, 2018. Available online: http://dev.w3.org/html5/webvtt/ (accessed on
1 November 2020).
73. W3C TTML1. Timed Text Markup Language 1. W3C, 2019. Available online: https://www.w3.org/TR/2018/REC-ttml1-20181
108/ (accessed on 1 November 2020).
74. W3C IMSC1. TTML Profiles for Internet Media Subtitles and Captions 1.0. W3C, 2015. Available online: https://dvcs.w3.org/
hg/ttml/raw-file/tip/ttml-ww-pro-33files/ttml-ww-profiles.html (accessed on 1 November 2020).
75. Apple. Incorporating Ads into A Playlist. Available online: https://developer.apple.com/documentation/http_live_streaming/
example_playlists_for_http_live_streaming/incorporating_ads_into_a_playlist (accessed on 1 November 2020).
76. ANSI/SCTE 30 2017. Digital Program Insertion Splicing API. SCTE, 2017. Available online: https://webstore.ansi.org/standards/
scte/ansiscte302017 (accessed on 1 November 2020).
77. ANSI/SCTE 172 2011. Constraints on AVC Video Coding for Digital Program Insertion. SCTE, 2011. Available online: https:
//webstore.ansi.org/preview-pages/SCTE/preview_SCTE+172+2011.pdf (accessed on 1 November 2020).
78. SMPTE 259:2008. SMPTE Standard—For Television—SDTV—Digital Signal/Data—Serial Digital Interface. SMPTE, 2008.
Available online: https://ieeexplore.ieee.org/document/7292109 (accessed on 1 November 2020).
79. SMPTE 292-1:2018. SMPTE Standard—1.5 Gb/s Signal/Data Serial Interface. SMPTE, 2018. Available online: https://ieeexplore.
ieee.org/document/8353270 (accessed on 1 November 2020).
80. SMPTE 2110-20:2017. SMPTE Standard—Professional Media over Managed IP Networks: Uncompressed Active Video. SMPTE,
2017. Available online: https://ieeexplore.ieee.org/document/8167389 (accessed on 1 November 2020).
81. AWS Direct Connect. Available online: https://aws.amazon.com/directconnect/ (accessed on 1 November 2020).
82. Azure ExpressRoute. Available online: https://azure.microsoft.com/en-us/services/expressroute/ (accessed on 1 November 2020).
83. FFMPEG Filter Documentation. Available online: https://ffmpeg.org/ffmpeg-filters.html (accessed on 1 November 2020).
84. Vanam, R.; Reznik, Y. Temporal Sampling Conversion using Bi-directional Optical Flows with Dual Regularization. In Proceedings
of the IEEE International Conference on Image Processing (ICIP), Abu Dhabi, UAE, 25–28 October 2020.
Appl. Sci. 2021, 11, 503 23 of 23

85. Teranex Standards Converters. Available online: https://www.blackmagicdesign.com/products/teranex (accessed on


1 November 2020).
86. Grass Valley KudosPro Converters. Available online: https://www.grassvalley.com/products/kudospro/ (accessed on 1
November 2020).
87. Apple. Protocol Extension for Low-Latency HLS (Preliminary Specification). Apple Inc., 2020. Available online: https://developer.
apple.com/documentation/http_live_streaming/protocol_extension_for_low-latency_hls_preliminary_specification (accessed
on 1 November 2020).
88. DASH-IF. DASH-IF ATSC3.0 IOP. 2012. Available online: https://dashif.org/docs/DASH-IF-IOP-for-ATSC3-0-v1.1.pdf (accessed
on 1 November 2020).
89. Law, W. Ultra Low Latency with CMAF. Available online: https://www.akamai.com/us/en/multimedia/documents/white-
paper/low-latency-streaming-cmaf-whitepaper.pdf (accessed on 1 November 2020).
90. Brightcove VideoCloud System. Available online: https://www.brightcove.com/en/online-video-platform (accessed on
1 November 2020).

You might also like