
RFC 6184
RTP Payload Format for H.264 Video May 2011
multiplex environments. Annex B of H.264 defines an encapsulation
process to transmit such NALUs over bytestream-oriented networks. In
the scope of this memo, Annex B is not relevant.
Internally, the NAL uses NAL units. A NAL unit consists of a one-
byte header and the payload byte string. The header indicates the
type of the NAL unit, the (potential) presence of bit errors or
syntax violations in the NAL unit payload, and information regarding
the relative importance of the NAL unit for the decoding process.
This RTP payload specification is designed to be unaware of the bit
string in the NAL unit payload.
One of the main properties of H.264 is the complete decoupling of the
transmission time, the decoding time, and the sampling or
presentation time of slices and pictures. The decoding process
specified in H.264 is unaware of time, and the H.264 syntax does not
carry information such as the number of skipped frames (as is common
in the form of the Temporal Reference in earlier video compression
standards). Also, there are NAL units that affect many pictures and
that are, therefore, inherently timeless. For this reason, the
handling of the RTP timestamp requires some special considerations
for NAL units for which the sampling or presentation time is not
defined or, at transmission time, is unknown.
1.2. Parameter Set Concept
One very fundamental design concept of H.264 is to generate self-
contained packets, to make mechanisms such as the header duplication
of
RFC 4629 [11] or MPEG-4 Visual’s Header Extension Code (HEC) [12]
unnecessary. This was achieved by decoupling information relevant to
more than one slice from the media stream. This higher-layer meta
information should be sent reliably, asynchronously, and in advance
from the RTP packet stream that contains the slice packets.
(Provisions for sending this information in-band are also available
for applications that do not have an out-of-band transport channel
appropriate for the purpose). The combination of the higher-level
parameters is called a parameter set. The H.264 specification
includes two types of parameter sets: sequence parameter sets and
picture parameter sets. An active sequence parameter set remains
unchanged throughout a coded video sequence, and an active picture
parameter set remains unchanged within a coded picture. The sequence
and picture parameter set structures contain information such as
picture size, optional coding modes employed, and macroblock to slice
group map.
Wang, et al. Standards Track [Page 5]