2_675
2_675
Regular Paper
The new video coding standard H.264/AVC offers major improvements in the coding effi-
ciency and flexible mapping to transport layers. It consists of a video coding layer (VCL) and
a network abstraction layer (NAL). The VCL carries out the coding, and the NAL encapsu-
lates data from the VCL in a manner where transmission over a broad variety of transport
layers is readily enabled. Since no security features are offered, an authentication scheme to
authenticate the sender and data integrity is needed. In this paper we propose SANAL, a
stream authentication scheme for H.264/AVC. Unlike existing schemes that carry out au-
thentication procedures at the packet level, authentication procedures in SANAL are carried
out at the NAL level. This makes it possible to set priorities to H.264/AVC-specific data
without interfering with the H.264/AVC features. We implemented a SANAL prototype and
carried out comparative evaluations on playout rate, communication overhead, and process
load. The evaluation results show that the playout rate is improved by 40% compared to
existing schemes.
675
Information and Media Technologies 2(2): 675-683 (2007)
reprinted from: IPSJ Digital Courier 3: 55-63 (2007)
© Information Processing Society of Japan
cussed in Section 3. Our stream authentication indicate the type of the NAL unit. Details on
scheme for H.264/AVC is proposed in Section nal unit type are given in the next section.
4. Evaluation results are presented in Section 5. The payload trailing bits are used to adjust
Finally concluding remarks are given in Section the payload to become a multiple of bytes. The
6. trailing bits start with a “1” and are followed
by multiple “0s”. The end of the payload data
2. Overview of H.264/AVC
is indicated by this “1”, the start of the trailing
A brief description of the H.264/AVC stan- bits.
dard is given in this section. 2.2 NAL Unit Types
The ITU-T Recommendation H.264 video The types of NAL units are listed in
coding and the ISO/IEC International Stan- Table 1. nal unit type 1-12 are currently de-
dard 14496-10 Advanced Video Coding to- fined. nal unit type 1-5, and 12 are coded
gether developed H.264/AVC, a new video cod- video data called VCL NAL units. The rest
ing standard. H.264/AVC is a generic coding of the nal unit types are called non-VCL NAL
standard designed for broadcast, storage and units and contain information such as parame-
transmission of a wide range of multimedia ap- ter sets and supplemental enhancement infor-
plications. A particular focus was improving mation. Of these NAL units, IDR Pictures,
the coding efficiency, and the new standard SPS, and PPS are important, and additional
therefore enables the bit rate of MPEG-4 to be descriptions are given below.
halved with the same level of fidelity. However, An instantaneous decoding refresh (IDR) pic-
the methods implemented in H.264/AVC to im- ture is a picture placed at the beginning of a
prove the coding efficiency are not important to coded video sequence. When the decoder re-
our proposal in the terms of stream authentica- ceives an IDR picture, all information is re-
tion. freshed, which indicates a new coded video se-
2.1 NAL quence. Therefore, pictures prior to this IDR
One characteristic feature of H.264/AVC is picture are not needed for this new sequence.
that it is separated into a video coding layer A sequence parameter set (SPS) contains im-
(VCL) and a network abstraction layer (NAL). portant header information that applies to all
The VCL carries out the encoding tasks. The NAL units in the coded video sequence. A pic-
NAL encapsulates the data from the VCL to ture parameter set (PPS) contains header infor-
enable transmission over packet networks or mation that applies to the decoding of one or
multiplex environments. Data such as picture more pictures within the coded video sequence.
slices and parameter sets are sent from the H.264/AVC enables handling of multiple se-
VCL to the NAL and encapsulated into units quences in one bitstream, and a sequence con-
called NAL units. These NAL units are used tains multiple pictures. Therefore, SPS and
in transport layer mapping. This structure of PPS are numerated to identify each sequence
H.264/AVC allows flexibility for operation over and picture. Each PPS contains an identifier
a variety of network environments. of which SPS to refer to, and each VCL NAL
The format of a NAL unit is shown in Fig. 1.
A NAL unit consists of a 1-byte NAL header Table 1 NAL unit types.
and a variable byte length raw byte sequence Type Name
payload (RBSP). Data such as picture slices 0 [Unspecified]
(coded video data) and parameter sets are 1 Coded Slice
2 Data Partition A
stored in the RBSP. The NAL header consists 3 Data Partition B
of one forbidden bit, two bits (nal ref idc) in- 4 Data Partition C
dicating wether or not the NAL unit is used 5 IDR (Instantaneous Decoding Refresh) Picture
for prediction, and five bits (nal unit type) to 6 SEI (Supplemental Enhancement Information)
7 SPS (Sequence Parameter Set)
8 PPS (Picture Parameter Set)
9 Access Unit Delimiter
10 EoS (End of Sequence)
11 EoS (End of Stream)
12 Filler Data
13-23 [Extended]
24-31 [Undefined]
Fig. 1 NAL unit format.
676
Information and Media Technologies 2(2): 675-683 (2007)
reprinted from: IPSJ Digital Courier 3: 55-63 (2007)
© Information Processing Society of Japan
677
Information and Media Technologies 2(2): 675-683 (2007)
reprinted from: IPSJ Digital Courier 3: 55-63 (2007)
© Information Processing Society of Japan
678
Information and Media Technologies 2(2): 675-683 (2007)
reprinted from: IPSJ Digital Courier 3: 55-63 (2007)
© Information Processing Society of Japan
679
Information and Media Technologies 2(2): 675-683 (2007)
reprinted from: IPSJ Digital Courier 3: 55-63 (2007)
© Information Processing Society of Japan
MPEG-4 AVC Reference Software 19) to sup- The number of frames generated at the en-
port authentication. Since the packet loss prob- coder, Fn , was set to 900 frames, and the frame
ability changes over time, it is difficult to eval- rate, Fr , was set to 30 frames/sec. The value
uate our scheme over real networks. We there- Fn /Fr is the length of the encoded video se-
fore used the two-state Markov chain loss model quence measured for evaluation. The SPS in-
to express burst packet losses and ran mea- sertion interval Si was set to a random number
surements over virtual networks. We used the between 2000,. . . ,5000 msec since it is stated
two-state Markov chain loss model since it is that an IDR is inserted every 2 to 5 seconds 24) .
often used to evaluate stream authentication The PPS insertion interval is set to a random
schemes. number between 1000,. . . ,2000 msec. This is to
5.1 Implementation Environment measure values of Permutation P.
Performance measurements were carried out 5.1.2 Evaluation Criteria
on a Pentium 4 3.4-GHz CPU, 2.0-GB RAM We evaluated playout rate, communication
processor. The implementation of SANAL is overhead, and process load. The playout rate is
written in C/C++. We embedded SANAL to the number of authenticated and playable NAL
the H.264/MPEG-4 AVC Reference Software units on the receiver side divided by the to-
JM9.6. Also, 160-bit SHA-1 hash functions and tal number of NAL units transmitted by the
1024-bit RSA for digital signatures from the sender side. In previously proposed schemes,
OpenSSL library were used, although our au- evaluation of the robustness to packet loss is
thentication scheme is not dependent on any often carried out as authentication rate, that
particular type of hash function or digital sig- is the total number of authenticated received
nature. IDA from Crypto++ library was used packets divided by the total number of pack-
as the FEC technique to reconstruct data lost ets transmitted by the sender side. However,
in packet loss. Due to the features of JM9.6, there are cases where authenticated data are
each NAL unit was encapsulated into one RTP unplayable when there are dependencies be-
packet. tween data. So the authentication rate is not
5.1.1 Measurement Parameters necessarily equal to the playout rate. There-
The parameters of the performance measure- fore, the playout rate is a more valuable evalu-
ments are shown in Table 2. ation criteria when dealing with data that carry
The maximum size of a NAL unit group, dependencies. The communication overhead
Nmax was set to 5,. . . ,15. The reconstruction is the amount of the authentication informa-
threshold, M was set to 3,. . . ,Nmax . The re- tion of SANAL divided by the total amount of
construction threshold is the number of FEC H.264/AVC encoder-generated NAL data. Here
data needed to reconstruct the original data in the authentication information refers to data
case of packet loss. The maximum value of such as Hc , Sig, and F EC, the data added
the packet loss rate, p was set to 40%. Re- by applying SANAL to the original H.264/AVC
sults from several studies that measured packet encoder. The H.264/AVC encoder-generated
loss over the Internet show that packet loss NAL data are the data such as coded slice, IDR,
probability via the Internet is much less than SPS and PPS, the data generated by the orig-
40% 20)∼23) . The expected burst loss length inal H.264/AVC encoder. The process load is
was set to eight packets since the average burst the encoder and decoder process time due to
packet loss length over the Internet is less than SANAL divided by the process time of the orig-
eight packets. inal H.264/AVC encoder. The encoder process
680
Information and Media Technologies 2(2): 675-683 (2007)
reprinted from: IPSJ Digital Courier 3: 55-63 (2007)
© Information Processing Society of Japan
time is the total time for hash calculation, sig- half of the NAL unit group may become unveri-
nature generation, and IDA encoding. The de- fiable due to loss of data in a different sequence.
coder process time is the total time for hash Thus, it is inefficient in terms of authentication.
calculation, signature verification, and IDA de- Furthermore, the succeeding data that are de-
coding. pendent on these unverifiable parameter data
We will compare SANAL with SAIDA, since are unplayable and thus inefficient in terms of
SAIDA uses FEC techniques. playout. In SANAL, the parameter data are
5.2 Results placed at the start of the NAL unit group, and
5.2.1 Playout Rate no data from a different sequence are included.
Figure 10 shows the relationship of the Thus, cases where data are unverifiable and un-
packet loss rate and the playout rate when playable due to lost data in NAL units of a dif-
Nmax = 9 and M = 5 and 9. ferent sequence do not occur.
The playout rate of SANAL is higher than 5.2.2 Communication Overhead
that of SAIDA for both M = 5 and M = 9. For Figure 11 shows the relationship of the com-
example, when the packet loss rate is 20% and munication overhead and the playout rate.
M =5, the playout rate of SANAL and SAIDA SANAL has a higher playout rate than
are 0.65 and 0.47 respectively. This shows that SAIDA, as explained in the previous section.
SANAL has a 38% better playout rate than However, the overhead of SANAL is also up to
SAIDA. When packets are lost, the high prior- 10 times higher than SAIDA. In SANAL, FEC
ity parameter data are reconstructed in SANAL is carried out for IDR slices, which are NAL
but not in SAIDA. In SAIDA, only the authen- units that include some of the largest coded
tication information is made robust to packet video data. On the other hand, in SAIDA, FEC
loss. Therefore, when the parameter data are is only carried out for the hashes of packets
lost, all data referring to the lost parameter and digital signatures, and thus the overhead
data are unplayable even if authenticated. This is small. Although the overhead of SANAL is
is the main reason for the difference in playout higher compared to SAIDA, it is still less than
rate between the two schemes. 10% of the total H.264/AVC encoder-generated
In addition, in SANAL, the NAL unit groups bitstream. Also, the FEC data generated for
are formed according to the appearance of the Permutation P are similar in size to the FEC
coded video sequences generated by the en- data generated by SAIDA, so the increase in
coder, such as permutations S, P and C. In the overhead is mainly due to the FEC data of
contranst, in SAIDA, a NAL unit group is Permutation S.
formed for every Nmax packets. For example, 5.2.3 Process load
say a NAL unit group formed by SAIDA is Figure 12 shows the relationship between
‘C, C, C, C, S, P, I’. This NAL unit group con- the reconstruction threshold and the encoder
tains data from two different sequences, since S and decoder process load. p is 20 for the de-
indicates a new sequence. For cases when the coder process load.
coded slices in the first half of the NAL unit Encoder Process Load
group are lost, the parameter data in the last Figure 12 shows that the encoder process
Fig. 10 Relationship between packet loss rate and Fig. 11 Relationship between overhead and playout
playout rate. rate.
681
Information and Media Technologies 2(2): 675-683 (2007)
reprinted from: IPSJ Digital Courier 3: 55-63 (2007)
© Information Processing Society of Japan
6. Conclusion
We have proposed SANAL, a stream au-
thentication scheme for H.264/AVC. To take
account of the features of H.264/AVC, au-
thentication procedures are carried out at the
NAL level. We implemented a SANAL proto-
type, and through our measurement results, we
showed the effectiveness of SANAL. The play-
out rate is improved by 40% compared to exist-
ing schemes while the process load is kept below
3.5%.
Fig. 12 Relationship between reconstruction Acknowledgments This work is sup-
threshold and process load. ported in part by JSPS Research Fellowships
for Young Scientists.
load is kept below 0.5% and also that the en-
References
coder process load has an approximately con-
stant value and is not affected by the value of 1) Schulzrinne, H., Casner, S., Frederick, R. and
M or Nmax . This is due to the fact that the Jacobson, V.: RTP: A transport protocol for
encoding process time of the H.264/AVC en- real-time applications, RFC 3550 (2003).
coder is much larger than the time needed by 2) Argyriou, A. and Madisetti, V.: Streaming
SANAL to generate hashes, digital signatures H.264/AVC Video over the Internet, IEEE
and FEC data. Also, in SANAL, the number Consumer Comm. and Networking Conference,
pp.169–174 (Jan. 2004).
of times procedures such as generation of dig-
3) Shahbazian, J. and Christensen, K.J.: TSGen:
ital signatures and FEC data which requires a a tool for modeling of frame loss in streaming
comparatively longer time are kept to a small video, International Journal of Network Man-
value. agement, pp.315–327 (2004).
Decoder Process Load 4) ISO/IEC 13818-2: 2000, Information
Figure 12 shows that the decoder process load technology-Generic coding of moving pictures
is kept below 3.5%. The maximum decoder and associated audio information (2000).
process load is not at the minimum or maxi- 5) ISO/IEC 14496-2: 2001, Coding of audio-
mum possible value of M . This is due to the visual objects — Part2: Visual (2001).
following reasons. When M is a small value, 6) ITU-T Recommendation H.264. Advanced
less number of FEC data are needed to recon- Video Coding for generic audiovisual services
struct the lost NAL units. The IDA decoding (2003).
procedures are inverse matrix calculations, and 7) ISO/IEC International Standard 14496-10
has characteristics where the larger the num- (2003).
ber of data becomes the larger the dimension 8) Wiegand, T., Sullivan, G., Bjontegaard, G.
and Lutra, A.: Overview of the H.264/AVC
of matrix becomes, which results in a longer
Video Coding Standard, IEEE Trans. on
procedure time. In other words, the smaller the Circuits and Systems for Video Technology,
number of FEC data, the shorter the procedure Vol.13, No.7, pp.560–576 (July 2003).
time. Also, when the value of M becomes large, 9) Wenger, S.: H.264/AVC Over IP, IEEE Trans.
reconstruction of the lost NAL units becomes on Circuits and Systems for Video Technology,
difficult since more FEC data are needed, and Vol.13, No.7, pp.645–656 (July 2003).
therefore the IDA decoding procedures are not 10) Ueda, S., Eto, S., Kawaguchi, N., Uda, R.,
carried out for the lost NAL units. Shigeno, H. and Okada, K.: Real-time Stream
Unlike the encoder process load, the bigger Authentication Scheme for IP Telephony, IPSJ
the value of Nmax , the bigger the decoder pro- Journal, Vol.45, No.2, pp.605–613 (Feb. 2004).
cess load is. Also, the value of M where the 11) Ueda, S., Kaneko, S., Kawaguchi, N. Shigeno,
decoder process load peaks is a higher value. A H. and Okada, K.: A Real-Time Stream Au-
higher value of M results in a larger number thentication Scheme for Video Streams, IPSJ
of FEC data for reconstruction and therefore, a Journal, Vol.47, No.2, pp.415–425 (Feb. 2006).
longer process time. 12) Challal, Y., Bettahar, H. and Bouabdallah,
A.: A Taxonomy of Multicast Data Origin
682
Information and Media Technologies 2(2): 675-683 (2007)
reprinted from: IPSJ Digital Courier 3: 55-63 (2007)
© Information Processing Society of Japan
683