USB Video Temporal Encoder Examples 1.5
USB Video Temporal Encoder Examples 1.5
Revision 1.5
August 9, 2012
UVC 1.5 Examples
Contributors
David Roh Dolby Laboratories Inc.
Choon Chng Google Inc.
Ville-Mikko Rautio Google Inc.
Van Duros Immedia Semiconductor Inc.
Abdul R. Ismail Intel Corp.
Bradley Saunders Intel Corporation
Ygal Blum Jungo
Yoav Nissim Jungo
Chandrashekhar Rao. Logitech Inc.
Chris Yokum MCCI Corporation
Stephen Cooper Microsoft Corp.
Maribel Figuera Microsoft Corp.
Richard Webb Microsoft Corp.
Tim Vlaar Point Grey Research Inc
Mark Bohm SMSC
John Sisto SMSC
Will Harris Texas Instruments
Grant Ley Texas Instruments
Paul E. Berg USB-IF
Revision History
Table of Contents
1 H.264 Simulcast Example ....................................................................................................... 8
1.1 Introduction ..................................................................................................................... 8
1.1.1 Purpose ..................................................................................................................... 8
1.1.2 Scope ........................................................................................................................ 8
1.1.3 Device Description ................................................................................................... 8
1.2 Descriptors ...................................................................................................................... 9
1.3 Scenario......................................................................................................................... 14
1.4 Negotiation .................................................................................................................... 14
1.5 Configuration using the Encoding Units Prior to Streaming ........................................ 19
1.6 Dynamic Configuration using the Encoding Units While Streaming ........................... 21
2 Webcam with VP8 Encoding Capability ............................................................................... 22
2.1 Product Description ...................................................................................................... 22
2.2 Descriptor Hierarchy..................................................................................................... 22
2.3 Descriptors .................................................................................................................... 25
2.4 Requests ........................................................................................................................ 27
2.4.1 Probe & Commit for Video Streaming Interface Two ........................................... 27
2.4.2 Sequence Diagram .................................................................................................. 29
3 The UVC 1.5 backward compatibility example: ................................................................... 31
3.1 Device Descriptor ......................................................................................................... 31
3.2 First Configuration: ....................................................................................................... 31
3.3 Second Configuration: .................................................................................................. 32
List of Tables
Table 1-1 Standard VC Interface Descriptor ................................................................................ 10
Table 1-2 Class-Specific VS Interface Input Header Descriptor .................................................. 10
Table 1-3 Encoding Unit Descriptor ............................................................................................. 11
Table 1-4 Video Streaming H.264 Format Descriptor.................................................................. 12
Table 1-5 Video Streaming H.264 Frame Descriptors ................................................................. 13
Table 1-6 GET_CUR Probe state ................................................................................................. 16
Table 1-7 SET_CUR Commit data structure ................................................................................ 17
Table 2-1 Encoding Unit Descriptor ............................................................................................. 25
Table 2-2 Class-specific VS Format Descriptor ........................................................................... 25
Table 2-3 Class-specific VS Frame Descriptor ............................................................................ 26
Table 2-4 VS_COMMIT_CONTROL(SET_CUR) Request to VSI Two. ................................... 27
List of Figures
Figure 1-1 USB Video Camera Topology ...................................................................................... 9
Figure 1-2 Sequence for Negotiating a Simulcast Stream ............................................................ 15
Figure 1-3 Configuration of each stream in the simulcast payload prior to streaming................. 20
Figure 1-4 Dynamic configuration while streaming ..................................................................... 21
Figure 2-1 USB Video Camera Topology .................................................................................... 22
Figure 2-2 USB Video Camera Descriptor Hierarchy. ................................................................. 24
Figure 2-3 Sequence for Configuring Preview Stream and Two Simulcast Streams. .................. 30
The specific features the H.264 Simulcast format supports in this example are:
Up to two single H.264 streams multiplexed into a single H.264 simulcast stream
A maximum macro block (MB) processing rate of 244,800 MB/s for H.264 single
stream. For example, a maximum resolution of 1080p ((1920*1088)*30/(256) =
244,800).
A maximum macro block (MB) processing rate of 169,200 MB/s for H.264
simulcasts. For example, the simulcast stream may consist of 720p and 540p at 30fps
((960*544+1280*720)*30/(256) = 169,200).
Three rate control methods: Constant QP, VBR and GVBR with underflow allowed.
Eight H.264 frame descriptors (e.g. 4 resolutions x 2 H.264 profiles per resolution):
o 4 different resolutions: 1080p (1920x1080), 720p (1280x720), 540p
(960x540), and 360p (640x360)
o For each resolution, the device supports 2 profiles: constrained baseline and
constrained high
o Each frame descriptor supports 2 frame intervals: dwFrameInterval =
333,333 (30 Hz) and dwFrameInterval = 666,666 (15 Hz)
o Each frame descriptor supports 2 bUsage values: UC Config mode 0 and UC
Config mode 1
Video Streaming
Interface 2
EU OT (H.264 Simulcast
format)
USB IN Endpoint
2
wMaxMBperSecTwoResolutionsTem
poralSpatialScalability 0x0000
wMaxMBperSecThreeResolutionsTe
mporalSpatialScalability 0x0000
wMaxMBperSecFourResolutionsTem
poralSpatialScalability 0x0000
wMaxMBperSecOneResolutionFullSc
alability 0x0000
wMaxMBperSecTwoResolutionsFullS
calability 0x0000
wMaxMBperSecThreeResolutionsFull
Scalability 0x0000
wMaxMBperSecFourResolutionsFull
Scalability 0x0000
1.3 Scenario
Simulcast of 720p 30fps and 360p 30fps streams, both using UC Config mode 1 with two
temporal layers.
1.4 Negotiation
This section shows how the host negotiates a simulcast transport stream that consists of two
multiplexed H.264 streams that have different resolutions. The fields that start with
“wMaxMBperSec” in the Video Format Descriptor indicate that a simulcast stream
generated by this device can support up to two different resolutions. This is given by the
non-zero value of wMaxMBperSecTwoResolutionsNoScalability for a simulcast payload
composed of multiplexed AVC streams and of
wMaxMBperSecTwoResolutionsTemporalScalability for a simulcast payload composed of
temporal scalable streams.
Note that the value of wMaxMBperSecTwoResolutions (169,200 MB/s) indicates that
1080p at 30 fps (1920x1088x30/256 = 244800 MB/s) is not supported. A different way to
discover this restriction is to leverage GET_MAX, and is illustrated in step 3 in the
sequences given below.
Initially, the host selects a simulcast payload composed of two UC Config mode 1 H.264
streams with two temporal layers each where the highest resolution is 1080p and the second
resolution is 360p. Once it discovers that at 1080p the device does not support simulcast of
two H.264 streams, the host instead selects a simulcast payload with two H.264 streams
where the highest resolution is 720p. The 720p stream corresponds to the stream with
stream_id = 0 and is set to use VBR low delay rate control mode. The second stream
corresponds to the stream with stream_id = 1 and is set to Constant QP rate control mode.
The resolution of the second stream is configured to 360p once the device has an active
state. This is illustrated in section 0.
Host Device
VS_PROBE_CONTROL(GET_CUR)
VS_PROBE_CONTROL(GET_MIN)
VS_PROBE_CONTROL(GET_MAX)
VS_PROBE_CONTROL(GET_CUR)
VS_PROBE_CONTROL(SET_CUR)
VS_PROBE_CONTROL(GET_CUR)
Figure 1-2 illustrates the communication between host and device during the Probe and
Commit stage. The individual steps are:
1) The host sets the streaming interface Probe state by issuing a SET_CUR request to
the VS_PROBE_CONTROL with all the fields set to 0 except the following:
a. bFormatIndex = 0x01
b. bFrameIndex = 0x02
c. dwFrameInterval = 0x00051615
d. bUsage = 0x02
e. bmLayoutPerStream = 0x0000000000020002 (two H.264 streams, each with
2 temporal layers)
2) Given that at 1080p the device cannot support simulcast of two H.264 streams, upon
a GET_CUR request to the VS_PROBE_CONTROL, the device returns a
GET_CUR state with bmLayoutPerStream changed to 0x0000000000000002.
3) Next, the host issues a GET_MIN and a GET_MAX request to the
VS_PROBE_CONTROL and the device returns bmLayoutPerStream =
0x0000000000000001 and bmLayoutPerStream = 0x0000000000000002,
respectively, indicating that the device can support simulcast of one UC Config
mode 1 1080p stream with one or two temporal layers.
4) The host now issues a SET_CUR request to the VS_PROBE_CONTROL with all
the fields set to 0 except the following:
a. bFormatIndex = 0x01
b. bFrameIndex = 0x04
c. dwFrameInterval = 0x00051615
d. bUsage = 0x02
e. bmLayoutPerStream = 0x0000000000020002 (two H.264 streams, each with
2 temporal layers)
5) Upon a GET_CUR request to the VS_PROBE_CONTROL, the device returns the
following state:
Table 1-6 GET_CUR Probe state
Control Selector VS_PROBE_CONTROL
6) Host changes the rate control mode of the second stream to Constant QP mode by
issuing a SET_CUR request to the VS_PROBE_CONTROL with
bmLayoutPerStream = 0x0031 and with the remaining fields set to those values the
device returned in the GET_CUR state of step 5.
7) Upon a GET_CUR request to the VS_PROBE_CONTROL, the device returns the
same Probe data structure as the one set by the host in step 6.
8) The host sets the active device state by issuing a SET_CUR request to the
VS_COMMIT_CONTROL where all the field values match the GET_CUR state of
step 7. Table 1-7 shows the field values of the Commit data structure.
Table 1-7 SET_CUR Commit data structure
Control Selector VS_COMMIT_CONTROL
Note that the value of bmSettings is set to 0x2A, establishing CABAC as the entropy
encoding method. The host could have selected CAVLC by setting bmSettings to 0x29.
Host Device
EU_SELECT_LAYER_CONTROL(SET_CUR)
wLayerOrViewId = 0x0080
EU_VIDEO_RESOLUTION_CONTROL(GET_CUR) (temporal_id = 1, stream_id = 0)
wWidth = 1280
wHeight = 720
EU_AVERAGE_BIT_RATE_CONTROL(GET_CUR)
dwAverageBitRate = 4 Mbps
EU_SELECT_LAYER_CONTROL(SET_CUR)
wLayerOrViewId = 0x0000
(temporal_id = 0, stream_id = 0)
EU_AVERAGE_BITRATE_CONTROL(GET_CUR)
dwAverageBitRate = 2.5 Mbps
EU_AVERAGE_BITRATE_CONTROL(SET_CUR)
dwAverageBitRate = 800 Kbps
EU_CPB_SIZE_CONTROL(SET_CUR)
dwCPBSize = 25,000
( = 800 Kbps* 500 ms / 16)
EU_SELECT_LAYER_CONTROL(SET_CUR)
wLayerOrViewId = 0x0080
(temporal_id = 1, stream_id = 0)
EU_AVERAGE_BITRATE_CONTROL(SET_CUR)
wWidth = 640
wHeight = 360
EU_QUANTIZATION_PARAMS_CONTROL(SET_CUR)
wQpPrime_I= wQpPrime_P = 34
wQpPrime_B = 0xFFFF (don’t care
because B slices are unsupported in
UCConfig mode 1)
SET_INTERFACE(1)
STREAMING
Figure 1-3 Configuration of each stream in the simulcast payload prior to streaming
EU_SELECT_LAYER_CONTROL(SET_CUR)
wLayerOrViewId = 0x0480
(temporal_id = 1, stream_id = 1)
EU_QUANTIZATION_PARAMS_CONTROL(SET_CUR)
wQpPrime_I= wQpPrime_P = 40
EU_SELECT_LAYER_CONTROL(SET_CUR)
wLayerOrViewId = 0x0400
(temporal_id = 0, stream_id = 1)
EU_QUANTIZATION_PARAMS_CONTROL(SET_CUR)
wQpPrime_I= wQpPrime_P = 37
EU_MIN_FRAME_INTERVAL_CONTROL(SET_CUR)
dwFrameInterval= 666,666 (15 Hz)
EU_SELECT_LAYER_CONTROL(SET_CUR)
wLayerOrViewId = 0x1C00
(temporal_id = 0, stream_id = 7
(wildcard))
EU_SYNC_REF_FRAME_CONTROL(SET_CUR)
bSyncFrameType = 1,
wSyncFrameInterval =0,
bGradualDecoderRefresh = 0
Video Function
USB IN Endpoint
Sensor CT PU OT
1
Video Streaming
Interface 2
EU OT
USB IN Endpoint
2
The video function contains a Camera Terminal representing the sensor. The video streams
captured by the Camera Terminals go through any necessary analogue-to-digital conversion, and
are routed into a Processing Unit for video signal processing. The output from Processing Unit
fans out. It is routed to both Output Terminal for preview stream – which transmits the
uncompressed preview bitstream to the host via an USB IN endpoint – and Encoding Unit for
video compression. The output from Encoding Unit is routed to an Output Terminal which
transmits the compressed video bitstream to the host via another USB IN endpoint. Both USB-IN
endpoints are part of the single VideoStreaming interface that this device contains. The internals
of the video function (unit and terminal topology) are presented to the host through the
(mandatory) VideoControl interface.
2.2 Descriptor Hierarchy
This USB camera device uses a Video Interface Collection that includes:
1. VideoControl interface (interface 0),
LEGEND
Standard Descriptor
Video Function
Interface Association
Camera Terminal
Output Terminal 1
Output Terminal 2
Processing Unit
Encoding Unit
VideoStreaming Format
VideoStreaming Frame
String
String
String
String
2.3 Descriptors
The following sections present the class-specific Encoding Unit and all VP8 payload specific
descriptors that are used to describe the device to the host. For reference on how to configure the
generic and other class-specific descriptors see other examples.
2.3.1 Encoding Unit Descriptor
This descriptor describes the encoding unit that processed the video stream data that is delivered
by the processing unit. This implementation supports Select Layer, Video Resolution and Start or
Stop Layer/View controls both at initialization time and runtime.
Table 2-1 Encoding Unit Descriptor
Offset Field Size Value Description
0 bLength 1 0x0E Size of this descriptor, 14 bytes.
1 bDescriptorType 1 0x24 CS_INTERFACE
2 bDescriptorSubtype 1 0x07 VC_ENCODING_UNIT
3 bUnitID 1 0x05 This unit is #5.
4 bSourceID 1 0x04 This input pin of this unit is connected to the
output pin of unit #4.
5 iEncoding 1 0x02 Index of the string descriptor identifying the
encoding unit (Product string).
7 bControlSize 1 0x03 Size of the bmControls and
bmControlsRuntime fields, in bytes.
8 bmControls 3 0x010005 Supports Select Layer (D0), Video
Resolution (D2) and Start or Stop
Layer/View (D16) controls at initialization
time.
11 bmControlsRuntime 3 0x010005 Supports Select Layer (D0), Video
Resolution (D2) and Start or Stop
Layer/View (D16) controls at runtime.
2.4 Requests
Following example shows how the host and device collaborate through controls to configure the
device to stream a preview stream for local preview and two compressed simulcast streams, both
with one temporal enhancement layer. First simulcast stream will be configured to a resolution of
1280x720 with global average bit rate of 1 Mbit/s. Second simulcast stream will be configured to
a resolution of 640x360 with global average bit rate of 400 kbit/s. After that, the host sets the
device into a streaming state and video starts to stream through the Video Streaming Interfaces
of the device.
Once streaming host decides to stop streaming of the temporal enhancement layer on both
simulcast streams and enhances the quality of picture in a region of interest that is placed on the
bottom right corner of the viewport.
2.4.1 Probe & Commit for Video Streaming Interface Two
Probe & Commit for the Video Streaming Interface streaming VP8 Payload goes through the
usual negotiation according to the rules set forth in the USB-UVC 1.5 specification. Table 2-4
presents a valid negotiated value for the VS_COMMIT_CONTROL(SET_CUR) request.
Table 2-4 VS_COMMIT_CONTROL(SET_CUR) Request to VSI Two.
Control Selector VS_COMMIT_CONTROL
Request SET_CUR
wLength 34
Offse Field Siz Value Description
t e
0 bmHint 2 0x000F dwFrameInterval
(D0),
wKeyFrameRate
(D1),
wPFrameRate
(D2),
wCompQuality
(D3), and
wCompWindowSiz
e (D4) to be kept
fixed.
2 bFormatIndex 1 0x01 First video payload
format.
3 bFrameIndex 1 0x01 First frame
descriptor frame
type.
4 dwFrameInterval 4 0x00051615 30 frames per
second. (333,333 ns)
8 wKeyFrameRate 2 0x0000 N/A
10 wPFrameRate 2 0x0000 N/A
12 wCompQuality 2 0x0000 N/A
14 wCompWindowSize 2 0x0000 N/A
16 wDelay 2 0x0021 33ms internal
latency.
1
Field consist of four 8-bit values for simulcast streams with stream_id={0, 1, 2, 3}. Streams 2
and 3 are disabled. Streams 0 and 1 are configured for mode 4, which is the Global VBR mode.
2
Field consists of four 16-bit values for simulcast streams with stream_id={0, 1, 2, 3}. Streams 2
and 3 are disabled. Streams 0 and 1 have the following configuration (for field spec see
bmLayoutPerStream explanation and for temporal layer explanation see section “Temporal
Layering with VP8 Encoders” in VP8 Payload Format Specification):
temporal
allowed
allowed
allowed
allowed
allowed
allowed
allowed
allowed
allowed
allowed
allowed
allowed
enabled
number
Golden
Golden
Golden
Golden
stream
layers
Prev
Prev
Prev
Prev
enh
Alt
Alt
Alt
Alt
of
Bit 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Value 0 0 0 0 0 0 0 1 1 1 1 0 1 0 1 1
Hex 0 1 E B
temporal
enhancement layer
enabled. According
to the structure set
temporal base layer
may depend on
previous and golden
frame and temporal
enhancement layer
may depend on all
available reference
frames on both
simulcast streams.
Host Device
loop VS_PROBE_CONTROL(SET_CUR)
[each VSI]
VS_PROBE_CONTROL(GET_CUR)
VS_COMMIT_CONTROL(SET_CUR)
wLayerOrViewId = 0x0380
EU_SELECT_LAYER_CONTROL(SET_CUR) (temporal_id = 7 (wildcard),
stream_id = 0)
EU_VIDEO_RESOLUTION_CONTROL(SET_CUR) wWidth = 1280
wHeight = 720
EU_AVERAGE_BITRATE_CONTROL(SET_CUR)
dwAverageBitRate = 1,000,000
wLayerOrViewId = 0x0780
EU_SELECT_LAYER_CONTROL(SET_CUR) (temporal_id = 7 (wildcard)
stream_id = 1)
EU_VIDEO_RESOLUTION_CONTROL(SET_CUR) wWidth = 640
wHeight = 360
EU_AVERAGE_BITRATE_CONTROL(SET_CUR)
dwAverageBitRate = 400,000
loop SET_INTERFACE(1)
[each VSI]
STREAMING
wLayerOrViewId = 0x1C80
EU_SELECT_LAYER_CONTROL(SET_CUR) (temporal_id = 1,
stream_id = 7 (wildcard))
EU_START_OR_STOP_LAYER_CONTROL(SET_CUR)
bUpdate = 0
wROI_Top = 480
CT_REGION_OF_INTEREST(SET_CUR) wROI_Left = 720
wROI_Bottom = 720
wROI_Right = 1280
bmAutoControls = 0x80 (HQ)
Figure 2-3 Sequence for Configuring Preview Stream and Two Simulcast Streams.
===>Device Descriptor<===
bLength: 0x12
bDescriptorType: 0x01
bcdUSB: 0x0200
bDeviceClass: 0xEF This is a Multi-interface Function Code Device
bDeviceSubClass: 0x02 This is the Common Class Sub Class
bDeviceProtocol: 0x01 This is the Interface Association Descriptor protocol
bMaxPacketSize0: 0x40 = (64) Bytes
idVendor: 0x046D
idProduct: 0x0823
bcdDevice: 0x0010
iManufacturer: 0x00
iProduct: 0x00
iSerialNumber: 0x00
bNumConfigurations: 0x02
===>Configuration Descriptor<===
bLength: 0x09
bDescriptorType: 0x02
wTotalLength: 0x0CC2
bNumInterfaces: 0x04
bConfigurationValue: 0x01 (first configuration)
iConfiguration: 0x00
bmAttributes: 0x80
MaxPower: 0xFA
bInCollection: 0x01
baInterfaceNr[1]: 0x01
===>Configuration Descriptor<===
bLength: 0x09
bDescriptorType: 0x02
wTotalLength: 0x0CC2
bNumInterfaces: 0x04
bConfigurationValue: 0x02 (second configuration)
iConfiguration: 0x00
bmAttributes: 0x80
MaxPower: 0xFA