Johannes Kron | 10aeb74 | 2020-03-06 10:34:22 | [diff] [blame] | 1 | # Absolute Capture Time |
| 2 | |
Johannes Kron | 73ff1ff | 2020-02-03 13:16:58 | [diff] [blame] | 3 | The Absolute Capture Time extension is used to stamp RTP packets with a NTP |
| 4 | timestamp showing when the first audio or video frame in a packet was originally |
| 5 | captured. The intent of this extension is to provide a way to accomplish |
| 6 | audio-to-video synchronization when RTCP-terminating intermediate systems (e.g. |
| 7 | mixers) are involved. |
| 8 | |
| 9 | **Name:** |
| 10 | "Absolute Capture Time"; "RTP Header Extension for Absolute Capture Time" |
| 11 | |
| 12 | **Formal name:** |
| 13 | <http://www.webrtc.org/experiments/rtp-hdrext/abs-capture-time> |
| 14 | |
| 15 | **Status:** |
| 16 | This extension is defined here to allow for experimentation. Once experience has |
| 17 | shown that it is useful, we intend to make a proposal based on it for |
| 18 | standardization in the IETF. |
| 19 | |
| 20 | Contact <chxg@google.com> for more info. |
| 21 | |
| 22 | ## RTP header extension format |
| 23 | |
| 24 | ### Data layout overview |
| 25 | Data layout of the shortened version of `abs-capture-time` with a 1-byte header |
| 26 | \+ 8 bytes of data: |
| 27 | |
| 28 | 0 1 2 3 |
| 29 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 |
| 30 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 31 | | ID | len=7 | absolute capture timestamp (bit 0-23) | |
| 32 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 33 | | absolute capture timestamp (bit 24-55) | |
| 34 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 35 | | ... (56-63) | |
| 36 | +-+-+-+-+-+-+-+-+ |
| 37 | |
| 38 | Data layout of the extended version of `abs-capture-time` with a 1-byte header + |
| 39 | 16 bytes of data: |
| 40 | |
| 41 | 0 1 2 3 |
| 42 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 |
| 43 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 44 | | ID | len=15| absolute capture timestamp (bit 0-23) | |
| 45 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 46 | | absolute capture timestamp (bit 24-55) | |
| 47 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 48 | | ... (56-63) | estimated capture clock offset (bit 0-23) | |
| 49 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 50 | | estimated capture clock offset (bit 24-55) | |
| 51 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 52 | | ... (56-63) | |
| 53 | +-+-+-+-+-+-+-+-+ |
| 54 | |
| 55 | ### Data layout details |
| 56 | #### Absolute capture timestamp |
| 57 | |
| 58 | Absolute capture timestamp is the NTP timestamp of when the first frame in a |
| 59 | packet was originally captured. This timestamp MUST be based on the same clock |
| 60 | as the clock used to generate NTP timestamps for RTCP sender reports on the |
| 61 | capture system. |
| 62 | |
| 63 | It's not always possible to do an NTP clock readout at the exact moment of when |
| 64 | a media frame is captured. A capture system MAY postpone the readout until a |
| 65 | more convenient time. A capture system SHOULD have known delays (e.g. from |
| 66 | hardware buffers) subtracted from the readout to make the final timestamp as |
| 67 | close to the actual capture time as possible. |
| 68 | |
| 69 | This field is encoded as a 64-bit unsigned fixed-point number with the high 32 |
| 70 | bits for the timestamp in seconds and low 32 bits for the fractional part. This |
| 71 | is also known as the UQ32.32 format and is what the RTP specification defines as |
| 72 | the canonical format to represent NTP timestamps. |
| 73 | |
| 74 | #### Estimated capture clock offset |
| 75 | |
| 76 | Estimated capture clock offset is the sender's estimate of the offset between |
| 77 | its own NTP clock and the capture system's NTP clock. The sender is here defined |
| 78 | as the system that owns the NTP clock used to generate the NTP timestamps for |
| 79 | the RTCP sender reports on this stream. The sender system is typically either |
| 80 | the capture system or a mixer. |
| 81 | |
| 82 | This field is encoded as a 64-bit two’s complement **signed** fixed-point number |
| 83 | with the high 32 bits for the seconds and low 32 bits for the fractional part. |
| 84 | It’s intended to make it easy for a receiver, that knows how to estimate the |
| 85 | sender system’s NTP clock, to also estimate the capture system’s NTP clock: |
| 86 | |
| 87 | Capture NTP Clock = Sender NTP Clock + Capture Clock Offset |
| 88 | |
| 89 | ### Further details |
| 90 | |
| 91 | #### Capture system |
| 92 | |
| 93 | A receiver MUST treat the first CSRC in the CSRC list of a received packet as if |
| 94 | it belongs to the capture system. If the CSRC list is empty, then the receiver |
| 95 | MUST treat the SSRC as if it belongs to the capture system. Mixers SHOULD put |
| 96 | the most prominent CSRC as the first CSRC in a packet’s CSRC list. |
| 97 | |
| 98 | #### Intermediate systems |
| 99 | |
| 100 | An intermediate system (e.g. mixer) MAY adjust these timestamps as needed. It |
| 101 | MAY also choose to rewrite the timestamps completely, using its own NTP clock as |
| 102 | reference clock, if it wants to present itself as a capture system for A/V-sync |
| 103 | purposes. |
| 104 | |
| 105 | #### Timestamp interpolation |
| 106 | |
| 107 | A sender SHOULD save bandwidth by not sending `abs-capture-time` with every |
| 108 | RTP packet. It SHOULD still send them at regular intervals (e.g. every second) |
| 109 | to help mitigate the impact of clock drift and packet loss. Mixers SHOULD always |
| 110 | send `abs-capture-time` with the first RTP packet after changing capture system. |
| 111 | |
| 112 | A receiver SHOULD memorize the capture system (i.e. CSRC/SSRC), capture |
| 113 | timestamp, and RTP timestamp of the most recently received `abs-capture-time` |
| 114 | packet on each received stream. It can then use that information, in combination |
| 115 | with RTP timestamps of packets without `abs-capture-time`, to extrapolate |
| 116 | missing capture timestamps. |
| 117 | |
| 118 | Timestamp interpolation works fine as long as there’s reasonably low NTP/RTP |
| 119 | clock drift. This is not always true. Senders that detect "jumps" between its |
| 120 | NTP and RTP clock mappings SHOULD send `abs-capture-time` with the first RTP |
| 121 | packet after such a thing happening. |