Investigation: Import VideoFrame from WebCodec to WebGPU #1380

shaoboyan · 2021-01-27T08:29:03Z

This is based on #1154 and focus on uploading VideoFrame from WebCodecs and Kangz@ inputs.

Rational

An important type of application that could be using WebGPU when it is released are applications handling video on the Web. These applications increasingly need to manipulate the video to add effects but also to extract data from them through machine learning. An example is the background replacement in Zoom video calls which does image processing to detect the background and then composites it with a replacement image.

Unlike HTMLVideoElement, the upcoming WebCodecs API allows applications to open a video stream and manipulate it at a very fine-grained level. WebCodecs exposes the exact format, colorspace, transform, and more importantly list of planes for a VideoFrame. We could imaging WebGPU combined with WebCodec could create amazing video application.

In current status, WebCodec can only interact with WebGPU through CopyImageBitmapToTexture, by uploading video contents to GPUTexture. But the uploading performance is not good because extra copies/transforms(e.g. ImageBitmap creation, at least one copy to upload to the dst texture) are needed during uploading process.

In WebGL, WEBGL_webcodecs_video_frame extension, which introduces an 0-copy uploading path(in HW decoder case) through VideoFrame from Webcodecs, shows better perf than "direct uploading"(1-copy path) in some cases(e.g. tighten-bandwidth platform).

So it is reasonable for WebGPU to use similar Import API to achieve an effiecient uploading path to interact with WebCodecs.

Proposals for Import API

The purpose of this API are:

Providing an uploading path for WebGPU with VideoFrame from WebCodecs with minimum copy/transform.
Expose planes in VideoFrame so developers could have abilities to handle single plane (See feedbacks here and here).

Current idl of VideoFrame contains lots of metadata about the frame content for js developer to get :

[Exposed=(Window,Worker)]
interface VideoFrame {
  constructor(ImageBitmap imageBitmap, VideoFrameInit frameInit);
  constructor(PixelFormat pixelFormat, sequence<(Plane or PlaneInit)> planes,
              VideoFrameInit frameInit);

  readonly attribute PixelFormat format;
  readonly attribute FrozenArray<Plane> planes;
  readonly attribute unsigned long codedWidth;
  readonly attribute unsigned long codedHeight;
  readonly attribute unsigned long cropLeft;
  readonly attribute unsigned long cropTop;
  readonly attribute unsigned long cropWidth;
  readonly attribute unsigned long cropHeight;
  readonly attribute unsigned long displayWidth;
  readonly attribute unsigned long displayHeight;
  readonly attribute unsigned long long? duration;
  readonly attribute unsigned long long? timestamp;

  undefined destroy();
  VideoFrame clone();

  Promise<ImageBitmap> createImageBitmap(
    optional ImageBitmapOptions options = {});

};

There are several basic ideas:

Per-Plane Per-Readonly-Texture

This API will import each plane in VideoFrame to a separate GPUTexture object. User needs to provide the correct plane size, compatible texture format and expected usages (readonly) for importing a planes as a GPUTexture.
webidl:

interface GPUTextureImportOptions {
    // ReadOnly Usages
    GPUTextureUsage usage;
    // Need to compatible with VideoFrame plane format
    GPUTextureFormat format;
};

interface GPUTextureImportDescriptor : GPUTextureImportOptions {
    (VideoFrame.plane as GPUTextureSource) source;
    GPUExtent3D size; // Can always be known from VideoFrame
};

interface GPUDevice {
    GPUTexture importTexture( GPUTextureImportDescriptor desc);
 };

Using this API to import VideoFrame to WebGPU could like:

function frame() {
    const videoFrame = getSomeWebCodecVideoFrame();
    // From videoFrame.format knows that this frame is 'I420'.
    // User can get the default formats by calling getOptimalImportOptions.
    
    const plane0 = device.importTexture({
        source: videoFrame.plane[0],
        size: [videoFrame.codedWidth, videoFrame.codedHeight],
        usage: GPUTextureUsage.SAMPLED,
        format: "r8unorm" // compatible plane format for "I420"
    });
    const plane1 = device.importTexture({
        source: videoFrame.plane[1],
        size: [videoFrame.codedWidth / 2, videoFrame.codedHeight / 2],
        usage: GPUTextureUsage.SAMPLED,
        format: "rg8unorm" // compatible plane format for "I420"
    });
    
    const plane0View = plane0.createView();
    const plane1View = plane1.createView();
    
    // Using shaders to access each plane and do some ops.
    
    // Destory plane textures to release VideoFrame.
    plane0.destroy();
    plane1.destroy();
}

User needs to provide:

VideoFrame object
Correct plane size (Could get this from VideoFrame)
Compatible texture format
Expected usage (Readonly one)

The VideoFrame will be 'locked' if any plane has been imported. And the VideoFrame will be 'released' by calling GPUTexture.destroy() for all imported planes

Pros:

Users have GPUTexture objects that can be used in copy operations and to create texture views with format reinterpretation.
It is possible to call GPUTexture.destroy() eagerly.

Challenges:

It introduces implementation complexity for creating multiple GPUTexture wrapping individual planes of a native multi-planar texture(e.g. subresource states transition).
Texture format reinterpretation may need some special restrictions.

Readonly Multiplanar Texture

This is very similar to the per-plane video importing but instead introduces new multi-planar WebGPU texture formats for the most common video formats (a concept that already exists in native GPU APIs). And users could create texture view on single plane by using aspect for this new format.
webidl:

enum GPUTextureAspect {
    "all",
    "stencil-only",
    "depth-only",
    "plane0",
    "plane1",
    "plane2",
    "plane3" // Maximum plane number of multi-planar format is 4
};

enum GPUTextureFormat {
// Multi-planar texture formats
"I420",
...
};

interface GPUTextureImportOptions {
    // ReadOnly Usages
    GPUTextureUsage usage;
    // Need to compatible with VideoPixelFormat which can be know from VideoFrame.format
    GPUTextureFormat format;
};

interface GPUTextureImportDescriptor : GPUTextureImportOptions {
    (VideoFrame as GPUTextureSource) source;
    GPUExtent3D size; // Can always be known from VideoFrame
};

interface GPUDevice {
    GPUTexture importTexture( GPUTextureImportDescriptor desc);
 };

Using this API to import VideoFrame to WebGPU could like:

function frame() {
    const videoFrame = getSomeWebCodecVideoFrame();
    // From videoFrame.format knows that this frame is 'I420'.
    
    const frame = device.importTexture({
        source: videoFrame,
        size: [videoFrame.codedWidth, videoFrame.codedHeight],
        usage: GPUTextureUsage.SAMPLED,
        format: videoFrame.format // assume it is 'I420'
    });
    
    const plane0View = frame.createView({aspect: 'plane0'});
    const plane1View = frame.createView({aspect: 'plane1'});
    
    // Using shaders to access each plane view and do some ops.
    
    // Destory plane textures to release VideoFrame.
    frame.destroy();
}

The API will import VideoFrame to a GPUTexture object with multi-planar texture format. User needs to provide:

VideoFrame obejct.
Correct VideoFrame size (Could get this from VideoFrame)
Compatible multi-planar texture format
Expected usage (Readonly one)

The VideoFrame will be 'locked' if it has been imported. And the VideoFrame will be 'released' by calling GPUTexture.destroy() on imported GPUTexture object.

Pros:

It is possible to call GPUTexture.destroy() eagerly.
The multi-planar formats would match the VideoFrame pixel format, which is a bit less confusing than the other alternatives.
Users have GPUTexture objects that can be used in copy operations and to create texture views with format reinterpretation.

Challenges:

The concept of multiplanar textures is exposed through the WebGPU API so they need to be well-specified and all operations made secure. This is a lot of work because the texture would have per-aspect mip size, weird copy restrictions, etc.
Users would likely not be able to create a multi-planar texture themselves (because the formats aren't supported universally) so they wouldn't be able to test their code with fake multiplanar textures.
Users need to be quite farmiliar to video pixel formats.

Per-Plane Per-Readonly-Texture-Views

This introduces some new APIs based on #1154 to import the GPUTextureSource but return multiple GPUTextureView. An API is introduced to release the imported resouce explictly.

interface GPUTextureViewImportDescriptor {
    (VideoFrame as GPUTextureSource) source;
    sequence<GPUTextureFormat>? formats;
};

interface GPUDevice {
    GPUTextureViewImportDescriptor
        getOptGPUTextureViewImportDescriptor((VideoFrame as GPUTextureSource) source);
    FrozenArray<GPUTextureView> importTextureView(GPUTextureViewImportDescriptor desc.);
    undefined releaseTextureSource((VideoFrame as GPUTextureSource) source);
 };

Using this API to import VideoFrame to WebGPU could like:

function frame() {
    const videoFrame = getSomeWebCodecVideoFrame();
    // From videoFrame.format knows that this frame is 'I420' and users can know plane compatible formats.
    // Or users could call getOptGPUTextureViewImportDescriptor(videoFrame)
    // to get default view formats and choose shaders.
    
    const [plane0View, plane1View] = device.importTextureView({ source: videoFrame});

    // Using shaders to access each plane view and do some ops.
    
    // release VideoFrame.
    releaseImportSource(videoFrame);
}

The API will import VideoFrame to several GPUTextureView objects. User needs to provide:

VideoFrame object.

The VideoFrame will be 'locked' if it has been imported. And the VideoFrame will be 'released' by calling releaseImportSource with VideoFrame object as parameter.

Pros:

We have GPUTextureView which can be set in the bind group directly.
The concept of multi-planar textures isn't exposed to WebGPU.
Validation and implemenation are not complex.

Challenges:

No support for copying from the texture views, no format reinterpretation and no GPUTexture.destroy() to release.
The browser still needs to implement multi-planar format support internally.

The text was updated successfully, but these errors were encountered:

Kangz · 2021-01-27T14:31:25Z

Thanks for the detailed proposal! As you know I think the best option is the last one: "Per-Plane Per-Readonly-Texture-Views
" because it doesn't expose the complexity of multi-planar textures in WebGPU and doesn't require expensive emulation as different textures in the implementation.

Given the low-levelness of WebCodec, can't we ask application to provide the correct format for importTextureViews? Even better, since the views can only be used as SAMPLED, can't we just guarantee that they are a "GPUTextureSampleType.Float" and not require a format?

For the lifetime control, it seems maybe we don't need GPUDevice.releaseImportSouce because WebCodec has its own lifetime control with VideoFrame.destroy(). We could specify things such that the underlying GPUTexture is destroyed when VideoFrame.destroy is called.

shaoboyan · 2021-01-28T02:06:35Z

Given the low-levelness of WebCodec, can't we ask application to provide the correct format for importTextureViews?

Yes, maybe getOptGPUTextureViewImportDescriptor could accept VideoFrame from WebCodec and HTMLVideoElement and return the formats. But still have concerns because user needs to know the video pixel format to know how to handle these views (e.g. choose the transform matrix), and it seems only VideoFrame could provide this metadata.

can't we just guarantee that they are a "GPUTextureSampleType.Float" and not require a format?

That maybe possible but my concerns is that users might need to know the component info (whether r value is valid or rg value is valid).( But maybe users could know this through video pixel format from VideoFrame.)

Kangz · 2021-01-28T12:31:37Z

Yes, maybe getOptGPUTextureViewImportDescriptor could accept VideoFrame from WebCodec and HTMLVideoElement and return the formats. But still have concerns because user needs to know the video pixel format to know how to handle these views (e.g. choose the transform matrix), and it seems only VideoFrame could provide this metadata.

For HTMLVideoElement there are more problems than just the format. Like you said the developer also needs to know the clipping, color transform, etc. That's why I think this solution should just be for VideoFrame. Either in the future HTMLVideoElement gains the ability to produce VideoFrames or we'll need a solution that's more like externalSamplerOES. WDYT?

kainino0x · 2021-01-29T02:46:08Z

We could specify things such that the underlying GPUTexture is destroyed when VideoFrame.destroy is called.

+1

That's why I think this solution should just be for VideoFrame.

+1

As you know I think the best option is the last one: "Per-Plane Per-Readonly-Texture-Views" because it doesn't expose the complexity of multi-planar textures in WebGPU and doesn't require expensive emulation as different textures in the implementation.

I'm not totally comfortable limiting the API to only what can be done with TextureViews (i.e. can't do copies). I'd like to understand this problem better - what expensive emulation would occur if Textures were provided instead? Of course expensive emulation defeats the purpose of this low-level API.

Is it that the texture formats don't actually represent the internal formats of the texture planes? If so, it makes good sense to only allow sampling (and only expose TextureViews) - and it's very convenient that GPUTextureBindingLayout doesn't require an actual format.
OTOH, if there really are concrete texture formats (in actual decoder memory!), like r8unorm/rg8snorm, I'd like to use them.

    const plane1 = device.importTexture({
        source: videoFrame.plane[1],
        size: [videoFrame.codedWidth / 2, videoFrame.codedHeight / 2],
        usage: GPUTextureUsage.SAMPLED,
        format: "rg8unorm" // compatible plane format for "I420"
    });

nit: UV planes should be signed. I have no idea what their internal format is, though, so I don't know if it should be rg8snorm or something else.

Kangz · 2021-01-29T14:31:54Z

I'm not totally comfortable limiting the API to only what can be done with TextureViews (i.e. can't do copies). I'd like to understand this problem better - what expensive emulation would occur if Textures were provided instead? Of course expensive emulation defeats the purpose of this low-level API.

If we go with "Per-Plane Per-Readonly-Texture" then we have to pretend we have separate textures for each plane when in fact we have a single one. This causes a ton of complexity in implementations because they need to somehow virtualize the concept of texture. There's also the issue that D3D12 can only do combined memory barriers for the N planes of multi-planar textures so that's additional tricky magic to be done.
If we go with "Readonly Multiplanar Texture" then copies work and there is no magic in implementations, but we have to standardize multi-planar formats in this group. It is a good potential long-term solution but I don't think there is appetite to standardize multi-planar textures for v1 (especially since they aren't universally supported).

kainino0x · 2021-01-30T00:26:09Z

ACK, "Per-Plane Per-Readonly-Texture-Views" makes sense to me then.

kvark · 2021-05-05T15:01:01Z

Thank you for writing this down, @shaoboyan ! I somehow missed this issue, and now I think we are close to the point of no return with the "fat sampler" concept. If WebCodec can be a superior solution in the longer term, while copyExternalImageXxx is fine for the short term, now is the time to act.

In your suggestions, generally, each frame the user is expected to do the following:

get a VideoFrame
create GPUTexture (or multiple ones)
create GPUTextureViews
create GPUBindGroup
draw
clean up properly

The "Per-Plane Per-Readonly-Texture-Views" removes point (2) here. I wonder if we could go further and also remove (3) and (4)? For example the API might look like this:

dictionary GPUVideoPlaneEntryLayout {
    required GPUIndex32 binding;
    GPUTextureFormat format;
};
dictionary GPUVideoFrameBindGroupDescriptor {
    VideoFrame frame;
    sequence<GPUVideoPlaneEntryLayout> entries;
};
interface GPUDevice {
    GPUBindGroup createVideoFrameBindGroup(GPUVideoFrameBindGroupDescriptor desc);
 };

We can call this "Whole Bind Group". I think it's the shortest path to what we are trying to expose here.

Alternatively, if we want to allow copies, "Readonly Multiplanar Texture" looks good to me.

Kangz · 2021-05-05T15:08:36Z

I'm not sure what additional ease of use the GPUVideoFrameBindGroupDescriptor gives us though. It's trivially polyfillable just adds more concepts to the API?

kvark · 2021-05-05T15:25:42Z

It doesn't add more concepts. All of the proposals here add more dictionaries, and one of the proposal adds more concepts (multi-planar textures). The "Per-Plane Per-Readonly-Texture-Views" also adds 2 entry points to the device (not counting the release), instead of 1.
It's technically very similar. Just feels weird to require the user to create a bunch of stuff every frame (views, textures, bind groups), when generally our message should be that resource creation should not be high frequency.

A more important question is: how do you feel about waiting for the Web Codecs? They seem to be roughly on the same timeline as WebGPU, and there is interest.
Going with "fat sampler" approach is slightly less risky, but also it's not like the approach is well tested either: if I understand correctly, it's too early to consider any feedback from WebGL side of it. I'm a bit concerned that it's not saleable. I.e. we'd have to consider "but what about fat sampler"? for each and every texture/sampler operation we add to WGSL, like the gather instructions. And it's not the end game either, since WebCodec is simply more powerful.

Kangz · 2021-05-05T15:48:29Z

Just feels weird to require the user to create a bunch of stuff every frame (views, textures, bind groups), when generally our message should be that resource creation should not be high frequency.

Views and bindgroups should be cheap, and I think it's ok for developer to create them each frame (within reason, but they're not going to have 100s of different video frames to wrap each frame). And the texture is done through and "import" so it's not really a texture creation.

A more important question is: how do you feel about waiting for the Web Codecs? They seem to be roughly on the same timeline as WebGPU, and there is interest.

That was our (Chromium contributors including @shaoboyan) initial idea, but AFAIK Firefox hasn't implemented WebCodec and WebKit isn't participating in WebCodec at all. Chromium is the only browser where WebCodec is likely to ship before WebGPU so I don't think it's a good idea to gate the powerful "video" interop capability on that API. However, we should eventually specify what happens when interoping WebCodec and WebGPU for browsers that implement both.

kvark · 2021-05-05T19:26:26Z

The situation may be a bit different now. Firefox is definitely interested in implementing WebCodec this year (at least in Nightly), and Apple is participating in the discussions.
It would be so nice to avoid baking the color transformations, rotation, clipping, multi-planar fetching, etc, into the implementation... since the user can do all of it themselves (if they can access the planes directly), and better than us.

kainino0x · 2021-05-05T20:58:14Z

WebCodecs is a huge dependency for us to take for WebGPU. If we were to only allow WebCodecs VideoFrames in importExternalTexture, then I think we would need a different pre-WebCodecs solution, like #1647 (which I'm not extremely fond of).

That said, the semantics of the API would be much less nuanced if we only allowed imports from WebCodecs, so there is certainly appeal to it.

It's only been a few months since we discussed this, but it would still be good to hear from @litherum whether Apple would be okay with taking that dependency, since I don't know their current position on WebCodecs.

shaoboyan · 2021-05-06T09:11:24Z

@kvark Thanks for backing to this.
Personally, I think this API is an extension-like things (same as WebGL). HTMLVideoElement is well supported in all browsers so I think we need to handle it properly in WebGPU. WebCodec sounds like a powerful tool for professional media developers but it will also add overhead for normal developer(e.g. TFJS framework developers only wants a RGBA result from multiple plane textures), so I think an extension maybe the best position for this proposal :) .

kdashg · 2021-05-06T17:46:44Z

To put this into context: Today, browsers internally (often) use something like WebCodec to sample from YUV planes for videos. However, not even our own browsers correctly handle all video inputs properly, leading to incorrect and non-portable results. (I have been fixing full-range video in Firefox over the last few weeks, except in h264-mp4s, which need yet more fixes) This is such a "sharp" tool that we would be expecting apps to do an even better job of handling videos than our browsers do today.

Furthermore, browsers decode subsystems (and OSes!) don't necessarily give back the same plane configuration for the same video. This is a pretty extreme portability concern!

kvark · 2021-05-06T18:48:35Z

@jdashg this is a fair argument, and it's an argument against WebCodec in general, even without any relation to WebGPU.
It feels to me that this isn't our group's decision to make - whether or not WebCodec is portable enough. The relevant W3C group should figure this out, possibly with your input, and communicate this decision. If the portability story is that bad, why does Chrome implement it today and Firefox is willing to implement? This doesn't match up.

kainino0x · 2021-05-06T19:02:59Z

It's not that the Web can't have less-portable APIs (in our opinion), it's that WebGPU shouldn't force users to use them. If WebCodecs is the only way to get zero-copy uploads into WebGPU, we're forcing developers to choose between (a) extra copy overhead and (b) significant additional engineering effort AND thorough testing of their video upload paths across more devices and browsers.

kvark · 2021-05-06T21:38:54Z

Suppose we are in the world where WebCodecs are available. We can do A) provide a fat sampler mechanism, handle the web codecs internally, handle all kinds of transformations that need to happen (multi-planar resolution, color space conversion, clipping, rotation, etc), leaving those who want more control without options. Or we can do B) expose the mechanism of importing WebCodecs planes, and let user-space to figure out the rest.

What stops us from developing a small user-space library that does the proper checks for plane formats, generates shader code, bindings, etc? Basically, everything that the browser is expected to do with #1666, we could do in this library, thus addressing the "hard to use" concern.

And there is always going to be an easy path for those who don't want to bother with zero-copy: just use copyExternalImageToTexture (which would need to support videos).

kdashg · 2021-05-06T22:36:17Z

You all know I am generally a fan of "just do it in user-space", so I hope my resistance here underlines the concerns I have. There are things to punt to user-space for, and there are things that are Surprisingly Hard that we should try to make much harder to mess up.

kdashg · 2021-05-06T22:42:41Z

Unfortunately, none of this can be quantified, so I can only say (given my experience with what does and doesn't work in APIs, and also in this subfield of video decoding) that I think we have a duty to make something safer than the web-codec path but still with great perf characteristics.

Also from my experience from where we tried to do something similar with designing around ImageBitmap as The New Hotness for WebGL, it just didn't pan out, and years on, it's still not the best way to deal with texture uploads, except for certain cases in Chrome.

Kangz · 2021-06-14T14:14:52Z

This issue is listed as "needs discussion" but I don't really see what aspect should be discussed in the WebGPU call. What's needed to make progress?

kvark · 2021-06-14T22:00:07Z

Action item for myself: figure out how WebCodecs would deal with newly added formats. How that would or would not break the applications.

kainino0x · 2021-06-14T22:09:10Z

I think it's the converse question of: w3c/webcodecs#67
That asks: When the app creates a VideoFrame, how does it know what PixelFormats the UA can handle?
We want to know: When the UA creates a VideoFrame, how does it know what PixelFormats the app can handle?

The current spec answer appears to be "Doesn't matter, we only have one PixelFormat". However I'm sure someone has thought about this.

kainino0x · 2021-11-13T02:07:56Z

FTR: Specification being discussed in #2124.

alexkarpenko · 2021-11-21T03:26:16Z

As someone who's building a user-space application atop WebCodecs, I'd like to add a vote for option B) expose the mechanism of importing WebCodecs planes, and let user-space figure out the rest.

The reasons are:

I often want access to just the grayscale Y plane for video processing (e.g., feature tracking). I don't want a blackbox API that does color conversions for me which I do not want or need.
There are many color space and format combinations. New ones are added frequently on OS upgrades. I can add support for them faster than it would take for all the browsers (firefox + chrome + safari teams) to add & ship support. Especially when this comes to more esoteric formats such as ProRes. So as long as the browser exposes a description of the frame, clipping, transform matrix, color profile, plane count, and the format type, I can do the rest.
The browser should expose a thin API on top of the hardware decoder API (which is fairly uniform across platforms). Convenience conversion functions should be written in user space and polly-filled quickly without depending on browser updates. This allows me to ship faster to users and provide a consistent experience across browsers, since I do not depend on the browser teams to get it right inside their own black box.

kainino0x · 2021-11-23T00:09:06Z

1. I often want access to just the grayscale Y plane for video processing (e.g., feature tracking). I don't want a blackbox API that does color conversions for me which I do not want or need.

The proposal is to add a "Give me Y only" function although we've punted for now on actually figuring out how to specify it. #1681

Re: 2/3, low level access like this is of course going to be predicated on WebCodecs being adopted in the first place in Firefox/Safari. We're not at that point yet so I think we need the simplest possible API to start.

Can WebCodecs really expose hardware decoding of new formats like ProRes without any browser implementation work? I don't know what the extensibility story is for WebCodecs but right now I think it's slightly restricted to what's needed for the majority of cases. (Maybe that's not true anymore, it's been a while since I looked.)

alexkarpenko · 2021-11-23T02:51:25Z

Thanks for pointing to #1681. Reading through that issue, it isn't quite what I'd want, as it again implies potential color conversions to guarantee a consistent luma/luminance result. What I'd want instead, is to get the ability to 0-copy map the decoder frame's planes directly onto GPU textures. And then do any processing that's needed in user space.

This capability is provided by at least the VideoToolbox (macOS, iOS) and MediaCodec (Android) APIs. Can't speak to Windows as I've no experience there, but I imagine it behaves similarly.

As for more esoteric formats like ProRes, this could be facilitated through a platform-specific extension API. Where platform-specific keys are passed in an extension dictionary. I agree that this can be punted on for later though.

kainino0x · 2021-11-23T22:43:05Z

Thanks, that's a useful perspective on it. (One reason we didn't standardize a luma/luminance thing yet is that we needed more info on the use cases.) I was trying to start out with a more color-aware solution under the assumption that arithmetic conversions would be quite cheap compared to loading the U/V planes. But if there's really no use to a color-aware solution then we would probably want to jump directly to a low-level one with an explicit Y-plane of a WebCodecs VideoFrame if possible.

alexkarpenko · 2021-11-23T23:49:39Z

Yep. Just to make it a little more concrete. Here is the Apple API that maps an IOSurfaceRef (the object that holds the decoded pixel data as it comes directly out of the VideoToolbox API) to a MTLTexture (a Metal texture): https://developer.apple.com/documentation/metal/mtldevice/1433378-newtexturewithdescriptor?language=objc

Note that you specify the plane you want to map. And you get a MTLTexture with the same number of channels as the IOSurfaceRef (e.g., R8 for Y, RG8 for UV, BGRA10_2 for wide gamut, etc). This is essentially guaranteed to be 0-copy. The MTLTexture directly references the pixel data written by the decoder when doing texture sampling. It is the most efficient way to get video frames from the decoder to the GPU for processing. Any additional color conversions would require an extra render-pass and texture allocation, which are often undesirable for 4K video or high-performance video processing.

On Android this is accomplished via mapping AHardwareBuffer (what's produced by the AMediaCodec decoder) onto a Vulkan texture: https://developer.android.com/ndk/reference/group/a-hardware-buffer

Note how many different hardware buffer formats there are in the above API docs. Fortunately, these plane formats are already enumerated in the WebGPU spec. So IMO all the browser should do internally is provide an API to map the decoded frame plane onto the matching WebGPU texture (implemented internally via the above 0-copy APIs), expose the frame's color space, crop & matrix transform metadata, and leave the rest to the user.

Fix the package order for "prettier" in package.json

Kangz · 2023-02-23T19:03:51Z

@litherum putting this for discussion in Milestone 2 since Safari "Added video-only support for Web Codecs." in TP 164.

dalecurtis · 2023-04-05T16:37:15Z

w3c/webcodecs#83 is the inverse of this issue on the WebCodecs side. I.e., creation of a VideoFrame from WebGPU objects. Your input would be welcome for any considerations there!

greggman · 2023-04-06T06:31:35Z

Is this different from captureStream?

https://developer.mozilla.org/en-US/docs/Web/API/HTMLCanvasElement/captureStream

example

dalecurtis · 2023-04-06T16:06:59Z

Yes, w3c/webcodecs#83 is about being able to do directly get a frame from WebGPU objects like this:

let frame = new VideoFrame(gpuTexture, {timestamp: 0});
let frame2 = new VideoFrame(gpuBuffer, {timestamp: 0});

Developers can already create a VideoFrame from canvas or use MediaStreamTrackProcessor to get a ReadableStream of VideoFrame objects, so it's not impossible to do this today; it just can't be done directly.

kainino0x · 2023-04-06T22:59:06Z

Commented on the webcodecs issue. Thanks for surfacing it!

kdashg · 2023-04-26T17:13:41Z

GPU Web 2023-04-19 Atlantic

CW: summary:
CW: 1) Source of importExternalTexture can be VideoFrame, too. Takes a union type.
CW: 2) Lifetime of external texture's tied to lifetime of VideoFrame. If you close the VideoFrame, it expires the external texture. External texture stays alive until you close the VideoFrame.
CW: we've implemented this in Chrome, it works, and is easier than HTMLVideoElement.
KR: Is there any autoexpiry of VideoFrame at the Web platform level. If you get the VideoFrame from a stream and drop it on the floor, does it expire?
KG: The decoders will deadlock if it doesn't expire. Some OSes video decoder have a ring buffer of frames and block if you don't stop using one (they always go in order).
KN: You could let GC take care of them, it might work, but you should close them.
BJ: Question is, does letting the GC close the object? If yes, does that destroy the externalTexture?
KN: The ExternalTexture would ref the VideoFrame.
KR: That works, and avoids the ExternalTexture transitioning unexpectedly to destroyed.
KN: Yes, that would be exposing the GC.
MM: The interesting about videos is not the sequence of frames, it is that frames have a timestamp associated with them. This is an important thing to let the window server handle the progression to avoid jank, and to let the display adapt to the refresh rate of the video.
MM: So GPUExternalTexture doesn't get these benefits. The swapchain is presented each frame immediately. So for WebGPU we'd like a way to integrate something like that, either have a timestamp on the WebGPU canvas, or have a way to put the result of WebGPU back in WebCodecs with a timestamp.
KG: you can do that. You can create a stream from your canvas.
MM: yes, but that loses the presentation time.
KG: do you need that now?
MM: no, but should come eventually.
KN: pretty sure you can already do this. Can create VideoFrame from canvas, turn into MediaStream, and the videoframe has a timestamp associated. Looks like you can specify a timestamp upon construction.
RC: not only are timestamps important for MM's reason - imp't for battery life too. Want to wake up at video framerate, not display refresh rate and go to sleep for most frames.
CW: think we all agree on the use cases/optimizations described. The integration of WebGPU with WebCodecs doesn't need to worry about this.
KG: the ingestion - not integration - with WebGPU. Timestamps, just pulling in WebCodecs - is more limited.
CW: WebGPU producing WebCodecs frames already works because of the Canvas API. Have prototyped that flow - it works. WebGPU's just a computation block in this model.
KG: main point - this issue is only about pulling in VideoFrames. That's something we can more easily talk about, rather than use cases for e.g. timestamps, which are important but which should be discussed in other issues.
KG: the proposal above makes sense.
KG: when the VideoFrame's closed, it invalidates the external texture?
CW: yes.
KG: seems reasonable.
CW: suggest that Chromium folks can close this issue, open a new one with just the description of what we discussed today. Either issue or PR form. Discuss more there.

shaoboyan added feature request A request for a new GPU feature exposed in the API investigation labels Jan 27, 2021

Kangz mentioned this issue Feb 4, 2021

Proposal for a GPUVideoTexture object. #1415

Closed

kainino0x mentioned this issue May 6, 2021

GPUExternalTexture: API for video; WGSL type and functions #1666

Merged

dogben mentioned this issue May 11, 2021

Relationship to WebGPU w3c/mediacapture-transform#34

Open

sandersdan mentioned this issue May 19, 2021

Negotiating colorspaces w3c/webcodecs#47

Closed

kainino0x added for webgpu editors meeting and removed for webgpu editors meeting labels Jun 14, 2021

Kangz added this to the V1.0 milestone Sep 2, 2021

kainino0x mentioned this issue Nov 13, 2021

Determine GPUExternalTexture lifetime #2124

Closed

kainino0x mentioned this issue Nov 25, 2021

Define importing VideoFrame into WebGPU w3c/webcodecs#412

Closed

kdashg modified the milestones: V1.0, post-V1 Dec 2, 2021

tidoust mentioned this issue Feb 15, 2022

Reducing memory copies on the Web w3c/strategy#242

Open

Kangz modified the milestones: post-V1, Polish post-V1 Apr 14, 2022

ben-clayton pushed a commit to ben-clayton/gpuweb that referenced this issue Sep 6, 2022

Fix order in package.json (gpuweb#1380)

7537403

Fix the package order for "prettier" in package.json

Kangz modified the milestones: Polish post-V1, Milestone 2 Feb 23, 2023

kainino0x mentioned this issue Apr 22, 2023

Allow importExternalTexture from VideoFrame #4063

Merged

kainino0x closed this as completed in #4063 Apr 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigation: Import VideoFrame from WebCodec to WebGPU #1380

Investigation: Import VideoFrame from WebCodec to WebGPU #1380

shaoboyan commented Jan 27, 2021 •

edited by Kangz

Loading

Kangz commented Jan 27, 2021

shaoboyan commented Jan 28, 2021 •

edited

Loading

Kangz commented Jan 28, 2021

kainino0x commented Jan 29, 2021

Kangz commented Jan 29, 2021

kainino0x commented Jan 30, 2021

kvark commented May 5, 2021

Kangz commented May 5, 2021

kvark commented May 5, 2021

Kangz commented May 5, 2021 •

edited

Loading

kvark commented May 5, 2021

kainino0x commented May 5, 2021

shaoboyan commented May 6, 2021

kdashg commented May 6, 2021 •

edited

Loading

kvark commented May 6, 2021

kainino0x commented May 6, 2021

kvark commented May 6, 2021

kdashg commented May 6, 2021

kdashg commented May 6, 2021

Kangz commented Jun 14, 2021

kvark commented Jun 14, 2021

kainino0x commented Jun 14, 2021

kainino0x commented Nov 13, 2021

alexkarpenko commented Nov 21, 2021 •

edited

Loading

kainino0x commented Nov 23, 2021

alexkarpenko commented Nov 23, 2021

kainino0x commented Nov 23, 2021

alexkarpenko commented Nov 23, 2021 •

edited

Loading

Kangz commented Feb 23, 2023

dalecurtis commented Apr 5, 2023

greggman commented Apr 6, 2023 •

edited

Loading

dalecurtis commented Apr 6, 2023 •

edited

Loading

kainino0x commented Apr 6, 2023

kdashg commented Apr 26, 2023

Investigation: Import VideoFrame from WebCodec to WebGPU #1380

Investigation: Import VideoFrame from WebCodec to WebGPU #1380

Comments

shaoboyan commented Jan 27, 2021 • edited by Kangz Loading

Rational

Proposals for Import API

Per-Plane Per-Readonly-Texture

Pros:

Challenges:

Readonly Multiplanar Texture

Pros:

Challenges:

Per-Plane Per-Readonly-Texture-Views

Pros:

Challenges:

Kangz commented Jan 27, 2021

shaoboyan commented Jan 28, 2021 • edited Loading

Kangz commented Jan 28, 2021

kainino0x commented Jan 29, 2021

Kangz commented Jan 29, 2021

kainino0x commented Jan 30, 2021

kvark commented May 5, 2021

Kangz commented May 5, 2021

kvark commented May 5, 2021

Kangz commented May 5, 2021 • edited Loading

kvark commented May 5, 2021

kainino0x commented May 5, 2021

shaoboyan commented May 6, 2021

kdashg commented May 6, 2021 • edited Loading

kvark commented May 6, 2021

kainino0x commented May 6, 2021

kvark commented May 6, 2021

kdashg commented May 6, 2021

kdashg commented May 6, 2021

Kangz commented Jun 14, 2021

kvark commented Jun 14, 2021

kainino0x commented Jun 14, 2021

kainino0x commented Nov 13, 2021

alexkarpenko commented Nov 21, 2021 • edited Loading

kainino0x commented Nov 23, 2021

alexkarpenko commented Nov 23, 2021

kainino0x commented Nov 23, 2021

alexkarpenko commented Nov 23, 2021 • edited Loading

Kangz commented Feb 23, 2023

dalecurtis commented Apr 5, 2023

greggman commented Apr 6, 2023 • edited Loading

dalecurtis commented Apr 6, 2023 • edited Loading

kainino0x commented Apr 6, 2023

kdashg commented Apr 26, 2023

shaoboyan commented Jan 27, 2021 •

edited by Kangz

Loading

shaoboyan commented Jan 28, 2021 •

edited

Loading

Kangz commented May 5, 2021 •

edited

Loading

kdashg commented May 6, 2021 •

edited

Loading

alexkarpenko commented Nov 21, 2021 •

edited

Loading

alexkarpenko commented Nov 23, 2021 •

edited

Loading

greggman commented Apr 6, 2023 •

edited

Loading

dalecurtis commented Apr 6, 2023 •

edited

Loading