Skip to content

Investigation: Import VideoFrame from WebCodec to WebGPU #1380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shaoboyan opened this issue Jan 27, 2021 · 34 comments · Fixed by #4063
Closed

Investigation: Import VideoFrame from WebCodec to WebGPU #1380

shaoboyan opened this issue Jan 27, 2021 · 34 comments · Fixed by #4063
Labels
feature request A request for a new GPU feature exposed in the API investigation
Milestone

Comments

@shaoboyan
Copy link
Contributor

shaoboyan commented Jan 27, 2021

This is based on #1154 and focus on uploading VideoFrame from WebCodecs and Kangz@ inputs.

Rational

An important type of application that could be using WebGPU when it is released are applications handling video on the Web. These applications increasingly need to manipulate the video to add effects but also to extract data from them through machine learning. An example is the background replacement in Zoom video calls which does image processing to detect the background and then composites it with a replacement image.

Unlike HTMLVideoElement, the upcoming WebCodecs API allows applications to open a video stream and manipulate it at a very fine-grained level. WebCodecs exposes the exact format, colorspace, transform, and more importantly list of planes for a VideoFrame. We could imaging WebGPU combined with WebCodec could create amazing video application.

In current status, WebCodec can only interact with WebGPU through CopyImageBitmapToTexture, by uploading video contents to GPUTexture. But the uploading performance is not good because extra copies/transforms(e.g. ImageBitmap creation, at least one copy to upload to the dst texture) are needed during uploading process.

In WebGL, WEBGL_webcodecs_video_frame extension, which introduces an 0-copy uploading path(in HW decoder case) through VideoFrame from Webcodecs, shows better perf than "direct uploading"(1-copy path) in some cases(e.g. tighten-bandwidth platform).

So it is reasonable for WebGPU to use similar Import API to achieve an effiecient uploading path to interact with WebCodecs.

Proposals for Import API

The purpose of this API are:

  • Providing an uploading path for WebGPU with VideoFrame from WebCodecs with minimum copy/transform.
  • Expose planes in VideoFrame so developers could have abilities to handle single plane (See feedbacks here and here).

Current idl of VideoFrame contains lots of metadata about the frame content for js developer to get :

[Exposed=(Window,Worker)]
interface VideoFrame {
  constructor(ImageBitmap imageBitmap, VideoFrameInit frameInit);
  constructor(PixelFormat pixelFormat, sequence<(Plane or PlaneInit)> planes,
              VideoFrameInit frameInit);

  readonly attribute PixelFormat format;
  readonly attribute FrozenArray<Plane> planes;
  readonly attribute unsigned long codedWidth;
  readonly attribute unsigned long codedHeight;
  readonly attribute unsigned long cropLeft;
  readonly attribute unsigned long cropTop;
  readonly attribute unsigned long cropWidth;
  readonly attribute unsigned long cropHeight;
  readonly attribute unsigned long displayWidth;
  readonly attribute unsigned long displayHeight;
  readonly attribute unsigned long long? duration;
  readonly attribute unsigned long long? timestamp;

  undefined destroy();
  VideoFrame clone();

  Promise<ImageBitmap> createImageBitmap(
    optional ImageBitmapOptions options = {});

};

There are several basic ideas:

Per-Plane Per-Readonly-Texture

This API will import each plane in VideoFrame to a separate GPUTexture object. User needs to provide the correct plane size, compatible texture format and expected usages (readonly) for importing a planes as a GPUTexture.
webidl:

interface GPUTextureImportOptions {
    // ReadOnly Usages
    GPUTextureUsage usage;
    // Need to compatible with VideoFrame plane format
    GPUTextureFormat format;
};

interface GPUTextureImportDescriptor : GPUTextureImportOptions {
    (VideoFrame.plane as GPUTextureSource) source;
    GPUExtent3D size; // Can always be known from VideoFrame
};

interface GPUDevice {
    GPUTexture importTexture( GPUTextureImportDescriptor desc);
 };

Using this API to import VideoFrame to WebGPU could like:

function frame() {
    const videoFrame = getSomeWebCodecVideoFrame();
    // From videoFrame.format knows that this frame is 'I420'.
    // User can get the default formats by calling getOptimalImportOptions.
    
    const plane0 = device.importTexture({
        source: videoFrame.plane[0],
        size: [videoFrame.codedWidth, videoFrame.codedHeight],
        usage: GPUTextureUsage.SAMPLED,
        format: "r8unorm" // compatible plane format for "I420"
    });
    const plane1 = device.importTexture({
        source: videoFrame.plane[1],
        size: [videoFrame.codedWidth / 2, videoFrame.codedHeight / 2],
        usage: GPUTextureUsage.SAMPLED,
        format: "rg8unorm" // compatible plane format for "I420"
    });
    
    const plane0View = plane0.createView();
    const plane1View = plane1.createView();
    
    // Using shaders to access each plane and do some ops.
    
    // Destory plane textures to release VideoFrame.
    plane0.destroy();
    plane1.destroy();
}

User needs to provide:

  • VideoFrame object
  • Correct plane size (Could get this from VideoFrame)
  • Compatible texture format
  • Expected usage (Readonly one)

The VideoFrame will be 'locked' if any plane has been imported. And the VideoFrame will be 'released' by calling GPUTexture.destroy() for all imported planes

Pros:

  • Users have GPUTexture objects that can be used in copy operations and to create texture views with format reinterpretation.
  • It is possible to call GPUTexture.destroy() eagerly.

Challenges:

  • It introduces implementation complexity for creating multiple GPUTexture wrapping individual planes of a native multi-planar texture(e.g. subresource states transition).
  • Texture format reinterpretation may need some special restrictions.

Readonly Multiplanar Texture

This is very similar to the per-plane video importing but instead introduces new multi-planar WebGPU texture formats for the most common video formats (a concept that already exists in native GPU APIs). And users could create texture view on single plane by using aspect for this new format.
webidl:

enum GPUTextureAspect {
    "all",
    "stencil-only",
    "depth-only",
    "plane0",
    "plane1",
    "plane2",
    "plane3" // Maximum plane number of multi-planar format is 4
};

enum GPUTextureFormat {
// Multi-planar texture formats
"I420",
...
};

interface GPUTextureImportOptions {
    // ReadOnly Usages
    GPUTextureUsage usage;
    // Need to compatible with VideoPixelFormat which can be know from VideoFrame.format
    GPUTextureFormat format;
};

interface GPUTextureImportDescriptor : GPUTextureImportOptions {
    (VideoFrame as GPUTextureSource) source;
    GPUExtent3D size; // Can always be known from VideoFrame
};

interface GPUDevice {
    GPUTexture importTexture( GPUTextureImportDescriptor desc);
 };

Using this API to import VideoFrame to WebGPU could like:

function frame() {
    const videoFrame = getSomeWebCodecVideoFrame();
    // From videoFrame.format knows that this frame is 'I420'.
    
    const frame = device.importTexture({
        source: videoFrame,
        size: [videoFrame.codedWidth, videoFrame.codedHeight],
        usage: GPUTextureUsage.SAMPLED,
        format: videoFrame.format // assume it is 'I420'
    });
    
    const plane0View = frame.createView({aspect: 'plane0'});
    const plane1View = frame.createView({aspect: 'plane1'});
    
    // Using shaders to access each plane view and do some ops.
    
    // Destory plane textures to release VideoFrame.
    frame.destroy();
}

The API will import VideoFrame to a GPUTexture object with multi-planar texture format. User needs to provide:

  • VideoFrame obejct.
  • Correct VideoFrame size (Could get this from VideoFrame)
  • Compatible multi-planar texture format
  • Expected usage (Readonly one)

The VideoFrame will be 'locked' if it has been imported. And the VideoFrame will be 'released' by calling GPUTexture.destroy() on imported GPUTexture object.

Pros:

  • It is possible to call GPUTexture.destroy() eagerly.
  • The multi-planar formats would match the VideoFrame pixel format, which is a bit less confusing than the other alternatives.
  • Users have GPUTexture objects that can be used in copy operations and to create texture views with format reinterpretation.

Challenges:

  • The concept of multiplanar textures is exposed through the WebGPU API so they need to be well-specified and all operations made secure. This is a lot of work because the texture would have per-aspect mip size, weird copy restrictions, etc.
  • Users would likely not be able to create a multi-planar texture themselves (because the formats aren't supported universally) so they wouldn't be able to test their code with fake multiplanar textures.
  • Users need to be quite farmiliar to video pixel formats.

Per-Plane Per-Readonly-Texture-Views

This introduces some new APIs based on #1154 to import the GPUTextureSource but return multiple GPUTextureView. An API is introduced to release the imported resouce explictly.

interface GPUTextureViewImportDescriptor {
    (VideoFrame as GPUTextureSource) source;
    sequence<GPUTextureFormat>? formats;
};

interface GPUDevice {
    GPUTextureViewImportDescriptor
        getOptGPUTextureViewImportDescriptor((VideoFrame as GPUTextureSource) source);
    FrozenArray<GPUTextureView> importTextureView(GPUTextureViewImportDescriptor desc.);
    undefined releaseTextureSource((VideoFrame as GPUTextureSource) source);
 };

Using this API to import VideoFrame to WebGPU could like:

function frame() {
    const videoFrame = getSomeWebCodecVideoFrame();
    // From videoFrame.format knows that this frame is 'I420' and users can know plane compatible formats.
    // Or users could call getOptGPUTextureViewImportDescriptor(videoFrame)
    // to get default view formats and choose shaders.
    
    const [plane0View, plane1View] = device.importTextureView({ source: videoFrame});

    // Using shaders to access each plane view and do some ops.
    
    // release VideoFrame.
    releaseImportSource(videoFrame);
}

The API will import VideoFrame to several GPUTextureView objects. User needs to provide:

  • VideoFrame object.

The VideoFrame will be 'locked' if it has been imported. And the VideoFrame will be 'released' by calling releaseImportSource with VideoFrame object as parameter.

Pros:

  • We have GPUTextureView which can be set in the bind group directly.
  • The concept of multi-planar textures isn't exposed to WebGPU.
  • Validation and implemenation are not complex.

Challenges:

  • No support for copying from the texture views, no format reinterpretation and no GPUTexture.destroy() to release.
  • The browser still needs to implement multi-planar format support internally.
@shaoboyan shaoboyan added feature request A request for a new GPU feature exposed in the API investigation labels Jan 27, 2021
@Kangz
Copy link
Contributor

Kangz commented Jan 27, 2021

Thanks for the detailed proposal! As you know I think the best option is the last one: "Per-Plane Per-Readonly-Texture-Views
" because it doesn't expose the complexity of multi-planar textures in WebGPU and doesn't require expensive emulation as different textures in the implementation.

Given the low-levelness of WebCodec, can't we ask application to provide the correct format for importTextureViews? Even better, since the views can only be used as SAMPLED, can't we just guarantee that they are a "GPUTextureSampleType.Float" and not require a format?

For the lifetime control, it seems maybe we don't need GPUDevice.releaseImportSouce because WebCodec has its own lifetime control with VideoFrame.destroy(). We could specify things such that the underlying GPUTexture is destroyed when VideoFrame.destroy is called.

@shaoboyan
Copy link
Contributor Author

shaoboyan commented Jan 28, 2021

Given the low-levelness of WebCodec, can't we ask application to provide the correct format for importTextureViews?

Yes, maybe getOptGPUTextureViewImportDescriptor could accept VideoFrame from WebCodec and HTMLVideoElement and return the formats. But still have concerns because user needs to know the video pixel format to know how to handle these views (e.g. choose the transform matrix), and it seems only VideoFrame could provide this metadata.

can't we just guarantee that they are a "GPUTextureSampleType.Float" and not require a format?

That maybe possible but my concerns is that users might need to know the component info (whether r value is valid or rg value is valid).( But maybe users could know this through video pixel format from VideoFrame.)

@Kangz
Copy link
Contributor

Kangz commented Jan 28, 2021

Yes, maybe getOptGPUTextureViewImportDescriptor could accept VideoFrame from WebCodec and HTMLVideoElement and return the formats. But still have concerns because user needs to know the video pixel format to know how to handle these views (e.g. choose the transform matrix), and it seems only VideoFrame could provide this metadata.

For HTMLVideoElement there are more problems than just the format. Like you said the developer also needs to know the clipping, color transform, etc. That's why I think this solution should just be for VideoFrame. Either in the future HTMLVideoElement gains the ability to produce VideoFrames or we'll need a solution that's more like externalSamplerOES. WDYT?

@kainino0x
Copy link
Contributor

We could specify things such that the underlying GPUTexture is destroyed when VideoFrame.destroy is called.

+1

That's why I think this solution should just be for VideoFrame.

+1

As you know I think the best option is the last one: "Per-Plane Per-Readonly-Texture-Views" because it doesn't expose the complexity of multi-planar textures in WebGPU and doesn't require expensive emulation as different textures in the implementation.

I'm not totally comfortable limiting the API to only what can be done with TextureViews (i.e. can't do copies). I'd like to understand this problem better - what expensive emulation would occur if Textures were provided instead? Of course expensive emulation defeats the purpose of this low-level API.

  • Is it that the texture formats don't actually represent the internal formats of the texture planes? If so, it makes good sense to only allow sampling (and only expose TextureViews) - and it's very convenient that GPUTextureBindingLayout doesn't require an actual format.
  • OTOH, if there really are concrete texture formats (in actual decoder memory!), like r8unorm/rg8snorm, I'd like to use them.
    const plane1 = device.importTexture({
        source: videoFrame.plane[1],
        size: [videoFrame.codedWidth / 2, videoFrame.codedHeight / 2],
        usage: GPUTextureUsage.SAMPLED,
        format: "rg8unorm" // compatible plane format for "I420"
    });

nit: UV planes should be signed. I have no idea what their internal format is, though, so I don't know if it should be rg8snorm or something else.

@Kangz
Copy link
Contributor

Kangz commented Jan 29, 2021

I'm not totally comfortable limiting the API to only what can be done with TextureViews (i.e. can't do copies). I'd like to understand this problem better - what expensive emulation would occur if Textures were provided instead? Of course expensive emulation defeats the purpose of this low-level API.

  • If we go with "Per-Plane Per-Readonly-Texture" then we have to pretend we have separate textures for each plane when in fact we have a single one. This causes a ton of complexity in implementations because they need to somehow virtualize the concept of texture. There's also the issue that D3D12 can only do combined memory barriers for the N planes of multi-planar textures so that's additional tricky magic to be done.
  • If we go with "Readonly Multiplanar Texture" then copies work and there is no magic in implementations, but we have to standardize multi-planar formats in this group. It is a good potential long-term solution but I don't think there is appetite to standardize multi-planar textures for v1 (especially since they aren't universally supported).

@kainino0x
Copy link
Contributor

ACK, "Per-Plane Per-Readonly-Texture-Views" makes sense to me then.

@kvark
Copy link
Contributor

kvark commented May 5, 2021

Thank you for writing this down, @shaoboyan ! I somehow missed this issue, and now I think we are close to the point of no return with the "fat sampler" concept. If WebCodec can be a superior solution in the longer term, while copyExternalImageXxx is fine for the short term, now is the time to act.

In your suggestions, generally, each frame the user is expected to do the following:

  1. get a VideoFrame
  2. create GPUTexture (or multiple ones)
  3. create GPUTextureViews
  4. create GPUBindGroup
  5. draw
  6. clean up properly

The "Per-Plane Per-Readonly-Texture-Views" removes point (2) here. I wonder if we could go further and also remove (3) and (4)? For example the API might look like this:

dictionary GPUVideoPlaneEntryLayout {
    required GPUIndex32 binding;
    GPUTextureFormat format;
};
dictionary GPUVideoFrameBindGroupDescriptor {
    VideoFrame frame;
    sequence<GPUVideoPlaneEntryLayout> entries;
};
interface GPUDevice {
    GPUBindGroup createVideoFrameBindGroup(GPUVideoFrameBindGroupDescriptor desc);
 };

We can call this "Whole Bind Group". I think it's the shortest path to what we are trying to expose here.

Alternatively, if we want to allow copies, "Readonly Multiplanar Texture" looks good to me.

@Kangz
Copy link
Contributor

Kangz commented May 5, 2021

I'm not sure what additional ease of use the GPUVideoFrameBindGroupDescriptor gives us though. It's trivially polyfillable just adds more concepts to the API?

@kvark
Copy link
Contributor

kvark commented May 5, 2021

It doesn't add more concepts. All of the proposals here add more dictionaries, and one of the proposal adds more concepts (multi-planar textures). The "Per-Plane Per-Readonly-Texture-Views" also adds 2 entry points to the device (not counting the release), instead of 1.
It's technically very similar. Just feels weird to require the user to create a bunch of stuff every frame (views, textures, bind groups), when generally our message should be that resource creation should not be high frequency.

A more important question is: how do you feel about waiting for the Web Codecs? They seem to be roughly on the same timeline as WebGPU, and there is interest.
Going with "fat sampler" approach is slightly less risky, but also it's not like the approach is well tested either: if I understand correctly, it's too early to consider any feedback from WebGL side of it. I'm a bit concerned that it's not saleable. I.e. we'd have to consider "but what about fat sampler"? for each and every texture/sampler operation we add to WGSL, like the gather instructions. And it's not the end game either, since WebCodec is simply more powerful.

@Kangz
Copy link
Contributor

Kangz commented May 5, 2021

Just feels weird to require the user to create a bunch of stuff every frame (views, textures, bind groups), when generally our message should be that resource creation should not be high frequency.

Views and bindgroups should be cheap, and I think it's ok for developer to create them each frame (within reason, but they're not going to have 100s of different video frames to wrap each frame). And the texture is done through and "import" so it's not really a texture creation.

A more important question is: how do you feel about waiting for the Web Codecs? They seem to be roughly on the same timeline as WebGPU, and there is interest.

That was our (Chromium contributors including @shaoboyan) initial idea, but AFAIK Firefox hasn't implemented WebCodec and WebKit isn't participating in WebCodec at all. Chromium is the only browser where WebCodec is likely to ship before WebGPU so I don't think it's a good idea to gate the powerful "video" interop capability on that API. However, we should eventually specify what happens when interoping WebCodec and WebGPU for browsers that implement both.

@kvark
Copy link
Contributor

kvark commented May 5, 2021

The situation may be a bit different now. Firefox is definitely interested in implementing WebCodec this year (at least in Nightly), and Apple is participating in the discussions.
It would be so nice to avoid baking the color transformations, rotation, clipping, multi-planar fetching, etc, into the implementation... since the user can do all of it themselves (if they can access the planes directly), and better than us.

@kainino0x
Copy link
Contributor

WebCodecs is a huge dependency for us to take for WebGPU. If we were to only allow WebCodecs VideoFrames in importExternalTexture, then I think we would need a different pre-WebCodecs solution, like #1647 (which I'm not extremely fond of).

That said, the semantics of the API would be much less nuanced if we only allowed imports from WebCodecs, so there is certainly appeal to it.

It's only been a few months since we discussed this, but it would still be good to hear from @litherum whether Apple would be okay with taking that dependency, since I don't know their current position on WebCodecs.

@shaoboyan
Copy link
Contributor Author

@kvark Thanks for backing to this.
Personally, I think this API is an extension-like things (same as WebGL). HTMLVideoElement is well supported in all browsers so I think we need to handle it properly in WebGPU. WebCodec sounds like a powerful tool for professional media developers but it will also add overhead for normal developer(e.g. TFJS framework developers only wants a RGBA result from multiple plane textures), so I think an extension maybe the best position for this proposal :) .

@kdashg
Copy link
Contributor

kdashg commented May 6, 2021

To put this into context: Today, browsers internally (often) use something like WebCodec to sample from YUV planes for videos. However, not even our own browsers correctly handle all video inputs properly, leading to incorrect and non-portable results. (I have been fixing full-range video in Firefox over the last few weeks, except in h264-mp4s, which need yet more fixes) This is such a "sharp" tool that we would be expecting apps to do an even better job of handling videos than our browsers do today.

Furthermore, browsers decode subsystems (and OSes!) don't necessarily give back the same plane configuration for the same video. This is a pretty extreme portability concern!

@kvark
Copy link
Contributor

kvark commented May 6, 2021

@jdashg this is a fair argument, and it's an argument against WebCodec in general, even without any relation to WebGPU.
It feels to me that this isn't our group's decision to make - whether or not WebCodec is portable enough. The relevant W3C group should figure this out, possibly with your input, and communicate this decision. If the portability story is that bad, why does Chrome implement it today and Firefox is willing to implement? This doesn't match up.

@kainino0x
Copy link
Contributor

It's not that the Web can't have less-portable APIs (in our opinion), it's that WebGPU shouldn't force users to use them. If WebCodecs is the only way to get zero-copy uploads into WebGPU, we're forcing developers to choose between (a) extra copy overhead and (b) significant additional engineering effort AND thorough testing of their video upload paths across more devices and browsers.

@kvark
Copy link
Contributor

kvark commented May 6, 2021

Suppose we are in the world where WebCodecs are available. We can do A) provide a fat sampler mechanism, handle the web codecs internally, handle all kinds of transformations that need to happen (multi-planar resolution, color space conversion, clipping, rotation, etc), leaving those who want more control without options. Or we can do B) expose the mechanism of importing WebCodecs planes, and let user-space to figure out the rest.

What stops us from developing a small user-space library that does the proper checks for plane formats, generates shader code, bindings, etc? Basically, everything that the browser is expected to do with #1666, we could do in this library, thus addressing the "hard to use" concern.

And there is always going to be an easy path for those who don't want to bother with zero-copy: just use copyExternalImageToTexture (which would need to support videos).

@kdashg
Copy link
Contributor

kdashg commented May 6, 2021

You all know I am generally a fan of "just do it in user-space", so I hope my resistance here underlines the concerns I have. There are things to punt to user-space for, and there are things that are Surprisingly Hard that we should try to make much harder to mess up.

@kdashg
Copy link
Contributor

kdashg commented May 6, 2021

Unfortunately, none of this can be quantified, so I can only say (given my experience with what does and doesn't work in APIs, and also in this subfield of video decoding) that I think we have a duty to make something safer than the web-codec path but still with great perf characteristics.

Also from my experience from where we tried to do something similar with designing around ImageBitmap as The New Hotness for WebGL, it just didn't pan out, and years on, it's still not the best way to deal with texture uploads, except for certain cases in Chrome.

@Kangz
Copy link
Contributor

Kangz commented Jun 14, 2021

This issue is listed as "needs discussion" but I don't really see what aspect should be discussed in the WebGPU call. What's needed to make progress?

@kvark
Copy link
Contributor

kvark commented Jun 14, 2021

Action item for myself: figure out how WebCodecs would deal with newly added formats. How that would or would not break the applications.

@kainino0x
Copy link
Contributor

I think it's the converse question of: w3c/webcodecs#67
That asks: When the app creates a VideoFrame, how does it know what PixelFormats the UA can handle?
We want to know: When the UA creates a VideoFrame, how does it know what PixelFormats the app can handle?

The current spec answer appears to be "Doesn't matter, we only have one PixelFormat". However I'm sure someone has thought about this.

@Kangz Kangz added this to the V1.0 milestone Sep 2, 2021
@kainino0x
Copy link
Contributor

FTR: Specification being discussed in #2124.

@alexkarpenko
Copy link

alexkarpenko commented Nov 21, 2021

As someone who's building a user-space application atop WebCodecs, I'd like to add a vote for option B) expose the mechanism of importing WebCodecs planes, and let user-space figure out the rest.

The reasons are:

  1. I often want access to just the grayscale Y plane for video processing (e.g., feature tracking). I don't want a blackbox API that does color conversions for me which I do not want or need.
  2. There are many color space and format combinations. New ones are added frequently on OS upgrades. I can add support for them faster than it would take for all the browsers (firefox + chrome + safari teams) to add & ship support. Especially when this comes to more esoteric formats such as ProRes. So as long as the browser exposes a description of the frame, clipping, transform matrix, color profile, plane count, and the format type, I can do the rest.
  3. The browser should expose a thin API on top of the hardware decoder API (which is fairly uniform across platforms). Convenience conversion functions should be written in user space and polly-filled quickly without depending on browser updates. This allows me to ship faster to users and provide a consistent experience across browsers, since I do not depend on the browser teams to get it right inside their own black box.

@kainino0x
Copy link
Contributor

1. I often want access to just the grayscale Y plane for video processing (e.g., feature tracking). I don't want a blackbox API that does color conversions for me which I do not want or need.

The proposal is to add a "Give me Y only" function although we've punted for now on actually figuring out how to specify it. #1681

Re: 2/3, low level access like this is of course going to be predicated on WebCodecs being adopted in the first place in Firefox/Safari. We're not at that point yet so I think we need the simplest possible API to start.

Can WebCodecs really expose hardware decoding of new formats like ProRes without any browser implementation work? I don't know what the extensibility story is for WebCodecs but right now I think it's slightly restricted to what's needed for the majority of cases. (Maybe that's not true anymore, it's been a while since I looked.)

@alexkarpenko
Copy link

Thanks for pointing to #1681. Reading through that issue, it isn't quite what I'd want, as it again implies potential color conversions to guarantee a consistent luma/luminance result. What I'd want instead, is to get the ability to 0-copy map the decoder frame's planes directly onto GPU textures. And then do any processing that's needed in user space.

This capability is provided by at least the VideoToolbox (macOS, iOS) and MediaCodec (Android) APIs. Can't speak to Windows as I've no experience there, but I imagine it behaves similarly.

As for more esoteric formats like ProRes, this could be facilitated through a platform-specific extension API. Where platform-specific keys are passed in an extension dictionary. I agree that this can be punted on for later though.

@kainino0x
Copy link
Contributor

Thanks, that's a useful perspective on it. (One reason we didn't standardize a luma/luminance thing yet is that we needed more info on the use cases.) I was trying to start out with a more color-aware solution under the assumption that arithmetic conversions would be quite cheap compared to loading the U/V planes. But if there's really no use to a color-aware solution then we would probably want to jump directly to a low-level one with an explicit Y-plane of a WebCodecs VideoFrame if possible.

@alexkarpenko
Copy link

alexkarpenko commented Nov 23, 2021

Yep. Just to make it a little more concrete. Here is the Apple API that maps an IOSurfaceRef (the object that holds the decoded pixel data as it comes directly out of the VideoToolbox API) to a MTLTexture (a Metal texture): https://developer.apple.com/documentation/metal/mtldevice/1433378-newtexturewithdescriptor?language=objc

Note that you specify the plane you want to map. And you get a MTLTexture with the same number of channels as the IOSurfaceRef (e.g., R8 for Y, RG8 for UV, BGRA10_2 for wide gamut, etc). This is essentially guaranteed to be 0-copy. The MTLTexture directly references the pixel data written by the decoder when doing texture sampling. It is the most efficient way to get video frames from the decoder to the GPU for processing. Any additional color conversions would require an extra render-pass and texture allocation, which are often undesirable for 4K video or high-performance video processing.

On Android this is accomplished via mapping AHardwareBuffer (what's produced by the AMediaCodec decoder) onto a Vulkan texture: https://developer.android.com/ndk/reference/group/a-hardware-buffer

Note how many different hardware buffer formats there are in the above API docs. Fortunately, these plane formats are already enumerated in the WebGPU spec. So IMO all the browser should do internally is provide an API to map the decoded frame plane onto the matching WebGPU texture (implemented internally via the above 0-copy APIs), expose the frame's color space, crop & matrix transform metadata, and leave the rest to the user.

@kdashg kdashg modified the milestones: V1.0, post-V1 Dec 2, 2021
@Kangz Kangz modified the milestones: post-V1, Polish post-V1 Apr 14, 2022
ben-clayton pushed a commit to ben-clayton/gpuweb that referenced this issue Sep 6, 2022
Fix the package order for "prettier" in package.json
@Kangz
Copy link
Contributor

Kangz commented Feb 23, 2023

@litherum putting this for discussion in Milestone 2 since Safari "Added video-only support for Web Codecs." in TP 164.

@Kangz Kangz modified the milestones: Polish post-V1, Milestone 2 Feb 23, 2023
@dalecurtis
Copy link

w3c/webcodecs#83 is the inverse of this issue on the WebCodecs side. I.e., creation of a VideoFrame from WebGPU objects. Your input would be welcome for any considerations there!

@greggman
Copy link
Contributor

greggman commented Apr 6, 2023

@dalecurtis
Copy link

dalecurtis commented Apr 6, 2023

Yes, w3c/webcodecs#83 is about being able to do directly get a frame from WebGPU objects like this:

let frame = new VideoFrame(gpuTexture, {timestamp: 0});
let frame2 = new VideoFrame(gpuBuffer, {timestamp: 0});

Developers can already create a VideoFrame from canvas or use MediaStreamTrackProcessor to get a ReadableStream of VideoFrame objects, so it's not impossible to do this today; it just can't be done directly.

@kainino0x
Copy link
Contributor

Commented on the webcodecs issue. Thanks for surfacing it!

@kdashg
Copy link
Contributor

kdashg commented Apr 26, 2023

GPU Web 2023-04-19 Atlantic
  • CW: summary:
  • CW: 1) Source of importExternalTexture can be VideoFrame, too. Takes a union type.
  • CW: 2) Lifetime of external texture's tied to lifetime of VideoFrame. If you close the VideoFrame, it expires the external texture. External texture stays alive until you close the VideoFrame.
  • CW: we've implemented this in Chrome, it works, and is easier than HTMLVideoElement.
  • KR: Is there any autoexpiry of VideoFrame at the Web platform level. If you get the VideoFrame from a stream and drop it on the floor, does it expire?
  • KG: The decoders will deadlock if it doesn't expire. Some OSes video decoder have a ring buffer of frames and block if you don't stop using one (they always go in order).
  • KN: You could let GC take care of them, it might work, but you should close them.
  • BJ: Question is, does letting the GC close the object? If yes, does that destroy the externalTexture?
  • KN: The ExternalTexture would ref the VideoFrame.
  • KR: That works, and avoids the ExternalTexture transitioning unexpectedly to destroyed.
  • KN: Yes, that would be exposing the GC.
  • MM: The interesting about videos is not the sequence of frames, it is that frames have a timestamp associated with them. This is an important thing to let the window server handle the progression to avoid jank, and to let the display adapt to the refresh rate of the video.
  • MM: So GPUExternalTexture doesn't get these benefits. The swapchain is presented each frame immediately. So for WebGPU we'd like a way to integrate something like that, either have a timestamp on the WebGPU canvas, or have a way to put the result of WebGPU back in WebCodecs with a timestamp.
  • KG: you can do that. You can create a stream from your canvas.
  • MM: yes, but that loses the presentation time.
  • KG: do you need that now?
  • MM: no, but should come eventually.
  • KN: pretty sure you can already do this. Can create VideoFrame from canvas, turn into MediaStream, and the videoframe has a timestamp associated. Looks like you can specify a timestamp upon construction.
  • RC: not only are timestamps important for MM's reason - imp't for battery life too. Want to wake up at video framerate, not display refresh rate and go to sleep for most frames.
  • CW: think we all agree on the use cases/optimizations described. The integration of WebGPU with WebCodecs doesn't need to worry about this.
  • KG: the ingestion - not integration - with WebGPU. Timestamps, just pulling in WebCodecs - is more limited.
  • CW: WebGPU producing WebCodecs frames already works because of the Canvas API. Have prototyped that flow - it works. WebGPU's just a computation block in this model.
  • KG: main point - this issue is only about pulling in VideoFrames. That's something we can more easily talk about, rather than use cases for e.g. timestamps, which are important but which should be discussed in other issues.
  • KG: the proposal above makes sense.
  • KG: when the VideoFrame's closed, it invalidates the external texture?
  • CW: yes.
  • KG: seems reasonable.
  • CW: suggest that Chromium folks can close this issue, open a new one with just the description of what we discussed today. Either issue or PR form. Discuss more there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request A request for a new GPU feature exposed in the API investigation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants