-
Notifications
You must be signed in to change notification settings - Fork 326
Investigation: Import VideoFrame from WebCodec to WebGPU #1380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the detailed proposal! As you know I think the best option is the last one: "Per-Plane Per-Readonly-Texture-Views Given the low-levelness of WebCodec, can't we ask application to provide the correct format for For the lifetime control, it seems maybe we don't need |
Yes, maybe getOptGPUTextureViewImportDescriptor could accept VideoFrame from WebCodec and HTMLVideoElement and return the formats. But still have concerns because user needs to know the video pixel format to know how to handle these views (e.g. choose the transform matrix), and it seems only VideoFrame could provide this metadata.
That maybe possible but my concerns is that users might need to know the component info (whether r value is valid or rg value is valid).( But maybe users could know this through video pixel format from VideoFrame.) |
For |
+1
+1
I'm not totally comfortable limiting the API to only what can be done with TextureViews (i.e. can't do copies). I'd like to understand this problem better - what expensive emulation would occur if Textures were provided instead? Of course expensive emulation defeats the purpose of this low-level API.
nit: UV planes should be signed. I have no idea what their internal format is, though, so I don't know if it should be rg8snorm or something else. |
|
ACK, "Per-Plane Per-Readonly-Texture-Views" makes sense to me then. |
Thank you for writing this down, @shaoboyan ! I somehow missed this issue, and now I think we are close to the point of no return with the "fat sampler" concept. If WebCodec can be a superior solution in the longer term, while In your suggestions, generally, each frame the user is expected to do the following:
The "Per-Plane Per-Readonly-Texture-Views" removes point (2) here. I wonder if we could go further and also remove (3) and (4)? For example the API might look like this: dictionary GPUVideoPlaneEntryLayout {
required GPUIndex32 binding;
GPUTextureFormat format;
};
dictionary GPUVideoFrameBindGroupDescriptor {
VideoFrame frame;
sequence<GPUVideoPlaneEntryLayout> entries;
};
interface GPUDevice {
GPUBindGroup createVideoFrameBindGroup(GPUVideoFrameBindGroupDescriptor desc);
}; We can call this "Whole Bind Group". I think it's the shortest path to what we are trying to expose here. Alternatively, if we want to allow copies, "Readonly Multiplanar Texture" looks good to me. |
I'm not sure what additional ease of use the |
It doesn't add more concepts. All of the proposals here add more dictionaries, and one of the proposal adds more concepts (multi-planar textures). The "Per-Plane Per-Readonly-Texture-Views" also adds 2 entry points to the device (not counting the release), instead of 1. A more important question is: how do you feel about waiting for the Web Codecs? They seem to be roughly on the same timeline as WebGPU, and there is interest. |
Views and bindgroups should be cheap, and I think it's ok for developer to create them each frame (within reason, but they're not going to have 100s of different video frames to wrap each frame). And the texture is done through and "import" so it's not really a texture creation.
That was our (Chromium contributors including @shaoboyan) initial idea, but AFAIK Firefox hasn't implemented WebCodec and WebKit isn't participating in WebCodec at all. Chromium is the only browser where WebCodec is likely to ship before WebGPU so I don't think it's a good idea to gate the powerful "video" interop capability on that API. However, we should eventually specify what happens when interoping WebCodec and WebGPU for browsers that implement both. |
The situation may be a bit different now. Firefox is definitely interested in implementing WebCodec this year (at least in Nightly), and Apple is participating in the discussions. |
WebCodecs is a huge dependency for us to take for WebGPU. If we were to only allow WebCodecs That said, the semantics of the API would be much less nuanced if we only allowed imports from WebCodecs, so there is certainly appeal to it. It's only been a few months since we discussed this, but it would still be good to hear from @litherum whether Apple would be okay with taking that dependency, since I don't know their current position on WebCodecs. |
@kvark Thanks for backing to this. |
To put this into context: Today, browsers internally (often) use something like WebCodec to sample from YUV planes for videos. However, not even our own browsers correctly handle all video inputs properly, leading to incorrect and non-portable results. (I have been fixing full-range video in Firefox over the last few weeks, except in h264-mp4s, which need yet more fixes) This is such a "sharp" tool that we would be expecting apps to do an even better job of handling videos than our browsers do today. Furthermore, browsers decode subsystems (and OSes!) don't necessarily give back the same plane configuration for the same video. This is a pretty extreme portability concern! |
@jdashg this is a fair argument, and it's an argument against WebCodec in general, even without any relation to WebGPU. |
It's not that the Web can't have less-portable APIs (in our opinion), it's that WebGPU shouldn't force users to use them. If WebCodecs is the only way to get zero-copy uploads into WebGPU, we're forcing developers to choose between (a) extra copy overhead and (b) significant additional engineering effort AND thorough testing of their video upload paths across more devices and browsers. |
Suppose we are in the world where WebCodecs are available. We can do A) provide a fat sampler mechanism, handle the web codecs internally, handle all kinds of transformations that need to happen (multi-planar resolution, color space conversion, clipping, rotation, etc), leaving those who want more control without options. Or we can do B) expose the mechanism of importing WebCodecs planes, and let user-space to figure out the rest. What stops us from developing a small user-space library that does the proper checks for plane formats, generates shader code, bindings, etc? Basically, everything that the browser is expected to do with #1666, we could do in this library, thus addressing the "hard to use" concern. And there is always going to be an easy path for those who don't want to bother with zero-copy: just use |
You all know I am generally a fan of "just do it in user-space", so I hope my resistance here underlines the concerns I have. There are things to punt to user-space for, and there are things that are Surprisingly Hard that we should try to make much harder to mess up. |
Unfortunately, none of this can be quantified, so I can only say (given my experience with what does and doesn't work in APIs, and also in this subfield of video decoding) that I think we have a duty to make something safer than the web-codec path but still with great perf characteristics. Also from my experience from where we tried to do something similar with designing around ImageBitmap as The New Hotness for WebGL, it just didn't pan out, and years on, it's still not the best way to deal with texture uploads, except for certain cases in Chrome. |
This issue is listed as "needs discussion" but I don't really see what aspect should be discussed in the WebGPU call. What's needed to make progress? |
Action item for myself: figure out how WebCodecs would deal with newly added formats. How that would or would not break the applications. |
I think it's the converse question of: w3c/webcodecs#67 The current spec answer appears to be "Doesn't matter, we only have one PixelFormat". However I'm sure someone has thought about this. |
FTR: Specification being discussed in #2124. |
As someone who's building a user-space application atop WebCodecs, I'd like to add a vote for option B) expose the mechanism of importing WebCodecs planes, and let user-space figure out the rest. The reasons are:
|
The proposal is to add a "Give me Y only" function although we've punted for now on actually figuring out how to specify it. #1681 Re: 2/3, low level access like this is of course going to be predicated on WebCodecs being adopted in the first place in Firefox/Safari. We're not at that point yet so I think we need the simplest possible API to start. Can WebCodecs really expose hardware decoding of new formats like ProRes without any browser implementation work? I don't know what the extensibility story is for WebCodecs but right now I think it's slightly restricted to what's needed for the majority of cases. (Maybe that's not true anymore, it's been a while since I looked.) |
Thanks for pointing to #1681. Reading through that issue, it isn't quite what I'd want, as it again implies potential color conversions to guarantee a consistent luma/luminance result. What I'd want instead, is to get the ability to 0-copy map the decoder frame's planes directly onto GPU textures. And then do any processing that's needed in user space. This capability is provided by at least the VideoToolbox (macOS, iOS) and MediaCodec (Android) APIs. Can't speak to Windows as I've no experience there, but I imagine it behaves similarly. As for more esoteric formats like ProRes, this could be facilitated through a platform-specific extension API. Where platform-specific keys are passed in an extension dictionary. I agree that this can be punted on for later though. |
Thanks, that's a useful perspective on it. (One reason we didn't standardize a luma/luminance thing yet is that we needed more info on the use cases.) I was trying to start out with a more color-aware solution under the assumption that arithmetic conversions would be quite cheap compared to loading the U/V planes. But if there's really no use to a color-aware solution then we would probably want to jump directly to a low-level one with an explicit Y-plane of a WebCodecs VideoFrame if possible. |
Yep. Just to make it a little more concrete. Here is the Apple API that maps an IOSurfaceRef (the object that holds the decoded pixel data as it comes directly out of the VideoToolbox API) to a MTLTexture (a Metal texture): https://developer.apple.com/documentation/metal/mtldevice/1433378-newtexturewithdescriptor?language=objc Note that you specify the plane you want to map. And you get a MTLTexture with the same number of channels as the IOSurfaceRef (e.g., R8 for Y, RG8 for UV, BGRA10_2 for wide gamut, etc). This is essentially guaranteed to be 0-copy. The MTLTexture directly references the pixel data written by the decoder when doing texture sampling. It is the most efficient way to get video frames from the decoder to the GPU for processing. Any additional color conversions would require an extra render-pass and texture allocation, which are often undesirable for 4K video or high-performance video processing. On Android this is accomplished via mapping AHardwareBuffer (what's produced by the AMediaCodec decoder) onto a Vulkan texture: https://developer.android.com/ndk/reference/group/a-hardware-buffer Note how many different hardware buffer formats there are in the above API docs. Fortunately, these plane formats are already enumerated in the WebGPU spec. So IMO all the browser should do internally is provide an API to map the decoded frame plane onto the matching WebGPU texture (implemented internally via the above 0-copy APIs), expose the frame's color space, crop & matrix transform metadata, and leave the rest to the user. |
Fix the package order for "prettier" in package.json
@litherum putting this for discussion in Milestone 2 since Safari "Added video-only support for Web Codecs." in TP 164. |
w3c/webcodecs#83 is the inverse of this issue on the WebCodecs side. I.e., creation of a VideoFrame from WebGPU objects. Your input would be welcome for any considerations there! |
Is this different from |
Yes, w3c/webcodecs#83 is about being able to do directly get a frame from WebGPU objects like this:
Developers can already create a |
Commented on the webcodecs issue. Thanks for surfacing it! |
GPU Web 2023-04-19 Atlantic
|
This is based on #1154 and focus on uploading VideoFrame from WebCodecs and Kangz@ inputs.
Rational
An important type of application that could be using WebGPU when it is released are applications handling video on the Web. These applications increasingly need to manipulate the video to add effects but also to extract data from them through machine learning. An example is the background replacement in Zoom video calls which does image processing to detect the background and then composites it with a replacement image.
Unlike HTMLVideoElement, the upcoming WebCodecs API allows applications to open a video stream and manipulate it at a very fine-grained level. WebCodecs exposes the exact format, colorspace, transform, and more importantly list of planes for a VideoFrame. We could imaging WebGPU combined with WebCodec could create amazing video application.
In current status, WebCodec can only interact with WebGPU through
CopyImageBitmapToTexture
, by uploading video contents toGPUTexture
. But the uploading performance is not good because extra copies/transforms(e.g. ImageBitmap creation, at least one copy to upload to the dst texture) are needed during uploading process.In WebGL, WEBGL_webcodecs_video_frame extension, which introduces an 0-copy uploading path(in HW decoder case) through VideoFrame from Webcodecs, shows better perf than "direct uploading"(1-copy path) in some cases(e.g. tighten-bandwidth platform).
So it is reasonable for WebGPU to use similar
Import
API to achieve an effiecient uploading path to interact with WebCodecs.Proposals for Import API
The purpose of this API are:
VideoFrame
from WebCodecs with minimum copy/transform.VideoFrame
so developers could have abilities to handle single plane (See feedbacks here and here).Current idl of
VideoFrame
contains lots of metadata about the frame content for js developer to get :There are several basic ideas:
Per-Plane Per-Readonly-Texture
This API will import each plane in
VideoFrame
to a separateGPUTexture
object. User needs to provide the correct plane size, compatible texture format and expected usages (readonly) for importing a planes as aGPUTexture
.webidl:
Using this API to import
VideoFrame
to WebGPU could like:User needs to provide:
VideoFrame
objectVideoFrame
)The
VideoFrame
will be 'locked' if any plane has been imported. And theVideoFrame
will be 'released' by callingGPUTexture.destroy()
for all importedplanes
Pros:
GPUTexture
objects that can be used in copy operations and to create texture views with format reinterpretation.GPUTexture.destroy()
eagerly.Challenges:
GPUTexture
wrapping individual planes of a native multi-planar texture(e.g. subresource states transition).Readonly Multiplanar Texture
This is very similar to the per-plane video importing but instead introduces new multi-planar WebGPU texture formats for the most common video formats (a concept that already exists in native GPU APIs). And users could create texture view on single plane by using
aspect
for this new format.webidl:
Using this API to import
VideoFrame
to WebGPU could like:The API will import
VideoFrame
to a GPUTexture object with multi-planar texture format. User needs to provide:VideoFrame
obejct.VideoFrame
size (Could get this fromVideoFrame
)The
VideoFrame
will be 'locked' if it has been imported. And theVideoFrame
will be 'released' by callingGPUTexture.destroy()
on importedGPUTexture
object.Pros:
GPUTexture
objects that can be used in copy operations and to create texture views with format reinterpretation.Challenges:
Per-Plane Per-Readonly-Texture-Views
This introduces some new APIs based on #1154 to import the
GPUTextureSource
but return multipleGPUTextureView
. An API is introduced to release the imported resouce explictly.Using this API to import
VideoFrame
to WebGPU could like:The API will import
VideoFrame
to severalGPUTextureView
objects. User needs to provide:VideoFrame
object.The
VideoFrame
will be 'locked' if it has been imported. And theVideoFrame
will be 'released' by callingreleaseImportSource
withVideoFrame
object as parameter.Pros:
GPUTextureView
which can be set in the bind group directly.Challenges:
GPUTexture.destroy()
to release.The text was updated successfully, but these errors were encountered: