Skip to content

AudioNode Lifetime section seems to attempt to make garbage collection observable #1471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
karlt opened this issue Jan 12, 2018 · 86 comments · Fixed by #1810
Closed

AudioNode Lifetime section seems to attempt to make garbage collection observable #1471

karlt opened this issue Jan 12, 2018 · 86 comments · Fixed by #1810
Assignees
Milestone

Comments

@karlt
Copy link
Contributor

karlt commented Jan 12, 2018

Attempting to describe observable behavior on garbage collection wouldn't make
sense because garbage collection is the removal of objects that are no longer
needed. If their removal would cause changes in behaviour, then that implies
that the objects are still needed.

https://w3ctag.github.io/design-principles/#js-gc states the requirement in
reverse: "There must not be a way for author code to deduce when/if garbage
collection of JavaScript objects has run."

https://webaudio.github.io/web-audio-api/#lifetime-AudioNode
describes conditions that would keep an AudioNode alive and
the deletion of AudioNodes when none of these conditions are present.

The section appears to be describing garbage collection. The first condition
is "A normal reference obeying normal garbage collection rules". I'm not
clear exactly what this means. Garbage collection attempts to detect objects
that are no longer needed. Typically, implementations ensure that objects
that are still needed have "normal references" reachable from roots.

All other "types of references" listed seem to be explicit conditions where the
AudioNode is still needed.

One situation that is not explicitly described is when an AudioNode has
connections on its outputs. In this situation non-unit output channel counts
can affect downstream input mixing, and the behavior of some kinds of
downstream nodes (e.g. AudioWorkletNode) depends on whether or not these
connections exist. An implementation however is able to consider an otherwise
unneeded upstream AudioNode as unneeded if it maintains the (now unchangeable)
observable effects on downstream nodes when garbage collection the upstream node.

The section causing the most confusion is in

When an AudioNode has no references it will be deleted. Before it is deleted,
it will disconnect itself from any other AudioNodes which it is connected to.
In this way it releases all connection references (3) it has to other nodes.

The releasing of "connection references" to other nodes is kind of consistent
with the model that permits otherwise unneeded upstream nodes to be deleted,
but "disconnect itself from any other AudioNodes which it is connected to"
could be interpreted to mean a disconnection similar to an explicit
disconnect(). Such a disconnection can produce observable behavior changes on
downstream nodes.
#1453 (comment) at least is interpreting this to mean that the effects of the connections are
removed together with the connection references when the AudioNode is deleted.

If this section is trying to define an object lifetime model that is different
from that of JS objects then that is problematic. AudioNodes are JS objects,
and efficient and effective garbage collection algorithms cannot immediately
detect when an object becomes otherwise unneeded. There is a indeterminate
delay before the object is deleted.

Requiring behavior changes when the object is otherwise unneeded leads to
unpredictable behavior, as the behavior may either continue as if the object
is otherwise needed for a long period of time or not at all. Such variations
may occur either between implementations or within implementations.

@joeberkovitz
Copy link

I agree, this does seem to be a problem leading to the observability of GC.

@joeberkovitz joeberkovitz added this to the Web Audio V1 milestone Jan 12, 2018
@joeberkovitz
Copy link

Thinking out loud here... possibly not very soundly... it seems we can go in at least two different directions:

  1. Purposely hide all observable information resulting from disconnection of GC'ed nodes, whether automatic or GC-related. At the least, disconnection would have to be prevented from decreasing the number of input channels visible to downstream nodes -- the input channel count could only go up as new nodes were added. But it would make the dynamic mixing behavior asymmetric and weird, as removals (even explicit ones) would no longer cause the corresponding downstream adjustments.

  2. Disallow GC from having any effect on the graph at all. Nodes could no longer be pinned into the graph by merely maintaining a JS reference to them: they'd have to be "active" in some sense. This probably requires a very careful look at how we've defined "activity". However, the concepts of tail-time, the "playing" state of a AudioScheduledSourceNode and AudioWorkletProcessor's "active source flag" feel like they are sufficient, or almost so.

In many ways Option 2 feels like the more rational way to go here, but it would obviously be a big change. The hugest piece of this change seems to be that nodes could become disconnected from the graph due to their own deterministic changes in state. Since JS references to the node could still exist (because they would no longer prevent such disconnection), the node would have to make its own disconnected state visible to applications. Such nodes would either be no longer usable (and many of our nodes do become unusable once they are finished with their job), or there would have to be a codified way to re-use them.

@karlt
Copy link
Contributor Author

karlt commented Jan 17, 2018

I hadn't considered Option 1. It would work and is simple, but having explicit
disconnect() calls remove only some of the effects I think would be quite
unexpected. I'm unsure how much that matters, but option 3 below would
make this change unnecessary.

For option 2, I assume that an inactive node that became active again would be
automatically "reconnected" to the graph where it was before? If not I fear
option 2 would be too much a change in direction. I suspect there are plenty
of existing use cases where a processing graph is one-off pre-constructed and
one shot source nodes are added on demand.

Regardless, I expect this would be simpler than option 2:

  1. Require that the downstream effects of connected nodes are
    maintained if and when the implementation garbage collects.

    Nodes are not "disconnected" when garbage collected, but their
    "connection references" can be removed.

This would be equivalent to not having any AudioNode lifetime section at all.
The implementation is free to garbage collect any resources provided that they
are no longer needed to maintain specified behavior.

@joeberkovitz
Copy link

joeberkovitz commented Jan 18, 2018

The WG discussed this today. We came up with the following proposal along the lines of Option 2:

  • Clarify the Node Lifetime section https://webaudio.github.io/web-audio-api/#lifetime-AudioNode to define that the lifetime in question is the lifetime of a AudioNode's effects on the graph, not of the JS AudioNode object itself.

  • Remove condition 1 of this section, i.e. delete the language: "A normal reference obeying normal garbage collection rules." As a result of this change, GC can no longer have any effect on the audio graph which fixes this issue.

  • Modify condition 2 of this section to clarify that AudioScheduledSourceNodes are retained in the graph not only while playing, but also during the period preceding a scheduled start time.

As Karl and I pointed out, the big consequence of this change is that a node can now be discarded from the graph due to inactivity even though it has JS references. If the holder of the JS reference could reactivate the node somehow (e.g. by connecting new inputs to it), this would present a problem. There are no such source node types, though, except AudioWorkletNode. Consequently we proposed to resolve #1463 by stating that returning false from process() causes the node to become permanently inactive.

On reflection this approach seems to me to have a big flaw: Consider an app that puts a GainNode into the graph, and holds a JS reference to that node with the intention of connecting other source nodes to its input later -- a very common pattern for a "master gain control". If this node automagically disappears from the graph (since it has no incoming connections yet, and it's not "active" as defined by the Lifetime section) then we broke this application. On the other hand, if we say that GainNodes always stick around until explicitly removed, we'll break all the applications that dynamically schedule a source S routed through a source-specific GainNode G: these apps create S and G on the fly, with the expectation that when S goes away, G will disappear too.

If that logic is correct, we're left with these choices:

  • Allow "resuscitated" nodes to be automatically reconnected to the same place in the graph (which might require all downstream nodes to be also automatically reconnected to their former spots)

  • Option 1 (effects of all disconnections are not observable from downstream)

  • Option 3 (effects of disconnections "due to GC" are not observable downstream)

@joeberkovitz
Copy link

I'll mention one other infeasible approach which is to require app developers to explicitly mark nodes that ought to magically go away when disconnected. Incompatible.

@rtoy
Copy link
Member

rtoy commented Jan 29, 2018

I think, independent of what the lifetime section says, GC is actually observable because we added AudioWorkletNode.

Consider a default ChannelMergerNode connected to an AudioWorklet. By default the merger has 6 channels on the single output. The worklet would see 6 channels on its input. If the reference to the merger node is dropped and collected, then there will suddenly be no inputs to the worklet. As currently spec'ed, input[0] will be an array with zero elements. Thus, you can tell that GC happened sometime before this change in the channel count.

I don't know how to fix this while also preserving the very, very useful feature of the worklet having dynamic channel count support.

@joeberkovitz
Copy link

Here is an outline of a possible fix to the situation. I do not think we can have perfect dynamic channel upmixing and GC-blindness at the same time, so this approach makes the channel upmixing a little less dynamic in order to eliminate the GC problem.

  • Add a variable named deactivated to the AudioNode interface, initially having the value false. This variable cannot be accessed programmatically.

  • Any node with a definite lifetime that is independent of GC (e.g. Oscillator, ABSN, etc.) must set deactivated to true when its lifetime ends. (Conversely, GC has no effect on the value of deactivated. So if a node's lifetime ends because of GC only, deactivated will be false.)

  • Add a variable named input channel floors to the AudioNode interface, initially having an array containing elements with value 0, with one element for each of the node's inputs. This variable is taken into account by dynamic channel count up-mixing: an input to a given node will always be up-mixed to include at least as many channels as the value of the corresponding element of input channel floors.

  • When a node N1 whose lifetime has ended has its outgoing connection to downstream node N2 automatically removed:

    • If N1's deactivated variable is false, then set the value of the corresponding element of N2's input channel floors to N1's output channel count. This causes N2 to continue reflecting the channel mixing effects of N1, even after N1 is gone. (For AudioWorkletNode, this implies that input[0] will not be empty, but shows a bunch of empty frames for the number of channels being "faked".)

This approach limits the cases where dynamic channel mixing is affected. In particular, dynamic channel count behavior will continue working, except in a small number of cases where developers rely on GC alone to eliminate nodes from the graph and these nodes have varying output channel counts.

@hoch
Copy link
Member

hoch commented Jan 29, 2018

It is Interesting that we had the similar thought, but consider this case:

N1 (6ch) -> N3
N2 (2ch) -> N3

Here N3's input channel floors will be 6, but what would be the input channel floors of N3 when N1 is removed by GC? 2 or 6? Either case poses a problem of its own:

  1. If the value is still 6, 2 channel input from N2 will be up-mixed to 6.
  2. If the value becomes 2, that means GC is still observable by inspecting the input channel count in N3.

Correct me if I misunderstood your proposal.

@joeberkovitz
Copy link

Let me explain what I think would happen here, I think maybe you misunderstood me (and I'm sure I could have been clearer!):

At the time the graph is constructed, N3's input channel floors will be set to [0], not [6]. A floor value for an input only gets set when a GCed node is disconnected from that input.

Thus, when N1 is removed by GC, then N3's input channel floors will be set to [6], making the GC unobservable. Correspondingly the input of N2 will continue being upmixed to 6.

Note that if N1 had been explicitly removed by calling N1.disconnect(), or if N1 was a source node that explicitly finished playing and was thus deactivated, input channel floors would still remain as [0]. So it's always possible to retain the dynamic mixing behavior, when there is an disconnection that is driven by an explicit, non-GC node lifetime.

@joeberkovitz
Copy link

[previous hasty comment deleted]

We also need a way for explicitly deactivated source nodes to communicate their state, by setting the deactivated flag on downstream nodes that then become eligible for release. This handles the case where subgraphs of sources chained through filter-type nodes like GainNode etc become inactive. I'm mulling this over and hope to post an update tomorrow after further thought.

@hoch
Copy link
Member

hoch commented Jan 29, 2018

Uh oh. Now I am confused.

At the time the graph is constructed, N3's input channel floors will be set to [0], not [6].

This sounds okay. How about during the rendering? What's the N3's input channel count?

A floor value for an input only gets set when a GCed node is disconnected from that input.

So it'll be 6 when N1 is GCed. Then it'll be 2 when N2 GCed? Then can't we get a signal when the number goes from 6 to 2? It won't match the exact timing of GC, but still this is a piece of information that developers can act upon.

Correspondingly the input of N2 will continue being upmixed to 6.

Well, this defeats the whole purpose of dynamic up/down mixing. The feels like sacrifice the most useful bit of WebAudio only to hide GC.

In my opinion, the source nodes are the least of our problems. The GainNode (N2 in the example) and the ChannelMerger (N1 in the example) going into an AudioWorkletNode (N3 in the example) would be our litmus test for GC observability.

@rtoy
Copy link
Member

rtoy commented Jan 29, 2018

I believe source nodes can be spec'ed so that GCing them isn't observable but this requires that AudioWorkletNodes need to change so that if a input is not connected, the process method is still called with an array (of all zeroes).

The real problem is with nodes whose number of output channels is fixed, like a ChannelMergerNode, PannerNode, StereoPannerNode, and ConvolverNode with more than on channel in the response buffer.

@karlt
Copy link
Contributor Author

karlt commented Jan 30, 2018

At the time the graph is constructed, N3's input channel floors will be set to [0], not [6].

This sounds okay. How about during the rendering? What's the N3's input channel count?

Before any GC occurs, input channel count follows the specified mixing rules.
For channelCountMode = "max", computedNumberOfChannels will be 6. After GC
occurs, the maximum is recorded so that computedNumberOfChannels remains at 6.

A floor value for an input only gets set when a GCed node is disconnected from that input.

So it'll be 6 when N1 is GCed. Then it'll be 2 when N2 GCed?

No, this is just a mechanism to maintain the same behavior as before the GC.
input channel floors need only record the maximum of the nodes that have been
GCed. If N1 is GCed, followed by N2, then input channel floors will remain at [6].
If N2 is GCed first, then input channel floors will first become [2], and then
become [6] when N1 is GCed.

Correspondingly the input of N2 will continue being upmixed to 6.

Well, this defeats the whole purpose of dynamic up/down mixing. The feels like sacrifice the most useful bit of WebAudio only to hide GC.

Would this really be a sacrifice?

It is specified that the input to be upmixed to 6 channels when N1 has stopped but GC has not occurred.
If that is OK, then it is OK for the input to be upmixed when N1 has finished and GC has occurred.

@rtoy
Copy link
Member

rtoy commented Jan 30, 2018 via email

@joeberkovitz
Copy link

Thanks @karlt for supplying the explanation, which is in line with my understanding.

@rtoy I believe that an AWN would continue to have an ability to duplicate the builtin nodes, since the proposed GC/lifetime/channel-floor rules are the same for AWNs as for any other node type. There is no difference in either its input or its output channel count behavior from other nodes: the definition given in https://webaudio.github.io/web-audio-api/#dfn-computednumberofchannels would be modified to take account of the AWN's input channel floor. And when an AWN is released due to GC alone (i.e. while its active source flag is false), its output channel count at that moment would be propagated to the input channel floor of whatever nodes the AWN was connected to. If I've missed something, perhaps you could spell out exactly the case where you believe the AWN is going to act differently.

That said, I do think there are some problems with this approach still, and it needs a more formal definition. I'm thinking about it and I expect to post an updated definition today or tomorrow in time for discussion on Thursday's call.

@joeberkovitz
Copy link

joeberkovitz commented Jan 30, 2018

The following changes will help us formulate new lifetime rules in a way that is more rigorous and which avoids magical connections between GC and lifetime, by introducing the general concept of an audio processor. It also clarifies the equivalence of AudioWorkletNode to other nodes, by defining AudioWorkletProcessor as an audio processor whose lifetime behavior is identical to that of other processors.

  • Each AudioNode possesses an associated audio processor. The audio processor is an object that is internal to the audio engine and is not directly accessible via JavaScript. The processor resides in the audio graph and performs the work associated with the node.

  • An audio processor is created when its owning AudioNode is constructed. The processor is destroyed when its lifetime is determined to have ended, or when its owning AudioContext is destroyed.

  • An AudioNode is a regular JavaScript object and as such has a lifetime determined solely via GC, as given in the language spec. The lifetime of its associated audio processor, on the other hand, is separate.

  • Each AudioNode maintains an internal JS reference to any AudioNode to which its outputs are connected. This means that nodes downstream in the graph from a retained AudioNode, will themselves be retained as per normal GC behavior.

  • Each audio processor has a referenced variable, initially set to true. When its AudioNode owner is garbage collected, as part of releasing the AudioNode's internal resources, a message is queued to the rendering thread to "mark the processor as unreferenced" (see steps below).

  • Each audio processor has a live variable, reflecting its "liveness" as determined by all of the rules enumerated in https://webaudio.github.io/web-audio-api/#lifetime-AudioNode, but with rule 1 defined in terms of the processor's referenced variable instead of " normal GC rules".

  • For clarity, we state that an AudioWorkletProcessor is the audio processor of an AudioWorkletNode.

  • A processor whose live variable changes to false is removed from the graph (and the implementation) immediately. Note that this may change the live value of a downstream node to false if it now lacks inputs.

  • The steps for "mark the processor as unreferenced" in the rendering thread are as follows:

    • Let previously live be the present value of the processor's live variable.

    • Set the processor's referenced variable to false.

    • Re-evaluate the processor's live flag, using the lifetime rules. This may remove the processor from the graph.

    • If previously live is true and live is false then the node has become dead due to GC. Propagate each of its outputs' current channel count to the corresponding elements of the input channel floors for each processor to which that output is connected.

    • If a processor formerly receiving input from this one now has a live value of false, then it was only being retained due to the now-destroyed connection. Set its referenced flag as true and invoke the algorithm "mark the processor as unreferenced" (recursively) on it, to further propagate input channel floors.

  • Redefine https://webaudio.github.io/web-audio-api/#dfn-computednumberofchannels to calculate the maximum of the existing definition, and the corresponding element of input channel floors.

It's likely that we'd need some global changes to refer to "processor" instead of "node" in a bunch of places but that's just linguistic.

@rtoy
Copy link
Member

rtoy commented Jan 30, 2018 via email

@joeberkovitz
Copy link

The worklet would only be subject to "the floor stuff" if its 32-channel source is disconnected as a result of garbage collection. If the 32-channel source goes away because it stops producing output (e.g. it's an ABSN that finishes playing, or is a filter on an upstream source that stops), or because it's explicitly disconnected, then there will be no "floor stuff" and the worklet will revert to seeing a mono input.

I think you're seeing this proposal as outright dispensing with dynamic channel counts, but the goal here is to make the "pinning" of channel counts apply to what will likely be edge cases, and ones that developers can easily remedy at that.

@karlt
Copy link
Contributor Author

karlt commented Jan 31, 2018

A biquad filters each channel separately. I connect a 32-channel source and a mono oscillator to the biquad. The biquad sees 32 channels on the input and creates 32 separate filter states, one per channel. The 32-channel source goes away, leaving the mono oscillator. The biquad now only has to filter just the mono channel, resulting in 32 times less computation.

As currently specified, if the 32-channel input is an ABSN that has finished,
the BiquadFilterNode with channelCountMode "max" will continue to filter 32
channels, at least until the source "goes away". Whether the source can just
go away due to GC is currently unclear from the current text (but that is what
this issue is addressing).

As you indicate, there is an opportunity for optimization if there is only a
single non-silent channel of input. A BiquadFilterNode implementation may
know that the 32 channels derive from a single channel and so skip some
processing (and downstream built-in nodes could even do similarly), but an
AudioWorkletProcessor wouldn't have this information, and so wouldn't be able
to optimize so effectively.

I'm not sure how important/common the temporary 32-channel input situation is,
but the probably more common case of ABSN could be addressed by changing its
output to be a single channel when not playing, as you and I suggested in
#462 (comment) and #462 (comment) .

Beware though that BiquadFilterNode (and perhaps other nodes) would also need
to changed because currently "The number of channels of the output always
equals the number of channels of the input." BiquadFilterNode has a tail
time. If the output channel count were to change from 32 to 1 at any time
before the tail time expires, then a step change (glitch) would be observed on
any downstream node.

As you point out, specifying a change in the fixed output channel count of
other nodes such as PannerNode, would be more complicated. I'm inclined to
think that optimizing for processing of 1 channel instead of 2 from
PannerNode, or more in the case of ChannelMergerNode or other explicit large
channel situations is not so important. I'm not clear whether or not
joeberkovitz's proposal is aiming to support optimization in these cases.

@karlt
Copy link
Contributor Author

karlt commented Jan 31, 2018

[...] all of the rules enumerated in https://webaudio.github.io/web-audio-api/#lifetime-AudioNode, but with rule 1 defined in terms of the processor's referenced variable instead of " normal GC rules".

A benefit of this processor concept is that this definition of rule 1 would be much more clear than "normal GC rules", thanks.

If previously live is true and live is false then the node has become dead due to GC. Propagate each of its outputs' current channel count to the corresponding elements of the input channel floors for each processor to which that output is connected.

previously live would always be true here IIUC because the referenced variable was true.

But note that there is a race re whether GC of the AudioNode occurs before or
processor is finished. If GC occurs first, then the processor will not leave the
effects of its output channel count when it becomes dead. If GC occurs
second, then the processor will leave its effects.

If ABSN were changed, then its output channel count change would propagate
through gain nodes and other simple filters, without the need for downstream
propagation of the live variable. I don't know whether it is worth
the additional complexity of a mechanism to affect output channel counts of
inactive PannerNodes, etc.

@joeberkovitz
Copy link

I thought of the race condition also, and it led me to an unexpected and rather pessimistic conclusion about this whole scheme -- or, rather family of schemes -- where we attempt to distinguish GC-related node death from other causes.

All such schemes appear doomed to make facts about the timing of GC observable in the following manner:

  • create a source node with N channels (an ABSN will do)
  • arrange for it to stop at time T
  • connect its output to an AudioWorkletNode
  • release all JS references to it
  • wait until time T
  • observe the input channel count of the AWN. If no channels are found, GC of the source node occurred prior to time T. If it's N, then GC occurred at or after time T.

@joeberkovitz
Copy link

In light of the observability of GC timing, it's time to take a more detailed look at possible node lifetime schemes that truly do ignore GC. Here is a start.

Let's begin with a very common use case that frames the central problem of what happens when GC has no influence on node lifetime. Our example application has:

  • A "master gain" GainNode that is permanently connected to the AudioDestinationNode, and is used to set the level of all sound output from an application. The master gain node has no inputs when created; later, when the application wishes to create sound, scheduled sources will be attached. The application maintains a reference to this master gain node, so that it can connect those sources to it. In today's world, that reference prevents the master gain node from being removed from the graph.

  • A set of "scheduled source" subgraphs that are dynamically created to create the application's sound output. Each subgraph is an OscillatorNode with some definite stop and start times, connected to a GainNode giving control of the level of this individual source. Each scheduled source's GainNode is in turn connected to the "master gain" node. These scheduled sources have no JS references and thus are fire-and-forget in today's world: when the OscillatorNode stops, it disconnects from the source's GainNode. Since there are no references to the GainNode, it disconnects from the master node, and both scheduled source nodes disappear.

Without GC influence on lifetime, it seems clear to me that we would need a new node attribute, endowing each of these GainNodes with different behavior, under application control. The master gain node needs to survive indefinitely, in spite of having no references. The scheduled source node needs to disappear as soon as its upstream Oscillator goes away.

Obviously this would break some applications, no matter which default value we adopt for such an attribtue.

@rtoy
Copy link
Member

rtoy commented Jan 31, 2018 via email

@rtoy
Copy link
Member

rtoy commented Jan 31, 2018 via email

@rtoy
Copy link
Member

rtoy commented Jan 31, 2018 via email

@hoch
Copy link
Member

hoch commented Jan 31, 2018

This is AudioWorklet-specific issue. We should not touch the other part of the spec to solve this. Also if this turns out to be a large-scale architecture problem, we should consider address this in V2. For that path, we maybe have to shut it down the dynamic channel count feature in AudioWorklet.

The channel count change being visible is a useful feature. But the feature is also the violation of GC observability. I don't think we can reconcile this without a messy hack. If we have to do this with messy hack, I rather have the fixed channel count in AudioWorklet to avoid the confusion around the spec and the implementation.

@rtoy
Copy link
Member

rtoy commented Jul 17, 2018

I think for an ABSN, it would output a single channel of silence only after the stop time had been reached. If the ABSN is looping a multi-channel silent buffer forever, it would never change to a single channel of silence.

For a panner node and convolver node, if the input is known to be silent, and the tail time has passed, these nodes would produce an output of one channel of silence.

As you say, cycles complicate this a lot. Consider absn -> convolver -> delay -> back to convolver. Even though the convolver is an FIR filter, the feedback loop makes the overall response be an IIR filter, so the output of the convolver is never really zero. But if the convolver detected that the input is "close enough" to zero that it can do the its tail processing, then the output of the convolver could go to a single channel of silence, which causes the delay node (eventually) to output one channel of silence, and the convolver then has one channel of true silence.

The panner node could do the same thing, and get similar behavior to the convolver, so I think we're ok there.

I think this would take care of the cycle case with these nodes.

I think the only problematic node is the ChannelMerger which is supposed to have a fixed number of output channels. It would be kind of weird, but if we said that if all the inputs are silent, then the merger produces a single channel of silence for the output, we can make everything work and not expose GC. While strange, no native node could really tell that the channel merger channel count changed because any upmixing/downmixing would just duplicate silent channels and not affect the audio (but possibly affect the number of channels). The worklet would be able to tell, of course, and this is what we want. And the channel count changes are due to silence, not GC.

Perhaps there are some corner cases that I've missed.

@rtoy
Copy link
Member

rtoy commented Aug 30, 2018

Teleconf: write up something for review based on the idea in #1471 (comment)

@karlt
Copy link
Contributor Author

karlt commented Aug 31, 2018

As you say, cycles complicate this a lot. Consider absn -> convolver -> delay -> back to convolver. Even though the convolver is an FIR filter, the feedback loop makes the overall response be an IIR filter, so the output of the convolver is never really zero. But if the convolver detected that the input is "close enough" to zero that it can do the its tail processing, then the output of the convolver could go to a single channel of silence, which causes the delay node (eventually) to output one channel of silence, and the convolver then has one channel of true silence.

The panner node could do the same thing, and get similar behavior to the convolver, so I think we're ok there.

I'm not clear here on whether the proposal is that convolver and panner nodes
always check for close enough to silence or only when in a cycle.

ABSN -> Gain > Delay -> back to Gain, needs similar treatment as for when
other filter nodes are in the cycle.

Consider if filter nodes were to always check for close enough to silence, but
an ABSN with a silent multi-channel buffer never outputs a single channel
while playing. ABSN -> Gain, however, would output a single channel while
playing. The effect of the insertion of a Gain in the graph changing the
number of channels would be unexpected.

If filter nodes are to only check for silence while in a cycle, then this is
more than necessary, because it would be sufficient for only DelayNodes to
check for close enough to silence. There is always a DelayNode in any unmuted
cycle, and a single channel "known" silent output would propagate
appropriately to other nodes.

I think these are the options from which to choose:

  1. All nodes check for close-enough to silent output (and produce only a
    single channel when silent).

  2. Only DelayNodes check for close enough to silent output.

  3. Only DelayNodes check for close enough to silent output, and only when in a
    cycle.

I expect any of these could be a solution.

For 2 and 3, in other situations, filter nodes need only check whether all
inputs are single channel "known" silent, where silent mono buffer playing
ABSN output is not "known" silent, I assume. I guess some other term such as
"inactive" might be better to describe the "known" silence.

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Oct 20, 2018
…denot

This is necessary for efficient processing of silence, and is consistent with
behavior of other nodes.

A null block current has a single channel of silence, which isn't aligned with
the current spec, but is consistent with the direction of
WebAudio/web-audio-api#1471 (comment)

Depends on D9210

Differential Revision: https://phabricator.services.mozilla.com/D9211

--HG--
extra : moz-landing-system : lando
@padenot
Copy link
Member

padenot commented Oct 26, 2018

TPAC resolution (2018-10-16): the group agrees on solution 3 above.

@hoch hoch removed the Needs Discussion The issue needs more discussion before it can be fixed. label Nov 1, 2018
@padenot
Copy link
Member

padenot commented Nov 8, 2018

I think for an ABSN, it would output a single channel of silence only after the stop time had been reached. If the ABSN is looping a multi-channel silent buffer forever, it would never change to a single channel of silence.

This is actually the current normative behaviour.

@hoch
Copy link
Member

hoch commented Nov 8, 2018

I think it's helpful to summarize where we are before writing up a PR.

  • When an AudioNode disabled, its output produces known silent (a single channel of silence) to the input of subsequent nodes.
  • A disabled node will be disconnected and collected later.
  • In the case of DelayNode in cycle, its output produces known silent when it goes below a threshold.

Please feel free to add more if I am missing something.

@karlt
Copy link
Contributor Author

karlt commented Nov 9, 2018

I'm not clear exactly what is meant by "disabled", but that seems like the gist of it. Also

  • Any filter node with known silent input and no tail remaining produces known silent output.
* A disabled node will be disconnected and collected later.

A disabled or known-silent node may be collected later, but this need not be spec'd.

It still cannot be disconnected just because an implementation wants to garbage collect the node. IOW any effects of connection counts must not be affected by GC.

The solution proposed here means channel count effects from inactive nodes are not observable, but connection counts are a slightly different issue. ScriptProcessorNode behavior is what depends on this. AudioWorklet behavior would not be affected by inactive connections when #1453 is addressed.

@hoch
Copy link
Member

hoch commented Nov 9, 2018

I think that's the following terms are to be clarified (so I can grasp what needs to be written):

  1. Disabled AudioNode
  2. Known silent input

In Chrome, the "disabled" status means that a node completed its rendering activity and won't be pulled any more. If a path in the graph is to be dismantled after a source node stops, nodes in such path will be disabled before they are disconnected. Thus, it does not mean either disconnection or garbage collection. We can mention the disconnection, but need not to say anything about GC.

I see your reference on "inactive connection" is quite close to this concept. No?

Once we settle on this idea, then defining known silent input should be easy: When an AudioNode is processed, it checks its upstream nodes to see whether they are disabled or not. If all upstream nodes are disabled, then the node will have known silent input in turn.

@karlt
Copy link
Contributor Author

karlt commented Nov 12, 2018

In Chrome, the "disabled" status means that a node completed its rendering activity and won't be pulled any more. If a path in the graph is to be dismantled after a source node stops, nodes in such path will be disabled before they are disconnected. Thus, it does not mean either disconnection or garbage collection. We can mention the disconnection, but need not to say anything about GC.

If only aiming to define "disabled" for source nodes that have played and stopped, then it would be possible to do so without saying anything about GC, but I doubt that is useful here.

IIUC the path "to be dismantled" includes filter nodes. These could only be disabled after they have no external references. It would be necessary to talk about GC to describe which filter nodes could be disabled. Source nodes that have not played would be in a similar situation.

As "disabled" would only be useful for source nodes that have finished playing, we'd still need a mechanism for other nodes. The mechanism for other nodes would also be sufficient for source nodes that have finished, and so there is little value in adding the "disabled" concept.

I see your reference on "inactive connection" is quite close to this concept. No?

It differs in that an inactive node can become active again.
I've been using inactive in a way that is equivalent to known silent.

The concept of known silent is key here.
A source node that is not currently playing would be known silent.

@hoch
Copy link
Member

hoch commented Nov 15, 2018

Okay, it seems like you think known silent is the only term that needs to be newly introduced in the spec. Would you mind writing a prose or something so we can get this going?

@karlt
Copy link
Contributor Author

karlt commented Nov 19, 2018

The note about a "variety of approaches" at https://webaudio.github.io/web-audio-api/#AudioWorkletProcessor-methods uses the term "active" for this, but "actively processing" would be more specific and avoid confusion with other active states.

The general prose for AudioNode could be

Unless otherwise specified, an AudioNode starts actively processing when any node connected to one of its inputs is actively processing, and ceases "actively processing" when the input that was received from other actively processing nodes no longer affects the output.

When an AudioNode is not actively processing, its outputs are silent and have only a single channel.

There are few necessary special cases.

An AudioScheduledSourceNode is actively processing when and only when it is playing for at least part of the current rendering quantum.

A DelayNode in a cycle is actively processing only when any output sample for the current rendering quantum is greater than or equal to 2^-126.

A ScriptProcessorNode is actively processing when its output is connected.

(This could be "input or output" to use the same sentence as onaudioprocess events because the actively processing state won't be observable if its output is not connected.)

For AudioWorkletNode, I suggest including [[callable process]] in the active processing conditions:

An AudioWorkletNode is actively processing when BOTH of the following conditions are met:

  • [[callable process]] is true.

  • EITHER of the following conditions are met.

    • The associated AudioWorkletProcessor's active source flag is equal to true.

    • Any node connected to one of its inputs is actively processing.

"If process() is not called during some rendering quantum due to the lack of any applicable active processing conditions, the result is as if the processor emitted silence for this period." may be removed.

Replace other active processing conditions references with references to actively processing.

@karlt
Copy link
Contributor Author

karlt commented Nov 19, 2018

In line with #1471 (comment), the AudioNode lifetime section would also be removed.

@padenot
Copy link
Member

padenot commented Nov 19, 2018

@hoch, do you agree with the above? I can take are of writing the prose here.

@hoch
Copy link
Member

hoch commented Nov 19, 2018

Thanks @karlt. This all looks good to me, @padenot!

@hoch hoch unassigned rtoy and hoch Nov 19, 2018
padenot added a commit to padenot/web-audio-api that referenced this issue Jun 26, 2019
padenot added a commit that referenced this issue Jun 28, 2019
Update the length of the  parameter sequence after #1471.
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Oct 3, 2019
…denot

This is necessary for efficient processing of silence, and is consistent with
behavior of other nodes.

A null block current has a single channel of silence, which isn't aligned with
the current spec, but is consistent with the direction of
WebAudio/web-audio-api#1471 (comment)

Depends on D9210

Differential Revision: https://phabricator.services.mozilla.com/D9211

UltraBlame original commit: 17f777b161890b63b207c412549e4afc7129fa32
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Oct 3, 2019
…denot

This is necessary for efficient processing of silence, and is consistent with
behavior of other nodes.

A null block current has a single channel of silence, which isn't aligned with
the current spec, but is consistent with the direction of
WebAudio/web-audio-api#1471 (comment)

Depends on D9210

Differential Revision: https://phabricator.services.mozilla.com/D9211

UltraBlame original commit: 17f777b161890b63b207c412549e4afc7129fa32
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Oct 3, 2019
…denot

This is necessary for efficient processing of silence, and is consistent with
behavior of other nodes.

A null block current has a single channel of silence, which isn't aligned with
the current spec, but is consistent with the direction of
WebAudio/web-audio-api#1471 (comment)

Depends on D9210

Differential Revision: https://phabricator.services.mozilla.com/D9211

UltraBlame original commit: 17f777b161890b63b207c412549e4afc7129fa32
bhearsum pushed a commit to mozilla-releng/staging-firefox that referenced this issue May 1, 2025
…denot

This is necessary for efficient processing of silence, and is consistent with
behavior of other nodes.

A null block current has a single channel of silence, which isn't aligned with
the current spec, but is consistent with the direction of
WebAudio/web-audio-api#1471 (comment)

Depends on D9210

Differential Revision: https://phabricator.services.mozilla.com/D9211
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants