core, grpclb,xds: let leaf LB policies explicitly refresh name resolution when subchannel connection is broken #8048

voidzcy · 2021-04-02T21:46:10Z

Currently each subchannel implicitly refreshes the name resolution when its connection is broken. That is, this feature is built into subchannel's internal implementation. Although it eliminates the burden of having LB implementations refreshing the resolver when connections to backends are broken, this is gives LB policies no chance to disable or override this refresh (e.g., in some complex load balancing hierarchy like xDS, LB policies may embed a resolver inside for resolving backends so the refreshing resolution operation should be hooked to the resolver embedded in the LB policy instead of the one in Channel).

This is likely to be a breaking change for users implementing their own LB policies (performing load balancing directly on backends). In order to make this transition smoothly, we add a check to SubchannelImpl that checks if the LoadBalancer has explicitly called Helper.refreshNameResolution for broken subchannels created by it. If not, it logs a warning and do the refresh.

A temporary LoadBalancer.Helper API ignoreRefreshNameResolution() is added to avoid false-positive warnings for xDS that intentionally does not want a refresh. Once the migration is done, this should be deleted.

See details in #8088.

…is broken, let LB policies handle it.

…c in pick_first.

…c in round_robin.

…c in grpclb.

…x/move_subchannel_refreshing_ns_to_lb

…c in ring hash LB policy.

ejona86

Do we have any ideas how we're supposed to notice bugs with LB behavior? It would seem trivial for a LB policy to not be triggering refreshes and no test would fail.

core/src/main/java/io/grpc/internal/ManagedChannelImpl.java

…l's refresh for cases LoadBalancer intentionally does not want a refresh.

…when handling subchannel state changes. Log a warning and do a refresh if it is not.

… a name resolution refresh for the resolver in Channel, so use ignoreRefreshNameResolutionCheck() to avoid false-positive warnings.

voidzcy · 2021-04-15T19:37:47Z

Do we have any ideas how we're supposed to notice bugs with LB behavior? It would seem trivial for a LB policy to not be triggering refreshes and no test would fail.

Fixed by adding a flag. The Channel will still do the automatic refresh if the LoadBalancer did not do so. PTAL.

ejona86

Just to help check that we've updated our code appropriately, replace the warning with an exception/panic and run the tests to make sure they pass. After a few releases with the warning, I wouldn't be surprised if we want to do something similar for a grpc-java release.

core/src/main/java/io/grpc/internal/ManagedChannelImpl.java

api/src/main/java/io/grpc/LoadBalancer.java

core/src/main/java/io/grpc/internal/ManagedChannelImpl.java

api/src/main/java/io/grpc/LoadBalancer.java

…o no-op, update its doc and not mark it as deprecated for now.

… for OOB channel state changes.

voidzcy added 7 commits April 2, 2021 14:20

Do not implicitly refresh name resolution when subchannel connection …

4743bc8

…is broken, let LB policies handle it.

Add subchannel refreshing name resolution when connection broken logi…

1305135

…c in pick_first.

Add subchannel refreshing name resolution when connection broken logi…

6c7f2f3

…c in round_robin.

Add subchannel refreshing name resolution when connection broken logi…

b6633e6

…c in grpclb.

Fix mock in RLS test.

8f3f060

Merge branch 'master' of https://github.com/grpc/grpc-java into bugfi…

826f6e1

…x/move_subchannel_refreshing_ns_to_lb

Add subchannel refreshing name resolution when connection broken logi…

a3b5e13

…c in ring hash LB policy.

voidzcy changed the title ~~core, grpclb: let leaf LB policies explicitly refresh name resolution when subchannel connection is broken~~ core, grpclb,xds: let leaf LB policies explicitly refresh name resolution when subchannel connection is broken Apr 9, 2021

voidzcy requested a review from ejona86 April 9, 2021 22:18

ejona86 reviewed Apr 9, 2021

View reviewed changes

core/src/main/java/io/grpc/internal/ManagedChannelImpl.java Show resolved Hide resolved

voidzcy added 3 commits April 15, 2021 11:52

Add ignoreRefreshNameResolutionCheck() API to skip warning and Channe…

be2386c

…l's refresh for cases LoadBalancer intentionally does not want a refresh.

Check if refreshNameResolution() has been called by the LoadBalancer …

fd1b608

…when handling subchannel state changes. Log a warning and do a refresh if it is not.

The cluster_resolver LB policy intentionally does not want to trigger…

2ff23c6

… a name resolution refresh for the resolver in Channel, so use ignoreRefreshNameResolutionCheck() to avoid false-positive warnings.

voidzcy force-pushed the bugfix/move_subchannel_refreshing_ns_to_lb branch from 9256ec1 to 2ff23c6 Compare April 15, 2021 19:34

Add proper annotation.

cf5af02

ejona86 approved these changes Apr 15, 2021

View reviewed changes

voidzcy added 4 commits April 15, 2021 16:38

Improve warning message.

e443a78

Make LoadBalancer.Helper.ignoreRefreshNameResolutionCheck() default t…

417532f

…o no-op, update its doc and not mark it as deprecated for now.

Add TODO note for changing LB policies manage name resolution refresh…

522d545

… for OOB channel state changes.

Fix warning message check test.

8312c61

voidzcy merged commit 9614738 into grpc:master Apr 16, 2021

github-actions bot locked as resolved and limited conversation to collaborators Jul 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core, grpclb,xds: let leaf LB policies explicitly refresh name resolution when subchannel connection is broken #8048

core, grpclb,xds: let leaf LB policies explicitly refresh name resolution when subchannel connection is broken #8048

voidzcy commented Apr 2, 2021 •

edited

Loading

ejona86 left a comment

voidzcy commented Apr 15, 2021

ejona86 left a comment

core, grpclb,xds: let leaf LB policies explicitly refresh name resolution when subchannel connection is broken #8048

core, grpclb,xds: let leaf LB policies explicitly refresh name resolution when subchannel connection is broken #8048

Conversation

voidzcy commented Apr 2, 2021 • edited Loading

ejona86 left a comment

Choose a reason for hiding this comment

voidzcy commented Apr 15, 2021

ejona86 left a comment

Choose a reason for hiding this comment

voidzcy commented Apr 2, 2021 •

edited

Loading