You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(spanner): implement generation and propagation of "x-goog-spanner-request-id" Header (#11048)
* spanner: implement generation and propagation of "x-goog-spanner-request-id" Header
In tandem with the specification:
https://orijtech.notion.site/x-goog-spanner-request-id-always-on-gRPC-header-to-aid-in-quick-debugging-of-errors-14aba6bc91348091a58fca7a505c9827
this change adds sending over the "x-goog-spanner-request-id" header
for every unary and streaming call, in the form:
<version>.<processId>.<clientId>.<channelId>.<requestCountForClient>.<rpcCountForRequest>
where:
* version is the version of the specification
* processId is a randomly generated uint64 singleton for the lifetime of a process
* clientId is the monotonically increasing id/number of gRPC Spanner clients created
* requestCountForClient is the monotonically increasing number of requests made by the client
* channelId currently at 1 is the Id of the client for Go
* rpcCountForRequest is the number of RPCs/retries within a specific request
This header is to be sent on both unary and streaming calls and it'll
help debug latencies for customers. On an error, customers can assert against
.Error and retrieve the associated .RequestID and log it, or even better
it'll be printed out whenever errors are logged.
Importantly making randIdForProcess to be a uint6 which is 64bits and not
a UUID4 which is 128bits which surely massively reduces the possibility of collisions
to ensure that high QPS applications can function and accept bursts of traffic
without worry, as the prior design used uint32 aka 32 bits for
which just 50,000 new processes being created could get the probability
of collisions to 25%, with this new change a company would have to
create 82 million QPS every second for 1,000 years for a 1% collision
with 2.6e18 for which the collision would be 1%.
Using 64-bits still provides really good protection whereby for a 1% chance of collision,
we would need 810 million objects, so we have good protection.
However, Google Cloud Spanner's backend has to store every one of the always on
headers for a desired retention period hence 64-bits is a great balance between collision
protection vs storage.
Fixes#11073
* Rebase with main; rename nthRPC to attempt
* Infer channelID from ConnPool directly
* Attach nthRequest to sessionClient instead of to grpcClient given channelID is derived from sessionClient.connPool
* Retain reference to grpc.Header(*metadata.MD)
We have to re-insert the request-id even after gax.Invoke->grpc
internals clear it. Added test to validate retries.
* Fix up Error.Error() to show RequestID for both cases
* spanner: bring in tests contributed by Knut
* spanner: allow errors with grpc.codes: Canceled and DeadlineExceeded to be wrapped with request-id
* spanner: correctly track and increment retry attempts for each ExecuteStreamingSql request
* spanner: propagate RequestID even for DeadlineExceeded
* spanner: assert .RequestID exists
* Address code reivew nits+feedback
* spanner: account for stream resets and retries
This change accounts for logic graciously raised by Knut
along with his test contribution.
* Address more updates
0 commit comments