Skip to content

Commit 3159b12

Browse files
feat: add timeout to inflight queue waiting (#1957)
* feat: Split writer into connection worker and wrapper, this is a prerequisite for multiplexing client * feat: add connection worker pool skeleton, used for multiplexing client * feat: add Load api for connection worker for multiplexing client * feat: add multiplexing support to connection worker. We will treat every new stream name as a switch of destinationt * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * feat: port the multiplexing client core algorithm and basic tests also fixed a tiny bug inside fake bigquery write impl for getting thre response from offset * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * feat: wire multiplexing connection pool to stream writer * feat: some fixes for multiplexing client * feat: fix some todos, and reject the mixed behavior of passed in client or not * feat: fix the bug that we may peek into the write_stream field but it's possible the proto schema does not contain this field * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * feat: fix the bug that we may peek into the write_stream field but it's possible the proto schema does not contain this field * feat: add getInflightWaitSeconds implementation * feat: Add schema comparision in connection loop to ensure schema update for the same stream name can be notified * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * feat: add schema update support to multiplexing * fix: fix windows build bug: windows Instant resolution is different with linux * fix: fix another failing tests for windows build * fix: fix another test failure for Windows build * feat: Change new thread for each retry to be a thread pool to avoid create/tear down too much threads if lots of retries happens * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * fix: add back the background executor provider that's accidentally removed * feat: throw error when use connection pool for explicit stream * fix: Add precision truncation to the passed in value from JSON float and double type. * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * modify the bom version * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * fix deadlockissue in ConnectionWorkerPool * fix: fix deadlock issue during close + append for multiplexing * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * fix: fix one potential root cause of deadlock issue for non-multiplexing case * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * Add timeout to inflight queue waiting, and also add some extra log Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
1 parent dcb234b commit 3159b12

File tree

3 files changed

+93
-2
lines changed

3 files changed

+93
-2
lines changed

google-cloud-bigquerystorage/src/main/java/com/google/cloud/bigquery/storage/v1/ConnectionWorker.java

+29-2
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,9 @@
6161
class ConnectionWorker implements AutoCloseable {
6262
private static final Logger log = Logger.getLogger(StreamWriter.class.getName());
6363

64+
// Maximum wait time on inflight quota before error out.
65+
private static long INFLIGHT_QUOTA_MAX_WAIT_TIME_MILLI = 300000;
66+
6467
private Lock lock;
6568
private Condition hasMessageInWaitingQueue;
6669
private Condition inflightReduced;
@@ -322,7 +325,14 @@ private ApiFuture<AppendRowsResponse> appendInternal(AppendRowsRequest message)
322325
this.inflightBytes += requestWrapper.messageSize;
323326
waitingRequestQueue.addLast(requestWrapper);
324327
hasMessageInWaitingQueue.signal();
325-
maybeWaitForInflightQuota();
328+
try {
329+
maybeWaitForInflightQuota();
330+
} catch (StatusRuntimeException ex) {
331+
--this.inflightRequests;
332+
waitingRequestQueue.pollLast();
333+
this.inflightBytes -= requestWrapper.messageSize;
334+
throw ex;
335+
}
326336
return requestWrapper.appendResult;
327337
} finally {
328338
this.lock.unlock();
@@ -347,6 +357,15 @@ private void maybeWaitForInflightQuota() {
347357
.withCause(e)
348358
.withDescription("Interrupted while waiting for quota."));
349359
}
360+
long current_wait_time = System.currentTimeMillis() - start_time;
361+
if (current_wait_time > INFLIGHT_QUOTA_MAX_WAIT_TIME_MILLI) {
362+
throw new StatusRuntimeException(
363+
Status.fromCode(Code.CANCELLED)
364+
.withDescription(
365+
String.format(
366+
"Interrupted while waiting for quota due to long waiting time %sms",
367+
current_wait_time)));
368+
}
350369
}
351370
inflightWaitSec.set((System.currentTimeMillis() - start_time) / 1000);
352371
}
@@ -373,7 +392,6 @@ public void close() {
373392
log.fine("Waiting for append thread to finish. Stream: " + streamName);
374393
try {
375394
appendThread.join();
376-
log.info("User close complete. Stream: " + streamName);
377395
} catch (InterruptedException e) {
378396
// Unexpected. Just swallow the exception with logging.
379397
log.warning(
@@ -387,6 +405,7 @@ public void close() {
387405
}
388406

389407
try {
408+
log.fine("Begin shutting down user callback thread pool for stream " + streamName);
390409
threadPool.shutdown();
391410
threadPool.awaitTermination(3, TimeUnit.MINUTES);
392411
} catch (InterruptedException e) {
@@ -396,7 +415,10 @@ public void close() {
396415
+ streamName
397416
+ " is interrupted with exception: "
398417
+ e.toString());
418+
throw new IllegalStateException(
419+
"Thread pool shutdown is interrupted for stream " + streamName);
399420
}
421+
log.info("User close finishes for stream " + streamName);
400422
}
401423

402424
/*
@@ -858,6 +880,11 @@ public static void setOverwhelmedCountsThreshold(double newThreshold) {
858880
}
859881
}
860882

883+
@VisibleForTesting
884+
static void setMaxInflightQueueWaitTime(long waitTime) {
885+
INFLIGHT_QUOTA_MAX_WAIT_TIME_MILLI = waitTime;
886+
}
887+
861888
@AutoValue
862889
abstract static class TableSchemaAndTimestamp {
863890
// Shows the timestamp updated schema is reported from response

google-cloud-bigquerystorage/src/test/java/com/google/cloud/bigquery/storage/v1/ConnectionWorkerTest.java

+63
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@
1616
package com.google.cloud.bigquery.storage.v1;
1717

1818
import static com.google.common.truth.Truth.assertThat;
19+
import static org.junit.Assert.assertEquals;
20+
import static org.junit.Assert.assertThrows;
21+
import static org.junit.Assert.assertTrue;
1922

2023
import com.google.api.core.ApiFuture;
2124
import com.google.api.gax.batching.FlowController;
@@ -28,7 +31,9 @@
2831
import com.google.cloud.bigquery.storage.v1.ConnectionWorker.Load;
2932
import com.google.protobuf.DescriptorProtos;
3033
import com.google.protobuf.Int64Value;
34+
import io.grpc.StatusRuntimeException;
3135
import java.io.IOException;
36+
import java.time.Duration;
3237
import java.util.ArrayList;
3338
import java.util.Arrays;
3439
import java.util.List;
@@ -52,6 +57,7 @@ public class ConnectionWorkerTest {
5257
@Before
5358
public void setUp() throws Exception {
5459
testBigQueryWrite = new FakeBigQueryWrite();
60+
ConnectionWorker.setMaxInflightQueueWaitTime(300000);
5561
serviceHelper =
5662
new MockServiceHelper(
5763
UUID.randomUUID().toString(), Arrays.<MockGrpcService>asList(testBigQueryWrite));
@@ -281,6 +287,63 @@ public void testAppendInSameStream_switchSchema() throws Exception {
281287
}
282288
}
283289

290+
@Test
291+
public void testAppendButInflightQueueFull() throws Exception {
292+
ConnectionWorker connectionWorker =
293+
new ConnectionWorker(
294+
TEST_STREAM_1,
295+
createProtoSchema("foo"),
296+
6,
297+
100000,
298+
Duration.ofSeconds(100),
299+
FlowController.LimitExceededBehavior.Block,
300+
TEST_TRACE_ID,
301+
client.getSettings());
302+
testBigQueryWrite.setResponseSleep(org.threeten.bp.Duration.ofSeconds(1));
303+
ConnectionWorker.setMaxInflightQueueWaitTime(500);
304+
ProtoSchema schema1 = createProtoSchema("foo");
305+
306+
long appendCount = 6;
307+
for (int i = 0; i < appendCount; i++) {
308+
testBigQueryWrite.addResponse(createAppendResponse(i));
309+
}
310+
311+
// In total insert 6 requests, since the max queue size is 5 we will stuck at the 6th request.
312+
List<ApiFuture<AppendRowsResponse>> futures = new ArrayList<>();
313+
for (int i = 0; i < appendCount; i++) {
314+
long startTime = System.currentTimeMillis();
315+
// At the last request we wait more than 500 millisecond for inflight quota.
316+
if (i == 5) {
317+
assertThrows(
318+
StatusRuntimeException.class,
319+
() -> {
320+
sendTestMessage(
321+
connectionWorker,
322+
TEST_STREAM_1,
323+
schema1,
324+
createFooProtoRows(new String[] {String.valueOf(5)}),
325+
5);
326+
});
327+
long timeDiff = System.currentTimeMillis() - startTime;
328+
assertEquals(connectionWorker.getLoad().inFlightRequestsCount(), 5);
329+
assertTrue(timeDiff > 500);
330+
} else {
331+
futures.add(
332+
sendTestMessage(
333+
connectionWorker,
334+
TEST_STREAM_1,
335+
schema1,
336+
createFooProtoRows(new String[] {String.valueOf(i)}),
337+
i));
338+
assertEquals(connectionWorker.getLoad().inFlightRequestsCount(), i + 1);
339+
}
340+
}
341+
342+
for (int i = 0; i < appendCount - 1; i++) {
343+
assertEquals(i, futures.get(i).get().getAppendResult().getOffset().getValue());
344+
}
345+
}
346+
284347
private AppendRowsResponse createAppendResponse(long offset) {
285348
return AppendRowsResponse.newBuilder()
286349
.setAppendResult(

google-cloud-bigquerystorage/src/test/java/com/google/cloud/bigquery/storage/v1/StreamWriterTest.java

+1
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@ public StreamWriterTest() throws DescriptorValidationException {}
105105
@Before
106106
public void setUp() throws Exception {
107107
testBigQueryWrite = new FakeBigQueryWrite();
108+
ConnectionWorker.setMaxInflightQueueWaitTime(300000);
108109
serviceHelper =
109110
new MockServiceHelper(
110111
UUID.randomUUID().toString(), Arrays.<MockGrpcService>asList(testBigQueryWrite));

0 commit comments

Comments
 (0)