Skip to content

Add register analysis to repo analysis API #93955

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

DaveCTurner
Copy link
Contributor

In #93825 we introduced a CAS operation on snapshot repositories. This commit extends the repository analysis API to include some concurrent operations which atomically increment a register, retrying on collisions, and a check which ensures that the final value of the register is as expected.

In elastic#93825 we introduced a CAS operation on snapshot repositories. This
commit extends the repository analysis API to include some concurrent
operations which atomically increment a register, retrying on
collisions, and a check which ensures that the final value of the
register is as expected.
@DaveCTurner DaveCTurner added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.8.0 labels Feb 21, 2023
@DaveCTurner DaveCTurner requested a review from fcofdez February 21, 2023 08:56
@elasticsearchmachine
Copy link
Collaborator

Hi @DaveCTurner, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Feb 21, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor Author

The test failures are legitimate and highlight an interesting question: there are situations where we might not even be able to read the current value (e.g. someone else holds the lock). How should we handle this? Do we need a specific exception to describe this case?

@DaveCTurner
Copy link
Contributor Author

It seems there's some problems related to concurrently manipulating file locks, e.g. a6b4e67 would trip this assertion in the JDK:

[2023-02-21T13:29:58,265][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [yamlRestTest-0] fatal error in thread [elasticsearch[yamlRestTest-0][snapshot][T#1]], exiting
java.lang.AssertionError: null
        at sun.nio.ch.FileLockTable.removeKeyIfEmpty(FileLockTable.java:139) ~[?:?]
        at sun.nio.ch.FileLockTable.removeAll(FileLockTable.java:193) ~[?:?]
        at sun.nio.ch.FileChannelImpl.implCloseChannel(FileChannelImpl.java:188) ~[?:?]
        at java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:112) ~[?:?]
        at org.elasticsearch.common.blobstore.fs.FsBlobContainer.compareAndExchangeRegister(FsBlobContainer.java:418) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.repositories.blobstore.testkit.RegisterAnalyzeAction$TransportAction$1Execution.doRun(RegisterAnalyzeAction.java:121) ~[?:?]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:958) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.lang.Thread.run(Thread.java:1589) ~[?:?]

I haven't spent much time tracking this down, but I suspect this might be tricky to avoid in general (e.g. allowing for things like mounting the same repository twice, aliasing via symlinks, etc.). Here I've implemented a simple solution that serialises all attempts to open or close a locked file.

@DaveCTurner
Copy link
Contributor Author

Sorry for all the noise/iterations here @fcofdez, these new tests showed that the existing implementation needed quite a bit of work. Looks to be done now tho, so this is good to review.

Copy link
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly good to me, I left a question about one of the invalid register transitions though.

return FileChannel.open(path, StandardOpenOption.READ, StandardOpenOption.CREATE_NEW, StandardOpenOption.WRITE);
private record LockedFileChannel(FileChannel fileChannel, Closeable fileLock) implements Closeable {

// avoid concurrently opening/closing locked files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add to the comment that we're serializing the LockedFileChannel creation due to a tripping assertion in the JDK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done in 37e3761.

*/
long compareAndExchangeRegister(String key, long expected, long updated) throws IOException;
long compareAndExchangeRegister(String key, long expected, long updated) throws IOException, ConcurrentRegisterOperationException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: does it need to be a checked exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I expect that callers will need to handle this exception very close to the call site (much like handling the case where the operation failed but managed to read the value) and forgetting to do this would be pretty bad, so I went this way. Perhaps it would be nicer to return an OptionalLong, we don't really need any details?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be nicer to return an OptionalLong, we don't really need any details?

see ab1385b for a draft of that

final var witness = blobContainer.compareAndExchangeRegister(registerName, currentValue, currentValue + 1);
if (witness == currentValue) {
listener.onResponse(null);
} else if (witness < currentValue || witness >= request.getRequestCount()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

witness >= request.getRequestCount() I think this is a valid scenario? i.e. another task runs > request.getRequestCount() updates while this task is waiting to be executed. Am I missing something here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are only request.getRequestCount() increment requests in total, and this one hasn't succeeded yet, so the current value must be smaller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@DaveCTurner DaveCTurner merged commit eac8727 into elastic:main Feb 23, 2023
@DaveCTurner DaveCTurner deleted the 2023-02-20-repo-analysis-register branch February 23, 2023 09:21
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Feb 23, 2023
Further work towards the S3 compare-and-exchange implementation showed
that we would like this API to permit async operations. This commit
moves to an async API.

Also, this change made it fairly awkward to use an exception to deliver
to the caller the indication that the current value could not be read,
so this commit adjusts things to use `OptionalLong` throughout as
suggested in the discussion on elastic#93955.
DaveCTurner added a commit that referenced this pull request Feb 27, 2023
Further work towards the S3 compare-and-exchange implementation showed
that we would like this API to permit async operations. This commit
moves to an async API.

Also, this change made it fairly awkward to use an exception to deliver
to the caller the indication that the current value could not be read,
so this commit adjusts things to use `OptionalLong` throughout as
suggested in the discussion on #93955.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants