-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Add register analysis to repo analysis API #93955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add register analysis to repo analysis API #93955
Conversation
In elastic#93825 we introduced a CAS operation on snapshot repositories. This commit extends the repository analysis API to include some concurrent operations which atomically increment a register, retrying on collisions, and a check which ensures that the final value of the register is as expected.
Hi @DaveCTurner, I've created a changelog YAML for you. |
Pinging @elastic/es-distributed (Team:Distributed) |
The test failures are legitimate and highlight an interesting question: there are situations where we might not even be able to read the current value (e.g. someone else holds the lock). How should we handle this? Do we need a specific exception to describe this case? |
It seems there's some problems related to concurrently manipulating file locks, e.g. a6b4e67 would trip this assertion in the JDK:
I haven't spent much time tracking this down, but I suspect this might be tricky to avoid in general (e.g. allowing for things like mounting the same repository twice, aliasing via symlinks, etc.). Here I've implemented a simple solution that serialises all attempts to open or close a locked file. |
Sorry for all the noise/iterations here @fcofdez, these new tests showed that the existing implementation needed quite a bit of work. Looks to be done now tho, so this is good to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks mostly good to me, I left a question about one of the invalid register transitions though.
return FileChannel.open(path, StandardOpenOption.READ, StandardOpenOption.CREATE_NEW, StandardOpenOption.WRITE); | ||
private record LockedFileChannel(FileChannel fileChannel, Closeable fileLock) implements Closeable { | ||
|
||
// avoid concurrently opening/closing locked files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add to the comment that we're serializing the LockedFileChannel
creation due to a tripping assertion in the JDK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done in 37e3761.
*/ | ||
long compareAndExchangeRegister(String key, long expected, long updated) throws IOException; | ||
long compareAndExchangeRegister(String key, long expected, long updated) throws IOException, ConcurrentRegisterOperationException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: does it need to be a checked exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but I expect that callers will need to handle this exception very close to the call site (much like handling the case where the operation failed but managed to read the value) and forgetting to do this would be pretty bad, so I went this way. Perhaps it would be nicer to return an OptionalLong
, we don't really need any details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it would be nicer to return an OptionalLong, we don't really need any details?
see ab1385b for a draft of that
final var witness = blobContainer.compareAndExchangeRegister(registerName, currentValue, currentValue + 1); | ||
if (witness == currentValue) { | ||
listener.onResponse(null); | ||
} else if (witness < currentValue || witness >= request.getRequestCount()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
witness >= request.getRequestCount()
I think this is a valid scenario? i.e. another task runs > request.getRequestCount()
updates while this task is waiting to be executed. Am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only request.getRequestCount()
increment requests in total, and this one hasn't succeeded yet, so the current value must be smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, for some reason I thought that the variable was created within the loop in https://github.com/elastic/elasticsearch/pull/93955/files#diff-2685c0b9e390123278fc8429f89d237daf9ed0d74ba41fd9ccc8318d077ca69cR454-R465 🤦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Further work towards the S3 compare-and-exchange implementation showed that we would like this API to permit async operations. This commit moves to an async API. Also, this change made it fairly awkward to use an exception to deliver to the caller the indication that the current value could not be read, so this commit adjusts things to use `OptionalLong` throughout as suggested in the discussion on elastic#93955.
Further work towards the S3 compare-and-exchange implementation showed that we would like this API to permit async operations. This commit moves to an async API. Also, this change made it fairly awkward to use an exception to deliver to the caller the indication that the current value could not be read, so this commit adjusts things to use `OptionalLong` throughout as suggested in the discussion on #93955.
In #93825 we introduced a CAS operation on snapshot repositories. This commit extends the repository analysis API to include some concurrent operations which atomically increment a register, retrying on collisions, and a check which ensures that the final value of the register is as expected.