-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Initial implementation for DataLifecycleService #94012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This adds support for managing the lifecycle for data streams. It currently supports rollover and data retention.
modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java
Show resolved
Hide resolved
Hi @andreidan, I've created a changelog YAML for you. |
public boolean equals(Object o) { | ||
if (this == o) { | ||
return true; | ||
} | ||
if (o == null || getClass() != o.getClass()) { | ||
return false; | ||
} | ||
RolloverRequest that = (RolloverRequest) o; | ||
return dryRun == that.dryRun | ||
&& Objects.equals(rolloverTarget, that.rolloverTarget) | ||
&& Objects.equals(newIndexName, that.newIndexName) | ||
&& Objects.equals(conditions, that.conditions) | ||
&& Objects.equals(createIndexRequest, that.createIndexRequest); | ||
} | ||
|
||
@Override | ||
public int hashCode() { | ||
return Objects.hash(rolloverTarget, newIndexName, dryRun, conditions, createIndexRequest); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed as the requests are used as keys in the ResultDeduplicator
@Override | ||
public boolean equals(Object o) { | ||
if (this == o) { | ||
return true; | ||
} | ||
if (o == null || getClass() != o.getClass()) { | ||
return false; | ||
} | ||
DeleteIndexRequest that = (DeleteIndexRequest) o; | ||
return Arrays.equals(indices, that.indices) && Objects.equals(indicesOptions, that.indicesOptions); | ||
} | ||
|
||
@Override | ||
public int hashCode() { | ||
int result = Objects.hash(indicesOptions); | ||
result = 31 * result + Arrays.hashCode(indices); | ||
return result; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed as the requests are used as keys in the ResultDeduplicator
dataLifecycleInitialisationService.set( | ||
new DataLifecycleService( | ||
settings, | ||
new OriginSettingClient(client, DLM_ORIGIN), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DLM runs as superuser
Pinging @elastic/es-data-management (Team:Data Management) |
continue; | ||
} | ||
|
||
TimeValue indexLifecycleDate = getCreationOrRolloverDate(dataStream.getName(), backingIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I was wondering if it would be better to call this rolloverDate
, at this point in the code we can only encounter rolled over indices, right?
I believe it's better because it is more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to not make that assumption as the modify datastream
API could be used to bring any index into the data stream (e.g. one that was never rolled over - at which point we'd take that index's creation data into consideration)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚀 So cool to see it working!!!!
long nowMillis = nowSupplier.getAsLong(); | ||
if (nowMillis >= indexLifecycleDate.getMillis() + retention.getMillis()) { | ||
// there's an opportunity here to batch the delete requests (i.e. delete 100 indices / request) | ||
// let's start simple and reevaluate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice remark!
if (this.isMaster) { | ||
if (scheduler.get() == null) { | ||
// don't create scheduler if the node is shutting down | ||
if (isClusterServiceStoppedOrClosed() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also take into account the shutdown API here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think we should, because DLM should continue to work while a master node is marked as shutting down (which could be for hours)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this Andrei! It's super exciting to see it starting to actually do things :)
I left some comments about this, I think we need to be really defensive on our error-handling (ILM has taught us that!) and we should try to factor as much stuff out into unit-testable and non-mock tests as possible.
modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java
Outdated
Show resolved
Hide resolved
if (this.isMaster) { | ||
if (scheduler.get() == null) { | ||
// don't create scheduler if the node is shutting down | ||
if (isClusterServiceStoppedOrClosed() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think we should, because DLM should continue to work while a master node is marked as shutting down (which could be for hours)
modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java
Outdated
Show resolved
Hide resolved
modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java
Outdated
Show resolved
Hide resolved
RolloverRequest rolloverRequest = defaultRolloverRequestSupplier.apply(dataStream.getName()); | ||
transportActionsDeduplicator.executeOnce( | ||
rolloverRequest, | ||
ActionListener.noop(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we pass in some kind of listener for logging purposes so that DLM can log (at trace) that it's invoking rollover requests? Not sure if it'd be too much noise or whether it'd be useful, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
I think we need a custom listener here that will collect the encountered errors. I am planning to add it as part of the next effort (follow-up PR) related to error reporting.
modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java
Outdated
Show resolved
Hide resolved
rolloverRequest.addMaxIndexAgeCondition(TimeValue.timeValueDays(30)); | ||
rolloverRequest.addMaxPrimaryShardSizeCondition(ByteSizeValue.ofGb(50)); | ||
rolloverRequest.addMaxPrimaryShardDocsCondition(200_000_000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to start a conversation (earlier is better) about what this default should be. I think we should perhaps aim for something a little shorter than 30 days, more in the 7 day range (or shorter if we think we can get away with it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to 7 days which is what we postulated via the design doc (not sure where my 30 days came from here)
Do you think we should discuss having it lower than 7 days?
if (scheduler.get() != null) { | ||
scheduler.get().remove(DATA_LIFECYCLE_JOB_NAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of these days we should just write a SetOnceOptional<T>
class of our own that combines SetOnce
and Optional
, so we can dispense with the safety checks and do scheduler.set(...)
and scheduler.ifPresent(s -> s.remove(DATA_LIFECYCLE_JOB_NAME));
void setDefaultRolloverRequestSupplier(Function<String, RolloverRequest> defaultRolloverRequestSupplier) { | ||
this.defaultRolloverRequestSupplier = defaultRolloverRequestSupplier; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this to make testing easier right? I would like to get to the point where we parsing out the real rollover request from the configuration and then we can remove this entirely, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. Mary is working on introducing the default rollover setting. This will go away once that's merged.
modules/dlm/src/test/java/org/elasticsearch/dlm/DataLifecycleServiceTests.java
Outdated
Show resolved
Hide resolved
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, thanks for the changes Andrei! I left one more comment that is more design-centric but could be done in followup work if you agree (or ignored if you don't agree)
List<Index> backingIndices = dataStream.getIndices(); | ||
// we'll look at the current write index in the next run if it's rolled over (and not the write index anymore) | ||
for (int i = 0; i < backingIndices.size() - 1; i++) { | ||
IndexMetadata backingIndex = state.metadata().index(backingIndices.get(i)); | ||
if (backingIndex == null || isManagedByDLM(dataStream, backingIndex) == false) { | ||
continue; | ||
} | ||
|
||
if (isTimeToBeDeleted(dataStream.getName(), backingIndex, nowSupplier, retention)) { | ||
// there's an opportunity here to batch the delete requests (i.e. delete 100 indices / request) | ||
// let's start simple and reevaluate | ||
DeleteIndexRequest deleteRequest = new DeleteIndexRequest(backingIndex.getIndex().getName()).masterNodeTimeout( | ||
TimeValue.MAX_VALUE | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this a little bit, we could possibly push this logic into the DataStream
itself right? Something like dataStream.getIndicesPastRetention()
returning the list. Especially since we're pushing the lifecycle information into the data stream. It lets the logic live next to the lifecycle configuration, and then this code here doesn't need to know anything about the write index. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ I think that could work. Thanks for the suggestion Lee.
Will do this in a follow-up PR
This adds support for managing the lifecycle for data streams. It currently supports rollover and data retention.
Note that error collection and reporting will come in a follow-up PR.
Relates to #93596