Initial implementation for DataLifecycleService #94012

andreidan · 2023-02-22T13:09:36Z

This adds support for managing the lifecycle for data streams. It currently supports rollover and data retention.

Note that error collection and reporting will come in a follow-up PR.

Relates to #93596

This adds support for managing the lifecycle for data streams. It currently supports rollover and data retention.

modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java

elasticsearchmachine · 2023-02-23T13:41:06Z

Hi @andreidan, I've created a changelog YAML for you.

andreidan · 2023-02-23T13:47:38Z

server/src/main/java/org/elasticsearch/action/admin/indices/rollover/RolloverRequest.java

+    public boolean equals(Object o) {
+        if (this == o) {
+            return true;
+        }
+        if (o == null || getClass() != o.getClass()) {
+            return false;
+        }
+        RolloverRequest that = (RolloverRequest) o;
+        return dryRun == that.dryRun
+            && Objects.equals(rolloverTarget, that.rolloverTarget)
+            && Objects.equals(newIndexName, that.newIndexName)
+            && Objects.equals(conditions, that.conditions)
+            && Objects.equals(createIndexRequest, that.createIndexRequest);
+    }
+
+    @Override
+    public int hashCode() {
+        return Objects.hash(rolloverTarget, newIndexName, dryRun, conditions, createIndexRequest);
+    }


Needed as the requests are used as keys in the ResultDeduplicator

andreidan · 2023-02-23T13:47:49Z

server/src/main/java/org/elasticsearch/action/admin/indices/delete/DeleteIndexRequest.java

+    @Override
+    public boolean equals(Object o) {
+        if (this == o) {
+            return true;
+        }
+        if (o == null || getClass() != o.getClass()) {
+            return false;
+        }
+        DeleteIndexRequest that = (DeleteIndexRequest) o;
+        return Arrays.equals(indices, that.indices) && Objects.equals(indicesOptions, that.indicesOptions);
+    }
+
+    @Override
+    public int hashCode() {
+        int result = Objects.hash(indicesOptions);
+        result = 31 * result + Arrays.hashCode(indices);
+        return result;
+    }


Needed as the requests are used as keys in the ResultDeduplicator

andreidan · 2023-02-23T13:48:26Z

modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecyclePlugin.java

+        dataLifecycleInitialisationService.set(
+            new DataLifecycleService(
+                settings,
+                new OriginSettingClient(client, DLM_ORIGIN),


DLM runs as superuser

elasticsearchmachine · 2023-02-23T13:49:03Z

Pinging @elastic/es-data-management (Team:Data Management)

gmarouli · 2023-02-23T14:43:00Z

modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java

+                        continue;
+                    }
+
+                    TimeValue indexLifecycleDate = getCreationOrRolloverDate(dataStream.getName(), backingIndex);


Hm, I was wondering if it would be better to call this rolloverDate, at this point in the code we can only encounter rolled over indices, right?

I believe it's better because it is more explicit.

I'd prefer to not make that assumption as the modify datastream API could be used to bring any index into the data stream (e.g. one that was never rolled over - at which point we'd take that index's creation data into consideration)

gmarouli

LGTM! 🚀 So cool to see it working!!!!

gmarouli · 2023-02-23T14:44:42Z

modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java

+                    long nowMillis = nowSupplier.getAsLong();
+                    if (nowMillis >= indexLifecycleDate.getMillis() + retention.getMillis()) {
+                        // there's an opportunity here to batch the delete requests (i.e. delete 100 indices / request)
+                        // let's start simple and reevaluate


Nice remark!

gmarouli · 2023-02-23T14:56:57Z

modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java

+        if (this.isMaster) {
+            if (scheduler.get() == null) {
+                // don't create scheduler if the node is shutting down
+                if (isClusterServiceStoppedOrClosed() == false) {


Should we also take into account the shutdown API here?

No, I don't think we should, because DLM should continue to work while a master node is marked as shutting down (which could be for hours)

dakrone

Thanks for working on this Andrei! It's super exciting to see it starting to actually do things :)

I left some comments about this, I think we need to be really defensive on our error-handling (ILM has taught us that!) and we should try to factor as much stuff out into unit-testable and non-mock tests as possible.

modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java

dakrone · 2023-02-23T23:58:44Z

modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java

+        if (this.isMaster) {
+            if (scheduler.get() == null) {
+                // don't create scheduler if the node is shutting down
+                if (isClusterServiceStoppedOrClosed() == false) {


No, I don't think we should, because DLM should continue to work while a master node is marked as shutting down (which could be for hours)

modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java

dakrone · 2023-02-24T00:15:42Z

modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java

+                RolloverRequest rolloverRequest = defaultRolloverRequestSupplier.apply(dataStream.getName());
+                transportActionsDeduplicator.executeOnce(
+                    rolloverRequest,
+                    ActionListener.noop(),


Should we pass in some kind of listener for logging purposes so that DLM can log (at trace) that it's invoking rollover requests? Not sure if it'd be too much noise or whether it'd be useful, what do you think?

++

I think we need a custom listener here that will collect the encountered errors. I am planning to add it as part of the next effort (follow-up PR) related to error reporting.

modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java

dakrone · 2023-02-24T00:29:48Z

modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java

+        rolloverRequest.addMaxIndexAgeCondition(TimeValue.timeValueDays(30));
+        rolloverRequest.addMaxPrimaryShardSizeCondition(ByteSizeValue.ofGb(50));
+        rolloverRequest.addMaxPrimaryShardDocsCondition(200_000_000);


I think we need to start a conversation (earlier is better) about what this default should be. I think we should perhaps aim for something a little shorter than 30 days, more in the 7 day range (or shorter if we think we can get away with it).

I changed it to 7 days which is what we postulated via the design doc (not sure where my 30 days came from here)
Do you think we should discuss having it lower than 7 days?

dakrone · 2023-02-24T00:32:21Z

modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java

+        if (scheduler.get() != null) {
+            scheduler.get().remove(DATA_LIFECYCLE_JOB_NAME);


One of these days we should just write a SetOnceOptional<T> class of our own that combines SetOnce and Optional, so we can dispense with the safety checks and do scheduler.set(...) and scheduler.ifPresent(s -> s.remove(DATA_LIFECYCLE_JOB_NAME));

dakrone · 2023-02-24T00:33:20Z

modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java

+    void setDefaultRolloverRequestSupplier(Function<String, RolloverRequest> defaultRolloverRequestSupplier) {
+        this.defaultRolloverRequestSupplier = defaultRolloverRequestSupplier;
+    }


We have this to make testing easier right? I would like to get to the point where we parsing out the real rollover request from the configuration and then we can remove this entirely, what do you think?

Absolutely. Mary is working on introducing the default rollover setting. This will go away once that's merged.

modules/dlm/src/test/java/org/elasticsearch/dlm/DataLifecycleServiceTests.java

andreidan · 2023-02-24T05:40:42Z

@elasticmachine update branch

dakrone

This LGTM, thanks for the changes Andrei! I left one more comment that is more design-centric but could be done in followup work if you agree (or ignored if you don't agree)

dakrone · 2023-02-24T19:32:56Z

modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java

+            List<Index> backingIndices = dataStream.getIndices();
+            // we'll look at the current write index in the next run if it's rolled over (and not the write index anymore)
+            for (int i = 0; i < backingIndices.size() - 1; i++) {
+                IndexMetadata backingIndex = state.metadata().index(backingIndices.get(i));
+                if (backingIndex == null || isManagedByDLM(dataStream, backingIndex) == false) {
+                    continue;
+                }
+
+                if (isTimeToBeDeleted(dataStream.getName(), backingIndex, nowSupplier, retention)) {
+                    // there's an opportunity here to batch the delete requests (i.e. delete 100 indices / request)
+                    // let's start simple and reevaluate
+                    DeleteIndexRequest deleteRequest = new DeleteIndexRequest(backingIndex.getIndex().getName()).masterNodeTimeout(
+                        TimeValue.MAX_VALUE
+                    );


Thinking about this a little bit, we could possibly push this logic into the DataStream itself right? Something like dataStream.getIndicesPastRetention() returning the list. Especially since we're pushing the lifecycle information into the data stream. It lets the logic live next to the lifecycle configuration, and then this code here doesn't need to know anything about the write index. What do you think?

++ I think that could work. Thanks for the suggestion Lee.

Will do this in a follow-up PR

Add initial implementation for DataLifecycleService

9ec25d9

This adds support for managing the lifecycle for data streams. It currently supports rollover and data retention.

andreidan added >feature v8.8.0 :Data Management/DLM labels Feb 22, 2023

andreidan added 3 commits February 22, 2023 13:13

Comment above the method

9d2be43

Fix license headers

cc9f084

fix

751893f

gmarouli reviewed Feb 22, 2023

View reviewed changes

modules/dlm/src/main/java/org/elasticsearch/dlm/DataLifecycleService.java Show resolved Hide resolved

andreidan added 2 commits February 23, 2023 13:22

Integration test, close the service, init method

ff73b9a

Merge branch 'main' into dlm-service

3838c2f

andreidan changed the title ~~Add initial implementation for DataLifecycleService~~ Initial implementation for DataLifecycleService Feb 23, 2023

remove unused things

ec5a18a

Update docs/changelog/94012.yaml

9f7c896

andreidan commented Feb 23, 2023

View reviewed changes

andreidan marked this pull request as ready for review February 23, 2023 13:48

andreidan requested a review from gmarouli February 23, 2023 13:48

elasticsearchmachine added the Team:Data Management Meta label for data/management team label Feb 23, 2023

andreidan requested a review from dakrone February 23, 2023 13:57

gmarouli reviewed Feb 23, 2023

View reviewed changes

gmarouli approved these changes Feb 23, 2023

View reviewed changes

andreidan mentioned this pull request Feb 23, 2023

[Meta] Data Lifecycle Management #93596

Closed

19 tasks

dakrone requested changes Feb 24, 2023

View reviewed changes

elasticmachine and others added 3 commits February 24, 2023 16:10

Merge branch 'main' into dlm-service

a999e07

7 days rollover

24ef8b1

Drop shallow method

67e8845

andreidan added 10 commits February 24, 2023 06:21

Add isMaster check when triggered

3a81030

If-else rewrite

8f610d3

spotless

3db7fba

SchedulerEngine handling

a3da9c9

Flip if (this.isMaster)

ab30f19

More logging and extract methods

a458b54

Avoid mocking the ClusterService

0067096

Spotless

8909a6e

Mocks no more

b5acf41

ClusterService is local field

bb870e2

andreidan requested a review from dakrone February 24, 2023 15:51

dakrone approved these changes Feb 24, 2023

View reviewed changes

andreidan merged commit 4760f00 into elastic:main Feb 27, 2023

		if (scheduler.get() != null) {
		scheduler.get().remove(DATA_LIFECYCLE_JOB_NAME);

Initial implementation for DataLifecycleService #94012

Initial implementation for DataLifecycleService #94012

Uh oh!

Conversation

andreidan commented Feb 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

elasticsearchmachine commented Feb 23, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Feb 23, 2023

Uh oh!

gmarouli Feb 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gmarouli left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dakrone left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andreidan commented Feb 24, 2023

Uh oh!

dakrone left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andreidan commented Feb 22, 2023 •

edited

Loading

gmarouli Feb 23, 2023 •

edited

Loading