[ML] add zero_shot_classification task for BERT nlp models #77799

benwtrent · 2021-09-15T16:02:28Z

Zero-Shot classification allows for text classification tasks without a pre-trained collection of target labels.

This is achieved through models trained on the Multi-Genre Natural Language Inference (MNLI) dataset. This dataset pairs text sequences with "entailment" clauses. An example could be:

"Throughout all of history, man kind has shown itself resourceful, yet astoundingly short-sighted" could have been paired with the entailment clauses: ["This example is history", "This example is sociology"...].

This training set combined with the attention and semantic knowledge in modern day NLP models (BERT, BART, etc.) affords a powerful tool for ad-hoc text classification.

See https://arxiv.org/abs/1909.00161 for a deeper explanation of the MNLI training and how zero-shot works.

The zeroshot classification task is configured as follows:

{
   // <snip> model configuration </snip>
  "inference_config" : {
    "zero_shot_classification": {
      "classification_labels": ["entailment", "neutral", "contradiction"], // <1>
      "labels": ["sad", "glad", "mad", "rad"], // <2>
      "multi_label": false, // <3>
      "hypothesis_template": "This example is {}.", // <4>
      "tokenization": { /*<snip> tokenization configuration </snip>*/}
    }
  }
}

<1> For all zero_shot models, there returns 3 particular labels when classification the target sequence. "entailment" is the positive case, "neutral" the case where the sequence isn't positive or negative, and "contradiction" is the negative case
<2> This is an optional parameter for the default zero_shot labels to attempt to classify
<3> When returning the probabilities, should the results assume there is only one true label or multiple true labels
<4> The hypothesis template when tokenizing the labels. When combining with sad the sequence looks like This example is sad.

For inference in a pipeline one may provide label updates:

{
  //<snip> pipeline definition </snip>
  "processors": [
    //<snip> other processors </snip>
    {
      "inference": {
        // <snip> general configuration </snip>
        "inference_config": {
          "zero_shot_classification": {
             "labels": ["humanities", "science", "mathematics", "technology"], // <1>
             "multi_label": true // <2>
          }
        }
      }
    }
    //<snip> other processors </snip>
  ]
}

<1> The labels we care about, these replace the default ones if they exist.
<2> Should the results allow multiple true labels

Similarly one may provide label changes against the _infer endpoint

{
   "docs":[{ "text_field": "This is a very happy person"}],
   "inference_config":{"zero_shot_classification":{"labels": ["glad", "sad", "bad", "rad"], "multi_label": false}}
}

benwtrent · 2021-09-16T12:24:43Z

run elasticsearch-ci/part-2

…ro-shot-classification-support

davidkyle

Looks good, only some minor comments

davidkyle · 2021-09-27T13:06:02Z

...re/src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/NlpConfigUpdate.java

+
+    @Override
+    public InferenceConfig toConfig() {
+        throw new UnsupportedOperationException("cannot serialize to nodes before 7.8");


I'm not sure what the reason for this is. I'm guessing the error message is a copy-paste, if the idea is that implementing classes should implement this method then remove this and let the compiler do its work.

@davidkyle we cannot create an NLP inference config from an update. In a separate PR I am gonna remove this check and this method as it is not used in master (so, this is an intermediate change really).

...ava/org/elasticsearch/xpack/core/ml/inference/trainedmodel/ZeroShotClassificationConfig.java

davidkyle · 2021-09-27T13:29:12Z

...ava/org/elasticsearch/xpack/core/ml/inference/trainedmodel/ZeroShotClassificationConfig.java

+            .orElse(new VocabularyConfig(InferenceIndexConstants.nativeDefinitionStore()));
+        this.tokenization = tokenization == null ? Tokenization.createDefault() : tokenization;
+        this.isMultiLabel = isMultiLabel != null && isMultiLabel;
+        this.hypothesisTemplate = Optional.ofNullable(hypothesisTemplate).orElse(DEFAULT_HYPOTHESIS_TEMPLATE);


If we allow labels to be null perhaps hypothesisTemplate should not be defaulted so that both must be defined at point of call.

Suggestion: rename labels to hypothesisLabels

rename labels to hypothesisLabels

I don't think anybody calls them that.

another good option might be target_labels?

If we allow labels to be null perhaps hypothesisTemplate should not be defaulted so that both must be defined at point of call.

The typical default (for MNLI trained models) is the one we are providing. This is a nice quality of life improvement I think.

I also don't think that labels should be required on creation. When the user is putting the model with its config, they have no idea what labels the model user will use. I think allowing null enforces that the person using the model has to provide the labels they want.

The whole point of zero_shot is that you don't know/care about the labels until you call infer.

...g/elasticsearch/xpack/core/ml/inference/trainedmodel/ZeroShotClassificationConfigUpdate.java

davidkyle · 2021-09-27T13:37:42Z

...g/elasticsearch/xpack/core/ml/inference/trainedmodel/ZeroShotClassificationConfigUpdate.java

+
+        @Override
+        public ZeroShotClassificationConfigUpdate.Builder setResultsField(String resultsField) {
+            throw new IllegalArgumentException();


Seems a little harsh I use this for the regression/classification models

Well, we don't even have a results field in the original NLP configs.

I think this is a larger discussion around unifying the NLP Configs with the classification/regression configs.

It should really be part of the inference processor config. Looking at the code this function appears to exist for use by InferencePipelineAggregationBuilder. ++ to revisiting this and simplifying the code.

...rg/elasticsearch/xpack/core/ml/inference/trainedmodel/ZeroShotClassificationConfigTests.java

.../src/main/java/org/elasticsearch/xpack/ml/inference/nlp/ZeroShotClassificationProcessor.java

…ro-shot-classification-support

benwtrent · 2021-09-27T17:09:47Z

docs/reference/ml/df-analytics/apis/get-trained-models.asciidoc

@@ -414,6 +414,68 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenizati
 (Optional, integer)
 include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-max-sequence-length]

+`with_special_tokens`::::


@szabosteve mind taking a look at the doc changes?

szabosteve

Documentation LGTM! Thanks for writing the content! ✍️ I left three suggestions. This one will make the docs CI pass.

docs/reference/ml/ml-shared.asciidoc

docs/reference/ml/df-analytics/apis/put-trained-models.asciidoc

davidkyle · 2021-09-28T09:02:24Z

docs/reference/ml/ml-shared.asciidoc

+it is possible to adjust the labels to classify. This makes this type of model
+and task exceptionally flexible.
+
+If consistently classifying the same labels, it may be better to use an optimized


Suggested change

If consistently classifying the same labels, it may be better to use an optimized

If consistently classifying the same labels, it may be better to use an fine tuned

If you accept this suggestion, please also change the indefinite article to a from an.

...ml/src/test/java/org/elasticsearch/xpack/ml/inference/nlp/tokenizers/BertTokenizerTests.java

.../src/main/java/org/elasticsearch/xpack/ml/inference/nlp/ZeroShotClassificationProcessor.java

davidkyle · 2021-09-28T10:44:07Z

.../src/main/java/org/elasticsearch/xpack/ml/inference/nlp/ZeroShotClassificationProcessor.java

+                );
+            }
+            final double[] normalizedScores;
+            if (isMultiLabel) {


isMultiLabel is just about how the results are interpreted? When true the probability of entailment as opposed to contradiction for each label is returned. When false it is the probability of each label being entailment. Can you help me understand this and update the docs

isMultiLabel is basically:

When true its softmax of individual entailment vs contradiction (probs don't sum to 1.0)

When false its softmax of all entailments (probs sum to 1.0)

The docs already state you use it when you could have more than one true label. Which is exactly what we use it for.

I would rather not talk about softmax, entailment, etc.

davidkyle · 2021-09-28T10:55:22Z

...g/elasticsearch/xpack/core/ml/inference/trainedmodel/ZeroShotClassificationConfigUpdate.java

+
+        @Override
+        public ZeroShotClassificationConfigUpdate.Builder setResultsField(String resultsField) {
+            throw new IllegalArgumentException();


It should really be part of the inference processor config. Looking at the code this function appears to exist for use by InferencePipelineAggregationBuilder. ++ to revisiting this and simplifying the code.

docs/reference/ml/ml-shared.asciidoc

.../src/main/java/org/elasticsearch/xpack/ml/inference/nlp/ZeroShotClassificationProcessor.java

Co-authored-by: David Kyle <[email protected]> Co-authored-by: István Zoltán Szabó <[email protected]>

davidkyle

LGTM

benwtrent added >non-issue :ml Machine learning v8.0.0 labels Sep 15, 2021

benwtrent force-pushed the feature/ml-add-zero-shot-classification-support branch 2 times, most recently from 1b308e7 to d3118d9 Compare September 16, 2021 12:14

benwtrent force-pushed the feature/ml-add-zero-shot-classification-support branch 3 times, most recently from 6eacd1a to 1da856b Compare September 21, 2021 12:27

[ML] add zero_shot_classification task for nlp models

1974ec2

benwtrent force-pushed the feature/ml-add-zero-shot-classification-support branch from 1da856b to 1974ec2 Compare September 21, 2021 19:28

benwtrent marked this pull request as ready for review September 22, 2021 11:27

elasticmachine added the Team:ML Meta label for the ML team label Sep 22, 2021

Merge remote-tracking branch 'upstream/master' into feature/ml-add-ze…

49b22c8

…ro-shot-classification-support

benwtrent changed the title ~~[ML] add zero_shot_classification task for nlp models~~ [ML] add zero_shot_classification task for BERT nlp models Sep 23, 2021

benwtrent added 3 commits September 23, 2021 09:11

fixing compilation'

62d2209

fixing empty labels handling

5210ae0

Merge remote-tracking branch 'upstream/master' into feature/ml-add-ze…

7d0cb15

…ro-shot-classification-support

davidkyle reviewed Sep 27, 2021

View reviewed changes

benwtrent added 2 commits September 27, 2021 12:51

Merge remote-tracking branch 'upstream/master' into feature/ml-add-ze…

a8d7f8c

…ro-shot-classification-support

addressing PR comments and adding docs

de401bb

benwtrent requested a review from davidkyle September 27, 2021 17:08

benwtrent commented Sep 27, 2021

View reviewed changes

szabosteve approved these changes Sep 28, 2021

View reviewed changes

docs/reference/ml/ml-shared.asciidoc Outdated Show resolved Hide resolved

docs/reference/ml/ml-shared.asciidoc Outdated Show resolved Hide resolved

docs/reference/ml/ml-shared.asciidoc Outdated Show resolved Hide resolved

davidkyle reviewed Sep 28, 2021

View reviewed changes

benwtrent commented Sep 28, 2021

View reviewed changes

docs/reference/ml/ml-shared.asciidoc Outdated Show resolved Hide resolved

benwtrent commented Sep 28, 2021

View reviewed changes

.../src/main/java/org/elasticsearch/xpack/ml/inference/nlp/ZeroShotClassificationProcessor.java Outdated Show resolved Hide resolved

benwtrent and others added 2 commits September 28, 2021 07:20

Apply suggestions from code review

9fd5d6a

Co-authored-by: David Kyle <[email protected]> Co-authored-by: István Zoltán Szabó <[email protected]>

addressing PR comments

7fa1e0f

benwtrent requested a review from davidkyle September 28, 2021 12:09

davidkyle approved these changes Sep 28, 2021

View reviewed changes

benwtrent merged commit 4084893 into elastic:master Sep 28, 2021

benwtrent deleted the feature/ml-add-zero-shot-classification-support branch September 28, 2021 13:38

davidkyle mentioned this pull request Sep 29, 2021

[ML] Add inference time configuration overrides #78441

Merged

jakelandis added v8.0.0-beta1 and removed v8.0.0 labels Oct 27, 2021

	If consistently classifying the same labels, it may be better to use an optimized
	If consistently classifying the same labels, it may be better to use an fine tuned

[ML] add zero_shot_classification task for BERT nlp models #77799

[ML] add zero_shot_classification task for BERT nlp models #77799

Uh oh!

Conversation

benwtrent commented Sep 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benwtrent commented Sep 16, 2021

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szabosteve left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidkyle Sep 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

benwtrent commented Sep 15, 2021 •

edited

Loading

szabosteve left a comment •

edited

Loading

davidkyle Sep 28, 2021 •

edited

Loading