Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) #22575

jlamypoirier · 2023-04-04T20:31:48Z

The GPTBigcode model from BigCode. It is the same model as GPT2, with:

Added support for Multi-Query Attention (https://arxiv.org/abs/1911.02150)
A large number of optimizations, mostly targeting inference but also useful in training.

Other than MQA, it's the same model as GPT2, just a new implementation (though it's not numerically equivalent and the checkpoints are not compatible)

The optimizations (I might be missing some):

Use gelu_pytorch_tanh (see Add the pytorch implementation of the OpenAI GeLU approximation #21344 Add the GeLU activation from pytorch with the tanh approximation #21345)
Avoid unnecessary synchronizations (added to GPT2 in Change constant torch.tensor to torch.full #20061, but wasn't in the original santacoder).
Use Linear layers instead of Conv1D (good speedup but makes the checkpoints incompatible).
Merge _attn and _upcast_and_reordered_attn. Always merge the matmul with scaling. Rename reorder_and_upcast_attn->attention_softmax_in_fp32
Rename scale_attn_by_inverse_layer_idx-> scale_attention_softmax_in_fp32 and change its behavior to match Megatron-LM (divide by layer_idx in fp16, then multiply in fp32).
Cache the attention mask value to avoid recreating it every time.
Use jit to fuse the attention fp32 casting, masking, softmax, and scaling.
Combine the attention and causal masks into a single one, pre-computed for the whole model instead of every layer.
Merge the key and value caches into one (this changes the format of layer_past/ present, does it risk creating problems?)
Use the memory layout (self.num_heads, 3, self.head_dim) instead of (3, self.num_heads, self.head_dim) for the QKV tensor with MHA. (prevents an overhead with the merged key and values, but makes the checkpoints incompatible).

Excluded from this PR (optional/opt-in features, could be added later):

CPU optimization for inference, aka InferenceRunner (huge speedup for generation with pre-allocated tensors, pre-computed views and support; faster than Deepspeed, but too experimental to add now)
KV cache pre-allocation and padding. (Same reason)
MQA with separate Q and KV (MQA2 in bigcode, a bit faster for training , slower for inference)
FlashAttention (planning to add support in near future)
Conversion script for Megatron weights (the MQA part needs the BigCode fork of Megatron)

TODO:

Update/fix the tests
Update the docs (should be mostly ok by now)
Address the remaining circleci issues (mostly related to the tests)

jlamypoirier · 2023-04-04T20:36:26Z

@lvwerra @harm-devries
(Replaces #21253)

sgugger · 2023-04-04T20:47:06Z

Code on the Hub is fine too and we are adding better support for it every day :-)

lvwerra · 2023-04-05T09:01:57Z

Hi @sgugger, the next generation of the model will also support this architecture so there should also be significantly more usage. Discussed this also with @LysandreJik previously, what do you think?

HuggingFaceDocBuilderDev · 2023-04-05T09:40:05Z

The documentation is not available anymore as the PR was closed or merged.

ArthurZucker · 2023-04-05T13:22:34Z

src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py

+        if position_ids is None:
+            position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device)
+            position_ids = position_ids.unsqueeze(0).view(-1, input_shape[-1])


Could benefit from #21853

I'm not sure I understand that PR, is it to make generation independent of the padding? If so we definitely want it.

sgugger · 2023-04-05T13:34:09Z

If you prefer @lvwerra and if the architecture is frozen: we won't be able to accommodate changes after it's merged and released in Transformers (no breaking changes in Transformers), whereas it's easier to quickly experiment with code on the Hub. If you feel the model is mature enough and it's time, I'm not opposed :-)

younesbelkada · 2023-04-06T14:13:27Z

src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py

+        zeros = torch.zeros(attn_view, dtype=query.dtype, device=query.device)
+        attn_weights = torch.baddbmm(zeros, query, key, beta=1, alpha=scale_factor).view(attn_shape)


For learning purpose:

Note that this block before was:

attn_weights = torch.baddbmm( torch.empty(attn_view), query, key, beta=0, alpha=scale_factor ).view(attn_shape)

This seemed to be needed to fix the CI tests that were failing on CPU, the reason behind that is the following:
orch.empty(attn_view) creates a tensor of shape attn_view that will also contain random values in the order of magnitude of 1e-43. Even though the empty tensor is multiplied by beta (which is hardcoded to 0), it let to some overflows on CPU only, leading to the presence of nan values inside attn_weights. Hence the fix seemed to be
to create an empty tensor of zeros and multiply it with an arbitrary float value (here 1)

@jlamypoirier mentioned that this would add some overhead on GPU, I will add a check to check if the model is running on cpu or not

👀 I knew it

sgugger

Thanks a lot for adding this new model! My main comment is in the weight initialization.

sgugger · 2023-04-06T16:56:54Z

docs/source/en/model_doc/gpt_bigcode.mdx

+
+*The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigating better preprocessing methods for the training data. We train 1.1B parameter models on the Java, JavaScript, and Python subsets of The Stack and evaluate them on the MultiPL-E text-to-code benchmark. We find that more aggressive filtering of near-duplicates can further boost performance and, surprisingly, that selecting files from repositories with 5+ GitHub stars deteriorates performance significantly. Our best model outperforms previous open-source multilingual code generation models (InCoder-6.7B and CodeGen-Multi-2.7B) in both left-to-right generation and infilling on the Java, JavaScript, and Python portions of MultiPL-E, despite being a substantially smaller model. All models are released under an OpenRAIL license at [this https URL.](https://huggingface.co/bigcode)*
+
+The model is a an optimized GPT2 model with support for Multi-Query Attention.


Maybe link to the GPT-2 model doc page here?

sgugger · 2023-04-06T17:02:27Z

src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py

+# Copyright 2023 The OpenAI Team Authors.
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.


Not sure we need more than the BigCode team and Hugging Face here.

sgugger · 2023-04-06T17:07:11Z

src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py

+        # Reinitialize selected weights subject to the OpenAI GPT-2 Paper Scheme:
+        #   > A modified initialization which accounts for the accumulation on the residual path with model depth. Scale
+        #   > the weights of residual layers at initialization by a factor of 1/√N where N is the # of residual layers.
+        #   >   -- GPT-2 :: https://openai.com/blog/better-language-models/
+        #
+        # Reference (Megatron-LM): https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/model/gpt_model.py
+        for name, p in module.named_parameters():
+            if name == "c_proj.weight":
+                # Special Scaled Initialization --> There are 2 Layer Norms per Transformer Block
+                p.data.normal_(mean=0.0, std=(self.config.initializer_range / math.sqrt(2 * self.config.n_layer)))


This is a bit ugly and won't necessarily work since the module is not marked as initialized, wo will get re-initialized as a linear layer after. This should be in a check for GptBigCodeAttention, where you initialize module.c_proj.weight this way then mark module.c_proj._is_hf_initialized=True so that the layer is not reinitialized. See this example in OneFormer for instance.

This part is copied as-is from GPT2...

So let's fix GPT-2 too.

sgugger · 2023-04-06T17:08:11Z

src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py

+
+        loss = None
+        if labels is not None:
+            # Shift so that tokens < n predict n


Move the labels to the logits device here, for model parallelism support.

Again these are copies of GPT2, but I'm open to fixing

sgugger · 2023-04-06T17:08:19Z

src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py

+
+        loss = None
+        if labels is not None:
+            if self.config.problem_type is None:


sgugger · 2023-04-06T17:08:26Z

src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py

+
+        loss = None
+        if labels is not None:
+            loss_fct = CrossEntropyLoss()


- fix GPU issue - fixed slow tests - skip disk offload

younesbelkada · 2023-04-06T17:37:28Z

Thanks a lot for your feedback! Just addressed them all,
Small note that the cpu/disk offload seem to not work on the testing suite, but I think it is related to the corner case issues we faced with tiny T5 models, as the test pass for the GPTBigCodModelTest but does not pass for the GPTBigCodeMQAModelTest.
I will also make sure doctests pass before merging

jlamypoirier · 2023-04-06T20:55:59Z

Please wait a bit before merging, I'll do a final check for the latest changes

jlamypoirier · 2023-04-06T21:01:51Z

src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py

+        if query.device == torch.device("cpu"):
+            # this seemed to be needed - on CPU only: check https://github.com/huggingface/transformers/pull/22575/files#r1159858870
+            zeros = torch.zeros(attn_view, dtype=query.dtype, device=query.device)
+            attn_weights = torch.baddbmm(zeros, query, key, beta=1, alpha=scale_factor).view(attn_shape)


I know it doesn't really matter, but shouldn't this also have beta=0?

I tried it with beta=0 and still got the issue .. maybe torch complains if you multiply 0 tensor with 0 ..

That should be the least problematic operation here...
The point is baddbmm shouldn't even read the input when beta=0. Also do you have some code to reproduce this? On my machine there is no problem

>>> a=torch.full([50, 50, 50], torch.nan) >>> b=torch.randn([50, 50, 50]) >>> c=torch.randn([50, 50, 50]) >>> d=torch.full([50, 50, 50], 0) >>> torch.baddbmm(a,b,c, beta=1, alpha=5).sum() tensor(nan) >>> torch.baddbmm(a,b,c, beta=0, alpha=5).sum() tensor(-9915.6406) >>> torch.baddbmm(d,b,c, beta=0, alpha=5).sum() tensor(-9915.6406) >>> torch.baddbmm(d,b,c, beta=1, alpha=5).sum() tensor(-9915.6406) >>> (5*torch.bmm(b,c)).sum() tensor(-9915.6406)

Now running on a A100:

import torch s=32 dtype=torch.float32 a=torch.full([s, s, s], torch.nan, dtype=dtype) b=torch.randn([s, s, s], dtype=dtype) c=torch.randn([s, s, s], dtype=dtype) d=torch.zeros([s, s, s], dtype=dtype) y0=torch.baddbmm(a,b,c, beta=1, alpha=5) y1=torch.baddbmm(a,b,c, beta=0, alpha=5) y2=torch.baddbmm(d,b,c, beta=0, alpha=5) y3=torch.baddbmm(d,b,c, beta=1, alpha=5) y4=torch.bmm(b,c)*5 y5=torch.matmul(b,c)*5 yy=[y0,y1,y2,y3,y4,y5] aa=a.cuda() bb=b.cuda() cc=c.cuda() dd=d.cuda() z0=torch.baddbmm(aa,bb,cc, beta=1, alpha=5).cpu() z1=torch.baddbmm(aa,bb,cc, beta=0, alpha=5).cpu() z2=torch.baddbmm(dd,bb,cc, beta=0, alpha=5).cpu() z3=torch.baddbmm(dd,bb,cc, beta=1, alpha=5).cpu() z4=(torch.bmm(bb,cc)*5).cpu() z5=(torch.matmul(bb,cc)*5).cpu() zz=[z0,z1,z2,z3,z4,z5] print([(z-y).std() for y,z in zip(yy,zz)]) print([(y-yy[1]).std() for y in yy]) print([(z-zz[1]).std() for z in zz])

>>> print([(z-y).std() for y,z in zip(yy,zz)]) [tensor(nan), tensor(0.0084), tensor(0.0084), tensor(0.0084), tensor(0.0084), tensor(0.0084)] >>> >>> print([(y-yy[1]).std() for y in yy]) [tensor(nan), tensor(0.), tensor(0.), tensor(0.), tensor(4.2393e-06), tensor(4.2393e-06)] >>> print([(z-zz[1]).std() for z in zz]) [tensor(nan), tensor(0.), tensor(0.), tensor(0.), tensor(0.), tensor(0.)] >>>

From this output it looks like the output is indeed different on CPU vs GPU, but that happens for every single kind of matrix multiplication so there is nothing we can do about it... In general I don't think we should expect the numerically equal outputs on different devices

I could reproduce through a common test, what I did was to replace the following block with:

if False: # this seemed to be needed - on CPU only: check https://github.com/huggingface/transformers/pull/22575/files#r1159858870 zeros = torch.zeros(attn_view, dtype=query.dtype, device=query.device) attn_weights = torch.baddbmm(zeros, query, key, beta=1, alpha=scale_factor).view(attn_shape) else: # We do the standard operation on GPU for faster inference attn_weights = torch.baddbmm( torch.empty(attn_view, device=query.device, dtype=query.dtype), query, key, beta=0, alpha=scale_factor ).view(attn_shape) if attn_weights.isnan().any(): print()

And run

CUDA_VISIBLE_DEVICES= pytest tests/models/gpt_bigcode/test_modeling_gpt_bigcode.py::GPTBigCodeMQAModelTest::test_beam_sample_generate

and put a breakpoint right on the print. Strangely I couldn't reproduce in a small snippet. Here is what I have tried:

import torch N_EXPERIMENTS = 10000 device = torch.device("cpu") batch_size=8 q_len=4 k_len=5 hidden_dim=8 dtype=torch.float32 attn_view =(batch_size, q_len, k_len) attn_shape = (batch_size, 1, q_len, k_len) for _ in range(N_EXPERIMENTS): query = torch.randn(batch_size, q_len, hidden_dim, device=device, dtype=dtype) key = torch.randn(batch_size, hidden_dim, k_len, device=device, dtype=dtype) a = torch.empty(attn_view, device=query.device, dtype=query.dtype) out = torch.baddbmm(a, query, key, beta=0, alpha=0.356).view(attn_shape) if out.isnan().any(): print(out.isnan().any())

I did a bit more investigating, from what I could find that's a semi-random bug that only manifests itself when it feels like it so it's not easy to reproduce. I was able to get it consistently with

import torch s=7 dtype=torch.float32 aa=[] n=[] for i in range(10000): a=torch.full([s,s,s],torch.nan, dtype=dtype) b=torch.randn([s,s,s], dtype=dtype) c=torch.randn([s,s,s], dtype=dtype) y=torch.baddbmm(a,b,c, beta=0, alpha=5) aa+=[a,b,c] n.append(y.isnan().float().mean().item()) >>> print(torch.mean(torch.tensor(n)).item()) 0.2447994202375412

For some reason I never get any nan with s>=8 and rarely when not accumulating tensors in the list (aka reusing memory addreses).

Edit: interestingly it can make nans with beta=0 even is the input doesn't have any. It's way less common but seems enough to break the tests.

import torch s=7 dtype=torch.float32 aa=[] n=[] for i in range(10000): a=torch.zeros([s,s,s], dtype=dtype) b=torch.randn([s,s,s], dtype=dtype) c=torch.randn([s,s,s], dtype=dtype) y=torch.baddbmm(a,b,c, beta=0, alpha=5) aa+=[a,b,c] n.append(y.isnan().float().mean().item()) >>> print(torch.mean(torch.tensor(n)).item()) 1.1661808230201132e-06

Anyway, it looks like that's a known bug pytorch/pytorch#96037. It's been fixed in pytorch/pytorch#96086 but only for the next release of pytorch. So for now we should leave the zero, I'll just simplify that code and add a reference to the issue in the comment. (In the future it could be updated to only set to zero for torch version <=2.0.0)

Thanks a mile for investigating!

jlamypoirier · 2023-04-08T00:02:07Z

tests/models/gpt_bigcode/test_modeling_gpt_bigcode.py

        GPTBigCodeForCausalLM,
        GPTBigCodeForSequenceClassification,
        GPTBigCodeForTokenClassification,
        GPTBigCodeModel,
    )
    from transformers.models.gpt_bigcode.modeling_gpt_bigcode import GPTBigCodeAttention

+    torch.backends.cuda.matmul.allow_tf32 = False


Why??? If needed that would need a comment and it can't go in an import.

The tests pass on A100 witout it so I'll just remove

jlamypoirier · 2023-04-08T01:40:23Z

I did a few minor tweaks, I'm OK for merging if it works for everyone. (Assuming CI passes)

younesbelkada

Looking great for me as well! Thanks a lot for all your work and investigations @jlamypoirier 🔥 !
Thanks again for everything!

…de) (huggingface#22575) * Add model with cli tool * Remove unwanted stuff * Add new code * Remove inference runner * Style * Fix checks * Test updates * make fixup * fix docs * fix doc * fix test * hopefully fix pipeline tests * refactor * fix CIs * add comment * rename to `GPTBigCodeForCausalLM` * correct readme * make fixup + docs * make fixup * fixes * fixes * Remove pruning * Remove import * Doc updates * More pruning removal * Combine copies * Single MQA implementation, remove kv cache pre-allocation and padding * Update doc * Revert refactor to match gpt2 style * Merge back key and value caches, fix some type hints * Update doc * Fix position ids pith padding (PR 21080) * Add conversion script temporarily * Update conversion script * Remove checkpoint conversion * New model * Fix MQA test * Fix copies * try fix tests * FIX TEST!! * remove `DoubleHeadsModel` * add MQA tests * add slow tests * clean up * add CPU checker * final fixes * fixes - fix GPU issue - fixed slow tests - skip disk offload * fix final issue * Simplify and comment baddbmm fix * Remove unnecessary code * Transpose tweaks * Use beta=1 on cpu, improve tests --------- Co-authored-by: younesbelkada <[email protected]>

bharadwajymg · 2023-07-19T07:04:53Z

any updates on supporting flash attention ? or do we have a different PR to track it

ArthurZucker · 2023-07-20T14:49:23Z

cc @younesbelkada I think this is supported in BetterTransformers no?

younesbelkada · 2023-07-20T14:51:32Z

Indeed this should go into BetterTransformer API on optimum library: https://github.com/huggingface/optimum
Once the feature is added there, you can just call model.to_bettertransformer() and benefit from flash-attention backend. @bharadwajymg would you mind opening a ticket there and request for BetterTransformer support for GPTBigCode model ? thanks!

jlamypoirier added 5 commits April 4, 2023 15:18

Add model with cli tool

af055f3

Remove unwanted stuff

0f08a89

Add new code

244b060

Remove inference runner

ec9e830

Style

5eb70cd

jlamypoirier changed the title ~~Add GPTBigCode model~~ Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode models) Apr 4, 2023

jlamypoirier changed the title ~~Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode models)~~ Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) Apr 4, 2023

jlamypoirier and others added 4 commits April 4, 2023 17:56

Fix checks

a268f64

Test updates

f56869e

make fixup

9754129

fix docs

0a9b98c

younesbelkada added 2 commits April 5, 2023 09:10

fix doc

a929a88

fix test

323d337

younesbelkada added 10 commits April 5, 2023 09:40

Merge remote-tracking branch 'upstream/main' into HEAD

3a99205

hopefully fix pipeline tests

7f2703b

refactor

e75785a

fix CIs

ecca83d

add comment

379f286

rename to GPTBigCodeForCausalLM

8d78a6a

correct readme

6b1df7e

make fixup + docs

be2fd2f

make fixup

9a5e58c

fixes

361c4c5

ArthurZucker reviewed Apr 5, 2023

View reviewed changes

fixes

e4e289d

younesbelkada added 3 commits April 6, 2023 13:32

add MQA tests

0e2c0db

add slow tests

b3eb76a

clean up

a3693ae

younesbelkada reviewed Apr 6, 2023

View reviewed changes

younesbelkada added 2 commits April 6, 2023 14:20

add CPU checker

5316d8c

final fixes

0057454

younesbelkada marked this pull request as ready for review April 6, 2023 14:37

younesbelkada requested a review from sgugger April 6, 2023 14:43

sgugger reviewed Apr 6, 2023

View reviewed changes

younesbelkada added 2 commits April 6, 2023 17:29

fixes

d58614e

- fix GPU issue - fixed slow tests - skip disk offload

fix final issue

3b6faf4

younesbelkada requested a review from sgugger April 6, 2023 17:36

sgugger approved these changes Apr 6, 2023

View reviewed changes

jlamypoirier commented Apr 6, 2023

View reviewed changes

Simplify and comment baddbmm fix

7888b6e

jlamypoirier commented Apr 8, 2023

View reviewed changes

jlamypoirier added 3 commits April 7, 2023 20:03

Remove unnecessary code

49d556c

Transpose tweaks

2b64dc2

Use beta=1 on cpu, improve tests

fc6a6f4

younesbelkada approved these changes Apr 10, 2023

View reviewed changes

younesbelkada merged commit e0921c6 into huggingface:main Apr 10, 2023

lvwerra mentioned this pull request Apr 21, 2023

[WIP] Adding GPT2 with Multi Query Attention #21253

Closed

shmuelk mentioned this pull request Aug 30, 2023

Prefix Tuning (PFT) fails on GPT-BigCode based models huggingface/peft#886

Closed

4 tasks

		zeros = torch.zeros(attn_view, dtype=query.dtype, device=query.device)
		attn_weights = torch.baddbmm(zeros, query, key, beta=1, alpha=scale_factor).view(attn_shape)


		The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigating better preprocessing methods for the training data. We train 1.1B parameter models on the Java, JavaScript, and Python subsets of The Stack and evaluate them on the MultiPL-E text-to-code benchmark. We find that more aggressive filtering of near-duplicates can further boost performance and, surprisingly, that selecting files from repositories with 5+ GitHub stars deteriorates performance significantly. Our best model outperforms previous open-source multilingual code generation models (InCoder-6.7B and CodeGen-Multi-2.7B) in both left-to-right generation and infilling on the Java, JavaScript, and Python portions of MultiPL-E, despite being a substantially smaller model. All models are released under an OpenRAIL license at [this https URL.](https://huggingface.co/bigcode)

		The model is a an optimized GPT2 model with support for Multi-Query Attention.

		# Copyright 2023 The OpenAI Team Authors.
		# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.

Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) #22575

Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) #22575

Uh oh!

Conversation

jlamypoirier commented Apr 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jlamypoirier commented Apr 4, 2023

Uh oh!

sgugger commented Apr 4, 2023

Uh oh!

lvwerra commented Apr 5, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Apr 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgugger commented Apr 5, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

younesbelkada commented Apr 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jlamypoirier commented Apr 6, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

younesbelkada Apr 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlamypoirier Apr 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlamypoirier commented Apr 8, 2023

Uh oh!

jlamypoirier commented Apr 4, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 5, 2023 •

edited

Loading

younesbelkada commented Apr 6, 2023 •

edited

Loading

younesbelkada Apr 7, 2023 •

edited

Loading

jlamypoirier Apr 7, 2023 •

edited

Loading

younesbelkada left a comment •

edited

Loading