0% found this document useful (0 votes)

90 views

Codegemma Report

The document introduces CodeGemma, a collection of open code models built on Gemma capable of code and natural language generation tasks. It releases three model checkpoints: CodeGemma 7B pretrained and instruction-tuned variants with strong language understanding and math skills, and CodeGemma 2B for fast code completion.

Uploaded by

Baltazar Jovo

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views

Codegemma Report

Uploaded by

Baltazar Jovo

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

2024-04-09

CodeGemma: Open Code Models Based on

Gemma
CodeGemma Team, Google LLC1
1 See
Contributions and Acknowledgments section for full author list. Please send correspondence to
[email protected].

This paper introduces CodeGemma, a collection of specialized open code models built on top of Gemma,
capable of a variety of code and natural language generation tasks. We release three model checkpoints.
CodeGemma 7B pretrained (PT) and instruction-tuned (IT) variants have remarkably resilient natural
language understanding, excel in mathematical reasoning, and match code capabilities of other open
models. CodeGemma 2B is a state-of-the-art code completion model designed for fast code infilling and
open-ended generation in latency-sensitive settings.

Introduction Gemma Pretrained Models

We present CodeGemma, a collection of open

2B 7B
code models based on Google DeepMind’s
Gemma models (Gemma Team et al., 2024).
Continuing from Gemma pretrained models, 100% Code 80% Code Infilling
CodeGemma models are further trained on more Infilling 20% Natural Language
than 500 billion tokens of primarily code, using
the same architectures as the Gemma model fam- CodeGemma
ily. As a result, CodeGemma models achieve state- CodeGemma CodeGemma Code SFT
7B
of-the-art code performance in both completion 2B 7B & RLHF
Instruct
and generation tasks, while maintaining strong
understanding and reasoning skills at scale. We Figure 1 | Both pretrained models are derived
release a 7B code pretrained model and a 7B from corresponding Gemma pretrained models.
instruction-tuned code model. Further, we re-
lease a specialized 2B model, trained specifically
for code infilling and open-ended generation. The from web documents, mathematics, and code.
lineage of these models is depicted in Figure 1. The 2B models are trained with 100% code while
In this report, we provide an overview of the the 7B models are trained with a 80% code-
additions to Gemma, such as pretraining and 20% natural language mixture. Our code corpus
instruction-tuning details for CodeGemma, fol- comes from publicly available code repositories.
lowed by evaluations of all models across a wide Datasets are deduplicated and filtered to remove
variety of academic and real world tasks against contamination of evaluation code and certain per-
similar models. Finally, we outline the areas in sonal and sensitive data. In addition to the pro-
which CodeGemma excels and its limitations, fol- cessing done for Gemma, we perform additional
lowed by recommendations for using this model. pretraining steps for code data.

Pretraining Preprocessing for Fill-in-the-Middle

Training Data The pretrained CodeGemma models are trained

using a method based on the fill-in-the-middle
CodeGemma models are further trained on 500 (FIM) task (Bavarian et al., 2022) with improve-
billion tokens of primarily English language data ments that address the shortcomings cited in the

© 2024 Google LLC. All rights reserved

CodeGemma: Open Code Models Based on Gemma

original work as well as empirically-found sys- Files not covered by this dependency graph
temic issues with existing FIM-trained models. method are sorted alphabetically within their
The relevant formatting control tokens are pre- repository with unit tests packed next to their
sented in Table 1. The models are trained to work implementations (e.g. TestFoo.java beside
with both PSM (Prefix-Suffix-Middle) and SPM Foo.java).
(Suffix-Prefix-Middle) modes. Figure 2 shows a
sample snippet formatted in PSM. We make de-
tailed FIM usage instructions in the Inference Instruction Tuning
Recommendations section.
Our training data consists of a combination of
Context Relevant Token open-source math datasets and synthetically gen-
erated code, in addition to the finetuning datasets
FIM prefix <|fim_prefix|>
used by Gemma. By exposing the model to math-
FIM middle <|fim_middle|> ematical problems, we aim to enhance its logical
FIM suffix <|fim_suffix|> reasoning and problem-solving skills, which are
File separator <|file_separator|> essential for code generation.

Table 1 | Formatting control tokens used for FIM

task. Note that | is the standard pipe character Mathematics Datasets
(ASCII code 124). To enhance the mathematical reasoning capabili-
ties of coding models, we employ supervised fine-
tuning on a diverse set of mathematics datasets,
including:
Multi-file Packing

Many downstream code-related tasks involve gen- MATH Dataset A collection of 12,500 challeng-
erating code based on a repository-level context ing mathematical problems from competi-
as opposed to a single file. To improve model tions, providing step-by-step solutions for
alignment with real-world applications, we cre- training models in answer derivation and
ate training examples by co-locating the most explanation generation (Hendrycks et al.,
relevant source files within code repositories and 2021).
best-effort grouping them into the same training
examples. Specifically, we employ two heuris- GSM8k Dataset A collection of 8,500 grade
tics: dependency graph-based packing and unit school math problems. This dataset tests
test-based lexical packing. the multi-step reasoning abilities of models,
To construct the dependency graph, we first highlighting their limitations despite the sim-
group files by repository. For each source file, plicity of the problems (Cobbe et al., 2021a).
we extract imports from the top N lines and per-
MathQA Dataset A large-scale dataset of math
form suffix matching to determine the longest
word problems (Amini et al., 2019) with an-
matching paths within the repository structure.
notations built on top of the AQuA dataset
We determine edge importance (a heuristic mea-
(Ling et al., 2017).
sure) between files, and remove unimportant
edges to break cyclic dependencies (common in
Synthetic Mathematical Data A
Python). We then calculate all-pairs shortest
programmatically-generated dataset of
paths within the graph, where shorter distances
algebraic problems used to improve ability
signify stronger file relationships. Finally, we lin-
to solve long algebra problems.
earize the graph of files using a topological sort,
selecting the next unparented node based on min-
imum distance to sorted nodes and using lexico- By leveraging these diverse datasets, we ex-
graphic order to break ties. pose the model to a wide range of mathematical

2
CodeGemma: Open Code Models Based on Gemma

path/to/the/first/file.py↵
<|fim_prefix|>from typing import List↵
↵
def mean_absolute_deviation(numbers: List[float]) -> float:↵
"""For a given list of input numbers, calculate Mean Absolute Deviation↵
around the mean of this dataset.↵
Mean Absolute Deviation is the average absolute difference between each↵
element and a centerpoint (mean in this case):↵
MAD = average | x - x_mean |↵
>>> mean_absolute_deviation([1.0, 2.0, 3.0, 4.0])↵
1.0↵
"""↵
<|fim_suffix|><|fim_middle|> return sum(abs(x - mean) for x in numbers) / len(numbers)↵
<|file_separator|>path/to/the/second/file.py↵
<|fim_prefix|>...

Figure 2 | Example code snippet in PSM mode. The green ↵ characters are part of the format, whereas
uncolored ↵ is from the source. The shown code sample is from HumanEval (Chen et al., 2021).

problems, increasing their ability to perform com- Infilling Capability

plex mathematical reasoning. Our training exper-
iments indicate that these datasets significantly HumanEval Infilling
boost code generation performance. The CodeGemma models are trained for code
completion purposes. We use the single-line
Coding Dataset and multi-line metrics in the HumanEval Infilling
benchmarks introduced in Fried et al. (2023) to
Effectively instruction-tuning large language evaluate. Performance against other FIM-aware
models for code generation tasks requires a sub- code models is shown in Table 2.
stantial amount of question-answer pairs. We
leverage synthetic code instruction data genera- We observe that our 2B pretrained model is
tion to create datasets used in the supervised- an excellent well-rounded model for code com-
finetuning (SFT) and reinforcement learning pletion use cases, where low latency is a critical
from human feedback (RLHF) phase. We apply factor. It performs on par with the other models
the following steps: while being, in many cases, nearly twice as fast
during inference. We attribute this speedup to
Example Generation Following the approach the base Gemma architectural decisions.
outlined in the OSS-Instruct paper (Wei et al.,
2023), we generate a set of self-contained
question-answer pairs. Real-world Evaluation

Post-Filtering We filter question-answer pairs We validate our model’s infilling abilities by mask-
using an LLM tasked with evaluating the ing out random snippets in code with cross-file de-
helpfulness and correctness of the generated pendencies, generating samples from the model,
question-answer pairs. and retesting the code files with the generated
snippets to show that it performs as expected,
a similar approach to Liu et al. (2023) or Ding
Evaluation et al. (2023). Due to our inclusion of very recently
committed open source code, we do not use the
We evaluate CodeGemma for code completion
evaluations directly, but use an internal version
and generation performance, as well as natural
with the same testing methodology.
language understanding, with automated bench-
marks across a variety of domains. In addition to evaluating on offline evaluations,

3
CodeGemma: Open Code Models Based on Gemma

Time (s) Performance

Model Single Multi Single Multi

2B class CodeGemma 543 8479 78.41% 51.44%

DeepSeek Coder 990 13138 79.96% 50.95%
DeepSeek Coder Instruct 5632 31505 81.41% 37.35%
StarCoder2 3665 20629 77.44% 47.65%
CodeGemma 1505 22896 76.09% 58.44%
CodeGemma Instruct 8330 49438 68.25% 20.05%
7B class

Code Llama* 74.10% 48.20%

DeepSeek Coder 1559 22387 85.87% 63.20%
DeepSeek Coder Instruct 9500 53498 86.45% 58.01%
StarCoder2 8080 45459 81.03% 53.21%

Table 2 | Single-line and multi-line code completion capability of CodeGemma compared to other
FIM-aware code models. Time is the total number of seconds to obtain 128-token continuations per
each HumanEval Infilling task (1033 tasks in single-line and 5815 multi-line). Measurements are done
with HuggingFace’s Transformers (Wolf et al., 2020) model implementations on g2-standard-4
GCE instances with bfloat16 datatype and batch size of 1. * Code Llama numbers are taken from
Rozière et al. (2024).

the model was tested within live coding environ- Multi-lingual Benchmarks
ments to benchmark its performance against cur-
rent Google completion models. BabelCode (Orlanski et al., 2023) is used to mea-
sure the performance of CodeGemma on code
generation across a variety of popular program-
Coding Capability ming languages. Results are presented in Table
Python Coding 4.

The canonical benchmarks used in coding evalu-

ation are HumanEval (Chen et al., 2021) and Language Capability
Mostly Basic Python Problems (Austin et al.,
2021). We present our results in Table 3. We evaluate performance on a variety of domains
including question answering (Bisk et al., 2019;
Benchmark HumanEval MBPP Clark et al., 2019, 2018; Joshi et al., 2017), natu-
ral language (Hendrycks et al., 2020; Sakaguchi
2B-PT 31.1% 43.6% et al., 2019; Zellers et al., 2019) and mathemat-
Gemma 2B PT 22.0% 29.2% ical reasoning (Cobbe et al., 2021b; Hendrycks
7B-PT 44.5% 56.2% et al., 2021). We present the results of our two
7B-IT 56.1% 54.2% 7B models next to the instruction-tuned Gemma
Gemma 7B PT 32.3% 44.4% 7B model in Figure 3.
CodeGemma retains most of the same natural
Table 3 | Python coding capability of CodeGemma
language capabilities seen in the base Gemma
on de-facto coding benchmarks.
models. CodeGemma PT and IT both outperform
Mistral 7B (Jiang et al., 2023) by 7.2% and Llama-
Compared to the base Gemma models (Gemma 2 13B model (Touvron et al., 2023) by 19.1%
Team et al., 2024), CodeGemma models perform (numbers reported in Gemma Team et al. 2024).
significantly better on tasks from the coding do- Further, we compare scores for GSM8K and MATH
main. in Table 5 from several code models in the 7B

4
CodeGemma: Open Code Models Based on Gemma

Language 2B 7B 7B-IT Model GSM8K MATH

C/C++ 24.2% 32.9% 42.2% CodeGemma PT 44.2% 19.9%
C# 10.6% 22.4% 26.7% CodeGemma IT 41.2% 20.9%
HumanEval

Go 20.5% 21.7% 28.6% Code Llama 13.0%

Java 29.2% 41.0% 48.4% DeepSeek Coder 43.2% 19.2%
JavaScript 21.7% 39.8% 46.0% StarCoder2 40.4%
Kotlin 28.0% 39.8% 51.6%
Python 21.7% 42.2% 48.4% Table 5 | Math reasoning capability of other code
Rust 26.7% 34.1% 36.0% models in the same 7B size class. Results collected
from Guo et al. (2024); Lozhkov et al. (2024);
C/C++ 47.1% 53.8% 56.7%
Rozière et al. (2024).
C# 28.7% 32.5% 41.2%
Go 45.6% 43.3% 46.2%
MBPP

Java 41.8% 50.3% 57.3% Practical Considerations

JavaScript 45.3% 58.2% 61.4%
Kotlin 46.8% 54.7% 59.9% CodeGemma is tailored for practical use and de-
Python 38.6% 59.1% 62.0% ployment in latency-sensitive settings. The 2B
Rust 45.3% 52.9% 53.5% model is considerably faster than all models in
our comparison set, which is critical for latency-
Table 4 | Multi-lingual coding capability of sensitive applications such as code completion.
CodeGemma (CG) on BabelCode-translated Hu- This speedup does not come with a significant,
manEval and Mostly Basic Python Problems measured compromise in quality according to
(MBPP) datasets. IT stands for instruction-tuned. our evaluations — the 2B model performs as
well or better compared to other open models
size class, and show that CodeGemma excels at in its class at code infilling tasks. Consequently,
mathematical reasoning compared to similarly CodeGemma 2B is exceptionally suitable for uti-
sized models. lization within Integrated Development Environ-
ments (IDEs), local environments, and other ap-
Boolq PIQA TriviaQA ARC-C HellaSwag plications with memory constraints.
MMLU WinoGrande GSM8K MATH
The 7B models, characterized by their strong
performance, are general coding models that sur-
80
pass the baseline Gemma models in terms of cod-
ing tasks while maintaining a high level of natural
60 language comprehension. The larger memory re-
quirement during inference renders these models
particularly suitable for deployment in hosted en-
40 vironments and applications where model quality
is of utmost importance.
20 The Responsible Deployment section in Gemma
Team et al. (2024) contains a thorough discussion
about the limitations and benefits of using an
0
Gemma IT CodeGemma PT CodeGemma IT
open model.

Figure 3 | Language capability comparison of

CodeGemma and the instruction-tuned version of Inference Recommendations
Gemma. Both Gemma and CodeGemma are in
the 7B size class. For pretrained models, prompts should be for-
matted for code completion tasks such as func-
tion completion, docstring generation, and im-

5
CodeGemma: Open Code Models Based on Gemma

port suggestion. Figure 4 shows an example of a

prompt format, where the file path is optional but
recommended. The stopping strategy for model
outputs should be chosen carefully to align with
the deployment setting. The most straightfor-
ward method is to truncate upon generating a
FIM sentinel token, as shown in Table 1.

Figure 4 | Prompt in PSM mode. The carriage

return ↵ is part of the format. There are no spaces
after the suffix.

The same formatting as Gemma, with

<start_of_turn> and <end_of_turn> to-
kens, can also prompt the instruction-tuned
model.

Conclusion
We present a collection of open models spe-
cialized for coding applications, built on top of
Gemma, an openly available family of language
models (Gemma Team et al., 2024). These mod-
els push the state of the art in code completion
and generation, while retaining natural language
capabilities from the base models.
The CodeGemma models presented in this re-
port are highly capable language models designed
for effective real-world deployment, optimized
to be run in latency-constrained settings while
delivering high-quality code completion on a va-
riety of tasks and languages. We show that the
lessons and technologies that built Gemini and
Gemma are transferable to downstream applica-
tions, and we are excited to release these models
to the broader community and to enable the appli-
cations which will be built on top of these models.

6
CodeGemma: Open Code Models Based on Gemma

Contributions and Acknowledgments Elisa Bandy, Emma Yousif, gOrA\g koEWyA (Gau-
rang Kothiya), Glenn Cameron, htl pV l (Hetul
Core Contributors Patel), James Freedman, Jasmine George, Jenny
赵赫日 (Heri Zhao) Brennan, Johan Ferret, Josh Woodward, Kath-
許嘉倫 (Jeffrey Hui) leen Kenealy, Keelin McDonell, Lav Rai, Léonard
Joshua Howland Hussenot, ‫( ﻟﺒﻨﻰ ﺑﻦ ﻋﻼل‬Loubna Ben Allal), Ludovic
Nguyễn Thành Nam1 (Nam Nguyen) Peran, Luiz Gustavo Martin, Manvinder Singh,
左斯琦 (Siqi Zuo) Matthew Watson, Meg Risdal, Michael Butler,
Contributors Michael Moynihan, ᄀ ᆷᄆ
ᅵ ᆫ (Min Kim), ᄇ
ᅵ ᆨᅵ
ᅡ ᆫ
ᄆ우
胡琪恩 (Andrea Hu) (Minwoo Park), Minh Giang, Morgane Rivière,
Christopher A. Choquette-Choo Navneet Potti, Nino Vieillard, Olivier Bachem,
Jingyue Shen Omar Sanseviero, Pedro Cuenca, Phil Culliton,
Pier Giuseppe Sessa, ం (Raj Gundluru),
Joe Kelley
E"Etя b\sl (Kshitij Bansal) Robert Dadashi, s\яnA proEht (Sanjana Puro-
hit), Sertan Girgin, ర ప (Surya Bhu-
Luke Vilnis
Mateo Wirth patiraju), u(kq p\yA (Utkarsh Pandya), v {Bv
Paul Michel rFvA-tv (Vaibhav Srivastav), 单志昊 (Zhihao
Peter Choy Shan).
prEtk яofF (Pratik Joshi)

ƒ
Ravin Kumar
ēũϗ ĂQϗĲ ëϗĳĞϗā (Sarmad Hashmi)
References
fBm ag }vAl (Shubham Agrawal) A. Amini, S. Gabriel, P. Lin, R. Koncel-Kedziorski,
Zhitao Gong Y. Choi, and H. Hajishirzi. MathQA: Towards
Product Management interpretable math word problem solving with
Jane Fine operation-based formalisms, 2019. URL http:
Tris Warkentin //arxiv.org/abs/1905.13319.

Program Management J. Austin, A. Odena, M. I. Nye, M. Bosma,

Ale Jakse Hartman H. Michalewski, D. Dohan, E. Jiang, C. J.
Cai, M. Terry, Q. V. Le, and C. Sutton. Pro-
Executive Sponsors gram synthesis with large language models.
Bin Ni CoRR, abs/2108.07732, 2021. URL https:
Kathy Korevec //arxiv.org/abs/2108.07732.
Kelly Schaefer
Scott Huffman M. Bavarian, H. Jun, N. Tezak, J. Schulman,
Acknowledgements C. McLeavey, J. Tworek, and M. Chen. Effi-
Our work is made possible by the dedication and cient training of language models to fill in the
efforts of numerous teams at Google. We would middle, 2022.
like to acknowledge the support from the follow-
Y. Bisk, R. Zellers, R. L. Bras, J. Gao, and Y. Choi.
ing teams: AIDA, DevRel, Gemini Infrastructure,
PIQA: reasoning about physical commonsense
Gemini Safety, Gemma, Google Cloud, Google
in natural language. CoRR, abs/1911.11641,
Research Responsible AI, Kaggle, Keras.
2019. URL http://arxiv.org/abs/1911.
Special thanks and acknowledgment to Alek 11641.
Andreev, அநி த் ராம் (Anirudh Sriram),
Antonia Paterson, aromA mh dý (Aroma Ma- M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P.
hendru), Arthur Zucker, Austin Huang, David de Oliveira Pinto, J. Kaplan, H. Edwards,
Huntsperger, व नक वर ड़या (Dhvanik Viradiya), Y. Burda, N. Joseph, G. Brockman, A. Ray,
R. Puri, G. Krueger, M. Petrov, H. Khlaaf,
1 Lead. G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ry-

7
CodeGemma: Open Code Models Based on Gemma

der, M. Pavlov, A. Power, L. Kaiser, M. Bavar- A. Chowdhery, A. Roberts, A. Barua, A. Botev,

ian, C. Winter, P. Tillet, F. P. Such, D. Cum- A. Castro-Ros, A. Slone, A. Héliou, A. Tacchetti,
mings, M. Plappert, F. Chantzis, E. Barnes, A. Bulanova, A. Paterson, B. Tsai, B. Shahri-
A. Herbert-Voss, W. H. Guss, A. Nichol, A. Paino, ari, C. L. Lan, C. A. Choquette-Choo, C. Crepy,
N. Tezak, J. Tang, I. Babuschkin, S. Balaji, D. Cer, D. Ippolito, D. Reid, E. Buchatskaya,
S. Jain, W. Saunders, C. Hesse, A. N. Carr, E. Ni, E. Noland, G. Yan, G. Tucker, G.-C.
J. Leike, J. Achiam, V. Misra, E. Morikawa, Muraru, G. Rozhdestvenskiy, H. Michalewski,
A. Radford, M. Knight, M. Brundage, M. Murati, I. Tenney, I. Grishchenko, J. Austin, J. Keel-
K. Mayer, P. Welinder, B. McGrew, D. Amodei, ing, J. Labanowski, J.-B. Lespiau, J. Stanway,
S. McCandlish, I. Sutskever, and W. Zaremba. J. Brennan, J. Chen, J. Ferret, J. Chiu, J. Mao-
Evaluating large language models trained on Jones, K. Lee, K. Yu, K. Millican, L. L. Sjoe-
code. CoRR, abs/2107.03374, 2021. URL sund, L. Lee, L. Dixon, M. Reid, M. Mikuła,
https://arxiv.org/abs/2107.03374. M. Wirth, M. Sharman, N. Chinaev, N. Thain,
O. Bachem, O. Chang, O. Wahltinez, P. Bailey,
C. Clark, K. Lee, M. Chang, T. Kwiatkowski,
P. Michel, P. Yotov, P. G. Sessa, R. Chaabouni,
M. Collins, and K. Toutanova. Boolq: Explor-
R. Comanescu, R. Jana, R. Anil, R. McIlroy,
ing the surprising difficulty of natural yes/no
R. Liu, R. Mullins, S. L. Smith, S. Borgeaud,
questions. CoRR, abs/1905.10044, 2019. URL
S. Girgin, S. Douglas, S. Pandya, S. Shak-
http://arxiv.org/abs/1905.10044.
eri, S. De, T. Klimenko, T. Hennigan, V. Fein-
P. Clark, I. Cowhey, O. Etzioni, T. Khot, A. Sabhar- berg, W. Stokowiec, Y. hui Chen, Z. Ahmed,
wal, C. Schoenick, and O. Tafjord. Think you Z. Gong, T. Warkentin, L. Peran, M. Giang,
have solved question answering? try arc, the C. Farabet, O. Vinyals, J. Dean, K. Kavukcuoglu,
ai2 reasoning challenge, 2018. D. Hassabis, Z. Ghahramani, D. Eck, J. Bar-
ral, F. Pereira, E. Collins, A. Joulin, N. Fiedel,
K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, E. Senter, A. Andreev, and K. Kenealy. Gemma:
H. Jun, L. Kaiser, M. Plappert, J. Tworek, Open models based on gemini research and
J. Hilton, R. Nakano, C. Hesse, and J. Schul- technology, 2024.
man. Training verifiers to solve math word
problems, 2021a. URL https://arxiv.org/ D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong,
abs/2110.14168v2. W. Zhang, G. Chen, X. Bi, Y. Wu, Y. K. Li, F. Luo,
Y. Xiong, and W. Liang. Deepseek-coder: When
K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen,
the large language model meets programming
H. Jun, L. Kaiser, M. Plappert, J. Tworek,
– the rise of code intelligence, 2024.
J. Hilton, R. Nakano, C. Hesse, and J. Schul-
man. Training verifiers to solve math word D. Hendrycks, C. Burns, S. Basart, A. Zou,
problems. CoRR, abs/2110.14168, 2021b. URL M. Mazeika, D. Song, and J. Steinhardt. Mea-
https://arxiv.org/abs/2110.14168. suring massive multitask language understand-
Y. Ding, Z. Wang, W. U. Ahmad, H. Ding, M. Tan, ing. CoRR, abs/2009.03300, 2020. URL
N. Jain, M. K. Ramanathan, R. Nallapati, P. Bha- https://arxiv.org/abs/2009.03300.
tia, D. Roth, and B. Xiang. Crosscodeeval: A D. Hendrycks, C. Burns, S. Kadavath, A. Arora,
diverse and multilingual benchmark for cross- S. Basart, E. Tang, D. Song, and J. Steinhardt.
file code completion, 2023. Measuring mathematical problem solving with
D. Fried, A. Aghajanyan, J. Lin, S. Wang, E. Wal- the math dataset. NeurIPS, 2021.
lace, F. Shi, R. Zhong, W. tau Yih, L. Zettle-
A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bam-
moyer, and M. Lewis. Incoder: A generative
ford, D. S. Chaplot, D. de las Casas, F. Bressand,
model for code infilling and synthesis, 2023.
G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud,
Gemma Team, T. Mesnard, C. Hardin, R. Dadashi, M.-A. Lachaux, P. Stock, T. L. Scao, T. Lavril,
S. Bhupatiraju, S. Pathak, L. Sifre, M. Riv- T. Wang, T. Lacroix, and W. E. Sayed. Mistral
ière, M. S. Kale, J. Love, P. Tafti, L. Hussenot, 7b, 2023.

8
CodeGemma: Open Code Models Based on Gemma

M. Joshi, E. Choi, D. S. Weld, and L. Zettle- abs/1907.10641, 2019. URL http://arxiv.

moyer. Triviaqa: A large scale distantly su- org/abs/1907.10641.
pervised challenge dataset for reading compre-
hension. CoRR, abs/1705.03551, 2017. URL H. Touvron, L. Martin, K. Stone, P. Albert,
http://arxiv.org/abs/1705.03551. A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra,
P. Bhargava, S. Bhosale, D. Bikel, L. Blecher,
W. Ling, D. Yogatama, C. Dyer, and P. Blunsom. C. C. Ferrer, M. Chen, G. Cucurull, D. Es-
Program induction by rationale generation : iobu, J. Fernandes, J. Fu, W. Fu, B. Fuller,
Learning to solve and explain algebraic word C. Gao, V. Goswami, N. Goyal, A. Hartshorn,
problems, 2017. URL https://arxiv.org/ S. Hosseini, R. Hou, H. Inan, M. Kardas,
abs/1705.04146v3. V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev,
P. S. Koura, M.-A. Lachaux, T. Lavril, J. Lee,
T. Liu, C. Xu, and J. McAuley. Repobench: Bench-
D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mi-
marking repository-level code auto-completion
haylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton,
systems, 2023.
J. Reizenstein, R. Rungta, K. Saladi, A. Schel-
A. Lozhkov, R. Li, L. B. Allal, F. Cassano, J. Lamy- ten, R. Silva, E. M. Smith, R. Subramanian,
Poirier, N. Tazi, A. Tang, D. Pykhtar, J. Liu, X. E. Tan, B. Tang, R. Taylor, A. Williams, J. X.
Y. Wei, T. Liu, M. Tian, D. Kocetkov, A. Zucker, Kuan, P. Xu, Z. Yan, I. Zarov, Y. Zhang, A. Fan,
Y. Belkada, Z. Wang, Q. Liu, D. Abulkhanov, M. Kambadur, S. Narang, A. Rodriguez, R. Sto-
I. Paul, Z. Li, W.-D. Li, M. Risdal, J. Li, J. Zhu, jnic, S. Edunov, and T. Scialom. Llama 2: Open
T. Y. Zhuo, E. Zheltonozhskii, N. O. O. Dade, foundation and fine-tuned chat models, 2023.
W. Yu, L. Krauß, N. Jain, Y. Su, X. He, M. Dey,
Y. Wei, Z. Wang, J. Liu, Y. Ding, and L. Zhang.
E. Abati, Y. Chai, N. Muennighoff, X. Tang,
Magicoder: Source code is all you need,
M. Oblokulov, C. Akiki, M. Marone, C. Mou,
2023. URL http://arxiv.org/abs/2312.
M. Mishra, A. Gu, B. Hui, T. Dao, A. Zebaze,
02120.
O. Dehaene, N. Patry, C. Xu, J. McAuley, H. Hu,
T. Scholak, S. Paquet, J. Robinson, C. J. Ander- T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. De-
son, N. Chapados, M. Patwary, N. Tajbakhsh, langue, A. Moi, P. Cistac, T. Rault, R. Louf,
Y. Jernite, C. M. Ferrandis, L. Zhang, S. Hughes, M. Funtowicz, J. Davison, S. Shleifer, P. von
T. Wolf, A. Guha, L. von Werra, and H. de Vries. Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L.
Starcoder 2 and the stack v2: The next genera- Scao, S. Gugger, M. Drame, Q. Lhoest, and
tion, 2024. A. M. Rush. Huggingface’s transformers: State-
of-the-art natural language processing, 2020.
G. Orlanski, K. Xiao, X. Garcia, J. Hui, J. How-
land, J. Malmaud, J. Austin, R. Singh, and R. Zellers, A. Holtzman, Y. Bisk, A. Farhadi, and
M. Catasta. Measuring the impact of program- Y. Choi. Hellaswag: Can a machine really finish
ming language distribution. arXiv preprint your sentence?, 2019.
arXiv:2302.01973, 2023.

B. Rozière, J. Gehring, F. Gloeckle, S. Sootla,

I. Gat, X. E. Tan, Y. Adi, J. Liu, R. Sauvestre,
T. Remez, J. Rapin, A. Kozhevnikov, I. Evtimov,
J. Bitton, M. Bhatt, C. C. Ferrer, A. Grattafiori,
W. Xiong, A. Défossez, J. Copet, F. Azhar,
H. Touvron, L. Martin, N. Usunier, T. Scialom,
and G. Synnaeve. Code llama: Open founda-
tion models for code, 2024.

K. Sakaguchi, R. L. Bras, C. Bhagavatula, and

Y. Choi. WINOGRANDE: an adversarial
winograd schema challenge at scale. CoRR,

Metal Gear Solid (Raymond Benson (Benson, Raymond) )
100% (1)
Metal Gear Solid (Raymond Benson (Benson, Raymond) )
335 pages
Bee - Coding Advanced
No ratings yet
Bee - Coding Advanced
18 pages
CodeGemma: Google's Open-Source Marvel in Code Completion
No ratings yet
CodeGemma: Google's Open-Source Marvel in Code Completion
9 pages
Programming Lang Processing
No ratings yet
Programming Lang Processing
70 pages
Evaluating Large Language Models Trained On Code
No ratings yet
Evaluating Large Language Models Trained On Code
35 pages
CodeGeeX4: Multilingual Open-Source Code Assistant
No ratings yet
CodeGeeX4: Multilingual Open-Source Code Assistant
9 pages
OpenAI Codex Arxiv
No ratings yet
OpenAI Codex Arxiv
35 pages
Gemma: Open Models Based On Gemini Research and Technology
No ratings yet
Gemma: Open Models Based On Gemini Research and Technology
17 pages
deepseek论文
No ratings yet
deepseek论文
18 pages
2401.14196
No ratings yet
2401.14196
23 pages
Code Generation With LLMs
No ratings yet
Code Generation With LLMs
59 pages
Google Gemini
No ratings yet
Google Gemini
16 pages
Gemma Report
No ratings yet
Gemma Report
16 pages
OpenCoder_1731317971
No ratings yet
OpenCoder_1731317971
35 pages
CodeTree
No ratings yet
CodeTree
16 pages
Codesearchnet Challenge Evaluating The State of Semantic Code Search
No ratings yet
Codesearchnet Challenge Evaluating The State of Semantic Code Search
6 pages
(2023) A Survey On Language Models For Code
No ratings yet
(2023) A Survey On Language Models For Code
55 pages
Coding with AI For Dummies Chris Minnick - The ebook is ready for instant download and access
100% (3)
Coding with AI For Dummies Chris Minnick - The ebook is ready for instant download and access
69 pages
2502.05664v1
No ratings yet
2502.05664v1
27 pages
Coding with AI For Dummies Chris Minnick download
100% (2)
Coding with AI For Dummies Chris Minnick download
65 pages
Paper 1
No ratings yet
Paper 1
10 pages
2403.00046v2
No ratings yet
2403.00046v2
13 pages
7469_Magicoder_Empowering_Code
No ratings yet
7469_Magicoder_Empowering_Code
26 pages
2504.07655v1 - Copia
No ratings yet
2504.07655v1 - Copia
12 pages
Agent Coder 2312.13010v2
No ratings yet
Agent Coder 2312.13010v2
21 pages
Token-by-Token Regeneration and Domain Biases- A Benchmark of LLMs on Advanced Mathematical Problem-Solving
No ratings yet
Token-by-Token Regeneration and Domain Biases- A Benchmark of LLMs on Advanced Mathematical Problem-Solving
8 pages
2502.17139v1
No ratings yet
2502.17139v1
13 pages
2034_999_DOC_LAB Exploring AI-Powered Coding
No ratings yet
2034_999_DOC_LAB Exploring AI-Powered Coding
3 pages
(2023) An Empirical Comparison of Pre-Trained Models of Source Code
No ratings yet
(2023) An Empirical Comparison of Pre-Trained Models of Source Code
13 pages
Lecture 15 - Foundation Models - CLIP and GPT
No ratings yet
Lecture 15 - Foundation Models - CLIP and GPT
45 pages
CodeGeeX - A Pre-Trained Model For Code Generation With Multilingual Evaluations On HumanEval-X
No ratings yet
CodeGeeX - A Pre-Trained Model For Code Generation With Multilingual Evaluations On HumanEval-X
30 pages
Qwen2.5-Coder Technical Report
No ratings yet
Qwen2.5-Coder Technical Report
32 pages
2412.12544v1
No ratings yet
2412.12544v1
19 pages
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet
Code T
No ratings yet
Code T
19 pages
10.48550 Arxiv.2204.02311
No ratings yet
10.48550 Arxiv.2204.02311
87 pages
P G - C 2: B L L M C R F: AN U Oder Oosting Arge Anguage Odels For Ode With Anking Eedback
No ratings yet
P G - C 2: B L L M C R F: AN U Oder Oosting Arge Anguage Odels For Ode With Anking Eedback
15 pages
Legal 2 AI
No ratings yet
Legal 2 AI
10 pages
Qwen2.5-Coder Technical Report: Binyuan Hui Jian Yang Zeyu Cui Jiaxi Yang
No ratings yet
Qwen2.5-Coder Technical Report: Binyuan Hui Jian Yang Zeyu Cui Jiaxi Yang
23 pages
Code Explanation
No ratings yet
Code Explanation
8 pages
Deepercoder: Code Generation Using Machine Learning: Ntroduction
No ratings yet
Deepercoder: Code Generation Using Machine Learning: Ntroduction
6 pages
Code Contrast A Contractive Learning Approach_for_G
No ratings yet
Code Contrast A Contractive Learning Approach_for_G
28 pages
Skills For AIO 2024: (All You Need To Prepare)
No ratings yet
Skills For AIO 2024: (All You Need To Prepare)
71 pages
Gptutor: A Chatgpt-Powered Programming Tool For Code Explanation
No ratings yet
Gptutor: A Chatgpt-Powered Programming Tool For Code Explanation
6 pages
A: Active Retrieval in Knowledge Soup For Code Generation
No ratings yet
A: Active Retrieval in Knowledge Soup For Code Generation
16 pages
Mastering Generic Programming in C++: Unlock the Secrets of Expert-Level Skills
From Everand
Mastering Generic Programming in C++: Unlock the Secrets of Expert-Level Skills
Larry Jones
No ratings yet
CS102_Lab1_Spring2025
No ratings yet
CS102_Lab1_Spring2025
13 pages
Z-5
No ratings yet
Z-5
19 pages
symmetry-13-00247-v2
No ratings yet
symmetry-13-00247-v2
15 pages
AI - Tools - For - Code - Generation - 1697393355 2023-10-15 18 - 09 - 22
No ratings yet
AI - Tools - For - Code - Generation - 1697393355 2023-10-15 18 - 09 - 22
10 pages
AIGDEL - 0820 Red 1 26 - Compressed 1 26
No ratings yet
AIGDEL - 0820 Red 1 26 - Compressed 1 26
26 pages
EVALUATION - Coding Data Requirements (3)
No ratings yet
EVALUATION - Coding Data Requirements (3)
24 pages
Code Llama: Open Foundation Models For Code
No ratings yet
Code Llama: Open Foundation Models For Code
48 pages
Autocoder: Enhancing Code Large Language Model With Aiev-I: Nstruct
No ratings yet
Autocoder: Enhancing Code Large Language Model With Aiev-I: Nstruct
11 pages
2308.08784
No ratings yet
2308.08784
10 pages
What Do Code Models Memorize? An Empirical Study On Large Language Models of Code
No ratings yet
What Do Code Models Memorize? An Empirical Study On Large Language Models of Code
13 pages
Recurrentgemma Report
No ratings yet
Recurrentgemma Report
6 pages
Big Code Bench
No ratings yet
Big Code Bench
62 pages
SIC - C - P - Chapter 1. Programing Basic Concept and Starting Python - v1
100% (1)
SIC - C - P - Chapter 1. Programing Basic Concept and Starting Python - v1
545 pages
Magicoder - Source Code Is All You Need
No ratings yet
Magicoder - Source Code Is All You Need
16 pages
KeyuHe_MaxLi_JosephLiu
No ratings yet
KeyuHe_MaxLi_JosephLiu
12 pages
Features of The Presidential System The Executive (President) Can Veto Acts by The Legislature
No ratings yet
Features of The Presidential System The Executive (President) Can Veto Acts by The Legislature
8 pages
Lecture 1 Agriculture Heritage (Dr. Jaya Bharti)
No ratings yet
Lecture 1 Agriculture Heritage (Dr. Jaya Bharti)
13 pages
Unit 3: Conditionals: English Team
No ratings yet
Unit 3: Conditionals: English Team
11 pages
Advertisement
No ratings yet
Advertisement
9 pages
Commonwealth Act 578 and 586 Ed. 301
No ratings yet
Commonwealth Act 578 and 586 Ed. 301
47 pages
2018
No ratings yet
2018
111 pages
Stille Imagiq2 Service Manual
No ratings yet
Stille Imagiq2 Service Manual
80 pages
Msds Klorin Lengkap
No ratings yet
Msds Klorin Lengkap
5 pages
Senior High School Student Permanent Record: Republic of The Philippines Department of Education
No ratings yet
Senior High School Student Permanent Record: Republic of The Philippines Department of Education
9 pages
(Ebook) The Future of Assisted Suicide and Euthanasia by Neil M. Gorsuch ISBN 9781400830343 instant download
No ratings yet
(Ebook) The Future of Assisted Suicide and Euthanasia by Neil M. Gorsuch ISBN 9781400830343 instant download
54 pages
British Identity
No ratings yet
British Identity
29 pages
SE Lesson Plan Form
No ratings yet
SE Lesson Plan Form
10 pages
SD Exercise en v4.1
No ratings yet
SD Exercise en v4.1
18 pages
Histology Slides First Year M Wheater's Diagrams
No ratings yet
Histology Slides First Year M Wheater's Diagrams
30 pages
Monarch of The Sea Grounding 15dec98
No ratings yet
Monarch of The Sea Grounding 15dec98
62 pages
Distillation: C H E 2 4 6 Separation Process
No ratings yet
Distillation: C H E 2 4 6 Separation Process
29 pages
Financial Analysis of Projects
No ratings yet
Financial Analysis of Projects
61 pages
C01 Algebra
No ratings yet
C01 Algebra
118 pages
How To Get A Patent
No ratings yet
How To Get A Patent
3 pages
WEEK 3 - Q1 M3 - THE HUMAN AS EMBODIED SPIRIT Wo Video Clip
No ratings yet
WEEK 3 - Q1 M3 - THE HUMAN AS EMBODIED SPIRIT Wo Video Clip
31 pages
Bluewat-MSDS-Water de Colorant Agen 40
No ratings yet
Bluewat-MSDS-Water de Colorant Agen 40
4 pages
Codoy
No ratings yet
Codoy
5 pages
HMP6
No ratings yet
HMP6
7 pages
Chapter 4 - Stylish Academic Writing
No ratings yet
Chapter 4 - Stylish Academic Writing
13 pages
Aja00104051 1354
No ratings yet
Aja00104051 1354
35 pages
Turkish 2
No ratings yet
Turkish 2
11 pages
23338867
No ratings yet
23338867
14 pages
Computer Peripherals: School of Computer Engineering Nanyang Technological University Singapore
No ratings yet
Computer Peripherals: School of Computer Engineering Nanyang Technological University Singapore
5 pages
CH 12 Three-Dimensional Figures
No ratings yet
CH 12 Three-Dimensional Figures
17 pages

Codegemma Report

Uploaded by

Codegemma Report

Uploaded by

2024-04-09

CodeGemma: Open Code Models Based on

Introduction Gemma Pretrained Models

We present CodeGemma, a collection of open

Pretraining Preprocessing for Fill-in-the-Middle

Training Data The pretrained CodeGemma models are trained

© 2024 Google LLC. All rights reserved

Table 1 | Formatting control tokens used for FIM

problems, increasing their ability to perform com- Infilling Capability

Time (s) Performance

2B class CodeGemma 543 8479 78.41% 51.44%

Code Llama* 74.10% 48.20%

The canonical benchmarks used in coding evalu-

Language 2B 7B 7B-IT Model GSM8K MATH

Go 20.5% 21.7% 28.6% Code Llama 13.0%

Java 41.8% 50.3% 57.3% Practical Considerations

Figure 3 | Language capability comparison of

port suggestion. Figure 4 shows an example of a

Figure 4 | Prompt in PSM mode. The carriage

The same formatting as Gemma, with

Program Management J. Austin, A. Odena, M. I. Nye, M. Bosma,

der, M. Pavlov, A. Power, L. Kaiser, M. Bavar- A. Chowdhery, A. Roberts, A. Barua, A. Botev,

M. Joshi, E. Choi, D. S. Weld, and L. Zettle- abs/1907.10641, 2019. URL http://arxiv.

B. Rozière, J. Gehring, F. Gloeckle, S. Sootla,

K. Sakaguchi, R. L. Bras, C. Bhagavatula, and

You might also like