贡献 Triton 算子库

最新推荐文章于 2025-04-22 11:21:47 发布

哦豁灬

最新推荐文章于 2025-04-22 11:21:47 发布

阅读量3.9k

点赞数 16

CC 4.0 BY-SA版权

本文链接：https://blog.csdn.net/qq_38342510/article/details/146715447

ai compiler 同时被 3 个专栏收录

13 篇文章

订阅专栏

CUDA

11 篇文章

订阅专栏

GPU

8 篇文章

订阅专栏

1 FlagGems

FlagGems 是使用 OpenAI 推出的 Triton编程语言实现的高性能通用算子库，旨在为大语言模型提供一系列可应用于 PyTorch 框架的算子，加速模型的推理与训练。

FlagGems 通过对 PyTorch 的后端 aten 算子进行覆盖重写，默认支持 pytorch 的 eager 模式，实现算子库的无缝替换，使用户能够在不修改模型代码的情况下平稳地切换到 Triton 算子库。FlagGems 不会影响 aten 后端的正常使用，且可以带来良好的性能提升。

FlagGems 所需依赖

Triton >= 2.2.0 PyTorch >= 2.2.0 Transformers >= 4.40.2

安装 FlagGems

git clone https://github.com/FlagOpen/FlagGems.gitcd FlagGemspip install .

全局替换FlagGems算子

import flag_gemsflag_gems.enable()

局部替换FlagGems算子

import flag_gemswith flag_gems.use_gems():    pass

FlagGems 使用示例

import torchimport 

flag_gemsM, N, K = 1024, 1024, 1024
A = torch.randn((M, K), dtype=torch.float16, device="cuda")
B = torch.randn((K, N), dtype=torch.float16, device="cuda")
with flag_gems.use_gems():
    C = torch.mm(A, B)

2 参与贡献

新建一个 Issue 来反馈遇到的 Bug 或者提出新功能需求（https://github.com/FlagOpen/FlagGems/issues/new/choose）
提一个 Pull Request 来修复一个 Bug 或者实现一个新功能（https://github.com/FlagOpen/FlagGems/compare）

2.1 代码贡献流程

重点浏览CI 相关的内容。

2.1.1 Fork 仓库并 clone 代码到本地

打开 FlagGems GitHub 首页（https://github.com/FlagOpen/FlagGems），单击 Fork 按钮创建一个仓库副本

# Clone FlagGems 仓库到本地并进入 FlagGems 文件夹
git clone https://github.com/FlagOpen/FlagGemscd FlagGems

2.1.2 创建本地分支

# 创建并切换到一个名为 master 的分支 
git checkout -b master

2.1.3 安装 pre-commit

FlagGems 使用 pre-commit（https://pre-commit.com）的 git hooks 格式化源代码，在调用 git commit 命令时进行代码静态检查，并且 pre-commit 测试也是 CI 的一部分，不通过检查的 Pull Request 不能被提交到 FlagGems

pip install pre-commit

2.1.4 开发代码

在提交拉取请求时，贡献者应描述所做的更改以及原因。如果可以设计测试用例，请提供相应测试。拉取请求在合并前需要一位成员的批准，而且需要通过代码的持续集成检查。

目前持续集成检查设有四条流水线:

代码格式检查
算子单元测试
模型测试
代码覆盖率检查

随着 FlagGems 的代码量增大，会有更多的测试检查加入集成测试，可以根据贡献指南（https://github.com/FlagOpen/FlagGems/blob/master/CONTRIBUTING_cn.md）查看最新的说明。

2.1.5 本地执行单元测试以及性能测试

算子正确性测试

cd testspytest 

test_xx_ops.py
# on CUDA

pytest test_xx_ops.py --device cpu
# on CPU

模型正确性测试

cd examplespytest
model_xx_test.py

算子性能测试

cd benchmarkpytest 
test_xx_perf.py -s # kernel
pytest test_xx_perf.py -s --mode cpu # e2e

运行时打印日志信息（性能测试不建议打开）

pytest program.py --log-cli-level debug

2.1.6 提交代码

git commit 会触发代码静态检测，检查有问题则代码提交失败
发起 Pull Request 合入代码之前，需要同步原仓库最新的代码防止代码冲突
pull request: 打开 fork 的 FlagGems 页面，并切换到所建分支，然后单击 Compare & pull request 按钮
提交 Pull Request 后会触发 Github Action 进行 CI（Continuous Integration，持续集成）测试，并且之后每提交一次代码合入（git push）都会触发一次 CI 测试
CI 测试通过后，请等待 Code Review。收到 Code Review 意见后，请回复评审人的意见，并根据意见修改代码。
PR Merge 后本次贡献结束