MagicPIG: GPU-CPU协同的LLM推理优化框架-CSDN博客

本文链接：https://blog.csdn.net/gitblog_01148/article/details/147346997

MagicPIG: GPU-CPU协同的LLM推理优化框架

MagicPIG [ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation 项目地址: https://gitcode.com/gh_mirrors/ma/MagicPIG

1. 项目介绍

MagicPIG（GPU-CPU协同的Locality-Sensitive Hashing LLM推理优化框架）是一个为了探索GPU-CPU系统在Locality-Sensitive Hashing（LSH）支持下的可能性的开源项目。该框架通过LSH采样技术，显著提高了大型语言模型（LLM）的推理效率，能够在不同的应用场景下，通过GPU和CPU的协同工作，实现推理性能的大幅提升。

2. 项目快速启动

环境准备

硬件要求：支持AVX512的Intel CPU。若使用BFloat16，需要支持AVX512_BF16的Intel CPU，GCC版本需大于等于11。
推荐Python版本：3.9/3.10。

安装步骤

# 创建并激活虚拟环境
conda create -n magicpig
conda activate magicpig

# 安装依赖
bash install.sh

生成示例

# 进入示例目录
cd examples

# 执行生成命令
numactl -C 0-31,52-83 -m 0,1 \
python generation.py \
--model meta-llama/Meta-Llama-3.1-8B-Instruct \
--M 8192 \
--G 256 \
--K 10 \
--L 170 \
--template meta-llama3 \
--data ../data/story.txt

基准测试

# 进入示例目录
cd examples

# 执行基准测试命令
numactl -C 0-31,52-83 -m 0,1 \
python bench.py \
--model meta-llama/Meta-Llama-3.1-8B-Instruct \
--B 1 \
--P 98000 \
--M 98304 \
--K 10 \
--L 150