【大模型部署实践-3】3个能在3090上跑起来的4bits量化Chat模型（baichuan2-13b、InternLM-20b、Yi-34b）_baichuan2-13b-chat-4bits ubuntu部署streamlit哪个版本-CSDN博客

写在前面：操作环境

操作系统：win11专业版
GPU：GTX 3090 24G
CUDA：12.0
python: 3.8
PyTorch: 2.0.1(baichuan2必须)

魔搭社区下载
先git clone仓库，然后手动下载bin文件

这部分踩坑比较多，主要是各种库兼容性的问题，出现各种报错
transformer需要4.33.1版本，高版本不能兼容，会出现No module named 'transformers_modules.Baichuan2-13B-Chat' transformers之类的错误。
需要下载bitsandbytes，由于项目所在环境为win11，尝试了下载bitsandbytes-windows并没有用。
找了一个改造过的仓库的release，对应版本需要bitsandbytes-0.41.1-py3-none-win_amd64（在https://github.com/jllllll/bitsandbytes-windows-webui/releases/tag/wheels下载），低于该版本容易出现问题，如CUDA detection failed、cannot import name 'Params4bit' from 'bitsandbytes.nn.modules'（由该解决方案启发）。
通过pip install xFormers==0.0.20消除警告并推理加速

一些参考：

运行Baichuan2-int4量化版本
 "玩一玩"baichuan2
Baichuan2大模型启动时，所依赖的三方包版本都有哪些

这部分下载的baichuan2的github仓库，用的cli-demo运行

魔搭社区下载量化后的参数，
先git clone仓库，然后手动下载bin文件

用工具箱所以不需要配置

基于transformer加载参数的文档写得有点问题，主要参考“仅需一块3090显卡，高效部署InternLM-20B模型”，使用LMDeploy相关命令转化模型参数、交互。

lmdeploy convert \
    --model-name internlm-chat \
    --model-path ./internlm-chat-20b-4bit \
    --model-format awq \
    --group-size 128 \