运行 Baichuan2-13B-Chat 模型推理代码

三千院本院

已于 2024-04-15 17:25:07 修改

阅读量574

点赞数 2

CC 4.0 BY-SA版权

分类专栏：大模型文章标签：深度学习人工智能 pytorch

于 2024-01-30 16:43:26 首次发布

本文链接：https://blog.csdn.net/weixin_42225889/article/details/135936199

大模型专栏收录该内容

5 篇文章

订阅专栏

文章介绍了如何使用预训练的Baichuan2模型进行文本生成，并展示了如何在不量化和进行4bits在线量化两种情况下运行模型，以生成关于历史问题的回答。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

运行代码

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig

def run_baichuan2(model_id):
    model_id = model_id
    # 加载分词
    tokenizer = AutoTokenizer.from_pretrained(
        # 模型地址
        model_id, 
        # 不加速，默认为Ture
        use_fast=False, 
        # 可解决ValueError: Tokenizer class BaichuanTokenizer does not exist
        trust_remote_code=True
    )
    # 加载模型
    model = AutoModelForCausalLM.from_pretrained(
        model_id, 
        # 加载模型精度
        torch_dtype=torch.float16, 
        # 信任远程代码
        trust_remote_code=True,
        # 显存自动分配（如果需要整数量化，该句需要注意，如果需要实现4bits在线量化需要注释）
        device_map="auto"
    )
    
    # 加载配置文件
    model.generation_config = GenerationConfig.from_pretrained(
        model_id
    )
    
    # 实现推理
    messages = []
    messages.append({"role": "user", "content":"亚历山大大帝的骑兵为什么强大？"})
    response = model.chat(tokenizer, messages)
    print(response)
    return response
    
def run_baichuan2_quantize(model_id):
    model_id = model_id
    model = AutoModelForCausalLM.from_pretrained(
        model_id, 
        torch_dtype=torch.float16, 
        trust_remote_code=True
    )
    tokenizer = AutoTokenizer.from_pretrained(
        model_id, 
        use_fast=False, 
        trust_remote_code=True
    )
    model.generation_config = GenerationConfig.from_pretrained(model_id)
    
    '''
    4bits 在线量化！（官方）
    需要注意的是，在用 from_pretrained 接口的时候，一般会加上 device_map="auto"，在使用在线量化时，需要去掉这个参数，否则会报错
    '''
    model = model.quantize(4).cuda()
    messages = []
    messages.append({"role": "user", "content":"亚历山大大帝的骑兵为什么强大？"})
    response = model.chat(tokenizer, messages)
    print(response)
    return response

if __name__ == "__main__":
    model_id = "Baichuan2-13B-Chat"
    # 不量化运行
    run_baichuan2(model_id)
    # 量化quantize(4)运行
    run_baichuan2_quantize(model_id)