LLM - Qwen-VL 视觉模型初体验

最新推荐文章于 2025-03-10 16:29:01 发布

BIT_666

最新推荐文章于 2025-03-10 16:29:01 发布

阅读量990

点赞数 3

分类专栏： LLM 文章标签：人工智能 LLM Vision

本文链接：https://blog.csdn.net/BIT_666/article/details/139117608

版权

LLM 专栏收录该内容

50 篇文章

订阅专栏

一.引言

随着语言 LLM 的崛起，多模态的图像、视频模型更迭的速度也在加快，今天尝试下 Qwen 最新的 Qwen-VL 多模态模型，官方文档介绍中 4-bit 量化版本并未有明显的性能下降，所以这里我们直接上 4-bit 量化模型搞起:

模型下载地址: https://huggingface.co/Qwen/Qwen-VL-Chat-Int4

二.环境准备

除了基础的 package 依赖需要更改外，还需到单独安装依赖库，主要是 AutoGPTQ:

pip install -r requirements.txt
pip install optimum

git clone https://github.com/JustinLin610/AutoGPTQ.git & cd AutoGPTQ
pip install -v .

下载前、下载后还有一些依赖的坑，所以除了上述依赖外，大家安装前首先保证安装如下两个 package:

pip install transformers_stream_generator
pip install gekko

三.模型测试

1.模型 Load

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(1234)

use_int4 = True

if use_int4 == True:
    model_path = "/model/Qwen-VL-Chat-Int4"
else:
    model_path = "/model/Qwen-VL-Chat"

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# use bf16
#model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cpu", trust_remote_code=True).eval()
# use cuda device
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda", trust_remote_code=True).eval()

这里 4 bit 模型加载后大约需要 20G 显存，如果不想使用量化模型，也可以在 HF 上下载对应的原始 FP32 的模型。

2.模型 Chat

query = tokenizer.from_list_format([
    {'image': '/vision/cr7.jpg'},
    {'text': '这是什么'},
])
response, history = model.chat(tokenizer, query=query, history=None)
print(response)

通过图片与 Instruction 获取图像的理解:

3.目标识别

# 2nd dialogue turn
response, history = model.chat(tokenizer, '输出"奖杯"的检测框', history=history)

print(response)
image = tokenizer.draw_bbox_on_latest_picture(response, history)
if image:
  image.save('1.jpg')
else:
  print("no box")

通过 "检测框" 的指令进行目标识别: