deepseek-vl-1.3b-base本地部署（Windows环境）

最新推荐文章于 2025-07-17 18:30:00 发布

THe CHallEnge of THe BrAve

最新推荐文章于 2025-07-17 18:30:00 发布

阅读量391

点赞数 4

CC 4.0 BY-SA版权

分类专栏：笔记文章标签： windows ai 计算机视觉视觉检测人工智能

本文链接：https://blog.csdn.net/qq_44842374/article/details/148833416

笔记专栏收录该内容

30 篇文章

订阅专栏

使用 venv 创建虚拟环境安装和运行 DeepSeek-VL 的完整方案：
确保已安装python

目录结构

DeepSeek-VL/
├── deepseek_env/          # 您的虚拟环境
├── weights/               # 模型权重
│   ├── deepseek-vl-1.3b-base
│   └── deepseek-vl-1.3b-chat
├── demo/                  # 示例代码
├── test.py                # 您的测试脚本
└── requirements.txt       # 依赖列表

1. 克隆仓库

git clone https://github.com/deepseek-ai/DeepSeek-VL.git
cd DeepSeek-VL

2. 在DeepSeek-VL目录下激活虚拟环境（Windows）

python -m venv deepseek_env
deepseek_env\Scripts\activate

升级pip

python -m pip install --upgrade pip

3. 安装依赖（确保使用 Python 3.10+）

pip install torchvision==0.18.1+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install torchaudio==2.7.1+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install torch==2.7.1+cu118 --index-url https://download.pytorch.org/whl/cu118
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121  # CUDA 12.1

# 安装项目核心依赖
pip install -r requirements.txt

4. 安装额外优化组件（可选但推荐）

pip install flash-attn --no-build-isolation  # 加速注意力计算
pip install bitsandbytes  # 8位量化支持

5. 下载模型权重

创建模型存储目录

mkdir weights
cd weights

下载基础模型（deepseek-vl-1.3b-base）

git lfs install
git clone https://huggingface.co/deepseek-ai/deepseek-vl-1.3b-base/tree/main

或者这个下载基础模型文件存放到目录：DeepSeek-VL\weights\deepseek-vl-1.3b-base\
在这里插入图片描述

6. 使用inference.py进行模型推理测试

inference.py 在该项目（DeepSeek-VL）中，是一个推理（inference）示例脚本，主要作用是：

作用说明

1. 加载模型和处理器

脚本会加载 DeepSeek-VL 多模态大模型（支持图文输入），以及相应的 tokenizer 和图片处理器。

2. 准备输入数据

脚本中有对话示例（如用户输入图片和问题），并加载本地图片作为输入。

3. 数据预处理

使用项目自带的 VLChatProcessor 对图片和文本进行预处理，生成模型需要的输入格式。

4. 模型推理

调用模型进行推理，生成对图片和文本的回复。

5. 输出结果

将模型生成的回复解码并打印出来。

测试样例：
可以直接用 inference.py 脚本来测试图片中的文字、数字等信息的识别。
假设你要识别一张图片中的文字或数字（比如 images/sample.jpg），并让模型描述图片里的文字或数字内容。

1）. 修改 `inference.py` 的输入部分

找到如下部分（大约在 30 行左右）：

conversation = [
    {
        "role": "User",
        "content": "<image_placeholder>Describe each stage of this image.",
        "images": ["./images/training_pipelines.jpg"],
    },
    {"role": "Assistant", "content": ""},
]

改成你自己的图片和问题，比如：

conversation = [
    {
        "role": "User",
        "content": "<image_placeholder>请识别这张图片中的所有文字和数字，并详细列出。",
        "images": ["./images/sample.jpg"],  # 这里换成你要测试的图片路径
    },
    {"role": "Assistant", "content": ""},
]

2）. 运行脚本

确保你的图片路径正确，图片放在 images/ 文件夹下（或写绝对路径），然后在命令行运行：

python inference.py

3）. 完整示例代码片段

假设你要识别 images/sample.jpg 里的文字和数字，代码如下：

# ... 省略前面加载模型的代码 ...

conversation = [
    {
        "role": "User",
        "content": "<image_placeholder>请识别这张图片中的所有文字和数字，并详细列出。",
        "images": ["./images/sample.jpg"],
    },
    {"role": "Assistant", "content": ""},
]

# 后面代码不用改，直接运行即可
pil_images = load_pil_images(conversation)
prepare_inputs = vl_chat_processor(
    conversations=conversation, images=pil_images, force_batchify=True
).to(vl_gpt.device)

inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)

outputs = vl_gpt.language_model.generate(
    inputs_embeds=inputs_embeds,
    attention_mask=prepare_inputs.attention_mask,
    pad_token_id=tokenizer.eos_token_id,
    bos_token_id=tokenizer.bos_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_new_tokens=512,
    do_sample=False,
    use_cache=True,
)

answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True)
print(f"{prepare_inputs['sft_format'][0]}", answer)

4）. 提问建议

你可以根据需要修改 content 字段，比如：

<image_placeholder>请详细识别图片中的所有文字和数字。
<image_placeholder>图片里有哪些数字？请全部列出。
<image_placeholder>图片中的文字内容是什么？

5）. 多图片识别

conversation = [
    {
        "role": "User",
        "content": "<image_placeholder>第一张图片的文字内容：<image_placeholder>第二张图片的数字内容：",
        "images": ["./images/sample1.jpg", "./images/sample2.jpg"],
    },
    {"role": "Assistant", "content": ""},
]