Qwen3 对 4 位和 2 位量化的表现如何？

runner000001

已于 2025-07-01 11:17:25 修改

阅读量935

点赞数 30

CC 4.0 BY-SA版权

文章标签：人工智能语言模型自然语言处理

于 2025-07-01 10:44:33 首次发布

本文链接：https://blog.csdn.net/xuner1213/article/details/149041015

#王者杯·14天创作挑战营·第3期#

Qwen3 系列模型终于面世，表现果然不负众望！

尽管体积紧凑，这些模型在各项基准测试中表现优异。14B 和 32B 版本潜力十足，尤其适合消费级硬件设备。但最引人注目的当属 Qwen3-30B-A3B：这个 300 亿参数的模型在推理时仅激活 30 亿参数，其混合专家架构（MoE）设计使其极为轻量化——量化版本可轻松部署在 24GB 显存的 GPU 上，若搭配 GPTQ+Marlin 等 GPU 友好格式更能高效运行。

本文将深入探讨 Qwen3 模型的量化表现，简而言之结论令人惊喜：该系列对量化极其友好，即便是 2-bit 版本仍保持强劲性能。我将逐步解析量化流程，分享评估数据，并演示如何通过 vLLM 高效运行模型（包括开启/关闭推理模式两种场景）。

Qwen3：紧凑型混合推理模型

截至目前，我们关于 Qwen3 的信息来源仅有官方模型卡片和一篇博客文章，尽管文档资料有限，Qwen3 系列模型的表现已引发业界将其与 DeepSeek R1 等领先版本进行比较。本次发布包含六个经过指令微调的稠密模型及其基础版本和 FP8 版本，另有两款稀疏混合专家（MoE）模型——其架构与 DeepSeek 模型相似。所有模型均采用宽松的 Apache 许可证发布。

基准测试结果显示，即便是 Qwen3 的 40 亿参数稠密模型，其性能也可与规模大得多的模型相媲美，在与 GPT-4、DeepSeek V3（非推理模型）等系统主导的性能区间内展开竞争。他们在神经网络架构方面改进不大，这有点遗憾。相反，他们把重点放在了改进训练后的方案上。在很多方面，他们效仿了DeepSeek的做法：先通过监督微调来植入推理模式，然后进行大量强化学习阶段。较小的模型采用了文档记录较少的方法进行蒸馏，可能涉及大量合成指令调优，可能会优先考虑基准测试性能而非更广泛的鲁棒性。

思维模式切换功能（见于 Grok 3 和 Claude 3.7 等模型）同样存在于本系统，并配有直观的性能缩放曲线图，展示不同 token 预算下的表现变化。

Qwen3 的训练数据量超过 35 万亿 token，规模与 Llama 4、DeepSeek V3 等顶级模型相当。其训练数据集具有高度多语言特性，官方宣称支持 119 种语言。但由于目前缺乏针对大多数语言的有效基准测试，尚难判断 Qwen3 是否真正实现了对这些语言的支持。

需要注意的是，这些模型并非多模态。这意味着仍有很大的改进空间。Qwen3-VL 在语言任务上的表现很可能会比本周发布的这些模型更出色！

Qwen3（GPU 优化版）量化方案

我们将采用目前最高效的量化方法之一 AutoRound。该方法支持导出 GPTQ 和 AWQ 两种格式的模型，这两种格式受到主流推理框架的广泛支持，且在 GPU 上运行速度极快。

采用这一配置，我创建了 Qwen3 的 4 位和 2 位量化版本：

from transformers import AutoModelForCausalLM, AutoTokenizer


model_name = "Qwen/Qwen3-32B"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
from auto_round import AutoRound

autoround = AutoRound(model, tokenizer, nsamples=512, iters=512, low_gpu_mem_usage=False, enable_torch_compile=True, bits=2, seqlen=4096, group_size=32, sym=True)    
output_dir = "./Qwen3-32B-autoround-2bit-gptq"
autoround.quantize_and_save(output_dir, format='auto_gptq')

使用 H100 GPU 对 32B 模型进行量化处理耗时仅三个多小时，但你并不需要如此高端的硬件；配备 32GB 显存的 GPU 就足够胜任，若适当调整超参数，甚至 24GB 显存的显卡也能应对。

令我意外的是，我无需更新运行环境——Qwen3 模型已获原生支持。不过当前存在一个限制：采用 GPTQ 格式量化的 Qwen3 混合专家模型（MoE）尚无法在 vLLM 或 Transformers 框架下运行。BitsAndBytes 和 AWQ 量化方案也存在相同问题。现阶段我们将重点分析密集型模型。

Qwen3 是否易于精确量化？

确实如此——而且效果令人惊艳。

Qwen3-32B 的 2 位量化模型在 IFEval 评测中仅比 4 位版本低 10 分，而该 4 位版本的性能与原始模型非常接近。

更令人惊喜的是，2 比特量化的 Qwen3-32B 模型能在 RTX 5080/4080 这类 16GB 显存的显卡上流畅运行。

令人惊讶的是，14B 模型的 2 位量化版本竟然没有崩溃。虽然其表现明显逊色于 4 位版本，但这在预料之中——同体量的大多数 2 位模型通常只能输出乱码。而这次测试中，它仍能生成连贯的输出，这个结果相当鼓舞人心。

对于需要良好指令跟随能力的任务，4 比特量化的 Qwen3-8B 在保持与更大模型几乎相同准确度的同时，实现了最高的内存效率。

至于较小的 1.7B 和 0.6B 版本模型，4 位量化会导致性能显著下降，但仍具可用性。不过建议避免对它们进行量化处理。其 GGUF 版本的表现很可能比这更差（未经证实但极有可能）。2 位量化版本则完全不可用，生成的文本毫无意义。

开启/关闭思考功能

Qwen3 附带一个推理参数，其作用是启用或禁用 Qwen3 的"思考"模式。你可以通过 Transformers 按如下方式使用它：

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True  # True is the default value for enable_thinking
)

或使用 vLLM 的离线聊天推理：

outputs = llm.chat(prompts, sampling_params, chat_template_kwargs={"enable_thinking": True})

实际上，这仅是一个修改模型聊天模板的开关：

 "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0].role == 'system' %}\n        {{- messages[0].content + '\\n\\n' }}\n    {%- endif %}\n    {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0].role == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n    {%- set index = (messages|length - 1) - loop.index0 %}\n    {%- if ns.multi_step_tool and message.role == \"user\" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n        {%- set ns.multi_step_tool = false %}\n        {%- set ns.last_query_index = index %}\n    {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {%- set content = message.content %}\n        {%- set reasoning_content = '' %}\n        {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n            {%- set reasoning_content = message.reasoning_content %}\n        {%- else %}\n            {%- if '</think>' in message.content %}\n                {%- set content = message.content.split('</think>')[-1].lstrip('\\n') %}\n                {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n            {%- endif %}\n        {%- endif %}\n        {%- if loop.index0 > ns.last_query_index %}\n            {%- if loop.last or (not loop.last and reasoning_content) %}\n                {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n            {%- else %}\n                {{- '<|im_start|>' + message.role + '\\n' + content }}\n            {%- endif %}\n        {%- else %}\n            {{- '<|im_start|>' + message.role + '\\n' + content }}\n        {%- endif %}\n        {%- if message.tool_calls %}\n            {%- for tool_call in message.tool_calls %}\n                {%- if (loop.first and content) or (not loop.first) %}\n                    {{- '\\n' }}\n                {%- endif %}\n                {%- if tool_call.function %}\n                    {%- set tool_call = tool_call.function %}\n                {%- endif %}\n                {{- '<tool_call>\\n{\"name\": \"' }}\n                {{- tool_call.name }}\n                {{- '\", \"arguments\": ' }}\n                {%- if tool_call.arguments is string %}\n                    {{- tool_call.arguments }}\n                {%- else %}\n                    {{- tool_call.arguments | tojson }}\n                {%- endif %}\n                {{- '}\\n</tool_call>' }}\n            {%- endfor %}\n        {%- endif %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n    {%- if enable_thinking is defined and enable_thinking is false %}\n        {{- '<think>\\n\\n</think>\\n\\n' }}\n    {%- endif %}\n{%- endif %}",

在此模板中，思维模式的激活方式有两种：一是在助手消息中显式提供 reasoning_content 字段，二是在消息内容中直接包含 <think>...</think> 区块。当存在推理过程时，系统会将其与最终答案分离，并用 <think> 标签包裹以表示模型内部思考。该功能仅适用于满足以下条件的助手消息：位于最后一条用户输入之后，且包含推理内容或是会话中的最后一条消息。若 enable_thinking 设置为 false ，模板将插入空的 <think> 区块以保持格式统一，但会禁用实际推理输出。

在我看来，这些都是软性约束，因为 Qwen3 很可能通过自主插入 <think> 标记来进入"思考模式"。这引发了一个问题：这种行为究竟有多可控或可被"破解"。

这表明 Qwen3 本质上是一个面向推理的模型，其设计初衷就是用于思考。早期迹象显示，这种设计选择可能会略微影响其在不需要推理的简单任务（如 SimpleQA）上的表现，但这仍需进一步验证。

以下是使用我们 2 位量化 Qwen3-32B 模型的完整推理示例，包含启用与未启用推理机制两种情况：

from vllm.vllm import LLM, SamplingParams

# Sample prompts.
prompts = [[{"role": "user", "content": "Let a, b be positive real numbers such that ab = a + b + 3. Determine the range of possible values for a + b."}]]

# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.6, top_k=20, top_p=0.95, max_tokens=4096)

# Create an LLM.
llm = LLM(model="kaitchup/Qwen3-32B-autoround-2bit-gptq")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.chat(prompts, sampling_params, chat_template_kwargs={"enable_thinking": False})
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

关闭思考模式（{"enable_thinking": False}）

We are given the equation:\n\n$$\nab = a + b + 3\n$$\n\nLet us define $ a + b = S $ and $ ab = P $. Then the equation becomes:\n\n$$\nP = S + 3\n$$\n\nWe want to find the range of possible values for $ S = a + b $.\n\n---\n\n### Step 1: Express $ P $ in terms of $ S $\n\nFrom the equation $ ab = a + b + 3 $, we have:\n\n$$\nP = S + 3\n$$\n\nLet’s also recall the identity:\n\n$$\na + b = S \\quad \\text{and} \\quad ab = P\n$$\n\nWe can use the identity:\n\n$$\na^2 + b^2 = S^2 - 2P\n$$\n\nBut we don't need that now. Instead, let's try to find the range of $ S $.\n\n---\n\n### Step 2: Express $ a $ and $ b $ in terms of $ S $\n\nLet’s define $ a = x $ and $ b = y $. Then:\n\n$$\nx + y = S \\quad \\text{and} \\quad xy = P = S + 3\n$$\n\nLet’s solve for $ x $ and $ y $ in terms of $ S $.\n\nWe can write the quadratic equation in terms of $ x $:\n\n$$\nx^2 + x(y) = S \\quad \\text{and} \\quad x(y) = P = S + 3\n$$\n\nLet’s substitute $ y = S - x $ into the equation:\n\n$$\nx(S - x) = S + 3\n$$\n\n$$\nx(S - x) = S + 3\n$$\n\nNow solve this quadratic equation:\n\n$$\nx(S - x) = S + 3 \\Rightarrow x(S - x) = S + 3\n$$\n\nMultiply both sides by 1:\n\n$$\nx(S - x) = S + 3\n$$\n\nNow solve for $ x $:\n\n$$\nx(S - x) = S + 3 \\Rightarrow x(S - x) = S + 3\n$$\n\nLet’s expand:\n\n$$\nx(S - x) = S + 3 \\Rightarrow xS - x^2 = S + 3\n$$\n\nRearranged:\n\n$$\nx^2 - xS + S + 3 = 0\n$$\n\nNow solve this quadratic equation for $ x $:\n\n$$\nx^2 - xS + S + 3 = 0\n$$\n\nWe can solve this using the quadratic formula:\n\n$$\nx = \\frac{S \\pm \\sqrt{S^2 - 4(S + 3)}}{2}\n$$\n\nNow compute the discriminant:\n\n$$\n\\Delta = S^2 - 4(S + 3)\n$$\n\nLet’s simplify:\n\n$$\n\\Delta = S^2 - 4S - 12\n$$\n\nWe want this to be non-negative for real solutions:\n\n$$\nS^2 - 4S - 1 2 \\ge 0\n$$\n\nLet’s solve this inequality:\n\n$$\nS^2 - 4S - 12 \\ge 0\n$$\n\nFactor:\n\n$$\n(S - 6)(S + 2) \\ge 0\n$$\n\nSo the solution is:\n\n$$\nS \\ge 6 \\quad \\text{or} \\quad S \\le -2\n$$\n\nBut since $ a, b > 0 $, we only consider $ S > 0 $. So the valid range is:\n\n$$\nS \\ge 6\n$$\n\n---\n\n### Final Answer:\n\n$$\n\\boxed{[6, \\infty)}\n$$\n\nThis means the range of possible values for $ a + b $ is all real numbers greater than or equal to 6."

思考功能已禁用，但模型仍明显试图进行思考。它可能未针对简洁响应进行充分优化。

开启思考模式（{"enable_thinking": True}）

Okay, so I need to figure out the range of possible values for a + b given that ab = a + b + 3, where a and b are positive real numbers. Let me start by trying to understand the problem and how to approach it.\n\nFirst, the equation is ab = a + b + 3. I need to find the range of a + b. Let me denote S = a + b and P = ab. Then the equation becomes P = S + 3. So if I can express S in terms of P, or vice versa, maybe I can find some relationship between them.\n\nBut since S = a + b and P = ab, I know that for two variables a and b, there are relationships between S and P. For example, if I consider a and b as roots of a quadratic equation, then S would be the sum of the roots and P the product. However, in this case, maybe I don't need to go that route. Let me try to manipulate the equation.\n\nGiven that ab = a + b + 3, let me substitute S = a + b into the equation. Then we have ab = S + 3. So P = S + 3. But also, since S = a + b, we can express P in terms of S. But how?\n\nAlternatively, maybe I can express P in terms of S. Let me think. If I have S = a + b, then P = ab. Let me recall that for two numbers, (a + b)^2 = a^2 + 2ab + b^2. But I don't know if that helps here.\n\nAlternatively, if I consider S and P, then for two numbers, we have S^2 ≥ 4P (since (a + b)^2 ≥ 4ab). But maybe that's not directly useful here unless I can relate S and P.\n\nWait, let me try to express P in terms of S. Let me suppose that S = a + b and P = ab. Then from the equation P = S + 3, so P = S + 3. So if I can find a relationship between S and P, maybe I can find a way to express S in terms of P or vice versa.\n\nBut since S = a + b and P = ab, perhaps I can use the standard identity for S and P. Let me think. If I have S and P, then for two numbers, we have the quadratic equation x^2 - Sx + P = 0. The roots are a and b. But maybe I need to use this.\n\nAlternatively, let me try to express P in terms of S. Let me think. If P = S + 3, then P = S + 3. So if I can find S in terms of P, or vice versa, maybe I can find a relationship.\n\nBut maybe I can use the inequality S^2 ≥ 4P. Since S^2 ≥ 4P, then S^2 ≥ 4(S + 3). Let me try that.\n\nSo S^2 ≥ 4(S + 3). Let me write that down:\n\nS^2 ≥ 4S + 12.\n\nThen S^2 - 4S - 12 ≥ 0. Let me solve this quadratic inequality.\n\nFirst, solve S^2 - 4S - 12 = 0. The quadratic equation is S^2 - 4S - 12 = 0. Using the quadratic formula:\n\nS = [4 ± sqrt(16 + 48)] / 2 = [4 ± sqrt(64)] / 2 = [4 ± 8]/2.\n\nSo S = (4 + 8)/2 = 6 or S = (4 - 8)/2 = -2. But since S = a + b and a, b are positive real numbers, S must be positive. Therefore, S ≥ 6.\n\nSo the inequality S^2 ≥ 4S + 12 implies that S ≥ 6. Therefore, the minimum value of S is 6. But is this correct? Let me check.\n\nIf S = 6, then S^2 = 36 and 4S + 12 = 24 + 12 = 36. So equality holds. So the minimum value of S is 6. Therefore, the range of S is [6, ∞). But wait, is that correct?\n\nBut let me check if S can be greater than 6. Let me think. If S increases, then P = S + 3 increases as well. But if S increases, then P increases. However, the relationship between S and P is such that P = S + 3. So if S increases, P increases as well. However, the inequality S^2 ≥ 4P implies that S^2 ≥ 4(S + 3). So if S increases beyond 6, then S^2 will be larger than 4(S + 3). So the inequality holds. Therefore, the minimum value of S is 6. But can S be any value greater than or equal to 6?\n\nBut let me check if there are any constraints on S. Let me think. For example, if S is too large, then P = S + 3 might not satisfy the condition that a and b are positive real numbers. Wait, but since S = a + b and P = ab, if S is large enough, then there might be solutions for a and b.\n\nBut maybe there's another constraint. Let me think. For example, if S is too large, then maybe there are no real solutions for a and b. But I need to check.\n\nAlternatively, maybe there's another way to find the range of S. Let me consider the equation ab = a + b + 3. Let me try to solve for one variable in terms of the other.\n\nLet me suppose that a is a positive real number, then solve for b. So ab = a + b + 3. Let me rearrange this equation:\n\nab - a - b = 3.\n\nLet me factor this. Let me add 1 to both sides:\n\nab - a - b + 1 = 4.\n\nThen (a - 1)(b - 1) = 4. Because (a - 1)(b - 1) = ab - a - b + 1 = 4. So this is a nice identity. So (a - 1)(b - 1) = 4. Therefore, if I let x = a - 1 and y = b - 1, then x*y = 4. So x and y are positive real numbers such that x*y = 4.\n\nTherefore, since x and y are positive real numbers, their product is 4. So the product of x and y is fixed. But then, since x = a - 1 and y = b - 1, then S = a + b = x + y + 2. Therefore, S = x + y + 2. Since x*y = 4, then S = x + y + 2. But since x and y are positive real numbers, we can find the minimum and maximum of S.\n\nSo let me consider x and y such that x*y = 4. Then S = x + y + 2. Let me find the minimum and maximum of S.\n\nWe know from the AM-GM inequality that x + y ≥ 2*sqrt(xy) = 2*sqrt(4) = 4. Therefore, S = x + y + 2 ≥ 4 + 2 = 6. Therefore, the minimum value of S is 6. But what about the maximum?\n\nSince x and y can be any positive real numbers such that x*y = 4, then x and y can be made arbitrarily large by taking x approaching infinity and y approaching zero. Therefore, S = x + y + 2 can be made arbitrarily large. Therefore, the range of S is [6, ∞). So that's the answer.\n\nBut let me check if this is correct. Let me take an example. Suppose x = 2, y = 2. Then x*y = 4. Then S = 2 + 2 + 2 = 6. So that's the minimum. If x = 4, y = 1, then S = 4 + 1 + 2 = 7. If x = 1, y = 4, same. If x is approaching infinity, then y approaches 0, so S approaches infinity. Therefore, the range of S is [6, ∞). Therefore, the answer is [6, ∞).\n\nBut let me check if there are any other constraints. For example, if we have x*y = 4, then x and y must be positive. Therefore, since x and y are positive, then S = x + y + 2 is at least 6. Therefore, the range is [6, ∞). So that's the answer.\n\nBut let me check another way. Let's go back to the original equation ab = a + b + 3. Let me try to find the minimum value of S = a + b. Let me use the AM-GM inequality. Since ab = a + b + 3, and S = a + b. Then, since ab = S + 3, and S = a + b, then S + 3 = ab. So if I use the AM-GM inequality on a and b, which says that a + b ≥ 2*sqrt(ab). So S ≥ 2*sqrt(ab) = 2*sqrt(S + 3). Therefore:\n\nS ≥ 2*sqrt(S + 3).\n\nLet me solve this inequality. Let me write it as:\n\nS ≥ 2*sqrt(S + 3).\n\nLet me square both sides:\n\nS^2 ≥ 4*(S + 3).\n\nWhich is the same inequality as before: S^2 - 4S - 12 ≥ 0. Which as before gives S ≥ 6 or S ≤ -2. But since S is positive, S ≥ 6. Therefore, this confirms that the minimum value of S is 6. So the range is [6, ∞). Therefore, the answer is [6, ∞).\n\nBut let me check if there's a maximum. Since S can be as large as possible, as we saw when x approaches infinity, then S can go to infinity. Therefore, there's no upper bound. Therefore, the range is [6, ∞).\n\nBut let me check if there's any other constraints. For example, if we take a and b to be positive real numbers, then x = a - 1 and y = b - 1 must be positive real numbers such that x*y = 4. Therefore, x and y can be any positive real numbers with product 4. Therefore, S = x + y + 2 can be any value ≥ 6. So yes, the range is [6, ∞). Therefore, the answer is [6, ∞).\n\nTherefore, the range of possible values for a + b is [6, ∞). So the answer is [6, ∞).\n\nBut let me check another example. Suppose a = 2, b = 2. Then ab = 4, and a + b = 4. But according to the equation, ab = a + b + 3. So 4 = 4 + 3? No, that's not true. Wait, that's a problem. Wait, let me check. If a = 2, b = 2, then ab = 4. Then a + b + 3 = 2 + 2 + 3 = 7. So 4 ≠ 7. Therefore, that example doesn't satisfy the equation. So maybe my earlier reasoning was wrong?\n\nWait, but in my earlier analysis, I had S = a + b, and I had S = x + y + 2 where x = a - 1 and y = b - 1. So if x*y = 4, then S = x + y + 2. So if x = 1 and y = 4, then S = 1 + 4 + 2 = 7. Then ab = (1 + 1)*(4 + 1) = 2*5 = 10. And a + b + 3 = 2 + 5 + 3 = 10. So that works. So if x = 1, y = 4, then a = x + 1 = 2, b = y + 1 = 5. Then ab = 10, a + b + 3 = 10. So that works. If x = 2, y = 2, then a = 3, b = 3. Then ab = 9, and a + b + 3 = 6 + 3 = 9. So that works too. So in this case, S = 6. So that works. So the minimum S is 6.\n\nBut in my first example, when I took a = 2, b = 2, then S = 4, but ab = 4, which does not equal a + b + 3 = 7. Therefore, that example is invalid. So that shows that when x = 1, y = 4, we have a valid example with S = 7, and when x = 2, y = 2, S = 6. So the minimum S is 6.\n\nTherefore, my earlier reasoning is correct. So the range is [6, ∞). Therefore, the answer is [6, ∞).\n\nBut let me check another example. Suppose a = 3, b = 3. Then ab = 9, a + b + 3 = 6 + 3 = 9. So that works. If a = 4, b = 1, then ab = 4, a + b + 3 = 5 + 3 = 8. But 4 ≠ 8. So that's not valid. Wait, but according to the equation x*y = 4, if x = 3, y = 4/3, then x*y = 4. Then a = x + 1 = 4, b = y + 1 = 4/3 + 1 = 7/3. Then ab = 4*(7/3) = 28/3 � forth. Then a + b + 3 = 4 + 7/3 + 3 = 4 + 7/3 + 3 = 7 + 7/3 = 28/3. So that works. So in this case, S = a + b = 4 + 7/3 = 19/3 ≈ 6.333... which is greater than 6. So that works. Therefore, the range is indeed [6, �infty).\n\nTherefore, the answer is [6, �infty). So the range of possible values for a + b is [6, ∞).\n\nBut let me check if there's another way to see this. Let me consider the equation ab = a + b + 3. Let me solve for b in terms of a.\n\nab = a + b + 3. Let me rearrange:\n\nab - b = a + 3.\n\nb(a - 1) = a + 3.\n\nTherefore, b = (a + 3)/(a - 1), provided that a ≠ 1. Since a is positive, and since a - 1 must be positive (since b must be positive as well), then a must be greater than 1. Therefore, a > 1. Then b is positive.\n\nSo if a > 1, then b = (a + 3)/(a - 1). Then S = a + b = a + (a + 3)/(a - 1). Let me compute S.\n\nLet me write S = a + (a + 3)/(a - 1). Let me combine the terms:\n\nS = [a(a - 1) + a + 3]/(a - 1) = [a^2 - a + a + 3]/(a - 1) = [a^2 + 3]/(a - 1).\n\nSo S = (a^2 + 3)/(a - 1). Let me analyze this function. Let me find the minimum value of S. Since a > 1, let me consider S as a function of a.\n\nLet me denote f(a) = (a^2 + 3)/(a - 1). Let me find the minimum of f(a). Let me take derivative.\n\nf'(a) = [2a(a - 1) - (a^2 + 3)]/(a - 1)^2.\n\nLet me compute numerator:\n\n2a(a - 1) - (a^2 + 3) = 2a^2 - 2a - a^2 - 3 = a^2 - 2a - 3.\n\nTherefore, f'(a) = (a^2 - 2a - 3)/(a - 1)^2.\n\nSet f'(a) = 0. So a^2 - 2a - 3 = 0. Solve for a:\n\na^2 - 2a - 3 = 0. The solutions are a = [2 ± sqrt(4 + 12)]/2 = [2 ± sqrt(16)]/2 = [2 ± 4]/2. Therefore, a = 3 or a = -1. Since a > 1, only a = 3 is valid.\n\nTherefore, the minimum occurs at a = 3. Let me compute S at a = 3. S = (3^2 + 3)/(3 - 1) = (9 + 3)/2 = 12/2 = 6. So S = 6 at a = 3. Therefore, the minimum value of S is 6. Therefore, the range is [6, ∞).\n\nBut let me check another point. Let me take a = 2. Then b = (2 + 3)/(2 - 1) = 5/1 = 5. Then S = 2 + 5 = 7. Which is greater than 6. If a = 4, then b = (4 + 3)/(4 - 1) = 7/3. Then S = 4 + 7/3 = 19/3 ≈ 6.333. Which is greater than 6. If a approaches infinity, then b = (a + 3)/(a - 1) approaches 1 as a approaches infinity. Therefore, S = a + b approaches infinity as a approaches infinity. Therefore, the range is [6, ∞).\n\nTherefore, the answer is [6, ∞). So the range of possible values for a + b is [6, �infty).\n\nBut let me check if there are any other constraints. For example, if a approaches 1 from the right, then b approaches (1 + 3)/(1 - 1) which is undefined. But since a must be greater than 1, as we saw earlier, then b is positive. Therefore, there's no problem here.\n\nTherefore, the answer is [6, ∞). So the range is from 6 to infinity.\n\nTherefore, the answer is [6, ∞). So the possible values for a + b are all real numbers greater or equal to 6.\n\nSo the answer is [6, ∞). Therefore, the boxed answer is \\boxed{[6, \\infty)}.\n\n**Final Answer**\nThe range of possible values for $ a + b $ is \\boxed{[6, \\infty)}.\n</think>\n\nTo solve the problem, we are given the equation:\n\n$$\nab = a + b + 3\n$$\n\nwhere $ a $ and $ b $ are positive real numbers. Our goal is to determine the range of possible values for $ a + b $.\n\n---\n\n### Step 1: Substituting the Sum and Product\n\nLet $ S = a + b $ and $ P = ab $. Then the equation becomes:\n\n$$\nP = S + 3\n$$\n\nThis is a key relationship between $ S $ and $ P $. Since $ S = a + b $ and $ P = ab $, we can use the identity:\n\n$$\n(a + b)^2 \\ge 4ab \\quad \\text{(AM-GM inequality)}\n$$\n\nThis gives:\n\n$$\nS^2 \\ge 4P = 4(S + 3)\n$$\n\nSo:\n\n$$\nS^2 \\ge 4S + 12\n$$\n\nWe now solve this inequality.\n\n---\n\n### Step 2: Solving the Inequality\n\n$$\nS^2 \\ge 4S + 12 \\quad \\Rightarrow \\quad S^2 - 4S - 12 \\ge 0\n$$\n\nWe solve the quadratic equation:\n\n$$\nS^2 - 4S - 12 = 0\n$$\n\nUsing the quadratic formula:\n\n$$\nS = \\frac{4 \\pm \\sqrt{16 + 48}}{2} = \\frac{4 \\pm \\sqrt{64}}{2} = \\frac{4 \\pm 8}{2}\n$$\n\nSo:\n\n$$\nS = 6 \\quad \\text{or} \\quad S = -2\n$$\n\nSince $ S = a + b $ and $ a, b > 0 $, the only valid value is $ S = 6 $.\n\n---\n\n### Step 3: Verifying the Minimum Value\n\nWe now analyze the inequality $ S^2 \\ge 4S + 12 $ to confirm the minimum value of $ S $. The minimum occurs when the inequality becomes equality:\n\n$$\nS^2 = 4S + 12 \\quad \\Rightarrow \\quad S = 6\n$$\n\nSo $ S \\ge 6 $.\n\n---\n\n### Step 4: Exploring the Range of $ S $\n\nLet’s consider the case where $ a $ and $ b $ are positive real numbers such that $ ab = a + b + 3 $. We define $ x = a - 1 $ and $ y = b - 1 $, so that:\n\n$$\nx \\cdot y = 4 \\quad \\text{and} \\quad S = a + b = x + y + 2\n$$\n\nFrom the AM-GM inequality, $ x + y \\ge 2\\sqrt{xy} = 2\\sqrt{4} = 4 $, so:\n\n$$\nS = x + y + 2 \\ge 6\n$$\n\nThus, the minimum value of $ S $ is 6.\n\n---\n\n### Step 5: Exploring the Upper Bound\n\nAs $ x \\to \\infty $ and $ y \\to 0 $, the product $ x \\cdot y = 4 $ remains fixed, but the sum $ x + y \\to \\infty $. Therefore, $ S = x + y + 2 \\to \\infty $.\n\nThus, the range of $ a + b = S $ is:\n\n$$\n[6, \\infty)\n$$\n\n---\n\n### Final Answer\n\n$$\n\\boxed{[6, \\infty)}\n$$

尽管参数精度较低，这段长推理过程表明 2 比特模型仍能进行结构化且（某种程度上）连贯的推理。

结论

先前研究表明，在更大规模 token 量上预训练的模型通常更难精确量化。这一点在 Llama 3 上表现尤为明显，其 2-bit 量化经常导致模型不稳定或性能低下。令人惊讶的是，尽管 Qwen3 的训练 token 量约为前者的两倍，却能以惊人的精度完成量化。这表明其预训练数据的本质特性或训练策略本身，使得 Qwen 系列模型对极端量化具有更强的鲁棒性。我原本期待能看到 Qwen3-50B 版本。若将其量化为 2-bit，或许能成为 24GB 显卡上可运行的最强性能模型。总体而言，Qwen3 系列表现卓越。唯一存疑的是它们是否会存在"过度思考"倾向——即便关闭推理功能，仍可能生成不必要的冗长回答。由于经过显式推理训练，这类模型在无需复杂输出的场景下可能默认产生冗余内容。幸运的是，Qwen 团队还发布了基础模型，这让我们能够灵活地进行后续训练，既可以避免推理问题，也能针对特定任务进行专业化调整。

Installation

git clone https://github.com/vllm-project/vllm.git && cd vllm && VLLM_USE_PRECOMPILED=1 pip install --editable .

Inference with vLLM and 2-bit Qwen3 32B

%env VLLM_USE_V1=0
from vllm.vllm import LLM, SamplingParams

# Sample prompts.
prompts = [[{"role": "user", "content": "Let a, b be positive real numbers such that ab = a + b + 3. Determine the range of possible values for a + b."}]]

# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.6, top_k=20, top_p=0.95, max_tokens=8192)

# Create an LLM.
llm = LLM(model="kaitchup/Qwen3-32B-autoround-2bit-gptq")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.chat(prompts, sampling_params, chat_template_kwargs={"enable_thinking": False})
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

outputs = llm.chat(prompts, sampling_params, chat_template_kwargs={"enable_thinking": True})
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Evaluation

git clone --depth 1 https://github.com/EleutherAI/lm-evaluation-harness && cd lm-evaluation-harness && pip install -e .

VLLM_USE_V1=0 lm_eval --model vllm \
    --model_args pretrained="Qwen/Qwen3-32B",dtype="bfloat16",max_model_len=12000 \
    --tasks leaderboard_ifeval \
    --device cuda:0 \
    --batch_size auto \
    --apply_chat_template \
    --output_path results

VLLM_USE_V1=0 lm_eval --model vllm \
    --model_args pretrained="kaitchup/Qwen3-32B-autoround-4bit-gptq",dtype="float16",max_model_len=12000 \
    --tasks leaderboard_ifeval \
    --device cuda:0 \
    --batch_size auto \
    --apply_chat_template \
    --output_path results

VLLM_USE_V1=0 lm_eval --model vllm \
    --model_args pretrained="kaitchup/Qwen3-32B-autoround-2bit-gptq",dtype="float16",max_model_len=12000 \
    --tasks leaderboard_ifeval \
    --device cuda:0 \
    --batch_size auto \
    --apply_chat_template \
    --output_path results

Quantization

2-bit

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "Qwen/Qwen3-4B"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
from auto_round import AutoRound

autoround = AutoRound(model, tokenizer, nsamples=512, iters=512, low_gpu_mem_usage=False, enable_torch_compile=True, bits=2, seqlen=4096, group_size=32, sym=True)
output_dir = "./Qwen3-4B-autoround-2bit-gptq"
autoround.quantize_and_save(output_dir, format='auto_gptq')

4-bit

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "Qwen/Qwen3-4B"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
from auto_round import AutoRound



autoround = AutoRound(model, tokenizer, nsamples=512, iters=512, low_gpu_mem_usage=False, enable_torch_compile=True, bits=4, seqlen=4096, group_size=128, sym=True)
output_dir = "./Qwen3-4B-autoround-4bit-gptq"
autoround.quantize_and_save(output_dir, format='auto_gptq')