向量量化VQ、码本

原创于 2025-05-27 15:56:41 发布 · 391 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能 #python #深度学习 #神经网络 #笔记

VQ简单的说是一种压缩手段，将数据进行浓缩，将特征相似的数据用共同的码本表示。

向量量化（Vector Quantization, VQ）

作用：将连续数据（如图像像素、语音信号）离散化为有限的代表性向量（码本中的条目）。

流程：

训练阶段：通过聚类（如K-Means）从数据中学习码本（一组固定的质心向量）

编码阶段：将输入数据映射到码本中最接近的向量（用索引表示，码本是数据的浓缩，属于这个码本的向量其实和它很接近！）。

示例：图像压缩：将图像块映射到码本中的最匹配向量，存储索引而非原始数据。

码本的基本定义

码本（Codebook）是一组预定义的符号、向量或编码的集合，用于将原始数据（如信号、图像、文本）映射到更紧凑或更有意义的表示形式。

组件说明

码字（Codeword）码本中的一个条目，通常是向量（如128维浮点数组）。

索引（Index）每个码字的唯一编号，用于高效存储和检索（如用整数3代表某个码字）。

距离度量计算输入数据与码字相似度的方式（如欧氏距离、余弦相似度）。

示例与代码

当码本为1的时候，很显然没有了表达能力，因为所有的数据都变成了相同的一个码本

当码本为2的时候，我们很惊喜的发现，已经有点像了！注意，这才2个码本啊！这意味着，数据块被映射到了2个数值的空间中。

码本为32时，已经有点难分辨了。原本的36684个块，我们只需要32个码本就表示了！

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from PIL import Image

class VectorQuantizer:
    def __init__(self, num_codewords=32, patch_size=8):
        self.num_codewords = num_codewords  # 码本大小（K）
        self.patch_size = patch_size        # 图像块大小
        self.codebook = None                # 码本（K x D，D=patch_size^2）

    def train_codebook(self, data):
        """用K-Means训练码本"""

        # num_codewords 决定了有多少个聚类中心
        # random_state随机种子 类型为int
        # data为 (36864, 64)
        kmeans = KMeans(n_clusters=self.num_codewords, random_state=0)
        # 随机初始化：选择num_codewords个初始聚类中心（码字）。
        # 迭代优化：分配样本：将每个数据点分配到最近的聚类中心。
        # 更新中心：重新计算每个聚类的均值作为新中心。
        # 收敛：当中心不再变化或达到最大迭代次数时停止。

        kmeans.fit(data)

        # codebook [32,64]
        self.codebook = kmeans.cluster_centers_  # 码本

        return kmeans

    def encode(self, vectors):
        """将向量编码为码本索引"""
        # vectors [36864, 64]
        # vectors[:, np.newaxis] [36864,1,64]
        # np.newaxis 用来增加维度的 相当于None

        # 计算每个向量与码本的举例，因为码本有32个，所以维度是 36864，32
        # axis=2 沿着 第axis=2维度来计算L2范数
        # [36864,1,64] - [32,64]----> [36864,32]
        distances = np.linalg.norm(vectors[:, np.newaxis] - self.codebook, axis=2)

        indices = np.argmin(distances, axis=1)
        # distances.shape [36864,32]
        # indices.shape [36864,]

        return indices

    def decode(self, indices):
        """从索引解码为码字"""
        return self.codebook[indices]

def extract_patches(image, patch_size):
    """将图像分割为小块并展平为向量"""
    h, w = image.shape
    patches = []
    for i in range(0, h, patch_size):
        for j in range(0, w, patch_size):
            patch = image[i:i+patch_size, j:j+patch_size]
            if patch.shape == (patch_size, patch_size):
                patches.append(patch.flatten())
    # patches [36864,64]
    return np.array(patches)

def reconstruct_image(patches, codebook, image_shape, patch_size):
    """从码本重构图像"""
    indices = codebook.encode(patches)

    # decoded_patches [36864,64]
    decoded_patches = codebook.decode(indices)

    image = np.zeros(image_shape)

    idx = 0
    for i in range(0, image_shape[0], patch_size):
        for j in range(0, image_shape[1], patch_size):
            if idx < len(decoded_patches):
                #   \ 这是一个延续符号
                image[i:i+patch_size, j:j+patch_size] = \
                    decoded_patches[idx].reshape(patch_size, patch_size)
                idx += 1
    return image

if __name__ == "__main__":
    # 1. 加载图像（转为灰度）
    # convert("L")是将彩色图片转换为单通道的灰度图片
    image = np.array(Image.open("生成1536像素图片.png").convert("L"))  # 替换为你的图像路径
    image = image / 255.0  # 归一化到[0, 1]

    # 2. 参数设置
    patch_size = 8
    num_codewords =32  # 码本大小

    # 3. 提取图像块
    patches = extract_patches(image, patch_size)
    print(f"提取到 {len(patches)} 个 {patch_size}x{patch_size} 的块")

    # 4. 训练码本
    vq = VectorQuantizer(num_codewords, patch_size)

    # patches shape[36864,64]
    vq.train_codebook(patches)


    # 5. 编码与重构图像
    reconstructed = reconstruct_image(patches, vq, image.shape, patch_size)

    # 6. 可视化

    # 创建画布
    plt.figure(figsize=(12, 6))

    plt.subplot(1, 2, 1)
    plt.title("Original Image")
    plt.imshow(image, cmap='gray')

    plt.subplot(1, 2, 2)
    plt.title(f"Reconstructed (K={num_codewords})")
    plt.imshow(reconstructed, cmap='gray')

    plt.show()

    # 7. 计算压缩率
    original_size = image.size  # 原始像素数

    compressed_size = (len(patches) * np.log2(num_codewords)) / 8  # 索引占用的字节数
    compression_ratio = original_size / compressed_size
    print(f"压缩率: {compression_ratio:.1f}x")