RedisVL 入门构建高效的 AI 向量搜索应用

本文链接：https://blog.csdn.net/weixin_43114209/article/details/148825365

一、前置条件

在开始之前，请确保：

已在 Python 环境中安装 redisvl。
运行 Redis Stack 或 Redis Cloud 实例。

二、定义索引架构（IndexSchema）

索引架构（IndexSchema）用于定义 Redis 的索引配置和字段信息，支持通过 Python 字典或 YAML 文件创建。以下以用户数据集为例，包含 user、job、age、credit_score 和三维的 user_embedding 向量。

2.1.示例架构

假设我们需要为数据集定义一个索引，索引名称为 user_simple，键前缀为 user_simple_docs。

2.3.YAML 格式

version: '0.1.0'

index:
  name: user_simple
  prefix: user_simple_docs

fields:
  - name: user
    type: tag
  - name: credit_score
    type: tag
  - name: job
    type: text
  - name: age
    type: numeric
  - name: user_embedding
    type: vector
    attrs:
      algorithm: flat
      dims: 3
      distance_metric: cosine
      datatype: float32

将上述内容保存为 schema.yaml 文件。

3.3.Python 字典格式

schema = {
    "index": {
        "name": "user_simple",
        "prefix": "user_simple_docs",
    },
    "fields": [
        {"name": "user", "type": "tag"},
        {"name": "credit_score", "type": "tag"},
        {"name": "job", "type": "text"},
        {"name": "age", "type": "numeric"},
        {
            "name": "user_embedding",
            "type": "vector",
            "attrs": {
                "dims": 3,
                "distance_metric": "cosine",
                "algorithm": "flat",
                "datatype": "float32"
            }
        }
    ]
}

三、准备样本数据集

我们创建一个包含 user、job、age、credit_score 和 user_embedding 字段的样本数据集。user_embedding 为三维向量，仅用于演示。

import numpy as np

data = [
    {
        'user': 'john',
        'age': 1,
        'job': 'engineer',
        'credit_score': 'high',
        'user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()
    },
    {
        'user': 'mary',
        'age': 2,
        'job': 'doctor',
        'credit_score': 'low',
        'user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()
    },
    {
        'user': 'joe',
        'age': 3,
        'job': 'dentist',
        'credit_score': 'medium',
        'user_embedding': np.array([0.9, 0.9, 0.1], dtype=np.float32).tobytes()
    }
]

注意，user_embedding 向量通过 NumPy 转换为字节格式，以符合 Redis 的存储要求。

四、创建搜索索引（SearchIndex）

准备好架构和数据集后，我们可以创建 SearchIndex 对象。

4.1.使用自定义 Redis 连接

如果需要自定义 Redis 连接设置或共享连接池：

from redisvl.index import SearchIndex
from redis import Redis

client = Redis.from_url("redis://localhost:6379")
index = SearchIndex.from_dict(schema, redis_client=client, validate_on_load=True)

4.2.让索引管理连接

对于简单场景，可以让索引自动管理 Redis 连接：

index = SearchIndex.from_dict(schema, redis_url="redis://localhost:6379", validate_on_load=True)

4.3.创建索引

执行以下命令创建索引：

index.create(overwrite=True)

此时，索引已创建但尚无数据。

五、使用 rvl CLI 检查索引

通过 rvl 命令行工具检查索引信息：

rvl index listall

输出：

19:17:09 [RedisVL] INFO   Indices:
19:17:09 [RedisVL] INFO   1. user_simple

查看具体索引详情：

rvl index info -i user_simple

输出：

Index Information:
╭──────────────────────┬──────────────────────┬──────────────────────┬──────────────────────┬──────────────────────╮
│ Index Name           │ Storage Type         │ Prefixes             │ Index Options        │ Indexing             │
├──────────────────────┼──────────────────────┼──────────────────────┼──────────────────────┼──────────────────────┤
| user_simple          | HASH                 | ['user_simple_docs'] | []                   | 0                    |
╰──────────────────────┴──────────────────────┴──────────────────────┴──────────────────────┴──────────────────────╯
Index Fields:
╭─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────╮
│ Name            │ Attribute       │ Type            │ Field Option    │ Option Value    │ Field Option    │ Option Value    │ Field Option    │ Option Value    │ Field Option    │ Option Value    │
├─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ user            │ user            │ TAG             │ SEPARATOR       │ ,               │                 │                 │                 │                 │                 │                 │
│ credit_score    │ credit_score    │ TAG             │ SEPARATOR       │ ,               │                 │                 │                 │                 │                 │                 │
│ job             │ job             │ TEXT            │ WEIGHT          │ 1               │                 │                 │                 │                 │                 │                 │
│ age             │ age             │ NUMERIC         │                 │                 │                 │                 │                 │                 │                 │                 │
│ user_embedding  │ user_embedding  │ VECTOR          │ algorithm       │ FLAT            │ data_type       │ FLOAT32         │ dim             │ 3               │ distance_metric │ COSINE          │
╰─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────╯

六、加载数据到索引

使用 load 方法将样本数据加载到 Redis 中：

keys = index.load(data)
print(keys)

输出：

['user_simple_docs:01JT4PPPNJZMSK2395RKD208T9', 'user_simple_docs:01JT4PPPNM63J55ZESZ4TV1VR8', 'user_simple_docs:01JT4PPPNM59RCKS2YQ58B1HQW']

RedisVL 使用 Pydantic 进行数据验证，确保加载的数据符合架构要求。如果数据无效（例如 user_embedding 不是字节类型），会抛出 SchemaValidationError。

七、更新索引数据

通过再次调用 load 方法可以插入或更新（upsert）数据：

new_data = [{
    'user': 'tyler',
    'age': 9,
    'job': 'engineer',
    'credit_score': 'high',
    'user_embedding': np.array([0.1, 0.3, 0.5], dtype=np.float32).tobytes()
}]
keys = index.load(new_data)
print(keys)

输出：

['user_simple_docs:01JT4PPX63CH5YRN2BGEYB5TS2']

八、创建并执行向量查询

使用 VectorQuery 创建向量查询对象：

from redisvl.query import VectorQuery

query = VectorQuery(
    vector=[0.1, 0.1, 0.5],
    vector_field_name="user_embedding",
    return_fields=["user", "age", "job", "credit_score", "vector_distance"],
    num_results=3
)

执行查询：

results = index.query(query)

输出：

vector_distance  user  age  job       credit_score
0               john  1    engineer  high
0               mary  2    doctor    low
0.0566299557686 tyler 9    engineer  high

九、使用异步 Redis 客户端

在生产环境中，推荐使用异步客户端 AsyncSearchIndex：

from redisvl.index import AsyncSearchIndex
from redis.asyncio import Redis

client = Redis.from_url("redis://localhost:6379")
index = AsyncSearchIndex.from_dict(schema, redis_client=client)
results = await index.query(query)

输出与同步查询一致。

十、更新索引架构

如果需要更新索引架构（例如将 job 字段从 text 改为 tag，或将 user_embedding 从 flat 向量索引改为 hnsw），可以直接修改并重新创建索引：

index.schema.remove_field("job")
index.schema.remove_field("user_embedding")
index.schema.add_fields([
    {"name": "job", "type": "tag"},
    {
        "name": "user_embedding",
        "type": "vector",
        "attrs": {
            "dims": 3,
            "distance_metric": "cosine",
            "algorithm": "hnsw",
            "datatype": "float32"
        }
    }
])

await index.create(overwrite=True, drop=False)

这将保留现有数据，仅更新索引配置。

十一、检查索引统计信息

使用 rvl CLI 查看索引统计：

rvl stats -i user_simple

输出：

Statistics:
╭─────────────────────────────┬────────────╮
│ Stat Key                    │ Value      │
├─────────────────────────────┼────────────┤
│ num_docs                    │ 4          │
│ num_terms                   │ 0          │
│ max_doc_id                  │ 4          │
│ num_records                 │ 20         │
│ percent_indexed             │ 1          │
│ hash_indexing_failures      │ 0          │
│ number_of_uses              │ 2          │
│ bytes_per_record_avg        │ 48.2000007 │
│ doc_table_size_mb           │ 4.23431396 │
│ inverted_sz_mb              │ 9.19342041 │
│ key_table_size_mb           │ 1.93595886 │
│ offset_bits_per_record_avg  │ nan        │
│ offset_vectors_sz_mb        │ 0          │
│ offsets_per_term_avg        │ 0          │
│ records_per_doc_avg         │ 5          │
│ sortable_values_size_mb     │ 0          │
│ total_indexing_time         │ 0.74400001 │
│ total_inverted_index_blocks │ 11         │
│ vector_index_sz_mb          │ 0.23560333 │
╰─────────────────────────────┴────────────╯

十二、清理

清理数据或索引：

# 清除索引中的所有数据，但保留索引结构
await index.clear()

# 完全删除索引及其数据
await index.delete()

十三、总结

RedisVL 提供了一个简单而强大的接口，用于在 Redis 中进行向量搜索。通过定义索引架构、加载数据、执行向量查询以及更新索引，你可以快速构建高效的 AI 应用。结合异步客户端和 CLI 工具，RedisVL 适用于从开发到生产环境的多种场景。