Tensorflow - 拿捏 tf.nn.embedding_lookup && tf.nn.embedding_lookup_sparse

BIT_666

已于 2022-06-10 12:05:42 修改

阅读量2k

点赞数

分类专栏： Tensorflow Machine Learning 常用语法文章标签： tensorflow embedding lookup lookup_sparse MaxNorm

于 2021-05-20 18:05:16 首次发布

本文链接：https://blog.csdn.net/BIT_666/article/details/117076183

版权

常用语法同时被 3 个专栏收录

66 篇文章

订阅专栏

Tensorflow

36 篇文章

订阅专栏

Machine Learning

29 篇文章

订阅专栏

一.引言

前面提到 Wide & Deep 中涉及到类别特征的 embedding，文中的 embedding 是由 Embedding 层得到，实际应用场景中，也可以从预训练的模型中加载已知 id 的 embedding，例如可以从矩阵分解获取 user-item 的向量，也可以通过 Word2vec 预训练的到语义的embedding 等等，embedding_lookup 相关函数可以看作是一个字典，给定对应特征的 id，从中获取 id 对应的 embedding 并做后续处理。

二.embedding_lookup

1.函数 API 简介

基础函数，从 params (可以理解是 embedding 字典) 中根据对应 ids 获取对应的 Tensor (向量 embedding) 。

tf.nn.embedding_lookup(
    params, ids, max_norm=None, name=None
)

2.基础使用 Demo

    # 模拟 embeddings
    embeddings = np.random.randint(0, 100, size=(100, 100))

    emb = tf.nn.embedding_lookup(embeddings, [1, 3])
    print(emb)

embeddings 为生成的 100 x 100 的向量，通过 lookup 找到索引为 1,3 的两个向量。

tf.Tensor(
[[20 31 27  3  9 64 82 55 24 61  5 54 42 52 49 26 47 96 87 40 18 91  9 25
  67  5  6 44 70 30 39 99 90 40 67 27 91  5 47 50 77 59 36 72 77 14 69 36
  96  8 84 96 85 48 89 20 25 65 67 44 83 42 37 35 75 85 86 35 15  0 55 45
  22 50 12 42 65 76 79 72 46 27 11 23 64 32 11 83 91 70 77 20 86 87 18  5
  53 10 10 49]
 [27 22  9 94 24 53 97 63 97  2 54 84 11 39 56 53 46 72 50  9 26 73 15 27
  95 50 14 45  0 21 91 44 80 41 49 52 91 90 16 12 95 23 70 46 67 83 74 55
  29 10 38  6 51 46 89 80 42 12 12 93  7 19 64 79 88 21 63 91 25 15 26 68
   7 96 17 64 19  0 47 55 65 84 85 81 25 21 35 64 65 91  4 71 73 63 65 86
  13 75 74 23]], shape=(2, 100), dtype=int64)

3. Max Norm 使用 Demo

    max_norm: If not `None`, each embedding is clipped if its l2-norm is larger than this value.

如果不是 None，且对应 embedding 的 L2 范数超过该值，则对 embedding 进行修剪。这里其实是对获取的 embedding 做一个正则化，防止向量过大。

假定索引 id 对应的 Vector 的 L2 范数超过 max_norm 的值，则对该 embedding 的值进行 L2 正则:

$Vector = \frac{Vector \cdot MaxNorm}{norm(Vector)}$

向量的 L2 范数计算如下，已知向量 Vector[x1,x2,x3, ... ,xn]:

$norm(Vector) = \sqrt{x_1^2 + x_2^2 + ... + x_n^2}$

    embeddings = tf.constant([[1., 1., 1.],
                              [2., 2., 2.],
                              [3., 3., 3.]])
    max_norm = tf.Variable(2.)
    emb = tf.nn.embedding_lookup(embeddings, [0, 2], max_norm=max_norm)
    print(emb)

我们先手算一下，max_norm 的值为2，索引 0,2 对应的向量为 [1,1,1] ，[3,3,3] ，前者 L2 范数 $\sqrt3$ 小于2不作修剪，后者 L2 范数 $\sqrt27$ 大于2，按照公式 [3,3,3] * 2 / $\sqrt27$ = 1.1547

3 * 2 / math.pow(27,0.5) = 1.1547005383792515

最终的结果，前者未修剪，后者进行了 L2 正则:

tf.Tensor(
[[1.        1.        1.       ]
 [1.1547006 1.1547006 1.1547006]], shape=(2, 3), dtype=float32)

三.embedding_lookup_sparse

1.函数 API 简介

params 对应 embeddings 字典，sp_ids 对应索引 index 的 sparse_tensor ，sp_weights 与sp_ids 对应，表示各 index 得到的权重，有点像 attention 机制，最终通过 combiner 聚合得到最后的嵌入。 MaxNorm 参数上面示范了使用，所以这里不再赘述。

tf.nn.embedding_lookup_sparse(
    params, sp_ids, sp_weights, combiner=None, max_norm=None, name=None
)

A dense tensor representing the combined embeddings for the sparse ids. For each row in the dense tensor represented by sp_ids, 
the op looks up the embeddings for all ids in that row, multiplies them by the corresponding weight, and combines these embeddings as specified.

params代表默认的 embedding 的字典，通过 sp_ids 的索引获取字典中的embedding，首先根据权重对 embedding 加权，随后按照 combiner 操作进行 embedding 的聚合操作。其中 combiner 的方式官方 API 已经给出，其中默认聚合操作为 mean :

mean，sqrtn，sum 源码:

      embeddings *= weights

      if combiner == "sum":
        embeddings = math_ops.segment_sum(embeddings, segment_ids, name=name)
      elif combiner == "mean":
        embeddings = math_ops.segment_sum(embeddings, segment_ids)
        weight_sum = math_ops.segment_sum(weights, segment_ids)
        embeddings = math_ops.divide(embeddings, weight_sum, name=name)

      elif combiner == "sqrtn":
        embeddings = math_ops.segment_sum(embeddings, segment_ids)
        weights_squared = math_ops.pow(weights, 2)
        weight_sum = math_ops.segment_sum(weights_squared, segment_ids)
        weight_sum_sqrt = math_ops.sqrt(weight_sum)
        embeddings = math_ops.divide(embeddings, weight_sum_sqrt, name=name)
      else:
        assert False, "Unrecognized combiner"

2.sparse_tensor & sp_ids

sparse_tensor 为稀疏向量，传统 dense 向量需要给定所有索引 (i,j) 对应的 value，而 sparse_tensor 只需给定有 value 的坐标即可(i,j)，坐标的确定由 sp_ids 指定，其余位置填充默认值或者 None。

    indices = tf.SparseTensor(indices=[[0, 1],
                                   [0, 3],
                                   [1, 2],
                                   [1, 3]],
                          values=[2, 1, 1, 1],
                          dense_shape=[2, 4])
    print("Indices")
    print(indices)

SparseTensor(indices=tf.Tensor(
[[0 1]
 [0 3]
 [1 2]
 [1 3]], shape=(4, 2), dtype=int64), values=tf.Tensor([2 1 1 1], shape=(4,), dtype=int32), dense_shape=tf.Tensor([2 4], shape=(2,), dtype=int64))

上述初始化了 (4,2) 的数组，其中前四个索引 [0,1]，[0,3] ... 代表有值的索引， values 代表各个索引的值，二者一一对应，转换为 Dense 数组可以理解为:

[[None, 2, None, 1],
 [None, None, 1, 1]]

Tips:

这里空值为 None，而不是0，因为0在这里有含义，embedding_lookup_sparse 函数会根据索引 0 从 params 中获取首位的 embedding，所以和传统需要计算的 Sparse_tensor 对应的 0 是有区别的，这里使用需要格外注意！

3.sp_weights

sp_weights: either a `SparseTensor` of float / double weights, or `None` to indicate all weights should be taken to be 1. 
If specified, `sp_weights` must have exactly the same shape and indices as `sp_ids`.

稀疏向量索引对应的权重，可以是 sparss_tensor，也可以是 float，double的数组，这里是给 embedding 加权使用，如果传入 None，则默认等权重即都为1，如果给定某个 embedding 高的权重，则对应到深度学习中的 attention 机制，需要注意的是这里 sp_weights 的形状需要与 sp_ids 对应。通过上述源码可以看到，每一个 combiner 操作之前都会执行 embeddings *= weights 加权操作。

    weights = tf.SparseTensor(indices=[[0, 1],
                                   [0, 3],
                                   [1, 2],
                                   [1, 3]],
                          values=[1, 2, 2, 2],
                          dense_shape=[2, 4])

4.Sum Demo

先从最简单的 sum 操作开始，sp_ids 就使用第2小节给出的 2 x 4 的稀疏向量为准，embedding 字典为了好对照，选取了最简单的三个共线向量:

    params = tf.constant([[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
                          [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2],
                          [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]])


    indices = tf.SparseTensor(indices=[[0, 1],
                                   [0, 3],
                                   [1, 2],
                                   [1, 3]],
                          values=[2, 1, 1, 1],
                          dense_shape=[2, 4])

    emb = tf.nn.embedding_lookup_sparse(params, indices, None, combiner='sum')
    print(emb)

第一行稀疏向量给定了索引2和1，所以第一个 embedding 为 parms[1] + params[2] ，第二个 embedding 为 params[1] + params[1] 具体实现可以参考上面给到的源码 embeddings = math_ops.segment_sum(embeddings, segment_ids, name=name) 。

tf.Tensor(
[[0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4]], shape=(2, 10), dtype=float32)

上面这个结果口算一下也可以得到。

5.Mean Demo

mean 的操作这里加入 weights，看一下这个怎么生效:

    params = tf.constant([[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
                          [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2],
                          [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]])

    indices = tf.SparseTensor(indices=[[0, 1],
                                   [0, 3],
                                   [1, 2],
                                   [1, 3]],
                          values=[2, 1, 1, 1],
                          dense_shape=[2, 4])

    weights = tf.SparseTensor(indices=[[0, 1],
                                   [0, 3],
                                   [1, 2],
                                   [1, 3]],
                          values=[1, 2, 2, 2],
                          dense_shape=[2, 4])

    emb = tf.nn.embedding_lookup_sparse(params, indices, weights, combiner='mean')
    print(emb)

权重矩阵理解为，注意这里和 sp_ids 不一样的是这里的0没有意义，因为加权是针对 sp_ids 的索引来，所以 0 不影响后续操作。

[[0, 1, 0, 2],
 [0, 0, 2, 2]]

根据源码可以看到执行顺序 Tips: [] 内为第一行执行顺序

(1) embeddings *= weights 样本加权 [sp_ids 通过索引得到 0.3 ... | 0.4... ]

(2) embeddings = math_ops.segment_sum(embeddings, segment_ids) embedding 求和 [ 0.3... + 0.4... 得到 0.7...]

(3) weight_sum = math_ops.segment_sum(weights, segment_ids) 权重求和 [1 + 2 得到权重3 ]

(4) embeddings = math_ops.divide(embeddings, weight_sum, name=name) (2) 得到的加权 embedding 除以 (3) 得到的求和权重 [0.7 / 3 得到 0.233... 另外一行 0.2 计算逻辑同上]

参考下面的执行顺序，对照 params与 weights ，即可得到最终4的结果。

6.Sqrtn Demo

sqrtn 与 mean比较相似，只不过这里权重先平方再求和再开根号，与上述权重值直接相加有区别。这里为了方便计算，sp_weights 改为全部为1，其余 params ，sp_ids 与上述 demo 一致。

    params = tf.constant([[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
                          [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2],
                          [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]])


    indices = tf.SparseTensor(indices=[[0, 1],
                                   [0, 3],
                                   [1, 2],
                                   [1, 3]],
                          values=[2, 1, 1, 1],
                          dense_shape=[2, 4])

    weights = tf.SparseTensor(indices=[[0, 1],
                                   [0, 3],
                                   [1, 2],
                                   [1, 3]],
                          values=[1, 1, 1, 1],
                          dense_shape=[2, 4])

    emb = tf.nn.embedding_lookup_sparse(params, indices, weights, combiner='sqrtn')
    print(emb)

根据源码可以看到执行顺序: Tips: [] 内为第一行执行顺序

(1) embeddings = math_ops.segment_sum(embeddings, segment_ids) 索引对应embedding加权 [权重为1，索引不变 0.3... + 0.2... = 0.5...]

(2) weights_squared = math_ops.pow(weights, 2) 权重平方 [ 1^2 = 1 1^2 = 1]

(3)weight_sum = math_ops.segment_sum(weights_squared, segment_ids) 平方后相加 [1 + 1 = 2]

(4) weight_sum_sqrt = math_ops.sqrt(weight_sum) 相加后开根号 [ sqrt2 ]

(5) embeddings = math_ops.divide(embeddings, weight_sum_sqrt, name=name) 加权embedding除以(4)得到的开根号权重 [0.5... / 1.414 = 0.35355339059327373... (0.5 / math.pow(2,0.5))]

参考下面的执行顺序，对照 params与 weights ，即可得到最终5的结果。

7.Output Shape 输出尺寸

上面的 Demo Output的向量都为 2 x m(m=embedding.shape[1])，output = 2 的由来可以参考官方介绍:

(1) combine 得到的 params 是 [p0, p1, ..., pm] 的 m+1 维度向量，这里 P0 的定义是：

(2) sp_ids 的 shape 为 [以上述Demo为例] 2x4，所以这里 d0=2 d1=4

(3) shape(output) = [d0, p1, ...., pm]，最后输出维度为 d0 =2 => 2 x m ，P0可以理解为一个辅助变量，其代表了当前 params 的权重值

基于上述分析，输出的向量个数由 sp_ids.shape[0] 决定，现在尝试下把 2x4 转换为 3x4:

    params = tf.constant([[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
                          [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2],
                          [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]])


    indices = tf.SparseTensor(indices=[[0, 1],
                                   [0, 3],
                                   [1, 2],
                                   [1, 3],
                                   [2, 1]],
                          values=[2, 1, 1, 1, 1],
                          dense_shape=[3, 4])
    print("Indices")
    print(indices)
    emb = tf.nn.embedding_lookup_sparse(params, indices, None, combiner='sum')
    print(emb)

sp_ids shape 改为 3x4 后输出向量也由 2x10 变为 3x10

但是需要注意，如果只修改 sp_ids 的 shape，但新增 dim 的 value 全部设置为空，则输出维度以 sp_ids 实际有值的维度为准:

    params = tf.constant([[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
                          [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2],
                          [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]])


    indices = tf.SparseTensor(indices=[[0, 1],
                                   [0, 3],
                                   [1, 2],
                                   [1, 3]],
                          values=[2, 1, 1, 1],
                          dense_shape=[3, 4])
    emb = tf.nn.embedding_lookup_sparse(params, indices, None, combiner='sum')
    print(emb)

四.总结

embedding_lookup 基本就拿捏到这里，其本质类似于一个 Layer，将需要的 embedding 进行 combine 的操作，lookup 和 lookup_sparse 使用场景不同，大规模稀疏特征场景下 lookup_sparse 表现更好，这里 lookup 拿 embedding 多用于加载预训练向量。由于版本不一致的问题，有些代码和API可能有出入，有问题欢迎讨论~