fuzzy extractor 模糊提取器的代码解读和实现

最新推荐文章于 2025-06-18 12:01:30 发布

MrCharles

最新推荐文章于 2025-06-18 12:01:30 发布

阅读量2.3k

点赞数 1

CC 4.0 BY-SA版权

本文链接：https://blog.csdn.net/MrCharles/article/details/119872958

Fuzzy Vault/Commitment研究专栏收录该内容

5 篇文章

订阅专栏

本文介绍了模糊提取器（Fuzzy Extractor）的概念，它允许输入包含一定噪声但仍能生成均匀分布的随机字符串。文章阐述了模糊提取器的工作原理、正确性和安全性，并与安全概略（Secure Sketch）进行了对比。同时，给出了基于Canetti等人2016年论文的Python实现示例，强调了实现中依赖mask的重要性以及安全性方面的考量。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

模糊提取器的原理可以参看博客文章：https://blog.csdn.net/MrCharles/article/details/108734526

密码机制中的秘密值通常是随机串，要求是均匀分布，而且需要时可以精确再生。而在现实世界中秘密值很难满足这一点，例如，对于类似指纹等的生物特征，并不是均匀分布的随机值，而且每次需要时，也无法精确的再现（指纹都存在一定误差）。用户进行认证最常见的方式是基于Password，短的Password用户容易记忆，但是熵值很低，安全性差；而一些长的密码短语，用户不容易记忆，而且也无法使用均匀分布的随机数。要跨越这种障碍，需要一种方法能将现实中的这些秘密值转换为真正密码系统需要的均匀分布的随机数。这篇介绍的模糊提取器可以达到这一点要求。

模糊提取器Fuzzy Extractor（FE），允许输入拥有一定的噪声（或者错误），只要输入相近能提取出相同的一个均匀的随机字符串，大体构造如下：
在这里插入图片描述

Gen过程：输入w，输出辅助数据P（公开）和均匀随机值R（l比特）。

Rep过程：给定P，输入w’，重新生成均匀随机值R 。

正确性：如果dis(w,w’)<=t，可以重构出精确的R；如果dis(w,w’)>t，对Rec的输出不提供任何保证

安全性：辅助数据P不会泄露R的太多信息；R的分布接近均匀分布。

应用：从w中提取的R可以用作密钥，不过不需要存储，下次使用从w’恢复即可 ?W可以是生物指纹、物理指纹PUF、或者其他密码材料（如用户记得不太清楚的密码短语）等。

安全概略Secure Sketch (SS) 允许对噪声输入进行精确的重构，与FE不同的是，SS是重构出原始输入本身，并不解决非均匀分布问题，SS大体构造如下：
在这里插入图片描述

SS过程：输入w，输出s，s可以公开；。

Rec过程：给定s和输入w’（与w接近），可以恢复出w。

正确性：如果dis(w,w’)<=t，可以恢复出精确的w；如果dis(w,w’)>t，对Rec的输出不提供任何保证。

安全性：s不会泄露w的太多信息。

SS过程实际上是暗含一个FE的，通过如下图所示的方式，通过增加一个强随机数提取器Ext，可以从SS构造FE：

在这里插入图片描述

从前面可知，SS和FE都可以重构出之前的一个精确值，但是SS重构的是原始输入本身，不具有均匀随机性，通过增加一个强随机提取器Ext可以从SS构造出FE。这里，x是Ext的随机种子，x和SS(w)可以作为FE的辅助数据。通过可以使用HMAC或者KDF来实现一个Ext功能。

按照以上的概念，我们就可以实现一个非常典型的模糊提取器

以下是一个典型的fuzzy extractor (看起来像是key binding，但是还是有区别的)
在这里插入图片描述
代码我们采用：
Canetti, Ran, et al. “Reusable fuzzy extractors for low-entropy distributions.” Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer, Berlin, Heidelberg, 2016.
GitHub：https://github.com/carter-yagemann/python-fuzzy-extractor

key生成：

    def generate(self, value):
        """Takes a source value and produces a key and public helper
        This method should be used once at enrollment.
        Note that the "public helper" is actually a tuple. This whole tuple should be
        passed as the helpers argument to reproduce().
        :param value: the value to generate a key and public helper for.
        :rtype: (key, helper)
        """
        if isinstance(value, (bytes, str)):
            value = np.fromstring(value, dtype=np.uint8)

        key = np.fromstring(urandom(self.length), dtype=np.uint8) # 就是上面算法里面的key k
        key_pad = np.concatenate((key, np.zeros(self.sec_len, dtype=np.uint8)))

        nonces = np.zeros((self.num_helpers, self.nonce_len), dtype=np.uint8) # 一个随机数
        masks = np.zeros((self.num_helpers, self.length), dtype=np.uint8) # 另外一个随机数
        digests = np.zeros((self.num_helpers, self.cipher_len), dtype=np.uint8) # 用来保存哈希之后的数据

        for helper in range(self.num_helpers):
            nonces[helper] = np.fromstring(urandom(self.nonce_len), dtype=np.uint8) # 初始化随机
            masks[helper] = np.fromstring(urandom(self.length), dtype=np.uint8)# 初始化随机

        # By masking the value with random masks, we adjust the probability that given
        # another noisy reading of the same source, enough bits will match for the new
        # reading & mask to equal the old reading & mask.

        vectors = np.bitwise_and(masks, value) # mask 和输入 比特与。 这个地方其实很鸡贼，这样与一下，能否匹配的概率就提升了，因为参杂了masks的信息，而masks作为辅助数据保存起来了，在恢复的时候，就可以利用了。

        # The "digital locker" is a simple crypto primitive made by hashing a "key"
        # xor a "value". The only efficient way to get the value back is to know
        # the key, which can then be hashed again xor the ciphertext. This is referred
        # to as locking and unlocking the digital locker, respectively.

        for helper in range(self.num_helpers):
            d_vector = vectors[helper].tobytes()
            d_nonce = nonces[helper].tobytes()
            digest = pbkdf2_hmac(self.hash_func, d_vector, d_nonce, 1, self.cipher_len) # 这里是输入的数据d_vector 和 随机数d_nonce 运行哈希加密，计算哈希码
            digests[helper] = np.fromstring(digest, dtype=np.uint8) # 保存哈希

        ciphers = np.bitwise_xor(digests, key_pad) # 哈希和key XOR (是先mask，哈希，再XOR，此时的digests足够稳定)

        return (key.tobytes(), (ciphers, masks, nonces)) # 返回

key恢复：

    def reproduce(self, value, helpers):
        """Takes a source value and a public helper and produces a key
        Given a helper value that matches and a source value that is close to
        those produced by generate, the same key will be produced.
        :param value: the value to reproduce a key for.
        :param helpers: the previously generated public helper.
        :rtype: key or None
        """
        if isinstance(value, (bytes, str)):
            value = np.fromstring(value, dtype=np.uint8)

        if self.length != len(value):
            raise ValueError('Cannot reproduce key for value of different length')

        ciphers = helpers[0]
        masks = helpers[1]
        nonces = helpers[2]

        vectors = np.bitwise_and(masks, value) # 首先和mask 与操作，别忘了，mask是同一个哦，其实能否重构，mask关键作用。输入的value算是一个noise，扰动

        digests = np.zeros((self.num_helpers, self.cipher_len), dtype=np.uint8)
        for helper in range(self.num_helpers):
            d_vector = vectors[helper].tobytes()
            d_nonce = nonces[helper].tobytes()
            digest = pbkdf2_hmac(self.hash_func, d_vector, d_nonce, 1, self.cipher_len) # 和同样的d_nonce计算哈希
            digests[helper] = np.fromstring(digest, dtype=np.uint8)

        plains = np.bitwise_xor(digests, ciphers) # 然后XOR和之前的ciphers， 计算拿到解锁的plains，如果value和之前的value足够靠近，那么应该plains就和之前的key一致。

        # When the key was stored in the digital lockers, extra null bytes were added
        # onto the end, which makes it each to detect if we've successfully unlocked
        # the locker.

        checks = np.sum(plains[:, -self.sec_len:], axis=1)
        for check in range(self.num_helpers):
            if checks[check] == 0:
                return plains[check, :-self.sec_len].tobytes()

        return None

实例：

from fuzzy_extractor import FuzzyExtractor
extractor = FuzzyExtractor(16, 8)

key, helper = extractor.generate('AABBCCDDEEFFGGHH')

r_key = extractor.reproduce('AABBCCDDEEFFGGHH', helper)  # r_key should equal key
r_key = extractor.reproduce('AABBCCDDEEFFGGHI', helper)  # r_key will probably still equal key!
r_key = extractor.reproduce('AAAAAAAAAAAAAAAA', helper)  # r_key is no longer likely to equal key

完整代码：

class FuzzyExtractor(object):
    """The most basic non-interactive fuzzy extractor"""

    def __init__(self, length, ham_err, rep_err=0.001, **locker_args):
        """Initializes a fuzzy extractor
        :param length: The length in bytes of source values and keys.
        :param ham_err: Hamming error. The number of bits that can be flipped in the
            source value and still produce the same key with probability (1 - rep_err).
        :param rep_err: Reproduce error. The probability that a source value within
            ham_err will not produce the same key (default: 0.001).
        :param locker_args: Keyword arguments to pass to the underlying digital lockers.
            See parse_locker_args() for more details.
        """
        self.parse_locker_args(**locker_args)
        self.length = length
        self.cipher_len = self.length + self.sec_len

        # Calculate the number of helper values needed to be able to reproduce
        # keys given ham_err and rep_err. See "Reusable Fuzzy Extractors for
        # Low-Entropy Distributions" by Canetti, et al. for details.
        bits = length * 8
        const = float(ham_err) / log(bits)
        num_helpers = (bits ** const) * log(float(2) / rep_err, 2)

        # num_helpers needs to be an integer
        self.num_helpers = int(round(num_helpers))

    def parse_locker_args(self, hash_func='sha256', sec_len=2, nonce_len=16):
        """Parse arguments for digital lockers
        :param hash_func: The hash function to use for the digital locker (default: sha256).
        :param sec_len: security parameter. This is used to determine if the locker
            is unlocked successfully with accuracy (1 - 2 ^ -sec_len).
        :param nonce_len: Length in bytes of nonce (salt) used in digital locker (default: 16).
        """
        self.hash_func = hash_func
        self.sec_len = sec_len
        self.nonce_len = nonce_len

    def generate(self, value):
        """Takes a source value and produces a key and public helper
        This method should be used once at enrollment.
        Note that the "public helper" is actually a tuple. This whole tuple should be
        passed as the helpers argument to reproduce().
        :param value: the value to generate a key and public helper for.
        :rtype: (key, helper)
        """
        if isinstance(value, (bytes, str)):
            value = np.fromstring(value, dtype=np.uint8)

        key = np.fromstring(urandom(self.length), dtype=np.uint8)
        key_pad = np.concatenate((key, np.zeros(self.sec_len, dtype=np.uint8)))

        nonces = np.zeros((self.num_helpers, self.nonce_len), dtype=np.uint8)
        masks = np.zeros((self.num_helpers, self.length), dtype=np.uint8)
        digests = np.zeros((self.num_helpers, self.cipher_len), dtype=np.uint8)

        for helper in range(self.num_helpers):
            nonces[helper] = np.fromstring(urandom(self.nonce_len), dtype=np.uint8)
            masks[helper] = np.fromstring(urandom(self.length), dtype=np.uint8)

        # By masking the value with random masks, we adjust the probability that given
        # another noisy reading of the same source, enough bits will match for the new
        # reading & mask to equal the old reading & mask.

        vectors = np.bitwise_and(masks, value)

        # The "digital locker" is a simple crypto primitive made by hashing a "key"
        # xor a "value". The only efficient way to get the value back is to know
        # the key, which can then be hashed again xor the ciphertext. This is referred
        # to as locking and unlocking the digital locker, respectively.

        for helper in range(self.num_helpers):
            d_vector = vectors[helper].tobytes()
            d_nonce = nonces[helper].tobytes()
            digest = pbkdf2_hmac(self.hash_func, d_vector, d_nonce, 1, self.cipher_len)
            digests[helper] = np.fromstring(digest, dtype=np.uint8)

        ciphers = np.bitwise_xor(digests, key_pad)

        return (key.tobytes(), (ciphers, masks, nonces))

    def reproduce(self, value, helpers):
        """Takes a source value and a public helper and produces a key
        Given a helper value that matches and a source value that is close to
        those produced by generate, the same key will be produced.
        :param value: the value to reproduce a key for.
        :param helpers: the previously generated public helper.
        :rtype: key or None
        """
        if isinstance(value, (bytes, str)):
            value = np.fromstring(value, dtype=np.uint8)

        if self.length != len(value):
            raise ValueError('Cannot reproduce key for value of different length')

        ciphers = helpers[0]
        masks = helpers[1]
        nonces = helpers[2]

        vectors = np.bitwise_and(masks, value)

        digests = np.zeros((self.num_helpers, self.cipher_len), dtype=np.uint8)
        for helper in range(self.num_helpers):
            d_vector = vectors[helper].tobytes()
            d_nonce = nonces[helper].tobytes()
            digest = pbkdf2_hmac(self.hash_func, d_vector, d_nonce, 1, self.cipher_len)
            digests[helper] = np.fromstring(digest, dtype=np.uint8)

        plains = np.bitwise_xor(digests, ciphers)

        # When the key was stored in the digital lockers, extra null bytes were added
        # onto the end, which makes it each to detect if we've successfully unlocked
        # the locker.

        checks = np.sum(plains[:, -self.sec_len:], axis=1)
        for check in range(self.num_helpers):
            if checks[check] == 0:
                return plains[check, :-self.sec_len].tobytes()

        return None