Hausdorff 距离

最新推荐文章于 2025-06-06 21:53:37 发布

有为少年

最新推荐文章于 2025-06-06 21:53:37 发布

阅读量1.4k

点赞数 18

CC 4.0 BY-SA版权

分类专栏：深度学习基础知识文章标签：人工智能计算机视觉深度学习神经网络

本文链接：https://blog.csdn.net/P_LarT/article/details/145226053

深度学习基础知识专栏收录该内容

11 篇文章

订阅专栏

Hausdorff 距离

本文的内容主要围绕目标定位经典工作 Locating Objects Without Bounding Boxes 展开，着重于介绍 Hausdorff Distance 相关的知识。

Hausdorff Distance

Pasted image 20220714184814.png

这是一种用于度量两个点集的距离的度量方式。其已经被广泛应用于多种任务中，包括字符识别、人脸识别、以及场景匹配等。

$\Omega$ 表示所有可能点的空间。对于 2D 图像这样的二维平面可能就是 $\mathbb{R}^{2}$ 。对于两个包含点数可能不同的点集 $X$ 和 $Y$ ，有针对其中的点的距离度量 $d (x, y)$ ，则具体计算形式为：

$d_H(X,Y) = \max \{ \sup_{x \in X} \inf_{y \in Y} d(x,y), \sup_{y \in Y} \inf_{x \in X} d(x,y) \}$

其中 $\sup$ 为上确界操作， $\inf$ 为下确界操作，对于我们关注的图像中的计算所对应的有限点集而言，分别可以简单理解为最大值和最小值。且两点之间距离最大为图像的对角线长度。而这也可以认为是两个点集之间 Hausdorff 距离可能的上界：

$d_H(X,Y) \le d_{max} = \max_{x \in \Omega, y \in \Omega} d(x,y)$

这个度量的计算过程可以简单归纳为如下几步：

$\inf_{y \in Y} d(x,y)$ ：对于每个 $x$ ，寻找距离最近（下确界）的 $y$ 所对应的距离。
$\sup_{x \in X} \inf_{y \in Y} d(x,y)$ ：从所有的 $x$ 所对应的距离下界中寻找最大值（上确界）。
$\inf_{x \in X} d(x,y)$ ：对于每个 $y$ ，寻找距离最近（下确界）的 $x$ 所对应的距离。
$\sup_{y \in Y} \inf_{x \in X} d(x,y)$ ：从所有的 $y$ 所对应的距离下界中寻找最大值（上确界）。
$\max \{ \dots \}$ ：对两部分计算结果选择最大值。

这一计算过程直观上可以简单理解为，如果一个点集中的每个点都非常接近于另一个点集中的一些点，那么两个点集就可以认为很接近。

对于 $\in \Omega$ ，Hausdorff 距离满足：

$d_H(X,Y) \ge 0$
$d_H(X,Y)=0 \Leftrightarrow X=Y$
$d_H(X,Y)=d_H(Y,X)$
$d_H(X,Y) \le d_H(X,Z)+d_H(Z,Y)$

由于涉及到最大距离的选择，所以 Hausdorff 距离对于异常点是很敏感的。

Average Hausdorff Distance

为了避免这一点，平均 Hausdorff 距离成为了更常用的选择：

$d_{AH} = \frac{1}{|X|}\sum_{x \in X} \min_{y \in Y} d(x,y) + \frac{1}{|Y|}\sum_{y \in Y} \min_{x \in X} d(x,y)$

这里的两项分别对两个点集中点的数量计算了平均最短距离。

这一形式仍然满足前面四条属性中的前三条，但是不再满足第四条了。并且也因此，Hausdorff 距离对于两个集合中的任意点都是可微分的。

让 $Y$ 表示包含真值点坐标的集合， $X$ 作为模型的预测。理想情况下，可以会使用平均 Hausdorff 距离作为损失函数用于训练过程，但是其作为损失函数存在限制。由于 FCN 风格的模型通常使用预测图上的高的激活位置指示目标中心，通常并不会直接返回像素坐标。为了使得这样情况可以正常优化，必须保证损失函数对于模型输出时可微分的。而上面直接基于坐标的形式就不行了。

Weighted Hausdorff Distance

于是，在这篇论文中提出了一个新的改进版本的 Hausdorff 距离，即加权 Hausdorff 距离：

$\begin{align} d_{WH}(p,Y) & = \frac{1}{\sum_{x \in \Omega}p_x +\epsilon} \sum_{x \in \Omega} p_x \min_{y \in Y} d(x,y) + \frac{1}{|Y|} \sum_{y \in {Y}} \underset{x \in \Omega}{M_{\alpha}}[p_x d(x,y) + (1-p_x)d_{max}] \\ \underset{a \in A}{M_{\alpha}}[f(a)] & = (\frac{1}{|A| \sum_{a \in A} f^\alpha(a)})^\frac{1}{\alpha} \end{align}$

第一部分计算了每个 $x$ 与最近的 $y$ 的距离，使用 x 处的预测值对平均后的结果进行加权。这里可以看做是一个加权的平均距离。

而第二部分则把针对位置 $x$ 处理的距离设定为了使用 $p_x$ 加权的 $d (x, y)$ 和图像中可能的最大距离 $d_{max}$ （即对角线长度）的组合。这个式子：

在极端情形，即 $p_x=0$ 时，此时对应的含义就成了图像对角线，因为此时距离可以理解为与任意的 $y$ 都等于最大距离。而当 $p_x=1$ 时，此时则仅考虑 $d (x, y)$ ，即实际的距离。
由于 $p_x$ 实际并不是二值状态，而是一个 0~1 之间的变化值，所以这可以表示一种连续集合与离散集合的距离形式。

这里特别的是 $\underset{a \in A}{M_{\alpha}}[f(a)]$ 是广义平均，具体可见：

广义平均在特殊参数的设定下可以实现对于最大和最小函数的逼近。但是广义平均本身却是可微的。

代码解析

代码来自：https://github.com/javiribera/locating-objects-without-bboxes/blob/master/object-locator/losses.py

先定义一些基本计算函数，包括计算成对欧氏距离的 cdist 和计算广义平均的 generaliz_mean。

def _assert_no_grad(variables):
    for var in variables:
        assert not var.requires_grad, \
            "nn criterions don't compute the gradient w.r.t. targets - please " \
            "mark these variables as volatile or not requiring gradients"


def cdist(x, y):
    """
    Compute distance between each pair of the two collections of inputs.
    :param x: Nxd Tensor
    :param y: Mxd Tensor
    :res: NxM matrix where dist[i,j] is the norm between x[i,:] and y[j,:],
          i.e. dist[i,j] = ||x[i,:]-y[j,:]||

    """
    differences = x.unsqueeze(1) - y.unsqueeze(0)
    distances = torch.sum(differences**2, -1).sqrt()
    return distances


def generaliz_mean(tensor, dim, p=-9, keepdim=False):
    """The generalized mean. It corresponds to the minimum when p = -inf.

    https://en.wikipedia.org/wiki/Generalized_mean

    :param tensor: Tensor of any dimension.
    :param dim: (int or tuple of ints) The dimension or dimensions to reduce.
    :param keepdim: (bool) Whether the output tensor has dim retained or not.
    :param p: (float<0).
    """
    assert p < 0
    res= torch.mean((tensor + 1e-6)**p, dim, keepdim=keepdim)**(1./p)
    return res

Average Hausdorff Distance

def averaged_hausdorff_distance(set1, set2, max_ahd=np.inf):
    """
    Compute the Averaged Hausdorff Distance function
    between two unordered sets of points (the function is symmetric).
    Batches are not supported, so squeeze your inputs first!
    :param set1: Array/list where each row/element is an N-dimensional point.
    :param set2: Array/list where each row/element is an N-dimensional point.
    :param max_ahd: Maximum AHD possible to return if any set is empty. Default: inf.
    :return: The Averaged Hausdorff Distance between set1 and set2.
    """

    if len(set1) == 0 or len(set2) == 0:
        return max_ahd

    set1 = np.array(set1)
    set2 = np.array(set2)

    assert set1.ndim == 2, 'got %s' % set1.ndim
    assert set2.ndim == 2, 'got %s' % set2.ndim

    assert set1.shape[1] == set2.shape[1], \
        'The points in both sets must have the same number of dimensions, got %s and %s.'\
        % (set2.shape[1], set2.shape[1])

    d2_matrix = pairwise_distances(set1, set2, metric='euclidean')

    res = np.average(np.min(d2_matrix, axis=0)) + \
        np.average(np.min(d2_matrix, axis=1))

    return res


class AveragedHausdorffLoss(nn.Module):
    def __init__(self):
        super(nn.Module, self).__init__()

    def forward(self, set1, set2):
        """Compute the Averaged Hausdorff Distance function between two unordered sets of points (the function is symmetric).
        Batches are not supported, so squeeze your inputs first!
        
        :param set1: Tensor where each row is an N-dimensional point.
        :param set2: Tensor where each row is an N-dimensional point.
        :return: The Averaged Hausdorff Distance between set1 and set2.
        """

        assert set1.ndimension() == 2, 'got %s' % set1.ndimension()
        assert set2.ndimension() == 2, 'got %s' % set2.ndimension()

        assert set1.size()[1] == set2.size()[1], \
            'The points in both sets must have the same number of dimensions, got %s and %s.'\
            % (set2.size()[1], set2.size()[1])

        d2_matrix = cdist(set1, set2)

        # Modified Chamfer Loss
        term_1 = torch.mean(torch.min(d2_matrix, 1)[0])
        term_2 = torch.mean(torch.min(d2_matrix, 0)[0])

        res = term_1 + term_2

        return res

平均形式的实现方式非常简单，关键在于计算点之间的成对距离矩阵，之后沿着两个轴（以不同的点集作为基准）计算取最小和均值操作。最终的加和即为最终的距离。

注意这里的实现里没有为 batch 形式提供支持。而且考虑到不同样本对应的点集中点数量的差异，导致无法直接将不同的数据堆叠到一起。

Werighted Hausdorff Distance

class WeightedHausdorffDistance(nn.Module):
    def __init__(self,
                 resized_height, resized_width,
                 p=-9,
                 return_2_terms=False,
                 device=torch.device('cpu')):
        """
        :param resized_height: Number of rows in the image.
        :param resized_width: Number of columns in the image.
        :param p: Exponent in the generalized mean. -inf makes it the minimum.
        :param return_2_terms: Whether to return the 2 terms
                               of the WHD instead of their sum.
                               Default: False.
        :param device: Device where all Tensors will reside.
        """
        super(nn.Module, self).__init__()

        # Prepare all possible (row, col) locations in the image
        self.height, self.width = resized_height, resized_width
        self.resized_size = torch.tensor([resized_height,
                                          resized_width],
                                         dtype=torch.get_default_dtype(),
                                         device=device)
        self.max_dist = math.sqrt(resized_height**2 + resized_width**2)
        self.n_pixels = resized_height * resized_width
        self.all_img_locations = torch.from_numpy(cartesian([np.arange(resized_height),
                                                             np.arange(resized_width)]))
        # Convert to appropiate type
        self.all_img_locations = self.all_img_locations.to(device=device,
                                                           dtype=torch.get_default_dtype())

        self.return_2_terms = return_2_terms
        self.p = p

    def forward(self, prob_map, gt, orig_sizes):
        """
        Compute the Weighted Hausdorff Distance function
        between the estimated probability map and ground truth points.
        The output is the WHD averaged through all the batch.

        :param prob_map: (B x H x W) Tensor of the probability map of the estimation.
                         B is batch size, H is height and W is width.
                         Values must be between 0 and 1.
        :param gt: List of Tensors of the Ground Truth points.
                   Must be of size B as in prob_map.
                   Each element in the list must be a 2D Tensor,
                   where each row is the (y, x), i.e, (row, col) of a GT point.
        :param orig_sizes: Bx2 Tensor containing the size of the original images.
                           B is batch size.
                           The size must be in (height, width) format.
        :return: Single-scalar Tensor with the Weighted Hausdorff Distance.
                 If self.return_2_terms=True, then return a tuple containing
                 the two terms of the Weighted Hausdorff Distance.
        """

        _assert_no_grad(gt)

        assert prob_map.dim() == 3, 'The probability map must be (B x H x W)'
        assert prob_map.size()[1:3] == (self.height, self.width), \
            'You must configure the WeightedHausdorffDistance with the height and width of the ' \
            'probability map that you are using, got a probability map of size %s'\
            % str(prob_map.size())

        batch_size = prob_map.shape[0]
        assert batch_size == len(gt)

        terms_1 = []
        terms_2 = []
        for b in range(batch_size):

            # One by one
            prob_map_b = prob_map[b, :, :]
            gt_b = gt[b]
            orig_size_b = orig_sizes[b, :]
            norm_factor = (orig_size_b/self.resized_size).unsqueeze(0)
            n_gt_pts = gt_b.size()[0]

            # Corner case: no GT points
            if gt_b.ndimension() == 1 and (gt_b < 0).all().item() == 0:
                terms_1.append(torch.tensor([0],
                                            dtype=torch.get_default_dtype()))
                terms_2.append(torch.tensor([self.max_dist],
                                            dtype=torch.get_default_dtype()))
                continue

            # Pairwise distances between all possible locations and the GTed locations
            n_gt_pts = gt_b.size()[0]
            normalized_x = norm_factor.repeat(self.n_pixels, 1) *\
                self.all_img_locations
            normalized_y = norm_factor.repeat(len(gt_b), 1)*gt_b
            d_matrix = cdist(normalized_x, normalized_y)  # HWxN

            # Reshape probability map as a long column vector,
            # and prepare it for multiplication
            p = prob_map_b.view(prob_map_b.nelement())
            n_est_pts = p.sum()
            p_replicated = p.view(-1, 1).repeat(1, n_gt_pts)

            # Weighted Hausdorff Distance
            term_1 = (1 / (n_est_pts + 1e-6)) * \
                torch.sum(p * torch.min(d_matrix, 1)[0])  # HWxN -> HW -> 1
            weighted_d_matrix = (1 - p_replicated)*self.max_dist + p_replicated*d_matrix
            minn = generaliz_mean(weighted_d_matrix,
                                  p=self.p,
                                  dim=0, keepdim=False)  # HWxN -> N
            term_2 = torch.mean(minn)
            terms_1.append(term_1)
            terms_2.append(term_2)
        terms_1 = torch.stack(terms_1)
        terms_2 = torch.stack(terms_2)

        if self.return_2_terms:
            res = terms_1.mean(), terms_2.mean()
        else:
            res = terms_1.mean() + terms_2.mean()
        return res

由于这里不同样本对应的点集不同，所以无法直接利用 batch 形式的计算，需要对每个样本单独计算，最后整体平均。这里同时考虑了实际模型输入输出中图像形状的变化，为了将距离关系对应于原图，所以这里利用放缩后的尺寸和原始尺寸之间计算了一个坐标放缩因子用于调整输出图中坐标和真值坐标。

之后针对两项分别进行计算，对于第一项，其计算方式和 Average Hausdorff Distance 一致，而第二项则根据公式进行了变换。这类时候首先计算了加权形式的距离矩阵，利用不同位置上的预测值对最大距离（预测图的对角线长度）和前面计算的距离矩阵进行加权求和。之后对加权距离矩阵计算广义平均，利用负指数，获得近似的最小值。最终对所有真值点对应的近似平均最小距离。