如何用聚类的思想做分类（预测）_聚类如何做预测分析-CSDN博客

本文链接：https://blog.csdn.net/The_lastest/article/details/81078987

使用聚类进行分类预测需要训练集带有正确标签。方法包括直接计算类别簇中心点或通过迭代Kmeans算法。通过比较样本与簇中心点的距离来确定分类。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

刚组里有人让我用聚类来做个预测。你一反应是，what?你确定你没说错？这玩意儿这么分类预测。经过一番点播，才明白过来。

用聚类的思想来做分类预测需要一个前提，那就是训练集得有正确的标签。

思路1：

第一步：根据训练集和标签，直接计算每个类别的簇中心点；
第二步：遍历所有的测试样本，一次计算每个样本到所有中心点的距离，选择距离最小的簇所对应的类标签即可。

代码：

def Compute_Center_Each_Class(x_train,y_train,batch_size):
    now=datetime.datetime.now()
    print("Compute center of each class begin: ",now)
    # x_train = x_train.toarray() # 如果是稀疏矩阵，加上这句
    classes = np.unique(y_train)# 统计类（簇）数
    class_center = np.zeros([len(classes),x_train.shape[1]])
    times = int(x_train.shape[0]/batch_size) + 1 
    # times 计算计算完所有样本需要迭代的轮数
    for k in range(times):# 这里通过mini_batch来分批计算
        begin = k * batch_size
        end = begin + batch_size
        if end >= x_train.shape[0]:
            end = x_train.shape[0]
        batch_x = x_train[begin:end]
        batch_y = y_train[begin:end]
        batch_classes = np.unique(batch_y)# 一个batch 中的
        for i in batch_classes:
            index = np.