【人工智能项目】深度学习实现图像多标签分类

最新推荐文章于 2025-05-30 09:26:25 发布

原创最新推荐文章于 2025-05-30 09:26:25 发布 · 9.5k 阅读

146 ·

CC 4.0 BY-SA版权

文章标签：

#深度学习 #分类 #人工智能

人工智能机器学习深度学习同时被 2 个专栏收录

76 篇文章

订阅专栏

项目篇

71 篇文章

订阅专栏

本文介绍了如何使用深度学习实现图像的多标签分类。通过ResNet50模型进行迁移学习，处理CMYK格式图片并进行数据增强，训练过程中采用二进制交叉熵损失和sigmoid激活函数。模型训练后，对测试集进行预测并转换为中文标签，最后生成提交文件。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

【人工智能项目】深度学习实现图像多标签分类

在这里插入图片描述

本次实现一个图像多标签分类的任务，接下来还会分享我研究生阶段做过的其它任务，走起瓷！！！
在这里插入图片描述

任务介绍

训练一个多标签分类模型，使得该模型能自动根据输入的任意图像提供对应图片内容的多个标签

数据集介绍

本次竞赛共有3.5W张图片作为训练集，8K张图片作为第一阶段评分测试集，最后6612张图片作为总决赛测试集。

viual_china_train.csv：图片与标签对应的列表。
valid_tags.txt: 6941个标签的有序列表文件。
tags_train.npz: 3.5W张图片对应的标签
train.tgz: 3.5W张训练图片
valid.tgz: 8K验证图片

思路

如猫狗大战的二分类、cifar-10的多分类，本次题目是多标签的图像分类，每张图片可能没有标签页可能存在6941个标签，即各个标签之间是不存在互斥关系的，所以最终分类的损失函数不能用softmax而必须要用sigmoid。然后把分类层预测6941个神经元，每个神经元用sigmoid函数返回是否存在某个标签即可。

具体流程

# 导入模块
import os

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

from PIL import Image

# 存放地址
dir_path = "D:\\01\\01contest"
print(os.listdir(dir_path))

[‘tag_train.npz’, ‘train’, ‘train.tar’, ‘train.tgz’, ‘valid’, ‘valid.tar’, ‘valid.tgz’, ‘valid_tags.txt’, ‘visual_china_train.csv’]

获取数据和标签

先导入train_csv的数据，保存文件名以及标签名。

# 查看数据
train_path = os.path.join(dir_path,"visual_china_train.csv")
train_df = pd.read_csv(train_path)
train_df.head()

在这里插入图片描述

print(train_df.shape)

(35000, 2)

处理图片名称，将其保存到img_paths列表中

img_paths = list(train_df["img_path"])

验证图片标签是否的确只有6941个标签。

tags = []
for i in range(train_df["tags"].shape[0]):
    for tag in train_df["tags"].iloc[i].split(","):
        tags.append(tag)
        
tags = set(tags)
print("the length of tags:",len(tags))

the length of tags: 6941

前期准备工作差不多做完了，开始导入训练集。原始训练集中存在CMYK格式的图片，传统图片处理一般格式为RGB格式，所以我们使用Image库中的convert函数对非RGB格式的图片进行转换。

# 尝试少量数据验证模型
num_train = 5000
X_train = np.zeros((num_train,224,224,3),dtype=np.uint8)
i = 0
for img_path in img_paths[:num_train]:
    img = Image.open(dir_path + "/train/" + img_path)
    if img.mode!="RGB":
        img = img.convert("RGB")
    img = img.resize((224,224))
    arr = np.asarray(img)
    X_train[i,:,:,:] = arr
    i += 1

训练集导入完成，查看一下图片的样子.

fig,axes = plt.subplots(6,6,figsize=(20,20))

j = 0
for i,img in enumerate(X_train[:36]):
    axes[i//6,j%6].imshow(img)
    j+=1

在这里插入图片描述
准备标签

y_train_path = os.path.join(dir_path,"tag_train.npz")
y_train = np.load(y_train_path)

y_train.files

[‘tag_train’]

y_train = y_train["tag_train"]
y_train.shape

(35000, 6941)

这样，数据和标签都拿到了，这里还是要是要分割一下数据集的。

分割数据集用作不同用途。

from sklearn.model_selection import train_test_split
X_train2,X_val,y_train2,y_val = train_test_split(X_train,y_train[:num_train],test_size=0.2,random_state=2019)

print(X_train2.shape)
print(y_train2.shape)
print(X_val.shape)
print(y_val.shape)

(4000, 224, 224, 3)
(4000, 6941)
(1000, 224, 224, 3)
(1000, 6941)

模型搭建

这里直接迁移模型，用ResNet模型进行迁移。

# 导入开发需要的库
from keras.models import *
from keras.layers import *
from keras.optimizers import *
from keras.callbacks import *
from keras.applications import *

base_model = ResNet50(input_tensor=Input((224,224,3)),weights="imagenet",include_top=False)

for layers in base_model.layers:
    layers.trainable = False
x = GlobalAveragePooling2D()(base_model.output)
x = Dropout(0.25)(x)
x = Dense(6941,activation="sigmoid")(x)
model = Model(base_model.input,x)

监测精准率召回率和F1的功能函数

import keras.backend as K

def precision(y_true, y_pred):
    # Calculates the precision
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    return precision

def recall(y_true, y_pred):
    # Calculates the recall
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    recall = true_positives / (possible_positives + K.epsilon())
    return recall

def fbeta_score(y_true, y_pred, beta=1):
    # Calculates the F score, the weighted harmonic mean of precision and recall.
    if beta < 0:
        raise ValueError('The lowest choosable beta is zero (only precision).')
    
    # If there are no true positives, fix the F score at 0 like sklearn.
    if K.sum(K.round(K.clip(y_true, 0, 1))) == 0:
        return 0

    p = precision(y_true, y_pred)
    r = recall(y_true, y_pred)
    bb = beta ** 2
    fbeta_score = (1 + bb) * (p * r) / (bb * p + r + K.epsilon())
    return fbeta_score

def fmeasure(y_true, y_pred):
    # Calculates the f-measure, the harmonic mean of precision and recall.
    return fbeta_score(y_true, y_pred, beta=1)

数据增强操作

from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(width_shift_range=0.1,
                                   height_shift_range=0.1,
                                   zoom_range=0.1)
val_datagen = ImageDataGenerator() # 验证集不做图片增强

batch_size = 4

train_generator = train_datagen.flow(X_train2,y_train2,batch_size=batch_size,shuffle=False)
val_generator = val_datagen.flow(X_val,y_val,batch_size=batch_size,shuffle=False)

模型训练

keras.callbacks.ModelCheckpoint(filepath,monitor='val_loss',verbose=0,save_best_only=False, save_weights_only=False, mode='auto', period=1)

filename：字符串，保存模型的路径（可以将模型的准确率和损失等写到路径中，格式如下：）
monitor:需要检测的值如测试集损失或者训练集损失等
save_best_only：当设置为True时，监测值有改进时才会保存当前的模型
verbose：信息展示模式，0或1（当为1时会有如下矩形框的信息提示）
mode：‘auto’，‘min’，‘max’之一，在save_best_only=True时决定性能最佳模型的评判准则，例如，当监测值为val_acc时，模式应为max，当监测值为val_loss时，模式应为min。在auto模式下，评价准则由被监测值的名字自动推断。
save_weights_only：若设置为True，则只保存模型权重，否则将保存整个模型
period：CheckPoint之间的间隔的epoch数

keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=10, verbose=0, mode='auto', epsilon=0.0001, cooldown=0, min_lr=0)

当评价指标不在提升时，减少学习率
monitor：被监测的量
factor：每次减少学习率的因子，学习率将以lr = lr*factor的形式被减少
patience：当patience个epoch过去而模型性能不提升时，学习率减少的动作会被触发
mode：‘auto’，‘min’，‘max’之一，在min模式下，如果检测值触发学习率减少。在max模式下，当检测值不再上升则触发学习率减少。
epsilon：阈值，用来确定是否进入检测值的“平原区”
cooldown：学习率减少后，会经过cooldown个epoch才重新进行正常操作
min_lr：学习率的下限

checkpointer = ModelCheckpoint(filepath='weights_best_simple_model.hdf5', 
                            monitor='val_fmeasure',verbose=1, save_best_only=True, mode='max')
reduce = ReduceLROnPlateau(monitor='val_fmeasure',factor=0.5,patience=2,verbose=1,min_delta=1e-4,mode='max')

model.compile(optimizer = 'adam',
           loss='binary_crossentropy',
           metrics=['accuracy',fmeasure,recall,precision])

epochs = 5

history = model.fit_generator(train_generator,
                             validation_data=val_generator,
                             steps_per_epoch=num_train/batch_size,
                             validation_steps = num_train/batch_size,
                             epochs=epochs,
                             callbacks=[checkpointer,reduce],
                             verbose=1)

在这里插入图片描述
以上就是5000张图片的简单模型训练过程。

模型保存

model.save("model.h5")

模型预测

predict_path = os.path.join(dir_path,"valid")
predict_img_paths = os.listdir(predict_path)
predict_num = len(predict_img_paths)
print(predict_num)

导入测试集

model.load_weights("weights_best_simple_model.hdf5")

X_test = np.zeros((predict_num,224,224,3),dtype=np.uint8)
i = 0
for img_path in predict_img_paths:
    img = Image.open(predict_path + "\\" + img_path)
    if img.mode!="RGB":
        img = img.convert("RGB")
    img = img.resize((224,224))
    arr = np.asarray(img)
    X_test[i,:,:,:] = arr
    i += 1

预测测试集并将结果转为中文标签，以便生成提交文件。

y_pred = model.predict(X_test)

print(y_pred.shape)
y_pred[0]

(8000, 6941)
array([3.2395124e-05, 0.0000000e+00, 2.0861626e-07, …, 0.0000000e+00,
0.0000000e+00, 0.0000000e+00], dtype=float32)

# 将预测结果转换为中文标签

# 第一步  形成索引和标签的字典 
def hash_tag(filepath):
    f = open(filepath,"r",encoding="utf-8")
    hash_tag = {}
    i = 0
    for line in f.readlines():
        line = line.strip()
        hash_tag[i] = line
        i += 1
    return hash_tag
filepath = os.path.join(dir_path,"valid_tags.txt")
hash_tag = hash_tag(filepath)

# 第二步将结果转为中文标签
def arr2tag(arr):
    tags = []
    for i in range(arr.shape[0]):
        tag = []
        index = np.where(arr[i]>0.5)
        index = index[0].tolist()
        tag = [hash_tag[j] for j in index]
        tags.append(tag)
    return tags
y_tags = arr2tag(y_pred)

print(y_tags[0])

[‘20多岁’, ‘一个人’, ‘不看镜头’, ‘东亚’, ‘东方人’, ‘亚洲’, ‘亚洲人’, ‘亚洲人和印度人’, ‘人’, ‘仅成年人’, ‘休闲活动’, ‘休闲装’, ‘女性’, ‘幸福’, ‘彩色图片’, ‘微笑’, ‘成年人’, ‘户外’, ‘拿着’, ‘摄影’, ‘放松’, ‘水平画幅’, ‘爱’, ‘生活方式’, ‘男人’, ‘男性’, ‘白昼’, ‘衣服’, ‘青年人’]

print(len(y_tags[0]))
print(len(y_tags[1]))

29
30

生成提交文件*

import pandas as pd

df = pd.DataFrame({"img_path":predict_img_paths, "tags":y_tags})
for i in range(df["tags"].shape[0]):
    df["tags"].iloc[i] = ",".join(str(e) for e in  df["tags"].iloc[i])
df.to_csv("submit.csv",index=None)

# 预览一下结果文件
predict_df = pd.read_csv("submit.csv")
predict_df.head()

在这里插入图片描述

小结

那本次就到此为止，有问题请留言了，要是时间长了，可能也就回忆不起来了~~

在这里插入图片描述