深度学习中的主动学习方法综述资源-CSDN下载

需积分: 44 16 浏览量 2020-09-06 20:23:32 上传评论收藏 1.21MB PDF 举报

主动学习(Active Learning, AL)是一种机器学习策略，旨在通过最小化所需标注样本来最大化模型的性能提升。在传统的机器学习中，由于模型通常需要较少的标记数据，因此早期的主动学习并未受到广泛关注。然而，随着深度学习(Deep Learning, DL)的兴起，情况发生了变化。深度学习依赖于大量标注数据来训练复杂的神经网络模型，这些模型包含成千上万的参数，用于学习高效能的特征表示。近年来，互联网技术的快速发展带来了数据的爆炸性增长，我们进入了信息丰富的时代。这为深度学习的发展提供了丰富的数据资源，使其在语音识别、信息提取、医学图像等领域取得了重大突破。然而，大规模高质量标注数据的获取成本高昂，特别是在需要专业知识的领域，如医疗诊断或自然语言处理，这使得依赖大量标注数据的方法变得不切实际。因此，将主动学习应用于深度学习，即深度主动学习(Deep Active Learning, DAL)，成为了一种探索的方向。DAL的目标是利用主动学习策略减少样本标注的成本，同时保持深度学习的强大学习能力。在DAL中，模型可以自我选择最有价值的未标注数据进行标注，以最大程度地提升其性能。尽管对DAL的研究已经相对丰富，但还没有全面的综述文献对此进行系统性的总结和分类。这样的综述对于理解当前的研究进展、发现潜在的问题以及指导未来的研究方向至关重要。论文可能将涵盖以下方面： 1. **主动学习策略**：论文可能会讨论各种主动学习策略，如不确定采样（Uncertainty Sampling）、查询策略（Query Strategy）、多样性采样（Diversity Sampling）等，以及它们如何与深度学习模型结合。 2. **深度模型集成**：研究可能包括如何在主动学习框架下融合多种深度学习模型，以提高模型的泛化能力和学习效率。 3. **性能评估与比较**：论文可能会对比不同主动学习策略在不同任务和数据集上的性能表现，提供一个基准来评估新的方法。 4. **领域应用**：论文可能会探讨DAL在特定领域的应用，如医学图像分析、自然语言理解和计算机视觉，以及这些应用中的挑战和解决方案。 5. **开放问题与未来研究方向**：论文可能会指出当前DAL研究的局限性，比如如何更有效地选择标注样本，如何降低人工标注的需求，以及如何将主动学习与半监督学习、强化学习等其他学习范式结合。 6. **算法效率与计算资源**：DAL可能需要考虑如何在有限的计算资源下实现高效的样本选择和模型训练，以适应实际应用场景。这篇"深度主动学习综述论文"将对深度主动学习的理论、方法和应用进行详尽的总结，为研究者提供一个全面的视角来理解这个领域的最新进展，并可能为未来的研究提供有价值的指导。

资源推荐

资源详情

资源评论

A Survey of Deep Active Learning

PENGZHEN REN

∗

and YUN XIAO

∗

, Northwest University

XIAOJUN CHANG, Monash University

PO-YAO HUANG, Carnegie Mellon University

ZHIHUI LI, Qilu University of Technology (Shandong Academy of Sciences)

XIAOJIANG CHEN and XIN WANG, Northwest University

Active learning (AL) attempts to maximize a model’s performance gain while annotating the fewest samples

possible. Deep learning (DL) is greedy for data and requires a large amount of data supply to optimize a

massive number of parameters if the model is to learn how to extract high-quality features. In recent years,

due to the rapid development of internet technology, we have entered an era of information abundance

characterized by massive amounts of available data. As a result, DL has attracted signicant attention from

researchers and has been rapidly developed. Compared with DL, however, researchers have relatively low

interest in AL. This is mainly because before the rise of DL, traditional machine learning requires relatively

few labeled samples, meaning that early AL is rarely accorded the value it deserves. Although DL has made

breakthroughs in various elds, most of this success is due to the large number of publicly available annotated

datasets. However, the acquisition of a large number of high-quality annotated datasets consumes a lot of

manpower, making it unfeasible in elds that require high levels of expertise (such as speech recognition,

information extraction, medical images, etc.) Therefore, AL is gradually coming to receive the attention it is

due.

It is therefore natural to investigate whether AL can be used to reduce the cost of sample annotations, while

retaining the powerful learning capabilities of DL. As a result of such investigations, deep active learning

(DAL) has emerged. Although research on this topic is quite abundant, there has not yet been a comprehensive

survey of DAL-related works; accordingly, this article aims to ll this gap. We provide a formal classication

method for the existing work, along with a comprehensive and systematic overview. In addition, we also

analyze and summarize the development of DAL from an application perspective. Finally, we discuss the

confusion and problems associated with DAL and provide some possible development directions.

CCS Concepts: • Computing methodologies → Machine learning algorithms.

Additional Key Words and Phrases: Deep Learning, Active Learning, Deep Active Learning.

ACM Reference Format:

Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Xiaojiang Chen, and Xin Wang. 2020. A

Survey of Deep Active Learning. 30 pages.

1 INTRODUCTION

Both deep learning (DL) and active learning (AL) have important applications in the machine

learning community. Due to their excellent characteristics, they have attracted widespread research

interest in recent years. More specically, DL has achieved unprecedented breakthroughs in various

challenging tasks; however, this is largely due to the publication of massive labeling datasets

[

]. Therefore, DL is limited by the high cost of sample labeling in some professional elds

that require rich knowledge. In comparison, an eective AL algorithm can theoretically achieve

exponential acceleration in labeling eciency [

]. This huge potential saving in labeling costs

∗

Both authors contributed equally to this research.

Authors’ addresses: Pengzhen Ren, pzhren@foxmail.com; Yun Xiao, [email protected], Northwest University; Xiaojun

Chang, [email protected], Monash University; Po-Yao Huang, Carnegie Mellon University; Zhihui Li, Qilu University of

Technology (Shandong Academy of Sciences); Xiaojiang Chen; Xin Wang, Northwest University.

2020.

arXiv:2009.00236v1 [cs.LG] 30 Aug 2020

2 Ren and Chang, et al.

is a fascinating development. However, the classic AL algorithm also nds it dicult to handle

high-dimensional data [

160

]. Therefore, the combination of DL and AL, referred to as DAL, is

expected to achieve superior results. DAL has been widely utilized in various elds, including image

recognition [

], text classication [

145

180

185

], visual question answering [

] and

object detection [

121

], etc. Although a rich variety of related work has been published, DAL

still lacks a unied classication framework. In order to ll this gap, in this article, we will provide

a comprehensive overview of the existing DAL related work, along with a formal classication

method. We will next briey review the development status of DL and AL in their respective

elds. Subsequently, in Section 2, the necessity and challenges of combining DL and AL are further

explicated.

1.1 Deep Learning

DL attempts to build appropriate models by simulating the structure of the human brain. The

McCulloch-Pitts (MCP) model proposed in 1943 by [

] is regarded as the beginning of modern DL.

Subsequently, in 1986, [

129

] introduced backpropagation into the optimization of neural networks,

which laid the foundation for the subsequent rapid development of DL. In the same year, Recurrent

Neural Networks (RNNs) [

] were rst proposed. In 1998, the LeNet [

] network made its rst

appearance, representing one of the earliest uses of deep neural networks (DNN). However, these

pioneering early works were limited by the computing resources available at the time and did

not receive as much attention and investigation as they should have [

]. In 2006, Deep Belief

Networks (DBNs) [

] were proposed and used to explore a deeper range of networks, which

prompted the name of neural networks as DL. In 2012, in the ImageNet competition, the DL model

AlexNet [

] won the championship in one fell swoop. AlexNet uses the ReLU activation function

to eectively suppress the gradient disappearance problem, while the use of multiple GPUs greatly

improves the training speed of the model. Subsequently, DL began to win championships in various

competitions and constantly beat records in various tasks. From the perspective of automation, the

emergence of DL has transformed the manual design of features [

102

] in machine learning to

facilitate automatic extraction [

149

]. It is precisely because of this powerful automatic feature

extraction ability that DL has demonstrated such unprecedented advantages in many elds. After

decades of development, the eld of DL-related research work quite rich. In Fig.1a, we present

a standard deep learning model example: convolutional neural network (CNN) [

130

]. Based

on this approach, similar CNNs are applied to various image processing tasks. In addition, RNNs

and Generative Adversarial Networks (GANs) [

132

] are also widely utilized. Beginning in 2017,

DL gradually shifted from the initial feature extraction automation to the automation of model

architecture design [11, 124, 189]; however, this still has a long way to go.

Thanks to the publication of a large number of existing annotation datasets [

], in recent

years, DL has made breakthroughs in various elds including machine translation [

159

168

speech recognition [

110

116

120

136

], and image classication [

106

115

174

]. However, this

comes at the cost of a large number of manually labeled datasets, and DL has a strong greedy

attribute to the data. While, in the real world, obtaining a large number of unlabeled datasets is

relatively simple, the manual labeling of datasets comes at a high cost; this is particularly true for

those elds where labeling requires a high degree of professional knowledge [

153

]. For example,

the labeling and description of lung lesion images of COVID-19 patients requires experienced

clinicians to complete, and it is clearly impractical to demand that such professionals complete

a large amount of medical image labeling. Similar elds also include speech recognition [

188

medical imaging [

109

176

], recommender systems [

], information extraction [

satellite remote sensing [

] and robotics [

158

186

], etc. Therefore, a way of maximizing the

performance gain of the model when annotating a small number of samples is urgently required.

A Survey of Deep Active Learning 3

(a) Structure diagram of convolutional neural

network.

(b) The pool-based active learning cycle.

Fig. 1. Comparison of typical architectures of DL, AL and DAL. (a) A common DL model: Convolutional

Neural Network. (b)The pool-based AL cycle: Use the query strategy to query the sample in the unlabeled pool

and hand it over to the oracle for labeling, then add the queried sample to the labeled training dataset

and train, and then use the newly learned knowledge for the next round of querying. Repeat this process until

the label budget is exhausted or the pre-defined termination conditions are reached. (c) A typical example of

DAL: The parameters

of the DL model are initialized or pre-trained on the label training set

, and the

samples of the unlabeled pool

are used to extract features through the DL model. Then select samples

based on the corresponding query strategy, and query the label in querying to form a new label training set

, then train the DL model on

, and update

at the same time. Repeat this process until the label budget is

exhausted or the pre-defined termination conditions are reached.

1.2 Active Learning

AL is such a method. It aims to select the most useful samples from the unlabeled dataset and hand

it over to the oracle (e.g., human annotator) for labeling, so as to reduce the cost of labeling as much

as possible while still maintaining performance. AL approaches can be divided into membership

query synthesis [

], stream-based selective sampling [

] and pool-based [

] AL from

application scenarios [

139

]. Membership query synthesis means that the learner can request to

query the label of any unlabeled sample in the input space, including the sample generated by the

learner. Moreover, the key dierence between stream-based selective sampling and pool-based

sampling is that the former makes an independent judgment on whether each sample in the data

stream needs to query the labels of unlabeled samples, while the latter chooses the best query

sample based on the evaluation and ranking of the entire dataset. Although the pool-based scenario

4 Ren and Chang, et al.

seems to be more common in the application of the paper, it is clear that the application scenario

of stream-based selective sampling is more suitable for scenarios involving small mobile devices

where timeliness is required. In Fig.1b, we illustrates the framework diagram of the pool-based

active learning cycle. In the initial state, we can randomly select one or more samples from the

unlabeled pool

, give this sample to the oracle query label to get the labeled dataset

, and then

train the model on

using supervised learning. Next, we use this new knowledge to select the next

sample to be queried, add the newly queried sample to

and then conduct training. This process is

repeated until the label budget is exhausted or the pre-dened termination conditions are reached.

It is dierent from DL by using manual or automatic methods to design models with high-

performance feature extraction capabilities. AL starts with datasets, primarily through the design

of elaborate query rules to select the best samples from unlabeled datasets and query their labels,

in an attempt to reduce the labeling cost to the greatest extent possible. Therefore, the design

of query rules is crucial to the performance of AL methods. The related research is also quite

rich. For example, in a given set of unlabeled datasets, the main query strategies include the

uncertainty-based approach [

123

142

161

], diversity-based approach [

111

]

and expected model change [

127

141

]. In addition, many works have also studied hybrid query

strategies [

146

177

183

], taking into account the uncertainty and diversity of query samples,

and attempting to nd a balance between these two strategies. Because separate sampling based on

uncertainty often results in sampling bias [

], the currently selected sample is not representative of

the distribution of unlabeled datasets. On the other hand, considering only strategies that promote

diversity in sampling may lead to increased labeling costs, as may be a considerable number of

samples with low information content will consequently be selected. More classic query strategies

are examined in [

140

]. Although there is a substantial body of existing AL-related research, AL still

faces the problem of expanding to high-dimensional data (e.g., images, text and video, etc.) [

160

];

thus, most AL works tend to concentrate on low-dimensional problems [

160

]. In addition, AL

often queries high-value samples based on features extracted in advance, and does not have the

ability to extract features.

2 THE NECESSITY AND CHALLENGE OF COMBINING DL AND AL

DL has a strong learning ability in the context of high-dimensional data processing and automatic

feature extraction, while AL has signicant potential to eectively reduce labeling costs. Therefore,

an obvious approach is to combine DL and AL, as this will greatly expand their application potential.

This combined approach, referred to as DAL, was proposed by considering the complementary

advantages of the two methods, and researchers have high expectations for the results of studies in

this eld. However, although AL-related research into query strategy is quite rich, it is still quite

dicult to apply this strategy directly to DL. This is mainly due to:

•

Insucient data for label samples. AL often relies on a small amount of labeled sample data

to learn and update the model, while DL is often very greedy for data [

]. The labeled

training samples provided by the classic AL method thus insucient to support the training

of traditional DL. In addition, the one-by-one sample query method commonly used in AL is

also not applicable in the DL context [183].

•

Model uncertainty. The query strategy based on uncertainty is an important direction of

AL research. In classication tasks, although DL can use the softmax layer to obtain the

probability distribution on the label, the facts show that they are too condent. The softmax

response (SR) [

166

] of the nal output is unreliable as a measure of condence, and the

performance of this method will thus be even worse than that of random sampling [165].

A Survey of Deep Active Learning 5

(a) Batch query strategy considering

only the amount of information.

(b) Batch query strategy considering

both information volume and diver-

sity.

Fig. 2. A comparison diagram of two batch query strategies, one that only considers the amount of information

and one that considers both the amount and diversity of information. The size of the dots indicates the

amount of information in the samples, while the distance between the dots represents the similarity between

the samples. The points shaded in gray indicate the sample points to be queried in a batch.

•

Processing pipeline inconsistency. The processing pipelines of AL and DL are inconsistent.

Most AL algorithms focus primarily on the training of classiers, and the various query

strategies utilized are largely based on xed feature representations. In DL, however, feature

learning and classier training are jointly optimized. Only ne-tuning the DL models in the

AL framework, or treating them as two separate problems, may thus cause divergent issues

[166].

To address the rst problem, researchers have considered using generative networks for data

augmentation [

162

] or assigning pseudo-labels to high-condence samples in order to expand the

labeled training set [

166

]. Some researchers have also used labeled and unlabeled datasets to combine

supervised and semisupervised training across AL cycles [

148

]. In addition, previous heuristic-

based AL [

139

] query strategies have proven to be ineective when applied to DL [

138

]; therefore,

for the one-by-one query strategy in classic AL, many researchers focus on the improvement of

the batch sample query strategy [

183

], taking both the amount of information and the

diversity of batch samples into account. In order to solve the neglect of model uncertainty in DL,

some researchers have applied Bayesian deep learning [

] to deal with the high-dimensional mini-

batch samples with fewer queries in the AL context [

118

162

], thereby eectively alleviating

the problem of the DL model being too condent about the output results. Furthermore, to deal

with the pipeline inconsistency problem, researchers have considered modifying the combined

framework of AL and DL to make the proposed DAL model as general as possible, an approach that

can be extended to various application elds. This is of great signicance to the promotion of DAL.

For example, [

178

] embeds the idea of AL into DL and consequently proposes a task-independent

architecture design.

We will focus on the detailed discussion and summary of the various strategies used in DAL in

Section 3.

剩余29页未读，继续阅读

评论收藏

内容反馈

syp_net

粉丝: 158

深度主动学习综述论文

深度强化学习综述

图像分类中的深度主动学习研究综述.pdf

关于深度学习的综述与讨论_胡越

最新中文深度强化学习综述文章， 利于入门学习者理解（兼论计算机围棋的发展）

深度对比学习综述论文PDF

deep-active-learning-pytorch:一站式购买最新的深度主动学习方法

电子科大最新《深度半监督学习》综述论文（2021版）

deep-active-learning:深度主动学习

深度迁移主动学习研究综述.pdf

主动学习算法综述.pdf

深度学习论文集一

深度学习的经典论文

深度学习原理及应用综述.docx

deep learning

deep learning经典论文30篇

《深度持续学习》综述论文

最新《智能交通系统的深度强化学习》综述论文

近十年国内教育领域深度学习研究综述——基于CNKI的文献计量可视化分析.pdf

深度强化学习进展：从AlphaGo到AlphaGo Zero.pdf

2019-2020必看的十篇【深度学习领域综述论文】.zip

深度学习多源领域自适应综述论文.pdf

浙大最新「多模态深度学习」综述论文

主动学习经典论文入门.zip

论文研究 - 主动行为的初步文献综述

概率记录链接的主动学习-研究论文

最新资源

最新中文深度强化学习综述文章，利于入门学习者理解（兼论计算机围棋的发展）