Multimodal Data Fusion Anoverview of Methods

This paper provides an overview of multimodal data fusion, which involves analyzing multiple datasets that provide information about the same phenomenon. The paper discusses why multimodal data fusion is useful, providing a more complete understanding than single datasets alone. It also covers several data-driven methods for performing multimodal data fusion, such as matrix and tensor decompositions, that account for diversity across datasets. The goal is to introduce readers to the broad field of multimodal data fusion and its opportunities across many application domains.

Uploaded by

Samuele Tesfaye

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views29 pages

Multimodal Data Fusion Anoverview of Methods

Uploaded by

Samuele Tesfaye

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

INVITED

PAPER

Multimodal Data Fusion:

An Overview of Methods,
Challenges, and Prospects
This paper provides an overview of the main challenges in multimodal data fusion
across various disciplines and addresses two key issues: ‘‘why we need data fusion’’ and
‘‘how we perform it.’’
By Dana Lahat, Tülay Adali, Fellow IEEE , and Christian Jutten, Fellow IEEE

ABSTRACT | In various disciplines, information about the same KEYWORDS | Blind source separation; data fusion; latent
phenomenon can be acquired from different types of detectors, variables; multimodality; multiset data analysis; overview;
at different conditions, in multiple experiments or subjects, tensor decompositions
among others. We use the term ‘‘modality’’ for each such
acquisition framework. Due to the rich characteristics of natural
phenomena, it is rare that a single modality provides complete I. INTRODUCTION
knowledge of the phenomenon of interest. The increasing
Information about a phenomenon or a system of interest
availability of several modalities reporting on the same system
can be obtained from different types of instruments,
introduces new degrees of freedom, which raise questions
measurement techniques, experimental setups, and other
beyond those related to exploiting each modality separately. As
types of sources. Due to the rich characteristics of natural
we argue, many of these questions, or ‘‘challenges,’’ are
processes and environments, it is rare that a single
common to multiple domains. This paper deals with two key
acquisition method provides complete understanding
issues: ‘‘why we need data fusion’’ and ‘‘how we perform it.’’
thereof. The increasing availability of multiple data sets
The first issue is motivated by numerous examples in science
that contain information, obtained using different acqui-
and technology, followed by a mathematical framework that
sition methods, about the same system, introduces new
showcases some of the benefits that data fusion provides. In
degrees of freedom that raise questions beyond those
order to address the second issue, ‘‘diversity’’ is introduced as a
related to analyzing each data set separately.
key concept, and a number of data-driven solutions based on
The foundations of modern data fusion have been laid in
matrix and tensor decompositions are discussed, emphasizing
the first half of the 20th century [1], [2]. Joint analysis of
how they account for diversity across the data sets. The aim of
multiple data sets has since been the topic of extensive
this paper is to provide the reader, regardless of his or her
research, and earned a significant leap forward in the late
community of origin, with a taste of the vastness of the field, the
1960s/early 1970s with the formulation of concepts and
prospects, and the opportunities that it holds.
techniques such as multiset canonical correlation analysis
(CCA) [3], parallel factor analysis (PARAFAC) [4], [5], and
other tensor decompositions [6], [7]. However, until rather
Manuscript received April 25, 2015; accepted June 25, 2015. Date of current version
recently, in most cases, these data fusion methodologies
August 20, 2015. The work of D. Lahat and C. Jutten was supported by the Project CHESS were confined within the limits of psychometrics and
(2012-ERC-AdG-320684). The work of T. Adall was supported by the National Science
Foundation (NSF) under Grants IIS 1017718 and CCF 1117056. GIPSA-Lab is a partner of
chemometrics, the communities in which they evolved.
the LabEx PERSYVAL-Lab (ANR-11-LABX-0025). With recent technological advances, in a growing number of
D. Lahat and C. Jutten are with the GIPSA-Lab, UMR CNRS 5216, F-38402 Saint Martin
d’Hères, France (e-mail: [email protected];
domains, the availability of data sets that correspond to the
[email protected]). same phenomenon has increased, leading to increased
T. Adall is with the Department of Computer Science and Electrical Engineering,
University of Maryland Baltimore County, Baltimore, MD 21250 USA (e-mail:
interest in exploiting them efficiently. Many of the providers
[email protected]). of multiview, multirelational, and multimodal data are
Digital Object Identifier: 10.1109/JPROC.2015.2460697 associated with high-impact commercial, social, biomedical,
0018-9219 Ó 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Vol. 103, No. 9, September 2015 | Proceedings of the IEEE 1449
Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

environmental, and military applications, and thus the drive nonnegativity, low-rank, and independence, among others,
to develop new and efficient analytical methodologies is that can be useful to more than one specific application or
high and reaches far beyond pure academic interest. data set. Hence, we present these challenges in quite a
Motivations for data fusion are numerous. They include general framework that is not specific to an application,
obtaining a more unified picture and global view of the system goal, or data type. We also give examples and motivations
at hand; improving decision making; exploratory research; from different domains.
answering specific questions about the system, such as In order to contain our discussion, we focus on setups
identifying common versus distinctive elements across modal- in which a phenomenon or a system is observed using
ities or time; and in general, extracting knowledge from data multiple instruments, measurement devices, or acquisition
for various purposes. However, despite the evident potential techniques. In this case, each acquisition framework is
benefit, and massive work that has already been done in the denoted as a modality and is associated with one data set.
field (see, for example, [8]–[16] and references therein), the The whole setup, in which one has access to data obtained
knowledge of how to actually exploit the additional diversity from multiple modalities, is known as multimodal. A key
that multiple data sets offer is still at its very preliminary stages. property of multimodality is complementarity, in the sense
Data fusion is a challenging task for several reasons [11], that each modality brings to the whole some type of added
[17]–[19]. First, the data are generated by very complex value that cannot be deduced or obtained from any of the
systems: biological, environmental, sociological, and psy- other modalities in the setup. In mathematical terms, this
chological, to name a few, driven by numerous underlying added value is known as diversity. Diversity allows to
processes that depend on a large number of variables to reduce the number of degrees of freedom in the system by
which we have no access. Second, due to the augmented providing constraints that enhance uniqueness, interpret-
diversity, the number, type, and scope of new research ability, robustness, performance, and other desired prop-
questions that can be posed is potentially very large. Third, erties, as will be illustrated in the rest of this paper.
working with heterogeneous data sets such that the respective Diversity can be found in a broad range of scenarios, and
advantages of each data set are maximally exploited, and plays a key role in a wide scope of mathematical and
drawbacks suppressed, is not an evident task. We elaborate on engineering studies. Accordingly, we suggest the following
these matters in Sections II–IV. Most of these questions have operative definition for the special type of diversity that is
been devised only in the very recent years, and, as we show in associated with multimodality.
this paper, only a fraction of their potential has already been
exploited. Hence, we refer to them as ‘‘challenges.’’
A rather wide perspective on challenges in data fusion is Definition I.1: Diversity (due to multimodality) is
presented in [8], where Van Mechelen and Smilde discuss the property that allows to enhance the uses,
linked-mode decomposition models within the framework of benefits and insights (such as those discussed in
chemometrics and psychometrics, and [9], where Khaleghi Section II), in a way that cannot be achieved with a
et al. focus on ‘‘automated decision making’’ with special single modality.
attention to multisensor information fusion. In practice,
however, challenges in data fusion are most often brought up
within a framework dedicated to a specific application, model, Diversity is the key to data fusion, as will be explained in
and data set; examples will be given in the sections that follow. Section III. Furthermore, in Section III, we demonstrate
In this paper, we bring together a comprehensive (but how a diversity approach to data fusion can provide a fresh
definitely not exhaustive) list of challenges in data fusion. new look on previously well-known and well-founded data
Following from [8], [9], [16], and [19] (and others), and and signal processing techniques.
further emphasized by our discussion in this paper, it is As already noted, ‘‘data fusion’’ is quite a diffuse
clear that at the appropriate level of abstraction, the same concept that takes different interpretations with applica-
challenge in data fusion can be relevant to completely tions and goals [8], [9], [20]. Therefore, within the context
different and diverse applications, goals, and data types. of this paper, and in accordance with the types of problems
Consequently, a solution to a challenge that is based on a on which we focus, our emphasis is on the following
sufficiently data-driven, model-free approach may turn out tighter interpretation [21]:
to be useful in very different domains. Therefore, there is
an obvious interest in opening up the discussion of data
fusion challenges to include and involve disparate com- Definition I.2: Data fusion is the analysis of several
munities, so that each community could inform the others. data sets such that different data sets can interact
Our goal is to stimulate and emphasize the relevance and and inform each other.
importance of a perspective based on challenges to
advanced data fusion. More specifically, we would like to
promote data-driven approaches, that is, approaches with This concept will be given a more concrete meaning in
minimal and weak priors and constraints, such as sparsity, Sections III and V.