0% found this document useful (0 votes)
57 views29 pages

Multimodal Data Fusion Anoverview of Methods

This paper provides an overview of multimodal data fusion, which involves analyzing multiple datasets that provide information about the same phenomenon. The paper discusses why multimodal data fusion is useful, providing a more complete understanding than single datasets alone. It also covers several data-driven methods for performing multimodal data fusion, such as matrix and tensor decompositions, that account for diversity across datasets. The goal is to introduce readers to the broad field of multimodal data fusion and its opportunities across many application domains.

Uploaded by

Samuele Tesfaye
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views29 pages

Multimodal Data Fusion Anoverview of Methods

This paper provides an overview of multimodal data fusion, which involves analyzing multiple datasets that provide information about the same phenomenon. The paper discusses why multimodal data fusion is useful, providing a more complete understanding than single datasets alone. It also covers several data-driven methods for performing multimodal data fusion, such as matrix and tensor decompositions, that account for diversity across datasets. The goal is to introduce readers to the broad field of multimodal data fusion and its opportunities across many application domains.

Uploaded by

Samuele Tesfaye
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

INVITED

PAPER

Multimodal Data Fusion:


An Overview of Methods,
Challenges, and Prospects
This paper provides an overview of the main challenges in multimodal data fusion
across various disciplines and addresses two key issues: ‘‘why we need data fusion’’ and
‘‘how we perform it.’’
By Dana Lahat, Tülay Adali, Fellow IEEE , and Christian Jutten, Fellow IEEE

ABSTRACT | In various disciplines, information about the same KEYWORDS | Blind source separation; data fusion; latent
phenomenon can be acquired from different types of detectors, variables; multimodality; multiset data analysis; overview;
at different conditions, in multiple experiments or subjects, tensor decompositions
among others. We use the term ‘‘modality’’ for each such
acquisition framework. Due to the rich characteristics of natural
phenomena, it is rare that a single modality provides complete I. INTRODUCTION
knowledge of the phenomenon of interest. The increasing
Information about a phenomenon or a system of interest
availability of several modalities reporting on the same system
can be obtained from different types of instruments,
introduces new degrees of freedom, which raise questions
measurement techniques, experimental setups, and other
beyond those related to exploiting each modality separately. As
types of sources. Due to the rich characteristics of natural
we argue, many of these questions, or ‘‘challenges,’’ are
processes and environments, it is rare that a single
common to multiple domains. This paper deals with two key
acquisition method provides complete understanding
issues: ‘‘why we need data fusion’’ and ‘‘how we perform it.’’
thereof. The increasing availability of multiple data sets
The first issue is motivated by numerous examples in science
that contain information, obtained using different acqui-
and technology, followed by a mathematical framework that
sition methods, about the same system, introduces new
showcases some of the benefits that data fusion provides. In
degrees of freedom that raise questions beyond those
order to address the second issue, ‘‘diversity’’ is introduced as a
related to analyzing each data set separately.
key concept, and a number of data-driven solutions based on
The foundations of modern data fusion have been laid in
matrix and tensor decompositions are discussed, emphasizing
the first half of the 20th century [1], [2]. Joint analysis of
how they account for diversity across the data sets. The aim of
multiple data sets has since been the topic of extensive
this paper is to provide the reader, regardless of his or her
research, and earned a significant leap forward in the late
community of origin, with a taste of the vastness of the field, the
1960s/early 1970s with the formulation of concepts and
prospects, and the opportunities that it holds.
techniques such as multiset canonical correlation analysis
(CCA) [3], parallel factor analysis (PARAFAC) [4], [5], and
other tensor decompositions [6], [7]. However, until rather
Manuscript received April 25, 2015; accepted June 25, 2015. Date of current version
recently, in most cases, these data fusion methodologies
August 20, 2015. The work of D. Lahat and C. Jutten was supported by the Project CHESS were confined within the limits of psychometrics and
(2012-ERC-AdG-320684). The work of T. Adall was supported by the National Science
Foundation (NSF) under Grants IIS 1017718 and CCF 1117056. GIPSA-Lab is a partner of
chemometrics, the communities in which they evolved.
the LabEx PERSYVAL-Lab (ANR-11-LABX-0025). With recent technological advances, in a growing number of
D. Lahat and C. Jutten are with the GIPSA-Lab, UMR CNRS 5216, F-38402 Saint Martin
d’Hères, France (e-mail: [email protected];
domains, the availability of data sets that correspond to the
[email protected]). same phenomenon has increased, leading to increased
T. Adall is with the Department of Computer Science and Electrical Engineering,
University of Maryland Baltimore County, Baltimore, MD 21250 USA (e-mail:
interest in exploiting them efficiently. Many of the providers
[email protected]). of multiview, multirelational, and multimodal data are
Digital Object Identifier: 10.1109/JPROC.2015.2460697 associated with high-impact commercial, social, biomedical,
0018-9219 Ó 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Vol. 103, No. 9, September 2015 | Proceedings of the IEEE 1449
Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

environmental, and military applications, and thus the drive nonnegativity, low-rank, and independence, among others,
to develop new and efficient analytical methodologies is that can be useful to more than one specific application or
high and reaches far beyond pure academic interest. data set. Hence, we present these challenges in quite a
Motivations for data fusion are numerous. They include general framework that is not specific to an application,
obtaining a more unified picture and global view of the system goal, or data type. We also give examples and motivations
at hand; improving decision making; exploratory research; from different domains.
answering specific questions about the system, such as In order to contain our discussion, we focus on setups
identifying common versus distinctive elements across modal- in which a phenomenon or a system is observed using
ities or time; and in general, extracting knowledge from data multiple instruments, measurement devices, or acquisition
for various purposes. However, despite the evident potential techniques. In this case, each acquisition framework is
benefit, and massive work that has already been done in the denoted as a modality and is associated with one data set.
field (see, for example, [8]–[16] and references therein), the The whole setup, in which one has access to data obtained
knowledge of how to actually exploit the additional diversity from multiple modalities, is known as multimodal. A key
that multiple data sets offer is still at its very preliminary stages. property of multimodality is complementarity, in the sense
Data fusion is a challenging task for several reasons [11], that each modality brings to the whole some type of added
[17]–[19]. First, the data are generated by very complex value that cannot be deduced or obtained from any of the
systems: biological, environmental, sociological, and psy- other modalities in the setup. In mathematical terms, this
chological, to name a few, driven by numerous underlying added value is known as diversity. Diversity allows to
processes that depend on a large number of variables to reduce the number of degrees of freedom in the system by
which we have no access. Second, due to the augmented providing constraints that enhance uniqueness, interpret-
diversity, the number, type, and scope of new research ability, robustness, performance, and other desired prop-
questions that can be posed is potentially very large. Third, erties, as will be illustrated in the rest of this paper.
working with heterogeneous data sets such that the respective Diversity can be found in a broad range of scenarios, and
advantages of each data set are maximally exploited, and plays a key role in a wide scope of mathematical and
drawbacks suppressed, is not an evident task. We elaborate on engineering studies. Accordingly, we suggest the following
these matters in Sections II–IV. Most of these questions have operative definition for the special type of diversity that is
been devised only in the very recent years, and, as we show in associated with multimodality.
this paper, only a fraction of their potential has already been
exploited. Hence, we refer to them as ‘‘challenges.’’
A rather wide perspective on challenges in data fusion is Definition I.1: Diversity (due to multimodality) is
presented in [8], where Van Mechelen and Smilde discuss the property that allows to enhance the uses,
linked-mode decomposition models within the framework of benefits and insights (such as those discussed in
chemometrics and psychometrics, and [9], where Khaleghi Section II), in a way that cannot be achieved with a
et al. focus on ‘‘automated decision making’’ with special single modality.
attention to multisensor information fusion. In practice,
however, challenges in data fusion are most often brought up
within a framework dedicated to a specific application, model, Diversity is the key to data fusion, as will be explained in
and data set; examples will be given in the sections that follow. Section III. Furthermore, in Section III, we demonstrate
In this paper, we bring together a comprehensive (but how a diversity approach to data fusion can provide a fresh
definitely not exhaustive) list of challenges in data fusion. new look on previously well-known and well-founded data
Following from [8], [9], [16], and [19] (and others), and and signal processing techniques.
further emphasized by our discussion in this paper, it is As already noted, ‘‘data fusion’’ is quite a diffuse
clear that at the appropriate level of abstraction, the same concept that takes different interpretations with applica-
challenge in data fusion can be relevant to completely tions and goals [8], [9], [20]. Therefore, within the context
different and diverse applications, goals, and data types. of this paper, and in accordance with the types of problems
Consequently, a solution to a challenge that is based on a on which we focus, our emphasis is on the following
sufficiently data-driven, model-free approach may turn out tighter interpretation [21]:
to be useful in very different domains. Therefore, there is
an obvious interest in opening up the discussion of data
fusion challenges to include and involve disparate com- Definition I.2: Data fusion is the analysis of several
munities, so that each community could inform the others. data sets such that different data sets can interact
Our goal is to stimulate and emphasize the relevance and and inform each other.
importance of a perspective based on challenges to
advanced data fusion. More specifically, we would like to
promote data-driven approaches, that is, approaches with This concept will be given a more concrete meaning in
minimal and weak priors and constraints, such as sparsity, Sections III and V.

1450 Proceedings of the IEEE | Vol. 103, No. 9, September 2015


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

The goal of this paper is to provide some ideas, In this section, we try to provide, by numerous
perspectives, and guidelines as to how to approach data practical examples, a more concrete sense to what we
fusion. This paper is not a review, not a literature survey, mean when we speak of ‘‘diversity’’ and ‘‘multimodality.’’
not a tutorial, nor a cookbook. As such, it does not propose The examples below illustrate the complementary nature
or promote any specific solution or method. On the of multimodal data, and as a result, some of the prominent
contrary, our message is that whatever specific method or uses, benefits, and insights that can be obtained from
approach is considered, it should be kept in mind that it is properly exploiting multimodal data, especially as opposed
just one among a very large set, and should be critically to the analysis of single-set and single-modal data. They
judged as such. In the same vein, any example in this paper also present various complicating factors, due to which
should only be regarded as a concretization of a much multimodal data fusion is not an evident task. The purpose
broader idea. of this section is to show that multimodality is already
present in almost every field of science and technology,
and thus it is of potential interest to everyone.
How to read this paper? In order to make this paper
accessible for readers with various interests and A. Multisensory Systems
backgrounds, it is organized in two types of cross
sections. The first part (Sections II and III) deals with Example II-A.1: Audiovisual Multimodality: Audiovisual
the question ‘‘why?’’, i.e., why we need data fusion. multimodality is probably the most intuitive, since it uses
The second part (Sections IV and V) deals with the two of our most informative senses. Most human verbal
question ‘‘how?’’, i.e., how we perform data fusion. communication involves seeing the speaker [18]. Indeed, a
Each question is treated on two levels: data (Sections II large number of audiovisual applications involve human
and IV) and theory (Sections III and V). More speech and vision. In such applications, it is usually the
specifically, Section II presents the concepts of multi- audio channel that conveys the information of interest. It
modality and data fusion and motivates them using is well known that audio and video convey complementary
examples from various applications. In Section III, we information. Audio has the advantage over video that it
introduce the concept of diversity as a key to data does not require line of sight. On the other hand, the visual
fusion and give it a concrete mathematical formula- modality is resistant to various factors that make audio and
tion. Section IV discusses complicating factors that speech processing difficult, such as ambient noise,
should be addressed in the actual processing of reverberations, and other acoustic disturbances.
heterogeneous data. Section V gives some guidelines Perhaps the most striking evidence to the amount of
as to how to actually approach a data fusion problem caution that needs to be taken in the design and use of
from a model design perspective. Section VI con- multimodal systems is the ‘‘McGurk effect’’ [18]. In their
cludes our work. seminal paper, McGurk and McDonald [18] have shown
that presenting contradictory, or discrepant, speech [‘‘ba’’]
and visual lip movements [‘‘ga’’] can cause a human to
perceive completely different syllables [‘‘da’’]. These
II . WHAT IS MULTIM ODALITY? WHY DO unexpected results have since been the subject of ongoing
WE NE ED MULTIMODAL ITY? exploratory research on human perception and cognition
For living creatures, multimodality is a very natural [22, Sec. VI.A.5]. The McGurk effect serves as an
concept. Living creatures use external and internal indication that in real-life scenarios, data fusion can take
sensors, sometimes denoted as ‘‘senses,’’ in order to detect paths much more intricate than simple summation of
and discriminate among signals, communicate, cross- information. Not less important, it serves as a lesson that
validate, disambiguate, and add robustness to numerous fusing modalities can yield undesired results and severe
life-and-death choices and responses that must be taken degradation of performance if the underlying relationships
rapidly, in a dynamic and constantly changing internal and between modalities are not properly understood.
external environment. Today, audiovisual multimodality is used for a broad
The well-accepted paradigm that certain natural process- range of applications [10], [23]. Examples include: speech
es and phenomena can express themselves under completely processing, including speech recognition, speech activity
different physical guises is the raison d’être of multimodal detection, speech enhancement, speaker extraction, and
data fusion. Too often, however, very little is known about separation; scene analysis, including tracking a speaker
the underlying relationships among these modalities. within a group, biometrics and monitoring, for safety and
Therefore, the most obvious and essential endeavour to be security applications [24]; human–machine interaction
undertaken in any multimodal data analysis task is (HMI) [10]; calibration [10, Sec. V.C], [25]; and more.
exploratory: to learn about relationships between modalities,
their complementarity, shared versus modality-specific Example II-A.2: Human–Machine Interaction: A domain
information content, and other mutual properties. that is heavily inspired by natural multimodality is HMI. In

Vol. 103, No. 9, September 2015 | Proceedings of the IEEE 1451


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

HMI, an important task is to design modalities that will scanner. EEG and MEG data are a set of time-series signals
make HMI as natural, efficient, and intuitive as possible reflecting voltage or neuromagnetic field changes recorded
[11]. The idea is to combine multiple interaction modes at each of the (usually a few dozen of) electrodes attached
based on audiovision, touch, smell, movement (e.g., to the scalp (EEG) or fixed within an MEG scanner helmet.
gesture detection and user tracking), interpretation of The sensitivity of EEG and MEG to deep-brain signals is
human language commands, and other multisensory limited. In addition, they have different selectivity to
functions [10], [11]. The principal point that makes HMI signals as a function of brain morphology. Therefore, they
stand out among other multimodal applications that we provide data at much poorer spatial resolution and do not
mention is that, in HMI, the modalities are often have access to the full brain volume. Consequently, the
interactive (as their name implies). Unlike other multi- spatiotemporal information provided by EEG, MEG, and
modal applications that we mention, not one but two very fMRI is highly complementary. Functional imaging
different types of systems (human and machine) are techniques can be complemented by other modalities
‘‘observed’’ by each other’s sensors, and the goal of data that convey structural information. For example, structural
fusion is not only to interpret each system’s output, but magnetic resonance imaging (sMRI) and diffusion tensor
also to actively convey information between these two imaging (DTI) report on the structure of the brain in terms
systems. An added challenge is that this task should usually of gray matter, white matter, and cerebrospinal fluid.
be accomplished in real time. An additional complicating sMRI is based on nuclear magnetic resonance of water
factor that makes multimodal HMI stand out is due to the protons. DTI measures the diffusion process of molecules,
fact that the human user often plays an active part in the mainly water, and thus reports also on brain connectivity.
choice of modalities (from the available set) and in the way Each of these methods is based on different physical
that they are used in practice. This implies that the design principles and is thus sensitive to different types of
of the multimodal setup and data fusion procedure must properties within the brain. In addition, each method has
rely not only on the theoretically and technologically different pros and cons in terms of safety, cost, accuracy,
optimal combination of data streams but also on the ability and other parameters. Recent technological advances
to predict and adapt to the subjective cognitive preferences allow recording data from several functional brain imaging
of the individual user. We refer to [11] (and references techniques simultaneously [26], [27], thus further moti-
therein) for further discussion of these aspects. vating advanced data fusion.
It is a well-accepted paradigm in neuroscience that
B. Biomedical, Health EEG and fMRI carry complementary information about
brain function [26], [28]. However, their very heteroge-
Example II-B.1: Understanding Brain Functionality: Func- neous nature and the fact that brain processes are very
tional brain study deals with understanding how the complicated systems that depend on numerous latent
different elements of the brain take part in various phenomena imply that simultaneously extracting useful
perceptual and cognitive activities. Functional brain study information from them is not an evident task. The fact that
largely relies on noninvasive imaging techniques, whose there is no ground truth is reflected in the very broad range
purpose is to reconstruct a high-resolution spatiotemporal of methods and approaches that are being proposed [12],
image of the neuronal activity within the brain. The [15], [17], [21], [28]–[31]. Works on biomedical brain
neuronal activity within the brain generates ionic currents imaging often emphasize the exploratory nature of this
that are often modeled as dipoles. These dipoles induce task. Despite decades of study, the underlying relationship
electric and magnetic fields that can be directly recorded between EEG and fMRI is far from being understood [17],
by electroencephalography (EEG) and magnetoencephalo- [29], [30], [32].
graphy (MEG), respectively. In addition, neuronal activity A well-known challenge in brain imaging is the EEG
induces changes in magnetization between oxygen-rich inverse problem. A prevalent assumption is that the measured
and oxygen-poor blood, known as the haemodynamic EEG signal is generated by numerous current dipoles within
response. This effect, also called blood-oxygen-level- the brain, and the goal is to localise the origins of this
dependent (BOLD) changes, can be detected by functional neuronal activity. Often formulated as a linear inverse
magnetic resonance imaging (fMRI). Therefore, fMRI is problem, it is ill-posed: many different spatial current
an indirect measure of neuronal activity. These three patterns within the skull can give rise to identical measure-
modalities register data at regular time intervals and thus ments [33]. In order to make the problem well-conditioned,
reflect temporal dynamics. However, these techniques additional hypotheses are required. A large number of
vary greatly in their spatiotemporal resolutions: EEG and solutions are based on adding various priors to the EEG data
MEG data provide high temporal (in milliseconds) [34]. Alternatively, an identifiable and unique solution can be
resolution, whereas fMRI images have low temporal (in obtained using spatial constraints from fMRI [12], [22], [30].
seconds) resolution. fMRI data are a set of high-resolution
3-D images, taken at regular time intervals, representing Example II-B.2: Medical Diagnosis: Various medical
the whole volume of the brain of a patient lying in an fMRI conditions such as potentially malignant tumors cannot

1452 Proceedings of the IEEE | Vol. 103, No. 9, September 2015


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

be diagnosed by a single type of measurement due to many rate, and others. Due to the fact that epileptic seizures
factors such as low sensitivity, low positive predictive vary within and across patients, and due to the complex
values, low specificity (high false-positive), a limited relations between different body systems, it is likely
number of spatial samples (as in biopsy), and other that any such system should rely on more than one
limitations of the various assessment techniques. In order modality [38].
to improve the performance of the diagnosis, risk
assessment, and therapy options, it is necessary to perform C. Environmental Studies
numerous medical assessments based on a broad range of
medical diagnostic techniques [35], [36]. For example, one Example II-C.1: Remote Sensing and Earth Observations:
can augment physical examination, blood tests, biopsies, Various sensor technologies can report on different aspects
static and functional magnetic resonance imaging, with of objects on Earth. Passive optical hyperspectral (respec-
other parameters such as genetic, environmental, and tively, multispectral) imaging technologies report on
personal risk factors. The question of how to analyze all material content of the surface by reconstructing its
these simultaneously available resources is largely open. spectral characteristics from hundreds of (respectively, a
Currently, this task relies mostly on human medical few) narrow (respectively, broad) adjacent spectral bands
experts. One of the main challenges is the automation of within the visible range and beyond. A third type of an
such decision procedures, in order to improve correct optical sensor is panchromatic imaging, which generates a
interpretation, as well as save costs and time [35]. monochromatic image with a much broader band. Typical
spatial resolutions of hyperspectral, multispectral, and
Example II-B.3: Developing Noninvasive Medical Diagnosis panchromatic images are tens of meters, a few meters, and
Techniques: In some cases, the use of multimodal data less than one meter, respectively. Hence, there exists a
fusion is only a first step in the design of a single-modal tradeoff between spectral and spatial resolution [13, Ch.
system. In [37], the challenge is understanding the link 9], [39], [40]. Topographic information can be acquired
between surface and intracardiac electrodes measuring the from active sensors such as light detection and ranging
same atrial fibrillation event and the goal is eventually (LiDAR) and synthetic aperture radar (SAR). LiDAR is
extracting relevant atrial fibrillation activity using only the based on a narrow pulsed laser beam and thus provides
noninvasive modality. For this aim, the intracardiac highly accurate information about distance to objects, i.e.,
modality is exploited as a reference to guide the extraction altitude. SAR is based on radio waves that illuminate a
of an atrial electrical signal of interest from noninvasive rather wide area, and the backscattered components
electrocardiography (ECG) recordings. The difficulty lies reaching the sensor are registered; interpreting the
in the fact that the intracardiac modality provides a rather reflections from the surface requires some additional
pure signal whereas the ECG signal is a mixture of the processing with respect to (w.r.t.) LiDAR. Both technol-
desired signal with other sources, and the mixing model ogies can provide information about elevation, 3-D
is unknown. structure of the observed objects, and their surface
properties. LiDAR, being based on a laser beam, generally
Example II-B.4: Smart Patient Monitoring: Health mon- reports on the structure of the surface, although it can
itoring using multiple types of sensors is drawing partially penetrate through certain areas such as forest
increasing attention from modern health services. The canopy, providing information on the internal structure of
goal is to provide a set of noninvasive, nonintrusive, the trees, for example. This ability is a mixed blessing,
reasonable-cost sensors that allow the patient to run a however, since it generates reflections that have to be
normal life while providing reliable warnings in real time. accounted for. SAR and LiDAR use different electromag-
Here, we focus on monitoring, predicting, and warning netic frequencies and thus interact differently with
epileptic patients from potentially dangerous seizures [38]. materials and surfaces. As an example, depending on the
The gold standard in monitoring epileptic seizures is wavelength, SAR may see the canopy as a transparent
combining EEG and video, where EEG is manually object (waves reach the soil under the canopy), a semi-
analyzed by experts and the whole diagnostic procedure transparent object (they penetrate the canopy and interact
requires a stay of up to several days in a hospital setting. with it) or an opaque object (they are reflected by the top
This procedure is expensive, time consuming, and of the canopy). Optical techniques are passive, which
physically inconvenient for the patient. Obviously, it is implies that they rely on natural illumination. Active
not practical for daily life. While much effort has already sensors such as LiDAR and SAR can operate at night and in
been dedicated to the prediction of epileptic seizures from shaded areas [41].
EEG, with no clear-cut results so far, a considerable Beyond the strengths and weaknesses of each technol-
proportion of potentially lethal seizures are hardly ogy w.r.t. the others, the use of each is limited by a certain
detectable by EEG at all. Therefore, a primary challenge inherent ambiguity. For example, hyperspectral imaging
is to understand the link between epileptic seizures and cannot distinguish between objects made of the same
additional body parameters: movement, breathing, heart material that are positioned at different elevations, such as

Vol. 103, No. 9, September 2015 | Proceedings of the IEEE 1453


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

concrete roofs and roads. LiDAR cannot distinguish systems provide large spatial coverage; however, they are
between objects with the same elevation and surface less accurate in measuring precipitation at ground level
roughness that are made of different materials such as (e.g., [48]). Microwave links are deployed by cellular
natural and artificial grass [42]. SAR images may providers for backhaul communication between base
sometimes be difficult to interpret due to their complex stations. The signals transmitted by the base stations are
dependence on the geometry of the surface [41]. influenced by various atmospheric phenomena (e.g., [49]),
In real-life conditions, interpretability of the observa- primarily attenuation due to rainfall [46], [47]. These
tions of one modality may be difficult without additional changes in signal strength are recorded at predefined time
information. For example, in hyperspectral imaging, on a intervals and kept in the cellular provider’s logs. Hence,
flat surface, reflected light depends on the abundance the precipitation data is in fact a ‘‘reverse engineering’’ of
(proportion of a material in a pixel) and on the endmember this information. The microwave links’ measurements
(pure material present in a pixel) reflectance. In a nonflat provide average precipitation on the entire link and close
surface, the reflected light depends also on the topography, to ground level [46]. Altogether, these technologies are
which may induce variations in scene illumination and largely complementary in their ability to detect and
scattering. Therefore, in nonflat conditions, one cannot distinguish between different meteorological phenomena,
accurately extract material content information from spatial coverage, temporal resolution, measurement error,
optical data alone. Adding a modality that reports on the and other properties. Therefore, meteorological data are
topography, such as LiDAR, is necessary to resolve spectra often combined for better accuracy, coverage and resolu-
accurately [43]. tion; see, e.g., [19], [47], [48], and references therein.
As an active initiative, we point out the yearly data
fusion contest of the IEEE Geoscience and Remote Sensing Example II-C.3: Cosmology: A major endeavour in
Society (GRSS) (see dedicated paper in this issue [44]). astronomy and astrophysics is understanding the forma-
Problems addressed include multimodal change detection, tion of our Universe. Recent results include robust support
in which the purpose is to detect changes in an area before for the six-parameter standard model of cosmology, of a
and after an event (a flood, in this case), given SAR and Universe dominated by Cold Dark Matter and a cosmo-
multispectral imaging, using either all or part of the logical constant L, known as LCDM [50], [51]. The
modalities [45]; multimodal multitemporal data fusion of purpose of ongoing and planned sky surveys is to decrease
optical, SAR, and LiDAR images taken at different years the allowable uncertainty volume of the six-dimensional
over the same urban area, where suggested applications LCDM parameter space and to improve the constraints on
include assessing urban density, change detection and the other cosmological parameters that depend on it [51].
overcoming adverse illumination conditions for optical The goal is to validate (or disprove) the standard model.
sensors [41]; and proposing new methods for fusing A major difficulty in astrophysics and cosmology is the
hyperspectral and LiDAR data of the same area, e.g., for absence of ground truth. This is because cosmological
improved classification of objects [42]. processes involve very high energies, masses, large space
and time scales that make experimental study prohibitive.
Example II-C.2: Meteorological Monitoring: Accurate The lack of ground truth and experimental support implied
measurements of atmospheric phenomena such as rain, that, from its very beginning, cosmological research had to
water vapor, dew, fog, and snow are required for rely on cross validation of outcomes of different observa-
meteorological analysis and forecasting, as well as for tions, numerical simulations, and theoretical analysis; in
numerous applications in hydrology, agriculture, and other words, data fusion. A complicating factor associated
aeronautical services. Data can be acquired from various with this task is the fact that in many types of inferences,
devices such as rain gauges, radars, satellite-borne remote for all practical purposes, we have only one realization of
sensing devices (see Example II-C.1), and recently also by the Universe. This means that even if we make statistical
exploiting existing commercial microwave links [46]. Rain hypotheses about underlying processes, there is still only
gauges, as an example, are simply cups that collect the one sample. This fact induces an uncertainty called
precipitation. Albeit the most direct and reliable tech- ‘‘cosmic variance’’ that cannot be accommodated by
nique, their small sampling area implies very localized improving the measurement precision.
representativeness and thus poor spatial resolution (e.g., Despite its simplicity, the LCDM model has proved to
[46] and [47]). Rain gauges may be read automatically at be successful in describing a wide range of cosmological
intervals as short as seconds. Satellites observe Earth at data [52]. In particular, it is predicted that its six
different frequencies, including visible, microwave, infra- parameters can fully explain the angular power spectra
red, and shortwave infrared to report on various of the temperature and polarization fluctuations of the
atmospheric phenomena such as water vapor content and cosmic microwave background radiation (CMB). There-
temperature. The accuracy of radar rainfall estimation may fore, since the first experimental discovery of the CMB in
be affected by topography, beam effects, distance from the 1965 [53], there has been an ongoing effort to obtain better
radar, and other complicating factors. Radars and satellite and more accurate measurements of these fluctuations.

1454 Proceedings of the IEEE | Vol. 103, No. 9, September 2015


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

A severe problem in validating the LCDM model from diversity and data fusion in data sets that are stacked in a
CMB observations is known as ‘‘parameter degeneracy.’’ single matrix or a higher order array, also known as a
Although the CMB power spectrum can be fully explained tensor. In Section III-D, we go beyond single-array data
by the standard model, this relationship is not unique in the analysis, and establish the idea of ‘‘a link between data sets
sense that the same measured CMB power spectrum can be as a new form of diversity’’ as the key to advanced data
explained by other models, not only LCDM. These fusion. We conclude our claims and summarize these ideas
degeneracies can be broken by combining CMB observa- in Section III-E.
tions with other cosmological data. While CMB corre-
sponds to photons released about 300 000 years after the
A. Mathematical Preliminaries
Big Bang, the same parameters that controlled the
In a large number of applications, one is interested in
evolution of the early Universe continue to influence its
extracting knowledge from the data. In real-life scenarios,
matter distribution and expansion rate to our very days.
each observation or measurement often consists of
Therefore, other measures, such as redshift from certain
contributions from multiple sources. These can be divided
types of supernovae, angular and radial baryon acoustic
into sources of interest, which carry valuable information,
oscillation scales that can be derived from galaxy surveys,
and other sources, which do not carry any information of
galaxy clustering [54], [55], and stacked gravitational
interest. The latter type of contribution is sometimes
lensing, also serve as important cosmological probes [51],
referred to as noise, or interference, depending on the
[52]. Since the cosmological parameters that determine the
scenario and context.
evolution of the early Universe are the same as those that
Consider one point x in the measurement space. We
control high-energy physics, cosmological observations are
can approximate it as (we write equality but we mean that
fused and cross validated with experimental outcomes such
we attribute a certain model to it)
as the Large Hadron Collider Higgs data [56].

x ¼ f ðzÞ (1)
II I. MULTIM ODALITY AS A FORM OF
DIVERSITY
In this section, we discuss data fusion from a theoretical where z ¼ fz1 ; . . . ; zV g is the ensemble of points in the
perspective. In order to contain our discussion, we focus latent variable space. These could be signals, parameters,
on data-driven methods. Within these, we restrict our or any other elements that contribute to the observation x,
examples to a class of problems known as blind separation, and f represents the corresponding transformation (e.g.,
and within these, to data and observations that can be channel effects). We are interested in scenarios where z is
represented by (multi)linear relationships. Reasons are as unknown, and in addition, cannot be observed directly
follows. First, by definition, data-driven models may be without the intermediate transformation f . In certain
useful to numerous applications, as will be explained in scenarios, also f is unknown. We denote all the unknown
Section III-B. Second, there exist much established theory elements of the model as ‘‘latent variables.’’
and numerous models that fit into this framework. Third, Perhaps the first and most obvious interpretation of (1) is
it is impossible to cover all types of models. Still, the an inverse problem, where the goal is to obtain an estimate as
ideas that these examples illustrate go far beyond these precise as possible of z and f given x. Recovering f and z can
specific models. also be regarded as finding the simplest set of variables that
A key property in any analytical model is uniqueness. explains the observations [5, Sec. I]. This interpretation
Uniqueness is necessary in order to achieve interpretabil- particularly corresponds to exploratory research. In addition,
ity, i.e., attach physical meaning to the output [2], [5]. In and especially when the number of observations is large
order to establish uniqueness, all blind separation w.r.t. the size of z, recovering the smallest-size z that best
problems invariably rely on one or more types of diversity explains the observations can be regarded as a form of
[57]: concrete mathematical examples will be given in compression. This can be particularly useful in large-scale
Section III-C1. In this section, we show how the concept of data scenarios. It is clear that in order to solve (1), one needs
‘‘diversity’’ plays part, under different guises, in data a sufficient number of constraints in order to (over)
fusion. In particular, we show that multimodality can determine the problem, i.e., constrain the number of degrees
provide a new form of diversity that can achieve of freedom such that the problem is well posed.
uniqueness even in cases that are not unique otherwise. In the rest of this paper, we use standard mathematical
The rest of this section is as follows. Section III-A notations. Scalars, vectors, matrices, and higher order
presents some basic mathematical preliminaries that will arrays (tensors) are denoted as a, a, A, and A,
serve us to provide a more concrete meaning to the ideas respectively. The dimensions of an Nth-order array
that we lay out in the rest of this work. Section III-B (tensor) are I1  I2      IN , where N ¼ 1; 2; 3; . . .
explains the model-driven approach versus the data-driven implies a vector, matrix, or higher order array (tensor),
approach, and motivates the latter. Section III-C discusses respectively. ðÞ> denotes transpose or conjugate transpose,

Vol. 103, No. 9, September 2015 | Proceedings of the IEEE 1455


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

where the exact interpretation should be understood from and linear and multilinear algebra. We begin by discussing
the context. diversity in data sets that can be stacked in a single array,
be it a matrix or a higher order array.
B. Data-Driven Versus Model-Driven Methods
Roughly, and for the sake of the discussion that follows, 1) Diversity in Matrix Decomposition Models: Perhaps the
approaches to the problem in Section III-A can be divided most simple yet useful implementation of (1) is
into two groups: model driven and data driven. Model-
driven approaches rely on an explicit realistic model of the
underlying processes [12], [27, Sec. 3.3], [58], and are X
R
x¼ ar br : (2)
generally successful if the assumptions are plausible and r¼1
the model holds. However, model-driven methods may not
always be the best choice, for example, when the
underlying model of the signals or the medium in which In many applications, model (2) is generalized as
they propagate is too complicated, varying rapidly, or
simply unknown.
In the context of multimodal data sets that are X
R
xij ¼ air bjr (3)
generated by complex systems as those mentioned in r¼1
Sections I and II, very little is known about the underlying
relationships between modalities. The interactions be-
tween data sets and data types are not always known or where i ¼ 1; . . . ; I, j ¼ 1; . . . ; J. An often-used interpreta-
sufficiently understood. Therefore, we focus on and tion is that xij is a linear combination of R signals
advocate a data-driven approach. In practice, this means bj1 ; . . . ; bjR impinging on sensor i at sample index j, with
making the fewest assumptions and using the simplest weights ai1 ; . . . ; aiR . Equation (3) can be rewritten in a
models, both within and across modalities [5]. ‘‘Simple’’ matrix form as
means, for example, linear relationships between vari-
ables, avoiding model-dependent parameters, and/or use
of model-independent priors such as sparsity, nonnegativ- X
R
>
ity, statistical independence, low rank, and smoothness, to X¼ ar b>
r ¼ AB (4)
r¼1
name a few. As its name implies, a data-driven approach is
self-contained in the sense that it relies only on the
observations and their assumed model: it avoids external such that xij is the ði; jÞth entry of X 2 K IJ , K 2 fR; Cg,
input [5]. For this reason, and especially in the signal and similarly for A 2 K IR and B 2 K JR . The rth column
processing community, data-driven methods are some- vectors of A and B are ar ¼ ½a1r ; . . . ; aIr > and br ¼
times termed ‘‘blind.’’ In the rest of this section, we give a ½b1r ; . . . ; bJr > , respectively.
more concrete meaning to these ideas. The model in (4) provides I linear combinations of the
Data-driven methods, both single modal and multi- columns of B and J linear combinations of the columns of
modal, have already proven successful in a broad range of A [57]. In the terminology of [57], X provides I-fold
problems and applications. A noncomprehensive list diversity for B and J-fold diversity for A. Unfortunately,
includes astrophysics [59], biomedics [60], telecommuni- these types of diversity are generally insufficient to
cations [61], audiovision [23], chemometrics [62], and retrieve the underlying factor matrices A and B. For
more. For further examples, see, e.g., [63]–[65] and any R  R invertible matrix T, it always holds that
references therein, as well as the numerous models
mentioned in the rest of this paper.
In the rest of this section, we discuss and explain the X ¼ AB> ¼ ðAT1 ÞðTB> Þ: (5)
role of diversity in achieving uniqueness in data-driven
models. In particular, we demonstrate how the presence
of multiple data sets can be exploited as a new form of Hence, the pairs ðAT1 ; TB> Þ and ðA; B> Þ have the
diversity. same contribution to the observations X and thus cannot
be distinguished. Consequently, one cannot uniquely identify
the rank-1 terms ar b>r unless R  1 [66, Lemma 4i]. We
C. Diversity in Single Matrix or Tensor refer to this matter as the indeterminacy problem. A
Decomposition Models prevalent approach is to reduce T to a unitary matrix using
Earlier in this section, we argued that diversity has a a simplifying assumption that the columns of B are
key role in achieving uniqueness of analytical models. We decorrelated. In such cases, the indeterminacy (5) is
now give a concrete mathematical meaning to this referred to as the rotation problem [2], [5], [67, Sec. 4].
statement, by way of examples from signal processing Conversely, even if the rank-1 terms are known, it is clear

1456 Proceedings of the IEEE | Vol. 103, No. 9, September 2015


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

from (4) that they can be uniquely characterized, at most, up ‘‘sources,’’ and xðtÞ 2 K I1 their observations. A is full
to ðr ar Þðr br Þ> ¼ ar b> r , r r ¼ 1, if R  minðI; JÞ. The column rank. The link with (4) is established via
latter amounts to T ¼ Pm, where P is a permutation matrix X ¼ ½xð1Þ; . . . ; xðTÞ, J ¼ T, and B> ¼ ½sð1Þ; . . . ; sðTÞ
and m is diagonal and invertible. The presence of P implies such that the R columns of B represent samples from the R
that the indexing 1; . . . ; R is arbitrary. This indeterminacy is statistically independent random processes. ICA uses the
inherent to the problem and thus inevitable. If all ‘‘spatial diversity’’ provided by an array of sensors, which
decompositions yield the same rank-1 terms then we say amounts to the I-fold diversity for B mentioned before,
that the model is unique. The fact that the factorization of a together with an assumption of statistical independence on
matrix into a product of several matrices is generally not the sources, in order to obtain estimates of sðtÞ whose
unique for R > 1 unless additional constraints are imposed is entries are as statistically independent as possible. This
well known [66, Sec. 3]. amounts to fixing the indeterminacy (5). Under these
We now discuss approaches to fix the indeterminacy in assumptions, separation can be achieved if the statistically
(5). In a general algebraic context, matrix factorizations independent sources are nonstationary, nonwhite, or non-
such as singular value decomposition (SVD) and eigenval- Gaussian [71], [80]–[82]. The first two can be interpreted
ue decomposition (EVD) are made unique by imposing as diversity across time or diversity in the spectral domain:
orthogonality on the underlying matrices and inequality on the sources must have different nonstationarity profiles or
the singular or eigenvalues [66, Sec. 3], [68]. Such power spectra [81, Sec. 6]. Non-Gaussianity is associated
constraints are convenient mathematically but usually with diversity in higher order statistics (HOS). A plethora
physically implausible since they yield noninterpretable of methods has been devised to exploit this diversity [63],
results [69]. It is thus desirable to find other types of [80], [83]–[86], and the matter is far from being exhausted.
constraints that allow for better representation of the Both FA and ICA have been used for decades and with
natural properties of the data. much success to analyze a very broad range of data, their
Depending on the application, the matrix factorization success being much due to the simplicity of their basic idea
model in (4) may be interpreted in different ways that give and the fact that very robust algorithms exist that yield
rise to different types of constraints. When the model in satisfying results. Therefore, they are at the focus of our
(4) is used to analyze data, it is sometimes termed factor discussion. It should be kept in mind, however, that in
analysis (FA) [70]. In the signal processing community, practice, many observations can be better explained by
when the columns of B represent signal samples and the other types of underlying models that are not limited to
goal is to recover these signals given only the observations decomposition into a sum of rank-1 terms, statistical
X, model (4) is commonly associated with the blind source independence, linear relationships, or even matrix factor-
separation (BSS) problem [63], [71]. The goal of FA and izations. Other properties that are often used to achieve
BSS is to represent X as a sum of low-rank terms with uniqueness, improve numerical robustness and enhance
interpretable factors [65], where the difference lies in the interpretability are, for example, nonnegativity, sparsity,
type of assumptions being used. and smoothness [63]. Proving uniqueness for these types
In FA, one approach to fixing the indeterminacy (5) is of factorizations is a matter of ongoing research.
by imposing external constraints [5, Sec. I]. This is not a
data-driven approach and is thus excluded from our
discussion. A data-driven approach to FA is to use Any type of constraint or assumption on the
physically meaningful constraints on the factor matrices underlying variables that helps achieve essential
that reduce the number of degrees of freedom. For uniqueness can be regarded as a ‘‘diversity.’’
example, a specific arrangement of a receive antenna array
or other properties of a communication system may be
imposed via a Vandermonde [72]–[75] or Toeplitz [76]
structure. Alternatively, a factor may reflect a specific 2) Going Up to Higher Order Arrays: In Section III-C1, we
signal type such as constant modulus or finite alphabet have seen that the two linear types of diversity that are
[57], [61]. Another approach is to use sparsity [77]–[79]. present in the rows and columns of X are not sufficient in
Probably the most well-known BSS approach to fix the order to obtain a unique matrix factorization. We saw that
indeterminacy in (5) is independent component analysis uniqueness can be established by imposing sufficiently
(ICA). ICA is more commonly formulated as strong constraints on the factor matrices A and B in (4).
An alternative approach is to enrich the observational
domain, without constraining the factor matrices. For
xðtÞ ¼ AsðtÞ; t ¼ 1; . . . ; T (6) example, if the two linear diversities given by the 2-D array
X are interpreted as spatial and temporal, it is possible to
obtain uniqueness by adding a third diversity in the
where sðtÞ ¼ ½s1 ðtÞ; . . . ; sR ðtÞ> 2 K R1 is a vector of R frequency domain, without imposing constraints on the
statistically independent random processes known as factor matrices. We now explain how this can be done.

Vol. 103, No. 9, September 2015 | Proceedings of the IEEE 1457


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

The two-way model (4) can be generalized by outcome is that underdetermined cases of ‘‘more sources
extending (3) to than sensors’’ can be handled straightforwardly. In
addition, the factor matrices A; B; C need not be full
column rank [67], [89], [90, Th. 2.2]; see Example III-D.2.
X
R
Upper bounds on R have first been derived in [88] and
xijk ¼ air bjr ckr (7)
r¼1
[89]. These results have later been extended to higher
order arrays, where ‘‘order’’ indicates the number N of
indices xijk and N  3 [57], [72], [91]. Recently, more
with i ¼ 1; . . . ; I, j ¼ 1; . . . ; J, k ¼ 1; . . . ; K. These obser- relaxed bounds that guarantee uniqueness for larger R have
vations can be collected into a three-way array (third-order been derived; see, e.g., [92]–[97] and references therein.
tensor) with dimensions I  J  K In analogy to (4), the three-way array X provides three
modes of linear diversity. It contains JK linear combina-
tions of the columns of A, IK of B and IJ of C [57]. The
X
R
fact that there exist multiple linear relationships within
X¼ ar  br  cr (8) the model gives it the name ‘‘multilinear.’’ As argued in
r¼1
[57], in many real-life scenarios, often there exist N  3
linear types of diversity that admit the multilinear
whose ði; j; kÞth entry is xijk . A ¼ ½a1 ; . . . ; aR  2 K IR , decomposition (8) and thus guarantee uniqueness without
B ¼ ½b1 ; . . . ; bR  2 K JR , and C ¼ ½c1 ; . . . ; cR  2 K KR any further assumptions. For example, in direct-sequence
are matrices whose column vectors are ar , br , and cr ¼ code-division multiple-access (DS-CDMA) communication
½c1r ; . . . ; cKr > , respectively. Here, ar  br  cr 2 K IJK is systems, one may exploit (spatial  temporal  spreading
an outer product of three vectors and thus is a rank-1 term. code) [57] or (sensor  polarization  source signal) types
Its ði; j; kÞth entry is air bjr ckr . When (8) holds and is of diversity; in psychometrics, (occasions  persons 
irreducible in the sense that R is minimal, it is sometimes tests) [70] or (observations  scores  variables) [98]; in
referred to as the canonical polyadic decomposition (CPD) chemometrics and metabolomics, (sample  frequency 
[4], [87]. Note that (4) can be rewritten as X ¼ emission profile  excitation profile) [8], [62], [99]; in
P R
r¼1 ar  br .
polarized Raman spectroscopy, (polarization  spatial
In striking difference to (5), the pair fðA; B; CÞ; diversity  wavenumber) [100]; in EEG, (time 
ðA; B; CÞg has the same triple product (8) if and only if frequency  electrode) [101]–[103]; and in fMRI, (voxels
there exists an R  R permutation matrix P and three  scans  subjects) [104].
diagonal matrices mA ; mB ; mC such that

Each type of constraint, structural (i.e., on the factor


A ¼ APmA ; B ¼ BPmB ; C ¼ CPmC ; matrices) or observational (i.e., any of the nonde-
and mA mB mC ¼ IR (9) generate modes of a matrix or a higher order array),
that contributes to the unique decomposition and
thus to the identifiability of the model, and cannot
even for R > 1, under very mild constraints on A, B, C be deduced from the other constraints, i.e., is
[66], P[67], [88]. Equation (9) can be reformulated as ‘‘disjoint’’ [16], can be regarded as a ‘‘diversity.’’ In
X ¼ Rr¼1 ðr ar Þ  ðr br Þ  ðr cr Þ 8r r r ¼ 1. If a three- particular, each observational mode in the Nth order
way array is subject only to these trivial indeterminacies tensor (8) is a ‘‘diversity.’’ Hence, a tensor order
(alternatively: if all CPDs yield the same rank-1 terms) corresponds to the number of types of (observation-
then we say that it is (essentially) unique. al) diversity [57], [61].

The key difference between matrix and tensor The explicit link between tensor order as a diversity and
factorizations is that CPD is inherently ‘‘essentially data fusion has been made in [16]. The fact that we can
unique’’ up to a scaled permutation matrix, whereas now associate ‘‘diversity’’ with well-defined mathematical
in the bilinear case the indeterminacy is an arbitrary properties of an analytical model implies that we can now
nonsingular matrix. link results on uniqueness, identifiability, and perfor-
mance with the number of types of diversity that this
model involves. Hence, the contribution of each ‘‘diver-
The uniqueness of the CPD becomes even more pro- sity’’ to the model can now be characterized and quantified
nounced when it is joined with the fact that it holds also [57], [82].
for R > maxðI; J; KÞ [67], [89]. This is in contrast to FA, An application of this idea is the question raised in
where it holds only for R  minðI; JÞ. The immediate [57] as to how the number of types of observational

1458 Proceedings of the IEEE | Vol. 103, No. 9, September 2015


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

diversity, i.e., tensor order N  3, contributes to the The discussion in this section implies that if we can
identifiability. To answer this question, it is shown that as represent our observations in terms of N  3 linear types
N increases, indeed the bound on the number of rank-1 of diversity or stack multiple data sets in an Nth-order
terms that can be uniquely identified becomes more tensor then we may benefit from the following powerful
relaxed. In other words, more observational modes allow properties.
to identify more sources in the same setup. Hence, this is
a proof that increasing observational diversity improves
identifiability. This is an example how questions regarding Why are tensor decompositions useful for data fusion?
multimodality and diversity are a stimulus for new (1) The model for R  1 rank-1 terms is identifiable:
mathematical and theoretical insights. The exact maximal number of identifiable rank-1 terms
Until now, we have looked at N-way arrays as a way to is generally unknown, though bounds that depend on
represent simultaneously N (multi)linear types of diversi- various properties of the factor matrices exist.
ty. An interesting link with the matrix factorization (2) Underdetermined mixtures are identifiable: iden-
problem in Section III-C1 is achieved if we look at an tification of R  1 rank-1 terms is possible even for
N-way array as a structure that stores ðN  1Þ-way arrays ‘‘more sources than sensors’’ cases.
by stacking them along the Nth dimension. As noted, e.g., (3) Factor matrices need not be full rank: identifia-
in [4], [5], [70], [89], and [105], the CPD can be thought of bility of R  1 rank-1 terms is possible even if no factor
as a generalization of FA, as follows. Let matrix A; B; C; . . ., is of full rank.
(4) Rank-1 terms are identifiable up to permutation:
when a tensor decomposition is interpreted as joint
analysis of lower order tensors, the arbitrary individual
Xk ¼ Amk B> ; k ¼ 1; . . . ; K (10)
permutation that arises if each decomposition is done
separately becomes common to all decompositions.
(5) Increasing N allows uniqueness for higher R: more
denote K instances of the FA problem (4) where the types of observational diversity allow to resolve more
diagonal R  R matrix mk ¼ diagfck1 ; . . . ; ckR g can be latent sources.
regarded as a scaling of the rows of B. It can be readily (6) There is no need for structural constraints or
verified that stacking the K matrices Xk in parallel along assumptions such as statistical independence, non-
the third dimension results in (8). As we already know negativity, sparsity, or smoothness in order to achieve
from (9), the rotation problem is eliminated [89]. It is thus a unique decomposition. And yet, multilinear struc-
no surprise that the tensor decomposition (8) is also tures readily admit such additional types of diversity
known as parallel factor analysis (PARAFAC) [5]. Com- that can further contribute to interpretability, robust-
bining this observation with the perspective of data fusion, ness, uniqueness, and other desired properties; see the
it has been noted that a tensor decomposition can be end of Section III-D for examples.
regarded as a way to fuse and jointly analyze data of
multiple observations when all the data sets have the same
size and share the same type of decomposition [16]. Note More properties of tensor decompositions and their uses in
that this notion applies also to two-way arrays. For various engineering applications can be found, for
example, if we associate a BSS interpretation to the model example, in [64], [65], [106], and references therein.
in (4), the ith row can be regarded as the contribution of Concluding Section III-C, Sections III-C1 and III-C2
the ith sensor, and stacking all I observations yields the presented two ways to look at matrices or tensors as data
I  J observation matrix X [16]. fusion structures. We have shown that matrix or tensor
Model (10) can be linked not only to FA but also to decompositions provide a natural framework to incorpo-
BSS, as follows. In Section III-C1, we mentioned that rate multiple types of observational diversity [16] on top of
uniqueness of BSS can be achieved if the sources are non- structural ones. We have shown that matrices and higher
Gaussian, nonstationary, or not spectrally flat (i.e., order tensors can be regarded as ways to jointly analyze
colored). These properties can be reformulated algebrai- multiple observations of the same data, when data sets
cally as a symmetric joint diagonalization (JD) of several share the same underlying structure [16]. It is thus no
matrices [81], [83], i.e., a special case of (10) when surprise that many multimodal data fusion models use
A ¼ B. As we have just explained, JD can be interpreted matrix or tensor decompositions as their underlying
as a simple data fusion problem in which several data sets analytical engine.
share the same mixing matrix. A key point is that Until now, we focused on decompositions in sum of
diagonalization of a single matrix has an infinite number rank-1 factors and statistical independence. In fact, these
of solutions, and each of these ‘‘nonproperties’’ [81] constraints can be regarded as too strong. Indeed, there
provides a set of at least two matrices that can be jointly exist other factorizations that may represent more flexible
diagonalized, thus fixing the indeterminacies. underlying relationships; see the end of Section III-D for

Vol. 103, No. 9, September 2015 | Proceedings of the IEEE 1459


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

examples. It is only for the sake of simplicity and limited statistically independent real-valued Gaussian processes
space that we restrict our discussion to one type of with independent and identically distributed (i.i.d.)
decomposition. samples, mixed by an invertible A, cannot be blindly
separated based on their observations xðtÞ alone [71],
D. A Link Between Data Sets as a New Form of [107]. If several such data sets are considered simulta-
Diversity neously, however, without changing the model within each
As explained in Section III-C, if all data sets share the mixture, but allowing statistical dependence across data
same underlying factorization model, and in addition, sets, then a unique and identifiable solution to all these
admit a (multi)linear relationship, then it may be possible mixtures, up to unavoidable scale and permutation
to use a single matrix or tensor decomposition in order to ambiguities, exists [82]. This model, when not restricted
perform data fusion. This assumption may be challenged in to Gaussian i.i.d. samples, is known as independent vector
various scenarios. An obvious conflict arises when data sets analysis (IVA) [82], [108], [109] and can be solved using
are given in different types of physical units. A technical second-order statistics (SOS) alone [110], [111].
difficulty is when data sets are stacked in arrays of different IVA was originally proposed to separate convolutive
orders, such as matrices versus higher order arrays. mixtures of audio signals [108], [109]. In the frequency
Further examples are data sets with different latent domain, this amounts (approximately) to resolving M ICA
models, different types of uncertainty, or when not all mixtures (6)
factors or latent variables are shared by all data sets. In
such cases, we say that data sets are heterogeneous [8].
While each of these complicating factors may be accom- xðmÞ ðtÞ ¼ AðmÞ sðmÞ ðtÞ; t ¼ 1; . . . ; T (11)
modated by preprocessing the data sets such that they all
comply, e.g., by normalizing, realigning, interpolating,
upsampling or downsampling, using features, or reducing where the M matrices AðmÞ , m ¼ 1; . . . ; M, are generally
dimensions, these procedures have the risk of being lossy different (in this context, t denotes samples in the
in various respects (for further discussion on complicating frequency domain and m are the frequency bins). For
factors in data fusion, see Section IV). For these reasons, simplicity, we assume that both xðmÞ ðtÞ and sðmÞ ðtÞ are
more elaborate models that allow heterogeneous data sets I  1. When each mixture (11) is solved separately, it is
to remain in their most explanatory form and still perform associated with an individual permutation matrix PðmÞ . It
true data fusion, i.e., in the sense of Definition I.2 and is clear that proper separation and reconstruction of the I
Section V-A, have been devised. audio signals cannot be achieved if the elements of the
In the following, we discuss data fusion approaches same source at different frequency bins are not properly
that go beyond single matrix or tensor factorization. Our matched. The key point in IVA w.r.t. a collection of ICA is
emphasis is on demonstrating how the concepts of true that it exploits statistical dependence among latent sources
data fusion allow pushing even further the limits of that belong to different mixtures, as illustrated in Fig. 1.
extracting knowledge from data that were summarized in Under certain conditions, the IVA framework provides a
Section III-C2. We show how these properties are carried single R  R permutation matrix PðmÞ ¼ P that applies to
over to more elaborate data fusion models and how they all the involved mixtures [82], [109].
can be reinforced into stronger properties that cannot be The ability of IVA to obviate the need to match the
achieved using single-set single-modal data. In particular: outputs of M separate ICA soon turned out useful far
1) allowing more relaxed uniqueness conditions that admit beyond convolutive mixtures: it has since been applied to
more challenging scenarios: for example, more relaxed
assumptions on the underlying factors, and the ability to
resolve more latent variables (low-rank terms) in each data
set; and 2) terms that are shared across data sets enjoy the
same permutation at all data sets. This obviates the need
for an additional step of identifying the arbitrarily ordered
outputs of each individual decomposition and matching
them, a task that generally cannot be accomplished
without additional information, in a blind or data-driven
context. Fixing the permutation reduces the number of
degrees of freedom and thus enhances performance and
interpretability. The following examples illustrate these
points.

Example III-D.1: Coupled Independent Component Analy-


sis: Consider the ICA problem (6). It is well known that Fig. 1. Diagram of the IVA model. Figure reproduced from [116, Fig. 1].

1460 Proceedings of the IEEE | Vol. 103, No. 9, September 2015


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

fMRI group data analysis [112], [113], multimodal fusion of This fundamental result extends to more elaborate
several brain-imaging modalities [114], and the analysis of scenarios. Uniqueness can be further improved if the order
temporal dynamic changes [115]. IVA extends CCA [1] and of (at least one of) the involved tensors increases [90]. This is
its multiset extension (MCCA) [3], which have both been analogous to the previously mentioned result (Section III-C2)
widely used for fusion [31], [36], [58], [116]–[118], to the for a single tensor, that increasing its order N relaxes the
case where not only second-order statistics but all-order bound on R [57], [91]. Adding assumptions such as individual
statistics are taken into account [82]. Recently, a uniqueness of one of the involved CPDs, full column rank of
generalization of IVA that allows decomposition into the shared factor C, or a specific structure such as a
terms of rank larger than one has been proposed [119]– Vandermonde matrix, also reinforces the uniqueness of the
[121]. In addition, since IVA is a generalization of ICA, it whole decomposition [90], [124]. Finally, all these results can
readily accommodates additional types of diversity such as be extended to more elaborate tensor decompositions that are
coloured (i.e., not spectrally flat) or nonstationary sources not limited to rank-1 terms [90].
[111], [122] (recall Section III-C1). Identifiability analysis Another benefit from coupling is that it helps relax the
of the multiple types of diversity in IVA is given in [82] and permutation ambiguity. Coupled tensor decompositions
[123]. It should be noted that the uniqueness results for have a unique arbitrary permutation matrix in a manner that
coupled CPD [90] (Example III-D.2) require at least one extends single-tensor results (9) [90], [125, Sec. III.A].
tensor of order larger than two in the coupled set and thus Consequently, the low-rank terms that are shared by all the
they cannot be applied to IVA. coupled tensors automatically have the same ordering at the
output of the algorithm.
Example III-D.2: Coupled Tensor Decompositions: In Linked-mode PARAFAC in which two or more third-
multilinear algebra, an ongoing endeavour is to obtain order tensors share a mode has first been suggested in
uniqueness conditions on a tensor decomposition [67], [126, p. 281]. The idea was extended to the case of arrays of
[88], [92]–[97]. The goal is to derive bounds that are as different orders (one of them must be three-way or higher)
relaxed as possible on the largest R that still satisfies in [69, Sec. 5.1.1]. Coupled tensor decompositions have
essential uniqueness (9). As an example, two necessary already proven useful in telecommunications [125],
conditions for the essential uniqueness of the CPD of a multidimensional harmonic retrieval (MHR) [124], che-
third-order tensor (8) are that mometrics and psychometrics [8], [99], and more. See
Fig. 2(a) for an example in metabolomics. Linked-mode
analysis has also been proposed as a means to represent
ðA BÞ; ðC AÞ and ðB CÞ have full column rank; missing values: each tensor is a data set that by itself is
and minðkA ; kB ; kC Þ  2 (12) complete, but as a whole, each data set has only partial
information w.r.t. a larger array in which all these data sets
are enclosed. This idea is accompanied by a more flexible
(e.g., [72] and [92]) where denotes the columnwise coupling design where more than one mode may be shared
Khatri–Rao product and kA is the Kruskal-rank of matrix between two tensors and the coupling may even involve
A, equal to the largest integer kA such that every subset of only parts of modes, i.e., shared (sub) factors [69, Sec. 5.1.2].
kA columns of A is linearly independent [67]. Fig. 2(b) illustrates this idea. Missing values are further
Consider now M third-order tensors X ðmÞ 2 CIm Jm K , discussed in Section IV-B4.
m ¼ 1; . . . ; M, with the same factorization as (8), that are We now summarize Examples III-D.1 and III-D.2. In
coupled by sharing one factor Section III-C, we have shown that both ICA and PARAFAC
can provide sufficient diversity to overcome the indeter-
minacy problem inherent to FA. We then extended
X
R
our discussion to jointly analyzing M such problems:
X ðmÞ ¼ aðmÞ
r  bðmÞ
r  cr (13) joint pdf
r¼1
M  ICA ! IVA (Example III-D.1) and M 
shared factor
PARAFAC  ! coupled CPD (Example III-D.2).
We have shown that by properly defining a link between
where the factor matrices of the mth tensor are AðmÞ ¼ data sets, we can extend and reinforce uniqueness and
ðmÞ ðmÞ ðmÞ ðmÞ
½a1 ; . . . ; aR  2 CIm R , BðmÞ ¼ ½b1 ; . . . ; bR  2 CJm R , identifiability beyond those obtained by individual analysis,
KR
C ¼ ½c; . . . ; cR  2 C . The coupled rank of the set up to the point of establishing uniqueness of otherwise
fX ðmÞ g is defined as the minimal number of rank-1 terms nonunique scenarios. In PARAFAC, mixtures share certain
aðmÞ
r  bðmÞ
r  cr that yield fX ðmÞ g in a linear combination factors, whereas in IVA, each mixture has its own individual
[90]. If the coupled rank of fX ðmÞ g is R, then (13) is called parameters and the link is via statistical dependence
the coupled CPD of fX ðmÞ g. It has recently been shown between certain variables. Next, we have shown that all
that the coupled CPD may be unique even if conditions these models are flexible in the sense that they can easily be
(12) are violated such that none of the individual CPDs in fine-tuned and modified in multiple ways, in order to better
(13) is unique [90]. fit various real-life data. More specifically, they readily admit

Vol. 103, No. 9, September 2015 | Proceedings of the IEEE 1461


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

enhance uniqueness but also to enable the same ordering for


all decompositions, thus further enhancing performance,
identifiability, and interpretability.

E. Conclusion: A Link Between Data Sets Is Indeed a


New Form of Diversity
The strength of IVA and coupled CPD over a set of
unlinked factorizations lies in their ability to exploit
commonalities among data sets. In IVA, it is the statistical
dependence of sources across mixtures; in coupled CPD, it is
the shared factors. In both scenarios, the links themselves are
new types of information: the fact that data sets are linked,
that elements in different data sets are related (or not), and
the nature of these interactions, bring new types of
constraints into the system that allow to reduce the number
of degrees of freedom and thus enhance uniqueness,
performance, interpretability, and robustness, among others.
On top of that, the links among the data sets allow desired
Fig. 2. Illustration of different types of coupling between matrices and properties within one data set to propagate to the ensemble
third-order tensors. (a) Linked-mode matrices and tensors in meta- and enhance the properties of the whole decomposition [16].
bolomics. Data sets represent four different acquisition methods. All This is a concrete mathematical manifestation of the raison
data sets share the same ‘‘samples’’ mode. Figure reproduced from d’être of data fusion that we have mentioned in Section II,
[127]. (b) Arrays (in this case, third-order tensors) may be coupled in
different modes and also via only part of a mode. In addition, linked
implying that [11, Sec. 9], [82], [143]
arrays may be regarded as elements in a larger volume (the red cube),
in which certain data points are missing. Figure reproduced from
[69, Fig. 3]. An ensemble of data sets is ‘‘more than the sum of its
parts’’ in the sense that it contains precious
information that is lost if these relations are ignored.
various types of diversity. A first generalization of these basic
models is by relaxing the assumptions within each decom-
position: allowing statistical dependence between latent The models that we have just presented allow multiple
sources of the same mixture in ICA (respectively, IVA) leads data sets to inform each other and interact, as formulated
to independent subspace analysis (ISA) [128]–[132] (respec- in Definition I.2 and further elaborated in Section V-A.
tively, joint independent subspace analysis (JISA) [119]– Therefore, in the same vein of the preceding discussion
[121]) as well as other BSS models [133], [134]. Relaxing the and Definition I.1, we conclude that [16], [82], [143]
sum-of-rank-1-terms constraint in PARAFAC leads to more
flexible tensor decompositions such as Tucker [6], [7], block
term decomposition (BTD) [135], three-way decomposition Properly linking data sets can be regarded as
into directional components (DEDICOM) [136], and others introducing a new form of diversity, and this
[137]. A second generalization is by combining several types diversity is the basis and driving force of data fusion.
of constraints and assumptions: for example, PARAFAC may
be combined with statistical independence [104], [138],
nonnegativity, sparsity, as well as structure of the latent
factors: Vandermonde [72]–[75], Toeplitz [76], among
others [16], [65], [106], [139]. A third generalization is I V. CHALLENGES AT T HE DATA LEVEL
increasing the number of types of observational diversity by Thanks to recent advances, the availability of multimodal
increasing the tensor order [57], [91]. A fourth is by linking data is now a fact of life. The acquisition of multimodal
data sets, leading to various coupled models, as explained in data, however, is only a first step. In this section, and
this section. When all these types of generalizations are taken Section V that follows, we discuss some of the issues that
into account, one obtains very general data fusion frame- should be addressed in the actual processing of multimodal
works such as structured data fusion (SDF) [16], coupled data. In this section, we focus on challenges imposed by
matrix and tensor factorization (CMTF) [99], linked multi- the data. These can be partitioned into challenges at the
way component analysis (LMWCA) [65], and others [140]– acquisition and observation level and challenges due to
[142]. These generalizations, and many more, are further various types of uncertainty in the data. A number of
discussed in Section V. In all cases, the link between approaches, to both types of challenges, are briefly
underlying factors at different modalities helps not only to mentioned in this section. Section V complements this

1462 Proceedings of the IEEE | Vol. 103, No. 9, September 2015


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

section with a more comprehensive discussion of how to various types of uncertainty. Data size incompatibility may
approach, in practice, some of these challenges, from a be due to a different number of samples at each
model design perspective. observational mode, as explained in Challenge IV-A.2.
Among possible causes are different acquisition techniques
A. Challenges at the Acquisition and Observation Level and experimental setups. The difference in size becomes
even more acute if data sets are arrays of different orders [8],
Challenge IV-A.1: Noncommensurability: As explained in [147], as is often encountered in chemometrics, metabo-
Sections I and II, a key motivation for multimodality is that lomics [e.g., Fig. 2(a)], and psychometrics, among others.
different instruments are sensitive to different physical
phenomena, and consequently, report on different aspects Challenge IV-A.4: Alignment and Registration: Registra-
of the underlying processes. A natural outcome is that the tion is the task of aligning several data sets, usually images,
raw measurements may be represented by different types on the same coordinate system. Registration is particularly
of physical units that do not commute. This situation is challenging when 3-D biomedical imaging techniques are
known as noncommensurability. Numerous examples of involved (Example II-B.1). In the first scenario, images of
noncommensurable data fusion scenarios were given in the same subject are taken at different times using the
Section II. Allowing noncommensurable data sets to same imaging technique. The difficulty arises from the fact
inform each other and interact is probably the first and that each image has some bias and spatial distortion w.r.t.
foremost task that one encounters in a large number of the others since the patient is never precisely in the same
multimodal data fusion scenarios [8]. position. In this case, image registration usually relies on
the basic assumption that image intensities are linearly
Challenge IV-A.2: Different Resolutions: It is most natural correlated [148]. This assumption, however, is much less
that different types of acquisition methods and observation likely in the second scenario, for multimodal images.
setups provide data at different sampling points, and often Consider, for example, registration of modalities that
at very disparate resolutions. The specific type of challenge convey anatomical information with others that report on
that is associated with this property varies according to the functional and metabolic activity. Naturally, the informa-
task. Consequently, solutions are diverse. In some cases, tion conveyed by each modality is inherently of different
different resolutions may be associated with various types physical nature. Other complicating factors include
of uncertainty, as explained in Section IV-B. Below, we list different types of noise, spatial distortions, varying
some scenarios in which challenges related to different contrasts, and different positions of the imaging instru-
resolutions occur. Data with different resolutions is a ments. One approach uses information theory and
prevalent challenge in multimodal image fusion [13], as maximizes mutual information [148], [149]. In remote
well as many other imaging techniques. For example, EEG sensing (Example II-C.1), images of the same area are
has an excellent temporal but low spatial resolution, taken by different types of instruments, e.g., airborne SAR
whereas fMRI has a fine spatial resolution but a very large and satellite-borne LiDAR, and possibly at different times
integration time (Example II-B.1). In remote sensing and conditions, e.g., before and after landscape-changing
(Example II-C.1), a common task is ‘‘pan-sharpening’’ events such as natural disasters. In principle, one can use
[13, Ch. 9], [40]: merging a high-spatial, low-spectral the global positioning system (GPS) for aligning the
(single band) resolution panchromatic image with a lower images. However, even the GPS signal has a finite spatial
spatial, higher spectral (several bands) resolution multi- precision. In biomedical imaging, the BOLD signal, to
spectral image, in order to generate a new synthetic image which fMRI is sensitive, has a large integration time and
that has both the higher spectral and spatial resolution of the thus a delay w.r.t. EEG. This leads to noninstantaneous
two. In audiovisual applications, the temporal resolution of coupling, even if the measurements themselves are
the signals differs by orders of magnitude. An audio signal is perfectly synchronized.
usually sampled at several kilohertz whereas the video signal Calibration can be interpreted as a special case of
is typically sampled at 15–60 Hz [144] (Example II-A.1). In alignment and registration using two sets of measure-
meteorological monitoring (Example II-C.2), each modality ments, and thus it can be considered as a form of data
has very distinct spatial and temporal resolutions; this is fusion. Calibration is a major task in chemometrics, where
probably the reason why solutions based on data integration it is often achieved via regression methods. Frequently
[47] (see Section V-A) are preferred. Different sampling used models such as multiway partial least squares (PLS)
schemes in coupled matrix and tensor decompositions are [150] and PARAFAC [62] are less adequate when the
discussed, e.g., in [145] and [146]. underlying profiles change shape from sample to sample. A
regression method that can accommodate such variability
Challenge IV-A.3: Incompatible Size: In practical situa- in multiway arrays is proposed in [151], and a multimodal
tions, it is quite rare that different data sets contain audiovisual calibration technique is given in [25]. The
exactly the same number of data samples. As explained in advantage of the proposed solution is that it is based on
Section IV-B, this incompatibility may be associated with direction of arrival estimation, an easier task than

Vol. 103, No. 9, September 2015 | Proceedings of the IEEE 1463


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

microphone-based time-difference-of-arrival estimation, implies heterogeneity in their level of importance or


which requires strict synchronization between micro- usefulness. For example, we may use questionnaires filled
phones. The challenge of automatic calibration of audio- by specialists (experts) and others by patients (nonex-
visual sensors (and others) in the context of HMI perts). Alternatively, if we consider two medical ques-
(Example II-A.2) is discussed in [10, Sec. V.C]. tionnaires, patients  symptoms and patients  diagnosis,
the first one may be more reliable since symptoms are
B. Challenges Due to Various Types of Uncertainty observed directly whereas diagnosis relies on interpreta-
We now turn to discussing uncertainty in the data. tion [155]. In order to address this issue, Wilderjans et al.
Any real-world set of observations is prone to various [153], [155] propose to associate the level of reliability with
types of uncertainty. The presence of heterogeneous noise, and use appropriate weights, obtained via an ML
multiple data sets creates new types of uncertainty that variant of simultaneous component analysis (SCA).
may also be heterogeneous. We argue that in such cases, it yimzekli et al. [156] propose individual weights for data
is the complementarity and diversity (Definition I.1) of sets with different divergence measures based on their
the data sets that should be exploited to resolve these relative ‘‘importance.’’ Similar to [153], these weights are
challenges [9]. interpreted as noise variances. Finally, in [47], Liberman
et al. process several meteorological monitoring modalities
Challenge IV-B.1: Noise: Thermal noise, calibration separately and then make a soft decision, using an optimal
errors, finite precision, quantization or any other quality weighted average based on location, number of links,
degradation in the measurements is unavoidable. For rainfall intensity and other parameters.
simplicity, we denote all these unavoidable phenomena as Another source of potential unbalance is data sets of
‘‘noise.’’ Naturally, each acquisition method produces not different size (recall Challenge IV-A.3). In the absence of
only heterogeneous types of desired data, but also additional assumptions, a simulation study favors equal
heterogeneous types of errors [8]. The question of how weight to each data entry regardless of its data set of origin,
to jointly weigh or balance different sources of error is over the alternative approach of weighting data sets by the
brought up in a number of data fusion scenarios, although number of their respective entries [147]. This approach is
most data fusion work currently ignores noise. Naturally, generalized to the case of missing values, where the
in the presence of noise, an appropriate model yields a weights should be proportional to the number of
more precise inference. nonmissing entries in each data set [16].
Several authors [152]–[154] use an additive noise
model with a distribution whose parameters may vary Challenge IV-B.3: Conflicting, Contradicting, or Inconsistent
within and across data sets. A Bayesian or maximum- Data: Whenever more than one origin of information is
likelihood (ML) framework is then applied to estimate the available, be it a single sensor or an ensemble of
noise parameters. In some cases, the noise estimates are observations, conflicts, contradictions, and inconsistencies
interpreted as weights that balance the contribution of may occur. If data are fused at the decision level, then a
each element [153] (note the link with Challenge IV-B.2). decision or voting [8] rule may be applied, as in the fusion
Beal et al. [24] propose a graphical model for audiovisual of different classification maps in remote sensing (see
object tracking in which they attribute different para- Example II-C.1). Other approaches, related to multisensor
meters to audio and video noise, and estimate both in a data fusion, are discussed in [9] and [157]. When only two
Bayesian inference framework. All these methods assume data sets are confronted, more elaborate solutions may be
independence among sources of noise across modalities. required. An obvious challenge is to devise a suitable
However, ignoring possible links (correlations) between compromise. A more fundamental challenge, however, is
noise across data sets may lead to bias [9]. identifying these inconsistencies.
In [158], Tmazirte et al. consider the problem of
Challenge IV-B.2: Balancing Information From Different detecting faults in multimodal sensors in a distributed data
Origins: In practice, for various reasons, not all observa- fusion framework, and dynamically reconfigure the system
tions or data entries have the same level of confidence, using information theoretical concepts. Their approach is
reliability or information quality [11], [21], [155]. Below, based on detecting inconsistency in the mutual informa-
we list scenarios in which this occurs, as well as some tion contribution of each sensor w.r.t. its history. Kumar
approaches to resolve these problems. et al. [159] deal with the problem of multimodal sensors
In real-life scenarios, certain sensors may be provide that occasionally produce spurious data, possibly due to
information that has more value than others, or certain sensor failure or environmental issues, and thus may bias
measurements might be taken at better controlled estimation. The challenge arises from the fact that
scenarios. For example, in a medical questionnaire about spurious events are difficult to predict and to model.
patients  symptoms, certain symptoms may be more Kumar et al. [159] propose a Bayesian approach that can
obvious, whereas others may be harder to define [155]. In identify and eliminate spurious data from a sensor. The
the same vein, heterogeneity of acquisition methods procedure attributes less weight to the measurement from

1464 Proceedings of the IEEE | Vol. 103, No. 9, September 2015


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

a suspected sensor when fused with measurements from each modality is properly sampled on its own, but there exist
other sensors. In the inference of cosmological parameters points on the common sampling grid that do not contain
(Example II-C.3), detecting and explaining (in)consisten- data from all modalities; these points can be regarded as
cies of observations from different experiments is of missing values. A fifth scenario is link prediction. This is a
utmost importance. A methodology for validation is common issue in recommender systems and social network
comparing various error measures on several types of analysis. In social network analysis, the challenge is
analytical products (Example V-A.1): cosmological para- predicting social links or activities based on an existing
meters, CMB power spectra, and full sky maps, with and database of connections or activities, where only a few
without the inclusion of data sets from both space-borne entries are known. As an example for a recommender
satellite missions and Earth-bound telescopes [50]. These system we mention the ‘‘Netflix Prize,’’ where the challenge
experiments vary in the spectral bands at which they is defined as improving the accuracy of predictions about
observe the sky, angular (spatial) resolution, sensitivity how much a person is going to enjoy a future movie based on
to different types of polarization, sky coverage, sky- past preferences. The data can be regarded as an incomplete
scanning strategies [50, Sec. 4.1], types of noise, and user  movie matrix, whose entries are user ratings in an
other parameters. Therefore, they carry complementary ordinal 1–5 scale, and the challenge is to fill in the missing
information. entries (initially set to zero). Among the many and diverse
methods that have been proposed we mention that some are
Challenge IV-B.4: Missing Values: The challenge of based on (coupled) matrix or tensor factorizations, possibly
missing values is not new and not unique to data fusion. by augmenting these data with further types of diversity; see,
The problem of matrix and tensor completion is long- e.g., [16], [161], [163], and references therein.
standing in linear and multilinear algebra. However, its
prevalence in data fusion draws special attention to it. In
Section III, we have seen that low-rank tensor decomposi- V. CHALLENGES AT THE MODEL DESIGN
tions provide redundancy that results in strong uniqueness LE VEL
that is further improved in the presence of coupling or In this section, we confront the unavoidable ‘‘how’’
additional constraints. It turns out that the same applies question, presenting some guidelines that might be
also in the case of missing values; see [16], [99], [160]– helpful in the actual design of data fusion solutions,
[162], and references therein. Approaches to missing from a model design perspective. This question has already
values that are motivated by various aspects of data fusion been raised by numerous authors, e.g., in [8]–[14], [16],
can be found, e.g., in [16], [160], and [161]. [152], [163], [164], among others, and the following
Missing values may occur in various scenarios. While discussion builds upon these foundations. In a sense, this
the first case that we mention below is not unique to data section concludes our paper. It complements Section III by
fusion, the remaining ones are. More specifically, the proposing theoretical model design principles that allow
first case deals with samples that are locally missing diversity to manifest itself. It complements Sections II and
within an individual data set, whereas the other cases IV by presenting model design principles that can
arise due to interaction among data sets. First, certain accommodate the practical data-level challenges pre-
data entries may be unreliable, discarded, or unavailable sented in Section IV and the numerous tasks given in
due to faulty detectors, occlusions, partial coverage, or Section II. It provides examples of approaches that allow
any other unavoidable effects. Second, sometimes a data sets to interact and inform each other, in the sense of
modality can report only on part of the system w.r.t. the Definition I.2. As in previous sections, due to the vastness
other modalities, as is the case with EEG versus MEG of the field, the discussion in this section is far from being
[12], nuclear magnetic resonance (NMR) versus liquid exhaustive: we only touch at certain topics, and leave
chromatography/mass spectrometry (LC–MS) [99], oc- others, such as computation, algorithms and fusion of
clusions or partial spatial coverage in remote sensing large-scale data, outside the scope of this overview. The
(Example II-C.1), audiovideo (Example II-A.1), meteo- rest of this section is organized as follows. In Section V-A,
rological monitoring (Example II-C.2), and HMI we discuss different strategies to data fusion, and address,
(Example II-A.2). A third scenario is illustrated in in particular, at which level of abstraction, reduction and
Fig. 2(b). In this case, there exist several data sets, depicted simplification the data should be fused. Section V-B
as complete third-order tensors. However, when linked discusses mathematical models for links between data sets
together, they can be regarded as elements in a larger third- that maximally exploit diversity and enhance interpret-
order tensor in which they are all contained, and whose ability and performance. Section V-C discusses some
volume is only partially filled. Fourth, data may be regarded theoretical approaches to the analysis of the ensemble of
as structurally missing if samples at different modalities are linked data sets. Section V-D brings together the numerous
not taken at comparable sampling points [8] (recall model design steps and considerations in a unified
Challenge IV-A.2), and we would like to construct a more framework of ‘‘structured data fusion.’’ We conclude our
complete picture from the entire sample set. In this case, discussion with validation issues in Section V-E.

Vol. 103, No. 9, September 2015 | Proceedings of the IEEE 1465


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

A. Level of Data Fusion In this paper, we focus on a third strategy, true fusion,
At first thought, it may seem that fusing multiple data that lets modalities fully interact and inform each other as
sets at the raw-data level should always yield the best claimed in Section I. True fusion is also characterized by
inference, since there would be no loss of information. In assigning a symmetric role to all modalities, i.e., not
practice, however, due to the complex and largely unknown sequential. The data fusion models mentioned in Section III
nature of the underlying phenomena (Section II), various fall into this category, as well as most of the models that we
complicating factors (Section IV), as well as the specific mention in the rest of this section. Within ‘‘true fusion,’’
research question (Sections I and II), it may turn out to be there are varying degrees.
more useful to fuse the data sets at a higher level of True fusion using high-level features. In this approach,
abstraction [9], and after certain simplification and reduc- the dimensionality is significantly reduced by associating
tion steps. The procedures listed below precede the actual each modality with a small number of variables. High-level
fusion of the data. Therefore, they are related to the features are often univariate. Examples include standard
preprocessing stage. Naturally, the choice of analytical variation, skewness, ratio of active voxels, other variables
model is influenced by decisions taken at this point. which concisely summarize statistics, or geometrical and
The first strategy that we mention is data integration. It other properties. In this case, inference is typically of
implies parallel processing pipelines for each modality, classification type. Examples include multisensor [9], HMI
followed by a decision-making step. Integration is a [10], and remote sensing [42] applications.
common approach to deal with heterogeneous data. True fusion using multivariate features. Unlike high-
When modalities are completely noncommensurable level features, this approach leaves the data sufficiently
(Challenge IV-A.1), as with remote sensing techniques multivariate within each modality (which now is in feature
that report on material content versus others that report on form) such that data in each modality can fully interact
3-D structures (Example II-C.1), integration becomes a [21], [58]. In neuroimaging, common features are task-
natural choice, and is often related to classification tasks. related spatial maps from fMRI, gray matter images from
Integration can be done via soft decision, using optimal sMRI, and event-related potential (ERP) from EEG,
weights, as in the fusion of data from wireless microwave extracted for each subject [21], [58], [60]. In audiovisual
sensor networks and radar, for rainfall measurement and applications, features often correspond to speech spectral
mapping [47] (Example II-C.2). Bullmore and Sporns coefficients and visual cues such as lip contours or
[165] study brain networks by first constructing separate speaker’s presence in the scene [23].
models of structural and functional networks based on True fusion using the data as is, or with minimal
several brain imaging modalities, and fuse them using a reduction. In fact, working with features implies a two-
graph-theoretical framework. Data integration may be step approach: in the first step, features are computed
preferred when modality-specific information carries more using a certain criterion; in the second step, features are
weight compared with the shared information, as argued for fused using a different, second criterion. An approach that
the joint analysis of EEG–fMRI in [32] (Example II-B.1). A merges the two, and thus expected to better exploit the
framework to choose between alternative soft decision whole raw data, is proposed in [166] for the fusion of
strategies in the presence of multiple sensor outputs, given fMRI and EEG. A remote sensing application in which it
various assumptions on uncertainty or partial knowledge, is natural to work with raw data is pan-sharpening
confidence levels, reliability, and conflicts, in a data fusion (explained in Challenge IV-A.2). Here, acquisition condi-
context, is given in [157]. Due to its simplicity, and relative tions are favorable since the two sensors (multispectral
stability since it allows to rely on well-established methods and pan) acquire data over the same area, with the same
from single-modal data analysis, a large number of existing angle of view and simultaneously, and the modalities are
data fusion approaches are still based on decision-level commensurable.
fusion. Pros and cons to data integration are further Features, at different levels, may accommodate hetero-
discussed in [21]. geneities across modalities, such as different types of
A second type of data fusion strategy is processing uncertainty and noncommensurability (Section IV). Fea-
modalities sequentially, where one (or more) modali- tures may significantly reduce the number of samples
ty(ies) is used to constrain another. Mathematically, this involved, i.e., allow compression. Example V-A.1 illus-
amounts to using one modality to restrict the number of trates this point. It also serves as a conclusion to the
degrees of freedom, and thus the set of possible solutions, discussion on the strategy for data fusion by showing how
in another. A sequential approach makes sense when one different levels of features can be used for varying data
modality has better quality in terms of the information fusion purposes. For further discussion on features and
that it conveys, in a certain respect, as in certain choosing the right level of data fusion, see, e.g., [21], [27],
audiovisual scenarios [10], [23] (Example II-A.1), as well [58] (biomedical imaging) and [10], [11] (HMI).
as in the fMRI-constrained solution for the otherwise-
underdetermined, ill-posed EEG inverse problem [12], Example V-A.1: Use of Features in Cosmological Inference
[26] (Example II-B.1). From CMB Observations: In the inference of cosmological

1466 Proceedings of the IEEE | Vol. 103, No. 9, September 2015


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

parameters from CMB observations (Example II-C.3), the and patients  diagnosis; we would like the data fusion
raw data consist of detector readouts as well as other model to allow us to uncover the medical conditions that
auxiliary information that amounts to several terabytes, or underlie both symptoms and diagnoses [155]. Properly
Oð1012 Þ, of observations [167]. The scientific products are defined links explain similarities and differences among
usually provided in several levels of multivariate ‘‘fea- data sets and allow better interpretability. As explained in
tures,’’ as follows: 1) full-sky maps, of CMB and non-CMB Section III, one of the first motivations for linking data sets
emissions, amounting to roughly Oð108 Þ pixels; 2) CMB in joint matrix decomposition scenarios is resolving the
power spectrum computed from the CMB spatial map, at arbitrary ordering of the latent components in each
Oð103 Þ spectral multipoles; and 3) six cosmological individual data set. It is interesting to note that all types
parameters that represent the best fit of the CMB power of links eventually alleviate this problem since they
spectrum to the LCDM model. It is clear that each level provide a single frame of reference.
represents a strong compression of the data w.r.t. the Since data fusion generally deals with heterogeneous
preceding one. Each level of features is useful for a data sets, we would like links to be flexible enough to allow
different type of inference. High-resolution component each data set to remain in its most explanatory form, as
maps are the first useful outcome from the component further discussed in Section V-B1. In various scenarios,
separation procedure [59]. Apart from providing valuable certain elements may be present only in a specific data set
information about the sky, they are useful, for instance, for whereas others are shared by two or more. We would like
consistency checks between instruments, experiments, the model not only to properly express these elaborate
and methods [50], [59]. Power spectra are useful to interactions but also to have the capacity to inform us
compare outcomes of different experiments that measure about (non)existence of links when this information is not
the CMB, e.g., Planck and BICEP2/Keck [168], whereas available in advance, a topic further elaborated on in
cosmological parameters form the link, via the LCDM Section V-B2.
model, with data sets that do not involve astrophysical As stated in Section II, the raison d’être of multimodal
observations, e.g., high-energy physics at CERN [56]. data fusion is the paradigm that certain natural processes
Order selection and dimension reduction. Related to and phenomena express themselves under completely
the open issue of choosing the most appropriate strategy of different physical guises. Due to the often complex nature
data fusion is order selection. As in nonmultimodal of the driving phenomena, it is likely that data sets will be
analysis, a dimension reduction step may be required in related via more than one type of diversity; e.g., time,
order to avoid overfitting the data, as well as a form of space, and frequency. Therefore, links should be designed
compression [9]. In a data fusion framework, this step such that they support a relationship via several types of
must take into consideration the possibly different diversity simultaneously, whenever applicable. Models
representations of the latent variables across data sets. based on multilinear relationships, as well as those that
As an example, a solution that maximally retains the joint admit multiple types of links simultaneously, seem to
information while also ensuring that the extracted sources better support this aim.
are independent from each other, in the context of a ‘‘joint
ICA’’-based approach, is proposed in [117]. Dimension 1) ‘‘Soft’’ and ‘‘Hard’’ Links Between Data Sets: One type
reduction may be performed locally, at each sensor or of decision that has to be made is whether each data set
modality, or at a central processing unit [9]. will have its own set of individual parameters, disjoint of
the others. In the first case, none of the parameters that
B. Link Between Data Sets define each data set’s model are shared by any other data
Data fusion is all about enabling modalities to fully set. As a result, additional information is required to
interact and inform each other. Hence, a key point is define the link. In such cases, the link is often defined as
choosing an analytical model that faithfully represents the some correspondence between data sets that can be
relationship between modalities and yields a meaningful interpreted as similarity, smoothness, or continuity [169].
combination thereof, without imposing phantom connec- Therefore, we call such links ‘‘soft.’’ In the second case,
tions or suppressing existing ones. The underlying idea of data sets explicitly share certain factor matrices or latent
data fusion is that an ensemble of data sets is ‘‘more than variables. For the sake of our discussion, we call such links
the sum of its parts’’ in the sense that it contains precious ‘‘hard’’ [145].
information that is lost if these relations are ignored. The ‘‘Hard’’ links between data sets. We have already seen
purpose of properly defined links is to support this goal, as shared factor matrices in numerous examples in Section III.
motivated by the discussion in Section III. In order to Naturally, data fusion methods that are based on stacking
maximize diversity, we would like links to be able to data in a single tensor fall within this category. Such are
exploit the heterogeneity among data sets. Properly PARAFAC (Section III-C-2), generalized singular value
defined links provide a clear picture of the underlying decomposition (GSVD) [170] and its higher order general-
structure of the ensemble of the related data sets [147]. ization [171], the higher order SVD (HOSVD) [172], and
Consider, for example, two data sets, patients  symptoms more. In joint ICA [173] and group ICA [174], [175], several

Vol. 103, No. 9, September 2015 | Proceedings of the IEEE 1467


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

ICA problems share a mixing matrix or source subspace, ‘‘soft’’ links are often reformulated using shared variables.
respectively, by concatenating the observation matrices in In the models that we have just mentioned, shared
rows or columns. Simultaneous factor analysis (FA) [152] variables are, for example, cross correlation or cross
and simultaneous component analysis (SCA)-based methods cumulants when statistical (in)dependence and covaria-
[98], [176]–[179] deal with multiway data that have at least tion are concerned, or a shared latent variable in
one shared mode, but do not stack it in a single tensor regression models. The reformulated models often take
(Section III-C2) due to various complicating factors, such as the form of (approximate) (coupled) matrix or tensor
those mentioned in Section IV. Linked tensor ICA [154] has factorizations. We mention [111], [190], and [191] as just a
one factor matrix shared by all decompositions. In Bayesian few examples; further discussion is beyond the scope of
group FA [180] and its tensor generalization [142], [181], as this paper. The bottom line is that the distinction between
well as in collective matrix factorization (CMF) [164], ‘‘soft’’ and ‘‘hard’’ links is often immaterial. The implica-
several matrices or tensors share all but one factor matrix. tion is that models with ‘‘soft’’ links can sometimes neatly
In fusion of hyperspectral and multispectral images fit within optimization frameworks that assume shared
(Example II-C.1), the joint factor is a matrix that reflects variables, for example, SDF [16].
the (desired, unknown) high-resolution image before spatial
and spectral degradation [182]–[185]. In group nonnegative 2) Shared Versus Unshared Elements: The idea that data
matrix factorization (NMF), shared columns of the feature sets have both shared (common) and unshared (individual,
matrix reflect task-related variations [186]. The generalized modality-specific) elements w.r.t. the others can be found
linked-mode framework for multiway data [8] allows in numerous models. It can be formulated mathematically
flexible links across data sets by shared (sub)factors, as do by defining certain columns of a factor matrix or sub-
other flexible tensor-based data fusion models such as elements of a latent variable as shared, while others are
coupled matrix and tensor factorization (CMTF) [99] and unshared. Models that admit this formulation include
its probabilistic extension generalized coupled tensor incomplete mode PARAFAC [69, Sec. 5.1.2] [Fig. 2(b)],
factorization (GCTF) [161], linked multiway component group NMF [186], LMWCA [192], and SDF [16]. In the
analysis (LMWCA) [65], and structured data fusion (SDF) extraction of a common source of variability from
[16]. In the fusion of astrophysical observations of the CMB heterogeneous sensors [187], it is hidden random variables
from different experiments (Example II-C.3), the link may that are either shared or unshared. Another example is
be established by a joint distribution of the ensemble of Bayesian group FA [180] and its tensor extensions [142],
samples from all data sets. In this case, the fusion is based [181], where a dedicated factor matrix determines which of
on the assumption that the random processes from which all the factors in a common pool are active within each data set.
samples are generated are controlled by the same under- The more fundamental challenge, however, is to
lying cosmological parameters [50]. A shared random identify the shared and unshared elements from the data
variable is used also in [187] to extract a common source itself, without a priori assignment of individual and shared
of variability from measurements in multiple sensors using variables. Bayesian linked tensor ICA [154] holds a
diffusion operators. modality-specific factor matrix of optimally determined
‘‘Soft’’ links between data sets. Prevalent types of weights that can eliminate a source from some modalities
‘‘soft’’ links are statistical dependence, as in IVA while keeping it in others. In [170], Alter et al. propose
(Example III-D.1); covariations, as in CCA [1] and its GSVD to infer, from two genome-scale expression data
extension to more than two matrices, multiset CCA sets, shared and individual processes. Ponnapalli et al.
(MCCA) [3], [58], [110], [118], and parallel ICA [188]; [171] extend this GSVD-based approach to more than two
and ‘‘similarity,’’ in the sense of minimizing some distance data sets. Shared and unshared processes in genomic and
measure between corresponding elements, as in soft metabolomic data may also be revealed by a proper
nonnegative matrix cofactorization [189] and joint matrix rotation of the components resulting from SCA. The
and tensor decompositions with flexible coupling [145]. proposed approach, called distinctive and common
For audiovisual data fusion, a dictionary learning model components with simultaneous-component analysis (DIS-
where each atom consists of an audio and video CO–SCA) [177], [193], [194], may outperform GSVD in
component has been proposed in [144]. A graphical certain scenarios and can be straightforwardly generalized
model in which audio and video shifts are linearly related to more than two data sets. A comparative study of GSVD,
in far-field conditions is proposed in [24]. Generalized DISCO–SCA, and other methods that can identify shared
linked-mode for multiway data [8] and LMWCA [65] and unshared processes underlying multiset data can be
mention explicitly that they can be defined both with found in [179]. HOSVD [195] can differentiate between
‘‘soft’’ or ‘‘hard’’ links. shared and unshared phenomena in DNA analysis from
Although the partition into ‘‘soft’’ and ‘‘hard’’ links is multiple experiments [172]. As a last example, in CMTF
conceptually appealing and simplifies our presentation, [196], model constraints may be defined in the form of
the following reservation is in order. In practice, when it sparse weights such that unshared components have norms
comes to writing the optimization problem, models with equal or close to zero in one of the data sets.

1468 Proceedings of the IEEE | Vol. 103, No. 9, September 2015


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

C. Analytical Framework framework [156], [161], for example. An ML framework


Certain data fusion approaches rely on existing can accommodate data sets with different noise patterns.
theoretical analytical frameworks that have originally Such are flexible simultaneous FA [152] and ML-based SCA
been devised for nonfusion applications, at least not [153], [155]. ML underlies certain noiseless stochastic
explicitly. Such are ICA and algebraic-based methods models, e.g., SOS–IVA [202] and JISA [119]–[121].
such as PARAFAC, generalized eigenvalue decomposi- Regression provides another solution to data fusion.
tion (GEVD), GSVD, and HOSVD, as will be elaborated Regression searches for latent factors that best explain the
below. These methods have been around for a while and covariance between two sets of observations. We mention
there is a large body of works that has been dedicated PLS [62], [203] and its multilinear extensions N-way PLS
to their computation. Data fusion approaches that rely [62], [150] and higher order PLS (HOPLS) [141].
on these well-established, widely known methods are Bayesian group FA [180], its tensor extension [142],
often more easily accepted and integrated within the [181], coupled matrix and tensor decompositions with
research communities. However, these approaches may flexible coupling [145], and certain methods for the fusion
not be able to exploit the full range of diversity in the of hyperspectral and multispectral images [183], [184],
data, and thus, more advanced data fusion methods may rely on a Bayesian framework for the decomposition.
be preferred. Below, we briefly review some of the Certain tensor extensions of ICA rely on a probabilistic
analytical approaches that have been proposed for data Bayesian framework [154]. In [185], fusion of hyperspec-
fusion. tral and multispectral images is achieved via dictionary
Well-known matrix and tensor factorizations can be learning and sparse coding. This is also the underlying
used for data fusion. In [170], Alter et al. use GSVD for technique of [144] for learning bimodal structure in
the comparison of genetic data from two different audiovisual data. Beal et al. [24], [204] use probabilistic
organisms. In [172], HOSVD is proposed for the analysis generative models, also termed graphical models, in order
of data from different studies. In the presence of two data to fuse audio and video models into a single probabilistic
sets, or in a noise-free scenario, many matrix- and tensor- graphical model. Lederman and Talmon [187] use an
based methods can be reformulated as GEVD [197]. This alternating-diffusion method for manifold learning that
holds for various BSS closed-form solutions [198], CCA extracts a common source of variability from measure-
[199, Ch. 12] and its multiset extension [3], [110], joint ments in multiple sensors, where all sensors observe the
BSS [111], and coupled tensor decompositions [200]. As same physical phenomenon but have different sensor-
explained in Section V-B1, algebraic (possibly approxi- specific effects. Combining labeled and unlabeled data via
mate) solutions to models with ‘‘soft’’ links often exist. cotraining is described in [205]. A survey of techniques for
Certain data fusion methods concatenate or reorganize multiview machine learning can be found in [206]. A
data such that it can be analyzed by a classical ICA multimodal deep-learning method for information retriev-
algorithm. Such is the case in joint ICA [173] and group al from bimodal data consisting of images and text is
ICA [174], [175]. These models can thus be solved using described in [207].
any existing ICA approach [63].
Guo et al. [201] propose a tensor extension to group D. Structured Data Fusion: A General Mathematical
ICA [174], [175] and to tensor ICA [104] that can Framework
accommodate different group structures. Parallel ICA In Sections III, IV and V.A–V.C, we mentioned a large
[188] and IVA [108], [109]–[111] (Example III-D.1) jointly number of data fusion models. However, it is clear that no
solve several separate ICA problems by exploiting covaria- list of existing solutions, comprehensive as it might be, can
tions or statistical dependence, respectively. CCA [1], its cover the practically endless number of current, future and
extension to multiple data sets [3], as well as one of the potential data sets, problems and tasks. Indeed, the
approaches to LMWCA [65], search for maximal correla- purpose of this paper is not in promoting specific models
tion, or other second-order-based relationships, between or methods. Instead, and building upon [8]–[14], [16],
variables. [152], [163], [164], and others, we wish to provide a deeper
Certain methods minimize the Frobenius norm be- and broader understanding of the concepts and ideas that
tween model and data. In the presence of additive white underlie data fusion. As such, in the model design front,
Gaussian noise, this amounts to maximum likelihood (ML). our goal is providing guidelines and insights that may
Further considerations associated with this choice of norm apply also to data sets, problems, and tasks that do not
are given in [16, Sec. II]. This type of optimization is used in necessarily conform to any of the specific examples,
group NMF [186], coupled NMF [182], certain SCA-based solutions, and mathematical frameworks that we men-
methods [98], [176]–[178], and numerous coupled tensor tion. The concept of diversity, presented in Section III,
decompositions; see, e.g., [16], [99], [124], and references is one such example. In the same vein, we now present
therein. In some cases, it may be better to tailor loss a general mathematical framework that will allow us to
functions individually to each data set, and use norms other give a more concrete meaning to some of the model
than Frobenius [8], [153]. Such is the case in the GCTF design concepts that have been discussed. Although this

Vol. 103, No. 9, September 2015 | Proceedings of the IEEE 1469


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

formulation is given in terms of matrices and higher order visualized in 2-D or 3-D arrays such as images do not
arrays, also known as tensors, the underlying ideas behind necessarily admit any useful (multi)linear relationships
‘‘structured data fusion’’ [16], such as flexibility and among their pixels or voxels. Hence, in the analysis, image
modularity, are not limited to these. The mathematical data are often vectorized into 1-D arrays. Further
formulation that we use is only a concretization of a more discussion of how to choose the right array structure for
general idea, applied to data sets that admit certain types data analysis can be found, e.g., in [8], [64], [65], [106],
of decompositions. and references therein.
We now present a formulation proposed by Sorber et al. Link between data sets: In the SDF formulation, a
[16], followed by a few examples for motivation and link between data sets can be established if their models
clarification. share at least one factor or variable. This corresponds to
Model of an individual data set: Consider an the ‘‘hard’’ links, mentioned in Section V-B1. However,
ensemble of M data sets, collected in M arrays (tensors) this does not exclude other types of interaction between
T ðmÞ 2 CI1 INm , m ¼ 1; . . . ; M, where Nm ¼ 1; 2; 3; . . . data sets: ‘‘soft’’ links may be established by reformulating
implies a vector, a matrix or a higher order tensor, ‘‘soft’’ links using shared parameters, as explained in
respectively. In order to allow maximal flexibility in the Section V-B1, and possibly via regularization terms.
model associated with each of these data sets, Sorber et al. The following examples provide a more concrete
[16] define several layers of underlying structures. The meaning to the mathematical formulation that we have
first layer is an ordered set of V variables z ¼ fz1 ; . . . ; zV g, just laid out. Consider two matrix data sets, patients 
where each variable may be anything from a scalar to a diagnosis and patients  symptoms. A latent variable may
higher order tensor, real or complex [recall (1)]. The be the syndrome that underlies both diagnoses and
second layer is an ordered set of F factors X ðzÞ ¼ symptoms factors. The link is established via the shared
fx1 ðzi1 Þ; . . . ; xF ðziF Þg that are driven by the V variables z. ‘‘patients’’ mode [155]. As a second example, certain
Each factor xf ðzif Þ is a mapping of the if th variable to a properties of a communication system in a coupled CPD
tensor. In the third layer, each data set T ðmÞ is associated framework may be expressed using a factor with a
with a decomposition model MðmÞ that approximates it. Vandermonde structure [124]. This can be implemented
Function MðmÞ ðX ðzÞÞ maps a subgroup of the factors in SDF with zf a vector of p scalars and xf ðzf Þ a p  q
X ðzÞ to a tensor. Fig. 3 illustrates these layers. For Vandermonde matrix constructed thereof. Further exam-
simplicity, each data set T ðmÞ is associated with a tensor ples for factor structures that can be reformulated in SDF
model MðmÞ of the same order and size. This is not include orthogonality, Toeplitz structure, nonnegativity, as
evident: the order Nm and dimensions I1      INm of the well as fixed and known entries. Constraints such as
model tensor, as used in the analysis, may differ from those sparsity and smoothness within factors may be implemen-
that most naturally represent the acquired data, as well as ted using regularization terms. Tensor decompositions that
from the natural way to visualize the samples. As a first can be reformulated as MðmÞ ðX ðzÞÞ include rank-1
example, raw EEG data are a time series in several decompositions (CPD, PARAFAC), decompositions into
electrodes, i.e., electrode  time. However, it has been rank  1 terms, Tucker, BTD, PARAFAC2 [70], and many
proposed to augment these data using a third type of others. These options, as well as other alternatives for
diversity, electrode  time  frequency [101]–[103]. In latent variables, factors, and models for each data set, are
this case, the EEG data will be stacked in a third-order mentioned in Section III. For further explanations about
tensor. On the other hand, data that are naturally implementation, see [16] and [208].

Fig. 3. Schematic illustration of structured data fusion. For example, vector z1 , upper triangular matrix z2 , and full matrix z3 are transformed using
mappings x1 , x2 , and x3 , into a Toeplitz, orthogonal, and nonnegative matrix, respectively. The resulting factors are then used to jointly factorize
two coupled data sets. Figure and caption reproduced from [16, Fig. 1].

1470 Proceedings of the IEEE | Vol. 103, No. 9, September 2015


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

Loss/objective function, regularization, and penalty admissible combinations are considered, then the number of
terms: The next step is to fit the model to the data. potential analytical data fusion models is significantly larger
Depending on the analytical framework that we choose than what is currently available in the literature [8].
(Section V-C), each data set or the whole ensemble is The modular perspective on data fusion offers several
attributed with a loss/objective function DðmÞ ð; Þ, between benefits. First, a major challenge in data fusion is its
observed and modeled data [153], [156], [161]. An augmented complexity due to the increased number of
individual loss/objective function allows flexibility both degrees of freedom. The modular approach to data fusion
in the analytical framework applied to each data set, and in answers this challenge by reformulating the problem in a
the individual types of uncertainty (Challenge IV-B.1). The small set of disjoint simpler components that can be
loss/objective function may be complemented by various separately analyzed, optimized, and coded. Second, the
penalty or regularization terms, in order to impose modular approach, in which the problem is factorized into
constraints that are not expressed by the other optimiza- smaller standalone elements, allows a broader view that
tion functionals. Regularization terms may impose certain makes it easier to come up with new combinations of the
types of sparsity, nonnegativity [196], similarity [145], basic building blocks, thus leading to new mathematical
[189], or coherence [209], to name a few. models, algorithms, and concepts [8], [16], [163], [164].
Missing values: In order not to take account of Third, modularity of the formulation makes it easier to
unknown data entries in the optimization procedure, these adapt it to computational challenges such as large-scale
values are masked. This is done via an entrywise data [16], [65], [162], [210], [211]. Fourth, in Sections I
)
(Hadamard) product (denoted as
) of the data tensor and II, we have emphasized the importance of exploratory
T ðmÞ with a binary tensor BðmÞ of the same size; see, e.g., research in data fusion. The modularity of the design is
[16], [160], and references therein. Missing values are particularly helpful in that, making it straightforward to
discussed in Challenge IV-B.4. come up with new exploratory variations, to test and
The whole optimization problem: Given these compare alternatives with minimal effort [16]. Modularity
elements, SDF may be written as the optimization problem allows to easily diagnose which elements in the model are
particularly useful, need to be modified, replaced or fine-
tuned, without having to undo the whole derivation,
X
M
!m   coding or analysis. The latter also facilitates the validation
min DðmÞ MðmÞ ðX ðzÞÞ; T ðmÞ stage; see Section V-E for further discussion.
z 2 BðmÞ
m¼1
þ regularization terms (14)
E. Validation
Despite accumulating empirical evidence of the
) )
where DðmÞ ð; ÞBðmÞ implies DðmÞ ðBðmÞ
; BðmÞ
Þ. Scalars benefits of data fusion, there is still very little theoretical
!m denote weights, reflecting the relative importance of validation and quantitative measure of its gain [11], [99].
the loss/objective functions in the ensemble. Scenarios in Choosing an appropriate model is a widely open question,
which weights are useful are discussed in Section IV-B. and approximate and highly simplified models are often
The optimization problem (14) implements the overall preferred. Therefore, a validation step is indispensable.
analytical framework associated with the model. Equation The following points are of particular interest. 1) Lower
(14) is a slight generalization of the original SDF formulation bounds on the best achievable error: How far are we from
[16, Eq. (1)], in which the loss/objective function is the best possible result (for a given data set, task, goal, and
a weighted Frobenius norm, DðmÞ ðMðmÞ ; T ðmÞ ÞBðmÞ ¼ model)? 2) Theoretical results on the reliability and
2
)
kBðmÞ
ðMðmÞ  T ðmÞ ÞkF . Numerical and computational practical usefulness of the method: Can we prove that the
advantages associated with the Frobenius norm in the model is identifiable? Is the solution unique? Is the output
context of SDF are discussed in [16, Sec. II]. An illustration physically meaningful? Are the results sufficiently inter-
of SDF is given in Fig. 3. pretable? IVA (Example III-D.1) and coupled tensor
As noted in [16], a large number of the existing data decompositions (Example III-D.2) are two of the models
fusion models can be reformulated in terms of SDF, or some for which there now exists a comprehensive theoretical
variation thereof. However, an even more interesting insight analysis that answers this type of questions.
is that each step in the design of (14) is independent of the Although these questions are not specific to multi-
others: to a large extent, the choice of constraints, modal data fusion, they take special interpretation in the
assumptions, types of links, loss/objective functions, and presence of multiple data sets. Some of the new questions
other parameters can be done disjointly. In other words, SDF that arise are as follows. 1) What is the mathematical
is a modular approach to data fusion. These insights have led formulation of ‘‘success,’’ ‘‘optimality,’’ and ‘‘error,’’ when
to the key observation that in fact, a large number of existing heterogeneous modalities and types of uncertainty are
data fusion models can be regarded as composed of a rather involved? What is the most appropriate target function and
small number of building blocks [16]. In other words, if all criterion of success? 2) How do we evaluate performance

Vol. 103, No. 9, September 2015 | Proceedings of the IEEE 1471


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

of exploratory tasks? 3) How do we design a figure of merit Due to the heterogeneous characteristics of the data, and
that can inform us how to exploit the advantages of each particularly in exploratory tasks, the interpretability of the
modality without suffering from its drawbacks w.r.t. the output should be given special care. Questions related to the
other modalities? 4) How do we identify and process representation of the output of multimodal data analysis are
information that is shared by several modalities, and how discussed, e.g., in [11, Sec. 8].
do we identify and exploit modality-specific information?
5) How do we compare alternative design choices such as
of level of data fusion, order selection, and analytical VI. CONCLUSION
model within and across modalities? As an example, We enter an era where the abundance of diverse sources of
theoretical figures of merit such as the Cramér–Rao lower information makes it practically impossible to ignore the
bound may help answer some of these questions. However, presence of multiple data sets that are possibly related. It is
calculating theoretical error bounds for all possible very likely that an ensemble of related data sets is ‘‘more
alternatives (especially in view of the modular approach than the sum of its parts,’’ in the sense that it contains
of Section V-D) is a prohibitive task, both due to the very precious information that is lost if these relations are
large number of options, and also since many models are ignored. The information of interest that is hidden in these
not mathematically tractable. Rescue may come from the data sets is usually not easily accessible, however. We argue
computational front. As an example, Tensorlab [208], a that the road to this added value must go through first
Matlab toolbox that follows the modular principles of understanding and identifying the particularities of multi-
SDF [16] (Section V-D), enables the user to switch modal and multiset data, as opposed to other types of
between the numerous combinations arising from multiple aggregated data sets. At the same time, the joint analysis of
choices in the model design. As such, it allows the user to multiple data sets ‘‘stands on the shoulders of’’ single-set
rapidly iterate toward a plausible solution for the problem analysis. Hence, the development of methods and techni-
at hand. Therefore, Tensorlab [208] (or any other ques for single-set analysis is a cornerstone for advanced
computational tool following the modularity principles) data fusion. In this paper, we have shown that methods that
may serve as a verification and validation tool, at least in properly account for the links among data sets indeed have
the preliminary stages of the design. the potential to achieve gains and benefits that go far
A class on their own are questions regarding the choice of beyond those possible when each data set is processed
modalities and the added value from using multiple individually. As argued in this paper, the potential impact
modalities in general. 1) Should all available modalities be of these gains is high, and spans the whole spectrum from
used, and/or given equal importance? 2) How much solving theoretical problems that cannot be solved in
(information, diversity, redundancy) does each modality single-set scenarios, to opening up new opportunities in
bring in to the total equations? How do we quantify this numerous medical, environmental, psychological, social,
‘‘extra contribution’’? Some of these questions (and examples and technological domains, among others. By adopting a
of possible answers) have been brought up within the data-driven approach, we have shown that the encountered
challenges in Section IV; others are related to the design of a challenges are ubiquitous, whence the incentive that both
data fusion model (Section V). Information theory seems like challenges and solutions be discussed at a level that brings
a natural framework to evaluate the contribution of various together all involved communities. h
types of diversity, as discussed, e.g., in [12]. Uniqueness
analysis of (coupled) tensor decompositions, as well as other
forms of error analysis, such as those mentioned in Section - Acknowledgment
III, quantify the added value of diversity in terms of the The authors would like to thank L. De Lathauwer,
admissible number of uniquely identifiable components or J.-F. Cardoso, J. Chanussot, M. Dalla Mura, N. David,
factors. Attention should be paid, for example, when I. Fijalkow, H. Messer, G. Miller, and S. Van Huffel, whose
modalities are too close to each other: in this case, they expertise, insightful remarks, and feedback have greatly
may not really convey new information; in addition, they helped extend the scope of this paper; and the anonymous
may be exposed to similar noise, and thus bias results [9]. reviewers for their careful reading and valuable remarks.

REFERENCES [4] J. D. Carroll and J.-J. Chang, ‘‘Analysis of Contributions to Mathematical Psychology.
individual differences in multidimensional New York, NY, USA: Holt, Rinehardt &
[1] H. Hotelling, ‘‘Relations between two sets scaling via an N-way generalization Winston, 1964, pp. 109–127.
of variates,’’ Biometrika, vol. 28, no. 3/4, of ‘‘Eckart-Young’’ decomposition,’’
pp. 321–377, Dec. 1936. [7] L. R. Tucker, ‘‘Some mathematical notes on
Psychometrika, vol. 35, no. 3, pp. 283–319, three-mode factor analysis,’’ Psychometrika,
[2] R. B. Cattell, ‘‘Parallel proportional profiles Sep. 1970. vol. 31, no. 3, pp. 279–311, Sep. 1966.
and other principles for determining the [5] R. A. Harshman, ‘‘Foundations of the
choice of factors by rotation,’’ Psychometrika, [8] I. Van Mechelen and A. K. Smilde, ‘‘A
PARAFAC procedure: Models and conditions generic linked-mode decomposition model
vol. 9, no. 4, pp. 267–283, Dec. 1944. for an ‘‘explanatory’’ multimodal factor for data fusion,’’ Chemom. Intell. Lab. Syst.,
[3] J. Kettenring, ‘‘Canonical analysis of several analysis,’’ UCLA Working Papers Phonetics, vol. 104, no. 1, pp. 83–94, Nov. 2010.
sets of variables,’’ Biometrika, vol. 58, no. 3, vol. 16, pp. 1–84, Dec. 1970.
pp. 433–451, 1971. [9] B. Khaleghi, A. Khamis, F. O. Karray, and
[6] L. R. Tucker, ‘‘The extension of factor S. N. Razavi, ‘‘Multisensor data fusion: A
analysis to three-dimensional matrices,’’ in

1472 Proceedings of the IEEE | Vol. 103, No. 9, September 2015


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

review of the state-of-the-art,’’ Inf. Fusion, component analysis: Integration of data- [43] R. B. Smith, ‘‘Introduction to hyperspectral
vol. 14, no. 1, pp. 28–44, Jan. 2013. driven and model-driven methods,’’ J. Integr. imaging,’’ TNTmips, Mar. 2013. [Online].
[10] S. T. Shivappa, M. M. Trivedi, and B. D. Rao, Neurosci., vol. 11, no. 3, pp. 313–337, 2012, Available: http://www.microimages.com/
‘‘Audiovisual information fusion in pMID: 22985350. documentation/Tutorials/hyprspec.pdf.
human-computer interfaces and intelligent [28] B. Horwitz and D. Poeppel, ‘‘How can EEG/ [44] M. Dalla Mura et al., ‘‘Challenges and
environments: A survey,’’ Proc. IEEE, vol. 98, MEG and fMRI/PET data be combined?’’ opportunities of multimodality and data
no. 10, pp. 1692–1715, Oct. 2010. Human Brain Mapping, vol. 17, no. 1, pp. 1–3, fusion in remote sensing,’’ Proc. IEEE,
[11] M. Turk, ‘‘Multimodal interaction: 2002. vol. 103, no. 9, Sep. 2015, DOI: 10.1109/
A review,’’ Pattern Recognit. Lett., vol. 36, [29] S. Makeig, T.-P. Jung, and T. J. Sejnowski, JPROC.2015.2462751.
pp. 189–195, Jan. 2014. ‘‘Having your voxels and timing them too?’’ [45] N. Longbotham et al., ‘‘Multi-modal change
[12] F. BieQmann, S. Plis, F. C. Meinecke, in Exploratory Analysis and Data Modeling in detection, application to the detection of
T. Eichele, and K. Müller, ‘‘Analysis of Functional Neuroimaging. Cambridge, MA, flooded areas: Outcome of the 2009–2010
multimodal neuroimaging data,’’ IEEE Rev. USA: MIT Press, 2003, p. 195. data fusion contest,’’ IEEE J. Sel. Top. Appl.
Biomed. Eng., vol. 4, pp. 26–58, 2011. [30] E. Martı́nez-Montes, P. A. Valdés-Sosa, Earth Observat. Remote Sens., vol. 5, no. 1,
F. Miwakeichi, R. I. Goldman, and pp. 331–342, Feb. 2012.
[13] T. Stathaki, Image Fusion: Algorithms
and Applications. Amsterdam M. S. Cohen, ‘‘Concurrent EEG/fMRI [46] H. Messer, A. Zinevich, and P. Alpert,
The Netherlands: Academic Press, 2008. analysis by multiway partial least squares,’’ ‘‘Environmental monitoring by wireless
NeuroImage, vol. 22, no. 3, pp. 1023–1034, communication networks,’’ Science, vol. 312,
[14] H. B. Mitchell, Data Fusion: Concepts and
Jul. 2004. no. 5774, p. 713, May 2006.
Ideas 2nd. New York, NY, USA:
Springer-Verlag, 2012. [31] J. Sui, T. Adall, Y.-O. Li, H. Yang, and [47] Y. Liberman, R. Samuels, P. Alpert, and
V. D. Calhoun, ‘‘A review of multivariate H. Messer, ‘‘New algorithm for integration
[15] T. Adall, Ed., Special Section on Multimodal
methods in brain imaging data fusion,’’ in between wireless microwave sensor network
Biomedical Imaging: Algorithms and
SPIE Medical Imaging, R. C. Molthen and and radar for improved rainfall measurement
Applications, IEEE Trans. Multimedia,
J. B. Weaver, Eds. Philadelphia, PA, USA: and mapping,’’ Atmos. Meas. Technol., vol. 7,
vol. 15, no. 5, Aug. 2013.
SPIE, 2010, pp. 76260D-1–76260D-11. pp. 3549–3563, 2014.
[16] L. Sorber, M. Van Barel, and
[32] M. De Vos et al., ‘‘The quest for single trial [48] H. Seyyedi, ‘‘Comparing satellite derived
L. De Lathauwer, ‘‘Structured data fusion,’’
correlations in multimodal EEG-fMRI data,’’ rainfall with ground based radar for
IEEE J. Sel. Top. Signal Process., vol. 9,
in Proc. Eng. Med. Biol. Conf., Osaka, Japan, North-Western Europe,’’ M.S. thesis, ITC,
no. 4, pp. 586–600, Jun. 2015.
Jul. 2013, pp. 6027–6030. Enschede, The Netherlands, Jan. 2010.
[17] A. R. McIntosh, F. L. Bookstein, J. V. Haxby,
[33] J. C. Mosher, P. S. Lewis, and R. M. Leahy, [49] N. David, P. Alpert, and H. Messer,
and C. L. Grady, ‘‘Spatial pattern analysis
‘‘Multiple dipole modeling and localization ‘‘Technical note: Novel method for water
of functional brain images using partial
from spatio-temporal MEG data,’’ IEEE vapour monitoring using wireless
least squares,’’ NeuroImage, vol. 3, no. 3,
Trans. Biomed. Eng., vol. 39, no. 6, communication networks measurements,’’
pp. 143–157, Jun. 1996.
pp. 541–557, Jun. 1992. Atmos. Chem. Phys., vol. 9, no. 7,
[18] H. McGurk and J. MacDonald, ‘‘Hearing pp. 2413–2418, 2009.
[34] H. Becker et al., ‘‘A performance study of
lips and seeing voices,’’ Nature, vol. 264,
various brain source imaging approaches,’’ in [50] Planck Collaboration, ‘‘Planck 2013
no. 5588, pp. 746–748, Dec. 1976.
Proc. Int. Conf. Acoust. Speech Signal Process., results. I. Overview of products and scientific
[19] D. McLaughlin, ‘‘An integrated approach to Florence, Italy, May 2014, pp. 5910–5914. results,’’ Astronomy Astrophys., vol. 571,
hydrologic data assimilation: Interpolation, no. A1, pp. 1–48, Nov. 2014.
[35] C. M. A. Hoeks et al., ‘‘Prostate cancer:
smoothing, and filtering,’’ Adv. Water
Multiparametric MR imaging for detection, [51] G. Hinshaw et al., ‘‘Nine-year Wilkinson,
Resour., vol. 25, no. 8–12, pp. 1275–1286,
localization, and staging,’’ Radiology, vol. 261, microwave anisotropy probe (WMAP)
Aug.–Dec. 2002.
no. 1, pp. 46–66, Oct. 2011. observations: Cosmological parameter
[20] H. Boström et al., ‘‘On the definition of results,’’ Astrophys. J. Suppl. Ser., vol. 208,
[36] A. R. Croitor-Sava et al., ‘‘Fusing in vivo
information fusion as a field of research,’’ no. 2, pp. 19-1–19-25, 2013.
and ex vivo NMR sources of information
Schl. Humanities Inf., Univ. Skövde, Skövde,
for brain tumor classification,’’ Meas. [52] Planck Collaboration, ‘‘Planck 2013
Sweden, Tech. Rep. HS-IKI-TR-07-006,
Sci. Technol., vol. 22, no. 11, 2011, results. XVI. Cosmological parameters,’’
2007.
Art. ID. 114012. Astronomy Astrophys., vol. 571, no. A16,
[21] V. D. Calhoun and T. Adall, ‘‘Feature-based pp. 1–66, Nov. 2014.
[37] M. Garibaldi and V. Zarzoso, ‘‘Exploiting
fusion of medical imaging data,’’ IEEE
intracardiac and surface recording modalities [53] A. A. Penzias and R. W. Wilson,
Trans. Inf. Technol. Biomed., vol. 13, no. 5,
for atrial signal extraction in atrial ‘‘A measurement of excess antenna
pp. 711–720, Sep. 2009.
fibrillation,’’ in Proc. Eng. Med. Biol. Conf., temperature at 4080 Mc/s,’’ Astrophys. J.,
[22] M. Hämäläinen, R. Hari, R. J. Ilmoniemi, Osaka, Japan, Jul. 2013, pp. 6015–6018. vol. 142, pp. 419–421, Jul. 1965.
J. Knuutila, and O. V. Lounasmaa,
[38] A. Van de Vel et al., ‘‘Non-EEG [54] M. Betoule et al., ‘‘Improved cosmological
‘‘MagnetoencephalographyVTheory,
seizure-detection systems and potential constraints from a joint analysis of the
instrumentation, and applications to
SUDEP prevention: State of the art,’’ SDSS-II and SNLS supernova samples,’’
noninvasive studies of the working human
SeizureVEur. J. Epilep., vol. 22, no. 5, Astronomy Astrophys., vol. 568, Aug. 2014,
brain,’’ Rev. Mod. Phys., vol. 65, pp. 413–497,
pp. 345–355, Jun. 2013. Art. ID. A22.
Apr. 1993.
[39] N. Yokoya, T. Yairi, and A. Iwasaki, [55] N. Palanque-Delabrouille et al., ‘‘Constraint
[23] B. Rivet, W. Wang, S. M. Naqvi, and
‘‘Hyperspectral, multispectral, and on neutrino masses from SDSS-III/BOSS
J. A. Chambers, ‘‘Audiovisual speech
panchromatic data fusion based on coupled Ly forest and other cosmological probes,’’
source separation: An overview of key
non-negative matrix factorization,’’ in Proc. J. Cosmol. Astroparticle Phys., vol. 2015, no. 2,
methodologies,’’ IEEE Signal Process. Mag.,
Workshop on Hyperspectral Image and Signal p. 045, Feb. 2015.
vol. 31, no. 3, pp. 125–134, May 2014.
Processing: Evolution in Remote Sensing [56] M. Krawczyk, D. Sokoaowska, P. Swaczyna,
[24] M. Beal, N. Jojic, and H. Attias, ‘‘A graphical (WHISPERS), Lisbon, Portugal, Jun. 2011, and B. Świeżewska, ‘‘Constraining inert
model for audiovisual object tracking,’’ IEEE DOI: 10.1109/WHISPERS.2011.6080924. dark matter by R and WMAP data,’’
Trans. Pattern Anal. Mach. Intell., vol. 25,
[40] G. Vivone et al., ‘‘A critical comparison J. High Energy Phys., vol. 55, no. 9,
no. 7, pp. 828–836, Jul. 2003.
among pansharpening algorithms,’’ IEEE Sep. 2013, DOI: 10.1007/JHEP09(2013)055.
[25] A. Plinge and G. A. Fink, ‘‘Geometry Trans. Geosci. Remote Sens., vol. 53, no. 5, [57] N. D. Sidiropoulos and R. Bro, ‘‘On
calibration of distributed microphone arrays pp. 2565–2586, May 2015. communication diversity for blind
exploiting audio-visual correspondences,’’ in
[41] C. Berger et al., ‘‘Multi-modal and identifiability and the uniqueness of
Proc. EUSIPCO, Lisbon, Portugal, Sep. 2014,
multi-temporal data fusion: Outcome of the low-rank decomposition of N-way arrays,’’ in
pp. 116–120.
2012 GRSS data fusion contest,’’ IEEE J. Sel. Proc. Int. Conf. Acoust. Speech Signal Process.,
[26] P. L. Nunez and R. B. Silberstein, ‘‘On Top. Appl. Earth Observat. Remote Sens., Istanbul, Turkey, Jun. 2000, vol. 5,
the relationship of synaptic activity to vol. 6, no. 3, pp. 1324–1340, Jun. 2013. pp. 2449–2452.
macroscopic measurements: Does
[42] C. Debes et al., ‘‘Hyperspectral and LiDAR [58] N. M. Correa, T. Adall, Y.-O. Li, and
co-registration of EEG with fMRI make
data fusion: Outcome of the 2013 GRSS data V. D. Calhoun, ‘‘Canonical correlation
sense?,’’ Brain Topogr., vol. 13, no. 2,
fusion contest,’’ IEEE J. Sel. Top. Appl. Earth analysis for data fusion and group
pp. 79–96, Dec. 2000.
Observat. Remote Sens., vol. 7, no. 6, inferences,’’ IEEE Signal Process. Mag.,
[27] X. Lei, P. A. Valdes-Sosa, and D. Yao, pp. 2405–2418, Jun. 2014. vol. 27, no. 4, pp. 39–50, Jul. 2010.
‘‘EEG/fMRI fusion based on independent

Vol. 103, No. 9, September 2015 | Proceedings of the IEEE 1473


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

[59] Planck Collaboration, ‘‘Planck 2013 results. [77] J. Paisley and L. Carin, ‘‘Nonparametric decomposition of third-order tensorsVPart
XII. Diffuse component separation,’’ factor analysis with Beta process priors,’’ in II: Uniqueness of the overall
Astronomy Astrophys., vol. 571, Nov. 2014, Proc. ICML, New York, USA: ACM, 2009, decomposition,’’ SIAM J. Matrix Anal. Appl.,
Art. ID. A12. pp. 777–784. vol. 34, no. 3, pp. 876–903, 2013.
[60] J. Sui, T. Adall, Q. Yu, J. Chen, and [78] P. Rai and H. Daumé, III, ‘‘The infinite [95] I. Domanov and L. De Lathauwer, ‘‘Generic
V. Calhoun, ‘‘A review of multivariate hierarchical factor regression model,’’ in uniqueness conditions for the canonical
methods for multimodal fusion of brain Proc. Neural Inf. Process. Syst., Dec. 2008, polyadic decomposition and INDSCAL,’’
imaging data,’’ J. Neurosci. Methods, vol. 204, pp. 1321–1328. 2014. [Online]. Available: http://arxiv.org/
no. 1, pp. 68–81, 2012. [79] D. Knowles and Z. Ghahramani, abs/1405.6238.
[61] N. D. Sidiropoulos, G. B. Giannakis, and ‘‘Nonparametric Bayesian sparse factor [96] L. Chiantini, G. Ottaviani, and
R. Bro, ‘‘Blind PARAFAC receivers for models with application to gene expression N. Vannieuwenhoven, ‘‘An algorithm for
DS-CDMA systems,’’ IEEE Trans. Signal modeling,’’ Ann. Appl. Stat., vol. 5, no. 2B, generic and low-rank specific identifiability
Process., vol. 48, no. 3, pp. 810–823, pp. 1534–1552, Jun. 2011. of complex tensors,’’ SIAM J. Matrix Anal.
Mar. 2000. [80] P. Comon, ‘‘Independent component Appl., vol. 35, no. 4, pp. 1265–1287, 2014.
[62] A. Smilde, R. Bro, and P. Geladi, Multi-Way analysis, a new concept?’’ Signal Process., [97] I. Domanov and L. De Lathauwer,
Analysis: Applications in the Chemical vol. 36, no. 3, pp. 287–314, Apr. 1994. Canonical polyadic decomposition of
Sciences. New York, NY, USA: Wiley, [81] J.-F. Cardoso, ‘‘The three easy routes to third-order tensors: Relaxed uniqueness
Aug. 2004. independent component analysis; contrasts conditions and algebraic algorithm,
[63] P. Comon and C. Jutten, Eds., Handbook and geometry,’’ in Proc. ICA, San Diego, CA, ESAT-STADIUS, KU Leuven, Leuven,
of Blind Source Separation: Independent USA, Dec. 2001, pp. 1–6. Belgium, Tech. Rep. 14-152, 2015.
Component Analysis and Applications, [82] T. Adall, M. Anderson, and G.-S. Fu, [98] K. De Roover, E. Ceulemans,
1st ed. New York, NY, USA: Academic, ‘‘Diversity in independent component and M. E. Timmerman, J. B. Nezlek, and
Feb. 2010. vector analyses: Identifiability, algorithms, P. Onghena, ‘‘Modeling differences in the
[64] T. G. Kolda and B. W. Bader, ‘‘Tensor and applications in medical imaging,’’ IEEE dimensionality of multiblock data by means
decompositions and applications,’’ SIREV, Signal Process. Mag., vol. 31, no. 3, pp. 18–33, of clusterwise simultaneous component
vol. 51, no. 3, pp. 455–500, Sep. 2009. May 2014. analysis,’’ Psychometrika, vol. 78, no. 4,
pp. 648–668, Oct. 2013.
[65] A. Cichocki et al., ‘‘Tensor decompositions [83] J.-F. Cardoso and A. Souloumiac, ‘‘Blind
for signal processing applications: From beamforming for non-Gaussian signals,’’ Inst. [99] E. Acar, M. A. Rasmussen, F. Savorani,
two-way to multiway component analysis,’’ Electr. Eng. Proc. FVRadar Signal Process., T. Næs, and R. Bro, ‘‘Understanding data
IEEE Signal Process. Mag., vol. 32, no. 2, vol. 140, no. 6, pp. 362–370, Dec. 1993. fusion within the framework of coupled
pp. 145–163, Mar. 2015. matrix and tensor factorizations,’’ Chemom.
[84] L. De Lathauwer, ‘‘Signal processing based
Intell. Lab. Syst., vol. 129, pp. 53–63, 2013.
[66] J. B. Kruskal, ‘‘Rank, decomposition, and on multilinear algebra,’’ Ph.D. dissertation,
uniqueness for 3-way and N-way arrays,’’ in KU Leuven, Leuven, Belgium, Sep. 1997. [100] S. Miron, M. Dossot, C. Carteret,
Multiway Data Analysis. Amsterdam, S. Margueron, and D. Brie, ‘‘Joint processing
[85] A. L. F. de Almeida, X. Luciani, A. Stegeman,
The Netherlands: Elsevier, 1989, pp. 7–18. of the parallel and crossed polarized Raman
and P. Comon, ‘‘CONFAC decomposition
spectra and uniqueness in blind nonnegative
[67] J. B. Kruskal, ‘‘Three-way arrays: Rank and approach to blind identification of
source separation,’’ Chemom. Intell. Lab. Syst.,
uniqueness of trilinear decompositions, with underdetermined mixtures based on
vol. 105, no. 1, pp. 7–18, Jan. 2011.
application to arithmetic complexity and generating function derivatives,’’ IEEE
statistics,’’ Linear Algebra Appl., vol. 18, no. 2, Trans. Signal Process., vol. 60, no. 11, [101] E. Acar, C. Aykut-Bingol, H. Bingol, R. Bro,
pp. 95–138, 1977. pp. 5698–5713, Nov. 2012. and B. Yener, ‘‘Multiway analysis of epilepsy
tensors,’’ Bioinformatics, vol. 23, no. 13,
[68] R. A. Horn and C. R. Johnson, Matrix [86] E. Moreau and T. Adall, Blind Identification
pp. i10–i181, 2007.
Analysis. Cambridge, U.K.: Cambridge and Separation of Complex-Valued Signals.
Univ. Press, 1985. Hoboken, NJ, USA: Wiley, 2013. [102] M. De Vos, L. De Lathauwer, B. Vanrumste,
S. Van Huffel, and W. Van Paesschen,
[69] R. A. Harshman and M. E. Lundy, [87] F. L. Hitchcock, ‘‘The expression of a tensor
‘‘Canonical decomposition of ictal scalp EEG
‘‘PARAFAC: Parallel factor analysis,’’ or a polyadic as a sum of products,’’ J. Math.
and accurate source localisation: Principles
Comput. Stat. Data Anal., vol. 18, no. 1, Phys., vol. 6, pp. 164–189, 1927.
and simulation study,’’ Comput. Intell.
pp. 39–72, Aug. 1994. [88] R. A. Harshman, ‘‘Determination and proof Neurosci., vol. 2007, pp. 1–10, 2007.
[70] R. A. Harshman, ‘‘PARAFAC2: Mathematical of minimum uniqueness conditions for
[103] F. Miwakeichi et al., ‘‘Decomposing EEG data
and technical notes,’’ UCLA Working Papers PARAFAC1,’’ UCLA Working Papers Phonet.,
into space-time-frequency components using
Phonet., vol. 22, pp. 30–47, 1972. vol. 22, pp. 111–117, 1972.
parallel factor analysis,’’ NeuroImage, vol. 22,
[71] J.-F. Cardoso, ‘‘Blind signal separation: [89] J. B. Kruskal, ‘‘More factors than subjects, no. 3, pp. 1035–1045, Jul. 2004.
Statistical principles,’’ Proc. IEEE, vol. 86, tests and treatments: An indeterminacy
[104] C. F. Beckmann and S. M. Smith, ‘‘Tensorial
no. 10, pp. 2009–2025, Oct. 1998. theorem for canonical decomposition
extensions of independent component
[72] X. Liu and N. D. Sidiropoulos, ‘‘Cramér-Rao and individual differences scaling,’’
analysis for multisubject fMRI analysis,’’
lower bounds for low-rank decomposition of Psychometrika, vol. 41, no. 3, pp. 281–293,
Neuroimage, vol. 25, no. 1, pp. 294–311,
multidimensional arrays,’’ IEEE Trans. Signal Sep. 1976.
Mar. 2005.
Process., vol. 49, no. 9, pp. 2074–2086, [90] M. Sørensen and L. De Lathauwer, ‘‘Coupled
[105] J. Levin, ‘‘Simultaneous factor analysis of
Sep. 2001. canonical polyadic decompositions and
several Gramian matrices,’’ Psychometrika,
[73] N. D. Sidiropoulos, ‘‘Generalizing (coupled) decompositions in multilinear
vol. 31, no. 3, pp. 413–419, 1966.
Carathéodory’s uniqueness of harmonic rank-ðLr;n ; Lr;n ; 1Þ termsVPart I:
Uniqueness,’’ SIAM J. Matrix Anal. Appl., [106] P. Comon, ‘‘Tensors: A brief introduction,’’
parameterization to N dimensions,’’
vol. 36, no. 2, pp. 496–522, 2015. IEEE Signal Process. Mag., vol. 31, no. 3,
IEEE Trans. Inf. Theory, vol. 47, no. 4,
pp. 44–53, May 2014.
pp. 1687–1690, May 2001. [91] N. D. Sidiropoulos and R. Bro, ‘‘On the
uniqueness of multilinear decomposition of [107] J. J. Lacoume and P. Ruiz, ‘‘Sources
[74] N. D. Sidiropoulos and X. Liu,
N-way arrays,’’ J. Chemometrics, vol. 14, no. 3, indentification: A solution based on the
‘‘Identifiability results for blind beamforming
pp. 229–239, May/Jun. 2000. cumulants,’’ in Proc. 4th Annu. ASSP
in incoherent multipath with small delay
Workshop Spectrum Estimat. Model.,
spread,’’ IEEE Trans. Signal Process., vol. 49, [92] A. Stegeman and N. D. Sidiropoulos, ‘‘On
Minneapolis, MN, USA, Aug. 1988,
no. 1, pp. 228–236, Jan. 2001. Kruskal’s uniqueness condition for the
pp. 199–203.
[75] M. Sørensen and L. De Lathauwer, ‘‘Blind Candecomp/Parafac decomposition,’’ Linear
Algebra Appl., vol. 420, no. 2, pp. 540–552, [108] T. Kim, I. Lee, and T.-W. Lee, ‘‘Independent
signal separation via tensor decomposition
2007. vector analysis: Definition and algorithms,’’
with Vandermonde factor: Canonical
in Proc. Asilomar Conf. Signals Syst. Comput.,
polyadic decomposition,’’ IEEE Trans. Signal [93] I. Domanov and L. De Lathauwer, ‘‘On
Pacific Grove, CA, USA, Nov. 2006,
Process., vol. 61, no. 22, pp. 5507–5519, the uniqueness of the canonical polyadic
pp. 1393–1396.
Nov. 2013. decomposition of third-order tensorsVPart
I: Basic results and uniqueness of one factor [109] T. Kim, T. Eltoft, and T.-W. Lee,
[76] L. De Lathauwer and A. de Baynast, ‘‘Blind
matrix,’’ SIAM J. Matrix Anal. Appl., vol. 34, ‘‘Independent vector analysis: An extension
deconvolution of DS-CDMA signals by
no. 3, pp. 855–875, 2013. of ICA to multivariate components,’’ in
means of decomposition in rank-ð1; L; LÞ
Independent Component Analysis and Blind
terms,’’ IEEE Trans. Signal Process., vol. 56, [94] I. Domanov and L. De Lathauwer, ‘‘On
Signal Separation, vol. 3889. Berlin,
no. 4, pp. 1562–1571, Apr. 2008. the uniqueness of the canonical polyadic

1474 Proceedings of the IEEE | Vol. 103, No. 9, September 2015


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

Germany: Springer-Verlag, 2006, [125] M. Sørensen and L. De Lathauwer, ‘‘Coupled Nonlinear Theory Appl., vol. 1, no. 1,
pp. 165–172. tensor decompositions for applications pp. 37–68, 2010.
[110] Y.-O. Li, T. Adall, W. Wang, and in array signal processing,’’ in Proc. Int. [141] Q. Zhao et al., ‘‘Higher order partial least
V. D. Calhoun, ‘‘Joint blind source separation Workshop Comput. Adv. Multi-Sensor Adapt. squares (HOPLS): A generalized multilinear
by multiset canonical correlation analysis,’’ Process., 2013, pp. 228–231. regression method,’’ IEEE Trans. Pattern
IEEE Trans. Signal Process., vol. 57, no. 10, [126] R. A. Harshman and M. E. Lundy, ‘‘Data Anal. Mach. Intell., vol. 35, no. 7,
pp. 3918–3929, Oct. 2009. preprocessing and the extended PARAFAC pp. 1660–1673, Jul. 2013.
[111] X.-L. Li, T. Adall, and M. Anderson, ‘‘Joint model,’’ in Research Methods for Multimode [142] S. A. Khan, E. Leppäaho, and S. Kaski,
blind source separation by generalized joint Data Analysis. New York, NY, USA: Multi-tensor factorization, 2014. [Online].
diagonalization of cumulant matrices,’’ Signal Praeger, 1984, pp. 216–284. Available: http://arxiv.org/abs/1412.4679.
Process., vol. 91, no. 10, pp. 2314–2322, [127] Joint data analysis for enhanced knowledge [143] D. Lahat, T. Adall, and C. Jutten, ‘‘Challenges
Oct. 2011. discovery in metabolomics. [Online]. in multimodal data fusion,’’ in Proc. EUSIPCO,
[112] J.-H. Lee, T.-W. Lee, F. A. Jolesz, and Available: http://www.models.life.ku.dk/ Lisbon, Portugal, Sep. 2014, pp. 101–105.
S.-S. Yoo, ‘‘Independent vector analysis joda.
[144] G. Monaci, P. Vandergheynst, and
(IVA): Multivariate approach for fMRI [128] P. Comon, ‘‘Supervised classification, a F. T. Sommer, ‘‘Learning bimodal structure
group study,’’ NeuroImage, vol. 40, no. 1, probabilistic approach,’’ in Proc. Eur. Symp. in audio-visual data,’’ IEEE Trans. Neural
pp. 86–109, Mar. 2008. Artif. Neural Netw., Brussels, Belgium, Netw., vol. 20, no. 12, pp. 1898–1910,
[113] A. M. Michael, M. Anderson, R. L. Miller, Apr. 1995, pp. 111–128. Dec. 2009.
T. Adall, and V. D. Calhoun, ‘‘Preserving [129] L. De Lathauwer, B. De Moor, and [145] R. Cabral Farias, J. E. Cohen, C. Jutten, and
subject variability in group fMRI analysis: J. Vandewalle, ‘‘Fetal electrocardiogram P. Comon, ‘‘Joint decompositions with
Performance evaluation of GICA vs. IVA,’’ extraction by source subspace separation,’’ in flexible couplings,’’ in Proc. LVA/ICA,
Front. Syst. Neurosci., vol. 8, Jun. 2014, Proc. IEEE Signal Process./ATHOS Workshop Liberec, Czech Republic, Aug. 2015.
Art. ID. 106. HOS, Girona, Spain, Jun. 1995, pp. 134–138.
[146] A. Liutkus, U. yimzekli, and T. Cemgil,
[114] Y. Levin-Schwartz, V. D. Calhoun, and [130] J.-F. Cardoso, ‘‘Multidimensional ‘‘Extraction of temporal patterns in
T. Adall, ‘‘Data-driven fusion of EEG, independent component analysis,’’ in Proc. multi-rate and multi-modal datasets,’’ in
functional and structural MRI: A comparison Int. Conf. Acoust. Speech Signal Process., Proc. LVA/ICA, Liberec, Czech Republic,
of two models,’’ in Proc. Conf. Inf. Sci. Syst., Seattle, WA, USA, May 1998, vol. 4, Aug. 2015.
Princeton, NJ, USA, Mar. 2014, DOI: 10. pp. 1941–1944.
[147] T. Wilderjans, E. Ceulemans, and
1109/CISS.2014.6814108. [131] A. Hyvärinen and P. O. Hoyer, ‘‘Emergence I. Van Mechelen, ‘‘Simultaneous analysis
[115] S. Ma, V. D. Calhoun, R. Phlypo, and of phase and shift invariant features by of coupled data blocks differing in size: A
T. Adall, ‘‘Dynamic changes of spatial decomposition of natural images into comparison of two weighting schemes,’’
functional network connectivity in healthy independent feature subspaces,’’ Neural Comput. Stat. Data Anal., vol. 53, no. 4,
individuals and schizophrenia patients using Comput., vol. 12, no. 7, pp. 1705–1720, pp. 1086–1098, Feb. 2009.
independent vector analysis,’’ NeuroImage, Jul. 2000.
[148] F. Maes, A. Collignon, D. Vandermeulen,
vol. 90, pp. 196–206, Apr. 2014. [132] D. Lahat, J.-F. Cardoso, and H. Messer, G. Marchal, and P. Suetens, ‘‘Multimodality
[116] Y.-O. Li, T. Eichele, V. D. Calhoun, and ‘‘Second-order multidimensional ICA: image registration by maximization of
T. Adall, ‘‘Group study of simulated driving Performance analysis,’’ IEEE Trans. Signal mutual information,’’ IEEE Trans. Med.
fMRI data by multiset canonical correlation Process., vol. 60, no. 9, pp. 4598–4610, Sep. Imag., vol. 16, no. 2, pp. 187–198, Apr. 1997.
analysis,’’ J. Signal Process. Syst., vol. 68, 2012.
[149] D. Loeckx, P. Slagmolen, F. Maes,
no. 1, pp. 31–48, Jul. 2012. [133] M. Castella and P. Comon, ‘‘Blind separation D. Vandermeulen, and P. Suetens, ‘‘Nonrigid
[117] J. Sui et al., ‘‘Three-way (N-way) fusion of of instantaneous mixtures of dependent image registration using conditional mutual
brain imaging data based on mCCA+jICA sources,’’ in Independent Component Analysis information,’’ IEEE Trans. Med. Imag.,
and its application to discriminating and Signal Separation, vol. 4666, vol. 29, no. 1, pp. 19–29, Jan. 2010.
schizophrenia,’’ NeuroImage, vol. 66, M. E. Davies, C. J. James, S. A. Abdallah, and
[150] R. Bro, ‘‘Multiway calibration. Multilinear
pp. 119–132, Feb. 2013. M. D. Plumbley, Eds. Berlin, Germany:
PLS,’’ J. Chemometrics, vol. 10, no. 1,
[118] A. Nielsen, ‘‘Multiset canonical correlations Springer-Verlag, 2007, pp. 9–16.
pp. 47–61, Jan./Feb. 1996.
analysis and multispectral, truly [134] A. Boudjellal, K. Abed-Meraim,
[151] F. Marini and R. Bro, ‘‘SCREAM: A novel
multitemporal remote sensing data,’’ A. Belouchrani, and P. Ravier, ‘‘Informed
method for multi-way regression problems
IEEE Trans. Image Process., vol. 11, no. 3, separation of dependent sources using joint
with shifts and shape changes in one mode,’’
pp. 293–305, Mar. 2002. matrix decomposition,’’ in Proc. EUSIPCO,
Chemom. Intell. Lab. Syst., vol. 129, Special
[119] D. Lahat and C. Jutten, ‘‘Joint blind source Lisbon, Portugal, Sep. 2014, pp. 1945–1949.
Issue: Multiway and Multiset Methods,
separation of multidimensional components: [135] L. De Lathauwer, ‘‘Decompositions of a pp. 64–75, 2013.
Model and algorithm,’’ in Proc. EUSIPCO, higher-order tensor in block terms. Part II:
[152] K. G. Jöreskog, ‘‘Simultaneous factor analysis
Lisbon, Portugal, Sep. 2014, pp. 1417–1421. Definitions and uniqueness,’’ SIAM J. Matrix
in several populations,’’ Psychometrika,
[120] R. F. Silva, S. Plis, T. Adall, and Anal. Appl., vol. 30, no. 3, pp. 1033–1066,
vol. 36, no. 4, pp. 409–426, Dec. 1971.
V. D. Calhoun, ‘‘Multidataset independent 2008.
[153] T. F. Wilderjans, E. Ceulemans,
subspace analysis extends independent [136] R. A. Harshman, ‘‘Models for analysis
I. Van Mechelen, and R. A. van den Berg,
vector analysis,’’ in Proc. Int. Conf. of asymmetrical relationships among N
‘‘Simultaneous analysis of coupled data
Image Process., Paris, France, Oct. 2014, objects or stimuli,’’ in Proc. 1st Joint Meeting
matrices subject to different amounts of
pp. 2864–2868. Psychometric Soc./ Soc. Math. Psychol.,
noise,’’ Brit. J. Math. Stat. Psychol., vol. 64,
[121] D. Lahat and C. Jutten, ‘‘Joint independent Hamilton, Ontario, Canada, Aug. 1978,
no. 2, pp. 277–290, May 2011.
subspace analysis using second-order pp. 1–25.
[154] A. R. Groves, C. F. Beckmann, S. M. Smith,
statistics,’’ GIPSA-Lab, Grenoble, France, [137] G. Favier and A. L. F. de Almeida, ‘‘Overview
and M. W. Woolrich, ‘‘Linked independent
Tech. Rep. hal-01132297, Mar. 2015. of constrained PARAFAC models,’’ EURASIP
component analysis for multimodal data
[122] J. Chatel-Goldman, M. Congedo, and J. Adv. Signal Process., 2014, Art. ID. 142.
fusion,’’ NeuroImage, vol. 54, no. 3,
R. Phlypo, ‘‘Joint BSS as a natural analysis [138] M. De Vos, D. Nion, S. Van Huffel, and pp. 2198–2217, Feb. 2011.
framework for EEG-hyperscanning,’’ in L. De Lathauwer, ‘‘A combination of
[155] T. F. Wilderjans, E. Ceulemans, and
Proc. Int. Conf. Acoust. Speech Signal parallel factor and independent component
I. Van Mechelen, ‘‘The SIMCLAS model:
Process., Vancouver, BC, Canada, May 2013, analysis,’’ Signal Process., vol. 92, no. 12,
Simultaneous analysis of coupled binary data
pp. 1212–1216. pp. 2990–2999, 2012.
matrices with noise heterogeneity between
[123] M. Anderson, G.-S. Fu, R. Phlypo, and [139] T. Yokota, A. Cichocki, and Y. Yamashita, and within data blocks,’’ Psychometrika,
T. Adall, ‘‘Independent vector analysis: ‘‘Linked PARAFAC/CP tensor decomposition vol. 77, no. 4, pp. 724–740, Oct. 2012.
Identification conditions and performance and its fast implementation for multi-block
[156] U. yimzekli, B. Ermiz, A. T. Cemgil, and
bounds,’’ IEEE Trans. Signal Process., vol. 62, tensor analysis,’’ in Neural Information
E. Acar, ‘‘Optimal weight learning for
no. 17, pp. 4399–4410, Sep. 2014. Processing, vol. 7665, T. Huang, Z. Zeng,
coupled tensor factorization with mixed
[124] M. Sørensen and L. De Lathauwer, C. Li, and C. Leung, Eds. Berlin, Germany:
divergences,’’ in Proc. EUSIPCO, Marrakech,
‘‘Multidimensional harmonic retrieval via Springer-Verlag, 2012, pp. 84–91.
Morocco, Sep. 2013, pp. 1–5.
coupled canonical polyadic decomposition,’’ [140] A. H. Phan and A. Cichocki, ‘‘Tensor
[157] I. Bloch, ‘‘Information combination
ESAT-SISTA, KU Leuven, Leuven, Belgium, decompositions for feature extraction and
operators for data fusion: A comparative
Internal Rep. 13-240, 2013. classification of high dimensional datasets,’’

Vol. 103, No. 9, September 2015 | Proceedings of the IEEE 1475


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

review with classification,’’ IEEE Trans. Syst. [172] L. Omberg, G. H. Golub, and O. Alter, ‘‘A in Proc. AISTATS, Clearwater Beach, FL,
Man Cybern. A, Syst. Humans, vol. 26, no. 1, tensor higher-order singular value USA, 2009, vol. 5, pp. 320–327.
pp. 52–67, Jan. 1996. decomposition for integrative analysis of [187] R. R. Lederman and R. Talmon, ‘‘Common
[158] N. A. Tmazirte, M. E. El Najjar, C. Smaili, DNA microarray data from different manifold learning using alternating-
and D. Pomorski, ‘‘Dynamical studies,’’ Proc. Nat. Acad. Sci. USA, vol. 104, diffusion,’’ Yale Univ., New Haven, CT,
reconfiguration strategy of a multi sensor no. 47, pp. 18371–18376, Nov. 2007. USA, Tech. Rep. YALEU/DCS/TR-1497,
data fusion algorithm based on information [173] V. D. Calhoun, T. Adall, G. D. Pearlson, and Mar. 2015.
theory,’’ in Proc. IEEE Intell. Veh. Symp., K. A. Kiehl, ‘‘Neuronal chronometry of target [188] J. Liu et al., ‘‘Combining fMRI and SNP data
Gold Coast, Qld., Australia, Jun. 2013, detection: Fusion of hemodynamic and to investigate connections between brain
pp. 896–901. event-related potential data,’’ NeuroImage, function and genetics using parallel ICA,’’
[159] M. Kumar, D. P. Garg, and R. A. Zachery, ‘‘A vol. 30, no. 2, pp. 544–553, Apr. 2006. Human Brain Mapping, vol. 30, no. 1,
method for judicious fusion of inconsistent [174] V. D. Calhoun, T. Adall, G. Pearlson, and pp. 241–255, Jan. 2009.
multiple sensor data,’’ IEEE Sensors J., vol. 7, J. Pekar, ‘‘Group ICA of functional MRI data: [189] N. Seichepine, S. Essid, C. Févotte, and
no. 5, pp. 723–733, May 2007. Separability, stationarity, and inference,’’ in O. Cappé, ‘‘Soft nonnegative matrix
[160] E. Acar, D. M. Dunlavy, T. G. Kolda, and Proc. ICA, San Diego, CA, USA, Dec. 2001, co-factorization,’’ IEEE Trans. Signal Process.,
M. Mørup, ‘‘Scalable tensor factorizations for pp. 155–160. vol. 62, no. 22, pp. 5940–5949, Nov. 2014.
incomplete data,’’ Chemom. Intell. Lab. Syst., [175] V. D. Calhoun, T. Adall, G. D. Pearlson, and [190] L. De Lathauwer, B. De Moor, and
vol. 106, no. 1, pp. 41–56, Mar. 2011. J. J. Pekar, ‘‘A method for making group J. Vandewalle, ‘‘Independent component
[161] B. Ermiz, E. Acar, and A. Cemgil, ‘‘Link inferences from functional MRI data using analysis and (simultaneous) third-order
prediction in heterogeneous data via independent component analysis,’’ Human tensor diagonalization,’’ IEEE Trans. Signal
generalized coupled tensor factorization,’’ Brain Mapping, vol. 14, no. 3, pp. 140–151, Process., vol. 49, no. 10, pp. 2262–2271,
Data Mining Knowl. Disc., vol. 29, no. 1, Nov. 2001. Oct. 2001.
pp. 203–236, 2015. [176] H. A. L. Kiers and J. M. F. ten Berge, [191] G. Chabriel et al., ‘‘Joint matrices
[162] N. Vervliet, O. Debals, L. Sorber, and ‘‘Hierarchical relations between methods for decompositions and blind source separation:
L. De Lathauwer, ‘‘Breaking the curse of simultaneous component analysis and a A survey of methods, identification, and
dimensionality using decompositions of technique for rotation to a simple applications,’’ IEEE Signal Process. Mag.,
incomplete tensors: Tensor-based scientific simultaneous structure,’’ Brit. J. Math. Stat. vol. 31, no. 3, pp. 34–43, 2014.
computing in big data analysis,’’ IEEE Signal Psychol., vol. 47, no. 1, pp. 109–126,
[192] A. Cichocki, ‘‘Tensor decompositions: A new
Process. Mag., vol. 31, no. 5, pp. 71–79, May 1994.
concept in brain data analysis?’’ 2013.
Sep. 2014. [177] K. Van Deun et al., ‘‘DISCO-SCA and [Online]. Available: http://arxiv.org/abs/
[163] A. P. Singh and G. J. Gordon, ‘‘A unified view properly applied GSVD as swinging 1305.0395.
of matrix factorization models,’’ in Machine methods to find common and distinctive
[193] M. Schouteden, K. Van Deun,
Learning and Knowledge Discovery in processes,’’ PloS One, vol. 7, no. 5, May 2012,
T. F. Wilderjans, and I. Van Mechelen,
Databases, vol. 5212, W. Daelemans, Art. ID. e37840.
‘‘Performing DISCO-SCA to search for
B. Goethals, and K. Morik, Eds. Berlin, [178] K. De Roover, M. E. Timmerman, distinctive and common information in
Germany: Springer-Verlag, 2008, I. Van Mechelen, and E. Ceulemans, ‘‘On linked data,’’ Behav. Res. Methods, vol. 46,
pp. 358–373. the added value of multiset methods for no. 2, pp. 576–587, Jun. 2013.
[164] A. P. Singh and G. J. Gordon, ‘‘Relational three-way data analysis,’’ Chemom. Intell. Lab.
[194] M. Schouteden, K. Van Deun, S. Pattyn, and
learning via collective matrix factorization,’’ Syst., vol. 129, pp. 98–107, Nov. 2013.
I. Mechelen, ‘‘SCA with rotation to
in Proc. 14th ACM SIGKDD Int. Conf. Knowl. [179] K. Van Deun, A. K. Smilde, L. Thorrez, distinguish common and distinctive
Disc. Data Mining, Las Vegas, NV, USA, H. A. L. Kiers, and I. Van Mechelen, information in linked data,’’ Behav. Res.
Aug. 2008, pp. 650–658. ‘‘Identifying common and distinctive Methods, vol. 45, no. 3, pp. 822–833, 2013.
[165] E. Bullmore and O. Sporns, ‘‘Complex brain processes underlying multiset data,’’
[195] L. De Lathauwer, B. De Moor, and
networks: Graph theoretical analysis of Chemom. Intell. Lab. Syst., vol. 129,
J. Vandewalle, ‘‘A multilinear singular value
structural and functional systems,’’ Nature pp. 40–51, Nov. 2013.
decomposition,’’ SIAM J. Matrix Anal. Appl.,
Rev. Neurosci., vol. 10, no. 3, pp. 186–198, [180] S. Virtanen, A. Klami, S. Khan, and S. Kaski, vol. 21, no. 4, pp. 1253–1278, 2000.
Mar. 2009. ‘‘Bayesian group factor analysis,’’ in Proc.
[196] E. Acar, A. J. Lawaetz, M. A. Rasmussen, and
[166] N. M. Correa, T. Eichele, T. Adall, Y.-O. Li, AISTATS, La Palma, Canary Islands,
R. Bro, ‘‘Structure-revealing data fusion
and V. D. Calhoun, ‘‘Multi-set canonical Apr. 2012, vol. 22, pp. 1269–1277.
model with applications in metabolomics,’’
correlation analysis for the fusion of [181] S. A. Khan and S. Kaski, ‘‘Bayesian in Proc. Eng. Med. Biol. Conf., Osaka, Japan,
concurrent single trial ERP and functional multi-view tensor factorization,’’ in Machine Jul. 2013, pp. 6023–6026.
MRI,’’ NeuroImage, vol. 50, no. 4, Learning and Knowledge Discovery in
[197] I. Domanov and L. De Lathauwer,
pp. 1438–1445, May 2010. Databases, vol. 8724, T. Calders, F. Esposito,
‘‘Canonical polyadic decomposition of
[167] R. Stompor, ‘‘Data analysis of massive data E. Hüllermeier, and R. Meo, Eds. Berlin,
third-order tensors: Reduction to generalized
sets a Planck example,’’ presented at the Germany: Springer-Verlag, 2014,
eigenvalue decomposition,’’ SIAM J. Matrix
LOFAR Workshop, Meudon, France, pp. 656–671.
Anal. Appl., vol. 35, no. 2, pp. 636–660,
Mar. 2006. [Online]. Available: http.lesia. [182] N. Yokoya, T. Yairi, and A. Iwasaki, ‘‘Coupled Apr./May 2014.
obspm.fr/plasma/LOFAR2006/Stompor.pdf. nonnegative matrix factorization
[198] A. Yeredor, ‘‘Performance analysis of GEVD-
[168] BICEP2/Keck and Planck Collaborations, unmixing for hyperspectral and multispectral
based source separation with second-order
‘‘A joint analysis of BICEP2/Keck Array and data fusion,’’ IEEE Trans. Geosci. Remote
statistics,’’ IEEE Trans. Signal Process.,
Planck data,’’ Phys. Rev. Lett., Sens., vol. 50, no. 2, pp. 528–537,
vol. 59, no. 10, pp. 5077–5082, Oct. 2011.
vol. 114, no. 10, Mar. 2015, Art. ID. 101301. Feb. 2012.
[199] T. W. Anderson, An Introduction to
[169] T. Virtanen, ‘‘Monaural sound source [183] Q. Wei, N. Dobigeon, and J.-Y. Tourneret,
Multivariate Statistical Analysis. New York,
separation by nonnegative matrix ‘‘Bayesian fusion of hyperspectral and
NY, USA: Wiley, 1958.
factorization with temporal continuity and multispectral images,’’ in Proc. Int. Conf.
Acoust. Speech Signal Process., Florence, Italy, [200] M. Sørensen, I. Domanov, and
sparseness criteria,’’ IEEE Trans. Audio
May 2014, pp. 3176–3180. L. De Lathauwer, ‘‘Coupled canonical
Speech Lang. Process., vol. 15, no. 3,
polyadic decompositions and (coupled)
pp. 1066–1074, Mar. 2007. [184] Q. Wei, N. Dobigeon, and J.-Y. Tourneret,
decompositions in multilinear
[170] O. Alter, P. O. Brown, and D. Botstein, ‘‘Bayesian fusion of multispectral and
rank-ðLr;n ; Lr;n ; 1Þ termsVPart II:
‘‘Generalized singular value decomposition hyperspectral images with unknown sensor
Algorithms,’’ SIAM J. Matrix Anal. Appl.,
for comparative analysis of genome-scale spectral response,’’ in Proc. Int. Conf.
vol. 36, no. 3, pp. 1015–1045, 2015.
expression data sets of two different Inf. Process., Paris, France, Oct. 2014,
pp. 698–702. [201] Y. Guo and G. Pagnoni, ‘‘A unified
organisms,’’ Proc. Nat. Acad. Sci., vol. 100,
framework for group independent
no. 6, pp. 3351–3356, 2003. [185] Q. Wei, J. M. Bioucas-Dias, N. Dobigeon,
component analysis for multi-subject
[171] S. P. Ponnapalli, M. A. Saunders, and J.-Y. Tourneret, ‘‘Fusion of multispectral
fMRI data,’’ NeuroImage, vol. 42, no. 3,
C. F. Van Loan, and O. Alter, ‘‘A higher-order and hyperspectral images based on sparse
pp. 1078–1093, Sep. 2008.
generalized singular value decomposition for representation,’’ in Proc. EUSIPCO, Lisbon,
Portugal, Sep. 2014, pp. 1577–1581. [202] M. Anderson, T. Adall, and X.-L. Li, ‘‘Joint
comparison of global mRNA expression from
blind source separation with multivariate
multiple organisms,’’ PLoS One, vol. 6, no. 12, [186] H. Lee and S. Choi, ‘‘Group nonnegative
Gaussian model: Algorithms and performance
Dec. 2011, Art. ID. e28072. matrix factorization for EEG classification,’’

1476 Proceedings of the IEEE | Vol. 103, No. 9, September 2015


Lahat et al.: Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

analysis,’’ IEEE Trans. Signal Process., vol. 60, [206] S. Sun, ‘‘A survey of multi-view machine Trans. Signal Process., vol. 63, no. 10,
no. 4, pp. 1672–1683, Apr. 2012. learning,’’ Neural Comput. Appl., vol. 23, pp. 2485–2495, May 2015.
[203] P. Geladi and B. R. Kowalski, ‘‘Partial least- no. 7–8, pp. 2031–2038, Dec. 2013. [210] N. Sidiropoulos, E. Papalexakis, and
squares regression: A tutorial,’’ Analytica [207] N. Srivastava and R. R. Salakhutdinov, C. Faloutsos, ‘‘Parallel randomly compressed
Chimica Acta, vol. 185, pp. 1–17, 1986. ‘‘Multimodal learning with deep Boltzmann cubes: A scalable distributed architecture for
[204] M. J. Beal, H. Attias, and N. Jojic, ‘‘Audio- machines,’’ in Proc. NIPS, F. Pereira, big tensor decomposition,’’ IEEE Signal
video sensor fusion with probabilistic C. Burges, L. Bottou, and K. Weinberger, Process. Mag., vol. 31, no. 5, pp. 57–70,
graphical models,’’ in Proc. ECCV, vol. 2350, Eds. Lake Tahoe, NV, USA: Curran Sep. 2014.
A. Heyden, G. Sparr, M. Nielsen, and Associates, 2012, pp. 2222–2230. [211] A. Cichocki, ‘‘Era of Big Data processing: A
P. Johansen, Eds. Berlin, Germany: [208] L. Sorber, M. Van Barel, and new approach via tensor networks and tensor
Springer-Verlag, May 28–31, 2002, L. De Lathauwer, Tensorlab v2.0, Jan. 2014. decompositions,’’ 2014. [Online]. Available:
pp. 736–750. [Online]. Available: http://www.tensorlab. http://arxiv.org/abs/1403.2048.
[205] A. Blum and T. Mitchell, ‘‘Combining net/.
labeled and unlabeled data with co-training,’’ [209] S. Sahnoun and P. Comon, ‘‘Joint source
in Proc. COLT, 1998, pp. 92–100. estimation and localization,’’ IEEE

ABOUT THE AUTHORS


Dana Lahat received the Ph.D. degree in electrical Christian Jutten (Fellow, IEEE) received the Ph.D.
engineering from Tel Aviv University, Tel Aviv, and Doctor ès Sciences degrees in signal proces-
Israel, in 2013. sing from Grenoble Institute of Technology (GIT),
She is currently a Postdoctoral Researcher at Grenoble, France, in 1981 and 1987, respectively.
the Grenoble Images, Speech, Signals and Control From 1982, he was an Associate Professor at
Lab (GIPSA-lab), Grenoble, France. She has been GIT, before becoming Full Professor at the
awarded the Chateaubriand Fellowship of the University Joseph Fourier of Grenoble, Grenoble,
French Government for the academic year 2007– France, in 1989. For 30 years, his research
2008. Her research interests include statistical interests have been machine learning and source
and deterministic methods for signal and data separation, including theory (separability, source
processing, blind source separation, and linear and multilinear algebra. separation in nonlinear mixtures, sparsity, multimodality) and applica-
tions (brain and hyperspectral imaging, chemical sensor array, speech).
He is author or coauthor of more than 85 papers in international journals,
four books, 25 keynote plenary talks, and 190 communications in
international conferences. He has been Visiting Professor at Swiss
Tülay Adall (Fellow, IEEE) received the Ph.D.
Federal Polytechnic Institute (Lausanne, Switzerland, 1989), at Riken labs
degree in electrical engineering from North Car-
(Japan, 1996), and at Campinas University (Brazil, 2010). He was Director
olina State University, Raleigh, NC, USA, in 1992.
or Deputy Director of his lab from 1993 to 2010, as well as Head of the
She joined the faculty at the University of
Signal Processing Department (120 people) and Deputy Director of
Maryland Baltimore County (UMBC), Baltimore,
GIPSA-lab (300 people) from 2007 to 2010. He was a scientific advisor for
MD, USA, in 1992. She is currently a Distinguished
signal and images processing at the French Ministry of Research (1996–
University Professor in the Department of Com-
1998) and for the French National Research Center (2003–2006). Since
puter Science and Electrical Engineering at UMBC.
May 2012, he has been Deputy Director at the Institute for Information
Her research interests are in the areas of statis-
Sciences, French National Center of Research (CNRS) in charge of signal
tical signal processing, machine learning for signal
and image processing.
processing, and biomedical data analysis.
Prof. Jutten was organizer or Program Chair of many international
Prof. Adall assisted in the organization of a number of international
conferences, including the 1st International Conference on Blind Signal
conferences and workshops including the IEEE International Conference on
Separation and Independent Component Analysis in 1999. He was a
Acoustics, Speech, and Signal Processing (ICASSP), the IEEE International
member of a few IEEE Technical Committees, and currently is a member
Workshop on Neural Networks for Signal Processing (NNSP), and the IEEE
of ‘‘Theory and Methods’’ of the IEEE Signal Processing Society. He
International Workshop on Machine Learning for Signal Processing (MLSP).
received best paper awards from the European Association for Signal
She was the General Co-Chair, NNSP (2001–2003); Technical Chair, MLSP
Processing (EURASIP) in 1992 and the IEEE Geoscience and Remote
(2004–2008); Program Co-Chair, MLSP (2008, 2009, and 2014), 2009
Sensing Society (GRSS) in 2012, and the 1997 Medal Blondel from the
International Conference on Independent Component Analysis and Source
French Electrical Engineering Society for his contributions in source
Separation; Publicity Chair, ICASSP (2000 and 2005); and Publications Co-
separation and independent component analysis. He is the EURASIP
Chair, ICASSP 2008. She is Technical Program Co-Chair for ICASSP 2017 and
Fellow (since 2013). He has been a Senior Member of the Institut
Special Sessions Co-Chair for ICASSP 2018. She chaired the IEEE Signal
Universitaire de France since 2008, with renewal in 2013. He is the
Processing Society (SPS) MLSP Technical Committee (2003–2005, 2011–
recipient of a 2012 ERC Advanced Grant for a project on challenges in
2013), served on the SPS Conference Board (1998–2006), and the Bio
extraction and separation of sources (CHESS).
Imaging and Signal Processing Technical Committee (2004–2007). She was
an Associate Editor for the IEEE TRANSACTIONS ON SIGNAL PROCESSING (2003–
2006), the IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING (2007–2013), the
IEEE JOURNAL OF SELECTED AREAS IN SIGNAL PROCESSING (2010–2013), and
Elsevier Signal Processing Journal (2007–2010). She is currently serving on
the Editorial Boards of the PROCEEDINGS OF THE IEEE and Journal of Signal
Processing Systems for Signal, Image, and Video Technology, and is a
member of the IEEE Signal Processing Theory and Methods Technical
Committee. She is a Fellow of the American Institute of Biomedical and
Medical Engineers (AIMBE), a Fulbright Scholar, recipient of a 2010 IEEE
Signal Processing Society Best Paper Award, 2013 University System of
Maryland Regents’ Award for Research, and an NSF CAREER Award. She was
an IEEE Signal Processing Society Distinguished Lecturer for 2012 and 2013.

Vol. 103, No. 9, September 2015 | Proceedings of the IEEE 1477

You might also like