A_Review_of_Deep_Learning_Methods_for_
A_Review_of_Deep_Learning_Methods_for_
Doha, Qatar Arab Academy for Science, Technlogy, and Maritime Transport
[email protected] Alexandria, Egypt
[email protected]
Abstract— Omics integration in the field of bioinformatics include multiomics and phenotypic data pairs. In section III,
and computational biology has advanced pharmaceutical we review the methods for integrating omics and phenotypic
research and ultimately precision medicine. Recently, data. In section IV, we discuss the challenges and limitations
biomedical research has grown exponentially in terms of the
gathering of genetic and molecular profiles of humans, posing
great interest in linking omics with phenotypic data. The
availability of different omics (e.g., genomics, transcriptomics,
epigenomics, glycomics, proteomics, metabolomics, and
lipidomics) has created opportunities in precision medicine, by
linking an individual’s unique omics profile and phenotypic
data such that the relationship between genotype and
phenotype, disease diagnosis, disease subtyping, and disease
prediction can be understood. There are tremendous challenges
in merging, analyzing, and interpreting the biological insights of
these omics datasets that are growing exponentially. In this
paper, we review the most recent advancements in machine
learning and deep learning-based approaches to harmonize
multiomics and phenotype data. Finally, we discuss the various
challenges in integrating omics with phenotypic data and future
directions.
of applying current deep learning systems to omics
Keywords—multiomics, integration, deep learning, machine integration.
learning
Fig. 1. Overview of different available omics and their link to phenotype and
I. INTRODUCTION disease.
Biomedical research has shifted tremendously toward a II. MULTIOMICS DATASETS AND RESOURCES
big-data-driven approach [1]. This shift is a result of the
drastic increase in the rate of generating genetic and molecular
profiles of human subjects. Machine learning and deep Multiomics data range from genome to proteome,
learning have recently been applied to omics data in various transcriptome, metabolome, and epigenome [19]. Like the
bioinformatics contexts as reviewed in several papers [2-16]. Human Genome Project [20], early studies usually carried out
Recent advancements and integration of electronic health data analysis on a single omics data [21]. Gradually,
record (EHR) system has further facilitated the link between researchers started combining several omics datasets from the
omics and phenotypic data for analysis [17, 18]. This same set of subjects to generate multiomics datasets. This
integration of different omics, including genomics, realization allowed the identification of causative changes
transcriptomics, glycomics, proteomics, metabolomics, and leading to disease.
lipidomics, with medical information has created There are numerous public repositories of multiomics
opportunities for precision medicine (Fig. 1). To understand data [19]. One of the largest and widely used resource for
the relationship between genotype and phenotype, which is multiomics is The Cancer Genome Atlas (TCGA) [22]. It
complicated through the multiple layers that present as various includes more than 33 types of cancer gathered from 20,000
levels of complexity in the model, is important in disease tumor samples. There is also a proteomics dataset from the
subtyping and diagnosis, and may aid in prediction and Clinical Proteomic Tumor Analysis Consortium (CPTAC)
prevention of adverse outcomes (Fig. 2).
[23]. Other resources include the Connectivity Map [24], a
In this review, we summarize the most recent resource rich with over one million gene expression profiles,
advancements, particularly in machine learning and deep TARGET for pediatric cancers [25], Omics Discovery Index
learning in the integration of multiomics data with phenotypic for genomics, transcriptomics, proteomics and metabolomics
data to answer important biological questions that can [26]. Although most of the early multiomics databases
improve personalized human health. In section II, we describe contain cancer samples, there has been a shift to population-
the publicly available and easily accessible data resources based biobanks of multiomics data including the UKBiobank
that
Fig. 2. Integration of multiomics data to study human phenotypes is essential towards improving precision medicine. By combining various machine
learning, deep learning, and computational methods, this integration may allow the analysis of metabolic pathways or forming meaningful biological
networks, thus enabling the study of population health. Identification of disease biomarkers or prediction of disease outcomes may then allow tailoring
of individualized therapies and interventions.
2209
Authorized licensed use limited to: UCLA Library. Downloaded on February 29,2024 at 05:18:49 UTC from IEEE Xplore. Restrictions apply.
method that can simultaneously achieve dimensionality
reduction and data integration. They are used to automatically
capture nonlinear features from unlabeled data while
maintaining input and output integrity. Simple neurons are
combined to construct an AE architecture where the
information from one layer of neurons serves as the input for
the next layer. By keeping the number of input layers and
output layers equal, the layered network architecture is
shaped like a butterfly, with a bottleneck layer (BL) at its
core. Using this type of architecture, the network is forced to
produce a compressed data representation while preserving
the essential characteristics of the original data [49]. There
are several versions of AE that have been used throughout the
literature. For instance, Withnell et al. [50] designed an
explainable deep learning model for cancer classification
using high-dimensional cancer omics data. Their model
(XOmiVAE) is based on a variational autoencoder (VAE)
and shows the impact of every gene on the latent dimensions
for every classification process. In addition, it identifies the
correlation between each gene and latent dimension.
Likewise, Zhang et al. [51] designed a deep learning classifier
for cancer subclass classification named Deep Latent Space
Fusion (DLSF) that combines multiomics data by studying
Fig. 3. Outline of various deep learning techniques used for multiomics consistent manifolds in the latent space. The DLSF model is
integration.
based on a cycle AE having a shared self-expressive layer,
which can fuse nonlinear at every omics level into a single
sample manifold and generate an adaptive interpretation of
A. Unsupervised Deep Learning Approaches diverse samples at the multiomics level. Also, Borhani et al.
1) Denoised Networks [52] designed a non-linear DL model named DIDL to predict
Studies have implemented various deep learning and inter-omics interactions. DIDL is composed of an AE that
machine learning approaches to integrate omics data in the automatically extracts features from biomolecules based on
classification of various cancer subtypes and survival present interactions paired with a predictor that predicts
prediction. Different aspects, such as noise effects and unanticipated interactions. This model has numerous
missing data were thoroughly addressed. Wang et al. [47] strengths including automatic feature extraction, end-to-end
presented a multiomics fusion technique that considers the deep learning, and vigor to network sparsity. In addition,
impact of noise and the unique properties of data-specific because the algorithm relies exclusively on existing layers
patterns which make it hard to reveal coherent patterns and and because the biochemical properties of molecules interact
understand multiomics data. Initially, the authors modeled independently, it can be applicable to various networks.
the error factor in their data reconstruction while considering Later, Lemsara et al. [53] proposed a method named
the noise impact and data-specific patterns. They used a PathMe also based on an autoencoder but combined with
denoised network regularization called “Defusion” to sparse non-negative matrix factorization to effectively cluster
construct fused networks with denoising processes that patients using multiomics data in order to identify cancer
control noise effects and data-specific patterns. The error subtypes. PathMe uses path information to effectively reduce
term combined with the denoised network regularization term the dimension of the omics data into a pathway and a patient-
captures specific data patterns. The optimization problem specific scoring profile. Consequently, it enables the
was solved through an alternating minimization algorithm. identification pathways in specific patient clusters. This
They validated the efficiency of the Defusion network on method was tested on several cancer datasets and may
seven cancer multiomics cohorts from TCGA. Similarly, potentially enable the identification of biologically
Chen et al. [48] proposed a Deep Learning (DL)-based conceivable disease subtypes that are characterized by
method named DeepMF to control noise effects in identifying specific molecular features. In addition, Zhang et al. [54]
cancer subtypes. DeepMF is a deep neural network based on proposed a multi-task deep learning approach for multiomics
matrix factorization. It identifies the correlation between data named OmiEmbed. This method consists of deep
molecular feature-associated and sample-associated latent embedding based on a VAE and downstream task blocks that
matrices. This method is robust to noise and missing values discover latent information from high-dimensional omics
and efficiently identifies cancer subtypes in mRNA, miRNA, data. OmiEmbed fuses multiomics data, reduces feature
and protein profiles of cancers such as medulloblastoma, dimension, identifies cancer subtypes, predicts survival, and
leukemia, breast cancer and small-scale blue-round cell performs feature reconstruction. In contrast, attention
cancer. training was carried out in Zhang et al. [55] with a DL model,
named multiGATAE, to identify cancer subtypes. First,
2) Autoencoders and their variants similarity network integration was used to fuse the
Autoencoders (AEs) are neural networks that are trained multiomics data and create a similarity graph. Then, the
in an unsupervised manner and are considered a deep learning feature matrix and output of the similarity graph were used as
2210
Authorized licensed use limited to: UCLA Library. Downloaded on February 29,2024 at 05:18:49 UTC from IEEE Xplore. Restrictions apply.
inputs to a graph autoencoder that has a graph attention approaches to predict survival cancer subtypes prediction
network and omics-level attention structure, to the interpret using multiomics data was also developed (Poirion et al.)
the embedding. K-means clustering is then used to group the [63]. It identified two subtypes of cancers with the highest
embedding interpretations and to identify cancer subsets. The probability of survival and provided significantly better risk
performance of this approach was evaluated on eight different stratification compared to other multiomics fusion methods.
TCGA datasets. By connecting genetic features to models of extracellular
3) Generative Adversarial Networks matrices, immune deregulation, and mitosis processes in
Generative adversarial networks (GAN) [56] have grown cancer subtypes with poor survival, this pipeline resulted in
in prominence in recent years as a means of resolving issues high predictive capacity for two liver and five breast cancer
related to computational biology. GANs produce realistic datasets.
synthetic data that is similar to real data by mimicking the Viaud et al. [64] also studied several approaches that
distribution of the real data when given random noise or integrate multiomics data, using a group of feature reduction
predetermined data as input. During training, GANs can techniques such as principal component analysis (PCA),
discover non-linear correlations between omics data features multiple factor analysis (MFA), and other extensions. They
that can be applied afterwards to gain more knowledge [57]. investigated DL-based approaches for learning integrated
Because of its ability to learn and mimic any data distribution, representations of multiomics data such as Standard Deep
it can handle missing data and is effective in imputation of Autoencoder (SDAE) and a Disjointed Deep Autoencoder
missing data. Many studies rely on applying GAN-based (DDAE). They used a new combination of a DDAE
algorithms to multiomics datasets. Yang et al. [58] proposed construction and a layer-wise reconstruction loss. These
a deep learning model called Subtype-GAN to identify cancer different representations were combined to identify a group
subtypes. This model is based on an artificial neural network of patients that were biologically similar. They also
that accurately prototypes the multiomics of the cohort. The introduced a weighted internal clustering index to assess how
network extracts latent features and then uses a Gaussian well the clustering information was preserved by each dataset
Mixture model clustering approach to identify cancer while preferring to have contributions from all datasets. This
subtypes. Based on a comparative study that used 10 cancer was applied to two case studies producing clusters that were
multiomics datasets, the clustering pattern generated by the more relevant compared to earlier studies.
Subtype-GAN model was not consistent with the AE method, More hybrid approaches were proposed by Hess et al. [65]
however, it was similar to the VAE. Cancer subtypes were that included a method based on a generative DL model
significantly different clinically according to this method. By which combines VAEs or Boltzmann machines (DBMs).
combining two omics datasets and their interactions, another This method first generates synthetic samples, and then
study [56] designed a biologically-motivated deep learning- extracts patterns from these samples. Their equivalent latent
based model, omicsGAN, to predict disease phenotypes (e.g., interpretations are then learned with the generative DL
mRNAexpression, miRNAexpression, and miRNA-mRNA model. Then the generative DL model classifies clusters of
interaction network). Using information from additional features that are associated with the states of latent attributes.
omics datasets and the interaction network, their model used The latent state data is used to reduce the representation of
a GAN to create new enhanced feature sets for each omics patterns observed in these clusters of attributes. Although this
dataset, thus improving prediction. By creating a functional method appeared promising at first, there are some
interaction network from reconstructed multiomics datasets, limitations to this technique such as requiring synthetic data.
another study [59] used GAN for discovery of better Also, since the number of features is randomly chosen, this
biomarkers. Other applications of GAN include using bulk does not guarantee that the most significant variables are
RNA-seq datasets to construct gene expression data [60] and selected.
using GAN to get rid of batch effects in scRNA-seq data [61].
B. Supervised Deep Learning Approaches
4) Hybrid Approaches
Hybrid approaches combine individual methods that 1) Convolutional Neural Networks
perform a specific task. Some studies have used unsupervised CNNs are designed to resemble the visual cortex of
deep learning hybrid techniques for integrating multiomics the brain and are used to process a variety of data sources,
data in the form of an ensemble of deep learning approaches particularly two-dimensional images. Convolution layers,
or as a combination of machine and deep learning methods. non-linear layers, fully connected layers, and pooling layers
Hybrid approaches combine the strengths of different make up the fundamental components of CNNs. Nowadays,
models. Huang et al. [62] constructed a DL-based model, CNNs are among the most successful deep-learning model
named SALMON to identify genes that predict survival in structures because of their exceptional ability to interpret
breast cancer. To predict the prognosis of cancer, SALMON spatial data [66]. CNNs have been widely used for integrating
combines gene expression data and cancer biomarkers using multiomics data. For example, [67] develop a generalized
artificial neural networks and Cox regression. Instead of CNN model that incorporates data from many omics to
using raw gene expression values as inputs to the DL model, prioritize candidate genes using objective measures. They
new genetic modules that are derived from the gene co- used a non-model organism to test this model. Another study
expression network analysis are used. Experiments have [68] developed a method for integrating multiomics data that
indicated performance improvement when multiple omics is built using a combination of CNNs and a gene similarity
data are used in building the model. It was also possible to network (GSN) based on uniform manifold approximation
identify co-expression modules associated with breast cancer. and projection (UMAP) (CNNs). The technique uses UMAP
Another ensemble pipeline called DeepProg based on a to integrate DNA methylation, copy number alterations
combination of deep learning and machine learning (CNAs), and gene expression into a lower dimension to
2211
Authorized licensed use limited to: UCLA Library. Downloaded on February 29,2024 at 05:18:49 UTC from IEEE Xplore. Restrictions apply.
produce two-dimensional RGB pictures. Using these CNNs aggressiveness of lung tumors [74]. Specifically, their
the Gleason scores of prostate cancer patients and the tumor framework, named 3D Multi-scale Capsule Network (3D-
stage in breast cancer patients were predicted. Likewise, MCN), used 3D inputs to provide information about the
another study [69] used an integrative CNN to construct a nodules in 3D, multi-scale inputs, which capture the local
CNA profile and a gene expression profile based on a features of the nodule as well as the characteristics of the
classification model for molecular subtypes of breast cancer. surrounding tissues, and a CapsNet-based design, which can
They predicted the molecular subtypes of breast cancer handle a limited number of training samples. In the
including the state of the estrogen receptor (ER+ and ER), meantime, Peng et al. [75] used Capsule Network-based
which was considered a binary classification problem, and Modeling of multiomics data, known as CapsNetMMD, as a
the cancer subtype (luminal A, luminal B, HER-2 enriched deep learning approach for the identification of genes
and basal-like), which was a multi-class classification associated with breast cancer. The identified breast cancer-
problem. related genes were used in CapsNetMMD to convert the
Using multiomics data, Yu et al. [70] compared problem of identifying genes into a supervised classification
traditional machine learning and DL techniques, in terms of problem. Several multiomics data, including mRNA
classification accuracy and robustness. They used 37 high- expression data, z-scored mRNA expression data, DNA
dimensional multiomics datasets including transcriptomes methylation, and two types of DNA copy number variations,
and metabolomes. They studied several DL architectures were integrated to obtain gene CNAs features.
including CNNs and Multi-Layer Perceptions (MLP) to find 3) Recurrent Neural Networks
the optimal construction that boosts performance. They used RNNs are a type of artificial neural networks that are
five traditional machine learning algorithms including characterized by having connections between nodes that
Support Vector Machines, (SVMs) Linear Discriminant create a cycle allowing the use of sequential information and
Analysis (LDA), Random Forest (RF), Multinomial Logistic exhibiting temporal dynamic behavior. Recurrent computing
Regression (MLR), and Naïve Bayes (NB) to identify disease occurs in the hidden units where cyclic connections exist
stages or to distinguish diseased samples from normal ones. because the input data is handled sequentially. By
Their highest performance was obtained using the MLP recognizing the sequential properties in data, RNNs can use
structure. Interestingly, further analysis indicated that using patterns to predict the next likely state. The varying length of
an MLP with one hidden layer outperformed the deeper biological sequences and the significance of their sequential
MLPs. information make RNNs an ideal deep learning architecture.
In another study, Shen et al. [71] presented a feature RNNs have been used in several studies to predict protein
aggregation method, named AggMap to reduce multiomics structure [76] [77], in gene expression control [78, 79], and
data by aggregating and mapping omics features into multi- categorization of proteins [80].
channel 2D images representing feature maps. These feature 4) Hybrid Approaches
maps represent the intrinsic correlations of the omics Several supervised deep learning hybrid approaches
features. The images were then fed to a CNN called are used in the literature. Among them, the article study [81]
AggMapNet to diagnose several diseases. Different noise proposed a mixed convolutional neural network and
levels were added to the omics data as a sensitivity analysis. convolutional auto-encoders strategy to build a deep
After combining the unsupervised AggMap with the multi- migratory learning classification model for the early
channel AggMapNet models there was considerable diagnosis of lung cancer. To make the dataset more suitable
improvement and robustness in the learning of both high and for migration learning, its dimensionality is first reduced
low sample omics. using the convolutional auto-encoders approach. Second,
Despite the strengths of CNNs, they have some using the initial dataset and the preexisting labeled dataset, a
weaknesses. First, they require huge datasets and are unable CNN is built, and model migration rules are established. To
to effectively generalize in the absence of a large number of finish building the classification model, a modest number of
images, which is commonly the case with a given cohort of labeled target datasets are employed in the training. On the
patients. The second issue is the "black box" issue, which other hand, Zhang et al. [82] proposed a DL-based model
makes it difficult to determine which features actively named DeepGP for predicting endocrine disease genes via
contribute to the categorization or whether there are omics data. DeepGP depends on convolutional neural
variations in the way features contribute to different networks (CNN) and a graph CNN for selecting significant
subgroups of the population. Although in some instances, it genes of five endocrine diseases. The study showed that type
may be adequate to identify positive or negative samples, it 1 diabetes mellitus (T1DM) and type 2 diabetes mellitus
more information regarding treatment choices contributes (T2DM) are associated with similar genes. Ensemble deep
rather than simple cancer detection is frequently needed [72]. learning was also used by Huang et al. [83] to capture the
This led the way to the development of newer deep-learning relationship between insulin resistance and various
architectures that address these limitations. multiomics. They developed a deep neural network to
2) Capsule Networks identify the contribution of various microbiome features to a
The limitations of CNNs can be addressed by model that discriminates between samples with and without
cutting-edge deep learning architectures known as capsule insulin resistance and insulin sensitivity. The model
networks (also known as CapsNets) [73]. By covering all performance was optimized using hyperparameter
potential rotations and transformations of the underlying optimization of ensemble models using Grid search.
objects, they do not require a huge number of samples. Finally, Dutta et al. [84] constructed a self-attention-
Therefore, CapsNets may be quite useful in the medical field. based DL architecture, called DeePROG a disease-related
Afshar et al. have used CapsNets to predict the genetic prediction tool based on various omics data. They
2212
Authorized licensed use limited to: UCLA Library. Downloaded on February 29,2024 at 05:18:49 UTC from IEEE Xplore. Restrictions apply.
used three NCBI datasets that included three omics datasets: [1] J. Zhao, Q. Feng, and W. Q. Wei, “Integration of Omics and
Phenotypic Data for Precision Medicine,” Methods Mol Biol, vol.
3D protein structures, gene expression profiles, and DNA
2486, pp. 19-35, 2022.
sequences. They constructed several context-specific DL [2] R. Miotto, F. Wang, S. Wang, X. Jiang, and J. T. Dudley, “Deep
models to extract features and developed three deep bimodal learning for healthcare: review, opportunities and challenges,”
architectures based on attention and DeePROG for the Brief Bioinform, vol. 19, no. 6, pp. 1236-1246, Nov 27, 2018.
[3] S. R. Stahlschmidt, B. Ulfenborg, and J. Synnergren,
prediction of important biomedical data.
“Multimodal deep learning for biomedical data fusion: a review,”
Brief Bioinform, vol. 23, no. 2, Mar 10, 2022.
IV. CHALLENGES AND LIMITATIONS [4] Y. Li, F. X. Wu, and A. Ngom, “A review on machine learning
The availability of multiomics and phenotype data is principles for multi-view biological data integration,” Brief
growing at a fast pace, however, linking these different types Bioinform, vol. 19, no. 2, pp. 325-340, Mar 1, 2018.
[5] P. Mamoshina, A. Vieira, E. Putin, and A. Zhavoronkov,
of data remains a great challenge. The interpretation of “Applications of Deep Learning in Biomedicine,” Mol Pharm,
meaningful biological connections can aid in designing vol. 13, no. 5, pp. 1445-54, May 2, 2016.
personalized therapies. Although standard data analysis [6] M. Kang, E. Ko, and T. B. Mersha, “A roadmap for multiomics
methods have been established for single-omics data, there is data integration using deep learning,” Brief Bioinform, vol. 23,
nevertheless a need for established methods that can connect no. 1, Jan 17, 2022.
[7] R. Li, L. Li, Y. Xu, and J. Yang, “Machine learning meets omics:
multiple data resources including various molecular data and applications and perspectives,” Brief Bioinform, vol. 23, no. 1,
clinical data. A structured data linking approach would Jan 17, 2022.
improve the process by making it de-centralized and [8] S. Min, B. Lee, and S. Yoon, “Deep learning in bioinformatics,”
accessible. Ensuring traceability and successful collaborations Brief Bioinform, vol. 18, no. 5, pp. 851-869, Sep 1, 2017.
requires the adoption of new technologies and models [1, 85, [9] Z. Zhang, Y. Zhao, X. Liao, W. Shi, K. Li, Q. Zou, and S. Peng,
“Deep learning in omics: a survey and guideline,” Brief Funct
86].
Genomics, vol. 18, no. 1, pp. 41-57, Feb 14, 2019.
The other main challenge is interpretation of these models. [10] B. Tang, Z. Pan, K. Yin, and A. Khateeb, “Recent Advances of
Deep Learning in Bioinformatics and Computational Biology,”
Although deep learning models can extract complex patterns, Front Genet, vol. 10, pp. 214, 2019.
they still suffer from the black-box issue. Identifying [11] G. Nicora, F. Vitali, A. Dagliati, N. Geifman, and R. Bellazzi,
biomarkers from predictive models is an integral part in the “Integrated Multiomics Analyses in Oncology: A Review of
translation of information to clinical practice. Thus, decoding Machine Learning Methods and Tools,” Front Oncol, vol. 10, pp.
this black-box and improving the interpretability of the 1030, 2020.
models is a current trend. There are many efforts that aim to [12] A. Holzinger, B. Haibe-Kains, and I. Jurisica, “Why imaging data
alone is not enough: AI-based integration of imaging, omics, and
improve this such as SHAP (Shapley exPlanations) [87]. clinical data,” Eur J Nucl Med Mol Imaging, vol. 46, no. 13, pp.
Nevertheless, there are also issues of missing data, the amount 2722-2730, Dec, 2019.
of noise present in omics and phenotypic data alike, which [13] M. Picard, M. P. Scott-Boyer, A. Bodein, O. Perin, and A. Droit,
could likely bias the models invalidating the health-related “Integration strategies of multiomics data for machine learning
conclusions [88, 89]. For these reasons, it is important to analysis,” Comput Struct Biotechnol J, vol. 19, pp. 3735-3746,
2021.
acknowledge the potential biases and minimize such effects [14] D. Grapov, J. Fahrmann, K. Wanichthanarak, and S. Khoomrung,
by including diverse cohorts and adopting methods that “Rise of Deep Learning for Genomic, Proteomic, and
mitigate such risks. Metabolomic Data Integration in Precision Medicine,” OMICS,
vol. 22, no. 10, pp. 630-636, Oct, 2018.
Finally, deep learning is incredibly data-hungry, making [15] R. Duan, L. Gao, Y. Gao, Y. Hu, H. Xu, M. Huang, K. Song, H.
the requirement for extraordinarily large datasets the most Wang, Y. Dong, C. Jiang, C. Zhang, and S. Jia, “Evaluation and
crucial aspect in multiomics integration. Despite our assertion comparison of multiomics data integration methods for cancer
that this is the era of big data, there is still a limitation in the subtyping,” PLoS Comput Biol, vol. 17, no. 8, pp. e1009224,
Aug, 2021.
number of certain kinds of samples and therefore insufficient [16] B. Jankovic, and T. Gojobori, “From shallow to deep: some
for deep learning algorithms to be effectively applied. lessons learned from application of machine learning for
Investing in the generation of multiple omics and integrating recognition of functional genomic elements in human genome,”
is essential. Despite advancements in biotechnologies, Hum Genomics, vol. 16, no. 1, pp. 7, Feb 18, 2022.
generating and collecting several omics simultaneously can be [17] W. Q. Wei, L. A. Bastarache, R. J. Carroll, J. E. Marlo, T. J.
Osterman, E. R. Gamazon, N. J. Cox, D. M. Roden, and J. C.
quite expensive. Thus, it is necessary to come up with
Denny, “Evaluating phecodes, clinical classification software,
alternatives to augment or complete such data. Considering and ICD-9-CM codes for phenome-wide association studies in
the success of many deep learning algorithms in the field of the electronic health record,” PLoS One, vol. 12, no. 7, pp.
diagnostics, increased use of imaging data to assess e0175508, 2017.
morphological or phenotypic alterations of cells and tissue [18] P. Wu, A. Gifford, X. Meng, X. Li, H. Campbell, T. Varley, J.
slides is an exciting field. Although extracting useful Zhao, R. Carroll, L. Bastarache, J. C. Denny, E. Theodoratou, and
W. Q. Wei, “Mapping ICD-10 and ICD-10-CM Codes to
knowledge from omics data remains a challenging task, recent Phecodes: Workflow Development and Initial Evaluation,” JMIR
advancements in the integration of multiomics and applying Med Inform, vol. 7, no. 4, pp. e14325, Nov 29, 2019.
various computational models has certainly advanced our [19] I. Subramanian, S. Verma, S. Kumar, A. Jere, and K. Anamika,
insights into biology and biomedicine [7]. “Multiomics Data Integration, Interpretation, and Its
Application,” Bioinform Biol Insights, vol. 14, pp.
ACKNOWLEDGMENT 1177932219899051, 2020.
[20] L. Hood, and L. Rowen, “The Human Genome Project: big
We would like to thank all authors of the papers that were science transforms biology and medicine,” Genome Med, vol. 5,
cited and reviewed in this manuscript. no. 9, pp. 79, 2013.
[21] A. Conesa, and S. Beck, “Making multiomics data accessible to
REFERENCES researchers,” Sci Data, vol. 6, no. 1, pp. 251, Oct 31, 2019.
[22] T. J. Giordano, “The cancer genome atlas research network: a
sight to behold,” Endocr Pathol, vol. 25, no. 4, pp. 362-5, Dec,
2014.
2213
Authorized licensed use limited to: UCLA Library. Downloaded on February 29,2024 at 05:18:49 UTC from IEEE Xplore. Restrictions apply.
[23] M. J. Ellis, M. Gillette, S. A. Carr, A. G. Paulovich, R. D. Smith, Cardiovascular Event Prediction,” Sci Rep, vol. 9, no. 1, pp. 717,
K. K. Rodland, R. R. Townsend, C. Kinsinger, M. Mesri, H. Jan 24, 2019.
Rodriguez, D. C. Liebler, and C. Clinical Proteomic Tumor [37] B. Zhu, N. Song, R. Shen, A. Arora, M. J. Machiela, L. Song, M.
Analysis, “Connecting genomic alterations to cancer biology T. Landi, D. Ghosh, N. Chatterjee, V. Baladandayuthapani, and
with proteomics: the NCI Clinical Proteomic Tumor Analysis H. Zhao, “Integrating Clinical and Multiple Omics Data for
Consortium,” Cancer Discov, vol. 3, no. 10, pp. 1108-12, Oct, Prognostic Assessment across Human Cancers,” Sci Rep, vol. 7,
2013. no. 1, pp. 16954, Dec 5, 2017.
[24] A. Subramanian, R. Narayan, S. M. Corsello, D. D. Peck, T. E. [38] S. Zhang, and X. J. Zhou, “Matrix factorization methods for
Natoli, X. Lu, J. Gould, J. F. Davis, A. A. Tubelli, J. K. Asiedu, integrative cancer genomics,” Methods Mol Biol, vol. 1176, pp.
D. L. Lahr, J. E. Hirschman, Z. Liu, M. Donahue, B. Julian, M. 229-42, 2014.
Khan, D. Wadden, I. C. Smith, D. Lam, A. Liberzon, C. Toder, [39] D. Lin, J. Zhang, J. Li, V. D. Calhoun, H. W. Deng, and Y. P.
M. Bagul, M. Orzechowski, O. M. Enache, F. Piccioni, S. A. Wang, “Group sparse canonical correlation analysis for genomic
Johnson, N. J. Lyons, A. H. Berger, A. F. Shamji, A. N. Brooks, data integration,” BMC Bioinformatics, vol. 14, pp. 245, Aug 12,
A. Vrcic, C. Flynn, J. Rosains, D. Y. Takeda, R. Hu, D. Davison, 2013.
J. Lamb, K. Ardlie, L. Hogstrom, P. Greenside, N. S. Gray, P. A. [40] C. Qiu, F. Yu, K. Su, Q. Zhao, L. Zhang, C. Xu, W. Hu, Z. Wang,
Clemons, S. Silver, X. Wu, W. N. Zhao, W. Read-Button, X. Wu, L. Zhao, Q. Tian, Y. Wang, H. Deng, and H. Shen, “Multiomics
S. J. Haggarty, L. V. Ronco, J. S. Boehm, S. L. Schreiber, J. G. Data Integration for Identifying Osteoporosis Biomarkers and
Doench, J. A. Bittker, D. E. Root, B. Wong, and T. R. Golub, “A Their Biological Interaction and Causal Mechanisms,” iScience,
Next Generation Connectivity Map: L1000 Platform and the First vol. 23, no. 2, pp. 100847, Feb 21, 2020.
1,000,000 Profiles,” Cell, vol. 171, no. 6, pp. 1437-1452 e17, Nov [41] L. Omberg, G. H. Golub, and O. Alter, “A tensor higher-order
30, 2017. singular value decomposition for integrative analysis of DNA
[25] "TARGET," https://ocg.cancer.gov/programs/target. microarray data from different studies,” Proc Natl Acad Sci U S
[26] "Omics Discovery," https://www.omicsdi.org. A, vol. 104, no. 47, pp. 18371-6, Nov 20, 2007.
[27] C. Bycroft, C. Freeman, D. Petkova, G. Band, L. T. Elliott, K. [42] V. Hore, A. Vinuela, A. Buil, J. Knight, M. I. McCarthy, K.
Sharp, A. Motyer, D. Vukcevic, O. Delaneau, J. O'Connell, A. Small, and J. Marchini, “Tensor decomposition for multiple-
Cortes, S. Welsh, A. Young, M. Effingham, G. McVean, S. tissue gene expression experiments,” Nat Genet, vol. 48, no. 9,
Leslie, N. Allen, P. Donnelly, and J. Marchini, “The UK Biobank pp. 1094-100, Sep, 2016.
resource with deep phenotyping and genomic data,” Nature, vol. [43] G. Zhou, S. Li, and J. Xia, “Network-Based Approaches for
562, no. 7726, pp. 203-209, Oct, 2018. Multiomics Integration,” Methods Mol Biol, vol. 2104, pp. 469-
[28] A. Nagai, M. Hirata, Y. Kamatani, K. Muto, K. Matsuda, Y. 487, 2020.
Kiyohara, T. Ninomiya, A. Tamakoshi, Z. Yamagata, T. [44] S. Huang, K. Chaudhary, and L. X. Garmire, “More Is Better:
Mushiroda, Y. Murakami, K. Yuji, Y. Furukawa, H. Zembutsu, Recent Progress in Multiomics Data Integration Methods,” Front
T. Tanaka, Y. Ohnishi, Y. Nakamura, G. BioBank Japan Genet, vol. 8, pp. 84, 2017.
Cooperative Hospital, and M. Kubo, “Overview of the BioBank [45] L. Zhang, C. Lv, Y. Jin, G. Cheng, Y. Fu, D. Yuan, Y. Tao, Y.
Japan Project: Study design and profile,” J Epidemiol, vol. 27, Guo, X. Ni, and T. Shi, “Deep Learning-Based Multiomics Data
no. 3S, pp. S2-S8, Mar, 2017. Integration Reveals Two Prognostic Subtypes in High-Risk
[29] H. Al Kuwari, A. Al Thani, A. Al Marri, A. Al Kaabi, H. Neuroblastoma,” Front Genet, vol. 9, pp. 477, 2018.
Abderrahim, N. Afifi, F. Qafoud, Q. Chan, I. Tzoulaki, P. [46] T. Wang, W. Shao, Z. Huang, H. Tang, J. Zhang, Z. Ding, and K.
Downey, H. Ward, N. Murphy, E. Riboli, and P. Elliott, “The Huang, “MOGONET integrates multiomics data using graph
Qatar Biobank: background and methods,” BMC Public Health, convolutional networks allowing patient classification and
vol. 15, pp. 1208, Dec 3, 2015. biomarker identification,” Nat Commun, vol. 12, no. 1, pp. 3445,
[30] D. Welter, J. MacArthur, J. Morales, T. Burdett, P. Hall, H. Jun 8, 2021.
Junkins, A. Klemm, P. Flicek, T. Manolio, L. Hindorff, and H. [47] W. Wang, X. Zhang, and D. Q. Dai, “DeFusion: a denoised
Parkinson, “The NHGRI GWAS Catalog, a curated resource of network regularization framework for multiomics integration,”
SNP-trait associations,” Nucleic Acids Res, vol. 42, no. Database Brief Bioinform, vol. 22, no. 5, Sep 2, 2021.
issue, pp. D1001-6, Jan, 2014. [48] L. Chen, J. Xu, and S. C. Li, “DeepMF: deciphering the latent
[31] J. C. Denny, M. D. Ritchie, M. A. Basford, J. M. Pulley, L. patterns in omics profiles with a deep learning method,” BMC
Bastarache, K. Brown-Gentry, D. Wang, D. R. Masys, D. M. Bioinformatics, vol. 20, no. Suppl 23, pp. 648, Dec 27, 2019.
Roden, and D. C. Crawford, “PheWAS: demonstrating the [49] Madhumita, and S. Paul, “Capturing the latent space of an
feasibility of a phenome-wide scan to discover gene-disease Autoencoder for multiomics integration and cancer subtyping,”
associations,” Bioinformatics, vol. 26, no. 9, pp. 1205-10, May 1, Comput Biol Med, vol. 148, pp. 105832, Sep, 2022.
2010. [50] E. Withnell, X. Zhang, K. Sun, and Y. Guo, “XOmiVAE: an
[32] H. Qin, T. Niu, and J. Zhao, “Identifying Multiomics Causers and interpretable deep learning model for cancer classification using
Causal Pathways for Complex Traits,” Front Genet, vol. 10, pp. high-dimensional omics data,” Brief Bioinform, vol. 22, no. 6,
110, 2019. Nov 5, 2021.
[33] Y. Lin, W. Zhang, H. Cao, G. Li, and W. Du, “Classifying Breast [51] C. Zhang, Y. Chen, T. Zeng, C. Zhang, and L. Chen, “Deep latent
Cancer Subtypes Using Deep Neural Networks Based on space fusion for adaptive representation of heterogeneous
Multiomics Data,” Genes (Basel), vol. 11, no. 8, Aug 4, 2020. multiomics data,” Brief Bioinform, vol. 23, no. 2, Mar 10, 2022.
[34] H. Zhang, T. Liu, Z. Zhang, S. H. Payne, B. Zhang, J. E. [52] N. Borhani, J. Ghaisari, M. Abedi, M. Kamali, and Y. Gheisari,
McDermott, J. Y. Zhou, V. A. Petyuk, L. Chen, D. Ray, S. Sun, “A deep learning approach to predict inter-omics interactions in
F. Yang, L. Chen, J. Wang, P. Shah, S. W. Cha, P. Aiyetan, S. multi-layer networks,” BMC Bioinformatics, vol. 23, no. 1, pp.
Woo, Y. Tian, M. A. Gritsenko, T. R. Clauss, C. Choi, M. E. 53, Jan 26, 2022.
Monroe, S. Thomas, S. Nie, C. Wu, R. J. Moore, K. H. Yu, D. L. [53] A. Lemsara, S. Ouadfel, and H. Frohlich, “PathME: pathway
Tabb, D. Fenyo, V. Bafna, Y. Wang, H. Rodriguez, E. S. Boja, T. based multi-modal sparse autoencoders for clustering of patient-
Hiltke, R. C. Rivers, L. Sokoll, H. Zhu, I. M. Shih, L. Cope, A. level multiomics data,” BMC Bioinformatics, vol. 21, no. 1, pp.
Pandey, B. Zhang, M. P. Snyder, D. A. Levine, R. D. Smith, D. 146, Apr 16, 2020.
W. Chan, K. D. Rodland, and C. Investigators, “Integrated [54] X. Zhang, Y. Xing, K. Sun, and Y. Guo, “OmiEmbed: A Unified
Proteogenomic Characterization of Human High-Grade Serous Multi-Task Deep Learning Framework for Multiomics Data,”
Ovarian Cancer,” Cell, vol. 166, no. 3, pp. 755-765, Jul 28, 2016. Cancers (Basel), vol. 13, no. 12, Jun 18, 2021.
[35] P. K. Mankoo, R. Shen, N. Schultz, D. A. Levine, and C. Sander, [55] G. Zhang, Z. Peng, C. Yan, J. Wang, J. Luo, and H. Luo,
“Time to recurrence and survival in serous ovarian tumors “MultiGATAE: A Novel Cancer Subtype Identification Method
predicted from integrated genomic profiles,” PLoS One, vol. 6, Based on Multiomics and Attention Mechanism,” Front Genet,
no. 11, pp. e24709, 2011. vol. 13, pp. 855629, 2022.
[36] J. Zhao, Q. Feng, P. Wu, R. A. Lupu, R. A. Wilke, Q. S. Wells, [56] K. T. Ahmed, J. Sun, S. Cheng, J. Yong, and W. Zhang,
J. C. Denny, and W. Q. Wei, “Learning from Longitudinal Data “Multiomics Data Integration by Generative Adversarial
in Electronic Health Record and Genetic Data to Improve Network,” Bioinformatics, Aug 20, 2021.
2214
Authorized licensed use limited to: UCLA Library. Downloaded on February 29,2024 at 05:18:49 UTC from IEEE Xplore. Restrictions apply.
[57] Y. G. Xu, Z. G. Zhang, L. You, J. J. Liu, Z. W. Fan, and X. B. [74] P. Afshar, A. Oikonomou, F. Naderkhani, P. N. Tyrrell, K. N.
Zhou, “scIGANs: single-cell RNA-seq imputation using Plataniotis, K. Farahani, and A. Mohammadi, “3D-MCN: A 3D
generative adversarial networks,” Nucleic Acids Research, vol. Multi-scale Capsule Network for Lung Nodule Malignancy
48, no. 15, Sep 4, 2020. Prediction,” Sci Rep, vol. 10, no. 1, pp. 7948, May 14, 2020.
[58] H. Yang, R. Chen, D. Li, and Z. Wang, “Subtype-GAN: a deep [75] C. Peng, Y. Zheng, and D. S. Huang, “Capsule Network Based
learning approach for integrative cancer subtyping of multiomics Modeling of Multiomics Data for Discovery of Breast Cancer-
data,” Bioinformatics, Feb 18, 2021. Related Genes,” Ieee-Acm Transactions on Computational
[59] M. Kim, I. Oh, and J. Ahn, “An Improved Method for Prediction Biology and Bioinformatics, vol. 17, no. 5, pp. 1605-1612, Sept
of Cancer Prognosis by Network Learning,” Genes (Basel), vol. 1, 2020.
9, no. 10, Oct 2, 2018. [76] a. O. W. S. K. Sønderby, “Protein secondary structure prediction
[60] J. Park, H. Kim, J. Kim, and M. Cheon, “A practical application with long short term memory networks,” arXiv preprint,
of generative adversarial networks for RNA-seq analysis to arXiv:1412.7828.
predict the molecular progress of Alzheimer's disease,” PLoS [77] S. Agrawal, D. S. Sisodia, and N. K. Nagwani, “Long short term
Comput Biol, vol. 16, no. 7, pp. e1008099, Jul, 2020. memory based functional characterization model for unknown
[61] M. Bahrami, M. Maitra, C. Nagy, G. Turecki, H. R. Rabiee, and protein sequences using ensemble of shallow and deep features,”
Y. Li, “Deep feature extraction of single-cell transcriptomes by Neural Computing & Applications, vol. 34, no. 6, pp. 4831-4841,
generative adversarial network,” Bioinformatics, vol. 37, no. 10, Mar, 2022.
pp. 1345-1351, May 15, 2021. [78] a. L. S. J. Thomas, “Deep neural network based precursor
[62] Z. Huang, X. Zhan, S. Xiang, T. S. Johnson, B. Helm, C. Y. Yu, microRNA prediction on eleven species,” arXiv preprint, 2017.
J. Zhang, P. Salama, M. Rizkalla, Z. Han, and K. Huang, [79] B. Lee, J. Baek, S. Park, and S. Yoon, “deepTarget: End-to-end
“SALMON: Survival Analysis Learning With Multiomics Neural Learning Framework for microRNA Target Prediction using
Networks on Breast Cancer,” Front Genet, vol. 10, pp. 166, 2019. Deep Recurrent Neural Networks,” Proceedings of the 7th Acm
[63] O. B. Poirion, Z. Jing, K. Chaudhary, S. Huang, and L. X. International Conference on Bioinformatics, Computational
Garmire, “DeepProg: an ensemble of deep-learning and machine- Biology, and Health Informatics, pp. 434-442, 2016.
learning models for prognosis prediction using multiomics data,” [80] S. K. Sønderby, C. K. Sønderby, H. Nielsen, and O. Winther,
Genome Med, vol. 13, no. 1, pp. 112, Jul 14, 2021. "Convolutional LSTM Networks for Subcellular Localization of
[64] P. M. G. Viaud, and P.-H. Cournede, “Representation Learning Proteins," Algorithms for Computational Biology. pp. 68-80.
for the Clustering of Multiomics Data,” IEEE/ACM Transactions [81] Z. Rong, D. Lingyun, L. Jinxing, and G. Ying, “Diagnostic
on Computational Biology and Bioinformatics, 2021. Classification of Lung Cancer Using Deep Transfer Learning
[65] M. H. M. Hess, and H. Binder, “Exploring generative deep Technology and Multi ‐ Omics Data, ” Chinese Journal of
learning for omics data using log-linear models,” Bioinformatics, Electronics, vol. 30, no. 5, pp. 843-852, 2021.
vol. 36, no. 20, pp. 5045–5053, 2020. [82] N. Zhang, H. Wang, C. Xu, L. Zhang, and T. Zang, “DeepGP: An
[66] Y. Yan, X. J. Yao, S. H. Wang, and Y. D. Zhang, “A Survey of Integrated Deep Learning Method for Endocrine Disease Gene
Computer-Aided Tumor Diagnosis Based on Convolutional Prediction Using Omics Data,” Front Cell Dev Biol, vol. 9, pp.
Neural Network,” Biology (Basel), vol. 10, no. 11, Oct 22, 2021. 700061, 2021.
[67] Y. H. Fu, J. Y. Xu, Z. S. Tang, L. Wang, D. Yin, Y. Fan, D. D. [83] E. Huang, S. Kim, and T. Ahn, “Deep Learning for Integrated
Zhang, F. Deng, Y. P. Zhang, H. H. Zhang, H. Y. Wang, W. H. Analysis of Insulin Resistance with Multiomics Data,” J Pers
Xing, L. L. Yin, S. L. Zhu, M. J. Zhu, M. Yu, X. Y. Li, X. L. Liu, Med, vol. 11, no. 2, Feb 15, 2021.
X. H. Yuan, and S. H. Zhao, “A gene prioritization method based [84] P. Dutta, A. P. Patra, and S. Saha, “DeePROG: Deep Attention-
on a swine multiomics knowledgebase and a deep learning based Model for Diseased Gene Prognosis by Fusing Multiomics
model,” Communications Biology, vol. 3, no. 1, Sep 10, 2020. Data,” IEEE/ACM Trans Comput Biol Bioinform, vol. PP, Jun 24,
[68] B. ElKarami, A. Alkhateeb, H. Qattous, L. Alshomali, and B. 2021.
Shahrrava, “Multiomics Data Integration Model Based on UMAP [85] A. E. A. Azaria, T. Vieira and A. Lippman, “MedRec: Using
Embedding and Convolutional Neural Network,” Cancer Blockchain for Medical Data Access and Permission
Informatics, vol. 21, Sep, 2022. Management,” 2016 2nd International Conference on Open and
[69] M. M. Islam, S. J. Huang, R. Ajwad, C. Chi, Y. Wang, and P. Z. Big Data (OBD), pp. 25-30, 2016.
Hu, “An integrative deep learning framework for classifying [86] L. O.-M. Tsung-Ting Kuo, “ModelChain: Decentralized Privacy-
molecular subtypes of breast cancer,” Computational and Preserving Healthcare Predictive Modeling Framework on
Structural Biotechnology Journal, vol. 18, pp. 2185-2199, 2020. Private Blockchain Networks.”
[70] H. Yu, D. C. Samuels, Y. Y. Zhao, and Y. Guo, “Architectures [87] S.-I. L. Scott M. Lundberg, “A unified approach to interpreting
and accuracy of artificial neural network for disease classification model predictions,” NIPS'17: Proceedings of the 31st
from omics data,” BMC Genomics, vol. 20, no. 1, pp. 167, Mar 4, International Conference on Neural Information Processing
2019. Systems, pp. 4768-4777, 2017.
[71] W. X. Shen, Y. Liu, Y. Chen, X. Zeng, Y. Tan, Y. Y. Jiang, and [88] M. P. Kadija Ferryman, Fairness in precision medicine, 2018.
Y. Z. Chen, “AggMapNet: enhanced and explainable low-sample [89] A. Rajkomar, M. Hardt, M. D. Howell, G. Corrado, and M. H.
omics deep learning with feature-aggregated multi-channel Chin, “Ensuring Fairness in Machine Learning to Advance
networks,” Nucleic Acids Res, Jan 31, 2022. Health Equity,” Ann Intern Med, vol. 169, no. 12, pp. 866-872,
[72] R. Yamashita, M. Nishio, R. K. G. Do, and K. Togashi, Dec 18, 2018.
“Convolutional neural networks: an overview and application in
radiology,” Insights into Imaging, vol. 9, no. 4, pp. 611-629, Aug,
2018.
[73] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic Routing
Between Capsules,” Advances in Neural Information Processing
Systems 30 (Nips 2017), vol. 30, 2017.
2215
Authorized licensed use limited to: UCLA Library. Downloaded on February 29,2024 at 05:18:49 UTC from IEEE Xplore. Restrictions apply.