A Practical Guide To Amplicon and Metagenomic Analysis of Microbiome Data
A Practical Guide To Amplicon and Metagenomic Analysis of Microbiome Data
REVIEW
A practical guide to amplicon
and metagenomic analysis of microbiome data
Yong-Xin Liu1,2,3& , Yuan Qin
1,2,3,4
, Tong Chen5 , Meiping Lu6 , Xubo Qian6 , Xiaoxuan Guo1,2,3 ,
Yang Bai1,2,3,4&
1
State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences,
Beijing 100101, China
2
CAS Center for Excellence in Biotic Interactions, University of Chinese Academy of Sciences, Beijing 100049, China
3
and standards have been evolving rapidly over the past few microbe-, DNA-, and mRNA-level analyses (Fig. 1A). The
years (Knight et al., 2018). For example, there was a pro- appropriate method(s) should be selected based on sample
posal to replace operational taxonomic units (OTUs) with types and research goals.
amplicon sequence variants (ASVs) in marker gene-based Culturome is a high-throughput method for culturing and
amplicon data analysis (Callahan et al., 2016). The next- identifying microbes at the microbe-level (Fig. 1A). The
generation microbiome analysis pipeline QIIME 2, a repro- microbial isolates are obtained as follows. First, the samples
ducible, interactive, efficient, community-supported platform are crushed, empirically diluted in liquid medium, and dis-
was recently published (Bolyen et al., 2019). In addition, new tributed in 96-well microtiter plates or Petri dishes. Second,
methods have recently been proposed for taxonomic clas- the plates are cultured for 20 days at room temperature.
sification (Ye et al., 2019), machine learning (Galkin et al., Third, the microbes in each well are subjected to amplicon
2018), and multi-omics integrated analysis (Pedersen et al., sequencing, and wells with pure, non-redundant colonies are
2018). selected as candidates. Fourth, the candidates are purified
The development of HTS and analysis methods has and subjected to 16S rDNA full-length Sanger sequencing.
provided new insights into the structures and functions of Finally, the newly characterized pure isolates are preserved
microbiome (Jiang et al., 2019; Ning and Tong, 2019). (Zhang et al., 2019). Culturome is the most effective method
However, these new developments have made it challenging for obtaining bacterial stocks, but it is expensive and labor
for researchers, especially those without a bioinformatics intensive (Fig. 1B). This method has been used for micro-
Protein & Cell
background, to choose suitable software and pipelines. In biome analysis in humans (Goodman et al., 2011; Zou et al.,
this review, we discuss the widely used software packages 2019), mouse (Liu et al., 2020), marine sediment (Mu et al.,
for microbiome analyses, summarize their advantages and 2018), Arabidopsis thaliana (Bai et al., 2015), and rice
limitations, and provide sample codes and suggestions for (Zhang et al., 2019). These studies not only expanded the
selecting and using these tools. catalog of taxonomic and functional databases for metage-
nomic analyses, but also provided bacterial stocks for
experimental verification. For further information, please see
HTS METHODS OF MICROBIOME ANALYSIS
(Lagier et al., 2018; Liu et al., 2019a).
The first step in microbiome research is to understand the DNA is easy to extract, preserve, and sequence, which has
advantages and limitations of specific HTS methods. These allowed researchers to develop various HTS methods (Fig. 1A).
methods are primarily used for three types of analysis: The commonly used HTS methods of microbiome are amplicon
A B
Method Advantages Limitations
High-throughput Expensive
Culturome Targeted selection Laborious
Microbes Provides microbial isolates Influenced by media and the environment
Figure 1. Advantages and limitations of HTS methods used in microbiome research. A Introduction to HTS methods for
different levels of analysis. At the molecule-level, microbiome studies are divided into three types: microbe, DNA, and mRNA. The
corresponding research techniques include culturome, amplicon, metagenome, metavirome, and metatranscriptome analyses. B The
advantages and limitations of various HTS methods for microbiome analysis.
and metagenomic sequencing (Fig. 1B). Amplicon sequencing, sample, virus enrichment (Metsky et al., 2019) or the
the most widely used HTS method for microbiome analysis, can removal of host DNA (Charalampous et al., 2019) is
be applied to almost all sample types. The major marker genes essential steps for obtaining sufficient quantities of viral DNA
used in amplicon sequencing include 16S ribosome DNA (rDNA) or RNA for analysis (Fig. 1B).
for prokaryotes and 18S rDNA and internal transcribed spacers The selection of sequencing methods depends on the
(ITS) for eukaryotes. 16S rDNA amplicon sequencing is the most scientific questions and sample types. The integration of
commonly used method, but there is currently a confusing array different methods is advisable, as multi-omics provides
of available primers. A good method for selecting primer is to insights into both the taxonomy and function of the micro-
evaluate their specificity and overall coverage using real sam- biome. In practice, most researchers select only one or two
ples or electronic PCR based on the SILVA database (Klindworth HTS methods for analysis due to time and cost limitations.
et al., 2012) and on host factors including the presence of Although amplicon sequencing can provide only the taxo-
chloroplasts, mitochondria, ribosomes, and other potential nomic composition of microbiota, it is cost effective ($20–50
sources of non-specific amplification. Alternatively, researchers per sample) and can be applied to large-scale research. In
can refer to the primers used in published studies similar to their addition, the amount of data generated from amplicon
own, which would save time in method optimization and facilitate sequencing is relatively small, and the analysis is quick and
to compare results among studies. Two-step PCR is typically easy to perform. For example, data analysis of 100 amplicon
used for amplification and to add barcodes and adaptors to each samples could be completed within a day using an ordinary
A B
Amplicon Metagenome
(fastq)
Merged clean
Clean reads
amplicons
Reads-based
sequences (MEGAHIT/metaSPAdes)
(fastq/fasta)
Clustering Denoising
(USEARCH) (DADA2/Deblur)
(MetaPhlAn2/
Quantifying (QIIME2/USEARCH) Contigs
Kraken2)
Quantifying
(feature table)
(OTU/ASV/genus…) (Gene/KO/pathway…)
Functional prediction
(PICRUSt/Tax4Fun)
Figure 2. Workflow of commonly used methods for amplicon (A) and metagenomic (B) sequencing. Blue, orange, and green
blocks represent input, intermediate, and output files, respectively. The text next to the arrow represents the method, with frequently
used software shown in parentheses. Taxonomic and functional tables are collectively referred to as feature tables. Please see
Table 1 for more information about the software listed in this figure.
2010) or QIIME (Caporaso et al., 2010). Alternatively, clean However, many available software packages have been
amplicon data supplied by sequencing service providers can developed to predict potential functional information. The
be used for next analysis (Fig. 2A). principle behind this prediction is to link the 16S rDNA
Picking the representative sequences as proxies of a sequences or taxonomy information with functional descrip-
species is a key step in amplicon analysis. Two major tions in literature. PICRUSt (Langille et al., 2013), which is
approaches for representative sequence selection are clus- based on the OTU table of the Greengenes database
tering to OTUs and denoising to ASVs. The UPARSE algo- (McDonald et al., 2011), could be used to predict the
rithm clusters sequences with 97% similarity into OTUs metagenomic functional composition (Zheng et al., 2019) of
(Edgar, 2013). However, this method may fail to detect subtle Kyoto Encyclopedia of Genes and Genomes (KEGG) path-
differences among species or strains. DADA2 is a recently ways (Kanehisa and Goto, 2000). The newly developed
developed denoising algorithm that outputs ASVs as more PICRUSt2 software package (https://github.com/picrust/
exactly representative sequences (Callahan et al., 2016). picrust2) can directly predict metagenomic functions based
The denoising method is available at denoise-paired/single on an arbitrary OTU/ASV table. The R package Tax4Fun
by DADA2, denoise-16S by Deblur in QIIME 2 (Bolyen et al., (Asshauer et al., 2015) can predict KEGG functional capa-
2019), and -unoise3 in USEARCH (Edgar and Flyvbjerg, bilities of microbiota based on the SILVA database (Quast
2015). Finally, a feature table (OTU/ASV table) can be et al., 2013). The functional annotation of prokaryotic taxa
obtained by quantifying the frequency of the feature (FAPROTAX) pipeline performs functional annotation based
sequences in each sample. Simultaneously, the feature on published metabolic and ecological functions such as
sequences can be assigned taxonomy, typically at the nitrate respiration, iron respiration, plant pathogen, and ani-
kingdom, phylum, class, order, family, genus, and species mal parasites or symbionts, making it useful for environ-
levels, providing a dimensionality reduction perspective on mental (Louca et al., 2016), agricultural (Zhang et al., 2019),
the microbiota. and animal (Ross et al., 2018) microbiome research. Bug-
In general, 16S rDNA amplicon sequencing can only be Base is an extended database of Greengenes used to pre-
used to obtain information about taxonomic composition. dict phenotypes such as oxygen tolerance, Gram staining,
Table 1 continued
metaWRAP https://github.com/bxlab/ Binning pipeline includes 140 tools and supports conda install, (Uritskiy
metaWRAP default binning by MetaBAT, MaxBin, and CONCOCT. Provides et al., 2018)
refinement, quantification, taxonomic classification and
visualization of bins
DAS Tool https://github.com/cmks/ Binning pipeline that integrates five binning software packages (Sieber et al.,
DAS_Tool and performs refinement 2018)
and pathogenic potential (Ward et al., 2017); this database is algorithms to perform taxonomic classification (Wood et al.,
mainly used in medical research (Mahnert et al., 2019). 2019). For a review about benchmarking 20 tools of taxo-
nomic classification, please see Ye et al. (2019). HUMAnN2
(Franzosa et al., 2018), the widely used functional profiling
Metagenomic analysis
software, can also be used to explore within- and between-
Compared to amplicon, shotgun metagenome can provide sample contributional diversity (species’ contributions to a
Protein & Cell
functional gene profiles directly and reach a much higher specific function). MEGAN (Huson et al., 2016) is a cross-
resolution of taxonomic annotation. However, due to the platform graphical user interface (GUI) software that per-
large amount of data, the fact that most software is only forms taxonomic and functional analyses (Table 1). In addi-
available for Linux systems, and the large amount of com- tion, various metagenomic gene catalogs are available,
puting resources are needed to perform analysis. To facili- including catalogs curated from the human gut (Li et al.,
tate software installation and maintenance, we recommend 2014; Pasolli et al., 2019; Tierney et al., 2019), the mouse
using the package manager Conda with BioConda channel gut (Xiao et al., 2015), the chicken gut (Huang et al., 2018),
(Grüning et al., 2018) to deploy metagenomic analysis the cow rumen (Stewart et al., 2018; Stewart et al., 2019),
pipelines. Since metagenomic analysis is computationally the ocean (Salazar et al., 2019), and the citrus rhizosphere
intensive, it is better to run multiple tasks/samples in parallel, (Xu et al., 2018). These customized databases can be used
which requires software such as GNU Parallel for queue for taxonomic and functional annotation in the appropriate
management (Tange, 2018). field of study, allowing efficient, precise, rapid analysis.
The Illumina HiSeqX/NovaSeq system often produces Assembly-based methods assemble clean reads into
PE150 reads for metagenomic sequencing, whereas reads contigs using tools such as MEGAHIT or metaSPAdes
generated by BGI-Seq500 are in PE100 mode. The first (Fig. 2B). MEGAHIT is used to assemble large, complex
crucial step in metagenomic analysis is quality control and metagenome datasets quickly using little computer memory
the removal of host contamination from raw reads, which (Li et al., 2015), while metaSPAdes can generate longer
requires the KneadData pipeline (https://bitbucket.org/ contigs but requires more computational resources (Nurk
biobakery/kneaddata) or a combination of Trimmomatic et al., 2017). Genes present in assembled contigs are then
(Bolger et al., 2014) and Bowtie 2 (Langmead and Salzberg, identified using metaGeneMark (Zhu et al., 2010) or Prokka
2012). Trimmomatic is a flexible quality-control software (Seemann, 2014). Redundant genes from separately
package for Illumina sequencing data that can be used to assembled contigs must be removed using tools such as
trim low-quality sequences, library primers and adapters. CD-HIT (Fu et al., 2012). Finally, a gene abundance
Reads mapped to host genomes using Bowtie 2 are treated table can be generated using alignment-based tools such as
as contaminated reads and filtered out. KneadData is an Bowtie 2 or alignment-free methods such as Salmon (Patro
integrated pipeline, including Trimmomatic, Bowtie 2, and et al., 2017). Millions of genes are normally present in a
related scripts that can be used for quality control, to remove metagenomic dataset. These genes must be combined into
host-derived reads, and to output clean reads (Fig. 2B). functional annotations, such as KEGG Orthology (KO),
The main step in metagenomic analysis is to convert modules and pathways, representing a form of dimensional
clean data into taxonomic and functional tables using reads- reduction (Kanehisa et al., 2016).
based and/or assembly-based methods. The reads-based In addition, metagenomic data can be used to mine gene
methods align clean reads to curated databases and output clusters or to assemble draft microbe genomes. The anti-
feature tables (Fig. 2B). MetaPhlAn2 is a commonly used SMASH database is used to identify, annotate, and visualize
taxonomic profiling tool that aligns metagenome reads to a gene clusters involved in secondary metabolite biosynthesis
pre-defined marker-gene database to perform taxonomic (Blin et al., 2018). Binning is a method that can be used to
classification (Truong et al., 2015). Kraken 2 performs exact recover partial or complete bacterial genomes in metage-
k-mer matching to sequences within the NCBI non-redun- nomic data. Available binning tools include CONCOCT (Al-
dant database and uses lowest common ancestor (LCA) neberg et al., 2014), MaxBin 2 (Wu et al., 2015), and
A a B C
Relative abundance
ab
ab
b b
PCo 2
Index
c
c
D E
Taxonomic table Functional table
Sample ID Sample ID
Sample metadata
-Log10(P-value)
S1 S2 S3 S4 S1 S2 S3 S4
OTU_1 KO_01
OTU_3 KO_03
OTU_4 KO_04
OTU_5 KO_05
Taxonomy Feature abundance
......
......
F G H
OTU 14
OT U 13
OTU 12
OTU 15
OTU 11
OTU
16
OTU 10
17
18
OTU
OTU 8
OTU
U 19
227
OTU 7
OTU
20
OT 21
9
OT U 6
OT
T U 36
U
U
OT
U
TU 3 5
OT U5
OT
O
97
O
T
Class A
O
T 25
OT U 4
O
U TU 22
OT 3 O U
U
2 OT 74
OT U
U OT
1 U 23
OT
24
OTU
26
OTU OTU
71 27
OTU 70 OTU
OTU 28
OTU 69
OTU 29
OTU 68
OTU 30
OTU 67 OTU 73
OTU 66 OTU 81
OTU 65 OTU 34
64 OTU
OTU 31
63 OTU
OTU 32
62 OTU
OTU 33
Class B
OT
U 61 U3
OT 60 OT 5
U U
OT 59 OT 36
U 8 O U
OT U 5 T U 37
T
TU 57
38
O
O
TU 72
TU
56
OT U 40
O
55
39
OT
U
54
O
OT
U 53
OT
OT
U
52
U4
OTU
OT
51
OTU
U 42
50
OTU 145
OTU 49
OT
OTU
OTU 45
OTU 48
1
OTU 46
OTU 47
OTU
OTU
43
44
Figure 3. Overview of statistical and visualization methods for feature tables. Downstream analysis of microbiome feature
tables, including alpha/beta-diversity (A/B), taxonomic composition (C), difference comparison (D), correlation analysis (E), network
analysis (F), classification of machine learning (G), and phylogenetic tree (H). Please see Table 2 for more details.
MetaBAT2 (Kang et al., 2015). Binning tools cluster contigs STATISTICAL ANALYSIS AND VISUALIZATION
into different bins (draft genomes) based on tetra-nucleotide
The most important output files from amplicon and metage-
frequency and contig abundance. Reassembly is performed
nomic analysis pipeline are taxonomic and functional table-
to obtain better bins. We recommend using a binning pipe-
s (Figs. 2 and 3). The scientific questions that researchers
line such as MetaWRAP (Uritskiy et al., 2018) or DAStool
could answer using the techniques include the following:
(Sieber et al., 2018), which integrate several binning soft-
Which microbes are present in the microbiota? Do different
ware packages to obtain refined binning results and more
experimental groups show significant differences in alpha-
complete genomes with less contamination. These pipelines
and beta-diversity? Which species, genes, or functional
also supply useful scripts for evaluation and visualization.
pathways are biomarkers of each group? To answer these
For a more comprehensive review on metagenomic experi-
questions, methods are needed for both overall and details
ments and analysis, we recommend Quince et al. (2017).
statistical analysis and visualization. Overall visualization
Taxonomic Relative abundance of features Stacked bar plot Taxonomic composition of each sample (Beckers et al.,
composition 2017) or group (Jin et al., 2017) (Fig. 3C)
Flow or alluvial Relative abundance (RA) of taxonomic changes among
diagram seasons (Smits et al., 2017) or time-series (Zhang
et al., 2018b)
Sanky diagram A variety of Venn diagrams showing changes in RA and
common or unique features among groups (Smits
et al., 2017)
Difference Significantly different biomarkers Volcano plot A variety of scatter plots showing P-value, RA, fold
comparison between groups change, and number of differences (Shi et al., 2019a)
Manhattan plot A variety of scatter plots showing P-values, taxonomy,
and highlighting significantly different biomarkers
(Zgadzaj et al., 2016) (Fig. 3D)
Extend bar plot Bar plot of RA combined with difference and confidence
intervals (Parks et al., 2014)
Correlation Correlation between features and Scatter plot with Shows changes in features with time (Metcalf et al.,
analysis sample metadata linear fitting 2016) or relationships with other numeric metadata
(Fig. 3E)
Corrplot Correlation coefficient or distance triangular matrix
visualized by color and/or shape (Zhang et al., 2018b)
Heatmap RA of features that change with time (Subramanian
et al., 2014)
Network Global view correlation of features Colored based Finding correlation patterns of features based on
analysis on taxonomy or taxonomy (Fig. 3F) and/or modules (Jiao et al., 2016)
modules
Colors highlight Highlighting important features and showing their
important positions and connections (Wang et al., 2018b)
features
Machine Classification groups or Heatmap Colored block showing classification results (Fig. 3G)
learning regression analysis for numeric (Wilck et al., 2017) or feature patterns in a time series
metadata prediction (Subramanian et al., 2014).
Bar plot Feature importance, RA (Zhang et al., 2019), and
increase in mean squared error (Subramanian et al.,
2014).
Treemap Phylogenetic tree or taxonomy Phylogenetic Phylogenetic tree (Fig. 3H) shows relationship of OTUs
hierarchy tree or or species (Levy et al., 2018). Taxonomic cladogram
cladogram highlighting interesting biomarkers (Segata et al.,
2011).
Table 2 continued
can be used to explore differences in alpha/beta- diversity permutational multivariate analysis of variance (PERMA-
and taxonomic composition in a feature table. Details anal- NOVA) with the adonis() function in vegan (Oksanen et al.,
ysis could involve identifying biomarkers via comparison, 2007).
correlation analysis, network analysis, and machine learning Taxonomic composition describes the microbiota that are
(Fig. 3). We will discuss these methods below and provide present in a microbial community, which is often visualized
examples and references to facilitate such studies (Fig. 3 using a stacked bar plot (Fig. 3C and Table 2). For simplicity,
and Table 2). the microbiota is often shown at the phylum or genus level in
Alpha diversity evaluates the diversity within a sample, the plot.
including richness and evenness measurements. Several Difference comparison is used to identify features (such
software packages can be used to calculate alpha diversity, as species, genes, or pathways) with significantly different
clinical indices, or to identify key environmental factors that describing their methods. Reproducibility is critical for
affect microbiota and dynamic taxa in a time series (Edwards microbiome analysis because it is impossible to reproduce
et al., 2018). results without raw data, detailed sample metadata, and
Network analysis explores the co-occurrence of features analysis codes. If the readers can run the codes, they will
from a holistic perspective (Fig. 3F). The properties of a better understand what has been done in the analyses. We
correlation network might represent potential interactions recommend that researchers share their sequencing data,
between co-occurring taxa or functional pathways. Correla- metadata, analysis codes, and detailed statistical reports
tion coefficients and significant P-values could be computed using the following steps:
using the cor.test() function in R or more robust tools that are
suitable for compositional data such as the SparCC (sparse Upload and share raw data and metadata in a data
correlations for compositional data) package (Kurtz et al., center
2015). Networks could also be visualized and analyzed
Amplicon or metagenomic sequencing generates a large
using R library igraph (Csardi and Nepusz, 2006), Cytoscape
volume of raw data. Normally, raw data must be uploaded to
(Saito et al., 2012), or Gephi (Bastian et al., 2009). There are
data centers such as NCBI, EBI, and DDBJ during publica-
several good examples of network analysis, such as studies
tion. In recent years, several repositories have also been
exploring the distribution of phylum or modules (Fan et al.,
established in China to provide data storage and sharing
2019) or showing trends at different time points (Wang et al.,
services. For example, the Genome Sequence Archive
Protein & Cell
2019).
(GSA) established by the Beijing Institute of Genomics
Machine learning is a branch of artificial intelligence that
Chinese Academy of Sciences (Wang et al., 2017; Mem-
learns from data, identifies patterns, and makes decisions
bers, 2019) has a lot of advantages (Table 3). We recom-
(Fig. 3G). In microbiome research, machine learning is used
mend that researchers upload raw data to one of these
for taxonomic classification, beta-diversity analysis, binning,
repositories, which not only provides backup but also meets
and compositional analysis of particular features. Commonly
the requirements for publication. Several journals such as
used machine learning methods include random forest
Microbiome require that the raw data should be deposited in
(Vangay et al., 2019; Qian et al., 2020), Adaboost (Wilck
repositories before submitting the manuscript.
et al., 2017), and deep learning (Galkin et al., 2018) to
classify groups by selecting biomarkers or regression anal-
ysis to show experimental condition-dependent changes in Share pipeline scripts with other researchers
biomarker abundance (Table 2).
Pipeline scripts could help reviewers or readers evaluate the
Treemap is widely used for phylogenetic tree construction
reproducibility of experimental results. We provide sample
and for taxonomic annotation and visualization of the
pipeline scripts for amplicon and metagenome analyses at
microbiome (Fig. 3H). Representative amplicon sequences
https://github.com/YongxinLiu/Liu2020ProteinCell. The run-
are readily used for phylogenetic analysis. We recommend
ning environment and software version used in analysis
using IQ-TREE (Nguyen et al., 2014) to quickly build high-
should also be provided to help ensure reproducibility. If
confidence phylogenetic trees using big data and online
Conda is used to deploy software, the command “conda env
visualization using iTOL (Letunic and Bork, 2019). Annota-
export environment_name > environment.yaml” can gener-
tion files of tree can easily be generated using the R script
ate a file containing both the software used and various
table2itol (https://github.com/mgoeker/table2itol). In addition,
versions for reproducible usage. For users who are not
we recommend using GraPhlAn (Asnicar et al., 2015) to
familiar with command lines, webservers such as Qiita
visualize the phylogenetic tree or hierarchical taxonomy in
(Gonzalez et al., 2018), MGnify (Mitchell et al., 2020), and
an attractive cladogram.
gcMeta (Shi et al., 2019b) could be used to perform analysis.
In addition, researchers may be interested in examining
However, webservers are less flexible than the command
microbial origin to address issues such as the origin of gut
line mode because they provide fewer adjustable steps and
microbiota and river pollution, as well as for forensic testing.
parameters.
FEAST (Shenhav et al., 2019) and SourceTracker (Knights
et al., 2011) were designed to unravel the origins of microbial
communities. If researchers would like to focus on the reg- Provide a detailed statistical and visualization reports
ulatory relationship between genetic information from the
The tools used for statistical analysis and visualization of a
host and microorganisms (Wang et al., 2018a), genome-
feature table include Excel, GraphPad, and Sigma plot, but
wide association analysis (GWAS) might be a good choice
these are commercial software tools, and are difficult to
(Wang et al., 2016).
quickly reproduce the results. We recommend using tools
such as R Markdown or Python Notebooks to trace all
REPRODUCIBLE ANALYSIS analysis codes and parameters and storing them in a version
control management system such as GitHub (Table 3).
Reproducible analysis requires that researchers submit their
These tools are free, open-source, cross-platform, and easy-
data and code along with their publications instead of merely
to-use. We recommend that researchers record all scripts studies are needed to dissect the causality of microbiome
and results of statistical analysis and visualization in R and host phenotypes.
markdown files. An R markdown document is a fully repro- Shotgun metagenomic sequencing could provide insights
ducible report that includes codes, tables, and figures in into a microbial community structure at strain-level, but it is
HTML/PDF format. This work mode would greatly improve difficult to recover high-quality genome (Bishara et al., 2018).
the efficiency of microbiome analysis and make the analysis Single-cell genome sequencing shows very promising
process transparent and easier to understand. R visualiza- applications in microbiome research (Xu and Zhao, 2018).
tion codes can refer to R Graph Gallery (Table 3). The input Based on flow cytometry and single-cell sequencing, Meta-
files (feature tables + metadata), analysis notebook (*.Rmd), Sort could recover high-quality genomes from sorted sub-
and output results (figures, tables, and HTML reports) of the metagenome (Ji et al., 2017). Recently developed third-
analysis can be uploaded to GitHub, which would allow generation sequencing techniques have been used for
peers to repeat your analyses or reuse your analysis codes. metagenome analysis, including Pacific Biosciences (Pac-
ImageGP (http://www.ehbio.com/ImageGP) provides more Bio) single molecule real time sequencing and the Oxford
than 20 statistical and visualization methods, making it a Nanopore Technologies sequencing platform (Bertrand
good choice for researchers without a background in R. et al., 2019; Stewart et al., 2019; Moss et al., 2020). With the
improvement in sequencing data quality and decreasing
costs, these techniques will lead to a technological revolution
NOTES AND PERSPECTIVES
Biosciences; PERMANOVA, permutational multivariate analysis of Beckers B, Op De Beeck M, Weyens N, Boerjan W, Vangronsveld J
variance; PE250, paired-end 250 bp; PCoA, principal coordinate (2017) Structural variability and niche differentiation in the
analysis; RA, relative abundance; rDNA, ribosome DNA. rhizosphere and endosphere bacterial microbiome of field-grown
poplar trees. Microbiome 5:25
Bertrand D, Shaw J, Kalathiyappan M, Ng AHQ, Kumar MS, Li C,
COMPLIANCE WITH ETHICS GUIDELINES Dvornicic M, Soldo JP, Koh JY, Tong C et al (2019) Hybrid
metagenomic assembly enables high-resolution analysis of
Yong-Xin Liu, Xubo Qian and Yang Bai contributed to write the
resistance determinants and mobile elements in human micro-
paper. Yuan Qin designed and draw the figures. Tong Chen tested
biomes. Nat Biotechnol 37:937–944
all the software mentioned in this review and share the codes. All
Bishara A, Moss EL, Kolmogorov M, Parada AE, Weng Z, Sidow A,
authors read, revise and approved this paper. Yong-Xin Liu, Yuan
Dekas AE, Batzoglou S, Bhatt AS (2018) High-quality genome
Qin, Tong Chen, Xubo Qian, Meiping Lu, Xiaoxuan Guo and Yang
sequences of uncultured microbes by assembly of read clouds.
Bai declare that they have no conflict of interest. This article does not
Nat Biotechnol 36:1067–1075
contain any studies with human or animal subjects performed by the
Blin K, Weber T, Lee SY, Medema MH, Pascal Andreu V, de los
any of the authors.
Santos ELC, Del Carratore F (2018) The antiSMASH database
version 2: a comprehensive resource on secondary metabolite
biosynthetic gene clusters. Nucleic Acids Res 47:D625–D630
OPEN ACCESS
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible
Protein & Cell
This article is licensed under a Creative Commons Attribution 4.0 trimmer for Illumina sequence data. Bioinformatics 30:2114–2120
International License, which permits use, sharing, adaptation, Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith
distribution and reproduction in any medium or format, as long as GA, Alexander H, Alm EJ, Arumugam M, Asnicar F et al (2019)
you give appropriate credit to the original author(s) and the source, Reproducible, interactive, scalable and extensible microbiome
provide a link to the Creative Commons licence, and indicate if data science using QIIME 2. Nat Biotechnol 37:852–857
changes were made. The images or other third party material in this Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud
article are included in the article's Creative Commons licence, unless D, Reddy TBK, Schulz F, Jarett J, Rivers AR, Eloe-Fadrosh EA
indicated otherwise in a credit line to the material. If material is not et al (2017) Minimum information about a single amplified
included in the article's Creative Commons licence and your genome (MISAG) and a metagenome-assembled genome
intended use is not permitted by statutory regulation or exceeds (MIMAG) of bacteria and archaea. Nat Biotechnol 35:725–731
the permitted use, you will need to obtain permission directly from Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA,
the copyright holder. To view a copy of this licence, visit http:// Holmes SP (2016) DADA2: high-resolution sample inference
creativecommons.org/licenses/by/4.0/. from Illumina amplicon data. Nat Methods 13:581–583
Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD,
Costello EK, Fierer N, Peña AG, Goodrich JK, Gordon JI et al
REFERENCES (2010) QIIME allows analysis of high-throughput community
sequencing data. Nat Methods 7:335–336
Alneberg J, Bjarnason BS, de Bruijn I, Schirmer M, Quick J, Ijaz UZ,
Carini P, Marsden PJ, Leff JW, Morgan EE, Strickland MS, Fierer N
Lahti L, Loman NJ, Andersson AF, Quince C (2014) Binning
(2016) Relic DNA is abundant in soil and obscures estimates of
metagenomic contigs by coverage and composition. Nat Meth-
soil microbial diversity. Nat Microbiol 2:16242
ods 11:1144–1146
Carrión VJ, Perez-Jaramillo J, Cordovez V, Tracanna V, de Hollan-
Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende
der M, Ruiz-Buck D, Mendes LW, van Ijcken WFJ, Gomez-
DR, Fernandes GR, Tap J, Bruls T, Batto JM et al (2011)
Exposito R, Elsayed SS et al (2019) Pathogen-induced activation
Enterotypes of the human gut microbiome. Nature 473:174–180
of disease-suppressive functions in the endophytic root micro-
Asnicar F, Weingart G, Tickle TL, Huttenhower C, Segata N (2015)
biome. Science 366:606–612
Compact graphical representation of phylogenetic data and
Charalampous T, Kay GL, Richardson H, Aydin A, Baldan R, Jeanes
metadata with GraPhlAn. PeerJ 3:e1029
C, Rae D, Grundy S, Turner DJ, Wain J et al (2019) Nanopore
Asshauer KP, Wemheuer B, Daniel R, Meinicke P (2015) Tax4Fun:
metagenomics enables rapid clinical diagnosis of bacterial lower
predicting functional profiles from metagenomic 16S rRNA data.
respiratory infection. Nat Biotechnol 37:783–792
Bioinformatics 31:2882–2884
Chen Q, Jiang T, Liu Y-X, Liu H, Zhao T, Liu Z, Gan X, Hallab A,
Bai Y, Müller DB, Srinivas G, Garrido-Oter R, Potthoff E, Rott M,
Wang X, He J et al (2019) Recently duplicated sesterterpene
Dombrowski N, Münch PC, Spaepen S, Remus-Emsermann M
(C25) gene clusters in Arabidopsis thaliana modulate root
et al (2015) Functional overlap of the Arabidopsis leaf and root
microbiota. Sci China Life Sci 62:947–958
microbiota. Nature 528:364–369
Costea PI, Zeller G, Sunagawa S, Pelletier E, Alberti A, Levenez F,
Bastian M, Heymann S, and Jacomy M (2009). Gephi: an open
Tramontano M, Driessen M, Hercog R, Jung F-E et al (2017)
source software for exploring and manipulating networks. In:
Towards standards for human fecal sample processing in
Third international AAAI conference on weblogs and social
metagenomic studies. Nat Biotechnol 35:1069–1076
media.
Csardi G, Nepusz T (2006) The igraph software package for
complex network research. InterJ Complex Syst 1695:1–9
de Goffau MC, Lager S, Sovio U, Gaccioli F, Cook E, Peacock SJ, Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch
Parkhill J, Charnock-Jones DS, Smith GCS (2019) Human CH, Valieris R, Köster J, The Bioconda T (2018) Bioconda:
placenta has no microbiome but can contain potential pathogens. sustainable and comprehensive software distribution for the life
Nature 572:329–334 sciences. Nat Methods 15:475–476
de Muinck EJ, Trosvik P, Gilfillan GD, Hov JR, Sundaram AYM Guo X, Zhang X, Qin Y, Liu Y-X, Zhang J, Zhang N, Wu K, Qu B, He
(2017) A novel ultra high-throughput 16S rRNA gene amplicon Z, Wang X et al (2020) Host-associated quantitative abundance
sequencing library preparation method for the Illumina HiSeq profiling reveals the microbial load variation of root microbiome.
platform. Microbiome 5:68 Plant Commun 1:100003
Edgar RC (2010) Search and clustering orders of magnitude faster Huang AC, Jiang T, Liu Y-X, Bai Y-C, Reed J, Qu B, Goossens A,
than BLAST. Bioinformatics 26:2460–2461 Nützmann H-W, Bai Y, Osbourn A (2019) A specialized metabolic
Edgar RC (2013) UPARSE: highly accurate OTU sequences from network selectively modulates Arabidopsis root microbiota.
microbial amplicon reads. Nat Methods 10:996–998 Science 364:eaau6389
Edgar RC, Flyvbjerg H (2015) Error filtering, pair assembly and error Huang P, Zhang Y, Xiao K, Jiang F, Wang H, Tang D, Liu D, Liu B, Liu
correction for next-generation sequencing reads. Bioinformatics Y, He X et al (2018) The chicken gut metagenome and the
31:3476–3482 modulatory effects of plant-derived benzylisoquinoline alkaloids.
Edwards J, Johnson C, Santos-Medellín C, Lurie E, Podishetty NK, Microbiome 6:211
Bhatnagar S, Eisen JA, Sundaresan V (2015) Structure, varia- Huson DH, Beier S, Flade I, Górska A, El-Hadidi M, Mitra S,
tion, and assembly of the root-associated microbiomes of rice. Ruscheweyh H-J, Tappu R (2016) MEGAN community edition—
Kurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau sequence diversity in metagenomes with comprehensive and
RA (2015) Sparse and compositionally robust inference of scalable probe design. Nat Biotechnol 37:160–168
microbial ecological networks. PLoS Comput Biol 11:e1004226 Mikheenko A, Saveliev V, Gurevich A (2016) MetaQUAST: evalu-
Lagier J-C, Dubourg G, Million M, Cadoret F, Bilen M, Fenollar F, ation of metagenome assemblies. Bioinformatics 32:1088–1090
Levasseur A, Rolain J-M, Fournier P-E, Raoult D (2018) Culturing Mitchell AL, Almeida A, Beracochea M, Boland M, Burgin J,
the human microbiota and culturomics. Nat Rev Microbiol Cochrane G, Crusoe MR, Kale V, Potter SC, Richardson LJ
16:540–550 et al (2020) MGnify: the microbiome analysis resource in 2020.
Langille MGI, Zaneveld J, Caporaso JG, McDonald D, Knights D, Nucleic Acids Res 48:D570–D578
Reyes JA, Clemente JC, Burkepile DE, Vega Thurber RL, Knight Moss EL, Maghini DG, and Bhatt AS (2020) Complete, closed
R et al (2013) Predictive functional profiling of microbial commu- bacterial genomes from microbiomes using nanopore sequenc-
nities using 16S rRNA marker gene sequences. Nat Biotechnol ing. Nat Biotechnol
31:814 Mu D-S, Liang Q-Y, Wang X-M, Lu D-C, Shi M-J, Chen G-J, Du Z-J
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with (2018) Metatranscriptomic and comparative genomic insights into
Bowtie 2. Nat Methods 9:357–359 resuscitation mechanisms during enrichment culturing. Micro-
Letunic I, Bork P (2019) Interactive tree of life (iTOL) v4: recent biome 6:230
updates and new developments. Nucleic Acids Res 47:W256– Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ (2014) IQ-
W259 TREE: a fast and effective stochastic algorithm for estimating
Levy A, Salas Gonzalez I, Mittelviefhaus M, Clingenpeel S, Herrera maximum-likelihood phylogenies. Mol Biol Evol 32:268–274
Protein & Cell
Paredes S, Miao J, Wang K, Devescovi G, Stillman K, Monteiro F Ning K, Tong Y (2019) The fast track for microbiome research.
et al (2018) Genomic features of bacterial adaptation to plants. Genom Proteom Bioinf 17:1–3
Nat Genet 50:138–150 Nurk S, Meleshko D, Korobeynikov A, Pevzner PA (2017)
Li D, Liu C-M, Luo R, Sadakane K, Lam T-W (2015) MEGAHIT: an metaSPAdes: a new versatile metagenomic assembler. Genome
ultra-fast single-node solution for large and complex metage- Res 27:824–834
nomics assembly via succinct de Bruijn graph. Bioinformatics Oksanen J, Kindt R, Legendre P, O’Hara B, Stevens MHH, Oksanen
31:1674–1676 MJ, Suggests M (2007) The vegan package. Commun Ecol Pack
Li J, Jia H, Cai X, Zhong H, Feng Q, Sunagawa S, Arumugam M, 10:631–637
Kultima JR, Prifti E, Nielsen T et al (2014) An integrated catalog of Parks DH, Tyson GW, Hugenholtz P, Beiko RG (2014) STAMP:
reference genes in the human gut microbiome. Nat Biotechnol statistical analysis of taxonomic and functional profiles. Bioinfor-
32:834–841 matics 30:3123–3124
Liu C, Zhou N, Du M-X, Sun Y-T, Wang K, Wang Y-J, Li D-H, Yu H-Y, Pasolli E, Asnicar F, Manara S, Zolfo M, Karcher N, Armanini F,
Song Y, Bai B-B et al (2020) The mouse gut microbial Biobank Beghini F, Manghi P, Tett A, Ghensi P et al (2019) Extensive
expands the coverage of cultured bacteria. Nat Commun 11:79 unexplored human microbiome diversity revealed by over
Liu Y-X, Qin Y, Bai Y (2019) Reductionist synthetic community 150,000 genomes from metagenomes spanning age, geography,
approaches in root microbiome research. Curr Opin Microbiol and lifestyle. Cell 176:649–662.e620
49:97–102 Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C (2017) Salmon
Liu Y-X, Qin Y, Guo X, Bai Y (2019) Methods and applications for provides fast and bias-aware quantification of transcript expres-
microbiome data analysis. Hereditas (Beijing) 41:1–18 sion. Nat Methods 14:417–149
Louca S, Parfrey LW, Doebeli M (2016) Decoupling function and Pedersen HK, Forslund SK, Gudmundsdottir V, Petersen AØ,
taxonomy in the global ocean microbiome. Science 353:1272– Hildebrand F, Hyötyläinen T, Nielsen T, Hansen T, Bork P, Ehrlich
1277 SD et al (2018) A computational framework to integrate high-
Mahnert A, Moissl-Eichinger C, Zojer M, Bogumil D, Mizrahi I, Rattei throughput ‘-omics’ datasets for the identification of potential
T, Martinez JL, Berg G (2019) Man-made microbial resistances in mechanistic links. Nat Protoc 13:2781–2800
built environments. Nat Commun 10:968 Proctor LM, Creasy HH, Fettweis JM, Lloyd-Price J, Mahurkar A,
Marchesi JR, Ravel J (2015) The vocabulary of microbiome Zhou W, Buck GA, Snyder MP, Strauss JF, Weinstock GM et al
research: a proposal. Microbiome 3:31 (2019) The integrative human microbiome project. Nature
McDonald D, Price MN, Goodrich J, Nawrocki EP, DeSantis TZ, 569:641–648
Probst A, Andersen GL, Knight R, Hugenholtz P (2011) An Qian X, Liu Y-X, Ye X, Zheng W, Lv S, Mo M, Lin J, Wang W, Wang
improved Greengenes taxonomy with explicit ranks for ecological W, Zhang X et al (2020) Gut microbiota in children with juvenile
and evolutionary analyses of bacteria and archaea. ISME J 6:610 idiopathic arthritis: characteristics, biomarker identification, and
Members BDC (2019) Database resources of the BIG data center in usefulness in clinical prediction. BMC Genom 21:286
2019. Nucleic Acids Res 47:D8–D14 Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C,
Metcalf JL, Xu ZZ, Weiss S, Lax S, Van Treuren W, Hyde ER, Song Nielsen T, Pons N, Levenez F, Yamada T et al (2010) A human
SJ, Amir A, Larsen P, Sangwan N et al (2016) Microbial gut microbial gene catalogue established by metagenomic
community assembly and metabolic function during mammalian sequencing. Nature 464:59–65
corpse decomposition. Science 351:158–162 Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P,
Metsky HC, Siddle KJ, Gladden-Young A, Qu J, Yang DK, Brehio P, Peplies J, Glockner FO (2013) The SILVA ribosomal RNA gene
Goldfarb A, Piantadosi A, Wohl S, Carter A et al (2019) Capturing
database project: improved data processing and web-based Assessment of variation in microbial community amplicon
tools. Nucleic Acids Res 41:D590–596 sequencing by the microbiome quality control (MBQC) project
Quince C, Walker AW, Simpson JT, Loman NJ, Segata N (2017) consortium. Nat Biotechnol 35:1077–1086
Shotgun metagenomics, from sampling to analysis. Nat Biotech- Smits SA, Leach J, Sonnenburg ED, Gonzalez CG, Lichtman JS,
nol 35:833 Reid G, Knight R, Manjurano A, Changalucha J, Elias JE et al
Ren Z, Li A, Jiang J, Zhou L, Yu Z, Lu H, Xie H, Chen X, Shao L, (2017) Seasonal cycling in the gut microbiome of the Hadza
Zhang R et al (2019) Gut microbiome analysis as a tool towards hunter-gatherers of Tanzania. Science 357:802–806
targeted non-invasive biomarkers for early hepatocellular carci- Stewart RD, Auffret MD, Warr A, Walker AW, Roehe R, Watson M
noma. Gut 68:1014–1023 (2019) Compendium of 4,941 rumen metagenome-assembled
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a biocon- genomes for rumen microbiome biology and enzyme discovery.
ductor package for differential expression analysis of digital gene Nat Biotechnol 37:953–961
expression data. Bioinformatics 26:139–140 Stewart RD, Auffret MD, Warr A, Wiser AH, Press MO, Langford KW,
Rognes T, Flouri T, Nichols B, Quince C, Mahé F (2016) VSEARCH: Liachko I, Snelling TJ, Dewhurst RJ, Walker AW et al (2018)
a versatile open source tool for metagenomics. PeerJ 4:e2584 Assembly of 913 microbial genomes from metagenomic
Ross AA, Müller KM, Weese JS, Neufeld JD (2018) Comprehensive sequencing of the cow rumen. Nat Commun 9:870
skin microbiome analysis reveals the uniqueness of human skin Subramanian S, Huq S, Yatsunenko T, Haque R, Mahfuz M, Alam
and evidence for phylosymbiosis within the class mammalia. MA, Benezra A, DeStefano J, Meier MF, Muegge BD et al (2014)
Proc Natl Acad Sci USA 115:E5786–E5795 Persistent gut microbiota immaturity in malnourished Banglade-
et al (2016) Genome-wide association analysis identifies varia- Yang J, Yu J (2018) The association of diet, gut microbiota and
tion in vitamin D receptor and other host factors influencing the colorectal cancer: what we eat may imply what we get. Protein
gut microbiota. Nat Genet 48:1396–1406 Cell 9:474–487
Wang J, Zheng J, Shi W, Du N, Xu X, Zhang Y, Ji P, Zhang F, Jia Z, Ye SH, Siddle KJ, Park DJ, Sabeti PC (2019) Benchmarking
Wang Y et al (2018) Dysbiosis of maternal and neonatal metagenomics tools for taxonomic classification. Cell 178:779–
microbiota associated with gestational diabetes mellitus. Gut 794
67:1614–1625 Yilmaz P, Kottmann R, Field D, Knight R, Cole JR, Amaral-Zettler L,
Wang W, Yang J, Zhang J, Liu Y-X, Tian C, Qu B, Gao C, Xin P, Gilbert JA, Karsch-Mizrachi I, Johnston A, Cochrane G et al
Cheng S, Zhang W et al (2020) An Arabidopsis secondary (2011) Minimum information about a marker gene sequence
metabolite directly targets expression of the bacterial type III (MIMARKS) and minimum information about any (x) sequence
secretion system to inhibit bacterial virulence. Cell Host Microbe (MIxS) specifications. Nat Biotechnol 29:415–420
27:601–613.e607 Zgadzaj R, Garrido-Oter R, Jensen DB, Koprivova A, Schulze-Lefert
Wang X, Wang M, Xie X, Guo S, Zhou Y, Zhang X, Yu N, and Wang P, Radutoiu S (2016) Root nodule symbiosis in Lotus japonicus
E (2020b) An amplification-selection model for quantified rhizo- drives the establishment of distinctive rhizosphere, root, and
sphere microbiota assembly. Sci Bull nodule bacterial communities. Proc Natl Acad Sci USA 113:
Wang Y, Song F, Zhu J, Zhang S, Yang Y, Chen T, Tang B, Dong L, E7996–E8005
Ding N, Zhang Q et al (2017) GSA: genome sequence archive*. Zhang F, Cui B, He X, Nie Y, Wu K, Fan D, Feng B, Chen D, Ren J,
Genom Proteom Bioinf 15:14–18 Deng M et al (2018) Microbiota transplantation: concept,
Protein & Cell
Ward T, Larson J, Meulemans J, Hillmann B, Lynch J, Sidiropoulos methodology and strategy for its modernization. Protein Cell
D, Spear JR, Caporaso G, Blekhman R, Knight R et al (2017) 9:462–473
BugBase predicts organism-level microbiome phenotypes. bioR- Zhang J, Liu Y-X, Zhang N, Hu B, Jin T, Xu H, Qin Y, Yan P, Zhang X,
xiv 133462 Guo X et al (2019) NRT1.1B is associated with root microbiota
Wilck N, Matus MG, Kearney SM, Olesen SW, Forslund K, composition and nitrogen use in field-grown rice. Nat Biotechnol
Bartolomaeus H, Haase S, Mähler A, Balogh A, Markó L et al 37:676–684
(2017) Salt-responsive gut commensal modulates TH17 axis and Zhang J, Zhang N, Liu Y-X, Zhang X, Hu B, Qin Y, Xu H, Wang H,
disease. Nature 551:585–589 Guo X, Qian J et al (2018) Root microbiota shift in rice correlates
Wood DE, Lu J, and Langmead B (2019) Improved metagenomic with resident time in the field and developmental stage. Sci China
analysis with Kraken 2. bioRxiv 762302 Life Sci 61:613–621
Wu Y-W, Simmons BA, Singer SW (2015) MaxBin 2.0: an automated Zheng M, Zhou N, Liu S, Dang C, Liu Y-X, He S, Zhao Y, Liu W,
binning algorithm to recover genomes from multiple metage- Wang X (2019) N2O and NO emission from a biological aerated
nomic datasets. Bioinformatics 32:605–607 filter treating coking wastewater: main source and microbial
Xiao L, Feng Q, Liang S, Sonne SB, Xia Z, Qiu X, Li X, Long H, community. J Clean Prod 213:365–374
Zhang J, Zhang D et al (2015) A catalog of the mouse gut Zhu W, Lomsadze A, Borodovsky M (2010) Ab initio gene identifi-
metagenome. Nat Biotechnol 33:1103 cation in metagenomic sequences. Nucleic Acids Res 38:e132–
Xu J, Zhang Y, Zhang P, Trivedi P, Riera N, Wang Y, Liu X, Fan G, e132
Tang J, Coletta-Filho HD et al (2018) The structure and function Zou Y, Xue W, Luo G, Deng Z, Qin P, Guo R, Sun H, Xia Y, Liang S,
of the global citrus rhizosphere microbiome. Nat Commun 9:4894 Dai Y et al (2019) 1,520 reference genomes from cultivated
Xu Y, Zhao F (2018) Single-cell metagenomics: challenges and human gut bacteria enable functional microbiome analyses. Nat
applications. Protein Cell 9:501–510 Biotechnol 37:179–185