0% found this document useful (0 votes)
16 views26 pages

The Cycas Genome and The Early Evolution of Seed Plants

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views26 pages

The Cycas Genome and The Early Evolution of Seed Plants

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/360025973

The Cycas genome and the early evolution of seed plants

Article in Nature Plants · April 2022


DOI: 10.1038/s41477-022-01129-7

CITATIONS READS

134 3,619

65 authors, including:

Yang Liu Linzhou Li

1,368 PUBLICATIONS 42,867 CITATIONS


Technical University of Denmark
46 PUBLICATIONS 1,480 CITATIONS
SEE PROFILE
SEE PROFILE

Shanshan Dong Yongbo Liu


Shenzhen Fairy Lake Botanical Garden Chinese Research Academy of Environmental Sciences
76 PUBLICATIONS 1,914 CITATIONS 75 PUBLICATIONS 1,287 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Sunil Kumar Sahu on 19 April 2022.

The user has requested enhancement of the downloaded file.


Articles
https://doi.org/10.1038/s41477-022-01129-7

The Cycas genome and the early evolution of seed


plants
Yang Liu 1,2,34 ✉, Sibo Wang1,34, Linzhou Li 1,34, Ting Yang1,34, Shanshan Dong2,34, Tong Wei1,34,
Shengdan Wu 3,34, Yongbo Liu 4,34, Yiqing Gong2, Xiuyan Feng5, Jianchao Ma6, Guanxiao Chang6,
Jinling Huang 5,6,7, Yong Yang8, Hongli Wang1,9, Min Liu1, Yan Xu1,9, Hongping Liang 1,9, Jin Yu 1,9,
Yuqing Cai1,9, Zhaowu Zhang1,9, Yannan Fan 1, Weixue Mu1, Sunil Kumar Sahu 1, Shuchun Liu2,
Xiaoan Lang2,10, Leilei Yang2, Na Li2, Sadaf Habib2,11, Yongqiong Yang12, Anders J. Lindstrom 13,
Pei Liang14, Bernard Goffinet 15, Sumaira Zaman15, Jill L. Wegrzyn15, Dexiang Li10, Jian Liu5,
Jie Cui 16, Eva C. Sonnenschein 17, Xiaobo Wang 18, Jue Ruan 18, Jia-Yu Xue19, Zhu-Qing Shao 20,
Chi Song 21, Guangyi Fan1, Zhen Li22, Liangsheng Zhang23,24, Jianquan Liu 25, Zhong-Jian Liu 26,
Yuannian Jiao 27, Xiao-Quan Wang 27, Hong Wu28, Ertao Wang 29, Michael Lisby 30,
Huanming Yang1, Jian Wang1, Xin Liu 1, Xun Xu 1, Nan Li2, Pamela S. Soltis 31,
Yves Van de Peer 19,22,32 ✉, Douglas E. Soltis 31,33 ✉, Xun Gong 5 ✉, Huan Liu 1 ✉ and
Shouzhou Zhang 2 ✉

Cycads represent one of the most ancient lineages of living seed plants. Identifying genomic features uniquely shared by cycads
and other extant seed plants, but not non-seed-producing plants, may shed light on the origin of key innovations, as well as the
early diversification of seed plants. Here, we report the 10.5-Gb reference genome of Cycas panzhihuaensis, complemented by
the transcriptomes of 339 cycad species. Nuclear and plastid phylogenomic analyses strongly suggest that cycads and Ginkgo
form a clade sister to all other living gymnosperms, in contrast to mitochondrial data, which place cycads alone in this posi-
tion. We found evidence for an ancient whole-genome duplication in the common ancestor of extant gymnosperms. The Cycas
genome contains four homologues of the fitD gene family that were likely acquired via horizontal gene transfer from fungi, and
these genes confer herbivore resistance in cycads. The male-specific region of the Y chromosome of C. panzhihuaensis contains
a MADS-box transcription factor expressed exclusively in male cones that is similar to a system reported in Ginkgo, suggesting
that a sex determination mechanism controlled by MADS-box genes may have originated in the common ancestor of cycads and
Ginkgo. The C. panzhihuaensis genome provides an important new resource of broad utility for biologists.

C
ycads are often referred to as ‘living fossils’; they originated the most important events of plant evolution10. As one of the four
in the mid-Permian and dominated terrestrial ecosystems extant gymnosperm groups (cycads, Ginkgo, conifers and gneto-
during the Mesozoic, a period called the ‘age of cycads and phytes), cycads hold an important evolutionary position for under-
dinosaurs’1. Although the major cycad lineages are ancient, mod- standing the origin and early evolution of seed plants. We therefore
ern cycad species emerged from several relatively recent diversi- generated a high-quality genome assembly for a species of Cycas
fications2,3. Cycads are long-lived woody plants that, unlike other to explore fundamental questions in seed plant evolution, includ-
extant gymnosperms, bear frond-like leaves clustered at the tip of ing the phylogenetic position of cycads, the occurrence of ancient
the stem4. Extant cycads comprise 10 genera and approximately whole-genome duplications (WGDs), innovation in gene function
360 species, two-thirds of which are on the International Union and the evolution of sex determination.
for Conservation of Nature Red List of threatened species5. All
living cycad species are dioecious, with individual plants develop- A chromosome-scale genome assembly
ing either male or female cones (except in Cycas, which produces Here, we report a high-quality, chromosome-level genome assembly
a loose cluster of megasporophylls rather than a true female cone; of Cycas panzhihuaensis based on sequencing of the haploid mega-
Fig. 1a)6. Unlike other extant seed plants, cycads and Ginkgo retain gametophyte using a combination of MGI-SEQ short-read, Oxford
flagellated sperm, an ancestral trait shared with bryophytes, lyco- Nanopore long-read and Hi-C sequencing methods (Supplementary
phytes and ferns7. Cycads exhibit other special features, such as the Note 2). The genome comprises 10.5 Gb assembled in 5,123 contigs
accumulation of toxins that deter herbivores8 in seeds and vegeta- (N50 = 12 Mb), with 95.3% of these contigs anchored to the larg-
tive tissues. They also produce coralloid roots that host symbiotic est 11 pseudomolecules, corresponding to the 11 chromosomes
cyanobacteria, making them the only gymnosperm associated with (n = 11) of the C. panzhihuaensis karyotype11 (Supplementary Note
nitrogen-fixing symbionts9. The origin of the seed marked one of 3 and Extended Data Fig. 1). The annotated genome describes

A full list of affiliations appears at the end of the paper.

Nature Plants | www.nature.com/natureplants


Articles NATurE PlAnTS

32,353 protein-coding genes and is mostly composed of repetitive genera emerged from rapid radiations ranging from 11 to 20 Myr
elements adding up to 7.8 Gb (Supplementary Note 4). Based on ago, which may have been a consequence of dramatic Miocene global
BUSCO12 estimation, the gene space completeness of the C. panzhi- temperature changes24,28. Notably, major temperate and tropical radi-
huaensis genome assembly is 91.6% (Supplementary Note 4). ations in several major clades of flowering plants have been shown
Compared with other gymnosperms, the size of the Cycas genome to be associated with Miocene cooling in the past 15 Myr (refs. 29–31).
is similar to that of Ginkgo (10.6 Gb)13,14 and intermediate between
the relatively compact genome of Gnetum (4.1 Gb)15 and the very Cycas is an ancient polyploid
large genomes of conifers (for example, ~20-Gb genomes of Picea WGD is a major driving force in the evolution of land plants and has
and Pinus)16–18. As in other gymnosperm genomes, a large portion dramatically promoted the diversification of flowering plants23,32.
(76.14%) of the C. panzhihuaensis genome consists of ancient repet- Synonymous substitutions per synonymous site (KS) analysis of
itive elements (Supplementary Note 4). In addition, the genome duplicate genes33 revealed a clear peak at similar KS values (~0.85,
contains almost equal proportions of copia and gypsy long terminal range 0.5–1.2) for both Cycas and Ginkgo, suggestive of an ancient
repeat (LTR) elements, in contrast to other gymnosperm genomes, WGD possibly shared by these two lineages (Supplementary Note
in which gypsy repeats are more frequent14,15 (Supplementary Note 7)34. However, the precise evolutionary position of this WGD event
6). Among all sequenced plant genomes, C. panzhihuaensis has the remains ambiguous. Our phylogenomic analyses based on 15
longest average introns (~30.8 kb) and genes (~121.3 kb) (Extended genomes and 1 transcriptome revealed 2,469 gymnosperm-wide
Data Fig. 2a), surpassing those of Ginkgo14. In comparison with duplications in 9,545 gene families and indicate that this WGD
Ginkgo, in which LTRs dominate intron content, the introns of event dates to the most recent common ancestor (MRCA) of extant
C. panzhihuaensis contain a large portion of unknown sequences gymnosperms (Fig. 2a), supporting recent findings based on tran-
(Extended Data Fig. 2b). The longest gene, CYCAS_013063, encod- scriptome data24. We also identified 69 ancient syntenic genomic seg-
ing a kinesin-like protein KIF3A, covers 2.1 Mb in the C. panzhi- ments that further support a gymnosperm-wide WGD (Extended
huaensis genome; the longest intron is approximately 1.5 Mb and Data Fig. 3, Supplementary Fig. 23 and Supplementary Tables 24
was detected in CYCAS_030563, a gene that encodes a photosystem and 25). Furthermore, a mixed dataset with increased sampling—29
II CP43 reaction centre protein. Both genes are expressed, as evi- genomes and 61 transcriptomes—also yielded the same result (Fig.
denced by our long-read transcriptome data. 2a and Extended Data Fig. 4). This gymnosperm-wide WGD, here
named omega (ω), is independent of the WGD preceding the split
Phylogeny of cycads and seed plants between gymnosperms and angiosperms35 and may have contrib-
The C. panzhihuaensis genome provides an opportunity to revisit uted to the subsequent evolution of gymnosperm-specific genes
the long-standing debate on the evolutionary relationships among involved in plant hormone signal transduction, biosynthesis of sec-
living seed plants. On the basis of molecular phylogenetic analy- ondary metabolites, plant–pathogen interaction and terpenoid bio-
ses, extant gymnosperms are resolved as a monophyletic group, synthesis (Supplementary Note 7).
but the branching order among their major lineages has remained
controversial19–23. Our phylogenetic analyses of separate nuclear Ancestral gene innovation in the origin of the seed
(Fig. 1b, Extended Data Fig. 3 and Supplementary Note 5) and The origin of seed plants is marked by the emergence of key traits
plastid datasets strongly support cycads plus Ginkgo as sister to the including the seed, pollen and secondary growth of xylem and
remaining extant gymnosperms, in agreement with several other phloem36. Reconstruction of the evolution of gene families across
analyses23,24, whereas mitochondrial data resolve cycads alone in the seed plant tree of life revealed that 663 orthogroups were gained
that position (Fig. 1c). This conflict arising from the mitochondrial and 368 expanded in the MRCA of extant seed plants compared
data cannot be explained by the presence of extensive RNA edit- with non-seed plants (Fig. 2b, node 1). Among these, 106 of the new
ing sites in the mitochondrial data (Fig. 1c), which in some cases orthogroups and 55 of the expanded orthogroups are associated
has been reported to bias phylogenetic inferences25,26, and instead with seed development in Arabidopsis37, including the regulation of
may be best explained by incomplete lineage sorting, which is sup- development during early embryogenesis, seed dormancy and ger-
ported by our PhyloNet27 and coalescent analyses of nuclear genes mination, and seed coat formation, as well as in immunity and stress
(Supplementary Note 5). response of the seed (Supplementary Note 6).
The extant diversity of cycads was previously considered to have Genes of the LAFL family are well-known as core regulatory genes
arisen synchronously within the past 9–50 million years (Myr)2,3. of seed development, including LEAFY COTYLEDON1 (LEC1),
Our inferences, based on 1,170 low-copy nuclear genes sampled for ABSCISIC ACID INSENSITIVE3 (ABI3), LEAFY COTYLEDON2
339 cycad species and 6 fossil calibrations3 corroborate recent broad (LEC2) and FUSCA3 (FUS3), which encode master transcriptional
analyses of gymnosperms indicating that extant species-rich cycad regulators, interacting to form complexes that control embryo

Fig. 1 | Phylogenomic analyses of cycads and seed plants. a, Illustration of Cycas panzhihuaensis. b, Chronogram of seed plants on the basis of the
SSCG-NT12 dataset inferred using MCMCTree. All branches are maximally supported by bootstrap values (ML) and posterior probabilities (ASTRAL). I,
II, III, VI, V and VI indicate internal branches for which the pie charts depicting gene tree incongruence are complemented by histograms (lower panel)
showing quartet support for the main topology (q1), the first alternative topology (q2) and the second alternative topology (q3). O, Ordovician; S, Silurian; D,
Devonian; C, Carboniferous; P, Permian; T, Triassic; J, Jurassic; K, Cretaceous; Pg, Palaeogene; N, Neogene; Q, Quaternary; Ma, million years ago.
c, DiscoVista species tree analysis: rows correspond to the nine hypothetical groups tested (see Supplementary Note 5 for details) and columns correspond
to the results derived from the use of different datasets and methods. SSCG, single-copy genes; LCG, low-copy genes; MT, mitochondrial genes; PT, plastid
genes; AA, amino acid sequences; NT, nucleotide sequences; NT12, codon 1st + 2nd positions; ASTRAL, coalescent tree inference method using ASTRAL;
CONCAT, maximum likelihood tree inferred with IQ-TREE based on concatenated datasets; STAG, species tree inference using software STAG with low-copy
genes (one to four copies); Original, original organellar nucleotide sequences; RNA Editing, organellar genes with RNA editing site modified. Strong support,
the clade is reconstructed with a support value >95%. Weak support, the clade is reconstructed with support value <95%. Weak rejection, the clade is not
recovered, but the alternative topology is not conflict if poorly supported branches (<85%) are collapsed. Strong rejection, the clade is not recovered, and
the alternative topology is conflict even when poorly supported branches (<85%) are collapsed. d, Diversification of Cycadales. The chronogram of 339
cycad species was inferred with MCMCTree based on 100 nuclear single-copy genes with concordant evolutionary histories. All illustrations are specifically
created for this study (a high-resolution version is available at https://db.cngb.org/codeplot/datasets/public_dataset?id=PwRftGHfPs5qG3gE).

Nature Plants | www.nature.com/natureplants


NATurE PlAnTS Articles
development and maturation38. LEC1 genes are found only in Regulation of seed development in Cycas
vascular plants, but ABI3 is widely distributed in embryophytes To better understand the dynamic changes in gene regulation and
(Supplementary Note 10.6). Cycas and Ginkgo each contain a small regulatory programmes during ovule pollination and fertilization,
number of LEC1 (two and three in each, respectively) and ABI3 we performed a weighted correlation network analysis (WGCNA)
(one in each) genes, whereas C. panzhihuaensis encodes a burst of and identified 11 co-expression modules at different developmental
FUS3 (ten) and LEC2 (seven) genes in the form of tandem repeats. stages of the C. panzhihuaensis ovule and seed (Fig. 3a). The mod-
FUS3 and LEC2 are shared by all living seed plants; the Cycas and ules are enriched in seed nutrition metabolic processes (M2, M6 and
other gymnosperm genomes contain genes composing a new clade M8), membrane biosynthesis (M9, which may relate to the develop-
of B3 domain proteins, that is, the FUS3/LEC2-like clade, which is ment of the integument) and genes synthesizing callose, a major
sister to the clade of FUS3 and LEC2 (Extended Data Fig. 5). The component of the pollen tube (M4) (Supplementary Note 10). A
FUS3/LEC2-like families are unique to gymnosperms, show sig- survey of phytohormones showed that salicylic acid and jasmonic
nificant expression after pollination in C. panzhihuaensis (Extended acid, which are both involved in pathogen resistance, were pro-
Data Fig. 5c) and may play specific roles in initiating embryogenesis duced at higher levels in unpollinated ovules versus post-pollinated
in gymnosperms. ovules (Fig. 3b), and genes involved in the biosynthesis of these two
a c

15-taxa-nuclear 90-taxa-nuclear 72-taxa-organellar


Seed plants
Angiosperms
Leaf
Gymnosperms
Cycads-Gingko
Cycads alone
Gnetophytes alone
Male cone
Gnepine
Gnecup
Coralloid root Stem Gnetifer
Precoralloid root

ST SC CG AA

TR LC G− A
C C SS C A

N C SS SCG AA
C NC SSC CG A

ng
R Or ng
A G G 2
AT − C 12

C O AT− −S G 2

N C -R -O 12
O C − - T

S G -N T

AS ST AL− −N T

T- T- diti l

Ed al
C CO −M −M -NT T
O C − -N T

−P −P E ina
ST TA -LC T1

TR RA SS T1
AS G- -LC −A
O − S -A

C 1
O − S -A
C ON AT CG G-N

A G -N

G N

G -N
C ON AT G -N

A in
iti
C T S T

N A SS T
−S SS G-

-
C AL −S CG

C AL −S CG
N A S N

AT T A ig

N ig
Primary root

C A N r
TR RA SS

C
O N T T
AS T L−
L

C T
L
AS RA

AT T
C A
T

N
AS

Megasporophylls

O N
A

C
Strong support Weak support Weak rejection Strong rejection

b Arabidopsis thaliana d
Oryza sativa
Cinnamomum micranthum
Angiosperms
Liriodendron chinense
II
Nymphaea colorata
Stangeria Microcycas
Amborella trichopoda
I Picea abies Zamia
V Pinus taeda Ceratozamia
IV Gnetum montanum
Gymnosperms 0.4
III Sequoiadendron giganteum
VI Ginkgo biloba
Cycas panzhihuaensis 0.3
Speciation rate

Azolla filiculoides Bowenia


Ferns
Salvinia cucullata 0.2
Selaginella moellendorffii Lycophyte
(Outgroup)
O S D C P T J K E NQ 0.1
Ma
450 400 350 300 250 200 150 100 50 0
0
300 250 200 150 100 50 0 Ma

1.0 1.0 1.0 Encephalartos


I II III
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
Proportion of gene trees

0.2 0.2 0.2


0 0 0
q1 q2 q3 q1 q2 q3 q1 q2 q3
Lepidozamia
1.0 1.0 1.0
IV V VI
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2 Macrozamia
Cycas
0 0 0
q1 q2 q3 q1 q2 q3 q1 q2 q3 Dioon

Nature Plants | www.nature.com/natureplants


Articles NATurE PlAnTS

a b
Previously recognized WGDs

n
on ion

io
Confirmed WGD by this study 2,654 Pinus taeda

ct
ns
tra
pa
n

ss
ai
324 Picea abies

Ex
Lo
G

C
250

Gymnosperms
Gnetum montanum Oryza sativa
275/102/610/47

2,469 [118] Sequoiadendron giganteum 309/193/258/83 Arabidopsis thaliana


ω
Encephalartos longifolius
957 (Transcriptome) Amborella trichopoda
322 Cycas panzhihuaensis
663/321/368/83 Picea abies
Ginkgo biloba 1 627/89/467/60
1,608 [148] 2,846
τ σ ρ Oryza sativa 369/85/274/64 Gnetum montanum
ζ 53 3
2,857
γ β α Arabidopsis thaliana Sequoiadendron giganteum

Angiosperms
215
1,338 [875] 2
Liriodendron chinense 726/0/360/0 445/45/289/58
116 λ 2,141 Ginkgo biloba
Cinnamomum micranthum 4
981 [151] 1,990 886/53/382/59
ε 7,572 Cycas panzhihuaensis
π Nymphaea colorata
MRCA
Amborella trichopoda Azolla filiculoides
1,169/290/408/159
Salvinia cucullata Salvinia cucullata

Outgroups
Azolla filiculoides
Selaginella moellendorffii
Selaginella moellendorffii (Outgroup)

Fig. 2 | Ancient polyploidy events and evolution of gene families in seed plants. a, Inference of the number of gene families with duplicated genes
surviving after WGD events mapped on a phylogenetic tree depicting the relationships among 16 vascular plants included in this study. The number of gene
families with retained gene duplicates reconciled on a particular branch of the species tree are shown above the branch across the phylogeny (Methods).
Numbers in square brackets denote the number of gene families with duplicated genes also supported by synteny evidence. b, Evolutionary analyses and
phylogenetic profiles depicting the gains (light green), losses (light red), expansions (light yellow) and contractions (light blue) of orthogroups, according
to the reconstruction of the ancestral gene content at key nodes and the dynamic changes of the lineage-specific gene characteristics.

phytohormones were also more highly expressed in unpollinated member ALTERED PHLOEM DEVELOPMENT (APL), WOL and
ovules, indicating the higher demand for these hormones as agents BRASSINOSTEROID-INSENSITIVE LIKE 1 (BRL1) and BRL3.
of pathogen resistance in the unpollinated ovule. Gibberellin, which The APL gene is expressed in the phloem and cambium in vascular
is reported to regulate integument development in the ovules of plants, and its encoded protein promotes phloem differentiation42.
flowering plants39, accumulated in the late stage of the pollinated The expression of APL is regulated by WOL in the procambium43.
ovule in Cycas. We also found gene families related to integument The BRL1 and BRL3 genes encode brassinosteroid receptors that
development (for example, those involved in cutin, suberine and play major roles in xylem differentiation and phloem/xylem pat-
wax biosynthesis), with increased expression levels at the late stage terning in angiosperms44. Many copies of these genes were found
of the pollinated ovule. Fertilized ovules accumulated a high level to be highly expressed in cambium or apical meristem of C. panzhi-
of abscisic acid and expressed the genes related to cell wall orga- huaensis (Supplementary Note 6).
nization and biogenesis, indicating their activity in embryo devel- Many gymnosperms are tall, woody plants with cell walls con-
opment, seed coat formation, and seed maturation and dormancy40 taining large quantities of cellulose, xyloglucan, glucomannan,
(Supplementary Note 10.1–10.5). homogalacturonans and rhamnogalacturonans45. In the cellulose
Among genes related to seed development, the most notable is synthase (CESA/CSL) superfamily46, we discovered the existence of
the cupin protein family, expanded in C. panzhihuaensis compared putative ancestral cellulose synthase-like B/H (CSLB/H) and CSLE/G
with all other green plants. Phylogenetic analysis revealed that the that are specifically shared by gymnosperms, and both gene groups
cupin family can be subdivided into two groups: the germin-like originated before the divergence of CSLB and CSLH in angiosperms
and seed storage protein (SSP)-encoding genes. Surprisingly, we (Extended Data Fig. 6). Cycads have manoxylic wood, with a large
identified a new type of gene encoding vicilin-like storage proteins pith, large amounts of parenchyma and relatively few tracheids,
in C. panzhihuaensis; this type appears to be homologous to the in contrast to most other gymnosperms, which have pycnoxylic
vicilin-like antimicrobial peptides (v-AMP) and is organized as a wood, with small amounts of pith, cortex and parenchyma, and a
tandem gene array in the C. panzhihuaensis genome (Fig. 3c). These greater density of tracheids4. The glutamyltransferase 77 (GT77)
v-AMP homologues are mostly expressed in C. panzhihuaensis at family, involved in the synthesis of rhamnogalacturonan II, which
the late stage of pollinated ovules and fertilized ovules, with expres- is essential for cell wall synthesis in rapidly growing tissues47, is
sion gradually decreasing during embryogenesis, suggesting the expanded in C. panzhihuaensis compared with other gymnosperms
potentially important role of v-AMP genes in seed development (Supplementary Note 11). In addition, gene families related to cell
(Fig. 3d and Supplementary Note 10.6). wall extension and loosening are uniquely expanded in C. panzhi-
huaensis, including those encoding hydroxyproline-rich glycopro-
Secondary growth and cell wall synthesis teins, which are seven times more abundant in Cycas than in Ginkgo,
Secondary growth is also a major innovation of seed plants36, and and the fasciclin-like arabinogalactan proteins, which are twice as
it has been recognized from fossils of now-extinct progymno- numerous in Cycas as in Ginkgo, Sequoiadendron giganteum and
sperms, which predated the origin of seed plants36,41. Secondary Pseudotsuga menziesii. How all these gene families related to wood
phloem and xylem are produced by the activity of a bifacial vas- features are regulated in cycads relative to other gymnosperms will
cular cambium (secondary meristem). We found that several genes be important for understanding the differences in wood density.
that are known in angiosperms to regulate secondary growth
in the positioning of the xylem, or in xylem/phloem pattern- The evolution of pollen, pollen tube and sperm
ing, underwent obvious expansions in the MRCA of extant seed Another major innovation during seed plant evolution is the pro-
plants compared with non-seed plants, including the MYB family duction of pollen and the pollen tube36. We found that many genes

Nature Plants | www.nature.com/natureplants


NATurE PlAnTS Articles
a b
Expression level
(Gene network)
1 0 −1 modules Salicylic acid 1-Aminocyclopropane

Hormone amounts (ng g–1)


400 20

Hormone amounts (ng g–1)


Modules -1-carboxylate
M2 M4 M6 M8 M9 M11
M1 Organophosphate metabolic process
M2 Carbohydrate derivative metabolic process 200 10

M3 Organonitrogen compound
Nuclear import signal receptor activity
M4
M5 Protein transport 0 0
Intracellular signal transduction S1 S2 S3 S4 S1 S2 S3 S4
M6 Phosphorelay signal transduction system

Hormone amounts (ng g–1)


100 600 Abscisic acid

Hormone amounts (ng g )


Gibberellic acid

–1
1,3-beta-D-glucan synthase activity
M7 (1->3)-beta-D-glucan biosynthetic process 400
M8 Malate dehydrogenase (decarboxylating) (NAD+) activity 50
Nutrient reservoir activity 200
Organonitrogen compound biosynthetic process
Organonitrogen compound metabolic process 0 0
Cell redox homeostasis S1 S2 S3 S4 S1 S2 S3 S4
Nitrogen compound metabolic process
M9 Cellular nitrogen compound metabolic process 60 Jasmonic acid-isoleucine 20 Jasmonic acid

Hormone amounts (ng g )


Hormone amounts (ng g–1)

–1
Cellular nitrogen compound biosynthetic process
Membrane 40
Membrane coat 10
Vesicle-mediated transport 20
M10 Vesicle coat
Integral component of membrane 0 0
Organelle membrane S1 S2 S3 S4 S1 S2 S3 S4

Whole membrane 300 3

Hormone amounts (ng g–1)


3-Indole acetic acid Trans-zeatin-riboside

Hormone amounts (ng g )


Endopeptidase inhibitor activity

–1
Cysteine-type endopeptidase inhibitor activity
M11 Iron–sulfur cluster binding
200 2

Carbohydrate derivative binding


100 1
Small molecule binding
S1 S2 S3 S4 0 0
S1 S2 S3 S4
Biological process Molecular function Cellular component S1 S2 S3 S4

c GLP2.2
d

GLP3
GLP5
GLP2
GLP4
GLP2.1

4 3 2 1 0
GLP1 log10(TPM)
Embryo
95 78 Fertilized ovule
Angiosperm Late stage of pollinated ovule
Gymnosperm 1 GLP6 Early stage of pollinated ovule
Fern 94 Unpollinated ovule
Lycophytes Megagametophyte
Bryophytes 100
Pollen sac (male cone)
Microsporophylls (male cone)
Apical meristem (stem)
v-AMP homologue 96
Cambium (stem)
95
GLP7 Pith (stem)
Cortex (stem)
Mature leaf
Primary root
Precoralloid root
GLP8 Coralloid root
v-SSP GLP2.1 GLP2.2 GLP5 GLP6,7 &8 l v v-AMP

SSP

l-SSP

Fig. 3 | Gene expression and phytohormone synthesis at different developmental stages of the seed of Cycas and the evolution of seed storage proteins.
a, Heatmap showing relative expression of genes in 11 co-expression modules by WGCNA across 4 developmental stages of the seed: S1, unpollinated
ovule; S2, early stage of pollinated ovule; S3, late stage of pollinated ovule; and S4, fertilized ovule. b, Quantification of eight plant phytohormone amounts
in the same four developmental stages of the Cycas seed as above. The grey histogram represents the amount of hormone (n = 2 biologically independent
experiments) and the error bar represents the standard error. c, Phylogeny of SSPs in some representative species in land plants. The SSPs analysed
include germin-like protein (GLP), legumin-like SSP (l-SSP), vicilin-like SSP (v-SSP) and v-AMP. A maximum likelihood tree with 500 bootstrap replicates
was constructed using RAxML. Bootstrap values (≥50%) for each major clade (highlighted in colour) and the relationships among them are provided. The
Cycas sequences are highlighted in red. d, Expression levels of SSP in different tissues of C. panzhihuaensis.

regulating pollen and pollen tube development (pollen matura- accumulate in the pollen tube cell wall and play a role in pollen
tion, pollen tube growth, pollen tube perception and prevention germination and pollen tube growth49, are remarkably expanded
of multiple-pollen tube attraction) were gained (or the respective in the MRCA of extant seed plants compared with non-seed
gene family expanded) in the MRCA of extant seed plants (Fig. plants (Supplementary Note 6). Such expansion also includes
2b), as might be predicted for these features. For instance, those polcalcin, which is involved in calcium signalling to guide pol-
genes encoding egg cell-secreted proteins that prevent attraction len tube growth50 (Supplementary Note 11). Both the COBRA
of multiple pollen tubes48 originated in the MRCA of living seed and COBRA-like protein gene families are expanded in Cycas
plants. The Ole e 1-like gene families, which encode proteins that and other seed plants compared with non-seed plants, and the

Nature Plants | www.nature.com/natureplants


Articles NATurE PlAnTS

a d
40

30
−log10(P)

20

10

0
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
Chromosome 8 (Mb)
b
πmale/πfemale Seed
FST
∆Hp

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 Male Female


microsporophyll megasporophyll
Chromosome 8 (Mb)
Low High
c

Prophase
30 e

BFm

AFm

11 d

21 d
BFp

0d

7d
0 2 4 8
25 log2(FPKM + 1) MADS-Y on MSY
CYCAS_010388 on autosome

20
MSY scaffolds (Mb)

f 89 Macrozamia pauli-guilielmi MADS-Y-like


98 Zamia neurophyllidia MADS-Y-like
93 MSY
15 Cycas panzhihuaensis MADS-Y
Ginkgo biloba GbMADS4
100
90 Macrozamia lucida CYCAS_010388-like
74 Zamia furfuracea CYCAS_010388-like
74
10 Cycas panzhihuaensis CYCAS_010388 Autosome
Ginkgo biloba GbMADS10
Selaginella moellendorffii LOC9645160 Outgroups
Physcomitrium patens LOC112281299
5

g
ns
is ida a
uc ce
0 a oe i al ura
eb am furf
sd cro
z ia
ca Ma Za
m
15 20 25 30 35 40 45 50 Cy
Chromosome 8 (Mb) M F M F M F

MADS-Y on MSY
CYCAS_010388 on autosome

Fig. 4 | Identification of male-specific chromosomal region in Cycas. a, Manhattan plot of GWAS analysis of sex differentiation in 31 male and 31
female Cycas samples. The red horizontal dashed line represents the Bonferroni-corrected threshold for genome-wide significance (α = 0.05). P values
were calculated from a mixed linear model association of SNPs. Association analyses were performed once with a population of 31 male and 31 female
individuals. b, Ratio of π, FST and difference of pooled heterozygosity (ΔHp) within a 100-kb sliding window between the female and male sequences.
Colour represents values from low (blue) to high (red). c, Genome alignment of the MSY scaffolds with the corresponding female-specific region on
chromosome 8. Scaffolds are separated by grey dashed lines. Red lines represent alignments >5 kb on the forward strand, and blue lines represent those
on the reverse strand. Pink boxes in a–c represent the most differentiated regions between the sex chromosomes. d, Photographs of microsporophyll
and megasporophyll of C. panzhihuaensis. Bar, 1 cm. e, Sex-specific expression of MADS-Y (CYCAS_034085) and CYCAS_010388 in male and female
reproductive organs. Microsporophyll tissues were collected before meiosis (BFm), during prophase (Prophase), after meiosis (AFm) and before
pollination (BFp); female tissues were collected at 0, 7, 11 and 21 days post-pollination. f, Phylogeny of MADS-Y homologues across land plants. Genes from
MSY and autosomes are marked on the right, and those from Selaginella and Physcomitrium are used as outgroups. Numbers above branches represent
bootstrap scores from IQ-TREE. g, Molecular genotyping of male and female cycad samples from Cycas debaoensis, Macrozamia lucida and Zamia furfuracea
using primers specific to homologues of MADS-Y and CYCAS_010388.

COBRA-like protein localizes at the tip of the pollen tube mem- release motile spermatozoids that, following pollination, swim the
brane and plays an important role in pollen tube growth and guid- remaining minute distance within the ovule to fertilize the egg52
ance51 (Supplementary Note 11). (Supplementary Video 1). Sperm motility is conferred by a fla-
All seed plants produce pollen and deliver their sperm through gellar apparatus, and most genes related to its assembly occur in
the growth of a pollen tube, whereas all non-seed land plants the C. panzhihuaensis genome. Ginkgo also retains flagellar genes,
(that is, bryophytes, lycophytes and ferns) rely on free-swimming although fewer, and most notably lacks those encoding radial spoke
motile sperm for sexual reproduction, as do the ancestors of land proteins (RSP) (that is, RSP2, RSP3, RSP9 and RSP11; Extended
plants1,4 (Extended Data Fig. 7a,b). The exceptions among seed Data Fig. 7c). By contrast, Gnetum, conifers and angiosperms,
plants are cycads and Ginkgo, both of which have pollen grains that which develop non-flagellated spermatozoa, lost many flagellar

Nature Plants | www.nature.com/natureplants


NATurE PlAnTS Articles
a b

ov le
)

e
La st ed yte e) ne

te vu
ul
rly nat ph con co

le ina d o
d
Ea olli eto le ale
Po osp ) m) em)

yo vu ll te
br d o f po lina
a (m
ic em te st

rti ag f p e
M st (s m (

Em lize e o ol
Fe st e o vul
U ag c (m lls
C ure oot ot

eg sa hy
Ancylomarina salipaludis-CFB group

at r ro

te ag o
th m te
e )
im ll t

C al m tem
Pr ora roo

M en rop
Pi biu ris
Labedaea rhizosphaerae-actinobacteria

M ary oid
CFB group &

Ap tex af
100

np am
Chitinophaga polysaccharea-CFB group

or le
i c (s

ll o
ec id
46
Actinobacteria & log10(TPM)

Pr llo
98 Actinocorallia populi-actinobacteria

(
a
67

am

r
Mycobacterium kansasii-actinobacteria
Cyanobacteria

or
1.4

C
97 Calothrix rhizosoleniae-cyanobacteria
79 Nostoc flagelliforme-cyanobacteria (outgroups)
95 Neosynechococcus sphagnicola-cyanobacteria CYCAS_004376 1.0
Silvanigrellales bacterium RF1110005-proteobacteria
Spirobacillus cienkowskii-proteobacteria CYCAS_004918
100
79 Fluviispira multicolorata-proteobacteria 0.5
30 CYCAS_004373
Silvanigrella sp. HNSRY-1-proteobacteria Proteobacteria
58 Silvanigrella aquatica-proteobacteria
Silvanigrella paludirubra-proteobacteria
CYCAS_004375 0
71 100
Chromobacterium piscinae-b-proteobacteria
100 Chromobacterium amazonense-b-proteobacteria
100
Clostridium sp. HMSC19B11-firmicutes c Plutella xylostella
d Helicoverpa armigera
Clostridioides difficile-firmicutes
100 Clostridium sp. HMSC19D07-firmicutes 100 50
100 100 Clostridioides difficile CD40-firmicutes Firmicutes * *
Clostridioides difficile P3-firmicutes
Clostridioides difficile DA00232-firmicutes 80 40

Mortality (%)

Mortality (%)
Clostridioides difficile 840-firmicutes
Rhodobacteraceae bacterium KLH11-a-proteobacteria
Chitinimonas sp. BJB300-b-proteobacteria
60 30
80 Paludibacterium yongneupense-b-proteobacteria
64
Pandoraea terrae-b-proteobacteria 40 20
100 Burkholderia stagnalis-b-proteobacteria
33 Burkholderia ubonensis-b-proteobacteria
Photorhabdus luminescens-g-proteobacteria Proteobacteria 20 10
99 Pseudomonas chlororaphis-g-proteobacteria
78 100 Pseudomonas sp. St29-g-proteobacteria
Pseudomonas sp. Os17-g-proteobacteria 0 0
100 Pseudomonas incertae sedis-g-proteobacteria PBS Cytotoxin PBS Cytotoxin
Pseudomonas protegens CHA0-g-proteobacteria
97 Pseudomonas protegens-g-proteobacteria
Neonectria ditissima-ascomycetes
e Plutella xylostella f Helicoverpa armigera
100 Epichloe typhina subsp. poae-ascomycetes Fungi 500 µm 500 µm
100 Lentinula edodes-basidiomycetes
CYCAS 004376
96 evm.model.contig000138.52 DEBAO
CYCAS 004918
100 CYCAS 004373 Cycas
evm.model.contig000009.32 DEBAO
1 CYCAS 004375
evm.model.contig000009.39 DEBAO

No PBS Cytotoxin No PBS Cytotoxin


treatment treatment

Fig. 5 | Origin of a Cycas insecticidal protein. a, Phylogenetic analysis of the TcdA/TcdB pore-forming domain containing proteins shows that the genes
encoding four cytotoxin proteins of Cycas were likely acquired from fungi through an ancient horizontal gene transfer event. The maximum likelihood
tree was generated by RAxML with the PROTCATGTR model and 1,000 bootstrap replicates. The numbers above the branches are bootstrap support
values. b, The expression level of four cytotoxin proteins in different tissues of C. panzhihuaensis. The digital expression values were normalized using the
TPM method. c,d, Mortalities of Plutella xylostella (c) and Helicoverpa armigera (d) after treatment with phosphate buffered saline (PBS) and cytotoxin.
The asterisk indicates a significant difference (two-sided Student’s t-test, P < 0.05, n = 3 biologically independent experiments), whereas the error bar
represents the standard error. e,f, Morphologies of Plutella xylostella (e) and Helicoverpa armigera (f) after receiving PBS and cytotoxin treatments.

structural genes (Supplementary Note 12). Outer dense fibres are Assembling the male-specific region of the Y chromosome
unique accessory structures that maintain the structural integrity (MSY) based on Nanopore long-read and Hi-C data resulted in
of flagella and are vital for flagellar function53. Outer dense fibres 45.5 Mb of sequence distributed over 43 scaffolds, most of which
exist in C. panzhihuaensis and Gingko biloba, as well as all non-seed aligned to the sex-differentiation region on chromosome 8 (Fig. 4c
land plants, but are absent in Gnetum, conifers and angiosperms, and Supplementary Fig. 38). The assembled MSY had an almost
all of which have non-motile sperm (Extended Data Fig. 7c). The 80-Mb difference in length from the corresponding region on the
shift from swimming to non-motile sperm is a major innovation in X chromosome, which agrees with the heteromorphy of the Cycas
land plant evolution, and C. panzhihuaensis and G. biloba exhibit an sex chromosomes. We annotated 624 putative protein-coding
ancestral gene content that is part of the shift from producing flagel- genes within the MSY, 11 of which were highly expressed (tran-
late to non-flagellate sperm cells. scripts per million (TPM) > 1) in the microsporophylls. The most
highly expressed gene in the MSY and also the most differentially
Sex chromosomes and sex determination in Cycas regulated gene between the two sexes is CYCAS_034085 (Fig.
Heteromorphic chromosomes have been reported to be associated 4d,e and Extended Data Fig. 8), which encodes a GGM13-like
with sex determination in Cycas54. To reveal the underlying genetic MADS-box transcription factor (TF), belonging to a lineage sister
mechanism of sex determination, we carried out genome-wide to the angiosperm AP3/PI clade that plays crucial roles in floral
association studies (GWAS) analysis of sex as a binary phenotype development. Its closest homologue, CYCAS_010388, was identi-
for C. panzhihuaensis and identified the most significant association fied on autosomal chromosome 2. In contrast to CYCAS_034085,
signals on chromosome 8, spanning the first 124 Mb on the refer- CYCAS_010388 was much more highly expressed in the ovule than
ence female genome (Fig. 4a). This sex-associated region is also the in the microsporophyll (Fig. 4e). A male-specific polymerase chain
most differentiated between male and female Cycas genomes, with reaction (PCR) product of CYCAS_034085 was amplified from all
the largest fixation index (FST; Supplementary Fig. 37) and the most tested male cycad samples, but was not detected in female samples,
differentiated nucleotide diversity (π) and heterozygosity ratios whereas a CYCAS_010388-specific PCR product was amplified
characterizing the window between 18 and 50 Mb on chromosome in both males and females (Fig. 4g and Supplementary Fig. 39b).
8 (Fig. 4b and Supplementary Note 13). These results confirm that Because of the presence in MSY and its exclusive expression pat-
Cycas possesses an XY sex determination system positioned on tern in males, we named CYCAS_034085 as MADS-Y, a potential
chromosome 8. sex determination gene.

Nature Plants | www.nature.com/natureplants


Articles NATurE PlAnTS

The reduced size of MSY compared with the X chromosome Cycas obtained a cytotoxin defence gene via horizontal
indicates that the Y chromosome of Cycas, unlike that reported for gene transfer
some angiosperms55, underwent severe degeneration and gene loss. Genes of fungal or bacterial origin are rare in seed plants61. However,
The most divergent 32-Mb region (between the 18 and 50 Mb loca- we identified a gene family in the C. panzhihuaensis genome that
tions) between the X and Y chromosomes probably represents an appears to have been acquired from a microbial organism and that
ancient evolutionary segment in the Cycas sex chromosomes. The codes for a Pseudomonas fluorescens insecticidal toxin (fitD). The
broad association of the MADS-Y homologue with sex in cycads acquired genes are flanked by vertically inherited plant sequences.
indicates a conserved sex determination system within this ancient We further confirmed that the relevant assembled regions were free
lineage (Fig. 4f and Supplementary Fig. 39). Moreover, the pres- of bacterial contamination. Transcriptomes and PCR amplification
ence of GbMADS4, a homologue of the Cycas MADS-Y, in Ginkgo from genomic DNA indicated that these genes occur in many Cycas
male-specific contigs56 suggests that the same mechanism for sex species (Supplementary Note 16). The fitD gene family comprises
determination might have originated before the split of cycads and four gene copies in the C. panzhihuaensis genome and three copies
Ginkgo, thus representing an ancient system of sex determination in the C. debaoensis genome (Supplementary Table 51); each copy
in seed plants. encodes a protein that is similar to the fit toxin and the ‘makes cat-
erpillars floppy’ (mcf) toxin of the bacterium Photorhabdus lumi-
Evolution of disease and herbivore resistance genes nescens, a lethal pathogen of insects. Both fit and mcf toxins are
All three types of immune receptors—CC-NBS-LRR (CNL), known for their insecticidal properties, and fit- or mcf-producing
TIR-NBS-LRR (TNL) and RPW8-NBS-LRR (RNL)—show patterns bacteria are often used in pest biocontrol62–64. Phylogenetic analyses
of expansion in C. panzhihuaensis and other gymnosperms, com- suggest that the fitD genes might have been acquired from fungi and
pared with non-seed plants (Supplementary Note 14). CNLs are then expanded before the divergence of C. panzhihuaensis and C.
expanded widely in both gymnosperms and angiosperms, whereas debaoensis (Fig. 5a). The fitD family genes are mainly expressed in
the TNL family tends to have been more expanded in gymnosperms roots, reproductive tissues such as male cones, unpollinated or early
than in most angiosperms, indicating different evolutionary pat- stages of pollinated ovules and embryos (Fig. 5b). Injection of the
terns of plant resistance (R) genes in these two lineages. Our data synthesized C. panzhihuaensis fitD protein resulted in significantly
suggest that RNL genes occur widely in gymnosperms. The RNL higher mortality in larvae of both the diamondback moth (Plutella
family plays a critical role in downstream resistance signal trans- xylostella) and cotton bollworm (Helicoverpa armigera) (Fig. 5c,d).
duction in angiosperms, and the broad occurrence of the RNL The acquisition of the fitD gene family may have provided an
family in gymnosperms suggests that this signalling pathway may important defence for Cycas against insect pests.
have been established no later than the origin of seed plants. Gene
families encoding resistance-related proteins are greatly expanded Conclusions
in C. panzhihuaensis and other gymnosperm genomes compared The high-quality genome sequence for Cycas, the last major lineage
with non-seed plants (Supplementary Note 14). For example, of seed plants for which a high-quality genome assembly was lack-
genes encoding endochitinases and chitinases as defences against ing, closes an important gap in our understanding of genome struc-
chitin-containing fungal pathogens are expanded as tandem repeats ture and evolution in seed plants. This genome enables comparative
in the C. panzhihuaensis and most gymnosperm genomes com- genomics and phylogenomic analyses to unravel the genetic control
pared with other land plants. of important traits in cycads and other gymnosperms, including a
Cycads comprise many more living species57 than Ginkgo, which WGD shared by gymnosperms, a sex determination mechanism
was once diverse in the Mesozoic but includes only one extant spe- that appears to be shared by cycads and Ginkgo, and critical gene
cies58. One possible explanation is that cycads may have acquired innovations including those that enable seed and pollen tube forma-
enhanced resistance to pathogens and herbivores through encoding tion, as well as chemical defence.
diversified resistance-related genes and the biosynthesis of diversi-
fied secondary compounds4,8. Indeed, comparisons of the Cycas and Methods
Ginkgo genomes reveal many Cycas-specific orthogroups enriched Plant materials. Fresh megagametophytes of Cycas panzhihuaensis, cultivated in
the garden of the Kunming Institute of Botany, Chinese Academy of Sciences, were
in pathogen interaction pathways (Supplementary Note 14), and C. collected for genome sequencing. The plant was originally transplanted from the
panzhihuaensis also shows remarkable expansions in plant immunity Pudu River, Luquan county, Yunnan, China (25° 57′ 35.2584″ N, 102° 43′ 41.5848″
and stress response gene families compared with Ginkgo, including E) and the voucher specimen (collection number: PZHF03) has been deposited
genes that encode programmed cell death, abiotic stress response, in the Herbarium of the Kunming Institute of Botany (KUN). For transcriptome
serine protease inhibitors against pests and ginkbilobin with anti- sequencing, we sampled 12 different types of organs and tissues from C.
panzhihuaensis, including megagametophyte, pollen sac, microsporophylls, apical
bacterial and antifungal activities (Supplementary Note 14). meristem of stem, cortex of stem, pith of stem, cambium of stem, mature leaf,
Terpenoids are a diverse group of secondary metabolites young leaf, primary root, precoralloid roots and coralloid roots (Supplementary
encoded by terpene synthase (TPS) genes59. Several TPS subfamilies Table 2). Ovule material was collected from two artificially pollinated individuals,
(TPS-a to TPS-h) are known in plants60, among which the TPS-d and we divided the development stages into four: unpollinated ovule (before
the artificial pollination), early stage of pollinated ovule (21 d after the artificial
family is unique to gymnosperms, and three of the four types of
pollination), late stage of pollinated ovule (88 d after the artificial pollination) and
TPS-d were found in C. panzhihuaensis, with remarkable expan- fertilized ovule or seed (119 d after the artificial pollination) (Supplementary Tables
sions of TPS-d2 compared with Ginkgo and most other gymno- 2 and 19). In addition, stem and root tissues of C. panzhihuaensis were used to
sperms (Supplementary Note 15). In addition, we identified a novel generate full-length transcriptomes (Supplementary Table 2). For phylogenomic
TPS subfamily in Cycas, with three copies in C. panzhihuaensis and analyses, we newly generated transcriptomes of 47 gymnosperms (Supplementary
Tables 2 and 13). We also sequenced transcriptomes of 339 cycad species
eight copies in Cycas debaoensis (Extended Data Fig. 9a). The gene (Supplementary Tables 2 and 14). For population resequencing, fresh leaf samples
expression levels of all TPS genes across different C. panzhihuaensis were collected for 31 male and 31 female plants that were randomly sampled in the
tissues (Extended Data Fig. 9b) reveal that many TPS genes are Cycas panzhihuaensis National Natural Reserve in Sichuan, China, where there is a
mainly expressed in the root (especially primary root and coral- population of approximately 38,000 C. panzhihuaensis individuals (Supplementary
loid root), microsporophyll and pollen sac, late stage of the polli- Table 4).
nated ovule and fertilized ovule. The three Cycas-specific TPS genes DNA and RNA sequencing. For genome sequencing, the genomic DNA
were mainly expressed in the root and male cone, but one of them was extracted by the QIAGEN Genomic kit followed the manufacturer’s
(CYCAS_009486) is particularly highly expressed in the megagame- instructions65. Nanodrop and Qubit (Invitrogen) were used to quantify the
tophyte and in the post-pollination and fertilized ovule. DNA. Nanopore libraries were prepared by SQK-LSK108 and sequenced using

Nature Plants | www.nature.com/natureplants


NATurE PlAnTS Articles
a Nanopore PromethION sequencer. The rest of the DNA was used to generate v.2 transcription factor database domain rules. Phylogenetic tree analysis was
short-read sequences using an MGI-SEQ platform, with 150-bp read length used to verify the majority of TFs and transcriptional regulators. Details about
and 300–500 DNA-fragment insert size. Hi-C libraries were created from fresh phylogenetic tree reconstruction for each TF can be found in the figure captions.
megagametophyte, following a previously published method66. Briefly, the tissue
was fixed in formaldehyde, lysed and the cross-linked DNA was digested overnight Phylogenetic reconstruction and divergence-time estimation. Nuclear
with HindIII. Sticky ends were biotinylated and proximity-ligated to generate phylogenetic reconstruction. The downloaded genome sequences and the newly
chimeric junctions, which were subsequently physically sheared to 500–700 bp generated genome sequences of C. panzhihuaensis were used to construct the
in size. The initial cross-linked long-distance physical interactions were then orthogroups using OrthoFinder82 with default settings. The software KinFin83
represented by chimeric fragments, which were processed into paired-end was used to select single-copy gene families for phylogenetic reconstruction
sequencing libraries. Paired-end reads were produced on both the MGI-SEQ with default parameters. TranslatorX84 was used to build gene alignments for
and Illumina HiSeq X platforms. See Supplementary Note 3 for details on codon (nt), codon 1st + 2nd (nt12) and amino acid (aa) sequences (command:
transcriptome, organelle genome and small RNA sequencing. perl translatorx_vLocal.pl -i gene.fa -o gene.out -p F -t F -w 1 -c 1 -g "-b1="$b1"
-b2="$b1" -b3=8 -b4=5 -b5=h -b6=y"). IQ-TREE 2 (ref. 85) was used to infer the
Genome assembly. About 1,010 Gb (~100×) Nanopore long-read data were maximum likelihood trees with an initial partition scheme of codon positions
used for genome assembly using NextDenovo (https://github.com/Nextomics/ combing ModelFinder, tree search, and ultrafast bootstrap. ASTRAL86 was used
NextDenovo) with default parameters (read_cutoff = 1k, seed_cutoff = 12k, to summarize the coalescent species tree and the quartet supports with default
minimap2_options_cns = -x ava-ont -k17 -w17). To further enhance assembly settings (-t 8). ASTRAL uses the quartet trees of the maximum likelihood
contiguity, about 456 Gb of Hi-C data were used to execute Hi-C chromosome phylogenies of each gene to produce the topology of the species tree while quartet
conformation in conjunction with 3D-dna algorithm67. The accuracy of Hi-C based supports (bar charts) show the percentage of quartets that agree with a specific
chromosomal assembly was assessed using Juicerbox’s chromatin contact matrix. branch in the species tree. STAG (https://github.com/davidemms/STAG) was also
used to construct the species tree with default settings using low-copy genes (one
Repeat annotation. We identified tandem repeats and transposable elements to four copies). The software PHYPARTS87 was used to infer and visualize the
throughout the genome. Tandem repeats were predicted using Tandem Repeat gene tree conflicts on the species tree topology with default settings. The software
Finder (v.4.07)68 with the following parameters: ‘Match = 2, Mismatch = 7, DISCOVISTA88 was used to summarize the conflicts among different analytical
Delta = 7, PM = 80, PI = 10, Minscore = 50 and MaxPeriod = 2,000’. To maximize methods and datasets, regarding several focal phylogenetic relationships.
the opportunity of identifying transposable elements, a combination of de novo
and homology-based approaches was performed following the Repeat Library Molecular dating and diversification analysis. The transcriptome sequencing reads
Construction-Advanced pipeline (http://weatherby.genetics.utah.edu/MAKER/ from 339 cycad species were generated in the current study. Clean reads were
wiki/index.php/Repeat_Library_Construction-Advanced). RepeatMasker69 assembled with TRINITY89, and the longest transcripts were selected and translated
and RepeatProteinMask69 were used to search for known repeat sequences; with TRANSDECODER (https://github.com/TransDecoder). OrthoFinder82 was
MITE-hunter70, LTR_retriever71, LTR_FINDER (v.1.0.6)72 and RepeatModeler73 then used to construct orthogroups for all the cycad species using Ginkgo as the
were then used to search the repeats de novo. The MITE, LTR and consensus outgroup. The software KinFin83 was used to select the mostly single-copy genes
repetitive libraries generated by RepeatModeler were combined and further used as for phylogenetic reconstruction with default settings. TranslatorX84, IQ-TREE 2
the input data for RepeatMasker. (ref. 85) and ASTRAL86 were used to align the sequences and to infer the species
tree for cycads as aforementioned. The software SORTADATE90 was used to select
LTR identification and estimation of LTR insertion times. All the candidate genes with mostly concordant evolutionary histories for dating analyses using
LTR elements were first identified using LTR_FINDER and LTR_retriever. MCMCTREE within the software PAML 4 (ref. 91). Rate priors and time priors
LTR_STRUC74 was then used to extract the complete 5′- and 3′-ends of the LTR were set following the method of Morris et al.92. A total of 27 fossils were used
elements. RepeatClassifier was then used to classify the candidate LTR. Distmat to calibrate the chronogram of seed plants, and six fossils for the chronogram of
from the EMBOSS (v.6.5.7.0) package was then used to calculate the K value of the cycads. The diversification pattern for cycads were analysed with Bayesian analysis
retrotransposons’ 5′- and 3′-LTR sequences. Finally, the insertion time (T) of LTRs of macroevolutionary mixture (www.bamm-project.org) following Condamine
was calculated using the formula T = K/2r, where r is the average substitution rate et al.93
of 2.2 × 10−9 substitutions per year per synonymous site. See Supplementary Note 5 for details on organellar phylogenetic
reconstruction, evaluation of the impact of RNA editing and investigation of
Gene annotation and functional annotation. Three types of evidence were used cyto-nuclear incongruences.
to predict protein-coding genes in the C. panzhihuaensis genome. For protein
evidence, Genewise75 was used to predict gene models based on Cycas proteins Identification of whole-genome duplication. An integrated phylogenomic
downloaded from the UniProt protein database and other proteins collected from approach and a method to analyse synteny as described previously35,94,95 were
representative plant species. Next, Hisat76 was used to map the transcriptome used to identify the WGD events in seed plant evolution. The protein-coding
to the genome, and then StringTie77 was used to predict transcriptome-based sequences of 15 completely sequenced genomes and 1 transcriptome, representing
gene models. Next, a custom training hint parameter was used to predict ab seven gymnosperms (C. panzhihuaensis, Encephalatos longifolius, G. biloba,
initio-based gene models in AUGUSTUS78. All the evidence was finally combined Gnetum montanum, Picea abies, Pinus taeda and Sequoiadendron giganteum),
and integrated by EVidenceModeler79. To maximize the opportunity of identifying six angiosperms (Arabidopsis thaliana, Amborella trichopoda, Cinnamomum
high-confidence genes, we further filtered the genes that were not expressed in the micranthum, Liriodendron chinense, Nymphaea colorata and Oryza sativa) and
full-length transcriptome or did not match to functional annotation results. For three other vascular plant outgroups (Azolla filiculoides, Salvinia cucullate and
functional annotation, the gene models were blasted against the UniProt, TrEMBL, Selaginella moellendorffii), were classified into putative gene families/subfamilies by
KEGG, KOG and NR databases. The domain and gene ontology of the gene OrthoFinder82, and then scored for gene duplications across global gene families.
models was identified by InterProScan80 (using data from Pfam, PRINTS, SMART, For the phylogenetic analysis of gene families, amino acid sequences of each
ProDom and PROSITE). gene family were first aligned with MAFFT96, the program PAL2NAL97 was then
used to construct their corresponding nucleotide sequence alignments. We used
Identification of key candidate functional genes. Based on the following criteria, trimAl98 to remove poorly aligned portions of alignments using the ‘automated1’
all candidate genes were screened: first, candidate gene sequences were detected option, which implements a heuristic algorithm to optimize the process for
by BLAST searches with an e value cut-off of 1 × 10−5to the collected query gene trimming the alignment. Finally, maximum likelihood trees were calculated using
sequences gathered from previous studies or public databases; and second, features RAxML99 with the GTRGAMMA model and bootstrap support was estimated
of candidate genes should be similar to the online functional annotation or based on 100 replicates. Following Wu et al.95, we applied two basic requirements
UniProt functional annotation as the query genes. With regard to the identification for the determination of a reliable duplication event: (1) at least one common
of flagellar genes, 58 flagellar-related genes were collected from previous studies81. species’ genes are present in two child branches; and (2) the bootstrap values of
The Reciprocal Best Blast hit method was employed to identify flagella-related the parental node and one of the child nodes are both ≥50%. After scoring gene
genes. For seed-related genes, we searched the genes against both the known duplications in a large-scale analysis on gene families, we were able to confidently
seed database (seedgenes.org/) and previous studies. We firstly used an e value identify the nodes with concentrated gene duplications across the phylogeny, which
(<1 × 10−20) as a cut-off to filter candidates and then filtered the candidates with possibly support the WGD events. Furthermore, because syntenic information is
functional annotation. Regarding the identification of TFs, we used the HMMER the most solid evidence for WGD, and the legacy of syntenic blocks may be found
search method. HMMER domain structure models were downloaded from the if the concentrated gene duplications are indeed derived from WGD events, we
Pfam website (https://pfam.xfam.org/), for each TF as present in the TAPscan v.2 also looked into whether such syntenic blocks exist. The intra- and intergenomic
database for TFs (https://plantcode.online.uni-marburg.de/tapscan/). Preliminary syntenic analyses were conducted using MCscanX100, with the default settings.
TF candidate genes were collected for each species (<1 × 10−5) by searching the In addition, the Nei–Gojobori method101 as implemented in the PAML
Hidden Markov Model profile. Parts of genes were then filtered if they were not package’s yn00 program91 was used to estimate synonymous substitutions per
the homologues according to their functional annotation of SwissProt (<1 × 10−5). synonymous site (KS) for pairwise comparisons of paralogous genes located on
In the end, we filtered genes containing a wrong domain under the TAPscan syntenic blocks. To search for genome-wide duplications, we used DupGen_finder

Nature Plants | www.nature.com/natureplants


Articles NATurE PlAnTS
(https://github.com/qiao-xin/DupGen_finder) to identify duplicated genes that Detection of metabolites and phytohormones. The plant tissues were
were classified into five different categories: WGD duplicates, tandem duplicates, collected and stored in liquid nitrogen, then transferred to freezer at −80 °C.
proximal duplicates, transposed duplicates and dispersed duplicates. For detection of metabolites, tissue samples were preliminarily disposed using
2-chlorophenylalanine (4 ppm) methanol. Samples and glass beads were then
Identification of the sex-differentiation region. To identify the sex-differentiation put into a tissue grinder for 90 s at 55 Hz, followed by centrifugation at 13,780g
region in the Cycas genome, a GWAS approach was adopted on sequence variations at 4 °C for 10 min, taking the supernatant and filtering through a 0.22-μm
from 31 male and 31 female individuals with sex treated as a binary phenotype. membrane, and transferring the filtrate into the detection bottle before liquid
Briefly, raw reads were filtered by Trimmomatic (v.0.38) (ILLUMINACLIP:adapter. chromatography mass spectrometry analysis. The sample extracts were the
fa:2:30:10 HEADCROP:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:5:15 analysed using the ultra high-performance liquid chromatography system
MINLEN:140), and read alignment and single-nucleotide polymorphism Vanquish (ThermoFisher Scientific) and Q Exactive HF-X (ThermoFisher
(SNP) calling were performed using the Sentieon pipeline102. SNPs were filtered Scientific). For the quantitative detection of phytohormones (auxin,
using the following criteria: (1) SNPs were filtered by GATK VariantFiltrations cytokinins, ethylene, abscisic acid, jasmonic acid, gibberellin, salicylic acid
with ‘QD < 2.0 || FS > 60.0 || MQ < 40.0 || SOR > 3.0 || MQRankSum < −12.5 || and brassinolide), tissue samples of primary root, precoralloid roots and
ReadPosRankSum < −8.0’, and indels with ‘QD < 2.0 || FS > 200.0 || SOR > 10.0 || coralloid roots, unpollinated ovule, early stage of pollinated ovule, late stage of
MQRankSum < −12.5 || ReadPosRankSum < −8.0’; (2) total depth <80 or >1,300; pollinated ovule, fertilized ovule and mature embryo were collected. Vanquish
(3) variants with more than two alleles; (4) variants with a missing rate >10% or (ThermoFisher Scientific) and the Q Exactive HF-X (ThermoFisher Scientific)
minor allele frequencies <0.1 were removed; and (5) a linkage disequilibrium were used for the detection of various phytohormones. The qualitative study was
pruning with PLINK (v.1.9) using a window size of 10 kb with a step size of one carried out using a self-constructed database that was built using the reference
SNP and r2 threshold of 0.5, resulting a 4.65-million pruned SNP set for association standards. To accomplish quantitative analysis, different concentrations of
analysis of sex differentiation. GWAS analysis of sex differentiation was performed standard were utilized.
on the linkage disequilibrium-pruned SNP set using the EMMAX program103
(beta-07Mar2010 version). The BN kinship matrix and the first five components Reporting Summary. Further information on research design is available in the
calculated from the principal component analysis104 (v.1.91.4beta3) were included Nature Research Reporting Summary linked to this article.
as random effects. Genetic differentiation (FST) and nucleotide diversity (π) were
calculated within a non-overlapping 100-kb window using VCFtools105 (v.0.1.13). Data availability
See Supplementary Note 13 for details on assembly of Cycas male-specific The genome and transcriptome data, genome assemblies and annotations can be
regions, phylogenetic analysis of MADS-Y and CYCAS_010388 homologues, and found at https://db.cngb.org/codeplot/datasets/public_dataset?id=PwRftGHfPs5q
genotyping of cycad male and female samples. G3gE. The raw genomic, transcriptomic and Hi-C data generated in this study were
deposited in the NCBI Sequence Read Archive (SRA, BioProject PRJNA734434)
Analysis of the differentially expressed genes. Transcriptome sequencing and the CNGB data center (https://db.cngb.org/) under project number
reads were trimmed using Trimmomatic106 program (ILLUMINACLIP:adapter. CNP0001756. Source data are provided with this paper.
fa:2:30:10 HEADCROP:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:5:15
MINLEN:140) and mapped against C. panzhihuaensis annotated gene models
using bowtie2 (with sensitive mode and default alignment parameters) by Received: 3 September 2021; Accepted: 10 March 2022;
retaining the best alignments. TPM were calculated using the eXpress program, Published: xx xx xxxx
which was incorporated in the Trinity89 package. Furthermore, differentially
expressed genes with a differential expression level of false discovery rate ≤ 0.01 References
and at least a twofold expression change were identified using DESeq2 (ref. 107). 1. Raven, P. H., Evert, R. F. & Eichhorn, S. E. Biology of Plants 7th edn
To identify the co-expressed genes during the seed development, we used the R (Macmillan, 2005).
package WGCNA108 on the basis of the TPM data of the genes whose expression 2. Nagalingum, N. S. et al. Recent synchronous radiation of a living fossil.
showed a coefficient of variation >0.5 across the four stages. To better visualize Science 334, 796–799 (2011).
the expression levels, we normalized the expression results. For each gene, the 3. Condamine, F. L., Nagalingum, N. S., Marshall, C. R. & Morlon, H. Origin
TPM value normalized by the maximum TPM value of all stages is shown. and diversification of living cycads: a cautionary tale on the impact of the
Fisher’s exact test was used to examine whether the functional categories were branching process prior in Bayesian molecular dating. BMC Evol. Biol. 15,
over-represented. The resulting P values were adjusted to Q values by the false 65 (2015).
discovery rate correction. 4. Norstog, T. J. & Nicholls, K. J. The Biology of the Cycads (Cornell Univ.
Press, 1997).
Identification of the horizontally transferred cytotoxin genes in C. 5. Calonje, M., Stevenson, D. W. & Osborne, R. The World List of Cycads
panzhihuaensis. The cytotoxin protein sequences of Cycas were used as query to http://www.cycadlist.org (2013–2021).
perform BLASTP searches against the NCBI nr protein sequence database using 6. Sultana, M., Mukherjee, K. K. & Gangopadhyay, G. in Reproductive Biology
the cut-off e value = 1 × 10−5 and max_target_seqs = 20,000. We also performed of Plants (eds Johri, B. M. & Srivastava, P. S.) 118–132 (Springer Science &
additional BLAST searches against the OneKP database and many other available Business Media, 2014).
genomes. See Supplementary Note 16 for details on verification and phylogenetic 7. Paolillo, D. J. Jr The swimming sperms of land plants. BioScience 31,
analysis of the cytotoxin gene. 367–373 (1981).
8. Brenner, E. D., Stevenson, D. W. & Twigg, R. W. Cycads: evolutionary
Assessing the effectiveness of cytotoxin. To improve the expression efficiency innovations and the role of plant-derived neurotoxins. Trends Plant Sci. 8,
of cytotoxin in the prokaryotic system, the full-length coding sequence 446–452 (2003).
of the C. panzhihuaensis cytotoxin protein was optimized for its codons. 9. Costa, J.-L. & Lindblad, P. in Cyanobacteria in Symbiosis (eds Rai, A. N.
C. panzhihuaensis, the optimized sequence was synthesized and ligated to et al.) 195–205 (Springer, 2002).
the pET-28a vector. The pET-28a-CR toxin plasmid was transformed into 10. Pettitt, J. Heterospory and the origin of the seed habit. Biol. Rev. 45,
Escherichia coli BL 21 (DE3) pLysS cells, the resulting strain was used for 401–415 (1970).
expression and purification of recombinant proteins under the control of 11. Yang, D.-Q. & Zhu, X.-F. Karyotype analysis of Cycas panzhihuaensis L.
isopropyl-β-d-thiogalactoside-inducible T7 promoter. Overnight-grown Zhou et S. Y. Yang. J. Syst. Evol. 23, 352–354 (1985).
cultures were diluted 100-fold with 200 ml of fresh LB medium and further 12. Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. &
grown at 37 °C and 220 r.p.m. rotation until the optical density at 600nm Zdobnov, E. M. BUSCO: assessing genome assembly and annotation
reached 0.5. The culture was induced by adding a 0.01 mM final concentration completeness with single-copy orthologs. Bioinformatics 31, 3210–3212
of isopropyl-β-d-thiogalactoside and incubated at 28 °C for 6 h. Cells were (2015).
then harvested and suspended with 20 ml 50of mM Tris–HCl buffer with pH 13. Guan, R. et al. Draft genome of the living fossil Ginkgo biloba. GigaScience
8 at 4 °C, containing 200 mM NaCl, then disrupted by sonication at 4 °C. In an 5, 49 (2016).
RC5 plus centrifuge, the cell lysate was spun at 13,800g for 40 min at 4 °C. The 14. Liu, H. et al. The nearly complete genome of Ginkgo biloba illuminates
preceding step’s supernatant was put onto a Ni-NTA agarose column that had gymnosperm evolution. Nat. Plants 7, 748–756 (2021).
been pre-equilibrated with Tris–NaCl buffer at 4 °C. Tris–NaCl buffer containing 15. Wan, T. et al. A genome for gnetophytes and early evolution of seed plants.
20 mM imidazole was used to thoroughly wash the column, and the 6× His-tagged Nat. Plants 4, 82–89 (2018).
protein was eluted with Tris–NaCl buffer containing 250 mM imidazole. The 16. Nystedt, B. et al. The Norway spruce genome sequence and conifer genome
elution product containing pure protein were washed three times with Tris–NaCl evolution. Nature 497, 579–584 (2013).
buffer and concentrated using centricon (Millipore PM10). Using an horseradish 17. Stevens, K. A. et al. Sequence of the sugar pine megagenome. Genetics 204,
peroxidase-conjugated monoclonal antibody and a western blot assay, the purified 1613–1626 (2016).
His-tagged protein was identified (HRP-66005). See Supplementary Note 16 for 18. Niu, S. et al. The Chinese pine genome and methylome unveil key features
further details on experimental verification of the function of Cycas cytotoxin. of conifer evolution. Cell 185, 204–217 (2021).

Nature Plants | www.nature.com/natureplants


NATurE PlAnTS Articles
19. Ran, J.-H., Shen, T.-T., Wang, M.-M. & Wang, X.-Q. Phylogenomics resolves 49. Prado, N. et al. Nanovesicles are secreted during pollen germination and
the deep phylogeny of seed plants and indicates partial convergent or pollen tube growth: a possible role in fertilization. Mol. Plant 7, 573–577
homoplastic evolution between Gnetales and angiosperms. Proc. Biol. Sci. (2014).
285, 20181012 (2018). 50. Neudecker, P. et al. Solution structure, dynamics, and hydrodynamics of the
20. Li, Z. et al. Single-copy genes as molecular markers for phylogenomic calcium-bound cross-reactive birch pollen allergen Bet v 4 reveal a
studies in seed plants. Genome Biol. Evol. 9, 1130–1147 (2017). canonical monomeric two EF-hand assembly with a regulatory function.
21. Xi, Z., Rest, J. S. & Davis, C. C. Phylogenomics and coalescent analyses J. Mol. Biol. 336, 1141–1157 (2004).
resolve extant seed plant relationships. PLoS ONE 8, e80870 (2013). 51. Higashiyama, T. & Takeuchi, H. The mechanism and key molecules
22. Soltis, D. et al. Phylogeny and Evolution of the Angiosperms: Revised and involved in pollen tube guidance. Annu. Rev. Plant Biol. 66, 393–413
Updated Edition (Univ. of Chicago Press, 2018). (2015).
23. Leebens-Mack, J. H. et al. One thousand plant transcriptomes and the 52. Bold, H. C., Alexopoulos, C. J. & Delevoryas, T. Morphology of Plants and
phylogenomics of green plants. Nature 574, 679–685 (2019). Fungi 5th edn (Harper and Row, 1987).
24. Stull, G. W. et al. Gene duplications and phylogenomic conflict underlie 53. Zhao, W. et al. Outer dense fibers stabilize the axoneme to maintain sperm
major pulses of phenotypic evolution in gymnosperms. Nat. Plants 7, motility. J. Cell. Mol. Med. 22, 1755–1768 (2018).
1015–1025 (2021). 54. Abraham, A. & Mathew, P. M. Cytological studies in the cycads: sex
25. Dong, S., Li, H., Goffinet, B. & Liu, Y. Exploring the impact of RNA editing chromosomes in Cycas. Ann. Bot. 26, 261–266 (1962).
on mitochondrial phylogenetic analyses in liverworts, an early land plant 55. Ming, R., Bendahmane, A. & Renner, S. S. Sex chromosomes in land plants.
lineage. J. Syst. Evol. 60, 16–22 (2021). Annu. Rev. Plant Biol. 62, 485–514 (2011).
26. Du, X.-Y., Lu, J.-M. & Li, D.-Z. Extreme plastid RNA editing may confound 56. Liao, Q. et al. The genomic architecture of the sex‐determining region and
phylogenetic reconstruction: A case study of Selaginella (lycophytes). Plant sex‐related metabolic variation in Ginkgo biloba. Plant J. 104, 1399–1409
Divers. 42, 356–361 (2020). (2020).
27. Wen, D., Yu, Y., Zhu, J. & Nakhleh, L. Inferring phylogenetic networks 57. Jones, D. L. Cycads of the World: Ancient Plants in Today’s Landscape 2nd
using PhyloNet. Syst. Biol. 67, 735–740 (2018). edn (Smithsonian Institution Press, 2002).
28. Zachos, J., Pagani, M., Sloan, L., Thomas, E. & Billups, K. Trends, rhythms, 58. Crane, P. R. An evolutionary and cultural biography of ginkgo. Plants
and aberrations in global climate 65 Ma to present. Science 292, 686–693 People Planet 1, 32–37 (2019).
(2001). 59. Zhou, F. & Pichersky, E. More is better: the diversity of terpene metabolism
29. Folk, R. A. et al. Rates of niche and phenotype evolution lag behind in plants. Curr. Opin. Plant Biol. 55, 1–10 (2020).
diversification in a temperate radiation. Proc. Natl Acad. Sci. USA 116, 60. Chen, F., Tholl, D., Bohlmann, J. & Pichersky, E. The family of terpene
10874–10882 (2019). synthases in plants: a mid-size family of genes for specialized metabolism
30. Sun, M. et al. Recent accelerated diversification in rosids occurred outside that is highly diversified throughout the kingdom. Plant J. 66, 212–229
the tropics. Nat. Commun. 11, 1–12 (2020). (2011).
31. Soltis, P. S., Folk, R. A. & Soltis, D. E. Darwin review: angiosperm 61. Chen, R. et al. Adaptive innovation of green plants by horizontal gene
phylogeny and evolutionary radiations. Proc. Biol. Sci. 286, 20190099 transfer. Biotechnol. Adv. 46, 107671 (2020).
(2019). 62. Ruffner, B. et al. Oral insecticidal activity of plant‐associated
32. Van de Peer, Y., Ashman, T.-L., Soltis, P. S. & Soltis, D. E. Polyploidy: an pseudomonads. Environ. Microbiol. 15, 751–763 (2013).
evolutionary and ecological force in stressful times. Plant Cell 33, 11–26 63. Daborn, P. J., Waterfield, N., Silva, C. P., Au, C. P. Y. & Sharma, S. A single
(2021). Photorhabdus gene, makes caterpillars floppy (mcf), allows Escherichia coli
33. Vanneste, K., Van de Peer, Y. & Maere, S. Inference of genome to persist within and kill insects. Proc. Natl Acad. Sci. USA 99, 10742–10747
duplications from age distributions revisited. Mol. Biol. Evol. 30, 177–190 (2002).
(2013). 64. Péchy-Tarr, M. et al. Molecular analysis of a novel gene cluster encoding an
34. Roodt, D. et al. Evidence for an ancient whole genome duplication in the insect toxin in plant-associated strains of Pseudomonas fluorescens. Environ.
cycad lineage. PLoS ONE 12, e0184454 (2017). Microbiol. 10, 2368–2386 (2008).
35. Jiao, Y. et al. Ancestral polyploidy in seed plants and angiosperms. Nature 65. Sahu, S. K., Thangaraj, M. & Kathiresan, K. DNA extraction protocol for
473, 97–100 (2011). plants with high levels of secondary metabolites and polysaccharides
36. Doyle, J. A. Phylogenetic analyses and morphological innovations in land without using liquid nitrogen and phenol. ISRN Mol. Biol. 2012, 205049
plants. Annu. Plant Rev. 45, 1–50 (2018). (2012).
37. Tzafrir, I. et al. The Arabidopsis SeedGenes Project. Nucleic Acids Res. 31, 66. Xie, T. et al. De novo plant genome assembly based on chromatin
90–93 (2003). interactions: a case study of Arabidopsis thaliana. Mol. Plant 8, 489–492
38. Lepiniec, L. et al. Molecular and epigenetic regulations and functions of the (2015).
LAFL transcriptional regulators that control seed development. Plant 67. Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using
Reprod. 31, 291–307 (2018). Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
39. Gomez, M. D., Ventimilla, D., Sacristan, R. & Perez-Amador, M. A. 68. Benson, G. Tandem repeats finder: a program to analyze DNA sequences.
Gibberellins regulate ovule integument development by interfering with the Nucleic Acids Res. 27, 573–580 (1999).
transcription factor ATS. Plant Physiol. 172, 2403–2415 (2016). 69. Chen, N. Using Repeat Masker to identify repetitive elements in genomic
40. Staszak, A. M., Rewers, M., Sliwinska, E., Klupczyńska, E. A. & Pawłowski, sequences. Curr. Protoc. Bioinformatics 5, 4.10.11–14.10.14 (2004).
T. A. DNA synthesis pattern, proteome, and ABA and GA signalling in 70. Han, Y. & Wessler, S. R. MITE-Hunter: a program for discovering miniature
developing seeds of Norway maple (Acer platanoides). Funct. Plant Biol. 46, inverted-repeat transposable elements from genomic sequences. Nucleic
152–164 (2019). Acids Res. 38, e199 (2010).
41. Spicer, R. & Groover, A. Evolution of development of vascular cambia and 71. Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program
secondary growth. New Phytol. 186, 577–592 (2010). for identification of long terminal repeat retrotransposons. Plant Physiol.
42. Baucher, M., El Jaziri, M. & Vandeputte, O. From primary to secondary 176, 1410–1422 (2018).
growth: origin and development of the vascular system. J. Exp. Bot. 58, 72. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of
3485–3501 (2007). full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268
43. Mähönen, A. P. et al. A novel two-component hybrid molecule regulates (2007).
vascular morphogenesis of the Arabidopsis root. Genes Dev. 14, 2938–2943 73. Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of
(2000). transposable element families. Proc. Natl Acad. Sci. USA 117, 9451–9457
44. Caño-Delgado, A. et al. BRL1 and BRL3 are novel brassinosteroid receptors (2020).
that function in vascular differentiation in Arabidopsis. Development 131, 74. McCarthy, E. M. & McDonald, J. F. LTR_STRUC: a novel search and
5341–5351 (2004). identification program for LTR retrotransposons. Bioinformatics 19,
45. Harris, P. J. in Plant Diversity and Evolution: Genotypic and Phenotypic 362–367 (2003).
Variation in Higher Plants (ed. Henry, R. J.) 201–227 (CAB International, 75. Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. Genome
2005). Res. 14, 988–995 (2004).
46. Yin, Y., Huang, J. & Xu, Y. The cellulose synthase superfamily in fully 76. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with
sequenced plants and algae. BMC Plant Biol. 9, 99 (2009). low memory requirements. Nat. Methods 12, 357–360 (2015).
47. Dumont, M. et al. The cell wall pectic polymer rhamnogalacturonan-II is 77. Pertea, M. et al. StringTie enables improved reconstruction of a
required for proper pollen tube elongation: implications of a putative transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295
sialyltransferase-like protein. Ann. Bot. 114, 1177–1188 (2014). (2015).
48. Sprunck, S. et al. Egg cell-secreted EC1 triggers sperm cell activation during 78. Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts.
double fertilization. Science 338, 1093–1097 (2012). Nucleic Acids Res. 34, W435–W439 (2006).

Nature Plants | www.nature.com/natureplants


Articles NATurE PlAnTS
79. Haas, B. J. et al. Automated eukaryotic gene structure annotation using 107. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change
EVidenceModeler and the Program to Assemble Spliced Alignments. and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550
Genome Biol. 9, R7 (2008). (2014).
80. Jones, P. et al. InterProScan 5: genome-scale protein function classification. 108. Langfelder, P. & Horvath, S. WGCNA: an R package for weighted
Bioinformatics 30, 1236–1240 (2014). correlation network analysis. BMC Bioinform. 9, 559 (2008).
81. Li, L. et al. The genome of Prasinoderma coloniale unveils the existence of a
third phylum within green plants. Nat. Ecol. Evol. 4, 1220–1231 (2020). Acknowledgements
82. Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for This study was supported by the Scientific Foundation of Urban Management Bureau
comparative genomics. Genome Biol. 20, 238 (2019). of Shenzhen (No. 201916 to Yang Liu, No. 202019 to Shouzhou Zhang and No. 202105
83. Laetsch, D. R. & Blaxter, M. L. KinFin: software for taxon-aware analysis of to Y.G.), the National Key R&D Program of China (No. 2019YFC1711000 to Huan
clustered protein sequences. G3 (Bethesda) 7, 3349–3357 (2017). Liu), the Biodiversity Survey and Assessment Project of the Ministry of Ecology
84. Abascal, F., Zardoya, R. & Telford, M. J. TranslatorX: multiple alignment of and Environment, China (No. 2019HJ2096001006 to Shouzhou Zhang and Yongbo
nucleotide sequences guided by amino acid translations. Nucleic Acids Res. Liu), the Major Science and Technology Projects of Yunnan Province (Digitalization,
38, W7–W13 (2010). development and application of biotic resource, No. 860 202002AA100007 to Huan Liu)
85. Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for and Shenzhen Municipal Government of China (No. JCYJ20151015162041454
phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 to Huan Liu). Y.V.d.P. acknowledges funding from the European Research Council
(2020). (ERC) under the European Union’s Horizon 2020 research and innovation program
86. Zhang, C., Rabiee, M., Sayyari, E. & Mirarab, S. ASTRAL-III: polynomial (No. 833522) and from Ghent University (Methusalem funding, BOF.MET.2021.0005.01).
time species tree reconstruction from partially resolved gene trees. BMC Plant illustrations were drawn by S. Li, Z. Li, D. Cui and X. Zeng. We are grateful to
Bioinform. 19, 15–30 (2018). the Orchid Conservation and Research Centre of Shenzhen for allowing us to access
87. Smith, S. A., Moore, M. J., Brown, J. W. & Yang, Y. Analysis of their computing resources. We also acknowledge T. Wan (Fairy Lake Botanical Garden)
phylogenomic datasets reveals conflict, concordance, and gene duplications and D. Stevenson (New York Botanical Garden), who kindly commented on an earlier
with examples from animals and plants. BMC Evol. Biol. 15, 150 (2015). draft of the manuscript, and T. Takaso (University of the Ryukyus), who provided
88. Sayyari, E., Whitfield, J. B. & Mirarab, S. DiscoVista: Interpretable the video for swimming sperm of Cycas. The study was supported by the National
visualizations of gene tree discordance. Mol. Phylogenet. Evol. 122, 110–115 Cycad Conservation Center at Fairy Lake Botanical Garden. This work is part of the
(2018). 10KP project (https://db.cngb.org/10kp/) and was also supported by China National
89. Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq GeneBank (CNGB; https://www.cngb.org/).
using the Trinity platform for reference generation and analysis. Nat.
Protoc. 8, 1494–1512 (2013).
90. Smith, S. A., Brown, J. W. & Walker, J. F. So many genes, so little time: A Author contributions
practical approach to divergence-time estimation in the genomic era. PLoS S.Z., H.L., X.G. and Y.L. led and managed the project. S.Z., H.L. and Yang Liu conceived
ONE 13, e0197433 (2018). the study. Yang Liu, S.W., L.L., S.D., T.W., J.M. and S. Wu wrote the manuscript. S.D.,
91. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Y.G., X.F., A.J.L., Y.Y., X.G., D.L., N.L., H.W. and L.Y. prepared materials. S.W., L.L., T.Y.,
Evol. 24, 1586–1591 (2007). Yang Liu, J.R., J.W., S. Zaman, J.-Y.X., L.Z., J.C., Z.-Q.S., C.S., S.H., Na Li, M.L., G.F.,
92. Morris, J. L. et al. The timescale of early land plant evolution. Proc. Natl H. Wang, J.Y., M. Lisby, S.K.S., W.M., Y.F., Y.C. and Z.Z. performed bioinformatics
Acad. Sci. USA 115, E2274–E2283 (2018). analysis. J.H., J.M., G.C. and P.L. performed horizontal gene transfer analysis. T.W., S.L.,
93. Condamine, F. L., Rolland, J., Höhna, S., Sperling, F. A. & Sanmartín, I. X.W. and X.L. performed SDR analysis. S.D., Yang Liu, Y.G., J.L., Y.Y. and Jianquan Liu
Testing the role of the Red Queen and Court Jester as drivers of the performed gene family clustering and comparative phylogenomics. S. Wu, Y.V.d.P., Y.J.,
macroevolution of Apollo butterflies. Syst. Biol. 67, 940–964 (2018). Z.-J.L. and Z.L. performed WGD analysis. P.S.S., Y.V.d.P., D.E.S., B.G., X.-Q.W., J.H.,
94. Jiao, Y., Li, J., Tang, H. & Paterson, A. H. Integrated syntenic and E.C.S., E.W. and M. Lisby contributed substantially to revisions. All authors read and
phylogenomic analyses reveal an ancient genome duplication in monocots. approved the manuscript.
Plant Cell 26, 2792–2802 (2014).
95. Wu, S., Han, B. & Jiao, Y. Genetic contribution of paleopolyploidy to Competing interests
adaptive evolution in angiosperms. Mol. Plant 13, 59–71 (2020). The authors declare no competing interests.
96. Katoh, K., Kuma, K.-i, Toh, H. & Miyata, T. MAFFT version 5:
improvement in accuracy of multiple sequence alignment. Nucleic Acids
Res. 33, 511–518 (2005). Additional information
97. Suyama, M., Torrents, D. & Bork, P. PAL2NAL: robust conversion of Extended data is available for this paper at https://doi.org/10.1038/s41477-022-01129-7.
protein sequence alignments into the corresponding codon alignments. Supplementary information The online version contains supplementary material
Nucleic Acids Res. 34, W609–W612 (2006). available at https://doi.org/10.1038/s41477-022-01129-7.
98. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for
automated alignment trimming in large-scale phylogenetic analyses. Correspondence and requests for materials should be addressed to
Bioinformatics 25, 1972–1973 (2009). Yang Liu, Yves Van de Peer, Douglas E. Soltis, Xun Gong, Huan Liu or Shouzhou Zhang.
99. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and Peer review information Nature Plants thanks James Clugston and the other,
post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014). anonymous, reviewer(s) for their contribution to the peer review of this work.
100. Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis Reprints and permissions information is available at www.nature.com/reprints.
of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
101. Nei, M. & Gojobori, T. Simple methods for estimating the numbers of Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. published maps and institutional affiliations.
3, 418–426 (1986). Open Access This article is licensed under a Creative Commons
102. Kendig, K. I. et al. Sentieon DNASeq variant calling workflow demonstrates Attribution 4.0 International License, which permits use, sharing, adap-
strong computational performance and accuracy. Front. Genet. 10, 736 tation, distribution and reproduction in any medium or format, as long
(2019). as you give appropriate credit to the original author(s) and the source, provide a link to
103. Kang, H. M. et al. Variance component model to account for sample the Creative Commons license, and indicate if changes were made. The images or other
structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010). third party material in this article are included in the article’s Creative Commons license,
104. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for unless indicated otherwise in a credit line to the material. If material is not included in
genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011). the article’s Creative Commons license and your intended use is not permitted by statu-
105. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, tory regulation or exceeds the permitted use, you will need to obtain permission directly
2156–2158 (2011). from the copyright holder. To view a copy of this license, visit http://creativecommons.
106. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for org/licenses/by/4.0/.
Illumina sequence data. Bioinformatics 30, 2114–2120 (2014). © The Author(s) 2022

Nature Plants | www.nature.com/natureplants


NATurE PlAnTS Articles
1
State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, China. 2Key Laboratory of Southern Subtropical Plant Diversity, Fairy Lake
Botanical Garden, Shenzhen & Chinese Academy of Sciences, Shenzhen, China. 3State Key Laboratory of Grassland Agro-Ecosystems, College of Ecology,
Lanzhou University, Lanzhou, China. 4State Environmental Protection Key Laboratory of Regional Eco-process and Function Assessment, Chinese Research
Academy of Environmental Sciences, Beijing, China. 5Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany,
Chinese Academy of Sciences, Kunming, China. 6Key Laboratory of Plant Stress Biology, State Key Laboratory of Crop Stress Adaptation and Improvement,
Henan University, Kaifeng, China. 7Department of Biology, East Carolina University, Greenville, NC, USA. 8College of Biology and Environment, Nanjing
Forestry University, Nanjing, China. 9College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China. 10Nanning Botanical Garden,
Nanning, China. 11School of Life Sciences, Sun Yat-sen University, Guangzhou, China. 12Sichuan Cycas panzhihuaensis National Nature Reserve, Panzhihua,
China. 13Global Biodiversity Conservancy, Chonburi, Thailand. 14Department of Entomology, China Agricultural University, Beijing, China. 15Department
of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA. 16Guangdong Provincial Key Laboratory for Plant Epigenetics, Longhua
Institute of Innovative Biotechnology, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, China. 17Department of Biotechnology
and Biomedicine, Technical University of Denmark, Lyngby, Denmark. 18Shenzhen Agricultural Genome Research Institute, Chinese Academy of
Agricultural Sciences, Shenzhen, China. 19College of Horticulture, Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University,
Nanjing, China. 20State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China. 21Chengdu University
of Traditional Chinese Medicine, Chengdu, China. 22Department of Plant Biotechnology and Bioinformatics, Ghent University, VIB UGent Center for
Plant Systems Biology, Gent, Belgium. 23College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China. 24Hainan Institute of Zhejiang
University, Sanya, China. 25The College of Life Sciences, Sichuan University, Chengdu, China. 26Key Laboratory of Orchid Conservation and Utilization of
National Forestry and Grassland Administration at College of Landscape Architecture, Fujian Agriculture and Forestry University, Fuzhou, China. 27State
Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China. 28College of Life Sciences,
South China Agricultural University, Guangzhou, China. 29National Key Laboratory of Plant Molecular Genetics, Chinese Academy of Sciences Center
for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China. 30Department of
Biology, University of Copenhagen, Copenhagen, Denmark. 31Florida Museum of Natural History, University of Florida, Gainesville, FL, USA. 32Department
of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, South Africa. 33Department of Biology, University of Florida, Gainesville, FL,
USA. 34These authors contributed equally: Yang Liu, Sibo Wang, Linzhou Li, Ting Yang, Shanshan Dong, Tong Wei, Shengdan Wu, Yongbo Liu. ✉e-mail:
[email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]

Nature Plants | www.nature.com/natureplants


Articles NATurE PlAnTS

Extended Data Fig. 1 | Genome features of C. panzhihuaensis. Outer ring: The 11 chromosomes are labeled from Chr1 to Chr11. Inner rings 1-4 (from
outside to inside): Repeat elements number shown in light purple. GC content colored indicated in light blue (y-axis min-max: 0.27–0.48). Expressed base
percentage colored in light blue (y-axis min-max: 0–0.20). Gene numbers colored in light orange (y-axis min-max: 0-30). The sliding window of the inner
rings 1-4 is 1 Mb. The inner ring 5 indicates the miRNA location over the genome. The blue lines inside represent the syntenic regions in Cycas.

Nature Plants | www.nature.com/natureplants


NATurE PlAnTS Articles

Extended Data Fig. 2 | Comparative analysis of C. panzhihuaensis. Extended Data Fig. 2. Comparative analysis of C. panzhihuaensis. (a) Comparison of
the longest 10% of introns and gene in the representative land plants. The minimum, first quartile (Q1), median, third quartile (Q3), and maximum value
was indicated in the box-plot by order after excluding the outliers. (b) Comparison of components of intron across the selected plants.

Nature Plants | www.nature.com/natureplants


Articles NATurE PlAnTS

Extended Data Fig. 3 | The chronogram of 90 vascular plant species inferred with MCMCTree based on 100. nuclear single copy genes with concordant
evolutionary histories. 25 fossil calibrations and 2 secondary calibrations were used. Individual gene trees (1,569 NT tree) were mapped on the nuclear
coalescent tree with Phyparts. The pie charts at each node show the proportion of genes in concordance (blue), conflict (green = a single dominant
alternative; red = all other conflicting trees), and without enough information (gray). Quartet support for six internal branches I, II, III, IV, V, VI were
indicated on the left panel as barcharts. Image courtesy of Zanqian Li and Xiaolian Zeng.

Nature Plants | www.nature.com/natureplants


NATurE PlAnTS Articles

Extended Data Fig. 4 | Ancestral polyploidy events in extant gymnosperms. Example showing both the phylogenomic and syntenic evidence supporting
an ancestral polyploidy event in extant gymnosperms. Four pairs of paralogous genes in OG0000093, OG0000255, OG00000276 and OG0000316
were duplicated before the divergence of gymnosperms and after the split of angiosperms and gymnosperms based on phylogenetic trees. These pairs
of duplicated genes are located on the same syntenic block identified in the C. panzhihuaensis genome. The abbreviated name given before the protein ID
represents species name: CYCAS: Cycas panzhihuaensis, Gb: Ginkgo biloba, ELO: Encephalartos longifolius, SEGI: Sequoiadendron giganteum, GMON: Gnetum
montanum, PICABI: Picea abies, PITA: Pinus taeda.

Nature Plants | www.nature.com/natureplants


Articles NATurE PlAnTS

Extended Data Fig. 5 | The phylogeny of LAFL(NF-YB, ABI3, FUS3, and LEC2) transcriptional regulators. (a) Phylogenetic tree of the NF-YB. The tree
was constructed using the maximum likelihood method with 500 bootstrap replicates. The bootstrap values are shown on the branches. (b) Phylogenetic
tree of the B3 domain containing the gene family of C. panzhihuaensis. Bootstrap values are shown on the braches. (c) Transcript expression level is
indicated by TPM during seed development. The phylogenetic trees were built using RAxML (estimating branch support values by bootstrap iterations
with 500 replicates) with PROTGAMMAGTRX amino acid substitution model. The abbreviated name given before the protein ID represents species name:
CYCAS: Cycas panzhihuaensis, Gb: Ginkgo biloba, SEGI: Sequoiadendron giganteum, GMON: Gnetum montanum, PICABI: Picea abies, PITA: Pinus taeda, ATH,
Arabidopsis thaliana, DEBAO: Cycas debaoensis, AMTR: Amborella trichopoda, OS: Oryza sativa, AFILI: Azolla filiculoides, SACU: Salvinia cucullata, SELMO:
Selaginella moellendorffii, PPATEH: Physcomitrella patens, MARPO: Marchantia polymorpha.

Nature Plants | www.nature.com/natureplants


NATurE PlAnTS Articles

Extended Data Fig. 6 | Phylogenetic tree of CESA/CSL gene families. (a) Phylogenetic trees of CESA and CSL gene families. (b) Phylogenetic tree of
CSLB and CSLH genes. (c) The phylogenetic tree of CSLE and CSLG genes. The CSLE/G from gymnosperm are the ancestral form of the angiosperm CSLE
and CSLG. The phylogenetic trees were generated using RAxML with PROTCATGTR model and 500 bootstrap replicates. Bootstrap values ≥ 50% are
shown. The abbreviated name given before the protein ID represents species name: CYCAS: Cycas panzhihuaensis, Gb: Ginkgo biloba, SEGI: Sequoiadendron
giganteum, GMON: Gnetum montanum, PICABI: Picea abies, PITA: Pinus taeda, ATH, Arabidopsis thaliana, DEBAO: Cycas debaoensis, AMTR: Amborella
trichopoda, OS: Oryza sativa.

Nature Plants | www.nature.com/natureplants


Articles NATurE PlAnTS

Extended Data Fig. 7 | The Evolution of flagella related genes in embrophyta. (a) Sketch of the Cycas sperm. (b) Schematic diagram of flagellum loss
events in green linage. (c) Distribution of outer dense fiber protein and other key flagellar proteins across representative embrophyta.

Nature Plants | www.nature.com/natureplants


NATurE PlAnTS Articles

Extended Data Fig. 8 | The phylogeny and expression level of TPS. (a) Phylogenetic tree of the TPS gene family. The tree was constructed using
RAxML (the maximum-likelihood method) with PROTCATGTR amino acid substitution model and 500 bootstrap replicates. The bootstrap values ≥
50% are shown in the central branches. The red colors in the tree represent the cycas genes. (b) Heatmap of TPS gene family in different tissues of C.
panzhihuaensis. The * denotes the C. panzhihuaensis specific TPS genes.

Nature Plants | www.nature.com/natureplants


Articles NATurE PlAnTS

Extended Data Fig. 9 | Two MADS-box transcription factor genes differentially expressed in reproductive organs of C. panzhihuaensis. (a) Heatmap of
1,971 genes differentially expressed in males and females’ organs. Arrows indicate CYCAS_034085 on the MSY and CYCAS_010388 on chromosome 2.
(b) Expression of CYCAS_034085 on MSY and CYCAS_010388 on chromosome 2 in male microsporophyll and in the ovule.

Nature Plants | www.nature.com/natureplants


View publication stats

You might also like