The Cycas Genome and The Early Evolution of Seed Plants
The Cycas Genome and The Early Evolution of Seed Plants
net/publication/360025973
CITATIONS READS
134 3,619
65 authors, including:
All content following this page was uploaded by Sunil Kumar Sahu on 19 April 2022.
Cycads represent one of the most ancient lineages of living seed plants. Identifying genomic features uniquely shared by cycads
and other extant seed plants, but not non-seed-producing plants, may shed light on the origin of key innovations, as well as the
early diversification of seed plants. Here, we report the 10.5-Gb reference genome of Cycas panzhihuaensis, complemented by
the transcriptomes of 339 cycad species. Nuclear and plastid phylogenomic analyses strongly suggest that cycads and Ginkgo
form a clade sister to all other living gymnosperms, in contrast to mitochondrial data, which place cycads alone in this posi-
tion. We found evidence for an ancient whole-genome duplication in the common ancestor of extant gymnosperms. The Cycas
genome contains four homologues of the fitD gene family that were likely acquired via horizontal gene transfer from fungi, and
these genes confer herbivore resistance in cycads. The male-specific region of the Y chromosome of C. panzhihuaensis contains
a MADS-box transcription factor expressed exclusively in male cones that is similar to a system reported in Ginkgo, suggesting
that a sex determination mechanism controlled by MADS-box genes may have originated in the common ancestor of cycads and
Ginkgo. The C. panzhihuaensis genome provides an important new resource of broad utility for biologists.
C
ycads are often referred to as ‘living fossils’; they originated the most important events of plant evolution10. As one of the four
in the mid-Permian and dominated terrestrial ecosystems extant gymnosperm groups (cycads, Ginkgo, conifers and gneto-
during the Mesozoic, a period called the ‘age of cycads and phytes), cycads hold an important evolutionary position for under-
dinosaurs’1. Although the major cycad lineages are ancient, mod- standing the origin and early evolution of seed plants. We therefore
ern cycad species emerged from several relatively recent diversi- generated a high-quality genome assembly for a species of Cycas
fications2,3. Cycads are long-lived woody plants that, unlike other to explore fundamental questions in seed plant evolution, includ-
extant gymnosperms, bear frond-like leaves clustered at the tip of ing the phylogenetic position of cycads, the occurrence of ancient
the stem4. Extant cycads comprise 10 genera and approximately whole-genome duplications (WGDs), innovation in gene function
360 species, two-thirds of which are on the International Union and the evolution of sex determination.
for Conservation of Nature Red List of threatened species5. All
living cycad species are dioecious, with individual plants develop- A chromosome-scale genome assembly
ing either male or female cones (except in Cycas, which produces Here, we report a high-quality, chromosome-level genome assembly
a loose cluster of megasporophylls rather than a true female cone; of Cycas panzhihuaensis based on sequencing of the haploid mega-
Fig. 1a)6. Unlike other extant seed plants, cycads and Ginkgo retain gametophyte using a combination of MGI-SEQ short-read, Oxford
flagellated sperm, an ancestral trait shared with bryophytes, lyco- Nanopore long-read and Hi-C sequencing methods (Supplementary
phytes and ferns7. Cycads exhibit other special features, such as the Note 2). The genome comprises 10.5 Gb assembled in 5,123 contigs
accumulation of toxins that deter herbivores8 in seeds and vegeta- (N50 = 12 Mb), with 95.3% of these contigs anchored to the larg-
tive tissues. They also produce coralloid roots that host symbiotic est 11 pseudomolecules, corresponding to the 11 chromosomes
cyanobacteria, making them the only gymnosperm associated with (n = 11) of the C. panzhihuaensis karyotype11 (Supplementary Note
nitrogen-fixing symbionts9. The origin of the seed marked one of 3 and Extended Data Fig. 1). The annotated genome describes
32,353 protein-coding genes and is mostly composed of repetitive genera emerged from rapid radiations ranging from 11 to 20 Myr
elements adding up to 7.8 Gb (Supplementary Note 4). Based on ago, which may have been a consequence of dramatic Miocene global
BUSCO12 estimation, the gene space completeness of the C. panzhi- temperature changes24,28. Notably, major temperate and tropical radi-
huaensis genome assembly is 91.6% (Supplementary Note 4). ations in several major clades of flowering plants have been shown
Compared with other gymnosperms, the size of the Cycas genome to be associated with Miocene cooling in the past 15 Myr (refs. 29–31).
is similar to that of Ginkgo (10.6 Gb)13,14 and intermediate between
the relatively compact genome of Gnetum (4.1 Gb)15 and the very Cycas is an ancient polyploid
large genomes of conifers (for example, ~20-Gb genomes of Picea WGD is a major driving force in the evolution of land plants and has
and Pinus)16–18. As in other gymnosperm genomes, a large portion dramatically promoted the diversification of flowering plants23,32.
(76.14%) of the C. panzhihuaensis genome consists of ancient repet- Synonymous substitutions per synonymous site (KS) analysis of
itive elements (Supplementary Note 4). In addition, the genome duplicate genes33 revealed a clear peak at similar KS values (~0.85,
contains almost equal proportions of copia and gypsy long terminal range 0.5–1.2) for both Cycas and Ginkgo, suggestive of an ancient
repeat (LTR) elements, in contrast to other gymnosperm genomes, WGD possibly shared by these two lineages (Supplementary Note
in which gypsy repeats are more frequent14,15 (Supplementary Note 7)34. However, the precise evolutionary position of this WGD event
6). Among all sequenced plant genomes, C. panzhihuaensis has the remains ambiguous. Our phylogenomic analyses based on 15
longest average introns (~30.8 kb) and genes (~121.3 kb) (Extended genomes and 1 transcriptome revealed 2,469 gymnosperm-wide
Data Fig. 2a), surpassing those of Ginkgo14. In comparison with duplications in 9,545 gene families and indicate that this WGD
Ginkgo, in which LTRs dominate intron content, the introns of event dates to the most recent common ancestor (MRCA) of extant
C. panzhihuaensis contain a large portion of unknown sequences gymnosperms (Fig. 2a), supporting recent findings based on tran-
(Extended Data Fig. 2b). The longest gene, CYCAS_013063, encod- scriptome data24. We also identified 69 ancient syntenic genomic seg-
ing a kinesin-like protein KIF3A, covers 2.1 Mb in the C. panzhi- ments that further support a gymnosperm-wide WGD (Extended
huaensis genome; the longest intron is approximately 1.5 Mb and Data Fig. 3, Supplementary Fig. 23 and Supplementary Tables 24
was detected in CYCAS_030563, a gene that encodes a photosystem and 25). Furthermore, a mixed dataset with increased sampling—29
II CP43 reaction centre protein. Both genes are expressed, as evi- genomes and 61 transcriptomes—also yielded the same result (Fig.
denced by our long-read transcriptome data. 2a and Extended Data Fig. 4). This gymnosperm-wide WGD, here
named omega (ω), is independent of the WGD preceding the split
Phylogeny of cycads and seed plants between gymnosperms and angiosperms35 and may have contrib-
The C. panzhihuaensis genome provides an opportunity to revisit uted to the subsequent evolution of gymnosperm-specific genes
the long-standing debate on the evolutionary relationships among involved in plant hormone signal transduction, biosynthesis of sec-
living seed plants. On the basis of molecular phylogenetic analy- ondary metabolites, plant–pathogen interaction and terpenoid bio-
ses, extant gymnosperms are resolved as a monophyletic group, synthesis (Supplementary Note 7).
but the branching order among their major lineages has remained
controversial19–23. Our phylogenetic analyses of separate nuclear Ancestral gene innovation in the origin of the seed
(Fig. 1b, Extended Data Fig. 3 and Supplementary Note 5) and The origin of seed plants is marked by the emergence of key traits
plastid datasets strongly support cycads plus Ginkgo as sister to the including the seed, pollen and secondary growth of xylem and
remaining extant gymnosperms, in agreement with several other phloem36. Reconstruction of the evolution of gene families across
analyses23,24, whereas mitochondrial data resolve cycads alone in the seed plant tree of life revealed that 663 orthogroups were gained
that position (Fig. 1c). This conflict arising from the mitochondrial and 368 expanded in the MRCA of extant seed plants compared
data cannot be explained by the presence of extensive RNA edit- with non-seed plants (Fig. 2b, node 1). Among these, 106 of the new
ing sites in the mitochondrial data (Fig. 1c), which in some cases orthogroups and 55 of the expanded orthogroups are associated
has been reported to bias phylogenetic inferences25,26, and instead with seed development in Arabidopsis37, including the regulation of
may be best explained by incomplete lineage sorting, which is sup- development during early embryogenesis, seed dormancy and ger-
ported by our PhyloNet27 and coalescent analyses of nuclear genes mination, and seed coat formation, as well as in immunity and stress
(Supplementary Note 5). response of the seed (Supplementary Note 6).
The extant diversity of cycads was previously considered to have Genes of the LAFL family are well-known as core regulatory genes
arisen synchronously within the past 9–50 million years (Myr)2,3. of seed development, including LEAFY COTYLEDON1 (LEC1),
Our inferences, based on 1,170 low-copy nuclear genes sampled for ABSCISIC ACID INSENSITIVE3 (ABI3), LEAFY COTYLEDON2
339 cycad species and 6 fossil calibrations3 corroborate recent broad (LEC2) and FUSCA3 (FUS3), which encode master transcriptional
analyses of gymnosperms indicating that extant species-rich cycad regulators, interacting to form complexes that control embryo
Fig. 1 | Phylogenomic analyses of cycads and seed plants. a, Illustration of Cycas panzhihuaensis. b, Chronogram of seed plants on the basis of the
SSCG-NT12 dataset inferred using MCMCTree. All branches are maximally supported by bootstrap values (ML) and posterior probabilities (ASTRAL). I,
II, III, VI, V and VI indicate internal branches for which the pie charts depicting gene tree incongruence are complemented by histograms (lower panel)
showing quartet support for the main topology (q1), the first alternative topology (q2) and the second alternative topology (q3). O, Ordovician; S, Silurian; D,
Devonian; C, Carboniferous; P, Permian; T, Triassic; J, Jurassic; K, Cretaceous; Pg, Palaeogene; N, Neogene; Q, Quaternary; Ma, million years ago.
c, DiscoVista species tree analysis: rows correspond to the nine hypothetical groups tested (see Supplementary Note 5 for details) and columns correspond
to the results derived from the use of different datasets and methods. SSCG, single-copy genes; LCG, low-copy genes; MT, mitochondrial genes; PT, plastid
genes; AA, amino acid sequences; NT, nucleotide sequences; NT12, codon 1st + 2nd positions; ASTRAL, coalescent tree inference method using ASTRAL;
CONCAT, maximum likelihood tree inferred with IQ-TREE based on concatenated datasets; STAG, species tree inference using software STAG with low-copy
genes (one to four copies); Original, original organellar nucleotide sequences; RNA Editing, organellar genes with RNA editing site modified. Strong support,
the clade is reconstructed with a support value >95%. Weak support, the clade is reconstructed with support value <95%. Weak rejection, the clade is not
recovered, but the alternative topology is not conflict if poorly supported branches (<85%) are collapsed. Strong rejection, the clade is not recovered, and
the alternative topology is conflict even when poorly supported branches (<85%) are collapsed. d, Diversification of Cycadales. The chronogram of 339
cycad species was inferred with MCMCTree based on 100 nuclear single-copy genes with concordant evolutionary histories. All illustrations are specifically
created for this study (a high-resolution version is available at https://db.cngb.org/codeplot/datasets/public_dataset?id=PwRftGHfPs5qG3gE).
ST SC CG AA
TR LC G− A
C C SS C A
N C SS SCG AA
C NC SSC CG A
ng
R Or ng
A G G 2
AT − C 12
C O AT− −S G 2
N C -R -O 12
O C − - T
S G -N T
AS ST AL− −N T
T- T- diti l
Ed al
C CO −M −M -NT T
O C − -N T
−P −P E ina
ST TA -LC T1
TR RA SS T1
AS G- -LC −A
O − S -A
C 1
O − S -A
C ON AT CG G-N
A G -N
G N
G -N
C ON AT G -N
A in
iti
C T S T
N A SS T
−S SS G-
-
C AL −S CG
C AL −S CG
N A S N
AT T A ig
N ig
Primary root
C A N r
TR RA SS
C
O N T T
AS T L−
L
C T
L
AS RA
AT T
C A
T
N
AS
Megasporophylls
O N
A
C
Strong support Weak support Weak rejection Strong rejection
b Arabidopsis thaliana d
Oryza sativa
Cinnamomum micranthum
Angiosperms
Liriodendron chinense
II
Nymphaea colorata
Stangeria Microcycas
Amborella trichopoda
I Picea abies Zamia
V Pinus taeda Ceratozamia
IV Gnetum montanum
Gymnosperms 0.4
III Sequoiadendron giganteum
VI Ginkgo biloba
Cycas panzhihuaensis 0.3
Speciation rate
a b
Previously recognized WGDs
n
on ion
io
Confirmed WGD by this study 2,654 Pinus taeda
ct
ns
tra
pa
n
ss
ai
324 Picea abies
Ex
Lo
G
C
250
Gymnosperms
Gnetum montanum Oryza sativa
275/102/610/47
Angiosperms
215
1,338 [875] 2
Liriodendron chinense 726/0/360/0 445/45/289/58
116 λ 2,141 Ginkgo biloba
Cinnamomum micranthum 4
981 [151] 1,990 886/53/382/59
ε 7,572 Cycas panzhihuaensis
π Nymphaea colorata
MRCA
Amborella trichopoda Azolla filiculoides
1,169/290/408/159
Salvinia cucullata Salvinia cucullata
Outgroups
Azolla filiculoides
Selaginella moellendorffii
Selaginella moellendorffii (Outgroup)
Fig. 2 | Ancient polyploidy events and evolution of gene families in seed plants. a, Inference of the number of gene families with duplicated genes
surviving after WGD events mapped on a phylogenetic tree depicting the relationships among 16 vascular plants included in this study. The number of gene
families with retained gene duplicates reconciled on a particular branch of the species tree are shown above the branch across the phylogeny (Methods).
Numbers in square brackets denote the number of gene families with duplicated genes also supported by synteny evidence. b, Evolutionary analyses and
phylogenetic profiles depicting the gains (light green), losses (light red), expansions (light yellow) and contractions (light blue) of orthogroups, according
to the reconstruction of the ancestral gene content at key nodes and the dynamic changes of the lineage-specific gene characteristics.
phytohormones were also more highly expressed in unpollinated member ALTERED PHLOEM DEVELOPMENT (APL), WOL and
ovules, indicating the higher demand for these hormones as agents BRASSINOSTEROID-INSENSITIVE LIKE 1 (BRL1) and BRL3.
of pathogen resistance in the unpollinated ovule. Gibberellin, which The APL gene is expressed in the phloem and cambium in vascular
is reported to regulate integument development in the ovules of plants, and its encoded protein promotes phloem differentiation42.
flowering plants39, accumulated in the late stage of the pollinated The expression of APL is regulated by WOL in the procambium43.
ovule in Cycas. We also found gene families related to integument The BRL1 and BRL3 genes encode brassinosteroid receptors that
development (for example, those involved in cutin, suberine and play major roles in xylem differentiation and phloem/xylem pat-
wax biosynthesis), with increased expression levels at the late stage terning in angiosperms44. Many copies of these genes were found
of the pollinated ovule. Fertilized ovules accumulated a high level to be highly expressed in cambium or apical meristem of C. panzhi-
of abscisic acid and expressed the genes related to cell wall orga- huaensis (Supplementary Note 6).
nization and biogenesis, indicating their activity in embryo devel- Many gymnosperms are tall, woody plants with cell walls con-
opment, seed coat formation, and seed maturation and dormancy40 taining large quantities of cellulose, xyloglucan, glucomannan,
(Supplementary Note 10.1–10.5). homogalacturonans and rhamnogalacturonans45. In the cellulose
Among genes related to seed development, the most notable is synthase (CESA/CSL) superfamily46, we discovered the existence of
the cupin protein family, expanded in C. panzhihuaensis compared putative ancestral cellulose synthase-like B/H (CSLB/H) and CSLE/G
with all other green plants. Phylogenetic analysis revealed that the that are specifically shared by gymnosperms, and both gene groups
cupin family can be subdivided into two groups: the germin-like originated before the divergence of CSLB and CSLH in angiosperms
and seed storage protein (SSP)-encoding genes. Surprisingly, we (Extended Data Fig. 6). Cycads have manoxylic wood, with a large
identified a new type of gene encoding vicilin-like storage proteins pith, large amounts of parenchyma and relatively few tracheids,
in C. panzhihuaensis; this type appears to be homologous to the in contrast to most other gymnosperms, which have pycnoxylic
vicilin-like antimicrobial peptides (v-AMP) and is organized as a wood, with small amounts of pith, cortex and parenchyma, and a
tandem gene array in the C. panzhihuaensis genome (Fig. 3c). These greater density of tracheids4. The glutamyltransferase 77 (GT77)
v-AMP homologues are mostly expressed in C. panzhihuaensis at family, involved in the synthesis of rhamnogalacturonan II, which
the late stage of pollinated ovules and fertilized ovules, with expres- is essential for cell wall synthesis in rapidly growing tissues47, is
sion gradually decreasing during embryogenesis, suggesting the expanded in C. panzhihuaensis compared with other gymnosperms
potentially important role of v-AMP genes in seed development (Supplementary Note 11). In addition, gene families related to cell
(Fig. 3d and Supplementary Note 10.6). wall extension and loosening are uniquely expanded in C. panzhi-
huaensis, including those encoding hydroxyproline-rich glycopro-
Secondary growth and cell wall synthesis teins, which are seven times more abundant in Cycas than in Ginkgo,
Secondary growth is also a major innovation of seed plants36, and and the fasciclin-like arabinogalactan proteins, which are twice as
it has been recognized from fossils of now-extinct progymno- numerous in Cycas as in Ginkgo, Sequoiadendron giganteum and
sperms, which predated the origin of seed plants36,41. Secondary Pseudotsuga menziesii. How all these gene families related to wood
phloem and xylem are produced by the activity of a bifacial vas- features are regulated in cycads relative to other gymnosperms will
cular cambium (secondary meristem). We found that several genes be important for understanding the differences in wood density.
that are known in angiosperms to regulate secondary growth
in the positioning of the xylem, or in xylem/phloem pattern- The evolution of pollen, pollen tube and sperm
ing, underwent obvious expansions in the MRCA of extant seed Another major innovation during seed plant evolution is the pro-
plants compared with non-seed plants, including the MYB family duction of pollen and the pollen tube36. We found that many genes
M3 Organonitrogen compound
Nuclear import signal receptor activity
M4
M5 Protein transport 0 0
Intracellular signal transduction S1 S2 S3 S4 S1 S2 S3 S4
M6 Phosphorelay signal transduction system
–1
1,3-beta-D-glucan synthase activity
M7 (1->3)-beta-D-glucan biosynthetic process 400
M8 Malate dehydrogenase (decarboxylating) (NAD+) activity 50
Nutrient reservoir activity 200
Organonitrogen compound biosynthetic process
Organonitrogen compound metabolic process 0 0
Cell redox homeostasis S1 S2 S3 S4 S1 S2 S3 S4
Nitrogen compound metabolic process
M9 Cellular nitrogen compound metabolic process 60 Jasmonic acid-isoleucine 20 Jasmonic acid
–1
Cellular nitrogen compound biosynthetic process
Membrane 40
Membrane coat 10
Vesicle-mediated transport 20
M10 Vesicle coat
Integral component of membrane 0 0
Organelle membrane S1 S2 S3 S4 S1 S2 S3 S4
–1
Cysteine-type endopeptidase inhibitor activity
M11 Iron–sulfur cluster binding
200 2
c GLP2.2
d
GLP3
GLP5
GLP2
GLP4
GLP2.1
4 3 2 1 0
GLP1 log10(TPM)
Embryo
95 78 Fertilized ovule
Angiosperm Late stage of pollinated ovule
Gymnosperm 1 GLP6 Early stage of pollinated ovule
Fern 94 Unpollinated ovule
Lycophytes Megagametophyte
Bryophytes 100
Pollen sac (male cone)
Microsporophylls (male cone)
Apical meristem (stem)
v-AMP homologue 96
Cambium (stem)
95
GLP7 Pith (stem)
Cortex (stem)
Mature leaf
Primary root
Precoralloid root
GLP8 Coralloid root
v-SSP GLP2.1 GLP2.2 GLP5 GLP6,7 &8 l v v-AMP
SSP
l-SSP
Fig. 3 | Gene expression and phytohormone synthesis at different developmental stages of the seed of Cycas and the evolution of seed storage proteins.
a, Heatmap showing relative expression of genes in 11 co-expression modules by WGCNA across 4 developmental stages of the seed: S1, unpollinated
ovule; S2, early stage of pollinated ovule; S3, late stage of pollinated ovule; and S4, fertilized ovule. b, Quantification of eight plant phytohormone amounts
in the same four developmental stages of the Cycas seed as above. The grey histogram represents the amount of hormone (n = 2 biologically independent
experiments) and the error bar represents the standard error. c, Phylogeny of SSPs in some representative species in land plants. The SSPs analysed
include germin-like protein (GLP), legumin-like SSP (l-SSP), vicilin-like SSP (v-SSP) and v-AMP. A maximum likelihood tree with 500 bootstrap replicates
was constructed using RAxML. Bootstrap values (≥50%) for each major clade (highlighted in colour) and the relationships among them are provided. The
Cycas sequences are highlighted in red. d, Expression levels of SSP in different tissues of C. panzhihuaensis.
regulating pollen and pollen tube development (pollen matura- accumulate in the pollen tube cell wall and play a role in pollen
tion, pollen tube growth, pollen tube perception and prevention germination and pollen tube growth49, are remarkably expanded
of multiple-pollen tube attraction) were gained (or the respective in the MRCA of extant seed plants compared with non-seed
gene family expanded) in the MRCA of extant seed plants (Fig. plants (Supplementary Note 6). Such expansion also includes
2b), as might be predicted for these features. For instance, those polcalcin, which is involved in calcium signalling to guide pol-
genes encoding egg cell-secreted proteins that prevent attraction len tube growth50 (Supplementary Note 11). Both the COBRA
of multiple pollen tubes48 originated in the MRCA of living seed and COBRA-like protein gene families are expanded in Cycas
plants. The Ole e 1-like gene families, which encode proteins that and other seed plants compared with non-seed plants, and the
a d
40
30
−log10(P)
20
10
0
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
Chromosome 8 (Mb)
b
πmale/πfemale Seed
FST
∆Hp
Prophase
30 e
BFm
AFm
11 d
21 d
BFp
0d
7d
0 2 4 8
25 log2(FPKM + 1) MADS-Y on MSY
CYCAS_010388 on autosome
20
MSY scaffolds (Mb)
g
ns
is ida a
uc ce
0 a oe i al ura
eb am furf
sd cro
z ia
ca Ma Za
m
15 20 25 30 35 40 45 50 Cy
Chromosome 8 (Mb) M F M F M F
MADS-Y on MSY
CYCAS_010388 on autosome
Fig. 4 | Identification of male-specific chromosomal region in Cycas. a, Manhattan plot of GWAS analysis of sex differentiation in 31 male and 31
female Cycas samples. The red horizontal dashed line represents the Bonferroni-corrected threshold for genome-wide significance (α = 0.05). P values
were calculated from a mixed linear model association of SNPs. Association analyses were performed once with a population of 31 male and 31 female
individuals. b, Ratio of π, FST and difference of pooled heterozygosity (ΔHp) within a 100-kb sliding window between the female and male sequences.
Colour represents values from low (blue) to high (red). c, Genome alignment of the MSY scaffolds with the corresponding female-specific region on
chromosome 8. Scaffolds are separated by grey dashed lines. Red lines represent alignments >5 kb on the forward strand, and blue lines represent those
on the reverse strand. Pink boxes in a–c represent the most differentiated regions between the sex chromosomes. d, Photographs of microsporophyll
and megasporophyll of C. panzhihuaensis. Bar, 1 cm. e, Sex-specific expression of MADS-Y (CYCAS_034085) and CYCAS_010388 in male and female
reproductive organs. Microsporophyll tissues were collected before meiosis (BFm), during prophase (Prophase), after meiosis (AFm) and before
pollination (BFp); female tissues were collected at 0, 7, 11 and 21 days post-pollination. f, Phylogeny of MADS-Y homologues across land plants. Genes from
MSY and autosomes are marked on the right, and those from Selaginella and Physcomitrium are used as outgroups. Numbers above branches represent
bootstrap scores from IQ-TREE. g, Molecular genotyping of male and female cycad samples from Cycas debaoensis, Macrozamia lucida and Zamia furfuracea
using primers specific to homologues of MADS-Y and CYCAS_010388.
COBRA-like protein localizes at the tip of the pollen tube mem- release motile spermatozoids that, following pollination, swim the
brane and plays an important role in pollen tube growth and guid- remaining minute distance within the ovule to fertilize the egg52
ance51 (Supplementary Note 11). (Supplementary Video 1). Sperm motility is conferred by a fla-
All seed plants produce pollen and deliver their sperm through gellar apparatus, and most genes related to its assembly occur in
the growth of a pollen tube, whereas all non-seed land plants the C. panzhihuaensis genome. Ginkgo also retains flagellar genes,
(that is, bryophytes, lycophytes and ferns) rely on free-swimming although fewer, and most notably lacks those encoding radial spoke
motile sperm for sexual reproduction, as do the ancestors of land proteins (RSP) (that is, RSP2, RSP3, RSP9 and RSP11; Extended
plants1,4 (Extended Data Fig. 7a,b). The exceptions among seed Data Fig. 7c). By contrast, Gnetum, conifers and angiosperms,
plants are cycads and Ginkgo, both of which have pollen grains that which develop non-flagellated spermatozoa, lost many flagellar
ov le
)
e
La st ed yte e) ne
te vu
ul
rly nat ph con co
le ina d o
d
Ea olli eto le ale
Po osp ) m) em)
yo vu ll te
br d o f po lina
a (m
ic em te st
rti ag f p e
M st (s m (
Em lize e o ol
Fe st e o vul
U ag c (m lls
C ure oot ot
eg sa hy
Ancylomarina salipaludis-CFB group
at r ro
te ag o
th m te
e )
im ll t
C al m tem
Pr ora roo
M en rop
Pi biu ris
Labedaea rhizosphaerae-actinobacteria
M ary oid
CFB group &
Ap tex af
100
np am
Chitinophaga polysaccharea-CFB group
or le
i c (s
ll o
ec id
46
Actinobacteria & log10(TPM)
Pr llo
98 Actinocorallia populi-actinobacteria
(
a
67
am
r
Mycobacterium kansasii-actinobacteria
Cyanobacteria
or
1.4
C
97 Calothrix rhizosoleniae-cyanobacteria
79 Nostoc flagelliforme-cyanobacteria (outgroups)
95 Neosynechococcus sphagnicola-cyanobacteria CYCAS_004376 1.0
Silvanigrellales bacterium RF1110005-proteobacteria
Spirobacillus cienkowskii-proteobacteria CYCAS_004918
100
79 Fluviispira multicolorata-proteobacteria 0.5
30 CYCAS_004373
Silvanigrella sp. HNSRY-1-proteobacteria Proteobacteria
58 Silvanigrella aquatica-proteobacteria
Silvanigrella paludirubra-proteobacteria
CYCAS_004375 0
71 100
Chromobacterium piscinae-b-proteobacteria
100 Chromobacterium amazonense-b-proteobacteria
100
Clostridium sp. HMSC19B11-firmicutes c Plutella xylostella
d Helicoverpa armigera
Clostridioides difficile-firmicutes
100 Clostridium sp. HMSC19D07-firmicutes 100 50
100 100 Clostridioides difficile CD40-firmicutes Firmicutes * *
Clostridioides difficile P3-firmicutes
Clostridioides difficile DA00232-firmicutes 80 40
Mortality (%)
Mortality (%)
Clostridioides difficile 840-firmicutes
Rhodobacteraceae bacterium KLH11-a-proteobacteria
Chitinimonas sp. BJB300-b-proteobacteria
60 30
80 Paludibacterium yongneupense-b-proteobacteria
64
Pandoraea terrae-b-proteobacteria 40 20
100 Burkholderia stagnalis-b-proteobacteria
33 Burkholderia ubonensis-b-proteobacteria
Photorhabdus luminescens-g-proteobacteria Proteobacteria 20 10
99 Pseudomonas chlororaphis-g-proteobacteria
78 100 Pseudomonas sp. St29-g-proteobacteria
Pseudomonas sp. Os17-g-proteobacteria 0 0
100 Pseudomonas incertae sedis-g-proteobacteria PBS Cytotoxin PBS Cytotoxin
Pseudomonas protegens CHA0-g-proteobacteria
97 Pseudomonas protegens-g-proteobacteria
Neonectria ditissima-ascomycetes
e Plutella xylostella f Helicoverpa armigera
100 Epichloe typhina subsp. poae-ascomycetes Fungi 500 µm 500 µm
100 Lentinula edodes-basidiomycetes
CYCAS 004376
96 evm.model.contig000138.52 DEBAO
CYCAS 004918
100 CYCAS 004373 Cycas
evm.model.contig000009.32 DEBAO
1 CYCAS 004375
evm.model.contig000009.39 DEBAO
Fig. 5 | Origin of a Cycas insecticidal protein. a, Phylogenetic analysis of the TcdA/TcdB pore-forming domain containing proteins shows that the genes
encoding four cytotoxin proteins of Cycas were likely acquired from fungi through an ancient horizontal gene transfer event. The maximum likelihood
tree was generated by RAxML with the PROTCATGTR model and 1,000 bootstrap replicates. The numbers above the branches are bootstrap support
values. b, The expression level of four cytotoxin proteins in different tissues of C. panzhihuaensis. The digital expression values were normalized using the
TPM method. c,d, Mortalities of Plutella xylostella (c) and Helicoverpa armigera (d) after treatment with phosphate buffered saline (PBS) and cytotoxin.
The asterisk indicates a significant difference (two-sided Student’s t-test, P < 0.05, n = 3 biologically independent experiments), whereas the error bar
represents the standard error. e,f, Morphologies of Plutella xylostella (e) and Helicoverpa armigera (f) after receiving PBS and cytotoxin treatments.
structural genes (Supplementary Note 12). Outer dense fibres are Assembling the male-specific region of the Y chromosome
unique accessory structures that maintain the structural integrity (MSY) based on Nanopore long-read and Hi-C data resulted in
of flagella and are vital for flagellar function53. Outer dense fibres 45.5 Mb of sequence distributed over 43 scaffolds, most of which
exist in C. panzhihuaensis and Gingko biloba, as well as all non-seed aligned to the sex-differentiation region on chromosome 8 (Fig. 4c
land plants, but are absent in Gnetum, conifers and angiosperms, and Supplementary Fig. 38). The assembled MSY had an almost
all of which have non-motile sperm (Extended Data Fig. 7c). The 80-Mb difference in length from the corresponding region on the
shift from swimming to non-motile sperm is a major innovation in X chromosome, which agrees with the heteromorphy of the Cycas
land plant evolution, and C. panzhihuaensis and G. biloba exhibit an sex chromosomes. We annotated 624 putative protein-coding
ancestral gene content that is part of the shift from producing flagel- genes within the MSY, 11 of which were highly expressed (tran-
late to non-flagellate sperm cells. scripts per million (TPM) > 1) in the microsporophylls. The most
highly expressed gene in the MSY and also the most differentially
Sex chromosomes and sex determination in Cycas regulated gene between the two sexes is CYCAS_034085 (Fig.
Heteromorphic chromosomes have been reported to be associated 4d,e and Extended Data Fig. 8), which encodes a GGM13-like
with sex determination in Cycas54. To reveal the underlying genetic MADS-box transcription factor (TF), belonging to a lineage sister
mechanism of sex determination, we carried out genome-wide to the angiosperm AP3/PI clade that plays crucial roles in floral
association studies (GWAS) analysis of sex as a binary phenotype development. Its closest homologue, CYCAS_010388, was identi-
for C. panzhihuaensis and identified the most significant association fied on autosomal chromosome 2. In contrast to CYCAS_034085,
signals on chromosome 8, spanning the first 124 Mb on the refer- CYCAS_010388 was much more highly expressed in the ovule than
ence female genome (Fig. 4a). This sex-associated region is also the in the microsporophyll (Fig. 4e). A male-specific polymerase chain
most differentiated between male and female Cycas genomes, with reaction (PCR) product of CYCAS_034085 was amplified from all
the largest fixation index (FST; Supplementary Fig. 37) and the most tested male cycad samples, but was not detected in female samples,
differentiated nucleotide diversity (π) and heterozygosity ratios whereas a CYCAS_010388-specific PCR product was amplified
characterizing the window between 18 and 50 Mb on chromosome in both males and females (Fig. 4g and Supplementary Fig. 39b).
8 (Fig. 4b and Supplementary Note 13). These results confirm that Because of the presence in MSY and its exclusive expression pat-
Cycas possesses an XY sex determination system positioned on tern in males, we named CYCAS_034085 as MADS-Y, a potential
chromosome 8. sex determination gene.
The reduced size of MSY compared with the X chromosome Cycas obtained a cytotoxin defence gene via horizontal
indicates that the Y chromosome of Cycas, unlike that reported for gene transfer
some angiosperms55, underwent severe degeneration and gene loss. Genes of fungal or bacterial origin are rare in seed plants61. However,
The most divergent 32-Mb region (between the 18 and 50 Mb loca- we identified a gene family in the C. panzhihuaensis genome that
tions) between the X and Y chromosomes probably represents an appears to have been acquired from a microbial organism and that
ancient evolutionary segment in the Cycas sex chromosomes. The codes for a Pseudomonas fluorescens insecticidal toxin (fitD). The
broad association of the MADS-Y homologue with sex in cycads acquired genes are flanked by vertically inherited plant sequences.
indicates a conserved sex determination system within this ancient We further confirmed that the relevant assembled regions were free
lineage (Fig. 4f and Supplementary Fig. 39). Moreover, the pres- of bacterial contamination. Transcriptomes and PCR amplification
ence of GbMADS4, a homologue of the Cycas MADS-Y, in Ginkgo from genomic DNA indicated that these genes occur in many Cycas
male-specific contigs56 suggests that the same mechanism for sex species (Supplementary Note 16). The fitD gene family comprises
determination might have originated before the split of cycads and four gene copies in the C. panzhihuaensis genome and three copies
Ginkgo, thus representing an ancient system of sex determination in the C. debaoensis genome (Supplementary Table 51); each copy
in seed plants. encodes a protein that is similar to the fit toxin and the ‘makes cat-
erpillars floppy’ (mcf) toxin of the bacterium Photorhabdus lumi-
Evolution of disease and herbivore resistance genes nescens, a lethal pathogen of insects. Both fit and mcf toxins are
All three types of immune receptors—CC-NBS-LRR (CNL), known for their insecticidal properties, and fit- or mcf-producing
TIR-NBS-LRR (TNL) and RPW8-NBS-LRR (RNL)—show patterns bacteria are often used in pest biocontrol62–64. Phylogenetic analyses
of expansion in C. panzhihuaensis and other gymnosperms, com- suggest that the fitD genes might have been acquired from fungi and
pared with non-seed plants (Supplementary Note 14). CNLs are then expanded before the divergence of C. panzhihuaensis and C.
expanded widely in both gymnosperms and angiosperms, whereas debaoensis (Fig. 5a). The fitD family genes are mainly expressed in
the TNL family tends to have been more expanded in gymnosperms roots, reproductive tissues such as male cones, unpollinated or early
than in most angiosperms, indicating different evolutionary pat- stages of pollinated ovules and embryos (Fig. 5b). Injection of the
terns of plant resistance (R) genes in these two lineages. Our data synthesized C. panzhihuaensis fitD protein resulted in significantly
suggest that RNL genes occur widely in gymnosperms. The RNL higher mortality in larvae of both the diamondback moth (Plutella
family plays a critical role in downstream resistance signal trans- xylostella) and cotton bollworm (Helicoverpa armigera) (Fig. 5c,d).
duction in angiosperms, and the broad occurrence of the RNL The acquisition of the fitD gene family may have provided an
family in gymnosperms suggests that this signalling pathway may important defence for Cycas against insect pests.
have been established no later than the origin of seed plants. Gene
families encoding resistance-related proteins are greatly expanded Conclusions
in C. panzhihuaensis and other gymnosperm genomes compared The high-quality genome sequence for Cycas, the last major lineage
with non-seed plants (Supplementary Note 14). For example, of seed plants for which a high-quality genome assembly was lack-
genes encoding endochitinases and chitinases as defences against ing, closes an important gap in our understanding of genome struc-
chitin-containing fungal pathogens are expanded as tandem repeats ture and evolution in seed plants. This genome enables comparative
in the C. panzhihuaensis and most gymnosperm genomes com- genomics and phylogenomic analyses to unravel the genetic control
pared with other land plants. of important traits in cycads and other gymnosperms, including a
Cycads comprise many more living species57 than Ginkgo, which WGD shared by gymnosperms, a sex determination mechanism
was once diverse in the Mesozoic but includes only one extant spe- that appears to be shared by cycads and Ginkgo, and critical gene
cies58. One possible explanation is that cycads may have acquired innovations including those that enable seed and pollen tube forma-
enhanced resistance to pathogens and herbivores through encoding tion, as well as chemical defence.
diversified resistance-related genes and the biosynthesis of diversi-
fied secondary compounds4,8. Indeed, comparisons of the Cycas and Methods
Ginkgo genomes reveal many Cycas-specific orthogroups enriched Plant materials. Fresh megagametophytes of Cycas panzhihuaensis, cultivated in
the garden of the Kunming Institute of Botany, Chinese Academy of Sciences, were
in pathogen interaction pathways (Supplementary Note 14), and C. collected for genome sequencing. The plant was originally transplanted from the
panzhihuaensis also shows remarkable expansions in plant immunity Pudu River, Luquan county, Yunnan, China (25° 57′ 35.2584″ N, 102° 43′ 41.5848″
and stress response gene families compared with Ginkgo, including E) and the voucher specimen (collection number: PZHF03) has been deposited
genes that encode programmed cell death, abiotic stress response, in the Herbarium of the Kunming Institute of Botany (KUN). For transcriptome
serine protease inhibitors against pests and ginkbilobin with anti- sequencing, we sampled 12 different types of organs and tissues from C.
panzhihuaensis, including megagametophyte, pollen sac, microsporophylls, apical
bacterial and antifungal activities (Supplementary Note 14). meristem of stem, cortex of stem, pith of stem, cambium of stem, mature leaf,
Terpenoids are a diverse group of secondary metabolites young leaf, primary root, precoralloid roots and coralloid roots (Supplementary
encoded by terpene synthase (TPS) genes59. Several TPS subfamilies Table 2). Ovule material was collected from two artificially pollinated individuals,
(TPS-a to TPS-h) are known in plants60, among which the TPS-d and we divided the development stages into four: unpollinated ovule (before
the artificial pollination), early stage of pollinated ovule (21 d after the artificial
family is unique to gymnosperms, and three of the four types of
pollination), late stage of pollinated ovule (88 d after the artificial pollination) and
TPS-d were found in C. panzhihuaensis, with remarkable expan- fertilized ovule or seed (119 d after the artificial pollination) (Supplementary Tables
sions of TPS-d2 compared with Ginkgo and most other gymno- 2 and 19). In addition, stem and root tissues of C. panzhihuaensis were used to
sperms (Supplementary Note 15). In addition, we identified a novel generate full-length transcriptomes (Supplementary Table 2). For phylogenomic
TPS subfamily in Cycas, with three copies in C. panzhihuaensis and analyses, we newly generated transcriptomes of 47 gymnosperms (Supplementary
Tables 2 and 13). We also sequenced transcriptomes of 339 cycad species
eight copies in Cycas debaoensis (Extended Data Fig. 9a). The gene (Supplementary Tables 2 and 14). For population resequencing, fresh leaf samples
expression levels of all TPS genes across different C. panzhihuaensis were collected for 31 male and 31 female plants that were randomly sampled in the
tissues (Extended Data Fig. 9b) reveal that many TPS genes are Cycas panzhihuaensis National Natural Reserve in Sichuan, China, where there is a
mainly expressed in the root (especially primary root and coral- population of approximately 38,000 C. panzhihuaensis individuals (Supplementary
loid root), microsporophyll and pollen sac, late stage of the polli- Table 4).
nated ovule and fertilized ovule. The three Cycas-specific TPS genes DNA and RNA sequencing. For genome sequencing, the genomic DNA
were mainly expressed in the root and male cone, but one of them was extracted by the QIAGEN Genomic kit followed the manufacturer’s
(CYCAS_009486) is particularly highly expressed in the megagame- instructions65. Nanodrop and Qubit (Invitrogen) were used to quantify the
tophyte and in the post-pollination and fertilized ovule. DNA. Nanopore libraries were prepared by SQK-LSK108 and sequenced using
Extended Data Fig. 1 | Genome features of C. panzhihuaensis. Outer ring: The 11 chromosomes are labeled from Chr1 to Chr11. Inner rings 1-4 (from
outside to inside): Repeat elements number shown in light purple. GC content colored indicated in light blue (y-axis min-max: 0.27–0.48). Expressed base
percentage colored in light blue (y-axis min-max: 0–0.20). Gene numbers colored in light orange (y-axis min-max: 0-30). The sliding window of the inner
rings 1-4 is 1 Mb. The inner ring 5 indicates the miRNA location over the genome. The blue lines inside represent the syntenic regions in Cycas.
Extended Data Fig. 2 | Comparative analysis of C. panzhihuaensis. Extended Data Fig. 2. Comparative analysis of C. panzhihuaensis. (a) Comparison of
the longest 10% of introns and gene in the representative land plants. The minimum, first quartile (Q1), median, third quartile (Q3), and maximum value
was indicated in the box-plot by order after excluding the outliers. (b) Comparison of components of intron across the selected plants.
Extended Data Fig. 3 | The chronogram of 90 vascular plant species inferred with MCMCTree based on 100. nuclear single copy genes with concordant
evolutionary histories. 25 fossil calibrations and 2 secondary calibrations were used. Individual gene trees (1,569 NT tree) were mapped on the nuclear
coalescent tree with Phyparts. The pie charts at each node show the proportion of genes in concordance (blue), conflict (green = a single dominant
alternative; red = all other conflicting trees), and without enough information (gray). Quartet support for six internal branches I, II, III, IV, V, VI were
indicated on the left panel as barcharts. Image courtesy of Zanqian Li and Xiaolian Zeng.
Extended Data Fig. 4 | Ancestral polyploidy events in extant gymnosperms. Example showing both the phylogenomic and syntenic evidence supporting
an ancestral polyploidy event in extant gymnosperms. Four pairs of paralogous genes in OG0000093, OG0000255, OG00000276 and OG0000316
were duplicated before the divergence of gymnosperms and after the split of angiosperms and gymnosperms based on phylogenetic trees. These pairs
of duplicated genes are located on the same syntenic block identified in the C. panzhihuaensis genome. The abbreviated name given before the protein ID
represents species name: CYCAS: Cycas panzhihuaensis, Gb: Ginkgo biloba, ELO: Encephalartos longifolius, SEGI: Sequoiadendron giganteum, GMON: Gnetum
montanum, PICABI: Picea abies, PITA: Pinus taeda.
Extended Data Fig. 5 | The phylogeny of LAFL(NF-YB, ABI3, FUS3, and LEC2) transcriptional regulators. (a) Phylogenetic tree of the NF-YB. The tree
was constructed using the maximum likelihood method with 500 bootstrap replicates. The bootstrap values are shown on the branches. (b) Phylogenetic
tree of the B3 domain containing the gene family of C. panzhihuaensis. Bootstrap values are shown on the braches. (c) Transcript expression level is
indicated by TPM during seed development. The phylogenetic trees were built using RAxML (estimating branch support values by bootstrap iterations
with 500 replicates) with PROTGAMMAGTRX amino acid substitution model. The abbreviated name given before the protein ID represents species name:
CYCAS: Cycas panzhihuaensis, Gb: Ginkgo biloba, SEGI: Sequoiadendron giganteum, GMON: Gnetum montanum, PICABI: Picea abies, PITA: Pinus taeda, ATH,
Arabidopsis thaliana, DEBAO: Cycas debaoensis, AMTR: Amborella trichopoda, OS: Oryza sativa, AFILI: Azolla filiculoides, SACU: Salvinia cucullata, SELMO:
Selaginella moellendorffii, PPATEH: Physcomitrella patens, MARPO: Marchantia polymorpha.
Extended Data Fig. 6 | Phylogenetic tree of CESA/CSL gene families. (a) Phylogenetic trees of CESA and CSL gene families. (b) Phylogenetic tree of
CSLB and CSLH genes. (c) The phylogenetic tree of CSLE and CSLG genes. The CSLE/G from gymnosperm are the ancestral form of the angiosperm CSLE
and CSLG. The phylogenetic trees were generated using RAxML with PROTCATGTR model and 500 bootstrap replicates. Bootstrap values ≥ 50% are
shown. The abbreviated name given before the protein ID represents species name: CYCAS: Cycas panzhihuaensis, Gb: Ginkgo biloba, SEGI: Sequoiadendron
giganteum, GMON: Gnetum montanum, PICABI: Picea abies, PITA: Pinus taeda, ATH, Arabidopsis thaliana, DEBAO: Cycas debaoensis, AMTR: Amborella
trichopoda, OS: Oryza sativa.
Extended Data Fig. 7 | The Evolution of flagella related genes in embrophyta. (a) Sketch of the Cycas sperm. (b) Schematic diagram of flagellum loss
events in green linage. (c) Distribution of outer dense fiber protein and other key flagellar proteins across representative embrophyta.
Extended Data Fig. 8 | The phylogeny and expression level of TPS. (a) Phylogenetic tree of the TPS gene family. The tree was constructed using
RAxML (the maximum-likelihood method) with PROTCATGTR amino acid substitution model and 500 bootstrap replicates. The bootstrap values ≥
50% are shown in the central branches. The red colors in the tree represent the cycas genes. (b) Heatmap of TPS gene family in different tissues of C.
panzhihuaensis. The * denotes the C. panzhihuaensis specific TPS genes.
Extended Data Fig. 9 | Two MADS-box transcription factor genes differentially expressed in reproductive organs of C. panzhihuaensis. (a) Heatmap of
1,971 genes differentially expressed in males and females’ organs. Arrows indicate CYCAS_034085 on the MSY and CYCAS_010388 on chromosome 2.
(b) Expression of CYCAS_034085 on MSY and CYCAS_010388 on chromosome 2 in male microsporophyll and in the ovule.