0% found this document useful (0 votes)
30 views

Point of View: Syst. Biol. 69 (6) :1231-1253, 2020

Uploaded by

aris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Point of View: Syst. Biol. 69 (6) :1231-1253, 2020

Uploaded by

aris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Copyedited by: YS MANUSCRIPT CATEGORY: Systematic Biology

Point of View
Syst. Biol. 69(6):1231–1253, 2020
© The authors 2020. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License
(http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
For commercial re-use, please contact [email protected]
DOI:10.1093/sysbio/syaa026
Advance Access publication April 16, 2020

Repositories for Taxonomic Data: Where We Are and What is Missing


AURÉLIEN MIRALLES1,2 , TEDDY BRUY1,2 , KATHERINE WOLCOTT2,3 , MARK D. SCHERZ4,5 , DOMINIK BEGEROW6 ,
BANK BESZTERI7 , MICHAEL BONKOWSKI8 , JANINE FELDEN9,10 , BIRGIT GEMEINHOLZER11 , FRANK GLAW4 ,
FRANK OLIVER GLÖCKNER10 , OLIVER HAWLITSCHEK4,12 , IVAYLO KOSTADINOV13 , TIM W. NATTKEMPER14 ,
CHRISTIAN PRINTZEN15 , JASMIN RENZ16 , NATALIYA RYBALKA17 , MARC STADLER18 , TANJA WEIBULAT13 , THOMAS WILKE19 ,
SUSANNE S. RENNER2,∗ , AND MIGUEL VENCES20
1 Departement Origins and Evolution, Institut Systématique, Evolution, Biodiversité (ISYEB), Muséum national d’Histoire naturelle, CNRS, Sorbonne
Université, EPHE, 57 rue Cuvier, CP50, 75005 Paris, France; 2 Systematic Botany and Mycology, University of Munich (LMU), Menzingerstraße 67, 80638
Munich, Germany; 3 National Museum of Natural History, Smithsonian Institution, Washington, DC, USA; 4 Department of Herpetology, Zoologische
Staatssammlung München (ZSM-SNSB), Münchhausenstraße 21, 81247 München, Germany; 5 Department of Biology, Universität Konstanz,
Universitätstraße 10, 78464 Konstanz, Germany; 6 Department of Geobotany, Ruhr-University Bochum, Universitätsstraße 150, 44780 Bochum, Germany;
7 Department of Phycology, Faculty of Biology, University of Duisburg-Essen, Universitätsstraße 2, 45141 Essen, Germany; 8 Department of Terrestrial
Ecology, Center of Excellence in Plant Sciences (CEPLAS), Terrestrial Ecology, Institute of Zoology, University of Cologne, 50674 Köln, Germany;
9 MARUM - Center for Marine Environmental Sciences, University of Bremen, Leobenerstraße 8, 28359 Bremen, Germany; 10 Alfred Wegener Institute -

Helmholtz Center for Polar- and Marine Research, Am Handelshafen 12, 27570 Bremerhaven, Germany; 11 Department of Systematic Botany, Justus Liebig
University Gießen, Heinrich-Buff Ring 38, 35392 Giessen, Germany; 12 Department of Scientific Infrastructure, Centrum für Naturkunde (CeNak),
Universität Hamburg, Martin-Luther-King-Platz 3, 20146 Hamburg, Germany; 13 GFBio - Gesellschaft für Biologische Daten e.V., c/o Research II, Campus
Ring 1, 28759 Bremen, Germany; 14 Biodata Mining Group, Center of Biotechnology (CeBiTec), Bielefeld University, PO Box 100131, 33501 Bielefeld,
Germany; 15 Department of Botany and Molecular Evolution, Senckenberg Research Institute and Natural History Museum Frankfurt, Senckenberganlage
25, 60325 Frankfurt/Main, Germany; 16 Zooplankton Research Group, DZMB – Senckenberg am Meer, Martin-Luther-King Platz 3, 20146 Hamburg,
Germany; 17 Department of Experimental Phycology and Culture Collection of Algae, University Göttingen, Nikolausberger-Weg 18, 37073 Göttingen,
Germany; 18 Department Microbial Drugs, Helmholtz Centre for Infection Research (HZI), and German Centre for Infection Research (DZIF), Partner Site
Hannover-Braunschweig, Inhoffenstrasse 7, 38124 Braunschweig, Germany; 19 Department of Animal Ecology and Systematics, Justus Liebig University
Gießen, Heinrich-Buff Ring 26, 35392 Giessen, Germany; and 20 Department of Evolutionary Biology, Zoological Institute, Technische Universität
Braunschweig, Mendelssohnstraße 4, 38106 Braunschweig, Germany
∗ Correspondence to be sent to: Systematic Botany and Mycology, University of Munich (LMU), Menzingerstraße 67, 80638 Munich, Germany;
E-mail: [email protected]

Received 13 November 2019; reviews returned 20 February 2020; accepted 24 March 2020
Associate Editor: Matt Friedman

Abstract.—Natural history collections are leading successful large-scale projects of specimen digitization (images, metadata,
DNA barcodes), thereby transforming taxonomy into a big data science. Yet, little effort has been directed towards
safeguarding and subsequently mobilizing the considerable amount of original data generated during the process of
naming 15,000–20,000 species every year. From the perspective of alpha-taxonomists, we provide a review of the properties
and diversity of taxonomic data, assess their volume and use, and establish criteria for optimizing data repositories. We
surveyed 4113 alpha-taxonomic studies in representative journals for 2002, 2010, and 2018, and found an increasing yet
comparatively limited use of molecular data in species diagnosis and description. In 2018, of the 2661 papers published
in specialized taxonomic journals, molecular data were widely used in mycology (94%), regularly in vertebrates (53%),
but rarely in botany (15%) and entomology (10%). Images play an important role in taxonomic research on all taxa, with
photographs used in >80% and drawings in 58% of the surveyed papers. The use of omics (high-throughput) approaches or
3D documentation is still rare. Improved archiving strategies for metabarcoding consensus reads, genome and transcriptome
assemblies, and chemical and metabolomic data could help to mobilize the wealth of high-throughput data for alpha-
taxonomy. Because long-term—ideally perpetual—data storage is of particular importance for taxonomy, energy footprint
reduction via less storage-demanding formats is a priority if their information content suffices for the purpose of taxonomic
studies. Whereas taxonomic assignments are quasifacts for most biological disciplines, they remain hypotheses pertaining
to evolutionary relatedness of individuals for alpha-taxonomy. For this reason, an improved reuse of taxonomic data,
including machine-learning-based species identification and delimitation pipelines, requires a cyberspecimen approach—
linking data via unique specimen identifiers, and thereby making them findable, accessible, interoperable, and reusable for
taxonomic research. This poses both qualitative challenges to adapt the existing infrastructure of data centers to a specimen-
centered concept and quantitative challenges to host and connect an estimated ≤2 million images produced per year by
alpha-taxonomic studies, plus many millions of images from digitization campaigns. Of the 30,000–40,000 taxonomists
globally, many are thought to be nonprofessionals, and capturing the data for online storage and reuse therefore requires
low-complexity submission workflows and cost-free repository use. Expert taxonomists are the main stakeholders able to
identify and formalize the needs of the discipline; their expertise is needed to implement the envisioned virtual collections
of cyberspecimens. [Big data; cyberspecimen; new species; omics; repositories; specimen identifier; taxonomy; taxonomic
data.]

Taxonomy, the science of documenting, naming, clas- of direct relevance for documenting and understanding
sifying, and understanding the diversity of life on biodiversity dynamics in the face of global change. Since
Earth (Simpson 1961; Small 1989; Stuessy et al. 2014), the current system of binomial scientific names was
is deeply embedded in evolutionary biology. It also is introduced by Linnaeus (1753, 1758), taxonomists have

1231

[12:58 8/10/2020 Sysbio-OP-SYSB200026.tex] Page: 1231 1231–1253


Copyedited by: YS MANUSCRIPT CATEGORY: Systematic Biology

1232 SYSTEMATIC BIOLOGY VOL. 69

named about 1.8 million species (Roskov et al. 2019), journals lack mechanisms (and funds) for the mainten-
and an unknown but undoubtedly vast number of ance of online supplementary documents with original
species remain unnamed (Wheeler 2007; Mora et al. specimen-based data, and specialized taxonomic data
2011; Fontaine et al. 2012; Costello et al. 2013a,b; Locey repositories are largely lacking, as we will show below.
and Lennon 2016; Larsen et al. 2017). With an estimated The importance of the availability, connectivity, and
global holding of 3 billion biological specimens in management of data in taxonomy is obvious (Gemein-
collections (Brooke 2000) and some 15,000–20,000 species holzer et al. 2020) and is reflected in concepts of
descriptions per year (IISE 2011: numbers for 2006 and cybertaxonomy (Pyle et al. 2008; Winterton 2009; LaSalle
2007 are 16,969 and 18,516, respectively; this study: et al. 2009; Padial et al. 2010; Balke et al. 2013; Favret
Fig. 1) taxonomy clearly qualifies as big data science 2014; Rosenberg 2014; Stackebrandt and Smith 2019).
by fulfilling the main criteria of volume, variety, and As claimed by Bik (2017), if we play our cards right,
velocity (De Mauro et al. 2016). Still, initiatives to taxonomy could be on the brink of another golden
implement cybertaxonomic approaches in taxonomic age. Driven by the need to comprehensively explore
publishing (Smith et al. 2013; Penev et al. 2018) have and document Earth’s species (Wheeler et al. 2012a),
not been widely adopted, and, most importantly, the big advances are being made in building cybertaxo-
rate of new species naming has failed to increase, nomic infrastructures, especially by digitally mobilizing
metadata and images of voucher specimens in biological
despite the rise of ever more efficient computational
collections as well as literature, by increasingly regis-
and DNA sequencing tools available. One reason is
tering nomenclatural acts online (Krell 2015), and by
that the basic species diagnosis and description pro- building curated databases of species names, diagnoses,
cedure has remained unchanged (Fig. 1, original data and descriptions (Crous et al. 2004; Patterson et al. 2010;
in Supplementary Appendix S1 available on Dryad at Webster 2017). At the moment, for instance, 172 taxonomic
http://dx.doi.org/10.5061/dryad.fj6q573qd). databases are contributing to the Catalogue of Life (Roskov
Naming a new species not only involves gathering et al. 2019).
images, measurements, and molecular sequences for Here, we review the data repositories currently avail-
a few reference specimens but also a comprehensive able for taxonomic data and describe how improved
comparative study to distinguish the new from the data management could contribute to improving the
already known. Little effort has been directed toward
inventory of life on Earth. We focus on alpha-taxonomy
harvesting the massive amount of original data that is
the purpose of which is to establish an inventory
being generated in the species naming process, and it
is therefore often not safeguarded in repositories. As of the past and present species on Earth, combining
in other fields of evolutionary biology, nonmolecular (i) a fundamental component, grounded in evolution-
archived data are often incomplete or insufficiently ary biology, which consists of specimen-based species
standardized, and therefore not available for reuse delimitation and (ii) an applied component, which
(Roche et al. 2015). Furthermore, many taxonomic consists of providing a universal communication system
to unambiguously communicate about biodiversity. This
is achieved via the assignment of a two-part name in
Latin ruled by taxon-specific codes of nomenclatures,
all of which require (i) designating type material from
a collection and (ii) a diagnosis that sets the new
taxon apart from the most similar already named taxa.
Descriptions are not mandatory in any of the five codes
for the simple reason that Linnaeus did not use them,
relying instead on concise diagnoses (Renner 2016).
Data for fundamental research in alpha-taxonomy
of eukaryotes necessarily are specimen-based. They
are therefore not covered in species-based taxonomic
databases that store information on diagnostic features,
synonymy, distribution, phylogeny, traits, or natural
history of species. The original analyses carried out for
this study show that many established data repositories
do not meet the requirements of taxonomists for data
submission, retrieval, searchability, and reuse.
FIGURE 1. Trends over time in taxonomic output (new species named
per year) compared to number of academic publications, computing
power, and DNA sequencing capacity. Numbers of new species were
compiled from the Index to Organism Names (organismnames.com),
the International Plant Name Index (ipni.org), and MycoBank (myco-
PROPERTIES AND DIVERSITY OF ALPHA-TAXONOMIC DATA
bank.org); scientific knowledge is represented as number of academic Historically, taxonomy was based on an essentialist
publications compiled from Scopus (scopus.com); computing power
is the number of transistors on silicon chips (Moore’s law; data from
concept, with members of a species assumed to share an
Rupp 2018); DNA sequencing capacity is the number of Mbp that essence setting them apart from other species. Today, tax-
can be sequenced per 1000 US$. All data presented as 2-year averages onomy is embedded in evolutionary biology, and species
(Original data in Supplementary Appendix S1 available on Dryad). are seen as inferred population-level evolutionary lin-

[12:58 8/10/2020 Sysbio-OP-SYSB200026.tex] Page: 1232 1231–1253


Copyedited by: YS MANUSCRIPT CATEGORY: Systematic Biology

2020 MIRALLES ET AL. — REPOSITORIES FOR TAXONOMIC DATA 1233

eages (Mayden 1997; de Queiroz 1998, 2007; Padial et al. they are destructively sampled for carbon-14 dating,
2010). This change of paradigm, however, did not change scanning electron microscopy, or DNA isolation, some
how other biological disciplines, and most end users authors have pushed for the introduction of digital
of taxonomies, tend to conceive and utilize taxonomic type specimens or cybertypes (e.g., Godfray 2007). Such
species hypotheses: individual organisms are examined cybertypes would be a complement (not a substitute)
and their traits are considered as representative for the to physical types deposited in collections. Represent-
nominal species to which they were assigned by the ing visual type information online is becoming more
most recent taxonomist to label or otherwise “identify” widespread (Bosselaers et al. 2010; Wheeler et al.
the organism in question (Supplementary Appendix 2012b; Faulwetter et al. 2013; Akkari et al. 2015; Scherz
S2 available on Dryad). This implies that databases et al. 2016a,b). Wheeler et al. (2012b) suggested that
a cybertype should minimally comprise a photo of
for end users of taxonomy, in science, and society,
the holotype and ideally additional photos of the
will be centered on species names: traits, geographic
organism in life, as well as detailed photos of important
ranges, taxonomy, phylogeny, diagnoses, images, or diagnostic characters. The cybertypes of Faulwetter et al.
DNA sequences will primarily be labeled with and (2013) and Akkari et al. (2015), for example, include
retrieved via scientific names and conceptualized as microCT scans with iodine, also known as diffusible
representing the respective species in other research, iodine-based contrast-enhanced computed tomography
identification tools, laws, and conservation assessments. (diceCT), which were used to create 3D digital models
The alpha-taxonomic workflow itself, that is, the of the external and internal morphology of specimens
elaboration of species hypotheses, follows a different without permanently damaging them (Gignac et al.
approach. Ideally, multiple individuals are studied to 2016). Such cyberspecimens (Favret 2014), could be expan-
infer “sufficiently” divergent, evolutionarily independ- ded by nonvisual characteristics (e.g. DNA sequences
ent population-level lineages, and based on this evalu- or sound recordings, Fig. 2). Standards for digital
ation, they are assigned species rank. The species is thus representations of specimens are so far lacking, but it is
not the basic unit of research, but instead the endpoint obvious that the cyberspecimen concept, also referred
and result of a study (Supplementary Appendix S2 as extended specimens (Cicero et al. 2017; Lendemer
available on Dryad). Independent of the species concept et al. 2020), implies digital publication of extensive and
and species criteria used, alpha-taxonomic research is diverse data packages connected via unique specimen
centered on individual organisms in order to assess identifiers.
variation and so are the data produced during this The data that are generated in taxonomic research—
research activity. and that would make up a cyberspecimen—are
The unit studied by alpha-taxonomists typically is a extremely diverse, depending on the organisms studied
specimen—either an individual organism, or in the case and the methods used (Table 1). They comprise both
of paleontology, part thereof, or a cultured isolate com- metadata and taxonomic data, and in a data manage-
posed of multiple, often clonal individuals. Of particular ment context it is crucial to conceptually distinguish
importance are name-bearing type specimens, which these two categories (Fig. 3). Metadata come in different
constitute anchors for assigning a scientific name to a categories (Riley 2004): in alpha-taxonomy, specimen
species. Almost universally, these are physical objects metadata characterize a specimen as a collection item, and
(preserved organisms or their parts, metabolically inact- contextualize it (Leonelli 2014) by providing information
ive strains, or living, viable cultures) as recommended by on taxonomic assignment (species name, supraspecific
all five codes of nomenclature (Amorim et al. 2016; Santos ranks), type status, spatial, and temporal origin (collec-
tion date and location), and other technical and historical
et al. 2016; Renner 2016), although where type specimens
characteristics (collector name, preservation modality
are declared lost, images can be used. Fierce disputes
or storage coordinates including institution, collection,
revolve around the option of basing new scientific nom-
individual identifier). In contrast, taxonomic data are
ina on photographs, videos, or DNA sequences alone
those that characterize the specimen as a biological entity.
(Ceriaco et al. 2016; Thorpe 2017; Krell and Marshall 2017; They represent different kinds of raw or encoded data
Garraffoni and Freitas 2017). In mycology, proposals have intended to capture or to describe biological charac-
been put forward to allow DNA sequences alone, even teristics, such as morphological, anatomical, molecular,
environmental DNA sequences, as a basis for naming or behavioral traits. They are most often generated a
new species (Hawksworth et al. 2016) but the majority of posteriori in the framework of research that includes
mycologists are presently reluctant to accept voucherless the specimen, but they can also be generated in situ
species-level taxa to be validly erected (May et al. during the collection of the specimen, anticipating future
2018), also because many comparative DNA sequences investigations (for instance, pictures taken in the field to
available from repositories are insufficiently linked to document coloration in life).
permanently preserved specimens (Hongsanan et al. Data on a specimen comprise both raw data and
2018; Zamora et al. 2018). processed, selected, and encoded data (Fig. 4; Table 2).
Because physical specimens in collections are not Specimen metadata can also become important raw
always accessible and deteriorate as they age or as material for taxonomic research, for example, when

[12:58 8/10/2020 Sysbio-OP-SYSB200026.tex] Page: 1233 1231–1253


Copyedited by: YS MANUSCRIPT CATEGORY: Systematic Biology

1234 SYSTEMATIC BIOLOGY VOL. 69

FIGURE 2. Schematic representation of a cyberspecimen: a virtual representation of a physical specimen to which the cyberspecimen is
linked by a unique identifier. Primarily, the cyberspecimen consists of a high-resolution digital representation, ideally a 3D image obtained
for example, via microCT-scanning or photogrammetry. The cyberspecimen additionally contains all other digital data obtained specifically
from the specimen, including photographs of the specimen “in life,” morphometric data, genetic (genomic) sequences, sound files, or chemical
profiles, all linked by the specimen identifier to the physical/true specimen.

geographical coordinates are used to model and distin- The specimen identifier approach still has to overcome
guish environmental niches (e.g. Rissler and Apodaca multiple practical problems due to ambiguities in defin-
2007; Cicero et al. 2017), when time of collection is ing what a specimen is (Supplementary Appendix S3
used to characterize phenology, migration behavior, available on Dryad). For instance, in most insect collec-
or invasiveness (Chauvel et al. 2006; Miller-Rushing tions, specimens—individual insects in the collection—
et al. 2006; Grass et al. 2014; Lorieul et al. 2019), or have no identifying number and usually also lack a
to determine the applicable regulations for access and catalog that could provide an inventory of specimens.
benefit sharing, which depend on the time when a Even type specimens may lack individual specimen
specimen was acquired by a collection. identifiers (e.g. Zompro 2005). This is a massive imped-
The heterogeneous taxonomic data themselves also iment considering an overall estimated 500 million
need to be described with contextual or methodological preserved insect specimens in collections (Short et al.
information. This might include the device, methods, 2018). For most of these specimens, the associated
and conditions used for photographic, tomographic, metadata are pinned on small labels underneath the
or sound recording (Roch et al. 2016; Köhler et al. specimen and therefore cannot be scanned without
2017), laboratory methods for histological staining or labor-intensive unpinning of every specimen. If several
molecular sequencing, or even the sociological context specimens have been collected at the same location and
of the data collection (McClellan 2019)—these constitute time, their metadata will be identical, and distinguishing
what one might consider “metadata of taxonomic data” among these specimens is impossible from the metadata.
(not the same as specimen metadata). Ideally, these While it is possible to consider these and other bulk
data and metadata must all be accommodated in the samples as one specimen, problems arise if data (DNA
archiving process. On the one hand, these intricate sequences, images, measurements) refer to only one of
requirements suggest that a distributed system of spe- the individuals included in the bulk, and problems are
cialized repositories for specific kinds of taxonomic exacerbated if the bulk is found to contain individuals of
data would be the best approach. On the other hand, different characteristics or even species (see also Nelson
it is preferable to adjust the existing infrastructure of et al. 2018).
established repositories rather than create new ones and Many natural history collections are currently digit-
to streamline the submission process of diverse data izing their specimens. For instance, 91% of the 5.5
via user-friendly submission portals. The key lies in million plant specimens deposited in the world’s largest
linking the data to a single specimen for which a specimen herbarium (MNHN in Paris) have been photographed at
identifier will be required (Güntsch et al. 2018). high resolution and made available online in less than a

[12:58 8/10/2020 Sysbio-OP-SYSB200026.tex] Page: 1234 1231–1253


2020
Copyedited by: YS

TABLE 1. Data types used and/or produced in the context of taxonomy, currently or potentially in the future, their predicted storage requirements and main issues to be solved to allow
their efficient storage and reuse
Current use in Potential and Storage Established Issues and gaps
alpha- prospective use in requirements specialized

[12:58 8/10/2020 Sysbio-OP-SYSB200026.tex]


taxonomy taxonomy (per specimen) repositories
Regular images Regularly used Images of different Moderate to very Yes (many Images produced in
(e.g., .jpeg, .pdf, kinds will continue to high, depending specialized and taxonomic revisions
.tiff) be a main workhorse on image quality generalist are rarely submitted to
of taxonomic and quantity repositories will repositories; images
description and accept images) are often not linked to
identification; new specimen identifiers
perspectives by
machine-learning
character extraction
High-resolution Increasingly used, As with regular images High to very high Yes (many As with regular images
images (stacks e.g. in insects specialized and
etc.) (e.g., .tiff) generalist
repositories will
accept images)
Annotated images Very rarely used Documentation of High to very high Only few specialized Requires development of
morphology of repositories standards for
small-sized organisms repositories and
(e.g., on a microscopic submitters
slide)
3D microCT, pho- Used rarely but High importance to High to very high, Yes, several Requires development of
togrammetry, regularly, visualize internal depending on standards for
and laser especially in features of an storage modality repositories and
scanners (e.g., vertebrates. organism or 3D (e.g., polygon submitters. See
MANUSCRIPT CATEGORY:

stack of .tiff, Increasing use morphometrics, key mesh vs. raw commentary by
polygon mesh in invertebrates. method in data) and level of Hipsley and Sherratt
such as .ply, cyberspecimen resolution (2019).
.bend, .obj) approaches
DNA sequences Regularly used DNA barcodes will Very low to low Yes, several very well Sequences deposited in
(Sanger) (e.g., for most continue to drive depending on the established ones databases are not
MIRALLES ET AL. — REPOSITORIES FOR TAXONOMIC DATA
Systematic Biology

.fasta, .fastq, organism species identification number of loci always curated


.gb) groups, almost and discovery, sequenced postsubmission,
omnipresent in multigene phylogenies leading to mismatches
mycological important for after taxonomic
taxonomy inferring relationships changes.
RNAseq (raw) Not used Potentially useful after High Yes (e.g., Sequence No issues
(e.g., .fastq) read mapping and Read Archive)
variant calling, but
currently rarely used.
(Continued)
1235

Page: 1235
1231–1253
1236
Copyedited by: YS

TABLE 1. (Continued)
Current use in Potential and Storage Established Issues and gaps

[12:58 8/10/2020 Sysbio-OP-SYSB200026.tex]


alpha- prospective use in requirements specialized
taxonomy taxonomy (per specimen) repositories
RNAseq Very rarely used Valuable source of Low Yes (e.g., Assemblies are often not
(assembly) (e.g., sequences for Transcriptome submitted to
.fasta) phylogenomics and Shotgun Assembly repositories, although
species delimitation Sequence they could be a
Database) valuable source of
sequences for
machine-learning
species discovery
pipelines
Amplicon (raw) Not used Not straightforward; High Yes (e.g., Sequence No issues
(e.g., .fastq) requires filtering and Read Archive)
preprocessing
Amplicon Not used Metabarcoding data Low Not really. Sequences OTU consensus
(consensus helps ascertaining >200 bp could be sequences from
OTUs) (e.g., distribution and submitted to metabarcoding studies
.fasta) ecology of taxa GenBank. are in most cases not
submitted to a
repository, but could
be important for
SYSTEMATIC BIOLOGY

DNA-based
assessments of
distribution of taxa;
MANUSCRIPT CATEGORY:

targeted and
searchable repositories
do not exist (GenBank
does not accept
sequences <200 bp)
Bait capture—raw Not used Only usable after High Yes (e.g., Sequence No issues
Systematic Biology

(e.g., .fastq) assembly Read Archive)


Bait capture— Rarely used (e.g., Very valuable source of Low to moderate Yes, well established, Similar as for Sanger
assembled (e.g., sequencing of sequences for same ones as for sequences
.fasta) historical types) phylogenomics and Sanger sequences
species delimitation—
next-generation DNA
barcoding
(Continued)
VOL. 69

Page: 1236
1231–1253
2020
Copyedited by: YS

[12:58 8/10/2020 Sysbio-OP-SYSB200026.tex]


TABLE 1. (Continued)
Current use in Potential and Storage Established Issues and gaps
alpha- prospective use in requirements specialized
taxonomy taxonomy (per specimen) repositories
Genomes—raw Not used Only usable after Very high Yes (e.g., Sequence Similar as for Sanger
(e.g., .fastq) assembly Read Archive) sequences
Genomes— Very rarely used Valuable source of High Yes Similar as for Sanger
assembled (e.g., sequences for sequences
.fasta) phylogenomics and
species delimitation
Maldi-TOF (e.g., Sometimes used Useful for Moderate to very No Requires development of
.raw, .mzXML, in mycology; chemotaxonomic high, depending standards for
.mzML) commonly in approaches of storage of repositories and
prokaryotes. spectra vs. raw submitters
data.
Near-infrared Not used Possibly useful for Moderate No Requires development of
spectroscopy “metabolomic standards for
(e.g., .snirf, .csv, barcoding” repositories and
.spc, and many submitters
others)
GC-MS/ (e.g., Sometimes used Useful for Moderate to very No Requires development of
.raw, .cdf, .D, in mycology; chemotaxonomic high, depending standards for
MANUSCRIPT CATEGORY:

.mzxml) commonly in approaches, e.g., fatty of storage of repositories and


prokaryotes. acid profiling in spectra vs. raw submitters; reference
yeasts, and in bacterial data. databases do exist.
taxonomy
NMR/TLC/HPLC Rarely used (e.g., Possibly useful for Moderate to very No Requires development of
(e.g., .raw , TLC and HPLC chemotaxonomic high, depending standards for
MIRALLES ET AL. — REPOSITORIES FOR TAXONOMIC DATA
Systematic Biology

.data, ..cdf) in lichenology) approaches of storage of repositories and


spectra vs. raw submitters
data.
(Continued)
1237

Page: 1237
1231–1253
1238
Copyedited by: YS

[12:58 8/10/2020 Sysbio-OP-SYSB200026.tex]


TABLE 1. (Continued)
Current use in Potential and Storage Established Issues and gaps
alpha- prospective use in requirements specialized
taxonomy taxonomy (per specimen) repositories
Sounds (e.g., .wav, Regularly used in Very useful for species Moderate to high, Yes Most repositories do not
.mp3) sound- delimitation of depending on file feature user-friendly
producing sound-producing format and submission procedures
animals animals sound duration and often data are not
open access
Videos (.avi, .mov, Very rarely used Limited value Moderate to very Yes Extend image databases
.mp4) (e.g., to high, depending to accept videos if
document on definition and linked to specimens
specific duration
behavior)
Measurements Regularly used Very useful basic data Very low No Requires development of
(e.g., .csv, .xls. for diagnosis and standards for
.txt) identification of repositories and
species submitters
SYSTEMATIC BIOLOGY

2D geometric Very rarely used Increasingly used for Very low to low Yes No issues
morphometric resolving species
MANUSCRIPT CATEGORY:

data sets (e.g., complexes


.csv)
3D geometric Very rarely used Increasingly used for Very low to low Yes No issues
morphometric resolving species
data sets (e.g., complexes, especially
.csv) in combination with
Systematic Biology

microCT scans
Note: Note that the second column specifically focuses on the current use of data types in alpha-taxonomic studies (mostly based on our survey reported below), not other taxonomy-related activ-
ities such as species identification or phylogenetics. Storage capacity required per specimen: very low (<0.1 MB), low (0.1–1 MB), moderate (1–10 MB), high (10–100 MB), and very high (>100 MB).
VOL. 69

Page: 1238
1231–1253
Copyedited by: YS MANUSCRIPT CATEGORY: Systematic Biology

2020 MIRALLES ET AL. — REPOSITORIES FOR TAXONOMIC DATA 1239

FIGURE 3. Two categories of data linked to a specimen: metadata and taxonomic data. While specimen metadata from museum catalogs are
increasingly made digitally available, the scarceness of specialized specimen-based data repositories adapted to the wide range of taxonomic
data types is a limitation for the development of digital taxonomy. Additionally, “metadata of taxonomic data” (not shown) are associated with
the taxonomic data (e.g., device used, methodology, author name, and date of the measurement).

FIGURE 4. Overview of data types, transformations, and specification of information in the process of specimen-based alpha-taxonomic
research. Paleontological samples can be considered to be already “fixed” for the purposes of this graphic, by the process of fossilization.

[12:58 8/10/2020 Sysbio-OP-SYSB200026.tex] Page: 1239 1231–1253


1240
Copyedited by: YS

[12:58 8/10/2020 Sysbio-OP-SYSB200026.tex]


TABLE 2. Properties of different kinds of specimen-based taxonomic data
a. Raw vs. encoded taxonomic data.
These two categories differ by the quantitative and qualitative nature of the information they convey, and consequently by their ease and cost of storage.
Description Raw taxonomic data: One of the multiple facets characterizing a specimen Encoded taxonomic data: Data already interpreted.
captured by a sensor (e.g., camera, sound recorder, scanner, DNA sequencer).
Allows one to represent the different properties of a virtual specimen (e.g.,
coloration, shape, size, structure, texture, chemical composition, bioacoustics
properties).
Strengths Free of interpretation. Containing much information. Different forms of encoded Although specific file types may exist, it is always possible (and easy) to translate
data can be extracted, including by future methods that do not yet exist. them into a suite of alphanumeric characters compatible with a universal format
(such as .csv). Small files, minimal storage cost.
Weaknesses Cannot be directly used as input for analyses (need to be interpreted and encoded, Information restricted to a minimal level. Subjectivity: alternative interpretations
either by a human or artificial intelligence). Different and specific storage (or coding errors) are possible.
formats. Large files and high storage cost.
Examples Photographs, sound and video recordings, microCT scans, chromatograms Quantitative measurements, qualitative traits encoded numerically or described in
depicting DNA sequences. natural language, nucleotide or amino acid sequences, morphometric
landmarks.
SYSTEMATIC BIOLOGY

b. Taxonomic data of unique vs. multiple specimens.


These two categories of data differ in the way they are (i) submitted to repositories, (ii) searched for (a particular specimen nested in a multiple-specimen data set has to be detectable using
basic search options), and (iii) presented and downloaded on the repository interface. Ideally, for multiple specimen data sets, it should be possible to download either the data measured
MANUSCRIPT CATEGORY:

for a particular specimen only, or the whole data set.

Description Unique specimen data: Data or set of data concerning a single specimen. Most Multiple-specimens data sets: Set of data concerning particular trait(s) measured
often it consists of raw taxonomic data (see above). for several specimens. Most often it consists of encoded data.
Strengths Specific (individual) searches are easy. Submission of large data set at once.
Weaknesses Case by case treatment unrealistic where specimen numbers are >> 102 Stringent search and data extraction might be compromised by an inadequate data
Systematic Biology

archiving process.
Examples Picture(s) of a specimen, complete mitogenome sequence of a given individual. Tabular data (.csv), DNA alignment (.fasta, .nex).
VOL. 69

Page: 1240
1231–1253
Copyedited by: YS MANUSCRIPT CATEGORY: Systematic Biology

2020 MIRALLES ET AL. — REPOSITORIES FOR TAXONOMIC DATA 1241

decade (Le Bras et al. 2017, constantly updated online at (Supplementary Appendix S5 available on Dryad). The
https://edition-humboldt.de), although so far only 16% average publication named 1–2 (fungi, plants, protists,
have field-collecting information (label data) associated vertebrates) or 3–4 (insects and other invertebrates) new
with them. Important efforts are also being made on species (Fig. 7; original data in Supplementary Appendix
several entomology collections (specimen images and S6 available on Dryad).
metadata; e.g., Dietrich et al. 2012). So far, however, only In this survey, we more restrictively considered the use
an estimated 2% have been digitized (Short et al. 2018). of a certain kind of data only if it was explicitly part of
To allow taxonomists to efficiently access, use and reuse the arguments supporting a taxonomic change (usually
these data, individual specimen identifiers are essential the description of a new species). The use of molecular
(Page 2016; Güntsch et al. 2018), and consequently, evidence, newly generated or from other sources, was
priority efforts are usually directed towards providing
similar to our Web of Science survey (Fig. 6). In the
specimen identifiers to type specimens and accordingly
specialized taxonomic journals (4113 studies), molecular
adding labels to the physical types in the collection.
Surprisingly, the International Code of Zoological Nomen- data were widely used in mycology, but much less so
clature (Anonymous 1999) does not require individual in botany and zoology (Supplementary Appendix S7
identifiers for type specimens. available on Dryad): in 2018 papers, DNA sequence
analysis was used in 94% of taxonomic studies of fungi,
53% of vertebrates, 15% of plants, and 10% and 14%
QUANTIFYING THE KINDS OF DATA USED AND PRODUCED IN of insects and other invertebrates (Fig. 7). Surprisingly,
ALPHA-TAXONOMY even in works on protists, which are difficult to identify
morphologically, genetic evidence was used in only 29%
To understand which repositories and storage capa-
of the 66 surveyed papers, although our Web of Science
cities are needed for taxonomic data we quantitatively
survey suggested otherwise (Fig. 6). Even the frequent
assessed the number of alpha-taxonomic studies and the
kinds of data produced in them. An updated summary DNA use in mycology suggested by our survey may be an
of numbers of studies naming new insects, plants, overestimate because many fungi are described in other
mollusks, fungi, and vertebrates from 1950 to 2016 (Fig. 5) specialized journals not surveyed here, mostly without
illustrated a noticeable increase after 1966 for insects, molecular data. Comparing papers from 2002, 2010, and
with >8000 new species named per year, while in plants, 2018, an increase in the use of molecular evidence is
a peak was apparent in the 1980s. Species discovery and apparent for all organismal groups (Fig. 7).
naming in fungi has been undergoing a striking increase Photographic images were used in >80% of the
since 2010 (see also Cannon et al. 2018), whereas for papers in all categories in 2018, whereas other sets of
vertebrates numbers have risen more continuously. data (extensive morphometric data sets or 3D-imagery)
Molecular data are at the core of a modern, integrative were only rarely used and almost restricted to studies
taxonomy (Padial et al. 2010). To assess their impact, on vertebrates. Specifically, in the entire set of 4113
we undertook a systematic search in Web of Science papers, only 17 studies used microCT-scanning, 2 used
using a combination of search terms to detect alpha- synchrotron-based visualization, 6 used other kinds of
taxonomic studies referring to molecular data during 3D-visualization, 14 used X-ray images, and 1 used
the years 1990–2018 (search terms: molecular, DNA, videos. Besides macroscopic photos, microscopy and
gene; details in Supplementary Appendix S4 available microscopy-produced images were used frequently: 670
on Dryad). The results confirm a raise in the explicit (16%) studies used electron microscopy (SEM or TEM)
use of molecular evidence across all groups (Fig. 6). and 709 (17%) used light microscopy. Classical drawings
Mycologists and protistologists mention molecular data were part of 2371 (58%) of the 4113 studies.
in >75% of their taxonomic studies in 2018, whereas Genome-scale data sets (e.g., RADseq, Sequence cap-
this was the case for only 33% of insect and 26% of ture, RNAseq, full genomes) in 2018 were only used in
plant studies. Such an increasing use of DNA sequences one paper in mycology (a draft genome), and not at all in
in taxonomy likely reflects a growing tendency to take zoology or in botany. Similarly, metabolomics data were
evolutionary concepts into account during the species rare in the surveyed papers in 2018: one publication using
delimitation process, even if only implicitly. NIR spectra in entomology, one using NMR spectra
We next undertook a survey of 4178 alpha-taxonomic in mycology, and one using peptide fingerprints in
studies (published in 2002, 2010, and 2018) that involved vertebrate zoology.
scientific naming of species. Each of these was manually Several other kinds of molecular data were used in
screened, and kinds of data used in the respective a moderate proportion of the 4113 papers: cytological
study were tabulated, along with a series of metadata techniques from cell descriptions to flow-cytometric
for each paper. We surveyed the taxonomic journals determination of ploidy and genome size (n = 329),
Phytotaxa, Zootaxa, Systematic Botany, and Mycological karyotypes (n = 34), fragment analysis (microsatellites,
Progress, and six generalist journals with higher-impact AFLP, RFLP, n = 10), allozymes (n = 3), and chemo-
factors (Nature, Science, PNAS, PLoS One, Scientific taxonomic approaches including analysis of cuticular
Reports, and the Biological, Botanical and Zoological hormones or metabolites (n = 17) and GC-MS or HPLC
Journal of the Linnean Society, for alpha-taxonomic studies metabolite profiles (n = 4).

[12:58 8/10/2020 Sysbio-OP-SYSB200026.tex] Page: 1241 1231–1253


Copyedited by: YS MANUSCRIPT CATEGORY: Systematic Biology

1242 SYSTEMATIC BIOLOGY VOL. 69

FIGURE 5. Species named per year for the study period. Insects and mollusks (ION—organismnames.com), fungi (MycoBank—
mycobank.org), plants (IPNI—ipni.org), vertebrates (compiled from Eschmeyer’s Catalog of Fishes, Amphibian Species of the World:
Frost 2019, Reptile Database, Howard, and Moore Bird Checklist: Christidis 2018, Mammal Diversity Database; all accessed in March
2019: calacademy.org/scientists/projects/eschmeyers-catalog-of-fishes, research.amnh.org/vz/herpetology/amphibia/, reptile-database.org,
mammaldiversity.org). The gray-shaded windows indicate the time frames for which our surveys of data types were carried out. Vertebrate
numbers refer to currently accepted species, whereas for the other taxa, also species currently considered as synonyms are included. Furthermore,
the ION data (insects) also include subspecies. Note that paleontological studies were excluded from our survey.

FIGURE 6. Comparison of the frequency of use of molecular data in taxonomic studies naming new species, in various groups of organisms,
given as proportion of taxonomic papers retrieved from a semantic search on Web of Science, measured every 2 years. Molecular data were
considered as contributing to every study based on a combination of search terms (cf. details in the Supplementary Appendix S4 available on
Dryad). Data do not necessarily reflect absolute numbers due to inaccuracy involved with keyword searches but primarily serve as a comparison
among organism groups.

[12:58 8/10/2020 Sysbio-OP-SYSB200026.tex] Page: 1242 1231–1253


Copyedited by: YS MANUSCRIPT CATEGORY: Systematic Biology

2020 MIRALLES ET AL. — REPOSITORIES FOR TAXONOMIC DATA 1243

FIGURE 7. a1) Histograms indicating the proportion of alpha-taxonomic studies that have implicated different categories of data in specialized
taxonomic journals in 2002, 2010, and 2018. Each series of bars in a and b corresponds to values (from left to right) of plants, fungi, vertebrates,
invertebrates, and protists. a2) Same statistics for generalist journals (only 2018 is represented for these journals, the number of papers dealing
with alpha-taxonomic issues being negligible for 2002 and 2010, with only 3 and 5 new species during these 2 years). (b) Mean number of species
named per article in 2002, 2010, and 2018 in specialized taxonomic journals and generalist journals. c) Proportion of articles with a taxonomic
component involving molecular data as a function of the number of authors. Specialized taxonomic journals are represented by a selection
of four journals with a strong taxonomic component: Mycological Progress (Mycology), Phytotaxa and Systematic Botany (Botany), and Zootaxa
(Zoology). Taxonomic works dealing with protists are shared among these journals. These journals belong to the top journals with taxonomic
orientation and were selected according to our subjective opinion. The generalist journal category includes PLoS ONE, Scientific Reports, Nature,
Science, Biological, Zoological and Botanical Journals the Linnean Society, and PNAS. “DNA” refers to mitochondrial or nuclear sequence data sets,
“Photography” to classical photography plus pictures generated by light and electron microscopy, “Morphometry” to all sets of measurements
realized and reported with a comparative perspective on a set of several specimens, and “3D imagery” to every study that generated data using
tomographic methods (mostly 3D X-ray CT, plus one paper using synchrotron radiation CT). See details in Supplementary Appendices S6–S8
available on Dryad.

[12:58 8/10/2020 Sysbio-OP-SYSB200026.tex] Page: 1243 1231–1253


Copyedited by: YS MANUSCRIPT CATEGORY: Systematic Biology

1244 SYSTEMATIC BIOLOGY VOL. 69

Of other kinds of data, measurement-based morpho- in the journal in 2018, this corresponds to an average
metric analysis was used relatively frequently (348 stud- of six DNA sequences per taxonomic study. While
ies), whereas landmark-based 2D- or 3D-morphometry this may be an underestimate because taxonomists
was rarely applied (7 studies only); 9 studies used often report the results of their molecular phylogenetic
geographical models; 13 reported or analyzed extensive studies separately in higher-impact journals, the overall
ecological data sets, including variables ranging from picture is that taxonomy is not yet fully embracing
climate to culture media; 78 studies used analysis of the opportunities offered by the analysis of genetic
sounds (of vertebrates and insects); and 5 used analyses data.
of electric waves, vibrations, and similar signals. Our analysis indicates that images are the most
Our survey may be biased against innovative and universal data type produced in alpha-taxonomic work.
groundbreaking taxonomic discoveries because those This is true of all regions of the world (Supplementary
are often published in generalist journals of higher- Appendix S9 available on Dryad). As a conservative
impact factor. The data we obtained from the gener- estimate, 10 images may typically be produced of
alist journals surveyed (Supplementary Appendix S5 the holotype and paratypes of a new species and
available on Dryad) confirmed this suspicion, with published as part of the taxonomic study. Mostly, these
are photographs and drawings, sometimes scanning
69% of papers on all organismal categories discussing
electron microscopy (SEM). We may assume that in
DNA data. Overall, molecular data were rare in papers
comprehensive revisionary studies, up to 100 images
published by single authors, whereas papers published
(of comparative voucher specimens, or of different
by larger author teams mentioned such data more fre- morphological characters) will be produced per newly
quently (Fig. 7c, Supplementary Appendix S8 available named species. Most are probably neither published
on Dryad). Taxonomists from each of five global regions nor submitted to repositories. Assuming again 20,000
use similar proportions of the data types (2D imagery new species named per year (Fig. 1), and a bound
> DNA > morphometrics > 3D data; Supplementary of 100 images per new species, this leads to an
Appendices S9 and S10 available on Dryad). estimated ≤2 million images produced per year in the
The journals Zookeys and Phytokeys, established in context of alpha-taxonomic studies. Considering that
2008 and 2014 respectively, and hence not included in Instagram alone hosts more than 50 billion images
our main survey, encourage data sharing and auto- and accepts more than 100 million new images per
matic linking of metadata, and the aims of Zookeys day (www.omnicoreagency.com, accessed January 19,
(zookeys.pensoft.net, accessed 22 August 2019) include 2020), the yearly storage capacity required for taxonomy-
the “preservation of digital materials to meet the highest specific images produced in alpha-taxonomic research
possible standards of the cybertaxonomy era.” Yet, the appears manageable and in the short term is smaller
general pattern of data use in these two journals so far than that needed for intensive digitization campaigns
does not differ from that in other outlets. In 2018, for all 83 of natural history museums and herbaria (e.g., Le Bras
alpha-taxonomic papers published in Phytokeys, and 100 et al. 2017).
randomly chosen ones published in Zookeys, molecular
data were implicated in 29% (botany), 22% (entomology),
and 50% (vertebrates). Despite innovations such as
semantic markup or tagging, a method that assigns USEFUL DATA FOR NEXT-GENERATION TAXONOMY
markers, or tags, to taxonomic names, gene sequences, Our survey revealed that taxonomists in their routine
localities, designations of nomenclatural novelties, and alpha-taxonomic work do not make systematic use
so on (Penev et al. 2018), standardization and sharing of large omics data sets or 3D imagery. A rise in
of raw data are far from being widely implemented the use of such advanced molecular and imagery
in taxonomy. For instance, only 2.5% of all the GBIF- data sets, however, is likely, especially as these
mediated occurrences for the 24 classes of organisms methods become more affordable and as images
surveyed by Troudet et al. (2018) were linked to digital of the type specimens of new names may become
data and 1.5% to DNA sequences, and outlets such required by the codes of nomenclature. Taxonomists’
as the Biodiversity Data Journal (Smith et al. 2013) that requirements for data and metadata formats, however,
try to redefine taxonomic papers as sources of data go beyond DNA sequences and images. Verifiability
rather than narratives, remain an exception—probably of taxonomic work may sometimes require the
not only because of technological limitations but also archiving of computer memory-intensive raw data
motivational factors (Hipsley and Sherratt 2019). of genomic and transcriptomic studies, for example, in
How many DNA sequences are produced in the the NCBI-SRA Sequence Read Archive, but assemblies,
context of taxonomic research? We used Zootaxa as especially if findable via a specimen identifier and
accompanied by specimen metadata, may be more
a benchmark, representative of a large amount of
important. So far, however, assemblies especially of
contemporary taxonomic work. For 2015–2018, numbers RNAseq experiments are often not submitted to the
of sequences deposited in NCBI-GenBank (accessed Transcriptome Shotgun Assembly Sequence Database
August 22, 2019) with a Zootaxa reference varied between (https://www.ncbi.nlm.nih.gov/genbank/tsa/) or
8662 and 14,073 per year (Supplementary Appendix other specialized repositories in a searchable format
S11 available on Dryad). With 2321 papers published (Moreton et al. 2015).

[12:58 8/10/2020 Sysbio-wwwomnicoreagency.com] Page: 1244 1231–1253


Copyedited by: YS MANUSCRIPT CATEGORY: Systematic Biology

2020 MIRALLES ET AL. — REPOSITORIES FOR TAXONOMIC DATA 1245

Geographical occurrence data, also extremely import- of them highly specialized (Louis et al. 2002; Pampel
ant for taxonomic work, are available from GBIF et al. 2013). Of the few generalist repositories, some are
(https://www.gbif.org/; 1.3 billion records as of not free of charge, and many do not provide curated
September 2019) or Map of Life (https://mol.org/) and metadata that would allow informed searches (Assante
furthered also by citizen science portals (e.g., iNatural- et al. 2016). Many scientific journals in the life sciences
ist, https://www.inaturalist.org/), but metabarcoding now recommend data repositories for archiving
data, which include occurrence records of morpho- the data that accompany a paper (e.g., the journal
logically cryptic or microscopic taxa including fungi, Scientific Data on behalf of Springer Nature journals:
protists, or small invertebrates, are so far not stored https://www.nature.com/sdata/policies/repositories,
in a retrievable way. This is because the focus has or PLoS: Public Library of Science Recommended
been on archiving the raw sequence reads rather than
Data Repositories; DOI: 10.25504/FAIRsharing.t2exm).
the consensus OTU sequences that could be reused by
taxonomists. Standards for metabarcoding data should Dedicated registries have been developed to searching
therefore include the archiving of quality-filtered con- repositories for specific kinds of data (e.g., re3data.org/
sensus reads in a searchable format, preferably as species and fairsharing.org/), with the FAIR data principles—
hypotheses linked to DOI numbers (Tedersoo et al. 2015). data should be Findable, Accessible, Interoperable, and
Lastly, chemotaxonomy is routine in the taxonomy Reusable—as a framework (Wilkinson et al. 2016) and
of prokaryotes (Stackebrandt and Smith 2019), is often measurable metric (Wilkinson et al. 2018). Taxonomic
used in fungi (Frisvad et al. 2008), has proven useful data repositories should be (i) free of charge for data
in several classification approaches in plants (Wink contributors, (ii) user-friendly, with a low-complexity
et al. 2010), and may be useful for some insects (Kather submission workflow, not requiring affiliation to
and Martin 2012) and vertebrates (Poth et al. 2012; academic institutions and not requiring cumbersome
Starnberger et al. 2013). According to our survey, it is registration or login procedures, and (iii) including
rarely used in alpha-taxonomic studies of nonfungal careful and prompt quality-checks of submissions by
eukaryotes today, but metabolomic or proteomic profiles dedicated data curators. This is particularly important
(Steinmann et al. 2013; Rossel and Martínez 2019) and because a substantial proportion of the estimated
NIR spectra (Rodríguez-Fernández et al. 2011; Kinzner 30,000–40,000 taxonomists worldwide (Haas and
et al. 2015) have proven useful in large-scale species Häuser 2007) lack data management expertise and
identification and discrimination. Chemotaxonomic support as they often work as single authors or small
data traditionally play an important role in lichenized teams (Knapp 2008; Joppa et al. 2011) and in many
fungi (Lumbsch 2002), and mycologists distinguish cases are nonprofessional researchers (Hopkins and
species by HPLC profiling (Kuhnert et al. 2017; Freckleton 2002; Fontaine et al. 2012).
Helaly et al. 2018) and sometimes higher taxa based Ideally, taxonomic repositories should be able to
on secondary metabolites (Wendt et al. 2018). The handle universally unique identifiers to refer to speci-
retention factors of known chemotaxonomic markers mens (Guralnick et al. 2015; Güntsch et al. 2018; Nelson
in standard thin-layer chromatography systems are et al. 2018; Triebel et al. 2018). At present, however, a
stored in the LIAS database (http://www.lias.net/). For mandatory use of such identifiers for submission of taxo-
spectroscopic data including GC-MS, the NIST database nomic data is unrealistic because, as we have explained
(https://www.nist.gov/pml/atomic-spectra-database) above, (i) they do not yet exist for many collections
provides reference spectra for many plant metabolites and (ii) the best way of numbering bulk collections
but does not act as a repository. Chemotaxonomy is still unclear. For data reuse to be encouraged and
can be aided by commercial databases like DNP, facilitated in taxonomy and by its end users, emphasis
(http://dnp.chemnetbase.com/), which contains should be on making data and metadata available
comprehensive information about the occurrence in highly standardized formats, enhancing comparab-
and distribution of secondary metabolites across ility across taxonomic studies. Metadata should thus
all organism kingdoms but these databases are not include a specimen identifier in best-practice format
open access and incur considerable license fees. for the respective group of organisms, in addition to a
Metabolomic and chemotaxonomic repositories do species-level name (accepted or candidate species) and
exist (e.g., Tsugawa et al. 2019) but the underlying information on geographic location, if possible including
raw data may vary in quality and quantity depending geographical coordinates. Usage of standards defined
on the applied technological sensitivity, and thus in the Darwin Core or ABCD (Holetschek et al. 2012;
may not be readily searchable or comparable across Wieczorek et al. 2012) would be highly advisable. In
platforms. general, however, the submission procedure should keep
mandatory metadata to a minimum but provide an
extensive, standardized list of optional metadata, as
CRITERIA FOR TAXONOMIC DATA REPOSITORIES in the minimum checklist concept of the Minimum
The importance of data repositories becoming part Information about any (x) Sequence (MIxS) for DNA data
of the routine taxonomic research workflow was (Yilmaz et al. 2011).
recognized almost 20 years ago (Louis et al. 2002; Lynch Taxonomy is firmly grounded in history. Studies
2008). Today, there is a plethora of repositories, many published 100 or 200 years ago are regularly consulted

[12:58 8/10/2020 Sysbio-wwwomnicoreagency.com] Page: 1245 1231–1253


Copyedited by: YS MANUSCRIPT CATEGORY: Systematic Biology

1246 SYSTEMATIC BIOLOGY VOL. 69

by taxonomists today and so are voucher specimens A final criterion for taxonomic data repositories is
collected over centuries (see also Venu and Sanjappa flexibility in format because of the diversity of taxonomic
2011). The principal task of natural history museums data (above and Figs. 3 and 4). To reflect this diversity,
and herbaria is to preserve biological materials in data submission should allow for user-defined metadata
perpetuity. The rapid technological turnover of the formats, but enforce the use of Darwin Core or ABCD
digital era therefore elicits concerns in the taxonomic standards (Holetschek et al. 2012; Wieczorek et al.
community (e.g. Dubois 2003; Padial and De la Riva 2012; Cicero et al. 2017) where applicable and not
2007): can data storage be ensured for “perpetu- impose restrictions on the number of data files to
ity”? This concern may be alleviated by data repos- be submitted. None of the 15 taxonomic repositories
itories acquiring a certificate, like the CoreTrustSeal reviewed for this article meet all 12 of the needs
(https://www.coretrustseal.org/), which certifies that and criteria assessed (Tables 3 and 4, Supplementary
they are sustainable and trustworthy. Because museums Appendix S12 available on Dryad). Some criteria,
and herbaria already provide long-term storage and especially free and open access, are fulfilled by most
careful curation of specimens, their data centers are also repositories, but taxonomy-specific options for sub-
the ideal location for long-term repositories of specimen- mission or search are not. As examples, the lead-
associated data, certified under even stricter rules such ing repositories in the field of molecular data (Gen-
as requiring a well-defined exit strategy defining where Bank, http://www.ncbi.nlm.nih.gov/genbank; DDBJ,
https://www.ddbj.nig.ac.jp; ENA, https://www.ebi.
the data will be archived if the repository ceases to
ac.uk/ena) seem to be compliant with most of the criteria
exist (Table 3).
in Table 3. In contrast, taxonomy-specific repositories, for
Taxonomic data repositories should include (i) the
instance those for bioacoustic recordings in amphibian
option of complex advanced searches with elaborate
taxonomy (Köhler et al. 2017), do not make data openly
combinations of inclusion and exclusion of search terms available for reuse.
(and/or an API), (ii) semantic (contextual) searches
for finding species under synonymous names, (iii)
fuzzy searches allowing for different spelling variations
e.g. of specimen identifiers, and (iv) the option to RECOMMENDATIONS AND CONCLUSIONS
search a repository through other, general portals The last decades have seen a massive increase
like GBIF (gbif.org) or GFBio (gfbio.org). Searches of taxonomic cyber-infrastructure, delivering crucial
that include taxon names could be facilitated by the services to many end users. Only a minor fraction
possibility to access established taxonomic backbones, of this infrastructure has, however, been specifically
such as the NCBI taxonomy (Federhen 2012), GBIF, conceived to support the alpha-taxonomic workflow
or the many databases underlying the Catalogue of itself. Taxonomists themselves need to become more
Life (http://www.catalogueoflife.org/), or ideally to a involved with the development of tools to integrate
dynamic database providing a Global Names Architec- the existing resources into their operational pipelines.
ture (Pyle 2016). Perhaps most important are data portals to retrieve and
Large-scale taxonomic studies are often impeded by submit specimen-based data. Via customized searches,
the sheer amount of data that need to be compared. a taxonomic portal fully dedicated to aggregating data
The problem is compounded by an inherent conflict based on specimen identifiers would retrieve all data
between the two main interests of taxonomy—quality in real time—DNA sequences, images, current species
and speed of delimitation (Sangster and Luksenburg attribution—available for a specimen across distributed
2015). Probabilistic tools for (semi-)automated species repositories and databases, thus coming close to the
delimitation relying on high-quality data repositories cyberspecimen concept. Distributed collection catalog
might help. A few such tools have been developed, portals, in particular VertNet (http://vertnet.org/),
including Structure (Pritchard et al. 2000), GMYC (Pons already have implemented many of the search options
et al. 2006), Haploweb (Flot et al. 2010), ABC (Camargo needed by taxonomists and could be successively expan-
et al. 2012), ABGD (Puillandre et al. 2012), RESL ded (Cicero et al. 2017). Connecting such a catalog
(Ratnasingham and Hebert 2013), and PTP (Zhang et al. to molecular data repositories, especially GenBank
2013), but they all rely on DNA data and do not integrate (https://www.ncbi.nlm.nih.gov/genbank/) or the Bar-
other taxonomic evidence (Edwards and Knowles 2014). code of Life (http://www.boldsystems.org/), whose
Examples of programs for automated integrative species structure fits our criteria for taxonomic data repositories
delimitation (including information from geography or quite well (Table 4) seems to be a logical first step.
morphology) are Geneland (Guillot et al. 2005) and Repositories should also be linked with taxonomic
iBPP (Solís-Lemus et al. 2015). In the future, initial databases in a flexible way, allowing data to be retrieved
species delimitation hypotheses could be elaborated by not only under the current taxonomic name but also
probabilistic (machine-learning) algorithms that make in nomenclatural and perhaps taxonomic synonym
full use of data from different repositories. For this to searches. A closer collaboration of taxonomists with the
work, data in repositories need to be machine-accessible, data scientists working on large cybertaxonomy projects
standardized, reviewed, georeferenced, and current. in the same institutions may create unexpected synergies

[12:58 8/10/2020 Sysbio-wwwomnicoreagency.com] Page: 1246 1231–1253


Copyedited by: YS MANUSCRIPT CATEGORY: Systematic Biology

2020 MIRALLES ET AL. — REPOSITORIES FOR TAXONOMIC DATA 1247

TABLE 3. Criteria relevant for specimen-based taxonomic data repositories.


Priority Criterion Explanation
1 Specimen-based data structure As alpha-taxonomy is centered on specimens, the repository structure must allow for the
identification of data from specimen numbers. Both submission and retrieval/search must
include a specimen identifier option.
2 Sustainability—certainty of perpetual data The naming of organisms is based on the principle of historical priority, and in taxonomy,
storage publications and data do not lose importance over time. The long-term availability of
taxonomic data is therefore a sine qua non condition for repositories. This include, but is not
limited to, long-term funding (preferably permanent), adequate data backups and if possible,
existence of mirrors and contingency strategies.
3 Adherence to the FAIR principles The principles of findable, accessible, interoperable, and reusable are partly overlap with the
more specific conditions listed in this table; still, overall adherence to the FAIR principles
constitutes an important criterion, measurable by “Fair Metrics” (Wilkinson et al. 2018).
4 Free of charge for data submitters and open Many taxonomists do not have access to institutional funds, and many taxonomic journals do not
access for data users cover repository fees. To be successful in capturing an increasing proportion of
taxonomy-related data, a repository must not charge data submission fees.
5 User-friendly low-complexity workflow for data Time-consuming submission procedures act as strong deterrent in convincing the large
submission community of taxonomists (including amateurs) of the value of making their data available.
Furthermore, given the enormous differences among collections in defining and labeling
specimens, data-deficient historical specimens, and nonstandardized collections across the
world, the amount of mandatory data fields for submission should be minimal (specimen
identifier, species name, geographic location).
6 Submission and storage of data packages from Taxonomists typically revise a group of organisms by examining specimens from collections held
multicollection sets of specimens by multiple institutions, often from different countries and continents. Repositories should
allow for coherent data packages containing such multicollection data rather than institutional
or national repositories restricting data to those from their collection or country.
7 Data submission portal with options for Even if a repository allows for specimen identifiers, the submission tools are often not optimized
taxonomic (specimen-based) data for taxonomy-related data. Ideally, a repository should allow bulk submissions of many kinds
of data (e.g., DNA sequences, images), linked to specimen identifiers by a separate metadata
table.
8 Machine-accessible for automated data retrieval Given the prospect of machine-learning tools for species delimitation and species identification,
the information in a repository should be automatically retrievable and readable through the
web.
9 Link to taxonomic databases for species The assignment of species names to taxonomic data is secondary because these names are bound
identifiers, synonymies, etc. to change over time. Yet, to facilitate their retrieval, data should be associated as much as
possible with accepted and valid genus and species names. Through dynamic links to
taxonomic databases, entries can be assigned to species names even if originally entered under
different synonyms, declensions, or combinations.
10 Compliance with taxonomic data standards While allowing for flexibility and enforcing only a minimal number of metadata fields per data
item is preferable, repositories for taxonomic data should ideally be structured in agreement
with international taxonomy standards: metadata field names should agree with Darwin Core
or ABCD terminology, specimen identifiers should allow for CETAF standards.
11 Manual search options tailored to the needs of To reflect variation of taxonomic questions, advanced, semantic, and fuzzy searches are desirable.
taxonomists
12 Data searches possible through other portals Repositories should be favored for taxonomic data if they are linked to overarching data portals
which can be used to search multiple repositories at once.
13 No limitation to number of data files Since data packages for taxonomic monographs may contain data on hundreds or thousands of
specimens, a repository should not enforce an a priori limit on the number of data items per
submission.
14 Wide use and acceptance by the community Reinventing the wheel should be avoided and repositories widely accepted and used by the
community should be preferred, i.e., repositories (i) where many data have already been
submitted by (ii) a large number of different submitters, and (iii) which are listed as standard
and recommended by journals and publishers (e.g., Springer Nature and PLoS lists).

because often, small modifications to existing data- power (Zurowietz et al. 2019). Most of them are already
aggregating portals could substantially improve their equipped with machine-learning functions to automate
utility for taxonomists. some steps in the annotation process. Toolboxes to be
Images are among the most widely produced and used included in taxonomic repositories, or in cyberspecimen
types of data in alpha-taxonomy (Fig. 7). Establishing data portals, could include automatic detection of
portals that allow image repositories to be searched by rulers or scale bars, dynamic continuous zoom, and
specimen identifiers should become a priority. Images measurement tools both for 2D and 3D images.
are semistructured data, and successful managing or Versatile data portals connected to rich taxonomic
searching of such data requires metadata, including spe- data repositories would benefit taxonomists as well
cies identifiers, annotations, scale information, author- as end users of taxonomy. For instance, the progress
ship, and geographical location. New software solutions in computational power and imaging technology
are needed to collect and safeguard this information and on smartphones allows the collection of visual data
the diverse image data. Recently, image annotation soft- and the instant availability of taxonomic knowledge
ware tools have been proposed to support, for instance, on a new scale. There is a boom of cellphone apps
environmental monitoring (Schlining and Stout 2006; that identify species of plants and mushrooms
Kloster et al. 2014; Althaus et al. 2015; Beijbom et al. (e.g., Pl@ntNet, https://identify.plantnet.org/;
2015; Langenkämper et al. 2017). These tools are easy PlantSnap, https://www.plantsnap.com/; Naturblick,
to use and have low requirements of computational http://www.naturblick.naturkundemuseum.berlin) or

[12:58 8/10/2020 Sysbio-wwwomnicoreagency.com] Page: 1247 1231–1253


1248
Copyedited by: YS

[12:58 8/10/2020 Sysbio-wwwomnicoreagency.com]


TABLE 4. Evaluation of a selection of 15 repositories according to 12 of the 14 criteria listed in Table 3 (++ = compliant; + = unsatisfactory or partial; − = not compliant)
Criterion 1. [1.2. 2. 3. 4. [4.2. users] 6. 7. 9. 10. 11. 12. 13. 14.
Specimen- search] Sustain- Compliance Free of Multicollection Submission Link to Compli- Taxo- Access No lim- Wide
based ability with FAIR charge data package portal with taxo- ance nomic through itation use/accep-
structure criteria [4.1. taxonomic nomic with search other to tance by the
[1.1.sub- submit- options data- taxo- options portals number comm-
mission] ters] bases nomic of data unity
data
stand-
ards
DRYAD + − ++ ++ − ++ ++ − − + + − + ++
Figshare + + + ++ ++ ++ ++ − − + − − + +
Macaulay ++ ++ + − ++ + ++ + + ++ − ++ ++ ++
Library
PANGEA + − ++ ++ ++ ++ ++ + − ++ − ++ ++ ++
OSF − − ++ ++ ++ ++ ++ − − − − − ++ +
Morphobank + + + ++ ++ ++ ++ − − ++ − − ++ ++
SYSTEMATIC BIOLOGY

Digimorph ++ + − − ++ + ++ − − − − − + +
Morphomuseum ++ ++ − − ++ ++ ++ − − ++ + − ++ +
Morphosource ++ ++ ++ ++ ++ ++ ++ + − + + − ++ ++
MANUSCRIPT CATEGORY:

IDR + + + ++ ++ ++ ++ ++ − − − + ++ +
Metabolights ++ ++ + ++ ++ ++ ++ + − + − − ++ ++
Genbank ++ ++ ++ ++ ++ ++ ++ + ++ ++ ++ + ++ ++
DDBJ ++ ++ ++ ++ ++ ++ ++ + ++ ++ ++ + ++ ++
ENA ++ ++ ++ ++ ++ ++ ++ + ++ ++ ++ ++ ++ ++
Movebank ++ − + ++ ++ ++ ++ ++ − + − ++ ++ +
Systematic Biology

See details in Supplementary Appendix S12 available on Dryad.


VOL. 69

Page: 1248
1231–1253
Copyedited by: YS MANUSCRIPT CATEGORY: Systematic Biology

2020 MIRALLES ET AL. — REPOSITORIES FOR TAXONOMIC DATA 1249

animals (e.g., https://fieldguide.ai/) or all RE 603/29-1) and benefited from the sharing of
of the above (https://www.inaturalist.org) by expertise within the DFG priority program SPP 1991
automated comparison of photos with large Taxon-Omics.
image collections. Similar apps also exist for
sound-based species identification of birds (e.g.,
ACKNOWLEDGMENTS
SongSleuth, https://www.songsleuth.com/; BirdNet,
https://birdnet.cornell.edu/; BirdGenie, https://press. We are grateful to William N. Eschmeyer, Jon D.
princeton.edu/apps/birdgenie.html; BirdSongID, Fong, Ronald Fricke, Darrel R. Frost, Rafaël Govaerts,
http://isoperla.co.uk/; ChirpOMatic, http://www. Vincent Robert, Peter Uetz, and Richard van der Laan for
chirpomatic.com/), bats (e.g. iBatsID, https:// useful advice and data on rates of species discovery and
sites.google.com/site/ibatsresources/iBatsID), and naming. We thank Christy Hipsley and one anonymous
increasingly also insects (e.g. CicadaHunt, reviewer for constructive feedback on our manuscript.
http://newforestcicada.info/app/). These apps We also thank Steve A. Marshall, Neal Evenhuis, and
impressively demonstrate the potential of computer- Sébastien Soubzmaigne for allowing the use of original
based approaches to species identification and provide photographs.
a glimpse into what may be possible in a future in
which large virtual collections of cyberspecimens
become available to train artificial intelligence REFERENCES
pipelines. Akkari N., Enghoff H., Metscher B.D. 2015. A new dimension
Having reviewed numerous data repositories for in documenting new species: high-detail imaging for myriapod
this study, we propose a pilot submission template in taxonomy and first 3D cybertype of a new millipede species
Supplementary Appendices S13 and S14 available on (Diplopoda, Julida, Julidae). PLoS One 10:e0135243.
Althaus F., Hill N., Ferrari R., Edwards L., Przeslawski R., Schönberg
Dryad, building upon models established by the NCBI C. H., Stuart-Smith R., Barrett N., Edgar G., Colquhoun J., Tran M.,
Sequence Read Archive and (re-)using ABCD terms. This Jordan A., Rees T., Gowlett-Holmes K. 2015. A standardised vocab-
template is currently being tested for the submission ulary for identifying benthic biota and substrata from underwater
of data to the GFBio data centers (Diepenbroek et al. imagery: the catami classification scheme. PLoS One 10:e0141039.
2014). Because taxonomy is intrinsically dependent on Amorim D.S., Santos C.M.D., Krell F.T., Dubois A. 2016. Timeless
standards for species delimitation. Zootaxa 4137:121–128.
long-term availability of data, taxonomists will have a Andrae A.S.G., Edler T. 2015. On global electricity usage of communic-
high motivation to meet the “taxonomic data repository” ation technology: trends to 2030. Challenges 6:117–157.
challenge and to develop concepts of truly sustainable, Anonymous [International Commission on Zoological Nomenclature].
potentially perpetual data storage. The electricity usage 1999. International code of zoological nomenclature. 4th ed. Lon-
and the carbon footprint associated with data storage don: International Trust for Zoological Nomenclature, p. i–xxix +
1–306.
(Andrae and Edler 2015; Jones 2018) may require stand- Assante M., Candela L., Castelli D., Tani A. 2016. Are scientific data
ards allowing submitters to identify which data truly repositories coping with research data publishing? Data Sci. J. 15:6.
merit long-term storage (e.g., to prevent submission Balke M., Schmidt S., Hausmann A., Toussaint E.F.A., Bergsten J.,
of redundant or blurred pictures, or to optimize their Buffington M., Häuser C.L., Kroupa A., Hagedorn G., Riedel A.,
resolution level when it is excessively high). A stringent Polaszek A., Ubaidillah R., Krogmann L., Zwick A., Fikáèek M.,
Hájek J., Michat J.C., Dietrich C., La Salle J., Mantle B.K.L., Ng P.,
archiving strategy of original taxonomic data could Hobern D. 2013. Biodiversity into your hands—a call for a virtual
become an integral part of a renewed procedure to global natural history ‘metacollection’. Front. Zool. 10:55.
name new species—accelerated but without comprom- Beijbom O., Edmunds P., Roelfsema C., Smith J., Kline D., Neal B.,
ising quality of species hypotheses, mobilizing species Dunlap M.J., Moriarty V., Fan T.Y., Tan C.J., Chan S., Treibitz T.,
information through images, DNA sequences, sounds, Gamst A., Mitchell B.G., Kriegman D. 2015. Towards automated
annotation of benthic survey images: variability of human experts
or tabulated trait information, while relieving taxonom- and operational modes of automation. PLoS One 10:e0130312.
ists from manually compiling lengthy descriptions. Bik H.M. 2017. Let’s rise up to unite taxonomy and technology. PLoS
Although words will necessarily remain the means to Biol. 15:e2002231.
justify taxonomic decisions, evaluate species criteria and Bosselaers J., Dierick M., Cnudde V., Masschaele B., van Hoorebeke L.,
(briefly) list diagnostic features of new species, tax- Jacobs P. 2010. High-resolution X-ray computed tomography of an
extant new Donuea (Araneae: Liocranidae) species in Madagascan
onomists should consider moving towards publishing copal. Zootaxa 2427:25–35.
alpha-taxonomic results as interlinked, standardized, Brooke M. de L. 2000. Why museums matter. Trends Ecol. Evol. 15:136–
and openly accessible data sets rather than traditional 137.
descriptive papers. Camargo A., Morando M., Avila L.J. and Sites J.W. 2012. Species
delimitations with ABC and other coalescent-based methods: a
test of accuracy with simulations and an empirical example with
SUPPLEMENTARY MATERIAL lizards of the Liolaemus darwinii complex (Squamata: Liolaemidae).
Data available from the Dryad Digital Repository: Evolution 66:2834–2849.
http://dx.doi.org/10.5061/dryad.fj6q573qd. Cannon P., Aguirre-Hudson B., Aime M.C., Ainsworth A.M., Bidar-
tondo M.I., Gaya E., Hawksworth D., Kirk P., Leitch I.J., Lücking
R. 2018. Definition and diversity. In: Willis K.J., editors. State of the
FUNDING world’s fungi. Report. Kew: Royal Botanic Gardens, p. 4–11.
Ceriaco L.M.P., Gutiérrez E.E., Dubois, A. 2016. Photography-based
This work was supported by the Deutsche taxonomy is inadequate, unnecessary, and potentially harmful for
Forschungsgemeinschaft (DFG, grant number DFG biological sciences. Zootaxa 4196(3): 435–445.

[12:58 8/10/2020 Sysbio-wwwomnicoreagency.com] Page: 1249 1231–1253


Copyedited by: YS MANUSCRIPT CATEGORY: Systematic Biology

1250 SYSTEMATIC BIOLOGY VOL. 69

Chauvel B., Dessaint F., Cardinal-Legrand C., Bretagnolle, F. 2006. amnh.org/herpetology/amphibia/index.html. American Museum
The historical spread of Ambrosia artemisiifolia L. in France from of Natural History, New York, USA (March 15, 2019).
herbarium records. J. Biogeogr. 33:665–673. Garraffoni A.R.S., Freitas A.V.L. 2017. Photos belong in the taxonomic
Christidis L. (Ed.) 2018. The Howard and Moore complete checklist code. Science 355(6327):805.
of the birds of the world, version 4.1 (Downloadable checklist). Gemeinholzer B., Vences M., Beszteri B., Bruy T., Felden J., Kostadinov
Available from: https://www.howardandmoore.org (March 15, I., Miralles A., Nattkemper T.W., Printzen C., Renz J., Rybalka N.,
2019). Schuster T., Weibulat T., Wilke T., Renner S.S. 2020. Data storage
Cicero C., Spencer C.L. Bloom D.A., Guralnick R.P., Koo M.S., Otegui and data re-use in taxonomy—the need for improved storage and
J.. Russell L.A, Wieczorek J.R. 2017. Biodiversity informatics and accessibility of heterogeneous data. Org. Divers. Evol. 20:1–8.
data quality on a global scale. In: Webster M.S., editors. Emerging Gignac P.M., Kley N.J., Clarke J.A., Colbert M.W., Morhardt A.C.,
frontiers in collections-based ornithological research: the extended Cerio D., Cost I.N., Cox P.G., Daza J.D., Early C.M., Echols M.S.,
specimen. Studies in avian biology. Boca Raton, FL: CRC Press, p. Henkelman R.M., Herdina A.N., Holliday C.M., Li Z., Mahlow
201–218. K., Merchant S., Müller J., Orsbon C.P., Paluh D.J., Thies M.L.,
Costello M.J., May R.M., Stork N.E. 2013a. Can we name Earth’s species Tsai H.P., Witmer L.M. 2016. Diffusible iodine-based contrast-
before they go extinct? Science 339(6118):413–416. enhanced computed tomography (diceCT): an emerging tool for
Costello M.J., Wilson S., Houlding B. 2013b. More taxonomists rapid, high-resolution, 3-D imaging of metazoan soft tissues. J. Anat.
describing significantly fewer species per unit effort may indicate 228(6):889–909.
that most species have been discovered. Syst. Biol. 62:616–624. Godfray H.C.J.Jr. 2007. Linnaeus in the information age. Nature
Crous P.W., Gams W., Stalpers J.A., Robert V., Stegehuis G. 2004. 446:259–260.
MycoBank: an online initiative to launch mycology into the 21st Grass A., Tremetsberger K., Hössinger R., Bernhardt K-G. 2014. Change
century. Stud. Mycol. 50(1):19–22. of species and habitat diversity in the Pannonian Region of Eastern
De Mauro A., Greco M., Grimaldi M. 2016. A formal definition of big Lower Austria over 170 years: using herbarium records as a witness.
data based on its essential features. Library Rev. 65:122–135 Nat. Resour. 5:583–596.
de Queiroz K. 1998. The general lineage concept of species, species Guillot G., Estoup A., Mourtier F., Cosson, J.F. 2005. A spatial statistical
criteria, and the process of speciation. In: Howard D.J., Berlocher model for landscape genetics. Genetics 170:1261–1280.
S.H., editors. Endless forms: species and speciation. New York: Güntsch A., Groom Q., Hyam R., Chagnoux S., Röpert D., Berendsohn
Oxford University Press., p. 57–75.
W., Casino A., Droege G., Gerritsen W., Holetschek J., Marhold
de Queiroz K. 2007. Species concepts and species delimitation. Syst.
K., Mergen P., Rainer H., Smith V., Triebel D. 2018. Standardised
Biol. 56:879–886.
globally unique specimen identifiers. Biodivers. Inf. Sci. Stand.
Diepenbroek M., Glöckner F., Grobe P., Güntsch A., Huber R., König-
2:e26658.
Ries B., Kostadinov I., Nieschulze J., Seeger B., Tolksdorf R., Triebel
Guralnick R.P., Cellinese N., Deck J., Pyle R.L., Kunze J., Penev L., Walls
D. 2014. Towards an integrated biodiversity and ecological research
R., Hagedorn G., Agosti D. Wieczorek J., Catapano T., Page R. 2015.
data management and archiving platform: the German Federation
for the Curation of Biological Data (GFBio) In: Plödereder E., Community next steps for making globally unique identifiers work
Grunske L., Schneider E., Ull D., editors. Informatik 2014—big for biocollections data. ZooKeys 494:133–154.
data komplexität meistern. GI-Edition: Lecture Notes in Informatics Haas F., Häuser C.L. 2007. How many taxonomists are there?
(LNI)—Proceedings. GI edn., vol. 232. Bonn: Köllen, p. 1711–1724. Available from: http://www.senckenberg.uni-frankfurt.de/odes/
Dietrich C., Hart J., Raila, D., Ravaioli U., Sobh N., Sobh O., Taylor C. Haas_Haeuser.pdf.
2012. InvertNet: a new paradigm for digital access to invertebrate Hawksworth D.L., Hibbett D.S., Kirk P.M., Lücking R. 2016. Proposals
collections. Zookeys 209:165–181. to permit DNA sequence data to serve as types of names of fungi.
Dubois A. 2003. Should internet sites be mentioned in the bibliograph- Taxon 65:899–900.
ies of scientific publications? Alytes 21:1–2. Helaly S.E., Thongbai B., Stadler M. 2018. Diversity of biologically
Edwards D.L., Knowles L.L. 2014. Species detection and individual active secondary metabolites from endophytic and saprotrophic
assignment in species delimitation: can integrative data increase fungi of the ascomycete order Xylariales. Nat. Prod. Rep. 35:992–
efficacy? Proc. R. Soc. Lond. [Biol]. 281:20132765. 1014.
Faulwetter S., Vasileiadou A., Kouratoras M., Dailianis T., Arvanitidis Hipsley C.A., Sherratt E. 2019. Psychology, not technology, is our
C. 2013. Micro-computed tomography: introducing new dimensions biggest challenge to open digital morphology data. Sci. Data. 6:41.
to taxonomy. Zookeys 263:1–45. Holetschek J., Dröge G., Güntsch A., Berendsohn W.G. 2012. The ABCD
Favret C. 2014. Cybertaxonomy to accomplish big things in aphid of primary biodiversity data access, Plant Biosyst. 146:771–779.
systematics. Insect Sci. 21:392–399. Hongsanan S., Xie N., Liu J.K., Dissanayake A., Ekanayaka A.H., Raspé
Federhen S. 2012. The NCBI taxonomy database. Nucleic Acids Res. 40 O., Jayawardena R.S., Hyde K.D., Jeewon R., Purahong W., Stadler
(Database issue):D136–D143. M., Peršoh D. 2018. Can we use environmental DNA as holotypes?
Flot J.-F., Couloux A., Tillier S. 2010. Haplowebs as a graphical tool for Fungal. Divers. 92:1–30.
delimiting species: a revival of Doyle’s “field for recombination” Hopkins G.W., Freckleton R.P. 2002. Declines in the numbers of ama-
approach and its application to the coral genus Pocillopora in teur and professional taxonomists: implications for conservation.
Clipperton. BMC Evol. Biol. 10:1–14. Anim. Conserv. 5:245–249.
Fontaine B., van Achterberg K., Alonso-Zarazaga M.A., Araujo R., IISE 2011. State of observed species. Tempe, AZ: International Institute
Asche M., Aspöck H., Aspöck U., Audisio P., Aukema B., Bailly N., for Species Exploration. Available from: http:/species.asu.edu/SOS
Balsamo M., Bank R.A., Belfiore C., Bogdanowicz W., Boxshall G., (March 15, 2019).
Burckhardt D., Chylarecki P., Deharveng L., Dubois A., Enghoff H., Jones N. 2018. How to stop data centres from gobbling up the world’s
Fochetti R., Fontaine C., Gargominy O., Gomez Lopez M.S., Goujet electricity. Nature 561:163–166.
D., Harvey M.S., Heller K.G., van Helsdingen P., Hoch H., De Jong Joppa L.N., Roberts D.L., Pimm S.L. 2011. The population ecology and
Y., Karsholt O., Los W., Magowski W., Massard J.A., McInnes S.J., social behaviour of taxonomists. Trends Ecol. Evol. 26:551–553.
Mendes L.F., Mey E., Michelsen V., Minelli A., Nieto Nafrıa J.M., Kather R., Martin S.J. 2012. Cuticular hydrocarbon profiles as a
van Nieukerken E.J., Pape T., De Prins W., Ramos M., Ricci C., taxonomic tool: advantages, limitations and technical aspects.
Roselaar C., Rota E., Segers H., Timm T., van Tol J., Bouchet P. 2012. Physiol. Entomol. 37: 25–32.
New species in the old world: Europe as a frontier in biodiversity Kinzner M.C., Wagner H.C., Peskoller A., Moder K., Dowell F.E.,
exploration, a test bed for 21st century taxonomy. PLoS One 7:e36881. Arthofer W., Schlick-Steiner B.C., Steiner F.M. 2015. A near-infrared
Frisvad J.C., Andersen B., Thrane U. 2008. The use of secondary spectroscopy routine for unambiguous identification of cryptic ant
metabolite profiling in chemotaxonomy of filamentous fungi. species. PeerJ. 3:e991.
Mycol. Res. 112(2):231–240. Kloster M., Kauer G., Beszteri B. 2014. SHERPA: an image segmentation
Frost D.R. 2019. Amphibian species of the world: an online ref- and outline feature extraction tool for diatoms and other objects.
erence. Version 6.0. Website. Available from: http://research. BMC Bioinformatics 15:218.

[12:58 8/10/2020 Sysbio-wwwomnicoreagency.com] Page: 1250 1231–1253


Copyedited by: YS MANUSCRIPT CATEGORY: Systematic Biology

2020 MIRALLES ET AL. — REPOSITORIES FOR TAXONOMIC DATA 1251

Knapp S. 2008. Taxonomy as a team sport. In: Wheeler Q., editor. The Wilson M.R. editors. Species: the units of diversity. London, NY:
new taxonomy. Systematics Association Special Volume 76. London: Chapman & Hall. p. 381–423.
CRC Press. p. 33–53. McClellan P.H. 2019. Taxonomic punchlines: metadata in biology. Hist.
Köhler J., Jansen M., Rodríguez A., Kok P.J.R., Toledo L.F., Emmrich Biol. https://doi.org/10.1080/08912963.2019.1618293.
M., Glaw F., Haddad C.F.B., Rödel M.O., Vences M. 2017. The use Miller-Rushing, A.J., Primack R.B., Primack D., Mukunda S. 2006.
of bioacoustics in anuran taxonomy: theory, terminology, methods Photographs and herbarium specimens as tools to document
and recommendations for best practice. Zootaxa 4251:1–124. phenological changes in response to global warming. Am. J. Bot.
Krell F.-T. 2015. ZooBank progress report. Bull. Zool. Nomenclat. 72: 93:1667–1674.
181. Mora C., Tittensor D.P., Adl S., Simpson A.G.B., Worm B. 2011. How
Krell F.-T., Marshall S.A. 2017. New species described from photo- many species are there on Earth and in the Ocean? PLoS Biol.
graphs: Yes? No? Sometimes? A fierce debate and a new Declaration 9:e1001127.
of the ICZN. Insect Syst. Divers. 1(1):3–19. Moreton J., Izquierdo A., Emes R.D. 2015. Assembly, assessment, and
Kuhnert E., Sir E.B., Lambert C., Hyde K.D., Hladki A.I., Romero availability of de novo generated eukaryotic transcriptomes. Front.
A.I., Rohde M., Stadler M. 2017. Phylogenetic and chemotaxonomic Genet. 6:361.
resolution of the genus Annulohypoxylon (Xylariaceae) including Nelson G., Sweeney P., Gilbert E. 2018. Use of globally unique
four new species. Fungal Divers. 85:1–43. identifiers (GUIDs) to link herbarium specimen records to physical
Langenkämper D., Zurowietz M., Schoening T., Nattkemper T.W. specimens. Appl. Plant Sci. 6:e1027.
2017. BIIGLE 2.0—browsing and annotating large marine image Padial J.M., De la Riva I. 2007. Taxonomy, the Cinderella of science,
collections. Front. Mar. Sci, 4:83. hidden by its evolutionary stepsister. Zootaxa 1577:1–2.
Larsen B.B., Miller E.C., Rhodes M.K., Wiens, J.J. 2017. Inordinate Padial J.M., Miralles A., De la Riva I., Vences M. 2010. The integrative
fondness multiplied and redistributed: the number of species on future of taxonomy. Front. Zool. 7:16.
Earth and the new pie of life. Q. Rev. Biol. 92: 229–265. Page R.D.M. 2016. DNA barcoding and taxonomy: dark taxa and dark
LaSalle J., Wheeler Q., Jackway P., Winterton S., Hobern D., Lovell texts. Philos. Trans. R. Soc. B. 371:20150334.
D. 2009. Accelerating taxonomic discovery through automated Pampel H., Vierkant P., Scholze F., Bertelmann R., Kindling M., Klump
character extraction. Zootaxa 2217:43–55. J., Goebelbecker H.J., Gundlach J., Schirmbacher P., Dierolf U. 2013.
Le Bras G., Pignal M., Jeanson M. L., Muller S., Aupic C., Carré B., Making research data repositories visible: the re3data.org registry.
Flament G., Gaudeul M., Gonçalves C., Invernón V.R., Jabbour F., PLoS One 8: e78080.
Lerat E., Lowry P.P., Offroy B., Pimparé Pérez E., Poncy O., Rouhan Patterson D.J., Cooper J., Kirk P.M., Pyle R.L., Remsen D.P. 2010. Names
G., Haevermans T. 2017. The French Muséum national d’Histoire are key to the big new biology. Trends Ecol. Evol. 25:686–691.
naturelle vascular plant herbarium collection dataset. Sci. Data
Penev L., Agosti D., Georgiev T., Senderov V., Sautter G., Catapano T.,
4:170016.
Stoev P. 2018. The open biodiversity knowledge management (eco-)
Lendemer J., Thiers B., Monfils A.K., Zaspel J., Ellwood E.R., Bentley
system: tools and services for extraction, mobilization, handling and
A., LeVan K., Bates J., Jennings D., Contreras D., Lagomarsino L.,
re-use of data from the published literature. Biodiver. Inf. Sci. Stand.
Mabee P., Ford L.S., Guralnick R., Gropp R.E., Revelez M., Cobb N.,
2:e25748.
Seltmann K., Aime M.C. 2020. The extended specimen network: a
Pons J., Barraclough T.G., Gomez-Zurita J., Cardoso A., Duran D.P.,
strategy to enhance US biodiversity collections, promote research
Hazell S., Kamoun S., Sumlin W.D., Vogler A.P. 2006. Sequence-
and education. BioScience 70(1):23–30.
Leonelli S. 2014. What difference does quantity make? On the based species delimitation for the DNA taxonomy of undescribed
epistemology of big data in biology. Big Data Soc. 2014:1–11. insects. Syst. Biol. 55:595–609.
Linnaeus C. 1753. Species plantarum exhibentes plantas rite cognitas Poth D., Wollenberg K.C., Vences M., Schulz S. 2012. Volatile amphibian
ad genera relatas, cum differentiis specificis, nominibus trivialibus, pheromones: macrolides of mantellid frogs from Madagascar.
synonymis selectis, locis natalibus, secundum systema sexuale Angew. Chem. Int. Ed. 51:1–5.
digestas. Holmiæ [Stockholm]: Impensis Laurentii Salvi. 132 p. Pritchard J.K., Stephens M., Donnelly P. 2000. Inference of population
Linnaeus C. 1758. Systema naturæ per regna tria naturæ, secundum structure using multilocus genotype data. Genetics 155:945–959.
classes, ordines, genera, species, cum characteribus, differentiis, Puillandre N., Lambert A., Brouillet S., Achaz G. 2012. ABGD,
synonymis, locis. Tomus I. Editio decima, reformata. Holmiæ Automatic barcode gap discovery for primary species delimitation.
[Stockholm]: Impensis Laurentii Salvi. 824 p. Mol. Ecol. 21:1864–1877.
Locey K.J., Lennon J.T. 2016. Scaling laws predict global microbial Pyle R.L. 2016. Towards a global names architecture: the future of
diversity. Proc. Natl. Acad. Sci. USA 113(21):5970–5975. indexing scientific names. Zookeys 550:261–281.
Lorieul T., Pearson K.D., Ellwood E.R., Goëau H., Molino J.F., Pyle R.L., Earle J.L., Greene B.D. 2008. Five new species of the
Sweeney P.W., Yost J.M., Sachs J., Mata-Montero E., Nelson G., damselfish genus Chromis (Perciformes: Labroidei: Pomacentridae)
Soltis P.S., Bonnet P., Joly A. 2019. Toward a large-scale and deep from deep coral reefs in the tropical western Pacific. Zootaxa
phenological stage annotation of herbarium specimens: case studies 1671:3–31.
from temperate, tropical, and equatorial floras. Appl. Plant Sci. Ratnasingham S., Hebert P.D.N. 2013. A DNA-based registry for all
7(3):e01233. animal species: the Barcode Index Number (BIN) system. PLoS One
Louis K.S., Jones L.M., Campbell E.G. 2002. Macroscope: Sharing in 8:e66213.
Science. Am. Sci. 90:304–307. Renner S.S. 2016. A return to Linnaeus’s focus on diagnosis, not
Lumbsch H.T. 2002. Analysis of phenolic products in lichens for description: the use of DNA characters in the formal naming of
identification and taxonomy. In: Kranner I.C., Beckett R.P., Varma species. Syst. Biol. 65:1085–1095.
A.K., editors. Protocols in lichenology. Springer Lab Manuals. Riley J. 2004. Understanding metadata. Bethesda, MD: NISO Press,
Berlin, Heidelberg: Springer. p. 281–295. National Information Standards Organization.
Lynch C. 2008. Big data: How do your data grow? Nature 455: 28–29. Rissler L.J., Apodaca J.J. 2007. Adding more ecology into species delim-
Marcial L.H., Hemminger B.M. 2010. Scientific data repositories on the itation: ecological niche models and phylogeography help define
web: an initial survey. J. Assoc. Inf. Sci. Technol. 61(10):2029–2048. cryptic species in the black salamander (Aneides flavipunctatus). Syst.
Marshall S.A., Evenhuis N.L. 2015. New species without dead bodies: Biol. 56(6):924–942.
a case for photo-based descriptions, illustrated by a striking new Roch M.A., Batchelor H., Baumann-Pickering S., Berchok C.L.,
species of Marleyimyia Hesse (Diptera, Bombyliidae) from South Cholewiak D., Fujioka E., Garland E.C., Herbert S., Hildebrand J.A.,
Africa. ZooKeys 525:117–127. Oleson E.M., Van Parijs S., Risch D., Široviæ A., Soldevilla M.S. 2016.
May T.W., Redhead S.A., Lombard L., Rossman A.Y. 2018. XI Management of acoustic metadata for bioacoustics. Ecol. Inform.
International Mycological Congress: report of Congress action on 31:122–136.
nomenclature proposals relating to fungi. IMA Fungus 9(2):xxii. Roche D.G., Kruuk L.E., Lanfear R., Binning S.A. 2015. Public data
Mayden R.L. 1997. A hierarchy of species concepts: the denouement archiving in ecology and evolution: how well are we doing? PLoS
in the saga of the species problem. In: Claridge M.F., Dawah H.A., Biol. 13:e1002295.

[12:58 8/10/2020 Sysbio-wwwomnicoreagency.com] Page: 1251 1231–1253


Copyedited by: YS MANUSCRIPT CATEGORY: Systematic Biology

1252 SYSTEMATIC BIOLOGY VOL. 69

Rodríguez-Fernández J.I , De Carvalho C.J.B., Pasquini C., Gomes Thorpe S.E. 2017. Is photography-based taxonomy really inadequate,
de Lima K.M, Moura M.O., Carbajal Arizaga, G.G. 2011. Bar- unnecessary, and potentially harmful for biological sciences? A
coding without DNA? Species identification using near infrared reply to Ceríaco et al. (2016). Zootaxa 4226:449–450.
spectroscopy. Zootaxa 2933:46–54. Triebel D., Reichert W., Bosert S., Feulner M., Osieko Okach D.,
Rosenberg M.S. 2014. Contextual cross-referencing of species names for Slimani A., Rambold G. 2018. A generic workflow for effective
fiddler crabs (genus Uca): an experiment in cyber-taxonomy. PLoS sampling of environmental vouchers with UUID assignment and
One. 9:e101704. image processing. Database 2018:bax096.
Roskov Y., Ower G., Orrell T., Nicolson D., Bailly N., Kirk P.M., Troudet J., Vignes-Lebbe R., Grandcolas P., Legendre F. 2018. The
Bourgoin T., DeWalt R.E., Decock W., Nieukerken E. van, Zarucchi increasing disconnection of primary biodiversity data from spe-
J., Penev L., eds. 2019. Species 2000 & ITIS Catalogue of Life, 26th cimens: how does it happen and how to handle it? Syst. Biol.
February 2019. Digital resource at www.catalogueoflife.org/col. 67:1110–1119.
Species 2000. Naturalis, Leiden, the Netherlands. Tsugawa H., Satoh A., Uchino H., Cajka T., Arita M., Arita M. 2019. Mass
Rossel S., Martínez Arbizu P. 2019. Revealing higher than expected spectrometry data repository enhances novel metabolite discoveries
diversity of Harpacticoida (Crustacea:Copepoda) in the North Sea with advances in computational metabolomics. Metabolites 9(6): pii:
using MALDI-TOF MS and molecular barcoding. Sci. Rep. 9:9182. E119.
Rupp K. 2018. 42 Years of microprocessor trend data. Web- Venu P., Sanjappa M. 2011. The impact factor and taxonomy. Curr. Sci.
site. Available from: https://www.karlrupp.net/2018/02/42-years 101(11):1397.
-of-microprocessor-trend-data/ (March 13, 2019). Webster M.S. 2017. Emerging frontiers in collections-based ornitholo-
Sangster G., Luksenburg, J.A. 2015. Declining rates of species described gical research: the extended specimen. Studies in avian biology. Boca
per taxonomist: Slowdown of progress or a side-effect of improved Raton, FL: CRC Press. 240 p.
quality in taxonomy? Syst. Biol. 64:144–151. Wendt L., Sir E.B., Kuhnert E., Heitkämper S., Lambert C., Hladki A.I.,
Santos C.M.D., Amorim D.S., Klassa B., Fachin D.A., Nihei S.S., Romero A.I., Luangsaard J.J., Srikitikulchai P., Peršoh D., Stadler M.
Carvalho C.J., Falaschi, R.L., Mello-Patiu C.A., Couri M.S., Oliveira 2018. Resurrection and emendation of the Hypoxylaceae, recognised
S.S., Silva V.C., Ribeiro G.C., Capellari R.S., Lamas, C.J. 2016. On from a multi-gene genealogy of the Xylariales. Mycol. Prog. 17:115–
typeless species and the perils of fast taxonomy. Syst. Entomol. 154.
41:511–515. Wheeler Q.D. 2007. Invertebrate systematics or spineless taxonomy?
Scherz M.D., Glaw F., Vences M., Andreone F., Crottini A. 2016a. Zootaxa 1668:11–18.
Two new species of terrestrial microhylid frogs (Microhylidae: Wheeler Q.D., Knapp S., Stevenson D.W., Stevenson J., Blum S.D.,
Cophylinae: Rhombophryne) from northeastern Madagascar. Sala- Boom B.M., Borisy G.G., Buizer J.L., De Carvalho M.R., Cibrian
mandra 52:91–106. A., Donoghue M.J., Doyle V., Gerson E.M., Graham C.H., Graves
Scherz M.D., Ruthensteiner B., Vences M., Glaw F. 2014. A new micro-
P., Graves S.J., Guralnick R.P., Hamilton A.L., Hanken J., Law
hylid frog, genus Rhombophryne, from northeastern Madagascar,
W., Lipscomb D.L., Lovejoy T.E., Miller H., Miller J.S., Naeem S.,
and a re-description of R. serratopalpebrosa using micro-computed
Novacek M.J., Page L.M., Platnick N.I., Porter-Morgan H., Raven
tomography. Zootaxa 3860:547–560.
P.H., Solis M.A., Valdecasas A.G., Van Der Leeuw S., Vasco A.,
Scherz M.D., Vences M., Rakotoarison A., Andreone F., Köhler J., Glaw
Vermeulen N., Vogel J., Walls R.L., Wilson E.O., Woolley J.B. 2012a.
F., Crottini A. 2016b. Reconciling molecular phylogeny, morpholo-
Mapping the biosphere: exploring species to understand the origin,
gical divergence and classification of Madagascan narrow-mouthed
organization and sustainability of biodiversity. Syst. Biodivers.
frogs (Amphibia: Microhylidae). Mol. Phylogenet. Evol. 100:372–381.
Schlining B.M., Stout, N.J. 2006. "MBARI’s Video Annotation and 10:1–20.
Reference System," OCEANS 2006. Boston, MA: IEEE. p. 1–5. Wheeler Q.D., Bourgoin T., Coddington J., Gostony T., Hamilton A.,
Short A.E.Z., Dikow T., Moreau C.S. 2018. Entomological collections in Larimer R., Plaszek A., Schauff M., Solis M.A. 2012b. Nomenclatural
the age of big data. Annu. Rev. Entomol. 63:513–530. benchmarking: the roles of digital typification and telemicroscopy.
Simpson G.G. 1961. Principles of animal taxonomy. New York: ZooKeys 209:193–202.
Columbia University Press. p. xii + 247. Wieczorek J., Bloom D., Guralnick R., Blum S., Döring M., Giovanni
Small E. 1989. Systematics of biological Systematics (or, Taxonomy of R., Robertson T., Vieglais D. 2012. Darwin Core: an evolving
Taxonomy). Taxon 38(3):335–356. community-developed biodiversity data standard. PLoS One
Smith V., Georgiev T., Stoev P, Biserkov J., Miller J., Livermore L., Baker 7:e29715.
E., Mietchen D., Couvreur T.L., Mueller G., Dikow T., Helgen K.M., Wilkinson M.D., Dumontier M., Aalbersberg I.J.J., Appleton G., Axton
Frank J., Agosti D., Roberts D., Penev L. 2013. Beyond dead trees: M., Baak A., Blomberg N., Boiten J.-W., Silva Santos L.B. da, Bourne
integrating the scientific process in the Biodiversity Data Journal. P.E., Bouwman J., Brookes A.J., Clark T., Crosas M., Dillo I., Dumon
Biodivers. Data J. 1:e995. O., Edmunds S., Evelo C.T., Finkers R., Gonzalez-Beltran A., Gray
Solís-Lemus C., Knowles L.L., Ané C. 2015. Bayesian species delimit- A.J.G., Groth P., Goble C., Grethe J.S., Heringa J., Hoen P.A.C. ‘t,
ation combining multiple genes and traits in a unified framework. Hooft R., Kuhn T., Kok R., Kok J.N., Lusher S.J., Martone M.E., Mons
Evolution 69:492–507. A., Packer A.L., Persson B., Rocca-Serra P., Roos M., Schaik R. van,
Stackebrandt E., Smith D. 2019. Paradigm shift in species description: Sansone S.-A., Schultes E., Sengstag T., Slater T., Strawn G., Swertz
the need to move towards a tabular format. Arch. Microbiol. 201:143– M.A., Thompson M., Lei J. van der, Mulligen E. van, Velterop J.,
145. Waagmeester A., Wittenburg P., Wolstencroft K.J., Zhao J., Mons B.
Starnberger I., Poth D., Peram P.S., Schulz S., Vences M., Knudsen 2016. The FAIR Guiding Principles for scientific data management
J., Barej M.F., Rödel M.-O., Walzl M., Hödl W. 2013. Take time to and stewardship. Sci. Data. 3:160018.
smell the frogs: vocal sac glands of reed frogs (Anura: Hyperoliidae) Wilkinson M.D., Sansone S.A., Schultes E., Doorn P., Bonino da Silva
contain species-specific chemical cocktails. Biol. J. Linn. Soc. Santos L.O., Dumontier M. 2018. A design framework and exemplar
110:828–838. metrics for FAIRness. Sci. Data 5:180118.
Steinmann I.C., Pflüger V., Schaffner F., Mathis A., Kaufmann C. 2013. Wink M., Botschen F., Gosmann C., Schäfer H., Waterman G.
Evaluation of matrix-assisted laser desorption/ionization time of 2010. Chemotaxonomy seen from a phylogenetic perspective and
flight mass spectrometry for the identification of ceratopogonid and evolution of secondary metabolism. Annu. Plant Rev. 40:364–433.
culicid larvae. Parasitology 140:318–327. Winterton S.L. 2009. Revision of the stiletto fly genus Neodialineura
Stuessy T.F., Crawford D.J., Soltis D.E., Soltis P.S. 2014. Plant Mann (Diptera: Therevidae): an empirical example of cybertax-
systematics—the origin, interpretation, and ordering of plant onomy. Zootaxa 2157:1–33.
biodiversity. In: Regnum Vegetabile, vol. 156. Königstein (Taunus): Yilmaz P., Kottmann R., Field D., Knight R., Cole J.R., Amaral-
Koeltz Scientific Books. 425 p. Zettler L., Gilbert J.A., Karsch-Mizrachi I., Johnston A., Cochrane
Tedersoo L., Ramirez K.S., Nilsson R.H., Kaljuvee A., Kõljalg U., G., Vaughan R., Hunter C., Park J., Morrison N., Rocca-Serra
Abarenkov K. 2015. Standardizing metadata and taxonomic iden- P., Sterk P., Arumugam M., Bailey M., Baumgartner L., Birren
tification in metabarcoding studies. GigaScience 4:34. B.W., Blaser M.J., Bonazzi V., Booth T., Bork P., Bushman F.D.,

[12:58 8/10/2020 Sysbio-wwwomnicoreagency.com] Page: 1252 1231–1253


Copyedited by: YS MANUSCRIPT CATEGORY: Systematic Biology

2020 MIRALLES ET AL. — REPOSITORIES FOR TAXONOMIC DATA 1253

Buttigieg P.L., Chain P.S., Charlson E., Costello E.K., Huot-Creasy any (x) sequence (MIxS) specifications. Nat. Biotechnol. 29:415–
H., Dawyndt P., DeSantis T., Fierer N., Fuhrman J.A., Gallery 420.
R.E., Gevers D., Gibbs R.A., San Gil I., Gonzalez A., Gordon Zamora J.C., and 412 coauthors. 2018. Considerations and con-
J.I., Guralnick R., Hankeln W., Highlander S., Hugenholtz P., sequences of allowing DNA sequence data as types of fungal taxa.
Jansson J., Kau A.L., Kelley S.T., Kennedy J., Knights D., Koren IMA Fungus 9:167–175.
O., Kuczynski J., Kyrpides N., Larsen R., Lauber C.L., Legg T., Ley Zhang J., Kapli P., Pavlidis P., Stamatakis A. 2013. A general species
R.E., Lozupone C.A., Ludwig W., Lyons D., Maguire E., Methé delimitation method with applications to phylogenetic placements.
B.A., Meyer F., Muegge B., Nakielny S., Nelson K.E., Nemergut Bioinformatics 29:2869–2876.
D., Neufeld J.D., Newbold L.K., Oliver A.E., Pace N.R., Palanisamy Zompro O. 2005. Catalogue of type material of the insect order Phas-
G., Peplies J., Petrosino J., Proctor L., Pruesse E., Quast C., Raes matodea, housed in the Museum für Naturkunde der Humboldt
J., Ratnasingham S., Ravel J., Relman D.A., Assunta-Sansone S., Universität zu Berlin, Germany and in the Institut für Zoologie
Schloss P.D., Schriml L., Sinha R., Smith M.I., Sodergren E., Spo der Martin Luther Universität in Halle (Saale), Germany. Dtsch.
A., Stombaugh J., Tiedje J.M., Ward D.V., Weinstock G.M., Wendel Entomol. Z. 52:251–290.
D., White O., Whiteley A., Wilke A., Wortman J.R., Yatsunenko Zurowietz M., Langenkämper D., Nattkemper T.W. 2019. BIIGLE2Go—
T., Glöckner F.O. 2011. Minimum information about a marker a scalable image annotation system for easy deployment on cruises.
gene sequence (MIMARKS) and minimum information about OCEANS 2019-Marseille. Marseille, France: IEEE, p. 1–6.

[12:58 8/10/2020 Sysbio-wwwomnicoreagency.com] Page: 1253 1231–1253

You might also like