SlideShare a Scribd company logo
Data Consultant,
Honorary Academic Editor
Associate Director,
Principal Investigator
The rise of the data-centric !
research and publication enterprises!
!
A journey into the whirlwind!
Susanna-Assunta Sansone, PhD!
!
uk.linkedin.com/in/sasansone!
@biosharing!
@isatools!
@scientificdata!
!
http://www.slideshare.net/SusannaSansone
Oxford Interdisciplinary Bioscience Doctoral Training Partnership (DTP) – 31 July, 2014
Scope is to outline the
strategies taken by the
researchers to allow access
to they research outputs
But it does not give you all info you need to know!
Also because data sharing policies are unclear; a very common and loose text is:
“Applicants should make use of existing, recognised standards for data collection
and management, where these exist, and make data available through existing
community resources or databases where possible”. But what constitutes a
recognised standard or acceptable community resource?
•  About myself!
o  activities and interests!
•  FAIR data!
o  concept!
o  my related projects!
•  Scientific Data!
o  rationale!
o  Data Descriptors!
o  examples!
Outline!
Communities we work with/for:
Communities we work with/for:
•  Describe my experiments
•  Share info with my group
•  Store for query & access
•  Upload to analysis tool
•  Compare with similar data
•  etc……
•  Support the needs of our
researchers
•  Enhancing our existing
tools; develop new
components, or connect
with other tools
•  Comply to community
standards
•  etc….
•  Drive data reusability agenda
•  Recommend tools, data
resources and standards in
data policies
•  Criteria for endorsement
•  Mapping the ecosystem of
efforts
•  etc……
Key areas of activity:
•  Data capture and curation
•  Database development
•  Data (nano)publication
•  Data provenance
•  Open, community ontologies
and standards
•  Semantic web
•  Social engineering
•  Software development
•  Training
•  Visualization (collaboration/
jointly with Prof Min Chen)
Communities we work with/for:
Key areas of activity:
•  Data capture and curation
•  Database development
•  Data (nano)publication
•  Data provenance
•  Open, community ontologies
and standards
•  Semantic web
•  Social engineering
•  Software development
•  Training
•  Visualization (collaboration/
jointly with Prof Min Chen)
Communities we work with/for: As part of:
•  UK, European and international
consortia
•  Pre-competitive informatics
public-private partnerships
•  Standardization initiatives
http://www.flickr.com/photos/notbrucelee/8016189356/
CC BY
Our activities are around and in support of
data curation, management and publication
and their pivotal roles in
enabling FAIR data and research, driving science and discoveries
eTRIKS – european Translational Information
and Knowledge management Services
Consortium of academic (Imperial College, CNRS, Un
of Luxemburg) and pharmas (Janssen, Merck, AZ, Lilly,
Lundbeck, Pfizer, Roche, Sanofi, Bayer, GSK) building
a sustainable, open translational research informatics
platform
• Nature Publishing Group‘s Scientific Data
• BioMedCentral and BGI‘s GigaScience
• F1000 Research
• Oxford University Press
Susanna-Assunta Sansone, Philippe Rocca-Serra, Alejandra Gonzalez-Beltran, Eamonn Maguire, Milo Thurston;
Oxford alumni: Annapaola Santarsiero, Pavlos Georgiou + my previous team when at EBI (2001 – 2010)!
Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014
Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014
https://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/
Credit to:
A great start, but not enough!
image by Greg Emmerich
http://discovery.urlibraries.org/
http://www.theguardian.com/higher-education-network/blog/2014/jun/26
§  Researchers and bioinformaticians in both academic and commercial
science, along with funding agencies and publishers, embrace the
concept that both
•  DATA: entities of interest e.g., genes, metabolites, phenotypes and
•  METADATA: experimental steps e.g., provenance of study materials,
technology and measurement types
should be Findable, Accessible, Interoperable and Reusable
Worldwide movement for FAIR data
Source: http://ebbailey.wordpress.com
In all fairness, no much data is FAIR!
In all fairness, no much data is FAIR!
But it is not just about technology…!
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
1
9
…breath and depth of the content!
…is pivotal!
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
2
0
sample characteristic(s)
experimental design
experimental variable(s)
technology(s)
measurement(s)
protocols(s)
data file(s)
......
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
2
1
•  make annotation explicit
and discoverable
•  structure the descriptions for
consistency
•  ensure/regulate access
•  deposit and publish
•  etc….
§  To make this dataset ‘FAIR’, one
must have best practices,
standards and tools to:
•  report sufficient details
•  capture all salient features of
the experimental workflow
A community mobilization to develop standards, e.g.:
§  Structural and operational differences
•  organization types (open, close to members, society, WG etc.)
•  standards development (how to formulate, conduct and maintain)
•  adoption, uptake, outreach (link to journals, funders and commercial sector)
•  funds (sponsors, memberships, grants, volunteering)
de jure de facto
grass-roots
groups
standard
organizations
Nanotechnology Working Group
A community mobilization to develop standards, e.g.:
de jure de facto
grass-roots
groups
standard
organizations
Nanotechnology Working Group
Focus on reporting or content standards
Including minimum
information reporting
requirements, or
checklists to report the
same core, essential
information
Including controlled
vocabularies, taxonomies,
thesauri, ontologies etc. to
use the same word and
refer to the same ‘thing’
Including conceptual
model, conceptual
schema from which an
exchange format is derived
to allow data to flow from
one system to another
Community-developed, standards are pivotal to structure, enrich the
description and share data and associated contextual metadata,
facilitating understanding and reuse!
Growing number of reporting standards
+ 130
+ 150
+ 303
Source:BioPortal
Databases, !
annotation,!
curation !
tools !
implementing !
standards!
miame!
MIAPA!
MIRIAM!
MIQAS!
MIX!
MIGEN!
CIMR!
MIAPE!
MIASE!
MIQE!
MISFISHIE….!
REMARK!
CONSORT!
MAGE-Tab!
GCDML!
SRAxml!
SOFT!
FASTA!
DICOM!
MzML!
SBRML!
SEDML…!
GELML!
ISA-Tab!
CML!
MITAB!
AAO!
CHEBI!
OBI!
PATO! ENVO!
MOD!
BTO!
IDO…!
TEDDY!
PRO!
XAO!
DO
VO!
Source:BioSharing
Source:BioSharing
26
Technologically-delineated
views of the world

!
Biologically-delineated
views of the world!
Generic features ( common core )!
- description of source biomaterial!
- experimental design components!
Arrays!
Scanning! Arrays &

Scanning!
Columns!
Gels!
MS! MS!
FTIR!
NMR!
Columns!
transcriptomics
proteomics
metabolomics
plant biology
epidemiology
microbiology
Fragmentation, duplications and gaps
To compare and integrate data we need interoperable standards
Three EBI
omics systems
SubmissionAccessStorage
Fragmentation of the databases and data, e.g.
Three EBI
omics systems
SubmissionAccessStorage
Fragmentation of the databases and data, e.g.
Proteomics
Three EBI
omics systems
SubmissionAccessStorage
Fragmentation of the databases and data, e.g.
Proteomics
Three EBI
omics systems
DIFFERENT
Formats, terminologies and
tools
Submission
DIFFERENT
Download formats
Access
DIFFERENT
- Core requirements
represented
- Representation of the
studies and related
samples
- Curation practices
Storage
Fragmentation of the databases and data, e.g.
Proteomics
How much do we know and which standards can we use
Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014
Registering and cataloging is just step one; the next one are:
•  Develop assessment criteria for usability and popularity of standards
•  Associate standards to data policies and databases
•  Assemble journal and funder policies re data storage
•  Make fully cross-searchable
•  Intended goal: help stakeholders make informed decisions
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
3
4
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
3
5
Users can claim
records and
maintain them
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
3
6
Users can claim
records and
maintain them
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
3
7
Classify, links standards and visualize relations
Example
The relationship among popular standard formats for pathway information!
BioPAX and PSI-MI are designed for data exchange to and from databases and pathway and
network data integration. SBML and CellML are designed to support mathematical simulations
of biological systems and SBGN represents pathway diagrams. !
CREDIT:
Demir, et al., The BioPAX
community standard for
pathway data sharing, 2010.
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
3
9
•  make annotation explicit
and discoverable
•  structure the descriptions for
consistency
•  ensure/regulate access
•  deposit and publish
•  etc….
§  To make this dataset ‘FAIR’, one
must have best practices,
standards and tools to:
•  report sufficient details
•  capture all salient features of
the experimental workflow
Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014
Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
ISA powers data collection, curation resources and repositories, e.g.:
General-purpose, configurable format,
designed to support:
•  description of the experimental metadata,
making the annotation explicit and
discoverable
•  provenance tracking
•  use community standards, such as minimal
reporting guidelines and terminologies
•  designed to be converted to - a growing
number of - other metadata formats, e.g.
used by EBI repositories
analysis !
method! script!
Data file or !
record in a
database!
Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014
Our industry needs a Disruptive Innovation. 

That Disruption...is Pistoia!
Oxford e-Research Centre (Sansone SA):
Data Standards and Curation training
FAIR data - roles and responsibilities
•  Data has to become an integral part of
the scholarly communications!
•  Responsibilities lie across several
stakeholder groups: researchers, data
centers, librarians, funding agencies and
publishers!
•  But publishers occupy a “leverage point”
in this process!
Journal publishing: the changing landscape!
Human Genome 2001
62 Pages, 150 Authors,
49 Figure, 27 tables
Encode Project 2012
30 papers,
3 Journals
Nature Publishing Group: the changing landscape!


!
!
!
Launched on May 27th, 2014
A new online-only publication for descriptions of scientifically valuable datasets
in the life, environmental and biomedical sciences, but not limited to these!
Credit for sharing
your data
Focused on reuse
and reproducibility
Peer reviewed,
curated
Promoting Community
Data Repositories
Open Access
Supported by:!
!
!
!
Experimental metadata or!
structured component!
(in-house curated, machine-
readable formats)!
Article or !
narrative component!
(PDF and HTML)!


Data Descriptor: narrative and structure!


Data Descriptor: narrative!
Sections:!
•  Title!
•  Abstract!
•  Background & Summary!
•  Methods!
•  Technical Validation!
•  Data Records!
•  Usage Notes !
•  Figures & Tables !
•  References!
•  Data Citations!


!
Focus on data reuse!
Detailed descriptions of the methods and technical analyses supporting the
quality of the measurements.!
Does not contain tests of new scientific hypotheses!
In traditional publications this
information is not provided in a
sufficiently detailed manner
However this information is
essential for understanding,
reusing, and reproducing
datasets


Data Descriptor: narrative!
Sections:!
•  Title!
•  Abstract!
•  Background & Summary!
•  Methods!
•  Technical Validation!
•  Data Records!
•  Usage Notes !
•  Figures & Tables !
•  References!
•  Data Citations!


!
Focus on data reuse!
Detailed descriptions of the methods and technical analyses supporting the
quality of the measurements.!
Does not contain tests of new scientific hypotheses!


Data Descriptor: narrative!
Sections:!
•  Title!
•  Abstract!
•  Background & Summary!
•  Methods!
•  Technical Validation!
•  Data Records!
•  Usage Notes !
•  Figures & Tables !
•  References!
•  Data Citations!


!
Focus on data reuse!
Detailed descriptions of the methods and technical analyses supporting the
quality of the measurements.!
Does not contain tests of new scientific hypotheses!
Joint Declaration of Data Citation Principles by the
Data Citation Synthesis Group, incl.:
-  CODATA
-  Research Data Alliance (RDA),
-  Force11
In-house curation team:!
•  assists users to submit the structured
content via simple templates and an
internal authoring tool!
•  performs value-added semantic
annotation of the experimental
metadata!
For advanced users/service providers
willing to export ISA-Tab for direct
submission, we will release a technical
specification:!
analysis !
method! script!
Data file or !
record in a
database!


Data Descriptor: structure (CC0)!
Hanke: Neuroscience !
!
!
!
!
!
!
!
!
!
Code in GitHub
New Dataset
Data in OpenfMRI
Source code in GitHub
Big Data
Stefano: Stem Cells!
Associated Nature Article
Data
- figshare
- NCBI GEO
Integrated figshare data viewer
!
!
!
!
!
!
!
!
Scientific hypotheses:!
Synthesis!
Analysis!
Conclusions!
Methods and technical analyses supporting
the quality of the measurements:!
What did I do to generate the data?!
How was the data processed?!
Where is the data?!
Who did what when!
BEFORE: get your data to the community as soon as possible (see NPG pre-publication policy)
AT THE SAME TIME: publish your Data Descriptor(s) alongside research article(s)
AFTER: expand on your research articles, adding further information for reuse of the data


Relation with traditional articles - content and time!
Export to various formats
(ISA_tab, RDF, etc)
Linking between research papers, Data Descriptors, and data records


Making data discoverable

!
We currently recognize over
50 public data repositories!
!
Evaluation is not be based on the perceived impact or novelty of the findings!
•  Experimental Rigour and Technical Data Quality!
o  Were the data produced in a rigorous and methodologically sound manner?!
o  Was the technical quality of the data supported convincingly with technical validation
experiments and statistical analyses of data quality or error, as needed?!
o  Are the depth, coverage, size, and/or completeness of these data sufficient for the types of
applications or research questions outlined by the authors?!
•  Completeness of the Description!
o  Are the methods and any data-processing steps described in sufficient detail to allow others to
reproduce these steps?!
o  Did the authors provide all the information needed for others to reuse this dataset or integrate it
with other data?!
o  Is this Data Descriptor, in combination with any repository metadata, consistent with relevant
minimum information or reporting standards?!
•  Integrity of the Data Files and Repository Record!
o  Have you confirmed that the data files deposited by the authors are complete and match the
descriptions in the Data Descriptor?!
o  Have these data files been deposited in the most appropriate available data repository?!
Peer review process focused on quality and reuse!
•  Neuroscience, ecology, epidemiology, environmental science,
functional genomics, metabolomics, toxicology!
•  New datasets and previously published data sets!
o  a fuller, more in-depth look at the data processing steps,
supported by additional data files and code from each step!
o  additional tutorial-like information for scientists interested in
reusing or integrating the data with their own!
•  Datasets in figshare and domain specific databases!
•  Code deposited in figshare and GitHub!
•  Individual datasets, curated aggregation and citizen science!
•  First dataset part of a collection !
•  Academic and industry authors!
61
Current content is diverse – bimonthly releases !
Take home messages
http://www.flickr.com/photos/12308429@N03/4957994485/
u  Make sure your research outputs make an impact!
u  Open your research outputs, via the right channels to get cited and credited
u  Contribute to the reproducible research movement and to FAIR data
Take home messages
u  Uniquely identify yourself via ORCID
u  Share identified generic research outputs, e.g. FigShare
u  Share and deposit code, e.g. GitHub, Bitbucket
http://www.flickr.com/photos/idiolector/289490834/
Take home messages
u  Learn about open standards in your area, via e.g. BioSharing
u  Select tools that implement relevant standards, e.g. ISA
u  Publish not just in traditional journals, but think Scientific Data
http://www.flickr.com/photos/webhamster/2582189977/
Data Consultant,
Honorary Academic Editor
Associate Director,
Principal Investigator
The rise of the data-centric !
research and publication enterprises!
!
A journey into the whirlwind!
Susanna-Assunta Sansone, PhD!
!
uk.linkedin.com/in/sasansone!
@biosharing!
@isatools!
@scientificdata!
!
http://www.slideshare.net/SusannaSansone
Oxford Interdisciplinary Bioscience Doctoral Training Partnership (DTP) – 31 July, 2014

More Related Content

PPT
Improving the Transparency and Credibility of Open Access Publishing by Lars ...
PPTX
Ada slide presentation rsc day_feb2017_v2
PPTX
Active research management and sharing
PDF
Open science and data sharing: the DataFirst experience/Martin Wittenberg
PPTX
The African Open Science Platform/Geoffrey Boulton
PPTX
NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...
PDF
Open Science Incentives/Veerle van den Eynden
PDF
"Building Capacity for Open Research" - AAMC
Improving the Transparency and Credibility of Open Access Publishing by Lars ...
Ada slide presentation rsc day_feb2017_v2
Active research management and sharing
Open science and data sharing: the DataFirst experience/Martin Wittenberg
The African Open Science Platform/Geoffrey Boulton
NISO Working Group Connection Live! Research Data Metrics Landscape: An Updat...
Open Science Incentives/Veerle van den Eynden
"Building Capacity for Open Research" - AAMC

What's hot (20)

PDF
Incentivizing data sharing: a "bottom up" perspective/Louise Bezuidenhout
PPTX
Without data, science is merely an opinion: African Open Science Platform/Ina...
PPTX
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
PPTX
11.13.14 Slides, “SHARE: An Overview”
PPTX
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
PDF
From BioSharing to FAIRsharing - mapping the standards landscape
PPTX
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...
PDF
3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...
PPTX
RDAP 16: Sustainability of data infrastructure: The history of science scienc...
PDF
How can we ensure research data is re-usable? The role of Publishers in Resea...
PDF
12.10.14 Slides, “The SHARE Notification Service”
PPTX
A librarian's road map to open access
PPT
5-14-13 An Introduction to VIVO Presentation Slides
PPTX
NISO Two Part Webinar: Is Granularity the Next Discovery Frontier? Part 1: ...
PPTX
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
PDF
Information literacy in an online world: A digital approach to address the n...
PPTX
August 12 NISO Webinar: MOOCs and Libraries: A Brewing Collaboration.
PPTX
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
PPTX
June 18 NISO Virtual Conference: Keynote Speaker: Altmetrics at the Portfolio...
PPTX
Vivo2016 presentation-haak herbertmichalek
Incentivizing data sharing: a "bottom up" perspective/Louise Bezuidenhout
Without data, science is merely an opinion: African Open Science Platform/Ina...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
11.13.14 Slides, “SHARE: An Overview”
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
From BioSharing to FAIRsharing - mapping the standards landscape
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...
3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...
RDAP 16: Sustainability of data infrastructure: The history of science scienc...
How can we ensure research data is re-usable? The role of Publishers in Resea...
12.10.14 Slides, “The SHARE Notification Service”
A librarian's road map to open access
5-14-13 An Introduction to VIVO Presentation Slides
NISO Two Part Webinar: Is Granularity the Next Discovery Frontier? Part 1: ...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
Information literacy in an online world: A digital approach to address the n...
August 12 NISO Webinar: MOOCs and Libraries: A Brewing Collaboration.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
June 18 NISO Virtual Conference: Keynote Speaker: Altmetrics at the Portfolio...
Vivo2016 presentation-haak herbertmichalek
Ad

Similar to Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014 (20)

PDF
Managing Big Data - Berlin, July 9-10, 201.
PDF
"Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013
PDF
Overview of standards/stakeholders in life science (RDA Engagement Interest G...
PDF
INSERM - Data Management & Reuse of Health Data - May 2017
PDF
Big Data Standards - Workshop, ExpBio, Boston, 2015
PPTX
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
PDF
All Things Biocuration
PPTX
FAIRsharing presentation at the Japan Science and Technology Agency
PPT
Sansone bio sharing introduction
PDF
FAIR, standards and FAIRsharing - MAQC Society 2019
PPT
Sansone mibbi-intro
PDF
Life science odin-oct2013-sa-sansone
PDF
FAIR, community standards and data FAIRification: components and recipes
PDF
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
PDF
Going FAIR: premises, promises and challenges of interoperability standards
PDF
FAIR: standards and services
PPTX
Susanna Sansone - OpenCon Oxford, 1st Dec 2017
PDF
The FAIR movement - Oxford Open Data Week
PDF
Open Access Week - Oxford, 20-24 Oct 2014
PDF
BioSharing.org - mapping the landscape of community standards, databases, dat...
Managing Big Data - Berlin, July 9-10, 201.
"Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013
Overview of standards/stakeholders in life science (RDA Engagement Interest G...
INSERM - Data Management & Reuse of Health Data - May 2017
Big Data Standards - Workshop, ExpBio, Boston, 2015
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
All Things Biocuration
FAIRsharing presentation at the Japan Science and Technology Agency
Sansone bio sharing introduction
FAIR, standards and FAIRsharing - MAQC Society 2019
Sansone mibbi-intro
Life science odin-oct2013-sa-sansone
FAIR, community standards and data FAIRification: components and recipes
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
Going FAIR: premises, promises and challenges of interoperability standards
FAIR: standards and services
Susanna Sansone - OpenCon Oxford, 1st Dec 2017
The FAIR movement - Oxford Open Data Week
Open Access Week - Oxford, 20-24 Oct 2014
BioSharing.org - mapping the landscape of community standards, databases, dat...
Ad

More from Susanna-Assunta Sansone (20)

PDF
FAIR and Reproducible - GSC, Tucson, Aug 2024
PDF
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
PDF
FAIRsharing-Standards-4-GSC-Aug23.pdf
PDF
FAIR-4-GSC-Sansone-Aug23.pdf
PDF
FAIRsharing & FAIRcookbook at RDA 2023
PDF
NFDI Physical Sciences Colloquium - FAIR
PDF
Metadata Standards
PDF
FAIRcookbook: GSRS22-Singapore
PDF
FAIR Cookbook
PDF
FAIRsharing and the FAIR Cookbook
PDF
FAIRsharing for EOSC
PDF
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
PDF
FAIRsharing: what we do for policies
PDF
FAIRsharing: how we assist with FAIRness
PDF
ELIXIR FAIR Activities - Examplars
PDF
FAIRsharing - focus on standards and new features
PDF
FAIR data and standards for a coordinated COVID-19 response
PDF
FAIRsharing poster
PDF
The FAIR Cookbook poster
PDF
The FAIR Cookbook in a nutshell
FAIR and Reproducible - GSC, Tucson, Aug 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIRsharing-Standards-4-GSC-Aug23.pdf
FAIR-4-GSC-Sansone-Aug23.pdf
FAIRsharing & FAIRcookbook at RDA 2023
NFDI Physical Sciences Colloquium - FAIR
Metadata Standards
FAIRcookbook: GSRS22-Singapore
FAIR Cookbook
FAIRsharing and the FAIR Cookbook
FAIRsharing for EOSC
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRsharing: what we do for policies
FAIRsharing: how we assist with FAIRness
ELIXIR FAIR Activities - Examplars
FAIRsharing - focus on standards and new features
FAIR data and standards for a coordinated COVID-19 response
FAIRsharing poster
The FAIR Cookbook poster
The FAIR Cookbook in a nutshell

Recently uploaded (20)

PDF
How to run a consulting project- client discovery
PPTX
Modelling in Business Intelligence , information system
PPTX
Managing Community Partner Relationships
PPTX
modul_python (1).pptx for professional and student
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Lecture1 pattern recognition............
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Introduction to Data Science and Data Analysis
PDF
Business Analytics and business intelligence.pdf
How to run a consulting project- client discovery
Modelling in Business Intelligence , information system
Managing Community Partner Relationships
modul_python (1).pptx for professional and student
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
SAP 2 completion done . PRESENTATION.pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Qualitative Qantitative and Mixed Methods.pptx
annual-report-2024-2025 original latest.
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Lecture1 pattern recognition............
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
ISS -ESG Data flows What is ESG and HowHow
Introduction to Data Science and Data Analysis
Business Analytics and business intelligence.pdf

Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014

  • 1. Data Consultant, Honorary Academic Editor Associate Director, Principal Investigator The rise of the data-centric ! research and publication enterprises! ! A journey into the whirlwind! Susanna-Assunta Sansone, PhD! ! uk.linkedin.com/in/sasansone! @biosharing! @isatools! @scientificdata! ! http://www.slideshare.net/SusannaSansone Oxford Interdisciplinary Bioscience Doctoral Training Partnership (DTP) – 31 July, 2014
  • 2. Scope is to outline the strategies taken by the researchers to allow access to they research outputs
  • 3. But it does not give you all info you need to know! Also because data sharing policies are unclear; a very common and loose text is: “Applicants should make use of existing, recognised standards for data collection and management, where these exist, and make data available through existing community resources or databases where possible”. But what constitutes a recognised standard or acceptable community resource?
  • 4. •  About myself! o  activities and interests! •  FAIR data! o  concept! o  my related projects! •  Scientific Data! o  rationale! o  Data Descriptors! o  examples! Outline!
  • 6. Communities we work with/for: •  Describe my experiments •  Share info with my group •  Store for query & access •  Upload to analysis tool •  Compare with similar data •  etc…… •  Support the needs of our researchers •  Enhancing our existing tools; develop new components, or connect with other tools •  Comply to community standards •  etc…. •  Drive data reusability agenda •  Recommend tools, data resources and standards in data policies •  Criteria for endorsement •  Mapping the ecosystem of efforts •  etc……
  • 7. Key areas of activity: •  Data capture and curation •  Database development •  Data (nano)publication •  Data provenance •  Open, community ontologies and standards •  Semantic web •  Social engineering •  Software development •  Training •  Visualization (collaboration/ jointly with Prof Min Chen) Communities we work with/for:
  • 8. Key areas of activity: •  Data capture and curation •  Database development •  Data (nano)publication •  Data provenance •  Open, community ontologies and standards •  Semantic web •  Social engineering •  Software development •  Training •  Visualization (collaboration/ jointly with Prof Min Chen) Communities we work with/for: As part of: •  UK, European and international consortia •  Pre-competitive informatics public-private partnerships •  Standardization initiatives
  • 9. http://www.flickr.com/photos/notbrucelee/8016189356/ CC BY Our activities are around and in support of data curation, management and publication and their pivotal roles in enabling FAIR data and research, driving science and discoveries
  • 10. eTRIKS – european Translational Information and Knowledge management Services Consortium of academic (Imperial College, CNRS, Un of Luxemburg) and pharmas (Janssen, Merck, AZ, Lilly, Lundbeck, Pfizer, Roche, Sanofi, Bayer, GSK) building a sustainable, open translational research informatics platform • Nature Publishing Group‘s Scientific Data • BioMedCentral and BGI‘s GigaScience • F1000 Research • Oxford University Press Susanna-Assunta Sansone, Philippe Rocca-Serra, Alejandra Gonzalez-Beltran, Eamonn Maguire, Milo Thurston; Oxford alumni: Annapaola Santarsiero, Pavlos Georgiou + my previous team when at EBI (2001 – 2010)!
  • 14. A great start, but not enough! image by Greg Emmerich http://discovery.urlibraries.org/ http://www.theguardian.com/higher-education-network/blog/2014/jun/26
  • 15. §  Researchers and bioinformaticians in both academic and commercial science, along with funding agencies and publishers, embrace the concept that both •  DATA: entities of interest e.g., genes, metabolites, phenotypes and •  METADATA: experimental steps e.g., provenance of study materials, technology and measurement types should be Findable, Accessible, Interoperable and Reusable Worldwide movement for FAIR data Source: http://ebbailey.wordpress.com
  • 16. In all fairness, no much data is FAIR!
  • 17. In all fairness, no much data is FAIR!
  • 18. But it is not just about technology…!
  • 19. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 1 9 …breath and depth of the content! …is pivotal!
  • 20. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 2 0 sample characteristic(s) experimental design experimental variable(s) technology(s) measurement(s) protocols(s) data file(s) ......
  • 21. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 2 1 •  make annotation explicit and discoverable •  structure the descriptions for consistency •  ensure/regulate access •  deposit and publish •  etc…. §  To make this dataset ‘FAIR’, one must have best practices, standards and tools to: •  report sufficient details •  capture all salient features of the experimental workflow
  • 22. A community mobilization to develop standards, e.g.: §  Structural and operational differences •  organization types (open, close to members, society, WG etc.) •  standards development (how to formulate, conduct and maintain) •  adoption, uptake, outreach (link to journals, funders and commercial sector) •  funds (sponsors, memberships, grants, volunteering) de jure de facto grass-roots groups standard organizations Nanotechnology Working Group
  • 23. A community mobilization to develop standards, e.g.: de jure de facto grass-roots groups standard organizations Nanotechnology Working Group
  • 24. Focus on reporting or content standards Including minimum information reporting requirements, or checklists to report the same core, essential information Including controlled vocabularies, taxonomies, thesauri, ontologies etc. to use the same word and refer to the same ‘thing’ Including conceptual model, conceptual schema from which an exchange format is derived to allow data to flow from one system to another Community-developed, standards are pivotal to structure, enrich the description and share data and associated contextual metadata, facilitating understanding and reuse!
  • 25. Growing number of reporting standards + 130 + 150 + 303 Source:BioPortal Databases, ! annotation,! curation ! tools ! implementing ! standards! miame! MIAPA! MIRIAM! MIQAS! MIX! MIGEN! CIMR! MIAPE! MIASE! MIQE! MISFISHIE….! REMARK! CONSORT! MAGE-Tab! GCDML! SRAxml! SOFT! FASTA! DICOM! MzML! SBRML! SEDML…! GELML! ISA-Tab! CML! MITAB! AAO! CHEBI! OBI! PATO! ENVO! MOD! BTO! IDO…! TEDDY! PRO! XAO! DO VO! Source:BioSharing Source:BioSharing
  • 26. 26 Technologically-delineated views of the world
 ! Biologically-delineated views of the world! Generic features ( common core )! - description of source biomaterial! - experimental design components! Arrays! Scanning! Arrays &
 Scanning! Columns! Gels! MS! MS! FTIR! NMR! Columns! transcriptomics proteomics metabolomics plant biology epidemiology microbiology Fragmentation, duplications and gaps To compare and integrate data we need interoperable standards
  • 28. Three EBI omics systems SubmissionAccessStorage Fragmentation of the databases and data, e.g. Proteomics
  • 29. Three EBI omics systems SubmissionAccessStorage Fragmentation of the databases and data, e.g. Proteomics
  • 30. Three EBI omics systems DIFFERENT Formats, terminologies and tools Submission DIFFERENT Download formats Access DIFFERENT - Core requirements represented - Representation of the studies and related samples - Curation practices Storage Fragmentation of the databases and data, e.g. Proteomics
  • 31. How much do we know and which standards can we use
  • 33. Registering and cataloging is just step one; the next one are: •  Develop assessment criteria for usability and popularity of standards •  Associate standards to data policies and databases •  Assemble journal and funder policies re data storage •  Make fully cross-searchable •  Intended goal: help stakeholders make informed decisions
  • 34. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 3 4
  • 35. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 3 5 Users can claim records and maintain them
  • 36. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 3 6 Users can claim records and maintain them
  • 37. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 3 7 Classify, links standards and visualize relations
  • 38. Example The relationship among popular standard formats for pathway information! BioPAX and PSI-MI are designed for data exchange to and from databases and pathway and network data integration. SBML and CellML are designed to support mathematical simulations of biological systems and SBGN represents pathway diagrams. ! CREDIT: Demir, et al., The BioPAX community standard for pathway data sharing, 2010.
  • 39. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 3 9 •  make annotation explicit and discoverable •  structure the descriptions for consistency •  ensure/regulate access •  deposit and publish •  etc…. §  To make this dataset ‘FAIR’, one must have best practices, standards and tools to: •  report sufficient details •  capture all salient features of the experimental workflow
  • 42. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project ISA powers data collection, curation resources and repositories, e.g.:
  • 43. General-purpose, configurable format, designed to support: •  description of the experimental metadata, making the annotation explicit and discoverable •  provenance tracking •  use community standards, such as minimal reporting guidelines and terminologies •  designed to be converted to - a growing number of - other metadata formats, e.g. used by EBI repositories analysis ! method! script! Data file or ! record in a database!
  • 45. Our industry needs a Disruptive Innovation. 
 That Disruption...is Pistoia!
  • 46. Oxford e-Research Centre (Sansone SA): Data Standards and Curation training
  • 47. FAIR data - roles and responsibilities •  Data has to become an integral part of the scholarly communications! •  Responsibilities lie across several stakeholder groups: researchers, data centers, librarians, funding agencies and publishers! •  But publishers occupy a “leverage point” in this process!
  • 48. Journal publishing: the changing landscape!
  • 49. Human Genome 2001 62 Pages, 150 Authors, 49 Figure, 27 tables Encode Project 2012 30 papers, 3 Journals Nature Publishing Group: the changing landscape!
  • 50. 
 ! ! ! Launched on May 27th, 2014 A new online-only publication for descriptions of scientifically valuable datasets in the life, environmental and biomedical sciences, but not limited to these! Credit for sharing your data Focused on reuse and reproducibility Peer reviewed, curated Promoting Community Data Repositories Open Access Supported by:!
  • 51. ! ! ! Experimental metadata or! structured component! (in-house curated, machine- readable formats)! Article or ! narrative component! (PDF and HTML)! 
 Data Descriptor: narrative and structure!
  • 52. 
 Data Descriptor: narrative! Sections:! •  Title! •  Abstract! •  Background & Summary! •  Methods! •  Technical Validation! •  Data Records! •  Usage Notes ! •  Figures & Tables ! •  References! •  Data Citations! 
 ! Focus on data reuse! Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.! Does not contain tests of new scientific hypotheses! In traditional publications this information is not provided in a sufficiently detailed manner However this information is essential for understanding, reusing, and reproducing datasets
  • 53. 
 Data Descriptor: narrative! Sections:! •  Title! •  Abstract! •  Background & Summary! •  Methods! •  Technical Validation! •  Data Records! •  Usage Notes ! •  Figures & Tables ! •  References! •  Data Citations! 
 ! Focus on data reuse! Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.! Does not contain tests of new scientific hypotheses!
  • 54. 
 Data Descriptor: narrative! Sections:! •  Title! •  Abstract! •  Background & Summary! •  Methods! •  Technical Validation! •  Data Records! •  Usage Notes ! •  Figures & Tables ! •  References! •  Data Citations! 
 ! Focus on data reuse! Detailed descriptions of the methods and technical analyses supporting the quality of the measurements.! Does not contain tests of new scientific hypotheses! Joint Declaration of Data Citation Principles by the Data Citation Synthesis Group, incl.: -  CODATA -  Research Data Alliance (RDA), -  Force11
  • 55. In-house curation team:! •  assists users to submit the structured content via simple templates and an internal authoring tool! •  performs value-added semantic annotation of the experimental metadata! For advanced users/service providers willing to export ISA-Tab for direct submission, we will release a technical specification:! analysis ! method! script! Data file or ! record in a database! 
 Data Descriptor: structure (CC0)!
  • 56. Hanke: Neuroscience ! ! ! ! ! ! ! ! ! ! Code in GitHub New Dataset Data in OpenfMRI Source code in GitHub Big Data
  • 57. Stefano: Stem Cells! Associated Nature Article Data - figshare - NCBI GEO Integrated figshare data viewer
  • 58. ! ! ! ! ! ! ! ! Scientific hypotheses:! Synthesis! Analysis! Conclusions! Methods and technical analyses supporting the quality of the measurements:! What did I do to generate the data?! How was the data processed?! Where is the data?! Who did what when! BEFORE: get your data to the community as soon as possible (see NPG pre-publication policy) AT THE SAME TIME: publish your Data Descriptor(s) alongside research article(s) AFTER: expand on your research articles, adding further information for reuse of the data 
 Relation with traditional articles - content and time!
  • 59. Export to various formats (ISA_tab, RDF, etc) Linking between research papers, Data Descriptors, and data records 
 Making data discoverable
 ! We currently recognize over 50 public data repositories! !
  • 60. Evaluation is not be based on the perceived impact or novelty of the findings! •  Experimental Rigour and Technical Data Quality! o  Were the data produced in a rigorous and methodologically sound manner?! o  Was the technical quality of the data supported convincingly with technical validation experiments and statistical analyses of data quality or error, as needed?! o  Are the depth, coverage, size, and/or completeness of these data sufficient for the types of applications or research questions outlined by the authors?! •  Completeness of the Description! o  Are the methods and any data-processing steps described in sufficient detail to allow others to reproduce these steps?! o  Did the authors provide all the information needed for others to reuse this dataset or integrate it with other data?! o  Is this Data Descriptor, in combination with any repository metadata, consistent with relevant minimum information or reporting standards?! •  Integrity of the Data Files and Repository Record! o  Have you confirmed that the data files deposited by the authors are complete and match the descriptions in the Data Descriptor?! o  Have these data files been deposited in the most appropriate available data repository?! Peer review process focused on quality and reuse!
  • 61. •  Neuroscience, ecology, epidemiology, environmental science, functional genomics, metabolomics, toxicology! •  New datasets and previously published data sets! o  a fuller, more in-depth look at the data processing steps, supported by additional data files and code from each step! o  additional tutorial-like information for scientists interested in reusing or integrating the data with their own! •  Datasets in figshare and domain specific databases! •  Code deposited in figshare and GitHub! •  Individual datasets, curated aggregation and citizen science! •  First dataset part of a collection ! •  Academic and industry authors! 61 Current content is diverse – bimonthly releases !
  • 62. Take home messages http://www.flickr.com/photos/12308429@N03/4957994485/ u  Make sure your research outputs make an impact! u  Open your research outputs, via the right channels to get cited and credited u  Contribute to the reproducible research movement and to FAIR data
  • 63. Take home messages u  Uniquely identify yourself via ORCID u  Share identified generic research outputs, e.g. FigShare u  Share and deposit code, e.g. GitHub, Bitbucket http://www.flickr.com/photos/idiolector/289490834/
  • 64. Take home messages u  Learn about open standards in your area, via e.g. BioSharing u  Select tools that implement relevant standards, e.g. ISA u  Publish not just in traditional journals, but think Scientific Data http://www.flickr.com/photos/webhamster/2582189977/
  • 65. Data Consultant, Honorary Academic Editor Associate Director, Principal Investigator The rise of the data-centric ! research and publication enterprises! ! A journey into the whirlwind! Susanna-Assunta Sansone, PhD! ! uk.linkedin.com/in/sasansone! @biosharing! @isatools! @scientificdata! ! http://www.slideshare.net/SusannaSansone Oxford Interdisciplinary Bioscience Doctoral Training Partnership (DTP) – 31 July, 2014