FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording
of metadata for their interpretation.
The FAIR Guiding Principles for scientific data management and stewardship (http://www.nature.com/articles/sdata201618) has been an effective rallying-cry for EU and USA Research Infrastructures. FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has 8 years of experience of asset sharing and data infrastructure ranging across European programmes (SysMO and EraSysAPP ERANets), national initiatives (de.NBI, German Virtual Liver Network, UK SynBio centres) and PI's labs. It aims to support Systems and Synthetic Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety.
This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges of and approaches to sharing, credit, citation and asset infrastructures in practice. I'll also highlight recent experiments in affecting sharing using behavioural interventions.
http://www.fair-dom.org
http://www.fairdomhub.org
http://www.seek4science.org
Presented at COMBINE 2016, Newcastle, 19 September.
http://co.mbine.org/events/COMBINE_2016
Reproducibility, Research Objects and Reality, Leiden 2016Carole Goble
Presented at the Leiden Bioscience Lecture, 24 November 2016, Reproducibility, Research Objects and Reality
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. It all sounds very laudable and straightforward. BUT…..
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange
In this talk I will explore these issues in data-driven computational life sciences through the examples and stories from initiatives I am involved, and Leiden is involved in too including:
· FAIRDOM which has built a Commons for Systems and Synthetic Biology projects, with an emphasis on standards smuggled in by stealth and efforts to affecting sharing practices using behavioural interventions
· ELIXIR, the EU Research Data Infrastructure, and its efforts to exchange workflows
· Bioschemas.org, an ELIXIR-NIH-Google effort to support the finding of assets.
This document discusses Research Objects (RO), which provide a framework for bundling, exchanging, and linking resources related to experiments in order to improve reproducibility. The RO framework uses unique identifiers, aggregation, and metadata to group related resources. Real-world examples of ROs include reviewed scientific papers, workflow runs, and Docker images. ROs can help make research fully FAIR (Findable, Accessible, Interoperable, Reusable). Tools and platforms like FAIRDOM, SEEK, and Figshare support the use of ROs.
FAIR Data, Operations and Model management for Systems Biology and Systems Me...Carole Goble
This document discusses the FAIRDOM consortium's efforts to promote FAIR (Findable, Accessible, Interoperable, Reusable) principles for managing data, operations, and models from systems biology and systems medicine projects. It outlines challenges in asset management for multi-partner, multi-disciplinary projects using multiple formats and repositories. FAIRDOM provides pillars of support including community actions, platforms/tools, and a public project commons to help address these challenges and better enable sharing, reuse, and reproducibility of research assets according to FAIR principles.
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
Lecture 1:
Being FAIR: FAIR data and model management
In recent years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship [1] have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems and Synthetic Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Our FAIRDOM project (http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. The FAIRDOM Platform has been installed by over 30 labs or projects. Our public, centrally hosted Asset Commons, the FAIRDOMHub.org, supports the outcomes of 50+ projects.
Now established as a grassroots association, FAIRDOM has over 8 years of experience of practical asset sharing and data infrastructure at the researcher coal-face ranging across European programmes (SysMO and ERASysAPP ERANets), national initiatives (Germany's de.NBI and Systems Medicine of the Liver; Norway's Digital Life) and European Research Infrastructures (ISBE) as well as in PI's labs and Centres such as the SynBioChem Centre at Manchester.
In this talk I will show explore how FAIRDOM has been designed to support Systems Biology projects and show examples of its configuration and use. I will also explore the technical and social challenges we face.
I will also refer to European efforts to support public archives for the life sciences. ELIXIR (http:// http://www.elixir-europe.org/) the European Research Infrastructure of 21 national nodes and a hub funded by national agreements to coordinate and sustain key data repositories and archives for the Life Science community, improve access to them and related tools, support training and create a platform for dataset interoperability. As the Head of the ELIXIR-UK Node and co-lead of the ELIXIR Interoperability Platform I will show how this work relates to your projects.
[1] Wilkinson et al, The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
Being Reproducible: SSBSS Summer School 2017Carole Goble
Lecture 2:
Being Reproducible: Models, Research Objects and R* Brouhaha
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange.
In this talk I will explore these issues in more depth using the FAIRDOM Platform and its support for reproducible modelling. The talk will cover initiatives and technical issues, and raise social and cultural challenges.
NSF Workshop Data and Software Citation, 6-7 June 2016, Boston USA, Software Panel
FIndable, Accessible, Interoperable, Reusable Software and Data Citation: Europe, Research Objects, and BioSchemas.org
Written and presented by Tom Ingraham (F1000), at the Reproducible and Citable Data and Model Workshop, in Warnemünde, Germany. September 14th -16th 2015.
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
Metadata and Semantics Research Conference, Manchester, UK 2015
Research Objects: why, what and how,
In practice the exchange, reuse and reproduction of scientific experiments is hard, dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: codes fork, data is updated, algorithms are revised, workflows break, service updates are released. Neither should they be viewed just as second-class artifacts tethered to publications, but the focus of research outcomes in their own right: articles clustered around datasets, methods with citation profiles. Many funders and publishers have come to acknowledge this, moving to data sharing policies and provisioning e-infrastructure platforms. Many researchers recognise the importance of working with Research Objects. The term has become widespread. However. What is a Research Object? How do you mint one, exchange one, build a platform to support one, curate one? How do we introduce them in a lightweight way that platform developers can migrate to? What is the practical impact of a Research Object Commons on training, stewardship, scholarship, sharing? How do we address the scholarly and technological debt of making and maintaining Research Objects? Are there any examples
I’ll present our practical experiences of the why, what and how of Research Objects.
Reproducibility of model-based results: standards, infrastructure, and recogn...FAIRDOM
Written and presented by Dagmar Waltemath (University of Rostock) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
This document summarizes Professor Carole Goble's presentation on making research more reproducible and FAIR (Findable, Accessible, Interoperable, Reusable) through the use of research objects and related standards and infrastructure. It discusses challenges to reproducibility in computational research and proposes bundling datasets, workflows, software and other research products into standardized research objects that can be cited and shared to help address these challenges.
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
Talk presented at Early Detection of Cancer Conference, OHSU, Portland, Oregon USA, 2-4 Oct 2018, http://earlydetectionresearch.com/ in the Data Science session
Findable Accessable Interoperable Reusable < data |models | SOPs | samples | articles| * >. FAIR is a mantra; a meme; a myth; a mystery; a moan. For the past 15 years I have been working on FAIR in a bunch of projects and initiatives in Life Science projects. Some are top-down like Life Science European Research Infrastructures ELIXIR and ISBE, and some are bottom-up, supporting research projects in Systems and Synthetic Biology (FAIRDOM), Biodiversity (BioVel), and Pharmacology (open PHACTS), for example. Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. Some have happy endings. Who are the villains and who are the heroes? What are the morals we can draw from these stories?
Research Objects: more than the sum of the partsCarole Goble
Workshop on Managing Digital Research Objects in an Expanding Science Ecosystem, 15 Nov 2017, Bethesda, USA
https://www.rd-alliance.org/managing-digital-research-objects-expanding-science-ecosystem
Research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
A first step is to think of Digital Research Objects as a broadening out to embrace these artefacts or assets of research. The next is to recognise that investigations use multiple, interlinked, evolving artefacts. Multiple datasets and multiple models support a study; each model is associated with datasets for construction, validation and prediction; an analytic pipeline has multiple codes and may be made up of nested sub-pipelines, and so on. Research Objects (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described.
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
presented at 1st First International Workshop on Reproducible Open Science @ TPDL, 9 Sept 2016, Hannover, Germany
http://repscience2016.research-infrastructures.eu/
FAIR Data and Model Management for Systems Biology(and SOPs too!)Carole Goble
MultiScale Biology Network Springboard meeting, Nottingham, UK, 1 June 2015
FAIR Data and model management for Systems Biology
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Yes, data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. And the multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Data and model management for the Systems Biology community is a multi-faceted one including: the development and adoption appropriate community standards (and the navigation of the standards maze); the sustaining of international public archives capable of servicing quantitative biology; and the development of the necessary tools and know-how for researchers within their own institutes so that they can steward their assets in a sustainable, coherent and credited manner while minimizing burden and maximising personal benefit.
The FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has grown out of several efforts in European programmes (SysMO and EraSysAPP ERANets and the ISBE ESRFI) and national initiatives (de.NBI, German Virtual Liver Network, SystemsX, UK SynBio centres). It aims to support Systems Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth.
This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges multi-scale biology presents.
http://www.fair-dom.org
http://www.fairdomhub.org
http://www.seek4science.org
ROHub is a digital library and management system for research objects (ROs). It enables scientists to create, manage, and share ROs, which are semantic aggregations of related scientific resources, annotations, and research context. ROHub provides APIs and a web portal for scientists to use throughout the research lifecycle. It stores ROs long-term to support reproducibility and allows for monitoring changes to assess quality.
This document introduces FAIRDOM, a consortium that provides a platform and services to help researchers organize, manage, share, and preserve research outputs according to FAIR principles. FAIRDOM has been in operation for 10 years and has over 50 installations supporting over 118 projects. It provides tools and services to help researchers collaborate better and integrate their data, models, publications and other research objects. FAIRDOM also works with other organizations and infrastructure providers to support broader research initiatives.
Short talk on Research Object and their use for reproducibility and publishing in the Systems Biology Commons Platform FAIRDOMHub, and the underlying software SEEK.
Improving the Management of Computational Models -- Invited talk at the EBIMartin Scharm
Improving the Management of Computational Models:
storage – retrieval & ranking – version control
More information and slides to download at http://sems.uni-rostock.de/2013/12/martin-visits-the-ebi/
Aspects of Reproducibility in Earth ScienceRaul Palma
The document discusses aspects of reproducibility in earth science research within the European Virtual Environment for Research - Earth Science Themes (EVEREST) project. The key objectives of EVEREST are to establish an e-infrastructure to facilitate collaborative earth science research through shared data, models, and workflows. Research Objects (ROs) will be used to capture and share workflows, processes, and results to help ensure reproducibility and preservation of earth science research. An example RO is described for mapping volcano deformation using satellite imagery and other data sources. Issues around reproducibility related to data access, software dependencies, and manual intervention in workflows are also discussed.
Citing data in research articles: principles, implementation, challenges - an...FAIRDOM
Prepared and presented by Jo McEntyre (EMBL_EBI) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
Crediting informatics and data folks in life science teamsCarole Goble
Science Europe LEGS Committee: Career Pathways in Multidisciplinary Research: How to Assess the Contributions of Single Authors in Large Teams, 1-2 Dec 2015, Brussels
The People Behind Research Software crediting from the informatics, technical point of view
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
Research Objects and their instantiation as RO-Crate: motivation, explanation, examples, history and lessons, and opportunities for scholarly communications, delivered virtually to 17th Italian Research Conference on Digital Libraries
Written and presented by Carole Goble (University of Manchester) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
Improving the management of computational models.FAIRDOM
Written by Martin Scharm (University of Rostock), Ron Henkel (University of Rostock), Dagmar Waltemath (University of Rostock), Olaf Wolkenhauer (University of Rostock, Stellenbosch University), and presented by Martin Scharm (University of Rostock) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
Metadata and Semantics Research Conference, Manchester, UK 2015
Research Objects: why, what and how,
In practice the exchange, reuse and reproduction of scientific experiments is hard, dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: codes fork, data is updated, algorithms are revised, workflows break, service updates are released. Neither should they be viewed just as second-class artifacts tethered to publications, but the focus of research outcomes in their own right: articles clustered around datasets, methods with citation profiles. Many funders and publishers have come to acknowledge this, moving to data sharing policies and provisioning e-infrastructure platforms. Many researchers recognise the importance of working with Research Objects. The term has become widespread. However. What is a Research Object? How do you mint one, exchange one, build a platform to support one, curate one? How do we introduce them in a lightweight way that platform developers can migrate to? What is the practical impact of a Research Object Commons on training, stewardship, scholarship, sharing? How do we address the scholarly and technological debt of making and maintaining Research Objects? Are there any examples
I’ll present our practical experiences of the why, what and how of Research Objects.
Reproducibility of model-based results: standards, infrastructure, and recogn...FAIRDOM
Written and presented by Dagmar Waltemath (University of Rostock) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
This document summarizes Professor Carole Goble's presentation on making research more reproducible and FAIR (Findable, Accessible, Interoperable, Reusable) through the use of research objects and related standards and infrastructure. It discusses challenges to reproducibility in computational research and proposes bundling datasets, workflows, software and other research products into standardized research objects that can be cited and shared to help address these challenges.
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
Talk presented at Early Detection of Cancer Conference, OHSU, Portland, Oregon USA, 2-4 Oct 2018, http://earlydetectionresearch.com/ in the Data Science session
Findable Accessable Interoperable Reusable < data |models | SOPs | samples | articles| * >. FAIR is a mantra; a meme; a myth; a mystery; a moan. For the past 15 years I have been working on FAIR in a bunch of projects and initiatives in Life Science projects. Some are top-down like Life Science European Research Infrastructures ELIXIR and ISBE, and some are bottom-up, supporting research projects in Systems and Synthetic Biology (FAIRDOM), Biodiversity (BioVel), and Pharmacology (open PHACTS), for example. Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. Some have happy endings. Who are the villains and who are the heroes? What are the morals we can draw from these stories?
Research Objects: more than the sum of the partsCarole Goble
Workshop on Managing Digital Research Objects in an Expanding Science Ecosystem, 15 Nov 2017, Bethesda, USA
https://www.rd-alliance.org/managing-digital-research-objects-expanding-science-ecosystem
Research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
A first step is to think of Digital Research Objects as a broadening out to embrace these artefacts or assets of research. The next is to recognise that investigations use multiple, interlinked, evolving artefacts. Multiple datasets and multiple models support a study; each model is associated with datasets for construction, validation and prediction; an analytic pipeline has multiple codes and may be made up of nested sub-pipelines, and so on. Research Objects (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described.
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
presented at 1st First International Workshop on Reproducible Open Science @ TPDL, 9 Sept 2016, Hannover, Germany
http://repscience2016.research-infrastructures.eu/
FAIR Data and Model Management for Systems Biology(and SOPs too!)Carole Goble
MultiScale Biology Network Springboard meeting, Nottingham, UK, 1 June 2015
FAIR Data and model management for Systems Biology
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Yes, data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. And the multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Data and model management for the Systems Biology community is a multi-faceted one including: the development and adoption appropriate community standards (and the navigation of the standards maze); the sustaining of international public archives capable of servicing quantitative biology; and the development of the necessary tools and know-how for researchers within their own institutes so that they can steward their assets in a sustainable, coherent and credited manner while minimizing burden and maximising personal benefit.
The FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has grown out of several efforts in European programmes (SysMO and EraSysAPP ERANets and the ISBE ESRFI) and national initiatives (de.NBI, German Virtual Liver Network, SystemsX, UK SynBio centres). It aims to support Systems Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth.
This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges multi-scale biology presents.
http://www.fair-dom.org
http://www.fairdomhub.org
http://www.seek4science.org
ROHub is a digital library and management system for research objects (ROs). It enables scientists to create, manage, and share ROs, which are semantic aggregations of related scientific resources, annotations, and research context. ROHub provides APIs and a web portal for scientists to use throughout the research lifecycle. It stores ROs long-term to support reproducibility and allows for monitoring changes to assess quality.
This document introduces FAIRDOM, a consortium that provides a platform and services to help researchers organize, manage, share, and preserve research outputs according to FAIR principles. FAIRDOM has been in operation for 10 years and has over 50 installations supporting over 118 projects. It provides tools and services to help researchers collaborate better and integrate their data, models, publications and other research objects. FAIRDOM also works with other organizations and infrastructure providers to support broader research initiatives.
Short talk on Research Object and their use for reproducibility and publishing in the Systems Biology Commons Platform FAIRDOMHub, and the underlying software SEEK.
Improving the Management of Computational Models -- Invited talk at the EBIMartin Scharm
Improving the Management of Computational Models:
storage – retrieval & ranking – version control
More information and slides to download at http://sems.uni-rostock.de/2013/12/martin-visits-the-ebi/
Aspects of Reproducibility in Earth ScienceRaul Palma
The document discusses aspects of reproducibility in earth science research within the European Virtual Environment for Research - Earth Science Themes (EVEREST) project. The key objectives of EVEREST are to establish an e-infrastructure to facilitate collaborative earth science research through shared data, models, and workflows. Research Objects (ROs) will be used to capture and share workflows, processes, and results to help ensure reproducibility and preservation of earth science research. An example RO is described for mapping volcano deformation using satellite imagery and other data sources. Issues around reproducibility related to data access, software dependencies, and manual intervention in workflows are also discussed.
Citing data in research articles: principles, implementation, challenges - an...FAIRDOM
Prepared and presented by Jo McEntyre (EMBL_EBI) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
Crediting informatics and data folks in life science teamsCarole Goble
Science Europe LEGS Committee: Career Pathways in Multidisciplinary Research: How to Assess the Contributions of Single Authors in Large Teams, 1-2 Dec 2015, Brussels
The People Behind Research Software crediting from the informatics, technical point of view
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
Research Objects and their instantiation as RO-Crate: motivation, explanation, examples, history and lessons, and opportunities for scholarly communications, delivered virtually to 17th Italian Research Conference on Digital Libraries
Written and presented by Carole Goble (University of Manchester) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
Improving the management of computational models.FAIRDOM
Written by Martin Scharm (University of Rostock), Ron Henkel (University of Rostock), Dagmar Waltemath (University of Rostock), Olaf Wolkenhauer (University of Rostock, Stellenbosch University), and presented by Martin Scharm (University of Rostock) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
FAIR data and model management for systems biology.FAIRDOM
Written and presented by Carole Goble (University of Manchester) as part of Intelligent Systems for Molecular Biology (ISMB), Dublin. July 10th - 14th 2015.
Written and presented by Wolfgang Müller (HITS) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
The two-day Systems Biology Data Management Foundry Workshop brought together 35 participants from 5 countries to improve collaboration among data management practitioners and explore opportunities in systems biology, synthetic biology, and systems medicine. Participants gained a better understanding of different systems through show-and-tell sessions, generated ideas for cross-integration, and discussed establishing a foundry to support developers. Outcomes included forming collaborations and planning for future meetings to continue developing solutions for open, interoperable, and reusable data management.
The document discusses licensing, citation, and sustainability of intellectual property. It covers different types of licenses for software and data including open source, proprietary, and Creative Commons licenses. It provides resources for choosing an appropriate license, ensuring works are properly cited and credited to help sustain them, and guidelines for repositories, audits, and certifications.
Reproducible and citable data and models: an introduction.FAIRDOM
Prepared and presented by Carole Goble (University of Manchester), Wolfgang Mueller (HITS), Dagmar Waltermath (University of Rostock), at the Reproducible and Citable Data and Models Workshop, Warnemünde, Germany. September 14th - 16th 2015.
The webinar discussed FAIRDOM services that can help applicants to the ERACoBioTech call with their data management plans and requirements. FAIRDOM offers webinars on developing data management plans, and their platform and tools can help with organizing, storing, sharing, and publishing research data and models in a FAIR manner by utilizing metadata standards. Different levels of support are available, from general community resources through their hub, to premium customized support for individual projects. Consortia can include FAIRDOM as a subcontractor within the guidelines of the ERACoBioTech call.
Precision Medicine in Oncology InformaticsWarren Kibbe
Precision medicine in oncology aims to provide targeted cancer treatments based on a patient's individual tumor characteristics. The presentation discusses precision oncology initiatives including NCI-MATCH clinical trials which assign cancer therapies based on a tumor's molecular abnormalities rather than location. It outlines plans to expand genomically-based cancer trials, understand and overcome treatment resistance through molecular analysis, and establish a national cancer database integrating genomic and clinical data to accelerate cancer research. Cloud computing platforms are being developed to provide researchers access to large cancer genomic and clinical datasets. The goal is to advance precision cancer treatment by incorporating individual patient genetics and biomarkers into therapeutic decision making.
2016-10-20 BioExcel: Advances in Scientific Workflow EnvironmentsStian Soiland-Reyes
Carole Goble, Stian Soiland-Reyes
http://orcid.org/0000-0001-9842-9718
Presented at 2016-10-20 BioExcel Workflow Training, BSC, Barcelona
http://bioexcel.eu/events/bioexcel-workflow-training-for-computational-biomolecular-research/
NOTE: Although these slides are licensed as CC Attribution, it includes various logos which are covered by their own licenses and copyrights.
"Data Provenance: Principles and Why it matters for BioMedical Applications"Pinar Alper
Tutorial given at Informatics for HEalth 2017 COnference These slides are for the second part of the tutorial describing provenance capture and management tools.
Taverna workflows can be run in the cloud to automate complex analysis pipelines and access remote data and services. This allows sophisticated computational analyses to be shared as web services. The BioVeL and CA4LS projects are developing cloud-based workflow systems to support life scientists and clinical researchers. Workflows are hidden from users, who access pre-configured analyses via a web interface. This "workflow as a service" approach scales easily and provides a secure environment for data-intensive biomedical research.
German Conference on Bioinformatics 2021
https://gcb2021.de/
FAIR Computational Workflows
Computational workflows capture precise descriptions of the steps and data dependencies needed to carry out computational data pipelines, analysis and simulations in many areas of Science, including the Life Sciences. The use of computational workflows to manage these multi-step computational processes has accelerated in the past few years driven by the need for scalable data processing, the exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods. The SARS-CoV-2 pandemic has significantly highlighted the value of workflows.
This increased interest in workflows has been matched by the number of workflow management systems available to scientists (Galaxy, Snakemake, Nextflow and 270+ more) and the number of workflow services like registries and monitors. There is also recognition that workflows are first class, publishable Research Objects just as data are. They deserve their own FAIR (Findable, Accessible, Interoperable, Reusable) principles and services that cater for their dual roles as explicit method description and software method execution [1]. To promote long-term usability and uptake by the scientific community, workflows (as well as the tools that integrate them) should become FAIR+R(eproducible), and citable so that author’s credit is attributed fairly and accurately.
The work on improving the FAIRness of workflows has already started and a whole ecosystem of tools, guidelines and best practices has been under development to reduce the time needed to adapt, reuse and extend existing scientific workflows. An example is the EOSC-Life Cluster of 13 European Biomedical Research Infrastructures which is developing a FAIR Workflow Collaboratory based on the ELIXIR Research Infrastructure for Life Science Data Tools ecosystem. While there are many tools for addressing different aspects of FAIR workflows, many challenges remain for describing, annotating, and exposing scientific workflows so that they can be found, understood and reused by other scientists.
This keynote will explore the FAIR principles for computational workflows in the Life Science using the EOSC-Life Workflow Collaboratory as an example.
[1] Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes,Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, and Daniel Schober FAIR Computational Workflows Data Intelligence 2020 2:1-2, 108-121 https://doi.org/10.1162/dint_a_00033.
The function-as-a-service (FaaS) model is well established in commercial cloud offerings but less so in research computing environments. The Globus Compute service enables remote computing using the FaaS model, but allows users to execute functions on any compute resource where they have access. We provide an overview of the Globus Compute service, and demonstrate how to install an endpoint and execute a function on a remote system.
This material was presented at the Research Computing and Data Management Workshop, hosted by Rensselaer Polytechnic Institute on February 27-28, 2024.
End-to-end Data Governance with Apache Avro and AtlasDataWorks Summit
This document discusses end-to-end data governance with Apache Avro and Apache Atlas at Comcast. It outlines how Comcast uses Avro for schema governance and Apache Atlas for data governance, including metadata browsing, schema registry, and tracking data lineage. Comcast has extended Atlas with new types for Avro schemas and customizations to better handle their hybrid environment and integrate platforms for comprehensive data governance.
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
With a current zoo of technologies and different ways of their interaction it's a big challenge to architect a system (or adopt existed one) that will conform to low-latency BigData analysis requirements. Apache Kafka and Kappa Architecture in particular take more and more attention over classic Hadoop-centric technologies stack. New Consumer API put significant boost in this direction. Microservices-based streaming processing and new Kafka Streams tend to be a synergy in BigData world.
eResearch workflows for studying free and open source software developmentAndrea Wiggins
1. The document discusses using scientific workflows and tools like Taverna for distributed collaborative research on free and open source software development using large datasets, computational resources, and reproducible analysis.
2. Taverna is presented as an example of a scientific workflow tool that allows modular development of analysis through reusable components with input and output ports, offering advantages over scripts.
3. An example workflow is shown that calculates network centralization in dynamic networks and generates time series and CSV output for further analysis.
Next-Generation Completeness and Consistency Management in the Digital Threa...Ákos Horváth
The document discusses challenges in maintaining consistency, completeness, and correctness (3C) across disconnected engineering data silos. It proposes using links and transformations to connect models between systems engineering and electrical design tools. Validation rules can then check that connections and components are properly mapped between the silos. The IncQuery Validator was used to import a model from E3.GENESYS into its knowledge graph and generate a validation report checking for 3C issues. Tracking link management and validation results over time provides visibility into the progress of the "digital thread" across the engineering lifecycle.
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSEd Dodds
1) Globus Genomics addresses challenges in sequencing analysis by providing a platform that integrates data transfer via Globus Online, workflow management in Galaxy, and scalable compute resources in AWS.
2) An example collaboration with the Dobyns Lab saw over a 10x speedup in exome data analysis by replacing a manual process with Globus Genomics.
3) Globus Genomics leverages XSEDE services like Globus Transfer and Nexus while integrating additional resources like sequencing centers and cloud computing, in order to reduce the costs and complexities of genomic research for communities not traditionally using advanced cyberinfrastructure.
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
cientific workflows are used by many scientific communities to capture, automate and standardize computational and data practices in science. Workflow-based automation is often achieved through a craft that combines people, process, computational and Big Data platforms, application-specific purpose and programmability, leading to provenance-aware archival and publications of the results. This talk summarizes varying and changing requirements for distributed workflows influenced by Big Data and heterogeneous computing architectures and present a methodology for workflow-driven science based on these maturing requirements.
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeCarole Goble
Workflow systems support the design, configuration and execution of repetitive, multi-step pipelines and analytics, well established in many disciplines, notably biology and chemistry, but less so in biodiversity and ecology. From an experimental perspective workflows are a means to handle the work of accessing an ecosystem of software and platforms, manage data and security, and handle errors. From a reporting perspective they are a means to accurately document methodology for reproducibility, comparison, exchange and reuse, and to trace the provenance of results for review, credit, workflow interoperability and impact analysis. Workflows operate in an evolving ecosystem and are assemblages of components in that ecosystem; their provenance trails are snapshots of intermediate and final results. Taking a lifecycle perspective, what are the challenges in workflow design and use with different stakeholders? What needs to be tackled in evolution, resilience, and preservation? And what are the “mitigate or adapt” strategies adopted by workflow systems in the face of changes in the ecosystem/environment, for example when tools are depreciated or datasets become inaccessible in the face of funding shortfalls?
Introduces the Globus software-as-a-service for file transfer and data sharing. Includes step-by-step instructions for creating a Globus account, transferring a file, and setting up a Globus endpoint on your laptop.
In the new era of digitalization, there is an ever-growing need for design and production processes capable of increasing systems quality, reducing risks and the chance of errors, while, at the same me, reducing overall production costs. Nowadays, more and more systems design scenarios comprise a high number of domains.
However, the underlying tool landscape is still dominated by closed ecosystems, resulting in the design data remaining in separate silos. To effectively deal with novel, massively diverse yet interconnected engineering scenarios, while also considering industrial sustainability and the well-being of the future digital society, we have to propose new ways to look at the digital thread, supporting every phase of a digital engineering lifecycle, while turning the siloed multi-domain engineering data into a holistic, accessible and globally analyzable digital thread.
Supporting Research through "Desktop as a Service" models of e-infrastructure...David Wallom
Keynote presentation given 13/9/16 @ ESA Earth Observation Open Science workshop 2016.
"The rise in cloud computing as an e-infrastructure model is one that has the power to democratise access to computational and data resources throughout the research communities. We have seen the difference that Infrastructure as a Service (IaaS) has made for different communities and are now only beginning to understand what different models further up the stack can make. It is also becoming clear that with the increase in research data volumes, the number of sources and the possibility of utilising data from different regulatory regimes that a different model of how analysis is performed on the data is possible. Utilising a "Desktop as a Service" model, with community focused applications installed on a common and well understood virtual system image that is directly connected to community relevant data allows the researcher to no longer have to consider moving data but only the final analysed results. This massively simplifies both the user model and the data and resource owner model. We will consider the specific example of the Environmental Ecomics Synthesis Cloud and how it could easily be generalised to other areas."
Apache Airavata is an open source science gateway software framework that allows users to compose, manage, execute, and monitor distributed computational workflows. It provides tools and services to register applications, schedule jobs on various resources, and manage workflows and generated data. Airavata is used across several domains to support scientific workflows and is largely derived from academic research funded by the NSF.
The Taverna Suite provides tools for interactive and batch workflow execution. It includes a workbench for graphical workflow construction, various client interfaces, and servers for multi-user workflow execution. The suite utilizes a plug-in framework and supports a variety of domains, infrastructures, and tools through custom plug-ins.
This document discusses tools for distributed data analysis including Apache Spark. It is divided into three parts:
1) An introduction to cluster computing architectures like batch processing and stream processing.
2) The Python data analysis library stack including NumPy, Matplotlib, Scikit-image, Scikit-learn, Rasterio, Fiona, Pandas, and Jupyter.
3) The Apache Spark cluster computing framework and examples of its use including contexts, HDFS, telemetry, MLlib, streaming, and deployment on AWS.
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...Carole Goble
Presented at the FAIR Data in Practice Symposium, 16 may 2023 at BioITWorld Boston. https://www.bio-itworldexpo.com/fair-data. The ELIXIR European research Infrastructure for life science data is an inter-governmental organizations coordinating, integrating and sustaining FAIR data and software resources across its 23 nations. To help advise users, data stewards, project managers and service providers, ELIXIR has developed complementary community-driven, open knowledge resources for guiding FAIR Research Data Management (RDMkit) and providing FAIRification recipes (FAIRCookbook). 150+ people have contributed content so far, including representatives of the pharmaceutical industry.
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...Carole Goble
Invited talk, PHIL_OS, March 30-31 2023, Exeter
https://opensciencestudies.eu/whither-open-science. Includes hidden slides.
FAIR and Open Science needs Digital Research Infrastructure, which is a federated system of systems and needs funding models that are fit for purpose
Culture change needed for paying for Open Science’s infrastructure and funding support for data driven research needs more reality and less rhetoric
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsCarole Goble
Abstract
slides available at: https://zenodo.org/record/7147703#.Y7agoxXP2F4
The Helmholtz Metadata Collaboration aims to make the research data [and software] produced by Helmholtz Centres FAIR for their own and the wider science community by means of metadata enrichment [1]. Why metadata enrichment and why FAIR? Because the whole scientific enterprise depends on a cycle of finding, exchanging, understanding, validating, reproducing), integrating and reusing research entities across a dispersed community of researchers.
Metadata is not just “a love note to the future” [2], it is a love note to today’s collaborators and peers. Moreover, a FAIR Commons must cater for the metadata of all the entities of research – data, software, workflows, protocols, instruments, geo-spatial locations, specimens, samples, people (well as traditional articles) – and their interconnectivity. That is a lot of metadata love notes to manage, bundle up and move around. Notes written in different languages at different times by different folks, produced and hosted by different platforms, yet referring to each other, and building an integrated picture of a multi-part and multi-party investigation. We need a crate!
RO-Crate [3] is an open, community-driven, and lightweight approach to packaging research entities along with their metadata in a machine-readable manner. Following key principles - “just enough” and “developer and legacy friendliness - RO-Crate simplifies the process of making research outputs FAIR while also enhancing research reproducibility and citability. As a self-describing and unbounded “metadata middleware” framework RO-Crate shows that a little bit of packaging goes a long way to realise the goals of FAIR Digital Objects (FDO)[4], and to not just overcome platform diversity but celebrate it while retaining investigation contextual integrity.
In this talk I will present the why, and how Research Object packaging eases Metadata Collaboration using examples in big data and mixed object exchange, mixed object archiving and publishing, mass citation, and reproducibility. Some examples come from the HMC, others from EOSC, USA and Australia, and from different disciplines.
Metadata is a love note to the future, RO-Crate is the delivery package.
[1] https://helmholtz-metadaten.de/en
[2] Scott, Jason The Metadata Mania, http://ascii.textfiles.com/archives/3181, June 2011
[3] Soiland-Reyes, Stian et al. “Packaging Research Artefacts with RO-Crate”. Data Science, 2022; 5(2):97-138, DOI: 10.3233/DS-210053
[4] De Smedt K, Koureas D, Wittenburg P. “FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units”. Publications. 2020; 8(2):21. https://doi.org/10.3390/publications8020021
Research Software Sustainability takes a VillageCarole Goble
1. Research software sustainability requires communities to support development and maintenance over time.
2. Strong communities cultivate relationships between developers, users, and other stakeholders to establish trust and shared responsibility for software.
3. Maintaining communities requires ongoing efforts like change management, skills development, and cultivating relationships that span organizational boundaries. Funders can support these community efforts.
“Bioscience has emerged as a data-rich discipline, in a transformation that is spreading as widely now as molecular biology in the twentieth century. We look forward to supporting new research careers, where data are valued and shared widely, where new software is a natural part of Biology, and where re-analysis and modelling are as creative as experimentation in understanding the rules of life and their applications.” Prof Andrew Millar FRS, chair Expert Group UKRI-BBSRC Review of data-intensive bioscience 2020.
Indeed - biomedical science is knowledge work and knowledge turning - the turning of observation and hypothesis through experimentation, comparison, and analysis into new, pooled knowledge. Turns depend on the FAIR and Open flow and availability of data and methods for automated processing and reproducible results, and on a society of scientists coordinating and collaborating.
For the past 25 years I have worked on the social and technical challenges in digital infrastructure to support scientific collaboration, data and method sharing, and automate scientific processing. Big ideas I have been instrumental in – sharing and publishing high quality computational workflows, semantic web technologies in bioscience, ecosystems of Research Objects as the currency of scholarly knowledge, FAIR data principles - preached revolution to inspire but need nudges* to get traction.
I’ll talk about making good on Andrew’s quote: what I’m doing to nudge and where we need to do more. I’ll also talk about my experiences as a woman in a digital infrastructure and computer science over the past 40 years – and some nudging is needed there too.
*Thaler RH, Sunstein CR (2008) Nudge: Improving Decisions about Health, Wealth, and Happiness. Yale University Press. ISBN 978-0-14-311526-7. OCLC 791403664.
https://www.bsc.es/research-and-development/research-seminars/hybrid-bsc-rslife-sessionbioinfo4women-seminar-love-money-fame-nudge-enabling-data-intensive
This document discusses FAIR computational workflows and why they are important. It defines computational workflows as multi-step processes for data analysis and simulation that link computational steps and handle data and processing dependencies. Workflows improve reproducibility, enable automation, and allow for increased sharing and reuse of research. The document outlines how applying FAIR principles to workflows makes them findable, accessible, interoperable, and reusable. This includes using standardized metadata, identifiers, licensing, and formats to describe workflows and ensure their components and data are also FAIR. Adopting FAIR workflows requires support from workflow systems, tools, communities and services.
Open Research: Manchester leading and learningCarole Goble
Open and FAIR science has an international momentum. Large scale communities are striving to make and manage the digital infrastructure needed for scientists to be open as possible, closed as necessary, as expected by the NIH, OECD, UNESCO and the EC. ELIXIR is such a research infrastructure in Europe for Life Sciences. This talk will highlight two of ELIXIR's Open Science resources built by Open Science communities to enable life science researchers to be open, and led by Manchester. And how can we learn from these and bring these practices to Manchester?
Launch: Manchester Office for Open Research, 4th April 2022
https://www.openresearch.manchester.ac.uk/
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
https://datascience.nih.gov/news/march-data-sharing-and-reuse-seminar 11 March 2022
Starting in 2023, the US National Institutes of Health (NIH) will require institutes and researchers receiving funding to include a Data Management Plan (DMP) in their grant applications, including the making their data publicly available. Similar mandates are already in place in Europe, for example a DMP is mandatory in Horizon Europe projects involving data.
Policy is one thing - practice is quite another. How do we provide the necessary information, guidance and advice for our bioscientists, researchers, data stewards and project managers? There are numerous repositories and standards. Which is best? What are the challenges at each step of the data lifecycle? How should different types of data? What tools are available? Research Data Management advice is often too general to be useful and specific information is fragmented and hard to find.
ELIXIR, the pan-national European Research Infrastructure for Life Science data, aims to enable research projects to operate “FAIR data first”. ELIXIR supports researchers across their whole RDM lifecycle, navigating the complexity of a data ecosystem that bridges from local cyberinfrastructures to pan-national archives and across bio-domains.
The ELIXIR RDMkit (https://rdmkit.elixir-europe.org (link is external)) is a toolkit built by the biosciences community, for the biosciences community to provide the RDM information they need. It is a framework for advice and best practice for RDM and acts as a hub of RDM information, with links to tool registries, training materials, standards, and databases, and to services that offer deeper knowledge for DMP planning and FAIR-ification practices.
Launched in March 2021, over 120 contributors have provided nearly 100 pages of content and links to more than 300 tools. Content covers the data lifecycle and specialized domains in biology, national considerations and examples of “tool assemblies” developed to support RDM. It has been accessed by over 123 countries, and the top of the access list is … the United States.
The RDMkit is already a recommended resource of the European Commission. The platform, editorial, and contributor methods helped build a specialized sister toolkit for infectious diseases as part of the recently launched BY-COVID project. The toolkit’s platform is the simplest we could manage - built on plain GitHub - and the whole development and contribution approach tailored to be as lightweight and sustainable as possible.
In this talk, Carole and Frederik will present the RDMkit; aims and context, content, community management, how folks can contribute, and our future plans and potential prospects for trans-Atlantic cooperation.
Data policy must be partnered with data practice. Our researchers need to be the best informed in order to meet these new data management and data sharing mandates.
This document discusses computational workflows and FAIR principles. It begins by providing background on computational workflows and their increasing importance. It then discusses challenges around finding, accessing, and sharing workflows. Next, it explores how applying FAIR principles to workflows could help address these challenges by making workflows and their associated objects findable, accessible, interoperable, and reusable. This includes discussing applying metadata standards, using persistent identifiers, and developing principles for FAIR workflows and FAIR software. The document concludes by examining the roles and responsibilities of different stakeholders in working towards FAIR workflows.
presentation at https://researchsoft.github.io/FAIReScience/, FAIReScience 2021 online workshop
virtually co-located with the 17th IEEE International Conference on eScience (eScience 2021)
FAIR Data Bridging from researcher data management to ELIXIR archives in the...Carole Goble
ISMB-ECCB 2021, NIH/ODSS Session, 27 July 2021
ELIXIR is the pan-national European Research Infrastructure for Life Science data, whose 23 national nodes and the EBI coordinate the development and long-term sustainability of domain public databases. FAIR services, policies and curation approaches aim to build a FAIR connected data ecosystem of trusted domain repositories, from ENA, HPA and EGA to specialised resources like CorkOakDB and PIPPA for plant phenotypes. But this is only one part of the data landscape and often the end of data’s journey. The nodes support research projects to operate “FAIR data first”, working with institutional and national platforms that are often generic or designed for project-based data management. We need to bridge between project-based and community-based, and support researchers across their whole RDM lifecycle, navigating the complexity this ecosystem. The ELIXIR-CONVERGE project and its flagship RDMkit toolkit (https://rdmkit.elixir-europe.org) aims to do just that.
FAIR Computational Workflows
Computational workflows capture precise descriptions of the steps and data dependencies needed to carry out computational data pipelines, analysis and simulations in many areas of Science, including the Life Sciences. The use of computational workflows to manage these multi-step computational processes has accelerated in the past few years driven by the need for scalable data processing, the exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods. The SARS-CoV-2 pandemic has significantly highlighted the value of workflows.
This increased interest in workflows has been matched by the number of workflow management systems available to scientists (Galaxy, Snakemake, Nextflow and 270+ more) and the number of workflow services like registries and monitors. There is also recognition that workflows are first class, publishable Research Objects just as data are. They deserve their own FAIR (Findable, Accessible, Interoperable, Reusable) principles and services that cater for their dual roles as explicit method description and software method execution [1]. To promote long-term usability and uptake by the scientific community, workflows (as well as the tools that integrate them) should become FAIR+R(eproducible), and citable so that author’s credit is attributed fairly and accurately.
The work on improving the FAIRness of workflows has already started and a whole ecosystem of tools, guidelines and best practices has been under development to reduce the time needed to adapt, reuse and extend existing scientific workflows. An example is the EOSC-Life Cluster of 13 European Biomedical Research Infrastructures which is developing a FAIR Workflow Collaboratory based on the ELIXIR Research Infrastructure for Life Science Data Tools ecosystem. While there are many tools for addressing different aspects of FAIR workflows, many challenges remain for describing, annotating, and exposing scientific workflows so that they can be found, understood and reused by other scientists.
This keynote will explore the FAIR principles for computational workflows in the Life Science using the EOSC-Life Workflow Collaboratory as an example.
[1] Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes,Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, and Daniel Schober FAIR Computational Workflows Data Intelligence 2020 2:1-2, 108-121 https://doi.org/10.1162/dint_a_00033.
FAIR Workflows and Research Objects get a Workout Carole Goble
So, you want to build a pan-national digital space for bioscience data and methods? That works with a bunch of pre-existing data repositories and processing platforms? So you can share FAIR workflows and move them between services? Package them up with data and other stuff (or just package up data for that matter)? How? WorkflowHub (https://workflowhub.eu) and RO-Crate Research Objects (https://www.researchobject.org/ro-crate) that’s how! A step towards FAIR Digital Objects gets a workout.
Presented at DataVerse Community Meeting 2021
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
https://ucsb.zoom.us/meeting/register/tZYod-ippz4pHtaJ0d3ERPIFy2QIvKqjwpXR
FAIRy stories: the FAIR Data principles in theory and in practice
The ‘FAIR Guiding Principles for scientific data management and stewardship’ [1] launched a global dialogue within research and policy communities and started a journey to wider accessibility and reusability of data and preparedness for automation-readiness (I am one of the army of authors). Over the past 5 years FAIR has become a movement, a mantra and a methodology for scientific research and increasingly in the commercial and public sector. FAIR is now part of NIH, European Commission and OECD policy. But just figuring out what the FAIR principles really mean and how we implement them has proved more challenging than one might have guessed. To quote the novelist Rick Riordan “Fairness does not mean everyone gets the same. Fairness means everyone gets what they need”.
As a data infrastructure wrangler I lead and participate in projects implementing forms of FAIR in pan-national European biomedical Research Infrastructures. We apply web-based industry-lead approaches like Schema.org; work with big pharma on specialised FAIRification pipelines for legacy data; promote FAIR by Design methodologies and platforms into the researcher lab; and expand the principles of FAIR beyond data to computational workflows and digital objects. Many use Linked Data approaches.
In this talk I’ll use some of these projects to shine some light on the FAIR movement. Spoiler alert: although there are technical issues, the greatest challenges are social. FAIR is a team sport. Knowledge Graphs play a role – not just as consumers of FAIR data but as active contributors. To paraphrase another novelist, “It is a truth universally acknowledged that a Knowledge Graph must be in want of FAIR data.”
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
RO-Crate: A framework for packaging research products into FAIR Research Objects presented to Research Data Alliance RDA Data Fabric/GEDE FAIR Digital Object meeting. 2021-02-25
How are we Faring with FAIR? (and what FAIR is not)Carole Goble
Keynote presented at the workshop FAIRe Data Infrastructures, 15 October 2020
https://www.gmds.de/aktivitaeten/medizinische-informatik/projektgruppenseiten/faire-dateninfrastrukturen-fuer-die-biomedizinische-informatik/workshop-2020/
Remarkably it was only in 2016 that the ‘FAIR Guiding Principles for scientific data management and stewardship’ appeared in Scientific Data. The paper was intended to launch a dialogue within the research and policy communities: to start a journey to wider accessibility and reusability of data and prepare for automation-readiness by supporting findability, accessibility, interoperability and reusability for machines. Many of the authors (including myself) came from biomedical and associated communities. The paper succeeded in its aim, at least at the policy, enterprise and professional data infrastructure level. Whether FAIR has impacted the researcher at the bench or bedside is open to doubt. It certainly inspired a great deal of activity, many projects, a lot of positioning of interests and raised awareness. COVID has injected impetus and urgency to the FAIR cause (good) and also highlighted its politicisation (not so good).
In this talk I’ll make some personal reflections on how we are faring with FAIR: as one of the original principles authors; as a participant in many current FAIR initiatives (particularly in the biomedical sector and for research objects other than data) and as a veteran of FAIR before we had the principles.
What is Reproducibility? The R* brouhaha and how Research Objects can helpCarole Goble
This document discusses reproducibility in computational research. It defines several key terms related to reproducibility, including replicate, rerun, and repeat. It notes that computational papers should provide the full software and data used to generate results. The document outlines several rules for reproducible research, such as tracking how results were produced, version controlling scripts, and archiving intermediate results. It also discusses challenges to reproducibility like lack of funding and changing dependencies over time. Finally, it introduces Research Objects as a framework to bundle resources like data, software, and protocols to help address reproducibility issues.
A keynote given on the FAIR Data Principles at the FAIRplus Innovation and SME Forum, Hinxton Genome Campus, Cambridge, UK, 29 January 2020. The history of the principles, issues about the principles and speculations about the future
ELIXIR UK Node presentation to the ELIXIR BoardCarole Goble
The document provides information about ELIXIR-UK, which is the UK node of ELIXIR, a European infrastructure for biological information. ELIXIR-UK is a network of 18 UK organizations and has established training programs, services, and communities. It coordinates UK participation in ELIXIR and related EU projects. ELIXIR-UK also works to establish interoperability across biological data resources and help make these resources FAIR (Findable, Accessible, Interoperable, Reusable). It is working to establish BioFAIR, a proposed new institute that would coordinate UK life science data infrastructure.
The human eye is a complex organ responsible for vision, composed of various structures working together to capture and process light into images. The key components include the sclera, cornea, iris, pupil, lens, retina, optic nerve, and various fluids like aqueous and vitreous humor. The eye is divided into three main layers: the fibrous layer (sclera and cornea), the vascular layer (uvea, including the choroid, ciliary body, and iris), and the neural layer (retina).
Here's a more detailed look at the eye's anatomy:
1. Outer Layer (Fibrous Layer):
Sclera:
The tough, white outer layer that provides shape and protection to the eye.
Cornea:
The transparent, clear front part of the eye that helps focus light entering the eye.
2. Middle Layer (Vascular Layer/Uvea):
Choroid:
A layer of blood vessels located between the retina and the sclera, providing oxygen and nourishment to the outer retina.
Ciliary Body:
A ring of tissue behind the iris that produces aqueous humor and controls the shape of the lens for focusing.
Iris:
The colored part of the eye that controls the size of the pupil, regulating the amount of light entering the eye.
Pupil:
The black opening in the center of the iris that allows light to enter the eye.
3. Inner Layer (Neural Layer):
Retina:
The light-sensitive layer at the back of the eye that converts light into electrical signals that are sent to the brain via the optic nerve.
Optic Nerve:
A bundle of nerve fibers that carries visual signals from the retina to the brain.
4. Other Important Structures:
Lens:
A transparent, flexible structure behind the iris that focuses light onto the retina.
Aqueous Humor:
A clear, watery fluid that fills the space between the cornea and the lens, providing nourishment and maintaining eye shape.
Vitreous Humor:
A clear, gel-like substance that fills the space between the lens and the retina, helping maintain eye shape.
Macula:
A small area in the center of the retina responsible for sharp, central vision.
Fovea:
The central part of the macula with the highest concentration of cone cells, providing the sharpest vision.
These structures work together to allow us to see, with the light entering the eye being focused by the cornea and lens onto the retina, where it is converted into electrical signals that are transmitted to the brain for interpretation.
he eye sits in a protective bony socket called the orbit. Six extraocular muscles in the orbit are attached to the eye. These muscles move the eye up and down, side to side, and rotate the eye.
The extraocular muscles are attached to the white part of the eye called the sclera. This is a strong layer of tissue that covers nearly the entire surface of the eyeball.he layers of the tear film keep the front of the eye lubricated.
Tears lubricate the eye and are made up of three layers. These three layers together are called the tear film. The mucous layer is made by the conjunctiva. The watery part of the tears is made by the lacrimal gland
2025 Insilicogen Company Korean BrochureInsilico Gen
Insilicogen is a company, specializes in Bioinformatics. Our company provides a platform to share and communicate various biological data analysis effectively.
Lipids: Classification, Functions, Metabolism, and Dietary RecommendationsSarumathi Murugesan
This presentation offers a comprehensive overview of lipids, covering their classification, chemical composition, and vital roles in the human body and diet. It details the digestion, absorption, transport, and metabolism of fats, with special emphasis on essential fatty acids, sources, and recommended dietary allowances (RDA). The impact of dietary fat on coronary heart disease and current recommendations for healthy fat consumption are also discussed. Ideal for students and professionals in nutrition, dietetics, food science, and health sciences.
2025 Insilicogen Company English BrochureInsilico Gen
Insilicogen is a company, specializes in Bioinformatics. Our company provides a platform to share and communicate various biological data analysis effectively.
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...Sérgio Sacani
The origin of heavy elements synthesized through the rapid neutron capture process (r-process) has been an enduring mystery for over half a century. J. Cehula et al. recently showed that magnetar giant flares, among the brightest transients ever observed, can shock heat and eject neutron star crustal material at high velocity, achieving the requisite conditions for an r-process.A. Patel et al. confirmed an r-process in these ejecta using detailed nucleosynthesis calculations. Radioactive decay of the freshly synthesized nuclei releases a forest of gamma-ray lines, Doppler broadened by the high ejecta velocities v 0.1c into a quasi-continuous spectrum peaking around 1 MeV. Here, we show that the predicted emission properties (light curve, fluence, and spectrum) match a previously unexplained hard gamma-ray signal seen in the aftermath of the famous 2004 December giant flare from the magnetar SGR 1806–20. This MeV emission component, rising to peak around 10 minutes after the initial spike before decaying away over the next few hours, is direct observational evidence for the synthesis of ∼10−6 Me of r-process elements. The discovery of magnetar giant flares as confirmed r-process sites, contributing at least ∼1%–10% of the total Galactic abundances, has implications for the Galactic chemical evolution, especially at the earliest epochs probed by low-metallicity stars. It also implicates magnetars as potentially dominant sources of heavy cosmic rays. Characterization of the r-process emission from giant flares by resolving decay line features offers a compelling science case for NASA’s forthcomingCOSI nuclear spectrometer, as well as next-generation MeV telescope missions.
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...home
This comprehensive assignment explores the pivotal role of DNA profiling and Short Tandem Repeat (STR) analysis in forensic science and genetic studies. The document begins by laying the molecular foundations of DNA, discussing its double helix structure, the significance of genetic variation, and how forensic science exploits these variations for human identification.
The historical journey of DNA fingerprinting is thoroughly examined, highlighting the revolutionary contributions of Dr. Alec Jeffreys, who first introduced the concept of using repetitive DNA regions for identification. Real-world forensic breakthroughs, such as the Colin Pitchfork case, illustrate the life-saving potential of this technology.
A detailed breakdown of traditional and modern DNA typing methods follows, including RFLP, VNTRs, AFLP, and especially PCR-based STR analysis, now considered the gold standard in forensic labs worldwide. The principles behind STR marker types, CODIS loci, Y-chromosome STRs, and the capillary electrophoresis (CZE) method are thoroughly explained. The steps of DNA profiling—from sample collection and amplification to allele detection using electropherograms (EPGs)—are presented in a clear and systematic manner.
Beyond crime-solving, the document explores the diverse applications of STR typing:
Monitoring cell line authenticity
Detecting genetic chimerism
Tracking bone marrow transplant engraftment
Studying population genetics
Investigating evolutionary history
Identifying lost individuals in mass disasters
Ethical considerations and potential misuse of DNA data are acknowledged, emphasizing the need for careful policy and regulation.
Whether you're a biotechnology student, a forensic professional, or a researcher, this document offers an in-depth look at how DNA and STRs transform science, law, and society.
Concise Notes on tree and graph data structureYekoyeTigabu2
Advances in Scientific Workflow Environments
1. 2016-09-04 BioExcel SIG, ECCB, Amsterdam
Advances in Scientific
Workflow Environments
Carole Goble, Stian Soiland-Reyes
The University of Manchester
[email protected]
http://esciencelab.org.uk/
2. What is a Workflow?
• Orchestrating multiple
computational tasks
• Managing the control and
data flow between them
• In a world that is
homogeneous or
heterogeneous
• Tasks
– Local / remote
– Local / third party
– White, grey or black boxes
– Reliable / fragile
– Reserved / dynamic
– Various underpinning
infrastructure
– Various access controls
BioExcel: Biomolecular recognition
3. What is a Workflow?
Automation
– Automate computational aspects
– Repetitive pipelines, sweep campaigns
Scaling – compute cycles
– Make use of computational infrastructure
& handle large data
Abstraction – people cycles
– Shield complexity and incompatibilities
– Report, re-use, evolve, share, compare
– Repeat –Tweak - Repeat
– First class commodities
Provenance - reporting
– Capture, report and utilize log and data
lineage auto-documentation
– Traceable evolution, audit, transparency
– Compare
With thanks to Bertram Ludascher:WORKS 2015 Keynote
Findable
Accessible
Interoperable
Reusable
(Reproducible)
12. Workflow Patterns, templates
Data
wrangling
& analytics
Simulations
Instrument
pipelines
+
+
http://tpeterka.github.io/maui-project/
The Future of ScientificWorkflows, Report of DOEWorkshop 2015,
http://science.energy.gov/~/media/ascr/pdf/programdocuments/docs/workflows_final_report.pd
13. Workflow Patterns, templates
Data
wrangling
& analytics
Simulations
Instrument
pipelines
+
+ Garijo et al Common Motifs in ScientificWorkflows: An EmpiricalAnalysis, FGCS, 36, July 2014, 338–351
14. Workflow Patterns, templates
• Long running and complex code
• Tunable parameters and input sets
• Simulation sweeps / iterations
• Ensembles, comparisons
• Tricky set-ups, human-in-the-loop
interaction
• Computational steering
• In situ workflows – multiple tasks, same
box, within fixed time
– data locality.
– human-in-the-loop.
– capture provenance.
Data
wrangling
& analytics
Simulations
Instrument
pipelines
+
+
15. Traction + Examples
Reuse behaviours
Exploratory vs Production
Different kinds of user / deployment
Developer – User Ratios
BiologistDeveloper Computational
Scientist
19. “Multi-scale” WFMS
• Workflow
Management
System
– Its design and reporting
environment
– Its execution
environment
• The tasks
– tools, codes and services
and their execution
environments
• Stack layer
– App level, infrastructure
level
20. Component making
Tasks loosely coupled through files,
• execute on geographically distributed
clusters, clouds, grids across systems
• execute on multiple facilities
• call host services (web / grid services)
DAIC
Distributed Area/Instrument
Computing
“Multi-scale” WFMS
Tasks tightly coupled
• exchanging info over memory/storage
• network of supercomputers
• In situ workflows – multiple tasks, same
box, within fixed time
HPC
Interoperability
Portability
Granularity
Maintenance
22. Copernicus workflow engine for
parallel adaptive molecular dynamics
• Peer-to-peer distributed
computing platform
– high-level parallelization of
statistical sampling problems
• Consolidation of heterogeneous
compute resources
• Automatic resource matching of
jobs against compute resources
• Automatic fault tolerance of
distributed work
• Workflow execution engine to
define a problem (reporting) and
trace its results live (provenance)
• Flexible plugin facilities
– programs to be integrated to the
workflow execution engine
Free Energy
Workflow using
GROMACS
http://copernicus-computing.org/
23. COMPs/PyCOMPs:
Programmer Productivity
framework
• Sequential programming
– Parallelisation and
distribution heavy-lifting
– Dependency detection
• Infrastructure unaware
– Abstract application from
underlying infrastructure
– Portability
• Standard Programming
Languages
– Java, Python, C/C++
• No (or few!) APIs
– Standard Java
25. Stop Press!
GUIs not essential!
• Canvas, drag-drop blocks, arrows,
run button
• Command-line & embedding in
developer or user applications
Scripts can be workflows!
• WMS<->Scripts
• Script vs Workflows/ASAP:
– Automation: *****
– Scaling: **
– Abstraction: *
– Provenance: **
26. Stop Press!
GUIs not essential!
• Canvas, drag-drop blocks, arrows,
run button
• Command-line & embedding in
developer or user applications
Scripts can be workflows!
• WMS <-> Scripts
• Script vs Workflows/ASAP:
– Automation: *****
– Scaling: **
– Abstraction: *
– Provenance: **
Work close to a problem-
specific ad-hoc data model
Domain Specific Language
"programming-lite" scripts
• wire with declarative
"makefile"-like DAG
Plus
• procedural scripting and
expressions in languages
like Javascript and Python
Nextflow, SnakeMake,
CommonWorkflow Language
29. Provenance the link between computation and results
W3C PROV model standard
record for reporting
compare diffs/discrepancies
provenance analytics
track changes, adapt
partial repeat/reproduce
carry attributions
compute credits
compute data quality/trust
select data to keep/release
optimisation and debugging
Metadata propagation –where was the
physical sample collected, and who
should be attributed?
Task-based abstractions: simplifying
provenance using motifs and tool
annotations
“Free energy calculation” rather than 5
steps including preparation of PDB files
and GROMACS execution
30. Provenance the link workflow variants
and workflow reuse and repurpose
W3C PROV model standard?
record for reporting
compare diffs/discrepancies
provenance analytics
track changes, adapt
carry attributions
compute design credits
versioning, forking, cloning
Nested workflows
functions by stealth
Copy and paste fragmentation
Designing for reuse
Find and Go
Software practices
Systematic reuse
Guidelines for persistently identifying
software using DataCite
https://epubs.stfc.ac.uk/work/24058274
https://www.force11.org/software-citation-
principles
31. ASAP Wfms for FAIR Science
Automate: workflows,
programs and services folks
already use or want to use
Scale: Enable computational
productivity
Abstract: Enable human
productivity
Provenance: Record and use Usability
Workflow Plugged in Code
Reporting Comparison
Thanks to Bertram Ludascher
33. ● Task-specific “mini-workflow”
fragments
– e.g. using Gromacs, CPMD,
HADDOCK
● Packaged
– EGIVM images and Docker
containers
● Backed by existing registries
– ELIXIR’s bio.tools and EGI App DB
● Instantiated as cloud instances
– private (Open Nebula, Open Stack)
– public (e.g.AmazonAWS )
Application Building Blocks
BioExcel Virtualised Software Library
“transversal workflow units”, higher level operations
34. BioExcel Use cases
● Genomics
● Ensembl Molecular
simulations
● Free Energy simulations
● Multiscale modelling of
molecular basis for odor
and taste
● Biomolecular recognition
● Pharmacological queries
● Virtual Screening
35. Finding valid pathways through free-energy
landscapes: implementation of the “string of
swarms” method using Copernicus as a
workflow manager, and GROMACS as a
compute engine.
36. Workflow Interoperability.
• Common format for bioinformatics tool &
workflow execution
• Community based standards effort
• Designed for clusters & clouds
• Supports the use of containers (e.g. Docker)
• Specify data dependencies between steps
• Scatter/gather on steps
• Nest workflows in steps
• Develop your pipeline on your local computer
(optionally with Docker)
• Execute on your research cluster or in the cloud
• Deliver to users via workbenches
• EDAM ontology (ELIXIR-DK) to specify file
formats and reason about them: “FASTQ
Sanger” encoding is a type of FASTQ file
37. Workflow Research Object Bundle
researchobject.org
Belhajjame et al (2015) Using a suite of ontologies for preserving workflow-centric research objects,
JWeb Semantics doi:10.1016/j.websem.2015.01.003
application/vnd.wf4ever.robundle+zip
38. Z. Zhao et al., “Workflow bus for e-Science”, in IEEE e-Science 2006, Amsterdam
40. http://bioexcel.eu/events/bioexcel-workflow-training-for-computational-biomolecular-
research/
Adam Hospital (IRB), Anna Montras (IRB), Stian Soiland-Reyes (UNIMAN), Alexandre Bonvin
(UU), Adrien Melquiond (UU), Josep Lluís Gelpí (BSC), Daniele Lezzi (BSC), Steven Newhouse
(EBI), Jose A. Dianes (EBI), Mark Abraham (KTH), Rossen Apostolov (KTH), Emiliano Ippoliti
(Jülich), Adam Carter (UEDIN), Darren J. White (UEDIN)
Slides: Bertram Ludascher, Ewa Deelman, Vasa Curcin, Paolo Missier, Pinar Alper, Susheel
Varma, Rob Finn, Michael Crusoe, Rizos Sakellariou
Sign up
ASAP!