SlideShare a Scribd company logo
Research Data Life Cycle
By Christine Kollen
University of Arizona Libraries
September 4, 2017
Agenda
 Introduction
 Ready Set, Data
 Issues in Research Data Management
 Research Data Life Cycle
 Data Sharing and Access - benefits
 Long-lived Data
 Metadata and Documentation
 Data Management Plans
 What are publishers and funders saying about data?
Introduction
 Workshop will cover main components of the research data life cycle
 Some of the content, exercises, and discussion topics were taken from
ANDS 23 (Research Data) Things program and UCSD Library 23
(Research Data) Things program
 Introduce yourself – what is your name, institution, and position?
What do you hope to learn from this workshop?
Ready Set Data!
What is Research Data?
"the recorded factual material commonly
accepted in the scientific community as
necessary to validate research findings, but not
any of the following: preliminary analyses,
drafts of scientific papers, plans for future
research, peer reviews, or communications
with colleagues.”
(OMB Circular 110).
What is Research Data?
Observational data – captured in real time, usually
irreplaceable
 Sensor readings
 Images of the physical world
 Survey data
 Telemetry
Experimental – from lab equipment, reproducible but expensive
 Gene sequences
 Images of water flowing through a flume
 Chromatograms
Simulation – test models, models and metadata
more important than raw data
 Climate models
 Economic models
What is Research Data? (continued)
Derived/Compiled – reproducible, but expensive
 Text and data mining
 Compiled databases
 3D models
Reference or canonical – collection of smaller datasets
 Gene sequence databanks
 Chemical structures
 Spatial Data portals, such as Spatial Data Explorer
Exercise 1 -- Ready Set Data
Go online and look at one of the following repositories:
• Harvard DataVerse - https://dataverse.harvard.edu/
• Global Change Master Directory – http://gcmd.nasa.gov -- Search for data
• Dryad Digital Repository -- http://datadryad.org/
• NeuroMorpho – http://neuromorpho.org – Search metadata or keyword
• Qualitative Data Repository -- https://qdr.syr.edu/ -- Discover  Search Data
Questions?
1. How does this data differ from what you are familiar with? By format, size, access?
2. Does the collection have a code book or data dictionary? Other documentation?
3. Explore the metadata representing the collection. Besides the title and description,
what other elements are described?
Issues in Research Data Management
YouTube video from NYU Health Sciences Library – what happens when a
researcher does not manage their data (4:40 minutes)
Data Management Best Practices
UA Data Management Resources on best practices:
Data Organization - http://data.library.arizona.edu/data-management-
tips/data-organization
Discussion
1. Are there file naming conventions in your discipline?
2. It is important to share data in a non-proprietary format that uses open
data standards. Which of the following formats are preferred when
sharing data?
Microsoft Word PDF/A
Microsoft Excel CSV (comma-separated values)
GIF/JPEG TIFF
Image Source: http://guides.library.ucsc.edu/datamanagement/
Research Data Life Cycle
 Data often has a longer lifespan than the research project that creates
them
 Other projects may analyze or add to the data; reused by other
researchers
 Funders and journal editors are beginning to require that researchers
make the underlying data accessible for the long term
Finding data repositories
“The resulting data ecosystem, therefore, appears to be
moving away from centralization, is becoming more diverse,
and less integrated…” (M.D. Wilkinson)
 Numerous repositories
 Scales range from institutional (campus repositories) to globally-
scoped repositories
 Accept a wide range of data types and formats
 Little attempt to integrate or harmonize deposited datasets
 Few requirements for the descriptors of a dataset
Data to use in your Research
What does the data repository landscape look like? Let’s explore the Registry of Research
Data Repositories, http://re3data.org to explore data repositories by academic discipline.
Go to http://re3data.org and click on Browse > By Country > click on Mexico
Look at a few – EcoCyc, California Coastal Atlas, International Maize & Wheat Improvement
• What content types are available?
• How do you access the database?
• Is it available for data upload or are there restrictions?
• Do they assign a persistent identifier (such as a DOI)? What metadata schema do they
use?
• What metadata schema do they use?
Sharing Data
Open Data
 Is freely available on the internet
 Permits any user to download, copy, analyze, re-process, or use for any
other purpose
 Is without financial, legal or other technical barriers
Benefits
 Accelerates the pace of discovery
 Grows the economy
 Improves the integrity of the scientific and scholarly record
 Becoming recognized by many in the research community, important part
of the research enterprise
Sharing Data (continued)
 Shared Data – data available to a specific group of people for a
specific purpose.
 Closed Data – data that only those within an organization can see
Sharing Data – attitudes of researchers
Wiley’s Researcher Data Insights Survey (2016) found that:
 Globally – 69% share their data; 31% do not
 Data is shared:
 41% as supplementary material in a journal
 29% - personal institutional or project website
 25% Institutional data repository
 10% Disciplinary data repository
 6% General purpose data repository (figshare, Dryad)
 Motivations for sharing – increase impact & visibility, public benefit,
transparency and re-use, journal requirement
Source: Researcher Data Sharing Insights - https://hub.wiley.com/community/exchanges/discover/blog/2017/04/19/open-
science-trends-you-need-to-know-about
Data Restrictions and Protection
 Appropriate protection of privacy
 Security of data
 Confidentiality/HIPAA or FERPA
 Intellectual Property and Copyright
 Embargo
 Other rights or requirements?
Long-lived Data: Curation and Preservation
 Data Curation - “active and
ongoing management of data
through its life cycle of interest
and usefulness to scholarship,
science, and education…”
(UI Graduate School of Library and Information
Science)
 Data Preservation – “series of
managed activities necessary to
ensure continued access to
digital materials for as long as
necessary.” (Digital Preservation Handbook)
Data curation is the process of
making data FAIR:
Archiving for preservation and long-term access
What happens to data once a research project is complete?
 How long should the data be retained?
 What format?
 Migration schedule?
 Plans for archiving other research products, physical samples and
derivatives
Metadata and Documentation
 Metadata is structured information - describes content, quality,
format, location and contact information
 Similar to descriptive cataloguing of library resources
 Metadata schema are sets of metadata elements (or fields) for
describing a particular type of information resource
 Most familiar those used in library catalogs and publications
repositories such as MARC and Dublin Core
Metadata and Documentation (continued)
 Also important to provide documentation so that your data will be
understood and interpreted correctly
 How your data was created, context for the data, structure and any
manipulations or analysis that have been done
 Can be as basic as a readme text file
• See Easy Data Management: Add README.txt file -
http://databrarians.org/2016/05/easy-data-management-add-a-readme-txt-
to-your-project-folders/
Exercise 2 - Metadata and Documentation
Look at a good quality metadata record:
Long-term variation of surface phytoplankton chlorophyll a in the
Southern Ocean during 1965-2002
Questions:
1. Why do you think this record is considered high quality?
2. What metadata fields help discovery and reuse of the data?
3. Why is metadata often neglected?
Data Management Plan (DMP)
Documents the lifecycle
of your data and provides
details on data collection
for storage, access,
sharing, and
reproducibility of your
results.
This can ensure the
availability and
accessibility of your
research results after
your project is complete.
Image Source: http://guides.library.ucsc.edu/datamanagement/
Exercise 3 – Data Management Plans (DMP)
Go DMPTool at https://dmptool.org
Questions:
1. Review 2-3 Data Management Plan samples. You will find them
under Public DMPs on the main screen.
What are 2 to 3 pieces of information that are essential to a DMP?
Why?
2. Log in to your DMPTool account and review one of the DMP
templates.
What are the strengths and weaknesses of the template you chose?
What are publishers and funders saying about data?
Data sharing policies are becoming increasingly common
 Journal editors are asking authors to make the data underlying an
article available. In addition, new forms of data publishing are
emerging  data journal
 Funders, US federal agencies and some foundations are requiring
researchers to:
 Submit a data management plan as part of the grant proposal or funding
request
 Deposit dataset(s) supporting published research results in a public data
repository
What is a data journal?
 A journal that publishes “data papers” describing datasets in rich detail so they
can be found and used by other researchers.
 The data paper contains a link to the entire set of data, which is usually published
in a public data repository. The dataset has a persistent identifier, usually a DOI.
 Two types of journals: hybrid and pure
• Hybrid journals publish regular papers and data papers
• Pure journals publish only data papers
 Pure data journals “explicitly provide peer review prior to ‘publication’ of the
data.” But the quality of that peer review varies. (Todd Carpenter)
 Hybrid journals may or may not peer review the dataset, although they usually
review the accompanying metadata for completeness.
Data Journals
116 data journals published by 15 different publishers, by subject. (Figure 2)
Candela, L., Castelli, D., Manghi, P. and Tani, A. (2015), Data journals: A survey. J Assn Inf Sci Tec, 66: 1747–1762.
doi:10.1002/asi.23358
Examples of Data Journals
 Geoscience Data Journal
http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2049-6060
 Earth System Science Data
http://earth-system-science-data.net/
 GigaScience – big data from life & biomedical sciences; open-access, open-
data, open peer-review
https://academic.oup.com/gigascience
 Scientific Data (Nature.com)
http://www.nature.com/sdata/
Journal or Funder Recommended Repositories
 Nature.com recommended:
https://www.nature.com/sdata/policies/repositories
 PlosOne recommended: http://journals.plos.org/plosone/s/data-
availability#loc-recommended-repositories
 Biosharing.org: https://biosharing.org/
 NIH-supported:
https://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html
 UA Library Data Management Data Repositories
Exercise 4 – Publisher and data repository policies
Choose one of the following:
 PLOS One data policy – http://journals.plos.org/plosone/s/data-availability
 Review their policy
 Dryad – data repository that integrates data and articles --
http://datadryad.org/pages/journalLookup
 Look up a journal you know and see what advice it gives on related data
Questions:
1. Was the policy you looked at clear?
2. Did you understand what you needed to do?
Conclusion
Important to think through the Research Data Life Cycle
 Identify potential data to reuse
 Develop your data management plan
 Collect, analyze, and reanalyze data, with organized protocols for data
management, data storage and back-up
 Archive data – migrate to suitable format, finalize metadata and
documentation
 Publication – distribute, share and promote data
Resources
“23 (Research Data) Things.” Australian National Data Service. August 3, 2017. http://www.ands.org.au/partners-and-
communities/23-research-data-things
“23 (Research Data) Things.” University of California San Diego Library. June 19, 2017. https://ucsdlib.github.io/23-Research-
Data-Things/
Carpenter, Todd. “What Constitutes Peer Review of Data? A Survey of Peer Review Guidelines.” August 4, 2017.
https://scholarlykitchen.sspnet.org/2017/04/11/what-constitutes-peer-review-research-data/
“Data Management Planning Tool, DMPTool.” University of California, California Digital Library, California Curation Center.
July 14, 2017. https://dmptool.org
“Data Management Resources.” University of Arizona Libraries. July 14, 2017. http://data.library.arizona.edu
“Guiding Principles for Findable, Accessible, Interoperable and Re-Usable Data Publishing Version B1.0.” FORCE 11. July 14,
2017. https://www.force11.org/fairprinciples.
NYU Health Sciences Library. “Data Sharing and Management Snafu in 3 Short Acts.” July 14, 2017.
https://www.youtube.com/watch?v=66oNv_DJuPc.
“Open/Closed/Shared: the world of data.” Open Data Institute. July 14, 2017. https://vimeo.com/125783029
“Open Data.” Scholarly Publishing and Academic Resources Coalition. July 14, 2017. https://sparcopen.org/open-data/
Vocile, Bobby. “Open Science Trends You Need to Know About.” Wiley Exchanges. April 20, 2017.
https://hub.wiley.com/community/exchanges/discover/blog/2017/04/19/open-science-trends-you-need-to-know-about.
Wilkinson, M.D., et. al. “The FAIR Guiding Principles for scientific data management and stewardship”. Scientific Data
3:160018. https://www.nature.com/articles/sdata201618
Witt, Michael. “23 Things: Libraries for Research Data.” Research Data Alliance. July 14, 2017. https://www.rd-
alliance.org/group/libraries-research-data-ig/outcomes/23-things-libraries-research-data-supporting-output
Questions?
Contact:
kollen@email.arizona.edu or 520-305-0495

More Related Content

PPT
Research Methodology lecture-01
Kishor Ade
 
PDF
Research Methodology - Scientific Research
Dr. Shivananda Koteshwar
 
PDF
Research Data Management and Sharing for the Social Sciences and Humanities
Rebekah Cummings
 
PPT
Introduction to Data Management and Sharing
Columbia Unviersity Scholarly Communication Program
 
PPT
Data management plans
Brad Houston
 
PPT
Data management plans (dmp) for nsf
Brad Houston
 
PPT
Data management plans (dmp) for nsf
Brad Houston
 
PPTX
Data Management for Research (New Faculty Orientation)
aaroncollie
 
Research Methodology lecture-01
Kishor Ade
 
Research Methodology - Scientific Research
Dr. Shivananda Koteshwar
 
Research Data Management and Sharing for the Social Sciences and Humanities
Rebekah Cummings
 
Introduction to Data Management and Sharing
Columbia Unviersity Scholarly Communication Program
 
Data management plans
Brad Houston
 
Data management plans (dmp) for nsf
Brad Houston
 
Data management plans (dmp) for nsf
Brad Houston
 
Data Management for Research (New Faculty Orientation)
aaroncollie
 

Similar to Research data life cycle (20)

PPT
Introduction to digital curation
Michael Day
 
PDF
Alain Frey Research Data for universities and information producers
Incisive_Events
 
PDF
You down with dmp yeah you know me!
Renaine Julian
 
PPT
Data management plans
Brad Houston
 
PPTX
Intro to RDM
Sarah Jones
 
PPTX
Research Lifecycles and RDM
Marieke Guy
 
PPTX
Meeting the NSF DMP Requirement June 13, 2012
IUPUI
 
PDF
Effective research data management
Catherine Gold
 
PPTX
Data Literacy: Creating and Managing Reserach Data
cunera
 
PDF
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
Arhiv družboslovnih podatkov
 
PPT
Human Genome and Big Data Challenges
Philip Bourne
 
PPTX
Research data management workshop april12 2016
Rebecca Raworth, MLIS
 
PPTX
Research data management workshop April 2016
Rebecca Raworth, MLIS
 
PPT
Survey of research data management practices up2010digschol2011
heila1
 
PPTX
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
University of California Curation Center
 
PPT
Data curation issues for repositories
Chris Rusbridge
 
PPTX
Practical Research Data Management: tools and approaches, pre- and post-award
Martin Donnelly
 
PPTX
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
Susanna-Assunta Sansone
 
PPTX
Fsci 2018 thursday2_august_am6
ARDC
 
PPTX
Management of Data Collections
abedejesus
 
Introduction to digital curation
Michael Day
 
Alain Frey Research Data for universities and information producers
Incisive_Events
 
You down with dmp yeah you know me!
Renaine Julian
 
Data management plans
Brad Houston
 
Intro to RDM
Sarah Jones
 
Research Lifecycles and RDM
Marieke Guy
 
Meeting the NSF DMP Requirement June 13, 2012
IUPUI
 
Effective research data management
Catherine Gold
 
Data Literacy: Creating and Managing Reserach Data
cunera
 
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
Arhiv družboslovnih podatkov
 
Human Genome and Big Data Challenges
Philip Bourne
 
Research data management workshop april12 2016
Rebecca Raworth, MLIS
 
Research data management workshop April 2016
Rebecca Raworth, MLIS
 
Survey of research data management practices up2010digschol2011
heila1
 
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
University of California Curation Center
 
Data curation issues for repositories
Chris Rusbridge
 
Practical Research Data Management: tools and approaches, pre- and post-award
Martin Donnelly
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
Susanna-Assunta Sansone
 
Fsci 2018 thursday2_august_am6
ARDC
 
Management of Data Collections
abedejesus
 
Ad

Recently uploaded (20)

PDF
Wings of Fire Book by Dr. A.P.J Abdul Kalam Full PDF
hetalvaishnav93
 
PPTX
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
PDF
Sunset Boulevard Student Revision Booklet
jpinnuck
 
PDF
3.The-Rise-of-the-Marathas.pdfppt/pdf/8th class social science Exploring Soci...
Sandeep Swamy
 
PDF
Landforms and landscapes data surprise preview
jpinnuck
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PPTX
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
PDF
7.Particulate-Nature-of-Matter.ppt/8th class science curiosity/by k sandeep s...
Sandeep Swamy
 
PPTX
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
PPTX
Understanding operators in c language.pptx
auteharshil95
 
PPTX
ACUTE NASOPHARYNGITIS. pptx
AneetaSharma15
 
PPTX
vedic maths in python:unleasing ancient wisdom with modern code
mistrymuskan14
 
PDF
Types of Literary Text: Poetry and Prose
kaelandreabibit
 
PPTX
IMMUNIZATION PROGRAMME pptx
AneetaSharma15
 
PPTX
PPTs-The Rise of Empiresghhhhhhhh (1).pptx
academysrusti114
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PPTX
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
PPTX
PREVENTIVE PEDIATRIC. pptx
AneetaSharma15
 
PDF
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
PDF
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
Wings of Fire Book by Dr. A.P.J Abdul Kalam Full PDF
hetalvaishnav93
 
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
Sunset Boulevard Student Revision Booklet
jpinnuck
 
3.The-Rise-of-the-Marathas.pdfppt/pdf/8th class social science Exploring Soci...
Sandeep Swamy
 
Landforms and landscapes data surprise preview
jpinnuck
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
7.Particulate-Nature-of-Matter.ppt/8th class science curiosity/by k sandeep s...
Sandeep Swamy
 
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
Understanding operators in c language.pptx
auteharshil95
 
ACUTE NASOPHARYNGITIS. pptx
AneetaSharma15
 
vedic maths in python:unleasing ancient wisdom with modern code
mistrymuskan14
 
Types of Literary Text: Poetry and Prose
kaelandreabibit
 
IMMUNIZATION PROGRAMME pptx
AneetaSharma15
 
PPTs-The Rise of Empiresghhhhhhhh (1).pptx
academysrusti114
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
PREVENTIVE PEDIATRIC. pptx
AneetaSharma15
 
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
Ad

Research data life cycle

  • 1. Research Data Life Cycle By Christine Kollen University of Arizona Libraries September 4, 2017
  • 2. Agenda  Introduction  Ready Set, Data  Issues in Research Data Management  Research Data Life Cycle  Data Sharing and Access - benefits  Long-lived Data  Metadata and Documentation  Data Management Plans  What are publishers and funders saying about data?
  • 3. Introduction  Workshop will cover main components of the research data life cycle  Some of the content, exercises, and discussion topics were taken from ANDS 23 (Research Data) Things program and UCSD Library 23 (Research Data) Things program  Introduce yourself – what is your name, institution, and position? What do you hope to learn from this workshop?
  • 4. Ready Set Data! What is Research Data? "the recorded factual material commonly accepted in the scientific community as necessary to validate research findings, but not any of the following: preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues.” (OMB Circular 110).
  • 5. What is Research Data? Observational data – captured in real time, usually irreplaceable  Sensor readings  Images of the physical world  Survey data  Telemetry Experimental – from lab equipment, reproducible but expensive  Gene sequences  Images of water flowing through a flume  Chromatograms Simulation – test models, models and metadata more important than raw data  Climate models  Economic models
  • 6. What is Research Data? (continued) Derived/Compiled – reproducible, but expensive  Text and data mining  Compiled databases  3D models Reference or canonical – collection of smaller datasets  Gene sequence databanks  Chemical structures  Spatial Data portals, such as Spatial Data Explorer
  • 7. Exercise 1 -- Ready Set Data Go online and look at one of the following repositories: • Harvard DataVerse - https://dataverse.harvard.edu/ • Global Change Master Directory – http://gcmd.nasa.gov -- Search for data • Dryad Digital Repository -- http://datadryad.org/ • NeuroMorpho – http://neuromorpho.org – Search metadata or keyword • Qualitative Data Repository -- https://qdr.syr.edu/ -- Discover  Search Data Questions? 1. How does this data differ from what you are familiar with? By format, size, access? 2. Does the collection have a code book or data dictionary? Other documentation? 3. Explore the metadata representing the collection. Besides the title and description, what other elements are described?
  • 8. Issues in Research Data Management YouTube video from NYU Health Sciences Library – what happens when a researcher does not manage their data (4:40 minutes)
  • 9. Data Management Best Practices UA Data Management Resources on best practices: Data Organization - http://data.library.arizona.edu/data-management- tips/data-organization Discussion 1. Are there file naming conventions in your discipline? 2. It is important to share data in a non-proprietary format that uses open data standards. Which of the following formats are preferred when sharing data? Microsoft Word PDF/A Microsoft Excel CSV (comma-separated values) GIF/JPEG TIFF
  • 11. Research Data Life Cycle  Data often has a longer lifespan than the research project that creates them  Other projects may analyze or add to the data; reused by other researchers  Funders and journal editors are beginning to require that researchers make the underlying data accessible for the long term
  • 12. Finding data repositories “The resulting data ecosystem, therefore, appears to be moving away from centralization, is becoming more diverse, and less integrated…” (M.D. Wilkinson)  Numerous repositories  Scales range from institutional (campus repositories) to globally- scoped repositories  Accept a wide range of data types and formats  Little attempt to integrate or harmonize deposited datasets  Few requirements for the descriptors of a dataset
  • 13. Data to use in your Research What does the data repository landscape look like? Let’s explore the Registry of Research Data Repositories, http://re3data.org to explore data repositories by academic discipline. Go to http://re3data.org and click on Browse > By Country > click on Mexico Look at a few – EcoCyc, California Coastal Atlas, International Maize & Wheat Improvement • What content types are available? • How do you access the database? • Is it available for data upload or are there restrictions? • Do they assign a persistent identifier (such as a DOI)? What metadata schema do they use? • What metadata schema do they use?
  • 14. Sharing Data Open Data  Is freely available on the internet  Permits any user to download, copy, analyze, re-process, or use for any other purpose  Is without financial, legal or other technical barriers Benefits  Accelerates the pace of discovery  Grows the economy  Improves the integrity of the scientific and scholarly record  Becoming recognized by many in the research community, important part of the research enterprise
  • 15. Sharing Data (continued)  Shared Data – data available to a specific group of people for a specific purpose.  Closed Data – data that only those within an organization can see
  • 16. Sharing Data – attitudes of researchers Wiley’s Researcher Data Insights Survey (2016) found that:  Globally – 69% share their data; 31% do not  Data is shared:  41% as supplementary material in a journal  29% - personal institutional or project website  25% Institutional data repository  10% Disciplinary data repository  6% General purpose data repository (figshare, Dryad)  Motivations for sharing – increase impact & visibility, public benefit, transparency and re-use, journal requirement Source: Researcher Data Sharing Insights - https://hub.wiley.com/community/exchanges/discover/blog/2017/04/19/open- science-trends-you-need-to-know-about
  • 17. Data Restrictions and Protection  Appropriate protection of privacy  Security of data  Confidentiality/HIPAA or FERPA  Intellectual Property and Copyright  Embargo  Other rights or requirements?
  • 18. Long-lived Data: Curation and Preservation  Data Curation - “active and ongoing management of data through its life cycle of interest and usefulness to scholarship, science, and education…” (UI Graduate School of Library and Information Science)  Data Preservation – “series of managed activities necessary to ensure continued access to digital materials for as long as necessary.” (Digital Preservation Handbook) Data curation is the process of making data FAIR:
  • 19. Archiving for preservation and long-term access What happens to data once a research project is complete?  How long should the data be retained?  What format?  Migration schedule?  Plans for archiving other research products, physical samples and derivatives
  • 20. Metadata and Documentation  Metadata is structured information - describes content, quality, format, location and contact information  Similar to descriptive cataloguing of library resources  Metadata schema are sets of metadata elements (or fields) for describing a particular type of information resource  Most familiar those used in library catalogs and publications repositories such as MARC and Dublin Core
  • 21. Metadata and Documentation (continued)  Also important to provide documentation so that your data will be understood and interpreted correctly  How your data was created, context for the data, structure and any manipulations or analysis that have been done  Can be as basic as a readme text file • See Easy Data Management: Add README.txt file - http://databrarians.org/2016/05/easy-data-management-add-a-readme-txt- to-your-project-folders/
  • 22. Exercise 2 - Metadata and Documentation Look at a good quality metadata record: Long-term variation of surface phytoplankton chlorophyll a in the Southern Ocean during 1965-2002 Questions: 1. Why do you think this record is considered high quality? 2. What metadata fields help discovery and reuse of the data? 3. Why is metadata often neglected?
  • 23. Data Management Plan (DMP) Documents the lifecycle of your data and provides details on data collection for storage, access, sharing, and reproducibility of your results. This can ensure the availability and accessibility of your research results after your project is complete. Image Source: http://guides.library.ucsc.edu/datamanagement/
  • 24. Exercise 3 – Data Management Plans (DMP) Go DMPTool at https://dmptool.org Questions: 1. Review 2-3 Data Management Plan samples. You will find them under Public DMPs on the main screen. What are 2 to 3 pieces of information that are essential to a DMP? Why? 2. Log in to your DMPTool account and review one of the DMP templates. What are the strengths and weaknesses of the template you chose?
  • 25. What are publishers and funders saying about data? Data sharing policies are becoming increasingly common  Journal editors are asking authors to make the data underlying an article available. In addition, new forms of data publishing are emerging  data journal  Funders, US federal agencies and some foundations are requiring researchers to:  Submit a data management plan as part of the grant proposal or funding request  Deposit dataset(s) supporting published research results in a public data repository
  • 26. What is a data journal?  A journal that publishes “data papers” describing datasets in rich detail so they can be found and used by other researchers.  The data paper contains a link to the entire set of data, which is usually published in a public data repository. The dataset has a persistent identifier, usually a DOI.  Two types of journals: hybrid and pure • Hybrid journals publish regular papers and data papers • Pure journals publish only data papers  Pure data journals “explicitly provide peer review prior to ‘publication’ of the data.” But the quality of that peer review varies. (Todd Carpenter)  Hybrid journals may or may not peer review the dataset, although they usually review the accompanying metadata for completeness.
  • 27. Data Journals 116 data journals published by 15 different publishers, by subject. (Figure 2) Candela, L., Castelli, D., Manghi, P. and Tani, A. (2015), Data journals: A survey. J Assn Inf Sci Tec, 66: 1747–1762. doi:10.1002/asi.23358
  • 28. Examples of Data Journals  Geoscience Data Journal http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2049-6060  Earth System Science Data http://earth-system-science-data.net/  GigaScience – big data from life & biomedical sciences; open-access, open- data, open peer-review https://academic.oup.com/gigascience  Scientific Data (Nature.com) http://www.nature.com/sdata/
  • 29. Journal or Funder Recommended Repositories  Nature.com recommended: https://www.nature.com/sdata/policies/repositories  PlosOne recommended: http://journals.plos.org/plosone/s/data- availability#loc-recommended-repositories  Biosharing.org: https://biosharing.org/  NIH-supported: https://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html  UA Library Data Management Data Repositories
  • 30. Exercise 4 – Publisher and data repository policies Choose one of the following:  PLOS One data policy – http://journals.plos.org/plosone/s/data-availability  Review their policy  Dryad – data repository that integrates data and articles -- http://datadryad.org/pages/journalLookup  Look up a journal you know and see what advice it gives on related data Questions: 1. Was the policy you looked at clear? 2. Did you understand what you needed to do?
  • 31. Conclusion Important to think through the Research Data Life Cycle  Identify potential data to reuse  Develop your data management plan  Collect, analyze, and reanalyze data, with organized protocols for data management, data storage and back-up  Archive data – migrate to suitable format, finalize metadata and documentation  Publication – distribute, share and promote data
  • 32. Resources “23 (Research Data) Things.” Australian National Data Service. August 3, 2017. http://www.ands.org.au/partners-and- communities/23-research-data-things “23 (Research Data) Things.” University of California San Diego Library. June 19, 2017. https://ucsdlib.github.io/23-Research- Data-Things/ Carpenter, Todd. “What Constitutes Peer Review of Data? A Survey of Peer Review Guidelines.” August 4, 2017. https://scholarlykitchen.sspnet.org/2017/04/11/what-constitutes-peer-review-research-data/ “Data Management Planning Tool, DMPTool.” University of California, California Digital Library, California Curation Center. July 14, 2017. https://dmptool.org “Data Management Resources.” University of Arizona Libraries. July 14, 2017. http://data.library.arizona.edu “Guiding Principles for Findable, Accessible, Interoperable and Re-Usable Data Publishing Version B1.0.” FORCE 11. July 14, 2017. https://www.force11.org/fairprinciples. NYU Health Sciences Library. “Data Sharing and Management Snafu in 3 Short Acts.” July 14, 2017. https://www.youtube.com/watch?v=66oNv_DJuPc. “Open/Closed/Shared: the world of data.” Open Data Institute. July 14, 2017. https://vimeo.com/125783029 “Open Data.” Scholarly Publishing and Academic Resources Coalition. July 14, 2017. https://sparcopen.org/open-data/ Vocile, Bobby. “Open Science Trends You Need to Know About.” Wiley Exchanges. April 20, 2017. https://hub.wiley.com/community/exchanges/discover/blog/2017/04/19/open-science-trends-you-need-to-know-about. Wilkinson, M.D., et. al. “The FAIR Guiding Principles for scientific data management and stewardship”. Scientific Data 3:160018. https://www.nature.com/articles/sdata201618 Witt, Michael. “23 Things: Libraries for Research Data.” Research Data Alliance. July 14, 2017. https://www.rd- alliance.org/group/libraries-research-data-ig/outcomes/23-things-libraries-research-data-supporting-output

Editor's Notes

  • #4: Developed from the RDA handout: 23 Things: Libraries for Research Data – see Resource List for link
  • #5: Research data often take physical and digital formats: numerical datasets, observational information, maps, texts, images, and time-dependent media, etc. The National Science Foundation states "data are any and all complex data entities from observations, experiments, simulations, models, and higher order assemblies, along with the associated documentation needed to describe and interpret the data.
  • #6: These are a few examples of the different categories of data
  • #7: May include: text or word documents, spreadsheets Lab notebooks Questionnaires Audiotapes, videotapes Slides, specimen Methodologies and workflows
  • #9: Ask them to write down any “mistakes” pointed out in the video that piques their interest. Anyone want to share?
  • #12: Here is a model of what we call the Research Data Management Lifecycle. Research begins with a research question, finds and assesses what data is already out there that bears on the question, then designs experiments that will (hopefully!) answer the question. Those experiments require the collection, analysis, and re-analysis of data, with organized protocols for data management and data storage and back-up. Once the study is finished, the data needs to be archived. Archiving – migrating data to a suitable format, create metadata and documentation (should be developing as you are going through the project, not at the end). Publication – Distribute and share data, control access (?), promote data. Data should have a DOI Move back to Research Question etc.
  • #13: For reproducibility and the validate results
  • #14: Slides contents quoted or adapted from Wilkinson, M.D., et. al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018 doi: 10.1038/sdata.2016.18 (2016).
  • #15: r3data.org is a global registry of over 1,500 research data repositories that covers research data repositories from different academic disciplines. Show them the different components of the record.
  • #16: View
  • #17: Research funders, journal publishers, institutions involved in the research process
  • #19: Wiley authors – survey filled out by over 4600 Wiley authors from 112 countries
  • #20: 70% indicated that they use email to share their data with others, 34% link supplementary information to a journal article, and 33% share their data in a data repository. Department, project, or university websites accounted for 63%, cloud services accounted for 48%, and 14% indicated other: secure data network, secure ftp, mailing DVDs or other physical media.
  • #21: You will need to address if your data will be of a sensitive nature - human subject concerns, potential patentability, species/ecological endangerment concerns where public access is inappropriate. In the US, we have HIPAA (Health Insurance Portability and Accountability Act) – protects data privacy and security provisions for safeguarding medical information and FERPA (Family Educational Rights and Privacy Act) provides certain protection with regards to student records. How will granular control and access be achieved (e.g. formal consent agreements; anonymization of data; restricted access, only available within a secure network). Who will hold the intellectual property rights to the data? Do you need to establish a an embargo period for political, commercial, patent reasons? Only available after a certain time period?
  • #22: Data Curation is, among other things, the process of making data FAIR: Findable, Accessible, Interoperable, and Reuseable. The FAIR guiding principles were developed by a diverse international consortium of “academia, industry, funding agencies, and scholarly publishers” so that “data providers and data consumers - both machine and human - could more easily discover, access, interoperate, and sensibly re-use, with proper citation, the vast quantities of information being generated by contemporary data-intensive science”. Guiding Principles for Findable, Accessible, Interoperable and Re-Usable Data Publishing Version B1.0, https://www.force11.org/fairprinciples Findable – any digital object should be uniquely and persistently identifiable Accessible- can always be obtained by machines and humans Interoperable – metadata is machine-actionable; metadata formats use shared vocabularies and ontologies Re-usable – compliant with first three principles, can be linked or integrated with other data sources; have rich enough metadata to enable proper citation
  • #23: How long to retain – 3-5 years, 10 years, forever? What final format to retain? Will you need to migrate to a different format before depositing? What procedures does the data storage facility have in place for preservation and back-up Does your long-term storage option not only provide continual access to your data but also ensure that your data will be usable over time.
  • #24: Loose information when you move data to new media
  • #25: Metadata is your best data friend!
  • #27: Question 1: Consider both the type and quality of information provided.
  • #28: So, what is a data management plan? A data management plan should document the lifecycle of your data, and I would recommend writing one up even it you’re not currently applying for a grant, or you plan to apply for a grant that doesn’t require one. Plans can be really beneficial: it requires you to think about data reuse, storage and sharing, and reproducibility.
  • #30: Such as a campus repository or disciplinary data repository
  • #31: From: What Constitutes Peer Review of Data? A Survey of Peer Review Guidelines by Todd Carpenter, https://scholarlykitchen.sspnet.org/2017/04/11/what-constitutes-peer-review-research-data/
  • #32: These include hybrid and pure journals.