2. Contents
1. Open science and research data management
2. Funding agency requirements
3. Research data and publication
4. RDR, the repository for publishing FAIR data
3. Contents
1. Open science and research data management
2. Funding agency requirements
3. Research data and publication
4. RDR, the repository for publishing FAIR data
4. What is open science?
New approach to the research process based on an open environment, that is, on
a communication ecosystem that facilitates that research results are transparent,
accessible, interoperable and reusable, and that science is made with and for
society.
https://cora.csuc.cat/
M. Imming & J. Tennant
5. Open Science at CSUC
Impulse
open
science Open
access
Research data management
Repositories
Catalonia Research Portal
https://cora.csuc.cat/
6. What do they have in common? Compilation of the percentage of abstention in different European countries
Photographs of an excavation
Prints of an archaeological intervention
Interview scripts
Samples from colon cancer patients
It's all RESEARCH DATA!
7. Therefore... What do we mean by?
(CT CORA.RDR)
DATASET: "set of files, descriptive metadata, associated
documentation, together with the rights and the license of use,
which validates a search activity" (CT CORA.RDR)
OPEN SEARCH DATA: "are online, at no cost of use,
accessible and can be reused and distributed as long as the
source of the data is cited"(FOSTER)
RESEARCH DATA: "any information that has been collected,
observed, generated or created to validate a research process"
(FOSTER)
FAIR DATA: "data that has been processed to be findable,
accessible, interoperable and reusable"
Terms
Medium
8. Research Data Management
(RDM)
It is the set of practices related to the
creation, organization, structuring,
storage, preservation and sharing of data
1.5 Why is it necessary to manage data?
9. 1.7 Data types
There are different types of data:
-Observational: data captured in real time
-Experimental: data captured on laboratory equipment
-Simulation: data generated from test models
-Derived or compiled: reproducible data, but difficult to reproduce
-Reference: conglomerate or dataset
And these usually go through different stages:
-Raw data: data as originally collected, without processing or analysis.
-Curated data: processed and analyzed data
-Published data: made public in an environment
-Metadata: data that accompanies the data and describes the resource
10. Contents
1. Open science and research data management
2. Funding agency requirements
3. Research data and publication
4. RDR, the repository for publishing FAIR data
11. 2.1 A European desire for national and institutional implementation
12. 2.2 Funding agency requirements
Generally, funding agencies ask for:
• Make a data management plan (DMP) and
update it periodically
•
Deposit data following FAIR principles in a
trusted repository
OpenAIRE
13. Contents
1. Open science and research data management
2. Funding agency requirements
3. Research data and publication
4. RDR, the repository for publishing FAIR data
15. 3.2.1 Criteria for selecting a repository
What happens in each case?
We cede the
rights to the data
and replicate the
current business
model of the
journals
The project ends and we
lose all information
We do not know the viability of
the project and usually do not
have quality processes
16. 3.2.2 Criteria for selecting a repository
Document about criteria for selecting a repository
Consult if your institution
has policies or
recommendations about
how to select a research
data repository.
It you have a funded
research project, check the
funding agencies
requirements.
Check if you can
deposit your
research data in
a disciplinary
research data
repository
Use a general-
purpose and
trustworthy data
repository
17. 3.2 Criteria for selecting a repository
What should be taken into account when choosing?
• Is there a specific repository for the discipline?
• Do needs match requirements? (formats, size, openness level, etc.)
• Do they assign a persistent identifier? DOI?
• Is it certified as a trusted repository?
18. 3.2.2 Criteria for selecting a repository
Re3Data, the catalogue par
excellence
-More than 3,000 repositories
-Search by name or discipline
-Possibility of adding filters
(certification, license, PID...)
19. Contents
1. Open science and research data management
2. Funding agency requirements
3. Research data and publication
4. RDR, the repository for publishing FAIR data
20. Dataset-federated repository
-Universities
-CERCA research centers
Multidisciplinary
For Teaching and Research Academic Staff
and PhD students
Open (as open as possible but as closed
as necessary)
Complies with funding agency
requirements and FAIR principles
100 GB per dataset by default
4.1 What is CORA. RDR?
https://dataverse.csuc.cat/
26. 4.6 First of all, essential metadata must be added
Predefined info
A descriptive title is required;
if it is the same as the article, add
"Replication Data for" at the beginning
Add as many authors as you need
ORCID!
Designates a contact to receive
communications from other
researchers
Describe the dataset (content,
formats, files, variables, etc.)
It's not the abstract of the
article!
27. 4.7 Adding more metadata by discipline
Choose your discipline from a drop-down list
Add as many keywords as you need. If it's
controlled language, even better!
Relate the dataset to associated
publications
Explain anything you want in your notes
Indicate the language of the dataset
The producer and distributor are pre-established
Depositor credit can be given
The deposit date is predefined
The type of data must be indicated
28. 4.8 Adding files
Add all the files you need!
Attention!
Every dataset must contain a
Readme.txt that describes
the content of the data
(variable and value labels,
units of measurement, etc.)
If your dataset has a certain
file structure, upload it to ZIP
to preserve its structure
Save dataset [not yet public]
Accept the Terms and condicions
29. 4.9 Dataset in draft
Fields
Metadata
At the top of the screen, different informative
messages will appear
The dataset has been created
The dataset is a draft
The tab file is being ingested
Dataset title
License
Versions
Citation
(with DOI!)
of the
dataset
Dataset
status
30. 4.10 After this point...
MANY MORE metadata can be
added
Especially by discipline
The default license is CC0
But it can be modified
31. 4.11 Add more data by discipline
Geospatial metadata
Social Science and
Humanities metadata
Astronomy and Astrophysics
metadata
Life sciences metadata
Journal metadata
Computational workflow metadata
32. The deposit and publication workflow
Contact your
institution the first
time you want to
make a deposit!
Researchers...
DEPOSIT SUBMIT FOR REVIEW
Dataset Draft
33. Creating a unpublished dataset preview URL
You can ask your institution's support service to
create a private URL so that others can view
the dataset even if it's in draft
36. Dataset published following the FAIR principles
Persistent identifiers (DOIs) for dataset and metadata indexing by search
engines (EOSC, Google, OpenAIRE, Recolecta...)
Licenses that clearly state how the data can be used and that ensure that
the data is well documented and retained for long-term use (CC Licensing
Recommendation; Metadata CC0)
Possibility of making
versions and having access
to each one
It provides standardized metadata schemas (generic and by discipline) and
allows the integration of data with research tools and platforms.
Guarantee that data can be downloaded in machine-readable formats
(depending on the level of openness: open, embargoed or restricted)
37. How can we reuse and access data?
Search by spaces,
datasets or files
• Facets for
metadata
Basic search
Advanced search
38. How can we reuse and access data?
If a tree structure has been
established for the files, the
first display option will be
this.
Files can be viewed in tree
or table format
39. How can we reuse and access data? – Open access
To download the file
individually
To select all dataset files
To download all selected
files
File preview
File in open access
Available formats
40. How can we reuse and access data? - Embargoed
It will be possible to access from 25-07-
2025
41. How can we reuse and access data? - Embargoed
We choose the date on which we want the
file to stop being restricted
42. How can we reuse and access data? - Restricted
File restricted, but access can be requested
43. How can we reuse and access data? - Restricted
We can enable request access for only one
dataset file
#14: Però, què entenem per dada? Doncs una dada és qualsevol informació que ha estat recollida, observada, generada o creada per validar un procés de recerca. Si el que volem és entendre que són les dades de recerca en obert, també hi haurem d’afegir conceptes com que son en línia, sense cost d’ús, accessibles i que es poden reutilitzar i redistribuir, sempre que se’n citi la font. I què entenem per dataset? Doncs aquell conjunt de fitxers (que poden ser dades, codi, documentació) i les metadades que les descriuen. Per tant, aquest conjunt de fitxers, per la raó que sigui, té sentit que estiguin agrupats. A vegades, la raó pot ser que formen part d’un mateix projecte, pot ser que donen veracitat, validen els resultats d’un article, etc.
I finalment, a mesura que s’anava estenent la publicació de dades es va veure que no totes les dades sempre s’havien de publicar en obert i, per aquest motiu, al al 2014 va sorgir aquest concepte que ha tingut molt d’èxit i que és el FAIR. FAIR no és res més que un acrònim de Findable, Accessible, Interoperable and Reusable i el que fa és que, per cadascun d’aquests conceptes, hi ha un conjunt bàsic de principis per tal d’optimitzar la reutilització de les dades. Per exemple, per a que les dades siguin trobables, és a dir, la F, cal que se’ls hi assignin un identificador únic, com és el DOI, per a que siguin accessibles, han d’estar disponibles , per a que siguin reutilitzables, han de seguir les convencions i les normes de la disciplina i s’han de poder exportar de manera automàtica entre diferents portals i, finalment, per a que siguin reutilitzables, han de tenir definida clarament la llicència i quin ús se’n podrà fer.
#38: Els datasets a l’RDR tenen tres nivells d’obertura: oberts, embargats i restringits.
Independentment del nivell d’obertura dels dfitxers, veurem això de la diapo