Consider These Criteria When Selecting a Repository
FAIR means that data publishing platforms should enable data to be Findable, Accessible, Interoperable, and Re-usable. Many organizations, including the NIH, place considerable emphasis on data sharing that meets these principles. The elements target technical implementations via data repositories. They can also be considered in terms of how they facilitate human interactions with these systems and the data. The FORCE11 FAIR Principles (simplified here):
- To be Findable any Data Object should be uniquely and persistently identifiable (have an identifer, such as a DOI)
- Data is Accessible in that it can be always obtained by machines and humans, upon authorization, through a well-defined protocol
- Data Objects are Interoperable if metadata and data is machine-accessible and actionable, and utilizes shared terminology
- Data Objects are Re-Usable if the above are met, and the data can be automatically linked or integrated with other data sources, with proper citation of the source
To be Findable:
F1. (meta)data are assigned a globally unique and eternally persistent identifier.
F2. data are described with rich metadata.
F3. (meta)data are registered or indexed in a searchable resource.
F4. metadata specify the data identifier.
To be Accessible:
A1 (meta)data are retrievable by their identifier using a standardized communications protocol.
A1.1 the protocol is open, free, and universally implementable.
A1.2 the protocol allows for an authentication and authorization procedure, where necessary.
A2 metadata are accessible, even when the data are no longer available.
To be Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles.
I3. (meta)data include qualified references to other (meta)data.
To be Re-usable:
R1. meta(data) have a plurality of accurate and relevant attributes.
R1.1. (meta)data are released with a clear and accessible data usage license.
R1.2. (meta)data are associated with their provenance.
R1.3. (meta)data meet domain-relevant community standards.
What is the cost structure? Are there ongoing costs after deposit? Have you accounted for these costs in your grant budget?
- How will others discover your dataset? Does the repository provide for rich metadata that will enable discovery?
- Is the repository indexed by Google and/or scholarly databases in your field?
Persistent identifiers for deposited files
- Does the repository register your data to create a persistent identifier (eg. a DOI; see Findable, above)?
- Identifiers are necessary for citing your data set. Three types of persistent identifiers commonly used: handles (hdl), digital object identifiers (DOI) and Archival Resource Keys (ARK)
Policies and licenses
- Are data use agreements and/or licensing clearly presented, to allow depositors to state explicitly up front what uses they would be willing to allow?
- Creative Commons licensing
- Does it track and report data citation and other indicators of impact?
In an ideal world, all data repositories would meet a set of certification standards, such as in the CoreTrust Seal of Approval, or the Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC). These criteria are indicative of having a framework in place to preserve digital content. These are useful metrics, but many repositories have not achieved certification yet.