Data publishing routes

It is expected that a Data Publication will ensure that data will potentially be considered as a first-class research output | Knowledge Exchange (2013).

For a dataset to “count” as a publication, it should follow a similar publication process to an article (Brase et al., 2009) and should be:

Properly documented with metadata;
Reviewed for quality;
Searchable and discoverable in catalogues (or databases);
Citable in publications.

The authors of a report from Knowledge Exchange (Knowlegde Exchange, 2013) define this type of data publication as 'Publishing with a capital P' and compare it with 'publishing with a small p, meaning that researchers publish their data files on a website somewhere. Publishing with a small "p" means that there are no guarantees that the data will be there after some time or that the files will not get corrupted.

Five routes

There are different ways to publish your data. Your preference may depend on the existing practices in your discipline or on the expectations of your funder.

According to a survey by Wiley (2014), the preferred way of publishing data is as supplementary material of a journal article. That may change as more data repositories become available, and more scientific journals recommend depositing in them. A data repository is a digital archive collecting, preserving and displaying datasets, related documentation, and metadata (OpenAIRE, 2017)

In the comparison table below we show five ways of publishing your data, together with their advantages and disadvantages.

Journal supplementary material service

Advantages

Most likely to comply with the journal or publisher’s requirements;
Data readily available alongside published findings.

Disadvantages

May be costly;
May claim copyright over the data;
May keep data behind a subscription wall;
Unlikely to offer a data repository’s functionality or long-term solution;
May not apply user-friendly or preservation formats;
More likely to accept subsets rather than complete datasets.

Institutional data repository

Advantages

Most likely to accept any data of value, especially if no suitable home can be found for it elsewhere, and to ensure that policy requirements for long-term access are met;
Researchers may trust such a repository more readily;
Possibly no charge for the data deposit;
May make your data visible via dissemination and promotion.

Disadvantages

May not offer sustainable long-term access to your data collection;
Might not have sufficient expertise in data and metadata standards needed for long time preservation and access.

General purpose repository

Advantages

Most likely to offer useful search, navigation and visualisation functionality;
Reach a wider audience of potential users;
Accepts a wide range of data types;
Suitable for cross-disciplinary data.

Disadvantages

Requires scrutiny of terms and conditions to ensure consistency with your funder, journal or institution’s policies on cost recovery, copyright/IP, and long-term preservation;
No editorial control over quality of deposited materials;
In most cases, only simple metadata is available, which is usually not enough for reuse.

Domain specific data repository

Advantages

Offers specialist domain knowledge and data management expertise, e.g. to create a catalogue record and documentation;
Likely to accept complete datasets (and not only the part of the dataset on which a publication is based);
May make your data visible via dissemination and promotion.

Disadvantages

Likely to be selective about what kind of data they accept.

Trusted domain specific data repository

Advantages

Offers specialist domain knowledge and data management expertise, e.g. to create a catalogue record and documentation;
More likely to accept complete datasets;
Provides preservation and curation to community standards, e.g. file formats migration;
Ability to control access of (sensitive) personal data;
May handle data re-use queries;
May make your data visible via dissemination and promotion.

Disadvantages

Most likely to be selective about what kind of data they accept;
May charge for data publishing;
Requires advance planning of the effort needed to meet high standards for metadata and documentation.

Choosing a data repository

There are hundreds of repositories worldwide. Some cater a specific research domain, while others are general-purpose repositories. They may be called something other than a repository, for example, a data centre or an archive | Whyte (2015).

If you decide to choose a data repository for publishing your data, which data repository should you choose? Sometimes the repository is already determined by your funder or another external party. But if the choice is yours to make, you may consider following the order of preference in the recommendations by OpenAIRE (2016b):

1: A (trusted) domain repository

Use a (trusted) repository already established for your research domain. The CESSDA archives are examples of domain-specific trusted repositories. Do note that not all individual datasets may be accepted or only certain types of data (e.g. surveys but not qualitative data). As a general rule, high-quality data with a potential for reuse and that can be publicly shared are submitted to this kind of repositories.
2: An institutional or recommended data repository

If a domain repository is not available, use an institutional research data repository. If such a repository is not available, you may follow the guidelines of your university or publisher. Some publishers provide lists of recommendations e.g., PLoS ONE (2014b) recommended repositories.
3: A general purpose repository

If none of the above is available, use a general purpose repository like Zenodo (n.d.), Figshare (n.d.) or Harvard Dataverse (2017). Here you can store, share and register your research data. Do take note that long-term preservation of your data collection is not always guaranteed. Check the repository in question to find out.
4: Find your own at re3data.org

Search Re3data.org (n.d.), a registry of over 1500 research data repositories, to discover other data repositories. You can search by subject, content type, and country. In addition, you can select whether you want to search for data archives with a certificate (a trusted repository), with data sets that are available via open access or for data sets that have a persistent identifier.

Expert tips

Timing is everything!
In data archiving and publishing timing is everything. If you archive or publish your data as soon as data collection ends, your knowledge about your data is still very high. As such, it will take you the least time to prepare your data for deposit while simultaneously guaranteeing the highest possible data quality for future users.
Publish a data paper
For high-quality datasets consider publishing a data paper in a data journal. This way, you can describe your datasets in more detail, which will increase their visibility and chances of being re-used. The data journal does not hold the datasets (they are in a data repository). See 'Promoting your data' for more information on this route.
Choose between self-archiving and expert help
There is a difference between self-archiving without any help and archiving with the help of an expert. While self-archiving is a quick and easy way to publish data, archiving with the help of an expert will enhance data quality. Expert help is most likely to be available at a trusted domain repository and an institutional repository. Check to see whether that is the case.

Table of Contents