Data publishing routes

It is expected that a Data Publication will ensure that data will potentially be considered as a first-class research output | Knowledge Exchange (2013).

BigP600px

For a dataset to “count” as a publication, it should follow a similar publication process to an article (Brase et al., 2009) and should be:

  • Properly documented with metadata;
  • Reviewed for quality;
  • Searchable and discoverable in catalogues (or databases);
  • Citable in publications.

The authors of a report from Knowledge Exchange (Knowlegde Exchange, 2013) define this type of data publication as 'Publishing with a capital P' and compare it with 'publishing with a small p, meaning that researchers publish their data files on a website somewhere. Publishing with a small "p" means that there are no guarantees that the data will be there after some time or that the files will not get corrupted.

Five routes

Publish800px

There are different ways to publish your data. Your preference may depend on the existing practices in your discipline or on the expectations of your funder.

According to a survey by Wiley (2014), the preferred way of publishing data is as supplementary material of a journal article. That may change as more data repositories become available, and more scientific journals recommend depositing in them. A data repository is a digital archive collecting, preserving and displaying datasets, related documentation, and metadata (OpenAIRE, 2017)

In the comparison table below we show five ways of publishing your data, together with their advantages and disadvantages.

Advantages

  • Most likely to comply with the journal or publisher’s requirements;
  • Data readily available alongside published findings.

Disadvantages

  • May be costly;
  • May claim copyright over the data;
  • May keep data behind a subscription wall;
  • Unlikely to offer a data repository’s functionality or long-term solution;
  • May not apply user-friendly or preservation formats;
  • More likely to accept subsets rather than complete datasets.

Advantages

  • Most likely to accept any data of value, especially if no suitable home can be found for it elsewhere, and to ensure that policy requirements for long-term access are met;
  • Researchers may trust such a repository more readily;
  • Possibly no charge for the data deposit;
  • May make your data visible via dissemination and promotion.

Disadvantages

  • May not offer sustainable long-term access to your data collection;
  • Might not have sufficient expertise in data and metadata standards needed for long time preservation and access.

Advantages

  • Most likely to offer useful search, navigation and visualisation functionality;
  • Reach a wider audience of potential users;
  • Accepts a wide range of data types;
  • Suitable for cross-disciplinary data.

Disadvantages

  • Requires scrutiny of terms and conditions to ensure consistency with your funder, journal or institution’s policies on cost recovery, copyright/IP, and long-term preservation;
  • No editorial control over quality of deposited materials;
  • In most cases, only simple metadata is available, which is usually not enough for reuse.

Advantages

  • Offers specialist domain knowledge and data management expertise, e.g. to create a catalogue record and documentation;
  • Likely to accept complete datasets (and not only the part of the dataset on which a publication is based);
  • May make your data visible via dissemination and promotion.

Disadvantages

  • Likely to be selective about what kind of data they accept.

Advantages

  • Offers specialist domain knowledge and data management expertise, e.g. to create a catalogue record and documentation;
  • More likely to accept complete datasets;
  • Provides preservation and curation to community standards, e.g. file formats migration;
  • Ability to control access of (sensitive) personal data;
  • May handle data re-use queries;
  • May make your data visible via dissemination and promotion.

Disadvantages

  • Most likely to be selective about what kind of data they accept;
  • May charge for data publishing;
  • Requires advance planning of the effort needed to meet high standards for metadata and documentation.

Choosing a data repository

re3data800px

There are hundreds of repositories worldwide. Some cater a specific research domain, while others are general-purpose repositories. They may be called something other than a repository, for example, a data centre or an archive | Whyte (2015).

If you decide to choose a data repository for publishing your data, which data repository should you choose? Sometimes the repository is already determined by your funder or another external party. But if the choice is yours to make, you may consider following the order of preference in the recommendations by OpenAIRE (2016b):

Expert tips

  • ExpertTIp400pxContrast
  • Timing is everything!
    In data archiving and publishing timing is everything. If you archive or publish your data as soon as data collection ends, your knowledge about your data is still very high. As such, it will take you the least time to prepare your data for deposit while simultaneously guaranteeing the highest possible data quality for future users.

    Publish a data paper
    For high-quality datasets consider publishing a data paper in a data journal. This way, you can describe your datasets in more detail, which will increase their visibility and chances of being re-used. The data journal does not hold the datasets (they are in a data repository). See 'Promoting your data' for more information on this route.

    Choose between self-archiving and expert help
    There is a difference between self-archiving without any help and archiving with the help of an expert. While self-archiving is a quick and easy way to publish data, archiving with the help of an expert will enhance data quality. Expert help is most likely to be available at a trusted domain repository and an institutional repository. Check to see whether that is the case.