News

Image: Fabien Barral via Unsplash
Wed 28 Jul 2021

Find out about a script to make Dataverse-produced metadata compliant with the CESSDA Data Catalogue.

Now that the CESSDA Data Catalogue (CDC) is up and running, service providers are expected to provide access to their collections to the catalogue bot.

The bot harvests metadata from various sources so that it all becomes available in a one-stop shop, the CDC. Researchers can thus find datasets available for reuse all over Europe.

However, to make this possible, service providers have to format their metadata to meet the conditions set out by the CESSDA DDI profiles. This is not always easy, especially when service providers rely on externally developed tools to manage their collections.

One such tool is Dataverse, a web application for data ingest and dissemination developed by the Institute for Quantitative Social Science (IQSS) of Harvard University. Dataverse is used by eight of CESSDA service providers and partners.

While the software enables some degree of metadata customisation, the XML output must be used as-is. That is no longer the case thanks to the SUPER DADA script developed by SODHA.

‘SUPER DADA’ stands for Script for Updating Electronic Records: From Dataverse to CESSDA.

Once run, the script edits Dataverse-produced metadata to make it compliant with the CESSDA CDC DDI 2.5 Profile 1.0.4. This means the CDC will return no constraint violations and all the information contained in the metadata will be properly distributed in the CDC’s own metadata fields.

The script is available on GitHub in a specific sub repository created by IQSS for CESSDA-related Dataverse developments.

The Belgian CESSDA service provider, SODHA (Social Sciences and Digital Humanities Archive) welcomes feedback and suggestions for improvement.

More information:

SUPER DADA script on GitHub

CESSDA DDI profiles

Dataverse installations around the world