Publishing data in a data repository does not automatically make them openly accessible. (Sensitive) personal data can still be protected by limiting access to the data. Access controls can permit control down to an individual file level, meaning that mixed levels of access control can be applied to a data collection.
Many data repositories operate a three-tiered approach to data access:
Open access Data that can be accessed by any user whether they are registered or not. Data in this category shouldn't contain personal information (unless consent is given (see 'Informed consent').
Access for registered users (safeguarded) Data that is accessible only to users who have registered with the archive. This data contains no direct identifiers but there may be a risk of disclosure through the linking of indirect identifiers.
Restricted access Access is limited and can only be granted upon request. This access category is for the most sensitive data that may contain disclosive information. Restricted access requires long-term commitment of the researcher or person responsible for the data to handle the upcoming the permission requests.
Embargo Besides offering the opportunity for restricted access 'for eternity' most data repositories allow you to place a temporary embargo on your data. During the embargo period, only the description of the dataset is published. The data themselves will become available in open access after a certain period of time.
Access conditions may differ slightly between data repositories. In the accordion below two examples are given.
At the Slovenian Social Science Data Archives (ADP) access to data and accompanying materials is determined in the Policy of Digital Preservation (ADP, 2017b). The types of access in ADP are the following:
Open Access Users may freely access the catalogue of the ADP, study metadata and research data of a limited range of studies without registration. Nonetheless, the use of data and accompanying materials is limited by the legislation, social sciences and institutional ethical standards and copyright.
Special Conditions Access Some data sets are only accessible under special conditions. In order to gain access, a special permission from the original authors is needed. For example when:
Data are not fully anonymized In this case, additional protection is required. Such files are called Scientific Use Files (SUF).
Embargo The authors place an embargo on access and decide that the datasets will only be available after a certain period of time, for example, after 6 months.
Limited availability The dataset is available only to the person/institution, ordering the study, or to the original authors.
If a user wants access to files in the Special Conditions Access section, not only a regular registration form (Standard Access) should be filled in but also an additional one which is called: "Application for access to materials on request". The Commission for the protection of Confidentiality carefully inspects such applications and decides on the possibility of access to the requested study data.
Types of possible special access are:
Access through a safe connection
Access in a safe environment If the data are especially sensitive the user may be granted access to it only in a safe room of the data archive. These are the so-called Secure Use Files (ScUF) that the user may access only after signing a special contract, determining the rights and obligations of use of the requested research data.
All research data at DANS are stored in and made available by its online repository EASY (DANS, 2017b). A licence agreement is always agreed between DANS and the depositor of the dataset: the person or organisation depositing a dataset in EASY who is normally the rights holder. One of the most important parts of this licence agreement is the access category by which the access to the dataset can be specified.
DANS supports the Open Access movement. This means that DANS encourages research data and publications to be made freely available as much as possible, without any restrictions. However, substantiated reasons exist why research data is not, or not immediately, freely accessible. This can be due to the presence of personal data or a temporary embargo on data due to an impending PhD thesis or other publication, contract obligations with third parties, etc. DANS, therefore, provides along with open access, the possibility of restricted access to research data.
EASY offers two Open Access categories and one Restricted Access category. The access categories are:
Open Access (CC0 Waiver, Creative Commons, n.d.b.) The dataset is, without any restriction, made available to all EASY users, both registered and unregistered, in accordance with the conditions of the Creative Commons Zero Waiver.
Open Access for Registered Users The dataset is only made available to all registered EASY users. Any existing copyrights and/or database rights are respected.
Restricted Access The dataset is only made available to those registered users that have obtained permission from the rights holder.
Datasets containing personal data are mostly placed in the category Restricted Access. Some datasets with personal data are made available in the Open Access categories. This is, however, only possible when explicit informed consent has been given by the persons involved. This is quite often the case with Oral History interviews (DANS, 2012). Besides from this open category, sensitive data can only be accessed by authorised users whose identities have been checked and who may be required to also sign special, additional, conditions of use.
Open metadata for (sensitive) personal data
Even if personal data cannot be published in open access, it is always possible to publish the metadata which belongs to this dataset. Openly publishing metadata is, in fact, the only way to make such datasets discoverable.
Trusted data repositories are dedicated to increasing the discoverability of your data sets. Therefore, metadata is always freely accessible in any of the CESSDA archives. That means that:
No registration is needed for searching in the metadata;
No registration is needed for harvesting the metadata (e.g. by search engines).
Metadata of sensitive datasets should never contain confidential or identifying elements or characteristics, like names.
When someone finds a dataset under restricted accesss (most likely because they containing (sensitive) personal data), he or she can submit an access request to the rights holder. If this is granted, the dataset will be available to download by this user. Even then the use is restricted. The user is not allowed to make the personal data of this data set public and can only refer to the data in an anonymised way.
Access control strategy
When choosing an access category, consider the following:
Does the data contain identifiable information?
Can the information in this data collection be linked with anything in another data collection which might lead to participant’s identities being disclosed?
What did participants consent to?
If ‘restricted access’ is to be chosen who will manage the access to this request?