I am collecting data on asylum seekers and refugees' experiences of forced labour. These participants can be considered 'doubly vulnerable'. We want to share these data. How should we protect our participant's anonymity?
The best way to protect your participant's privacy may be not to collect certain identifiable information at all. The second best is anonymisation which allows data to be shared whilst protecting participant’s personal information. Anonymisation should be considered in the context of the whole project and how it can be utilised alongside, informed consent and access controls. For example, if a participant consents to their data being shared then the use of anonymisation may not be required.
Personal data can be disclosed through two categories of identifiers.
When anonymising, data identifiers need to be removed, generalised, aggregated or distorted. Below, best practices for anonymising quantitative and qualitative data are given.
1. Data access controls
In situations where (sensitive) personal data are not fully anonymised, data can still be archived and shared by regulating or limiting access to the data. Access controls can permit control down to an individual file level, meaning that mixed levels of access control can be applied to a data collection. You will learn more about choosing the appropriate data access category for your data files in the chapter on archiving and publishing data (see 'Access categories').
2. Irreversible anonymisation
In some countries anonymisation needs to be irreversible and the original data deleted. Be sure to check the national requirements.
3. Anonymisation tools
The UK Data Archive (n.d.b.) has developed a Text anonymisation helper tool (downloads in a .zip file) with how to install instructions via Wiki. It is an add-on MS Word macro for aiding anonymisation of qualitative data.
4. Reading tip
In this factsheet by OpenAIRE (2017) you are guided in how to balance open access and data protection and advised on what to do when anonymisation isn't possible.
In a research study on investigating how couples manage their households during recessions (Gush and Laury, 2015), finding the right balance between confidentiality and usefulness of the data was a real challenge (UK Data Service, 2017c). Archiving challenges with this project were to anonymise the data and apply optimal access conditions.
Careful judgement was required to apply the level of anonymisation most appropriate for this particular data. The research team members went through the transcripts and removed certain types of identifying data such as names, places of work, and geographic areas. Regarding access conditions, it was decided to make the data available using a Special Licence (UK Data Service, 2017d; see 'Licensing your data' for other possible licenses). Under this kind of licence, a potential user is required not only to register with the UK Data Service, but also to complete a detailed application form and agree to additional restrictions on data handling and usage. The use of the Special Licence then made it possible to apply a minimal level of anonymisation, thus reducing loss of data quality.
Follow the steps to see whether you recognise direct and indirect identifiers in an interview transcript and whether you know how to deal with them accordingly.