The challenges of new data types in data management and archiving
Social media data is increasingly used in the social sciences. Data archivists and managers and social scientists met online to discuss ongoing data management and archiving activities of new types of data. Other topics addressed were the willingness of researchers to deposit their data, access to such data, and challenges of an Internet-centred society.
A CESSDA Training webinar “New Data Types in Data Management and Archiving” took place on 15 September 2022. Four data archivists, data management specialists and social scientists presented their experience with storage, management and archiving of new data types (NDTs), i.e. administrative, transactional and social media data. They also shared results of their own research related to the usage and sharing of NDTs (see the programme).
The webinar and the following discussion attracted the attention of not only data archivists and data managers but also social scientists who use NDTs for their research.
CESSDA archives lack experience in working with NDTs
The first presenter was sociologist, data archivist and data management specialist Martin Vávra, from the Czech Social Science Data Archive (CSDA). In spring 2021, a questionnaire was sent to 24 European data archives in member and partner countries. It asked about their experience with archiving NDTs. The survey revealed that, although some NDTs are stored in CESSDA archives – especially the larger and well-established ones (e.g. GESIS) – the overall volume of NDTs archived is rather small. Another finding was that archives lack experience in working with NDTs and have not yet developed a widely shared and effective strategy for handling NDTs.
A proposal for a more coordinated approach
Brian Kleiner, the head of data services at the Swiss Centre of Expertise in the Social Sciences (FORS) and data management specialist, presented his proposal for a more coordinated approach to handling new data types across CESSDA archives. He addressed the key questions that European data repositories must solve in the near future and suggested possible forms of coordination between data archives. He recommended setting up forums for the exchange of expertise about the sharing and archiving NDTs, as well as publications on shared know-how. According to Kleiner, CESSDA archives should establish a common conceptual framework for NDTs and strive to take a coherent approach.
Barriers to sharing social media data
Yevhen Voronin, a researcher and data management specialist from GESIS, presented a study on sharing social media data (SMD) in the research community. An online survey was conducted among researchers using SMD. They were asked about their willingness to share the SMD they had used in their research. Voronin’s study focused on the barriers connected to sharing SMD and identified factors behind the researchers’ reluctance to share these data. The most important factors were the lack of interest in sharing data and/or knowledge about available data repositories, perceived legal obstacles to sharing the data, and finally, ethical and privacy concerns for the participants.
Social science in the Embattled Digital Age
Professor Pascal Jürgens from the University of Trier took a different approach to that of data management and archiving experts. Jürgens is a computational social scientist who works with NDTs on a daily basis. His perspective is strongly researcher oriented and takes into consideration the broader economic and political background of the contemporary, Internet-centred society.
Pascal Jürgens’ identified several reasons why NDTs have not been widely shared within the research and archiving community.
The two main reasons were:
The very high costs of obtaining NDTs
The collection of social media data and transactional data can be technically challenging and very time-consuming and expensive. Hence, social scientists who collected the NDTs are sometimes reluctant to share their data.
Pressure from powerful actors who are interested in the Internet as a source of economic and political power
These actors were identified as predominantly being either powerful private companies such as Google and Facebook, or authoritarian regimes with ambitions to censor and control the content of the Internet. These powerful actors prevent social scientists from obtaining the NDTs, alter the data itself, and attack those who dare to criticise the policies of the technological giants or the state laws. They strive to make the usage of NDTs difficult.
Based on this view, Jürgens recommended that archives offer researchers protection against these powerful actors. They should create a safe environment in which social scientists have legal access to NDTs and are protected against attacks from these powerful actors.
Still much to learn
Lastly, participants in the final discussion panel agreed that the community of data archivists and data management experts is currently facing challenges about how to grasp this still relatively new phenomenon. Handling NDTs is in its early days and there is still much to learn.
In 2021, CESSDA published “Archiving Social Media Data: A guide for archivists and researchers” in 2021. The guide presents suggestions for metadata elements that need to be developed or extended for the proper documentation of social media data. The guide takes into account the various challenges in archiving social media data and how they may be addressed. It also presents practical recommendations for researchers working with social media data.