CESSDA asks ten questions to Simon Hodson
Simon Hodson is the Executive Director of CODATA, the Committee on Data of the International Science Council (ISC).
“CODATA exists to promote global collaboration to advance Open Science and to improve the availability and usability of data for all areas of research”1. Simon is a data policy and research data management expert and is active in a number of projects. Amongst others, he is a member of the CESSDA Scientific Advisory Board.
He has contributed to influential reports on Current Best Practice for Research Data Management Policies and to the Science International Accord on Open Data in a Big Data World. Most recently he chaired the European Commission’s Expert Group on FAIR Data which produced the report Turning FAIR into Reality https://doi.org/10.2777/1524.
CESSDA asked Simon Hodson to answer a few questions.
CODATA has substantial experience in bringing together stakeholders with differing roles and perspectives - researchers, data management experts, policy leaders - to facilitate knowledge sharing between research communities and research institutions. What have been the key approaches and incentives in achieving a productive dialogue?
We are a global organisation and the key to success is identifying activities in areas of shared interest. It is important to have a dialogue, analyse the situation and understand what the points of contact and areas for collaboration are. The varied organisations with which we work in different countries have diverse interests in what we are doing. The root approach is “to keep your ears to the ground,'' to be a listener as well as a talker, try to understand what the issues are from the perspective of the governments, the institutions or the researchers that we are engaged with.
If you look to Japan or Australia, do you see different topics appear than in Europe or the US?
I think that there is a shared recognition of the need for research data stewardship. And there is a shared recognition of the need to get more results, more return on investment from scientific expenditure and in particular in relation to the data.
What is very different is that in Japan, for example, a lot of the funding is more centralised. When a couple of years ago we did the work with the OECD on business models for data centres, there was a very big difference in the way that data repositories, data services were funded from country to country, from economic zone to economic zone. In China, Taiwan and Korea it is also mainly central funding. Their concern is that we need to have an assessment mechanism that allows us to demonstrate and allows the government funder to make the decision “is this data worth the investment?” either for the creation, or the stewardship of the data.
What are the next steps in the work of the European Commission FAIR Data Expert Group and what role can CESSDA play in achieving its aims and ambitions?
The work of the Expert Group on FAIR Data as such is completed. We really put a lot of effort into our report and the feedback has been very gratifying. What we really tried to do was to create something that would be approximate to a manual and a guide that would have a validity for a few years beyond that and that people could reference and say “these are things that need to be done in the more medium- to long-term not just in the short-term.''
So what I hope is that stakeholder organisations such as CESSDA can use it as a manual and a guide not just for engagement with EOSC but for engagement with the research data management/stewardship and Open Science space and the FAIR space more generally. We see organisations like CESSDA as fundamental building blocks for the OS and FAIR space in Europe and for the EOSC.
What is important is that there is coordination at a European level of those entities that look after social science data in the long-term, that ensure that those data are Findable, Accessible, Interoperable, Reusable. Organisations that keep pushing the technology and culture in relation to those data, such that we can analyse them not just by downloading datasets and putting them in our statistical software, but we can actually do data level or variable level search and integration across the sort of holdings that might be of interest in different CESSDA members, for example.
What are the biggest challenges to FAIR Data that need international collaboration?
Something that always springs to mind for me is that the “F” and “A” of FAIR are a lot easier than the “I” and the “R”. Relating to “R”, I think that those research institutions and those entities looking after data in the long-term should realise that it will benefit the advancement of research if we take on board archival principles. The provenance of data needs to be understood and communicated so that it can be properly understood and reused. That covers a lot of aspects. The key aspect here is the recognition and realisation that provenance information needs to be captured effectively, automated where possible, and stored, otherwise we cannot reuse the data.
In terms of interoperability (the “I”), although it is a well-discussed topic, it still rests with the semantics, the metadata standards, the definitions, vocabularies, and the ontologies that we use. Real concerted work on that is still necessary. It will always be necessary and on one level, isn’t this the core of what we do as scientists? If we don’t have shared definitions of the things that we are measuring, whatever they are (boiling points or chemical constitutions or the way particular social groups answer particular questions), if we do not have proper definitions about that, we can’t compare the data that has been gathered and it’s not interoperable or reusable.
What future business models for science publishing in the open science/open data policy framework will be needed to avoid a greater divide between researchers in the developed world and those in developing countries?
One aspect of that is that there is a need for greater investment in an Open Science platforms and activities in the developing world and that is something I think that we are trying to do through the African Open Science Platform. We are trying to ensure that the investment (including some outside investment) is African-led and that the governance and coordination comes from Africa.
We have a responsibility, as the global north, in that scenario as well as it is not acceptable to conduct research in the global south without having some of that investment lead to improved research infrastructure. There has to be investment in those African countries to look after the data, developing those research infrastructures and an African equivalent of CESSDA. One example of doing things the right way is the SKA example in South Africa, where in other tiers of the research infrastructure, data will be looked after in other countries. The H3Africa project (Human Hereditary & Health in Africa) and the bioinformatic wing of that project (H3Africa Bionet) is developing data stewardship capacity in a number of institutions across Africa to look after bioinformatics and human genome data and to analyse it.
You played a prominent role in working on the recently published Open Science Framework for South Africa. What are the next steps in taking this forward?
The report is now with the South African Department of Science & Technology, where there is a lot activity around creating an Open Science programme. Something that is important to stress is that the African OS Platform is envisaged as a pan-African initiative, although the pilot funding comes from South Africa, this is not to be the case in the long-term. That ethos is very strong.
Some challenges that the report uncovered which need to be addressed on a pan-African level were the balance to be struck between the open availability of data and the promotion of open innovation, while also encouraging the monetisation and exploitation of original thought.
At the International Data Week last year in Botswana, there were 850 participants, with almost two thirds of these from Africa and half of those from Botswana and South Africa. There is a huge amount of interest. The main challenge is the investment.
What good practices and mechanisms for controlled sharing of sensitive and restricted data exist in the medical sciences and elsewhere that the social sciences and humanities could learn from?
In the human and medical sciences, controlled sharing of data is obviously the way forward. There are mechanisms for doing that in a particular way, for example at ELIXIR nodes or other biomedical data archives. Of course social science data archives have long explored how to maximise data access while retaining necessary protection. I think the principle of proportionality in data protection is important also.
Quality control of data is both essential for the reuse of data and often a root cause of a scientist's reluctance to share their data. How do we encourage more data sharing amongst scientists within and across universities?
Quality control is a root cause of the reluctance sometimes to use other people’s data. What one also hears in the surveys that are conducted with researchers about sharing their data is the concern that it will not be understood properly. I think there is also the fear of some research groups or individuals that when sharing the data, they will be found to have made a mistake, that they have done something wrong and that they will be caught out.
You have to show your working, or the sources that you used or the data that you referred to and substantiate why those data are significant, why they’re a representation of a reality that is worth paying attention to and why the analysis that is conducted is valid. This is what we do. In the end, this is about research culture. Surely, we should all be more concerned about doing the work in a valid way, that can be shown to be robust, rather than be concerned that we will be caught out.
I think that part of the issue is that people feel under pressure to publish and to publish quickly. Perhaps the knowledge that people have in their heart of hearts that they have cut corners in order to achieve that. The answer to that has to be to cultivare a scientific culture that puts more emphasis on quality than quantity.
I come from a humanities background. I was a historian and the sort of articles or work that I used to enjoy reading the most was where the interpretation was there but also you would get immediate access to transcripts of the documents, or at least the presentation of the resources in various ways (a database, a map, the transcripts of text). I think that sort of research output is what is genuinely scholarly work and I am sure it can be translated to almost any discipline of research. For things generally to be evidence based, you have to provide access to those sources and the analysis.
There is another discussion that could be had about what the quality of data is and it’s a number of things. FAIR helps us understand a certain aspect of quality, the epiphenomena of data, things around the data that allow the data to be FAIR (the metadata, etc.) and so we can judge that. But the quality of data properly speaking is a scientific or a research question. I think it is important to retain that distinction when we talk about the quality of data.
What do you see as the main challenges and channels for introducing more people to academic research and enhancing citizen engagement?
The challenge of our time is populism and anti-intellectualism, politics that are anti-expert (for people following UK politics), and anti-science. There is a very strong anti-science current in political movements and it’s very concerning. Part of it is a resentment of intellectual activity and a resentment of the rigour that is required to communicate things based upon evidence rather than just to latch onto easy soundbites and easy prejudices.
I think that we have to fight for enlightenment values, fundamentally scepticism (in the proper philosophical sense) about what we’re told and a demand for evidence to prove that we should believe what we are asked to believe or what people say should be believed. We have to push the agenda for what’s called “transdisciplinary research” and is actually engaging communities, stakeholders with the research that is being conducted about them and about things that matter to them (see HRI-LIRA).
At International Data Week in 2016 in Denver, a scientist from an indigenous community on the West Coast of America, Christopher Horsethief, spoke very eloquently about the need not to ‘Other’ your subject in social sciences, not to be on the outside, but to engage them properly in the objectives and the conduct of the research. That is as good an example of transdisciplinary research as you can get. We are no longer doing nineteenth century anthropology work where the subject was very much on the other side of an observational divide.
CODATA is part of the International Science Council, which merged the natural and physical sciences and the international social science council to form a new body. The mission statement is “science as a global public good”. This sums up in a few words I think something which is extremely important which I hope that ISC and CODATA will be fighting for over the next few years. It’s very important to make the case for science as a public good and all the ramifications that that leads to, including in relation to data.
If one big leap forward could be achieved now in Open Science/Open Data, what in your view should it be?
In the end, the fundamental question here is what is the unit of production for scientific research? It shouldn’t be anymore the PDF article. It should be this agglomeration of what could be described as FAIR outputs, and it’s the interpretation of that, the presentation of that to a human audience. It’s the data, it’s the analysis and something that can be used by machines. Moving towards that as the culture of research is what is really needed.
We need for the thing that we are measured for, the thing that advances our careers to be appropriate for research in the digital age. For it to be that rather than an article which is an analogy for paper, which you download as a PDF. For that to still be the unit of production of research, what is measured when assessing peoples’ scientific contribution and considering their careers, and therefore what people are concerned about is simply not enough anymore.
European Commission Expert Group on Turning FAIR Data into Reality (CODATA website)
Previous interview: CESSDA asks ten questions to Cathrin Stöver