In social and scientific research, data are extremely important. Dealt with first as raw material, but then processed and analyzed by professionals in the framework of a research plan, they produce, in the end, knowledge. Associations – and not only of a scientific nature – generate a lot of data and today it is essential for them to handle, preserve and store most of it.
In practice, a Data Management Plan (DMP) summarizes the way in which researchers monitor their data over time, no matter their origin or type. Data can in fact come from a functional magnetic resonance apparatus or a particle generator, but can also contain texts, graphs, images, tables and so on.
In reality, some scientists, such as geneticists and astronomers, have used data-related methods for a long time, but, for others, they represent a novelty they now have to deal with. Geneticists, for example, can already count on over 70 metadata information systems, which can range from viruses to oncological images. However, each research community has its own metadata management system, as www.re3data.org shows.
DMPs are required to fully realize the so-called open science, whose aim is to make scientific research quickly available and as accessible as possible. Data retention makes it possible to reuse them, compare them and duplicate searches. It also prevents some from embarking on roads that have not led to concrete results.
In addition, when the amount of data is big, artificial intelligence can come in and deal with it much better than humans, especially in the medical field. Healthcare planners, providers and researchers are perfectly aware that the data they collect every day can be translated into valuable information for patients worldwide. Let’s just think, for instance, about the millions of mammograms that are processed everyday…
The fact is that US federal agencies, such as the National Science Foundation and the National Institutes of Health, but also the European Commission, require DMPs to obtain funding. And it is actually requested not only to specify the way in which the data will be produced, but how they will be stored when the research project comes to an end. Those who do not provide open access to the information they have collected for reasons related perhaps to intellectual property or security must have adequate reasons.
Lack of information
The problem lies in the fact that many researchers are not informed thoroughly. A survey carried out last year among over 1,200 young European research fellows and PhD students shows that only a quarter of them had generated a DMP and another quarter did not even know what it was. Many complained about the poor support from the institutions to which they belonged.
Unfortunately, each scientific discipline produces – qualitatively and quantitatively – a big amount of data, so that the variety of DMPs that can be needed is very high. Clearly, a particle generator provides an enormous quantity of data, while an anthropologist typically produces less. There is also research of a conceptual nature, or of a theoretical nature, that do not require any DMP: it is simply impossible to preserve every source, even minimal, of information. A fundamental step, however, concerns the indication of who, after a research project is finished, will keep the data. Choosing a physical person will be unwise; a library will be more appropriate. But even in this case, since libraries do not store personal data, it is advisable to include the data in a specialized computer archive.
There is also the problem of sensitive data, especially the medical ones, more than ever vulnerable in our ‘big data’ era. A lot of companies accumulate personal data in order to resell them to third parties, allowing them to hone their marketing strategies or propaganda. These ‘gatherings’ of information are based on individual, demographic and geographical data. Those called “psychographic” are focused on behaviours and attitudes, and usually come from the illegal fishing from smartphones and social networks. Some agencies – as shown during the last presidential campaign in the US – even managed to collect data on voters. The aim was for them to tailor their messages, and, in the end, influence the election.
In this context, as some sensitive data can clearly be obtained in an illegal manner, many research institutions are looking for ways to control them. Trust protocols have been adopted that allow for transparent exchanges. And, to accelerate the acceptance of what some consider as being just another administrative burden, science professionals and research associations must work to streamline the process and to explain its benefits. Funders and institutions, then, must ensure that data management, and the basic skills of exercising it properly, becomes widespread.
Clearly, some steps are fundamental to producing a good DMP. Online help is available to develop a DMP that fits the requirements of the funding agency on the basis of the association field. Clear objectives about how the data will be archived will clarify the storage space and the formats needed. Other aspects to be touched upon include who will use and manage the data, how and when data will be shared with people outside the association.
We are all perfectly aware that technology can outpace its regulation. But we cannot be afraid of sharing and, at the same time control, the data we generate considering our codes of ethics.
This article was contributed by Franco Viviani, the former President of the International Council for Physical Activity and Fitness Research (ICPAFR) and a professor of Anthropology at the University of Padua, Italy.