Data Management Plan – An Essential Tool for the Data Industry
By Mickaël Rigault, ELDA
For as far as research has existed it has always relied on data to make its findings. However, the management of data in both research projects and other data intensive processes has become a crucial element in the last few years. This management implies having knowledge of the value of datasets for other projects and developments, for the evaluation of research and sometimes for commercial undertakings.
This is the reason why, nowadays, both the research community and funding institutions require data-production teams to draft and implement data management plans for the data they collect in the course of their R&D work.
1. Definition and Content of the Data Management Plan (DMP)
A data management plan is generally a written document whose purpose is to define how data are going to be managed from the time they are collected during the project to the time they are stored and curated, even after the project has ended.
The drafting of a data management plan usually requires the research team to answer a series of questions that will allow them to ascertain the different requirements they will enact while collecting and curating their research.
The European Commission Horizon 2020 portal gives access to a template that is specifically designed to cover the main aspects of most data management plans and gives a hint of the content it should contain. Below follows the description of the points and sections addressed by such DMP template.
1.1 Data Summary
This section looks into the general information that is required from the project team, such as the purpose of the data collection to be carried out and its relation to the project, and the data types and format to be collected.
In addition, there are also questions about the re-use of data, their origin, size and utility.
1.2 FAIR Requirements
The H2020 template is specific as it refers to the FAIR data principles, whereas other templates may not address those principles in a strict manner, even if they cover basically the same issues that are defined below:
- Making data findable
This requirement covers questions around the findability of data, which in practice adeals with the identification of data, naming conventions, searchability of data, versioning and metadata standards.
- Making data openly accessible
Accessibility has to be properly managed too. Project teams have to determine how and which data will be made openly accessible by default. They will need to define which tools will be needed to access the data and whether the data and associated metadata will be deposited on an open source repository. If restrictions on access are imposed, they need to be defined and justified, as well as properly described.
- Making data interoperable
The researchers preparing the DMP need to define how they will make their data interoperable by default, in order to allow data exchanges and re-use with other organizations. In doing so, they will need to clarify which metadata standards they will use and which standard vocabularies.
- Making data reusable
This requirement covers mostly questions around the licensing of data to allow wide re-use, as well as quality assurance principles.
1.3 Allocation of resources
The questions asked in this section enquire about the costs of producing and collecting the data in compliance with the FAIR data principles and thus regard open access. This section also covers responsibility and long-term preservation costs.
1.4 Data security
Data security is also a key point regarding data preservation and curation. This section compiles information about secure storage, means for data recovery, transfer of sensitive data and usage of certified repositories.
1.5 Ethical aspects
This part comprises questions surrounding ethical issues that could limit data sharing, especially dealing with consent in questionnaires with personal data.
2. Purposes of the Data Management Plan
Filling out a Data Management Plan before starting any project which aims to collect and/or produce data may be tedious or seem irrelevant, in particular when it comes to innovation. Nevertheless, a sensible drafting and thorough implementation is of great help to manage certain concerns that arise in the research community.
Indeed, a Data Management Plan serves several useful purposes:
- The sharing scheme developed in the Data Management Plan is a useful tool to make its research more influential and easier to verify by others in the research community.
- This document serves as a planning document which will help save time and resources that will be better used on the project as such.
- This document may help comply with regulatory obligations such as the GDPR in Europe or other privacy protection legislations, by providing useful answers to questions regarding issues like data minimization, purposes of data collection, storage, security and many others.
- This document may be a requirement to obtain financial support from funding agencies.
- The openness of data or lack thereof may foster or hinder innovation in the field.
The usefulness of the Data Management Plan goes even beyond the completion of the research as it is meant to ensure the accessibility and reusability of data even after the completion of the project.
This makes the DMP an essential and crucial tool for all projects, relying heavily on data collection or production.