Keywords
Data Management Plan (DMP), TILDA, health research, data governance
This article is included in the TILDA gateway.
A Data Management Plan (DMP) is a formal document that outlines the management and stewardship of data generated over the lifecycle of a research project from data collection, and governance structures, to the long-term preservation of data outputs. DMPs are an important feature of good research practice. Our aim is to provide details of the development of a DMP that others can learn from and adapt to their specific needs.
Our DMP was developed as part of a COVID-19 sub-study of The Irish Longitudinal Study on Ageing (TILDA), titled “Altered lives in a time of crisis: preparing for recovery from the impact of the COVID-19 pandemic on the lives of older adults”. TILDA is a longitudinal study of community-dwelling older adults. In 2009/2010, an initial nationally representative sample of 8,500 adults aged 50 years and older were selected. The sample for the COVID-19 study were recruited from this existing sample. The objective of the sub-study was to document the lives and experiences of older adults during the COVID-19 pandemic to better understand the effect of the pandemic and public health responses on their well-being.
This DMP describes the study design and objectives; data collection tools and procedures; data preparation; data storage and security; data sharing and preservation; and ethical and legal considerations within the European Union and Irish Health Research legislative context.
Responsible data governance in Ireland is complex, requiring adherence to both European and Irish legislation. Implementation of the Health Information Bill (2023) may bring further complexities to this context. It is therefore crucial that researchers, data stewards, and other practitioners, share their expertise freely, as we have done here, so that others can learn from their experiences and the health research community can develop standards of best practice.
Data Management Plan (DMP), TILDA, health research, data governance
Here, we provide a detailed description of the Data Management Plan (DMP) developed by The Irish Longitudinal Study on Ageing (TILDA). This record supports the development of DMP in other studies. The DMP described herein was developed as part of a large-scale study of older adults conducted during the COVID-19 pandemic. A DMP is a formal document that outlines the management and stewardship of study data during the lifecycle of a research project, including data-collection procedures, storage, and dissemination. Importantly, a DMP also includes a description of the data governance structures beyond the project funding lifecycle. To do so, DMPs describe the long-term preservation, archiving, and data-sharing aspects of data collected during a research study (Science Europe, 2021). DMPs are an important and increasingly necessary feature of good research practices and are routinely required by national and international funding agencies. Our contribution is not intended as a toolkit or template to be strictly adhered to by others, as they develop their own DMP. Instead, we provide a description of TILDA’s experience of developing a DMP that others can learn from, which can be adapted to their specific context and needs.
TILDA is a longitudinal study of community-dwelling older adults. This study is part of the international family of Health and Retirement Studies (HRS). In 2009/2010, an initial nationally representative sample of 8,500 adults aged 50 years and older was recruited for the study. These same individuals were interviewed every two years and completed a comprehensive health assessment at Waves 1 (2009–2001), Wave 3 (2014–15; Cronin et al., 2013), and most recently at Wave 6 in 2022–2023. Details of the methodologies and procedures employed by TILDA have been published previously (Donoghue et al., 2018; Kearney et al., 2011; Kenny et al., 2010; Whelan & Savva, 2013). In early 2020, TILDA completed the pilot phase for Wave 6 data collection. However, this round of computer-assisted personal interviews (CAPI) was halted by the onset of the COVID-19 pandemic in March 2020. With an existing sample of older adults and a fully developed research infrastructure in place, TILDA was uniquely positioned to document the real-time impact of the pandemic on the lives of older adults. Therefore, with the financial support of the Health Research Board (HRB, COVID-19 Pandemic Rapid Response Funding Call (COV19-2020-070]), TILDA invited existing participants to complete a Self-Completion Questionnaire (SCQ) between July and November 2020. Full details of the procedures adopted by TILDA during the COVID-19 sub-study are provided in the study protocol (Ward et al., 2021a). As part of this study, we also developed a DMP that we share in detail here.
Our intention in describing this study DMP in detail is to provide an example of good research practice that other researchers and data stewards may learn from when preparing their own DMPs. This work will be of particular interest to individuals developing DMPs within the European Union and Irish Health Research legislative context.
The title of TILDA’s COVID-19 sub-study was “Altered lives in a time of crisis: preparing for recovery from the impact of the COVID-19 pandemic on the lives of older adults” (Costello et al., 2021; Ward et al., 2021a; Ward et al., 2021b). The overall aim of this study was to document the lives and experiences of older adults during the COVID-19 pandemic to better understand the effect of the pandemic and public health responses on their well-being. Specifically, the aims of the study were to: (1) describe the prevalence of COVID-19 symptomatology and testing among older adults; (2) describe the levels of adherence to public health guidelines intended to halt the spread of the virus; (3) examine health-related, caring, and other unmet needs; (4) measure changes in well-being and examine whether these vary between groups defined by gender and other socio-demographic characteristics, socioeconomic status, coexisting conditions, and existing psychological ill-health; (5) examine whether older adults experienced ageism or discrimination during the pandemic; and (6) describe how public health information was received and understood.
It was not possible to conduct in-person face-to-face interviews due to the Irish Government’s public health restrictions in response to the COVID-19 pandemic. Therefore, data were collected via the postal SCQ. The questionnaire design was guided by three considerations: First, indicators that had been included in previous TILDA questionnaires were selected so that we could examine changes over time. Second, preference was given to indicators that were included in other ageing cohorts from within the Health and Retirement Study (HRS) family of studies, among others, to aid harmonisation of analysis. Third, our choice of indicators was informed by the World Health Organization (WHO) COSMO toolkit (World Health Organization, 2020). This toolkit provides guidance for the development of survey instruments to capture insights into changes due to the COVID-19 pandemic.
Questionnaires, participant information leaflets (PIL), informed consent forms (ICF), and pre-paid stamped addressed return envelopes were posted to TILDA participants in early July 2020. Of the 5535 booklets posted to participants, 3922 completed and returned their SCQ, giving a response rate of 71%. Key demographic characteristics were extracted from existing TILDA data and linked to the newly generated sub-study data so that they could be included in the analyses. These demographic variables included gender, age, education, household size, and urban/rural location.
The data generated from the SCQ were primarily quantitative. A small amount of qualitative data (two open-ended questions with free-text response boxes) were also generated. However, because of the personal and identifiable nature of qualitative data, this qualitative component is not included in the publicly available data files described below.
Upon the return of each completed SCQ, the paper questionnaires were immediately reviewed, time-stamped, and assigned a storage box by a TILDA data team administrator. All identifying information was removed from the returned materials by using a permanent black marker. Each booklet was assigned a unique ID, rendering it pseudonymised. Data cleaning was then performed. Each booklet was manually coded following a standardised protocol to ensure that all responses within the booklets were formatted consistently. A red pen was used to differentiate in-house coding from the participants’ own ticks or responses. Examples of data cleaning processes include coding an item as missing if there were multiple responses to a question that required only one response. In this scenario, the SCQ administrator would code this item as missing because an accurate response could not be ascertained. All participants’ freehand notes were transcribed into a relevant free text box for data capture.
Once pseudonymised and coded, a third-party company was contracted to complete the data entry. After a competitive tender process, a suitable data management company was selected. This vendor was compliant with the European Union General Data Protection Regulation (GDPR) and certified by ISO 27001. The data entry company also signed a non-disclosure agreement and was obliged to destroy all study data in their possession once data entry was completed. Each returned questionnaire was inputted via a double-blind data entry method by the data entry company, using a spreadsheet template provided by TILDA. In this process, two operators independently enter the records. Their outputs are then compared in a software validation programme that allows any character deficiencies to be automatically flagged for review. This ensures that the output is 99.99% accurate. The only scope for error is if both operators make the same mistake at the same data point. Data were password protected at rest, and once the quality of each batch of SCQ data passed these quality control measures, data were securely transferred using a password protected 7zip encrypted archive (https://www.7-zip.org/) using a 256-AES algorithm.
The sequence of procedures in the study, from participant recruitment to data archiving, is shown in Figure 1.
Further details of Quality Control (QC) checks performed during the above project lifecycle, from the initial coding of returned questionnaires to final data archiving, are provided in Figure 2.
TILDA works closely with Trinity College Dublin’s Data Protection Officer (DPO) to ensure that all data processing is compliant and meets the GDPR and Health Research Regulations 2018 (as amended) (HRRs) requirements. A Data Protection Impact Assessment (DPIA) was also conducted. Recommendations were provided by the TCD DPO. The DMP was also reviewed using the TCD data steward with the recommendations provided. TILDA preforms in-house reviews to assess compliance with data protection legislation, how policies and procedures are implemented, and review the adequacy of the control measures in place, particularly with third-party services.
Once returned, completed paper copies of the SCQs were stored at secure TILDA offices in Trinity College Dublin. Questionnaires were stored in rooms secured by access keypads in an area with restricted swipe card access. During off-site data entry processing, SCQs were stored at the purpose-built fireproof storage facility of the data entry company. TILDA employs the services of an offsite records storage company to provide long-term secure storage in a locked and controlled facility for study documentation, including consent forms and hard copies of completed SCQs. Individual consent forms and SCQs are stored separately, so that the individual identifiable information contained in the consent form cannot be linked to their corresponding questionnaire information.
The cleaned data generated from the questionnaires were stored on a secure storage resource hosted by Institutional IT Services. The volume of generated data was approximately 2GB. Data are stored redundantly to avoid data loss owing to the failure of individual hard drives. All computers, including the backup and storage servers, are protected from external access by the institutional network firewall.
Data should generally be stored on secure Institutional or National IT infrastructure, as available and appropriate. The data entry company similarly backs up data nightly, which are stored on their secure server at an Irish location. The back-up site implements physical and environmental controls similar to those in place at the main site. Backup media are tested biannually to ensure that the backup can be relied upon. These data storage and backup processes are fully compliant with the GDPR requirements.
TILDA adheres to the FAIR (Findable, Accessible, Interoperable and Reusable) principles for scientific data management and stewardship (Wilkinson et al., 2016). In accordance with the requirements of the Health Research Board’s data-sharing policy, data are shared according to FAIR principles. This promotes transparency and accountability and increases the reuse of the data generated during the study. Data collected during this sub-study will be stored indefinitely and reviewed in line with the study funding cycles. This is in line with consent and ethical approval, as this study is of public and scientific interest. This was highlighted to participants in the PIL, ICF, and privacy notice available on the TILDA website. Future processing for secondary research must be in line with the original consent obtained and used only for research purposes, as expressed in the explicit consent form.
The pseudonymised public researcher data file was made available via application in the Irish Social Science Data Archive (ISSDA) at University College Dublin, alongside existing public TILDA data files (www.ucd.ie/issda/data/tilda). Importantly, a Digital Object Identifier (DOI) was assigned to each uploaded TILDA dataset. The public archiving of data allows for long-term data storage and wider dissemination. To support data access, data were available in multiple formats (SPSS/Stata/SAS/R). TILDA transfers datasets to ISSDA using the secure end-to-end encrypted file-sharing tool ‘HEAnet FileSender,’ which is available from HEAnet at https://filesender.heanet.ie/.
The publicly archived data files were protected under Directive 96/9/EC on the legal protection of databases, and no IP issues were anticipated or experienced. In addition to adhering to FAIR principles, our public archiving of data generated in this study also adheres to the principles set out in the joint statement on sharing research data and findings relevant to the COVID-19 outbreak1. This will ensure that the study’s research findings will be publicly accessible in a timely manner.
In preparation for public archiving, TILDA data team employed a range of anonymisation techniques, as described below. Each variable is reviewed by the TILDA data team for low cell counts, extreme values, or potentially identifiable patterns, resulting in some recoding as appropriate to preserve participant confidentiality. A small number of response categories were merged with other categories where there were very few valid responses, and other variables were top or bottom coded as appropriate to remove extreme values. Some distinct values were banding into wider ranges to maintain the confidentiality of the participant data. The publicly archived dataset also contains the derived variables generated during the project. In addition, this public file has an associated detailed codebook and a complete metadata. The codebook provides a description of each variable and coding schema for the derived variables and is published alongside the public data file available from the ISSDA repository. Associated Metadata records are also exported in the following formats: Dublin Core, DDI 2.5, and DATS 2.2 (JSON). Access to publicly archived data is restricted by the application only. The applicant must commit to a set of conditions associated with the use of TILDA data and relevant data protection legislation. While TILDA specifies the conditions of sharing to ensure access is only for research and education purposes, ISSDA manages the application and approval process. The publicly archived dataset was specifically given a second layer of unique ID numbers (linked to the TILDA study ID number). Pseudonymization does not affect data quality in any way, but does ensure that EU GDPR and Ireland's HRR (2018, as amended) data protection safeguards are adhered to.
Contractual agreements are in place with data repositories and are reviewed in line with legislative requirements and changes. International researchers and educators from within the European Economic Area and countries with GDPR adequacy decisions can access the data for teaching and research purposes. As per the end-user Licence agreement with the ISSDA, data will only be distributed and accessed by the research team members listed on the data request form. Data will only be used for purposes in line with the original consent obtained, and access to the data will be kept secure and not disclosed to others.
TILDA adheres to the guidelines set out in the 1964 Helsinki declaration and its later amendments. Ethical approval for the wider TILDA study is granted for each wave of data collection by the Faculty of Health Sciences research Ethics Committee at Trinity College Dublin (Wave 6 REC Ref: 190407). As part of Ireland’s response to the COVID-19 pandemic, and in accordance with a recommendation in the WHO's ‘A Coordinated Global Research Roadmap’, the Irish Minister for Health established the first National Research Ethics Committee (NREC2) to deliver an expedited process for review of COVID-19-related health research. The National Research Ethics Office launched the first NREC in March 2020. This study has full ethical approval in place (NREC application number: 20NREC-COV-030-2, approved 10.05.20). Informed explicit written consent was obtained from all participants who agreed to participate in the study. Participants consented to sharing their data both within and outside the EU for research purposes. In addition to the SCQ, participants were asked to read and sign a PIL and an ICF. Multiple contact methods were provided to participants in the PIL and on the TILDA website, and each participant had any questions or concerns.
Trinity College Dublin is the data controller used in this study. In collaboration with the Trinity College Dublin DPO, TILDA operates in compliance with GDPR and HRR.
Data sharing is in line with consent and relevant research guidelines, data protection legislation, and health research legislation. TILDA participants have been a part of the study for over ten years and are very familiar with the consent process. The PIL and ICF were reviewed and approved by the DPO, in addition to the NREC. Note that due to the pandemic restrictions, this is the first time TILDA employed obtaining consent in this way, via post, and not face-to-face.
Participants retain the right to restrict the processing of their individual data under particular circumstances. However, these rights are not absolute, and are subject to certain restrictions. Participants were informed that these rights include the following:
The right to access data.
The right to restrict the use of the data.
The right to correct inaccuracies.
The right to have information deleted.
The right to object to profiling.
All TILDA staff members are required to receive regular in-house data protection training and must complete the University of Oxford Epigeum Research Integrity Training. A comprehensive data management policy is in place that details the procedures for the storage, transfer, processing, and retention of TILDA data. All identifiable data (directly and indirectly identifiable) were further restricted and password protected. This information is stored on a separate secure server for the final pseudonymised researcher dataset. This identifiable data is only available to a limited number of members of the TILDA team via role-based access.
In this paper, we provided a detailed description of a DMP developed for a large sub-study of an established prospective cohort study of older adults in Ireland. The template used for this DMP follows the ‘Health Research Board DMP Template’ available for download at DMP Online3. It also aligns with Science Europe 2021 core requirements for a DMP. Specifically, the DMP describes the procedures used to develop TILDA's COVID-19 study, which involved administering almost 4,000 postal self-completion questionnaires to existing TILDA participants during the early months of the COVID-19 pandemic in 2020. The DMP provides details on the management and stewardship of all data generated during the lifecycle of the research project, including data collection procedures, storage, and dissemination. Importantly, it also describes data management and archiving structures beyond the project funding lifecycle. TILDA has also incorporated the DMP Evaluation Rubric from Science Europe 2021, a practical guide to the international alignment of research data management to ensure that all core elements of the DMP have been considered (Science Europe, 2021).
Our objective was to provide an exemplar of good research practice that may be helpful to researchers and data stewards operating within European and Irish legislative frameworks. Documented DMPs are increasingly required and reviewed by research funding agencies in many jurisdictions, and their importance is likely to increase. The legislative context within which researchers operate in Ireland is complex and requires adherence to both European (GDPR) and Irish legislation (HRR). This context is likely to change when the Health Information Bill (2023) becomes a law in Ireland. The main element of this proposed regulation is "electronic patient summaries that can be accessed for care and treatment, wider use of health information for desirable health service and organisational developments that support the digitisation of health services" (Department of Health, 2023: p.8). Implementation of the Bill will bring with it increasing complexities for health researchers. Within this intricate and developing research environment, it is important that researchers, data stewards, and other practitioners share their expertise freely so that others can learn from their experiences, and the health research community can develop standards of best practice.
Is the rationale for, and objectives of, the study clearly described?
Yes
Is the study design appropriate for the research question?
Yes
Are sufficient details of the methods provided to allow replication by others?
Yes
Are the datasets clearly presented in a useable and accessible format?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Ageing research
Is the rationale for, and objectives of, the study clearly described?
Partly
Is the study design appropriate for the research question?
Yes
Are sufficient details of the methods provided to allow replication by others?
Partly
Are the datasets clearly presented in a useable and accessible format?
Not applicable
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: EMR + MRI data analysis, statistics. I'm not an expert on data management plans and don't feel completely comfortable providing advice on improving this article. I'm very familiar with the technical aspect and the fact that I found this article confusing to read suggests further work is required.
Is the rationale for, and objectives of, the study clearly described?
Yes
Is the study design appropriate for the research question?
Yes
Are sufficient details of the methods provided to allow replication by others?
Partly
Are the datasets clearly presented in a useable and accessible format?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Please be more specific about the exact measures employed in the study.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |||
---|---|---|---|
1 | 2 | 3 | |
Version 1 24 Jun 24 |
read | read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Register with HRB Open Research
Already registered? Sign in
Submission to HRB Open Research is open to all HRB grantholders or people working on a HRB-funded/co-funded grant on or since 1 January 2017. Sign up for information about developments, publishing and publications from HRB Open Research.
We'll keep you updated on any major new updates to HRB Open Research
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)