Exploring factors that influence the practice of Open Science by early career health researchers: a mixed methods study

Background: There is a growing global movement towards open science and ensuring that health research is more transparent. It is vital that the researchers are adequately prepared for this research environment from early in their careers. However, limited research has been conducted on the barriers and enablers to practicing open science for early career researchers. This study aimed to explore the views, experiences and factors influencing open science practices amongst ECRs working in health research. Methods: Semi-structured individual interviews were conducted with a convenience sample of ECRs working in health research. Participants also completed surveys regarding the factors influencing open science practices. Thematic analysis was used to analyse the qualitative data and descriptive statistical analyses were used to analyse survey data. Results: 14 ECRs participated. Two main themes were identified from interview data; Valuing Open Science and Creating a Culture for Open Science. Within ‘Valuing Open Science’, participants spoke about the conceptualisation of open science to be open across the entire research cycle, and important for producing better and more impactful research for patients and the public. Within ‘Creating a Culture of Open Science’ participants spoke about a number of factors influencing their practice of open science. These included cultural and academic pressures, the positives and negatives of increased accountability and transparency, and the need for more training and supporting resources to facilitate open science practices. Conclusion: ECRs see the importance of open science for beneficially impacting patient and public health but many feel that they are not fully supported to practice open science. Resources and supports including education and training are needed, as are better incentives for open science activities. Crucially, tangible engagement from institutions, funders and researchers is needed to facilitate the development of an open science culture.


Introduction
Open science aims to make research materials and results openly accessible and available to as a wide an audience as possible. Although there is no commonly accepted definition, open science is an umbrella term (Fecher & Friesike, 2013) (Pontika et al., 2015). In essence, open science is about transparency and collaboration with all stakeholders throughout the whole research cycle, from conception and design to data production, analysis and dissemination. Opaque research limits our ability to build on existing research, leading to unnecessary duplication and research waste. It has been estimated that 85% of biomedical research resources have been wasted on flawed and nontransparent research (Chalmers & Glasziou, 2009). As such, openness and transparency is particularly important for health research to maximise research efficiency and ensure optimum outcomes for patient care and health service delivery.
The importance of open science is becoming increasingly recognised on a global scale (Fecher & Friesike, 2013). Open science has been recognised as a key priority for the European Commission, with a heightened focus on open access publishing and open data for Horizon 2020 funded projects (Guedj & Ramjoué, 2015). As such, it is crucial that researchers are adequately equipped to navigate an open science landscape. However, despite its growing importance, researcher awareness of and engagement in open science activities remains suboptimal (Morais & Borrell-Damian, 2019). A 2017 survey of 1,277 European researchers from a wide range of disciplines including natural sciences, social sciences, engineering and medical/ health sciences found that the majority of respondents were unaware of the concept of open science (O'Carroll et al., 2017). This survey also found that early career researchers (ECRs) have less knowledge of open science policies and practices compared to senior researchers, but reasons for this were not explored. In addition, ECRs are often heavily involved in research data collection and analyses, but often have less autonomy for research decision-making (Allen & Mehler, 2019;Farnham et al., 2017). As such, the particular barriers and facilitators to practicing open science may well be different for researchers at the beginning of their research career than for those at a later stage, and warrant in-depth exploration.
Achieving a transition to open science practices within health research is complex, and requires a holistic approach to behaviour change that considers multiple individual, institutional and sociocultural level factors. Previous research has found that data sharing is influenced by individual factors, such as authors' attitudes, past behaviours, perceived social norms and abilities, as well as broader factors such as journal data sharing policies and institutional supports and career incentives (Kim & Stanton, 2016;Zenk-Möltgen et al., 2018). However, many studies have focused on specific singular aspects of open science such as data sharing. Moreover, few studies have explored the perceptions, benefits and challenges of practicing open science among ECRs specifically, nor have they explored specific challenges and opportunities that may exist within health research. In addition, existing studies are predominantly quantitative in nature and many have not sought to qualitatively explore factors at a depth needed for gaining insight and understanding into this phenomenon. We aimed to explore the perceptions and experiences of open science for ECRs working in health research. We also specifically sought to explore the barriers, facilitators and factors influencing their practice of open science activities.

Ethical approval and study protocol
Ethical approval for the study was obtained from NUI Galway Research Ethics Committee . Written informed consent was obtained from all study participants prior to data collection. The study protocol is accessible at https://doi. org/10.17605/OSF. IO/PKREN (Zecevic et al., 2020). As the qualitative method had priority and no reporting criteria for mixed methods research exists, this study is reported in accordance with COREQ criteria (Extended data: Appendix 1 (Zecevic et al., 2020)).

Study sample and setting
Study participants were a convenience sample recruited from a two-day introductory training workshop on open science, which was held in NUI Galway (Republic of Ireland) in April 2019 for early career researchers funded by the Irish Health Research Board. Participants self-defined themselves as ECRs when registering for the event, with no restrictions placed on eligibility. There is no unanimously accepted definition of ECR, however previous research has defined an ECR as 'one who is currently within their first five years of academic or other research-related employment allowing uninterrupted, stable research development following completion of their postgraduate research training ' (Bazeley, 2003).

Study design
We used a convergent mixed methods design to elicit a comprehensive and holistic understanding of the research question (Creswell et al., 2011). A convergent mixed methods design typically requires qualitative and quantitative data from different sources to be collected, analysed separately and brought together and integrated during the interpretation phase (Creswell et al., 2011).
Participants provided quantitative data via questionnaires and were subsequently followed up with individual semistructured qualitative interviews. In our study, participants had self-selected themselves into the two-day workshop, with another level of self-selection to participate in this study, therefore interviewees may not have been truly representative of the general population of ECRs. As such, the qualitative data provided comprehensive insights of their experiences and perceptions of open science, with the quantitative data used to further describe the characteristics, beliefs and experiences of this sample and to add further context to the qualitative findings. Although we also originally intended (as outlined in the study protocol) to integrate quantitative and qualitative data regarding the factors influencing open science practices using triangulation methods (Farmer et al., 2006), this was deemed largely unnecessary and uninformative upon completion of analysis for each dataset, as due to their semi-structured nature and conversational flow during interviews, not all interviewees spoke about specific survey items.

Quantitative data collection
Participants completed a study questionnaire before and after the workshop. The content and structure of the questionnaire were informed by the 2017 European open science survey (O'Carroll et al., 2017). The pre-workshop questionnaire collected data on participant demographics, such as gender, age, discipline etc. Both pre and post-workshop questionnaires (Extended data: Appendix 2 (Zecevic et al., 2020)) included closed and open-ended questions exploring the knowledge and awareness of open science components and initiatives among early career health researchers, as well as their perceptions of the barriers and facilitators influencing their practice of open science activities. As interviews were carried out after the workshop, only data regarding knowledge, awareness and influencing factors from the post-workshop questionnaire were used in this study as this was perceived to be more relevant for characterising the sample. Pre-post questionnaire data was reported separately as part of a workshop evaluation for the funders (Toomey, 2019).

Qualitative data collection
Individual semi-structured interviews were carried out either in person or by telephone to participant preference. Interview duration ranged from 13-34 minutes, with an average of 21 minutes in duration. All the participants were interviewed within three weeks after the workshop and interviews were facilitated by one or two members of the research team (KZ, CH). Interviewers may have been known to participants from attendance at the workshop, but were introduced during interviews as independent researchers with no bias or hidden interests in the topic. A topic guide was developed by an experienced qualitative researcher (CH) with input from KZ and ET, and used to structure the interviews (Extended data: Appendix 3 (Zecevic et al., 2020)). The topic guide specifically explored participants' understanding and experience with open science and their perceptions of barriers and enablers to practicing open science, with specific probes to enable deeper exploration of the topic in question. Interviews were audio-recorded and transcribed verbatim. Member checking of transcripts was not conducted due to time constraints and also due to evidence suggesting the benefits of member checking for verifying accuracy of transcripts may be relatively small (Hagens et al., 2009).

Data analysis
Basic descriptive statistical analyses including percentage distributions and median calculations were used to describe the closed-ended questionnaire data. Data analysis was conducted using Microsoft Office Excel. Open-ended questions were analysed by coding answers using themes identified during the qualitative analysis, whilst remaining open to the potential for iterative generation of new themes.
Interviews were analysed using thematic analysis (Braun & Clarke, 2006), which was facilitated within NVivo 12 qualitative management software. Thematic analysis is an inductive approach to analysis, thus moving beyond description into interpretation and "telling a story" of the data (Clarke & Braun, 2018), p106). First, transcripts were read several times by one member of the research team (KZ). The researcher then coded statements and citations from the interviews into nodes. Each node was named according to the content of the citations it encapsulated. If a citation did not fit in any of the existing nodes, the researcher created a new node and named it appropriately. The coding was an iterative process, and the researcher went back and forth between the transcripts and the formed nodes, sometimes re-naming the node to give a better idea of its content.
Once this initial round of coding was complete, the NVivo file was sent to a second researcher (CH) to review the nodes. Initial themes were then generated and refined by the two researchers through iterative cycles of discussion and review. The final themes, subthemes and their descriptions were then reviewed by a third researcher (ET) alongside the transcripts. The role of this third researcher was as a 'critical friend', by offering critical feedback on interpretations of the data and encouraged reflexivity by providing a "theoretical sounding board" (Smith & McGannon, 2018) p113).

Rigour
A number of strategies were employed to ensure the study was carried out in a rigorous and transparent way. Firstly, as outlined above, the research team agreed the coding and theme development from the qualitative phase. This exercise is known as peer de-briefing and is used to ensure the data is represented fairly in the developed themes to minimise researcher bias (Houghton et al., 2013). Secondly, we created a codebook within QSR NVivo to demonstrate the dependability of our findings (Extended data: Appendix 4 (Zecevic et al., 2020)). Using the coding query function, we were able to illustrate the density of coded references from each participant across all sub themes, emphasising that our findings were grounded in the data (Extended data: Appendix 5 (Zecevic et al., 2020)). This strategy also enhances transparency without the privacy concerns of publishing raw transcripts (Tsai et al., 2016).

Participant characteristics
14 out of the 40 workshop participants agreed to participate in the mixed methods study. Ten were female. Four participants had obtained a PhD in the previous 1-2 years, one was 6 years post-PhD, seven were undertaking a PhD at the time of participating in the study and two of the participants did not have a PhD. Participant demographics are further described in Table 1 Overall interviewees felt they did not have much experience with practicing specific open science activities, but the experiences they had were predominantly positive. Most interviewees had pre-registered a study or published a study protocol and some of them were planning to do so at the next opportunity.

The 'why' of open science
Interviewees generally agreed open science has many benefits, with some less obvious than others. For example, participants felt that practices like protocol publication would enable others to see what researchers planned to do, in a timely and accessible way. This provided opportunities for peer-reviewers to spot potential errors at an early stage of the research cycle. Pre-registering a study or publishing a protocol also helped in the planning of the research and thinking more thoroughly about the adopted study procedures.
"… I found publishing a protocol really helpful because it meant I dedicated time at the outset of a project, i.e. the systematic review to plan and think ahead about the Increased accountability and the challenges of transparency • Increased accountability deriving from transparency was commonly identified as a barrier to open science as an ECR, for example, the fear of being exposed, making mistakes or data confidentiality breaches ○ On the other hand, participants felt that increased accountability was also good, for example, improving the peer review process and ensuring reviewers are more constructive, and improving the quality of the overall research • Nefarious practices, such as the fear of others misusing available data, or fear of having research ideas being taken from them (i.e. being 'scooped'), were occasionally identified as a barrier Striving to be open • All identified the importance and need for more training and resources to support ECRs and all researchers, and the need for this to be integrated into existing systems and driven from the top, e.g. institutional buy-in Interviewees identified a resistance to change in the current culture, some particularly among their more senior colleagues.
Interviewees reported that more senior colleagues may not see the value in changing an established system that works, that they are used to and where they are also able to delegate certain tasks they may not see as important. However, at the same time interviewees felt the academic pressures and many challenges to doing open science were similar regardless of being at an early or more advanced career stage: In addition to the desire to protect themselves, interviewees also expressed a strong need to protect others, for example, research participants. A breach of confidentiality was considered of great importance, particularly when dealing with qualitative data and clinical data. The protection of research participants was an important factor influencing how interviewees felt about the practice of sharing data. They felt that openly sharing data was a key concern especially when treating rare conditions or narrow, very specific groups of people. Interviewees had legal, ethical and personal concerns around sharing data and making data accessible to anyone. They felt that there was still room for improvement and further consideration regarding how to achieve a balance between data sharing and confidentiality: ECRs have a particularly important role to play in the future of research. Recently described as the 'harbingers of change ' (Nicholas et al., 2019), ECRs have the potential to influence and transform future research practices (Stürmer et al., 2017). However, ECRs experience a number of challenges to practicing open science, which may be heightened due to their circumstances. In our study, participants particularly highlighted the pressures already being faced to publish in order to establish their career. Although the issue of 'publish or perish' is by no means a challenge unique to ECRs, it is perhaps particularly problematic during a stage where the development and refining of research skills and the process of learning itself is crucial, and potentially more important than the final output. This is also the point at which we are socialised to the scientific practices that we will continue throughout our careers (Felt et al., 2013

Implications for Practice, Policy and Research
It is clear that appropriate support systems are needed to help ECRs to become the new generation of researchers and develop a culture that embraces open science. For example, our participants emphasised the importance of the availability of training events, education and resources within research active institutions to improve open science awareness, knowledge and skills for researchers. However, both formal and informal undergraduate and postgraduate research training and supervision in these institutions should also endeavour to include a holistic focus on the bigger picture beyond knowledge and skills, e.g., by focusing on the benefits and impact of transparent

Strengths and weaknesses
This study has some limitations. It is important to note that our interviewees were recruited from participants of a two-day open science training workshop based in Ireland, and also volunteered to participate in interviews. This implies that our study sample is a sub-sample of the wider ECR target population who had a pre-existing interest in open science and may with, and who may also have been pre-equipped with awareness and deeper understanding of open science challenges and opportunities. Taking part in the workshop inevitably also influenced their knowledge about open science, which should be taken into consideration when interpreting the study findings. However, this also means that participants were in a position to provide substantial depth and richness of insight into an under-explored area of research. As such, these findings will provide useful comparison data for future similar studies or replications amongst other samples of ECRs. The rigour with which we conducted data collection and analysis also enhances the transparency and transferability of our findings.
Ironically, as a team of ECRs, challenges we encountered with making data from this study available mirror that of the study findings. Given the size and unique nature of our sample, de-identification of qualitative study transcripts was not deemed possible. However, as outlined in our methods section, we used NVIVO coding queries as advocated by Tsai et al. (2016) to provide a clear audit trail and facilitate transparency (Houghton et al., 2013). In addition, although de-identification of the quantitative data was deemed more easily achievable, the nature of our ethics approval meant that explicit consent for access to data beyond the research team had not been obtained. On reflection, this was deemed mostly due to a lack of forward planning for data sharing, with priority given to data confidentiality and participant protection. As such, our experience highlights another example of the need for better training and awareness for researchers in collaboration with other bodies such as ethics committees and data protection offices to facilitate appropriate data sharing.

Conclusion
Our study aimed to explore the views and experiences

Data availability
Underlying data Given the size and unique nature of our sample, de-identification of qualitative study transcripts was not deemed possible.
As outlined in our methods section, we used NVIVO coding queries as advocated by Tsai et al. (2016) to provide a clear audit trail and permit the running of coding and matrix queries to facilitate transparency (Houghton et al., 2013) (see Extended data). The data cannot be shared via an alternative route of closed access, since ethics approval did not provide explicit consent for access to the data beyond the research team. For the quantitative data, the information sheet given to participants clearly stated that their data would not be shared by anyone other than the research team. Aggregated results are provided in Extended data.

Emily Arden Close
Department of Psychology, Bournemouth University, Bournemouth, UK This extremely well-written paper is very timely. As the authors state, with the growing global movement towards open science it is important to ensure researchers are prepared for it from early in their careers. Although the findings are not representative of the general population of ECRs, the finding that even ECRs who had a keen interest in open science felt they were not fully supported to practice it is extremely valuable for identifying ways this issue can be overcome. The qualitative research in particular was conducted with a high level of rigour.
My only criticism is that the suggestions for how to address the barriers mentioned in the article are not very specific. For example, the authors state: "It is clear that appropriate support systems are needed to help ECRs to become the new generation of researchers and develop a culture that embraces open science." Some suggestions about how these "appropriate support systems" would look would be very helpful. Similarly, where the authors state "A concerted approach is needed from all stakeholders including research funders, university and institutional management and researchers to actively improve how research is valued, how career progression is evaluated and to explicitly seek engagement in open science activities," some specific suggestions would be extremely helpful.
In addition, I have a number of minor comments about the wording below: Abstract: Background: I would change the wording to "Limited research has been conducted on the barriers and enablers to practicing open science for early career researchers." Introduction: This is extremely well written and concisely sets out the background to the study.
Qualitative data collection: Change wording from "over the phone" to "by telephone" In this article, the authors conducted a mixed methods research study, interviewing 14 students and also asking them to complete a survey. The results indicated that the students support OS practices but some barriers -particularly institutionally or socially -may prevent their use.
There's much to like in this article and frankly little to critique. Given its qualitative nature, the sample and sampling design seem fine. The authors clearly have developed the themes and the analysis process seemed sound. I enjoyed reading some of the quotes, too. One small critiquesome of the quotes could probably be "cleaned up" a bit. Not changed substantively -but sometimes our spoken words don't always convey exactly what the written word tries to. So it might be useful to clean them up some.
So the biggest critique I can surmise is simply with the conclusions and discussion. I'd like to see the authors try and provide even greater suggestions about how to address the barriers or issues the respondents raised. How will we go from the start of the OS movement -where we are nowto the next generation? How can we ask ECRs to do more and be better than the last generationwhat will it take? I'm not asking for world-breaking ideas, but a few more suggestions, written plainly and in its own section at the end, might go a long way here.
I appreciate the authors hard work on this manuscript. I look forward to reading it again soon.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Partly Are the conclusions drawn adequately supported by the results?  (Egan et al., 2020), at different levels (e.g. undergraduate, postgraduate, postdoctoral) to help overcome the obstacles to open science faced by ECRs working in health research and across other disciplines and improve open science practices.