Keywords
Primary Care (Primary Health Care), General Practice (Family Practice), Digital Scribes (MeSH heading), Artificial Intelligence (MeSH heading), Non-specific symptoms
Non-specific symptoms (NSS) such as unexplained fatigue, weight loss, or abdominal discomfort are common in general practice and may herald serious disease, including cancer. These symptoms are less consistently recorded than "alarm" symptoms, partly because time-pressured consultations limit comprehensive documentation. AI-enabled digital scribes are increasingly used in routine consultations and can be configured with templates to surface specified symptom groups, but their acceptability and practical value for NSS documentation in primary care are unknown.
To assess the acceptability and feasibility of an AI digital scribe template, enhanced for NSS documentation in Irish general practice, and to describe patterns of NSS documentation in template-generated consultation notes.
Mixed-methods feasibility study in general practices within the ARQ Practice-Based Research Network in Ireland, all of which routinely use the Heidi™ AI medical scribe.
In Phase 1 we will iteratively develop an NSS-focused template (covering eight target symptoms) and test it on publicly available primary care consultation data. In Phase 2, five purposively sampled GPs (n=5) will deploy the template during routine consultations over four weeks. In Phase 3, all GPs in the network using Heidi™ (n=30) will be invited to complete an online acceptability survey, with semi-structured interviews for pilot participants (n=5). In Phase 4, a pseudonymised chart review of clinical notes will describe template utilisation, GP editing behaviour, and NSS documentation prevalence. Quantitative data will be analysed descriptively; qualitative data will be analysed using the Framework Method and integrated using convergence coding matrices.
Primary feasibility outcomes are: recruitment and retention of GPs; template use during routine consultations; template acceptability; and completeness of routine data extraction. Findings will inform refinement of the NSS template and the design of subsequent evaluative studies examining clinical impact. Study materials will be available via the Open Science Framework.
Primary Care (Primary Health Care), General Practice (Family Practice), Digital Scribes (MeSH heading), Artificial Intelligence (MeSH heading), Non-specific symptoms
Non-specific symptoms (NSS), including unexplained fatigue, weight loss, abdominal discomfort, and generalised pain are frequent in general practice and are most often benign, but in a minority of patients they represent the earliest presentation of serious conditions, including poor-prognosis cancers1–3. Non-specific symptoms may be associated with multiple cancer types, for example, abdominal discomfort can indicate eight different malignancies in men and nine in women4, and 11–35% of patients presenting with NSS referred to a rapid diagnostic cancer pathway ultimately receive a cancer diagnosis5. Because NSS are non-localising and overlap with common self-limiting problems, they are harder to triage and are less consistently documented than classic "alarm" symptoms6,7.
In time-pressured general practice, comprehensive documentation of subtle, non-specific presentations is challenging. Globally, 50% of patients receive consultations of five minutes or less8, and diagnostic errors in primary care are strongly associated with documentation failures, with 78.9% involving breakdowns in patient-practitioner encounters related to incomplete history-taking and examination recording9. When NSS remain undocumented, opportunities for pattern recognition across serial consultations or when symptoms evolve may be missed, potentially delaying investigation for serious disease. The extent of NSS under-documentation in routine Irish general practice, and its clinical consequences, remain poorly quantified.
"Ambient" AI-enabled digital scribe tools, which transcribe consultations and generate draft notes using large language models, are now entering routine primary care workflows10,11. Early systematic reviews suggest digital scribes can reduce documentation time and improve note completeness, though accuracy varies substantially in complex primary care scenarios12–14. Most evidence derives from hospital settings or highly structured clinical encounters, with AI implementation in primary care remaining predominantly developmental with minimal real-world adoption beyond pilot studies15,16. Whether digital scribes can reliably capture clinically salient information, including subtle, potentially significant symptoms in routine primary care consultations remains uncertain.
Many digital scribe systems allow users to add structured templates that instruct the model to surface specified clinical features17,18. These templates can increase documentation of essential clinical elements substantially, for example, clinical alerting systems increase problem documentation by 3.4-fold19. When applied to digital scribes, templates function as instruction sets that guide the language model to identify and surface specific symptoms mentioned during the consultation. This represents a form of passive clinical decision support: rather than requiring active GP recall or manual entry during already time-pressured consultations, the system surfaces potentially relevant information for clinician review and verification. Targeted template customisation could therefore offer a low-friction way to standardise NSS documentation. However, whether such templates generate notes that are acceptable to GPs and fit within everyday workflow or contain NSS in a form that is usable for research or quality improvement in real Irish practice settings has not been evaluated.
Large language models generate probabilistic outputs and may produce inaccurate or hallucinated content20,21. When applied to generating clinical notes, large language models produce hallucinations in 42% of summaries, omit clinically relevant information in 47%, and generate entirely error-free content in only 33% of cases22,23. The high error rates necessitate rigorous human oversight24, and the fluent, authoritative tone of LLM-generated text may foster over-reliance, particularly when time-pressured clinicians review outputs rapidly25. For NSS documentation specifically, false positives (flagging symptoms not mentioned) could create unnecessary clinical work or patient anxiety, whilst false negatives (missing genuine NSS) undermine the intervention's purpose. Any implementation of AI-assisted documentation must therefore balance potential benefits—improved efficiency, systematic symptom capture against risks of error and the need for clinician verification of AI-generated content.
Evidence regarding digital scribe use in routine primary care remains limited, with few studies examining their application to specific clinical documentation challenges such as NSS capture11,14. Existing evaluations focus predominantly on efficiency metrics (time saved, note length) rather than documentation accuracy, clinical utility, or acceptability to clinicians. It is unclear whether template-enhanced digital scribes can reliably surface NSS in real-world consultations, whether GPs find such tools acceptable and practical, and whether systematically surfaced NSS would influence clinical decision-making. Furthermore, no studies have examined digital scribe implementation in Irish general practice, where consultation patterns and practice autonomy differ from other healthcare systems26. These gaps highlight the need for a structured approach to evaluating NSS documentation. Figure 1 illustrates the rationale for the study intervention. The left panel shows the current challenge where time-pressured consultations and the subtle nature of NSS can lead to inconsistent documentation, potentially delaying diagnosis. The middle panel depicts the proposed intervention: an AI digital scribe enabled with a passive NSS template designed to systematically surface these symptoms for clinician review during routine workflows. The right panel outlines the long-term goal, where improved capture of NSS may facilitate earlier investigation and referral pathways, ultimately improving diagnostic timelines.
We therefore designed a mixed-methods feasibility study to evaluate an NSS-enhanced template for an AI digital scribe already in routine use within an Irish practice-based research network. Our specific objectives are: first, to develop and internally test an NSS-enhanced template using simulated consultation data; second, to evaluate GP perceptions of acceptability, usability, workflow integration, and clinical utility of both the digital scribe and the NSS-enhanced template; third, to describe NSS documentation prevalence in consultation notes generated using the NSS-enhanced template; and fourth, to assess the feasibility of using routinely collected consultation notes to evaluate NSS documentation patterns in future studies.
This mixed-methods feasibility study evaluates the acceptability and implementation of an AI-enabled digital scribe with NSS-enhanced template in Irish general practice. We employ a convergent parallel mixed-methods design27, integrating quantitative survey data, qualitative interview findings, and descriptive chart review analyses to provide equally weighted, complementary perspectives on feasibility, acceptability, and early signals of impact on NSS documentation. Given that feasibility, acceptability, workflow integration, and documentation patterns constitute multidimensional constructs that cannot be adequately captured using a single methodological approach, a mixed-methods design is required. Quantitative data allow description of utilisation patterns and documentation outputs, while qualitative inquiry provides explanatory insight into clinician behaviour, contextual influences, and perceptions of utility. Integrating these strands is essential to address the overarching feasibility research question, where understanding both "whether it works" and "how it works in context" are essential for informing subsequent definitive evaluation28. (Supplementary material 1)
The study comprises four sequential phases: (1) template development and internal testing, (2) pilot deployment in routine practice, (3) acceptability evaluation through surveys and interviews, and (4) retrospective analysis of documentation patterns. This phased approach allows iterative refinement whilst generating preliminary evidence on implementation barriers and facilitators prior to potential scale-up. The study is reported according to the Good Reporting of A Mixed Methods Study (GRAMMS) framework29.
This study will be conducted within practices in the ARQ Practice-Based Research Network, a collaboration between Royal College of Surgeons in Ireland (RCSI) and Centric Health, Ireland's largest primary care provider. The network comprises over 70 general practices across the Republic of Ireland, providing routine primary care whilst participating in coordinated clinical and health services research. Participating practices serve diverse populations across urban, suburban, and rural settings, with registered patient populations ranging from 3,000 to 15,000 per practice.
All participating practices have implemented the Heidi™ digital scribe (Heidi Health, Melbourne, Australia) as part of routine clinical workflow. Heidi™ is a commercial AI-powered medical scribe that uses automatic speech recognition and large language model technology to transcribe patient-clinician interactions into structured clinical notes. The platform operates through desktop or mobile application, recording consultation audio (with patient consent), generating draft clinical notes and allowing clinician review and editing before transfer to the electronic medical record. Heidi™ supports customisable templates that function as instruction sets for the underlying language model, specifying note structure, terminology, and emphasis. This existing implementation provides an opportunity to evaluate template-enhanced NSS documentation without requiring de novo digital scribe adoption.
Practice and clinician sampling
We will use purposive sampling30 to select practices and clinicians across study phases. For Phase 2 (pilot deployment), we will invite five general practitioners (GPs) from practices within the ARQ network where Heidi™ has been implemented and where GPs report routine use (defined as ≥10 consultations per week using the platform). Selection will prioritise diversity in practice setting, clinician experience (years since qualification), and baseline Heidi™ usage patterns to capture varied implementation contexts. This sample size aligns with recommendations for feasibility studies in primary care quality improvement interventions, where 5–10 participants provide sufficient data to identify major implementation barriers whilst remaining pragmatically achievable31.
For Phase 3 (acceptability evaluation), all GPs within the ARQ network who routinely use Heidi™ will be invited to complete a survey assessing general digital scribe acceptability and usage patterns. This broader sampling frame provides contextual data on digital scribe implementation across the network and allows pilot participants' experiences to be situated within wider patterns of technology adoption. From survey respondents who participated in the NSS template pilot, we will invite all to participate in semi-structured interviews32. Participants will provide written informed consent to participate in the study before taking part.
Inclusion and exclusion criteria
Eligible clinicians for pilot (Phase 2): Fully qualified GPs employed within ARQ network practices who routinely use Heidi™ for clinical documentation and who will remain in post for the full four-week pilot period.
Exclusions: Locum GPs with anticipated tenure <6 weeks, GP trainees, and clinicians who do not routinely use Heidi™. These exclusions reduce heterogeneity in baseline digital scribe familiarity for this early-phase feasibility work.
Eligible clinicians for survey (Phase 3): All GPs within ARQ network practices who routinely use Heidi™ for clinical documentation.
No restrictions are placed on consultation type, patient demographics, or presenting complaint.
Overview and rationale
Phase 1 involves iterative development of an NSS-enhanced template—a structured instruction set that guides Heidi™'s language model to systematically identify and document NSS mentioned during consultations. The NSS template includes Heidi’s standard SOAP (Subjective, Objective, Assessment, Plan) or ‘Issues List’ note format, with specific instructions appended directing the model to surface and summarise NSS in a dedicated "Non-Specific Symptoms" section, and to note absence of NSS when none are mentioned. This appendage approach was designed to surface NSS without disrupting routine clinical note formatting and preferences.
Development process
Template development will proceed through iterative cycles of design, testing, and refinement, guided by the research team (comprising general practitioners, computer scientists, and implementation researchers) and informed by the Primock57 dataset—a publicly available collection of 57 simulated primary care consultation audio recordings with corresponding transcripts33. Whilst modest in size, this dataset provides standardised test cases spanning common primary care presentations, allowing controlled assessment of template performance without patient involvement. Each development cycle will involve: (1) defining or refining template instructions (symptom definitions, formatting, placement within note structure); (2) processing Primock57 consultations through Heidi™ using the current template version; (3) GP researcher review of outputs against consultation transcripts to assess whether NSS mentioned in conversations are surfaced in generated notes; and (4) team discussion to identify instruction ambiguities, formatting issues, or systematic errors requiring revision.
Target non-specific symptoms
Based on epidemiological evidence linking specific NSS to cancer risk2 and expert consensus within the research team, we will target nine NSS for documentation: unexplained fatigue, unintentional weight loss, persistent pain (location unspecified or non-localised), persistent, unexplained abdominal discomfort, persistent bloating, unexplained appetite change, persistent nausea, night sweats, and generalised itch. The template will include qualifying modifiers for each symptom specifying that persistence or inexplicability of the symptom must be explicit or implied.
Performance assessment
Template performance will be assessed descriptively rather than through formal validation, recognising the exploratory nature of this development work and constraints of the small test dataset. GP researchers will assess whether: (1) NSS mentioned in consultation audio are captured in the generated note (detection completeness); (2) NSS documented in the note were mentioned in the consultation (accuracy); and (3) note formatting and readability are clinically acceptable. Disagreements will be resolved through discussion. We will use an indicative benchmark of ≥70% accuracy to guide template refinement. This threshold was set through expert consensus within the research team, recognising that excessive false positives (>30%) would create unacceptable clinical burden. These thresholds are provisional and not statistically validated.
Development outputs
Phase 1 will produce: (1) a finalised NSS-enhanced template suitable for pilot testing; (2) documentation of template instruction text and formatting specifications; (3) technical notes on Heidi™ system settings that influence output (e.g., detail level, processing mode); and (4) descriptive data on template performance using Primock57 test cases. This phase involves no patient participants and serves solely to establish a template version suitable for real-world pilot deployment.
Pilot implementation
Five purposively selected GPs will be invited to use the NSS-enhanced template during routine consultations over a four-week period. This timeframe balances the need for sufficient exposure to assess implementation barriers (shorter periods risk insufficient experience) against feasibility constraints and potential pilot fatigue (longer periods increase dropout risk in this feasibility work). Participating GPs will receive a structured onboarding package comprising: (1) a two-page information sheet explaining study aims, digital scribe functionality, large language model limitations, and participant rights; (2) step-by-step instructions for installing the NSS-enhanced template as a selectable option within Heidi™; (3) a brief video demonstration (5–7 minutes); and (4) access to technical support from the research team via email or telephone. We will also provide guidance on patient consent for audio recording, emphasising that Heidi™ use requires explicit verbal consent documented in the medical record, consistent with Irish data protection requirements.
Template usage and clinical workflow
GPs will be instructed to use the NSS-enhanced template as they judge appropriate during the four-week pilot. They may use it for all consultations, specific consultation types, or selected patients, and may switch between the NSS template and standard templates at will. This pragmatic approach reflects real-world implementation where clinicians exercise judgement about tool application. However, it introduces heterogeneity in exposure that we will quantify and explore qualitatively in Phase 3. The template generates a draft consultation note with a dedicated NSS section appended. Clinicians retain full editorial control: they may accept, edit, delete, or ignore the NSS content before finalising the note. This design maintains clinician agency and patient safety whilst allowing us to observe which NSS content is retained (suggesting clinical utility) versus removed (suggesting irrelevance or inaccuracy). Each note generated using the NSS template during the pilot period will include a system-generated metadata marker ("pilot participation label") and a template version identifier, enabling retrospective identification for Phase 4 analysis.
Fidelity monitoring
We will monitor pilot fidelity by tracking: (1) number of participating GPs who complete the four-week period; (2) number of consultations where the NSS template is used (via pilot participation labels); (3) distribution of usage across GPs and across the four-week period; and (4) any reported technical issues or adverse events. GPs will be asked to document any problems encountered via a brief online incident log. Mid-pilot modifications to the template are not anticipated; however, any changes will be documented with rationale, and version control will be maintained through template version identifiers embedded in notes.
Survey component
Survey development and content
We will develop a structured online survey assessing acceptability and perceived utility of the Heidi™ digital scribe in routine practice, with additional questions for GPs who piloted the NSS-enhanced template. Survey development will be informed by the Theoretical Framework of Acceptability34, which identifies seven component constructs: affective attitude, burden, ethicality, intervention coherence, opportunity costs, perceived effectiveness, and self-efficacy. We will adapt items from validated technology acceptance instruments where available35,36 and develop bespoke items for NSS-specific content.
The survey comprises two parts with conditional branching. Part 1 assesses general Heidi™ acceptability and will be completed by all respondents, covering: practice context (years in practice, practice setting, patient volume); baseline documentation practices; frequency and patterns of Heidi™ use; perceived utility for routine documentation; impact on consultation conduct and efficiency; trust in AI-generated content; perceived burden and time savings; overall satisfaction with Heidi™; intention to continue use; and experienced barriers and enablers to adoption.
Respondents will be asked whether they used the NSS-enhanced template during the four-week pilot period. Those responding affirmatively will be directed to Part 2, which includes additional items specific to the NSS-enhanced template: frequency of template use during eligible consultations; clarity and clinical relevance of NSS prompts; ease of workflow integration; perceived impact on NSS consideration and documentation; trust in template-generated NSS content; control over the NSS section; comparative burden relative to standard templates; overall template satisfaction; intention to continue template use; and likelihood of recommending the template to colleagues.
Items will use five-point Likert scales (1=Strongly Disagree to 5=Strongly Agree) for agreement statements, five-point directional scales for comparative questions (e.g., "Compared to manual documentation, Heidi™ is: Much worse / Somewhat worse / About the same / Somewhat better / Much better"), and "check all that apply" formats for barriers and enablers. One optional free-text item in each part will invite general feedback (Part 1) and template-specific suggestions for improvement (Part 2). The survey will be piloted with two GPs outside the study sample and refined based on feedback before deployment.
Survey administration and sampling
Following the four-week pilot period, all GPs across participating practices within the ARQ network who use Heidi™ will be invited via email to complete the online survey (Microsoft Forms). This broader sampling frame (beyond the five pilot participants) serves two purposes: first, it provides contextual data on general digital scribe acceptability and usage patterns in the network; second, it captures experiences of the pilot participants regarding both general Heidi™ use and the specific NSS-enhanced template. Based on network composition and Heidi™ adoption rates, we anticipate 20–30 eligible GPs, with an estimated response rate of 40–60% yielding 8–18 completed surveys for Part 1. Of these, we expect 5–8 respondents will have participated in the NSS template pilot and therefore complete Part 2. Up to two email reminders will be sent at weekly intervals. Survey responses will be anonymised, with no identifiable information collected beyond basic demographic and practice context variables. Completion time is estimated at 8–12 minutes for Part 1, with an additional 5–7 minutes for Part 2. This design allows us to contextualise pilot participants' experiences within broader patterns of digital scribe adoption in the network, whilst also gathering template-specific acceptability data from those who used the NSS-enhanced version.
Interview component
Interview design and conduct
Semi-structured interviews will explore pilot participants' experiences, perceptions, and contextual factors influencing NSS template implementation in greater depth than survey data allow. Interviews will be guided by a semi-structured topic guide informed by survey domains and implementation science frameworks, particularly the Consolidated Framework for Implementation Research37. The topic guide will cover: motivations for participating in the pilot (characteristics of the individuals); experiences using the NSS template in routine practice (innovation characteristics); perceived clinical utility and impact on consultation conduct (innovation characteristics and inner setting); workflow integration and practical challenges (inner setting); trust in template outputs and verification strategies (characteristics of Individuals and innovation characteristics); views on NSS documentation importance; comparison with standard Heidi™ templates and manual documentation (innovation characteristics); unanticipated consequences or concerns (innovation characteristics and implementation process); and recommendations for template refinement or implementation strategy (implementation process). Interviews will be conducted by a trained qualitative researcher (CD or CM) either remotely (video call) or in person according to participant preference. Interviews will last approximately 30–40 minutes, be audio-recorded with consent, transcribed verbatim by a professional transcription service, and checked for accuracy by the interviewer. Transcripts will be pseudonymised, removing any identifying information about participants, patients, or specific practices.
Interview sampling and recruitment
All GPs who participated in the four-week NSS template pilot (n=5) will be invited to participate in interviews, regardless of whether they completed the survey. We will aim to interview all five pilot participants if willing, or until thematic saturation is achieved—the point at which no substantially new themes emerge32. If fewer than five participants volunteer, we will interview all volunteers. If conducting all five interviews reveals substantial redundancy with clear thematic saturation before completion, we may conclude with 4–5 interviews. Invitation to interview will occur via email following survey closure, with up to two reminder emails. Participants will be offered flexibility in scheduling and format (video call or in-person) to maximise participation. A €50 voucher will be offered as compensation for time, recognising the additional burden of interview participation beyond survey completion.
Overview and data source
Following pilot completion, we will conduct a retrospective descriptive analysis of consultation notes generated during the pilot period. This analysis aims to describe NSS documentation prevalence and patterns, assess template utilisation, and evaluate the feasibility of extracting structured data from routine clinical notes for future research. Importantly, this is an observational descriptive analysis without a control group; we cannot attribute observed documentation patterns to template use per se, as we lack baseline or contemporaneous comparison data. Rather, this phase establishes what NSS documentation looks like when the template is used and assesses whether meaningful data extraction is feasible.
Data extraction and inclusion criteria
The research team will access pseudonymised consultation note data through the ARQ network's secure research data environment. Only GP-signed final consultation notes generated during the four-week pilot period will be eligible for data extraction. Notes must meet the following inclusion criteria: (1) contains a pilot participation label (system-generated marker confirming NSS template use); (2) dated within the four-week pilot period; and (3) not a duplicate entry. Notes failing any criterion will be excluded, with exclusion reasons documented.
From eligible notes, we will extract only pseudonymised note-level variables; no free-text narratives or patient-identifiable information will be exported from the secure environment. Extracted variables will include: GP identifier (pseudonymised), date of consultation data extraction, template version used, and presence/absence of each target NSS in the final GP-signed note (binary coding: documented or not documented). We will also code whether the NSS section was: (1) retained as generated; (2) edited by the GP; or (3) deleted entirely, where this can be reliably determined from evidence of modification.
Analytical approach
Descriptive analyses will summarise: (1) template utilisation (number and proportion of consultations using NSS template, distribution across GPs and over time); (2) NSS documentation prevalence (overall proportion of consultations where any NSS was documented, proportion documenting each specific NSS type); (3) GP editing behaviour (proportion of NSS sections retained unchanged, edited, or deleted); and (4) most frequently documented NSS types. Data will be analysed using Microsoft Excel and Python programming, with results reported as frequencies, proportions, and ranges. No inferential statistics will be applied given the descriptive nature of this feasibility work and absence of control data. We will conduct exploratory descriptive analyses to assess whether sensitivity- and specificity-style metrics could be applied to template performance using routine data. However, this requires external verification (e.g., independent review of consultation audio to confirm that NSS were actually discussed), which is beyond the scope of this feasibility study. We will report on the feasibility of such verification for future work.
Quantitative analysis
Survey data will be analysed using descriptive statistics. Continuous variables (e.g., Likert scale responses) will be summarised using means, standard deviations, medians, and ranges. Categorical variables (e.g., barriers experienced, practice setting) will be presented as frequencies and proportions. Free-text survey responses will be coded using deductive content analysis38, grouping similar responses into categories aligned with survey domains. Given the small sample size, we will not conduct inferential statistical testing. All analyses will be conducted in Microsoft Excel or using Python and results will be reported in aggregate to preserve anonymity.
Qualitative analysis
Interview transcripts will be analysed using the Framework Method39, a matrix-based analytical approach suited to multi-case comparison in health services research. Analysis will proceed through five stages: (1) familiarisation (reading transcripts, noting initial themes); (2) developing a coding framework (combining deductive codes from survey domains and CFIR constructs with inductive codes emerging from data); (3) indexing (systematically applying codes to all transcripts); (4) charting (organising coded data into framework matrices by theme and participant); and (5) mapping and interpretation (identifying patterns, associations, and explanations across cases).
Two researchers will independently code the first two transcripts, compare coding, and discuss discrepancies to refine the framework. Remaining transcripts will be coded by one researcher with regular team review. NVivo software (version 14) will be used to support coding and data management. Analytical rigour will be enhanced through regular team meetings to discuss emerging themes, reflexive consideration of researcher positionality, and use of deviant case analysis to explore perspectives that contradict dominant patterns. Quotations will be fully anonymised before inclusion in reports.
Mixed-methods integration
Integration of quantitative and qualitative data will be undertaken by designated members of the research team (CD and CM) at the interpretation stage using a convergence coding matrix40. This approach systematically compares findings across data sources (survey, interview, chart review) for each domain of interest (e.g., perceived utility, workflow integration, trust, NSS documentation patterns). For each domain, we will code whether findings from different sources show: (a) agreement (convergence), (b) partial agreement (complementarity, where findings expand on each other), (c) silence (one source provides data, others do not), or (d) dissonance (contradictory findings). For example, if survey data show high mean acceptability scores, interview data reveal concerns about specific workflow challenges, and chart review shows low template utilisation, this would be coded as "partial agreement" or "dissonance" depending on interpretation, prompting deeper exploration of why quantitative acceptability does not translate to sustained use. This approach generates integrated insights that are richer and more nuanced than either data type alone, providing a fuller understanding of feasibility and implementation context. A joint display41 will be created to visually present integrated findings, showing quantitative results alongside illustrative qualitative themes and chart review patterns for each key domain. This will form the basis of the results narrative.
This is a feasibility study with descriptive and exploratory aims; formal power calculations are not applicable31. Sample sizes are determined by pragmatic considerations balanced against the need for sufficient data to address study objectives. Five GP pilot participants align with feasibility study recommendations for early-phase quality improvement interventions42 and provide sufficient diversity to identify major implementation barriers whilst remaining achievable within resource constraints. Survey sample is appropriate for generating preliminary acceptability data in this professional population. Interview sample (5 participants) is consistent with guidance for achieving thematic saturation in homogeneous professional samples32. Chart review sample size will depend on participating GPs' clinical activity; assuming 5 GPs each conduct 50 consultations over four weeks (250 total), with 30–50% NSS template usage, we anticipate 75–125 eligible notes for analysis—sufficient to describe documentation prevalence with reasonable precision.
No patient and public involvement was incorporated in protocol development, as this feasibility work focuses on GP acceptability and implementation. However, patient perspectives on AI-documented consultations will be essential for subsequent evaluative research, and PPI will be integrated into design of definitive trials.
This mixed-methods feasibility study addresses an evidence gap regarding AI-enabled digital scribes in primary care, specifically their application to capturing non-specific symptoms that may indicate serious disease. Whilst digital scribes have been adopted rapidly across healthcare settings43, evaluation of their real-world implementation, clinical utility for specific documentation challenges, and acceptability to clinicians remains limited12,15,16. This study provides foundational evidence on these domains, informing both template refinement and the design of subsequent evaluative research within a structured feasibility-to-efficacy evaluation pathway26.
Several limitations constrain the inferences that can be drawn. First, the small sample (5 pilot participants, anticipated 8–18 survey respondents) limits precision and generalisability. Findings will be indicative rather than definitive, suitable for hypothesis generation and template refinement but not for establishing effectiveness. This sample size is consistent with recommendations for feasibility studies in primary care quality improvement interventions, where 5–0 participants provide sufficient data to identify major implementation barriers whilst remaining pragmatically achievable31,42. Second, without a control group, observed documentation cannot be causally attributed to template use. The analysis is descriptive, showing what documentation looks like with the template rather than whether it improves documentation. Third, selection bias is likely, as participating GPs are early adopters of digital scribe technology within a single practice network. Early adopters in healthcare are characterised by higher technology self-efficacy and willingness to embrace innovation44, potentially yielding more favourable acceptability findings than would be observed in broader implementation. Fourth, findings may not generalise beyond the Heidi™ platform; template performance could differ on other digital scribe platforms. Fifth, the four-week pilot duration may be insufficient for novelty effects to dissipate, which can artificially inflate initial positive results45. Finally, feasibility outcomes (acceptability scores, usage rates, documentation prevalence) do not measure clinical impact. Even if the template is acceptable and widely used, we cannot determine whether this influences clinical decision-making, diagnostic pathways, or patient outcomes.
Progression to definitive evaluation will be informed by pre-specified criteria across feasibility domains42. Acceptability will be adequate if mean Likert scores for overall template satisfaction (Part 2 survey) exceed 3.5 (between "neither agree nor disagree" and "agree") and if ≥60% of pilot participants report intention to continue template use. Feasibility of sustained usage will be judged adequate if ≥50% of participants use the template in ≥20 consultations over four weeks. Data extraction feasibility will be considered adequate if ≥80% of expected notes contain pilot participation labels and if NSS presence can be reliably coded from ≥90% of extracted notes. Progression will be contingent on absence of serious adverse events or major usability failures, and on qualitative evidence that identifies implementation barriers as addressable through template refinement. Success is not defined by high NSS documentation rates per se—frequent surfacing could reflect false positives but by evidence that GPs find the template acceptable and practical, that template use is feasible within routine workflow, and that no major safety concerns emerge precluding further development.
Findings will inform subsequent research in several ways. Template refinement will be guided by qualitative feedback on prompt clarity, symptom definitions, and false positive patterns, recognising that careful prompt design substantially influences large language model performance in clinical contexts46. Optimal study design for definitive evaluation will be informed by observed usage patterns and implementation barriers; if template use proves highly variable across GPs, cluster randomisation or stepped-wedge designs may be more appropriate than individual-level randomisation47. Outcome selection for definitive trials will be refined based on which NSS are most frequently documented and whether documentation patterns plausibly link to clinical actions such as investigations or referrals. Furthermore, implementation strategies can be tailored based on identified barriers and enablers37,48. If interviews reveal concerns about false positives, future implementations might emphasise calibration of sensitivity-specificity trade-offs or educational interventions about interpreting AI outputs. If editing burden emerges as problematic, streamlined interfaces or context-dependent prompting could be developed. These adaptations are essential before scaling beyond feasibility contexts.
Generalisability beyond the Centric Health ARQ network requires careful consideration. The network represents a specific context: practices with existing digital scribe adoption, coordinated research infrastructure, and organisational quality improvement support. Implementation outcomes are inherently context-dependent, influenced by organizational readiness, professional networks, and local implementation climate49. Findings regarding acceptability and barriers may differ in settings without these enabling factors. However, the clinical problem—NSS documentation challenges and their relationship to diagnostic delay is not unique to this network, suggesting the intervention addresses a generalisable need. Purposive sampling to include diverse practice settings enhances transferability to similar practices elsewhere. Whilst this study evaluates a Heidi™-specific implementation, the underlying concept of using template-based instructions to guide language model output for systematic symptom surfacing is platform-agnostic. Findings regarding effective symptom definitions and prompt structures could inform template development across multiple platforms, though platform-specific validation would be necessary. A generalisable implementation framework, rather than vendor-locked solutions, represents the most scalable long-term approach.
This feasibility study provides foundational evidence for evaluating AI -supported documentation in primary care28. By integrating quantitative acceptability data, qualitative insights, and descriptive documentation patterns, it will generate evidence to inform template refinement and guide definitive evaluative research. While limited by sample size and lack of controls, the study addresses an important gap in understanding real-world use of AI documentation tools for diagnostically challenging presentations. Success will be demonstrated by feasibility, safety, and addressable implementation barriers rather than effectiveness. The study recognises both the potential of AI-enhanced documentation to support earlier cancer detection and the substantial technical, clinical, and ethical requirements that must be met before routine adoption.
Ethical approval has been granted by the Irish College of General Practitioners (reference: ICGP_REC_2025_ 3282). Participants will provide written informed consent to participate in the study before taking part.
No data associated with this article.
Open Science Framework. Project data for AI Digital Scribe Template Enhancement for Non-Specific Symptom Documentation in Irish General Practice: A Mixed-Methods Feasibility Protocol. https://doi.org/10.17605/OSF.IO/U9XJR50
This project contains the following:
Data is available under the terms of the CC-By Attribution 4.0 International license.
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Register with HRB Open Research
Already registered? Sign in
Submission to HRB Open Research is open to all HRB grantholders or people working on a HRB-funded/co-funded grant on or since 1 January 2017. Sign up for information about developments, publishing and publications from HRB Open Research.
We'll keep you updated on any major new updates to HRB Open Research
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)