Skip to content
ALL Metrics
-
Views
8
Downloads
Get PDF
Get XML
Cite
Export
Track
Study Protocol

Protocol of a Scoping Review on the Use of Transactional Data for Early Diagnosis (TRADED-ScR)

[version 1; peer review: 1 approved with reservations]
PUBLISHED 10 Jan 2025
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

Background

The early detection of many diseases is crucial for effective treatment, as it increases the likelihood of successful management and helps prevent or delay complications. However, early detection is often hampered by the asymptomatic nature of initial disease stages and delays in patients seeking care. In cancers such as ovarian, gastrointestinal, and haematological types, delays may result from self-management of non-specific symptoms with over-the-counter medications. Recent studies, such as the Cancer Loyalty Card Study (CLOCS), suggest that transactional data can reveal early self-medicating behaviours indicative of cancer. Despite this potential, a comprehensive understanding of the use of transactional data for early diagnosis across diseases is lacking.

Aim

This scoping review aims to systematically collate and analyse the literature on the use of transactional data for early detection of any disease, assessing its viability as a predictive tool.

Methods

The review will follow the Arksey and O'Malley methodological framework, with enhancements by Levac et al., and will be reported according to PRISMA-P and PRISMA-ScR guidelines. A comprehensive search will be conducted across databases including MEDLINE, Embase, Scopus, Web of Science, and the Cochrane Database of Systematic Reviews. The review will include studies that utilise transactional data to identify early signs of disease, focusing on both peer-reviewed articles and conference abstracts. The data will be thematically charted and synthesised to compare methodologies, disease types, types of transactional data used, key findings, and limitations.

Implications

By mapping the use of transactional data as a non-invasive tool for early detection, this review aims to inform and potentially transform approaches to screening and diagnosis. The findings will provide insights for healthcare professionals, researchers, and policymakers, supporting the development of targeted interventions that leverage transactional data for disease surveillance and early detection.

Keywords

Early diagnosis, Screening, Transactional data, Health informatics, Consumer behaviour, Scoping review

Introduction

Many diseases begin with subtle symptoms, which can lead to delayed diagnoses that occur only after significant, sometimes irreversible, damage has been done. These late diagnoses are linked to lower survival rates and higher morbidity, highlighting the importance of early detection to improve patient outcomes13. Early diagnosis refers to identifying a condition at a stage when interventions can prevent or minimise irreversible damage, thereby improving survival rates, reducing morbidity, and enhancing quality of life13.

However, the early detection of diseases presents substantial challenges. Many diseases in their early stages are asymptomatic or present with non-specific symptoms, which complicates timely diagnosis. Conditions such as cardiovascular diseases, renal diseases, and various forms of cancer exemplify this issue, where delays in seeking care are common due to the subtlety of early symptoms46. These delays are often exacerbated by self-management behaviours, where individuals attempt to alleviate vague symptoms with over-the-counter (OTC) medications. While self-medication can temporarily mask symptoms, it may also obscure the early signs of more serious underlying conditions, such as malignancies7,8.

Recent studies, including the Cancer Loyalty Card Study (CLOCS), have explored the use of transactional data—specifically, the purchase of OTC medications—as a potential tool for early disease detection. These studies suggest that patterns in transactional data can reveal self-medicating behaviours that may be indicative of conditions like cancer, offering a novel and non-invasive method for early diagnosis9. Beyond cancer, transactional data has also been used to enhance the detection of infectious disease outbreaks, such as Influenza, by tracking sales of specific OTC medication10,11.

The relevance of this approach is further supported by research showing that patients with certain cancers, such as ovarian, haematological, and gastrointestinal cancers, often delay seeking professional advice due to non-specific symptoms, which they manage with OTC medications1214. For example, a survey of over 1,500 women with ovarian cancer revealed that many attributed early symptoms to non-threatening causes such as irregular menstrual cycles, menopause, or aging, leading to delayed diagnosis14. Additionally, focus group studies have demonstrated that these women had higher purchases of specific medications prior to diagnosis compared to healthy controls, indicating a potential pattern that could be leveraged for earlier detection15.

The potential for using transactional data in medical research is further supported by regulatory frameworks like the General Data Protection Regulation (GDPR), which grants patients the right to access their personal data, including transaction records16. This regulation could facilitate the ethical use of consumer transactional data for medical research, provided that safeguards such as the ability to withdraw consent, data security measures, and transparency in data management are maintained15.

Given these developments, transactional data, such as purchase histories of OTC medications, has emerged as a promising avenue for the early detection of cancer and other diseases911. The ability to detect early warning signs through consumer purchasing patterns represents a novel, non-invasive diagnostic approach that could significantly improve early diagnosis strategies. However, the existing literature on this topic is fragmented, with studies varying widely in design, disease focus, and outcomes. To date, there has been no comprehensive synthesis that could clarify the overall efficacy of using transactional data for disease detection and its potential applications in public health and clinical practice.

Aims and objectives

The aim of this scoping review is to provide a detailed overview of how transactional data can be used in the early detection of disease. This review intends to map the range, scope, and nature of the existing research, providing critical insights into the feasibility of utilising such data to enhance early diagnosis strategies.

The specific objectives include:

  • 1. to identify all peer-reviewed articles which describe the use transactional data for early detection, cataloguing them according to methodology, disease type and data type;

  • 2. to report, in the case of observational and interventional studies, the population characteristics, the event horizon studied, and how the transactional data and disease diagnosis data were operationalised;

  • 3. to quantify, where the published data allow, the predictive power of the disease signal identified in the transactional data, by calculating the implied positive predictive value;

  • 4. to discuss the logistical, legal and ethical challenges related to the use of transactional data in disease screening and early detection reported by the authors of the primary studies.

  • 5. to conduct this review in two phases, the first phase will focus specifically on the early detection of cancer, while the second phase will broaden the scope to include the early detection of any disease.

Methods

Reporting and registration

We will conduct a scoping review in accordance the Arksey and O'Malley framework enhanced by Levac et al.17. The protocol will adhere to the PRISMA-P reporting guidelines and will be pre-registered on the Open Science Framework. The results will be reported following the PRISMA-ScR guideline18. Supplementary files and relevant datasets will be made available via the Open Science Framework19.

Eligibility criteria

The review will include studies that use transactional data for early detection of diseases. Eligible studies must specifically address the use of transactional data, such as loyalty card purchases or OTC medication sales, with incident disease or disease recurrence as an outcome, regardless of the disease type. Studies from all geographical locations will be considered.

Since our aim is to describe the scope of the literature, all article types will be eligible for inclusion: interventional studies, observational studies, qualitative studies, review articles, letters and editorials will all be included. Both peer-reviewed articles (including doctoral theses) and conference abstracts will be included. We will exclude studies that do not focus on transactional data or are not concerned with diagnosis as an outcome.

Information sources

In Phase 1, the databases to be searched include MEDLINE, Embase, Scopus, Web of Science, and the Cochrane Database of Systematic Reviews. In Phase 2, additional sources such as CINAHL, ProQuest Dissertations and Theses Global (PQDT Global), three clinical trial registries (ClinicalTrials.gov, EU Clinical Trials Register, and International Standard Randomised Controlled Trial Number), and two systematic review registries (PROSPERO and Joanna Briggs Institute) will also be searched.

Search strategy

The search strategy, developed with an information specialist, will combine keywords and subject headings related to the themes of “transactional data” and “early detection”. To ensure thorough coverage, the strategy will also incorporate synonyms and related terms such as "purchase data," "loyalty card data," "retail data," and "consumer behaviour data." The detailed search strategies for both phases are available via the Open Science Framework: https://doi.org/10.17605/OSF.IO/ZRJT520. Based on pilot searches, it is anticipated the number of papers to be screened in Phase 1 will be less than 20,000, while the number of papers to be screened in Phase 2 is expected to be over 100,000. References of included studies will be reviewed to identify additional relevant works.

Selection of sources of evidence

Two reviewers will independently screen titles and abstracts for eligibility using Rayyan, a software that designed to facilitate efficient screening of abstracts and titles21. Reviewers will follow the screening algorithm (Table 1) to determine included studies and excluded studies. Full-text screening will follow for potentially relevant studies. Discrepancies will be resolved through discussion or consultation with a third reviewer. A PRISMA flow diagram will document the selection process. The screening algorithm, which will be piloted using a limited search of the MEDLINE database, is outlined on Open Science Framework: https://doi.org/10.17605/OSF.IO/ZRJT520.

Table 1. Screening algorithm.

StepConceptQuestionExclusion label
1TransactionDoes this paper deal with transaction data?“NotTransactional”
2DiagnosisDoes this paper discuss the association between patterns in
transaction data and a subsequent disease diagnosis/incidence/risk?
“NotDiagnosis”
3(INCLUDE)If negative to all the above, then includen/a

Given the high volume of articles anticipated in Phase 2, title and abstract screening may be assisted by an artificial intelligence tool based on a large language model. The validity of this approach will be assessed by comparing the results of Phase 1 screening with the authors' results. The detailed approach will be outlined in a subsequent protocol akin to a previous study implementing a similar approach22.

Data collection and data management

A standardised form will be used to extract data from selected studies based on Table 2. Extracted data will include study characteristics (e.g. author, year, disease type), transactional data specifics (data sources, items tracked), methodology (e.g. data collection, analysis techniques), and key findings (e.g. disease symptoms identified, effectiveness of detection, and limitations).

Table 2. Data charting pro-forma.

StepConceptQuestionInclusion label example
1Publication details(Automatically extracted by Rayyan software)Title, Authors, Publication Year, Data Source
(e.g., Journal, Registry)
2Publication typeWhat type of record/publication is this?P-Journal, P-Conference, P-Registration,
3MethodsWhat type of study is this?M-Review, M-Trial, M-CCC, M-Cohort, M-Other
4LocationWhere did the study population live?[Coded as “Note”] L-Korea, L-UK, L-NA
5Disease typeWhat disease was being studied?[Coded as “Note”] C-Pancreas, C-CRC, C-
Ovarian, C-Lung
6Disease dataWhat data was used to determine disease
diagnosis?
[Coded as “Note”] D-CPRD, D-THIN
7Transactional dataWhat type of transactional data was used?[Coded as “Note”] T-Statement, T-Loyalty

Data items

This review aims to identify and catalogue peer-reviewed articles that utilise transactional data for early disease detection, categorising them based on methodology, disease type, and data type. For observational and interventional studies, it will report population characteristics, the event horizon studied, and operationalisation of transactional and disease diagnosis data. Additionally, it will quantify the predictive power of the studied transactional data signal, when possible, by calculating the implied positive predictive value.

Critical appraisal

While scoping reviews do not typically involve a quality assessment, we will use the Robins-E and Rob-2/Robins-I tools to critically appraise the observational and interventional studies identified, respectively23,24. Meta-biases and confidence in cumulative evidence will not be reported since we are conducting a scoping review that does not focus on a single primary outcome.

Data synthesis

The data synthesis will proceed through four steps. First, a bibliometric analysis will map the research landscape by categorising identified articles by methodology, disease type, and data type, providing a high-level overview of the field. Second, study details will be analysed, including population characteristics, study durations, and the operational definitions and uses of transactional and disease diagnosis data, using simple descriptive statistics and narrative approaches as appropriate. Third, where data is sufficient, the review will calculate the implied positive predictive value for the disease signal identified in transactional data. Finally, the review will discuss the logistical, legal, and ethical challenges highlighted in the studies, addressing the complexities of using transactional data in disease detection.

Discussion

Summary

This scoping review protocol presents a systematic approach to examining the use of transactional data in the early detection of diseases. It highlights key areas of interest, including the types of diseases studied, the nature of the data utilised, the methodologies employed, and the challenges encountered in this emerging field. The significance of this review lies in its potential to clarify the utility of transactional data—such as loyalty card purchases and OTC medication sales—as a tool for early diagnosis across a range of conditions.

Strengths and limitations

A major strength of this protocol is its comprehensive and methodical approach to literature search and data extraction, ensuring that the review will thoroughly explore the landscape of research in this area. The inclusion of various study types and the systematic categorisation of findings by disease type, data type, and methodology will provide a broad and detailed overview of the current state of the field.

However, several limitations must be acknowledged. First, there is the issue of publication bias, where studies with negative or inconclusive results may not have been published, potentially skewing the literature. Additionally, language bias could limit the scope of this review, as studies published in languages other than English may be excluded. Another challenge arises from the need to synthesise a large and diverse body of literature into a coherent summary, which may overlook some nuances of individual studies.

A significant limitation specific to this area of research is the restricted access to raw transactional data. For example, in studies like the Cancer Loyalty Card Study (CLOCS), the raw data cannot be shared with other researchers due to privacy and legal constraints. This limitation directly impacts our ability to fully answer the research question, particularly concerning the detailed analysis of transactional patterns associated with early disease detection. While we can report on the scope of the literature in terms of publication details, publication type, methods, location, cancer type, and general descriptions of the datasets used, the lack of access to raw data means that our ability to assess the robustness of the findings reported in primary studies is inherently constrained. However, most primary studies are expected to provide sufficient descriptions of the datasets and methodologies used, allowing us to still draw meaningful insights into the use of transactional data for early diagnosis.

Implications

The findings of this review will be instrumental in shaping future research on the use of transactional data in disease detection. By identifying gaps in the existing literature and highlighting under-researched areas, this review will guide researchers towards optimal study designs and methodologies. Moreover, it will provide healthcare professionals with the latest insights into innovative tools for early detection, potentially leading to advancements in screening and diagnosis practices that could improve patient outcomes.

In addition, this review aims to contribute to the ongoing dialogue on the ethical and privacy concerns associated with the use of transactional data in medical research. By addressing these concerns, we hope to influence the development of guidelines that balance the potential health benefits of this research with the need to protect patient privacy and data security.

Conclusion

In conclusion, the proposed scoping review will provide valuable contributions to our understanding of the potential role of transactional data in early diagnosis. By mapping the research landscape, evaluating methodologies, synthesising key findings, and identifying gaps, this review will inform both current practices and future directions in disease detection research. Despite the limitations posed by the inaccessibility of raw data, this review will offer a comprehensive overview of how transactional data is currently being used and its potential to enhance early detection strategies, ultimately aiming to improve patient outcomes.

Reporting guidelines

PRISMA-P (certain elements modified and added to reflect the specific PRISMA-ScR guideline, as outlined in PRISMA-P checklist) checklist is available via the Open Science Framework.

https://doi.org/10.17605/OSF.IO/ZRJT5 under CC-By Attribution 4.0 International license20.

Ethics and consent

Ethical approval and consent were not required.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 10 Jan 2025
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
VIEWS
352
 
downloads
8
Citations
CITE
how to cite this article
Araz K, Jacob BM, Flanagan J and Redmond P. Protocol of a Scoping Review on the Use of Transactional Data for Early Diagnosis (TRADED-ScR) [version 1; peer review: 1 approved with reservations]. HRB Open Res 2025, 8:3 (https://doi.org/10.12688/hrbopenres.14015.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 10 Jan 2025
Views
9
Cite
Reviewer Report 30 Jun 2025
Paul A. Townsend, University of Stirling, Stirling, UK 
Approved with Reservations
VIEWS 9
This protocol describes a scoping review looking at how transactional data (like loyalty card purchases and over-the-counter medicine sales) might help detect diseases early.

The idea is that people's shopping patterns could reveal when they are self-medicating, ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Townsend PA. Reviewer Report For: Protocol of a Scoping Review on the Use of Transactional Data for Early Diagnosis (TRADED-ScR) [version 1; peer review: 1 approved with reservations]. HRB Open Res 2025, 8:3 (https://doi.org/10.21956/hrbopenres.15383.r47866)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 10 Jan 2025
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

Are you a HRB-funded researcher?

Submission to HRB Open Research is open to all HRB grantholders or people working on a HRB-funded/co-funded grant on or since 1 January 2017. Sign up for information about developments, publishing and publications from HRB Open Research.

You must provide your first name
You must provide your last name
You must provide a valid email address
You must provide an institution.

Thank you!

We'll keep you updated on any major new updates to HRB Open Research

Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.