Improving palliative care with machine learning and routine data: a rapid review

Introduction: Improving palliative care is a priority worldwide as this population experiences poor outcomes and accounts disproportionately for costs. In clinical practice, physician judgement is the core method of identifying palliative care needs but has important limitations. Machine learning (ML) is a subset of artificial intelligence advancing capacity to identify patterns and make predictions using large datasets. ML has the potential to improve clinical decision-making and policy design, but there has been no systematic assembly of current evidence. Methods: We conducted a rapid review, searching systematically seven databases from inception to December 31st, 2018: EMBASE, MEDLINE, Cochrane Library, PsycINFO, WOS, SCOPUS and ECONLIT. We included peer-reviewed studies that used ML approaches on routine data to improve palliative care for adults. Our specified outcomes were survival, quality of life (QoL), place of death, costs, and receipt of high-intensity treatment near end of life. We did not search grey literature. Results: The database search identified 426 citations. We discarded 162 duplicates and screened 264 unique title/abstracts, of which 22 were forwarded for full text review. Three papers were included, 18 papers were excluded and one full text was sought but unobtainable. One paper predicted six-month mortality, one paper predicted 12-month mortality and one paper cross-referenced predicted 12-month mortality with healthcare spending. ML-informed models outperformed logistic regression in predicting mortality where data inputs were relatively strong, but those using only basic administrative data had limited benefit from ML. Identifying poor prognosis does not appear effective in tackling high costs associated with serious illness. Conclusion: While ML can in principle help to identify those at risk of adverse outcomes and inappropriate treatment, applications to policy and practice are formative. Future research must not only expand scope to other outcomes and longer timeframes, but also engage with individual preferences and ethical challenges.


Introduction
Background Improving care for people with serious and complex medical illness is a health system priority worldwide. Between 2016 and 2060 there will be an estimated 87% increase globally in the number of deaths that occur following serious health-related suffering, with low-income countries experiencing the largest proportional increases 1 . Health systems originally configured to provide acute, episodic treatment often provide poor-value care to complex multimorbid populations 2 .
Palliative care is an approach "that improves the quality of life (QOL) of patients and their families facing the problems associated with life-threatening illness, through the prevention and relief of suffering by means of early identification and impeccable assessment and treatment of pain and other problems, physical, psychosocial and spiritual" 3 . Studies suggest that palliative care improves outcomes and reduces health care costs for people with serious medical illness, although significant gaps in the evidence base persist 4 . Insufficient palliative care capacity is reported even among those nations whose services perform strongly on international rankings 5,6 , and need will only grow given demographic ageing 7 .
In clinical practice, physician judgement remains the de facto method of identifying palliative care needs and predicting adverse outcomes including mortality 8 . However, studies have repeatedly shown that clinicians tend to make imprecise and overly optimistic predictions of survival in metastatic cancer, where prognosis is most accurate of major terminal illnesses [9][10][11] . No subgroup of clinicians are proven to be more accurate than others in late-stage predictions 8 .
For research and policy, primary research studies are rare due to ethical and practical issues, increasing reliance on the use of routine data 4,12 . Evaluations of programmes and interventions employing traditional analytic approaches encounter challenges in missing data and unobserved confounding 13 .
Health care is entering an era of 'big data' in which researchers and providers will have access to unprecedented levels of information on patients 14 . Machine learning (ML) is a subset of artificial intelligence that is rapidly advancing capacity to identify patterns and make predictions using large datasets 15 . In contrast to traditional analytic methods, where the analyst specifies data inputs according to hypotheses and/or conceptual models, ML approaches leverage computing power to identify patterns in available data and can make inferences without explicit user instruction 16 . These have a well-documented potential to improve clinical decision-making by analysing electronic health records and other routinely collected health data that are commonly large and "dirty", i.e. erroneous and incomplete 17 , although significant data missingness may be still be a source of bias 18 .
Rationale and aim ML approaches to improving decision-making may be strongly appropriate for palliative care. First, patient care is complex with high illness burden and management challenges including polypharmacy 19 . ML tools may be effective at identifying patterns in data on complex patients, for example denoting risk of functional decline or mortality, that are not obvious to healthcare professionals with limited time and analytic expertise to interpret available data. Second, the volume of projected palliative care need means that a minority of patients will receive care from palliative care specialists 7,20 . Tools that can aid non-specialists to identify need and provide appropriate care are essential to avoiding population health crises in an era of demographic ageing 21 . While multiple faceto-face clinical assessment tools exist for this purpose, meeting challenges of scale requires improved use of routinely collected data. Third, in the context of reliance on routine data 4,12,13 . Since 'big' data are largely routine data, optimising analytic approaches to these may be particularly impactful in palliative care compared to fields where randomised trials and large primary data collection are more feasible and commonplace.
Multiple recent editorials in topic journals reflect the growing interest in principle in using big data and ML to improve palliative care [22][23][24][25] . Less has been written on the empirical evidence of these applications. In this context we conducted a rapid review of research studies using ML to improve palliative care. By identifying and organising this evidence for the first time, we anticipated that our findings could inform ongoing and future efforts in this field.

Eligibility criteria
We included studies meeting the following PICOS (Population, Intervention, Comparison, Outcomes and Study design) criteria.

Types of participants.
Studies that reported on adults (≥ 18 years). We specified this criterion as children are a distinct and functionally different population. Studies that reported adults and children separately were eligible, but only results for adults would be considered; studies that pooled adults and children in one sample were excluded 26 . We sought studies seeking to improve palliative care: care that "improves the quality of life of patients and their families facing the problems associated with a life-threatening illness" 3 . Studies not addressing palliative care, or evaluating a specific discrete treatment (e.g. stent, chemotherapy) were excluded.

Amendments from Version 1
Minor changes were made to the manuscript between versions 1 and 2, in line with reviewer comments (see responses to reviewers for consideration of each specific change). The most substantial changes were (1) amending "palliative and end-of-life care" to "palliative care" throughout, starting with the title, as a reviewer found the original confusing; (2) a more detailed justification for exclusion of decedent cohort study designs in Methods>Eligibility criteria>Types of studies and reports.

Types of outcomes.
We specified three domains of interest: patient outcomes (survival; QoL); caregiver outcomes (survival; QoL); and economic outcomes (costs; receipt of cost-(in)effective treatment, e.g. high-intensity treatment at end of life). We specified these as established measures for quality in palliative care, and so of particular interest to practitioners and policymakers 27 .
Types of studies and reports. We included the following types of studies: those that used routinely collected data with ML approaches to improve palliative care, connecting our outcomes of interest with patients' characteristics before or at diagnosis. Any prospective study meeting the other criteria was therefore eligible; retrospective studies were eligible provided that the analysis was conducted counting forwards, i.e. defining samples and treatments at baseline. Studies that counted backwards, examining samples defined by characteristics at death or according to outcome, were excluded for three reasons. First, participant selection at death biases substantially derived results by distorting timeframe of analysis 28 . Second, mortality is an outcome and many characteristics at death are outcomes; they are not independent of treatment choices but instead endogenous and therefore inappropriate as an eligibility criteria for evaluating the treatment 29 . Third, treatment effect estimates from decedent cohort studies are in practice evidence that "treatment t should be provided to population p", where p is partly defined by imminent death 28,29 . They are therefore only useful under scenarios of very good prognostic accuracy, and this is seldom the case in the real world 8-11 . This was a rapid review of published literature. Rapid reviews are a well-established methodology for gathering evidence in a limited timeframe, provided that search methods are transparent and reproducible, clear inclusion criteria are applied, and a rigorous if time-limited appraisal is performed 30,31 . Only peerreviewed papers in academic journals were considered eligible. Selection of relevant papers was restricted to Englishlanguage publications only. We excluded conference abstracts and book chapters returned by the database search, and we did not search grey literature. A full search strategy is provided in Table 1. Search terms for all seven databases are provided as extended data 32 .

Study selection
Screening of titles and abstracts. Two reviewers (any two authors) independently screened each title and abstract of retrieved citations based on the eligibility criteria. Subsequently, conflicts were resolved on the basis of consensus. The online reviewer tool Covidence was used to manage the screening and selection process.

Screening of full-text reports.
The second phase of screening involved downloading and assessing the full-text papers for all of the citations retained from the first phase of screening and applying the eligibility criteria. Two independent reviewers (any two of VS, AOH, SA, RA) screened each full-text paper.
Hand searching. One author (PM) searched the references of all included papers by hand, applying the same criteria.
Data extraction. We extracted study design features to a standardised template: setting (country, year, health care environment), aim, principle methods, data sources and sample size. These are presented in Table 2.
Data synthesis. Narrative synthesis was employed to review the included studies and combine their key findings. Narrative synthesis involves appraisal of all relevant material, grouping 9 Limiters on #8: To end of 2018; articles and reviews and articles in press only. NOT conference proceedings or a book chapters. the findings into a coherent thematic narrative. We chose this approach ex ante in the context of our broad aims: to understand how ML has been used to improve palliative care to date, and to consider the implications for future research and practice. At no point did we anticipate quantitative results for combination in meta-analysis, since both available data and ML methods would not be standardised in a way that permitted meaningful pooling.

Database search
The database search identified 426 citations for consideration against the eligibility criteria for this review. Following removal of duplicates (n=162), 264 unique citations were forwarded for title and abstract screening. Of these, 242 citations were excluded based on title and abstract, as they clearly did not meet the review's pre-specified eligibility criteria. A full-text review of the remaining 22 citations was performed, following which a further 18 citations were excluded and one 36 was unobtainable. We corresponded via email with the author of the unobtainable text, who confirmed she did not have access to a copy of the manuscript and was not able to source one.
Full details for each of the 18 exclusions is provided as extended data 32 . Reasons for exclusion at full text review were wrong methods used, e.g. not using artificial intelligence/computer learning in the study design (n=6); wrong intervention, e.g. a named drug, a stent, a surgery or chemotherapy (n=5); wrong study design, i.e. defining the sample by outcome (n=4); wrong population, i.e. under 18 (n=2); and not a peer-reviewed article (n=1). Three studies were therefore judged as eligible and included in narrative synthesis 33-35 ; a hand-search of all references in these three papers identified no further studies of interest.
The review process is displayed in a PRISMA flow chart ( Figure 1).

Included studies
Three studies meeting the criteria are presented in Table 1.
All came from the United States, two 33,34 using data from the national Medicare programme for those aged 65 and over, and one 35 data on adults admitted to a network of six hospitals in a single urban area. Sampling strategy and sample sizes varied: 59,848 35 hospitalised adults (18+); 80,000 34 Medicare beneficiaries divided equally into four disease cohorts (cancer, chronic obstructive pulmonary disease (COPD), congestive heart failure (CHF), dementia); and a representative sample of 5,631,168 33 Medicare beneficiaries.
ML methods were similar: one 35 used random forest (RF) models only, one 34 used six approaches in which RF models performed best, and one 33 used an ensemble method combing RF, LASSO and gradient boosting approaches. All three studies used area-under-the-curve (AUC) as a measure of predictive accuracy. In this context AUC represents the probability that a randomly selected patient who died within the time horizon had a higher risk-of-mortality score than one who did not. Two 34,35 studies compared the performance of their ML models to traditional logistic regression (LR) in predicting mortality.

Main findings
In predicting 12-month mortality amongst a cohort of hospitalised adults, Sahni et al. report an AUC of 0.86 with ML methods versus 0.82 with logistic regression 35 .
In predicting six-month mortality amongst disease cohorts of Medicare beneficiaries, Makar et al. report AUC scores of 0.83 (cancer), 0.81 (COPD), 0.76 (CHF) and 0.72 (dementia) 34 . This performance is superior to LR using the authors' own dataset. Sensitivity analyses suggest that unmeasured disease severity accounts for the difference in model performance across disease cohorts.
In analysing association between predicted 12-month mortality and health care expenditures in a representative sample of Medicare beneficiaries, Einav et al. report a weak relationship 33 . Their AUC for 12-month mortality is 0.87 but for a population with low mortality rates (about 5% of Medicare beneficiaries died in 2008 and this proportion has since fallen), the 1% of beneficiaries with the highest mortality risk (46%<) account for less than 5% of programme expenditures. Nearly half of this "high-risk" group did not die in the 12-month time horizon. High spending is not concentrated among decedents and ex ante identification of mortality does not appear a useful way to identify poor value care.

Key results
Palliative care is a field of health that may be improved through ML techniques that identify patterns and make predictions using large and complex data. A systematic search of peer-reviewed papers on this topic identifies three relevant studies with important insights.
First, ML approaches are powerful in predicting mortality in older and/or hospitalised adults. All three studies report AUC statistics that compare favourably with prior mortality prediction attempts.
Second, in reported studies these ML methods are superior to traditional LR, but only provided sufficient data are available. Particularly notable were the results of Sahni et al., for whom RF model performance relied on physiologic and biochemical data. When these data were excluded, LR performance was superior to RF and this performance was modest. Many studies have used comorbidity counts derived from ICD codes as predictors of important outcomes for people with serious chronic illness including short-term mortality 37-39 , but the papers in this review suggest these indices may not be as powerful predictors as previously understood.
Makar et al. similarly observe much improved RF model performance with more considered inputs, though it is notable that these were not additional clinical data but simply an inventive approach to handling routine data. The predictive power of their four models varied across disease, reflecting unmeasured underlying severity, further emphasising that input data quality decides model performance.
Third, strong predictive power does not correspond to policy implications. Einav et al. was the only included paper seeking to apply explicitly its mortality predictions to a policy problem (high costs near end of life), but found a strong mortality prediction model was not useful for their specific purpose.
Corresponding implications for researchers and practitioners seeking to follow these examples are clear. Certainly, ML approaches have significant potential to improve identification of trajectories in seriously-ill populations. However, be mindful of the truism that a model is only as good as its inputs. Where possible, collect granular, individual-level clinical data. Optimise available routine data with innovative approaches where possible. Do not assume that ML approaches are de facto superior to traditional LR, and compare performance of different models. Perhaps most important, recognise that improving clinical decision-making will require more than simply improving the predictive power of mortality models. Expand analyses to other outcomes of interest and to the processes underpinning those outcomes.

Limitations
This review has a number of important limitations. We conducted a rapid review of published peer review literature and not a full systematic review incorporating grey literature. Rapid reviews are a well-established methodology for gathering evidence in a limited timeframe 30 , and we considered this appropriate for our aims of characterising the landscape of a fast-emerging field 31 . Our results can inform ongoing attempts by researchers and practitioners to harness the power of ML methods in improving palliative care, a policy priority worldwide. A particular strength of our methods was that we retained two independent reviewers at each stage, per systematic review methodology, an approach many rapid reviews eschew 40 . Nevertheless this is a rapidly evolving field 16,41 , and there are likely conference abstracts and other works in progress that would have featured in a systematic review but are excluded here.
Another limitation of the rapid review timeframe is that we did not formally assess quality or risk of bias in included papers. Instead we incorporated considerations of data quality and study usefulness in reporting, e.g. in interpreting differences in available data across studies in the context of their results. While we specified at the outset that we would not conduct a metaanalysis, we do not consider this a serious limitation in the context of our findings: specific data inputs vary across included papers yet are central to the performance of ML methods; pooling results from these papers is therefore of limited relevance.
Our eligibility criteria led to some exclusions of papers applying ML methods to improve palliative care (see extended data 32 for 18 papers excluded at full text review). In particular, four papers that were otherwise-eligible and examined patient outcomes, e.g. QoL, a comfortable death and related factors, were excluded for defining their sample by vital status (a sub-sample of decedents were extracted ex post) [42][43][44][45] . While decedent cohort studies have well established value in some research contexts 12 , we established this criterion ex ante given endogeneity and bias concerns 28,29 . This decision was validated by the demonstration of one included study that found analyses of the ex post dead are not particularly useful in analysing ex ante the sick 33 . Additionally we excluded one otherwise-eligible paper for pooling children and adults in the sample 46 , but this decision was based in standard research practice in this field 26 and this was reflected in 19 of 21 (90%) full texts using adulthood as an eligibility criterion. Finally, we did not include studies where the endpoints of interest were process measures, e.g. patient-physician interaction, advance care planning and expression of patient/ family preferences 47 . While it is rational to assume that improved processes using ML will lead to improved outcomes in the long run 16,41 , we required that this effect on outcomes be evaluated to be eligible for our review.
One paper was excluded as unobtainable within the timeframe of the analysis, although we did subsequently obtain a copy. We did not consider the paper met our eligibility criteria. While we decided at the outset to exclude papers not in the English language, no paper was ruled ineligible on that basis.

Future research
At the outset of this review we identified outcomes of interest in three domains: patient outcomes (survival; QoL); caregiver outcomes (survival; QoL); and economic outcomes (costs, receipt of cost-(in)effective treatment, high-intensity treatment at end of life). We specified these as established measures for quality in palliative care, and so of direct relevance to practitioners and policymakers 27 . Most of these domains were unaddressed and so stand as priority areas for future work.
Our review included three studies with a predominant interest in patient mortality. No included study examined patient outcomes such as QoL. Evaluations of QoL are not straightforward because the outcome of interest is an individual and subjective concept where mortality is an observable binary state. Nevertheless, the reality is that living and dying with serious illness, and caring for those populations, is messy and complex. Studies characterising palliative care need beyond mortality, for example those at risk of pain or unmet need or death anxiety 42-45 , and accurately predicting risk would have the capacity to improve clinical decision-making and treatment pathways.
No included study, or any study rejected at full text, incorporated caregiver perspective. This is perhaps not surprising as the patient is naturally the primary focus of clinical practice and associated research. Nevertheless the role of unpaid carers is well recognised in this field, and evaluations of dyad and family outcomes is increasingly common 48 . Identifying caregiver needs in advance would also have vast potential benefit.
One study examined an economic dimension -the longestablished association between end-of-life phase and high costs. The authors conclude that this is not a useful lens by which to improve resource allocation in the care of the seriously ill. Rather, those dying with terminal illness and multimorbidity are a subset of all people living with high illness burden and high associated health care use. Identification of appropriate care and supports for this group using ML must be cognisant of recent work identifying substantive treatment effect heterogeneity by disease profile and burden in palliative care 49 . In turn, this review has important insights for treatment effect heterogeneity work that has been based on routinely collected clinical data such as ICD codes: these data, it appears, are relatively weak predictors of relevant outcomes 50 .
An additional concern for future studies across all outcomes of interest is timeframe. The three included studies here used six-and 12-month survival as their outcomes of interest, but there is increasing recognition that palliative care has benefits across the trajectory of life-limiting illness 3 . For example, American Society of Clinical Oncology guidelines recommend that palliative care is provided across the disease trajectory 51 , and the last 12 months of life accounts for less than a third of US cancer care costs 52 . Treatment choices from diagnosis have the greatest scope to impact outcomes and costs 53,54 , and so studies that can inform these choices are the most useful.
Ethical issues receive scant attention in the studies read for this review. Questions arise with respect to how systems will use the data collected. For example, demographics may be important drivers of outcomes and treatment effects in the context of different experiences and preferences for care across social groups 55,56 . But offering or providing treatments to some groups and not others based on sociodemographic characteristics is, to say the least, ethically problematic. Also important are the data not collected. Fulfilling the goal of a "good death" involved fixed and modifiable dimensions of patient personality, experience and preferences. Personal resources associated with improved QoL near end of life may include religiosity and beliefs, ''acceptance of reality'', ''life meaning and purpose'', ''self-worth'', ''hope',' and ''caregivers' support and acceptance'' 57 .
These limitations may be best addressed in the context of broad conceptual considerations for the optimal applications of big data 58 . In all fields it is critical to ground big data collection and measurement in conceptual frameworks to guard against results that are specious, not generalisable or not actionable 59 . This is never more true than in palliative care, where many areas of data collection and conceptual understanding are formative.
No matter how powerful the artificial intelligence driving models to improve decision-making, it remains paramount that research, policy and practice protects space for the human wisdom of patients, their families and health care professionals in optimising experience of a unique life event.

Conclusion
Improving palliative care is a policy priority worldwide. ML has the potential to support clinicians in improved decision-making by identifying those at heightened risk of inappropriate care, poor outcomes and mortality. To date studies have demonstrated capacity to improve mortality prediction. Other outcomes have not received equivalent attention. Applications of ML approaches to policy and practice remains formative. Derived results depend on available data and must be interpreted in this context. Future research must not only expand scope to consider other outcomes and longer timeframes, but also address individual needs and preferences in the context of prognosis, and engage with the profound ethical challenges of this emerging field.

Data availability
Underlying data All data underlying the results are available as part of the article and no additional source data are required.

Grant information Health Research Board Ireland [ARPP-A-2018-005 to PM].
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
I believe the paper should be indexed and my comments are broad to hopefully help the authors strengthen the paper. The subject area is fairly narrow: Machine learning of routine data in palliative care. Consequently, few articles are included and the authors comment on the articles not included as a potential limitation. The authors do touch briefly on the reason why only routine data was chosen (i.e. the majority of BigData is routine date); however, it would be useful to have a little more justification for the choice of this (in the background) and a little more about how this may limit the broad usefulness of a paper about 'machine learning' (in the limitations), with some discussion about how analysis of other forms data with machine learning could benefit palliative care (in future research/directions).
Implications to policy and practice: I think a headed section of the discussion would be helpful for readers to determine whether there is evidence to change practice/policy. Also, I think it would be useful if the paper could provide a slightly clearer statement of whether the current evidence should change clinical practice or not. Currently it reads that applications to policy and practice are formative. I feel that a clear sentence stating whether clinical practice should change or not based on the evidence would be useful (e.g. 'not enough evidence to reccommend changes in policy/practice as evidence is formative etc etc).
Future research: It would be useful for the authors to detail the research opportunities of other forms of Big Data (e.g. wearables, apps, user generated data, non health related data (e.g. IoT devices) and other forms of artificial intelligence studies (deep learning, natural language processing).

Are the rationale for, and objectives of, the Systematic Review clearly stated? Yes
Are sufficient details of the methods and analysis provided to allow replication by others? Yes I think it may be more useful if this is a rapid review on current applications of ML (or even artificial intelligence) in palliative and end of life care. This might simplify the search terms and relax the inclusion criteria, thus may result in more studies (rather than 3 studies only) to be included. It is unclear why the review should be based only on those studies with some specific outcomes.

Minor comments
Title may need to be changed so it reflects more accurately the aim.
Type of intervention: care that "improves the quality of life of patients and their families facing the problems associated with a life-threatening illness". Why "intervention" only? How about observational studies in which case "PECOTS" formula should be used? Why not other outcomes (e.g. satisfaction, functional outcomes, caregiver burden)?
Types of studies and reports: "Studies that counted backwards, examining samples defined by Types of studies and reports: "Studies that counted backwards, examining samples defined by characteristics at death or according to outcome, were excluded." It needs more justifications on why these studies were excluded.

Are the rationale for, and objectives of, the Systematic Review clearly stated? Partly
Are sufficient details of the methods and analysis provided to allow replication by others? Yes

Is the statistical analysis and its interpretation appropriate? Yes
Are the conclusions drawn adequately supported by the results presented in the review? Partly No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Palliative and end of life care; big data; Statistics and Epidemiology; complex intervention.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 06 Aug 2019 , Trinity College Dublin, Dublin, Ireland

Peter May
We thank the reviewer for their comments. These have been pasted below, numbered sequentially, and addressed as appropriate with and/or Response to Reviewer Revision to . Manuscript We hope that these address the concerns around rationale and conclusions as returned in the original reviewer report.

Major comments
2.1 This rapid review provides some interesting findings in a fast-emerging application area in palliative and end of life care. I enjoyed reading it. My major concern is related to the aim of this review which may also affect the choice of the search terms and the study selection.
I think it may be more useful if this is a rapid review on current applications of ML (or even artificial intelligence) in palliative and end of life care. This might simplify the search terms and relax the inclusion criteria, thus may result in more studies (rather than 3 studies only) to be included. It is unclear why the review should be based only on those studies with some specific outcomes.
As noted in the Methods>Eligiblity criteria>Type of outcomes, outcomes Response to Reviewer: Studies that counted backwards, examining samples defined by characteristics at death or according to outcome, were excluded for three reasons. First, participant selection at death biases substantially derived results by distorting timeframe of analysis. Second, mortality is an outcome and many characteristics at death are outcomes; they are not independent of treatment choices but instead endogenous and therefore inappropriate as an eligibility criteria for evaluating the treatment. Third, treatment effect estimates from decedent cohort studies are in practice evidence that "treatment t should be provided to population p", where population p is partly defined by imminent death. They are therefore only useful under scenarios of very good prognostic accuracy, and this is seldom the case in the real world.
As well as the addition to Methods>Eligibility critieria>type of studies and reports, we Response: also note the following lines of justification in the Discussion>Limitations, "While decedent cohort studies have well established value in some EOL research contexts, we established this criterion ex ante given endogeneity and bias concerns. This decision was validated by the demonstration of one included study that found analyses of the ex post dead are not particularly useful in analysing ex ante the sick." No competing interests were disclosed.

Competing Interests:
The final sentence of the first paragraph is a bit harsh -I suggest replacing 'consistently' with 'sometimes'.
Palliative care is defined as an 'interdisciplinary specialism' but the rest of the definition is concerned with the WHO's palliative approach to care rather than the specialist/generalist nature of the workforce delivering it. I suggest focusing just on the WHO definition. I am by no means an expert, but doesn't ML only sometimes 'make inferences without explicit user instruction' -i.e. when it is of an unsupervised kind? Methods I found reference to 'Types of intervention' confusing, as this heading is usually reserved for evaluative reviews (e.g. of RCTs). Given that life-limiting illness was not mentioned under 'Types of participants', I suggest moving this part of the criteria under that heading. QOL is dealt with under 'Types of outcomes'.
Some justification is needed of excluding studies that 'counted backwards'. This may also need further explanation in the context of the predominance of AUC, which seems to focus on people who died and then look back to see what predicted this.
The English language requirement needs to be included under eligibility rather than Information sources.
The Information sources heading needs to also include 'Search'.
'Hand searching' is a more widely used term than 'Snowball sampling' for reviews.
Remove the heading 'Data analysis' and leave in those for extraction and synthesis.

Findings
Unless I have misunderstood, this section seems to refer to other studies that were not included in the review and so should be moved to the Discussion. But what is meant by 'prior mortality prediction efforts' and which authors are referred to in 'the authors' own dataset'? If the authors of the paper in question, then it should be written in the past tense.
I lack expertise to comment on interpretations brought to the different statistical approaches used in the studies.

Discussion
Here, 'prior mortality prediction attempts' really is presumably referring to other research that does not use ML; references are needed.
I don't quite follow the statement 'This places a question mark against the assumption in much palliative care literature that comorbidity counts derived from ICD codes are important predictors of short-term mortality', and again references are needed.
The justification that quality appraisal was outside the authors' 'sphere of interest' reads oddly and is not a normally accepted justification. Could it be, instead, that existing quality appraisal tools are not a good fit for studies using routine data?
I recommend removing reference to meta-analysis in the limitations, as this is totally unsuited to the I recommend removing reference to meta-analysis in the limitations, as this is totally unsuited to the review's aims.
have defined this as an RR ourselves, conducted the whole thing to a tight time frame (circa 6 months) and excised various elements of an SR that I would consider non-negotiable (grey lit, PROSPERO registration). Despite the temptation then, I'm minded to leave as is unless instructed otherwise by editors and reviewers.
: None. Revision to Manuscript 1.2 Regarding terminology, I would use palliative care throughout, as it's confusing for the reader to seemingly distinguish between these without defining the latter.
: 'Palliative and EOL care' as a sphere of interest has been changed throughout to Revision 'palliative care', including the paper title. A small handful of 'end of life' uses remain where appropriate to specific context.

No need to repeat peer-reviewed as an eligibility criterion.
: Final clause of the Methods deleted. Revision

Poor prognosis is a weak driver of costs.
: Not clear exactly what this comment implies but perhaps it's a poor explanation of the Revision finding.
Revised final two sentences of the Results: ML-informed models outperformed logistic regression in predicting mortality where data inputs were relatively strong, but models using only routine administrative data had limited benefit from ML methods. Identifying poor prognosis does not appear effective in tackling high costs associated with serious illness.

The final sentence of the first paragraph is a bit harsh -I suggest replacing 'consistently' with 'sometimes'.
: Changed to 'often'. Revision 1.6 Palliative care is defined as an 'interdisciplinary specialism' but the rest of the definition is concerned with the WHO's palliative approach to care rather than the specialist/generalist nature of the workforce delivering it. I suggest focusing just on the WHO definition.
: Reference to interdisciplinary specialism removed. Revision 1.7 I am by no means an expert, but doesn't ML only sometimes 'make inferences without explicit user instruction' -i.e. when it is of an unsupervised kind?
: make inferences without explicit user instruction. Revision can : make inferences without explicit user instruction. Revision can Methods 1.8 I found reference to 'Types of intervention' confusing, as this heading is usually reserved for evaluative reviews (e.g. of RCTs). Given that life-limiting illness was not mentioned under 'Types of participants', I suggest moving this part of the criteria under that heading. QOL is dealt with under 'Types of outcomes'.
: 'Type of intervention' section merged into 'Type of particpants'. Revision 1.9 Some justification is needed of excluding studies that 'counted backwards'. This may also need further explanation in the context of the predominance of AUC, which seems to focus on people who died and then look back to see what predicted this.
: Addition to Methods>Eligibility critieria>type of studies and reports (also in response to Revision Reviewer 2, comment 2.5): Studies that counted backwards, examining samples defined by characteristics at death or according to outcome, were excluded for three reasons. First, participant selection at death biases substantially derived results by distorting timeframe of analysis. Second, mortality is an outcome and many characteristics at death are outcomes; they are not independent of treatment choices but instead endogenous and therefore inappropriate as an eligibility criteria for evaluating the treatment. Third, treatment effect estimates from decedent cohort studies are in practice evidence that "treatment t should be provided to population p", where p is partly defined by imminent death. They are therefore only useful under scenarios of very good prognostic accuracy, and this is seldom the case in the real world.
: Studies using AUC can either do prospectively (enrolling people at baseline and then Response following them until death and then using baseline data to create and validate a mortality-prediction index) or retrospectively (scooping up all data after the fact then using baseline data to create and validate a mortality-prediction index), and included studies do this. No included studies (but a number of excluded full texts in the supplementary materials) defined the sample at death and counted backwards where any baseline is created ex post by the investigator, which is a recipe for bias ( ). https://www.ncbi.nlm.nih.gov/pubmed/15585737 1.10 The English language requirement needs to be included under eligibility rather than Information sources.
Moved to end of 'types of studies and reports'. Revision:

The Information sources heading needs to also include 'Search'
Added as Table 1. Revision: 1.12 'Hand searching' is a more widely used term than 'Snowball sampling' for reviews.
Changed to 'Hand searching'. Revision: 1.13 Remove the heading 'Data analysis' and leave in those for extraction and synthesis.#