A review of public health economic modelling in the National Institute for Health and Care Excellence (NICE) [version 1; peer review: 3 approved with reservations]

Background: The National Institute for Health and Care Excellence (NICE) use economic modelling to inform judgements whenever further insight is required for decision-making. Doing so for public health guidance poses several challenges. The study’s objective was to investigate the level of heterogeneity in NICE’s public health economic models with regards to economic evaluation techniques, perspectives on outcomes and the measurement of non-health benefits. Methods: A review of all economic modelling reports published by NICE’s Centre for Public Health (CPH) as part of their guidance. Results: The review identified 56 eligible pieces of public health over the relevant period. Of these, 43 used economic modelling and 13 used no formal economic model. In total 61 economic models were used. Though the CPH specifies a reference case, in practice there is a large amount of variability from one model to the next. The most common perspective used for evaluations was that of the National Health Service (NHS); the most common economic evaluation approach was cost-utility analysis (CUA). 23 of the 56 topics used other combinations of perspective and technique, which allowed them to incorporate non-health effects, such as productivity, the effect on taxes raised and benefits spending, costs to the criminal justice sector, the effect on educational attainment and general wellbeing. Conclusions: NICE regularly updates its reference case, and non-CUA evaluation techniques have become more prominent in recent years. The results highlight the genuine advantages of having a variety of economic evaluation techniques available, which can be matched with the given topic. While it is always necessary to be wary of the possibility of gamesmanship and cherry picking, there is a surprising Open Peer Review


Background
The National Institute for Health and Care Excellence (NICE) is responsible for producing technology appraisals and evidence-based guidelines for health and social care, including public health, in England. The Wanless Report 1 stated that "to achieve the objective of allocating funding more efficiently between health care and public health, it is vital that... analytic [al] methods are used". NICE's Centre for Public Health Excellence (CPH) was founded in 2005 to be responsible for producing the public health guidance, relating to "preventing disease, prolonging life and promoting health and efficiency" 2 .
After a NICE internal reorganisation in 2015, standalone directorates were merged, making definitions of public health appraisals less demarcated from other topics areas. Reference cases and other methods were harmonised across NICE in all guidance producing areas. Many of the unique aspects relating to public health were incorporated into the unified methods used by NICE, and CPH staff redirected to the broader 'Centre for Guidelines' (CfG). Due to these changes, it was decided to limit the range of guidance topics in the review up until December of 2014, when the final guidance topic under the CPH banner was published.
In practice it is challenging to adequately understand, quantify and model the multiple effects of many public health interventions 3 . Certain interventions cannot in practice be investigated directly due the methodological challenges specific to public health; these include the attribution of effects to interventions, measuring and valuing outcomes, identification of intersectoral costs/consequences, and how best to incorporate equity issues 4,5 . Economic modelling can in principle address many of these issues, providing estimates of effect and quantifying the uncertainty related to these.
CPH guidance took into account the cost-effectiveness of the intervention, the likely impact of its provision on societal equity, and other concerns 6 as part of the relevant committee's deliberations. Final decisions on public health guidance require human judgement, regardless of the reference case and other frameworks used to structure them, which are "informed by science but nevertheless judgements" 6 . Aside from the challenges of public health modelling more generally, understanding NICE decision making processes are complex and can be difficult to articulate and understand 7 . The full range of nuances and the complex nature of deliberations cannot necessarily be recorded accurately, leading potentially to a lack of clarity about how final decisions are made, such as whether all relevant criteria have been given the appropriate consideration throughout, or how factors have been weighted implicitly. This is exacerbated by the fact that originally NICE committees were inaccessible to public viewing. One stage of the directorate's decisionmaking process at which all issues under analysis have been stated explicitly was during the economic modelling process.
For this reason, the objective of this paper was to complete a review of all economic modelling conducted in the CPH over the period of March 2006 to December 2014, in order to investigate the how the reference case was used in public health settings in practice, and in particular to assess • the level of the heterogeneity in terms of the use of economic evaluation techniques, • perspectives on outcomes and • the measurement of non-health benefits.
The paper describes the variety of issues (and compromises) that have been considered in establishing the cost-effectiveness of various approaches as part of the guidance process. These reflect the broad scope of public health settings and the wide range of costs and benefits at a population level outside of health. Each topic is unique, requiring its own criteria, choices as to the most appropriate economic appraisal technique(s) to be used and general flexibility, regardless of the official line specified in the reference case, shown in Table 1.
The public health reference case gradually changed since the CPH's foundation in 2005 8 ; it had initially been based heavily upon the reference case of the health technologies directorate. The CPH's discount rate, for example, was reduced in 2012 9 to 1.5% to reflect the long term nature of many public health interventions (and because other directorates had begun to reduce the rate similarly), before being raised again in 2014. Further economic evaluations are now also permitted than were initially the case. Other elements of the reference case have not changed over this time; for example, QALYs remain the sole recommended measure of health effects and explicit equity weighting is not permitted -though these issues have too been subject to occasional criticism 10 . The CPH had a specific responsibility to consider the equity of outcomes alongside cost-effectiveness concerns 11 , and it was known the use of a purely CUA approach would fail to address equity or distribution issues directly 12 . Equity was not considered explicitly until the 2014 update to its methods manual 13 , and not incorporated into modelling; it was to be only considered later in the decision process.
When is modelling required?
The approaches used for economic evaluation in the CPH, as elsewhere in NICE, compared the costs of interventions under consideration with their expected benefits, making explicit how effectively they meet the directorate's objectives. While there are differing approaches to quantifying the benefits and costs depending on the economic evaluation technique used, the CPH defined their aim broadly as "the promotion of good health and the prevention of ill health", taking an "inclusive" perspective 8 . Over the long term, this quantification should reduce the potential for inconsistent prioritisation and the opportunity costs associated with this, and lead to the "conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients" 14 . Evidence of the effectiveness of certain interventions is not always available and it is not always possible in practice to capture the full range of non-health population impacts into the model, and potentially hence into the broader decision process.

Synthesis of evidence on outcomes
Based on a systematic review

Time horizon
Long enough to reflect all important differences in costs or outcomes between the interventions being compared

Measure of health effects
QALYs: the EQ-5D is the preferred measure of healthrelated quality of life in adults Measure of non-health benefits Where appropriate, to be decided on a case-by-case basis

Source of data for measurement of health-related quality of life (HRQoL)
Reported directly by people using service and/or carers

Source of preference data for valuation of changes in HRQoL
Representative sample of the UK population Discount rate • The same annual rate for both costs and health effects (currently 3.5%) • Sensitivity analyses using rates of 1.5% for both costs and health effects may be presented alongside the reference-case analysis • In certain cases, cost-effectiveness analyses are very sensitive to the discount rate used. In this circumstance, analyses that use a non-reference-case discount rate for costs and outcomes may be considered

Equity weighting
A QALY has the same weight regardless of the other characteristics of the people receiving the health benefit The available evidence in public health tends to be from a broader range of settings and hence is sometimes shallower than in pharmacoeconomic studies, at least in terms of the hierarchy of evidence. These hierarchies follow a standardised ranking from randomised control trials to expert evidence, and are based upon the increasing probability of bias being introduced 15 . But they are not necessarily appropriate for use in public health settings 3 . The length of the causal chains for these interventions requires assumptions and judgements to be used, which must be tested through modelling and deliberation as the "principle of the accumulation of results of trials does not sit well with model and theory based sciences" 3 .
Similarly, expert judgement, deliberation and experience remain vital to decision-making within NICE. There is wide variation in this -the initial evidence review carried out for a topic may find overwhelming evidence of effectiveness or cost-effectiveness, or in other cases effectively no such evidence. Economic modelling is used in the area in between these extremes to better inform decision-makers, where further insight is required to interpret the evidence available. Where it is already clear in advance that the intervention would be cost saving or where it is obvious the costs are relatively small compared to the expected health gains, modelling is not necessary 9 .
How is modelling used?
Interpreting and deciding which evidence is most relevant to the model requires a pragmatic approach, utilising simplifying assumptions. As with any scientific setting, the model can therefore offer only an approximation. The model makes explicit the logical implications of the data available, making it easier for decision-makers to draw rational conclusions given the uncertainty associated with each potential course of action. In many cases ballpark figures will provide a strong indication of whether or not the intervention is likely to prove cost-effective. Many interventions in public health are extremely so; about 15% of those investigated were shown to be cost saving and 89% were below the usual NICE Incremental cost-effectiveness ratio (ICER) threshold of £30,000 16 . Probabilistic and sensitivity analyses can be used to further indicate the robustness of a model's conclusions.
These assumptions and decisions are an inevitable part of the process and "economic modelling requires judgements to be made by both modellers and decision-makers" 6 . There are several stages to this process. NICE usually tenders the evidence review and modelling processes to experts in academia, but CPH staff would liaise throughout to ensure that the methods used were in line with NICE's requirements. The model must be requisite -parsimonious but good enough to do the job 17 -requiring appropriate balancing between the aims, costs and effects of the interventions under review. Modellers, in collaboration with topic appraisal committees, must also make decisions as to where insufficient data are available to investigate certain interventions further. The study into Workplace interventions to promote smoking cessation (PH5) 18 , for example, was initially also intended to investigate mass media interventions, but the previously completed literature review found no relevant evidence of effect; it was therefore not pursued further in the modelling stage. As well as the use of the reference case, the 2012 CPH methods manual 9 stated that modellers must ensure that: • the most important questions or intervention areas are selected for economic analysis • the overall modelling approach is appropriate • important health effects and resource costs are all included • effects and outcomes not related to health are included (if they are material for the sector whose perspective is being used, usually the public sector, local government or the NHS.
• best available effectiveness, epidemiological and resource evidence is used • model assumptions are plausible • uncertainties are fully explored and systematically addressed • results are interpreted appropriately and any limitations are acknowledged.
In some cases, de novo models are required, though on many occasions it is possible to base them upon previous work. Because over time the guidance on specific topics is revised, previous models may be recycled, reducing the work required and easing comparisons between conclusions. There is also a limited number of themes explored in practice, and several tobacco related topics, for example, used a similar simulationbased model indicating the probability of acquiring a number of smoking related conditions (or death) as simulated individuals aged (such as PH5 18 , PH10 19 , PH14 20 ). Similarly guidance on physical activity reused updated versions of a model incorporating the risk of coronary heart disease, stroke and type 2 diabetes mellitus (e.g. PH44 21 , PH54 22 ). This allowed not just for comparisons between the effectiveness of interventions versus 'doing nothing', but also between topics. Other models are based upon or incorporate pre-existing work previously completed externally to NICE. Economic modelling may also solely focus on a specific part of the guidance or investigate a sub-population of those to whom the broader guidance will apply.

Model perspectives and economic evaluation techniques
The CPH's reference case, the set of standard approaches to be used in economic modelling and broader decision-making, was intended to be flexible where necessary on a case-by-case basis. This standard approach recommended a public sector perspective, to take account of the costs and benefits of each intervention 9 . The 2014 methods manual 13 formally allowed for discretion on the perspective used across all NICE directorates for the first time, varying according to the nature of the problem. Previously a public sector perspective was largely confined to public health settings. An NHS and Personal Social Services (PSS) specific perspective is used where costs and benefits are largely related to health alone. This has also previously been used in many cases in public health and is the standard for drug and technology appraisals in NICE. Where these criteria largely fall upon local authorities, a local government perspective can similarly be used. However, in practice which perspective is used can be something of a moot point. In many cases models claiming to use a public sector perspective used only healthcare costs (e.g. PH32 23 ); in others, no perspective has been explicitly stated. Other perspectives are also possible, such as the employer benefit for workplace interventions (PH5 18 ) or a societal perspective for example in cases of domestic violence (PH50 24 ). These were codified in the reference case for the first time in the 2012 guidance 9 , but had nonetheless previously been used. The most common economic evaluation techniques used in the CPH were Cost-effectiveness analysis (CEA), Cost-utility analysis (CUA), Cost-benefit analysis (CBA) and Cost-consequence analysis (CCA); for definitions of these techniques see Drummond et al. 25 .
Economic modelling is informed by a prior evidence review, specifying the range of relevant interventions that have been used in the area in the past, the nature and strength of studies investigating them, and their effectiveness, costs and applicability to a UK context. Where no previous study has directly measured the outcomes under investigation, economic modelling can be used to compute estimates of these outcomes, using parameters populated based upon results derived from the literature discovered as part the evidence review. In cases where it is not appropriate or possible to further model results, generic discussion pieces may still be written to discuss how an economic model would be used if such a thing were possible, and to indicate the magnitude, 'wheres' and 'hows' of the underlying uncertainty that make more formal modelling impossible.
Where possible, the costs of an intervention could be assessed based on the information available in the literature described in the evidence review. This typically included the costs of any device or pharmaceutical required, staff time, monitoring and maintenance costs, treating adverse events, rent and so on. Public health also throws up more unusual consequences; the costs found in the review also included, for example: decreased tax revenues (PH41 26 ); the impact of having to pay for extra years of health care (PH23 27 -though these were more than outweighed by the decrease in costs treating the illnesses associated with smoking); and issues relating to injuries occurring as a side effect of the intervention (PH44 21 ). The impact upon individuals volunteering to give up their time may be highlighted (PH9 28 ), though this cannot be fully incorporated into CUA as costs to carers and non-patients have in practice been considered outside the scope of NICE's reference case.
Final decisions rest with the relevant committee for the topic, titled "Public Health Advisory Committees" (PHACs). Prior to the 2012 public health methods manual 9 , there were two different types of committees in their place. The Public Health Intervention Advisory Committee (PHIAC) was a standing multi-disciplinary panel which could look at clear, well defined public health topics, such as "Prevention of sexually transmitted infections and under 18 conceptions" 29 and "School-based interventions on alcohol" 30 . Programme Development Groups (PDGs) were assembled on a topic-by-topic basis, to create guidance in broader, more complicated or less clearly defined areas, such as "Behaviour change" 31 or "Community engagement" 28 .

Equity in economic modelling
Issues relating to equity are also not currently formally considered as part of the economic modelling stage, though of course they remain a fundamental part of the broader guidance development process. In a similar manner to the impact of volunteerism, this does not mean that such issues are irrelevant to the modelling stage. The Tuberculosis: hard-to-reach groups (PH37 32 ) guidance explained how out-of-pocket expenses -again relevant to private individuals rather than to the public sector perspective, which was used in this model -are likely to have a disproportionate effect on homeless populations. As such, they may be relevant not solely to ethical issues around access for these groups, but also in the practical terms of the likely real-life effectiveness of the intervention (assuming that homeless populations will have significantly lower uptake than might otherwise be expected as a result). The committee could then take this into account during the later stages of formalising recommendations. Equally, though the modelling itself may not deal with equity head on, at times the topic itself may obviously relate to improving health outcomes for specific vulnerable groups. And during several topics, public sector (and societal) perspectives may bring into consideration areas outside of healthcare where outcomes appear to be heavily influenced by socioeconomic and other factors.
Interventions may also increase inequality in the short term if higher social status groups are likely to benefit from first mover advantage. Having worked on developing the guidance on Walking and Cycling (PH41 26 ), there was a concern that such an issue may arise (at least temporarily), though it was hoped that in the longer term there would be increased uptake amongst all groups. A similar concern was expressed in published guidance on Physical activity and the environment (PH8 33 ). It is tempting to draw a comparison with the Kuznets curve 34 , which implied that as a nation's economy develops inequality will widen for a time before ultimately reducing once a certain overall level is attained; a similar phenomenon relating to health inequalities may exist in certain public health settings.
At times decisions in the CPH could even hinge on seemingly inconsequential differences in interpretation of the scope at present and such subtleties may prove difficult to represent in models. The scopes may themselves change, via gradual, subtle reinterpretation in light of the facts as they emerge. The facts used are not always self-evident and require judgement and an understanding of the causal chains of effect 35 . With a new government, wholesale changes in scope are also possible. PHACs always needed to interpret some issues in an ad hoc manner as they arose -using their experience and judgement as part of this -though recommendations had to be formulated carefully.
An example framework from start to finish for developing the economic model, taken from PH41 26 , is shown in Figure 1, displaying the variety of stages required from defining to whom the modelling stage should apply through to quantifying the cost-effectiveness in monetary units.

Methods
All economic modelling reports on public health topics, published by NICE between March 2006 and December 2014 were eligible for inclusion in this review. These are published on NICE's "Guidance and advice list" webpage. Each guidance topic has its own webpage listed there, containing an 'Evidence' section. This lists relevant reports offering supporting evidence on which the subsequent decisions are based. The reports available vary from topic to topic, but typically contain an effectiveness/costeffectiveness literature review, an economic modelling report and (if a sufficiently long period has passed since the guidance was issued) a report reviewing whether to update the guidance. On occasions, further reports are included such as fieldwork, qualitative approaches or case studies. Reports do not have a standardised nomenclature and occasionally the modelling report is incorporated into the literature review.
For this review, all reports which are described as relating to economic modelling for each piece of guidance have been investigated. Where a topic's guidance webpage has no listed "economic modelling" report, all reports labelled under evidence, effectiveness and cost-effectiveness were reviewed to ensure that economic models that were instead listed there were included in this review. For topics that instead reviewed a series of case studies to illustrate the potential cost-effectiveness of approaches -such as Strategies to prevent unintentional injuries among under-15s (PH29) -these case reports too were investigated and described. This process was carried out primarily by BR, with recourse to the co-authors where necessary. There was no simple one-to-one relationship between topics and modelling reports. While some guidance did not require any modelling approaches, others used multiple models in the same report, or across multiple reports. Sometimes this was to reflect topics where different perspectives are possible, such as initially using an NHS and PSS perspective before investigating the effect of considering further public sector or societal issues. On occasions where a topic is broadly defined, a wide variety of very different interventions may fit under the definition and in such cases a smattering of these may be used as case studies.

Results
In total 56 Public Heath guideline reports were published between March 2006 and December 2014 by NICE, and following screening all were included in the review (see Figure 2). 13 of these used no economic modelling. Of the remainder, 30 of the 43 used CUA alone. Others typically used CUA alongside other approaches: 3 used CBA, 3 CCA, 2 Net Financial Benefit (NFB) and 1 arguably used both CBA and CCA. One used an approach that could be equally described as CBA or CUA (described later). Many used more than one model, and in total 61 models were published.
Unsurprisingly, the primary criterion used to quantify the benefits of interventions has been health, generally represented by QALYs, though there have been both exceptions and additions to this. For example, the "contraceptive services with a focus on young people up to the age of 25" [PH51] economic model uses a series of CEAs -such as reduction in the rate of ectopic pregnancies -which were not subsequently translated into QALY gains, and "promoting physical activity in the workplace" [PH13] guidance (which was aimed at employers) described the potential reduction in absenteeism rather than health outcomes. ICERs have been calculated using a variety of approaches, particularly Markov modelling techniques such as state transition modelling and cohort simulation studies. At times, the number of deaths or cases averted is shown (e.g. PH41 26 ) in addition to the QALY measurements; in principle, such an approach could be expanded upon more formally in a CCA.
For simple cases using CUA, the QALYs lost to mortality and morbidity are calculated, such as from fatal and non-fatal myocardial infarctions (PH6 31 ). Where the data are available, and a broader perspective is required, the reduction in quality of life associated with non-health causes may also be incorporated. However, in many cases, such research was not previously carried out and cannot therefore be reliably quantified, meaning such outcomes were excluded from the model.
In Interventions to reduce substance misuse among vulnerable young people (PH4 36 ) for example, the reduction in QALYs associated with robbery was included because a previous study had estimated this effect. But the effects of other crimes listed in the initial model were excluded as no comparable studies had been carried out (and even though it is relatively clear that these would similarly have had a negative effect on people's lives and health). The effects of unemployment were excluded from the final model for the same reason. Similarly, guidance on Managing overweight and obesity among children and young people: lifestyle weight management services (PH47 37 ) explained that while related bullying and subsequent mental health issues are clearly relevant, there have been no studies indicating their health-related quality of life impact. Because these may well be implicitly included in prior studies, they were excluded as to include them would risk double counting their effects.
Such exclusions may mean that the full range of benefits are not taken into account in the model, potentially increasing the apparent ICER (assuming the full range of costs are included) and therefore making the intervention look less cost-effective than it might otherwise.
To a certain extent it may be possible to include such data in other ways, such as using a CCA. In Physical activity: brief advice for adults in primary care (PH44 21 ), for example, the CUA model included the QALY impact of increased physical activity on reducing coronary heart disease, strokes and type 2 diabetes, whereas the CCA approach incorporated these factors alongside the improvement in outcomes in mental health, cancer and a broader range of health effects, as well as further benefits from reduced absenteeism in work.
Smoking cessation in secondary care: acute, maternity and mental health services (PH48 38 ) extended this approach further, initially carrying out a CUA using an NHS and PSS perspective and subsequently building upon this. In the original model, a Markov simulation was used, incorporating reduced coronary heart disease, chronic obstructive pulmonary disease, lung cancer, myocardial infarctions and stroke. Because the effectiveness of approaches may vary depending on the setting in which they were employed, models were tested for a variety of casestudy type scenarios such as maternal and neonatal health issues, mental health and preoperative settings. Further inputs to each case study specific model could then be more relevant and reliable, and an initial ICER calculated. Subsequent cost savings from offsetting future costs of treating smoking related diseases were next included and a revised "total ICER" calculated. A societal perspective was then used, by incorporating the savings attributable to increased productivity from employees. Though referred to in the text of the report, the corresponding ICERs using this approach appear to have been removed from the relevant tables and net financial savings to employers portrayed instead.
The economic modelling methodologies used in each of the 56 public health guidance topics carried out up to the end of 2014 are available as extended data 39 . Where a single model is used, even for multiple interventions or across multiple reports, this is listed only once except in the case of PH12 40 , where two wholly different CUA models were used. Non-CUA approaches are described separately. At times multiple perspectives were also used as part of the modelling process and at others it was difficult to ascertain which perspective was actually used. Where a public sector or societal perspective has been claimed but only health related costs and benefits included, these have been listed as they were described in the report. In others there was no clear perspective specified and only healthcare criteria were described, and these have been listed as having an NHS and PSS perspective.
In total, an NHS and PSS perspective was the most common perspective used, on 30 occasions (though PSS generally appears to have had little or no impact). A public sector approach was used on only 15 occasions, despite this being nominally prescribed by the reference case. A societal approach was used on 11 occasions and an employer perspective 4 times. The 13 topics that did not include a model naturally used no perspective.
Of the studies that used CBA or CCA before the 2012 guidance elevated their role, all except one are concerned with the related areas of travel or physical activity. As CBA is widely used in transport planning, this result is not surprising. The remaining study, relating to preventing harmful drinking (PH24 41 ), arguably contains both a pseudo-CCA approach -listing a range of likely outcomes of each potential intervention in their natural units -and a "valuation of harms analysis" which bears a striking resemblance to CBA, which is further discussed later.
Workplace interventions have in the past used net financial benefit for an employer's perspective arising from increased productivity and reduced absenteeism (PH5 18 , PH19 42 ). CBA and CCA as a rule use societal approaches, further broadening the scope by which to judge interventions, allowing for the inclusion of issues such as environmental effects and reduced traffic congestion (PH41 26 ) reduced travel time and increased comfort (PH8 33 ), estimates of impact upon the economy (PH24 41 ) and woollier concepts around "human costs" (PH31 43 ).
There are a series of judgements to be made regarding what is relevant in cost-effectiveness modelling, though broadly speaking the NICE protocol is to include all relevant factors 6 , which can be subject to considerable uncertainty.
Because the assumptions used in building the model may not hold in practice, modellers need to be upfront about potential weaknesses and to employ sensitivity analyses to ensure findings are robust. Due to the complex nature of public health interventions, there are unique challenges in applying these approaches to this sector and judgement will naturally play a key role in interpreting the model's findings. One topic (PH39 44 -which contained the memorable proviso that "even the uncertainty is uncertain") attempted to use a Markov model to estimate the prevalence and survival rate of various conditions arising from use of smokeless tobacco amongst South Asian communities. Due to a lack of data, particularly on the costs of interventions, the predicted cost-effectiveness of the intervention was felt to be extremely uncertain. As such, the authors aimed only to highlight this issue and encourage decision-makers to exercise their judgement in making conclusions from the report's findings.
"Naturally, one has to weigh these figures with one's own assessments of where the base line estimates have been too optimistic or too conservative. The analysis presented here offers a starting point to guide one's assessment. The data limitations are too severe to offer anything else." 44 The PHAC committee and modellers can collaborate to decide whether modelling is required at all; it can be skipped if the evidence review has already clarified the likely interventions' effects. Such issues can be considered formally using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system 9,45 . This facilitates the evaluation of studies across five criteria: the risk of bias within studies; the directness of evidence; consistency of evidence; the precision of the estimated effects (relative to decision-making); and publication bias. These issues must be considered for both the intervention and the comparator. On two occasions, principles of economic modelling were employed to generate whatever insights were possible, though it was known in advance that full implementation of a model would be unsuitable due to lack of suitable comparators: • Not only were very few studies available for the guidance on the costs and benefits of community engagement programmes (PH9 28 ), but defining precisely what constitutes such an intervention was deemed effectively impossible. These difficulties were tied up with issues regarding how to quantify concepts felt to be universal goods, such as democracy, empowerment and social capital.
• The Obesity -working with local communities project (PH42 46 ) employed economic modelling to investigate "partnership working to reduce obesity". This attempted to describe the decision problem from a costs perspective, while maintaining that there exists no worthwhile comparator by which to judge cost-effectiveness and that such benefits would be too difficult to measure even if such a comparator could be found.
While CUA models applied to NHS and PSS settings are generally relatively simple to interpret, other approaches using a broader perspective or differing methodologies require more nuanced and careful consideration (and ultimately trade-offs). These topics and the criteria considered as part of the modelling are available as extended data 39 . Many contained non-health benefits (as classified by the authors of this paper), posing further challenges to decision-makers It is not immediately clear how best to incorporate these. The complex negotiations around inter-sectoral effects are well beyond the scope of this paper. But it is worth reiterating that choosing to ignore such non-health factors is in itself a decision, and a rather nihilistic one at that.
A wide range of non-health benefits was found in the review of CPH guidance. Where a public sector perspective was used, cost savings to government departments aside from the Department of Health become relevant to the decision-making process. Reduced productivity, normally measured through absenteeism or presenteeism, occurs on 12 occasions (e.g. PH44 21 , PH48 38 , PH50 24 ). Criminal justice service costs arise on 10 occasions, incorporating the combined costs of arrest, custody, court appearances and prison (PH4 36 ), police costs in implementing laws (PH29 47 ) and the impact of conviction on future wages (PH40 48 ). While costs on individuals are outside of the reference case, reduced income implies reduced tax revenue for the government in the future, and becomes relevant in this perspective. A similar approach is employed for educational attainment, which arose on 4 occasions (PH7 30 , PH12 39 , PH20 49 , PH28 50 ). Knockon effects on spending are also included, such as reduced costs of providing unemployment benefits (PH24 41 ) and drug treatment (PH4 36 ). Emotional wellbeing, broadly defined, was incorporated into the decision framework on 5 occasions (PH8 33 , PH9 28 , PH31 43 , PH47 37 , PH50 24 ).

Discussion
Comparison to previous studies Weatherly et al. 4 (also described in an associated report by Drummond et al. 51 ) reviewed 154 economic evaluations of public health interventions worldwide from 2000 to 2005 and found that 32% used a health service perspective and 31% a self-described societal perspective (though this was felt to be an overestimate) and 48% of which related solely to health. 24% had no stated perspective, 3% used multiple perspectives and the remainder related to local healthcare provider, government or patient. In contrast to the CPH guidance, the Weatherly paper found CCA was used in a relatively high 37% of studies, while 27% used CUA (whether based upon QALYs or the related Disability adjusted life years). A further 36% used CEA (excluding CUA), such as units of weight lost, alongside their cost information. CUA and CEA are recorded separately. 4 reports (3%) claimed to use CBA but these were upon further investigation re-categorised as CCA or CUA. Though it is worth reiterating that evaluations described in the Weatherly paper were not confined to UK settings (61% were from the US, 15% UK, 6% Canada and 4% others) -and are not directly comparable to NICE economic modelling -issues relating to costs to the voluntary sector and to private citizens were felt by the authors to merit further attention. Equity considerations, however, were rarely described in the literature and never addressed formally (and the authors argued that this implies that QALYs were simply summed directly for the studies using CUA). It is argued that these should be better highlighted, and opportunity costs of implementing more equitable interventions made transparent.
McDaid and Needles' report 52 featured 1700 studies from the mid-1960s to mid-2000s. 49% of studies were based in the US, 13% in the UK, 5% Canada, 4% Australia and 4% from the Netherlands. Intervention settings, rather than perspectives, are reported. 22% took place in workplace settings (overwhelmingly in the US). 8% took place in schools or colleges of higher education. Others (though no specific proportions are given) were funded by the state, social health insurance or by individuals; understandably, these tended to have less direct impact on productivity than the interventions that employers had chosen to provide. 57% of studies used CEA, 21% CCA, 13% used CUA, and the remainder used CBA (5%), econometric techniques (3%) or cost-minimisation approaches (1%). The authors emphasise the critical importance of context in understanding the influences of uptake and successful implementation of techniques, and argue that novel policy level approaches to funding are needed to ensure that non-health impacts of interventions are given adequate consideration. These will likely require further government investment given the diffuse nature of the benefits that accrue from interventions of this nature.
Though not strictly public health, a literature review describing the economic evaluation techniques used in social care was published in 2002 53 . Mental health and public health were the two most common topics that used such techniques, totalling about two thirds of all reports. Many related to multiple issues at the same time, and this complexity presents further similarities with the attempts at applying such evaluations in public health. In total, 131 reports are reported, taking place over 5 years. The perspective used was not reported (though elsewhere in the report, the importance of a societal approach is emphasised). 65% of studies used CCA, 18% CEA, 5% CBA and 6% each for cost-minimisation analysis and cost-saving analysis. 72% of studies had taken place in the US, 15% UK and 13% other.
Nonetheless, a Cochrane review on public health evidence in practice 54 found that the findings of studies to inform local public health decision-making are rarely published (and that they rarely meet the standards required for subsequent systematic reviews anyway). This may mean that the reviews listed in this section do not reflect the full range of research undertaken applying economic evaluation techniques in public health.

Potential implications
Economics is ultimately about the studying of decisions, the incentives behind them and their consequences 55 ; health economics relate to these choices in the context of the resources available in health generally. Public health economics can be defined as the "study of the economic role of government in public health, particularly, but not exclusively, in supplying public goods and addressing externalities" 56 . The broader effects of such interventions, taking into account a range of other factors, mean that such decision-making is much more complex than some other settings in health. This paper has already highlighted the range of factors considered in NICE's public health economic modelling. Complicated decisions merit that models of some sort are used to structure the information available to better inform decisions makers, who may otherwise make unnecessary mistakes 57 .
Ultimately, nearly all economic models used in the CPH translated the findings from relevant papers into gains in utility in order to facilitate a CUA approach for NICE guidance. However, such translation runs the risk of oversimplification 5 . While it is accepted that potential costs and benefits should be identified and highlighted in the limitations section or elsewhere, even where they cannot be measured -Sheill argued that those issues that are not measured, may ultimately be cast to one side if they cannot be incorporated into calculations 58 . This could in principle be remedied somewhat if these issues were listed explicitly for later deliberations, but it is well established that human intuition is systematically subject to predictable biases for complex problems 59 of this nature. The use of a model can counter this, though in such a scenario all factors that are considered relevant would need to be measured, or else incorporated in an innovative way 58 . There may therefore be a case for using these CCAs and CEAs directly in future in certain circumstances, or potentially extending these to multiple criteria decision analysis (MCDA) approaches. These techniques potentially hold a number of advantages for prioritisation in policy making settings 60 , as well as well-documented possible stumbling blocks 61 .
The use of different methodologies may not lead to as many changes as one might think, as there is a large amount of overlap and ambiguity between approaches at times. CBA and net financial benefit appear to be for all intents and purposes equivalent approaches. Both apply monetary valuations to costs and benefits arising from implementing the approach in order to arrive at estimations of the net benefit arising from it. The main difference appears to be that net financial benefit was used as the term of choice when an employer perspective was used. As such, it has been maintained in the tables above as initially described, but it is worth bearing this in mind.
CBA also has links to CEA (and ultimately CUA). Guidance on Contraceptive services with a focus on young people up to the age of 25 (PH51 62 ) used a CBA approach which was built up upon the cost per negative outcome avoided (i.e. CEA) on a range of criteria, such as costs of maternity care, miscarriage, ectopic pregnancy and stillbirth, which had all previously been estimated in prior NICE research. The model estimated probabilities of each event occurring with and without the intervention. These could be combined formally (including the costs arising to future governments of benefit pay outs) to calculate the cost savings attributed to the intervention. The approach was found to be dominant. It would therefore similarly be dominant for CUA, regardless of any conceivable threshold used (assuming the same costs were taken into account in both cases). A similar approach was used for Domestic violence and abuse: how health services, social care and the organisations they work with can respond effectively (PH50 24 ), in which all conceivable costs and benefits were calculated on a societal level and the interventions modelled were found to be dominant. While it was described as a CUA, in such a scenario there is no clear line between CBA and CUA.
Such an approach nonetheless poses challenges. CUA has usually been used as part of an NHS and PSS perspective, and extending it to a public sector or societal perspective makes it unclear how to draw a threshold when non-health expenditure becomes involved. This can (rightly or wrongly) be ignored in the particular cases of PH50 24 and PH51 62 , where the interventions were found to be dominant. But if this was not found to be the case, it is not clear if the normally used threshold would be applicable. Would it be acceptable, for example, to approve an intervention with an ICER of £35,000, but where there are broader implications outside of healthcare? Such questions will need further revisiting in future if CBA does become more common in NICE's modelling. These issues have been discussed elsewhere 4,63,64 . While no consensus has as yet emerged, the unambiguous cost-effectiveness of most of NICE's public health interventions 16,65 make the issue somewhat moot for now.
There is also the risk that in giving modellers free rein over which consequences of the intervention to include when using a societal perspective, they may (unconsciously or otherwise) cherry-pick effects which will help their case in approving the guidance. Even if similar risks are possible using a healthcare perspective, there will be clearer borders and experience in terms of what is and is not relevant. In a societal approach, aspects relating to criminal justice, education, or even tourism or the arts could feasibly be included or excluded at the whim of modellers, and if the PHAC members fail to query the position then these factors will influence all future decision-making. Even if the 2014 methods guidance allows other directorates to use such perspectives, if they are not widely adopted outside of public health it may make like-for-like comparisons of the cost-effectiveness of interventions more difficult between directorates.
On a more positive note, the ambiguity between CBA/CUA/ CEA (and potentially CCA) does offer potential advantages.
Where there is evidence that an approach is effective but no evidence of its magnitude, in the past this has generally been omitted from modelling entirely. CCA could be used to measure the direction of such issues and checked for dominance or what-if analyses to investigate whether the required effectiveness of interventions was plausible. PH51 62 offers what appears to be an equivalent way of structuring the problem. It implicitly assumed that the intervention's outcomes were better than the alternative (e.g. no ectopic pregnancy is better than an ectopic pregnancy), making intuitive sense, even though mortality estimates for these were zero and utility measurements for each were not available. Such an approach could be further extended where committees are willing to explain the logic of their assumptions, allowing formal quantitative approaches to be used in settings where they are not currently available.
The 2012 methods manual 9 stated that CCA had previously been used 'implicitly' for trade-offs between cost-effectiveness and equity (and other concerns) at other stages of the guideline development process, but could now to be adopted more formally. Even within the economic modelling stage, a type of pseudo-CCA was used in "Alcohol-use disorders -preventing harmful drinking" (PH24 41 ). The first time it was used more formally was in PH44 41 . In the past, equity considerations have been considered out of scope according to the reference case. While other areas of the reference case have been considered flexible on a case-by-case basis, the change in emphasis underway may permit equity trade-offs to be considered more explicitly as part of CCA modelling and that "the sets of consequences have been implicitly weighted should be recorded as openly, transparently and as accurately as possible" and "various tools are available to support this part of the process" 9 . Again, for a variety of reasons, MCDA techniques may be well placed to carry this out.
These equity concerns -which are in theory fundamentally important to public health decision-making -could also be given some formal weighting. Where equity factors are seen to influence the results of the guidance this will likely lead to some controversy -but it is worth bearing in mind that such considerations are already carried out without the help of a formal model and that opening such decisions up to some level of scrutiny seems entirely appropriate. If the equity concerns are seen to be too influential and costly in terms of societal health, or vice versa, then the weighting on equity for future appraisals could be adjusted. At present the decisions of PHACs cannot be held to the same level of accountability in such circumstances.
On the other hand, even if a workable definition of equity was specified 66 , then assessing what weight is appropriate for equity is not immediately clear 67 . It is plausible that this might not be necessary however; for example, MCDA could be used as part of a what-if style analysis. If, for example, an intervention would only be worth investing in if equity concerns made up 90% of the total decision, then this might provide a clear justification to decision-makers for refusing to invest. Over time, a de facto threshold level may emerge through precedent and prior experience, though this step may not be necessary to improve decision-making processes.

Limitations and challenges
Similar to other studies that are reviews of previously published studies, our review suffers some limitations. These include limitations of sample size, availability of data, type of data extraction and self-reporting. It was difficult to extract the data about economic modelling as there was no clear template for reporting PH studies and thus, all the documents had to be searched to extract the information about modelling. Whilst it is possible that some data could potentially be missed, we are confident in our review process extracted all relevant information. This review only extracted information at an overview level (i.e. type of economic evaluation methods) and involved qualitative synthesis. It would have been useful to extract detailed information (e.g. the results, whether the intervention was deemed cost-effective, other benefits considered beyond QALYs) and analyse the underlying relationships using quantitative techniques. The data extracted is based on the CPH authors' interpretation of the economic modelling, however, given the established nature of this field we are confident that our interpretation of the CPH reports are as the authors intended. Regarding the number of studies included, we have a reasonable sample size as our review included all the topics in CPH from inception to the end of 2014. It should be noted that the new methods manual that came into effect in 2015 (including even how public health topics are coded) and thus our review only included studies prior to the new methods manual. Future research on the differences before and after the introduction of the new methods manual would be useful.

Conclusions
Of the 56 eligible guidance topics, 43 were found to have used economic modelling while 13 used no formal economic model. In total 61 economic models were published over the relevant period. Models were shown to vary considerably from each other, despite the reference case. Both health and non-health issues were regularly taken into account, the economic evaluation methodology used varied from case to case, and the reference case was clearly applied flexibly -most obviously in the fact that its specified public sector perspective is not the most common approach used.
Another interesting finding was that under certain circumstances, certain economic approaches are indistinguishable from each other and effectively equivalent. However, in other cases the same threshold is used regardless of the perspective and methodology used, which may lead to inconsistent decision-making.
It seems an oversight that equity is not considered at the modelling stage -which is the one stage that uses formal quantitative decision analytical techniques that are incorporated into the guidance process. It would be interesting if equity concerns, and the trade-offs and opportunity costs associated with them, could be formally compared. This could lead to more consistency and would greatly improve the transparency of the process. At present we have no way to look retrospectively at these issues, and we can only hope and trust that deliberations dealt with them even-handedly and comprehensively. Publishing a list of equity issues (or perhaps other factors) taken into account as part of deliberations subsequent to economic modelling for each topic could also allow retrospective quantitative analysis of such issues in future.
Upon its foundation, the CPH represented an ambitious attempt to impose economic evaluation methodologies on disparate public health settings. As such there was no clear roadmap of how best to approach such issues, which instead needed to be worked out by trial and error. This is reflected in both the changing reference case and in the related changes in the types of economic evaluations techniques used. As such, inconsistencies between guidance topics are understandable. But this combination of being both novel and clearly quite tricky to get right may in future merit the further investigation of formal techniques to address such issues. MCDA approaches could offer some benefits, in principle ensuring that decision making is more transparent and consistent between appraisals.

Data availability
Underlying data All data underlying the results are available as part of the article and no additional source data are required.

Alastair Fischer
Office of Health Economics, London, UK The paper under review is the culmination of what must have been an enormous amount of work. I know, because I was unwilling to undertake a similar job myself, even though, as a NICE insider, I knew a small amount more about where to find things on the NICE website than most of the authors of this piece. Additionally, I knew why certain things were done the way they were, because I either knew the history of the organisation that would allow us to understand why a particular decision was made, or because I had drafted the decision myself. (Such decisions would then have been vetted and endorsed by people higher up the NICE hierarchy and by the essentially independent committee that made the decisions.) I was the lead health economist for about 40% of the Public Health guidelines, and I was the lead author for those parts of the 2102 and 2014 NICE Methods Guides that dealt with Public Health.
Like the first reviewer, I found the long introductory section hard to read and follow. My initial reaction was that the cart was being put before the horse, and that much of the introduction should either be in the Results or the Discussion section. I now tend to think that most if not all of it should be removed, because the points raised do not add a lot to the paper. The focus of the paper could be on summarising the raw data by promoting the online appendix table of all the topics considered, together with their titles and what kind of economic analysis (if any) was presented for each topic. More effort should be put into the presentation of the Table if and when it is promoted to the main paper. The reason for this change is simply that the Table is the most important piece of evidence and should not be put where few people will read it.
The main reason for removing the bulk of the discussion is that to my mind it misses what I think is one of the most important ideas regarding the way that NICE conducts its appraisals. That is the notion of social value judgements (SVJs), and as far as I could see, almost nothing was written in the paper about them. These values create a mosaic that underpins the way that governments rule over their populations. They are fluid and thus are capable of being changed over time. They can rarely if ever be derived by theorems. They can vary dramatically within a subtle context change. For example, consider a society composed of two equal-sized groups, called the advantaged (A) and the disadvantaged (D), where (on average) the members of A have a life expectancy of 2 years more than those of D. Suppose for the same amount of money, the government could increase life expectancy of D by 2 years without changing that of A (thus making each group of the same life expectancy); or increase the life expectancy of both A and D by one year (thus maintaining the 2-year gap); or increase A's life expectancy by 2 years and leaving that of D unchanged (thus increasing the gap to 4 years). Individuals asked what the government should do will tend to opt to remove the gap if A is "rich" and D is "poor". But if A is "women" and D is "men", the answers given tend to be different, and if A is "non-smokers" and D is "smokers", the answers differ again. Few individuals would wish to favour the smokers group. Yet the scenarios are the same apart from the labels given to A and D. The 3 sets of labels (wealth, gender and smoking status) are quite distinct from each other, but in the real world, even small changes in context will change the answers as to where the money to increase life expectancy should be spent.
For these reasons, it seems to me, NICE has never accepted equity weights. Equity is still part of the modelling process, broadly defined, but not part of the automated model that measures cost effectiveness. The submission I am reviewing suggests giving (automated?) weights for equity. It references articles by Rawlins whose arguments are against weighting, but the submission does not debate the issue as it should if it wishes to argue for change. Something similar occurs with MCDA in the submission: it argues for weights but gives very little or no evidence, especially in terms of differentiating a set of weights given to an SVJ from that of a parameter that is being estimated.
The submission appears to chide NICE for saying that a perspective is "public sector" but only derives an estimate from an NHS perspective. However, it does not link the perspective to the statement in the 2012 methods guide that says that if an intervention is estimated to be cost saving or of very low cost, its estimation by a model may not be required. If an intervention is cost saving or of low cost from an NHS perspective, and other parts of the public sector are then considered, the addition of other public sectors will not usually reverse a cost effectiveness decision. Under the tight timelines that NICE sets itself and its contractors, a decision to broaden a perspective must be made early on and in doing so, will increase the cost of evidence collection.
Broadening the perspective to a societal level will often require additional sample surveys or other relatively expensive data collection. At the end of that process, the societal threshold ICER will generally be unknown, in that it requires data from hundreds if not thousands of disparate PH and other healthcare interventions to build up a library of societal ICERs.
The submission seems to wonder why a CUA was not employed for a pregnancy topic. Perhaps it should have been in the methods guide that in many aspects of pregnancy including its termination, QALYs are not appropriate, because the answer depends on a hotly-contested SVJ. How many QALYs are lost if a foetus miscarries naturally? What if the baby was not wanted by its mother or the family? What if the mother did not abort the foetus (even if unwanted) but then decided to keep the baby after it was born? Does that change from "not wanted" to "wanted" gain a lifetime of QALYs? If so, does an abortion lose a lifetime of QALYs? If QALYs and DALYs are ruled out, then so is CUA. In that case, a form of CEA may be a possible substitute.
What we call CCA is where two or more components of benefit do not have a well-defined rate of exchange. In motorised transport, we are interested in health in terms of the effects of accidents, pollution and climate change, but we are also interested in safety, speed, comfort, reliability in several aspects (Will a car break down? Will a train be on time?), congestion and prestige. Cycling and walking have additional health benefits. Many of these aspects can be converted into money terms, which is why CBA has been used by transport planners. Thus NICE has to allow CBA when dealing with a transport PH guideline in order to summarise the literature. However, CBA flounders when it comes to equity (because it assumes the goal is to maximise profits without considering equity). Since the transport evaluation literature is full of CBA studies, to include equity, the NICE Methods manual for PH had either to say that CBA and equity were two components of an overarching CCA or it would have to look at the CBA and then spend time in committee seeing whether equity considerations changed any of the recommendations based on the CBA. In practice, I don't think that there is any substantive difference between these two options. Very often there would be a number of recommendations that would have been made on the topic under consideration, provided that the model had shown that the main intervention was cost effective, which was almost invariably the case. LG took a much broader interest in the welfare of its populations of which health was only one aspect. In addition, LG had far more constricted budgets than those of the NHS. The emphasis switched from carrying out those projects that would maximise the health of the LG Area population in the long run, to carrying out those projects that would generate the highest net benefit (in aggregate) by the time of the next LG elections, held every four years. The net benefit was normally not simply a health benefit, but involved welfare more generally, and would be closer to a public/local-government perspective. The PH budget had to stretch from the traditional PH set of projects previously carried out by the NHS to include existing LG PH programmes such as the filling of pot-holes and the upkeep of parks and gardens. Further, while there was an imperative for the NHS to provide a service that NICE had shown its ICER to be lower than the threshold ICER, there was no such imperative for LG to adhere to. To that extent, NICE PH Guidelines for LGAs became ways of suggesting to each LGA the most effective way of undertaking a PH programme, to the extent that the LGA was able to undertake it. In other words, the cost effectiveness that was estimated for programmes and interventions in PH and carried out by LGAs was aspirational rather than a guide as to what really should be done.
I think that this background is likely to be a more interesting explanation of what CPH did than the commentary that you have given in your paper. I think it is a separate paper. Rather than writing a detailed discussion about what I consider to be relatively minor points, what I think would make a better paper would be to compare what NICE has done in terms of perspective with what the rest of the world has been doing over the same time period. However, that might better be done as a separate piece, as it could be a lot of work.
Some other points. First, some follow-up remarks along the lines already established above. One very important omission in your submitted paper is the lack of reference and discussion of the time horizon in each guideline. Mostly the time horizon should be a lifetime horizon, but for infectious diseases that have not been eradicated (and that is almost all infectious diseases except smallpox and at a pinch SARS, some African parasitic diseases and effectively polio) the time horizon should exceed a lifetime. Only very rarely will the effect of an intervention in public health stop short of a lifetime. The effect of reducing a time horizon to within 4 years (as is done for LG interventions) is to render almost no PH intervention cost effective. The other extraordinary problem with cost effectiveness analysis carried out by NICE on behalf of LGAs is that LGAs are expected to pay for the intervention, but the beneficiaries of the interventions are not LGAs, but the NHS! Who in LG in their right minds would carry out PH interventions, unless they were required to do so by central government, without being recompensed for the benefits that will occur many years later?
Second, some remarks about the presentation and manner of writing of the submission. I think that the paper could do with a good edit. There are a few typos and words out of place. But there are many places where what is being suggested is exceedingly vague. In many places, I found myself puzzled by not knowing what was meant. You get judged by those who read your work, and it is in your interests as authors to make it easy for your readers. Many readers will give up before the end of this paper if it remains in its present state. Normally as a reviewer I will point out phrases that do not make sense for me or whose wording could be improved, but if I did that with your paper, there would scarcely be a line untouched.
I suggest that you make a list of papers from PH1 to PH2 to the last paper in 2014 as the first part of your references, and make these references 1, 2, etc. They should all be followed by an https link to the exact bit of the guideline you wish the reader to access. The authors present a useful introduction and background to the guidance for public health evaluation, including the criteria for the reference case. The authors discuss some of the chief differences from pharmacoeconomic modeling for HTA. Then, the methods for review are presented, followed by detailed literature results with many specific examples and a comprehensive discussion of the findings along with some recommendations on how to address issues specific to public health assessments.
Overall, this is a very nice addition to the literature and certainly a timely one, given that decisionmaking around the current global pandemic may rely on such public health assessments.
Overall Comments: The introduction and background were very useful and provide a very good review of the issues for readers who are more familiar with pharmacoeconomic appraisals, for example. However, some orientation for the reader would be useful up front. The paper introduction could be separated from the comprehensive background.
The methods section was appropriate, but some of the results later presented did not specifically match the methods. Some revision of the methods to make them more tied to the objectives and results will help the reader understand the results of the review.
The results section was very detailed with numerous useful examples from the papers. I believe it would benefit from some more structure, perhaps using the original study objectives to organize the findings into subsections. I also recommend that a table might help to compare/contrast the attributes of studies within each objective, if possible quantifying the results to some degree for each objective. Figure 2, outlining the literature results, is useful to observe the landscape of methods among the included studies. Additional exhibits would help organize the results.
Once I read the results, I went back to the methods to see if the extraction criteria were listed. There was a bit of a disconnect between some of the quantitative findings and what I had expected from the methods.
Additionally, some of the results appeared to be more appropriate for a discussion section, such as on Page 10 text about what modelers should include and what sensitivity analyses to conduct and then specifically this text: "The complex negotiations around inter-sectoral effects are well beyond the scope of this paper.....is in itself a decision, and a rather nihilistic one at that." These points are well made, but deserving of their own place in Discussion, in my opinion.
Specific comments: Note: In addition to the CPH reference case guidance, are there additional resources for quality reporting of public health models, similar to the CHEERS checklist for CEA studies? The authors may consider citing guidance in the recent 2019 book by Edwards and McIntosh about applied health economics for public health.
Page 6: In the paragraph about economic modeling, the following text is vague and could be more concisely and technically described: "In cases where it is not appropriate or possible to further model results, .....if such a thing were possible, and to indicate the magnitude, 'wheres' and 'hows' of the underlying uncertainty ....." Page 10: in the Discussion of Weatherly et al. , the comment, "(though this was felt to be an overestimate)"; Was this a conclusion of Weatherly et al. or your own? If the latter, it should be explained why.
Throughout: Some references contain hyperlinked citations, but some do not include the citation, for example on Page 7: Strategies to prevent unintentional injuries among under-15s (PH29).

If applicable, is the statistical analysis and its interpretation appropriate? Not applicable
Are all the source data underlying the results available to ensure full reproducibility? Yes

James O'Mahony
School of Medicine, Trinity College Dublin, Dublin, Ireland

Overview summary
Reddy et al. describe considerations of cost-effectiveness in the context of public health decision making in the UK. They differentiate the public health evidence generations processes from the standard appraisal processes applied by NICE. They describe some of the particular considerations relevant to the appraisal of public health interventions, including the relevance of different analytical perspectives other than that of the health system, equity considerations and the challenges of quantifying health effects and other benefits.
They then review all public health guidance reports issued by NICE's public health function between 2006 and 2014. The review assesses the types of economic appraisal approaches used, the analytical perspectives adopted and examined what measures of benefit were assessed. The results also include a number of illustrative examples of appraisals that explore the relevance of unquantified benefits and the role of decision makers in responding to such evidence gaps.
The report complements a detailed background section with a comprehensive discussion of the findings and broader challenges of appraising public health interventions.

Primary comments
The paper does not have a separate introduction section. I think it would benefit from a short section that very briefly outlines public health decision making vs NICE's usual decision processes, explains what the research question is, makes a statement of novelty regarding this research and signposts the structure of the rest of the manuscript. Currently the background section serves as both an introducing to the topic and a rather detailed examination of many of the issues involved. I believe a separate short introduction would make the paper more accessible at the beginning.
The manuscript begins to flow naturally from the results section onward, but I found the background section a little long. The reader is presented with a lot of material before they even arrive at the methods. You may enhance readability if you can find any material that could be cut or safely moved to the discussion. Streamlining the background will reduce reader fatigue and I think will enhance the impact of your work.
Page 5, first paragraph You may wish to mention a typical public health intervention in order to frame the difficulties of generating evidence for such interventions. For example, I remember an interesting seminar in which the challenges of quantifying the benefits of park facilities was mentioned. This helped me differentiate challenges in appraising the benefits from such amenities from pharmaceutical interventions.
The sole significant substantive point that I need to raise in this review regards the review's methods section. It does not describe what information you extracted from the reports. While it is already implied from parts of the background and the results themselves, I think it is necessary to formally state what information you sought and, more generally, what critical appraisal techniques you applied to assess the reports. Stating this explicitly will prime the reader for the results section. One example of the lack of formal description of the methods is that limitations section states the analysis conducted a qualitative synthesis, yet this was not stated in the methods section.
Equity considerations receive considerable mention in both the background and discussion sections, but do not appear to be mentioned in the results section. Even if the analysis found no formal consideration of equity considerations in the reports, it would be useful to explicitly state this in the result section.
Page 9 In the paragraph in which inconsistencies with the stated perspectives are noted in several reports, you may wish to cite the reports in order to guide readers to the examples. I do, however, understand that you may not wish to "name names" so to speak and omit such citations.

Page 9
Where you state that a PSS perspective had little or no impact, you may wish to be a little more explicit. I presume it means that adopting a broader perspective did not change the relative position of an ICER to the cost-effectiveness threshold or alter the overall policy recommendation. The finding that the PSS perspective did not change decisions in these cases seems an interesting finding that is worth being more explicit about.
While the comments above include some suggestions for additions to the manuscript, I think it would be a useful goal to bring the overall word count down if possible in order to enhance accessibility of the manuscript.

Minor comments on the text
Abstract Is the following sentence missing a word? "The review identified 56 eligible pieces of public health over the relevant period." Definition of Acronyms I note NICE use CPHE as the acronym for the Centre for Public Health Excellence. Would there be an advantage in consistency with this nomenclature? Alternatively, if the acronym NICE has used has changed over time, then clarifying this might be helpful.
The first instance of citation 6 is to Rawlins et al. paper. From the text I was expecting to see a link to a guidance document such as the following; could you clarify?
Page 3 The meaning of the following sentence was not obvious. You may wish to clarify. "Further economic evaluations are now also permitted than were initially the case." Page 4 Can you cite a source document for the information in Table 1?
Page 5