Keywords
Network Analysis; Randomised Trial; Generalised Anxiety Disorder; Adult Psychiatry; Mental Health
Research has suggested that network analysis can be used to identify important pathology symptoms and inform targeted treatment plans that could lead to more efficacious outcomes in clinical trials. However, unless it can be demonstrated that network models are stable, including when accounting for moderating variables, NA-derived treatment plans may not be appropriate to implement.
We aim to assess the stability and invariance properties of two commonly used anxiety outcome measures to determine the suitability of NA methods to inform treatment plan design in clinical settings.
Individual participant data (IPD) for large multi-trial samples will be accessed via Vivli.org. Exploratory graphical analysis will be used to model empirical networks pre- (baseline) and post-treatment (outcome) for the two most commonly used outcome measures in antidepressant clinical trials, namely the Hamilton Rating Scale for Anxiety (HAM-A) and the anxiety subscale of the Hospital Anxiety and Depression Scale (HADS_A). Bootstrapping and permutation techniques will be used to determine the stability and invariance properties of empirical networks in relation to a range of moderating variables, such as age, sex, treatment type and symptom severity. For networks that are unstable or partially invariant, we will examine item redundancy and remove non-performing items to pursue stable/invariant abbreviated models.
This study will determine the suitability of applying NA methods in clinical trials. Findings could inform the way in which clinical trials, and other such research, are conducted. If outcome measures are stable and invariant, then NA methods will have demonstrable utility to inform more efficacious treatment plans. However, if NA is not found to be suitable, its validity as a robust analytical approach will be questionable.
Network Analysis; Randomised Trial; Generalised Anxiety Disorder; Adult Psychiatry; Mental Health
Symptoms caused by psychopathologies such as anxiety or depression can vary from person to person and there are a number of different ways these can be assessed. The most common is the use of clinician rating scales or patient reported outcome measures. These types of measures allow clinicians and researchers to measure the severity of common anxiety symptoms, such as low mood, low energy and insomnia. Symptom severity is typically measure using a rating response format, rating symptom severity on a scale from 1–5 (for example), with 1 indication mild severity and 5 indicating high severity (Vagias, 2006). A wide array of outcome measures are currently available, such as the Hamilton Anxiety Rating Scale (HAM-A; Hamilton, 1959), Hospital Anxiety and Depression Scale (Zigmond & Snaith, 1983) and Beck Anxiety Inventory (BAI; Beck et al., 1988a), which measure anxiety, or the Hamilton Rating Scale for Depression (HRSD; Hamilton, 1960), Montgomery-Åsberg Depression Rating Scale (MADRS; Montgomery & Åsberg, 1979) and Beck Depression Inventory (BDI; Beck et al., 1988b), which measure depression.
While there is some overlap in both diagnostic criteria and symptoms assessed between and among anxiety and depression measures, different measures can consist of very different symptoms. For example, one study examined the convergent validity of 52 symptoms across seven commonly used depression measures, including the HRSD, MADRS and BDI (Fried, 2017). It was found that 40% of symptoms assessed were unique to a single measure. These findings were supported by a recent systematic review of 388 different outcome measures used across 450 depression randomised trials of various treatment-types (Veal et al., 2024), which noted that the most commonly used measure (the HRSD) only accounted for 59% of 80 depression domains that matter to patients. Even if high levels of convergent validity are observed, it is still possible that measures that purport to assess depression may in fact assess completely different constructs. Fried (2017) provided a reproducible example, whereby 40 disparate items with minimal inter-item correlations (r=0.1) could be distributed into separate measures of 20 items each with no overlap and resulting sum scores could show high correlation (r=0.69). This was argued to demonstrate that high convergent validity can be achieved even between measures that consist of minimally related individual items. Not only does this make it difficult to operationalise or ‘reify’ a unitary latent construct of conditions like anxiety or depression (Jones & Robinaugh, 2021), it also becomes difficult to identify what might be the most important and impactful symptoms.
Network analysis (NA) is a modern psychometric method that can be used to explore complex patterns and interactions in outcome measure data. Network theory eschews common assumptions of psychopathology that psychological problems are caused by disease entities which are independent of their symptoms (i.e. latent constructs; Borsboom, 2017). Rather, NA methods adhere to assumptions that are more broadly accepted in clinical practice, such as the interaction of thoughts, feelings and behaviours, outside of the influence of latent constructs (Jones & Robinaugh, 2021). This is achieved by plotting inter-relationships or “edges”, among symptoms, or “nodes”, while appreciating that these symptoms can be empirically related, often for unknown reasons (Borsboom, 2017). This allows clinicians and researchers to move away from the need to reify constructs like depression as latent variables and focus on identifying the key, or “central”, symptoms in patient depression networks, both statistically in terms of measurable treatment outcomes and experientially in terms of patient experience of given symptoms (Borsboom & Cramer, 2013; Borsboom, 2017).
In recent years, network analysis has seen a surge in popularity and has been used to help understand pathology and predict different types of outcomes across a wide range of domains. For example, a recent study (Elliott et al., 2020) used NA methods to analyse data from a clinical trial of people with anorexia nervosa (n=142) and found that high centrality symptoms, including “Feeling Fat” and “Fear of Weight Gain” were strongly related to prognostic utility (r2=0.52, 0.55, respectively). Another study examined a clinical sample (n=58) of people with mood and anxiety disorder with the aim of predicting patient dropout (Lutz et al., 2018). Baseline network models were found to be non-invariant between completers and dropouts, meaning the way in which symptoms related to one another differed significantly between the two groups. Using two-tailed p-values test, the difference in the dropout model was found to be characterised by low centrality of ‘feeling nervous’ and ‘being active’, suggesting these symptoms were less influential within the dropout groups network. Analyses based on these models correctly identified 47 of 58 patients who subsequently left the study, outperforming any other single predictor, such as sex, and it was argued that inspection of baseline network models could be used to predict dropout in such trials.
Several studies have examined the potential for NA to inform more efficacious treatment plans in antidepressant clinical trials and other research, with promising results (Bringmann et al., 2015; Maciaszek et al., 2023; Park et al., 2021). In one such example (Maciaszek et al., 2023), treatment efficacy was calculated as the percentage change between pre- and post-treatment outcomes scores for 88 patients in a clinical trial of the antidepressant duloxetine, as measured using the HRSD (Hamilton, 1960). This treatment efficacy variable was then included in a pre-treatment (baseline) network model of depression symptom scores. NA identified “depressed mood” among the most central symptoms and that treatment efficacy was most strongly related to this symptom. This suggested that duloxetine may be most efficacious when depression is characterised by high levels of depressed mood. In addition, the UKU Side Effect Rating Scale (UKU SERS; Lindström et al., 2001) was used to monitor adverse outcomes and this was also modelled in the baseline network. Adverse outcomes were directly related to anxiety (which was the most central symptom) and this relationship was stronger than that between treatment efficacy and depressed mood, suggesting that the efficacy of duloxetine may be mitigated by high levels of anxiety, even when depression networks are characterised by depressed mood. These findings point toward the potentially significant implications NA methods could have in augmenting the design of more efficacious treatment plans in clinical trial settings.
There is a notable absence of research examining the implications of NA in relation to anti-anxiety treatment clinical trials, and what research can be found often examines community samples and typically examines anxiety as a comorbidity within broader network models (e.g., Fisher et al., 2017; Jin et al., 2022; Levinson et al., 2017; Yohannes et al., 2022). While it is useful and informative to build an understanding of the relationship between anxiety and different comorbidities, a key limitation here is that it is not possible to determine important underlying network modelling assumptions (such as configural or metric invariance) of different anxiety measures (Christensen & Golino, 2021; Jamison et al., 2022). Configural invariance assesses the stability of a network model and exists when the network structure is consistent and sub-groups of nodes, called “communities”, are stable across moderating variables, such as age or sex. When configural invariance holds, it suggests the basic symptom network structure is comparable across groups. Metric invariance reflects the importance or “strength” of different nodes and exists when node strength remains similar across different moderating variables, indicating the influence of particular symptoms are similar across groups. These are important, as unless it can be demonstrated that configural and metric invariance exists in network models, clinicians and researchers may not be able to utilise NA to inform treatment plan design, as recommended treatments may not be appropriate for all participants or generalisable to the broader population.
This study aims to evaluate the suitability of NA for use in clinical trials by examining configural and metric invariance of commonly used outcome measures of anxiety. Specific aims are: (a) to specify network models for outcome measures frequently used in clinical trials of pharmacotherapies for anxiety pre- and post-treatment (b) to assess stability characteristics (i.e. configural invariance) of resulting models (c) to assess metric invariance of network models in relation to important moderating variables, and (d) to determine optimum outcome measures for use with NA methods in clinical trial settings by comparing stability and invariance indices of different measures.
This study will conduct secondary analyses of existing data from anxiolytic trials randomised trials. Individual participant data (IPD) will be accessed via the data repository Vivli.org (2024). We will specifically target treatment trials of generalised anxiety disorder (GAD), as measured using two commonly adopted outcome measures, namely the Hamilton Anxiety Rating Scale (HAM-A; Hamilton, 1960) and the Hospital Anxiety and Depression Scale- Anxiety subscale (HADS-A; Zigmond & Snaith, 1983). Preliminary searchers of the Vivli repository indicate that six GAD treatment trials (n=2,334) use each outcome measure, allowing for direct comparison of performance, and may be eligible for inclusion. Data will be collated according to outcome measure, with potential moderating variables representing age, sex, geographic location, treatment-type and symptom severity (as well as other such potential modifiers, as may be available in the data). Network models will be estimated pre- (baseline) and post-treatment (outcome). The post-treatment follow-up interval will be 8-weeks, in line with previous similar research and to maximise sample size (Byrne et al., 2025b; Byrne et al., 2025a; Cipriani et al., 2018; Doyle et al., 2023). Invariance at pre- and post-treatment will be assessed at two levels. Configural invariance will be assessed to ensure the structure of the model and constituent communities are stable across moderating variable groups. Metric invariance will be evaluated to determine whether node centrality for each symptom remains similar across moderating variable groups. If these measures prove to be unstable or non-invariant, further analyses will be undertaken to remove underperforming items in an attempt to specify optimal abbreviated network models. The abbreviated models will then be subject to invariance analyses to determine their suitability for use in clinical trial settings.
Ethical approval for this study was awarded by the RCSI University of Medicine and Health Sciences Ethics Committee (Ref: REC202410010).
A data access request will be submitted to Vivli.org to obtain access to analysis-ready data for 6 (n=2,334) already-completed GAD treatment trials. Inclusion/exclusion criteria are specified in Table 1. Each of the six trials for which data has been requested used both the HADS-A and HAM-A as outcome measures.
We will evaluate and report the stability and invariance properties of the 14-item HAM-A and the anxiety subscale of the HADS (HADS-A; totalling seven items) in relation to each moderating variable.
Hamilton Rating Scale for Anxiety. The HAM-A (Hamilton, 1959) is a unidimensional measure of anxiety commonly used in randomised clinical trials (Amsterdam et al., 2009; Bradley et al., 2018; Llorca et al., 2002), and assesses anxiety symptoms such as anxious mood tension, insomnia, low mood and somatic symptoms. Severity of different symptoms is measured on a five-point rating scale, rated from 0-4, with higher values indicating increased symptom severity (Hamilton, 1959).
Hospital Anxiety and Depression Scale. The HADS measures both anxiety and depression and consists of 14 items split into two subscales of seven items each. Items are measured on a four-point Likert scale, with 0 indicating the lowest severity and 3 indicating the highest (Zigmond & Snaith, 1983). Only the seven items of the anxiety sub-scale will be included in the current analyses. Similar to the HAM-A, these items reflect issues such as mood, tension and somatic symptoms (see Table 2 for a list of symptoms assessed by each measure).
Specifying Empirical Network Models. Analyses will be conducted using R v4.2.2 (2024). NA modelling will be conducted for baseline (pre-treatment) and 8-week outcome (post-treatment) data, and stability and invariance analyses will be conducted using the package ‘EGAnet’ (Golino et al., 2024). Graphical lasso (glasso) estimation will be used to calculate networks, which EGAnet conducts via the qgraph package (Epskamp, 2023). A walktrap algorithm will be used for community detection, to identify clusters of nodes more closely connected to each other, than the rest of the network, using the igraph package (Csárdi et al., 2024).
Empirical networks will be modelled for baseline and outcome data using the ‘EGA’ function in EGAnet. The walktrap algorithm will be used to identify communities of nodes by implementing random walks between nodes to identify the strongest relationships among sub-groups of nodes in the network. Network loadings will be calculated to represent the between and within community strength of each outcome measure node. Network loadings are used to represent the strength of relationships among different nodes, and are reflective of factor loadings calculated during factor analysis (Golino et al., 2024; Hallquist et al., 2021).
Stability analyses will be conducted to determine the stability, and thus configural invariance, of empirical networks (Byrne et al., 2025b; Christensen & Golino, 2021; Golino et al., 2024). To assess network stability, the ‘bootEGA’ function will be used to bootstrap the empirical models for 1,000 iterations using glasso estimation and a walktrap algorithm, as per the empirical networks. The resulting models will be used to form a distribution of simulated samples, against which item and dimension stability characteristics of the empirical model will be assessed. Multivariate normality testing will be conducted to determine whether bootstrapping should use a parametric or resampling technique. When the bootstrapped sample distribution is computed, the ‘dimensionStability’ function will then be used to compute stability indices. Item replication scores, which indicate the proportion (percentage) of times each node replicates with a given community, will be inspected to determine node stability across bootstrapped samples. Community replication scores, which are mean replication scores for constituent nodes within a given community, will be examined. Node and community replication will be assessed in relation to a lower threshold of 0.65, below which they are considered to be unstable (Christensen & Golino, 2021). The stability of the specified communities will also be assessed in relation to the frequency with which a given number of communities is identified in the network model during bootstrapping. These analyses will determine if the configuration of the network can be considered stable, thus establishing configural invariance, and will be performed for baseline and outcome models.
If a configurally invariant network model is found, metric invariance analyses will then be conducted using the ‘invariance’ function, as per recommendations by Jamison et al. (2022). Centrality scores will be calculated for each node according to each moderating variable group (e.g. for ‘male’ and ‘female’ participants) and the difference in node centrality between groups will be computed. The resulting values are termed the ‘empirical values’. The configurally invariant model will then be permuted for 1,000 samples independently for each group (e.g. ‘male’ and ‘female’) and respective centrality scores will be computed. The difference in centrality scores between groups will be determined, resulting in a null distribution. The empirical values will then be compared with the null distribution using two-tailed p-values to determine which nodes meet the criteria of metric invariance and which are non-invariant.
Exploring Revised Network Models. If empirical network models are found to be configurally non-invariant, additional stability analyses will be undertaken to identify revised invariant models. Nodes that are found to have a stability score lower than a threshold of 0.65 will be removed and the network will be modelled again. This will continue until all nodes demonstrate acceptable stability scores. If network models consist of multiple communities, the stability of the these will be assessed in terms of the number of times bootstrapped models result in a given number of dimensions, as per above. Metric invariance will then be assessed for revised configurally invariant models. If partial invariance is detected, additional analyses may be conducted to remove invariant items, with the aim of identify revised models that are both configurally and metrically non-invariant. There are currently no agreed upon guidelines for acceptable levels of partial invariance (Jamison et al., 2022), so a threshold of >70% of nodes demonstrating invariance will be adopted as a partial invariance criterion to reflect the mean threshold for item stability (Christensen & Golino, 2021). If fewer than 70% of nodes are invariant, additional analyses will not be undertaken.
Descriptive comparisons of configural and metric invariance outcomes, as well as edge weight and centrality indices, will be conducted between HAM-A and HADS network models to determine which scale (or sub-scale, in the case of the HADS) may be a more stable measure of anxiety symptoms. If the HADS-A network is unstable, further analysis will be considered whereby the full 14-item scale will be modelled to explore if this provides a stable network.
This study will be the first of its kind to assess network configural and metric invariance of the two most commonly adopted outcome measures in nxiolyticclinical trials using a large multi-trial sample. Findings from this study could significantly impact the way in which clinical trials, and other such research, are conducted and analysed in two ways. First, the potential to identify invariant measurement models would open up the utility of NA methods to the design of more efficacious treatment plans by allowing clinicians to identify and target central symptoms. As demonstrated by Lutz et al. (2018), this utility could also be applied to other important outcomes, such as predicting and redressing patient dropout. Conversely, findings indicating that outcome measures are unstable or non-invariant would have adverse implications for the use of NA methods, as any recommended treatment plans based on such analyses may not be appropriate for all patients or generalisable to a broader population.
A key strength of this study is the potential to analyse a large multi-trial sample, which is less susceptible to type 1 errors during invariance analyses. The sample consisting of data from multiple trials also broadens the scope of moderating variables that could be assessed, which would strengthen any potential invariance claims (Christensen & Golino, 2021; Jamison et al., 2022). Another strength is the use of the invariance function in EGAnet. Simulation studies have shown that this can outperform other types of metric invariance analyses, particularly with small or unequal samples (Jamison et al., 2022).
This study will have limitations. Analyses will be conducted within a secure research environment provisioned and hosted by Vivli.org. As such, Analyses will also be contingent upon the resources available within the provided research environment. The sample will also be limited to trials that are accessible through the Vivli repository. Results obtained may not be generalisable to other outcome measures, or to the use of the HAM-A or HADS in uncontrolled or observational studies. In addition, some studies accessed may have particular sampling characteristics (e.g. age- sex- or geolocation-specific sampling), which may impact the generalisability of findings. Finally, the HAM-A is a clinician-rated outcome measure, whereas the HADS is patient-rated. This could influence network indices and invariance analyses in relation to moderating variables. Interpretations of outcomes will need to be assessed with respect to differing methods and perspectives.
Consent was not a requisite for this study, as no data were collected or analysed for this manuscript. Planned analyses will be conducted using secondary data and participant consent for the use of their data in research subsequent to the original study was obtained by respective study sponsors.
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Register with HRB Open Research
Already registered? Sign in
Submission to HRB Open Research is open to all HRB grantholders or people working on a HRB-funded/co-funded grant on or since 1 January 2017. Sign up for information about developments, publishing and publications from HRB Open Research.
We'll keep you updated on any major new updates to HRB Open Research
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)