Skip to content

Nature Reviews Drug Discovery – Informing single-arm clinical trials with external controls

Nature Reviews Drug Discovery – Informing single-arm clinical trials with external controls

Randomized controlled trials are the accepted standard for evaluating investigational therapies, but such trials are sometimes not an option for reasons of ethics or feasibility. Here, we discuss opportunities to address evidence gaps by using historical clinical trial data and real-world data in external control arms for single-arm trials, as well as the associated challenges.
Control arms in randomized clinical trials (RCTs) have a key role in isolating the effects of the medical product from those caused by other factors. However, in some cases — such as phase II trials in oncology where no effective standard of care exists, or trials for rare diseases with a very small number of eligible patients — a concurrent (placebo, active or no treatment) control may not be feasible or ethical. In these cases, single-arm trials are often used, but this remains controversial because of the difficulty of distinguishing treatment effects from other factors1. This means such trials may provide preliminary support of activity, but are typically inadequate for quantification of the treatment effects and an understanding of possible biases.

Using a well-characterized and well-balanced external control that is carefully constructed from patient-level data could help address evidence gaps in single-arm trials under certain circumstances. In some cases, such external controls could also be used to augment randomized controls when studying interventions in indications for which RCTs have ethical or operational limitations. New uses of established statistical methodologies (for example, propensity score matching for balancing clinical trial participants)2 and technological advances are providing rigorous approaches to data curation and analysis to facilitate creation of such external controls. Two types of patient-level data are relevant: data from historical trials; and real-world data collected as part of routine care, including electronic health records, claims and billing data, registries, and data from personal devices and health apps. In this article, we highlight characteristics of external controls and selected applications so far, and discuss the opportunities and challenges for the future.

Characteristics of external control arms

In single-arm trials with no concurrent randomized internal control group, external controls — primarily references to static outcomes as documented in published literature — have been used to help benchmark the likely effects of a new treatment in indications for which randomization is not ethical1. For example, in 2006, the FDA approved alglucosidase alfa for the treatment of patients with Pompe disease, a debilitating, progressive and often fatal disorder. The approval was supported in part by a comparison between the number of patients who died or needed invasive ventilator support in a single-arm trial involving 18 patients with infantile-onset Pompe disease and the expected 98% mortality rate estimated from a chart review of 61 untreated patients with infantile-onset Pompe disease. However, signals of efficacy relative to static external benchmarks are typically not as clear as this, and are considered inadequate for regulatory decision-making in many indications, although such data may serve as supporting information to guide clinical practice.

FDA guidance published in December 2019 highlights the lack of random assignment of patients as a potential shortcoming of external controls (see Related links). This may result in differences in patient characteristics or concomitant treatments in the trial population compared with the external control, yielding differences in outcomes that are unrelated to the investigational treatment. A strategy to mitigate this shortcoming is to develop external controls based on selection of individual external patients through well-defined eligibility criteria and statistical balancing of baseline characteristics (for example, via propensity scores) using patient-level data generated in one or more previous clinical trials or in clinical practice. A propensity score is the probability of treatment assignment, or enrolment in a group of a clinical trial offering a particular treatment (or treatments), conditional on observed baseline characteristics. By matching, stratifying, weighting or adjusting for the propensity score, the distribution of observed baseline covariates should be similar between the investigational arm and the external control, facilitating fair estimation of the treatment effect, similar to a randomized trial2.

When created from external clinical trials data, external control arms may have additional rigour owing to the collection of traditionally defined efficacy and safety outcomes, as well as thorough assessments of baseline characteristics, medical history and compliance with study treatment. However, balancing covariates that are unknown or unmeasured in the historical data is nearly impossible. Elucidation of the sensitivity of the observed treatment effect to a possible imbalance in unknown covariates (for example, through tipping point analyses) can help characterize the risk of such covariates being important (see Related links).

Example applications of external control arms

An example of the external control arm approach is provided by a study on blinatumomab3, which received accelerated approval for the treatment of Philadelphia chromosome-negative relapsed or refractory B cell precursor acute lymphoblastic leukaemia largely based on a phase II single-arm trial. This trial was compared with patient records from European national study groups and large individual sites from Europe and the United States using a weighted analysis and an external control arm built using propensity score methods. Before matching, there were statistically significant imbalances between blinatumomab-treated and historical patients for six of the eight baseline characteristics considered. After matching, there were no significant differences in any covariates except for region (there were more patients from Europe in the historical data set). Both weighted and external control analyses allowed more accurate estimation of the treatment effect than would otherwise have been possible.

The validity of an external control arm comprising historical clinical trial data has recently been examined by a Friends of Cancer Research working group in pilot studies for non-small cell lung cancer and multiple myeloma (see Related links). Both case studies created an external control arm by combining completed clinical trials in patients with these diseases drawn from the Medidata Enterprise Data Store, a proprietary platform comprising more than 17,000 clinical trials with patient-level data recorded electronically, as well as Project Data Sphere (see Related links). In both case studies, overall survival in the external control arm closely matched that of the target randomized control with regard to survival times, hazard ratio estimates and hypothesis tests.


As the quality of real-world data and methodology for its collection evolve, real-world data could also be used to construct external control arms, which might be particularly useful in cases where suitable historical clinical trial data are limited. The potential of real-world data and historical clinical trials data is increasingly being considered by regulatory agencies4,5. The Framework for FDA’s Real-World Evidence Program discusses the potential use of single-arm trials with an external control using real-world data to support additional effectiveness claims for marketed products. The framework document also highlights possible limitations, including difficulties in reliably selecting a comparable population due to potential changes in medical practice, a lack of standardized diagnostic criteria or equivalent outcome measures, and varying follow-up procedures. The challenge of reliably selecting a comparable population despite possible changes in medical practice may be mitigated by careful prespecification of eligibility criteria, use of the most recent external data, and adjustment for factors probably related to changes in standard of care (for example, stage of disease at diagnosis) in the matching or weighting procedure. Relying on objective end points such as overall survival may help overcome a lack of standardized diagnostic criteria associated with more subjective outcome measures and variable follow-up procedures.

Owing to the ability to balance baseline characteristics, external controls may be more widely useful than the references to summary trial level statistics utilized in the past, where comparability of patients could not be assured. External controls should not replace randomized controls when randomization is ethical and feasible; however, external controls could enhance trials for promising therapies in difficult-to-study populations.