Skip to content

Data Driven: Mining – Institutions, Funding, and Areas of Focus

Data Driven: Mining – Institutions, Funding, and Areas of Focus


This is Data Driven, a new series that explore our nation’s health agencies’ publicly available data. This data, which is comprised of searchable online databases and files, presents a unique opportunity to analyze information on broad health care topics. We thought it would be interesting to explore data from agencies whose work most closely impacts biomedical innovation, namely the National Institutes of Health (NIH), the Food and Drug Administration (FDA), and the Centers for Medicare and Medicaid Services (CMS).


In this first post, we investigate the NIH-sponsored site, which is a government-funded registry of more than 200,000 clinical studies of drugs, biologics, and medical devices. The site is maintained by the NIH and lists studies conducted in all 50 states and 192 countries. Entries for each trial contain a description of the conditions, interventions, and outcomes being studied, as well as the study design, enrollment, and funding sources.

Overview of the data set

The chart below shows the number of trials registered every month between 2007 and 2016. As you can see, there’s a general upward trend. The first spike in 2007 is due to new trial reporting requirements put in place by the Food and Drug Administration Amendments Act, which stipulated that a broader swath of clinical trials had to be listed on the registry.




The data set consists predominantly of interventional drug trials, as shown in the chart below, although a significant number of observational studies are also listed, as well as trials for medical procedures and devices. As for the recruitment status of the trials in the data set, about two-thirds of trials analyzed have been completed or are currently recruiting participants.



Which cancer types have received the most attention?

Not surprisingly, breast cancer was the most commonly studied tumor site. Most other major cancer types are listed in the chart below, including both solid tumors and hematologic malignancies. Many trials are listed as studying several conditions, and thus there is some unavoidable double counting when assessing the relative study of different disease areas. Disease subtypes and alternate spellings were accounted for by bucketing categories by the organ site (with the exception of leukemias and lymphomas). For example, kidney cancer includes renal cell carcinoma, and brain cancer includes glioblastomas.



How has the study of different cancer types changed over time?

The chart below shows that the number of trials conducted per year for most of the common tumor types has increased. Interestingly, the only two cancer types that did not experience an increase in study were blood cancers. Both leukemias and lymphomas were studied slightly more in 2007 than 2015.



Which anti-cancer drugs have been studied the most?

The chart below demonstrates that chemotherapies have been studied far more often than targeted agents, largely because the same chemotherapies are often used in the control arms of randomized studies, and in many instances, represent the standard of care against which new drugs are evaluated. Additionally, many of the chemotherapies on this list have been around for much longer than the targeted agents.



Which institutions have sponsored the most trials?

We looked at the top 30 trial sponsors that were listed as lead sponsors in the registry, meaning they were the organization or home institution of the person who oversees the clinical study and is responsible for analyzing the study data. We then compared the number of trials for which they were lead sponsor and the number of trials for which they were a collaborator, meaning they were listed as a sponsor, but not the lead sponsor. 



Do certain institutions specialize in different areas?

Here we looked at the top sponsors again, this time by the number of trials for the cancer they studied most as lead sponsor. Additionally, here we are looking at trials for all interventions, not just drug-related trials as we did in the previous chart. Below is a snapshot of the varying areas of focus of different institutions.



How does the source of funding affect what phase of development is sponsored?

The chart below shows the relationship between the source of a trial’s funding and the phase of the trial. Oftentimes trials are funded by a single source, such as a pharmaceutical company or the NIH. However, in many other cases, trials are funded through public-private partnerships. The most common public-private partnerships are also displayed below. The phases, or stages of clinical development, of trials in the data set are also shown below. The typical clinical development process consists of Phase 1 trials to establish preliminary safety data, then Phase 2 trials to further investigate safety and obtain preliminary efficacy data, and finally Phase 3 trials, which seek to generate enough evidence of effectiveness to support marketing approval. Phase 4 trials are conducted in the post-market setting and evaluate previously approved products. There are also trials with combined phases, such as Phase 1-2 trials, which seek to speed up the development process.

It is important to note that there are likely many more Phase 1 oncology studies than are visible in the chart below. This is most likely due to clinical trial reporting requirements that exclude certain Phase 1 studies, including studies in which investigational drugs are used as research tools to explore biological phenomena or disease processes.




This exploratory analysis only scratches the surface, and many more insights can be gleaned from the data on One area that has been explored by others is the number of trials that report results on the registry. Regulations require that many trials post results, but as the Boston Globe’s pharma blog STAT points out, many top institutions have neglected to submit this information. Also of interest for further analyses would be a more in depth look at trends over time, to see how different policies have impacted the conduct of clinical trials.

Notes On Our Method

We searched the registry for all oncology-related trials with start dates from January 2007 to August 2016. We chose this period because trial registration requirements were enhanced by statute in 2007. We searched the terms: “carcinoma,” “neoplasm,” “cancer,” “leukemia,” “lymphoma,” or “oncology,” which resulted in more than 40,000 trials. We then came up with a few questions we hope the data could shed light on.

We also wanted to note a few limitations. First, not all trials are required to be registered on the site, including many Phase 1 trials, and it is unclear how many have been left out. For more information on which trials must be listed, see the FDAAA trial reporting requirements webpage. Second, the “sponsor” data element fails to adequately capture the source of funding for a particular trial. For example, a trial may be listed as having been sponsored by Yale Cancer Center, but Yale may have received funding for the trial from the NIH or some other source. If the NIH was not listed as a collaborator on the trial, its role in funding the trial is not highlighted in the data. These limitations aside, this is a rich data source that we hope will provide both scientists and policymakers helpful insights on how research is being conducted and to show a general overview of the clinical development landscape in oncology.


*Michael Shea is a Policy Research Associate and the resident data scientist at Friends. If there is a particular data set you would like to see analyzed, please reach out to Michael at mshea(AT)focr(DOT)org

We thought it would be interesting to explore data from agencies whose work most closely impacts biomedical innovation, namely the National Institutes of Health (NIH), the Food and Drug Administration (FDA), and the Centers for Medicare and Medicaid Services (CMS). In the first of a series, Friends investigated the NIH-sponsored site 


Data Driven FDA NIH Research