Homologous recombination deficiency tests are available from multiple vendors, and every day they are used to determine whether patients stand to benefit from PARP inhibitors.
In an effort to determine whether these tests produce consistent results, a research project led by Friends of Cancer Research sent identical samples from 90 ovarian cancer patients to 17 independent HRD assay developers, asking all of them to determine the samples’ HRD positivity status.
The test results, they found, varied substantially from one assay to another.
“No assay for HRD that I’m aware of perfectly predicts who’s going to benefit or who’s going to be sensitive to these therapies,” said Lisa McShane, an NCI biostatistician whose team, including Ming-Chung Li, Yingdong Zhao, and Zhiwei Zhang, planned and conducted the statistical analysis for the Friends-led research group. “With that in mind, we have to realize that even if two assays have a fair amount of disagreement, maybe they’re picking up different parts of the population that are still going to be benefiting or not benefiting from PARP inhibitors or be sensitive to platinum agents.”
A story about the Friends HRD harmonization project appears in this issue.
McShane, associate director of the NCI Division of Cancer Treatment and Diagnosis and head of the Biometric Research Program, is an expert on utilization of complex biomarkers in the clinic, who played a key role in developing a standardized, prospective approach to trials studying genomic predictors or treatment response. The NCI “omics checklist” was a direct response to three ill-fated Duke University studies that relied on genomic predictors to determine therapy for breast and lung cancer patients (The Cancer Letter, Feb. 7, 2013).
The Friends-led harmonization project demonstrated that the field has a long way to go when it comes to HRD diagnostic testing. With no gold standard and only moderate concordance among available assays, improvements need to be made—especially since HRD testing currently guides treatment decisions in the clinic.
“I think we had some assays where the concordance was in the 30-40% range, and the initial reaction to that might have been, ‘Oh my god,’ but we don’t really know how to understand that concordance. This exercise has said to us, ‘There’s a space here where we need to be investigating things more.’”
The Friends project is a good first step, McShane said. Even perfect concordance in HRD status would not necessarily translate to improved clinical outcomes. Further studies are needed to establish how well any of these tests perform in selecting therapy that provides best outcome for patients.
“We need to understand the implications of the assay discordance in terms of differences in clinical performance,” McShane said. “That is, do these assays perform equally well in distinguishing patients who benefit from PARP inhibitors from those who do not?
“Ideally one would like to compare all assays head-to-head in a prospective clinical trial, but there are a variety of practical limitations in trying to do that.”
The HRD space will only get more complicated.
“As more HRD tests and PARP inhibitors become available, assessments of clinical performance for all those tests will become increasingly challenging,” McShane said.
McShane spoke with Jacquelyn Cobb, reporter with The Cancer Letter.
Jacquelyn Cobb: What brought you to the HRD Harmonization project?
Lisa McShane: I have a longstanding interest in clinical validation of biomarker-based tests and assay analytical validation. I served as a statistical collaborator in several assay reproducibility studies over the years, including studies for immunohistochemical assays for P53 and Ki67 and most recently genomic assays for the complex biomarker Tumor Mutational Burden (TMB).
The TMB project was led by Friends of Cancer Research, so I guess it was natural for them to think of me when they wanted to launch a similar project for the complex biomarker Homologous Recombination Deficiency (HRD).
The goal of both the TMB and HRD projects was to first assess the level of variability across different assays; if variability is found, then we try to understand drivers of that variability so that we might determine whether harmonization of some aspects of the assays might bring them into closer alignment.
When you first mentioned the results of this study to me, I was a bit scandalized—a biomarker that guides clinical decision making isn’t reliably assessed by different assays? Since then, I’ve spoken with Dr. Hillary Andrews at Friends about this and gotten a bit more perspective, but I would love to hear what you think the impact and importance of this study is, if it’s not aligned with my initial surprise.
LM: The study demonstrated two things: a) there is wide variation in the characteristics incorporated into different assays for purposes of calling HRD status, and b) those differences lead to variability in results. Some pairs of assays agree quite well with one another; whereas, others do not, for example less than 50% of the time.
Since all assays include some mutations in BRCA1/2 genes, agreement across assays was generally good for that subset of cases. There was more discordance for cases that showed some signs of genomic scarring, which are typical consequences of HRD, but without BRCA1/2 mutations. The types of “scars” that the assays detected differed and hence so did the resulting HRD status calls.
The implications of this are that for certain patients, their HRD assay results will be highly dependent on which lab analyzed their tumor sample.
Does this issue exist with other complex biomarkers?
LM: Yes, there are many biomarkers that exhibit variability across assays. It is important to realize that even simple one-biomarker immunohistochemical assays can show variability resulting from use of different antibody stains or different approaches for scoring the staining pattern visualized on the slide.
For complex biomarkers, there are more moving parts, so greater chance that different assay methods could introduce variability in results. For the TMB project we saw differences arising from bioinformatic pipelines.
One simple example of a difference in pipeline was that some labs counted all mutations while others counted only nonsynonymous mutations, i.e., a mutation that alters an amino acid and hence may alter the protein.
As expected, labs that counted all mutations (including synonymous) tended to report higher values of TMB. A processing step like this can be harmonized, or one can attempt to calibrate the assays to an agreed upon reference.
In the TMB project we developed a free software tool that assay developers could use to calibrate their assay to results calculated from TCGA whole exome sequencing (WES) data.
Can you speak to how this project relates to access to care?
LM: HRD assays are primarily being used clinically to guide decisions on whether to treat patients with high grade serous ovarian cancer with PARP inhibitors, although their use is being investigated in other cancers as well.
Our study included only high grade serous ovarian cancers. The percentage of cases that were called HRD “positive” varied from 23-74% on the same set of patient tumors, depending on the assay. If the decision to treat with a PARP inhibitor is determined primarily by these assays, then clearly the percentage of patients who would receive that treatment will differ depending on the assay used.
What we don’t know is whether the assays have substantially different accuracy in identifying patients who will benefit from PARP inhibitors. A clinically validated predictive (treatment selection) biomarker test for a targeted drug should be able to substantially enrich for patients who benefit from that drug, but “enrich” rarely means 100% accuracy.
Think of a simple situation with a binary response endpoint where 50% of tumors are biomarker-positive and response rate is 60% for that subset; whereas, biomarker-negative tumors (50%) have a response rate essentially zero.
That would be a very exciting result, but it means that the biomarker test gave an incorrect response prediction 20% of the time.
Suppose a different assay for that same biomarker was 100% accurate in predicting response; then it would disagree with the first assay 20% of the time.
What are the next steps after this project? It seems like it has uncovered a lot of questions and challenges—are there any tangible steps forward?
LM: As suggested by the last example, we need to understand the implications of the assay discordance in terms of differences in clinical performance. That is, do these assays perform equally well in distinguishing patients who benefit from PARP inhibitors from those who do not?
Ideally one would like to compare all assays head-to-head in a prospective clinical trial, but there are a variety of practical limitations in trying to do that.
An alternative would be to evaluate assays on tumor samples stored from completed clinical trials of PARP inhibitors, although if those trials required HRD positivity by a different assay (or only BRCA mutation) as an eligibility criterion, then full evaluation of the clinical performance of an alternative HRD assay is not possible.
In lieu of these options, one can assess concordance of a new assay with an existing assay that had been shown in a previous study to have acceptable clinical performance. If the assay concordance is extremely high, then the good clinical performance imputes to the new assay.
If the concordance is only moderate and the performance of the existing assay is good but not perfect, we can’t be sure how the new assay’s clinical performance compares to that of the existing assay. Next generation PARP inhibitors with broader activity or used in different tumor types may add even more layers of complexity for the evaluation of HRD assay clinical performance.
As more HRD tests and PARP inhibitors become available, assessments of clinical performance for all those tests will become increasingly challenging.
https://cancerletter.com/conversation-with-the-cancer-letter/20240517_2/