The Placebo Arm Problem: Why Second-Entrant Trials May Overstate Superiority | Daniel Tran, PharmD

The Comparison Nobody Questions

When a second-in-class drug reports Phase 3 results, the first thing every analyst, investor, and brand team does is line up the numbers against the first-in-class agent. Hazard ratios, response rates, median PFS — placed side by side in a table, as if the two trials measured the same thing in the same people.

They did not. And the assumption that they did — the casual equivalence of cross-trial comparison — is one of the most consequential analytical shortcuts in pharmaceutical commercialization.

The FDA does not require head-to-head trials for second-in-class approvals. Regulatory approval demands that a drug demonstrate efficacy against its own control arm, not against the first mover. This is reasonable from a regulatory standpoint. But it creates a commercial problem: the differentiation story that drives prescribing and market share rests on a comparison the trial was never designed to make.

I want to examine a specific vulnerability in this comparison that I think is underappreciated — what I will call the enrollment-era hypothesis — and illustrate it with a case study where the numbers are stark enough to demand scrutiny.

The Enrollment-Era Hypothesis

Here is the core idea: when a first-in-class drug is approved and becomes commercially available, it changes the denominator of eligible patients for subsequent trials in the same disease.

Consider the timeline. A first-in-class drug enrolls its pivotal trial before approval, drawing from the full spectrum of eligible patients — including those with favorable disease biology who would respond to any active therapy, and those with more refractory or indolent disease. The trial randomizes across this full spectrum. The placebo arm reflects the natural history of the unselected population.

Now the drug is approved. It enters commercial use. The patients who are most motivated to seek treatment — and their physicians, who are most attuned to the new option — preferentially access the approved drug through commercial prescriptions, expanded access programs, or clinical trial crossover provisions. These are not random patients. They are, on average, the patients most likely to have progressive disease and the strongest treatment-seeking behavior.

When the second-in-class drug opens enrollment for its pivotal trial, the eligible population has shifted. Patients who would have been randomized to the first trial’s placebo arm — but who now have access to the approved first-in-class drug — are no longer enrolling in a placebo-controlled study. Why would they? An effective therapy is available.

The patients who do enroll in the second trial’s placebo arm are, on average, different. They may be patients whose disease is more indolent (less urgency to seek the approved drug), patients in geographies where the first-in-class drug is not yet available, or patients whose physicians are less connected to the treatment landscape. The placebo arm of the second trial is not the same population as the placebo arm of the first trial.

If the second trial’s placebo arm performs better — because it is enriched with more indolent patients — the treatment effect (expressed as a hazard ratio or response rate delta) will appear larger, even if the active drug’s absolute efficacy is identical to the first-in-class agent.

Desmoid Tumors: The Numbers That Make This Concrete

The clearest illustration I have encountered is in desmoid tumors, specifically the gamma-secretase inhibitor (GSI) class.

Nirogacestat (Ogsiveo, SpringWorks Therapeutics) was the first-in-class GSI. Its pivotal DeFi trial (NCT03785964) enrolled 142 patients and randomized them to nirogacestat versus placebo. The trial enrolled from May 2019 through approximately August 2020 (primary enrollment period; primary analysis published 2023), prior to any GSI approval. The results: a PFS hazard ratio of 0.29 (71% risk reduction), an objective response rate of 41%, and a median tumor volume reduction of -27% (Gounder et al., NEJM, 2023). The FDA approved Ogsiveo in November 2023.

Varegacestat (Immunome/Ayala) is the second-in-class GSI. Its pivotal RINGSIDE trial (NCT05307614) enrolled 156 patients and randomized them to varegacestat versus placebo. Critical detail: RINGSIDE enrollment began in 2022 and continued through 2024 — overlapping with and extending beyond Ogsiveo’s approval.

The topline RINGSIDE results are striking: a PFS hazard ratio of 0.16 (84% risk reduction), an ORR of 56%, and a median tumor volume reduction of -83%. By every cross-trial metric, varegacestat appears meaningfully superior to nirogacestat.

But look at the placebo arms.

In DeFi (pre-Ogsiveo approval), the placebo arm’s median PFS has been estimated at approximately 15.1 months. In RINGSIDE (post-Ogsiveo approval), the placebo arm’s PFS data, while not yet fully published, appears substantially longer — early analyses suggest a difference of approximately 9 months or more. The RINGSIDE placebo arm performed meaningfully better than the DeFi placebo arm.

This is the enrollment-era effect in action. After Ogsiveo became available, the desmoid tumor patients with the most progressive, symptomatic disease had a treatment option. They did not need to enroll in a placebo-controlled trial. The patients who enrolled in RINGSIDE’s placebo arm — particularly those randomized after November 2023 — were, on average, patients with less aggressive disease trajectories. Their tumors were growing more slowly. Their placebo PFS was longer.

When you calculate a hazard ratio, the denominator matters as much as the numerator. A better-performing placebo arm inflates the treatment effect — even if the active drug’s absolute performance is unchanged.

What the Hazard Ratios Actually Tell Us

Let me be precise about what this does and does not mean.

It does not mean varegacestat is not an effective drug. A PFS HR of 0.16 is a robust treatment effect by any standard. An ORR of 56% and a median tumor volume reduction of -83% represent genuine, clinically meaningful activity.

What it does mean is that the comparison between varegacestat’s 0.16 HR and nirogacestat’s 0.29 HR is not an apples-to-apples comparison. The denominators — the placebo arms against which these ratios are calculated — are likely different populations measured in different eras of the disease’s treatment landscape.

If you were to hypothetically adjust for the enrollment-era effect — normalizing both placebo arms to the same baseline disease severity — the delta between the two drugs’ efficacy might narrow. It might still favor varegacestat. But the magnitude of apparent superiority would likely be smaller than the raw cross-trial comparison suggests.

This matters enormously for commercial positioning. A brand team building a launch strategy around “84% vs 71% risk reduction” is building on a comparison that a sophisticated competitor — or a rigorous advisory board member — will challenge. And the challenge is legitimate.

The Methodological Responses and Their Limits

The statistical community has developed several methods to address cross-trial comparison bias: matching-adjusted indirect comparison (MAIC), propensity score matching, simulated treatment comparisons (STC), and network meta-analysis (NMA). Each attempts to adjust for differences in baseline patient characteristics between trial populations (Phillippo et al., Statistics in Medicine, 2018).

These methods can adjust for measured confounders — age, sex, disease severity at baseline, prior treatment history. But enrollment-era confounding is, by its nature, partly unmeasurable. You can match on baseline tumor size or symptom severity, but you cannot fully match on the selection effect that determines which patients were still available for enrollment after the first-in-class drug entered the market. The patients who chose to enroll in a placebo-controlled trial when a commercial option existed are behaviorally and possibly biologically different from those who enrolled before that option existed.

MAIC and propensity matching improve the comparison. They do not eliminate the fundamental problem. And they are sufficiently complex that their assumptions can be challenged by anyone with motivation to do so — which, in a competitive commercial context, is everyone.

The FDA has acknowledged these limitations directly. The 2023 draft guidance on complex innovative trial designs notes that “indirect comparisons across trials conducted at different times, in different populations, or with different standards of care should be interpreted with caution” (FDA, Guidance for Industry: Adaptive Designs for Clinical Trials of Drugs and Biologics, updated 2023). This is regulatory language for: do not assume these numbers are comparable just because they are measured in the same units.

The Commercial Implication: Transparency as Strategy

Here is where this analysis becomes strategic rather than purely methodological.

The instinct of most brand teams, when faced with a favorable cross-trial comparison, is to push the numbers as hard as regulatory and legal review will allow. The HR is 0.16 versus 0.29. The ORR is 56% versus 41%. The tumor volume reduction is -83% versus -27%. Why would you not lead with these numbers?

Because the audience is not naive. The 20-30 sarcoma KOLs who treat the majority of desmoid tumor patients in the United States are clinical trialists themselves. They understand hazard ratios, they understand cross-trial limitations, and they will identify the placebo arm divergence before your first advisory board concludes. If your positioning leads with a cross-trial comparison that informed prescribers know is confounded, you lose credibility before you gain it.

The alternative is transparency. Acknowledge the cross-trial limitation explicitly. Present the absolute efficacy data — ORR, tumor volume reduction, durability of response — without relying on the HR comparison as the primary differentiator. Commission the MAIC. Present it honestly, including its limitations. Let the data speak for itself without overselling the comparison.

This approach feels counterintuitive. It feels like voluntarily disarming. But in a rare disease with a small, expert prescriber base, credibility compounds. The brand team that acknowledges what every expert in the room already knows builds trust. The team that pretends the cross-trial comparison is definitive erodes it.

In the varegacestat case, the absolute efficacy data — particularly the -83% median tumor volume reduction versus -27% for nirogacestat — may well be genuinely superior and less vulnerable to enrollment-era confounding than the HR comparison. Tumor shrinkage is an on-treatment measure of the active drug’s direct effect, less dependent on the placebo arm’s performance. Leading with absolute efficacy and patient-level response data, rather than the HR delta, is a stronger and more defensible commercial strategy.

Beyond Desmoid Tumors

The enrollment-era effect is not unique to desmoid tumors. It applies wherever second-in-class drugs run placebo-controlled trials after the first-in-class agent changes the treatment landscape.

In checkpoint inhibitors, the sequential approvals of pembrolizumab, nivolumab, and subsequent agents created progressively different patient populations in each trial era. In targeted therapies — CDK4/6 inhibitors in breast cancer, BTK inhibitors in CLL — the available patient pool shifted with each approval. In rare diseases, where the eligible population is small and the approval of a first therapy dramatically changes treatment-seeking behavior, the effect is amplified.

Every commercial team launching a second-in-class product should ask three questions:

First, how did the treatment landscape change between the first trial’s enrollment and ours? If a first-in-class drug was approved during or before your enrollment period, the denominator shifted.

Second, do the placebo arms perform differently? If your placebo arm has better outcomes than the first trial’s placebo arm, the enrollment-era effect is a plausible explanation — and your treatment effect comparison is inflated.

Third, what is our differentiation story if the HR comparison is neutralized? If a sophisticated MAIC or a head-to-head trial reduced the apparent treatment effect delta to non-significance, does the product still have a compelling value proposition? If the answer is yes — based on absolute efficacy, safety, dosing convenience, or patient-reported outcomes — the product is well-positioned regardless. If the answer is no, the entire commercial thesis is built on a confounded comparison, and that is a fragile foundation.

The Honest Answer Is the Durable Answer

Cross-trial comparisons are not going away. The FDA will continue to approve drugs based on single-arm or placebo-controlled trials without requiring head-to-head data. Brand teams will continue to place efficacy numbers side by side. Investors will continue to calculate implied market share based on relative hazard ratios.

But the teams that build their commercial strategy on the most defensible interpretation of the data — rather than the most favorable — will outperform in the long run. In a rare disease with 20 KOLs and 85 treatment centers, reputation is the most valuable commercial asset. And reputation is built on intellectual honesty, not promotional spin.

The placebo arm problem is not a reason to avoid cross-trial comparisons entirely. It is a reason to conduct them rigorously, present them transparently, and build a positioning strategy that does not collapse when the comparison is challenged.

Because it will be challenged. The only question is whether your team is prepared for that conversation or surprised by it.

Daniel Tran is a Hematology/Oncology Product Marketing Manager at Pfizer and a UCSD-trained PharmD specializing in launch strategy, competitive intelligence, and market access.