Download the Report Now

Enter your information for instant access to the TrialGPT Benchmarking Report

Original research · 2026

We benchmarked our AI against the published gold standard in clinical trial matching.

Health Universe tested our agentic oncology pipeline head-to-head against TrialGPT — the NIH-developed framework published in Nature Communications. Download the full evaluation report to see what we found.

  • 12-page technical report
  • 15 min read
  • Free download
What's inside

A 12-page evaluation, methodology and reasoning included.

  • Full methodology — how we designed the two-arm evaluation using the original TrialGPT datasets.

  • Head-to-head accuracy data — ROC AUC, precision, recall, and PR AUC across 176 labeled patient-trial pairs.

  • LLM-as-judge results — independent GPT-5 evaluation of clinical reasoning quality in divergent cases.

  • Four real case vignettes — where our system caught critical eligibility errors the original missed.

  • Discussion of tradeoffs — why higher recall isn't always better, and what precision means for your workflow.

  • Honest limitations — what this study does and doesn't prove, and what we're doing next.

Real errors our system caught — that TrialGPT didn't.

Case 01

Age mismatch

TrialGPT flagged a 75-year-old male as strongly eligible for a pediatric trial. Our system correctly identified ineligibility.

Case 02

Criteria inversion

TrialGPT misread "women ≥ 40" as an exclusion criterion, wrongly ruling out an eligible patient. Our system got it right.

Case 03

Missing diagnosis

TrialGPT scored a patient as strongly eligible despite lacking a required VTE diagnosis. Our system caught the gap.

Case 04

Complex comorbidities

For a patient with NHL, SLE, diabetes, and hepatitis C, our system treated missing critical data with appropriate caution.

This evaluation benchmarks against TrialGPT (Jin et al., Nature Communications 2024) — the most rigorous publicly available framework for AI-assisted patient-to-trial matching, developed at NIH and evaluated on 75,000+ trial annotations.

Matching patients to clinical trials with large language models

Jin et al. · Nature Communications · 2024 · DOI: 10.1038/s41467-024-53081-z