HyperAIHyperAI

Command Palette

Search for a command to run...

3 years ago

A Tutorial for Evaluating Cure Model Appropriateness

Geethanjalee Mudunkotuwa Durbadal Ghosh Subodh Selukar

Evaluating Diffusion Models

20 Hours of RTX 5090 Compute Resources for Only $1 (Worth $7)
Go to Notebook

Abstract

In survival analysis, traditional models assume all individuals will eventually experience the event of interest. However, advances in therapeutics have led to multiple clinical contexts with potentially curative therapies, and in these contexts, certain individuals may never experience the event. Statisticians have developed cure models as a methodology to address this challenge. Nonetheless, despite significant statistical advances in cure models, we have seen more limited uptake in biomedical applications, and we hypothesize that this is caused by limited guidance in the appropriate application of cure models. Cure models require specific identifiability conditions for valid parameter estimation, and previous reports have demonstrated significant issues with the inappropriate application of cure models. Existing tutorials for cure models focus on model implementation and either assume or provide only limited guidance on whether cure modeling is appropriate for the given dataset. This tutorial addresses this gap by describing a systematic procedure that integrates clinical judgment, visual inspection of Kaplan-Meier curves, and quantitative evaluation.

One-sentence Summary

This tutorial addresses the limited biomedical adoption of cure models in survival analysis by presenting a systematic evaluation procedure that integrates clinical judgment, visual inspection of Kaplan-Meier curves, and quantitative evaluation to satisfy identifiability conditions and ensure valid parameter estimation, distinguishing the approach from prior implementation-focused guides that frequently enable model misapplication.

Key Contributions

  • This tutorial presents a systematic evaluation framework to determine the statistical and clinical appropriateness of cure models for survival analyses involving potentially curative therapies and long-term survivorship.
  • The proposed procedure integrates expert clinical judgment, visual inspection of non-zero Kaplan-Meier curve plateaus, and formal quantitative testing to verify identifiability conditions prior to parameter estimation.
  • Empirical analyses demonstrate that cure model suitability depends on cohort risk profiles rather than follow-up duration alone, while also highlighting data quality degradation when clinical monitoring transitions from active trials to passive follow-up.

Introduction

Modern therapeutic advances have enabled long-term remission in several oncology populations, prompting statisticians to develop cure models that estimate a non-zero fraction of patients who will never experience the event of interest. These models improve long-term survival extrapolations and clinical trial planning, making them essential for accurate biomedical decision-making. However, cure models require strict identifiability conditions, particularly adequate follow-up duration to distinguish cured individuals from those still at risk. Prior methodological tutorials have focused heavily on model implementation while neglecting how to verify whether a dataset actually meets these prerequisites, which frequently leads to misapplication and unreliable estimates. The authors address this gap by introducing a systematic evaluation framework that integrates clinical judgment, visual inspection of Kaplan-Meier curves, and quantitative statistical checks. By validating this workflow with acute myeloid leukemia and hematopoietic cell transplantation data, they provide researchers with a practical guide to ensure cure models are only applied when biologically and statistically justified.

Dataset

Dataset Composition and Sources: The authors combine clinical trial data from the SWOG S1203 study with prospective and retrospective datasets from the Bone Marrow Transplantation & Cellular Therapy program at St. Jude Children's Research Hospital to evaluate cure model applicability in hematologic oncology.

Subset Details:

  • S1203 AML Trial: Focuses on the IA treatment arm for adult patients with previously untreated acute myeloid leukemia. Event-free survival is tracked from randomization, with a maximum follow-up exceeding 7.3 years.
  • St. Jude Prospective Cohorts: HAPNK1 includes 53 patients in complete remission and 19 with active disease from a phase 2 high-risk malignancy study. HAP2HCT includes 48 patients from phase 2 dose levels 3 to 4 of a phase 1/2 study.
  • St. Jude Retrospective Cohorts: HCTRETRO covers 106 patients receiving a second transplant and 13 receiving more than two. The Refractory at HCT cohort includes 129 patients with active disease at the time of transplant.

Data Usage and Modeling Approach: The authors use the cohorts to fit and evaluate cure survival models rather than for traditional machine learning training splits or mixture ratios. They apply a standardized assessment framework to test model appropriateness across diverse patient prognoses and follow-up protocols, calculating maximum follow-up duration in years and ranking model performance using Akaike Information Criterion values and RECeUS assessments.

Processing and Metadata Construction: Time-to-event metrics are standardized from the randomization date, and patients without observed events are right censored at their last contact. The authors compile cohort-level metadata that records sample sizes, median survival times, maximum follow-up in years, Kaplan-Meier estimates at the longest observation point, visual cure indicators, and the best-fitting model specification for each subgroup.

Method

The authors leverage a structured three-stage framework to assess the appropriateness of cure models for survival data. This process begins with clinical and biological evaluation, where expert judgment determines whether a cure model is biologically plausible and whether long-term survival without recurrence is expected. If both conditions are satisfied, the analysis proceeds to visual assessment using the Kaplan-Meier survival curve. This stage examines whether the survival curve exhibits a horizontal plateau at a level above zero, indicating a non-zero cure fraction, and whether late events are absent or rare, suggesting that susceptible individuals have largely experienced the event.

As shown in the figure below, if the visual evidence supports a cure model, the process advances to a quantitative assessment. This final stage evaluates whether the data provide strong evidence for both a sufficient follow-up duration and a non-negligible cure fraction. The assessment involves fitting both a cure model and a standard non-cure model using the same parametric distribution, such as Weibull, and comparing them via Akaike Information Criterion (AIC). A cure model is selected if it yields a lower AIC. Subsequently, the estimated cure fraction θ\thetaθ and the ratio r^r_{\hat{}}r^, which reflects the proportion of uncured individuals still at risk at the maximum follow-up time, are computed. The model is deemed appropriate only if θ>0.025\theta > 0.025θ>0.025 and r^<0.05r_{\hat{}} < 0.05r^<0.05, ensuring both a clinically meaningful cure fraction and sufficient follow-up to observe the tail of the survival distribution. This framework integrates clinical insight, graphical evidence, and statistical inference to justify the use of cure models.

Experiment

The evaluation setup integrates visual inspection of Kaplan-Meier survival curves with quantitative hypothesis testing to assess follow-up adequacy and cure model appropriateness. Visual analysis demonstrates that extended observation periods produce distinct survival plateaus indicative of long-term remission, while restricted timelines obscure this pattern and compromise model validity. Quantitative frameworks, including two-step testing and the RECeUS method, consistently corroborate these visual findings by confirming that prolonged monitoring reliably supports cure modeling. Ultimately, the experiments conclude that retrospective datasets with sufficient follow-up duration can confidently justify cure model application, even when formal protocol-specified monitoring is absent.

The authors evaluate the appropriateness of cure models using a combination of visual and quantitative methods, including model selection via AIC and the RECeUS framework. Results show that a cure model is supported for certain cohorts based on sufficient follow-up and model fit, while other cases are deemed inappropriate due to insufficient data or model selection outcomes. The RECeUS method confirms model appropriateness when both the estimated cure fraction is above a threshold and the ratio of susceptible survival to population survival is low. A cure model is supported when both the estimated cure fraction is above a threshold and the ratio of susceptible survival to population survival is low. Model selection via AIC identifies the Weibull cure model as the best fit for the IA arm, supporting the use of a cure model. Visual inspection of Kaplan-Meier curves shows a long plateau, indicating sufficient follow-up and supporting the plausibility of a cure model.

The authors compare parametric cure and non-cure models using AIC to assess the appropriateness of a cure model for the IA arm. The Weibull cure model has the lowest AIC value, indicating it is the best-fitting model among those considered. Based on this selection, further analysis is conducted to evaluate follow-up adequacy and cure fraction using the RECeUS method. The Weibull cure model has the lowest AIC value among all models, suggesting it is the best fit for the data. AIC comparison favors the Weibull cure model over all non-cure and other cure models. The RECeUS method confirms the appropriateness of a cure model based on model selection and follow-up adequacy criteria.

The authors evaluate the appropriateness of cure models for different datasets using a combination of visual and quantitative methods. Results show that datasets with longer follow-up times and evidence of a survival plateau are more likely to support the use of cure models, as indicated by both visual inspection and the RECeUS method. The RECeUS method, which integrates cure fraction and follow-up sufficiency, provides a consistent assessment of model appropriateness across datasets. Datasets with longer follow-up times and observed survival plateaus are more likely to support cure model application. The RECeUS method provides a consistent assessment of cure model appropriateness by combining cure fraction and follow-up sufficiency. Visual evidence of a survival plateau is a key indicator for the suitability of cure models, particularly when combined with quantitative results.

The evaluation combines visual inspection of survival curves with quantitative model selection and follow-up adequacy assessments to determine the suitability of cure models across different datasets. Experiments validate that cure models are appropriate primarily when datasets demonstrate sufficient follow-up periods, clear survival plateaus, a high estimated cure fraction, and a low ratio of susceptible to population survival. Quantitative comparisons consistently identify the Weibull cure model as the optimal fit for the IA arm, while the RECeUS framework reliably confirms model appropriateness by integrating cure fraction thresholds with follow-up sufficiency. Overall, the findings establish that adequate longitudinal data and distinct survival plateaus are critical prerequisites for successfully applying cure models in survival analysis.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp