HyperAIHyperAI

Command Palette

Search for a command to run...

il y a 3 ans

Le titre est vide. Veuillez fournir le titre à traduire.

Prédiction du risque de cancer du sein

20 heures de calcul sur RTX 5090 pour seulement $1 (valeur $7)
Aller à Notebook

Résumé

Please provide the title and abstract you would like me to translate into French.

One-sentence Summary

BioFusionNet is a deep learning framework for ER+ breast cancer survival risk stratification that integrates histopathological, genetic, and clinical data through self-supervised extractors (DINO and MoCoV3), a variational autoencoder, and a co-dual-cross-attention mechanism, employs a weighted Cox loss to address survival data imbalance, and achieves a mean concordance index of 0.77 and a time-dependent area under the curve of 0.84.

Key Contributions

  • The paper introduces BioFusionNet, a deep learning framework for survival risk stratification in ER+ breast cancer that integrates histopathological patches, genomic profiles, and clinical records. The architecture employs self-supervised DINO and MoCoV3 extractors to capture detailed image features, which are aggregated via a variational autoencoder to generate patient-level representations.
  • A co-dual-cross-attention mechanism combines histopathological and genetic features, while a feed-forward network incorporates clinical data to enable comprehensive multimodal fusion. The training process utilizes a custom weighted Cox loss function to effectively address the inherent imbalance in survival datasets.
  • Empirical evaluations demonstrate that the framework achieves a mean concordance index of 0.77 and a time-dependent area under the curve of 0.84. These metrics establish superior predictive performance compared to existing state-of-the-art survival analysis models.

Introduction

Accurate survival risk stratification is essential for optimizing treatment pathways in estrogen receptor positive breast cancer, where clinicians must decide whether patients require additional chemotherapy or can safely rely on endocrine therapy alone. Traditional prognostic models depend primarily on clinicopathological factors and frequently miss underlying tumor biology, while existing deep learning approaches struggle to reconcile the dimensional variability and heterogeneity inherent in combining histopathology, genomics, and clinical records. To overcome these barriers, the authors leverage self-supervised feature extraction and a co-dual-cross-attention mechanism to build BioFusionNet, a multimodal survival prediction framework that seamlessly integrates diverse data sources. They also implement a weighted Cox loss function to correct for imbalanced survival datasets, ultimately delivering a more accurate and interpretable risk scoring system that streamlines personalized treatment decisions.

Dataset

  • Dataset Composition and Sources: The authors use multi-modal data from The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA). Whole-slide images are sourced from the GDC portal, while transcriptomic and clinical records are retrieved from cBioPortal. The dataset combines H&E-stained histopathology slides, mRNA expression profiles, and standardized patient records.

  • Subset Details and Filtering: The study utilizes a curated cohort of 249 cases, comprising 149 Luminal A and 100 Luminal B tumors. The authors preserved the original event distribution by selecting 83 cases with survival outcomes and approximately three times as many censored cases per subtype. Genomic data was filtered from 20,438 initial genes to 138 clinically relevant markers featured in commercial assays like Oncotype DX, Mammaprint, and PAM50. Clinical records were narrowed to four variables: tumor grade, tumor size, patient age, and lymph node status.

  • Processing and Cropping Strategy: An expert pathologist manually annotated tumor boundaries using QuPath, excluding necrosis while retaining stroma and tumor-infiltrating lymphocytes. The annotated regions were downsampled to 0.25 micrometers per pixel and cropped into 224x224 pixel patches, generating roughly 500 non-overlapping patches per slide. Vector-based color normalization was applied to standardize staining inconsistencies. The RNA sequencing data was used post-RSEM processing without additional normalization. Clinical variables were binarized to align with Cox proportional hazards analysis requirements.

  • Model Usage and Workflow: The authors prepared this integrated dataset for multi-modal prognostic modeling. While the provided text does not specify explicit training or validation splits, the curated patches, gene signatures, and binarized clinical features were structured to train survival prediction models and evaluate subtype-specific outcomes. The balanced case selection and targeted gene filtering ensure the data directly supports the study's focus on Luminal A and B breast cancer prognosis.

Method

The authors leverage a two-stage deep learning framework, BioFusionNet, designed to integrate histopathological, genomic, and clinical data for survival risk stratification in ER+ breast cancer. The overall architecture is structured to first extract and fuse image features, then integrate these with genetic and clinical data to generate a comprehensive patient-level risk score. The framework is divided into distinct stages, each addressing specific aspects of multimodal data processing.

The initial stage focuses on extracting rich morphological features from histopathological images. The model employs two self-supervised Vision Transformer (ViT)-based architectures, DINO and MoCoV3, pretrained on large-scale histology datasets. DINO33M, trained on a diverse collection of 33 million patches from multiple sources, provides a broad understanding of general histopathological patterns, while DINO2M, trained specifically on 2 million TCGA-BRCA patches, captures breast cancer-specific features. MoCoV3, pretrained on 15 million patches from TCGA and PAIP datasets, enhances the model's ability to learn robust representations through momentum contrast. Each of these models processes 224×224×3 image patches to produce a 1×384 feature vector. These individual features are concatenated into a single 1times11521 \\times 11521times1152 vector, ftextcat(x)f_{\\text{cat}}(x)ftextcat(x), which serves as the input for the next phase of feature integration.

To consolidate and refine these image features, a Variational Autoencoder (VAE) is employed. The VAE's encoder maps the concatenated feature vector ftextcat(x)f_{\\text{cat}}(x)ftextcat(x) to a latent space, generating the mean (mu\\mumu) and standard deviation (sigma\\sigmasigma) of the latent representation. For each patient, which contributes 500 patches, this results in a latent space matrix of size 500times256500 \\times 256500times256. The latent variable zzz is sampled using the reparameterization trick, z=mu+sigmacdotepsilonz = \\mu + \\sigma \\cdot \\epsilonz=mu+sigmacdotepsilon, where epsilonsimmathcalN(0,I)\\epsilon \\sim \\mathcal{N}(0, I)epsilonsimmathcalN(0,I), ensuring the latent space is structured as a 500times256500 \\times 256500times256 matrix. The VAE is trained with a total loss combining mean squared error (MSE) for reconstruction accuracy and Kullback-Leibler (KL) divergence for distribution regularisation, which encourages the latent distribution to approximate a standard normal distribution. This process effectively blends features from the different self-supervised models, enhancing the overall representation.

Following VAE encoding, a self-attention module aggregates the 500times256500 \\times 256500times256 latent patch-level features into a single patient-level representation. This module computes a weighted sum of the key (KKK), query (QQQ), and value (VVV) vectors, where the attention scores are determined by the relevance between each query and key pair. This process allows the model to focus on the most pertinent image features while contextualizing each patch within the broader histopathology of the patient, resulting in a comprehensive patient-level embedding.

The second stage of the model integrates the patient-level image features with genetic data. A co-dual-cross-attention mechanism is used to achieve this fusion. This mechanism first applies co-attention, where image embeddings (III) and genetic features (GGG) are linearly transformed to generate query (QQQ), key (KKK), and value (VVV) vectors. This generates bidirectional attention scores, AIGA_{IG}AIG (image to genetic) and AGIA_{GI}AGI (genetic to image), which are computed using the softmax function. The dual-cross-attention module further refines this integration in two stages. In the first stage, the co-attended features are used to compute cross-attention outputs, CIGC_{IG}CIG and CGIC_{GI}CGI, which enhance each modality by integrating contextually relevant information from the other. In the second stage, these outputs are re-applied to their original features to produce refined representations, DIGD_{IG}DIG and DGID_{GI}DGI. The concatenated output Ttextcat=DIGoplusDGIT_{\\text{cat}} = D_{IG} \\oplus D_{GI}Ttextcat=DIGoplusDGI is then fed into a Transformer Encoder.

The Transformer Encoder, consisting of multiple identical layers, further assimilates the fused image and genetic features. Each layer contains a multihead self-attention mechanism and a position-wise fully connected feed-forward network (FFN). The multihead self-attention allows the model to attend to information from different representation subspaces simultaneously. The FFN applies two linear transformations with a ReLU activation in between. Residual connections and layer normalization are applied around each sublayer to facilitate training. This deep integration produces a holistic representation of the phenotypic and genotypic information.

Clinical data, which consists of only four features, is integrated at a later stage of the network to ensure its impact is not overshadowed by the higher-dimensional imaging and genetic features. This "late fusion" strategy involves concatenating the clinical data with the output of the Transformer Encoder. This concatenated feature vector is then processed by a series of fully-connected layers, with the third layer specifically incorporating the clinical information. The final layer is a linear output layer that predicts a continuous survival risk score.

To address the challenge of imbalanced survival data, the authors propose a weighted Cox loss function. This loss function modifies the traditional Cox proportional hazards loss by incorporating sample weights to mitigate bias towards censored data. The loss is computed by sorting samples by risk, calculating a weighted cumulative hazard, and then computing the log-likelihood, with the final loss normalized by the total weighted events. This ensures the model is sensitive to the minority event class, improving prediction performance on imbalanced datasets. The model is trained in two stages: the first stage optimizes the VAE with AdamW, and the second stage optimizes the entire risk prediction pipeline using the weighted Cox loss with Adam. An early stopping mechanism is employed to prevent overfitting.

Experiment

The evaluation framework systematically assessed BioFusionNet through cross-validated survival analysis, comparative benchmarks, ablation studies, and interpretability assessments to validate its multimodal integration capabilities. Each experiment confirms that combining imaging, genetic, and clinical data substantially improves risk prediction over unimodal or traditional approaches, while the proposed weighted Cox loss and attention mechanisms consistently enhance model accuracy. Survival analysis and hazard modeling further demonstrate that the model’s risk stratifications strongly align with actual patient outcomes, outperforming conventional clinical indicators. Finally, interpretability analyses verify that the architecture effectively isolates clinically relevant tissue features and identifies key predictive biomarkers, establishing the system as a highly accurate and transparent tool for oncology despite its computational intensity.

The authors conducted a hazard analysis to evaluate the association between various clinical and predictive factors and overall survival in ER+ breast cancer patients. Results show that BioFusionNet-predicted risk groups and lymph node status were significantly associated with survival outcomes, while other factors such as tumour grade and size did not show significant associations. The analysis also highlights that age and clinical parameters like lymph node status positively impact the model's risk predictions. BioFusionNet-predicted risk groups and lymph node status were significantly associated with survival outcomes. Tumour grade and size did not show significant associations with survival in the analysis. Age and clinical parameters such as lymph node status had a positive impact on the model's risk predictions.

The authors compare the performance of two loss functions, traditional Cox loss and a proposed weighted Cox loss, on BioFusionNet and MoCoV3 using C-index across five cross-validation folds. Results show that the proposed weighted Cox loss consistently improves performance for both methods compared to the traditional Cox loss. The improvement is more pronounced in BioFusionNet, which achieves higher mean C-index values with the proposed loss. The proposed weighted Cox loss consistently improves C-index performance compared to the traditional Cox loss for both BioFusionNet and MoCoV3. BioFusionNet achieves higher mean C-index values than MoCoV3 when using the proposed weighted Cox loss. The improvement in performance is more significant for BioFusionNet than for MoCoV3 when using the proposed loss function.

The authors compare the performance of BioFusionNet against several state-of-the-art multimodal fusion methods using C-index and AUC metrics. Results show that BioFusionNet consistently outperforms the other methods across both metrics, achieving the highest mean values. The the the table indicates that BioFusionNet demonstrates superior predictive performance compared to all baseline models. BioFusionNet achieves the highest mean C-index and AUC values compared to all other methods. The proposed model consistently outperforms existing multimodal fusion methods across all evaluation folds. BioFusionNet demonstrates superior predictive performance in both C-index and AUC metrics compared to baseline models.

The authors compare the computational properties of BioFusionNet with several state-of-the-art multimodal fusion methods, focusing on the number of parameters, memory usage, and floating-point operations. Results show that BioFusionNet requires the highest number of FLOPS among the compared models, indicating greater computational demand, while also being more memory efficient than some alternatives. BioFusionNet has the highest computational cost in terms of FLOPS compared to other multimodal fusion methods. BioFusionNet is more memory efficient than PathomicFusion despite its higher computational demands. The proposed model has a moderate number of parameters relative to other methods, suggesting a balance between complexity and efficiency.

The authors compare the performance of BioFusionNet with other models across different modality configurations, showing that the combination of imaging, genetic, and clinical data consistently yields higher predictive performance than using any single or dual modality. The model achieves the highest mean performance when all three data types are integrated, outperforming traditional methods and other multimodal fusion approaches. Results also indicate that the inclusion of specific attention mechanisms and a weighted loss function further enhances prediction accuracy. BioFusionNet achieves the highest performance when using imaging, genetic, and clinical data together compared to using any single or dual modality. The model outperforms traditional methods like CoxPH and MLP, as well as other multimodal fusion techniques. Incorporating advanced attention mechanisms and a weighted loss function improves the model's predictive accuracy.

The experiments evaluate BioFusionNet, a multimodal deep learning framework designed to predict overall survival in ER+ breast cancer patients by integrating imaging, genetic, and clinical data. Hazard analysis and modality integration tests confirm that the model’s risk stratification aligns strongly with established clinical prognostic factors, while the combined use of all three data types yields the most robust predictions. Comparative benchmarks demonstrate that BioFusionNet consistently outperforms traditional statistical methods and state-of-the-art fusion techniques, with a proposed weighted Cox loss and attention mechanisms further enhancing its accuracy. Finally, computational assessments reveal that the architecture achieves superior memory efficiency despite higher processing demands, validating its practical viability for complex clinical prediction tasks.


Créer de l'IA avec l'IA

De l'idée au lancement — accélérez votre développement IA avec le co-codage IA gratuit, un environnement prêt à l'emploi et le meilleur prix pour les GPU.

Codage assisté par IA
GPU prêts à l’emploi
Tarifs les plus avantageux

HyperAI Newsletters

Abonnez-vous à nos dernières mises à jour
Nous vous enverrons les dernières mises à jour de la semaine dans votre boîte de réception à neuf heures chaque lundi matin
Propulsé par MailChimp