منذ 3 أعوام

سارتوريوس: تحليل استكشافي عميق + تفسير + نموذج

20 ساعة فقط من موارد حوسبة RTX 5090 $1 (قيمة $7)

جدول المحتويات

الملخص

Please provide the title and abstract you would like me to translate.

One-sentence Summary

The authors propose generalized regularized evidential deep learning models that theoretically characterize and overcome the activation-dependent learning-freeze inherent to Subjective Logic frameworks by introducing a general family of activation functions and corresponding evidential regularizers to ensure consistent evidence updates, with extensive experiments on four benchmark classification problems (MNIST, CIFAR-10, CIFAR-100, and Tiny-ImageNet), two few-shot classification problems, and a blind face restoration problem empirically validating the approach.

Key Contributions

The work theoretically characterizes an activation-induced learning-freeze phenomenon in evidential deep learning, demonstrating how non-negative evidence constraints and specific activation functions restrict gradient flow in low-evidence regions.
A generalized family of activation functions and corresponding evidential regularizers is introduced to mitigate this stagnation and enable consistent evidence updates across different activation regimes.
Empirical evaluations on MNIST, CIFAR-10, CIFAR-100, Tiny-ImageNet, two few-shot classification problems, and a blind face restoration problem validate the theoretical analysis and demonstrate the effectiveness of the proposed generalized regularized evidential models.

Introduction

Deep learning models achieve high performance across diverse applications but frequently generate overconfident predictions, posing risks in safety-critical domains like medical diagnosis where accurate uncertainty quantification is essential. Evidential deep learning provides a computationally efficient mechanism to quantify fine-grained uncertainty without the overhead of sampling-based approaches, yet existing models often suffer from degraded performance on complex datasets due to unstable training dynamics. The authors reveal that the non-negative evidence parameterization required by Subjective Logic, when paired with common activation functions, creates zero-evidence regions where gradients vanish and prevent effective learning. To overcome this stagnation, the authors develop a generalized regularized evidential framework that employs positive evidence regularization and a broad class of activation functions, enabling consistent evidence accumulation and improved robustness across classification and restoration tasks.

Method

The authors leverage a framework that extends standard classification models by incorporating evidential deep learning to systematically quantify uncertainty. The core architecture of the evidential model begins with a neural network $\mathcal{F}_{\Theta}$ that processes input $\mathbf{x}$ to produce logits $\mathbf{o}$ . These logits are then transformed into non-negative evidence values $\mathbf{e} = \mathcal{A}(\mathbf{o})$ using a non-negative activation function $\mathcal{A}$ . The evidence values are used to define Dirichlet parameters $\alpha = \mathbf{e} + \mathbf{1}$ , which parameterize a Dirichlet prior over the predictive multinomial distribution $\text{Mult}(\mathbf{y}|\mathbf{p})$ . The model's prediction $\mathbf{y}$ is derived by marginalizing over the latent parameter $\mathbf{p}$ , enabling the model to output both a prediction and a measure of confidence, characterized by vacuity $\nu = K/S$ , where $S = \sum_{k=1}^{K} (\mathbf{e}_k + 1)$ is the Dirichlet strength.

The model is trained using a combination of evidential losses and regularization terms. The primary evidential loss, such as the Bayes risk with sum of squares or cross-entropy, aims to maximize evidence for the correct class and minimize evidence for incorrect classes. Additionally, an incorrect evidence regularization term is employed to further penalize high evidence for incorrect classes. The authors analyze the gradient dynamics of this training process and identify a critical issue: when samples are mapped to the zero evidence region, where $\mathbf{e} = \mathbf{0}$ , the gradients of the standard evidential losses vanish, leading to a learning-freeze behavior where the model fails to update its parameters from such samples. This occurs because the gradient of the evidence $\frac{\partial e_k}{\partial o_k}$ approaches zero as $e_k \to 0$ for all common activation functions, including ReLU, SoftPlus, and exponential, resulting in a zero gradient signal for the model parameters.

To address this issue, the authors propose a novel Correct Evidence Regularization (CER) term, $\mathcal{L}^{\text{cor}}(\mathbf{x}, \mathbf{y}) = -\lambda_{\text{cor}} o_{\text{gt}}$ , which is activated only when the ground-truth logit $o_{\text{gt}}$ is negative. This term is designed to provide a non-vanishing gradient signal in the zero evidence region, specifically pushing samples away from this region. The regularization strength is modulated by the vacuity $\lambda_{\text{cor}} = \nu$ , which is large (approaching 1) when evidence is low and decreases as evidence increases. This ensures the regularization is most influential in low-evidence regions and fades away for high-evidence samples, allowing the standard evidential losses to dominate learning in those areas.

The authors further introduce a Generalized Regularized Evidential Model (GRED) that combines the standard evidential loss $\mathcal{L}^{\text{evid}}$ , the incorrect evidence regularization $\mathcal{L}^{\text{inc}}$ , and the proposed correct evidence regularization $\mathcal{L}^{\text{cor}}$ into a single objective. This combined loss ensures that the model learns from all training samples, regardless of their evidence level. The framework is illustrated in Figure 2, which shows that while standard evidential training drives samples toward high-evidence regions, samples in the zero evidence region receive no update. In contrast, the GRED regularization (red arrows) actively pushes these samples out of the zero evidence region, ensuring a continuous learning signal across the entire evidence space.

The authors also analyze the impact of different activation functions on learning. They find that the exponential activation function $\mathcal{A}(o_k) = \exp(o_k)$ provides the largest gradient updates for samples near the zero evidence region compared to ReLU and SoftPlus, as its gradient $\frac{\partial e_k}{\partial o_k} = \exp(o_k)$ remains non-zero even for negative logits. However, this can lead to exploding evidence values for large positive logits. To balance this, they propose a novel Shifted Exponential Linear Unit (SELU) activation, which behaves like the exponential function for negative logits (ensuring strong gradients in the low-evidence region) and linearly for positive logits (preventing evidence explosion). The overall training process is designed to increase evidence for the correct class, reduce evidence for incorrect classes, and ensure that samples in low- or zero-evidence regions receive a meaningful learning signal.

Experiment

The evaluation spans standard classification, few-shot learning, adversarial robustness, out-of-distribution detection, and image restoration across diverse architectures to validate the theoretical limitations of conventional evidential models and the efficacy of the proposed correct evidence regularization. Learning dynamics and ablation studies reveal that standard approaches suffer from vanishing gradients in zero-evidence regions, causing learning stagnation and high sensitivity to regularization strength. Generalization and uncertainty analyses demonstrate that the proposed method consistently improves model reliability by restoring gradients for low-evidence samples, aligning predictive accuracy with uncertainty estimates, and enabling effective prediction filtering in data-scarce settings. Overall, the findings confirm that correcting evidence regularization effectively mitigates learning freeze while delivering robust uncertainty quantification across complex vision tasks.

The authors evaluate the impact of regularization strength on model accuracy across different activation functions and their regularized variants. The results show that models with correct evidence regularization maintain stable performance even at high regularization strengths, while baseline models degrade significantly. The exp activation function consistently achieves the highest accuracy, and the proposed regularization improves learning dynamics for all activation types. Models with correct evidence regularization maintain stable performance across varying regularization strengths, unlike baseline models that degrade significantly. The exp activation function consistently outperforms other activations in terms of accuracy across all regularization strengths. Correct evidence regularization improves learning dynamics, enabling models to learn from all training samples, including those with low or zero evidence.

The authors evaluate the proposed generalized regularized evidential model (GRED) across various benchmarks and settings, demonstrating its ability to improve generalization and uncertainty quantification compared to baseline evidential models. Results show that GRED maintains better performance under strong regularization and adversarial conditions, while also improving out-of-distribution detection and few-shot learning. The model's enhanced calibration and uncertainty behavior are consistent across different datasets and architectures. GRED improves generalization and uncertainty quantification by enabling learning from samples in the zero-evidence region. GRED maintains better performance under strong regularization and adversarial conditions compared to baseline models. GRED enhances out-of-distribution detection and few-shot learning by providing more reliable uncertainty estimates.

The authors evaluate the performance of evidential models on a blind face restoration task using the CodeFormer architecture, comparing different activation functions and regularization strategies. Results show that the proposed GRED-Exp model with correct evidence regularization achieves the best performance across all metrics, outperforming baseline models and demonstrating improved generalization and uncertainty quantification. The GRED-Exp model with correct evidence regularization achieves the best performance across all metrics on the blind face restoration task. The proposed GRED framework improves generalization and uncertainty quantification compared to baseline evidential models. Different activation functions and regularization strengths significantly impact model performance, with exponential activations and correct evidence regularization yielding superior results.

The authors evaluate the performance of evidential models in few-shot learning tasks, focusing on the impact of correct evidence regularization. Results show that the proposed GRED framework consistently outperforms baseline models across different activation functions and shot settings, with significant improvements in accuracy, especially in low-data regimes. The GRED variants demonstrate enhanced generalization and more reliable uncertainty estimates compared to standard evidential models. GRED consistently outperforms baseline evidential models across all activation functions and few-shot settings. The proposed correct evidence regularization enables better generalization and more reliable uncertainty estimates in low-data scenarios. GRED achieves significant accuracy improvements, particularly in the 100-way 1-shot setting, with the best performance observed for the Exp activation function.

The authors evaluate the impact of different activation functions and the proposed correct evidence regularization on model performance across multiple datasets and architectures. Results show that the proposed GRED model, which incorporates correct evidence regularization, consistently improves generalization and learning dynamics compared to baseline evidential models, particularly under strong incorrect evidence regularization. The exp activation function demonstrates superior performance, and the GRED model maintains stable training even when other models degrade. The proposed GRED model with correct evidence regularization improves generalization across various activation functions and datasets. The exp activation function outperforms other activations, and GRED maintains stable learning even under strong incorrect evidence regularization. GRED enables learning from all training samples, including those in the zero-evidence region, leading to improved performance and uncertainty reliability.

The experiments evaluate the proposed GRED framework with correct evidence regularization across diverse benchmarks, including blind face restoration and few-shot learning, while systematically varying activation functions and regularization strengths. These studies validate the model's capacity to maintain stable performance and improve learning dynamics under strong regularization and data-scarce conditions by effectively utilizing training samples with minimal evidence. The exponential activation function consistently yields superior accuracy, and the proposed regularization significantly enhances generalization, uncertainty calibration, and robustness compared to baseline evidential models. Overall, the findings demonstrate that GRED provides a more reliable and adaptable framework for evidential deep learning across varied tasks and architectures.

ملف PDF المصدر

جدول المحتويات

بناء الذكاء الاصطناعي بالذكاء الاصطناعي

من الفكرة إلى الإطلاق — سرّع تطوير الذكاء الاصطناعي الخاص بك مع المساعدة البرمجية المجانية بالذكاء الاصطناعي، وبيئة جاهزة للاستخدام، وأفضل أسعار لوحدات معالجة الرسومات.

البرمجة التعاونية باستخدام الذكاء الاصطناعي

وحدات GPU جاهزة للعمل

أفضل الأسعار

ابدأ عرض الأسعار

HyperAI Newsletters

اشترك في آخر تحديثاتنا

سنرسل لك أحدث التحديثات الأسبوعية إلى بريدك الإلكتروني في الساعة التاسعة من صباح كل يوم اثنين

مدعوم بواسطة MailChimp

HyperAI

شغّل هذا الـNotebook ناقش على Discord

منذ 3 أعوام

سارتوريوس: تحليل استكشافي عميق + تفسير + نموذج

20 ساعة فقط من موارد حوسبة RTX 5090 $1 (قيمة $7)

الانتقال إلى دفتر

جدول المحتويات

الملخص

Please provide the title and abstract you would like me to translate.

One-sentence Summary

Key Contributions

The work theoretically characterizes an activation-induced learning-freeze phenomenon in evidential deep learning, demonstrating how non-negative evidence constraints and specific activation functions restrict gradient flow in low-evidence regions.
A generalized family of activation functions and corresponding evidential regularizers is introduced to mitigate this stagnation and enable consistent evidence updates across different activation regimes.
Empirical evaluations on MNIST, CIFAR-10, CIFAR-100, Tiny-ImageNet, two few-shot classification problems, and a blind face restoration problem validate the theoretical analysis and demonstrate the effectiveness of the proposed generalized regularized evidential models.

Introduction

Method

Experiment

ملف PDF المصدر

جدول المحتويات

بناء الذكاء الاصطناعي بالذكاء الاصطناعي

البرمجة التعاونية باستخدام الذكاء الاصطناعي

وحدات GPU جاهزة للعمل

أفضل الأسعار

ابدأ عرض الأسعار

HyperAI Newsletters

اشترك في آخر تحديثاتنا

سنرسل لك أحدث التحديثات الأسبوعية إلى بريدك الإلكتروني في الساعة التاسعة من صباح كل يوم اثنين

مدعوم بواسطة MailChimp

Command Palette

Please provide the title you would like me to translate.

سارتوريوس: تحليل استكشافي عميق + تفسير + نموذج

الملخص

One-sentence Summary

Key Contributions

Introduction

Method

Experiment

بناء الذكاء الاصطناعي بالذكاء الاصطناعي

HyperAI Newsletters

Command Palette

Please provide the title you would like me to translate.

سارتوريوس: تحليل استكشافي عميق + تفسير + نموذج

الملخص

One-sentence Summary

Key Contributions

Introduction

Method

Experiment

بناء الذكاء الاصطناعي بالذكاء الاصطناعي

HyperAI Newsletters

Command Palette

Please provide the title you would like me to translate.

سارتوريوس: تحليل استكشافي عميق + تفسير + نموذج

الملخص

One-sentence Summary

Key Contributions

Introduction

Method

Experiment

بناء الذكاء الاصطناعي بالذكاء الاصطناعي

HyperAI Newsletters