HyperAI

Recently, a research team from the Institute of Software at the Chinese Academy of Sciences (CAS) has developed a novel small-batch data sampling strategy. This strategy aims to mitigate the impact of spurious correlations caused by unobservable semantic variables, thereby enhancing the out-of-distribution generalization capabilities of self-supervised learning models. Out-of-distribution generalization refers to the model's ability to perform well on test data that is different from its training data. Essentially, the model needs to show similar performance on "unseen" data distributions as it does on the training data. However, studies have shown that self-supervised learning models can be affected by the semantics of unobservable variables during training, which can weaken their out-of-distribution generalization. The CAS researchers addressed this issue using causal effect estimation techniques. They introduced a small-batch data sampling strategy designed to eliminate the confounding effects of unobservable semantic variables. By constructing a latent variable model, they estimated the posterior probability distribution of the unobservable semantic variables given an "anchor" sample. This estimate, termed the balance score, is used to group samples with similar or close balance scores into the same small batch. This approach ensures that within each batch, the unobservable semantic variables are conditionally independent of the anchor sample, helping the model avoid spurious correlations and improve out-of-distribution generalization. The effectiveness of this sampling strategy was demonstrated through a series of extensive experiments on benchmark datasets. In these experiments, only the mechanism for generating batches was altered, with no changes made to the model architecture or hyperparameters. The results were impressive: the sampling strategy improved performance by at least 2% across various evaluation tasks. Specifically, on ImageNet 100 and ImageNet classification tasks, both Top-1 and Top-5 accuracies exceeded the state-of-the-art (SOTA) self-supervised methods. In semi-supervised classification tasks, the Top-1 and Top-5 accuracies saw improvements of over 3% and 2%, respectively. Additionally, the average precision in object detection and instance segmentation transfer learning tasks received consistent boosts. For few-shot transfer learning tasks on Omniglot, miniImageNet, and CIFAR-FS, the performance enhancements were over 5%. These findings indicate that the proposed sampling strategy can reduce spurious correlations and enhance causal learning, thereby improving out-of-distribution generalization. The research has been accepted at the International Conference on Machine Learning (ICML-25), which is a top-tier academic conference in the field of artificial intelligence. The paper's link can be found here: [Paper Link]

Related Links

Related Links

Related Links

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.

Command Palette

CAS Software Institute Proposes New Strategy to Enhance Generalization Ability of Self-Supervised Learning

Related Links

Command Palette

CAS Software Institute Proposes New Strategy to Enhance Generalization Ability of Self-Supervised Learning

Related Links

Command Palette

CAS Software Institute Proposes New Strategy to Enhance Generalization Ability of Self-Supervised Learning

Related Links

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.