SemiReward: A General Reward Model for Semi-supervised Learning

Semi-supervised learning (SSL) has witnessed great progress with variousimprovements in the self-training framework with pseudo labeling. The mainchallenge is how to distinguish high-quality pseudo labels against theconfirmation bias. However, existing pseudo-label selection strategies arelimited to pre-defined schemes or complex hand-crafted policies speciallydesigned for classification, failing to achieve high-quality labels, fastconvergence, and task versatility simultaneously. To these ends, we propose aSemi-supervised Reward framework (SemiReward) that predicts reward scores toevaluate and filter out high-quality pseudo labels, which is pluggable tomainstream SSL methods in wide task types and scenarios. To mitigateconfirmation bias, SemiReward is trained online in two stages with a generatormodel and subsampling strategy. With classification and regression tasks on 13standard SSL benchmarks across three modalities, extensive experiments verifythat SemiReward achieves significant performance gains and faster convergencespeeds upon Pseudo Label, FlexMatch, and Free/SoftMatch. Code and models areavailable at https://github.com/Westlake-AI/SemiReward.