Energy-based Transformers (EBTs)
Energy-Based Transformers (EBTs) are a new type of energy-based model proposed by a team from the University of Virginia on July 2, 2025. They can assign an energy value to each input and candidate prediction pair and achieve prediction by minimizing the energy based on gradient descent until convergence.Energy-Based Transformers are Scalable Learners and Thinkers".
EBTs scale faster during training than the current mainstream Transformer++ approach in both discrete and continuous modality tasks, achieving scaling improvements of up to 35% across multiple dimensions, including data volume, batch size, number of parameters, FLOPs, and model depth. Even with comparable or even inferior pre-training performance, EBTs still outperform existing models on most downstream tasks, demonstrating superior generalization capabilities compared to existing methods.
EBTs are a promising new paradigm that can simultaneously expand a model's learning and thinking capabilities.