HyperAI

Decoding-time Realignment

Decoding-time realignment (DeRa) is a method for adjusting the degree of model alignment when a language model generates an answer. It was proposed in 2024 by researchers from the University of Basel in Switzerland, universities in the UK and France, as well as Google DeepMind and Google Research.Decoding-time Realignment of Language Models" has been accepted by ICML-2024 and selected as a spotlight presentation (accounting for only 3.5% of the total submissions).

The core idea of this technology is to dynamically adjust the alignment of the model during the decoding process without retraining the model, thereby saving computing resources and improving research efficiency. Specifically, the decoding-time realignment method (DeRa) can adjust the ratio between rewards and regularization when generating answers. It approximates different regularization strengths by interpolating the supervised fine-tuning (SFT) model and the alignment model on the original output (logits) to achieve control over the degree of model alignment. This method is simple, flexible, and can adjust the strength of alignment according to different needs, while avoiding the computational overhead of repeatedly training the model.

In addition, this technology has shown good results in multiple experiments. For example, experiments on the Zephyr-7b model show how DeRa adjusts the alignment of the language model during decoding, and experiments on generation length and summary tasks verify DeRa's similarity to the retrained model and its potential in reducing hallucinations.