HyperAI

UNA Alignment Framework

UNA stands for Unified Alignment Framework, a new alignment framework proposed by a research team from Salesforce and Xiamen University.UNA: Unifying Alignments of RLHF/PPO, DPO and KTO by a Generalized Implicit Reward Function".

The core idea of UNA is to unify different alignment techniques, including RLHF/PPO, DPO and KTO, through a generalized implicit reward function. The innovation of this method is that it merges these alignment techniques into a supervised learning problem, which is to minimize the difference between implicit rewards and explicit rewards.

UNA was proposed to address some limitations of existing alignment techniques. For example, RLHF requires training reward models and policies separately, which is complex, time-consuming, memory-intensive, and unstable during training. Although DPO proposes a mapping relationship between the optimal policy and reward, which simplifies the training process of RLHF, it cannot fully utilize the reward model and is limited to paired preference data. UNA mathematically proves that given the classic RLHF objective, the optimal policy can be induced by a generalized implicit reward function. This new mapping relationship enables UNA to simplify RLHF/PPO while stabilizing, accelerating, and reducing the memory burden of the RL fine-tuning process, and can adapt to different types of feedback, including paired, binary, and scalar feedback.