2 months ago

Zephyr: Direct Distillation of LM Alignment

Tunstall, Lewis ; Beeching, Edward ; Lambert, Nathan ; Rajani, Nazneen ; Rasul, Kashif ; Belkada, Younes ; Huang, Shengyi ; von Werra, Leandro ; Fourrier, Clémentine ; Habib, Nathan ; Sarrazin, Nathan ; Sanseviero, Omar ; Rush, Alexander M. ; Wolf, Thomas

View Paper Details

Zephyr: Direct Distillation of LM Alignment

Abstract

We aim to produce a smaller language model that is aligned to user intent.Previous research has shown that applying distilled supervised fine-tuning(dSFT) on larger models significantly improves task accuracy; however, thesemodels are unaligned, i.e. they do not respond well to natural prompts. Todistill this property, we experiment with the use of preference data from AIFeedback (AIF). Starting from a dataset of outputs ranked by a teacher model,we apply distilled direct preference optimization (dDPO) to learn a chat modelwith significantly improved intent alignment. The approach requires only a fewhours of training without any additional sampling during fine-tuning. The finalresult, Zephyr-7B, sets the state-of-the-art on chat benchmarks for 7Bparameter models, and requires no human annotation. In particular, results onMT-Bench show that Zephyr-7B surpasses Llama2-Chat-70B, the best open-accessRLHF-based model. Code, models, data, and tutorials for the system areavailable at https://github.com/huggingface/alignment-handbook.