2 months ago

GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation

Gholami, Mohsen ; Akbari, Mohammad ; Hu, Cindy ; Masrani, Vaden ; Wang, Z. Jane ; Zhang, Yong

Abstract

Knowledge distillation from LLMs is essential for the efficient deployment oflanguage models. Prior works have proposed data generation using LLMs forpreparing distilled models. We argue that generating data with LLMs is prone tosampling mainly from the center of original content distribution. Thislimitation hinders the distilled model from learning the true underlying datadistribution and to forget the tails of the distributions (samples with lowerprobability). To this end, we propose GOLD, a task-agnostic data generation andknowledge distillation framework, which employs an iterativeout-of-distribution-guided feedback mechanism for the LLM. As a result, thegenerated data improves the generalizability of distilled models. Anenergy-based OOD evaluation approach is also introduced to deal with noisygenerated data. Our extensive experiments on 10 different classification andsequence-to-sequence tasks in NLP show that GOLD respectively outperforms priorarts and the LLM with an average improvement of 5% and 14%. We will also showthat the proposed method is applicable to less explored and novel tasks. Thecode is available.