2 months ago

Assemble Foundation Models for Automatic Code Summarization

Gu, Jian ; Salza, Pasquale ; Gall, Harald C.

Abstract

Automatic code summarization is beneficial to daily software developmentsince it could help reduce the requirement of manual writing. Currently,artificial intelligence is undergoing a paradigm shift. The foundation modelspretrained on massive data and finetuned to downstream tasks surpass speciallycustomized models. This trend inspired us to consider reusing foundation modelsinstead of learning from scratch. Thereby, we propose a flexible and robustapproach for automatic code summarization, based on neural models. We assembleavailable foundation models, such as CodeBERT and GPT-2, into a single neuralmodel named AdaMo. Moreover, we utilize Gaussian noise as the simulation ofcontextual information to optimize the latent representation. Furthermore, weintroduce two adaptive schemes from the perspective of knowledge transfer,namely continuous pretraining and intermediate finetuning, and designintermediate stage tasks for general sequence-to-sequence learning. Finally, weevaluate AdaMo against a benchmark dataset for code summarization, by comparingit with state-of-the-art models.