HyperAIHyperAI
2 months ago

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Lu, Shuai ; Guo, Daya ; Ren, Shuo ; Huang, Junjie ; Svyatkovskiy, Alexey ; Blanco, Ambrosio ; Clement, Colin ; Drain, Dawn ; Jiang, Daxin ; Tang, Duyu ; Li, Ge ; Zhou, Lidong ; Shou, Linjun ; Zhou, Long ; Tufano, Michele ; Gong, Ming ; Zhou, Ming ; Duan, Nan ; Sundaresan, Neel ; Deng, Shao Kun ; Fu, Shengyu ; Liu, Shujie
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding
  and Generation
Abstract

Benchmark datasets have a significant impact on accelerating research inprogramming language tasks. In this paper, we introduce CodeXGLUE, a benchmarkdataset to foster machine learning research for program understanding andgeneration. CodeXGLUE includes a collection of 10 tasks across 14 datasets anda platform for model evaluation and comparison. CodeXGLUE also features threebaseline systems, including the BERT-style, GPT-style, and Encoder-Decodermodels, to make it easy for researchers to use the platform. The availabilityof such data and baselines can help the development and validation of newmethods that can be applied to various program understanding and generationproblems.

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation | Latest Papers | HyperAI