8 months ago

Abstract

Despite the substantial success of Information Retrieval (IR) in various NLPtasks, most IR systems predominantly handle queries and corpora in naturallanguage, neglecting the domain of code retrieval. Code retrieval is criticallyimportant yet remains under-explored, with existing methods and benchmarksinadequately representing the diversity of code in various domains and tasks.Addressing this gap, we present COIR (Code Information Retrieval Benchmark), arobust and comprehensive benchmark specifically designed to assess coderetrieval capabilities. COIR comprises ten meticulously curated code datasets,spanning eight distinctive retrieval tasks across seven diverse domains. Wefirst discuss the construction of COIR and its diverse dataset composition.Further, we evaluate nine widely used retrieval models using COIR, uncoveringsignificant difficulties in performing code retrieval tasks even withstate-of-the-art systems. To facilitate easy adoption and integration withinexisting research workflows, COIR has been developed as a user-friendly Pythonframework, readily installable via pip. It shares same data schema as otherpopular benchmarks like MTEB and BEIR, enabling seamless cross-benchmarkevaluations. Through COIR, we aim to invigorate research in the code retrievaldomain, providing a versatile benchmarking tool that encourages furtherdevelopment and exploration of code retrieval systems.https://github.com/CoIR-team/coir.

Source PDF