CLIcK Korean Culture and Language Intelligence Dataset
Date
Size
Publish URL
Categories
The CLIcK dataset was created by the Korea Advanced Institute of Science and Technology to fill the gap in the assessment of cultural and linguistic knowledge in Korean large models. The dataset contains 1995 pairs of question-answer samples from official Korean exams and textbooks, covering two major categories: language and culture, which are divided into 11 subcategories. Each sample provides fine-grained annotations to indicate the cultural and linguistic knowledge required to answer the question.
With official permission, the research team extracted questions from six Korean exams and a textbook, and used GPT-4 to generate new questions to ensure the originality and cultural relevance of the content. After rigorous manual review and classification, CLIcK finally formed a high-quality Korean evaluation benchmark. As an important benchmark for evaluating the cultural and language understanding capabilities of Korean language models, this dataset provides solid data support for promoting research in related fields.