HyperAIHyperAI
2 months ago

GigaCheck: Detecting LLM-generated Content

Tolstykh, Irina ; Tsybina, Aleksandra ; Yakubson, Sergey ; Gordeev, Aleksandr ; Dokholyan, Vladimir ; Kuprashevich, Maksim
GigaCheck: Detecting LLM-generated Content
Abstract

With the increasing quality and spread of LLM-based assistants, the amount ofLLM-generated content is growing rapidly. In many cases and tasks, such textsare already indistinguishable from those written by humans, and the quality ofgeneration tends to only increase. At the same time, detection methods aredeveloping more slowly, making it challenging to prevent misuse of generativeAI technologies. In this work, we investigate the task of generated text detection byproposing the GigaCheck. Our research explores two approaches: (i)distinguishing human-written texts from LLM-generated ones, and (ii) detectingLLM-generated intervals in Human-Machine collaborative texts. For the firsttask, our approach utilizes a general-purpose LLM, leveraging its extensivelanguage abilities to fine-tune efficiently for the downstream task ofLLM-generated text detection, achieving high performance even with limiteddata. For the second task, we propose a novel approach that combines computervision and natural language processing techniques. Specifically, we use afine-tuned general-purpose LLM in conjunction with a DETR-like detection model,adapted from computer vision, to localize AI-generated intervals within text. We evaluate the GigaCheck on five classification datasets with English textsand three datasets designed for Human-Machine collaborative text analysis. Ourresults demonstrate that GigaCheck outperforms previous methods, even inout-of-distribution settings, establishing a strong baseline across alldatasets.

GigaCheck: Detecting LLM-generated Content | Latest Papers | HyperAI