HyperAIHyperAI

Command Palette

Search for a command to run...

NOTAI.AI: 곡률 및 특성 귀인을 통한 기계 생성 텍스트의 설명 가능한 탐지

Oleksandr Marchenko Breneur Adelaide Danilov Aria Nourbakhsh Salima Lamsiyah

초록

본 논문은 기계 생성 텍스트 탐지를 위한 설명 가능한 프레임워크인 NOTAI.AI 를 제안합니다. 본 프레임워크는 Fast-DetectGPT 를 확장하여, 지도 학습 환경에서 곡률 기반 신호와 신경망 및 양식론적 (stylometric) 특징을 통합합니다. 본 시스템은 조건부 확률 곡률 (Conditional Probability Curvature), ModernBERT 탐지기 점수, 가독성 지표, 양식론적 단서 등 17 가지 해석 가능한 특징을 통합하고, 이를 그래디언트 부스팅 트리 (XGBoost) 메타 분류기에 적용하여 텍스트가 인간 생성인지 AI 생성인지를 판별합니다. 또한, NOTAI.AI 는 Shapley Additive Explanations (SHAP) 를 적용하여 국부적 및 전역적 특징 수준의 귀속 분석을 제공합니다. 이러한 귀속 결과는 LLM 기반 설명 계층을 통해 구조화된 자연어 논거로 변환되어 사용자 중심의 해석 가능성을 실현합니다. 본 시스템은 실시간 분석, 시각적 특징 검토, 구조화된 증거 제시를 지원하는 대화형 웹 애플리케이션으로 배포되었습니다. 웹 인터페이스를 통해 사용자는 텍스트를 입력하고 신경망 및 통계적 신호가 최종 결정에 미치는 영향을 검토할 수 있습니다. 재현성을 지원하기 위해 소스 코드와 데모 비디오는 공개되어 있습니다.

One-sentence Summary

Researchers from the University of Luxembourg present NOTAI.AI, an explainable framework that enhances Fast-DetectGPT by fusing curvature-based signals with neural and stylo-metric features in an XGBoost classifier. This system uniquely translates SHAP attributions into natural language rationales via an LLM layer, offering real-time, interpretable detection of machine-generated text through an interactive web application.

Key Contributions

  • NOTAI.AI addresses the opacity of existing detectors by extending Fast-DetectGPT to integrate curvature-based signals with neural and stylometric features in a supervised framework.
  • The system employs an XGBoost meta-classifier on 17 interpretable features and uses SHAP to generate local and global attributions that are translated into structured natural-language rationales.
  • Evaluated on a balanced subset of the RAID benchmark, the ensemble achieves higher performance than individual component models and is deployed as an interactive web application for real-time analysis.

Introduction

The rapid adoption of large language models has transformed industries like education and journalism but created urgent challenges regarding text authenticity and information integrity. While prior detection methods rely on probability curvature or supervised neural signals, they often suffer from performance degradation under domain shifts and provide opaque scores that lack actionable justification for end users. To address these gaps, the authors present NOTAI.AI, an explainable detection system that fuses curvature-based signals with neural and stylometric features within a supervised framework. They leverage a XGBoost meta-classifier and SHAP values to generate both local and global feature attributions, which are then converted into natural language rationales and displayed via an interactive web interface to ensure transparency and practical utility.

Dataset

  • Dataset Composition and Sources: The authors utilize the RAID dataset (Dugan et al., 2024), which originally contains a highly imbalanced non-adversarial subset with approximately 2.86% human-written text and 97.14% AI-generated text.

  • Balancing Strategy: To prevent supervised detectors from overfitting to the majority class, the team constructs a balanced evaluation set with a 1:1 human-to-AI ratio. They retain all human-written instances and downsample the AI-generated portion in a stratified manner to ensure equal representation across different generator models.

  • Processing and Reproducibility: Sampling is performed without replacement using a fixed seed (random_state=42) to guarantee reproducibility. The final balanced dataset is created by concatenating the full human subset with the per-generator sampled AI subsets and resetting the indices.

  • Feature Construction and Usage: For training and evaluation, the authors precompute input features to create a feature-augmented version of the dataset. They employ gpt-neo-1.3B by EleutherAI as a proxy language model to compute CPC values for these features.

Method

The NOTAI.AI framework operates as a hybrid, explainable system designed to detect machine-generated text through a four-stage pipeline: Extract, Decide, Explain, and Present. The core of the architecture relies on aggregating diverse signals—neural, statistical, and stylometric—into a unified meta-classification model.

In the feature extraction stage, the system captures distributional, structural, and stylistic properties of the input text using 17 complementary features. The authors leverage a fine-tuned ModernBERT model to extract neural detection probabilities, which capture contextual likelihood signals and semantic fluency patterns. This neural score is combined with Conditional Probability Curvature (CPC), a statistical signal adapted from Fast-DetectGPT. CPC quantifies the second-order variation in token likelihood under local perturbations, distinguishing between the irregular probability landscapes of human writing and the smoother profiles of machine-generated text.

Beyond neural and statistical signals, the system computes interpretable linguistic indicators. These include readability metrics such as Flesch Reading Ease, lexical diversity measures like Type-Token Ratio and hapax legomena ratio, and surface-level stylometric cues. The latter encompasses punctuation counts, stopword ratios, cliché ratios, and maximum repeated nnn-gram frequencies to model repetition patterns and syntactic pacing.

The aggregation of these diverse inputs is visualized in the system architecture below, where individual feature streams converge into a single decision-making unit.

All extracted features are subsequently fed into an Extreme Gradient Boosting (XGBoost) classifier. This meta-classifier learns nonlinear interactions between the neural confidence scores, curvature statistics, and stylometric indicators to determine the final classification. To ensure transparency, the system employs a two-layer explanation mechanism. First, Shapley Additive Explanations (SHAP) are applied to quantify the contribution of each feature to the Boosted Tree decision, providing both local and global interpretability. Second, a Large Language Model (specifically Google Gemma-3-27b-it) translates these mathematical attributions into structured, natural-language rationales. This LLM-based explainer receives the raw text, prediction probability, and feature importance scores to generate concise, user-friendly evidence supporting the final decision.

Experiment

  • Performance evaluation demonstrates that combining complementary feature families yields substantial gains over single-signal baselines, with the full ensemble achieving the highest accuracy and F1 score by integrating distinct aspects of human versus machine-generated text.
  • Curvature features alone prove most effective among individual signals for minimizing false positives, while stylometric features provide strong overall performance and ModernBERT features offer higher recall at the cost of precision.
  • Feature importance analysis identifies Conditional Probability Curvature, Type-Token Ratio, and ModernBERT scores as the primary drivers of the classifier's decisions.
  • The influence of curvature on classification is non-linear, where positive values strongly indicate human authorship, whereas the Type-Token Ratio follows a sigmoid pattern with a specific threshold distinguishing vocabulary-rich samples.
  • ModernBERT scores function as a confirmatory indicator, contributing significantly to the decision only at high confidence levels rather than providing a continuously graded signal.

AI로 AI 구축

아이디어에서 출시까지 — 무료 AI 코코딩, 즉시 사용 가능한 환경, 최적의 GPU 가격으로 AI 개발을 가속화하세요.

AI 협업 코딩
바로 사용 가능한 GPU
최적의 가격

HyperAI Newsletters

최신 정보 구독하기
한국 시간 매주 월요일 오전 9시 에 이번 주의 최신 업데이트를 메일로 발송합니다
이메일 서비스 제공: MailChimp
NOTAI.AI: 곡률 및 특성 귀인을 통한 기계 생성 텍스트의 설명 가능한 탐지 | 문서 | HyperAI초신경