HyperAIHyperAI

Command Palette

Search for a command to run...

NOTAI.AI:曲率と特徴帰属による機械生成テキストの説明可能な検出

Oleksandr Marchenko Breneur Adelaide Danilov Aria Nourbakhsh Salima Lamsiyah

概要

本研究では、機械生成テキストの検出を対象とした説明可能なフレームワーク「NOTAI.AI」を提案する。本フレームワークは、Fast-DetectGPT を拡張し、教師あり学習の枠組みにおいて、曲率に基づく信号をニューラル特徴量および文体論的特徴量と統合することで実現されている。本システムは、条件付き確率曲率(Conditional Probability Curvature)、ModernBERT 検出器スコア、可読性指標、文体論的手がかりを含む 17 の解釈可能な特徴量を組み合わせ、勾配ブースティング木(XGBoost)メタ分類器を用いて、テキストが人間によって生成されたものか、AI によって生成されたものかを判定する。さらに、NOTAI.AI は Shapley 加法的説明(SHAP)を適用し、局所的および大域的な特徴量レベルの帰属を提供する。これらの帰属は、LLM ベースの説明層を介して構造化された自然言語による根拠へと変換され、ユーザー向けの解釈可能性を実現する。本システムは、リアルタイム分析、視覚的な特徴量検査、構造化された証拠提示を支援するインタラクティブな Web アプリケーションとして展開されている。Web インターフェースを通じて、ユーザーはテキストを入力し、ニューラル信号および統計的信号が最終的な判定にどのように影響するかを検証することができる。再現性を支援するため、ソースコードおよびデモ動画は公開されている。

One-sentence Summary

Researchers from the University of Luxembourg present NOTAI.AI, an explainable framework that enhances Fast-DetectGPT by fusing curvature-based signals with neural and stylo-metric features in an XGBoost classifier. This system uniquely translates SHAP attributions into natural language rationales via an LLM layer, offering real-time, interpretable detection of machine-generated text through an interactive web application.

Key Contributions

  • NOTAI.AI addresses the opacity of existing detectors by extending Fast-DetectGPT to integrate curvature-based signals with neural and stylometric features in a supervised framework.
  • The system employs an XGBoost meta-classifier on 17 interpretable features and uses SHAP to generate local and global attributions that are translated into structured natural-language rationales.
  • Evaluated on a balanced subset of the RAID benchmark, the ensemble achieves higher performance than individual component models and is deployed as an interactive web application for real-time analysis.

Introduction

The rapid adoption of large language models has transformed industries like education and journalism but created urgent challenges regarding text authenticity and information integrity. While prior detection methods rely on probability curvature or supervised neural signals, they often suffer from performance degradation under domain shifts and provide opaque scores that lack actionable justification for end users. To address these gaps, the authors present NOTAI.AI, an explainable detection system that fuses curvature-based signals with neural and stylometric features within a supervised framework. They leverage a XGBoost meta-classifier and SHAP values to generate both local and global feature attributions, which are then converted into natural language rationales and displayed via an interactive web interface to ensure transparency and practical utility.

Dataset

  • Dataset Composition and Sources: The authors utilize the RAID dataset (Dugan et al., 2024), which originally contains a highly imbalanced non-adversarial subset with approximately 2.86% human-written text and 97.14% AI-generated text.

  • Balancing Strategy: To prevent supervised detectors from overfitting to the majority class, the team constructs a balanced evaluation set with a 1:1 human-to-AI ratio. They retain all human-written instances and downsample the AI-generated portion in a stratified manner to ensure equal representation across different generator models.

  • Processing and Reproducibility: Sampling is performed without replacement using a fixed seed (random_state=42) to guarantee reproducibility. The final balanced dataset is created by concatenating the full human subset with the per-generator sampled AI subsets and resetting the indices.

  • Feature Construction and Usage: For training and evaluation, the authors precompute input features to create a feature-augmented version of the dataset. They employ gpt-neo-1.3B by EleutherAI as a proxy language model to compute CPC values for these features.

Method

The NOTAI.AI framework operates as a hybrid, explainable system designed to detect machine-generated text through a four-stage pipeline: Extract, Decide, Explain, and Present. The core of the architecture relies on aggregating diverse signals—neural, statistical, and stylometric—into a unified meta-classification model.

In the feature extraction stage, the system captures distributional, structural, and stylistic properties of the input text using 17 complementary features. The authors leverage a fine-tuned ModernBERT model to extract neural detection probabilities, which capture contextual likelihood signals and semantic fluency patterns. This neural score is combined with Conditional Probability Curvature (CPC), a statistical signal adapted from Fast-DetectGPT. CPC quantifies the second-order variation in token likelihood under local perturbations, distinguishing between the irregular probability landscapes of human writing and the smoother profiles of machine-generated text.

Beyond neural and statistical signals, the system computes interpretable linguistic indicators. These include readability metrics such as Flesch Reading Ease, lexical diversity measures like Type-Token Ratio and hapax legomena ratio, and surface-level stylometric cues. The latter encompasses punctuation counts, stopword ratios, cliché ratios, and maximum repeated nnn-gram frequencies to model repetition patterns and syntactic pacing.

The aggregation of these diverse inputs is visualized in the system architecture below, where individual feature streams converge into a single decision-making unit.

All extracted features are subsequently fed into an Extreme Gradient Boosting (XGBoost) classifier. This meta-classifier learns nonlinear interactions between the neural confidence scores, curvature statistics, and stylometric indicators to determine the final classification. To ensure transparency, the system employs a two-layer explanation mechanism. First, Shapley Additive Explanations (SHAP) are applied to quantify the contribution of each feature to the Boosted Tree decision, providing both local and global interpretability. Second, a Large Language Model (specifically Google Gemma-3-27b-it) translates these mathematical attributions into structured, natural-language rationales. This LLM-based explainer receives the raw text, prediction probability, and feature importance scores to generate concise, user-friendly evidence supporting the final decision.

Experiment

  • Performance evaluation demonstrates that combining complementary feature families yields substantial gains over single-signal baselines, with the full ensemble achieving the highest accuracy and F1 score by integrating distinct aspects of human versus machine-generated text.
  • Curvature features alone prove most effective among individual signals for minimizing false positives, while stylometric features provide strong overall performance and ModernBERT features offer higher recall at the cost of precision.
  • Feature importance analysis identifies Conditional Probability Curvature, Type-Token Ratio, and ModernBERT scores as the primary drivers of the classifier's decisions.
  • The influence of curvature on classification is non-linear, where positive values strongly indicate human authorship, whereas the Type-Token Ratio follows a sigmoid pattern with a specific threshold distinguishing vocabulary-rich samples.
  • ModernBERT scores function as a confirmatory indicator, contributing significantly to the decision only at high confidence levels rather than providing a continuously graded signal.

AIでAIを構築

アイデアからローンチまで — 無料のAIコーディング支援、すぐに使える環境、最高のGPU価格でAI開発を加速。

AI コーディング補助
すぐに使える GPU
最適な料金体系

HyperAI Newsletters

最新情報を購読する
北京時間 毎週月曜日の午前9時 に、その週の最新情報をメールでお届けします
メール配信サービスは MailChimp によって提供されています