HyperAIHyperAI

Command Palette

Search for a command to run...

NOTAI.AI: Erklärbare Erkennung maschinell generierter Texte mittels Krümmung und Feature Attribution

Oleksandr Marchenko Breneur Adelaide Danilov Aria Nourbakhsh Salima Lamsiyah

Zusammenfassung

Wir stellen NOTAI.AI vor, ein erklärbares Framework zur Erkennung maschinell generierter Texte, das Fast-DetectGPT erweitert, indem es krümmungsbasierte Signale mit neuronalen und stilometrischen Merkmalen in einem überwachten Setting integriert. Das System kombiniert 17 interpretierbare Merkmale – darunter die bedingte Wahrscheinlichkeitskrümmung (Conditional Probability Curvature), den ModernBERT-Detektorscore, Lesbarkeitsmetriken und stilometrische Indikatoren – innerhalb eines Meta-Klassifikators auf Basis gradientenverstärkter Bäume (XGBoost), um zu bestimmen, ob ein Text von einem Menschen oder einer KI generiert wurde. Darüber hinaus wendet NOTAI.AI Shapley Additive Explanations (SHAP) an, um sowohl lokale als auch globale Merkmalsattribuierungen auf Merkmalsebene zu liefern. Diese Attribuierungen werden durch eine auf einem Large Language Model (LLM) basierende Erklärungsschicht in strukturierte, natürliche Sprachbegründungen übersetzt, wodurch eine benutzerorientierte Interpretierbarkeit ermöglicht wird. Das System ist als interaktive Webanwendung bereitgestellt, die Echtzeitanalyse, visuelle Merkmalsinspektion und die Präsentation strukturierter Belege unterstützt. Über eine Weboberfläche können Nutzer Text eingeben und nachvollziehen, wie neuronale und statistische Signale die endgültige Entscheidung beeinflussen. Der Quellcode und ein Demonstrationsvideo sind öffentlich verfügbar, um die Reproduzierbarkeit zu gewährleisten.

One-sentence Summary

Researchers from the University of Luxembourg present NOTAI.AI, an explainable framework that enhances Fast-DetectGPT by fusing curvature-based signals with neural and stylo-metric features in an XGBoost classifier. This system uniquely translates SHAP attributions into natural language rationales via an LLM layer, offering real-time, interpretable detection of machine-generated text through an interactive web application.

Key Contributions

  • NOTAI.AI addresses the opacity of existing detectors by extending Fast-DetectGPT to integrate curvature-based signals with neural and stylometric features in a supervised framework.
  • The system employs an XGBoost meta-classifier on 17 interpretable features and uses SHAP to generate local and global attributions that are translated into structured natural-language rationales.
  • Evaluated on a balanced subset of the RAID benchmark, the ensemble achieves higher performance than individual component models and is deployed as an interactive web application for real-time analysis.

Introduction

The rapid adoption of large language models has transformed industries like education and journalism but created urgent challenges regarding text authenticity and information integrity. While prior detection methods rely on probability curvature or supervised neural signals, they often suffer from performance degradation under domain shifts and provide opaque scores that lack actionable justification for end users. To address these gaps, the authors present NOTAI.AI, an explainable detection system that fuses curvature-based signals with neural and stylometric features within a supervised framework. They leverage a XGBoost meta-classifier and SHAP values to generate both local and global feature attributions, which are then converted into natural language rationales and displayed via an interactive web interface to ensure transparency and practical utility.

Dataset

  • Dataset Composition and Sources: The authors utilize the RAID dataset (Dugan et al., 2024), which originally contains a highly imbalanced non-adversarial subset with approximately 2.86% human-written text and 97.14% AI-generated text.

  • Balancing Strategy: To prevent supervised detectors from overfitting to the majority class, the team constructs a balanced evaluation set with a 1:1 human-to-AI ratio. They retain all human-written instances and downsample the AI-generated portion in a stratified manner to ensure equal representation across different generator models.

  • Processing and Reproducibility: Sampling is performed without replacement using a fixed seed (random_state=42) to guarantee reproducibility. The final balanced dataset is created by concatenating the full human subset with the per-generator sampled AI subsets and resetting the indices.

  • Feature Construction and Usage: For training and evaluation, the authors precompute input features to create a feature-augmented version of the dataset. They employ gpt-neo-1.3B by EleutherAI as a proxy language model to compute CPC values for these features.

Method

The NOTAI.AI framework operates as a hybrid, explainable system designed to detect machine-generated text through a four-stage pipeline: Extract, Decide, Explain, and Present. The core of the architecture relies on aggregating diverse signals—neural, statistical, and stylometric—into a unified meta-classification model.

In the feature extraction stage, the system captures distributional, structural, and stylistic properties of the input text using 17 complementary features. The authors leverage a fine-tuned ModernBERT model to extract neural detection probabilities, which capture contextual likelihood signals and semantic fluency patterns. This neural score is combined with Conditional Probability Curvature (CPC), a statistical signal adapted from Fast-DetectGPT. CPC quantifies the second-order variation in token likelihood under local perturbations, distinguishing between the irregular probability landscapes of human writing and the smoother profiles of machine-generated text.

Beyond neural and statistical signals, the system computes interpretable linguistic indicators. These include readability metrics such as Flesch Reading Ease, lexical diversity measures like Type-Token Ratio and hapax legomena ratio, and surface-level stylometric cues. The latter encompasses punctuation counts, stopword ratios, cliché ratios, and maximum repeated nnn-gram frequencies to model repetition patterns and syntactic pacing.

The aggregation of these diverse inputs is visualized in the system architecture below, where individual feature streams converge into a single decision-making unit.

All extracted features are subsequently fed into an Extreme Gradient Boosting (XGBoost) classifier. This meta-classifier learns nonlinear interactions between the neural confidence scores, curvature statistics, and stylometric indicators to determine the final classification. To ensure transparency, the system employs a two-layer explanation mechanism. First, Shapley Additive Explanations (SHAP) are applied to quantify the contribution of each feature to the Boosted Tree decision, providing both local and global interpretability. Second, a Large Language Model (specifically Google Gemma-3-27b-it) translates these mathematical attributions into structured, natural-language rationales. This LLM-based explainer receives the raw text, prediction probability, and feature importance scores to generate concise, user-friendly evidence supporting the final decision.

Experiment

  • Performance evaluation demonstrates that combining complementary feature families yields substantial gains over single-signal baselines, with the full ensemble achieving the highest accuracy and F1 score by integrating distinct aspects of human versus machine-generated text.
  • Curvature features alone prove most effective among individual signals for minimizing false positives, while stylometric features provide strong overall performance and ModernBERT features offer higher recall at the cost of precision.
  • Feature importance analysis identifies Conditional Probability Curvature, Type-Token Ratio, and ModernBERT scores as the primary drivers of the classifier's decisions.
  • The influence of curvature on classification is non-linear, where positive values strongly indicate human authorship, whereas the Type-Token Ratio follows a sigmoid pattern with a specific threshold distinguishing vocabulary-rich samples.
  • ModernBERT scores function as a confirmatory indicator, contributing significantly to the decision only at high confidence levels rather than providing a continuously graded signal.

KI mit KI entwickeln

Von der Idee bis zum Launch – beschleunigen Sie Ihre KI-Entwicklung mit kostenlosem KI-Co-Coding, sofort einsatzbereiter Umgebung und bestem GPU-Preis.

KI-gestütztes kollaboratives Programmieren
Sofort einsatzbereite GPUs
Die besten Preise

HyperAI Newsletters

Abonnieren Sie unsere neuesten Updates
Wir werden die neuesten Updates der Woche in Ihren Posteingang liefern um neun Uhr jeden Montagmorgen
Unterstützt von MailChimp