2 hours ago

Alexander Panfilov Peter Romov Igor Shilov Yves-Alexandre de Montjoye Jonas Geiping Maksym Andriushchenko

Abstract

LLM agents like Claude Code can not only write code but also be used for autonomous AI research and engineering \citep{rank2026posttrainbench, novikov2025alphaevolve}. We show that an \emph{autoresearch}-style pipeline \citep{karpathy2026autoresearch} powered by Claude Code discovers novel white-box adversarial attack \textit{algorithms} that \textbf{significantly outperform all existing (30+) methods} in jailbreaking and prompt injection evaluations.Starting from existing attack implementations, such as GCG~\citep{zou2023universal}, the agent iterates to produce new algorithms achieving up to 40% attack success rate on CBRN queries against GPT-OSS-Safeguard-20B, compared to ≤10% for existing algorithms (\Cref{fig:teaser}, left).The discovered algorithms generalize: attacks optimized on surrogate models transfer directly to held-out models, achieving \textbf{100% ASR against Meta-SecAlign-70B} \citep{chen2025secalign} versus 56% for the best baseline (\Cref{fig:teaser}, middle). Extending the findings of~\cite{carlini2025autoadvexbench}, our results are an early demonstration that incremental safety and security research can be automated using LLM agents. White-box adversarial red-teaming is particularly well-suited for this: existing methods provide strong starting points, and the optimization objective yields dense, quantitative feedback.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

2 hours ago

LLM

DeepSeek

Alexander Panfilov Peter Romov Igor Shilov Yves-Alexandre de Montjoye Jonas Geiping Maksym Andriushchenko

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

2 hours ago

LLM

DeepSeek

Alexander Panfilov Peter Romov Igor Shilov Yves-Alexandre de Montjoye Jonas Geiping Maksym Andriushchenko

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

Alexander Panfilov Peter Romov Igor Shilov Yves-Alexandre de Montjoye Jonas Geiping Maksym Andriushchenko

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

Alexander Panfilov Peter Romov Igor Shilov Yves-Alexandre de Montjoye Jonas Geiping Maksym Andriushchenko

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

Alexander Panfilov Peter Romov Igor Shilov Yves-Alexandre de Montjoye Jonas Geiping Maksym Andriushchenko

Abstract

Build AI with AI

HyperAI Newsletters