2달 전

Songyang Liu Chaozhuo Li Chenxu Wang Jinyu Hou Zejian Chen Litian Zhang Zheng Liu Qiwei Ye Yiming Hei Xi Zhang

초록

OpenClaw 는 도구 통합, 로컬 파일 접근, 셸 명령어 실행 등 강력한 기능을 제공하며, 오픈소스 자율 에이전트 런타임 분야에서 선도적인 위치를 빠르게 확보했습니다. 그러나 이러한 광범위한 운영 권한은 중요한 보안 취약점을 초래하여, 모델 오류를 민감 정보 유출, 권한 상승, 악성 서드파티 스킬 실행 등 실질적인 시스템 수준의 위협으로 전환시킵니다. 현재 OpenClaw 생태계에 적용된 보안 조치는 여전히 매우 단편적이며, 에이전트 수명주기의 고립된 단계들만 대상으로 하고 포괄적인 보호를 제공하지 못하고 있습니다. 이러한 격차를 해소하기 위해 우리는 ClawKeeper 를 제안합니다. ClawKeeper 는 세 가지 상호 보완적인 아키텍처 계층에 걸쳐 다차원 보호 메커니즘을 통합한 실시간 보안 프레임워크입니다. (1) 스킬 기반 보호 (Skill-based protection) 는 명령 수준에서 작동하며, 구조화된 보안 정책을 에이전트 컨텍스트에 직접 주입하여 환경별 제약 조건 및 크로스플랫폼 경계를 강제합니다. (2) 플러그인 기반 보호 (Plugin-based protection) 는 내부 런타임 집행자 역할을 수행하여, 실행 파이프라인 전반에 걸쳐 구성 강화, 능동적 위협 탐지, 지속적인 행동 모니터링을 제공합니다. (3) 와처 기반 보호 (Watcher-based protection) 는 에이전트 상태 진화를 지속적으로 검증하는 새로운 형태의 결합이 해제된 시스템 수준 보안 미들웨어를 도입합니다. 이는 에이전트의 내부 논리와 결합되지 않은 상태에서 실시간 실행 개입을 가능하게 하여, 고위험 작업 중단 또는 인간 확인 강제와 같은 작업을 지원합니다. 우리는 이 와처 (Watcher) 패러다임이 차세대 자율 에이전트 시스템을 보안하는 데 있어 핵심적인 기반 구성 요소로 강력한 잠재력을 지닌다고 주장합니다. 다양한 위협 시나리오에 대한 포괄적인 정성적 및 정량적 평가를 통해 ClawKeeper 의 효과성과 견고성이 입증되었습니다. 우리는 관련 소스 코드를 공개합니다.

One-sentence Summary

Researchers from Beijing University of Posts and Telecommunications and the Beijing Academy of Artificial Intelligence propose ClawKeeper, a unified security framework for OpenClaw agents that integrates skills, plugins, and a novel decoupled Watcher to enable real-time, adaptive defense against system-level threats while resolving the safety-utility tradeoff.

Key Contributions

The paper introduces ClawKeeper, a real-time security framework that integrates multi-dimensional protection across three architectural layers to address fragmented safety measures in the OpenClaw ecosystem. This unified approach combines instruction-level policy injection, runtime enforcement, and decoupled system monitoring to provide holistic coverage throughout the agent lifecycle.
A novel Watcher-based protection mechanism is presented as a standalone external middleware that verifies agent state evolution and enables real-time intervention without coupling to internal logic. This design separates safety oversight from task execution, allowing the system to halt high-risk actions or enforce human confirmation while avoiding the traditional safety-utility tradeoff.
Extensive qualitative and quantitative evaluations demonstrate the effectiveness and robustness of the framework across diverse threat scenarios, including sensitive data leakage and malicious skill execution. The work validates that this three-layer architecture outperforms existing point defenses by adapting to emerging threats and providing continuous behavioral monitoring.

Introduction

As autonomous Agents like OpenClaw evolve into operating system-like environments with direct access to local files and shell commands, they introduce critical security risks where model errors can escalate into system-level threats such as data leakage and privilege abuse. Prior security measures suffer from fragmented coverage that addresses only isolated lifecycle stages, while also struggling with a safety-utility tradeoff, reactive post-hoc analysis, and static defense mechanisms that cannot adapt to the agent's self-evolving nature. To address these gaps, the authors present ClawKeeper, a unified real-time security framework that integrates multi-dimensional protection across three layers: instruction-level Skill policies, runtime Plugin enforcement, and a novel decoupled Watcher middleware that enables proactive intervention and regulatory separation without coupling to the agent's internal logic.

Dataset

The authors construct a benchmark to assess the security capabilities of CLAWKEEPER, comprising seven safety task categories aligned with the OWASP Agent Security Initiative and open-source defense taxonomies.
Each of the seven categories contains 20 adversarial instances, split equally into 10 simple and 10 complex examples.
Human annotators independently score every instance to determine if the defense succeeds, following the evaluation protocol of AgentSafetyBench.
The dataset serves as a systematic evaluation tool rather than a training corpus, with no specific training splits or mixture ratios mentioned for model development.
Representative examples and definitions for each category are summarized in Table 4 of the paper.

Method

The authors propose ClawKeeper, a comprehensive security framework designed to unify three complementary protection paradigms into a multi-layered architecture for the OpenClaw ecosystem. This system integrates skill-based context enforcement, plugin-based runtime hardening, and an independent Watcher for external behavior verification. Refer to the framework diagram for a high-level view of how these three pillars converge into a unified security core.

The first layer, Skill-based Protection, operates at the instruction level where the agent constructs its inference context. Security rules are defined as structured Markdown documents that the agent can directly interpret and enforce. This design allows for low-cost deployment without modifying the underlying framework. The protection mechanism covers two dimensions: system-level constraints for diverse operating systems like Windows and Linux, and software-level constraints for communication platforms such as Telegram or Feishu. To enhance robustness, the framework incorporates inspection scripts that perform scheduled security scanning and interaction summarization. As shown in the figure below, this approach allows policies to be continuously applied throughout the entire interaction lifecycle.

The second layer, Plugin-based Protection, functions as a hard-coded enforcement layer within the OpenClaw runtime. Unlike prompt-level defenses, this module affords direct control over system behavior to ensure comprehensive security coverage. The plugin acts as a comprehensive security auditor, scanner, and hardening enforcer. It executes detailed Threat Detection to identify misconfigurations and known vulnerabilities aligned with OWASP Agentic Security guidelines. To maintain integrity, the Configuration Protection module generates cryptographic hash backups of critical operational files. Furthermore, a Behavior Scanning mechanism analyzes historical execution flows to detect latent threat patterns such as prompt injections or dangerous commands. The figure below illustrates the specific modules including Threat Detection, Configuration Protection, Monitoring, Behavior Scanning, and Hardening.

The third layer, Watcher-based Protection, introduces an independent external agent that functions as a dedicated security auditor. This decoupled architecture addresses the limitations of tightly coupled safety components by separating task execution from safety enforcement. The Watcher is implemented as a separate OpenClaw instance equipped with specialized monitoring skills. It communicates with the task-executing agent via a persistent WebSocket connection to perform real-time safety diagnosis. If a potentially unsafe trajectory is detected, the Watcher signals the agent to pause and seek user confirmation. The framework supports flexible deployment configurations, including Local Deployment for privacy-sensitive scenarios and Cloud Deployment for centralized governance. As shown in the figure below, the Watcher provides observability, trigger awareness, and execution intervention while maintaining a decoupled design.

Experiment

Comparative evaluation against seven open-source baselines validates that CLAWKEEPER's unified three-layer architecture achieves significantly higher defense success rates across all seven safety task categories, whereas existing methods suffer from severe coverage fragmentation and only moderate effectiveness within their limited scopes.
Self-evolution experiments demonstrate that the Watcher component continuously improves its defense capabilities by processing new adversarial cases, increasing success rates through dynamic updates to monitoring skills and risk thresholds, a capability absent in static plugin or skill-based approaches.
Qualitative case studies confirm that skill-based protection effectively enforces context-aware security protocols at system and software perimeters while enabling autonomous periodic self-auditing without human intervention.
Plugin-based assessments validate that the Hardening module prevents sensitive data exfiltration by injecting risk-aware rules into core configurations, while integrated scanners successfully identify latent vulnerabilities and provide actionable remediation steps.
Watcher-based scenarios illustrate the system's ability to intercept unsafe behaviors in real-time, including blocking dangerous command execution, halting excessive tool chaining, and stopping automated retry loops following upstream failures to enforce strict human-in-the-loop safety policies.

소스 PDF 코드 보기

AI로 AI 구축

아이디어에서 출시까지 — 무료 AI 코코딩, 즉시 사용 가능한 환경, 최적의 GPU 가격으로 AI 개발을 가속화하세요.

AI 협업 코딩

바로 사용 가능한 GPU

최적의 가격

시작하기 가격 보기

HyperAI Newsletters

최신 정보 구독하기

한국 시간 매주 월요일 오전 9시 에 이번 주의 최신 업데이트를 메일로 발송합니다

이메일 서비스 제공: MailChimp

2달 전

Songyang Liu Chaozhuo Li Chenxu Wang Jinyu Hou Zejian Chen Litian Zhang Zheng Liu Qiwei Ye Yiming Hei Xi Zhang

초록

One-sentence Summary

Key Contributions

The paper introduces ClawKeeper, a real-time security framework that integrates multi-dimensional protection across three architectural layers to address fragmented safety measures in the OpenClaw ecosystem. This unified approach combines instruction-level policy injection, runtime enforcement, and decoupled system monitoring to provide holistic coverage throughout the agent lifecycle.
A novel Watcher-based protection mechanism is presented as a standalone external middleware that verifies agent state evolution and enables real-time intervention without coupling to internal logic. This design separates safety oversight from task execution, allowing the system to halt high-risk actions or enforce human confirmation while avoiding the traditional safety-utility tradeoff.
Extensive qualitative and quantitative evaluations demonstrate the effectiveness and robustness of the framework across diverse threat scenarios, including sensitive data leakage and malicious skill execution. The work validates that this three-layer architecture outperforms existing point defenses by adapting to emerging threats and providing continuous behavioral monitoring.

Introduction

Dataset

The authors construct a benchmark to assess the security capabilities of CLAWKEEPER, comprising seven safety task categories aligned with the OWASP Agent Security Initiative and open-source defense taxonomies.
Each of the seven categories contains 20 adversarial instances, split equally into 10 simple and 10 complex examples.
Human annotators independently score every instance to determine if the defense succeeds, following the evaluation protocol of AgentSafetyBench.
The dataset serves as a systematic evaluation tool rather than a training corpus, with no specific training splits or mixture ratios mentioned for model development.
Representative examples and definitions for each category are summarized in Table 4 of the paper.

Method

Experiment

Comparative evaluation against seven open-source baselines validates that CLAWKEEPER's unified three-layer architecture achieves significantly higher defense success rates across all seven safety task categories, whereas existing methods suffer from severe coverage fragmentation and only moderate effectiveness within their limited scopes.
Self-evolution experiments demonstrate that the Watcher component continuously improves its defense capabilities by processing new adversarial cases, increasing success rates through dynamic updates to monitoring skills and risk thresholds, a capability absent in static plugin or skill-based approaches.
Qualitative case studies confirm that skill-based protection effectively enforces context-aware security protocols at system and software perimeters while enabling autonomous periodic self-auditing without human intervention.
Plugin-based assessments validate that the Hardening module prevents sensitive data exfiltration by injecting risk-aware rules into core configurations, while integrated scanners successfully identify latent vulnerabilities and provide actionable remediation steps.
Watcher-based scenarios illustrate the system's ability to intercept unsafe behaviors in real-time, including blocking dangerous command execution, halting excessive tool chaining, and stopping automated retry loops following upstream failures to enforce strict human-in-the-loop safety policies.

소스 PDF 코드 보기

AI로 AI 구축

아이디어에서 출시까지 — 무료 AI 코코딩, 즉시 사용 가능한 환경, 최적의 GPU 가격으로 AI 개발을 가속화하세요.

AI 협업 코딩

바로 사용 가능한 GPU

최적의 가격

시작하기 가격 보기

HyperAI Newsletters

최신 정보 구독하기

한국 시간 매주 월요일 오전 9시 에 이번 주의 최신 업데이트를 메일로 발송합니다

이메일 서비스 제공: MailChimp

Command Palette

ClawKeeper: 기술, 플러그인 및 워처를 통한 오픈클로 에이전트의 포괄적 안전 보호

Songyang Liu Chaozhuo Li Chenxu Wang Jinyu Hou Zejian Chen Litian Zhang Zheng Liu Qiwei Ye Yiming Hei Xi Zhang1 more

초록

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

AI로 AI 구축

HyperAI Newsletters

Command Palette

ClawKeeper: 기술, 플러그인 및 워처를 통한 오픈클로 에이전트의 포괄적 안전 보호

Songyang Liu Chaozhuo Li Chenxu Wang Jinyu Hou Zejian Chen Litian Zhang Zheng Liu Qiwei Ye Yiming Hei Xi Zhang1 more

초록

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

AI로 AI 구축

HyperAI Newsletters

Command Palette

ClawKeeper: 기술, 플러그인 및 워처를 통한 오픈클로 에이전트의 포괄적 안전 보호

Songyang Liu Chaozhuo Li Chenxu Wang Jinyu Hou Zejian Chen Litian Zhang Zheng Liu Qiwei Ye Yiming Hei Xi Zhang1 more

초록

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

AI로 AI 구축

HyperAI Newsletters

Songyang Liu Chaozhuo Li Chenxu Wang Jinyu Hou Zejian Chen Litian Zhang Zheng Liu Qiwei Ye Yiming Hei Xi Zhang

Songyang Liu Chaozhuo Li Chenxu Wang Jinyu Hou Zejian Chen Litian Zhang Zheng Liu Qiwei Ye Yiming Hei Xi Zhang

Songyang Liu Chaozhuo Li Chenxu Wang Jinyu Hou Zejian Chen Litian Zhang Zheng Liu Qiwei Ye Yiming Hei Xi Zhang