HyperAIHyperAI

Command Palette

Search for a command to run...

SkillClaw: دع المهارات تتطور بشكل جماعي باستخدام Agentic Evolver

Ziyu Ma Shidong Yang Yuxiang Ji Xucong Wang Yong Wang Yiming Hu Tongwen Huang Xiangxiang Chu

الملخص

بناءً على طلبك، إليك الترجمة الاحترافية للنص إلى اللغة العربية، مع مراعاة المصطلحات التقنية والأسلوب الأكاديمي الرصين:تعتمد عملاء (agents) النماذج اللغوية الكبيرة (LLM)، مثل OpenClaw، على مهارات قابلة لإعادة الاستخدام لأداء المهام المعقدة، ومع ذلك تظل هذه المهارات ثابتة إلى حد كبير بعد مرحلة النشر (deployment). ونتيجة لذلك، يتم إعادة اكتشاف سير العمل (workflows) المتشابهة، وأنماط استخدام الأدوات، وأنماط الفشل (failure modes) بشكل متكرر عبر المستخدمين المختلفين، مما يمنع النظام من التحسن من خلال الخبرة المكتسبة. ورغم أن التفاعلات من مستخدمين مختلفين توفر إشارات متكاملة حول متى تنجح المهارة أو تفشل، إلا أن الأنظمة الحالية تفتقر إلى آلية لتحويل هذه الخبرات غير المتجانسة إلى تحديثات موثوقة للمهارات.ولمعالجة هذه المشكلات، نقدم SkillClaw، وهو إطار عمل للتطور الجماعي للمهارات في الأنظمة البيئية للعملاء متعدد المستخدمين (multi-user agent ecosystems)، والذي يتعامل مع التفاعلات العابرة للمستخدمين وعبر الزمن كإشارة أساسية لتحسين المهارات. يقوم SkillClaw بتجميع المسارات (trajectories) التي يتم إنشاؤها أثناء الاستخدام بشكل مستمر، ثم يعالجها باستخدام مطور ذاتي (autonomous evolver)، والذي يقوم بتحديد الأنماط السلوكية المتكررة وترجمتها إلى تحديثات لمجموعة المهارات، وذلك إما عن طريق تحسين المهارات الحالية أو توسيعها بقدرات جديدة.يتم الاحتفاظ بالمهارات الناتجة في مستودع مشترك (shared repository) ومزامنتها عبر المستخدمين، مما يسمح للتحسينات المكتشفة في سياق معين بالانتشار على مستوى النظام بأكمله دون الحاجة إلى أي مجهود إضافي من قبل المستخدمين. ومن خلال دمج خبرات المستخدمين المتعددين في تحديثات المهارات المستمرة، يتيح SkillClaw نقل المعرفة عبر المستخدمين وتحسين القدرات التراكمية. وتُظهر التجارب على WildClawBench أنه حتى مع وجود تفاعل وتغذية راجعة (feedback) محدودين، فإن النظام يحسن بشكل كبير أداء Qwen3-Max في سيناريوهات الـ agent في العالم الحقيقي.

One-sentence Summary

SkillClaw is a framework for collective skill evolution in multi-user agent ecosystems that utilizes an autonomous evolver to transform heterogeneous interaction trajectories into refined or extended skills, enabling system-wide knowledge transfer and cumulative capability improvements that significantly enhance Qwen3-Max performance on WildClawBench.

Key Contributions

  • The paper introduces SkillClaw, a framework designed for collective skill evolution within multi-user agent ecosystems. This framework enables the continuous transformation of interaction trajectories into shared evidence to facilitate system-wide capability growth.
  • The method utilizes an autonomous evolver that identifies recurring behavioral patterns from aggregated user data to refine existing skills or create new ones. This process allows improvements discovered in a single context to propagate through a shared repository to all users.
  • Experiments conducted on the WildClawBench benchmark demonstrate that SkillClaw significantly improves the performance of the Qwen3-Max model in real-world agent scenarios, even when provided with limited interaction and feedback.

Introduction

Large language model (LLM) agents rely on reusable skills to execute complex, multi-step workflows. While these skills are essential for coordinating tools and reasoning, current skill libraries remain largely static after deployment. Existing approaches often focus on local memory or individual session refinement, which prevents improvements discovered by one user from benefiting others. This lack of a collective mechanism means that similar failures and successful workarounds are repeatedly rediscovered across different users, hindering system-level capability growth.

The authors leverage a framework called SkillClaw to enable collective skill evolution within multi-user agent ecosystems. SkillClaw continuously aggregates interaction trajectories from various users to create a shared evidence base of successful patterns and recurring failure modes. An autonomous agentic evolver then analyzes this aggregated data to refine existing skills or create new ones through open-ended reasoning. By synchronizing these updates across a shared repository, the framework allows improvements discovered in a single context to propagate system-wide, turning individual experiences into cumulative intelligence.

Dataset

Dataset overview
Dataset overview

The authors utilize a specialized data framework centered around WildClawBench and a structured agent session repository to drive skill evolution. The dataset composition and processing details are as follows:

  • Benchmark Source: The evaluation is based on WildClawBench, a real-world agent benchmark containing 60 complex tasks. These tasks are distributed across six capability domains, including productivity workflows, code execution, social interaction, retrieval, creative generation, and safety alignment.
  • Session Data Composition: The dataset includes pre-processed agent session JSON files. Each session contains a unique identifier, the associated task ID, the number of interaction turns, and aggregate statistics such as mean ORM scores, success or failure counts, and stability metrics.
  • Data Processing and Metadata:
    • Trajectory Truncation: To maintain compactness, step-by-step trajectories are truncated to approximately 400 characters per field. These traces include skill usage, tool call arguments, outcomes, and PRM/ORM scores.
    • Analytical Summarization: An LLM-generated summary (8 to 15 sentences) is appended to each session, detailing the agent's strategy, tool usage patterns, and skill effectiveness.
    • Skill History Construction: The authors maintain a versioned history for each skill. This includes snapshots of the skill documentation (SKILL.md) and corresponding evidence files that link specific session feedback to subsequent skill iterations.
  • Usage in Skill Evolution: The data is used within an Agentic Evolve Prompt framework. The system analyzes the session logs, specifically looking at recurring tool failures, representative PRM scores, and relevant task IDs to drive the iterative refinement of the skill library.

Method

The authors leverage a multi-stage framework to enable collective skill evolution across independently operating agents, forming a closed-loop system that transforms isolated interaction sessions into a shared, evolving skill repository. At the core of this architecture is the SkillClaw system, which operates through a centralized evolution engine that periodically processes interaction data from all agents. Each agent, upon completing a task, records its full interaction session—comprising the user prompt, the agent's actions (including tool calls), intermediate feedback, and the final response—and uploads it as structured evidence. This evidence is then aggregated and grouped by the skills referenced in each session, enabling cross-user analysis of skill performance under diverse conditions. The system's overall workflow proceeds through four main phases: interaction, evidence collection, evolution, and synchronization, forming a continuous loop where updated skills inform future interactions and generate new evidence.

System Architecture Overview
System Architecture Overview

The central component of this framework is the agentic evolver, an LLM-based agent that operates within an agent harness. This harness provides the evolver with structured inputs—grouped session evidence, the current skill definition, and a set of permitted evolution actions—without constraining its reasoning. The evolver analyzes both successful and failed executions of a skill to diagnose root causes, then selects one of three actions: refine, create, or skip. For refinement, the evolver proposes targeted edits to correct identified errors or improve robustness, guided by conservative editing principles that preserve the original skill structure and only modify sections where evidence indicates deficiencies. For creation, the evolver identifies recurring, reusable procedures not covered by existing skills and generates a new skill, ensuring it serves a distinct purpose and compresses environment-specific knowledge. The skip action is taken when evidence is insufficient to justify modification. This joint analysis of success and failure patterns ensures that evolution is cumulative, preserving validated behaviors while correcting failures.

Agentic Evolver Workflow
Agentic Evolver Workflow

After the evolver generates candidate updates, a rigorous validation process ensures only improvements are deployed. During the nighttime, candidate skills are evaluated in real deployment environments using the same toolchain and task contexts as the original sessions. Both the original and evolved versions are executed, and their outcomes are compared based on task success and execution stability. Only updates that demonstrably improve performance are accepted and merged into the shared repository. This validation step enforces a monotonic deployment policy, preventing degradation and ensuring users always interact with the best validated skills. The updated repository is then synchronized back to all agents, completing the evolution loop and enabling the system to benefit from collective user experiences without requiring explicit coordination or manual intervention.

Experiment

The experiment employs a continuous day-night closed-loop setup where agents interact with users during the day and undergo skill evolution and validation at night. This process validates whether nightly updates to a shared skill pool can progressively resolve task-specific bottlenecks and improve system stability. The findings demonstrate that skill evolution follows heterogeneous trajectories across different categories, successfully transforming naive execution patterns into structured, reliable, and environment-aware workflows. Overall, the system demonstrates a robust ability to consolidate procedural knowledge, effectively addressing failures related to input reliability, multimodal pipeline organization, and real-world execution constraints.

The experiment shows a consistent improvement in performance across four categories over six days, with each category stabilizing after an initial gain. Results indicate that the system evolves by integrating validated skill updates, leading to enhanced user-facing capabilities in areas such as social interaction, search, creativity, and safety. Performance improves significantly on Day 2 in all categories and stabilizes thereafter. Social Interaction shows early and sharp gains, followed by sustained performance. Creative Synthesis and Safety & Alignment exhibit notable early improvements with subsequent stabilization.

User-side daytime results
User-side daytime results

The the the table presents performance gains across three custom queries after a single round of skill evolution. Results show significant improvements in baseline performance, with the largest gains observed in basic extraction and save report tasks. Performance improvements are most pronounced in tasks involving procedural knowledge gaps. The average gain across all queries exceeds 40%, indicating consistent effectiveness of skill evolution. Save report achieves a perfect score after evolution, highlighting resolution of specific environmental failures.

Controlled validation results
Controlled validation results

The evaluation tracks system performance across various functional categories and specific task queries to validate the effectiveness of skill evolution. The results demonstrate that integrating validated skill updates leads to consistent improvements in social interaction, creativity, and safety, with particularly significant gains in tasks addressing procedural knowledge gaps. Overall, the system demonstrates a capacity for rapid evolution and stabilization, successfully resolving specific environmental failures and enhancing general user-facing capabilities.


بناء الذكاء الاصطناعي بالذكاء الاصطناعي

من الفكرة إلى الإطلاق — سرّع تطوير الذكاء الاصطناعي الخاص بك مع المساعدة البرمجية المجانية بالذكاء الاصطناعي، وبيئة جاهزة للاستخدام، وأفضل أسعار لوحدات معالجة الرسومات.

البرمجة التعاونية باستخدام الذكاء الاصطناعي
وحدات GPU جاهزة للعمل
أفضل الأسعار

HyperAI Newsletters

اشترك في آخر تحديثاتنا
سنرسل لك أحدث التحديثات الأسبوعية إلى بريدك الإلكتروني في الساعة التاسعة من صباح كل يوم اثنين
مدعوم بواسطة MailChimp
SkillClaw: دع المهارات تتطور بشكل جماعي باستخدام Agentic Evolver | مستندات | HyperAI