HyperAIHyperAI

Command Palette

Search for a command to run...

SkillNet: AI 기술의 생성, 평가 및 연결

초록

현재 AI 에이전트는 다양한 도구를 유연하게 호출하고 복잡한 작업을 수행할 수 있지만, 체계적인 기술 축적과 전이가 부재함에 따라 장기적인 진보가 저해되고 있습니다. 기술 통합을 위한 통일된 메커니즘이 없기 때문에 에이전트는 종종 '바퀴를 다시 발명'하며, 이전 전략을 활용하지 못한 채 고립된 맥락에서 해법을 재발견합니다. 이러한 한계를 극복하기 위해 우리는 대규모로 AI 기술을 생성, 평가, 조직화하도록 설계된 오픈 인프라인 SkillNet을 소개합니다. SkillNet은 이질적인 소스에서 기술을 생성하고, 풍부한 관계적 연결을 수립하며, 안전성, 완전성, 실행 가능성, 유지보수성, 비용 인식 등 다차원 평가를 수행할 수 있는 통일된 온톨로지 내에서 기술을 구조화합니다. 본 인프라는 20만 개 이상의 기술이 저장된 리포지토리, 대화형 플랫폼, 그리고 다목적 파이썬 툴킷을 통합하여 제공합니다. ALFWorld, WebShop, ScienceWorld에서의 실험적 평가 결과, SkillNet은 여러 백본 모델에서 에이전트 성능을 획기적으로 향상시켜 평균 보상을 40% 증가시키고 실행 단계를 30% 감소시켰음이 입증되었습니다. 기술을 진화 가능하고 조합 가능한 자산으로 형식화함으로써, SkillNet은 에이전트가 일시적인 경험에서 견고한 숙달로 전환할 수 있는 강력한 기반을 제공합니다.

One-sentence Summary

Researchers from Zhejiang University and major industry partners introduce SkillNet, an open infrastructure that unifies over 200,000 AI skills into a structured ontology. This system enables systematic skill consolidation and multi-dimensional evaluation, significantly boosting agent performance in complex task environments by preventing redundant learning.

Key Contributions

  • Current AI agents struggle to accumulate and transfer skills systematically, often rediscovering solutions in isolated contexts without leveraging prior strategies.
  • SkillNet addresses this by introducing an open infrastructure with a unified ontology that organizes over 200,000 skills and evaluates them across five dimensions including Safety, Completeness, and Executability.
  • Experimental results on ALFWorld, WebShop, and ScienceWorld show that SkillNet improves average agent rewards by 40% and reduces execution steps by 30% across multiple backbone models.

Introduction

As AI agents evolve to handle complex, long-horizon tasks, their progress is currently stalled by an inability to systematically accumulate and transfer skills, forcing them to repeatedly rediscover solutions in isolated contexts. Prior approaches rely on manual engineering or transient in-context learning, while existing skill repositories suffer from static curation, lack of rigorous quality control, and poor composability that prevents scalable reuse. To address these gaps, the authors introduce SkillNet, an open infrastructure that structures over 200,000 skills into a unified ontology with rich relational connections and a multi-dimensional evaluation framework covering safety, executability, and cost. This system transforms fragmented experience into durable, composable assets, enabling agents to achieve significant performance gains by leveraging a robust foundation for cumulative learning rather than episodic trial and error.

Dataset

  • Dataset Composition and Sources: The authors construct a versatile skill repository by aggregating heterogeneous data from four primary sources: execution trajectories and conversational logs, open-source GitHub repositories, semi-structured documents (PDF, PowerPoint, Word), and direct natural language user prompts.

  • Key Details for Each Subset: The initial pool contains over 200,000 candidate skills derived from open internet resources, automated pipelines, and community contributions. A rigorous multi-stage filtering and evaluation process curates this down to a final repository of more than 150,000 high-quality skills that are constantly expanding.

  • Data Usage and Processing: The authors employ a fully automated pipeline powered by Large Language Models to transform raw inputs into reusable structured agent skills. Users can customize the underlying models, and the system supports continuous expansion through open resources and community submissions.

  • Quality Assurance and Metadata: To ensure reliability, the team implements automated checks across five dimensions: safety, completeness, executability, maintainability, and cost-awareness. They also conduct periodic manual audits via random sampling and use the data to analyze skill relations, uncovering dependencies, hierarchical compositions, and functional similarities.

Method

The authors propose SkillNet, a comprehensive framework designed to transform fragmented agent experiences and human knowledge into reusable, verifiable skill entities. The system operates through a systematic pipeline that encompasses skill creation, evaluation, and organization to support scalable and reliable capability growth. Refer to the framework diagram for the end-to-end architecture.

The framework begins with Skill Creation, where the system analyzes diverse inputs including user trajectories, documents, GitHub projects, and direct prompts to generate new skills. These generated skills undergo a rigorous Skill Filtering process involving deduplication, categorization, and a multi-dimensional evaluation mechanism. The evaluation dimensions include Safety, Completeness, Executability, Maintainability, and Cost-Awareness. Only high-quality skills that pass these checks are admitted into the repository, ensuring the system functions as a self-evolving ecosystem rather than a static collection.

To manage the growing repository, SkillNet employs a structured ontology. As shown in the figure below, this ontology is organized into three progressive layers.

The top layer is the Skill Taxonomy, which categorizes skills into broad domains such as Development, AIGC, and Science, further refined by fine-grained tags. The middle layer is the Skill Relation Graph, which models inter-skill dependencies and semantic associations using relations such as similar_to, compose_with, belong_to, and depend_on. The bottom layer is the Skill Package Library, which groups individual skills into modular, task-oriented bundles for deployment.

Beyond isolated skill creation, the system includes a Skill Analysis module that automatically discovers and models structural relations between skills. This enables global reasoning over large skill repositories and supports advanced downstream applications such as skill retrieval and workflow synthesis. In practical scenarios, such as autonomous scientific discovery or coding, the system decomposes user tasks into actionable steps. Refer to the practical use examples for specific instances of skill application and evaluation.

For instance, in a scientific workflow, the agent schedules data processing skills followed by mechanistic analysis and target validation. The system provides detailed skill cards, such as for kegg-database or component-refactoring, which include metadata and quality scores to guide the agent's selection and execution. This structured approach allows agents to bridge high-level user intentions with executable actions by organizing specialized skills into a coherent workflow.

Experiment

  • A multi-dimensional evaluation framework was established to assess skill reliability across safety, completeness, executability, maintainability, and cost-awareness, confirming that an automated LLM-based evaluator achieves near-perfect alignment with human expert judgments.
  • Experiments in three simulated environments (ALFWorld, WebShop, and ScienceWorld) demonstrate that integrating SkillNet significantly outperforms baseline methods like ReAct and Few-Shot by enabling agents to solve tasks more reliably with fewer interaction steps.
  • Results validate that SkillNet effectively transforms fragmented experiences into reusable procedural abstractions, providing robust performance gains across models of varying sizes and ensuring strong generalization to both seen and unseen tasks.

AI로 AI 구축

아이디어에서 출시까지 — 무료 AI 코코딩, 즉시 사용 가능한 환경, 최적의 GPU 가격으로 AI 개발을 가속화하세요.

AI 협업 코딩
바로 사용 가능한 GPU
최적의 가격

HyperAI Newsletters

최신 정보 구독하기
한국 시간 매주 월요일 오전 9시 에 이번 주의 최신 업데이트를 메일로 발송합니다
이메일 서비스 제공: MailChimp