a day ago

Shanhui Zhao Jiacheng Liu Guohong Liu Jichao Yan Jialei Ye Yuhao Yang Hao Wen Shizuo Tian Yizhen Yuan Yuxuan Chen

Table of Contents

Abstract

AI agents are driving a new software paradigm, with the ability to autonomously call tools, extract information, manage memory, and complete tasks that span applications and data sources. Most existing end-user operating systems, however, are designed for application-centric workflows and offer little native support for AI agents. This mismatch limits the wider adoption of agents and leads to execution overhead and safety risks when running agents on conventional systems. While the concept of agent-native operating systems is emerging, the research community lacks an open testbed to explore the architectural primitives desired for agent-mediated interaction. We present AOHP (Android Open Harness Project), an OS-level agent harness built on the Android Open Source Project (AOSP). The core design principle of AOHP is to treat agents as first-class OS actors, enabling adaptive user interfaces and agent-friendly runtime environments. AOHP preserves the mature Android software and hardware ecosystem while introducing three agent-oriented system mechanisms: personalized service composition, efficient agent interfaces, and secure information flow. Based on preliminary experiments on challenging tasks covering key capabilities of OS agents, AOHP shows clear advantages in task completion (+21.12% completion rate), execution cost (-51.55% token cost), and security-policy compliance.

One-sentence Summary

The authors present AOHP (Android Open Harness Project), an open-source OS-level agent harness built on the Android Open Source Project that treats AI agents as first-class actors through personalized service composition, efficient agent interfaces, and secure information flow, achieving a 21.12% higher task completion rate, 51.55% lower token cost, and strict security-policy compliance compared to conventional systems.

Key Contributions

An agent-native OS architecture that treats services as interface-neutral capabilities and shifts cross-app personalization and sensitive state management to the system, addressing the mismatch between app-centric platforms and agent workflows.
AOHP, an OS-level harness built on the Android Open Source Project, introduces personalized service composition for task-level entrances, efficient agent interfaces enabling parallel background execution and structured UI/event streams, and secure information flow that sandboxes sensitive values through trusted tracking.
Evaluations with OpenClaw agents on self-crafted mobile tasks requiring cross-app interaction show that AOHP, compared to stock Android, raises the average completion rate from 54.44% to 75.56%, reduces LLM token consumption by 51.55%, and accelerates task execution by 44.21%; security case studies confirm the design restricts plaintext exposure of private data while preserving legitimate task execution.

Introduction

AI agents are moving from simple chat interfaces into the operating system itself, coordinating tools, GUIs, and cross-app workflows to fulfill user intent. Conventional app-centric OS designs hinder this shift: they present fixed pixel-based interfaces optimized for humans, assume one active app at a time, and enforce permissions at app boundaries without tracking sensitive data as it flows through an agent’s context and tool calls. Prior GUI automation work treats the OS as a fixed substrate, leaving execution overhead and security gaps unaddressed. The authors tackle this mismatch by redesigning Android as AOHP, an agent-native harness that introduces personalized service composition to generate task-level entrances, efficient agent interfaces supporting parallel background execution and structured events, and a trusted vault with information flow tracking that keeps private values in placeholder form and out of the agent’s plaintext context. This OS-level rearchitecting raises task completion rates, cuts token consumption and execution time, and restricts sensitive data exposure.

Dataset

The benchmark dataset comprises 30 handcrafted mobile UI tasks that reflect real-world workflows, grouped into five core capability categories and one hybrid category that combines them.

Composition and sources
- 30 tasks total, evenly split across six categories (five tasks per category).
- Tasks are designed by the authors using standard Android apps such as Calendar, Notes, Messages, Markor, Contacts, and Gallery.
- The tasks are presented as natural language instructions and are meant to test an agent’s ability to retrieve, manipulate, and reason over app data.
Category and task details
- Each core capability category targets a specific skill (e.g., information retrieval, memory management). The hybrid category composes multiple skills.
- Memory-management tasks (Category 5) follow a two-stage structure: stage A performs an action (e.g., create a calendar event), and stage B asks memory-related questions about the performed action.
- Task instructions include explicit checks (e.g., file paths, answer formats) and often require compiling information from several apps.
- Example tasks:
  - A retrieval task: “Please find and compile information for the next Development Meeting … write three lines to …/ir_dev_meeting_brief.txt” using Calendar, Notes, and Contacts.
  - A multi-app compilation task involving a Paris trip, using Calendar, Messages (SMS), and Markor.
  - A memory task: create a meeting, then answer “What is the end time of the Development Meeting on the 15th of next month? What is the title of the Task on the 30th of next month?”
How the paper uses the data
- The benchmark is used exclusively for evaluation; there is no training split or mixture ratio because it is not a training dataset.
- Agents are given the task instructions and must interact with the mobile environment to produce the correct output or answer.
Processing and metadata
- Each task comes with a human-readable instruction, a ground-truth answer or expected file content, and file-path constraints.
- For memory tasks, the two-stage format automatically pairs an execution step with a recall question.
- No cropping or image preprocessing is applied to the task descriptions; the benchmark is used as a textual specification for evaluation runs.

Method

The authors presentAOHP (Android Open Harness Project), an OS-level agent harness built on the Android Open Source Project (AOSP). The core design principle treats agents as first-class OS actors, enabling adaptive user interfaces and agent-friendly runtime environments. As shown in the figure below, the system transitions from the traditional app-centric, human-oriented paradigm of Stock Android to an agent-native, service-oriented architecture.

In the traditional paradigm, users navigate isolated app silos through sequential interactions. This leads to inefficient interaction, app-isolated memory, coarse-grained permissions, and static predefined workflows. In contrast, AOHP introduces an OS Agent that understands, plans, orchestrates, executes, and monitors tasks. This agent mediates personalized user interaction through generated service entrances such as a Shopping Aggregator or Travel Planner. It invokes underlying apps and services via multi-interface interaction, utilizing APIs, CLI, Structured UI, and Rendered GUI. This shift enables efficient task execution, system-level memory, fine-grained information flow, and flexible service composition.

The detailed architecture of AOHP is organized into vertical layers and horizontal cross-layer mechanisms. As illustrated in the figure below:

Personalized Service Composition At the top layer, the system generates personalized service entrances. These are user-facing shells backed by OS-managed service composition. For instance, a shopping entrance can aggregate product search from multiple providers, normalize attributes, and apply user preferences. Each entrance comprises a task schema, a service graph, and a presentation policy. The OS agent discovers service capabilities across API, CLI, and GUI channels, representing each with input/output schemas and policy labels. Composition is constrained by policy. For example, product search may be parallelized, while purchase submission requires explicit confirmation. System memory allows personalization to survive app boundaries, distinguishing between persistent profile memory, task-local memory, and sensitive memory.

AOHP Capabilities and Unified Interaction Interface The central capabilities layer reorganizes services into System Memory, Skills, and UI Utilities. System memory stores preferences and task state outside any single app. Skills package reusable service capabilities. UI utilities support the construction of generated entrances. Below this, the Unified Interaction Interface normalizes both traditional Android interfaces and emerging agent interfaces into four invocation modes: API, CLI, Structured UI, and Rendered GUI. This allows agents to select a compact symbolic path when available and fall back to visual operation when compatibility requires it. The bottom layer remains the Android Ecosystem, preserving existing apps, system services, and the native framework as the compatibility base.

Efficient Agent Interfaces To optimize how agents access system resources, the authors introduce efficient agent interfaces on the left side of the architecture.

Parallel Background Interaction: AOHP decouples execution from the screen through lightweight virtual displays, allowing agents to run workflows in the background without preempting the foreground session.
Agent-aware UI Enhancement: GUIs are abstracted into structured representations with lower redundancy and richer semantics, while retaining rendered GUI fallback.
Native Sandbox Runtime: A native Linux sandbox environment provides an execution surface independent of app-facing interfaces for computation and tooling.
Unified File Shortcut: This bridges GUI and CLI file processing, treating files as first-class task objects. GUI interactions affecting storage are reflected as structured file observations.
Event Stream Abstraction: This captures dynamic notifications and sensor streaming access, separating event generation from consumption.

Secure Information Flow To the right, the secure information flow mechanism treats sensitive data as OS-controlled state.

Policy Enforcement: The policy layer evaluates runtime data use, considering data source, purpose, destination, and action sensitivity.
Sensitive Source Sanitization: Before sensitive content enters the agent context, plaintext is replaced with typed placeholders.
Trusted Vault and Execution: Plaintext and privileged actions are kept inside trusted services. The agent requests operations with a UUID, which the Trusted Execution Environment and Data Vault handle.
Data Flow Taint Tracking: Taint metadata follows values through copying, transformation, and transfer. At system exits, tainted data is checked before display or transmission, providing an audit trail from source to sink.

Experiment

The evaluation compares an agent operating with AOHP’s agent-native interfaces against the same agent on stock Android across 30 real-world tasks that exercise GUI and non-GUI operations, event capture, multi-source retrieval, memory management, and hybrid workflows. AOHP markedly improves task completion, especially for notification-driven, cross-app, and memory-dependent scenarios, by providing structured observations, system APIs, and virtual execution that streamline workflows. On tasks both settings solve, AOHP reduces tool calls, duration, and token consumption by 44–52 percent, as agent-oriented shortcuts replace expensive GUI navigation. Supplementary security testing with an annotated payment application confirms that AOHP’s information-flow controls correctly enforce source sanitization, action mediation, taint propagation, and fail-closed behavior.

The authors evaluate the information-flow security of the AOHP system using a purpose-built annotated payment application. The evaluation tests mechanisms for data display, action mediation, and access control to ensure secure handling of financial data. Results indicate that the system successfully enforces all expected security behaviors, protecting sensitive information while allowing standard operations. Sensitive fields such as account and card numbers appear as vault references in the agent interface rather than plaintext. The system differentiates between ordinary actions that proceed automatically and sensitive actions like transfers that require user consent. Access requests outside the policy scope fail closed, and transaction event streams redact sensitive fields while preserving taint metadata.

The authors evaluate the execution efficiency of the OpenClaw agent on AOHP compared to stock Android using tasks that both settings solve completely. Results indicate that AOHP significantly lowers resource consumption and execution time by leveraging agent-native interfaces that streamline interactions and reduce context size. AOHP substantially reduces the number of tool calls and total execution duration compared to the stock Android baseline. Token consumption and LLM requests are significantly lower when using AOHP due to fewer execution steps and more compact observations. The efficiency improvements are driven by specialized interfaces that bypass complex GUI navigation and data processing workflows.

The security evaluation uses a purpose-built annotated payment application to verify that the agent system enforces information-flow controls, protecting sensitive data by displaying account and card numbers as vault references, requiring user consent for sensitive actions, and ensuring that unauthorized access requests fail closed while transaction event streams redact sensitive fields but retain taint metadata. A separate efficiency experiment compares the OpenClaw agent on AOHP against stock Android, showing that agent-native interfaces significantly lower resource consumption by reducing the number of tool calls, execution duration, and token usage through streamlined interactions that bypass complex GUI navigation. Together, these experiments confirm that the system securely mediates financial data access and achieves substantial performance gains over standard mobile platforms.

Source PDF View Code

Table of Contents

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Discuss on Discord

a day ago

Agent

Shanhui Zhao Jiacheng Liu Guohong Liu Jichao Yan Jialei Ye Yuhao Yang Hao Wen Shizuo Tian Yizhen Yuan Yuxuan Chen

Table of Contents

Abstract

One-sentence Summary

Key Contributions

An agent-native OS architecture that treats services as interface-neutral capabilities and shifts cross-app personalization and sensitive state management to the system, addressing the mismatch between app-centric platforms and agent workflows.
AOHP, an OS-level harness built on the Android Open Source Project, introduces personalized service composition for task-level entrances, efficient agent interfaces enabling parallel background execution and structured UI/event streams, and secure information flow that sandboxes sensitive values through trusted tracking.
Evaluations with OpenClaw agents on self-crafted mobile tasks requiring cross-app interaction show that AOHP, compared to stock Android, raises the average completion rate from 54.44% to 75.56%, reduces LLM token consumption by 51.55%, and accelerates task execution by 44.21%; security case studies confirm the design restricts plaintext exposure of private data while preserving legitimate task execution.

Introduction

Dataset

The benchmark dataset comprises 30 handcrafted mobile UI tasks that reflect real-world workflows, grouped into five core capability categories and one hybrid category that combines them.

Composition and sources
- 30 tasks total, evenly split across six categories (five tasks per category).
- Tasks are designed by the authors using standard Android apps such as Calendar, Notes, Messages, Markor, Contacts, and Gallery.
- The tasks are presented as natural language instructions and are meant to test an agent’s ability to retrieve, manipulate, and reason over app data.
Category and task details
- Each core capability category targets a specific skill (e.g., information retrieval, memory management). The hybrid category composes multiple skills.
- Memory-management tasks (Category 5) follow a two-stage structure: stage A performs an action (e.g., create a calendar event), and stage B asks memory-related questions about the performed action.
- Task instructions include explicit checks (e.g., file paths, answer formats) and often require compiling information from several apps.
- Example tasks:
  - A retrieval task: “Please find and compile information for the next Development Meeting … write three lines to …/ir_dev_meeting_brief.txt” using Calendar, Notes, and Contacts.
  - A multi-app compilation task involving a Paris trip, using Calendar, Messages (SMS), and Markor.
  - A memory task: create a meeting, then answer “What is the end time of the Development Meeting on the 15th of next month? What is the title of the Task on the 30th of next month?”
How the paper uses the data
- The benchmark is used exclusively for evaluation; there is no training split or mixture ratio because it is not a training dataset.
- Agents are given the task instructions and must interact with the mobile environment to produce the correct output or answer.
Processing and metadata
- Each task comes with a human-readable instruction, a ground-truth answer or expected file content, and file-path constraints.
- For memory tasks, the two-stage format automatically pairs an execution step with a recall question.
- No cropping or image preprocessing is applied to the task descriptions; the benchmark is used as a textual specification for evaluation runs.

Method

The detailed architecture of AOHP is organized into vertical layers and horizontal cross-layer mechanisms. As illustrated in the figure below:

Efficient Agent Interfaces To optimize how agents access system resources, the authors introduce efficient agent interfaces on the left side of the architecture.

Parallel Background Interaction: AOHP decouples execution from the screen through lightweight virtual displays, allowing agents to run workflows in the background without preempting the foreground session.
Agent-aware UI Enhancement: GUIs are abstracted into structured representations with lower redundancy and richer semantics, while retaining rendered GUI fallback.
Native Sandbox Runtime: A native Linux sandbox environment provides an execution surface independent of app-facing interfaces for computation and tooling.
Unified File Shortcut: This bridges GUI and CLI file processing, treating files as first-class task objects. GUI interactions affecting storage are reflected as structured file observations.
Event Stream Abstraction: This captures dynamic notifications and sensor streaming access, separating event generation from consumption.

Secure Information Flow To the right, the secure information flow mechanism treats sensitive data as OS-controlled state.

Policy Enforcement: The policy layer evaluates runtime data use, considering data source, purpose, destination, and action sensitivity.
Sensitive Source Sanitization: Before sensitive content enters the agent context, plaintext is replaced with typed placeholders.
Trusted Vault and Execution: Plaintext and privileged actions are kept inside trusted services. The agent requests operations with a UUID, which the Trusted Execution Environment and Data Vault handle.
Data Flow Taint Tracking: Taint metadata follows values through copying, transformation, and transfer. At system exits, tainted data is checked before display or transmission, providing an audit trail from source to sink.

Experiment

Source PDF View Code

Table of Contents

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

AOHP: An Open-Source OS-Level Agent Harness for Personalized, Efficient and Secure Interaction

Shanhui Zhao Jiacheng Liu Guohong Liu Jichao Yan Jialei Ye Yuhao Yang Hao Wen Shizuo Tian Yizhen Yuan Yuxuan Chen6 more

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

Build AI with AI

HyperAI Newsletters

Command Palette

AOHP: An Open-Source OS-Level Agent Harness for Personalized, Efficient and Secure Interaction

Shanhui Zhao Jiacheng Liu Guohong Liu Jichao Yan Jialei Ye Yuhao Yang Hao Wen Shizuo Tian Yizhen Yuan Yuxuan Chen6 more

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

Build AI with AI

HyperAI Newsletters

Command Palette

AOHP: An Open-Source OS-Level Agent Harness for Personalized, Efficient and Secure Interaction

Shanhui Zhao Jiacheng Liu Guohong Liu Jichao Yan Jialei Ye Yuhao Yang Hao Wen Shizuo Tian Yizhen Yuan Yuxuan Chen6 more

Abstract

One-sentence Summary

Key Contributions

Introduction

Dataset

Method

Experiment

Build AI with AI

HyperAI Newsletters

Shanhui Zhao Jiacheng Liu Guohong Liu Jichao Yan Jialei Ye Yuhao Yang Hao Wen Shizuo Tian Yizhen Yuan Yuxuan Chen

Shanhui Zhao Jiacheng Liu Guohong Liu Jichao Yan Jialei Ye Yuhao Yang Hao Wen Shizuo Tian Yizhen Yuan Yuxuan Chen

Shanhui Zhao Jiacheng Liu Guohong Liu Jichao Yan Jialei Ye Yuhao Yang Hao Wen Shizuo Tian Yizhen Yuan Yuxuan Chen