7 months ago

Human-Computer Interaction

Video Understanding

Computer Vision

Luke Rivard Sun Sun Hongyu Guo Wenhu Chen Yuntian Deng

Abstract

We introduce NeuralOS, a neural framework that simulates graphical userinterfaces (GUIs) of operating systems by directly predicting screen frames inresponse to user inputs such as mouse movements, clicks, and keyboard events.NeuralOS combines a recurrent neural network (RNN), which tracks computerstate, with a diffusion-based neural renderer that generates screen images. Themodel is trained on a large-scale dataset of Ubuntu XFCE recordings, whichinclude both randomly generated interactions and realistic interactionsproduced by AI agents. Experiments show that NeuralOS successfully rendersrealistic GUI sequences, accurately captures mouse interactions, and reliablypredicts state transitions like application launches. Although modelingfine-grained keyboard interactions precisely remains challenging, NeuralOSoffers a step toward creating fully adaptive, generative neural interfaces forfuture human-computer interaction systems.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

7 months ago

Human-Computer Interaction

Video Understanding

Computer Vision

Luke Rivard Sun Sun Hongyu Guo Wenhu Chen Yuntian Deng

Abstract

We introduce NeuralOS, a neural framework that simulates graphical userinterfaces (GUIs) of operating systems by directly predicting screen frames inresponse to user inputs such as mouse movements, clicks, and keyboard events.NeuralOS combines a recurrent neural network (RNN), which tracks computerstate, with a diffusion-based neural renderer that generates screen images. Themodel is trained on a large-scale dataset of Ubuntu XFCE recordings, whichinclude both randomly generated interactions and realistic interactionsproduced by AI agents. Experiments show that NeuralOS successfully rendersrealistic GUI sequences, accurately captures mouse interactions, and reliablypredicts state transitions like application launches. Although modelingfine-grained keyboard interactions precisely remains challenging, NeuralOSoffers a step toward creating fully adaptive, generative neural interfaces forfuture human-computer interaction systems.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp