HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

Atomic Task Graph: A Unified Framework for Agentic Planning and Execution

Atomic Task Graph: A Unified Framework for Agentic Planning and Execution

Yue Zhang, Sihan Chen, Ziwen Huang, et al.

LongE2V: Long-Horizon Event-based Video Reconstruction, Prediction, and Frame Interpolation with Video Diffusion Models

LongE2V: Long-Horizon Event-based Video Reconstruction, Prediction, and Frame Interpolation with Video Diffusion Models

Diffusion Model

Video Generation

Cheng-De Fan, Chun-Wei Tuan Mu, Chen-Wei Chang, et al.

UniClawBench: A Universal Benchmark for Proactive Agents on Real-World Tasks

Zhekai Chen, Chengqi Duan, Kaiyue Sun, et al.

Ideas Have Genomes: Benchmarking Scientific Lineage Reasoning and Lineage-Grounded Idea Generation

Yifan Zhou, Qihao Yang, Yan Li, et al.

Why Can’t I Open My Drawer? Mitigating Object-Driven Shortcuts in Zero-Shot Compositional Action Recognition

Video Understanding

Action Recognition

Geo Ahn, Inwoong Lee, Taeoh Kim, et al.

Video-Oasis: Rethinking Evaluation of Video Understanding

Video Understanding

Geuntaek Lim, Sungjune Park, Jaeyun Lee, et al.

Vidu S1: A Real-Time Interactive Video Generation Model

Video Generation

Diffusion Model

Jintao Zhang, Kai Jiang, Jintao Chen, et al.

Measuring the Gap Between Human and LLM Research Ideas

Ziyu Chen, Yilun Zhao, Arman Cohan

The Harness Effect: How Orchestration Design Sets the Token Economics of Enterprise Agentic AI

Muayad Sayed Ali, Aliaksandra Novik, Anji Boddupally, et al.

Infinite Worlds with Versatile Interactions

Video Generation

Zelin Gao, Qiuyu Wang, Jiapeng Zhu, et al.

Scaling Mixture-of-Experts Video Pretraining for Embodied Intelligence

Video Generation

Shuailei Ma, Jiaqi Liao, Xinyang Wang, et al.

LAME M-VLA: DUAL LATENT MEMORY IN VISION-LANGUAGE-ACTION MODELS FOR ROBOTIC MANIPULATION

Multimodal Representation

Hongyu Qu, Jianzhe Gao, Xiaobin Hu, et al.

Accurate, Interdisciplinary and Transparent Structure-property Understanding with Deep Native Structural Reasoning

Chen Tang, Yizhou Wang, Jianyu Wu, et al.

Parallelized Autoregressive Decoding for Omni-Modal Dense Video Captioning

Video Captioning

Wenzheng Zeng, Siyi Jiao, Chen Gao, et al.

Light-Omni: Reflex over Reasoning in Agentic Video Understanding with Long-Term Memory

Video Understanding

Chang Nie, Jiaju Wei, Junlan Feng, et al.

Vision as Unified Multimodal Generation

Xiaoyang Han, Jianhua Li, Kewang Deng, et al.

Hierarchical Sparse Attention Done Right: Toward Infinite Context Modeling

Xiang Hu, Xinyu Wei, Hao Gu, et al.

AlayaWorld: Long-Horizon and Playable Video World Generation

Video Generation

RynnWorld-4D: 4D Embodied World Models for Robotic Manipulation

Diffusion Model

Video Generation

Haoyu Zhao, Xingyue Zhao, Siteng Huang, et al.

Nemotron-Labs-3-Puzzle-75B-A9B: Compressing Hybrid MoE LLMs

Akhiad Bercovich, Talor Abramovich, Daniel Afrimi, et al.

Multi-Turn On-Policy Distillation with Prefix Replay

Reinforcement Learning

Baohao Liao, Hanze Dong, Christof Monz, et al.

Gemma 4 Technical Report

Sherif El Abd, Vaibhav Aggarwal, Robin Algayres, et al.

UI-MOPD: Multi-Platform On-Policy Distillation for Continual GUI Agent Learning

Niu Lian, Alan Chen, Zhehao Yu, et al.

Wan-Streamer v0.2: Higher Resolution, Same Latency

Video Generation

Lianghua Huang, Zhi-Fan Wu, Yupeng Shi, et al.

EVA-Client: A Unified Framework for Deployment, Evaluation, and Data Collection on Real Robots

Heqing Yang, Yang Yi, Liyao Wang, et al.

GigaWorld-1: A Roadmap to Build World Models for Robot Policy Evaluation

Video Generation

Angyuan Ma, Boyuan Wang, Bohan Li, et al.

ResearchStudio-Idea: An Evidence-Grounded Research-Ideation Skill Suite from ML Conference Outcomes

Retrieval-Augmented Generation

Qihao Zhao, Yangyu Huang, Yalun Dai, et al.

ResearchStudio-Reel: Automate the Last Mile of Research from Paper to Poster, Video, and Blog

Document Understanding

Text Generation

Lingao Xiao, Yalun Dai, Yangyu Huang, et al.

FINAL Bench: Measuring Functional Metacognitive Reasoning in Large Language Models

Taebong Kim, Minsik Kim, Sunyoung Choi, et al.

SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes

3D Machine Vision

Semantic Segmentation

TheoremGraph: Bridging Formal and Informal Mathematics

Retrieval-Augmented Generation

Simon Kurgan, Evan Wang, Eric Leonen, et al.

Always-On Agents: A Survey of Persistent Memory, State, and Governance in LLM Agents

Tianyu Ding, Aditya Nannapaneni, Bingfan Liu, et al.

Atomic Task Graph: A Unified Framework for Agentic Planning and Execution

Atomic Task Graph: A Unified Framework for Agentic Planning and Execution

Yue Zhang, Sihan Chen, Ziwen Huang, et al.

LongE2V: Long-Horizon Event-based Video Reconstruction, Prediction, and Frame Interpolation with Video Diffusion Models

LongE2V: Long-Horizon Event-based Video Reconstruction, Prediction, and Frame Interpolation with Video Diffusion Models

Diffusion Model

Video Generation

Cheng-De Fan, Chun-Wei Tuan Mu, Chen-Wei Chang, et al.

UniClawBench: A Universal Benchmark for Proactive Agents on Real-World Tasks

Zhekai Chen, Chengqi Duan, Kaiyue Sun, et al.

Ideas Have Genomes: Benchmarking Scientific Lineage Reasoning and Lineage-Grounded Idea Generation

Yifan Zhou, Qihao Yang, Yan Li, et al.

Why Can’t I Open My Drawer? Mitigating Object-Driven Shortcuts in Zero-Shot Compositional Action Recognition

Video Understanding

Action Recognition

Geo Ahn, Inwoong Lee, Taeoh Kim, et al.

Video-Oasis: Rethinking Evaluation of Video Understanding

Video Understanding

Geuntaek Lim, Sungjune Park, Jaeyun Lee, et al.

Vidu S1: A Real-Time Interactive Video Generation Model

Video Generation

Diffusion Model

Jintao Zhang, Kai Jiang, Jintao Chen, et al.

Measuring the Gap Between Human and LLM Research Ideas

Ziyu Chen, Yilun Zhao, Arman Cohan

The Harness Effect: How Orchestration Design Sets the Token Economics of Enterprise Agentic AI

Muayad Sayed Ali, Aliaksandra Novik, Anji Boddupally, et al.

Infinite Worlds with Versatile Interactions

Video Generation

Zelin Gao, Qiuyu Wang, Jiapeng Zhu, et al.

Scaling Mixture-of-Experts Video Pretraining for Embodied Intelligence

Video Generation

Shuailei Ma, Jiaqi Liao, Xinyang Wang, et al.

LAME M-VLA: DUAL LATENT MEMORY IN VISION-LANGUAGE-ACTION MODELS FOR ROBOTIC MANIPULATION

Multimodal Representation

Hongyu Qu, Jianzhe Gao, Xiaobin Hu, et al.

Accurate, Interdisciplinary and Transparent Structure-property Understanding with Deep Native Structural Reasoning

Chen Tang, Yizhou Wang, Jianyu Wu, et al.

Parallelized Autoregressive Decoding for Omni-Modal Dense Video Captioning

Video Captioning

Wenzheng Zeng, Siyi Jiao, Chen Gao, et al.

Light-Omni: Reflex over Reasoning in Agentic Video Understanding with Long-Term Memory

Video Understanding

Chang Nie, Jiaju Wei, Junlan Feng, et al.

Vision as Unified Multimodal Generation

Xiaoyang Han, Jianhua Li, Kewang Deng, et al.

Hierarchical Sparse Attention Done Right: Toward Infinite Context Modeling

Xiang Hu, Xinyu Wei, Hao Gu, et al.

AlayaWorld: Long-Horizon and Playable Video World Generation

Video Generation

RynnWorld-4D: 4D Embodied World Models for Robotic Manipulation

Diffusion Model

Video Generation

Haoyu Zhao, Xingyue Zhao, Siteng Huang, et al.

Nemotron-Labs-3-Puzzle-75B-A9B: Compressing Hybrid MoE LLMs

Akhiad Bercovich, Talor Abramovich, Daniel Afrimi, et al.

Multi-Turn On-Policy Distillation with Prefix Replay

Reinforcement Learning

Baohao Liao, Hanze Dong, Christof Monz, et al.

Gemma 4 Technical Report

Sherif El Abd, Vaibhav Aggarwal, Robin Algayres, et al.

UI-MOPD: Multi-Platform On-Policy Distillation for Continual GUI Agent Learning

Niu Lian, Alan Chen, Zhehao Yu, et al.

Wan-Streamer v0.2: Higher Resolution, Same Latency

Video Generation

Lianghua Huang, Zhi-Fan Wu, Yupeng Shi, et al.

EVA-Client: A Unified Framework for Deployment, Evaluation, and Data Collection on Real Robots

Heqing Yang, Yang Yi, Liyao Wang, et al.

GigaWorld-1: A Roadmap to Build World Models for Robot Policy Evaluation

Video Generation

Angyuan Ma, Boyuan Wang, Bohan Li, et al.

ResearchStudio-Idea: An Evidence-Grounded Research-Ideation Skill Suite from ML Conference Outcomes

Retrieval-Augmented Generation

Qihao Zhao, Yangyu Huang, Yalun Dai, et al.

ResearchStudio-Reel: Automate the Last Mile of Research from Paper to Poster, Video, and Blog

Document Understanding

Text Generation

Lingao Xiao, Yalun Dai, Yangyu Huang, et al.

FINAL Bench: Measuring Functional Metacognitive Reasoning in Large Language Models

Taebong Kim, Minsik Kim, Sunyoung Choi, et al.

SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes

3D Machine Vision

Semantic Segmentation

TheoremGraph: Bridging Formal and Informal Mathematics

Retrieval-Augmented Generation

Simon Kurgan, Evan Wang, Eric Leonen, et al.

Always-On Agents: A Survey of Persistent Memory, State, and Governance in LLM Agents

Tianyu Ding, Aditya Nannapaneni, Bingfan Liu, et al.

UniClawBench: A Universal Benchmark for Proactive Agents on Real-World Tasks

Ideas Have Genomes: Benchmarking Scientific Lineage Reasoning and Lineage-Grounded Idea Generation

Why Can’t I Open My Drawer? Mitigating Object-Driven Shortcuts in Zero-Shot Compositional Action Recognition

Video-Oasis: Rethinking Evaluation of Video Understanding

Vidu S1: A Real-Time Interactive Video Generation Model

Measuring the Gap Between Human and LLM Research Ideas

The Harness Effect: How Orchestration Design Sets the Token Economics of Enterprise Agentic AI

Infinite Worlds with Versatile Interactions

Scaling Mixture-of-Experts Video Pretraining for Embodied Intelligence

LAME M-VLA: DUAL LATENT MEMORY IN VISION-LANGUAGE-ACTION MODELS FOR ROBOTIC MANIPULATION

Accurate, Interdisciplinary and Transparent Structure-property Understanding with Deep Native Structural Reasoning

Parallelized Autoregressive Decoding for Omni-Modal Dense Video Captioning

Light-Omni: Reflex over Reasoning in Agentic Video Understanding with Long-Term Memory

Vision as Unified Multimodal Generation

Hierarchical Sparse Attention Done Right: Toward Infinite Context Modeling

AlayaWorld: Long-Horizon and Playable Video World Generation

RynnWorld-4D: 4D Embodied World Models for Robotic Manipulation

Nemotron-Labs-3-Puzzle-75B-A9B: Compressing Hybrid MoE LLMs

Multi-Turn On-Policy Distillation with Prefix Replay

Gemma 4 Technical Report

UI-MOPD: Multi-Platform On-Policy Distillation for Continual GUI Agent Learning

Wan-Streamer v0.2: Higher Resolution, Same Latency

EVA-Client: A Unified Framework for Deployment, Evaluation, and Data Collection on Real Robots

GigaWorld-1: A Roadmap to Build World Models for Robot Policy Evaluation

ResearchStudio-Idea: An Evidence-Grounded Research-Ideation Skill Suite from ML Conference Outcomes

ResearchStudio-Reel: Automate the Last Mile of Research from Paper to Poster, Video, and Blog

FINAL Bench: Measuring Functional Metacognitive Reasoning in Large Language Models

SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes

TheoremGraph: Bridging Formal and Informal Mathematics

Always-On Agents: A Survey of Persistent Memory, State, and Governance in LLM Agents

UniClawBench: A Universal Benchmark for Proactive Agents on Real-World Tasks

Ideas Have Genomes: Benchmarking Scientific Lineage Reasoning and Lineage-Grounded Idea Generation

Why Can’t I Open My Drawer? Mitigating Object-Driven Shortcuts in Zero-Shot Compositional Action Recognition

Video-Oasis: Rethinking Evaluation of Video Understanding

Vidu S1: A Real-Time Interactive Video Generation Model

Measuring the Gap Between Human and LLM Research Ideas

The Harness Effect: How Orchestration Design Sets the Token Economics of Enterprise Agentic AI

Infinite Worlds with Versatile Interactions

Scaling Mixture-of-Experts Video Pretraining for Embodied Intelligence

LAME M-VLA: DUAL LATENT MEMORY IN VISION-LANGUAGE-ACTION MODELS FOR ROBOTIC MANIPULATION

Accurate, Interdisciplinary and Transparent Structure-property Understanding with Deep Native Structural Reasoning

Parallelized Autoregressive Decoding for Omni-Modal Dense Video Captioning

Light-Omni: Reflex over Reasoning in Agentic Video Understanding with Long-Term Memory

Vision as Unified Multimodal Generation

Hierarchical Sparse Attention Done Right: Toward Infinite Context Modeling

AlayaWorld: Long-Horizon and Playable Video World Generation

RynnWorld-4D: 4D Embodied World Models for Robotic Manipulation

Nemotron-Labs-3-Puzzle-75B-A9B: Compressing Hybrid MoE LLMs

Multi-Turn On-Policy Distillation with Prefix Replay

Gemma 4 Technical Report

UI-MOPD: Multi-Platform On-Policy Distillation for Continual GUI Agent Learning

Wan-Streamer v0.2: Higher Resolution, Same Latency

EVA-Client: A Unified Framework for Deployment, Evaluation, and Data Collection on Real Robots

GigaWorld-1: A Roadmap to Build World Models for Robot Policy Evaluation

ResearchStudio-Idea: An Evidence-Grounded Research-Ideation Skill Suite from ML Conference Outcomes

ResearchStudio-Reel: Automate the Last Mile of Research from Paper to Poster, Video, and Blog

FINAL Bench: Measuring Functional Metacognitive Reasoning in Large Language Models

SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes

TheoremGraph: Bridging Formal and Informal Mathematics

Always-On Agents: A Survey of Persistent Memory, State, and Governance in LLM Agents