HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

AI for Auto-Research: Roadmap & User Guide

AI for Auto-Research: Roadmap & User Guide

Lingdong Kong, Xian Sun, Wei Chow, et al.

SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

Hongyi Liu, Haoyan Yang, Tao Jiang, et al.

Lance: Unified Multimodal Modeling by Multi-Task Synergy

Image Generation

Fengyi Fu, Mengqi Huang, Shaojin Wu, et al.

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

Video Generation

Diffusion Model

Yukang Chen, Luozhou Wang, Wei Huang, et al.

Slicing and Dicing: Configuring Optimal Mixtures of Experts

Margaret Li, Sneha Kudugunta, Danielle Rothermel, et al.

Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

Alberto Pepe, Chien-Yu Lin, Despoina Magka, et al.

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

Supervised Fine-Tuning

Yuchen Cai, Ding Cao, Liang Lin, et al.

DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo

Hanwen Wang, Weizhi Zhao, Xiangyu Wang, et al.

FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization

Video Generation

Quanjian Song, Yefeng Shen, Mengting Chen, et al.

CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

Document Understanding

Visual Question Answering

Dongsheng Ma, Jiayu Li, Zhengren Wang, et al.

MMSkills: Towards Multimodal Skills for General Visual Agents

Multimodal Representation

Kangning Zhang, Shuai Shao, Qingyao Li, et al.

PhysBrain 1.0 Technical Report

Visual Question Answering

Multimodal Representation

Shijie Lian, Bin Yu, Xiaopeng Lin, et al.

Bringing Value Models Back: Generative Critics for Value Modeling in LLM Reinforcement Learning

Reinforcement Learning

Zikang Shan, Han Zhong, Liwei Wang, et al.

NEXUS: An Agentic Framework for Time Series Forecasting

Sarkar Snigdha Sarathi Das, Palash Goyal, Mihir Parmar, et al.

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

Minghao Guo, Qingyue Jiao, Zeru Shi, et al.

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

Diffusion Model

Video Generation

Haoyi Zhu, Haozhe Liu, Yuyang Zhao, et al.

MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models

Visual Question Answering

Xiyu Ren, Zhaowei Wang, Yiming Du, et al.

Self-Distilled Agentic Reinforcement Learning

Reinforcement Learning

Zhengxi Lu, Zhiyuan Yao, Zhuowen Han, et al.

Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

Video Generation

Diffusion Model

Min Zhao, Hongzhou Zhu, Kaiwen Zheng, et al.

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Yafu Li, Runzhe Zhan, Haoran Zhang, et al.

RepoZero: Can LLMs Generate a Code Repository from Scratch?

Code Generation

Zhaoxi Zhang, Yiming Xu, Jiahui Liang, et al.

Qwen-Image-VAE-2.0 Technical Report

Diffusion Model

Image Generation

Zekai Zhang, Deqing Li, Kuan Cao, et al.

Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling

Eilam Shapira, Moshe Tennenholtz, Roi Reichart

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context

Visual Question Answering

Zhaowei Wang, Lishu Luo, Haodong Duan, et al.

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

Diffusion Model

Video Generation

Yuchao Gu, Guian Fang, Yuxin Jiang, et al.

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

Mind Lab, Song Cao, Vic Cao, et al.

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

Multimodal Representation

Alan Arazi, Eilam Shapira, Shoham Grunblat, et al.

Geometric Context Transformer for Streaming 3D Reconstruction

3D Machine Vision

Video Processing

Lin-Zhuo Chen, Jian Gao, Yihang Chen, et al.

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Supervised Fine-Tuning

Zhuolin Yang, Zihan Liu, Yang Chen, et al.

MOSS-TTS Technical Report

Audio and Speech Processing

SII-OpenMOSS Team

StreakMind: AI detection and analysis of satellite streaks in astronomical images with automated database integration

Object Detection

Computer Vision

Rafael Carrillo, René Duffard, Pablo García-Martín, et al.

VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

Keisuke Kamahori, Shihang Li, Simon Peter, et al.

AI for Auto-Research: Roadmap & User Guide

AI for Auto-Research: Roadmap & User Guide

Lingdong Kong, Xian Sun, Wei Chow, et al.

SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

Hongyi Liu, Haoyan Yang, Tao Jiang, et al.

Lance: Unified Multimodal Modeling by Multi-Task Synergy

Image Generation

Fengyi Fu, Mengqi Huang, Shaojin Wu, et al.

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

Video Generation

Diffusion Model

Yukang Chen, Luozhou Wang, Wei Huang, et al.

Slicing and Dicing: Configuring Optimal Mixtures of Experts

Margaret Li, Sneha Kudugunta, Danielle Rothermel, et al.

Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

Alberto Pepe, Chien-Yu Lin, Despoina Magka, et al.

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

Supervised Fine-Tuning

Yuchen Cai, Ding Cao, Liang Lin, et al.

DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo

Hanwen Wang, Weizhi Zhao, Xiangyu Wang, et al.

FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization

Video Generation

Quanjian Song, Yefeng Shen, Mengting Chen, et al.

CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

Document Understanding

Visual Question Answering

Dongsheng Ma, Jiayu Li, Zhengren Wang, et al.

MMSkills: Towards Multimodal Skills for General Visual Agents

Multimodal Representation

Kangning Zhang, Shuai Shao, Qingyao Li, et al.

PhysBrain 1.0 Technical Report

Visual Question Answering

Multimodal Representation

Shijie Lian, Bin Yu, Xiaopeng Lin, et al.

Bringing Value Models Back: Generative Critics for Value Modeling in LLM Reinforcement Learning

Reinforcement Learning

Zikang Shan, Han Zhong, Liwei Wang, et al.

NEXUS: An Agentic Framework for Time Series Forecasting

Sarkar Snigdha Sarathi Das, Palash Goyal, Mihir Parmar, et al.

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

Minghao Guo, Qingyue Jiao, Zeru Shi, et al.

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

Diffusion Model

Video Generation

Haoyi Zhu, Haozhe Liu, Yuyang Zhao, et al.

MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models

Visual Question Answering

Xiyu Ren, Zhaowei Wang, Yiming Du, et al.

Self-Distilled Agentic Reinforcement Learning

Reinforcement Learning

Zhengxi Lu, Zhiyuan Yao, Zhuowen Han, et al.

Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

Video Generation

Diffusion Model

Min Zhao, Hongzhou Zhu, Kaiwen Zheng, et al.

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

Yafu Li, Runzhe Zhan, Haoran Zhang, et al.

RepoZero: Can LLMs Generate a Code Repository from Scratch?

Code Generation

Zhaoxi Zhang, Yiming Xu, Jiahui Liang, et al.

Qwen-Image-VAE-2.0 Technical Report

Diffusion Model

Image Generation

Zekai Zhang, Deqing Li, Kuan Cao, et al.

Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling

Eilam Shapira, Moshe Tennenholtz, Roi Reichart

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context

Visual Question Answering

Zhaowei Wang, Lishu Luo, Haodong Duan, et al.

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

Diffusion Model

Video Generation

Yuchao Gu, Guian Fang, Yuxin Jiang, et al.

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

Mind Lab, Song Cao, Vic Cao, et al.

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

Multimodal Representation

Alan Arazi, Eilam Shapira, Shoham Grunblat, et al.

Geometric Context Transformer for Streaming 3D Reconstruction

3D Machine Vision

Video Processing

Lin-Zhuo Chen, Jian Gao, Yihang Chen, et al.

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Supervised Fine-Tuning

Zhuolin Yang, Zihan Liu, Yang Chen, et al.

MOSS-TTS Technical Report

Audio and Speech Processing

SII-OpenMOSS Team

StreakMind: AI detection and analysis of satellite streaks in astronomical images with automated database integration

Object Detection

Computer Vision

Rafael Carrillo, René Duffard, Pablo García-Martín, et al.

VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

Keisuke Kamahori, Shihang Li, Simon Peter, et al.

Lance: Unified Multimodal Modeling by Multi-Task Synergy

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

Slicing and Dicing: Configuring Optimal Mixtures of Experts

Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo

FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization

CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

MMSkills: Towards Multimodal Skills for General Visual Agents

PhysBrain 1.0 Technical Report

Bringing Value Models Back: Generative Critics for Value Modeling in LLM Reinforcement Learning

NEXUS: An Agentic Framework for Time Series Forecasting

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models

Self-Distilled Agentic Reinforcement Learning

Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

RepoZero: Can LLMs Generate a Code Repository from Scratch?

Qwen-Image-VAE-2.0 Technical Report

Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

Geometric Context Transformer for Streaming 3D Reconstruction

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

MOSS-TTS Technical Report

StreakMind: AI detection and analysis of satellite streaks in astronomical images with automated database integration

VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

Lance: Unified Multimodal Modeling by Multi-Task Synergy

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

Slicing and Dicing: Configuring Optimal Mixtures of Experts

Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo

FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization

CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

MMSkills: Towards Multimodal Skills for General Visual Agents

PhysBrain 1.0 Technical Report

Bringing Value Models Back: Generative Critics for Value Modeling in LLM Reinforcement Learning

NEXUS: An Agentic Framework for Time Series Forecasting

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models

Self-Distilled Agentic Reinforcement Learning

Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

RepoZero: Can LLMs Generate a Code Repository from Scratch?

Qwen-Image-VAE-2.0 Technical Report

Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

Geometric Context Transformer for Streaming 3D Reconstruction

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

MOSS-TTS Technical Report

StreakMind: AI detection and analysis of satellite streaks in astronomical images with automated database integration

VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?