HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation

Video Generation

Xiangyang Luo, Xiaozhe Xin, Tao Feng, et al.

Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items

Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items

Image Generation

Mengting Chen, Zhengrui Chen, Yongchao Du, et al.

Fast NF4 Dequantization Kernels for Large Language Model Inference

Xiangbo Qi, Chaoyi Jiang, Murali Annavaram

EasyVideoR1: Easier RL for Video Understanding

Video Understanding

Chuanyu Qin, Chenxu Yang, Qingyi Si, et al.

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

Haoyu Wu, Jiwen Yu, Yingtian Zou, et al.

OpenGame: Open Agentic Coding for Games

Code Generation

Yilei Jiang, Jinyuan Hu, Qianyin Xiao, et al.

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

Guanting Dong, Junting Lu, Junjie Huang, et al.

OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

Autonomous Driving

Jinghui Lu, Jiayi Guan, Zhijian Huang, et al.

Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation

Image Generation

Chenxi Zhao, Chen Zhu, Xiaokun Feng, et al.

ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image

Image Segmentation

Medical Imaging

Halle E. Wong, Marianne Rakic, John Guttag, et al.

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

Video Understanding

Yunhang Shen, Chaoyou Fu, Shaoqi Dong, et al.

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Yujia Qin, Yining Ye, Junjie Fang, et al.

HunyuanVideo: A Systematic Framework for Large Video Generative Models

Video Generation

Hunyuan Foundation Model Team

MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

Retrieval-Augmented Generation

Shaden Alshammari, Kevin Wen, Abrar Zainal, et al.

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

Chenyu Zhou, Huacan Chai, Wenteng Chen, et al.

Active Context Compression: Autonomous Memory Management in LLM Agents

Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning

Jiaxi Bi, Tongxu Luo, Wenyu Du, et al.

Qwen3.5-Omni Technical Report

Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems

Retrieval-Augmented Generation

Uday Allu, Sonu Kedia, Tanmay Odapally, et al.

PersonaVLM: Long-Term Personalized Multimodal LLMs

Chang Nie, Chaoyou Fu, Yifan Zhang, et al.

Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips

Ido Galil, Moshe Kimhi, Ran El-Yaniv

Elucidating the SNR-t Bias of Diffusion Probabilistic Models

Diffusion Model

Image Generation

Meng Yu, Lei Sun, Jianhao Zeng, et al.

Multimodal OCR: Parse Anything from Documents

Document Understanding

Handong Zheng, Yumeng Li, Kaile Zhang, et al.

Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities

Audio and Speech Processing

George Saon, Avihu Dekel, Alexander Brooks, et al.

Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis

Shijia Liao, Yuxuan Wang, Tianyu Li, et al.

Video Object and Interaction Deletion

Image Inpainting

Video Generation

Saman Motamed, William Harvey, Benjamin Klein, et al.

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

Diffusion Model

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models

Diffusion Model

Han Zhu, Lingxuan Ye, Wei Kang, et al.

Where Vision Becomes Text: Locating the OCR Routing Bottleneck in Vision-Language Models

Jonathan Steinberg, Oren Gal

OCR or Not? Rethinking Document Information Extraction in the MLLMs Era with Real-World Large-Scale Datasets

Document Understanding

Jiyuan Shen, Peiyue Yuan, Atin Ghosh, et al.

dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning

Arnav Shah, Junzhe Li, Parsa Idehpour, et al.

Neural Computers

Video Generation

Mingchen Zhuge, Changsheng Zhao, Haozhe Liu, et al.

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation

Video Generation

Xiangyang Luo, Xiaozhe Xin, Tao Feng, et al.

Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items

Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items

Image Generation

Mengting Chen, Zhengrui Chen, Yongchao Du, et al.

Fast NF4 Dequantization Kernels for Large Language Model Inference

Xiangbo Qi, Chaoyi Jiang, Murali Annavaram

EasyVideoR1: Easier RL for Video Understanding

Video Understanding

Chuanyu Qin, Chenxu Yang, Qingyi Si, et al.

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

Haoyu Wu, Jiwen Yu, Yingtian Zou, et al.

OpenGame: Open Agentic Coding for Games

Code Generation

Yilei Jiang, Jinyuan Hu, Qianyin Xiao, et al.

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

Guanting Dong, Junting Lu, Junjie Huang, et al.

OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

Autonomous Driving

Jinghui Lu, Jiayi Guan, Zhijian Huang, et al.

Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation

Image Generation

Chenxi Zhao, Chen Zhu, Xiaokun Feng, et al.

ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image

Image Segmentation

Medical Imaging

Halle E. Wong, Marianne Rakic, John Guttag, et al.

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

Video Understanding

Yunhang Shen, Chaoyou Fu, Shaoqi Dong, et al.

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Yujia Qin, Yining Ye, Junjie Fang, et al.

HunyuanVideo: A Systematic Framework for Large Video Generative Models

Video Generation

Hunyuan Foundation Model Team

MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

Retrieval-Augmented Generation

Shaden Alshammari, Kevin Wen, Abrar Zainal, et al.

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

Chenyu Zhou, Huacan Chai, Wenteng Chen, et al.

Active Context Compression: Autonomous Memory Management in LLM Agents

Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning

Jiaxi Bi, Tongxu Luo, Wenyu Du, et al.

Qwen3.5-Omni Technical Report

Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems

Retrieval-Augmented Generation

Uday Allu, Sonu Kedia, Tanmay Odapally, et al.

PersonaVLM: Long-Term Personalized Multimodal LLMs

Chang Nie, Chaoyou Fu, Yifan Zhang, et al.

Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips

Ido Galil, Moshe Kimhi, Ran El-Yaniv

Elucidating the SNR-t Bias of Diffusion Probabilistic Models

Diffusion Model

Image Generation

Meng Yu, Lei Sun, Jianhao Zeng, et al.

Multimodal OCR: Parse Anything from Documents

Document Understanding

Handong Zheng, Yumeng Li, Kaile Zhang, et al.

Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities

Audio and Speech Processing

George Saon, Avihu Dekel, Alexander Brooks, et al.

Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis

Shijia Liao, Yuxuan Wang, Tianyu Li, et al.

Video Object and Interaction Deletion

Image Inpainting

Video Generation

Saman Motamed, William Harvey, Benjamin Klein, et al.

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

Diffusion Model

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models

Diffusion Model

Han Zhu, Lingxuan Ye, Wei Kang, et al.

Where Vision Becomes Text: Locating the OCR Routing Bottleneck in Vision-Language Models

Jonathan Steinberg, Oren Gal

OCR or Not? Rethinking Document Information Extraction in the MLLMs Era with Real-World Large-Scale Datasets

Document Understanding

Jiyuan Shen, Peiyue Yuan, Atin Ghosh, et al.

dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning

Arnav Shah, Junzhe Li, Parsa Idehpour, et al.

Neural Computers

Video Generation

Mingchen Zhuge, Changsheng Zhao, Haozhe Liu, et al.

Fast NF4 Dequantization Kernels for Large Language Model Inference

EasyVideoR1: Easier RL for Video Understanding

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

OpenGame: Open Agentic Coding for Games

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation

ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

HunyuanVideo: A Systematic Framework for Large Video Generative Models

MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

Active Context Compression: Autonomous Memory Management in LLM Agents

Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning

Qwen3.5-Omni Technical Report

Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems

PersonaVLM: Long-Term Personalized Multimodal LLMs

Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips

Elucidating the SNR-t Bias of Diffusion Probabilistic Models

Multimodal OCR: Parse Anything from Documents

Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities

Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis

Video Object and Interaction Deletion

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models

Where Vision Becomes Text: Locating the OCR Routing Bottleneck in Vision-Language Models

OCR or Not? Rethinking Document Information Extraction in the MLLMs Era with Real-World Large-Scale Datasets

dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning

Neural Computers

Fast NF4 Dequantization Kernels for Large Language Model Inference

EasyVideoR1: Easier RL for Video Understanding

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

OpenGame: Open Agentic Coding for Games

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation

ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

HunyuanVideo: A Systematic Framework for Large Video Generative Models

MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

Active Context Compression: Autonomous Memory Management in LLM Agents

Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning

Qwen3.5-Omni Technical Report

Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems

PersonaVLM: Long-Term Personalized Multimodal LLMs

Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips

Elucidating the SNR-t Bias of Diffusion Probabilistic Models

Multimodal OCR: Parse Anything from Documents

Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities

Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis

Video Object and Interaction Deletion

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models

Where Vision Becomes Text: Locating the OCR Routing Bottleneck in Vision-Language Models

OCR or Not? Rethinking Document Information Extraction in the MLLMs Era with Real-World Large-Scale Datasets

dnaHNet: A Scalable and Hierarchical Foundation Model for Genomic Sequence Learning

Neural Computers