HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow

Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow

Video Generation

Karthik Dharmarajan, Wenlong Huang, Jiajun Wu, et al.

On the Role of Discreteness in Diffusion LLMs

On the Role of Discreteness in Diffusion LLMs

Diffusion Model

Ziqi Jin, Bin Wang, Xiang Lin, et al.

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Diffusion Model

Zefeng He, Xiaoye Qu, Yafu Li, et al.

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Natural Language Processing

Xingwei Qu, Shaowen Wang, Zihao Huang, et al.

Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling

Retrieval-Augmented Generation

Chulun Zhou, Chunkang Zhang, Guoxin Yu, et al.

AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents

Jiafeng Liang, Hao Li, Chang Li, et al.

Scaling Open-Ended Reasoning to Predict the Future

Retrieval-Augmented Generation

Nikhil Chandak, Shashwat Goel, Ameya Prabhu, et al.

GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction

Diffusion Model

3D Machine Vision

Yi-Chuan Huang, Hao-Jen Chien, Chin-Yang Lin, et al.

mHC: Manifold-Constrained Hyper-Connections

Zhenda Xie, Yixuan Wei, Huanqi Cao, et al.

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Weixun Wang, XiaoXiao Xu, Wanhe An, et al.

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

Junru Lu, Jiarui Qin, Lingfeng Qiao, et al.

GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs

Text Generation

Lichao Wu, Sasha Behrouzi, Mohamadreza Rostami, et al.

GraphLocator: Graph-guided Causal Reasoning for Issue Localization

Wei Liu, Chao Peng, Pengfei Gao, et al.

Evaluating Parameter Efficient Methods for RLVR

Reinforcement Learning

Supervised Fine-Tuning

Qingyu Yin, Yulun Wu, Zhennan Shen, et al.

End-to-End Test-Time Training for Long Context

Natural Language Processing

Arnuv Tandon, Karan Dalal, Xinhao Li, et al.

DreamOmni3: Scribble-based Editing and Generation

Image Generation

Image Inpainting

Bin Xia, Bohao Peng, Jiyang Liu, et al.

UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement

Diffusion Model

Tanghui Jia, Dongyu Yan, Dehao Hao, et al.

mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs

Embodied Intelligence

Jonas Pai, Liam Achenbach, Victoriano Montesinos, et al.

HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation

Diffusion Model

Yuxin Wen, Qing Shuai, Di Kang, et al.

SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling

Embodied Intelligence

Yufan He, Pengfei Guo, Mengya Xu, et al.

SpotEdit: Selective Region Editing in Diffusion Transformers

Diffusion Model

Image Processing

Zhibin Qin, Zhenxiong Tan, Zeqing Wang, et al.

Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

Depth Estimation

Diffusion Model

Shaocong Xu, Songlin Wei, Qizhe Wei, et al.

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Reinforcement Learning

Shaofei Cai, Yulei Qin, Haojia Lin, et al.

Yume-1.5: A Text-Controlled Interactive World Generation Model

Diffusion Model

Xiaofeng Mao, Zhen Li, Chuanhao Li, et al.

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Diffusion Model

Video Generation

Ethan Chern, Zhulin Hu, Bohao Tang, et al.

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Ang Lv, Jin Ma, Yiyuan Ma, et al.

LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration

Video Understanding

Wen Jiang, Li Wang, Kangyao Huang, et al.

Attention Is Not What You Need

SlideTailor: Personalized Presentation Slide Generation for Scientific Papers

Text Generation

Human-Computer Interaction

Wenzheng Zeng, Mingyu Ouyang, Langyuan Cui, et al.

InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search

Multimodal Representation

Kaican Li, Lewei Yao, Jiannan Wu, et al.

InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

Diffusion Model

Video Processing

Hoiyeong Jin, Hyojin Jang, Jeongho Kim, et al.

Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding

Retrieval-Augmented Generation

Yuqing Li, Jiangnan Li, Zheng Lin, et al.

Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow

Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow

Video Generation

Karthik Dharmarajan, Wenlong Huang, Jiajun Wu, et al.

On the Role of Discreteness in Diffusion LLMs

On the Role of Discreteness in Diffusion LLMs

Diffusion Model

Ziqi Jin, Bin Wang, Xiang Lin, et al.

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Diffusion Model

Zefeng He, Xiaoye Qu, Yafu Li, et al.

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Natural Language Processing

Xingwei Qu, Shaowen Wang, Zihao Huang, et al.

Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling

Retrieval-Augmented Generation

Chulun Zhou, Chunkang Zhang, Guoxin Yu, et al.

AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents

Jiafeng Liang, Hao Li, Chang Li, et al.

Scaling Open-Ended Reasoning to Predict the Future

Retrieval-Augmented Generation

Nikhil Chandak, Shashwat Goel, Ameya Prabhu, et al.

GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction

Diffusion Model

3D Machine Vision

Yi-Chuan Huang, Hao-Jen Chien, Chin-Yang Lin, et al.

mHC: Manifold-Constrained Hyper-Connections

Zhenda Xie, Yixuan Wei, Huanqi Cao, et al.

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Weixun Wang, XiaoXiao Xu, Wanhe An, et al.

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

Junru Lu, Jiarui Qin, Lingfeng Qiao, et al.

GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs

Text Generation

Lichao Wu, Sasha Behrouzi, Mohamadreza Rostami, et al.

GraphLocator: Graph-guided Causal Reasoning for Issue Localization

Wei Liu, Chao Peng, Pengfei Gao, et al.

Evaluating Parameter Efficient Methods for RLVR

Reinforcement Learning

Supervised Fine-Tuning

Qingyu Yin, Yulun Wu, Zhennan Shen, et al.

End-to-End Test-Time Training for Long Context

Natural Language Processing

Arnuv Tandon, Karan Dalal, Xinhao Li, et al.

DreamOmni3: Scribble-based Editing and Generation

Image Generation

Image Inpainting

Bin Xia, Bohao Peng, Jiyang Liu, et al.

UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement

Diffusion Model

Tanghui Jia, Dongyu Yan, Dehao Hao, et al.

mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs

Embodied Intelligence

Jonas Pai, Liam Achenbach, Victoriano Montesinos, et al.

HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation

Diffusion Model

Yuxin Wen, Qing Shuai, Di Kang, et al.

SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling

Embodied Intelligence

Yufan He, Pengfei Guo, Mengya Xu, et al.

SpotEdit: Selective Region Editing in Diffusion Transformers

Diffusion Model

Image Processing

Zhibin Qin, Zhenxiong Tan, Zeqing Wang, et al.

Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

Depth Estimation

Diffusion Model

Shaocong Xu, Songlin Wei, Qizhe Wei, et al.

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Reinforcement Learning

Shaofei Cai, Yulei Qin, Haojia Lin, et al.

Yume-1.5: A Text-Controlled Interactive World Generation Model

Diffusion Model

Xiaofeng Mao, Zhen Li, Chuanhao Li, et al.

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Diffusion Model

Video Generation

Ethan Chern, Zhulin Hu, Bohao Tang, et al.

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Ang Lv, Jin Ma, Yiyuan Ma, et al.

LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration

Video Understanding

Wen Jiang, Li Wang, Kangyao Huang, et al.

Attention Is Not What You Need

SlideTailor: Personalized Presentation Slide Generation for Scientific Papers

Text Generation

Human-Computer Interaction

Wenzheng Zeng, Mingyu Ouyang, Langyuan Cui, et al.

InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search

Multimodal Representation

Kaican Li, Lewei Yao, Jiannan Wu, et al.

InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

Diffusion Model

Video Processing

Hoiyeong Jin, Hyojin Jang, Jeongho Kim, et al.

Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding

Retrieval-Augmented Generation

Yuqing Li, Jiangnan Li, Zheng Lin, et al.

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling

AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents

Scaling Open-Ended Reasoning to Predict the Future

GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction

mHC: Manifold-Constrained Hyper-Connections

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs

GraphLocator: Graph-guided Causal Reasoning for Issue Localization

Evaluating Parameter Efficient Methods for RLVR

End-to-End Test-Time Training for Long Context

DreamOmni3: Scribble-based Editing and Generation

UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement

mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs

HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation

SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling

SpotEdit: Selective Region Editing in Diffusion Transformers

Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Yume-1.5: A Text-Controlled Interactive World Generation Model

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration

Attention Is Not What You Need

SlideTailor: Personalized Presentation Slide Generation for Scientific Papers

InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search

InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling

AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents

Scaling Open-Ended Reasoning to Predict the Future

GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction

mHC: Manifold-Constrained Hyper-Connections

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs

GraphLocator: Graph-guided Causal Reasoning for Issue Localization

Evaluating Parameter Efficient Methods for RLVR

End-to-End Test-Time Training for Long Context

DreamOmni3: Scribble-based Editing and Generation

UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement

mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs

HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation

SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling

SpotEdit: Selective Region Editing in Diffusion Transformers

Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Yume-1.5: A Text-Controlled Interactive World Generation Model

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration

Attention Is Not What You Need

SlideTailor: Personalized Presentation Slide Generation for Scientific Papers

InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search

InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding