HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt

DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt

Supervised Fine-Tuning

Yitong Zhang, Jia Li, Liyi Cai, et al.

WorldGen: From Text to Traversable and Interactive 3D Worlds

WorldGen: From Text to Traversable and Interactive 3D Worlds

Diffusion Model

Dilin Wang, Hyunyoung Jung, Tom Monnier, et al.

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Shalini Maiti, Amar Budhiraja, Bhavul Gauri, et al.

DTS: Enhancing Large Reasoning Models via Decoding Tree Sketching

Zicheng Xu, Guanchu Wang, Yu-Neng Chuang, et al.

Adaptive Kernel Design for Bayesian Optimization Is a Piece of CAKE with LLMs

Richard Cornelius Suwandi, Feng Yin, Juntao Wang, et al.

DePass: Unified Feature Attributing by Simple Decomposed Forward Pass

Natural Language Processing

Xiangyu Hong, Che Jiang, Kai Tian, et al.

COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence

Multi-Task Learning

Zefeng Zhang, Xiangzhao Hao, Hengzhu Tang, et al.

From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks

Reinforcement Learning

Changpeng Yang, Jinyang Wu, Yuchen Liu, et al.

PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling

Reinforcement Learning

Bowen Ping, Chengyou Jia, Minnan Luo, et al.

EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture

Multi-Task Learning

Xin He, Longhui Wei, Jianbo Ouyang, et al.

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Image Generation

Hongyu Li, Manyuan Zhang, Dian Zheng, et al.

TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

Diffusion Model

Zhenglin Cheng, Peng Sun, Jianguo Li, et al.

CARE-PD: A Multi-Site Anonymized Clinical Dataset for Parkinson's Disease Gait Assessment

Video Understanding

Vida Adeli, Ivan Klabucar, Javad Rajabi, et al.

WenetSpeech-Chuan: A Large-Scale Sichuanese Corpus with Rich Annotation for Dialectal Speech Processing

Audio and Speech Processing

Yuhang Dai, Ziyu Zhang, Shuai Wang, et al.

PolypSense3D: A Multi-Source Benchmark Dataset for Depth-Aware Polyp Size Measurement in Endoscopy

Depth Estimation

Semantic Segmentation

Ruyu Liu, Lin Wang, Zhou Mingming, et al.

PhysDrive: A Multimodal Remote Physiological Measurement Dataset for In-vehicle Driver Monitoring

Computer Vision

Jiyao Wang, Xiao Yang, Qingyong Hu, et al.

Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)

Liwei Jiang, Yuanjun Chai, Margaret Li, et al.

OmniSVG: A Unified Scalable Vector Graphics Generation Model

Image Generation

Yiying Yang, Wei Cheng, Sijin Chen, et al.

Algorithmic Thinking Theory

MohammadHossein Bateni, Vincent Cohen-Addad, Yuzhou Gu, et al.

Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics

Reinforcement Learning

Chenhao Li, Andreas Krause, Marco Hutter

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Video Generation

Diffusion Model

Yunhong Lu, Yanhong Zeng, Haobo Li, et al.

Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion

Diffusion Model

Image Generation

Yueming Pan, Ruoyu Feng, Qi Dai, et al.

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

Preference Modeling

Shengyuan Ding, Xinyu Fang, Ziyu Liu, et al.

Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

Nex-AGI Team, Yuxuan Cai, Lu Chen, et al.

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Fangyu Lei, Jinxiang Meng, Yiming Huang, et al.

Diffusion Model

Yubo Huang, Hailong Guo, Fangtai Wu, et al.

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Yushen Chen, Zhikang Niu, Ziyang Ma, et al.

VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

Video Understanding

Object Detection

Yash Garg, Saketh Bachu, Arindam Dutta, et al.

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

Reinforcement Learning

NVIDIA, Yulong Cao, Tong Che, et al.

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Neural Networks

Ali Behrouz, Meisam Razaviyayn, Peilin Zhong, et al.

Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation

Diffusion Model

Subin Kim, Sangwoo Mo, Mamshad Nayeem Rizve, et al.

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

Supervised Fine-Tuning

Siyuan Yang, Yang Zhang, Haoran He, et al.

DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt

DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt

Supervised Fine-Tuning

Yitong Zhang, Jia Li, Liyi Cai, et al.

WorldGen: From Text to Traversable and Interactive 3D Worlds

WorldGen: From Text to Traversable and Interactive 3D Worlds

Diffusion Model

Dilin Wang, Hyunyoung Jung, Tom Monnier, et al.

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Shalini Maiti, Amar Budhiraja, Bhavul Gauri, et al.

DTS: Enhancing Large Reasoning Models via Decoding Tree Sketching

Zicheng Xu, Guanchu Wang, Yu-Neng Chuang, et al.

Adaptive Kernel Design for Bayesian Optimization Is a Piece of CAKE with LLMs

Richard Cornelius Suwandi, Feng Yin, Juntao Wang, et al.

DePass: Unified Feature Attributing by Simple Decomposed Forward Pass

Natural Language Processing

Xiangyu Hong, Che Jiang, Kai Tian, et al.

COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence

Multi-Task Learning

Zefeng Zhang, Xiangzhao Hao, Hengzhu Tang, et al.

From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks

Reinforcement Learning

Changpeng Yang, Jinyang Wu, Yuchen Liu, et al.

PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling

Reinforcement Learning

Bowen Ping, Chengyou Jia, Minnan Luo, et al.

EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture

Multi-Task Learning

Xin He, Longhui Wei, Jianbo Ouyang, et al.

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Image Generation

Hongyu Li, Manyuan Zhang, Dian Zheng, et al.

TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

Diffusion Model

Zhenglin Cheng, Peng Sun, Jianguo Li, et al.

CARE-PD: A Multi-Site Anonymized Clinical Dataset for Parkinson's Disease Gait Assessment

Video Understanding

Vida Adeli, Ivan Klabucar, Javad Rajabi, et al.

WenetSpeech-Chuan: A Large-Scale Sichuanese Corpus with Rich Annotation for Dialectal Speech Processing

Audio and Speech Processing

Yuhang Dai, Ziyu Zhang, Shuai Wang, et al.

PolypSense3D: A Multi-Source Benchmark Dataset for Depth-Aware Polyp Size Measurement in Endoscopy

Depth Estimation

Semantic Segmentation

Ruyu Liu, Lin Wang, Zhou Mingming, et al.

PhysDrive: A Multimodal Remote Physiological Measurement Dataset for In-vehicle Driver Monitoring

Computer Vision

Jiyao Wang, Xiao Yang, Qingyong Hu, et al.

Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)

Liwei Jiang, Yuanjun Chai, Margaret Li, et al.

OmniSVG: A Unified Scalable Vector Graphics Generation Model

Image Generation

Yiying Yang, Wei Cheng, Sijin Chen, et al.

Algorithmic Thinking Theory

MohammadHossein Bateni, Vincent Cohen-Addad, Yuzhou Gu, et al.

Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics

Reinforcement Learning

Chenhao Li, Andreas Krause, Marco Hutter

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Video Generation

Diffusion Model

Yunhong Lu, Yanhong Zeng, Haobo Li, et al.

Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion

Diffusion Model

Image Generation

Yueming Pan, Ruoyu Feng, Qi Dai, et al.

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

Preference Modeling

Shengyuan Ding, Xinyu Fang, Ziyu Liu, et al.

Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

Nex-AGI Team, Yuxuan Cai, Lu Chen, et al.

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Fangyu Lei, Jinxiang Meng, Yiming Huang, et al.

Diffusion Model

Yubo Huang, Hailong Guo, Fangtai Wu, et al.

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Yushen Chen, Zhikang Niu, Ziyang Ma, et al.

VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

Video Understanding

Object Detection

Yash Garg, Saketh Bachu, Arindam Dutta, et al.

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

Reinforcement Learning

NVIDIA, Yulong Cao, Tong Che, et al.

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Neural Networks

Ali Behrouz, Meisam Razaviyayn, Peilin Zhong, et al.

Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation

Diffusion Model

Subin Kim, Sangwoo Mo, Mamshad Nayeem Rizve, et al.

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

Supervised Fine-Tuning

Siyuan Yang, Yang Zhang, Haoran He, et al.

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

DTS: Enhancing Large Reasoning Models via Decoding Tree Sketching

Adaptive Kernel Design for Bayesian Optimization Is a Piece of CAKE with LLMs

DePass: Unified Feature Attributing by Simple Decomposed Forward Pass

COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence

From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks

PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling

EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

CARE-PD: A Multi-Site Anonymized Clinical Dataset for Parkinson's Disease Gait Assessment

WenetSpeech-Chuan: A Large-Scale Sichuanese Corpus with Rich Annotation for Dialectal Speech Processing

PolypSense3D: A Multi-Source Benchmark Dataset for Depth-Aware Polyp Size Measurement in Endoscopy

PhysDrive: A Multimodal Remote Physiological Measurement Dataset for In-vehicle Driver Monitoring

Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)

OmniSVG: A Unified Scalable Vector Graphics Generation Model

Algorithmic Thinking Theory

Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

DTS: Enhancing Large Reasoning Models via Decoding Tree Sketching

Adaptive Kernel Design for Bayesian Optimization Is a Piece of CAKE with LLMs

DePass: Unified Feature Attributing by Simple Decomposed Forward Pass

COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence

From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks

PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling

EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

CARE-PD: A Multi-Site Anonymized Clinical Dataset for Parkinson's Disease Gait Assessment

WenetSpeech-Chuan: A Large-Scale Sichuanese Corpus with Rich Annotation for Dialectal Speech Processing

PolypSense3D: A Multi-Source Benchmark Dataset for Depth-Aware Polyp Size Measurement in Endoscopy

PhysDrive: A Multimodal Remote Physiological Measurement Dataset for In-vehicle Driver Monitoring

Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)

OmniSVG: A Unified Scalable Vector Graphics Generation Model

Algorithmic Thinking Theory

Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach