HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

HyperAI

Main

GPU

Console
Studio
Docs
Pricing

Pulse

News

Resources

Papers
Notebooks
Datasets
Wiki

Benchmarks

SOTA
LLM Models
GPU Leaderboard

Community

Events

Utility

About Terms of Service Privacy Policy
English

Command Palette

Search for a command to run...

HyperAI
Papers

Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Build the Future of Artificial Intelligence

About

About Us Support Dataset Help

Products

News Papers Notebooks Datasets Wiki

Links

© HyperAI

GitHub Discord X (formerly Twitter)

iLRM: An Iterative Large 3D Reconstruction Model

iLRM: An Iterative Large 3D Reconstruction Model

Gyeongjin Kang, Seungtae Nam, Xiangyu Sun, et al.

villa-X: Enhancing Latent Action Modeling in Vision-Language-Action
Models

villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models

Video Understanding

Xiaoyu Chen, Hangxing Wei, Pushi Zhang, et al.

C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring
Challenges in Complex Conversations

Chengqian Ma, Wei Tao, Yiwen Guo

RecGPT Technical Report

Retrieval-Augmented Generation

Chao Yi, Dian Chen, Gaoyang Guo, et al.

Phi-Ground Tech Report: Advancing Perception in GUI Grounding

Image Understanding

Miaosen Zhang, Ziqiang Xu, Jialiang Zhu, et al.

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Luoxin Chen, Jinming Gu, Liankai Huang, et al.

The Outcome of the 2022 Landslide4Sense Competition: Advanced Landslide Detection from Multi-Source Satellite Imagery

Computer Vision

Omid Ghorbanzadeh, Yonghao Xu, Hengwei Zhao, et al.

Less is More for Synthetic Speech Detection in the Wild

Nicholas Andrews, Matthew Wiesner, Sanjeev Khudanpur, et al.

Solution-aware vs global ReLU selection: partial MILP strikes back for DNN verification

Convolutional Neural Network

Yuke Liao, Blaise Genest, Kuldeep Meel, et al.

CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks

Ping Yu, Jack Lanchantin, Tianlu Wang, et al.

Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual
Segmentation

Video Understanding

Kaining Ying, Henghui Ding, Guanquan Jie, et al.

Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision

Object Detection

Xiao Fang, Minhyek Jeon, Zheyang Qin, et al.

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced
Multimodal Reasoning

Reinforcement Learning

Ruifeng Yuan, Chenghao Xiao, Sicong Leng, et al.

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency
and Performance

Jingwei Zuo, Maksim Velikanov, Ilyas Chahed, et al.

BANG: Dividing 3D Assets via Generative Exploded Dynamics

Longwen Zhang, Qixuan Zhang, Haoran Jiang, et al.

ScreenCoder: Advancing Visual-to-Code Generation for Front-End
Automation via Modular Multimodal Agents

Code Generation

Yilei Jiang, Yaozhi Zheng, Yuxuan Wan, et al.

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Preference Modeling

Deep Ganguli, Liane Lovitt, Jackson Kernion, et al.

MIRepNet: A Pipeline and Foundation Model for EEG-Based Motor Imagery Classification

Convolutional Neural Network

Dingkun Liu, Zhu Chen, Jingwei Luo, et al.

HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into
Multimodal LLMs at Scale

Visual Question Answering

Junying Chen, Ruyi Ouyang, Anningzhe Gao, et al.

ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge

Zihan Zhao, Bo Chen, Ziping Wan, et al.

X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again

Image Generation

Diffusion Model

Zigang Geng, Yibing Wang, Yeyao Ma, et al.

HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

HunyuanWorld Team, Zhenwei Wang, Yuhao Liu, et al.

AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data

Christopher F. Brown, Michal R. Kazmierski, Valerie J. Pasquarella, et al.

Toward long-range ENSO prediction with an explainable deep learning model

Convolutional Neural Network

Qi Chen, Yinghao Cui, Guobin Hong, et al.

OmniArch: Building Foundation Model for Scientific Computing

Tianyu Chen, Haoyi Zhou, Ying Li, et al.

VA-MoE: Channel-Adapted MoE for Incremental Weather Forecasting

Hao Chen, Han Tao, Guo Song, et al.

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

Supervised Fine-Tuning

Shuquan Lian, Yuhang Wu, Jia Ma, et al.

DualSG: A Dual-Stream Explicit Semantic-Guided Multivariate Time Series Forecasting Framework

Natural Language Processing

Kuiye Ding, Fanda Fan, Yao Wang, et al.

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token
Compression across Images, Videos, and Audios

Computer Vision

Kele Shao, Keda Tao, Kejia Zhang, et al.

SmallThinker: A Family of Efficient Large Language Models Natively
Trained for Local Deployment

Yixin Song, Zhenliang Xue, Dongliang Wei, et al.

Reconstructing 4D Spatial Intelligence: A Survey

Computer Vision

Video Understanding

Yukang Cao, Jiahao Lu, Zhisheng Huang, et al.

Rep-MTL: Unleashing the Power of Representation-level Task Saliency for
Multi-Task Learning

Multi-Task Learning

Neural Networks

Zedong Wang, Siyuan Li, Dan Xu

iLRM: An Iterative Large 3D Reconstruction Model

iLRM: An Iterative Large 3D Reconstruction Model

Gyeongjin Kang, Seungtae Nam, Xiangyu Sun, et al.

villa-X: Enhancing Latent Action Modeling in Vision-Language-Action
Models

villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models

Video Understanding

Xiaoyu Chen, Hangxing Wei, Pushi Zhang, et al.

C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring
Challenges in Complex Conversations

Chengqian Ma, Wei Tao, Yiwen Guo

RecGPT Technical Report

Retrieval-Augmented Generation

Chao Yi, Dian Chen, Gaoyang Guo, et al.

Phi-Ground Tech Report: Advancing Perception in GUI Grounding

Image Understanding

Miaosen Zhang, Ziqiang Xu, Jialiang Zhu, et al.

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Luoxin Chen, Jinming Gu, Liankai Huang, et al.

The Outcome of the 2022 Landslide4Sense Competition: Advanced Landslide Detection from Multi-Source Satellite Imagery

Computer Vision

Omid Ghorbanzadeh, Yonghao Xu, Hengwei Zhao, et al.

Less is More for Synthetic Speech Detection in the Wild

Nicholas Andrews, Matthew Wiesner, Sanjeev Khudanpur, et al.

Solution-aware vs global ReLU selection: partial MILP strikes back for DNN verification

Convolutional Neural Network

Yuke Liao, Blaise Genest, Kuldeep Meel, et al.

CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks

Ping Yu, Jack Lanchantin, Tianlu Wang, et al.

Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual
Segmentation

Video Understanding

Kaining Ying, Henghui Ding, Guanquan Jie, et al.

Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision

Object Detection

Xiao Fang, Minhyek Jeon, Zheyang Qin, et al.

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced
Multimodal Reasoning

Reinforcement Learning

Ruifeng Yuan, Chenghao Xiao, Sicong Leng, et al.

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency
and Performance

Jingwei Zuo, Maksim Velikanov, Ilyas Chahed, et al.

BANG: Dividing 3D Assets via Generative Exploded Dynamics

Longwen Zhang, Qixuan Zhang, Haoran Jiang, et al.

ScreenCoder: Advancing Visual-to-Code Generation for Front-End
Automation via Modular Multimodal Agents

Code Generation

Yilei Jiang, Yaozhi Zheng, Yuxuan Wan, et al.

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

Preference Modeling

Deep Ganguli, Liane Lovitt, Jackson Kernion, et al.

MIRepNet: A Pipeline and Foundation Model for EEG-Based Motor Imagery Classification

Convolutional Neural Network

Dingkun Liu, Zhu Chen, Jingwei Luo, et al.

HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into
Multimodal LLMs at Scale

Visual Question Answering

Junying Chen, Ruyi Ouyang, Anningzhe Gao, et al.

ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge

Zihan Zhao, Bo Chen, Ziping Wan, et al.

X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again

Image Generation

Diffusion Model

Zigang Geng, Yibing Wang, Yeyao Ma, et al.

HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

HunyuanWorld Team, Zhenwei Wang, Yuhao Liu, et al.

AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data

Christopher F. Brown, Michal R. Kazmierski, Valerie J. Pasquarella, et al.

Toward long-range ENSO prediction with an explainable deep learning model

Convolutional Neural Network

Qi Chen, Yinghao Cui, Guobin Hong, et al.

OmniArch: Building Foundation Model for Scientific Computing

Tianyu Chen, Haoyi Zhou, Ying Li, et al.

VA-MoE: Channel-Adapted MoE for Incremental Weather Forecasting

Hao Chen, Han Tao, Guo Song, et al.

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

Supervised Fine-Tuning

Shuquan Lian, Yuhang Wu, Jia Ma, et al.

DualSG: A Dual-Stream Explicit Semantic-Guided Multivariate Time Series Forecasting Framework

Natural Language Processing

Kuiye Ding, Fanda Fan, Yao Wang, et al.

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token
Compression across Images, Videos, and Audios

Computer Vision

Kele Shao, Keda Tao, Kejia Zhang, et al.

SmallThinker: A Family of Efficient Large Language Models Natively
Trained for Local Deployment

Yixin Song, Zhenliang Xue, Dongliang Wei, et al.

Reconstructing 4D Spatial Intelligence: A Survey

Computer Vision

Video Understanding

Yukang Cao, Jiahao Lu, Zhisheng Huang, et al.

Rep-MTL: Unleashing the Power of Representation-level Task Saliency for
Multi-Task Learning

Multi-Task Learning

Neural Networks

Zedong Wang, Siyuan Li, Dan Xu

C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations

RecGPT Technical Report

Phi-Ground Tech Report: Advancing Perception in GUI Grounding

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

The Outcome of the 2022 Landslide4Sense Competition: Advanced Landslide Detection from Multi-Source Satellite Imagery

Less is More for Synthetic Speech Detection in the Wild

Solution-aware vs global ReLU selection: partial MILP strikes back for DNN verification

CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks

Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation

Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

BANG: Dividing 3D Assets via Generative Exploded Dynamics

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

MIRepNet: A Pipeline and Foundation Model for EEG-Based Motor Imagery Classification

HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale

ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge

X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again

HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data

Toward long-range ENSO prediction with an explainable deep learning model

OmniArch: Building Foundation Model for Scientific Computing

VA-MoE: Channel-Adapted MoE for Incremental Weather Forecasting

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

DualSG: A Dual-Stream Explicit Semantic-Guided Multivariate Time Series Forecasting Framework

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios

SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment

Reconstructing 4D Spatial Intelligence: A Survey

Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning

C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations

RecGPT Technical Report

Phi-Ground Tech Report: Advancing Perception in GUI Grounding

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

The Outcome of the 2022 Landslide4Sense Competition: Advanced Landslide Detection from Multi-Source Satellite Imagery

Less is More for Synthetic Speech Detection in the Wild

Solution-aware vs global ReLU selection: partial MILP strikes back for DNN verification

CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks

Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation

Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

BANG: Dividing 3D Assets via Generative Exploded Dynamics

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

MIRepNet: A Pipeline and Foundation Model for EEG-Based Motor Imagery Classification

HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale

ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge

X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again

HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data

Toward long-range ENSO prediction with an explainable deep learning model

OmniArch: Building Foundation Model for Scientific Computing

VA-MoE: Channel-Adapted MoE for Incremental Weather Forecasting

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

DualSG: A Dual-Stream Explicit Semantic-Guided Multivariate Time Series Forecasting Framework

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios

SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment

Reconstructing 4D Spatial Intelligence: A Survey

Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning