HyperAI

AI SOTA Benchmarks

Latest AI model performance metrics, GPU benchmarks, and cutting-edge papers

AI Model Performance Benchmarks

Performance metrics of mainstream AI models across various tasks, showcasing the state-of-the-art technology

Open-Domain Question Answering

30 papers | 15 benchmarks

Handwritten Text Recognition

32 papers | 13 benchmarks

Adversarial Defense

34 papers | 10 benchmarks

Red Teaming

47 papers | 0 benchmarks

Backdoor Attack

36 papers | 0 benchmarks

Audio Classification

44 papers | 26 benchmarks

Bandwidth Extension

45 papers | 6 benchmarks

Target Speaker Extraction

49 papers | 0 benchmarks

Inference Optimization

48 papers | 0 benchmarks

Room Impulse Response (RIR)

46 papers | 0 benchmarks

Type prediction

44 papers | 3 benchmarks

Chart Question Answering

41 papers | 3 benchmarks

Compiler Optimization

44 papers | 0 benchmarks

Traffic Signal Control

40 papers | 0 benchmarks

Code Classification

37 papers | 0 benchmarks

Action Quality Assessment

50 papers | 8 benchmarks

3D Lane Detection

50 papers | 4 benchmarks

Colorization

50 papers | 2 benchmarks

Color Constancy

50 papers | 1 benchmarks

Human Dynamics

50 papers | 0 benchmarks

Node Classification

42 papers | 127 benchmarks

Graph Property Prediction

45 papers | 4 benchmarks

Jet Tagging

44 papers | 1 benchmarks

Triple Classification

44 papers | 1 benchmarks

Graph Sampling

49 papers | 0 benchmarks

Document Summarization

46 papers | 7 benchmarks

Knowledge Graphs

44 papers | 4 benchmarks

Explainable Artificial Intelligence (XAI)

49 papers | 1 benchmarks

Ontology Matching

50 papers | 0 benchmarks

Knowledge Base Construction

44 papers | 0 benchmarks

multimodal

74 papers | 77 benchmarks

reasoning

56 papers | 56 benchmarks

understanding

46 papers | 48 benchmarks

other

32 papers | 32 benchmarks

knowledge

27 papers | 29 benchmarks

Skin Lesion Segmentation

48 papers | 3 benchmarks

Diabetic Retinopathy Detection

48 papers | 1 benchmarks

Pharmacovigilance

50 papers | 0 benchmarks

SSVEP

50 papers | 0 benchmarks

Metal Artifact Reduction

48 papers | 0 benchmarks

Classification

49 papers | 71 benchmarks

Domain Generalization

48 papers | 20 benchmarks

Bilevel Optimization

50 papers | 3 benchmarks

Computational Efficiency

49 papers | 1 benchmarks

Inductive Learning

49 papers | 0 benchmarks

Deep Clustering

50 papers | 5 benchmarks

Multimodal Recommendation

50 papers | 5 benchmarks

Physical Simulations

50 papers | 5 benchmarks

Electrical Engineering

50 papers | 1 benchmarks

Computational Efficiency

49 papers | 1 benchmarks

Music Transcription

40 papers | 6 benchmarks

Voice Conversion

41 papers | 3 benchmarks

Community Question Answering

35 papers | 2 benchmarks

Music Classification

49 papers | 0 benchmarks

Music Information Retrieval

44 papers | 0 benchmarks

Few-Shot Text Classification

49 papers | 8 benchmarks

Word Alignment

50 papers | 7 benchmarks

Deep Clustering

50 papers | 5 benchmarks

Semantic Dependency Parsing

50 papers | 3 benchmarks

Lemmatization

49 papers | 0 benchmarks

Offline RL

48 papers | 2 benchmarks

Community Question Answering

35 papers | 2 benchmarks

Car Racing

48 papers | 0 benchmarks

Real-Time Strategy Games

46 papers | 0 benchmarks

Game Design

43 papers | 0 benchmarks

Common Sense Reasoning

45 papers | 24 benchmarks

3D Human Reconstruction

48 papers | 10 benchmarks

ARC

50 papers | 0 benchmarks

Discrete Choice Models

50 papers | 0 benchmarks

Causal Identification

46 papers | 0 benchmarks

Gesture Generation

47 papers | 4 benchmarks

Robot Task Planning

46 papers | 3 benchmarks

Trajectory Planning

47 papers | 2 benchmarks

Benchmarking

45 papers | 2 benchmarks

multimodal interaction

45 papers | 0 benchmarks

Speech Separation

49 papers | 19 benchmarks

Spoken language identification

50 papers | 12 benchmarks

Speech Dereverberation

50 papers | 5 benchmarks

Acoustic Modelling

50 papers | 0 benchmarks

Spoken Dialogue Systems

47 papers | 0 benchmarks

Time Series Forecasting

49 papers | 86 benchmarks

Time Series Prediction

50 papers | 2 benchmarks

Computational Efficiency

49 papers | 1 benchmarks

Activity Prediction

48 papers | 1 benchmarks

Predictive Process Monitoring

48 papers | 0 benchmarks

GPU Benchmarks

Latest GPU hardware and software performance evaluations to help you make informed hardware choices

Software Performance

DeepSeek-R1-Distill-Qwen-7B
Environment: vllm
DeepSeek-R1-Distill-Llama-8B
Environment: vllm
DeepSeek-R1-Distill-Qwen-14B
Environment: vllm
DeepSeek-R1-Distill-Qwen-32B
Environment: vllm
DeepSeek-R1-Distill-Llama-70B
Environment: vllm
DeepSeek-R1-Distill-Qwen-7B
Environment: sglang
DeepSeek-R1-Distill-Llama-8B
Environment: sglang
DeepSeek-R1-Distill-Qwen-14B
Environment: sglang
DeepSeek-R1-Distill-Qwen-32B
Environment: sglang
DeepSeek-R1-Distill-Llama-70B
Environment: sglang

Latest Research Papers

Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Sleep-time Compute: Beyond Inference Scaling at Test-time
Kevin Lin, Charlie Snell, Yu Wang, et al.
Release Date: 4/18/2025
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling
Tsung-Han Wu, Heekyung Lee, Jiaxin Ge, et al.
Release Date: 4/18/2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya, Po-Yao Huang, Peize Sun, et al.
Release Date: 4/18/2025
It seems like the text you provided is already in English and is a title or a heading for a scientific or technological paper. However, if you have a Chinese text that you would like translated into English, please provide it, and I will be happy to assist you.
Shizhe Diao, Yu Yang, Yonggan Fu, et al.
Release Date: 4/18/2025
FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
Nandan Thakur, Jimmy Lin, Sam Havens, et al.
Release Date: 4/18/2025
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
Haojian Huang, Haodong Chen, Shengqiong Wu, et al.
Release Date: 4/18/2025
Exploring Expert Failures Improves LLM Agent Tuning
Li-Cheng Lan, Andrew Bai, Minhao Cheng, et al.
Release Date: 4/18/2025
Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts
Leyang Li, Shilin Lu, Yan Ren, et al.
Release Date: 4/18/2025
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Xiangyan Liu, Jinjie Ni, Zijian Wu, et al.
Release Date: 4/18/2025
It seems like you've provided a term, "Antidistillation Sampling," rather than a full news article or academic report. Could you please provide the complete text you would like translated? This will help me give you an accurate and contextually appropriate translation.
Yash Savani, Asher Trockman, Zhili Feng, et al.
Release Date: 4/18/2025
Retrieval-Augmented Generation with Conflicting Evidence
Han Wang, Archiki Prasad, Elias Stengel-Eskin, et al.
Release Date: 4/18/2025
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation
Lvmin Zhang, Maneesh Agrawala
Release Date: 4/18/2025
HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks
Stefan Abi-Karam, Cong Hao
Release Date: 4/18/2025
SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians
Liam Schoneveld, Zhe Chen, Davide Davoli, et al.
Release Date: 4/18/2025
Cobra: Efficient Line Art COlorization with BRoAder References
Junhao Zhuang, Lingen Li, Xuan Ju, et al.
Release Date: 4/17/2025
Towards Learning to Complete Anything in Lidar
Ayca Takmaz, Cristiano Saltori, Neehar Peri, et al.
Release Date: 4/17/2025
Robust and Fine-Grained Detection of AI Generated Texts
Ram Mohan Rao Kadiyala, Siddartha Pullakhandam, Kanwal Mehreen, et al.
Release Date: 4/17/2025
BitNet b1.58 2B4T Technical Report
Shuming Ma, Hongyu Wang, Shaohan Huang, et al.
Release Date: 4/17/2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei, Jiacong Wang, Haochen Wang, et al.
Release Date: 4/17/2025
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Fangzhi Xu, Hang Yan, Chang Ma, et al.
Release Date: 4/17/2025
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients
Ming Li, Yanhong Li, Ziyue Li, et al.
Release Date: 4/17/2025
Seedream 3.0 Technical Report
Yu Gao, Lixue Gong, Qiushan Guo, et al.
Release Date: 4/17/2025
Heimdall: test-time scaling on the generative verification
Wenlei Shi, Xing Jin
Release Date: 4/17/2025
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Tao Zhang, Xiangtai Li, Zilong Huang, et al.
Release Date: 4/17/2025
TextArena
Leon Guertler, Bobby Cheng, Simon Yu, et al.
Release Date: 4/17/2025
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Ding Chen, Qingchen Yu, Pengyuan Wang, et al.
Release Date: 4/17/2025
Uncertainty-Guided Coarse-to-Fine Tumor Segmentation with Anatomy-Aware Post-Processing
Ilkin Sevgi Isler, David Mohaisen, Curtis Lisle, et al.
Release Date: 4/17/2025
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Zhiwei He, Tian Liang, Jiahao Xu, et al.
Release Date: 4/17/2025
The Urban Impact of AI: Modeling Feedback Loops in Next-Venue Recommendation
Giovanni Mauro, Marco Minici, Luca Pappalardo
Release Date: 4/16/2025
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft
Junliang Guo, Yang Ye, Tianyu He, et al.
Release Date: 4/16/2025
PixelFlow: Pixel-Space Generative Models with Flow
Shoufa Chen, Chongjian Ge, Shilong Zhang, et al.
Release Date: 4/16/2025
Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models
Yang Yan, Yu Lu, Renjun Xu, et al.
Release Date: 4/16/2025
CoRAG: Collaborative Retrieval-Augmented Generation
Aashiq Muhamed, Mona Diab, Virginia Smith
Release Date: 4/16/2025
SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs
Aashiq Muhamed, Jacopo Bonato, Mona Diab, et al.
Release Date: 4/16/2025
It seems like the text you provided is already in English. However, if you meant to translate a Chinese title or description related to FlexIP, please provide the Chinese text, and I will be happy to translate it for you. Here’s an example of how the translation might look if the text were in Chinese: **Chinese:** FlexIP: 定制图像生成的动态保存与个性控制 **English:** FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation If you have a different piece of text that needs translation, please share it!
Linyan Huang, Haonan Lin, Yanning Zhou, et al.
Release Date: 4/16/2025
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Sai Kumar Dwivedi, Dimitrije Antić, Shashank Tripathi, et al.
Release Date: 4/16/2025
In-2-4D: Inbetweening from Two Single-View Images to 4D Generation
Sauradip Nag, Daniel Cohen-Or, Hao Zhang, et al.
Release Date: 4/16/2025
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Peixian Ma, Xialie Zhuang, Chengjin Xu, et al.
Release Date: 4/16/2025
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
Wissam Antoun, Benoît Sagot, Djamé Seddah
Release Date: 4/16/2025
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Team Seawead, Ceyuan Yang, Zhijie Lin, et al.
Release Date: 4/16/2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong, Jun Hao Liew, Zilong Huang, et al.
Release Date: 4/16/2025
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images
Boyang Deng, Songyou Peng, Kyle Genova, et al.
Release Date: 4/16/2025
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model
Haozhan Shen, Peng Liu, Jingcheng Li, et al.
Release Date: 4/16/2025
Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs
Yichun Yin, Wenyong Huang, Kaikai Song, et al.
Release Date: 4/16/2025
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization
Jialu Li, Shoubin Yu, Han Lin, et al.
Release Date: 4/16/2025
ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration
Yongsheng Yu, Haitian Zheng, Zhifei Zhang, et al.
Release Date: 4/16/2025
Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging
Gabriele Lozupone, Alessandro Bria, Francesco Fontanella, et al.
Release Date: 4/16/2025
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users
Xinnong Zhang, Jiayu Lin, Xinyi Mou, et al.
Release Date: 4/16/2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu, Weiyun Wang, Zhe Chen, et al.
Release Date: 4/16/2025
LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models
Minqian Liu, Zhiyang Xu, Xinyi Zhang, et al.
Release Date: 4/16/2025
S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models
Wenyuan Zhang, Shuaiyi Nie, Xinghua Zhang, et al.
Release Date: 4/16/2025
LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models
Parshin Shojaee, Ngoc-Hieu Nguyen, Kazem Meidani, et al.
Release Date: 4/16/2025
It seems like the text you provided is already in English. However, if you meant to translate a Chinese title or description of "Mavors: Multi-granularity Video Representation for Multimodal Large Language Model," please provide the Chinese text, and I will be happy to translate it for you.
Yang Shi, Jiaheng Liu, Yushuo Guan, et al.
Release Date: 4/16/2025
It seems like the text you provided is already in English. However, if you meant to translate a title or a concept from Chinese to English, please provide the Chinese text, and I will be happy to translate it for you. If you have any other news or academic achievement reports in Chinese that need translation, feel free to share them!
Zheng Liu, Mengjie Liu, Jingzhou Chen, et al.
Release Date: 4/16/2025
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Junxiong Wang, Wen-Ding Li, Daniele Paliotta, et al.
Release Date: 4/16/2025
Breaking the Data Barrier -- Building GUI Agents Through Task Generalization
Junlei Zhang, Zichen Ding, Chang Ma, et al.
Release Date: 4/16/2025
Towards Automated Safety Requirements Derivation Using Agent-based RAG
Balahari Vignesh Balu, Florian Geissler, Francesco Carella, et al.
Release Date: 4/16/2025
C-SHAP for time series: An approach to high-level temporal explanations
Annemarie Jutte, Faizan Ahmed, Jeroen Linssen, Maurice van Keulen
Release Date: 4/16/2025