AI SOTA Benchmarks
Latest AI model performance metrics, GPU benchmarks, and cutting-edge papers
AI Model Performance Benchmarks
Performance metrics of mainstream AI models across various tasks, showcasing the state-of-the-art technology
Open-Domain Question Answering
30 papers | 15 benchmarks
Handwritten Text Recognition
32 papers | 13 benchmarks
Adversarial Defense
34 papers | 10 benchmarks
Red Teaming
47 papers | 0 benchmarks
Backdoor Attack
36 papers | 0 benchmarks
Audio Classification
44 papers | 26 benchmarks
Bandwidth Extension
45 papers | 6 benchmarks
Target Speaker Extraction
49 papers | 0 benchmarks
Inference Optimization
48 papers | 0 benchmarks
Room Impulse Response (RIR)
46 papers | 0 benchmarks
Type prediction
44 papers | 3 benchmarks
Chart Question Answering
41 papers | 3 benchmarks
Compiler Optimization
44 papers | 0 benchmarks
Traffic Signal Control
40 papers | 0 benchmarks
Code Classification
37 papers | 0 benchmarks
Action Quality Assessment
50 papers | 8 benchmarks
3D Lane Detection
50 papers | 4 benchmarks
Colorization
50 papers | 2 benchmarks
Color Constancy
50 papers | 1 benchmarks
Human Dynamics
50 papers | 0 benchmarks
Node Classification
42 papers | 127 benchmarks
Graph Property Prediction
45 papers | 4 benchmarks
Jet Tagging
44 papers | 1 benchmarks
Triple Classification
44 papers | 1 benchmarks
Graph Sampling
49 papers | 0 benchmarks
Document Summarization
46 papers | 7 benchmarks
Knowledge Graphs
44 papers | 4 benchmarks
Explainable Artificial Intelligence (XAI)
49 papers | 1 benchmarks
Ontology Matching
50 papers | 0 benchmarks
Knowledge Base Construction
44 papers | 0 benchmarks
multimodal
74 papers | 77 benchmarks
reasoning
56 papers | 56 benchmarks
understanding
46 papers | 48 benchmarks
other
32 papers | 32 benchmarks
knowledge
27 papers | 29 benchmarks
Skin Lesion Segmentation
48 papers | 3 benchmarks
Diabetic Retinopathy Detection
48 papers | 1 benchmarks
Pharmacovigilance
50 papers | 0 benchmarks
SSVEP
50 papers | 0 benchmarks
Metal Artifact Reduction
48 papers | 0 benchmarks
Classification
49 papers | 71 benchmarks
Domain Generalization
48 papers | 20 benchmarks
Bilevel Optimization
50 papers | 3 benchmarks
Computational Efficiency
49 papers | 1 benchmarks
Inductive Learning
49 papers | 0 benchmarks
Deep Clustering
50 papers | 5 benchmarks
Multimodal Recommendation
50 papers | 5 benchmarks
Physical Simulations
50 papers | 5 benchmarks
Electrical Engineering
50 papers | 1 benchmarks
Computational Efficiency
49 papers | 1 benchmarks
Music Transcription
40 papers | 6 benchmarks
Voice Conversion
41 papers | 3 benchmarks
Community Question Answering
35 papers | 2 benchmarks
Music Classification
49 papers | 0 benchmarks
Music Information Retrieval
44 papers | 0 benchmarks
Few-Shot Text Classification
49 papers | 8 benchmarks
Word Alignment
50 papers | 7 benchmarks
Deep Clustering
50 papers | 5 benchmarks
Semantic Dependency Parsing
50 papers | 3 benchmarks
Lemmatization
49 papers | 0 benchmarks
Offline RL
48 papers | 2 benchmarks
Community Question Answering
35 papers | 2 benchmarks
Car Racing
48 papers | 0 benchmarks
Real-Time Strategy Games
46 papers | 0 benchmarks
Game Design
43 papers | 0 benchmarks
Common Sense Reasoning
45 papers | 24 benchmarks
3D Human Reconstruction
48 papers | 10 benchmarks
ARC
50 papers | 0 benchmarks
Discrete Choice Models
50 papers | 0 benchmarks
Causal Identification
46 papers | 0 benchmarks
Gesture Generation
47 papers | 4 benchmarks
Robot Task Planning
46 papers | 3 benchmarks
Trajectory Planning
47 papers | 2 benchmarks
Benchmarking
45 papers | 2 benchmarks
multimodal interaction
45 papers | 0 benchmarks
Speech Separation
49 papers | 19 benchmarks
Spoken language identification
50 papers | 12 benchmarks
Speech Dereverberation
50 papers | 5 benchmarks
Acoustic Modelling
50 papers | 0 benchmarks
Spoken Dialogue Systems
47 papers | 0 benchmarks
Time Series Forecasting
49 papers | 86 benchmarks
Time Series Prediction
50 papers | 2 benchmarks
Computational Efficiency
49 papers | 1 benchmarks
Activity Prediction
48 papers | 1 benchmarks
Predictive Process Monitoring
48 papers | 0 benchmarks
GPU Benchmarks
Latest GPU hardware and software performance evaluations to help you make informed hardware choices
Software Performance
DeepSeek-R1-Distill-Qwen-7B
Environment: vllm
DeepSeek-R1-Distill-Llama-8B
Environment: vllm
DeepSeek-R1-Distill-Qwen-14B
Environment: vllm
DeepSeek-R1-Distill-Qwen-32B
Environment: vllm
DeepSeek-R1-Distill-Llama-70B
Environment: vllm
DeepSeek-R1-Distill-Qwen-7B
Environment: sglang
DeepSeek-R1-Distill-Llama-8B
Environment: sglang
DeepSeek-R1-Distill-Qwen-14B
Environment: sglang
DeepSeek-R1-Distill-Qwen-32B
Environment: sglang
DeepSeek-R1-Distill-Llama-70B
Environment: sglang
Latest Research Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends
Sleep-time Compute: Beyond Inference Scaling at Test-time
Kevin Lin, Charlie Snell, Yu Wang, et al.
Release Date: 4/18/2025
Generate, but Verify: Reducing Hallucination in Vision-Language Models
with Retrospective Resampling
Tsung-Han Wu, Heekyung Lee, Jiaxin Ge, et al.
Release Date: 4/18/2025
Perception Encoder: The best visual embeddings are not at the output of
the network
Daniel Bolya, Po-Yao Huang, Peize Sun, et al.
Release Date: 4/18/2025
It seems like the text you provided is already in English and is a title or a heading for a scientific or technological paper. However, if you have a Chinese text that you would like translated into English, please provide it, and I will be happy to assist you.
Shizhe Diao, Yu Yang, Yonggan Fu, et al.
Release Date: 4/18/2025
FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on
Technical Documents
Nandan Thakur, Jimmy Lin, Sam Havens, et al.
Release Date: 4/18/2025
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference
Optimization for Large Video Models
Haojian Huang, Haodong Chen, Shengqiong Wu, et al.
Release Date: 4/18/2025
Exploring Expert Failures Improves LLM Agent Tuning
Li-Cheng Lan, Andrew Bai, Minhao Cheng, et al.
Release Date: 4/18/2025
Set You Straight: Auto-Steering Denoising Trajectories to Sidestep
Unwanted Concepts
Leyang Li, Shilin Lu, Yan Ren, et al.
Release Date: 4/18/2025
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Xiangyan Liu, Jinjie Ni, Zijian Wu, et al.
Release Date: 4/18/2025
It seems like you've provided a term, "Antidistillation Sampling," rather than a full news article or academic report. Could you please provide the complete text you would like translated? This will help me give you an accurate and contextually appropriate translation.
Yash Savani, Asher Trockman, Zhili Feng, et al.
Release Date: 4/18/2025
Retrieval-Augmented Generation with Conflicting Evidence
Han Wang, Archiki Prasad, Elias Stengel-Eskin, et al.
Release Date: 4/18/2025
Packing Input Frame Context in Next-Frame Prediction Models for Video
Generation
Lvmin Zhang, Maneesh Agrawala
Release Date: 4/18/2025
HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks
Stefan Abi-Karam, Cong Hao
Release Date: 4/18/2025
SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians
Liam Schoneveld, Zhe Chen, Davide Davoli, et al.
Release Date: 4/18/2025
Cobra: Efficient Line Art COlorization with BRoAder References
Junhao Zhuang, Lingen Li, Xuan Ju, et al.
Release Date: 4/17/2025
Towards Learning to Complete Anything in Lidar
Ayca Takmaz, Cristiano Saltori, Neehar Peri, et al.
Release Date: 4/17/2025
Robust and Fine-Grained Detection of AI Generated Texts
Ram Mohan Rao Kadiyala, Siddartha Pullakhandam, Kanwal Mehreen, et al.
Release Date: 4/17/2025
BitNet b1.58 2B4T Technical Report
Shuming Ma, Hongyu Wang, Shaohan Huang, et al.
Release Date: 4/17/2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei, Jiacong Wang, Haochen Wang, et al.
Release Date: 4/17/2025
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Fangzhi Xu, Hang Yan, Chang Ma, et al.
Release Date: 4/17/2025
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients
Ming Li, Yanhong Li, Ziyue Li, et al.
Release Date: 4/17/2025
Seedream 3.0 Technical Report
Yu Gao, Lixue Gong, Qiushan Guo, et al.
Release Date: 4/17/2025
Heimdall: test-time scaling on the generative verification
Wenlei Shi, Xing Jin
Release Date: 4/17/2025
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Tao Zhang, Xiangtai Li, Zilong Huang, et al.
Release Date: 4/17/2025
TextArena
Leon Guertler, Bobby Cheng, Simon Yu, et al.
Release Date: 4/17/2025
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Ding Chen, Qingchen Yu, Pengyuan Wang, et al.
Release Date: 4/17/2025
Uncertainty-Guided Coarse-to-Fine Tumor Segmentation with Anatomy-Aware Post-Processing
Ilkin Sevgi Isler, David Mohaisen, Curtis Lisle, et al.
Release Date: 4/17/2025
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
Zhiwei He, Tian Liang, Jiahao Xu, et al.
Release Date: 4/17/2025
The Urban Impact of AI: Modeling Feedback Loops in Next-Venue Recommendation
Giovanni Mauro, Marco Minici, Luca Pappalardo
Release Date: 4/16/2025
MineWorld: a Real-Time and Open-Source Interactive World Model on
Minecraft
Junliang Guo, Yang Ye, Tianyu He, et al.
Release Date: 4/16/2025
PixelFlow: Pixel-Space Generative Models with Flow
Shoufa Chen, Chongjian Ge, Shilong Zhang, et al.
Release Date: 4/16/2025
Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning
vs. Memorization in Large Language Models
Yang Yan, Yu Lu, Renjun Xu, et al.
Release Date: 4/16/2025
CoRAG: Collaborative Retrieval-Augmented Generation
Aashiq Muhamed, Mona Diab, Virginia Smith
Release Date: 4/16/2025
SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder
Guardrails for Precision Unlearning in LLMs
Aashiq Muhamed, Jacopo Bonato, Mona Diab, et al.
Release Date: 4/16/2025
It seems like the text you provided is already in English. However, if you meant to translate a Chinese title or description related to FlexIP, please provide the Chinese text, and I will be happy to translate it for you. Here’s an example of how the translation might look if the text were in Chinese:
**Chinese:**
FlexIP: 定制图像生成的动态保存与个性控制
**English:**
FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation
If you have a different piece of text that needs translation, please share it!
Linyan Huang, Haonan Lin, Yanning Zhou, et al.
Release Date: 4/16/2025
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Sai Kumar Dwivedi, Dimitrije Antić, Shashank Tripathi, et al.
Release Date: 4/16/2025
In-2-4D: Inbetweening from Two Single-View Images to 4D Generation
Sauradip Nag, Daniel Cohen-Or, Hao Zhang, et al.
Release Date: 4/16/2025
SQL-R1: Training Natural Language to SQL Reasoning Model By
Reinforcement Learning
Peixian Ma, Xialie Zhuang, Chengjin Xu, et al.
Release Date: 4/16/2025
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on
Transformer Encoder Models Performance
Wissam Antoun, Benoît Sagot, Djamé Seddah
Release Date: 4/16/2025
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Team Seawead, Ceyuan Yang, Zhijie Lin, et al.
Release Date: 4/16/2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for
Autoregressive Image Generation
Tianwei Xiong, Jun Hao Liew, Zilong Huang, et al.
Release Date: 4/16/2025
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections
of Images
Boyang Deng, Songyou Peng, Kyle Genova, et al.
Release Date: 4/16/2025
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model
Haozhan Shen, Peng Liu, Jingcheng Li, et al.
Release Date: 4/16/2025
Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend
NPUs
Yichun Yin, Wenyong Huang, Kaikai Song, et al.
Release Date: 4/16/2025
Training-free Guidance in Text-to-Video Generation via Multimodal
Planning and Structured Noise Initialization
Jialu Li, Shoubin Yu, Han Lin, et al.
Release Date: 4/16/2025
ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image
Restoration
Yongsheng Yu, Haitian Zheng, Zhifei Zhang, et al.
Release Date: 4/16/2025
Latent Diffusion Autoencoders: Toward Efficient and Meaningful
Unsupervised Representation Learning in Medical Imaging
Gabriele Lozupone, Alessandro Bria, Francesco Fontanella, et al.
Release Date: 4/16/2025
SocioVerse: A World Model for Social Simulation Powered by LLM Agents
and A Pool of 10 Million Real-World Users
Xinnong Zhang, Jiayu Lin, Xinyi Mou, et al.
Release Date: 4/16/2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for
Open-Source Multimodal Models
Jinguo Zhu, Weiyun Wang, Zhe Chen, et al.
Release Date: 4/16/2025
LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety
in Large Language Models
Minqian Liu, Zhiyang Xu, Xinyi Zhang, et al.
Release Date: 4/16/2025
S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability
of Large Reasoning Models
Wenyuan Zhang, Shuaiyi Nie, Xinghua Zhang, et al.
Release Date: 4/16/2025
LLM-SRBench: A New Benchmark for Scientific Equation Discovery with
Large Language Models
Parshin Shojaee, Ngoc-Hieu Nguyen, Kazem Meidani, et al.
Release Date: 4/16/2025
It seems like the text you provided is already in English. However, if you meant to translate a Chinese title or description of "Mavors: Multi-granularity Video Representation for Multimodal Large Language Model," please provide the Chinese text, and I will be happy to translate it for you.
Yang Shi, Jiaheng Liu, Yushuo Guan, et al.
Release Date: 4/16/2025
It seems like the text you provided is already in English. However, if you meant to translate a title or a concept from Chinese to English, please provide the Chinese text, and I will be happy to translate it for you. If you have any other news or academic achievement reports in Chinese that need translation, feel free to share them!
Zheng Liu, Mengjie Liu, Jingzhou Chen, et al.
Release Date: 4/16/2025
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Junxiong Wang, Wen-Ding Li, Daniele Paliotta, et al.
Release Date: 4/16/2025
Breaking the Data Barrier -- Building GUI Agents Through Task
Generalization
Junlei Zhang, Zichen Ding, Chang Ma, et al.
Release Date: 4/16/2025
Towards Automated Safety Requirements Derivation Using Agent-based RAG
Balahari Vignesh Balu, Florian Geissler, Francesco Carella, et al.
Release Date: 4/16/2025
C-SHAP for time series: An approach to high-level temporal explanations
Annemarie Jutte, Faizan Ahmed, Jeroen Linssen, Maurice van Keulen
Release Date: 4/16/2025