HyperAI

Reasoning

Performance metrics of mainstream AI models across various tasks, showcasing the state-of-the-art technology

AI Model Performance Benchmarks

Performance metrics of mainstream AI models across various tasks, showcasing the state-of-the-art technology

ARC

50 papers | 0 benchmarks

Discrete Choice Models

50 papers | 0 benchmarks

3D Human Reconstruction

48 papers | 10 benchmarks

Causal Identification

46 papers | 0 benchmarks

Common Sense Reasoning

45 papers | 24 benchmarks

Task Planning

42 papers | 0 benchmarks

StrategyQA

39 papers | 0 benchmarks

Decision Making Under Uncertainty

38 papers | 0 benchmarks

Temporal Sequences

35 papers | 1 benchmarks

Physical Intuition

33 papers | 1 benchmarks

Assortment Optimization

32 papers | 0 benchmarks

Natural Language Visual Grounding

32 papers | 1 benchmarks

Missing Labels

30 papers | 0 benchmarks

Model-based Reinforcement Learning

30 papers | 0 benchmarks

Abstract Argumentation

25 papers | 0 benchmarks

Zero-Shot Video Question Answer

25 papers | 16 benchmarks

Visual Reasoning

24 papers | 12 benchmarks

Systematic Generalization

22 papers | 0 benchmarks

Decision Making

20 papers | 1 benchmarks

Geometry Problem Solving

20 papers | 0 benchmarks

Odd One Out

20 papers | 1 benchmarks

Video-based Generative Performance Benchmarking

20 papers | 1 benchmarks

Abstract Algebra

18 papers | 1 benchmarks

Program Repair

18 papers | 3 benchmarks

Image Paragraph Captioning

17 papers | 1 benchmarks

Navigate

16 papers | 0 benchmarks

Video-based Generative Performance Benchmarking (Contextual Understanding)

16 papers | 1 benchmarks

Video-based Generative Performance Benchmarking (Correctness of Information)

15 papers | 1 benchmarks

Video-based Generative Performance Benchmarking (Detail Orientation))

15 papers | 1 benchmarks

Video-based Generative Performance Benchmarking (Temporal Understanding)

15 papers | 1 benchmarks

Video-based Generative Performance Benchmarking (Consistency)

15 papers | 1 benchmarks

Date Understanding

14 papers | 0 benchmarks

Visual Commonsense Reasoning

14 papers | 7 benchmarks

Formal Logic

13 papers | 1 benchmarks

Automated Theorem Proving

11 papers | 9 benchmarks

Arithmetic Reasoning

9 papers | 5 benchmarks

Error Understanding

9 papers | 2 benchmarks

Logical Sequence

9 papers | 0 benchmarks

Mathematical Induction

9 papers | 1 benchmarks

Physical Commonsense Reasoning

9 papers | 1 benchmarks

Analogical Similarity

7 papers | 1 benchmarks

Autonomous Web Navigation

7 papers | 0 benchmarks

Causal Judgment

7 papers | 0 benchmarks

Elementary Mathematics

7 papers | 1 benchmarks

Logical Reasoning

7 papers | 10 benchmarks

Theory of Mind Modeling

7 papers | 0 benchmarks

GitHub issue resolution

6 papers | 0 benchmarks

Logical Fallacy Detection

6 papers | 0 benchmarks

Math Word Problem Solving

6 papers | 13 benchmarks

Multimodal Reasoning

6 papers | 3 benchmarks

Visual Entailment

6 papers | 3 benchmarks

Human Judgment Correlation

5 papers | 2 benchmarks

Winowhy

5 papers | 0 benchmarks

Checkmate In One

4 papers | 0 benchmarks

High School Mathematics

4 papers | 1 benchmarks

Penguins In A Table

4 papers | 0 benchmarks

Anachronisms

3 papers | 0 benchmarks

College Mathematics

3 papers | 1 benchmarks

Conformal Prediction

3 papers | 0 benchmarks

Crass AI

3 papers | 1 benchmarks

Reasoning About Colored Objects

3 papers | 0 benchmarks

Analytic Entailment

2 papers | 1 benchmarks

Crash Blossom

2 papers | 1 benchmarks

Entailed Polarity

2 papers | 1 benchmarks

Evaluating Information Essentiality

2 papers | 1 benchmarks

Human Judgment Classification

2 papers | 1 benchmarks

Identify Odd Metapor

2 papers | 1 benchmarks

Logical Args

2 papers | 1 benchmarks

Metaphor Boolean

2 papers | 1 benchmarks

Novel Concepts

2 papers | 0 benchmarks

Presuppositions As NLI

2 papers | 1 benchmarks

Code Line Descriptions

1 papers | 0 benchmarks

Commonsense Reasoning for RL

1 papers | 1 benchmarks

Pre-election ratings estimation

1 papers | 0 benchmarks

Professional Accounting

1 papers | 1 benchmarks