HyperAI

Reasoning

Métriques de performance des modèles IA grand public sur diverses tâches, mettant en valeur la technologie de pointe

Benchmarks de performance des modèles IA

Métriques de performance des modèles IA grand public sur diverses tâches, mettant en valeur la technologie de pointe

ARC

50 articles | 0 benchmarks

Discrete Choice Models

50 articles | 0 benchmarks

3D Human Reconstruction

48 articles | 10 benchmarks

Causal Identification

46 articles | 0 benchmarks

Common Sense Reasoning

45 articles | 24 benchmarks

Task Planning

42 articles | 0 benchmarks

StrategyQA

39 articles | 0 benchmarks

Decision Making Under Uncertainty

38 articles | 0 benchmarks

Temporal Sequences

35 articles | 1 benchmarks

Physical Intuition

33 articles | 1 benchmarks

Assortment Optimization

32 articles | 0 benchmarks

Natural Language Visual Grounding

32 articles | 1 benchmarks

Missing Labels

30 articles | 0 benchmarks

Model-based Reinforcement Learning

30 articles | 0 benchmarks

Abstract Argumentation

25 articles | 0 benchmarks

Zero-Shot Video Question Answer

25 articles | 16 benchmarks

Visual Reasoning

24 articles | 12 benchmarks

Systematic Generalization

22 articles | 0 benchmarks

Decision Making

20 articles | 1 benchmarks

Geometry Problem Solving

20 articles | 0 benchmarks

Odd One Out

20 articles | 1 benchmarks

Video-based Generative Performance Benchmarking

20 articles | 1 benchmarks

Abstract Algebra

18 articles | 1 benchmarks

Program Repair

18 articles | 3 benchmarks

Image Paragraph Captioning

17 articles | 1 benchmarks

Navigate

16 articles | 0 benchmarks

Video-based Generative Performance Benchmarking (Contextual Understanding)

16 articles | 1 benchmarks

Video-based Generative Performance Benchmarking (Correctness of Information)

15 articles | 1 benchmarks

Video-based Generative Performance Benchmarking (Detail Orientation))

15 articles | 1 benchmarks

Video-based Generative Performance Benchmarking (Temporal Understanding)

15 articles | 1 benchmarks

Video-based Generative Performance Benchmarking (Consistency)

15 articles | 1 benchmarks

Date Understanding

14 articles | 0 benchmarks

Visual Commonsense Reasoning

14 articles | 7 benchmarks

Formal Logic

13 articles | 1 benchmarks

Automated Theorem Proving

11 articles | 9 benchmarks

Arithmetic Reasoning

9 articles | 5 benchmarks

Error Understanding

9 articles | 2 benchmarks

Logical Sequence

9 articles | 0 benchmarks

Mathematical Induction

9 articles | 1 benchmarks

Physical Commonsense Reasoning

9 articles | 1 benchmarks

Analogical Similarity

7 articles | 1 benchmarks

Autonomous Web Navigation

7 articles | 0 benchmarks

Causal Judgment

7 articles | 0 benchmarks

Elementary Mathematics

7 articles | 1 benchmarks

Logical Reasoning

7 articles | 10 benchmarks

Theory of Mind Modeling

7 articles | 0 benchmarks

GitHub issue resolution

6 articles | 0 benchmarks

Logical Fallacy Detection

6 articles | 0 benchmarks

Math Word Problem Solving

6 articles | 13 benchmarks

Multimodal Reasoning

6 articles | 3 benchmarks

Visual Entailment

6 articles | 3 benchmarks

Human Judgment Correlation

5 articles | 2 benchmarks

Winowhy

5 articles | 0 benchmarks

Checkmate In One

4 articles | 0 benchmarks

High School Mathematics

4 articles | 1 benchmarks

Penguins In A Table

4 articles | 0 benchmarks

Anachronisms

3 articles | 0 benchmarks

College Mathematics

3 articles | 1 benchmarks

Conformal Prediction

3 articles | 0 benchmarks

Crass AI

3 articles | 1 benchmarks

Reasoning About Colored Objects

3 articles | 0 benchmarks

Analytic Entailment

2 articles | 1 benchmarks

Crash Blossom

2 articles | 1 benchmarks

Entailed Polarity

2 articles | 1 benchmarks

Evaluating Information Essentiality

2 articles | 1 benchmarks

Human Judgment Classification

2 articles | 1 benchmarks

Identify Odd Metapor

2 articles | 1 benchmarks

Logical Args

2 articles | 1 benchmarks

Metaphor Boolean

2 articles | 1 benchmarks

Novel Concepts

2 articles | 0 benchmarks

Presuppositions As NLI

2 articles | 1 benchmarks

Code Line Descriptions

1 articles | 0 benchmarks

Commonsense Reasoning for RL

1 articles | 1 benchmarks

Pre-election ratings estimation

1 articles | 0 benchmarks

Professional Accounting

1 articles | 1 benchmarks