Reasoning
Métriques de performance des modèles IA grand public sur diverses tâches, mettant en valeur la technologie de pointe
Benchmarks de performance des modèles IA
Métriques de performance des modèles IA grand public sur diverses tâches, mettant en valeur la technologie de pointe
ARC
50 articles | 0 benchmarks
Discrete Choice Models
50 articles | 0 benchmarks
3D Human Reconstruction
48 articles | 10 benchmarks
Causal Identification
46 articles | 0 benchmarks
Common Sense Reasoning
45 articles | 24 benchmarks
Task Planning
42 articles | 0 benchmarks
StrategyQA
39 articles | 0 benchmarks
Decision Making Under Uncertainty
38 articles | 0 benchmarks
Temporal Sequences
35 articles | 1 benchmarks
Physical Intuition
33 articles | 1 benchmarks
Assortment Optimization
32 articles | 0 benchmarks
Natural Language Visual Grounding
32 articles | 1 benchmarks
Missing Labels
30 articles | 0 benchmarks
Model-based Reinforcement Learning
30 articles | 0 benchmarks
Abstract Argumentation
25 articles | 0 benchmarks
Zero-Shot Video Question Answer
25 articles | 16 benchmarks
Visual Reasoning
24 articles | 12 benchmarks
Systematic Generalization
22 articles | 0 benchmarks
Decision Making
20 articles | 1 benchmarks
Geometry Problem Solving
20 articles | 0 benchmarks
Odd One Out
20 articles | 1 benchmarks
Video-based Generative Performance Benchmarking
20 articles | 1 benchmarks
Abstract Algebra
18 articles | 1 benchmarks
Program Repair
18 articles | 3 benchmarks
Image Paragraph Captioning
17 articles | 1 benchmarks
Navigate
16 articles | 0 benchmarks
Video-based Generative Performance Benchmarking (Contextual Understanding)
16 articles | 1 benchmarks
Video-based Generative Performance Benchmarking (Correctness of Information)
15 articles | 1 benchmarks
Video-based Generative Performance Benchmarking (Detail Orientation))
15 articles | 1 benchmarks
Video-based Generative Performance Benchmarking (Temporal Understanding)
15 articles | 1 benchmarks
Video-based Generative Performance Benchmarking (Consistency)
15 articles | 1 benchmarks
Date Understanding
14 articles | 0 benchmarks
Visual Commonsense Reasoning
14 articles | 7 benchmarks
Formal Logic
13 articles | 1 benchmarks
Automated Theorem Proving
11 articles | 9 benchmarks
Arithmetic Reasoning
9 articles | 5 benchmarks
Error Understanding
9 articles | 2 benchmarks
Logical Sequence
9 articles | 0 benchmarks
Mathematical Induction
9 articles | 1 benchmarks
Physical Commonsense Reasoning
9 articles | 1 benchmarks
Analogical Similarity
7 articles | 1 benchmarks
Autonomous Web Navigation
7 articles | 0 benchmarks
Causal Judgment
7 articles | 0 benchmarks
Elementary Mathematics
7 articles | 1 benchmarks
Logical Reasoning
7 articles | 10 benchmarks
Theory of Mind Modeling
7 articles | 0 benchmarks
GitHub issue resolution
6 articles | 0 benchmarks
Logical Fallacy Detection
6 articles | 0 benchmarks
Math Word Problem Solving
6 articles | 13 benchmarks
Multimodal Reasoning
6 articles | 3 benchmarks
Visual Entailment
6 articles | 3 benchmarks
Human Judgment Correlation
5 articles | 2 benchmarks
Winowhy
5 articles | 0 benchmarks
Checkmate In One
4 articles | 0 benchmarks
High School Mathematics
4 articles | 1 benchmarks
Penguins In A Table
4 articles | 0 benchmarks
Anachronisms
3 articles | 0 benchmarks
College Mathematics
3 articles | 1 benchmarks
Conformal Prediction
3 articles | 0 benchmarks
Crass AI
3 articles | 1 benchmarks
Reasoning About Colored Objects
3 articles | 0 benchmarks
Analytic Entailment
2 articles | 1 benchmarks
Crash Blossom
2 articles | 1 benchmarks
Entailed Polarity
2 articles | 1 benchmarks
Evaluating Information Essentiality
2 articles | 1 benchmarks
Human Judgment Classification
2 articles | 1 benchmarks
Identify Odd Metapor
2 articles | 1 benchmarks
Logical Args
2 articles | 1 benchmarks
Metaphor Boolean
2 articles | 1 benchmarks
Novel Concepts
2 articles | 0 benchmarks
Presuppositions As NLI
2 articles | 1 benchmarks
Code Line Descriptions
1 articles | 0 benchmarks
Commonsense Reasoning for RL
1 articles | 1 benchmarks
Pre-election ratings estimation
1 articles | 0 benchmarks
Professional Accounting
1 articles | 1 benchmarks