6 months ago

Weida Wang Dongchen Huang Jiatong Li Tengchao Yang Ziyang Zheng Di Zhang Dong Han Benteng Chen Binzhao Luo Zhiyu Liu

Abstract

We introduce CMPhysBench, designed to assess the proficiency of LargeLanguage Models (LLMs) in Condensed Matter Physics, as a novel Benchmark.CMPhysBench is composed of more than 520 graduate-level meticulously curatedquestions covering both representative subfields and foundational theoreticalframeworks of condensed matter physics, such as magnetism, superconductivity,strongly correlated systems, etc. To ensure a deep understanding of theproblem-solving process,we focus exclusively on calculation problems, requiringLLMs to independently generate comprehensive solutions. Meanwhile, leveragingtree-based representations of expressions, we introduce the Scalable ExpressionEdit Distance (SEED) score, which provides fine-grained (non-binary) partialcredit and yields a more accurate assessment of similarity between predictionand ground-truth. Our results show that even the best models, Grok-4, reachonly 36 average SEED score and 28% accuracy on CMPhysBench, underscoring asignificant capability gap, especially for this practical and frontier domainrelative to traditional physics. The code anddataset are publicly available athttps://github.com/CMPhysBench/CMPhysBench.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

6 months ago

Weida Wang Dongchen Huang Jiatong Li Tengchao Yang Ziyang Zheng Di Zhang Dong Han Benteng Chen Binzhao Luo Zhiyu Liu

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

6 months ago

Weida Wang Dongchen Huang Jiatong Li Tengchao Yang Ziyang Zheng Di Zhang Dong Han Benteng Chen Binzhao Luo Zhiyu Liu

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics

Weida Wang Dongchen Huang Jiatong Li Tengchao Yang Ziyang Zheng Di Zhang Dong Han Benteng Chen Binzhao Luo Zhiyu Liu25 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics

Weida Wang Dongchen Huang Jiatong Li Tengchao Yang Ziyang Zheng Di Zhang Dong Han Benteng Chen Binzhao Luo Zhiyu Liu25 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics

Weida Wang Dongchen Huang Jiatong Li Tengchao Yang Ziyang Zheng Di Zhang Dong Han Benteng Chen Binzhao Luo Zhiyu Liu25 more

Abstract

Build AI with AI

HyperAI Newsletters

Weida Wang Dongchen Huang Jiatong Li Tengchao Yang Ziyang Zheng Di Zhang Dong Han Benteng Chen Binzhao Luo Zhiyu Liu

Weida Wang Dongchen Huang Jiatong Li Tengchao Yang Ziyang Zheng Di Zhang Dong Han Benteng Chen Binzhao Luo Zhiyu Liu

Weida Wang Dongchen Huang Jiatong Li Tengchao Yang Ziyang Zheng Di Zhang Dong Han Benteng Chen Binzhao Luo Zhiyu Liu