HyperAI

Amber stands for Assisted Model Building with Energy Refinement.

The Amber Benchmark dataset is a collection of performance benchmark inputs and configuration files designed specifically for high-performance computing (HPC) environments. It is used to test and compare the efficiency and scalability of the Amber Molecular Dynamics program across a variety of hardware and parallel architectures.

Unlike scientific experimental data or simulation results, this dataset contains standardized input and configuration packages used to measure the computational performance (speed, scalability, and efficiency) of a system, rather than simulation outputs for scientific analysis. All benchmarks (such as DHFR, Factor IX, Cellulose, STMV, etc.) come with standardized input files and reference performance results, which can be directly run repeatedly on different GPU or CPU platforms to verify performance.

The relevant paper results areRecent Developments in Amber Biomolecular SimulationsThe dataset, titled "...", was released in 2025 by David A. Case et al. The current version of this dataset is "...".Amber24: pmemd.cuda performance information".

Dataset structure

Amber offers two complementary benchmark suites:

Walker Baseline Kit
Created by Dr. Ross C. Walker, it was one of the earliest performance evaluation benchmarks for the Amber GPU module (pmemd.cuda).
Since 2010, it has covered multiple versions and GPU architectures (Fermi → Ampere → Hopper → Blackwell).
It includes several representative architectures (JAC, Factor IX, Cellulose, STMV, etc.) to compare the running speed (ns/day) of different GPUs.
Cerutti benchmark kit
Designed by Dr. Dave Cerutti, it employs modern, realistic simulation settings (Amber 18–20–24).
It includes four periodic systems: DHFR, Factor IX, Cellulose, and STMV (23K–1.1M atoms).
Supports NVE/NPT ensembles with a time step of 4 fs and a cutoff radius of 9 Å.
It offers two operating modes: "Default" and "Boost," with the latter improving performance by approximately 10%.

In addition, the dataset also includes implicit solvent (GB) benchmark systems, such as Trp Cage, Myoglobin, and Nucleosome, for non-periodic simulation performance evaluation.

Dataset content example

Walker Benchmark Kit (Traditional GPU Benchmark)

Typical architecture and performance examples (running on a single GPU)

System Name	number of atoms	Series	Step length	GPU Model	Performance (ns/day)	illustrate
JAC_production	23,558	NVE/NPT	4 fs	RTX 4090	1638 / 1618	Small protein systems offer the highest performance, reaching over 1600 ns/day.
Factor IX_production	90,906	NVE/NPT	2 fs	RTX 4090	466 / 433	Large water-box protein system for testing PME communication efficiency
Cellulose production	408,609	NVE/NPT	2 fs	RTX 4090	129 / 119	Polymer systems for evaluating long-range interactions and parallel decomposition performance
STMV_production	1,067,095	NPT	4 fs	RTX 4090	78.9	Tobacco Satellite Virus System, Ultra-Large-Scale Parallel Load Testing

On the latest Blackwell B200 GPUs, Amber24's "Walker" suite outperforms the A100/H100 in small systems and maintains its lead in large systems.

Cerutti Benchmarking Suite (Modern Optimized Benchmarking)

Typical architecture and performance examples (V100 GPU, Amber 20)

System Name	number of atoms	Series	model	Performance (ns/day)	illustrate
DHFR (JAC)	23,588	NVE/NPT	Default / Boost	934 / 1059	Small protein systems, standard reference points
Factor IX	90,906	NVE/NPT	Default / Boost	365 / 406	Medium-sized system, communication and scalability balance test
Cellulose	408,609	NVE/NPT	Default / Boost	88.9 / 96.2	Large-scale polysaccharide systems, GPU memory and bandwidth pressure scenarios
STMV	1,067,095	NVE/NPT	Default / Boost	30.4 / 33.5	Million-Atom Virus System, Extreme Parallel Performance Evaluation

Amber 20 introduces the "leaky pair list" and "net force correction" optimization algorithms, which reduce the computational burden by approximately 31 TP3T while maintaining energy conservation.

Implicit Solvent (GB) Reference Kit

Typical architecture and performance example (V100 GPU, Amber 20, 4 fs)

System Name	number of atoms	Model	Performance (ns/day)	illustrate
Trp Cage	304	GB	2801	A small protein folding model with peak performance of >2800 ns/day
Myoglobin	2,492	GB	1725	Medium-sized single-chain protein system with stable performance
Nucleosome	25,095	GB	48.5	Large chromatin unit system for testing energy conservation and throughput capacity

The GB model can significantly improve the sampling rate after removing explicit solvent friction, making it suitable for rapid energy surface exploration.

Performance Comparison and Scalability Overview

Small systems (≤ 30 K atoms): performance is mainly affected by GPU clock speed and memory bandwidth due to the limited amount of parallel tasks.
Medium-sized systems (≈ 100 K atoms): Reach peak GPU utilization, representing the optimal performance range for most real-world biological systems.
Large systems (≥ 400 K atoms): Communication and memory overhead increase, and performance gradually decreases as the system size increases.
Million-atom scale system: Amber 24 can stably maintain a performance of >130 ns/day on a single B200 GPU, demonstrating good parallel scalability.

Amber_Benchmark Molecular Dynamics Performance Evaluation Dataset

Dataset structure

Dataset content example

Performance Comparison and Scalability Overview

Build AI with AI

Hyper Newsletters

Command Palette

Amber_Benchmark Molecular Dynamics Performance Evaluation Dataset

Dataset structure

Dataset content example

Performance Comparison and Scalability Overview

Build AI with AI

Hyper Newsletters