Command Palette
Search for a command to run...
Amber_Benchmark Molecular Dynamics Performance Evaluation Dataset
Amber stands for Assisted Model Building with Energy Refinement.
The Amber Benchmark dataset is a collection of performance benchmark inputs and configuration files designed specifically for high-performance computing (HPC) environments. It is used to test and compare the efficiency and scalability of the Amber Molecular Dynamics program across a variety of hardware and parallel architectures.
Unlike scientific experimental data or simulation results, this dataset contains standardized input and configuration packages used to measure the computational performance (speed, scalability, and efficiency) of a system, rather than simulation outputs for scientific analysis. All benchmarks (such as DHFR, Factor IX, Cellulose, STMV, etc.) come with standardized input files and reference performance results, which can be directly run repeatedly on different GPU or CPU platforms to verify performance.
The relevant paper results areRecent Developments in Amber Biomolecular SimulationsThe dataset, titled "...", was released in 2025 by David A. Case et al. The current version of this dataset is "...".Amber24: pmemd.cuda performance information".
Dataset structure
Amber offers two complementary benchmark suites:
- Walker Baseline Kit
- Created by Dr. Ross C. Walker, it was one of the earliest performance evaluation benchmarks for the Amber GPU module (pmemd.cuda).
- Since 2010, it has covered multiple versions and GPU architectures (Fermi → Ampere → Hopper → Blackwell).
- It includes several representative architectures (JAC, Factor IX, Cellulose, STMV, etc.) to compare the running speed (ns/day) of different GPUs.
- Cerutti benchmark kit
- Designed by Dr. Dave Cerutti, it employs modern, realistic simulation settings (Amber 18–20–24).
- It includes four periodic systems: DHFR, Factor IX, Cellulose, and STMV (23K–1.1M atoms).
- Supports NVE/NPT ensembles with a time step of 4 fs and a cutoff radius of 9 Å.
- It offers two operating modes: "Default" and "Boost," with the latter improving performance by approximately 10%.
In addition, the dataset also includes implicit solvent (GB) benchmark systems, such as Trp Cage, Myoglobin, and Nucleosome, for non-periodic simulation performance evaluation.
Dataset content example
- Walker Benchmark Kit (Traditional GPU Benchmark)
Typical architecture and performance examples (running on a single GPU)
| System Name | number of atoms | Series | Step length | GPU Model | Performance (ns/day) | illustrate |
|---|---|---|---|---|---|---|
| JAC_production | 23,558 | NVE/NPT | 4 fs | RTX 4090 | 1638 / 1618 | Small protein systems offer the highest performance, reaching over 1600 ns/day. |
| Factor IX_production | 90,906 | NVE/NPT | 2 fs | RTX 4090 | 466 / 433 | Large water-box protein system for testing PME communication efficiency |
| Cellulose production | 408,609 | NVE/NPT | 2 fs | RTX 4090 | 129 / 119 | Polymer systems for evaluating long-range interactions and parallel decomposition performance |
| STMV_production | 1,067,095 | NPT | 4 fs | RTX 4090 | 78.9 | Tobacco Satellite Virus System, Ultra-Large-Scale Parallel Load Testing |
- On the latest Blackwell B200 GPUs, Amber24's "Walker" suite outperforms the A100/H100 in small systems and maintains its lead in large systems.
- Cerutti Benchmarking Suite (Modern Optimized Benchmarking)
Typical architecture and performance examples (V100 GPU, Amber 20)
| System Name | number of atoms | Series | model | Performance (ns/day) | illustrate |
|---|---|---|---|---|---|
| DHFR (JAC) | 23,588 | NVE/NPT | Default / Boost | 934 / 1059 | Small protein systems, standard reference points |
| Factor IX | 90,906 | NVE/NPT | Default / Boost | 365 / 406 | Medium-sized system, communication and scalability balance test |
| Cellulose | 408,609 | NVE/NPT | Default / Boost | 88.9 / 96.2 | Large-scale polysaccharide systems, GPU memory and bandwidth pressure scenarios |
| STMV | 1,067,095 | NVE/NPT | Default / Boost | 30.4 / 33.5 | Million-Atom Virus System, Extreme Parallel Performance Evaluation |
- Amber 20 introduces the "leaky pair list" and "net force correction" optimization algorithms, which reduce the computational burden by approximately 31 TP3T while maintaining energy conservation.
- Implicit Solvent (GB) Reference Kit
Typical architecture and performance example (V100 GPU, Amber 20, 4 fs)
| System Name | number of atoms | Model | Performance (ns/day) | illustrate |
|---|---|---|---|---|
| Trp Cage | 304 | GB | 2801 | A small protein folding model with peak performance of >2800 ns/day |
| Myoglobin | 2,492 | GB | 1725 | Medium-sized single-chain protein system with stable performance |
| Nucleosome | 25,095 | GB | 48.5 | Large chromatin unit system for testing energy conservation and throughput capacity |
- The GB model can significantly improve the sampling rate after removing explicit solvent friction, making it suitable for rapid energy surface exploration.
Performance Comparison and Scalability Overview
- Small systems (≤ 30 K atoms): performance is mainly affected by GPU clock speed and memory bandwidth due to the limited amount of parallel tasks.
- Medium-sized systems (≈ 100 K atoms): Reach peak GPU utilization, representing the optimal performance range for most real-world biological systems.
- Large systems (≥ 400 K atoms): Communication and memory overhead increase, and performance gradually decreases as the system size increases.
- Million-atom scale system: Amber 24 can stably maintain a performance of >130 ns/day on a single B200 GPU, demonstrating good parallel scalability.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.