Search for a command to run...
Evaluation of large language model performance on the Biomedical Language Understanding and Reasoning Benchmark