HyperAI超神经

* This dataset supports online use.Click here to jump.

There is widespread optimism that cutting-edge large language models (LLMs) and LLM-enhanced systems have the potential to rapidly accelerate scientific discovery across a wide range of disciplines. Today, there are many benchmarks that measure the knowledge and reasoning capabilities of LLMs on textbook scientific problems, but few benchmarks have been used to evaluate the performance of language models on practical tasks required for scientific research, such as literature retrieval, protocol planning, and data analysis.

As a first step in establishing such a benchmark, the research team from FutureHouse launched the Language Agent Biology Benchmark (LAB-Bench) in 2024. The dataset contains more than 2,400 multiple-choice questions to evaluate the performance of artificial intelligence systems in a range of practical biological research capabilities, including literature retrieval and reasoning capabilities, data interpretation capabilities, the ability to access and navigate databases, and the ability to understand and control DNA and protein sequences.LAB-Bench: Measuring Capabilities of Language Models for Biology Research"

LAB Bench Language Model Biology Benchmark Dataset

* This dataset supports online use.Click here to jump.