OceanInstruct Ocean Large Model Instruction Dataset
Date
Size
Publish URL
Categories

OceanInstruct is a large-scale language model instruction dataset designed specifically for the field of ocean science. It contains 20,000 instructions and aims to provide training data for large-scale language models in the ocean field. These instructions cover a wide range of ocean science knowledge, ensuring that the model has professional capabilities in ocean science question answering, content generation, and underwater embodied intelligence. This dataset was used to train the OceanGPT model, which performs well in ocean science question answering, content generation, and other aspects. The OceanGPT model outperforms the baseline language model on multiple tasks, showing its advantage in handling ocean tasks that require expertise.
This dataset was open sourced by Zhejiang University in 2024, and the related paper results are "OceanGPT: A Large Language Model for Ocean Science Tasks".
The address of the super neuro report isSelected for ACL 2024! Zhejiang University launches the first ocean language model OceanGPT, making underwater embodied intelligence a reality".
In addition, OceanBench also proposed OceanBench oceanography benchmark evaluation dataset, which is a benchmark evaluation dataset specifically for oceanographic tasks. This dataset includes a total of 15 ocean-related tasks, such as question answering and description tasks, and is designed to comprehensively evaluate the capabilities of large language models (LLMs) in the field of oceanography. The samples in OceanBench are generated from seed datasets in an automated way and manually verified by experts to ensure the professionalism and accuracy of the data.