Finance-Instruct-500k Financial Reasoning Dataset
Date
4 days ago
Publish URL
License
Apache 2.0
Categories
Finance-Instruct-500k is a financial reasoning dataset designed for training high-level language models for financial tasks, reasoning, and multi-turn dialogue.
The dataset contains more than 500,000 high-quality data in the financial field, covering financial question answering, reasoning, sentiment analysis, topic classification, multilingual named entity recognition and conversational AI.
Dataset features:
- Multi-round dialogue: Rich dialogue content, emphasizing contextual understanding and reasoning ability.
- Diverse data sources: Contains data from multiple high-quality datasets such as Cinder and Sujet-Finance-Instruct-177k.
- RAG format data: In the Retrieval Augmentation Generation (RAG) task, external data is appended before the user field to enhance context understanding.
- Deduplication and preprocessing: Eliminate overlapping and irregular entries to obtain cleaner, higher-quality data.
- XBRL Tagging: Contains structured financial entity tags from Financial-NER-NLP for advanced extraction tasks.