HyperAI

Deepmoney Tutorial Series 2: Financial Big Models Based on Deepseek-llm Training

Tutorial Introduction

This tutorial was produced and released by Deepmoney project manager Xingye Yuanyuan in 2024. It aims to provide in-depth market explanations and financial analysis to make up for the lack of public knowledge in the actual financial field. The Deepmoney series of tutorials includes three models: Deepmoney-yi-34b, Deepmoney-67b-full and Deepmoney-miqu-70b.This tutorial uses Deepmoney-67b-full.

This model is trained based on Deepseek-llm-67b-base, which is open sourced from Deepseek-AI. It is divided into two stages: Lora training (pt) and Lora fine-tuning (sft). Similar to Deepmoney-yi-34b, this model also uses full parameter training to ensure the professionalism and accuracy of the model in the financial field.

The other 2 models in this tutorial series can be found here:

* Financial Big Model Series Tutorial 1: Deepmoney-34b-full

* Financial Big Model Series Tutorial 3: Deepmoney-miqu-70b

1. Research Background

Most of the so-called financial models are currently trained on public knowledge, but in the actual financial field, these public knowledge is often seriously insufficient to explain the current market. If you are interested, you can learn about the various propositions of Keynes, Friedman and even the current behavioral finance. Moreover, the market is changing every moment, and a large amount of news and massive data are real-time. Why not use a large model to make a pipeline? In the research plan, this model is the base model of this process. Models such as information collectors, target judges, qualitative analysts, quantitative analysts, and data extractors are all part of the process. But it is undoubtedly important for the model itself to master a large number of qualitative and quantitative methods. This is the reason why this model was born.

2. About Data

pt: The validity of a lot of public knowledge is questionable - but that doesn't mean it's wrong. The theoretical support behind many research methods in research reports also relies on this knowledge. So in the training, the researchers selected some university textbooks and some professional books. The quantity is not large, but the quality is good. In addition, the researchers selected a large amount of research report data from December 2019 to 2023 - these reports were published by a variety of publishers, including traditional brokers and research institutions. Most of them are paid and only available to institutions.

If you have read research reports, especially high-quality ones, you will find that research reports are all subjective judgment + quantitative analysis, and the data support in quantitative analysis is crucial to the entire logical chain. In order to extract this data, I created a process that summarizes the context of the research report as part of the prompt.

Finally, the researchers mixed the data and did not include general knowledge data because it was created for greed. And the knowledge in the industry research report is comprehensive enough.

sft: First, a research report is divided into several parts according to chapters. As a context, goliath-120b (you can continue to test more here, and claude3 has better results in actual tests) asks questions about the content of the research report. Then use Nous-Capybara-34B to answer the questions and the corresponding research report fragments. The reason for separating the questioner and the answerer is to prevent the model from "asking and answering itself" and not answering according to the research report but enclosing its own output. In this way, the knowledge and methods in the research report can be extracted. In addition, the researchers used gpt4 to extract the underlying assets (if any) from the research report and placed them in the instructions. In the use envisioned by the research, it is desired to give the target in the instruction and the news source that the crawler crawls in real time, combined with an agent that automatically asks questions, so that the model can reason about current news.

3. About training

This model is trained using the llama-factory training framework. For specific usage, please refer to:hiyouga/LLaMA-Factory: Unify Efficient Fine-tuning of 100+ LLMs (github.com)

This model goes through two stages: pt and sft.

4. Model Evaluation

Let's sample some recent events, simulate the event-driven securities analysis process in the real world, and conduct comparative tests on DeepMoney and GPT4. Because the impact of events on the market is relatively invisible, it is difficult to evaluate the effect without a rigorous backtesting process. And our output requires the use of many quantitative methods for analysis. So the researchers posted the results here, and everyone can make a perceptual evaluation of the production results. The researchers have a global news crawling system, with a lot of news every moment. The process deduplicates and makes subjective and objective judgments on these news, which can be solved by traditional BERT. Then for DeepMoney, here are 3 steps to process the incoming news:

1. Which industry sectors or investment targets may be affected by the above news?

2. Please design a quantitative method to study the impact of the above news on the ____ industry. And explain what specific data needs to be used.

3. Based on the following data, please _____design a specific quantitative method to quantitatively analyze the impact of the above news on the ____ industry.

Among them, the first question is a subjective judgment, extracting the target affected by the news. This relies more on the subjective analysis ability of the model. Then extract the industry name from the first answer (for those who are familiar with the big model, it is easy to design an automated process) and fill it in the second question, the purpose is to obtain data for quantitative analysis. The reason why the quantitative method is asked first and then the data is the magic of COT. The answer to the last question is what we really need. The context of this question gives enough information, and it needs to reply with an exact and specific quantitative method. Combining the model written in code and the model called by function, this is completely achievable if you have a macro and micro database with a complete data dictionary. The above are the three-step answers of deepmoney and gpt4. The news just happened at 9:35 am Beijing time on 20240115.