Weekly Editor's Picks | Online Operation of Deepmoney Financial Big Model, AI Preference and Other High-quality Data Sets Are Online

At present, most of the research results of AI on financial models are trained based on public knowledge, but in actual financial practice, this public knowledge is often seriously insufficient in interpretability for the current market. An ideal financial big model should be able to understand news or data events and be able to instantly evaluate the events from both subjective and quantitative perspectives.
Deepmoney was created for this purpose. It is a large-scale language model project focusing on investment in the financial field. The hyper.ai official website now provides online operation tutorials, come and experience it!
From March 18 to March 22, hyper.ai official website updates:
* High-quality public datasets: 10
* Selection of high-quality tutorials: 3
* Community article selection: 3 articles
* Popular encyclopedia entries: 10
Visit the official website:hyper.ai
Selected public datasets
1. OpenHermesPreferences: AI Preferences Dataset The OpenHermesPreferences dataset was created by Argilla in collaboration with the Hugging Face H4 team and contains about 1 million AI preference data. This dataset can be used to train preference models or align language models through techniques such as direct preference optimization.
Direct use:
2. LongAlign 10K Large Model Long Context Alignment Dataset
LongAlign-10k is a dataset proposed by Tsinghua University to address the challenges faced by large models in long-context alignment tasks. It contains 10,000 long instruction data with a length between 8k and 64k. This dataset aims to evaluate the performance of large models in long contexts and their ability to follow 10k-100k length task instructions.
Direct use:
3. CyberMetric Large Model Cybersecurity Evaluation Dataset
The CyberMetric dataset contains 10,000 questions designed to comprehensively assess the cybersecurity knowledge of big models. The dataset was created using different big models and validated by experts in the cybersecurity field to ensure its relevance and accuracy.
Direct use:
4. 2020 China Ground Photovoltaic Power Station 10m National Scale Map Dataset
China Agricultural University and the Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, have developed a national mapping method to address the lack of high-resolution, open-source data on the distribution of national ground photovoltaic power stations in China, and successfully released a national 10-meter resolution ground photovoltaic power station classification dataset for 2020. This achievement not only accurately reveals the spatial distribution characteristics of my country's photovoltaic power stations, but also provides valuable data resources for energy planning, land use, remote sensing monitoring and environmental research with a classification accuracy of 89%. This move fills the gap in domestic data in this field and is of great significance to related research.
Direct use:
5. Crop Diseases Classification Crop Disease Classification Image Dataset
This dataset contains classified images of five crop diseases, which have been labeled as: cassava bacterial wilt (CBB), cassava brown streak disease (CBSD), cassava green mottle disease (CGM), cassava mosaic disease (CMD), and healthy. It can be used to train machine learning models to detect plant diseases or develop automatic plant diagnosis algorithms.
Direct use:
6. Tomato Leaf Diseases Detection Tomato Leaf Disease Image Detection Dataset
This dataset is a tomato leaf disease image detection dataset. The images are divided into the following categories: healthy, bacterial spot, early blight, healthy, late blight, leaf mold, target point, and black spot. The images are annotated in YOLO v5 PyTorch format.
Direct use:
7. AMAZON REVIEWS 2023 Large Amazon Review Dataset
AMAZON REVIEWS 2023 is a large-scale Amazon review dataset collected by McAuley Lab in 2023, containing more than 570 million reviews and 48 million products, covering 33 different categories.
Direct use:
8. DiFF Diffusion Model-Generated Facial Forgery Dataset
DiFF is a high-quality, large-scale facial forgery image dataset jointly developed by Shandong University, National University of Singapore and other institutions. It is generated based on a diffusion model and contains more than 500,000 images. This dataset is suitable for facial forgery detection, adversarial attacks and defenses against deep forgeries, and other related computer vision task training.
Direct use:
9. MIntRec2.0 Multimodal Intent Recognition Dialogue Dataset
MIntRec2.0 is a large-scale multimodal multi-party benchmark dataset proposed by Tsinghua University and others, which is specifically used to identify intent in conversations and detect non-intent content. Compared with the previous MIntRec, the data volume of MIntRec2.0 has increased to 15K, covering 30 intent categories, and contains about 9.3K in-intent and 5.7K out-of-intent annotated sentences, involving multiple modalities such as text, video and audio.
Direct use:
10. ApolloCorpora Multilingual Medical Dataset
ApolloCorpora is a multilingual medical dataset jointly constructed by the Shenzhen Big Data Research Institute and the Chinese University of Hong Kong research team. The dataset covers six major languages used by 6.1 billion people worldwide, including English, Chinese, Hindi, Spanish, French and Arabic.
Direct use:
For more public datasets, please visit:
Selected Public Tutorials
1. Run Deepmoney-34b-full online
Deepmoney is a large language model project focusing on financial investment. Deepmoney-34b-full is trained based on the 01-ai open source Yi-34B-200K model, which is divided into two stages: pt (full parameter training) and sft (lora fine-tuning). It can now be cloned and used on the Super Neural official website.
Run online:
2. Run Deepmoney-miqu-70b online
This model is trained based on huggingface.co's miqu-1-70b-sf, with only SFT (Lora fine-tuning) performed. It can now be cloned and used with one click on the SuperNeural official website.
Run online:
3. Run Deepmoney-67b-full online
The model is trained based on the deepseek-llm-67b-base open sourced by deepseek-ai. It is divided into two stages: pt (lora training) and sft (lora training). It can now be cloned and used with one click on the SuperNeural official website.
Run online:
Community Articles
This year's 2024 GTC AI conference is here as scheduled. From March 18 to March 21, there will be more than 900 meetings and more than 20 technical lectures. This article is a summary of Huang Renxun's keynote speech at GTC.
View the full report:
A research team from Argonne National Laboratory in the United States proposed a generative AI framework GHP-MOFsassemble, which can randomly generate and assemble new MOFs structures, screen highly stable MOFs structures through molecular dynamics simulation, and use crystal graph neural network (CGCNN) and grand canonical system Monte Carlo simulation (GCMC) to test the adsorption capacity of MOFs for carbon dioxide. The relevant paper has been published in "Nature".
View the full report:
Researchers at Princeton University have developed an AI controller for adaptive prediction and control that can predict the potential risk of plasma tearing 300 milliseconds in advance and intervene in time. The relevant results have been published in "Nature".
View the full report:
Popular Encyclopedia Articles
1. Data Gravity
2. Massive Multi-Task Language Understanding (MMLU)
3. Mixture of Experts (MoE)
4. Quantum Neural Network
5. Neural Radiance Field (NeRF)
Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:
Station B live broadcast preview
date | time | content |
March 25 Monday | 10:0017:00 | MIT Deep Learning Course 2020MIT Deep Learning Course 2021 |
Tuesday, March 26 | 10:00 | Python API Development - Comprehensive Course for Beginners |
Wednesday, March 27 | 10:0014:00 | SQL Tutorial - Beginner Course Generative AI Full Course |
Thursday, March 28 | 21:00 | Flutter courses for beginners |
Friday, March 29 | 10:00 | Flutter courses for beginners |
Saturday, March 30 | 10:00 | Harvard CS50—Python Artificial Intelligence Course |
Sunday, March 31 | 10:00 | Learn PyTorch for Deep Learning in One Day |
Super Neuro TV broadcasts live 24/7. Click to get the "electronic pickles" in the AI field:
http://live.bilibili.com/26483094
The above is all the content of this week’s editor’s selection. If you have resources that you want to include on the hyper.ai official website, you are also welcome to leave a message or submit an article to tell us!
See you next week!
About HyperAI
HyperAI (hyper.ai) is the leading artificial intelligence and high-performance computing community in China.We are committed to becoming the infrastructure in the field of data science in China and providing rich and high-quality public resources for domestic developers. So far, we have:
* Provide domestic accelerated download nodes for 1200+ public data sets
* Includes 300+ classic and popular online tutorials
* Interpretation of 100+ AI4Science paper cases
* Support 500+ related terms search
* Hosting the first complete Apache TVM Chinese documentation in China
Visit the official website to start your learning journey: