Weekly Editor's Picks | Hometown Chicken Open Source "Dish Tracing Report", Old Photo Repair SUPIR Tool Online Use

Extra! Extra!Lao Xiang Ji’s “confidential documents” were actually made public!
Recently, Lao Xiang Ji released the 200,000-word "Lao Xiang Ji Food Traceability Report" to the public, which clearly describes everything from the source of ingredients to cooking details. It is now available for download on Chao Shen Jing, come and see if there is anything you like!
PS: The editor just wants to ask, if the work fails according to the report, can I file an issue?
From April 15th to April 19th, hyper.ai official website updates:
* High-quality public datasets: 10
* Selected high-quality tutorials: 2
* Community article selection: 3 articles
* Popular encyclopedia entries: 5
Visit the official website:hyper.ai
Selected public datasets
1. Homemade Chicken Food Traceability Report
The data set covers 226 SKUs, 873 kinds of raw materials and 305 suppliers in the current 1,218 Lao Xiang Ji restaurants. Lao Xiang Ji has made the 677-page Lao Xiang Ji Food Traceability Report, which contains 200,000 words, available to the public.
Direct use:https://go.hyper.ai/nbESl
2. Open Sora Dataset Project Video Dataset
Open-Sora-Plan is an open source project that aims to reproduce OpenAI's Sora (T2V model). This dataset is a video dataset for its project. The research team crawled 40,258 videos from open source websites under the CC0 license. All videos are high-quality and watermark-free, of which about 60% are landscape data.
Direct use:https://go.hyper.ai/75Ftc
3. MMVP Multimodal Motion Capture Dataset
This dataset contains many large-scale and rapid human movements, such as running, skipping, standing long jump, etc., and a total of more than 44k frame-synchronized RGBD frames and pressure data from 16 subjects were collected.
Direct use: https://go.hyper.ai/4edeR
4. OpenWebMath Open Web Mathematics Training Dataset
OpenWebMath is a dataset containing high-quality mathematical texts from most of the Internet. It is filtered and extracted from more than 200B HTML files on Common Crawl, resulting in a set of 6.3 million documents containing a total of 14.7B tokens. OpenWebMath is designed to be used for pre-training and fine-tuning large language models.
Direct use: https://go.hyper.ai/zjytq
5. Proof-Pile-2 Mathematical Dataset
Proof-Pile-2 is a tokenized dataset of 55 billion math and science documents. It is a blend of scientific papers, math-related web content, and math code up to date as of April 2023 (excluding a specific subset of Lean proof steps). This dataset was created to train Llemma 7B and Llemma 34B models.
Direct use: https://go.hyper.ai/aant8
6. Mizar Mathematics Dataset
The Mizar Mathematics Library contains formalized mathematical theorems and proofs covering a wide range of mathematical areas, including logic, algebra, analysis, geometry, etc. The goal of this library is to provide a solid mathematical foundation for automated theorem proving and formal reasoning.
Direct use: https://go.hyper.ai/IJeHa
7. Isabelle Parallel Corpus
The Isabelle Parallel Corpus (IPC) is a community-driven initiative to create a parallel corpus of Isabelle documents. IPC pairs formal documents in Isabelle (such as theorems, lemmas, definitions, etc.) with their natural language counterparts.
Direct use: https://go.hyper.ai/BEADY
8. Fruits Dataset Fruit freshness classification dataset
The dataset contains images of three types of fruits: apples, oranges, and bananas. Each image is labeled according to its fruit type and freshness state, enabling supervised learning tasks such as classification or object detection.
Direct use:https://go.hyper.ai/b7TNx
9. DeepFruit fruit image classification dataset
DeepFruit is a fruit image classification dataset jointly released by Prince Mohammed bin Fahd University and other research institutions. The dataset contains 21,122 fruit images based on 8 different fruit sets. It can be used for research in the field of fruit detection, recognition and classification, as well as other innovative applications such as calorie estimation.
Direct use:https://go.hyper.ai/ut4BA
10. 15 Animal Image Classification Datasets
The dataset contains image folders of 15 animals, all images are 224X224 in size, suitable for image classification. The images were downloaded from the Internet and preprocessed (resized and enhanced) using the OpenCV library. Therefore, the dataset can be used directly for training without further data enhancement.
Direct use:https://go.hyper.ai/tgMtH
For more public datasets, please visit:
Selected Public Tutorials
The image restoration tool SUPIR uses StableDiffusion-XL (SDXL) and model extension technology, and can significantly improve the quality of image restoration through machine learning and multimodal methods. This tutorial has set up the environment for everyone, without any complicated preliminary preparation, and you can repair the image with one click.
Run online:https://go.hyper.ai/3RBMH
2. Deploy large models with Ollama and Open WebUI
This tutorial is a one-click run package of Ollama + Open WebUI. You only need to follow the steps and enter the command to run the large model. The models currently included are: qwen 1.5 14b, qwen 1.5 32b, llava 1.6 34b, and you can upload new models by yourself.
Run online:https://go.hyper.ai/FwREK
Community Articles
The research group led by Assistant Professor Yulian He of the JI at Shanghai Jiao Tong University proposed a new method to determine the key physical quantities that determine Eads, namely, a feature deletion experiment based on automatic machine learning, which realizes the automatic extraction of knowledge from the high-throughput density functional theory database. This article is a detailed interpretation and sharing of the research.
View the full report:https://go.hyper.ai/LEVS1
The Google team has developed a river forecasting model based on machine learning. The model's forecasting ability is better than the world's most advanced flood forecasting system GloFAS. It can achieve reliable flood forecasts 5 days in advance and cover more than 80 countries. This article is a sharing and interpretation of the research.
View the full article:https://go.hyper.ai/V4r4i
The research team of Shanghai Jiao Tong University proposed a semi-supervised learning method PBCT, which makes full use of the low-cost and abundant unlabeled data generated during the life cycle of lithium batteries. By extracting hidden information, it deepens the understanding of the underlying data patterns and improves the accuracy of lithium battery life prediction by 20%. This article is a sharing and interpretation of the research.
View the full report:https://go.hyper.ai/2EQGa
Popular Encyclopedia Articles
1. Epoch
2. Learning Rate
3. Paired t-Test
4. Diffusion Model
5. Large Language Model
Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:
Station B live broadcast preview
Google recently announced that it will hold the 2024 I/O Developer Conference on May 14. In order to help everyone gain a deeper understanding of Google,Super neural live broadcast room willStarting from next Monday, we will broadcast the "Google Special" video live 24/7.Involves: Google I/O press conferences over the years, interviews with executives, related documentaries and other rich content.
The following table is a preview of the content selected by the editor↓↓↓
date | time | content |
April 15 Monday | 18:00 | Google I/O Conferences over the Years |
Tuesday, April 16 | 18:00 | Google Cloud NEXT Conferences |
Wednesday, April 17 | 18:00 | TIME100 Interview with Sundar Pichai |
Thursday, April 18 | 18:00 | Google CEO on the US-China AI race |
Friday, April 19 | 18:00 | AlphaGo Documentary |
Saturday, April 20 | 18:00 | The story behind the founder of Google |
Sunday, April 21 | 18:00 | BBC documentary: A World Without Google |
Super Neuro TV broadcasts live 24/7. Click to get the "electronic pickles" in the AI field:
http://live.bilibili.com/26483094
The above is all the content of this week’s editor’s selection. If you have resources that you want to include on the hyper.ai official website, you are also welcome to leave a message or submit an article to tell us!
See you next week!
About HyperAI
HyperAI (hyper.ai) is the leading artificial intelligence and high-performance computing community in China.We are committed to becoming the infrastructure in the field of data science in China and providing rich and high-quality public resources for domestic developers. So far, we have:
* Provide domestic accelerated download nodes for 1200+ public data sets
* Includes 300+ classic and popular online tutorials
* Interpretation of 100+ AI4Science paper cases
* Support 500+ related terms search
* Hosting the first complete Apache TVM Chinese documentation in China
Visit the official website to start your learning journey: