HyperAI

Weekly Editor's Picks | Hometown Chicken Open Source "Dish Tracing Report", Old Photo Repair SUPIR Tool Online Use

特色图像

Extra! Extra!Lao Xiang Ji’s “confidential documents” were actually made public!

Recently, Lao Xiang Ji released the 200,000-word "Lao Xiang Ji Food Traceability Report" to the public, which clearly describes everything from the source of ingredients to cooking details. It is now available for download on Chao Shen Jing, come and see if there is anything you like!

PS: The editor just wants to ask, if the work fails according to the report, can I file an issue?

From April 15th to April 19th, hyper.ai official website updates:

* High-quality public datasets: 10

* Selected high-quality tutorials: 2

* Community article selection: 3 articles

* Popular encyclopedia entries: 5

Visit the official website:hyper.ai

Selected public datasets

1. Homemade Chicken Food Traceability Report

The data set covers 226 SKUs, 873 kinds of raw materials and 305 suppliers in the current 1,218 Lao Xiang Ji restaurants. Lao Xiang Ji has made the 677-page Lao Xiang Ji Food Traceability Report, which contains 200,000 words, available to the public.

Direct use:https://go.hyper.ai/nbESl

2. Open Sora Dataset Project Video Dataset

Open-Sora-Plan is an open source project that aims to reproduce OpenAI's Sora (T2V model). This dataset is a video dataset for its project. The research team crawled 40,258 videos from open source websites under the CC0 license. All videos are high-quality and watermark-free, of which about 60% are landscape data.

Direct use:https://go.hyper.ai/75Ftc

3. MMVP Multimodal Motion Capture Dataset

This dataset contains many large-scale and rapid human movements, such as running, skipping, standing long jump, etc., and a total of more than 44k frame-synchronized RGBD frames and pressure data from 16 subjects were collected. 

Direct use: https://go.hyper.ai/4edeR

4. OpenWebMath Open Web Mathematics Training Dataset

OpenWebMath is a dataset containing high-quality mathematical texts from most of the Internet. It is filtered and extracted from more than 200B HTML files on Common Crawl, resulting in a set of 6.3 million documents containing a total of 14.7B tokens. OpenWebMath is designed to be used for pre-training and fine-tuning large language models.

Direct use: https://go.hyper.ai/zjytq

5. Proof-Pile-2 Mathematical Dataset

Proof-Pile-2 is a tokenized dataset of 55 billion math and science documents. It is a blend of scientific papers, math-related web content, and math code up to date as of April 2023 (excluding a specific subset of Lean proof steps). This dataset was created to train Llemma 7B and Llemma 34B models.

Direct use: https://go.hyper.ai/aant8

6. Mizar Mathematics Dataset

The Mizar Mathematics Library contains formalized mathematical theorems and proofs covering a wide range of mathematical areas, including logic, algebra, analysis, geometry, etc. The goal of this library is to provide a solid mathematical foundation for automated theorem proving and formal reasoning.

Direct use: https://go.hyper.ai/IJeHa

7. Isabelle Parallel Corpus

The Isabelle Parallel Corpus (IPC) is a community-driven initiative to create a parallel corpus of Isabelle documents. IPC pairs formal documents in Isabelle (such as theorems, lemmas, definitions, etc.) with their natural language counterparts.

Direct use: https://go.hyper.ai/BEADY

8. Fruits Dataset Fruit freshness classification dataset

The dataset contains images of three types of fruits: apples, oranges, and bananas. Each image is labeled according to its fruit type and freshness state, enabling supervised learning tasks such as classification or object detection.

Direct use:https://go.hyper.ai/b7TNx

9. DeepFruit fruit image classification dataset

DeepFruit is a fruit image classification dataset jointly released by Prince Mohammed bin Fahd University and other research institutions. The dataset contains 21,122 fruit images based on 8 different fruit sets. It can be used for research in the field of fruit detection, recognition and classification, as well as other innovative applications such as calorie estimation.

Direct use:https://go.hyper.ai/ut4BA

10. 15 Animal Image Classification Datasets

The dataset contains image folders of 15 animals, all images are 224X224 in size, suitable for image classification. The images were downloaded from the Internet and preprocessed (resized and enhanced) using the OpenCV library. Therefore, the dataset can be used directly for training without further data enhancement.

Direct use:https://go.hyper.ai/tgMtH

For more public datasets, please visit:

https://hyper.ai/datasets

Selected Public Tutorials

1. Online tutorial | Low threshold deployment! SUPIR specializes in resolving various blurry images and can also understand text descriptions for fine-tuning

The image restoration tool SUPIR uses StableDiffusion-XL (SDXL) and model extension technology, and can significantly improve the quality of image restoration through machine learning and multimodal methods. This tutorial has set up the environment for everyone, without any complicated preliminary preparation, and you can repair the image with one click.

Run online:https://go.hyper.ai/3RBMH

2. Deploy large models with Ollama and Open WebUI

This tutorial is a one-click run package of Ollama + Open WebUI. You only need to follow the steps and enter the command to run the large model. The models currently included are: qwen 1.5 14b, qwen 1.5 32b, llava 1.6 34b, and you can upload new models by yourself.

Run online:https://go.hyper.ai/FwREK

Community Articles

1. Accelerating catalyst design, He Yulian's research group at Shanghai Jiaotong University automatically extracts knowledge based on AutoML

The research group led by Assistant Professor Yulian He of the JI at Shanghai Jiao Tong University proposed a new method to determine the key physical quantities that determine Eads, namely, a feature deletion experiment based on automatic machine learning, which realizes the automatic extraction of knowledge from the high-throughput density functional theory database. This article is a detailed interpretation and sharing of the research.

View the full report:https://go.hyper.ai/LEVS1

2. Google's flood prediction model is published in Nature again, beating the world's No.1 system and covering 80+ countries

The Google team has developed a river forecasting model based on machine learning. The model's forecasting ability is better than the world's most advanced flood forecasting system GloFAS. It can achieve reliable flood forecasts 5 days in advance and cover more than 80 countries. This article is a sharing and interpretation of the research.

View the full article:https://go.hyper.ai/V4r4i

3. The accuracy of lithium battery life prediction has been improved by 20%! The Shanghai Jiaotong University team released the semi-supervised learning method PBCT to extract hidden information from unlabeled data

The research team of Shanghai Jiao Tong University proposed a semi-supervised learning method PBCT, which makes full use of the low-cost and abundant unlabeled data generated during the life cycle of lithium batteries. By extracting hidden information, it deepens the understanding of the underlying data patterns and improves the accuracy of lithium battery life prediction by 20%. This article is a sharing and interpretation of the research.

View the full report:https://go.hyper.ai/2EQGa

Popular Encyclopedia Articles

1. Epoch

2. Learning Rate

3. Paired t-Test

4. Diffusion Model

5. Large Language Model

Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:

https://hyper.ai/wiki

Station B live broadcast preview

Google recently announced that it will hold the 2024 I/O Developer Conference on May 14. In order to help everyone gain a deeper understanding of Google,Super neural live broadcast room willStarting from next Monday, we will broadcast the "Google Special" video live 24/7.Involves: Google I/O press conferences over the years, interviews with executives, related documentaries and other rich content.

The following table is a preview of the content selected by the editor↓↓↓

datetimecontent
April 15
Monday
18:00Google I/O Conferences over the Years
Tuesday, April 1618:00Google Cloud NEXT Conferences
Wednesday, April 1718:00TIME100 Interview with Sundar Pichai 
Thursday, April 1818:00Google CEO on the US-China AI race
Friday, April 1918:00AlphaGo Documentary
Saturday, April 2018:00The story behind the founder of Google
Sunday, April 2118:00BBC documentary: A World Without Google

Super Neuro TV broadcasts live 24/7. Click to get the "electronic pickles" in the AI field:

http://live.bilibili.com/26483094

The above is all the content of this week’s editor’s selection. If you have resources that you want to include on the hyper.ai official website, you are also welcome to leave a message or submit an article to tell us!

See you next week!

About HyperAI

HyperAI (hyper.ai) is the leading artificial intelligence and high-performance computing community in China.We are committed to becoming the infrastructure in the field of data science in China and providing rich and high-quality public resources for domestic developers. So far, we have:

* Provide domestic accelerated download nodes for 1200+ public data sets

* Includes 300+ classic and popular online tutorials

* Interpretation of 100+ AI4Science paper cases

* Support 500+ related terms search

* Hosting the first complete Apache TVM Chinese documentation in China

Visit the official website to start your learning journey:

https://hyper.ai