HyperAI

Worth 999! Free Tickets to the Apache CoC Conference; ToT Large Model Temporal Reasoning Benchmark Dataset Is Newly Released

特色图像

From July 26 to 28, Apache will hold CommunityOverCode Asia 2024 (CoC) in Hangzhou. The conference will bring you the latest information and cutting-edge practices in Apache community building and development. HyperAI was invited to attend the conference as a cooperative community. We have prepared exciting check-in activities and rich gifts for everyone on site. Welcome to the booth to interact~

Welfare is coming!We have prepared 5 event tickets worth 999 yuan for you.The prizes will be distributed through a lottery. You can follow the "HyperAI Super Neural" official account to participate in the lottery.

From July 15 to July 19, hyper.ai official website updates:

* High-quality public datasets: 10

* Selected high-quality tutorials: 2

* Community article selection: 4 articles

* Popular encyclopedia entries: 5

* Top conferences with deadline in August: 4

Visit the official website:hyper.ai

Selected public datasets

1. Test of Time Benchmark dataset for large model temporal reasoning capabilities

The dataset is referred to as ToT and is divided into three subsets: ToT-semantic contains 1,850 examples, ToT-arithmetic contains 2,800 examples, and ToT-semantic-large contains 46,480 examples. ToT examines the temporal understanding and arithmetic ability of LLM respectively.

Direct use:https://go.hyper.ai/D5089

2. VEGA Scientific Paper Graphics and Text Dataset

The dataset contains text and image data from more than 50,000 scientific papers and is specially constructed for the task of interleaved text and image reading comprehension.

Direct use:https://go.hyper.ai/DMmWq

3. Lemon Quality Control Dataset

The dataset contains 2,690 annotated images (1,056 x 1,056 pixels) and can be used to study fruit quality control tasks.

Direct use:https://go.hyper.ai/03ytr

4. GDHY 1981-2016 Global Major Crops Historical Yield Dataset

This dataset provides historical yield data of major crops worldwide from 1981 to 2016. It is of great value for analyzing the impact of climate change on crop yields, evaluating global grid crop model simulations, and providing a basis for global and seasonal crop prediction systems.

Direct use:https://go.hyper.ai/xNzH3

5. WHU-OHS Large-Scale Spectral Image Classification Benchmark Dataset

The dataset consists of 42 OHS satellite images of more than 40 different locations in China. There are 4,822, 513, and 2,460 sub-images in the training set, validation set, and test set, respectively.

Direct use:https://go.hyper.ai/OFxxR

6. VISO Large-Scale Satellite Video Moving Target Detection and Tracking Dataset

The dataset consists of high-resolution videos captured by the Jilin-1 satellite platform with a resolution of 12,000×5,000 pixels. It aims to promote technological advances in the field of satellite video analysis and address the challenges it faces, such as small target size, low spatial resolution, and limited appearance and texture information.

Direct use:https://go.hyper.ai/LcMbH

7. SAT-DS Large-Scale 3D Medical Image Segmentation Dataset

This dataset is the largest 3D medical image segmentation dataset currently. It brings together 72 public datasets, 22K+ images from three modalities of CT, MR and PET, 302K+ segmentation annotations, covering 497 segmentation targets in 8 major parts of the human body, and realizing a universal medical segmentation model for radiological images through text prompts.

Direct use:https://go.hyper.ai/aANbx

8. GAIA General AI Assistant Benchmark Dataset

GAIA consists of more than 450 complex questions with clear answers that require different levels of tools and autonomy to solve. Therefore, it is divided into 3 levels, where level 1 can be solved by very good LLMs, and level 3 indicates a significant improvement in model ability. Each level is divided into a fully public development set for validation, and a test set containing private answers and metadata.

Direct use:https://go.hyper.ai/VY3cU

9. Helmet Detection Helmet Detection Dataset

This dataset contains 764 images of two different categories: "wearing a helmet" and "not wearing a helmet", which can be used for helmet detection tasks.

Direct use:https://go.hyper.ai/QuMyR

10. Soil Moisture Hyperspectral Benchmark Dataset

This dataset is a benchmark dataset for soil moisture assessment based on hyperspectral data. It was obtained through a 5-day field measurement campaign in Karlsruhe, Germany. It aims to study and develop models that can estimate soil moisture content based on hyperspectral data.

Direct use:https://go.hyper.ai/fG77T

For more public datasets, please visit:

https://hyper.ai/datasets

Selected Public Tutorials

1. Tencent HunyuanDiT Wenshengtu Demo

This model is the first Chinese-English bilingual DiT architecture. It is a text-to-image generation model based on the Diffusion transformer. This model has fine-grained understanding capabilities in Chinese and English. The research team built a complete data pipeline to update and evaluate data to help optimize the model iteration. This tutorial does not require any commands to be entered, and image generation can be started immediately with one-click cloning.

Run online:https://go.hyper.ai/Dwtf7

2. Paints-Undo Demo of the whole process of generating a painting from one picture

PaintsUndo is a model that can simulate human painting behavior. It aims to provide a basic model of painting behavior for humans, and at the same time hopes that future AI models can better meet the real needs of human artists. The project provides a series of models that take an image as input and then output a sequence of paintings of the image. This tutorial is a one-click run demo of PaintsUndo. The relevant environment and dependencies have been installed. You can experience it by cloning and starting it with one click.

Run online:https://go.hyper.ai/Nr3DC

We have also established a Stable Diffusion tutorial exchange group. Welcome friends to scan the QR code and remark [SD tutorial] to join the group to discuss various technical issues and share application results~

Community Articles

1. Neural network replaces density functional theory! Tsinghua research group releases universal material model DeepH, achieving ultra-accurate prediction

Researchers from Tsinghua University used the original DeepH method to develop the DeepH universal material model and demonstrated a feasible solution for building a "big material model". This breakthrough provides new opportunities for innovative material discovery. This article is an interpretation and sharing of the paper.

View the full report:https://go.hyper.ai/lxFha

2. Not replacement, but symbiosis! The future of meteorological science requires the organic combination of AI and numerical forecasting

With the rapid development of AI, the question of "Will traditional numerical forecasting be caught up with, surpassed or even completely replaced by AI? How can the two coexist?" has aroused people's attention and thinking in recent years. In this regard, Huang Wei, deputy director of the Shanghai Typhoon Research Institute of the China Meteorological Administration, believes that "in the foreseeable future, the organic combination of AI weather forecasting and traditional numerical forecasting is the most effective way to achieve breakthroughs in forecasting technology." This article is HyperAI's interpretation and sharing of the relationship between the two.

View the full report:https://go.hyper.ai/ui8Yv

3. Selected for ICML! The Renmin University team used equivariant graph neural network to predict target protein binding sites, with the highest performance improvement of 20%

A research team from Renmin University of China Gaoling School of Artificial Intelligence applied E(3) equivariant graph neural network (GNN) to ligand binding site prediction for the first time, and proposed a framework called EquiPocket, which solved the challenges encountered by CNN-based methods. This article is an interpretation and sharing of the research process.

View the full report:https://go.hyper.ai/HrzK4

4. Stanford, Apple and 23 other institutions released DCLM benchmarks. Can high-quality datasets shake up the Scaling Laws? The basic model performs on par with Llama3 8B

In response to the continuous increase in the amount of data required for language model training and issues such as data quality, 23 institutions including Stanford and Apple released the DCLM benchmark test, which cleaned up 240 trillion data. This article is an interpretation and sharing of the experimental process.

View the full report:https://go.hyper.ai/V3gPg

Popular Encyclopedia Articles

1. Scaling Law

2. Masked Language Modeling (MLM)

3. Data Augmentation

4. Long Short-Term Memory Short-Term Memory

5. Quantum Neural Network

Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:

https://go.hyper.ai/wiki

One-stop tracking of top AI academic conferences:https://go.hyper.ai/event

The above is all the content of this week’s editor’s selection. If you have resources that you want to include on the hyper.ai official website, you are also welcome to leave a message or submit an article to tell us!

See you next week!

Welfare Activities
CommunityOverCode 2024 (CoC) will be held in Hangzhou from July 26 to 28. The conference is the official global series of conferences of the Apache Software Foundation (ASF) to promote the development of open source technologies and community participation. HyperAI will participate in this event as a cooperative community and look forward to meeting you offline!

Welcome to follow the "HyperAI Super Neural" public account to participate in the lucky draw.Have a chance to win event tickets worth 999 yuan!