HyperAI

MMLU-Pro Benchmark Dataset Is Now Available, Including 12k Interdisciplinary Complex Problems, Which Are More Challenging! DeepSeek Mathematical Model Can Be Deployed With One Click

特色图像

In the era of large language models (LLMs), benchmarks such as Massive Multi-Task Language Understanding (MMLU) play a crucial role in pushing the limits of AI’s language understanding and reasoning capabilities in different fields.

However, with the continuous improvement and optimization of the model, the performance of LLM in these benchmarks has gradually stabilized, making it increasingly difficult to distinguish the differences in the capabilities of different models.

To better evaluate the capabilities of LLM, researchers from the University of Waterloo, the University of Toronto, and Carnegie Mellon University jointly released the MMLU-Pro dataset, which integrates questions from multiple sources, including the original MMLU dataset, STEM websites, TheoremQA, and SciBench.The dataset is now available for download on hyper.ai. Scroll down to get the link~

From September 9 to September 14, hyper.ai official website updates:

* High-quality public datasets: 10

* Selection of high-quality tutorials: 3

* Community article selection: 4 articles

* Popular encyclopedia entries: 5

* Top conferences with deadline in September: 3

Visit the official website:hyper.ai

Selected public datasets

1. MMLU-Pro Large-Scale Multi-Task Understanding Dataset

The MMLU-Pro dataset is a more powerful and challenging large-scale multi-task understanding dataset designed to more rigorously benchmark the capabilities of large language models. The dataset contains 12K complex questions across disciplines.

Direct use: https://go.hyper.ai/PwJDW

2. DeepGlobe18 road extraction dataset

The training data for the Road Challenge contains 6,226 RGB satellite images with a size of 1024 × 1024. The images have a resolution of 50 cm pixels and are collected by DigitalGlobe's satellites.

Direct use: https://go.hyper.ai/VIg0J

3. OpenForensics face forgery detection dataset

The dataset consists of 115K in-the-wild images and 334K faces, all with rich facial annotations including forgery categories, bounding boxes, segmentation masks, forgery boundaries, and general facial landmarks, covering various backgrounds and multiple people of different ages, genders, poses, positions, and facial occlusions.

Direct use: https://go.hyper.ai/jTTRz

4. DeepfakeTIMIT deep fake detection dataset

This dataset contains videos of faces swapped using an open-source Generative Adversarial Network (GAN)-based approach. These videos were created based on the original autoencoder-based Deepfake algorithm.

Direct use: https://go.hyper.ai/me1TI

5. SESYD Synthetic Document Database

The dataset contains document images with benchmark real information. It consists of 11 sets, including 284k images, 190k symbols and 284k characters. It focuses on two major research problems in the field of document image analysis: (1) symbol recognition and localization in online drawing images (such as floor plans and circuit diagrams); (2) character segmentation and recognition in geographic maps.

Direct use: https://go.hyper.ai/ZqRTQ

6. LAV-DF Multimodal DeepFake Audio-Visual Dataset

LAV-DF is a multimodal (video tampering and audio tampering) dataset derived from the VoxCeleb2 dataset, containing 136,304 videos, including 36,431 real videos and 99,873 fake videos.

Direct use: https://go.hyper.ai/ujock

7. Vibrent Clothes Rental Dataset Clothing Rental Dataset

The dataset contains 64k transactions, rental histories of 2.2k anonymous users, and 15.8k unique garments, where the properties and rental history of each item are recorded in detail. All garments are listed as individual items or their corresponding groups, referring to the shared designs between individual items, and each garment is accompanied by a set of tags describing some of its properties.

Direct use:https://go.hyper.ai/PFlKA

8. FFIW10K Face Forgery Dataset

The data includes 10k high-quality fake videos collected from Youtube, with an average of three faces per frame. Each video contains real faces and fake faces, which is closer to realistic and complex scenes. The manipulation process is fully automatic and controlled by a domain adversarial quality assessment network, making the dataset highly scalable and low-manpower.

Direct use: https://go.hyper.ai/AHS7y

9. ForgeryNet face forgery dataset

The dataset contains 2.9 million images and 221,247 videos, covering 7 image-level and 8 video-level forgery methods from around the world. This dataset provides researchers with rich resources to support 4 tasks at the image and video levels: image forgery classification, spatial forgery localization, video forgery classification, and temporal forgery localization.

Direct use: https://go.hyper.ai/Yx0mj

10. EEG Eve State Dataset Eye state EEG dataset

This dataset contains instances of EEG measurements, where the output is whether the eyes are open or closed. The values in the dataset are arranged in chronological order, where 0 indicates the eyes are open and 1 indicates the eyes are closed. The dataset contains 14 EEG measurements, labeled AF3, F7, F3, FC5, T7, P, O1, O2, P8, T8, FC6, F4, F8, AF4.

Direct use:https://go.hyper.ai/RTBDy

For more public datasets, please visit:

https://hyper.ai/datasets

Selected Public Tutorials

1. OneKey Deployment DeepSeek-Prover-V1.5

This model is a mathematical theorem proving model that DeepSeek open-sourced in 2024. The research team introduced this model in Lean 4. The model builds a "Go"-style learning environment through self-iteration and Lean prover supervision. This tutorial is a step-by-step use of the model for one-click deployment demo.

Direct use: https://go.hyper.ai/MevMB

2. LLaVA OneVision multimodal all-round vision model Demo

The model can process images, text, interleaved image and text input, and video. It is the first single model that can simultaneously break through the performance bottleneck of open multimodal models in these three important computer vision scenarios. Go to the official website to clone and start the container, and directly copy the API address to experience the model inference.

Direct use: https://go.hyper.ai/Dcg74

3. Online Tutorials | Sir, the era of Vincent van Gogh has changed again! SD core members set up their own studio, and the first model FLUX.1 is a tough rival to SD 3 and Midjourney

Competition in the field of image models is getting more intense! A former core member of Stable Diffusion has set up his own company and released the image model FLUX, which covers everything from commercial use to open source personal use. The generated effect is very close to real-life photos, and the details of the characters are very realistic. Currently, hyper.ai has launched "FLUX ComfyUI (including the Black Myth Wukong LoRA training version)", click the link below to deploy according to the tutorial.

Directlyuse:https://go.hyper.ai/trQhv

Community Articles

1. Dataset Summary | DeepFake chaos is rampant, use magic to defeat magic! High-quality datasets help the development of forgery detection technology

Face recognition and DeepFake chaos require the upgrading of face recognition and forgery detection technologies to accurately identify tampered images and videos. HyperAI has compiled 11 commonly used face recognition and DeepFake datasets for you to download with one click.

View the full summary:https://go.hyper.ai/EMKo2

2. Apple Intelligence late night explosion! Apple releases 4 self-developed chips, iPhone/iWatch/AirPods major upgrades

At the autumn new product launch conference on September 10, Apple launched new products such as iPhone 16, AirPods 4, and Apple Watch Series 10. Based on self-developed chips, they have achieved a major leap in performance and fully integrated Apple Intelligence to bring users an unprecedented smart experience. This article is a comprehensive report on Apple's autumn new product launch conference.

View the full report:https://go.hyper.ai/H7P8X

3. Sensitivity improved by 56%, CUHK/Fudan/Yale and others jointly proposed a new protein homolog detection method

In the process of protein recognition, homology identification of protein sequences is one of the most important tasks. In order to solve the pain points of protein remote homology research, based on protein language model and dense search technology, Li Yu from the Chinese University of Hong Kong, together with Sun Siqi, a young researcher from the Intelligent Complex Systems Laboratory of Fudan University and the Shanghai Artificial Intelligence Laboratory, and Mark Gerstein from Yale University, proposed an ultra-fast and highly sensitive homology detection framework - dense homology searcher. This article is a detailed interpretation and sharing of the research paper.

View the full report:https://go.hyper.ai/vLAej

4. Based on 2,500 square kilometers of real-world data, the Beijing Normal University team proposed the StarFusion model to achieve high spatial resolution image prediction

Chen Jin's team from the State Key Laboratory of Earth Surface Processes and Resource Ecology at Beijing Normal University proposed a dual-stream spatiotemporal decoupled fusion architecture model, StarFusion, which can overcome the problem that most existing deep learning algorithms require HSR time series images for training and fully realize the prediction of high spatial resolution images. This article is a detailed interpretation and sharing of the research paper.

View the full report:https://go.hyper.ai/7LmzA

Popular Encyclopedia Articles

1. Sigmoid function

2. Paired t-Test

3. Contrastive Learning

4. Semi-Supervised Learning

5. Data Augmentation

Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:

https://go.hyper.ai/wiki

One-stop tracking of top AI academic conferences:https://go.hyper.ai/event

About HyperAI

HyperAI (hyper.ai) is the leading artificial intelligence and high-performance computing community in China.We are committed to becoming the infrastructure in the field of data science in China and providing rich and high-quality public resources for domestic developers. So far, we have:

* Provide domestic accelerated download nodes for 1300+ public data sets

* Includes 400+ classic and popular online tutorials

* Interpretation of 100+ AI4Science paper cases

* Support 500+ related terms search

* Hosting the first complete Apache TVM Chinese documentation in China

Visit the official website to start your learning journey:

https://hyper.ai