HyperAI

Over 110,000 Downloads, OpenThoughts-114k Reasoning Dataset Is Online; SkyReels-V1, the First AI Short Drama Creation Tool, Is Here! Say Goodbye to High Costs and Long Cycles

特色图像

The domineering short drama can be called the modern "electronic pickled mustard". In just a few minutes, it is full of crazy sweetness and high-energy reversals, making countless audiences immersed in it and unable to stop. However, the traditional creation model is time-consuming and laborious, which greatly limits the output of short dramas.

The SkyReels-V1-Hunyuan-I2V model launched by Kunlun Wanwei may bring new ideas to the creation of short dramas.It is fine-tuned based on HunyuanVideo with over 13 billion parameters. After in-depth training with massive Hollywood-level film and television data, it can accurately recognize 33 facial expressions and 400 natural movement combinations. The generated video frame by frame has a movie texture.

The hyper.ai official website has now launched the "SkyReels-V1-Hunyuan-I2V First AI Short Drama Creation Model Demo" tutorial.Come and start your short play creation journey~

Online use:https://go.hyper.ai/45cHH

In addition, I would like to recommend an academic sharing event. The latest Meet AI4S live broadcast will be held at 12:00 noon on March 7 with the theme of "Her Power in the AI Era: Transformation under Hard-core Technology".We invited Professor Huang Hong from Huazhong University of Science and Technology, Zhou Dongzhan, a young researcher from the AI for Science Center of Shanghai Artificial Intelligence Laboratory, and Zhou Bingxin, an assistant researcher from the Institute of Natural Sciences of Shanghai Jiao Tong University.Introduce personal achievements and share scientific research experience.

From February 24 to February 28, hyper.ai official website updates:

* High-quality public datasets: 10

* High-quality tutorial selection: 7

* Community Article Selection: 10 articles

* Popular encyclopedia entries: 5

* Top conferences with deadline in March: 6

Visit the official website:hyper.ai

Selected public datasets

1. OpenThoughts-114k Reasoning Dataset

The OpenThoughts-114k reasoning dataset focuses on areas such as mathematics, code, science, and puzzles. It contains 114,000 high-quality samples and aims to train small reasoning models to surpass existing large models (such as DeepSeek-R1-Distill-Qwen-32B and DeepSeek-R1-Distill-Qwen-7B) on mathematics and code reasoning tasks.

Direct use:https://go.hyper.ai/JfftY

Dataset generation process

2. Goku-MovieGenBench movie video dataset

This dataset is a video dataset focused on movie generation tasks, containing about 1,000 video samples for training and evaluating Goku, a stream-based video generation model. It combines high-quality video material to support the rectified Transformer architecture training model to achieve higher quality visual generation effects.

Direct use:https://go.hyper.ai/XeV82

Video Sample Example

3. Flapping Wing System Dataset Robotics Dataset

This dataset is specially created for studying the deep inverse mapping model of flapping wing robot wings, aiming to provide a new learning framework for the control of flapping wing robot wings. It contains 548 experiments, 470 time points per experiment, 3 wing rotation angles (pitch, yaw, roll) and 5 features (3 force measurements and 2 torque measurements), and the data sampling rate is 25 Hz.

Direct use:https://go.hyper.ai/ucDdq

4. LIMO Mathematical Reasoning Benchmark Dataset

LIMO is a mathematical reasoning dataset that aims to train and evaluate the mathematical reasoning ability of large models by carefully selecting high-quality training samples, so as to improve their performance in mathematics exams and competition questions (such as AIME, MATH-500, etc.).

Direct use:https://go.hyper.ai/MSOK1

5. OSC Molecular Dataset

The dataset contains 4 different types of OSC molecular datasets used to evaluate the property prediction performance: HOPV (Lopez et al. 2016), PFD (Nagasawa et al. 2018), NFA (Miyake and Saeki 2021) and PD (Miyake and Saeki 2021).

Direct use:https://go.hyper.ai/Ku2VO

6. Dolphin-R1 Inference Dataset

This dataset contains about 800,000 samples, and is designed to provide high-quality samples for training inference models like DeepSeek-R1. The data sources include 200,000 samples provided by DeepSeek-R1, Gemini Flash, and Dolphin Chat. These samples are mainly used to improve the performance of the model in inference tasks, covering complex tasks such as mathematics, logic, and coding.

Direct use:https://go.hyper.ai/Z6QBU

7. NuminaMath-1.5 Mathematical Reasoning Dataset

This dataset is suitable for the field of mathematics education and competition problems. It contains about 900k high-quality competition-level mathematics problems, and the solution of each problem adopts the chain of thinking (CoT) format. These problems are derived from Chinese high school mathematics exercises and American and international mathematics Olympiad competition problems.

Direct use:https://go.hyper.ai/72c4t

8. pyMethods2Test Programming Language Processing Dataset

This dataset contains a large number of open source unit testing methods and corresponding focus maps, aiming to generate effective unit test cases for Python code, filling the gap in the Python language in large test datasets.

Direct use:https://go.hyper.ai/Vqe4c6

9. Bespoke Stratos 17k Reasoning Task Dataset

The dataset is generated by improving Berkeley's Sky-T1 data pipeline and using the distilled data of DeepSeek-R1, aiming to support the training of high-performance inference models. The dataset contains questions, reasoning traces, and answers, covering multiple fields such as code, mathematics, and scientific puzzles.

Direct use:https://go.hyper.ai/xi1jt

10. s1K Reasoning Problem Dataset

The dataset contains 1k questions and their detailed reasoning traces and answers. The dataset covers 50 different fields, including probability theory, quantitative interview questions, Olympic Games questions, etc., ensuring that the model can handle various types of reasoning tasks.

Direct use:https://go.hyper.ai/gvIyv

Selected Public Tutorials

1. SkyReels-V1-Hunyuan-I2V The first AI short drama creation model Demo

This model is a high-quality video generation model that focuses on human-centered film-quality video generation. It is fine-tuned based on the HunyuanVideo model and trained with tens of millions of high-quality film and television data to generate video content with movie-quality texture.

The relevant models and dependencies of this project have been deployed. You only need to upload pictures and enter commands to start your skit creation journey.

Run online:https://go.hyper.ai/45cHH

Demo Example

2. One-click deployment of DeepSeek-R1-70B

This model is an inference-enhanced model with a parameter scale of up to 70 billion. It is trained based on Llama3.3-70B-Instruct and uses reinforcement learning and distillation technology to improve inference performance. It not only inherits the advantages of the Llama series of models, but also further optimizes the inference ability on this basis, especially in mathematics, code and logical reasoning tasks.

This project can generate a front-end interactive interface through the Gradio interface. The relevant models and dependencies have been deployed, and you can start a dialogue with the model with one click.

Run online:https://go.hyper.ai/LlFKB

Demo Example

3. Deploy DeepSeek R1 with Ollama and Open WebUI

DeepSeek-R1 is the first version of the language model series launched by DeepSeek in 2025, focusing on efficient and lightweight natural language processing tasks. It aims to reduce computing resource requirements while maintaining high performance. The design of DeepSeek-R1 focuses on practical application scenarios, supports rapid deployment and integration, and is suitable for a variety of tasks, including text generation, dialogue systems, translation, and summary generation.

Go to the official website to clone and start the container, directly copy the API address, and you can communicate with the model.

Run online:https://go.hyper.ai/2UJDf

Demo Example

4. LAMMPS Getting Started Tutorial: Estimating the Melting Point of FCC Cu Using npt Temperature Control

LAMMPS can be used to model a variety of materials, including solid-state materials (metals, semiconductors), biomolecules, polymers, etc., and can provide a variety of particle interaction models for different materials.

This tutorial is an introductory tutorial for LAMMPS: estimating the melting point of FCC Cu using npt temperature control. It can be run using the CPU version of LAMMPS to quickly get started with molecular dynamics simulations.

Click to view the full tutorial: Getting Started with LAMMPS: Estimating the Melting Point of FCC Cu Using npt Temperature Control

Run online:https://go.hyper.ai/BajMV

GIF cover
Effect examples

5. LTX-Video ultra-fast video generation

LTX-Video is a video generation model that uses transformer and Video-VAE technology to efficiently generate high-resolution videos. In addition, LTX-Video also supports multiple video generation methods, including from text to video and from image to video. Follow the tutorial steps and just describe what you want to generate a high-resolution video.

Run online:https://go.hyper.ai/EfjvF

Generating examples from text to video

6. MatterGen Inorganic Material Design Model Demo

MatterGen is a generative AI-based inorganic material design model launched by Microsoft, which guides the generation of materials that meet various property constraints through fine-tuning. It aims to directly generate new materials with specific chemical, mechanical, electronic or magnetic properties through diffusion models.

This tutorial will show you how to use this model to generate inorganic materials and train MatterGen yourself.

Click to view the full tutorial: Directly design materials with target properties! Microsoft's MatterGen model is open source, redefining the new paradigm of material reverse design with generative AI

Run online:https://go.hyper.ai/arVTV

Deploy MatterGen

7. One-click deployment of the Cosmos world basic model

At CES 2025, NVIDIA introduced the first batch of Cosmos World Base Models, advanced models trained with millions of hours of driving and robotics video data, which can predict and generate neural networks of physically aware videos of the future state of virtual environments to help developers build the next generation of robots and autonomous vehicles (AVs).

This project can generate a front-end interactive interface through the Gradio interface. You can start it with one click and copy the API address to experience it.

Click to view the full tutorial: Physical AI system innovation, quick start NVIDIA world basic model, can simulate sunlight and haze

Run online:https://go.hyper.ai/ypcP4

Generate video example

Community Articles

1. The $500 billion "Stargate" is launched, and Oracle's founder "paints" AI customized cancer vaccines

At the White House press conference, Trump appeared with the CEO of OpenAI, the CEO of SoftBank, and the CEO of Oracle, and announced an artificial intelligence project called the "Stargate Project". The project emphasizes the great breakthroughs that AI has brought to the medical and health fields, such as designing a unique vaccine for everyone to fight cancer. Many netizens have discussed this. More details are as follows.

View event recap:https://go.hyper.ai/6YZnN

2. Directly design target material properties! Microsoft’s MatterGen model is open source, redefining the new paradigm of material reverse design with generative AI

Microsoft has open-sourced MatterGen, a generative AI model for reverse material design. We can expect that in the future we can directly design the structure of new materials based on the required properties. This article systematically sorts out the key role of generative models in reverse design of new materials, covering battery materials, high entropy alloys, superconducting material development, etc.

View the full report:https://go.hyper.ai/gyQu0

3. Determined to achieve the first AGI in the field of biology! Medical AI company Owkin builds the world's largest cancer spatial omics dataset

Owkin is determined to realize the first AGI in the field of biology. It has solved the patient data privacy issue that the public is most concerned about. By integrating multimodal data from different institutions, it provides a reliable decision-making basis for precision medicine, assists in the diagnosis and drug development of cancers such as breast cancer and colorectal cancer, and has cooperated with pharmaceutical giants such as Sanofi, BMS and AstraZeneca. This article is a detailed report of the company, click to read it quickly.

View the full report:https://go.hyper.ai/cOuX1

4. Selected for AAAI 2025! The Hong Kong Polytechnic University team accurately predicts the optoelectronic properties of organic material molecules based on graph transformers

The RingFormer framework recently published by the Hong Kong Polytechnic University team can accurately predict the optoelectronic properties of molecules. Its performance is 22.77% higher than traditional methods, which is equivalent to shortening the development cycle of new materials from several years to weeks, marking that organic solar cell research has officially entered a new era of "computation-guided experiments". This article is a detailed interpretation and sharing of the research.

View the full report:https://go.hyper.ai/iBwDq

5. Real evaluation of 3 voice cloning models, GPT-SoVITS accurately grasps the characteristics of "Shiji Niangniang"

The box office of the Spring Festival movie "Nezha 2" has been soaring, and has now exceeded 12 billion. In the film, the voice actors gave the characters a vivid vitality with their smart voices, which attracted widespread attention from netizens. Today, voice cloning technology is developing rapidly. HyperAI Super Neural Evaluation has evaluated 3 current mainstream voice cloning models. Come and have a look.

View the full report:https://go.hyper.ai/JqDwI

6. The first book of 2025! Must-read books in the field of AI strongly recommended by Musk/Sam Altman/Bill Gates, etc.

HyperAI has selected 10 excellent works in the field of AI. These books, which are highly recommended by big names such as Musk, Sam Altman, Bill Gates, and Hawking, will help you further understand artificial intelligence and its development from different aspects such as basic science, application scenarios, and development trends. Click to read quickly.

View the full report:https://go.hyper.ai/Ne3uA

7. Ten "best" events! A review of AI events in 2024, revealing hidden trends and industry challenges

In 2024, the AI wave is still surging forward fiercely, showing no signs of decline, quietly reshaping the world's contours and writing record-breaking innovative events. Observing the continued rise in the AI development boom, market research firm IoT Analytics has selected the top ten most noteworthy events in the AI field in 2024. Come and have a look.

View the full report:https://go.hyper.ai/xyVnq

8. YOLO series has been updated 11 times in 10 years, and the latest model has reached SOTA in multiple target detection tasks

The super classic object detection model YOLO is favored by the industry for its high precision and efficiency, and is widely used in autonomous driving, security monitoring, medical imaging and other fields. The model has been updated 11 versions, and the HyperAI Super Neural official website has launched several important versions. For more details, see below.

View the full report:https://go.hyper.ai/9xHRS

9. Open source 176 billion parameter universal medical language model! BUPT/PKU/CTSU proposed MedFound, whose reasoning ability is close to that of expert physicians

A medical-engineering cross-disciplinary team consisting of Professor Wang Guangyu from Beijing University of Posts and Telecommunications, Professor Song Chunli from Peking University Third Hospital, and Professor Yang Jian from China Three Gorges University proposed the current biomedical large language model MedFound (176B) with the largest number of parameters, and further created the medical generalist diagnosis large language model MedFound-DX-PA. The model has knowledge and reasoning capabilities close to those of experts. For more details, see below.

View the full report:https://go.hyper.ai/oudKP

10. Superconducting material search efficiency increased by 5 times! University of Florida and others use deep learning to transform material discovery, and the results are published in Nature sub-journal

Researchers from the University of Florida and the University of Tennessee have increased the efficiency of searching for high-Tc superconductors by five times through the deep learning model BETE-NET. This article is a detailed interpretation and sharing of the research.

View the full report:https://go.hyper.ai/hAIXd

Popular Encyclopedia Articles

1. Diffusion Loss

2. Causal Attention

3. Kolmogorov-Arnold Representation Theorem

4. Large-scale Multi-task Language Understanding (MMLU)

5. Contrastive Learning

Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:

https://go.hyper.ai/wiki

One-stop tracking of top AI academic conferences:https://go.hyper.ai/event


The above is all the content of this week’s editor’s selection. If you have resources that you want to include on the hyper.ai official website, you are also welcome to leave a message or submit an article to tell us!

See you next week!