HyperAI

Ebook2Audiobook Converts E-books to Audiobooks in One Click; CVPR's First Cross-domain Small Sample Object Detection Challenge Dataset Is Online

特色图像

In this era of information explosion, our eyes are already overwhelmed - staring at mobile phone screens on the way to work, facing computer documents at work, and immersing ourselves in the world of novels before going to bed. If text can be transformed into warm voices, and we can listen to them while jogging in the morning, cooking, or resting our eyes, then the acquisition of information will no longer be limited to vision.

Ebook2Audiobook is an open source tool designed to convert eBooks into audiobooks. The project uses advanced Text-to-Speech (TTS) technology to convert the text content in eBooks into voice files to generate audiobooks that can be listened to.

at present,"Ebook2Audiobook e-book to audiobook" tutorial is now online hyper.ai official website, one-click start can make your e-book library reborn in the sound waves, come and try it~

Online use:https://go.hyper.ai/sgLbN

From March 3rd to March 7th, hyper.ai official website updates:

* High-quality public datasets: 10

* Selection of high-quality tutorials: 3

* Community Article Selection: 6 articles

* Popular encyclopedia entries: 5

* Top conferences with deadline in March: 5

Visit the official website:hyper.ai

Selected public datasets

1. CC-OCR text recognition dataset

The CC-OCR dataset covers four core tasks: multi-scene text reading, multi-language text reading, document parsing, and key information extraction. It contains 39 subsets and 7,058 fully annotated images. The launch of CC-OCR fills the gap in the evaluation of current multimodal models in terms of complex structures and fine-grained visual challenges, and is of great significance to promoting the progress of multimodal models in practical applications.


Direct use:https://go.hyper.ai/rQT2y

Dataset Example

2. MM-RLHF Multimodal Preference Alignment Dataset

This dataset contains 120,000 pairs of fine-grained, manually annotated preference comparison data, covering three areas: image understanding, video analysis, and multimodal security. The amount of data far exceeds existing resources, covering more than 100,000 multimodal task instances. Each piece of data has been carefully scored and interpreted by more than 50 annotators to ensure the high quality and granularity of the data.

Direct use:https://go.hyper.ai/sTfNc

Dataset Example

3. GAIA Visual Language Remote Sensing Image Understanding Dataset

GAIA is a global, multimodal, multiscale vision-language dataset for remote sensing image analysis, aiming to bridge the gap between remote sensing (RS) imagery and natural language understanding. The dataset covers 25 years of Earth observation data (1998-2024), covering a diverse range of geographic areas, satellite missions, and remote sensing modalities.

Direct use:https://go.hyper.ai/JHgSb

Dataset Example

4. OpenR1-Math-220k Mathematical Reasoning Dataset

OpenR1-Math-220k is a large-scale mathematical reasoning dataset that contains 220,000 high-quality mathematical problems and their reasoning traces, which are derived from 800,000 reasoning traces generated by DeepSeek R1.

Direct use:https://go.hyper.ai/VkUMt

5. JuDGE Chinese Legal Judgment Benchmark Dataset

JuDGE is a benchmark dataset for legal document generation designed for Chinese legal systems. This dataset aims to improve the performance of legal document generation models through high-quality annotated data, especially in legal reasoning and document writing. It is suitable for a variety of application scenarios such as legal intelligent systems, automatic generation of legal documents, and legal question-answering systems.

Direct use:https://go.hyper.ai/Fygtg

6. NTIRE2025 CDFSOD small sample object detection dataset

This dataset is used by the first cross-domain small sample object detection challenge of NTIRE 2025, which includes the source dataset COCO and multiple verification datasets, such as ArTaxOr, Clipart1k, DIOR, DeepFish, NEU-DET, UODD, etc. The core research problem of this dataset is how to perform target detection in cross-domain scenarios using only very limited annotated target images.

Direct use:https://go.hyper.ai/kGZhW

Dataset Example

7. Cat Scratch YOLO-format Detection Cat scratch object YOLO format detection dataset

This dataset is a YOLO format dataset for detecting cats scratching objects. It contains about 1,500 images with backgrounds. Each image has a .txt label file compatible with YOLO, which can be used to train object detection models to identify whether a cat is scratching something.

Direct use:https://go.hyper.ai/wkzNJ

Dataset Example

8. Chinese DeepSeek R1 Distill data 110k Chinese based on DeepSeek-R1 distillation dataset

This dataset is a Chinese open source distilled full-blooded R1 dataset. The dataset contains not only math data, but also a large amount of general type data, with a total amount of 110K.

Direct use:https://go.hyper.ai/5zvRt

9. Hand Gesture Gesture Detection Dataset

This dataset is specially built for smart TV gesture control systems and contains about 500 independently collected short video samples. Each video clip lasts 2 to 3 seconds and fully records the dynamic process from the start of the gesture to the full display. These gestures include thumbs up, thumbs up, swipe left, swipe right, and stop, and serve as separate training samples for gesture recognition models. The samples are collaboratively completed by participants of different ages (18-65 years old), genders, and skin colors, covering a variety of interactive postures such as standing and sitting, in order to capture the differences in operating habits that may occur in real users.

Direct use:https://go.hyper.ai/nMdjB

Dataset Example

10. Rich-Human-Feedback Image Dataset

This dataset is designed to provide rich feedback for the training and evaluation of text-to-image generation models and contains 15k images. It collects 1.5 million annotations from over 150,000 people, covering feedback such as image ratings, semantic consistency, and correction suggestions.

Direct use:https://go.hyper.ai/GhD9w

Dataset Example

Selected Public Tutorials

1. One-click deployment of YOLOv12

For a long time, enhancing the network architecture of the YOLO framework has been a core topic in the field of computer vision. Although the attention mechanism has performed well in modeling capabilities, CNN-based improvements are still the mainstream because attention-based models are difficult to match in speed. However, the launch of YOLOv12 has changed this situation. Not only is it comparable to CNN-based frameworks in speed, it also fully utilizes the performance advantages of the attention mechanism and becomes a new benchmark for real-time object detection.

The relevant models and dependencies of this project have been deployed. After starting the container, click the API address to enter the Web interface.

Run online:https://go.hyper.ai/Wy1So

Demo Example

2. Ebook2Audiobook e-book to audiobook

Ebook2Audiobook is an open source tool designed to convert eBooks to audiobooks. The project uses advanced Text-to-Speech (TTS) technology to automatically convert the text content in eBooks into speech, generating audiobooks for users to listen to. Ebook2Audiobook supports multiple eBook formats, such as EPUB, PDF, MOBI, etc., and can retain chapter structure and metadata, making the generated audiobooks easier to navigate and understand.

Go to the official website to clone and start the container, directly copy the API address, and then start the model.

Run online:https://go.hyper.ai/sgLbN

Demo Example

Community Articles

1. The accuracy rate reaches 97%. The Australian team's new achievement is based on deep learning to identify gender by skull CT, surpassing human forensic doctors

The team from the University of Western Australia and other institutions proposed an automated framework based on deep learning. The study used 200 skull CT scans from a hospital in Indonesia to train and test three deep learning-based network configurations. The most accurate deep learning framework was able to combine gender and skull features for judgment, with a classification accuracy of 97%, significantly higher than the 82% of human observers. This article is a detailed interpretation and sharing of the paper.

View the full report:https://go.hyper.ai/0rfjM

2. Taking the 1.7K Shenzhen residential housing price as an example, Zhejiang University GIS Laboratory uses the attention mechanism to mine geographic context features and improve the accuracy of spatial non-stationary regression

Researchers from Zhejiang Provincial GIS Key Laboratory proposed a deep learning model CatGWR based on attention mechanism. The model combines the spatial distance and contextual similarity between samples by introducing the attention mechanism, thereby more accurately estimating spatial non-stationarity. This provides a new perspective for geospatial modeling, especially when dealing with complex geographical phenomena, and can better capture spatial heterogeneity and contextual influences. This article is a detailed interpretation and sharing of the research.

View the full report:https://go.hyper.ai/irDAo

3. Covering mathematics/code/science/puzzles, high-quality reasoning data sets are summarized to help reproduce DeepSeek's powerful reasoning capabilities

HyperAI has carefully compiled the most popular reasoning datasets, covering mathematics, code, science, puzzles and other fields. For practitioners and researchers who hope to effectively improve the reasoning capabilities of large models, these datasets are undoubtedly an excellent starting point. This article is the dataset download address.

View the full report:https://go.hyper.ai/XGIi8

4. Selected for ICLR 2025! Zhejiang University Shen Chunhua et al. proposed Boltzmann alignment technology, protein binding free energy prediction reached SOTA

Zhejiang University and others proposed a technique called Boltzmann alignment, which transferred knowledge from the pre-trained inverse folding model to the prediction of binding free energy. This method showed superior performance and was included in ICLR 2025, the top international academic conference in the field of artificial intelligence. This article is a detailed interpretation and sharing of the paper.

View the full report:https://go.hyper.ai/MsUDj

5. Model parameters exceed RFdiffusion by 5 times! NVIDIA and others release Proteina, which achieves SOTA performance in de novo protein backbone design

NVIDIA, in collaboration with MIT and others, has developed a new type of large-scale streaming protein backbone generator, Proteina. Proteina has five times the number of parameters of the RFdiffusion model, and has expanded the training data to 21 million synthetic protein structures. It has achieved SOTA performance in de novo protein backbone design, and has generated diverse and designable proteins with an unprecedented length of up to 800 residues. The results have been selected for ICLR 2025 Oral. This article is a detailed interpretation and sharing of the research.

View the full report:https://go.hyper.ai/n4fWv

6. The government work report mentioned "artificial intelligence +" again, and the proposals of technology leaders at the two sessions focused on AI + medical care/AI face-changing and voice-changing/large model illusions...

Lei Jun, Zhou Hongyi, Liu Qingfeng and other industry leaders closely followed the pulse of the times and actively proposed proposals and suggestions in many key areas such as new energy vehicles, large model illusions, AI medical care, AI face replacement, and AI education. See below for more details.

View the full report:https://go.hyper.ai/EazuY

Popular Encyclopedia Articles

1. Diffusion Loss

2. Causal Attention

3. Kolmogorov-Arnold Representation Theorem

4. Large-scale Multi-task Language Understanding (MMLU)

5. Contrastive Learning

Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:

https://go.hyper.ai/wiki

One-stop tracking of top AI academic conferences:https://go.hyper.ai/event


The above is all the content of this week’s editor’s selection. If you have resources that you want to include on the hyper.ai official website, you are also welcome to leave a message or submit an article to tell us!

See you next week!