Ebook2Audiobook Converts e-books to Audiobooks in One Click; CVPR's First cross-domain Small Sample Object Detection Challenge Dataset Is Online

a year ago

In this era of information explosion, our eyes are already overwhelmed - staring at mobile phone screens on the way to work, facing computer documents at work, and immersing ourselves in the world of novels before going to bed. If text can be transformed into warm voices, and we can listen to them while jogging in the morning, cooking, or resting our eyes, then the acquisition of information will no longer be limited to vision.

Ebook2Audiobook is an open source tool designed to convert eBooks into audiobooks. The project uses advanced Text-to-Speech (TTS) technology to convert the text content in eBooks into voice files to generate audiobooks that can be listened to.

at present,"Ebook2Audiobook e-book to audiobook" tutorial is now online hyper.ai official website, one-click start can make your e-book library reborn in the sound waves, come and try it~

Online use:https://go.hyper.ai/sgLbN

From March 3rd to March 7th, hyper.ai official website updates:

* High-quality public datasets: 10

* Selection of high-quality tutorials: 3

* Community Article Selection: 6 articles

* Popular encyclopedia entries: 5

* Top conferences with deadline in March: 5

Visit the official website:hyper.ai

Selected public datasets

1. CC-OCR text recognition dataset

The CC-OCR dataset covers four core tasks: multi-scene text reading, multi-language text reading, document parsing, and key information extraction. It contains 39 subsets and 7,058 fully annotated images. The launch of CC-OCR fills the gap in the evaluation of current multimodal models in terms of complex structures and fine-grained visual challenges, and is of great significance to promoting the progress of multimodal models in practical applications.

Direct use:https://go.hyper.ai/rQT2y

2. MM-RLHF Multimodal Preference Alignment Dataset

This dataset contains 120,000 pairs of fine-grained, manually annotated preference comparison data, covering three areas: image understanding, video analysis, and multimodal security. The amount of data far exceeds existing resources, covering more than 100,000 multimodal task instances. Each piece of data has been carefully scored and interpreted by more than 50 annotators to ensure the high quality and granularity of the data.

Direct use:https://go.hyper.ai/sTfNc

3. GAIA Visual Language Remote Sensing Image Understanding Dataset

GAIA is a global, multimodal, multiscale vision-language dataset for remote sensing image analysis, aiming to bridge the gap between remote sensing (RS) imagery and natural language understanding. The dataset covers 25 years of Earth observation data (1998-2024), covering a diverse range of geographic areas, satellite missions, and remote sensing modalities.

Direct use:https://go.hyper.ai/JHgSb

4. OpenR1-Math-220k Mathematical Reasoning Dataset

OpenR1-Math-220k is a large-scale mathematical reasoning dataset that contains 220,000 high-quality mathematical problems and their reasoning traces, which are derived from 800,000 reasoning traces generated by DeepSeek R1.

Direct use:https://go.hyper.ai/VkUMt

5. JuDGE Chinese Legal Judgment Benchmark Dataset

JuDGE is a benchmark dataset for legal document generation designed for Chinese legal systems. This dataset aims to improve the performance of legal document generation models through high-quality annotated data, especially in legal reasoning and document writing. It is suitable for a variety of application scenarios such as legal intelligent systems, automatic generation of legal documents, and legal question-answering systems.

Direct use:https://go.hyper.ai/Fygtg

6. NTIRE2025 CDFSOD small sample object detection dataset

This dataset is used by the first cross-domain small sample object detection challenge of NTIRE 2025, which includes the source dataset COCO and multiple verification datasets, such as ArTaxOr, Clipart1k, DIOR, DeepFish, NEU-DET, UODD, etc. The core research problem of this dataset is how to perform target detection in cross-domain scenarios using only very limited annotated target images.

Direct use:https://go.hyper.ai/kGZhW

7. Cat Scratch YOLO-format Detection Cat scratch object YOLO format detection dataset

This dataset is a YOLO format dataset for detecting cats scratching objects. It contains about 1,500 images with backgrounds. Each image has a .txt label file compatible with YOLO, which can be used to train object detection models to identify whether a cat is scratching something.

Direct use:https://go.hyper.ai/wkzNJ

8. Chinese DeepSeek R1 Distill data 110k Chinese based on DeepSeek-R1 distillation dataset

This dataset is a Chinese open source distilled full-blooded R1 dataset. The dataset contains not only math data, but also a large amount of general type data, with a total amount of 110K.

Direct use:https://go.hyper.ai/5zvRt

9. Hand Gesture Gesture Detection Dataset

This dataset is specially built for smart TV gesture control systems and contains about 500 independently collected short video samples. Each video clip lasts 2 to 3 seconds and fully records the dynamic process from the start of the gesture to the full display. These gestures include thumbs up, thumbs up, swipe left, swipe right, and stop, and serve as separate training samples for gesture recognition models. The samples are collaboratively completed by participants of different ages (18-65 years old), genders, and skin colors, covering a variety of interactive postures such as standing and sitting, in order to capture the differences in operating habits that may occur in real users.

Direct use:https://go.hyper.ai/nMdjB

10. Rich-Human-Feedback Image Dataset

This dataset is designed to provide rich feedback for the training and evaluation of text-to-image generation models and contains 15k images. It collects 1.5 million annotations from over 150,000 people, covering feedback such as image ratings, semantic consistency, and correction suggestions.

Direct use:https://go.hyper.ai/GhD9w

Selected Public Tutorials

1. One-click deployment of YOLOv12

For a long time, enhancing the network architecture of the YOLO framework has been a core topic in the field of computer vision. Although the attention mechanism has performed well in modeling capabilities, CNN-based improvements are still the mainstream because attention-based models are difficult to match in speed. However, the launch of YOLOv12 has changed this situation. Not only is it comparable to CNN-based frameworks in speed, it also fully utilizes the performance advantages of the attention mechanism and becomes a new benchmark for real-time object detection.

The relevant models and dependencies of this project have been deployed. After starting the container, click the API address to enter the Web interface.

Run online:https://go.hyper.ai/Wy1So

2. Ebook2Audiobook e-book to audiobook

Ebook2Audiobook is an open source tool designed to convert eBooks to audiobooks. The project uses advanced Text-to-Speech (TTS) technology to automatically convert the text content in eBooks into speech, generating audiobooks for users to listen to. Ebook2Audiobook supports multiple eBook formats, such as EPUB, PDF, MOBI, etc., and can retain chapter structure and metadata, making the generated audiobooks easier to navigate and understand.

Go to the official website to clone and start the container, directly copy the API address, and then start the model.

Run online:https://go.hyper.ai/sgLbN

Community Articles

1. The accuracy rate reaches 97%. The Australian team's new achievement is based on deep learning to identify gender by skull CT, surpassing human forensic doctors

The team from the University of Western Australia and other institutions proposed an automated framework based on deep learning. The study used 200 skull CT scans from a hospital in Indonesia to train and test three deep learning-based network configurations. The most accurate deep learning framework was able to combine gender and skull features for judgment, with a classification accuracy of 97%, significantly higher than the 82% of human observers. This article is a detailed interpretation and sharing of the paper.

View the full report:https://go.hyper.ai/0rfjM

2. Taking the 1.7K Shenzhen residential housing price as an example, Zhejiang University GIS Laboratory uses the attention mechanism to mine geographic context features and improve the accuracy of spatial non-stationary regression

Researchers from Zhejiang Provincial GIS Key Laboratory proposed a deep learning model CatGWR based on attention mechanism. The model combines the spatial distance and contextual similarity between samples by introducing the attention mechanism, thereby more accurately estimating spatial non-stationarity. This provides a new perspective for geospatial modeling, especially when dealing with complex geographical phenomena, and can better capture spatial heterogeneity and contextual influences. This article is a detailed interpretation and sharing of the research.

View the full report:https://go.hyper.ai/irDAo

3. Covering mathematics/code/science/puzzles, high-quality reasoning data sets are summarized to help reproduce DeepSeek's powerful reasoning capabilities

HyperAI has carefully compiled the most popular reasoning datasets, covering mathematics, code, science, puzzles and other fields. For practitioners and researchers who hope to effectively improve the reasoning capabilities of large models, these datasets are undoubtedly an excellent starting point. This article is the dataset download address.

View the full report:https://go.hyper.ai/XGIi8

4. Selected for ICLR 2025! Zhejiang University Shen Chunhua et al. proposed Boltzmann alignment technology, protein binding free energy prediction reached SOTA

Zhejiang University and others proposed a technique called Boltzmann alignment, which transferred knowledge from the pre-trained inverse folding model to the prediction of binding free energy. This method showed superior performance and was included in ICLR 2025, the top international academic conference in the field of artificial intelligence. This article is a detailed interpretation and sharing of the paper.

View the full report:https://go.hyper.ai/MsUDj

5. Model parameters exceed RFdiffusion by 5 times! NVIDIA and others release Proteina, which achieves SOTA performance in de novo protein backbone design

NVIDIA, in collaboration with MIT and others, has developed a new type of large-scale streaming protein backbone generator, Proteina. Proteina has five times the number of parameters of the RFdiffusion model, and has expanded the training data to 21 million synthetic protein structures. It has achieved SOTA performance in de novo protein backbone design, and has generated diverse and designable proteins with an unprecedented length of up to 800 residues. The results have been selected for ICLR 2025 Oral. This article is a detailed interpretation and sharing of the research.

View the full report:https://go.hyper.ai/n4fWv

6. The government work report mentioned "artificial intelligence +" again, and the proposals of technology leaders at the two sessions focused on AI + medical care/AI face-changing and voice-changing/large model illusions...

Lei Jun, Zhou Hongyi, Liu Qingfeng and other industry leaders closely followed the pulse of the times and actively proposed proposals and suggestions in many key areas such as new energy vehicles, large model illusions, AI medical care, AI face replacement, and AI education. See below for more details.

View the full report:https://go.hyper.ai/EazuY

Ebook2Audiobook Converts e-books to Audiobooks in One Click; CVPR's First cross-domain Small Sample Object Detection Challenge Dataset Is Online

a year ago

Information

Artificial Intelligence

Dataset

Object Detection

at present,"Ebook2Audiobook e-book to audiobook" tutorial is now online hyper.ai official website, one-click start can make your e-book library reborn in the sound waves, come and try it~

Online use:https://go.hyper.ai/sgLbN

From March 3rd to March 7th, hyper.ai official website updates:

* High-quality public datasets: 10

* Selection of high-quality tutorials: 3

* Community Article Selection: 6 articles

* Popular encyclopedia entries: 5

* Top conferences with deadline in March: 5

Visit the official website:hyper.ai

Selected public datasets

1. CC-OCR text recognition dataset

Direct use:https://go.hyper.ai/rQT2y

2. MM-RLHF Multimodal Preference Alignment Dataset

Direct use:https://go.hyper.ai/sTfNc

3. GAIA Visual Language Remote Sensing Image Understanding Dataset

Direct use:https://go.hyper.ai/JHgSb

4. OpenR1-Math-220k Mathematical Reasoning Dataset

Direct use:https://go.hyper.ai/VkUMt

5. JuDGE Chinese Legal Judgment Benchmark Dataset

Direct use:https://go.hyper.ai/Fygtg

6. NTIRE2025 CDFSOD small sample object detection dataset

Direct use:https://go.hyper.ai/kGZhW

7. Cat Scratch YOLO-format Detection Cat scratch object YOLO format detection dataset

Direct use:https://go.hyper.ai/wkzNJ

8. Chinese DeepSeek R1 Distill data 110k Chinese based on DeepSeek-R1 distillation dataset

This dataset is a Chinese open source distilled full-blooded R1 dataset. The dataset contains not only math data, but also a large amount of general type data, with a total amount of 110K.

Direct use:https://go.hyper.ai/5zvRt

9. Hand Gesture Gesture Detection Dataset

Direct use:https://go.hyper.ai/nMdjB

10. Rich-Human-Feedback Image Dataset

Direct use:https://go.hyper.ai/GhD9w

Selected Public Tutorials

1. One-click deployment of YOLOv12

The relevant models and dependencies of this project have been deployed. After starting the container, click the API address to enter the Web interface.

Run online:https://go.hyper.ai/Wy1So

2. Ebook2Audiobook e-book to audiobook

Go to the official website to clone and start the container, directly copy the API address, and then start the model.

Run online:https://go.hyper.ai/sgLbN

Community Articles

1. The accuracy rate reaches 97%. The Australian team's new achievement is based on deep learning to identify gender by skull CT, surpassing human forensic doctors

View the full report:https://go.hyper.ai/0rfjM

View the full report:https://go.hyper.ai/irDAo

3. Covering mathematics/code/science/puzzles, high-quality reasoning data sets are summarized to help reproduce DeepSeek's powerful reasoning capabilities

View the full report:https://go.hyper.ai/XGIi8

4. Selected for ICLR 2025! Zhejiang University Shen Chunhua et al. proposed Boltzmann alignment technology, protein binding free energy prediction reached SOTA

View the full report:https://go.hyper.ai/MsUDj

5. Model parameters exceed RFdiffusion by 5 times! NVIDIA and others release Proteina, which achieves SOTA performance in de novo protein backbone design

View the full report:https://go.hyper.ai/n4fWv

View the full report:https://go.hyper.ai/EazuY

Command Palette

Ebook2Audiobook Converts e-books to Audiobooks in One Click; CVPR's First cross-domain Small Sample Object Detection Challenge Dataset Is Online

Selected public datasets

Selected Public Tutorials

Community Articles

Popular Encyclopedia Articles

Command Palette

Ebook2Audiobook Converts e-books to Audiobooks in One Click; CVPR's First cross-domain Small Sample Object Detection Challenge Dataset Is Online

Selected public datasets

Selected Public Tutorials

Community Articles

Popular Encyclopedia Articles

Related News

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

Online Tutorial | 32K Context Parsing of Dozens of Pages of Documents at Once: Baidu Open Sources Unlimited OCR, Refactoring Complex Scenarios With Long Documents

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Command Palette

Ebook2Audiobook Converts e-books to Audiobooks in One Click; CVPR's First cross-domain Small Sample Object Detection Challenge Dataset Is Online

Selected public datasets

Selected Public Tutorials

Community Articles

Popular Encyclopedia Articles

Related News

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

Online Tutorial | 32K Context Parsing of Dozens of Pages of Documents at Once: Baidu Open Sources Unlimited OCR, Refactoring Complex Scenarios With Long Documents

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Related News

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

Online Tutorial | 32K Context Parsing of Dozens of Pages of Documents at Once: Baidu Open Sources Unlimited OCR, Refactoring Complex Scenarios With Long Documents

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Related News

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

Online Tutorial | 32K Context Parsing of Dozens of Pages of Documents at Once: Baidu Open Sources Unlimited OCR, Refactoring Complex Scenarios With Long Documents

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.