One-click to Run ComfyUI SD3! A large-scale Medical VQA Evaluation Dataset Is Online, Involving More Than 20 Human Organs and Parts

In order to further promote the popularization of AI4S, HyperAI has planned the "Meet AI4S" series of live broadcast programs.The first live broadcast will be launched on time at 19:00 on July 17!We have invited Ding Jiale, a PhD student in Remote Sensing and Geographic Information Systems at Zhejiang University. His topic is "Neural Networks Provide New Explanations for Spatial Heterogeneity of Housing Prices". Come and make an appointment for the live broadcast~

https://www.huodongxing.com/event/2762111401922

From July 8 to July 12, hyper.ai official website updates:

* High-quality public datasets: 10

* Selection of high-quality tutorials: 3

* Community article selection: 5 articles

* Popular encyclopedia entries: 5

* Top conferences with deadline in July: 2

Visit the official website:hyper.ai

Selected public datasets

1. OmniMedVQA Large-Scale Medical VQA Evaluation Dataset

This dataset focuses on large-scale visual question answering evaluation in the medical field. It contains 118,010 different images, covering 12 different modalities and involving more than 20 different organs and parts of the human body. It aims to provide an evaluation benchmark for the development of large medical multimodal models.

Direct use:https://go.hyper.ai/vafuu

2. Evol-character role setting and dialogue dataset

The dataset contains settings and dialogue data of 200 characters, generated by GPT3.5 and GPT4.
Direct use:https://go.hyper.ai/IwTIW

3. HellaSwag Large Model Common Sense Reasoning Dataset

The HellaSwag dataset contains 70,000 questions that are very easy for humans (over 95% accuracy) but difficult for models (about 48% accuracy). This dataset aims to explore the performance of deep pre-trained models in commonsense reasoning by building a dataset that is challenging for existing state-of-the-art models.

Direct use:https://go.hyper.ai/4WJGQ

4. M2Lingual Multi-language Multi-round Instruction Fine-tuning Dataset

The dataset covers 70 different languages, provides more training data for low-resource languages, and contains a total of 182,000 instruction fine-tuning pairs, aiming to improve the performance of large language models in following instructions, especially on diverse languages and tasks.

Direct use:https://go.hyper.ai/1AY34

5. MyAnimeList popular anime information dataset

This dataset contains popular anime information collected from the MyAnimeList website. It includes various attributes that can describe each anime in detail and can be used to analyze and study anime trends, ratings, and other related factors.

Direct use:https://go.hyper.ai/mU04c

6. Magpie-Pro-300K-Filtered High-Quality Alignment Dataset

This dataset is a high-quality instruction dataset synthesized using the Magpie method, which is extracted from Llama-3 70B. This dataset contains about 300k high-quality dialogues and is generated through an automated self-synthesis process that exploits the autoregressive properties of aligned LLMs to generate user queries and corresponding replies.

Direct use:https://go.hyper.ai/YTDxI

7. Vript English Video-Text Dataset

The dataset contains 12k annotated videos with a total of more than 420k clips. Each clip in the Vript dataset is accompanied by a caption of approximately 145 words.

Direct use:https://go.hyper.ai/7o2Ca

8. High-resolution tree detection dataset in the plain and hilly areas of eastern China

The dataset contains 1,920 images for training and 480 images for testing, with a total of 664,487 trees, with an average of 276 trees per image.

Direct use:https://go.hyper.ai/zTo63

9. AdaTreeFormer-London London High Resolution Tree Detection Dataset

The dataset covers a variety of urban and residential environments with high tree density, different tree shapes and sizes. It contains a total of 95,067 trees in the 452-image training set and 161-image test set, with an average of 155 trees per image.

Direct use:https://go.hyper.ai/iVHO1

10. AdaTreeFormer-Yoesmite Yosemite high-resolution tree detection dataset

The dataset mainly covers woody mountainous areas with low tree density and complex terrain. It contains a training set of 1,350 images with a total of 98,949 trees and a test set of 1,350 images. Each image contains an average of 36 trees, providing an important testing environment for the performance of the model in complex terrain.

Direct use:https://go.hyper.ai/ic1bO

For more public datasets, please visit:

https://hyper.ai/datasets

Selected Public Tutorials

1. Online Tutorial | Tsinghua University strongly recommends! YOLOv10 achieves more efficient target detection

YOLOv10 is a real-time target detection method developed by researchers at Tsinghua University based on the Ultralytics Python package, which aims to address the deficiencies of previous YOLO versions in post-processing and model architecture. In this tutorial, you can start target detection immediately by cloning it with one click without entering any commands.

Run online:https://go.hyper.ai/vtjgs

2. img2img-turbo image conversion demo

img2img-turbo is an efficient image-to-image conversion model designed for efficient visual content conversion. It can easily give rich colors to monochrome images or convert simple sketches into realistic images. This tutorial provides an intuitive model demonstration Demo. With just a few strokes, you can experience the fun of becoming a master painter!

Run online:https://go.hyper.ai/Ms5zH

3. ComfyUI StableDiffusion3 workflow online tutorial

Stable Diffusion 3 is a Multimodal Diffusion Transformer (MMDiT) model that is specifically designed to transform text descriptions into images. It excels at generating high-quality images, handling complex layouts, and parsing complex prompts. This tutorial shows you how to deploy and use Stable Diffusion 3 through the ComfyUI workflow. Simply clone the container and you can easily start and run the model through the API interface.

Run online:https://go.hyper.ai/sEQCW

Community Articles

1. Published in Cell sub-journal! Tsinghua University Zhang Qiangfeng's research group developed the SPACE algorithm, which has the leading ability in discovering organizational modules among similar tools

The research group led by Zhang Qiangfeng of Tsinghua University has developed an artificial intelligence algorithm called SPACE based on the deep learning framework of graph autoencoders, which can identify spatial cell types and discover tissue modules from spatial transcriptome data with single-cell resolution. SPACE is significantly superior to other tools in terms of cell type identification and tissue module discovery. This article is a detailed interpretation and sharing of the research.

View the full report:https://go.hyper.ai/IZE5Q

2. Yu Xiang's research group at Shanghai Jiao Tong University released a transferable deep learning model to identify multiple types of RNA modifications and significantly reduce computing costs

The research group led by Yu Xiang, a tenured associate professor at the School of Life Sciences and Technology, Shanghai Jiao Tong University, and the team led by Yang Jun and Wang Hongxia from Shanghai Chenshan Botanical Garden, developed a transferable deep learning model, TandemMod, which enabled the identification of multiple types of RNA modifications in DRS. This article is an interpretation and sharing of the experimental process.

View the full report:https://go.hyper.ai/qkS18

3. Universal Robots Milestone! MIT proposes a strategy combination framework PoCo to solve the problem of heterogeneous data sources and enable flexible multi-task execution of robots

MIT researchers proposed a robot strategy composition framework PoCo, which can solve the problems of data heterogeneity and task diversity in robot tool use tasks. This article is an interpretation and sharing of the research process.

View the full report:https://go.hyper.ai/jrJNV

4. Ding Han, Academician of the Chinese Academy of Sciences: Humanoid Robots - The Breakthrough Point of the Combination of Robots and Artificial Intelligence

Recently, HyperAI had a deep conversation with Academician Ding Han to learn about his profound accumulation in the field of intelligent manufacturing, as well as his unique insights into research fields such as industrial robots and humanoid robots. This article is a detailed interpretation and sharing of the interview content with Academician Ding Han.

View the full report:https://go.hyper.ai/A883w

5. 20 experimental data create AI protein milestone! Shanghai Jiaotong University and Shanghai AI Lab jointly released FSFP to effectively optimize protein pre-training model

The team led by Hong Liang from Shanghai Jiao Tong University and Tan Pan, a young researcher from Shanghai Artificial Intelligence Laboratory, proposed a fine-tuning training method FSFP based on protein pre-training models. It can efficiently train protein pre-training models using only 20 random wet experimental data and can significantly improve the single-point mutation prediction positive rate of the model. This article is an interpretation and sharing of the paper.

View the full report:https://go.hyper.ai/5vKyf

Popular Encyclopedia Articles

1. LlamaIndex

2. Lifelong Learning

3. Rotational Position Encoding RoPE

4. Russian dolls represent learning MRL

5. 3D Gaussian Splatting

Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:

https://go.hyper.ai/wiki

Station B live broadcast preview

The first episode of the "Meet AI4S" live broadcast series will be officially launched at 19:00 on July 17. We are honored to invite Ding Jiale, a doctoral student in remote sensing and geographic information systems at Zhejiang University. He will introduce the design ideas and application scenarios of the model in an easy-to-understand manner under the title "Neural networks provide a new explanation for the spatial heterogeneity of housing prices", and further share the spatial regression analysis method of geographically weighted regression.

Click to schedule a live broadcast:https://www.huodongxing.com/event/2762111401922

Guest Introduction

One-stop tracking of top AI academic conferences:

https://go.hyper.ai/event

The above is all the content of this week’s editor’s selection. If you have resources that you want to include on the hyper.ai official website, you are also welcome to leave a message or submit an article to tell us!

See you next week!

About HyperAI

HyperAI (hyper.ai) is the leading artificial intelligence and high-performance computing community in China.We are committed to becoming the infrastructure in the field of data science in China and providing rich and high-quality public resources for domestic developers. So far, we have:

* Provide domestic accelerated download nodes for 1300+ public data sets

* Includes 400+ classic and popular online tutorials

* Interpretation of 100+ AI4Science paper cases

* Support 500+ related terms search

* Hosting the first complete Apache TVM Chinese documentation in China

Visit the official website to start your learning journey:

https://hyper.ai

HyperAI

One-click to Run ComfyUI SD3! A large-scale Medical VQA Evaluation Dataset Is Online, Involving More Than 20 Human Organs and Parts

2 years ago

Information

Artificial Intelligence

Dataset

Machine Learning

Deep Learning

https://www.huodongxing.com/event/2762111401922

From July 8 to July 12, hyper.ai official website updates:

* High-quality public datasets: 10

* Selection of high-quality tutorials: 3

* Community article selection: 5 articles

* Popular encyclopedia entries: 5

* Top conferences with deadline in July: 2

Visit the official website:hyper.ai

Selected public datasets

1. OmniMedVQA Large-Scale Medical VQA Evaluation Dataset

Direct use:https://go.hyper.ai/vafuu

2. Evol-character role setting and dialogue dataset

The dataset contains settings and dialogue data of 200 characters, generated by GPT3.5 and GPT4.
Direct use:https://go.hyper.ai/IwTIW

3. HellaSwag Large Model Common Sense Reasoning Dataset

Direct use:https://go.hyper.ai/4WJGQ

4. M2Lingual Multi-language Multi-round Instruction Fine-tuning Dataset

Direct use:https://go.hyper.ai/1AY34

5. MyAnimeList popular anime information dataset

Direct use:https://go.hyper.ai/mU04c

6. Magpie-Pro-300K-Filtered High-Quality Alignment Dataset

Direct use:https://go.hyper.ai/YTDxI

7. Vript English Video-Text Dataset

The dataset contains 12k annotated videos with a total of more than 420k clips. Each clip in the Vript dataset is accompanied by a caption of approximately 145 words.

Direct use:https://go.hyper.ai/7o2Ca

8. High-resolution tree detection dataset in the plain and hilly areas of eastern China

The dataset contains 1,920 images for training and 480 images for testing, with a total of 664,487 trees, with an average of 276 trees per image.

Direct use:https://go.hyper.ai/zTo63

9. AdaTreeFormer-London London High Resolution Tree Detection Dataset

Direct use:https://go.hyper.ai/iVHO1

10. AdaTreeFormer-Yoesmite Yosemite high-resolution tree detection dataset

Direct use:https://go.hyper.ai/ic1bO

For more public datasets, please visit:

https://hyper.ai/datasets

Selected Public Tutorials

1. Online Tutorial | Tsinghua University strongly recommends! YOLOv10 achieves more efficient target detection

Run online:https://go.hyper.ai/vtjgs

2. img2img-turbo image conversion demo

Run online:https://go.hyper.ai/Ms5zH

3. ComfyUI StableDiffusion3 workflow online tutorial

Run online:https://go.hyper.ai/sEQCW

Community Articles

View the full report:https://go.hyper.ai/IZE5Q

2. Yu Xiang's research group at Shanghai Jiao Tong University released a transferable deep learning model to identify multiple types of RNA modifications and significantly reduce computing costs

View the full report:https://go.hyper.ai/qkS18

3. Universal Robots Milestone! MIT proposes a strategy combination framework PoCo to solve the problem of heterogeneous data sources and enable flexible multi-task execution of robots

View the full report:https://go.hyper.ai/jrJNV

4. Ding Han, Academician of the Chinese Academy of Sciences: Humanoid Robots - The Breakthrough Point of the Combination of Robots and Artificial Intelligence

View the full report:https://go.hyper.ai/A883w

5. 20 experimental data create AI protein milestone! Shanghai Jiaotong University and Shanghai AI Lab jointly released FSFP to effectively optimize protein pre-training model

View the full report:https://go.hyper.ai/5vKyf

Popular Encyclopedia Articles

1. LlamaIndex

2. Lifelong Learning

3. Rotational Position Encoding RoPE

4. Russian dolls represent learning MRL

5. 3D Gaussian Splatting

Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:

https://go.hyper.ai/wiki

Station B live broadcast preview

Click to schedule a live broadcast:https://www.huodongxing.com/event/2762111401922

Guest Introduction

One-stop tracking of top AI academic conferences:

https://go.hyper.ai/event

See you next week!

About HyperAI

* Provide domestic accelerated download nodes for 1300+ public data sets

* Includes 400+ classic and popular online tutorials

* Interpretation of 100+ AI4Science paper cases

* Support 500+ related terms search

* Hosting the first complete Apache TVM Chinese documentation in China

Visit the official website to start your learning journey:

https://hyper.ai

One-click to Run ComfyUI SD3! A large-scale Medical VQA Evaluation Dataset Is Online, Involving More Than 20 Human Organs and Parts

2 years ago

Information

Artificial Intelligence

Dataset

Machine Learning

Deep Learning

https://www.huodongxing.com/event/2762111401922

From July 8 to July 12, hyper.ai official website updates:

* High-quality public datasets: 10

* Selection of high-quality tutorials: 3

* Community article selection: 5 articles

* Popular encyclopedia entries: 5

* Top conferences with deadline in July: 2

Visit the official website:hyper.ai

Selected public datasets

1. OmniMedVQA Large-Scale Medical VQA Evaluation Dataset

Direct use:https://go.hyper.ai/vafuu

2. Evol-character role setting and dialogue dataset

The dataset contains settings and dialogue data of 200 characters, generated by GPT3.5 and GPT4.
Direct use:https://go.hyper.ai/IwTIW

3. HellaSwag Large Model Common Sense Reasoning Dataset

Direct use:https://go.hyper.ai/4WJGQ

4. M2Lingual Multi-language Multi-round Instruction Fine-tuning Dataset

Direct use:https://go.hyper.ai/1AY34

5. MyAnimeList popular anime information dataset

Direct use:https://go.hyper.ai/mU04c

6. Magpie-Pro-300K-Filtered High-Quality Alignment Dataset

Direct use:https://go.hyper.ai/YTDxI

7. Vript English Video-Text Dataset

The dataset contains 12k annotated videos with a total of more than 420k clips. Each clip in the Vript dataset is accompanied by a caption of approximately 145 words.

Direct use:https://go.hyper.ai/7o2Ca

8. High-resolution tree detection dataset in the plain and hilly areas of eastern China

The dataset contains 1,920 images for training and 480 images for testing, with a total of 664,487 trees, with an average of 276 trees per image.

Direct use:https://go.hyper.ai/zTo63

9. AdaTreeFormer-London London High Resolution Tree Detection Dataset

Direct use:https://go.hyper.ai/iVHO1

10. AdaTreeFormer-Yoesmite Yosemite high-resolution tree detection dataset

Direct use:https://go.hyper.ai/ic1bO

For more public datasets, please visit:

https://hyper.ai/datasets

Selected Public Tutorials

1. Online Tutorial | Tsinghua University strongly recommends! YOLOv10 achieves more efficient target detection

Run online:https://go.hyper.ai/vtjgs

2. img2img-turbo image conversion demo

Run online:https://go.hyper.ai/Ms5zH

3. ComfyUI StableDiffusion3 workflow online tutorial

Run online:https://go.hyper.ai/sEQCW

Community Articles

View the full report:https://go.hyper.ai/IZE5Q

2. Yu Xiang's research group at Shanghai Jiao Tong University released a transferable deep learning model to identify multiple types of RNA modifications and significantly reduce computing costs

View the full report:https://go.hyper.ai/qkS18

3. Universal Robots Milestone! MIT proposes a strategy combination framework PoCo to solve the problem of heterogeneous data sources and enable flexible multi-task execution of robots

View the full report:https://go.hyper.ai/jrJNV

4. Ding Han, Academician of the Chinese Academy of Sciences: Humanoid Robots - The Breakthrough Point of the Combination of Robots and Artificial Intelligence

View the full report:https://go.hyper.ai/A883w

5. 20 experimental data create AI protein milestone! Shanghai Jiaotong University and Shanghai AI Lab jointly released FSFP to effectively optimize protein pre-training model

View the full report:https://go.hyper.ai/5vKyf

Popular Encyclopedia Articles

1. LlamaIndex

2. Lifelong Learning

3. Rotational Position Encoding RoPE

4. Russian dolls represent learning MRL

5. 3D Gaussian Splatting

Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:

https://go.hyper.ai/wiki

Station B live broadcast preview

Click to schedule a live broadcast:https://www.huodongxing.com/event/2762111401922

Guest Introduction

One-stop tracking of top AI academic conferences:

https://go.hyper.ai/event

See you next week!

About HyperAI

* Provide domestic accelerated download nodes for 1300+ public data sets

* Includes 400+ classic and popular online tutorials

* Interpretation of 100+ AI4Science paper cases

* Support 500+ related terms search

* Hosting the first complete Apache TVM Chinese documentation in China

Visit the official website to start your learning journey:

https://hyper.ai

Command Palette

One-click to Run ComfyUI SD3! A large-scale Medical VQA Evaluation Dataset Is Online, Involving More Than 20 Human Organs and Parts

Command Palette

One-click to Run ComfyUI SD3! A large-scale Medical VQA Evaluation Dataset Is Online, Involving More Than 20 Human Organs and Parts

Related News

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.

Command Palette

One-click to Run ComfyUI SD3! A large-scale Medical VQA Evaluation Dataset Is Online, Involving More Than 20 Human Organs and Parts

Related News

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.

Related News

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.

Related News

Achieve "voice-over Freedom" With Just 3 Seconds of Audio: Mistral open-source Speech Model Voxtral-4B-TTS-2603; Set a New Benchmark for Data Quality: Sutra 10B Pretraining.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.