Countdown 3 Days! Book an Appointment for Apple WWDC24 Live Broadcast Now; RLAIF-V large-scale Multimodal Preference Dataset Is Online, Effectively Reducing the Hallucination Phenomenon of Different MLLMs

From June 3rd to June 7th, hyper.ai official website updates:

High-quality public datasets: 10

High-quality tutorial selection: 2

Community Article Selection: 3 articles

Popular encyclopedia entries: 5

Top conferences with deadlines in June and July: 5

Visit the official website:hyper.ai

Selected public datasets

1. ChartQA Chart Question Benchmark Dataset

The dataset covers 9.6K human-written questions and 23.1K questions generated from human-written diagram summaries, and is designed to solve complex problems involving visual and logical reasoning.

Direct use:https://go.hyper.ai/5tJE9

2. RS5M Large-scale Image-Text Pairing Remote Sensing Dataset

The RS5M dataset contains 5 million remote sensing images with English descriptions. This dataset is obtained by screening publicly available image-text pairing datasets and labeled remote sensing (RS) datasets using a pre-trained visual language model (VLM).
Direct use:https://go.hyper.ai/jbwsV

3. CapsFusion-120M Multimodal Image and Text Dataset

This dataset contains image and text information from the LAION-2B and LAION-COCO datasets, which can be used for large-scale multimodal pre-training or to further study the quality of image and text data.

Direct use:https://go.hyper.ai/pEE7u

4. ShareGPT4V Large-scale High-quality Image and Text Dataset

The dataset contains 1.2 million image-text pairs that effectively align visual and language features, enhance the model's ability to follow instructions, and incorporate more academic tasks such as ScienceQA, TextVQA, SBU, etc.

Direct use:https://go.hyper.ai/9CVao

5. RLAIF-V-Dataset Large-scale Multimodal Preference Dataset

The RLAIF-V dataset is an AI-generated multimodal preference dataset that covers a variety of tasks and domains. The dataset contains more than 44,757 sets of high-quality comparison pairs for training and evaluating multimodal large language models.

Direct use:https://go.hyper.ai/cG6fp

6. FoodLogoDet-1500 High-quality food logo detection dataset

The dataset consists of 1,500 categories, 99,768 images, and 145,400 objects. This is the first and largest publicly available food logo detection dataset.

Direct use:https://go.hyper.ai/eco23

7. ZSFooD Food Image Dataset

The dataset contains 20,603 food images collected from 10 restaurant scenes, each of which has multiple food objects annotated with bounding boxes, consisting of 95,322 bounding boxes and 291 classes.

Direct use:https://go.hyper.ai/6xrrC

8. Food-1K Food Image Dataset

The dataset contains more than 1,000 fine-grained food categories and more than 500,000 images, and was used by ICCV 2021 for the Workshop LargeFineFoodAI large-scale fine-grained food analysis competition.

Direct use:https://go.hyper.ai/sjZJi

9. ISIA Ingredient-201 Food Image Dataset

There are 201 subcategories in this dataset, covering common types of existing food categories. Food images are collected in 5 food-related scenes, and at least 150 food categories are collected in each scene.

Direct use:https://go.hyper.ai/bGe45

10. ISIA Food-500 Food Dishes Dataset

The dataset contains 399,726 food items, including more than 500 dishes. Each item includes the food name and food image.

Direct use:https://go.hyper.ai/yqco5

For more public datasets, please visit:

https://hyper.ai/datasets

Selected Public Tutorials

1. ComfyUI DynamiCrafter Tutorial | Subvert AI video generation! Convert images to videos in minutes, and adjust details perfectly

The DynamiCrafter model launched by the Chinese University of Hong Kong, Tencent AI Lab, etc. uses video diffusion technology to simulate real-world motion patterns. Combined with text instructions, images can be converted into dynamic videos. This tutorial has built a ComfyUI workflow environment for everyone. Don't worry about node connection errors. Just upload the picture and enter the text to operate!

Run online:https://go.hyper.ai/PWzJR

2. Don’t wait! Come and experience GLM-4-9B-Chat Demo

This week, Zhipu AI released the latest open source achievement of the large base model GLM-4 - GLM-4-9B, which has multimodal capabilities for the first time. In order to let everyone experience this open source model that claims to "surpass Llama3-8B" as soon as possible, Chao Neuro launched the "GLM-4-9B-Chat Demo" tutorial. You can start experiencing the excellent performance of GLM-4-9B-Chat immediately without entering any commands and clicking Clone.

Run online:https://go.hyper.ai/hc5OK

Community Articles

1. Without experimental data to guide directed protein evolution, the research group of Shanghai Jiaotong University published the microenvironment-aware graph neural network ProtLGN

The research group of Hong Liang from Shanghai Jiao Tong University proposed the PROTLGN microenvironment-aware graph neural network, which can learn and predict beneficial amino acid mutation sites from the three-dimensional structure of proteins, guide the design of single-site mutations and multi-site mutations of proteins with different functions, and the PROTLGN designed single-site mutant proteins with more than 40% are superior to their wild-type counterparts. The relevant results have been published in "JCM".

View the full report:https://go.hyper.ai/6FkFu

2. Reshaping the performance boundaries of lithium batteries, Kang Jianqiang's team from Wuhan University of Technology proposed a simplified electrochemical model based on ensemble learning

Kang Jianqiang's team from Wuhan University of Technology proposed a simplified electrochemical model of ensemble learning (ELM) + FIE. ELM accurately predicts the lithium ion concentration of the solid electrode, achieves more accurate voltage prediction than a single model, and its computational complexity is much lower than the P2D model. FIE accurately predicts the lithium ion concentration in the electrolyte near the positive and negative current collectors.

View the full report:https://go.hyper.ai/CWvce

3. Microelectronics is accelerating towards the post-Moore era! Mei Yongfeng's research group at Fudan University integrates DNN and nanofilm technology to accurately analyze the angle of incident light

Professor Mei Yongfeng's research group at the Department of Materials Science at Fudan University proposed a multi-level quasi-static finite element analysis method and designed and constructed six types of silicon/chromium nanofilm assembled three-dimensional microstructures and corresponding three-dimensional light detectors, verifying the good versatility and industrial practicability of the technology. The relevant results have been published in "Nature".

View the full report:https://go.hyper.ai/2s73Q

Popular Encyclopedia Articles

1. Nuclear Norm

2. Masked Language Modeling (MLM)

3. Long and short-term memory Long Short-Term Memory

4. YOLOv10 Real-time End-to-End Object Detection

5. Kolmogorov-Arnold Networks

Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:

https://hyper.ai/wiki

Station B live broadcast preview

Apple will hold WWDC24 on June 11 (next Tuesday) Beijing time. HyperAI Super Neural Video Account and Bilibili will broadcast it in real time. Please scan the QR code to make an appointment for the live broadcast↓

In order to help you gain a deeper understanding of Apple's relevant information,The Super Neuro B Station live broadcast room will continue to broadcast the "Apple Special" video.Involves: Past WWDC conferences, executive interviews, related documentaries and other rich content.

The following table is a preview of the content selected by the editor↓↓↓

date	time	content
Monday, June 10	18:00	Steve Jobs
Tuesday, June 11	1:00	Apple WWDC24
Wednesday, June 12	18:00	What makes Apple
Thursday, June 13	18:00	iPhone first release
Friday, June 14	18:00	History of Steve Jobs
Saturday, June 15	18:00	How Apple survived nearly bankruptcy
Sunday, June 16	18:00	Tim Cook's History

Super Neuro TV broadcasts live 24/7. Click to get the "electronic pickles" in the AI field:

http://live.bilibili.com/26483094

Deadline for the conference is June-July

One-stop tracking of top AI academic conferences:https://hyper.ai/events

The above is all the content of this week’s editor’s selection. If you have resources that you want to include on the hyper.ai official website, you are also welcome to leave a message or submit an article to tell us!

See you next week!

About HyperAI

HyperAI (hyper.ai) is the leading artificial intelligence and high-performance computing community in China.We are committed to becoming the infrastructure in the field of data science in China and providing rich and high-quality public resources for domestic developers. So far, we have:

* Provide domestic accelerated download nodes for 1200+ public data sets

* Includes 300+ classic and popular online tutorials

* Interpretation of 100+ AI4Science paper cases

* Support 500+ related terms search

* Hosting the first complete Apache TVM Chinese documentation in China

Visit the official website to start your learning journey:

https://hyper.ai

HyperAI

Countdown 3 Days! Book an Appointment for Apple WWDC24 Live Broadcast Now; RLAIF-V large-scale Multimodal Preference Dataset Is Online, Effectively Reducing the Hallucination Phenomenon of Different MLLMs

2 years ago

Information

Artificial Intelligence

Dataset

Machine Learning

Deep Learning

From June 3rd to June 7th, hyper.ai official website updates:

High-quality public datasets: 10

High-quality tutorial selection: 2

Community Article Selection: 3 articles

Popular encyclopedia entries: 5

Top conferences with deadlines in June and July: 5

Visit the official website:hyper.ai

Selected public datasets

1. ChartQA Chart Question Benchmark Dataset

The dataset covers 9.6K human-written questions and 23.1K questions generated from human-written diagram summaries, and is designed to solve complex problems involving visual and logical reasoning.

Direct use:https://go.hyper.ai/5tJE9

2. RS5M Large-scale Image-Text Pairing Remote Sensing Dataset

3. CapsFusion-120M Multimodal Image and Text Dataset

Direct use:https://go.hyper.ai/pEE7u

4. ShareGPT4V Large-scale High-quality Image and Text Dataset

Direct use:https://go.hyper.ai/9CVao

5. RLAIF-V-Dataset Large-scale Multimodal Preference Dataset

Direct use:https://go.hyper.ai/cG6fp

6. FoodLogoDet-1500 High-quality food logo detection dataset

The dataset consists of 1,500 categories, 99,768 images, and 145,400 objects. This is the first and largest publicly available food logo detection dataset.

Direct use:https://go.hyper.ai/eco23

7. ZSFooD Food Image Dataset

Direct use:https://go.hyper.ai/6xrrC

8. Food-1K Food Image Dataset

Direct use:https://go.hyper.ai/sjZJi

9. ISIA Ingredient-201 Food Image Dataset

Direct use:https://go.hyper.ai/bGe45

10. ISIA Food-500 Food Dishes Dataset

The dataset contains 399,726 food items, including more than 500 dishes. Each item includes the food name and food image.

Direct use:https://go.hyper.ai/yqco5

For more public datasets, please visit:

https://hyper.ai/datasets

Selected Public Tutorials

1. ComfyUI DynamiCrafter Tutorial | Subvert AI video generation! Convert images to videos in minutes, and adjust details perfectly

Run online:https://go.hyper.ai/PWzJR

2. Don’t wait! Come and experience GLM-4-9B-Chat Demo

Run online:https://go.hyper.ai/hc5OK

Community Articles

1. Without experimental data to guide directed protein evolution, the research group of Shanghai Jiaotong University published the microenvironment-aware graph neural network ProtLGN

View the full report:https://go.hyper.ai/6FkFu

2. Reshaping the performance boundaries of lithium batteries, Kang Jianqiang's team from Wuhan University of Technology proposed a simplified electrochemical model based on ensemble learning

View the full report:https://go.hyper.ai/CWvce

View the full report:https://go.hyper.ai/2s73Q

Popular Encyclopedia Articles

1. Nuclear Norm

2. Masked Language Modeling (MLM)

3. Long and short-term memory Long Short-Term Memory

4. YOLOv10 Real-time End-to-End Object Detection

5. Kolmogorov-Arnold Networks

Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:

https://hyper.ai/wiki

Station B live broadcast preview

The following table is a preview of the content selected by the editor↓↓↓

date	time	content
Monday, June 10	18:00	Steve Jobs
Tuesday, June 11	1:00	Apple WWDC24
Wednesday, June 12	18:00	What makes Apple
Thursday, June 13	18:00	iPhone first release
Friday, June 14	18:00	History of Steve Jobs
Saturday, June 15	18:00	How Apple survived nearly bankruptcy
Sunday, June 16	18:00	Tim Cook's History

Super Neuro TV broadcasts live 24/7. Click to get the "electronic pickles" in the AI field:

http://live.bilibili.com/26483094

Deadline for the conference is June-July

One-stop tracking of top AI academic conferences:https://hyper.ai/events

See you next week!

About HyperAI

* Provide domestic accelerated download nodes for 1200+ public data sets

* Includes 300+ classic and popular online tutorials

* Interpretation of 100+ AI4Science paper cases

* Support 500+ related terms search

* Hosting the first complete Apache TVM Chinese documentation in China

Visit the official website to start your learning journey:

https://hyper.ai

Countdown 3 Days! Book an Appointment for Apple WWDC24 Live Broadcast Now; RLAIF-V large-scale Multimodal Preference Dataset Is Online, Effectively Reducing the Hallucination Phenomenon of Different MLLMs

2 years ago

Information

Artificial Intelligence

Dataset

Machine Learning

Deep Learning

From June 3rd to June 7th, hyper.ai official website updates:

High-quality public datasets: 10

High-quality tutorial selection: 2

Community Article Selection: 3 articles

Popular encyclopedia entries: 5

Top conferences with deadlines in June and July: 5

Visit the official website:hyper.ai

Selected public datasets

1. ChartQA Chart Question Benchmark Dataset

The dataset covers 9.6K human-written questions and 23.1K questions generated from human-written diagram summaries, and is designed to solve complex problems involving visual and logical reasoning.

Direct use:https://go.hyper.ai/5tJE9

2. RS5M Large-scale Image-Text Pairing Remote Sensing Dataset

3. CapsFusion-120M Multimodal Image and Text Dataset

Direct use:https://go.hyper.ai/pEE7u

4. ShareGPT4V Large-scale High-quality Image and Text Dataset

Direct use:https://go.hyper.ai/9CVao

5. RLAIF-V-Dataset Large-scale Multimodal Preference Dataset

Direct use:https://go.hyper.ai/cG6fp

6. FoodLogoDet-1500 High-quality food logo detection dataset

The dataset consists of 1,500 categories, 99,768 images, and 145,400 objects. This is the first and largest publicly available food logo detection dataset.

Direct use:https://go.hyper.ai/eco23

7. ZSFooD Food Image Dataset

Direct use:https://go.hyper.ai/6xrrC

8. Food-1K Food Image Dataset

Direct use:https://go.hyper.ai/sjZJi

9. ISIA Ingredient-201 Food Image Dataset

Direct use:https://go.hyper.ai/bGe45

10. ISIA Food-500 Food Dishes Dataset

The dataset contains 399,726 food items, including more than 500 dishes. Each item includes the food name and food image.

Direct use:https://go.hyper.ai/yqco5

For more public datasets, please visit:

https://hyper.ai/datasets

Selected Public Tutorials

1. ComfyUI DynamiCrafter Tutorial | Subvert AI video generation! Convert images to videos in minutes, and adjust details perfectly

Run online:https://go.hyper.ai/PWzJR

2. Don’t wait! Come and experience GLM-4-9B-Chat Demo

Run online:https://go.hyper.ai/hc5OK

Community Articles

1. Without experimental data to guide directed protein evolution, the research group of Shanghai Jiaotong University published the microenvironment-aware graph neural network ProtLGN

View the full report:https://go.hyper.ai/6FkFu

2. Reshaping the performance boundaries of lithium batteries, Kang Jianqiang's team from Wuhan University of Technology proposed a simplified electrochemical model based on ensemble learning

View the full report:https://go.hyper.ai/CWvce

View the full report:https://go.hyper.ai/2s73Q

Popular Encyclopedia Articles

1. Nuclear Norm

2. Masked Language Modeling (MLM)

3. Long and short-term memory Long Short-Term Memory

4. YOLOv10 Real-time End-to-End Object Detection

5. Kolmogorov-Arnold Networks

Here are hundreds of AI-related terms compiled to help you understand "artificial intelligence" here:

https://hyper.ai/wiki

Station B live broadcast preview

The following table is a preview of the content selected by the editor↓↓↓

date	time	content
Monday, June 10	18:00	Steve Jobs
Tuesday, June 11	1:00	Apple WWDC24
Wednesday, June 12	18:00	What makes Apple
Thursday, June 13	18:00	iPhone first release
Friday, June 14	18:00	History of Steve Jobs
Saturday, June 15	18:00	How Apple survived nearly bankruptcy
Sunday, June 16	18:00	Tim Cook's History

Super Neuro TV broadcasts live 24/7. Click to get the "electronic pickles" in the AI field:

http://live.bilibili.com/26483094

Deadline for the conference is June-July

One-stop tracking of top AI academic conferences:https://hyper.ai/events

See you next week!

About HyperAI

* Provide domestic accelerated download nodes for 1200+ public data sets

* Includes 300+ classic and popular online tutorials

* Interpretation of 100+ AI4Science paper cases

* Support 500+ related terms search

* Hosting the first complete Apache TVM Chinese documentation in China

Visit the official website to start your learning journey:

https://hyper.ai

Command Palette

Countdown 3 Days! Book an Appointment for Apple WWDC24 Live Broadcast Now; RLAIF-V large-scale Multimodal Preference Dataset Is Online, Effectively Reducing the Hallucination Phenomenon of Different MLLMs

Command Palette

Countdown 3 Days! Book an Appointment for Apple WWDC24 Live Broadcast Now; RLAIF-V large-scale Multimodal Preference Dataset Is Online, Effectively Reducing the Hallucination Phenomenon of Different MLLMs

Related News

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Command Palette

Countdown 3 Days! Book an Appointment for Apple WWDC24 Live Broadcast Now; RLAIF-V large-scale Multimodal Preference Dataset Is Online, Effectively Reducing the Hallucination Phenomenon of Different MLLMs

Related News

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Related News

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Related News

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

Fast and Accurate! Cohere Releases open-source Transcription Model; Accurate Parsing of Complex Scenarios: Chandra-ocr-2 Visual Language Model Achieves Precise OCR.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

Anima V1, a brand-new Raw Image Model, Has Been Released, Focusing on anime-style Image Generation; the MemLens Multimodal long-range Memory Evaluation Dataset Covers cross-conversation text-to-image Reasoning and Knowledge Update mechanisms.

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.