ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

2 months ago

Information

Artificial Intelligence

Lance, released by ByteDance in 2026, is a native unified multimodal model. Employing a 3B activity parameter design, it can simultaneously perform image and video understanding, generation, and editing within a single framework. This model achieves capability sharing across text, image, and video tasks through unified multimodal representation and multi-task collaborative training. Its core utilizes a two-stream hybrid expert (MoE) architecture and modality-aware rotational position encoding (MaPE), achieving unified context learning on shared interleaved multimodal sequences while cleverly decoupling the capability paths of understanding and generation. Combined with a phased multi-task training strategy, Lance significantly surpasses existing open-source unified models in terms of image and video generation quality while maintaining excellent multimodal semantic understanding capabilities.

The HyperAI website now features "Lance: Unifying Multimodal Understanding, Generation, and Editing Models," so come and try it out!

Online use:https://go.hyper.ai/Okkmw

Welcome to visit our official website for more information:

https://hyper.ai

A quick overview of hyper.ai's official website updates from May 23rd to May 29th:

* High-quality public datasets: 3

* Selection of high-quality tutorials: 3

* Community article interpretation: 3 articles

* Popular encyclopedia entries: 5

Visit the official website:hyper.ai

Selected public datasets

1. ViMU Video Metaphor Understanding Dataset

ViMU is a benchmark dataset for video metaphor understanding released by the National University of Singapore in 2026. It aims to evaluate the ability of multimodal large models to understand deep semantic meanings of video metaphors.

Online use:https://go.hyper.ai/0DIpe

2. Rice Leaf Diseases Dataset

Rice Leaf Disease Detection is a rice leaf image dataset specifically designed for precision agriculture target detection tasks. It is widely used in applications such as YOLO model training, agricultural disease detection, edge vision deployment, and intelligent rice planting management. This dataset contains 8,665 rice leaf images, covering 9 categories, including healthy rice leaves and 8 common diseases: bacterial leaf blight, brown spot, rice leaf roller damage, rice blast, leaf scorch, leaf smut, narrow brown spot, and neck blast.

Online use:https://go.hyper.ai/IXOlY

3. MRI Brain Neurodegenerative Diseases Dataset

MRI Brain Neurodegenerative Diseases is an MRI dataset designed for research and medical image analysis of neurodegenerative diseases of the brain. It is widely used in research areas such as disease classification, medical image recognition, and deep learning model training. The dataset contains 2,846 brain MRI images with a resolution of 512 × 512, organized according to two imaging weights and four main categories.

Online use:https://go.hyper.ai/VpFoh

Selected Public Tutorials

1. Lance: A unified model for multimodal understanding, generation, and editing.

Lance, released by ByteDance in 2026, is a 3B-scale native unified multimodal model designed for tasks such as image understanding, video understanding, text-to-image generation, text-to-video generation, image editing, and video editing. Lance's key feature is that it processes understanding, generation, and editing within the same model framework, enabling text, image, and video tasks to share a unified multimodal representation. It can generate images or videos from text, perform visual editing by combining input images, input videos, and text instructions, and perform question answering, description, and reasoning on images and videos.

Run online:https://go.hyper.ai/Okkmw

2. HY-World-2.0 World Model

HY-World-2.0 is a multimodal world model framework launched by Tencent in 2026. Unlike world models that only generate pixel videos (such as Genie 3 and Cosmos), HY-World-2.0 directly generates realistic 3D assets (mesh/3DGS), which are editable, persistent, and can be directly imported into game engines such as Blender, Unity, and Unreal Engine.

Run online:https://go.hyper.ai/ZQpHM

3. AutoFigure: An LLM-based automatic figure generation system for academic papers

AutoFigure is an intelligent academic illustration generation system developed by the ResearchAI team at Westlake University and published at ICLR 2026. This system utilizes a large language model (LM) through an iterative optimization mechanism to automatically generate high-quality scientific illustrations that meet publication standards from text descriptions or research papers. It supports both SVG vector graphics and mxGraph XML (fully compatible with draw.io) output formats.

Run online:https://go.hyper.ai/ZrWS4

Community article interpretation

1. CVEvolve, a zero-code, self-discovery scientific image processing algorithm proposed by Argonne National Laboratory, possesses full-stack capabilities including coding, result self-checking, and strategy optimization.

A research team at Argonne National Laboratory (ANL) in the United States has developed a zero-code autonomous agent framework called CVEvolve after systematically analyzing past AI-based automated work. This framework is designed to discover algorithms needed for scientific data processing. It possesses strong versatility, requiring no pre-defined problem architecture or fixed process templates. It can achieve closed-loop linkage of various elements such as code, data, evaluation metrics, retrieval records, and visualization results, supporting the development of executable algorithms for computer vision, image processing, and other fields.

View the full report:https://go.hyper.ai/UBS5q

2. In just 30 minutes, the biological multi-agent Robin successfully integrated 550 research papers, establishing an autonomous research loop and identifying dAMD candidate therapies.

A joint team from FutureHouse in San Francisco, the University of Oxford, and Fordham University has proposed the Robin biological multi-agent system. This is the first biomedical intelligent system to simultaneously integrate scientific hypothesis generation and experimental data analysis capabilities, achieving a continuous closed-loop workflow.

View the full report:https://go.hyper.ai/KnYpQ

3. Scientists have independently generated novel materials by reverse-engineering gallium-containing materials using a Bayesian optimization framework. The optimization results exhibit uniqueness and novelty.

A research team led by Flinders University in collaboration with Khalifa University in the UAE has proposed a machine learning-guided Bayesian optimization (BO) framework that enables the reverse design of gallium-based compositions with predetermined electronic properties while maintaining chemical rationality. Analytical results after optimization show that the generated material possesses 100% uniqueness and novelty relative to the training data, and that the SMACT effectiveness is significantly improved within the 1.5–2.5 eV bandgap range.

View the full report:https://go.hyper.ai/kXS7f

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

2 months ago

Information

Artificial Intelligence

The HyperAI website now features "Lance: Unifying Multimodal Understanding, Generation, and Editing Models," so come and try it out!

Online use:https://go.hyper.ai/Okkmw

Welcome to visit our official website for more information:

https://hyper.ai

A quick overview of hyper.ai's official website updates from May 23rd to May 29th:

* High-quality public datasets: 3

* Selection of high-quality tutorials: 3

* Community article interpretation: 3 articles

* Popular encyclopedia entries: 5

Visit the official website:hyper.ai

Selected public datasets

1. ViMU Video Metaphor Understanding Dataset

Online use:https://go.hyper.ai/0DIpe

2. Rice Leaf Diseases Dataset

Online use:https://go.hyper.ai/IXOlY

3. MRI Brain Neurodegenerative Diseases Dataset

Online use:https://go.hyper.ai/VpFoh

Selected Public Tutorials

1. Lance: A unified model for multimodal understanding, generation, and editing.

Run online:https://go.hyper.ai/Okkmw

2. HY-World-2.0 World Model

Run online:https://go.hyper.ai/ZQpHM

3. AutoFigure: An LLM-based automatic figure generation system for academic papers

Run online:https://go.hyper.ai/ZrWS4

Community article interpretation

View the full report:https://go.hyper.ai/UBS5q

2. In just 30 minutes, the biological multi-agent Robin successfully integrated 550 research papers, establishing an autonomous research loop and identifying dAMD candidate therapies.

View the full report:https://go.hyper.ai/KnYpQ

View the full report:https://go.hyper.ai/kXS7f

Command Palette

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

Selected public datasets

Selected Public Tutorials

Community article interpretation

Popular Encyclopedia Articles

Command Palette

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

Selected public datasets

Selected Public Tutorials

Community article interpretation

Popular Encyclopedia Articles

Related News

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

ICML 26 Outstanding Papers: Tsinghua JustGRPO Overcomes the dLLM Inference Bottleneck; Say Goodbye to Simple Instruction Tests: Agents Last Exam Comprehensively Evaluates the long-range Professional Capabilities of Intelligent agents.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

MiniCPM5-1B, Trained Using RL+OPD, Achieves state-of-the-art (SOTA) Performance on Multiple Complex Tasks; the CHI-Bench Dataset for Evaluating Medical Agents, Designed for Automation of Complex Healthcare Processes, Has Been released.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.

Command Palette

ByteDance open-sources Lance, a 3B Model Encompassing Understanding, Generation, and Editing; the National University of Singapore Proposes the ViMU Dataset: Covering 588 Videos and non-verbal Question answering.

Selected public datasets

Selected Public Tutorials

Community article interpretation

Popular Encyclopedia Articles

Related News

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

ICML 26 Outstanding Papers: Tsinghua JustGRPO Overcomes the dLLM Inference Bottleneck; Say Goodbye to Simple Instruction Tests: Agents Last Exam Comprehensively Evaluates the long-range Professional Capabilities of Intelligent agents.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

MiniCPM5-1B, Trained Using RL+OPD, Achieves state-of-the-art (SOTA) Performance on Multiple Complex Tasks; the CHI-Bench Dataset for Evaluating Medical Agents, Designed for Automation of Complex Healthcare Processes, Has Been released.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.

Related News

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

ICML 26 Outstanding Papers: Tsinghua JustGRPO Overcomes the dLLM Inference Bottleneck; Say Goodbye to Simple Instruction Tests: Agents Last Exam Comprehensively Evaluates the long-range Professional Capabilities of Intelligent agents.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

MiniCPM5-1B, Trained Using RL+OPD, Achieves state-of-the-art (SOTA) Performance on Multiple Complex Tasks; the CHI-Bench Dataset for Evaluating Medical Agents, Designed for Automation of Complex Healthcare Processes, Has Been released.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.

Related News

Can Emojis Control Speech Generation? Irodori-TTS Is a Japanese TTS Based on the RF-DiT Architecture; Eczema and Tinea Skin Disease Datasets: Supporting Medical Image Classification and Transfer learning.

Tencent open-sources Hy-MT1.5 Translation Model: 440MB Achieves top-tier Translation Capabilities; MIT Jointly Releases MathNet: a Multimodal Mathematical Inference Benchmark Covering 27,000 Real Olympiad Math problems.

4-step Image output/4K quality/6x Speedup, PiD Uses Pixel Diffusion to Unify Decoding and super-resolution Output; SA-3DAO: a Dataset Containing 1000 Pairs of Real Images Paired With Handcrafted 3D Meshes by artists.

ICML 26 Outstanding Papers: Tsinghua JustGRPO Overcomes the dLLM Inference Bottleneck; Say Goodbye to Simple Instruction Tests: Agents Last Exam Comprehensively Evaluates the long-range Professional Capabilities of Intelligent agents.

Zero-sampling TTS Breakthrough! A Few Seconds of Reference Audio, OmniVoice Helps You Easily Clone Hundreds of Languages; 17 Languages All in One Go: MDPbench Solves the Major Problem of Parsing low-resource Text systems.

Extremely Lightweight, yet With Undiminished Image Quality! ERNIE-Image-Turbo: Say Goodbye to Long Waits, lightning-fast Speed; Introducing dual-dimensional Metrics of Perception and Cognition: Alibaba's Unified Multimodal Parsing and Evaluation Dataset OmniParsingBench Is Now online.

MiniCPM5-1B, Trained Using RL+OPD, Achieves state-of-the-art (SOTA) Performance on Multiple Complex Tasks; the CHI-Bench Dataset for Evaluating Medical Agents, Designed for Automation of Complex Healthcare Processes, Has Been released.

A Locally Runnable Privacy Detection Model: Privacy Filter Achieves high-quality PII Filtering at Low Cost; Hardcore Open Source! Covering the Transfermarkt Structured Football Dataset With Over 80,000 matches.

Supports live-action/animation/animal-driven Video Generation; Meituan's open-source multi-style audio-driven Video Generation Framework LongCat 1.5 Enhances VLM's Chart Reconstruction and Table Extraction Capabilities Using the million-level Chart Understanding Dataset ChartNet.