5 months ago

Ahmed Nassar Andres Marafioti Matteo Omenetti Maksym Lysak Nikolaos Livathinos Christoph Auer Lucas Morin Rafael Teixeira de Lima Yusik Kim A. Said Gurbuz

Abstract

We introduce SmolDocling, an ultra-compact vision-language model targetingend-to-end document conversion. Our model comprehensively processes entirepages by generating DocTags, a new universal markup format that captures allpage elements in their full context with location. Unlike existing approachesthat rely on large foundational models, or ensemble solutions that rely onhandcrafted pipelines of multiple specialized models, SmolDocling offers anend-to-end conversion for accurately capturing content, structure and spatiallocation of document elements in a 256M parameters vision-language model.SmolDocling exhibits robust performance in correctly reproducing documentfeatures such as code listings, tables, equations, charts, lists, and moreacross a diverse range of document types including business documents, academicpapers, technical reports, patents, and forms -- significantly extending beyondthe commonly observed focus on scientific papers. Additionally, we contributenovel publicly sourced datasets for charts, tables, equations, and coderecognition. Experimental results demonstrate that SmolDocling competes withother Vision Language Models that are up to 27 times larger in size, whilereducing computational requirements substantially. The model is currentlyavailable, datasets will be publicly available soon.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

5 months ago

Document Understanding

Multimodal

Any-to-Any

Natural Language Processing

Multimodality

Task/Problem

Ahmed Nassar Andres Marafioti Matteo Omenetti Maksym Lysak Nikolaos Livathinos Christoph Auer Lucas Morin Rafael Teixeira de Lima Yusik Kim A. Said Gurbuz

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

5 months ago

Document Understanding

Multimodal

Any-to-Any

Natural Language Processing

Multimodality

Task/Problem

Ahmed Nassar Andres Marafioti Matteo Omenetti Maksym Lysak Nikolaos Livathinos Christoph Auer Lucas Morin Rafael Teixeira de Lima Yusik Kim A. Said Gurbuz

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Ahmed Nassar Andres Marafioti Matteo Omenetti Maksym Lysak Nikolaos Livathinos Christoph Auer Lucas Morin Rafael Teixeira de Lima Yusik Kim A. Said Gurbuz3 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Ahmed Nassar Andres Marafioti Matteo Omenetti Maksym Lysak Nikolaos Livathinos Christoph Auer Lucas Morin Rafael Teixeira de Lima Yusik Kim A. Said Gurbuz3 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Ahmed Nassar Andres Marafioti Matteo Omenetti Maksym Lysak Nikolaos Livathinos Christoph Auer Lucas Morin Rafael Teixeira de Lima Yusik Kim A. Said Gurbuz3 more

Abstract

Build AI with AI

HyperAI Newsletters

Ahmed Nassar Andres Marafioti Matteo Omenetti Maksym Lysak Nikolaos Livathinos Christoph Auer Lucas Morin Rafael Teixeira de Lima Yusik Kim A. Said Gurbuz

Ahmed Nassar Andres Marafioti Matteo Omenetti Maksym Lysak Nikolaos Livathinos Christoph Auer Lucas Morin Rafael Teixeira de Lima Yusik Kim A. Said Gurbuz

Ahmed Nassar Andres Marafioti Matteo Omenetti Maksym Lysak Nikolaos Livathinos Christoph Auer Lucas Morin Rafael Teixeira de Lima Yusik Kim A. Said Gurbuz