5 months ago

Henrique Schechter Vera Sahil Dua Biao Zhang Daniel Salz Ryan Mullins Sindhu Raghuram Panyam Sara Smoot Iftekhar Naim Joe Zou Feiyang Chen

Abstract

We introduce EmbeddingGemma, a new lightweight, open text embedding modelbased on the Gemma 3 language model family. Our innovative training recipestrategically captures knowledge from larger models via encoder-decoderinitialization and geometric embedding distillation. We improve modelrobustness and expressiveness with a spread-out regularizer, and ensuregeneralizability by merging checkpoints from varied, optimized mixtures.Evaluated on the Massive Text Embedding Benchmark (MTEB) across multilingual,English, and code domains, EmbeddingGemma (300M) achieves state-of-the-artresults. Notably, it outperforms prior top models, both proprietary and open,with fewer than 500M parameters, and provides performance comparable to modelsdouble its size, offering an exceptional performance-to-cost ratio. Remarkably,this lead persists when quantizing model weights or truncating embeddingoutputs. This makes EmbeddingGemma particularly well-suited for low-latency andhigh-throughput use cases such as on-device applications. We provide ablationstudies exploring our key design choices. We release EmbeddingGemma to thecommunity to promote further research.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

5 months ago

LLM

Transformer

Retrieval-Augmented Generation

Method/Architecture

Henrique Schechter Vera Sahil Dua Biao Zhang Daniel Salz Ryan Mullins Sindhu Raghuram Panyam Sara Smoot Iftekhar Naim Joe Zou Feiyang Chen

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

5 months ago

LLM

Transformer

Retrieval-Augmented Generation

Method/Architecture

Henrique Schechter Vera Sahil Dua Biao Zhang Daniel Salz Ryan Mullins Sindhu Raghuram Panyam Sara Smoot Iftekhar Naim Joe Zou Feiyang Chen

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

EmbeddingGemma: Powerful and Lightweight Text Representations

Henrique Schechter Vera Sahil Dua Biao Zhang Daniel Salz Ryan Mullins Sindhu Raghuram Panyam Sara Smoot Iftekhar Naim Joe Zou Feiyang Chen78 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

EmbeddingGemma: Powerful and Lightweight Text Representations

Henrique Schechter Vera Sahil Dua Biao Zhang Daniel Salz Ryan Mullins Sindhu Raghuram Panyam Sara Smoot Iftekhar Naim Joe Zou Feiyang Chen78 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

EmbeddingGemma: Powerful and Lightweight Text Representations

Henrique Schechter Vera Sahil Dua Biao Zhang Daniel Salz Ryan Mullins Sindhu Raghuram Panyam Sara Smoot Iftekhar Naim Joe Zou Feiyang Chen78 more

Abstract

Build AI with AI

HyperAI Newsletters

Henrique Schechter Vera Sahil Dua Biao Zhang Daniel Salz Ryan Mullins Sindhu Raghuram Panyam Sara Smoot Iftekhar Naim Joe Zou Feiyang Chen

Henrique Schechter Vera Sahil Dua Biao Zhang Daniel Salz Ryan Mullins Sindhu Raghuram Panyam Sara Smoot Iftekhar Naim Joe Zou Feiyang Chen

Henrique Schechter Vera Sahil Dua Biao Zhang Daniel Salz Ryan Mullins Sindhu Raghuram Panyam Sara Smoot Iftekhar Naim Joe Zou Feiyang Chen