Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

In this work, we introduce the Qwen3 Embedding series, a significantadvancement over its predecessor, the GTE-Qwen series, in text embedding andreranking capabilities, built upon the Qwen3 foundation models. Leveraging theQwen3 LLMs' robust capabilities in multilingual text understanding andgeneration, our innovative multi-stage training pipeline combines large-scaleunsupervised pre-training with supervised fine-tuning on high-quality datasets.Effective model merging strategies further ensure the robustness andadaptability of the Qwen3 Embedding series. During the training process, theQwen3 LLMs serve not only as backbone models but also play a crucial role insynthesizing high-quality, rich, and diverse training data across multipledomains and languages, thus enhancing the training pipeline. The Qwen3Embedding series offers a spectrum of model sizes (0.6B, 4B, 8B) for bothembedding and reranking tasks, addressing diverse deployment scenarios whereusers can optimize for either efficiency or effectiveness. Empiricalevaluations demonstrate that the Qwen3 Embedding series achievesstate-of-the-art results across diverse benchmarks. Notably, it excels on themultilingual evaluation benchmark MTEB for text embedding, as well as invarious retrieval tasks, including code retrieval, cross-lingual retrieval andmultilingual retrieval. To facilitate reproducibility and promotecommunity-driven research and development, the Qwen3 Embedding models arepublicly available under the Apache 2.0 license.