News
Models
Products
keyboard_arrow_down
DeepSearch
Search, read and reason until best answer found.
Reader
Convert any URL to Markdown for better grounding LLMs.
Embeddings
World-class multimodal multilingual embeddings.
Reranker
World-class reranker for maximizing search relevancy.
More
keyboard_arrow_down
Classifier
Zero-shot and few-shot classification for image and text.
Segmenter
Cut long text into chunks and do tokenization.

API Docs
Auto codegen for your copilot IDE or LLM
open_in_new


Company
keyboard_arrow_down
About us
Contact sales
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms & Conditions


Log in
login
Model Architecture
Get Started
Conclusion
star
Featured
Press release
September 18, 2024

Jina Embeddings v3: A Frontier Multilingual Embedding Model

jina-embeddings-v3 is a frontier multilingual text embedding model with 570M parameters and 8192 token-length, outperforming the latest proprietary embeddings from OpenAI and Cohere on MTEB.
Dynamic image showing the characters "V3" formed by bright green dots varying in size on a black background.
Jina AI
Jina AI • 10 minutes read
jinaai/jina-embeddings-v3 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
jina-embeddings-v3: Multilingual Embeddings With Task LoRA
We introduce jina-embeddings-v3, a novel text embedding model with 570 million parameters, achieves state-of-the-art performance on multilingual data and long-context retrieval tasks, supporting context lengths of up to 8192 tokens. The model includes a set of task-specific Low-Rank Adaptation (LoRA) adapters to generate high-quality embeddings for query-document retrieval, clustering, classification, and text matching. Additionally, Matryoshka Representation Learning is integrated into the training process, allowing flexible truncation of embedding dimensions without compromising performance. Evaluation on the MTEB benchmark shows that jina-embeddings-v3 outperforms the latest proprietary embeddings from OpenAI and Cohere on English tasks, while achieving superior performance compared to multilingual-e5-large-instruct across all multilingual tasks.
arXiv.orgSaba Sturua

Today, we are excited to announce jina-embeddings-v3, a frontier text embedding model with 570 million parameters. It achieves state-of-the-art performance on multilingual data and long-context retrieval tasks, supporting input length of up to 8192 tokens. The model features task-specific Low-Rank Adaptation (LoRA) adapters, enabling it to generate high-quality embeddings for various tasks including query-document retrieval, clustering, classification, and text matching.

In evaluations on the MTEB English, Multilingual and LongEmbed, jina-embeddings-v3 outperforms the latest proprietary embeddings from OpenAI and Cohere on English tasks, while also surpassing multilingual-e5-large-instruct across all multilingual tasks. With a default output dimension of 1024, users can arbitrarily truncate embedding dimensions down to 32 without sacrificing performance, thanks to the Matryoshka Representation Learning (MRL) integration.

Chart comparing the performance of various NLP tools on MTEB English Tasks, with scores ranging from 60 to 65.5, displayed on
The performance of jina-embeddings-v3 vs other embedding models across all MTEB English tasks. Full evaluation results per task can be found in our arXiv paper.
Graph depicting MTEB Multilingual Tasks Performance, comparing multilingual embeddings and 'jina embeddings' versions with sc
The performance of jina-embeddings-v3 has been evaluated across a broad selection of multilingual and cross-lingual MTEB tasks. Please note that jina-embeddings-v2-(zh/es/de) refers to our bilingual model suite, which was only tested on Chinese, Spanish, and German monolingual and cross-lingual tasks, excluding all other languages. Additionally, we do not report scores for openai-text-embedding-3-large and cohere-embed-multilingual-v3.0, as these models were not evaluated on the full range of multilingual and cross-lingual MTEB tasks.
Bar graph showing performance of different embeddings on long document retrieval tasks with scores for various libraries.
The performance of jina-embeddings-v3 on six long-document retrieval tasks from the LongEmbed benchmark shows a significant improvement over other models. Scores are nDCG@10; higher is better. This suggests the effectiveness of our RoPE-based positional embeddings, which outperform both the fixed positional embeddings used by baai-bge-m3 and the ALiBi-based approach used in jina-embeddings-v2.

As of its release on September 18, 2024, jina-embeddings-v3 is the best multilingual model and ranks 2nd on the MTEB English leaderboard for models with fewer than 1 billion parameters. v3 supports 89 languages in total, including 30 languages with the best performance: Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, Georgian, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Urdu, and Vietnamese.

Leaderboard table comparing language models across various performance metrics with highlighted rankings, set on a dark, prof
As of its release on September 18, 2024, jina-embeddings-v3, featuring 570 million parameters and 1024 output dimensions, stands as the most efficient, powerful, and reliable multilingual embedding model with fewer than 1 billion parameters.
Graph showing Scaling Law of Embedding Models with 'Parameter Size' on the x-axis and 'MTEB Performance' on the y-axis, featu
Scaling law of embedding models. The average MTEB performance on English tasks is plotted against the number of model parameters. Each dot represents an embedding model. The trendline, representing all models, is highlighted, with multilingual models emphasized in cyan. One can see that jina-embeddings-v3 demonstrates superior performance compared to models of similar size, also showing a superlinear improvement over its predecessor, jina-embeddings-v2. This graph was created by selecting top-100 embedding models from the MTEB leaderboard , excluding those without size information, typically closed-source or proprietary models. Submissions identified as obvious trolling were also filtered out.

Additionally, compared to LLM-based embeddings that have recently gained attention, such as e5-mistral-7b-instruct, which has a parameter size of 7.1 billion (12x larger) and an output dimension of 4096 (4x larger) but offers only a 1% improvement on MTEB English tasks, jina-embeddings-v3 is a far more cost-efficient solution, making it more suitable for production and on-edge computing.

tagModel Architecture

Feature Description
Base jina-XLM-RoBERTa
Parameters Base 559M
Parameters w/ LoRA 572M
Max input tokens 8192
Max output dimensions 1024
Layers 24
Vocabulary 250K
Supported languages 89
Attention FlashAttention2, also works w/o
Pooling Mean pooling

The architecture of jina-embeddings-v3 is shown in the figure below. To implement the backbone architecture, we adapted the XLM-RoBERTa model with several key modifications: (1) enabling effective encoding of long text sequences, (2) allowing task-specific encoding of embeddings, and (3) improving overall model efficiency with latest techniques. We continue to use the original XLM-RoBERTa tokenizer. While jina-embeddings-v3, with its 570 million parameters, is larger than jina-embeddings-v2 at 137 million, it is still much smaller than embedding models fine-tuned from LLMs.

Flowchart mapping sentiment classification. Begins with “Downstream Task: sentiment = classify” and includes stages like “Mea
The architecture of jina-embeddings-v3 is based on the jina-XLM-RoBERTa model, with five LoRA adapters for four different tasks.

The key innovation in jina-embeddings-v3 is the use of LoRA adapters. Five task-specific LoRA adapters are introduced to optimize embeddings for four tasks. The model’s input consists of two parts: the text (the long document to be embedded) and the task. jina-embeddings-v3 supports four tasks and implements five adapters to choose from: retrieval.query and retrieval.passage for query and passage embeddings in asymmetric retrieval tasks, separation for clustering tasks, classification for classification tasks, and text-matching for tasks involving semantic similarity, such as STS or symmetric retrieval. The LoRA adapters account for less than 3% of the total parameters, adding very minimal overhead to the computation.

To further enhance performance and reduce memory consumption, we integrate FlashAttention 2, support activation checkpointing, and use the DeepSpeed framework for efficient distributed training.

tagGet Started

tagVia Jina AI Search Foundation API

The easiest way to use jina-embeddings-v3 is to visit Jina AI homepage and navigate to the Search Foundation API section. Starting today, this model is set as the default for all new users. You can explore different parameters and features directly from there.

Screenshot of a dark-themed interface with options like 'Join us', 'Explore', showing 'Start instantly - no credit card or re
curl https://api.jina.ai/v1/embeddings \
	 -H "Content-Type: application/json" \
	 -H "Authorization: Bearer jina_387ced4ff3f04305ac001d5d6577e184hKPgRPGo4yMp_3NIxVsW6XTZZWNL" \
	 -d '{
	"model": "jina-embeddings-v3",
	"task": "text-matching",
	"dimensions": 1024,
	"late_chunking": true,
	"input": [
		"Organic skincare for sensitive skin with aloe vera and chamomile: ...", 
		"Bio-Hautpflege für empfindliche Haut mit Aloe Vera und Kamille: Erleben Sie die wohltuende Wirkung...", 
		"Cuidado de la piel orgánico para piel sensible con aloe vera y manzanilla: Descubre el poder ...", 
		"针对敏感肌专门设计的天然有机护肤产品:体验由芦荟和洋甘菊提取物带来的自然呵护。我们的护肤产品特别为敏感肌设计,...", 
		"新しいメイクのトレンドは鮮やかな色と革新的な技術に焦点を当てています: 今シーズンのメイクアップトレンドは、大胆な色彩と革新的な技術に注目しています。..."
    ]}'

Compared to v2, v3 introduces three new parameters in the API: task, dimensions, and late_chunking.

Parameter task

The task parameter is crucial and must be set according to the downstream task. The resulting embeddings will be optimized for that specific task. For more details, refer to the list below.

task value Task Description
retrieval.passage Embedding documents in a query-document retrieval task
retrieval.query Embedding queries in a query-document retrieval task
separation Clustering documents, visualizing a corpus
classification Text classification
text-matching (Default) Semantic text similarity, general symmetric retrieval, recommendation, finding similar items, deduplication

Note that the API does not first generate a generic meta embedding and then adapt it with an additional fine-tuned MLP. Instead, it inserts the task-specific LoRA adapter into every transformer layer (a total of 24 layers) and performs the encoding in one shot. Further details can be found in our arXiv paper.

Parameter dimensions

The dimensions parameter allows users to choose a trade-off between space efficiency and performance at the lowest cost. Thanks to the MRL technique used in jina-embeddings-v3, you can reduce the dimensions of embeddings as much as you want (even down to a single dimension!). Smaller embeddings are more storage-friendly for vector databases, and their performance cost can be estimated from the figure below.

Scatter plot titled "Performance of Different Output Dimensions" showing performance metrics across increasing MRL dimensions

Parameter late_chunking

Late Chunking in Long-Context Embedding Models
Chunking long documents while preserving contextual information is challenging. We introduce the “Late Chunking” that leverages long-context embedding models to generate contextual chunk embeddings for better retrieval applications.
GitHub

Finally, the late_chunking parameter controls whether to use the new chunking method we introduced last month for encoding a batch of sentences. When set to true, our API will concatenate all sentences in the input field and feed them as a single string to the model. In other words, we treat the sentences in the input as if they originally come from the same section, paragraph, or document. Internally, the model embeds this long concatenated string and then performs late chunking, returning a list of embeddings that matches the size of the input list. Each embedding in the list is therefore conditioned on the previous embeddings.

From a user perspective, setting late_chunking does not change the input or output format. You will only notice a change in the embedding values, as they are now computed based on the entire previous context rather than independently. What's important to know when using late_chunking=True is that the total number of tokens (by summing up all tokens in input) per request is restricted to 8192, which is the maximum context length allowed for jina-embeddings-v3. When late_chunking=False, there is no such restriction; the total number of tokens is only subject to the rate limit of the Embedding API.

Late Chunking On vs Off: The input and output format remains the same, with the only difference being the embedding values. When late_chunking is enabled, embeddings are influenced by the entire previous context in input, whereas without it, embeddings are computed independently.

tagVia Azure & AWS

jina-embeddings-v3 is now available on AWS SageMaker and Azure Marketplace.

AWS Marketplace: Jina Embeddings v3
Microsoft Azure Marketplace

If you need to use it beyond those platforms or on-premises within your company, note that the model is licensed under CC BY-NC 4.0. For commercial usage inquiries, feel free to contact us.

tagVia Vector Databases & Partners

We closely collaborate with vector database providers such as Pinecone, Qdrant, and Milvus, as well as LLM orchestration frameworks like LlamaIndex, Haystack, and Dify. At the time of release, we are pleased to announce that Pinecone, Qdrant, Milvus and Haystack have already integrated support for jina-embeddings-v3, including the three new parameters: task, dimensions, and late_chunking. Other partners that have already integrated with the v2 API should also support v3 by simply changing the model name to jina-embeddings-v3. However, they may not yet support the new parameters introduced in v3.

Via Pinecone

The vector database to build knowledgeable AI | Pinecone
Search through billions of items for similar matches to any object, in milliseconds. It’s the next generation of search, an API call away.
Pinecone Docs

Via Qdrant

Jina Embeddings - Qdrant
Qdrant is an Open-Source Vector Database and Vector Search Engine written in Rust. It provides fast and scalable vector similarity search service with convenient API.
logoQdrant

Via Milvus

Integrate Milvus with Jina | Milvus Documentation
This guide demonstrates how to use Jina embeddings and Milvus to conduct similarity search and retrieval tasks. | v2.4.x
milvus-logo

Via Haystack

Jina AI | Haystack
Use the latest Jina AI embedding models
HaystackAuthors deepset

tagConclusion

In October 2023, we released jina-embeddings-v2-base-en, the world’s first open-source embedding model with an 8K context length. It was the only text embedding model that supported long context and matched OpenAI's text-embedding-ada-002. Today, after a year of learning, experimentation, and valuable lessons, we are proud to release jina-embeddings-v3—a new frontier in text embedding models and a big milestone of our company.

With this release, we continue to excel in what we are known for: long-context embeddings, while also addressing the most requested feature from both the industry and the community—multilingual embeddings. At the same time, we push performance to a new high. With new features such as Task-specific LoRA, MRL, and late chunking, we believe jina-embeddings-v3 will truly serve as the foundational embedding model for various applications, including RAG, agents, and more. Compared to recent LLM-based embeddings like NV-embed-v1/v2, our model is highly parameter-efficient, making it much more suitable for production and edge devices.

Moving forward, we plan to focus on evaluating and improving jina-embeddings-v3 performance on low-resource languages and further analyzing systematic failures caused by limited data availability. Moreover, the model weights of jina-embeddings-v3, along with its innovative features and hot takes, will serve as the foundation for our upcoming models, including jina-clip-v2, jina-reranker-v3, and reader-lm-v2.

Categories:
star
Featured
Press release
rss_feed

Read more
April 08, 2025 • 21 minutes read
jina-reranker-m0: Multilingual Multimodal Document Reranker
Jina AI
Modern dot matrix text display on a dark blue background, conveying a digital feel.
January 15, 2025 • 17 minutes read
ReaderLM v2: Frontier Small Language Model for HTML to Markdown and JSON
Jina AI
Orange text "ReaderLM-u2" on a vibrant dark red digital screen.
December 16, 2024 • 2 minutes read
Re·Search: Order 2024 Yearbook of Search Foundation Advances
Jina AI
Open red publication "ReSearch" volume 24 displayed on a white surface with a distinctive shadow casting over the pages.
Offices
location_on
Sunnyvale, CA
710 Lakeway Dr, Ste 200, Sunnyvale, CA 94085, USA
location_on
Berlin, Germany (HQ)
Prinzessinnenstraße 19-20, 10969 Berlin, Germany
location_on
Beijing, China
Level 5, Building 6, No.48 Haidian West St. Beijing, China
location_on
Shenzhen, China
402 Floor 4, Fu'an Technology Building, Shenzhen, China
Search Foundation
DeepSearch
Reader
Embeddings
Reranker
Classifier
Segmenter
API Documentation
Get Jina API key
Rate Limit
API Status
Company
About us
Contact sales
Newsroom
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms
Security
Terms & Conditions
Privacy
Manage Cookies
email
Jina AI © 2020-2025.