News
Models
Products
keyboard_arrow_down
Reader
Convert any URL to Markdown for better grounding LLMs.
Embeddings
World-class multimodal multilingual embeddings.
Reranker
World-class reranker for maximizing search relevancy.
DeepSearch
Search, read and reason until best answer found.
More
keyboard_arrow_down
Classifier
Zero-shot and few-shot classification for image and text.
Segmenter
Cut long text into chunks and do tokenization.

MCP Server
Add mcp.jina.ai as your MCP server to access our API in LLMs
open_in_new
API Docs
Auto codegen for your copilot IDE or LLM
open_in_new


Company
keyboard_arrow_down
About us
Contact sales
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms & Conditions



Log in
login
Model Architecture
Getting Started
Conclusion
star
Featured
Press release
October 03, 2025

Jina Reranker v3: 0.6B Listwise Reranker for SOTA Multilingual Retrieval

New 0.6B-parameter listwise reranker that considers the query and all candidate documents in a single context window.
Light blue background with stylized text in the center, composed of small dots or squares, evoking a modern and minimalistic
Jina AI
Jina AI • 7 minutes read
jinaai/jina-reranker-v3 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
jina-reranker-v3: Last but Not Late Interaction for Document Reranking
jina-reranker-v3 is a 0.6B parameter multilingual document reranker that introduces a novel last but not late interaction. Unlike late interaction models such as ColBERT that perform separate encoding followed by multi-vector matching, our approach conducts causal self-attention between query and documents within the same context window, enabling rich cross-document interactions before extracting contextual embeddings from the last token of each document. This compact architecture achieves state-of-the-art BEIR performance with 61.94 nDCG@10 while being ten times smaller than generative listwise rerankers.
arXiv.orgFeng Wang

We're excited to release jina-reranker-v3, our latest-generation reranker that delivers state-of-the-art performance across multilingual retrieval benchmarks. This 0.6B-parameter document reranker introduces a novel last but not late interaction that takes a fundamentally different approach from existing methods. jina-reranker-v3 works listwise: it applies causal attention between the query and all candidate documents within a single context window, enabling rich cross-document interactions before extracting contextual embeddings from each document's final token. Our new model achieves 61.94 nDCG@10 on BEIR outperforming Qwen3-Reranker-4B while being 6× smaller in size.

Model Size BEIR MIRACL MKQA CoIR
jina-reranker-v3 0.6B 61.94 66.83 67.92 70.64
jina-reranker-v2 0.3B 57.06 63.65 67.90 56.14
jina-reranker-m0 2.4B 58.95 66.75 68.19 63.55
bge-reranker-v2-m3 0.6B 56.51 69.32 67.88 36.28
mxbai-rerank-base-v2 0.5B 58.40 55.32 64.24 65.71
mxbai-rerank-large-v2 1.5B 61.44 57.94 67.06 70.87
Qwen3-Reranker-0.6B 0.6B 56.28 57.70 65.34 65.18
Qwen3-Reranker-4B 4.0B 61.16 67.52 67.52 73.91
jina-code-embeddings-0.5b 0.5B - - - 73.94
Graph presenting evaluation results for BEIR retrieval tasks, comparing multiple retrieval methods labeled along the x-axis a
English retrieval performance on BEIR, measured by nDCG@10. All scores are our runs based on the top-100 results from jina-embeddings-v3 as the first stage retriever. We evaluate three variants of jina-reranker-v3: documents ordered by descending relevance scores, ascending scores, and random permutation. Evaluation shows that v3 maintains relatively stable performance across different input orderings, suggesting robust self-attention mechanisms that can effectively process documents regardless of their initial arrangement.
Visual chart showing evaluation results for MIRACL retrieval tasks, comparing datasets and metrics like NDCG@10 and NDCG@80 a
MIRACL evaluation across 18 diverse languages demonstrates jina-reranker-v3 cross-lingual consistency despite its compact architecture. The languages we evaluate include English, Chinese, Spanish, Arabic, French, Russian, German, Japanese, Indonesian, Hindi, Bengali, Korean, Swahili, Telugu, Thai, Persian/Farsi, Yoruba, and Finnish.
Graph displaying evaluation metrics for MKQA retrieval tasks, with Recall@10 on the vertical axis and models like jina-rerank
Multilingual retrieval performance on the MKQA, measured by Recall@10. The languages we evaluate include English, Chinese (Simplified), Spanish, Arabic, Portuguese, Russian, Japanese, German, French, Korean, Vietnamese, Italian, Turkish, Polish, Thai, Dutch, Malay, Chinese (Traditional), Swedish, Hebrew, Hungarian, Chinese (Hong Kong), Danish, Norwegian, Finnish, and Khmer.

tagModel Architecture

jina-reranker-v3 is built on the Qwen3-0.6B backbone, a decoder-only transformer model with causal self-attention. The model processes multiple documents and query simultaneously, extracting contextual embeddings at designated token positions for efficient similarity computation.

Diagram illustrating a hierarchical data structure with blocks labeled Query, Document 1, Document 2, and Document 3, showing
Architecture of jina-reranker-v3 showing the transformer backbone with special token positions for embedding extraction. The model processes multiple documents and query in one context window, extracting contextual embeddings at designated token positions for similarity computation.
Parameter Value
Total Parameters 0.6B
Non-Embedding Parameters 0.44B
Hidden Size 1,024
Number of Layers 28
Attention Heads (Q/KV) 16/8 (GQA)
Context Length 131,072
MLP Projector 1024→512→256
Final Embedding Size 256

Given a query and a set of candidate documents, jina-reranker-v3 processes the reranking task with a specialized prompt template that enables cross-document interactions within a single forward pass. The input construction follows a specific format:

<|im_start|>system
You are a search relevance expert who can determine
a ranking of passages based on their relevance to the query.
<|im_end|>

<|im_start|>user
I will provide you with k passages, each indicated by a numerical identifier.
Rank the passages based on their relevance to query: [QUERY]

<passage id="1">
[DOCUMENT_1]<|doc_emb|>
</passage>
<passage id="2">
[DOCUMENT_2]<|doc_emb|>
</passage>
...
<passage id="k">
[DOCUMENT_k]<|doc_emb|>
</passage>

<query>
[QUERY]<|query_emb|>
</query>
<|im_end|>

<|im_start|>assistant
<think></think>

Each document is wrapped in passage tags with sequential IDs, enabling clear document boundaries within the shared context window. The model processes up to 64 documents simultaneously within its 131K token context capacity. For larger document collections, processing occurs in batches while maintaining query consistency across batches.

The query appears twice in the input structure - once at the beginning for task instructions and once at the end for final attention processing. This dual placement enables the final query position to attend to all preceding documents through causal attention. Two critical special tokens mark embedding extraction positions: the <|doc_emb|> token is placed after each document to mark document embedding extraction points, while the <|query_emb|> token is placed after the final query to mark query embedding extraction point. These embeddings capture both local document semantics and global cross-document context through the shared causal self-attention mechanism.

We call this query-document interaction "last but not late." It's "last" because <|doc_emb|> is placed as the final token of each document. It's "not late" because, unlike late interaction models such as ColBERT that separately encode documents before multi-vector matching, we enable query-document and document-document interactions within the same context window during the forward pass.

Finally, a two-layer MLP projector with ReLU activation maps the 1024-dimensional hidden states to a 256-dimensional ranking space. Relevance scoring is computed using cosine similarity between the projected query embedding and each projected document embedding. This produces a relevance score for each document in the input set.

tagGetting Started

tagVia API

The easiest way to use jina-reranker-v3 is via our Search Foundation API.

curl -X POST \
  https://api.jina.ai/v1/rerank \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer JINA_API_KEY" \
  -d '{
  "model": "jina-reranker-v3",
  "query": "slm markdown",
  "documents": [
    ...
  ],
  "return_documents": false
}'
{
  "model":"jina-reranker-v3",
  "usage": {
    "total_tokens":2813
  },
  "results":[
    {
      "index":1,
      "relevance_score":0.9310624287463884
    },
    {
      "index":4,
      "relevance_score":0.8982678574191957
    },
    {
      "index":0,
      "relevance_score":0.890233167219021
    },
    ...
  ]
}

The relevance_score field indicates the relevance of each document to the query, with higher scores indicating greater relevance.

tagVia transformers

from transformers import AutoModel

model = AutoModel.from_pretrained(
    'jinaai/jina-reranker-v3',
    dtype="auto",
    trust_remote_code=True,
)
model.eval()

Now you can use the model's rerank function to compute relevance scores for a query and a list of documents:

query = "What are the health benefits of green tea?"
documents = [
    "Green tea contains antioxidants called catechins that may help reduce inflammation and protect cells from damage.",
    "El precio del café ha aumentado un 20% este año debido a problemas en la cadena de suministro.",
    "Studies show that drinking green tea regularly can improve brain function and boost metabolism.",
    "Basketball is one of the most popular sports in the United States.",
    "绿茶富含儿茶素等抗氧化剂,可以降低心脏病风险,还有助于控制体重。",
    "Le thé vert est riche en antioxydants et peut améliorer la fonction cérébrale.",
]

# Rerank documents
results = model.rerank(query, documents)

# Results are sorted by relevance score (highest first)
for result in results:
    print(f"Score: {result['relevance_score']:.4f}")
    print(f"Document: {result['document'][:100]}...")
    print()

tagConclusion

jina-reranker-v3 is a new 0.6B parameter multilingual listwise reranker that introduces last but not late interaction for efficient document reranking. Documents can attend to each other during encoding, establishing interaction that inform the final ranking.

One of the primary concerns is whether such interaction is resilient to input permutation—that is, if we shuffle the input order, will the rankings stay the same? We tested this with a query against 110 candidate documents using random permutation and plotted the variance at each rank position in the figure below.

Line graph titled "Rank Position Stability Across 1000 Random Permutations (110 documents)" showing stability variance peakin
The rank stability graph visualizes how consistently documents appear at specific rank positions across 1,000 random input permutations. The y-axis represents stability variance as a percentage, where 0% indicates perfect stability (the exact same document always appears at this rank) and 100% indicates maximum variance (nearly all documents appeared at this rank across permutations). The x-axis shows rank positions from 1 to 110.

The critical finding is that top-ranked positions show excellent stability. Ranks 1-10 exhibit minimal variance, with the most relevant documents consistently ranking at the top regardless of input order. This is crucial for nDCG@10 and similar top-k metrics. Irrelevant documents consistently stay at the bottom, creating clear separation between relevant and irrelevant content.

The middle section shows significant position swapping, which is expected and acceptable. The model uses causal self-attention and encodes different contextual information based on what appears before them in the sequence.

In practice, where we care about the top-most results, such behavior is completely acceptable. Our evaluation shows jina-reranker-v3 outperforming our earlier generations, including jina-reranker-v2-base-multilingual and jina-colbert-v2, as well as much larger alternatives such as Qwen3-Reranker-4B and jina-reranker-m0 further confirming this.

Categories:
star
Featured
Press release
rss_feed
Offices
location_on
Sunnyvale, CA
710 Lakeway Dr, Ste 200, Sunnyvale, CA 94085, USA
location_on
Berlin, Germany (HQ)
Prinzessinnenstraße 19-20, 10969 Berlin, Germany
Search Foundation
Reader
Embeddings
Reranker
DeepSearch
Classifier
Segmenter
Get Jina API key
Rate Limit
API Status
Company
About us
Contact sales
Newsroom
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms
Security
Terms & Conditions
Privacy
Manage Cookies
email
Jina AI © 2020-2025.