I/O graph
Choose models to compare
Publications (1)
Overview
jina-reranker-v3 is a 0.6B parameter multilingual document reranker introducing a novel last but not late interaction architecture. Unlike ColBERT's separate encoding with multi-vector matching, this model performs causal self-attention between query and documents within the same context window, enabling rich cross-document interactions before extracting contextual embeddings from the last token of each document. Built on Qwen3-0.6B with 28 transformer layers and a lightweight MLP projector (1024→512→256), it processes up to 64 documents simultaneously within 131K token context. The model achieves state-of-the-art BEIR performance with 61.94 nDCG-10 while being 10× smaller than generative listwise rerankers.
Methods
Employs three-stage progressive training with multi-objective loss combining InfoNCE, dispersive loss (0.45), dual matching loss (0.85), and similarity loss (0.85). Stage 1 uses LoRA fine-tuning (r=16, α=32) on domain-specific datasets including BGE-M3, Cornstack, with 16 documents per query. Stage 2 extends context to 8,192 tokens and mines hard negatives across retrieval systems with up to 25 negatives at τ=0.05. Stage 3 merges specialized models with weights 0.25-0.65. Special tokens doc_emb and query_emb mark embedding extraction positions. Training uses structured prompts with system/user/assistant roles, placing query at both beginning and end for bidirectional attention.
Performance
Achieves 61.94 nDCG-10 on BEIR, highest among all evaluated rerankers and 4.88% improvement over jina-reranker-v2. Excels in multi-hop retrieval with 78.56 on HotpotQA, fact verification reaching 93.95 on FEVER. Multilingual performance reaches 66.50 on MIRACL across 18 languages, with Arabic at 78.69 and Thai at 81.06. Code retrieval achieves 63.28 on CoIR. Outperforms 1.5B mxbai-rerank-large (61.44) with 2.5Ă— fewer parameters. Shows 5.43% improvement over same-scale bge-reranker-v2-m3. Relatively stable across document orderings: random (62.54), descending (61.94), ascending (61.52).
Best Practice
Use structured prompt template with system/user/assistant roles and special tokens for embedding extraction. Process up to 64 documents per forward pass for collections exceeding 131K context. Optimal with documents ordered randomly or by descending relevance. Leverage cross-document interaction capability for comparative ranking tasks. For multilingual applications, model provides strong zero-shot transfer across 18 languages. Implement batch processing for large document sets, maintaining query embeddings consistently across batches. Consider the 256-dimensional output embeddings for efficient similarity computation. Ideal for applications requiring both ranking quality and inference efficiency, particularly multi-hop reasoning and fact verification tasks.
Blogs that mention this model