Available via
I/O graph
Choose models to compare
Publications (1)
Overview
jina-embeddings-v5-text-small is a 0.6B parameter multilingual text embedding model built on the Qwen3-0.6B-Base backbone. It produces 1024-dimensional embeddings via last-token pooling and supports context lengths up to 32K tokens through rotary positional embeddings (RoPE) with adjusted base frequencies. The model includes four task-specific LoRA adapters for retrieval, semantic similarity, clustering, and classification, trained independently on frozen backbone weights. Matryoshka Representation Learning enables embedding truncation to dimensions as low as 32. The model is trained using a two-stage process: first, embedding distillation from Qwen3-Embedding-4B to transfer knowledge from the larger teacher model, followed by task-specific adapter training with specialized loss functions for each task category. It supports asymmetric retrieval with 'Query:' and 'Document:' prefixes.
Methods
Training proceeds in two stages. In the first stage, embedding distillation transfers knowledge from Qwen3-Embedding-4B (a 4B parameter teacher model) to the Qwen3-0.6B-Base student model using a cosine distance loss between projected student embeddings and teacher embeddings. A linear projection layer maps the student's 1024-dimensional space into the teacher's higher-dimensional space. General-purpose distillation uses over 300 datasets in 30+ languages for 50,000 steps, followed by long-context training on synthetic and natural long documents (1,000-4,096 tokens) with adjusted RoPE parameters. In the second stage, four LoRA adapters are trained on frozen backbone weights: the retrieval adapter combines InfoNCE contrastive loss with hard negatives, continued distillation loss, and a Global Orthogonal Regularizer (GOR) for quantization robustness; the text-matching adapter uses CoSENT ranking loss for graded similarity with distillation on unscored pairs; the clustering adapter uses re-distillation with a clustering-specific teacher instruction; and the classification adapter uses bidirectional InfoNCE loss with relational knowledge distillation regularization. Final retrieval adapter weights are averaged across checkpoints.
Performance
On MMTEB (multilingual), jina-embeddings-v5-text-small achieves 67.0 average (task-level) and 58.9 average (type-level), the highest among all models under 1B parameters. It scores 71.3 on classification, 53.4 on clustering, 82.9 on pair classification, 65.7 on reranking, 64.9 on retrieval, and 78.9 on STS. On English MTEB, it achieves 71.7 average, outperforming Qwen3-0.6B with instructions (70.5) and jina-embeddings-v3 (65.7). On retrieval-specific benchmarks, it scores 64.88 on MTEB-M retrieval, 66.84 on RTEB, 56.67 on BEIR, and 66.39 on LongEmbed. The model surpasses its teacher Qwen3-4B on pair classification (42.0 vs 26.8 on MMTEB) while maintaining competitive scores across all other categories despite being 6x smaller.
Best Practice
Select the appropriate LoRA adapter for your task: 'retrieval' for asymmetric query-document search (prepend 'Query:' to queries and 'Document:' to passages), 'text-matching' for symmetric similarity tasks like duplicate detection and paraphrase identification (uses 'Document:' prefix for both inputs), 'clustering' for grouping related documents, and 'classification' for categorization and sentiment analysis. For retrieval tasks, always use the correct prefix as the model is trained with asymmetric encoding. Matryoshka truncation allows reducing embeddings from 1024 to as low as 32 dimensions; performance remains strong above 256 dimensions but degrades noticeably below that threshold, consistent with Johnson-Lindenstrauss limits. Binary quantization is supported with minimal performance loss thanks to GOR regularization. The 32K context window handles long documents natively, but the model was additionally trained on long-context data for robust long-document retrieval. Use cosine similarity for embedding comparison. The model is available via Jina AI API, Hugging Face (with Sentence Transformers and vLLM integration), and quantized variants for llama.cpp.
Blogs that mention this model



