jina-embeddings-v5-text-small

Embeddings

CC BY-NC 4.0

Release Post

jina-embeddings-v5-text-small

SOTA multilingual embeddings with task-specific adapters

License

CC-BY-NC-4.0

Release Date

2026-02-18

Input

Text

Output

Vector

Matryoshka Dimensions

128

256

512

1024

Late Chunking

Model Details

Parameters: 677M

Input Token Length: 32K

Output Dimension: 1024

Base Model

Qwen3-0.6B-Base

Trained Languages

32 languages

Supported Languages

93 languages

Quantizations

GGUF

Apple Silicon Support

MLX

Related Models

jina-embeddings-v3

jina-embeddings-v5-text-nano

Supported Tasks

Retrieval

Text Matching

Clustering

Classification

Overview

jina-embeddings-v5-text-small is a 0.6B parameter multilingual text embedding model built on the Qwen3-0.6B-Base backbone. It produces 1024-dimensional embeddings via last-token pooling and supports context lengths up to 32K tokens through rotary positional embeddings (RoPE) with adjusted base frequencies. The model includes four task-specific LoRA adapters for retrieval, semantic similarity, clustering, and classification, trained independently on frozen backbone weights. Matryoshka Representation Learning enables embedding truncation to dimensions as low as 32. The model is trained using a two-stage process: first, embedding distillation from Qwen3-Embedding-4B to transfer knowledge from the larger teacher model, followed by task-specific adapter training with specialized loss functions for each task category. It supports asymmetric retrieval with 'Query:' and 'Document:' prefixes.

Methods

Training proceeds in two stages. In the first stage, embedding distillation transfers knowledge from Qwen3-Embedding-4B (a 4B parameter teacher model) to the Qwen3-0.6B-Base student model using a cosine distance loss between projected student embeddings and teacher embeddings. A linear projection layer maps the student's 1024-dimensional space into the teacher's higher-dimensional space. General-purpose distillation uses over 300 datasets in 30+ languages for 50,000 steps, followed by long-context training on synthetic and natural long documents (1,000-4,096 tokens) with adjusted RoPE parameters. In the second stage, four LoRA adapters are trained on frozen backbone weights: the retrieval adapter combines InfoNCE contrastive loss with hard negatives, continued distillation loss, and a Global Orthogonal Regularizer (GOR) for quantization robustness; the text-matching adapter uses CoSENT ranking loss for graded similarity with distillation on unscored pairs; the clustering adapter uses re-distillation with a clustering-specific teacher instruction; and the classification adapter uses bidirectional InfoNCE loss with relational knowledge distillation regularization. Final retrieval adapter weights are averaged across checkpoints.

Performance

On MMTEB (multilingual), jina-embeddings-v5-text-small achieves 67.0 average (task-level) and 58.9 average (type-level), the highest among all models under 1B parameters. It scores 71.3 on classification, 53.4 on clustering, 82.9 on pair classification, 65.7 on reranking, 64.9 on retrieval, and 78.9 on STS. On English MTEB, it achieves 71.7 average, outperforming Qwen3-0.6B with instructions (70.5) and jina-embeddings-v3 (65.7). On retrieval-specific benchmarks, it scores 64.88 on MTEB-M retrieval, 66.84 on RTEB, 56.67 on BEIR, and 66.39 on LongEmbed. The model surpasses its teacher Qwen3-4B on pair classification (42.0 vs 26.8 on MMTEB) while maintaining competitive scores across all other categories despite being 6x smaller.

Best Practice

Select the appropriate LoRA adapter for your task: 'retrieval' for asymmetric query-document search (prepend 'Query:' to queries and 'Document:' to passages), 'text-matching' for symmetric similarity tasks like duplicate detection and paraphrase identification (uses 'Document:' prefix for both inputs), 'clustering' for grouping related documents, and 'classification' for categorization and sentiment analysis. For retrieval tasks, always use the correct prefix as the model is trained with asymmetric encoding. Matryoshka truncation allows reducing embeddings from 1024 to as low as 32 dimensions; performance remains strong above 256 dimensions but degrades noticeably below that threshold, consistent with Johnson-Lindenstrauss limits. Binary quantization is supported with minimal performance loss thanks to GOR regularization. The 32K context window handles long documents natively, but the model was additionally trained on long-context data for robust long-document retrieval. Use cosine similarity for embedding comparison. The model is available via Jina AI API, Hugging Face (with Sentence Transformers and vLLM integration), and quantized variants for llama.cpp.

Blogs that mention this model

May 12, 2026 • 7 minutes read

jina-embeddings-v5-omni: Embeddings for Text, Image, Audio and Video

One model, four modalities: text, image, audio, video. Best-in-class omni embeddings in 1.6B and 0.9B.

March 06, 2026 • 6 minutes read

Identifying Embedding Models from Raw Numerical Values

A tiny transformer that fingerprints embedding models by reading raw numerical digits. No feature engineering.

February 19, 2026 • 7 minutes read

jina-embeddings-v5-text: New SOTA Small Multilingual Embeddings

Two sub-1B multilingual embeddings with best-in-class performance, available on Elastic Inference Service, Llama.cpp and MLX.