News
Models
Products
keyboard_arrow_down
DeepSearch
Search, read and reason until best answer found.
Reader
Convert any URL to Markdown for better grounding LLMs.
Embeddings
World-class multimodal multilingual embeddings.
Reranker
World-class reranker for maximizing search relevancy.
More
keyboard_arrow_down
Classifier
Zero-shot and few-shot classification for image and text.
Segmenter
Cut long text into chunks and do tokenization.

API Docs
Auto codegen for your copilot IDE or LLM
open_in_new


Company
keyboard_arrow_down
About us
Contact sales
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms & Conditions


Log in
login
Benchmarking Against the Best 8K Model from Open AI
Features and Benefits
A Glimpse into the Future
star
Featured
Press release
October 25, 2023

Jina AI Launches World's First Open-Source 8K Text Embedding, Rivaling OpenAI

Jina AI introduces jina-embeddings-v2, the world's first open-source model boasting an 8K context length. Matching the prowess of OpenAI's proprietary models, this innovation is now publicly accessible on Huggingface, signaling a significant milestone in the landscape of text embeddings.
Neon "V2" inscription amidst blue and pink abstract waves, creating a dynamic, tech-inspired design
Jina AI
Jina AI • 4 minutes read
Jina Embeddings v3: A Frontier Multilingual Embedding Model
jina-embeddings-v3 is a frontier multilingual text embedding model with 570M parameters and 8192 token-length, outperforming the latest proprietary embeddings from OpenAI and Cohere on MTEB.
Jina AI

jina-embeddings-v3 has been released on Sept. 18, 2024. The best <1B multilingual embedding model.

Berlin, Germany - October 25, 2023 – Jina AI, the Berlin-based artificial intelligence company, is thrilled to announce the launch of its second-generation text embedding model: jina-embeddings-v2. This cutting-edge model is now the only open-source offering that supports an impressive 8K (8192 tokens) context length, putting it on par with OpenAI's proprietary model, text-embedding-ada-002, in terms of both capabilities and performance on the Massive Text Embedding Benchmark (MTEB) leaderboard.

Embedding API
Start with 1M free tokens. Top-performing, 8192 context length bilingual embeddings for your search and RAG systems.

tagBenchmarking Against the Best 8K Model from Open AI

When directly compared with OpenAI's 8K model text-embedding-ada-002, jina-embeddings-v2 showcases its mettle. Below is a performance comparison table, highlighting areas where jina-embeddings-v2 particularly excels:

Rank Model Model Size (GB) Embedding Dimensions Sequence Length Average (56 datasets) Classification Average (12 datasets) Reranking Average (4 datasets) Retrieval Average (15 datasets) Summarization Average (1 dataset)
15 text-embedding-ada-002 Unknown 1536 8191 60.99 70.93 84.89 56.32 30.8
17 jina-embeddings-v2-base-en 0.27 768 8192 60.38 73.45 85.38 56.98 31.6

Notably, jina-embedding-v2 outperforms its OpenAI counterpart in Classification Average, Reranking Average, Retrieval Average, and Summarization Average.

tagFeatures and Benefits

Jina AI’s dedication to innovation is evident in this latest offering:

  • From Scratch to Superiority: The jina-embeddings-v2 was built from the ground up. Over the last three months, the team at Jina AI engaged in intensive R&D, data collection, and tuning. The outcome is a model that marks a significant leap from its predecessor.
  • Unlocking Extended Context Potential with 8K: jina-embeddings-v2 isn’t just a technical feat; its 8K context length opens doors to new industry applications:
    • Legal Document Analysis: Ensure every detail in extensive legal texts is captured and analyzed.
    • Medical Research: Embed scientific papers holistically for advanced analytics and discovery.
    • Literary Analysis: Dive deep into long-form content, capturing nuanced thematic elements.
    • Financial Forecasting: Attain superior insights from detailed financial reports.
    • Conversational AI: Improve chatbot responses to intricate user queries.

Benchmarking shows that in several datasets, this extended context enabled jina-embeddings-v2 to outperform other leading base embedding models, emphasizing the practical advantages of longer context capabilities.

Two color-coded bar charts showing performance comparisons of various models based on NDCG@10% values for retrieval and clustering tasks
  • Availability: Both models are freely available for download on Huggingface:
    • Base Model (0.27G) - Designed for heavy-duty tasks requiring higher accuracy, like academic research or business analytics.
    • Small Model (0.07G) - Crafted for lightweight applications such as mobile apps or devices with limited computing resources.
  • Size Options for Different Needs: Understanding the diverse needs of the AI community, Jina AI offers two versions of the model:
jinaai/jina-embeddings-v2-base-en · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
jinaai/jina-embeddings-v2-small-en · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.

In reflecting on the journey and significance of this launch, Dr. Han Xiao, CEO of Jina AI, shared his thoughts:

"In the ever-evolving world of AI, staying ahead and ensuring open access to breakthroughs is paramount. With jina-embeddings-v2, we've achieved a significant milestone. Not only have we developed the world's first open-source 8K context length model, but we have also brought it to a performance level on par with industry giants like OpenAI. Our mission at Jina AI is clear: we aim to democratize AI and empower the community with tools that were once confined to proprietary ecosystems. Today, I am proud to say, we have taken a giant leap towards that vision."

This pioneering spirit is evident in Jina AI's forward-looking plans.

tagA Glimpse into the Future

Jina AI is committed to leading the forefront of innovation in AI. Here’s what’s next on their roadmap:

  • Academic Insights: An academic paper detailing the technical intricacies and benchmarks of jina-embeddings-v2 will soon be published, allowing the AI community to gain deeper insights.
  • API Development: The team is in the advanced stages of developing an OpenAI-like embeddings API platform. This will provide users with the capability to effortlessly scale the embedding model according to their needs.
  • Language Expansion: Venturing into multilingual embeddings, Jina AI is setting its sights on launching German-English models, further expanding its repertoire.

About Jina AI GmbH:
Located at Ohlauer Str. 43 (1st floor), zone A, 10999 Berlin, Germany, Jina AI is at the vanguard of reshaping the landscape of multimodal artificial intelligence. For inquiries, please reach out at [email protected].

Categories:
star
Featured
Press release
rss_feed

Read more
April 08, 2025 • 21 minutes read
jina-reranker-m0: Multilingual Multimodal Document Reranker
Jina AI
Modern dot matrix text display on a dark blue background, conveying a digital feel.
January 15, 2025 • 17 minutes read
ReaderLM v2: Frontier Small Language Model for HTML to Markdown and JSON
Jina AI
Orange text "ReaderLM-u2" on a vibrant dark red digital screen.
December 16, 2024 • 2 minutes read
Re·Search: Order 2024 Yearbook of Search Foundation Advances
Jina AI
Open red publication "ReSearch" volume 24 displayed on a white surface with a distinctive shadow casting over the pages.
Offices
location_on
Sunnyvale, CA
710 Lakeway Dr, Ste 200, Sunnyvale, CA 94085, USA
location_on
Berlin, Germany (HQ)
Prinzessinnenstraße 19-20, 10969 Berlin, Germany
location_on
Beijing, China
Level 5, Building 6, No.48 Haidian West St. Beijing, China
location_on
Shenzhen, China
402 Floor 4, Fu'an Technology Building, Shenzhen, China
Search Foundation
DeepSearch
Reader
Embeddings
Reranker
Classifier
Segmenter
API Documentation
Get Jina API key
Rate Limit
API Status
Company
About us
Contact sales
Newsroom
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms
Security
Terms & Conditions
Privacy
Manage Cookies
email
Jina AI © 2020-2025.