search
reorder
Newsroom
Crafting AI innovations, one word at a time.
school
Academic
rss_feed
RSS
Featured
September 18, 2024 • 10 minutes read
Jina Embeddings v3: A Frontier Multilingual Embedding Model
jina-embeddings-v3 is a frontier multilingual text embedding model with 570M parameters and 8192 token-length, outperforming the latest proprietary embeddings from OpenAI and Cohere on MTEB.
September 11, 2024 • 13 minutes read
Reader-LM: Small Language Models for Cleaning and Converting HTML to Markdown
Reader-LM-0.5B and Reader-LM-1.5B are two novel small language models inspired by Jina Reader, designed to convert raw, noisy HTML from the open web into clean markdown.
August 30, 2024 • 10 minutes read
Jina ColBERT v2: Multilingual Late Interaction Retriever for Embedding and Reranking
Jina ColBERT v2 supports 89 languages with superior retrieval performance, user-controlled output dimensions, and 8192 token-length.
update
Latest
October 09, 2024 • 13 minutes read
Bridging Language Gaps in Multilingual Embeddings via Contrastive Learning
October 03, 2024 • 9 minutes read
What Late Chunking Really Is & What It’s Not: Part II
September 27, 2024 • 15 minutes read
Migration From Jina Embeddings v2 to v3
school
Academic Publications
arXiv
September 18, 2024
jina-embeddings-v3: Multilingual Embeddings With Task LoRA
arXiv
September 07, 2024
Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models
arXiv
August 30, 2024
Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever
arXiv
June 21, 2024
Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models
ICML 2024
May 30, 2024
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
arXiv
February 26, 2024
Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings
arXiv
October 30, 2023
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents
EMNLP 2023
July 20, 2023
Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models
8 publications in total.
Featured
All
Press release
Tech blog
Opinion
Knowledge base
Software update
Event
1
2
3
…
23
October 09, 2024 • 13 minutes read
Bridging Language Gaps in Multilingual Embeddings via Contrastive Learning
Multilingual models often face a "language gap," where similar phrases in different languages don't align. We show how contrastive learning can bridge this gap, enhancing cross-language performance.
October 03, 2024 • 9 minutes read
What Late Chunking Really Is & What It’s Not: Part II
Part 2 of our exploration of Late Chunking, a deep dive into why it is the best method for chunk embeddings and improving search/RAG performance.
September 27, 2024 • 15 minutes read
Migration From Jina Embeddings v2 to v3
We collected some tips to help you migrate from Jina Embeddings v2 to v3.
September 18, 2024 • 10 minutes read
Jina Embeddings v3: A Frontier Multilingual Embedding Model
jina-embeddings-v3 is a frontier multilingual text embedding model with 570M parameters and 8192 token-length, outperforming the latest proprietary embeddings from OpenAI and Cohere on MTEB.
September 11, 2024 • 13 minutes read
Reader-LM: Small Language Models for Cleaning and Converting HTML to Markdown
Reader-LM-0.5B and Reader-LM-1.5B are two novel small language models inspired by Jina Reader, designed to convert raw, noisy HTML from the open web into clean markdown.
August 30, 2024 • 10 minutes read
Jina ColBERT v2: Multilingual Late Interaction Retriever for Embedding and Reranking
Jina ColBERT v2 supports 89 languages with superior retrieval performance, user-controlled output dimensions, and 8192 token-length.
August 26, 2024 • 13 minutes read
The What and Why of Text-Image Modality Gap in CLIP Models
You can't just use a CLIP model to retrieve text and images and sort the results by score. Why? Because of the modality gap. What is it, and where does it come from?
August 22, 2024 • 8 minutes read
Late Chunking in Long-Context Embedding Models
Chunking long documents while preserving contextual information is challenging. We introduce the "Late Chunking" that leverages long-context embedding models to generate contextual chunk embeddings for better retrieval applications.
August 14, 2024 • 17 minutes read
By Hoovering Up the Web, AI Is Poisoning Itself
What does it mean for LLMs when the web has been strip-mined clean, content providers have locked their doors, and there’s barely a trickle of new data to scrape?
August 07, 2024 • 10 minutes read
What We Learned at ICML2024 ft. PLaG, XRM, tinyBenchmark, MagicLens, Prompt Sketching etc.
We had a blast at ICML 2024 in Vienna, and we want to share with you everything we said, saw, and learned.
July 31, 2024 • 17 minutes read
Rephrased Labels Improve Zero-Shot Text Classification by 30%
When using embedding models for zero-shot classification, rephrasing the class label to "This is seriously about 'LABEL'" gives higher accuracy vs. using LABEL alone. But how, and why?
July 24, 2024 • 10 minutes read
Can Embedding/Reranker Models Compare Numbers?
A lot of LLMs can't figure out that 9.11 is actually smaller than 9.9. Can our embedding and reranker models do any better?
1
2
3
…
23
Search by title
search
Filter by product
Filter by author
Offices
location_on
Berlin, Germany (HQ)
Prinzessinnenstraße 19-20, 10969 Berlin, Germany
Geschäftsanschrift: Leipzigerstr. 96, 10117 Berlin, Germany
location_on
Beijing, China
Level 5, Building 6, No.48 Haidian West St. Beijing Haidian, China
location_on
Shenzhen, China
402, Floor 4, Fu'an Technology Building, Shenzhen Nanshan, China
Search Foundation
Embeddings
Reranker
Reader
Segmenter
Get Jina AI API key
Rate Limit
API Status
Company
About us
Contact sales
Newsroom
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms
Security
Terms & Conditions
Privacy
Manage Cookies
email
language
English
science
Jina AI GmbH © 2020-2024.