search
reorder
Newsroom
Accelerate search AI, one word at a time.
school
Academic
rss_feed
RSS
folder_special
Featured
October 22, 2024 • 16 minutes read
Jina Classifier API for High Performance Zero-Shot and Few-Shot Classification
New Classifier API offers zero-shot and few-shot classification for text and images. Start classifying content instantly or train it with your own examples.
September 18, 2024 • 10 minutes read
Jina Embeddings v3: A Frontier Multilingual Embedding Model
jina-embeddings-v3 is a frontier multilingual text embedding model with 570M parameters and 8192 token-length, outperforming the latest proprietary embeddings from OpenAI and Cohere on MTEB.
September 11, 2024 • 13 minutes read
Reader-LM: Small Language Models for Cleaning and Converting HTML to Markdown
Reader-LM-0.5B and Reader-LM-1.5B are two novel small language models inspired by Jina Reader, designed to convert raw, noisy HTML from the open web into clean markdown.
update
Latest
November 19, 2024 • 9 minutes read
Meta-Prompt for Better Jina API Integration and CodeGen
November 05, 2024 • 2 minutes read
Call for Participants: EMNLP 2024 BoF on Embeddings, Reranker & Small LMs for Better Search
October 29, 2024 • 11 minutes read
Beyond CLIP: How Jina-CLIP Advances Multimodal Search
school
Academic Publications
arXiv
September 18, 2024
jina-embeddings-v3: Multilingual Embeddings With Task LoRA
arXiv
September 07, 2024
Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models
arXiv
August 30, 2024
Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever
arXiv
June 21, 2024
Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models
ICML 2024
May 30, 2024
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
arXiv
February 26, 2024
Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings
arXiv
October 30, 2023
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents
EMNLP 2023
July 20, 2023
Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models
8 publications in total.
folder_special
Featured
All
Press release
Tech blog
Opinion
Event
1
2
3
…
15
chevron_left
chevron_right
November 19, 2024 • 9 minutes read
Meta-Prompt for Better Jina API Integration and CodeGen
Is Meta-Prompt the new norm for API specs? Feed it to LLMs and generate integration code that reliably integrates Jina's APIs, saving you from the usual trial-and-error process.
November 05, 2024 • 2 minutes read
Call for Participants: EMNLP 2024 BoF on Embeddings, Reranker & Small LMs for Better Search
At EMNLP 2024 Miami? Join us for a Birds of a Feather session focusing on embeddings, rerankers, and small LMs for better search.
October 29, 2024 • 11 minutes read
Beyond CLIP: How Jina-CLIP Advances Multimodal Search
Learn how Jina-CLIP enhances OpenAI's CLIP with better retrieval accuracy and more diverse results through unified text-image embeddings.
October 25, 2024 • 19 minutes read
Finding Optimal Breakpoints in Long Documents Using Small Language Models
We trained three small language models to better segment long documents into chunks, and here are the key lessons we learned.
October 22, 2024 • 16 minutes read
Jina Classifier API for High Performance Zero-Shot and Few-Shot Classification
New Classifier API offers zero-shot and few-shot classification for text and images. Start classifying content instantly or train it with your own examples.
October 15, 2024 • 9 minutes read
Fact-Checking with New Grounding API in Jina Reader
With the new g.jina.ai, you can easily ground statements to reduce LLM hallucinations or improve the integrity of human-written content.
October 09, 2024 • 13 minutes read
Bridging Language Gaps in Multilingual Embeddings via Contrastive Learning
Multilingual models often face a "language gap," where similar phrases in different languages don't align. We show how contrastive learning can bridge this gap, enhancing cross-language performance.
October 03, 2024 • 9 minutes read
What Late Chunking Really Is & What It’s Not: Part II
Part 2 of our exploration of Late Chunking, a deep dive into why it is the best method for chunk embeddings and improving search/RAG performance.
September 27, 2024 • 15 minutes read
Migration From Jina Embeddings v2 to v3
We collected some tips to help you migrate from Jina Embeddings v2 to v3.
September 18, 2024 • 10 minutes read
Jina Embeddings v3: A Frontier Multilingual Embedding Model
jina-embeddings-v3 is a frontier multilingual text embedding model with 570M parameters and 8192 token-length, outperforming the latest proprietary embeddings from OpenAI and Cohere on MTEB.
September 11, 2024 • 13 minutes read
Reader-LM: Small Language Models for Cleaning and Converting HTML to Markdown
Reader-LM-0.5B and Reader-LM-1.5B are two novel small language models inspired by Jina Reader, designed to convert raw, noisy HTML from the open web into clean markdown.
August 30, 2024 • 10 minutes read
Jina ColBERT v2: Multilingual Late Interaction Retriever for Embedding and Reranking
Jina ColBERT v2 supports 89 languages with superior retrieval performance, user-controlled output dimensions, and 8192 token-length.
1
2
3
…
15
Search by title
search
Filter by product
arrow_drop_down
Filter by author
arrow_drop_down
Offices
location_on
Berlin, Germany (HQ)
Prinzessinnenstraße 19-20, 10969 Berlin, Germany
location_on
Beijing, China
Level 5, Building 6, No.48 Haidian West St. Beijing Haidian, China
location_on
Shenzhen, China
402, Floor 4, Fu'an Technology Building, Shenzhen Nanshan, China
Search Foundation
Embeddings
Reranker
Reader
Classifier
Segmenter
Get Jina AI API key
Rate Limit
API Status
Company
About us
Contact sales
Newsroom
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms
Commercial License
Security
Terms & Conditions
Privacy
Manage Cookies
email
language
arrow_drop_down
Jina AI GmbH © 2020-2024.