News
Models
Products
keyboard_arrow_down
DeepSearch
Search, read and reason until best answer found.
Reader
Convert any URL to Markdown for better grounding LLMs.
Embeddings
World-class multimodal multilingual embeddings.
Reranker
World-class reranker for maximizing search relevancy.
Classifier
Zero-shot and few-shot classification for image and text.
Segmenter
Cut long text into chunks and do tokenization.

API Docs
Auto codegen for your copilot IDE or LLM
open_in_new


Company
keyboard_arrow_down
About us
Contact sales
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms & Conditions


Log in
login

Contact sales

Grow your business with Jina AI.
calculateEnterprise License Configurator

Three Ways to Purchase

Subscribe to our API, purchase through cloud providers, or obtain a commercial license for your organization.
radio_button_unchecked
encrypted
With a commercial license for on-prem use
Require 100% control and privacy? Purchase a commercial license to use our models on-premises.
radio_button_unchecked
cloud
With 3 cloud service providers
Using AWS or Azure? You can deploy our models directly on your company's cloud platform and handle billing through the CSP account.
AWS SageMaker
Embeddings
Reranker
Microsoft Azure
Embeddings
Reranker
Google Cloud
Embeddings
Reranker
radio_button_checked
With Jina Search Foundation API
The easiest way to access all of our products. Top-up tokens as you go.
Top up this API key with more tokens
Depending on your location, you may be charged in USD, EUR, or other currencies. Taxes may apply.
Please input the right API key to top up
Understand the rate limit
Rate limits are the maximum number of requests that can be made to an API within a minute per IP address/API key (RPM). Find out more about the rate limits for each product and tier below.
keyboard_arrow_down
Rate Limit
Rate limits are tracked in three ways: RPM (requests per minute), and TPM (tokens per minute). Limits are enforced per IP/API key and will be triggered when either the RPM or TPM threshold is reached first. When you provide an API key in the request header, we track rate limits by key rather than IP address.
ProductAPI EndpointDescriptionarrow_upwardw/o API Keykey_offw/ API Keykeyw/ Premium API KeykeyAverage LatencyToken Usage CountingAllowed Request
Reader APIhttps://r.jina.aiConvert URL to LLM-friendly text20 RPM500 RPMtrending_up5000 RPM7.9sCount the number of tokens in the output response.GET/POST
Reader APIhttps://s.jina.aiSearch the web and convert results to LLM-friendly textblock100 RPMtrending_up1000 RPM2.5sEvery request costs a fixed number of tokens, starting from 10000 tokensGET/POST
DeepSearchhttps://deepsearch.jina.ai/v1/chat/completionsReason, search and iterate to find the best answer0.5 RPM50 RPM500 RPM56.7sCount the total number of tokens in the whole process.POST
Embedding APIhttps://api.jina.ai/v1/embeddingsConvert text/images to fixed-length vectorsblock500 RPM & 1,000,000 TPMtrending_up2,000 RPM & 5,000,000 TPM
ssid_chart
depends on the input size
help
Count the number of tokens in the input request.POST
Reranker APIhttps://api.jina.ai/v1/rerankRank documents by queryblock500 RPM & 1,000,000 TPMtrending_up2,000 RPM & 5,000,000 TPM
ssid_chart
depends on the input size
help
Count the number of tokens in the input request.POST
Classifier APIhttps://api.jina.ai/v1/trainTrain a classifier using labeled examplesblock20 RPM & 200,000 TPM60 RPM & 1,000,000 TPM
ssid_chart
depends on the input size
Tokens counted as: input_tokens × num_itersPOST
Classifier API (Few-shot)https://api.jina.ai/v1/classifyClassify inputs using a trained few-shot classifierblock20 RPM & 200,000 TPM60 RPM & 1,000,000 TPM
ssid_chart
depends on the input size
Tokens counted as: input_tokensPOST
Classifier API (Zero-shot)https://api.jina.ai/v1/classifyClassify inputs using zero-shot classificationblock200 RPM & 500,000 TPM1,000 RPM & 3,000,000 TPM
ssid_chart
depends on the input size
Tokens counted as: input_tokens + label_tokensPOST
Segmenter APIhttps://api.jina.ai/v1/segmentTokenize and segment long text20 RPM200 RPM1,000 RPM0.3sToken is not counted as usage.GET/POST

FAQ

How to get my API key?

video_not_supported

What's the rate limit?

Rate Limit
Rate limits are tracked in three ways: RPM (requests per minute), and TPM (tokens per minute). Limits are enforced per IP/API key and will be triggered when either the RPM or TPM threshold is reached first. When you provide an API key in the request header, we track rate limits by key rather than IP address.
ProductAPI EndpointDescriptionarrow_upwardw/o API Keykey_offw/ API Keykeyw/ Premium API KeykeyAverage LatencyToken Usage CountingAllowed Request
Reader APIhttps://r.jina.aiConvert URL to LLM-friendly text20 RPM500 RPMtrending_up5000 RPM7.9sCount the number of tokens in the output response.GET/POST
Reader APIhttps://s.jina.aiSearch the web and convert results to LLM-friendly textblock100 RPMtrending_up1000 RPM2.5sEvery request costs a fixed number of tokens, starting from 10000 tokensGET/POST
DeepSearchhttps://deepsearch.jina.ai/v1/chat/completionsReason, search and iterate to find the best answer0.5 RPM50 RPM500 RPM56.7sCount the total number of tokens in the whole process.POST
Embedding APIhttps://api.jina.ai/v1/embeddingsConvert text/images to fixed-length vectorsblock500 RPM & 1,000,000 TPMtrending_up2,000 RPM & 5,000,000 TPM
ssid_chart
depends on the input size
help
Count the number of tokens in the input request.POST
Reranker APIhttps://api.jina.ai/v1/rerankRank documents by queryblock500 RPM & 1,000,000 TPMtrending_up2,000 RPM & 5,000,000 TPM
ssid_chart
depends on the input size
help
Count the number of tokens in the input request.POST
Classifier APIhttps://api.jina.ai/v1/trainTrain a classifier using labeled examplesblock20 RPM & 200,000 TPM60 RPM & 1,000,000 TPM
ssid_chart
depends on the input size
Tokens counted as: input_tokens × num_itersPOST
Classifier API (Few-shot)https://api.jina.ai/v1/classifyClassify inputs using a trained few-shot classifierblock20 RPM & 200,000 TPM60 RPM & 1,000,000 TPM
ssid_chart
depends on the input size
Tokens counted as: input_tokensPOST
Classifier API (Zero-shot)https://api.jina.ai/v1/classifyClassify inputs using zero-shot classificationblock200 RPM & 500,000 TPM1,000 RPM & 3,000,000 TPM
ssid_chart
depends on the input size
Tokens counted as: input_tokens + label_tokensPOST
Segmenter APIhttps://api.jina.ai/v1/segmentTokenize and segment long text20 RPM200 RPM1,000 RPM0.3sToken is not counted as usage.GET/POST

Do I need a commercial license?

CC BY-NC License Self-Check

play_arrow
Are you using our official API or official images on Azure or AWS?
play_arrow
done
Yes
play_arrow
Are you using a paid API key or free trial key?
play_arrow
done
Paid API key
No restrictions. Use as per your current agreement.
play_arrow
info
Free API key
Free trial key can be only used for non-commercial purposes. Please purchase a paid package for commercial use.
play_arrow
Are you using our official model images on AWS and Azure?
No restrictions. Use as per your current agreement.
play_arrow
close
No
play_arrow
Are you using these models?
jina-reranker-m0
jina-clip-v2
jina-embeddings-v3
jina-reranker-v2-base-multilingual
jina-colbert-v2
reader-lm-1.5b
reader-lm-0.5b
ReaderLM-v2
play_arrow
close
No
No restrictions apply.
play_arrow
done
Yes
play_arrow
Is your use commercial?
play_arrow
question_mark
Not sure
play_arrow
Are you:
play_arrow
Using it for personal or hobby projects?
This is non-commercial. You can use the models freely.
play_arrow
A for-profit company using it internally?
This is commercial. Contact our sales team.
Contact sales
play_arrow
An educational institution using it for teaching?
This is typically non-commercial. You can use the models freely.
play_arrow
A non-profit or NGO using it for your mission?
This is typically non-commercial, but check with us if unsure.
Contact sales
play_arrow
Using it in a product or service you sell?
This is commercial. Contact our sales team.
Contact sales
play_arrow
A government entity using it for public services?
This may be commercial. Please contact us for clarification.
Contact sales
play_arrow
close
No
You can use the models freely.
play_arrow
done
Yes
Contact our sales team for licensing.
Contact sales

Other questions

DeepSearch-related common questions
What is DeepSearch?
keyboard_arrow_down
DeepSearch is an LLM API that performs iterative search, reading, and reasoning until it finds an accurate answer to a query or reaches its token budget limit.
How is DeepSearch different from OpenAI and Gemini's deep research capabilities?
keyboard_arrow_down
Unlike OpenAI and Gemini, DeepSearch specifically focuses on delivering accurate answers through iteration rather than generating long-form articles. It's optimized for quick, precise answers from deep web search rather than creating comprehensive reports.
What API key do I need to use DeepSearch?
keyboard_arrow_down
You need a Jina API key. We offers 10M free tokens for new API keys.
What happens when DeepSearch reaches its token budget? Does it return an incomplete answer?
keyboard_arrow_down
It generates a final answer based on all accumulated knowledge, rather than just giving up or returning an incomplete response.
Does DeepSearch guarantee accurate answers?
keyboard_arrow_down
No. While it uses an iterative search process to improve accuracy, the evaluation shows it achieves a 75% pass rate on test questions, significantly better than the 0% baseline (gemini-2.0-flash) but not perfect.
How long does a typical DeepSearch query take?
keyboard_arrow_down
It varies significantly - queries can take anywhere from 1 to 42 steps, with an average of 4 steps based on evaluation data. That's 20 seconds. Simple queries might resolve quickly, while complex research questions can involve many iterations and up to 120 seconds.
Can DeepSearch work with any OpenAI-compatible client like Chatwise, CherryStudio or ChatBox?
keyboard_arrow_down
Yes, the official DeepSearch API at deepsearch.jina.ai/v1/chat/completions is fully compatible with the OpenAI API schema, using 'jina-deepsearch-v1' as the model name. Therefore it is super easy to switch from OpenAI to DeepSearch and use with local clients or any OpenAI-compatible client. We highly recommend Chatwise for a seamless experience.
What are the rate limits for the API?
keyboard_arrow_down
Rate limits vary by API key tier, ranging from 10 RPM to 30 RPM. This is important to consider for applications with high query volumes.
What is the content inside the <think> tag?
keyboard_arrow_down
DeepSearch wraps thinking steps in XML tags ... and provides the final answer afterward, following the OpenAI streaming format but with these special markers for the chain of thoughts.
Does DeepSearch use Jina Reader for web search and reading?
keyboard_arrow_down
Yes. Jina Reader is used for web search and reading, providing the system with the ability to efficiently access and process web content.
Why does DeepSearch use so many tokens for my queries?
keyboard_arrow_down
Yes, the token usage of DeepSearch on complex queries is arguably high - averaging 70,000 tokens compared to 500 for basic LLM responses. This shows the depth of research but also has cost implications.
Is there a way to control or limit the number of steps?
keyboard_arrow_down
The system is primarily controlled by token budget rather than step count. Once the token budget is exceeded, it enters Beast Mode for final answer generation. Check reasoning_effort for more details.
How reliable are the references in the answers?
keyboard_arrow_down
References are considered so important that if an answer is deemed definitive but lacks references, the system continues searching rather than accepting the answer.
Can DeepSearch handle questions about future events?
keyboard_arrow_down
Yes, but with extensive research steps. The example of 'who will be president in 2028' shows it can handle speculative questions through multiple research iterations, though accuracy isn't guaranteed for such predictions.
Reader-related common questions
What are the costs associated with using the Reader API?
keyboard_arrow_down
The Reader API is free of charge and does not require an API key. Simply prepend 'https://r.jina.ai/' to your URL.
How does the Reader API function?
keyboard_arrow_down
The Reader API uses a proxy to fetch any URL, rendering its content in a browser to extract high-quality main content.
Is the Reader API open source?
keyboard_arrow_down
Yes, the Reader API is open source and available on the Jina AI GitHub repository.
What is the typical latency for the Reader API?
keyboard_arrow_down
The Reader API generally processes URLs and returns content within 2 seconds, although complex or dynamic pages might require more time.
Why should I use the Reader API instead of scraping the page myself?
keyboard_arrow_down
Scraping can be complicated and unreliable, particularly with complex or dynamic pages. The Reader API provides a streamlined, reliable output of clean, LLM-ready text.
Does the Reader API support multiple languages?
keyboard_arrow_down
The Reader API returns content in the original language of the URL. It does not provide translation services.
What should I do if a website blocks the Reader API?
keyboard_arrow_down
If you experience blocking issues, please contact our support team for assistance and resolution.
Can the Reader API extract content from PDF files?
keyboard_arrow_down
Yes, the Reader API can natively extract content from PDF files.
Can the Reader API process media content from web pages?
keyboard_arrow_down
Currently, the Reader API does not process media content, but future enhancements will include image captioning and video summarization.
Is it possible to use the Reader API on local HTML files?
keyboard_arrow_down
No, the Reader API can only process content from publicly accessible URLs.
Does Reader API cache the content?
keyboard_arrow_down
If you request the same URL within 5 minutes, the Reader API will return the cached content.
Can I use the Reader API to access content behind a login?
keyboard_arrow_down
Unfortunately not.
Can I use the Reader API to access PDF on arXiv?
keyboard_arrow_down
Yes, you can either use the native PDF support from the Reader (https://r.jina.ai/https://arxiv.org/pdf/2310.19923v4) or use the HTML version from the arXiv (https://r.jina.ai/https://arxiv.org/html/2310.19923v4)
How does image caption work in Reader?
keyboard_arrow_down
Reader captions all images at the specified URL and adds `Image [idx]: [caption]` as an alt tag (if they initially lack one). This enables downstream LLMs to interact with the images in reasoning, summarizing etc.
What is the scalability of the Reader? Can I use it in production?
keyboard_arrow_down
The Reader API is designed to be highly scalable. It is auto-scaled based on the real-time traffic and the maximum concurrency requests is now around 4000. We are maintaining it actively as one of the core products of Jina AI. So feel free to use it in production.
What is the rate limit of the Reader API?
keyboard_arrow_down
Please find the latest rate limit information in the table below. Note that we are actively working on improving the rate limit and performance of the Reader API, the table will be updated accordingly.
speedRate limit
What is Reader-LM? How can I use it?
keyboard_arrow_down
Reader-LM is a novel small language model (SLM) designed for data extraction and cleaning from the open web. It converts raw, noisy HTML into clean markdown, drawing inspiration from Jina Reader. With a focus on cost-efficiency and small model size, Reader-LM is both practical and powerful. It is currently available on AWS, Azure, and GCP marketplaces. If you have specific requirements, please contact us at sales AT jina.ai.
launchAWS SageMakerlaunchGoogle CloudlaunchMicrosoft Azure
Reranker-related common questions
How much does the Reranker API cost?
keyboard_arrow_down
The pricing for the Reranker API is aligned with our Embedding API pricing structure. It begins with 10 million free tokens for each new API key. Beyond the free tokens, different packages are available for purchase. For more details, please visit our pricing section.
What is the difference between the two rerankers?
keyboard_arrow_down
jina-reranker-v2-base-multilingual excels in multilingual support, outperforming bge-reranker-v2-m3 and offering 15x faster throughput than jina-reranker-v1-base-en. It also supports agentic tasks and code retrieval. jina-colbert-v2 improves upon ColBERTv2, delivering 6.5% better retrieval performance and adding multilingual support for 89 languages. It features user-controlled embedding sizes for optimal efficiency and precision.
Are Jina Rerankers open source?
keyboard_arrow_down
Yes, both jina-reranker-v2-base-multilingual and jina-colbert-v2 are open source and available under the CC-BY-NC 4.0 license. You are freely to use, share, and adapt the models for non-commercial purposes.
Do the rerankers support multiple languages?
keyboard_arrow_down
Yes, both jina-reranker-v2-base-multilingual and jina-colbert-v2 support 100+ languages, including English, Chinese, and other major global languages. They are optimized for multilingual tasks and outperform previous models.
What is the maximum length for queries and documents?
keyboard_arrow_down
The maximum query token length is 512. There is no token limit for documents.
What is the maximum number of documents I can rerank per query?
keyboard_arrow_down
You can rerank up to 2048 documents per query.
What is the batch size and how many query-document tuples can I send in one request?
keyboard_arrow_down
There is no concept of batch size unlike our Embedding API. You can send only one query-document tuple per request, but the tuple can include up to 2048 candidate documents.
What latency can I expect when reranking 100 documents?
keyboard_arrow_down
Latency varies from 100 milliseconds to 7 seconds, depending largely on the length of the documents and the query. For instance, reranking 100 documents of 256 tokens each with a 64-token query takes about 150 milliseconds. Increasing the document length to 4096 tokens raises the time to 3.5 seconds. If the query length is increased to 512 tokens, the time further increases to 7 seconds.
Below is the time cost of reranking one query and 100 documents in milliseconds:
Number of tokens in each document
Number of tokens in the query256512102420484096
64156323136621073571
128194369137721233598
256273475139721554299
5124681385211435367068
Can your endpoints be hosted privately on AWS, Azure, or GCP?
keyboard_arrow_down
Yes, our services are available on AWS, Azure, and GCP marketplaces. If you have specific requirements, please contact us at sales AT jina.ai.
launchAWS SageMakerlaunchGoogle CloudlaunchMicrosoft Azure
Do you offer a fine-tuned reranker on domain-specific data?
keyboard_arrow_down
If you are interested in a fine-tuned reranker tailored to specific domain data, please contact our sales team. Our team will respond to your inquiry promptly.
Contact
What's the minimum image size for the documents?
keyboard_arrow_down
The minimum acceptable image size for the jina-reranker-m0 model is 28x28 pixels.
Embeddings-related common questions
How were the jina-embeddings-v3 models trained?
keyboard_arrow_down
For detailed information on our training processes, data sources, and evaluations, please refer to our technical report available on arXiv.
launcharXiv
What are the jina-clip models, and can I use them for text and image search?
keyboard_arrow_down
Jina CLIP jina-clip-v2 is an advanced multimodal embedding model that supports text-text, text-image, image-image, and image-text retrieval tasks. Unlike the original OpenAI CLIP, which struggles with text-text search, Jina CLIP excels as a text retriever. jina-clip-v2 offers a 3% performance improvement over jina-clip-v1 in both text-image and text-text retrieval tasks, supports 89 languages for multilingual image retrieval, processes higher resolution images (512x512), and reduces storage requirements with Matryoshka representations. You can read more about it in our tech report.
launcharXiv
Which languages do your models support?
keyboard_arrow_down
As of its release on September 18, 2024, jina-embeddings-v3 is the best multilingual model and ranks 2nd on the MTEB English leaderboard for models with fewer than 1 billion parameters. v3 supports a total of 89 languages, including the top 30 with the best performance: Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, Georgian, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Urdu, and Vietnamese. For more details, please refer to the jina-embeddings-v3 tech report.
launcharXiv
What is the maximum length for a single sentence input?
keyboard_arrow_down
Our models allow for an input length of up to 8192 tokens, which is significantly higher than most other models. A token can range from a single character, like 'a', to an entire word, such as 'apple'. The total number of characters that can be input depends on the length and complexity of the words used. This extended input capability enables our jina-embeddings-v3 and jina-clip models to perform more comprehensive text analysis and achieve higher accuracy in context understanding, especially for extensive textual data.
What is the maximum number of sentences I can include in a single request?
keyboard_arrow_down
A single API call can process up to 2048 sentences or texts, facilitating extensive text analysis in one request.
How do I send images to the jina-clip models?
keyboard_arrow_down
You can use either url or bytes in the input field of the API request. For url, provide the URL of the image you want to process. For bytes, encode the image in base64 format and include it in the request. The model will return the embeddings of the image in the response.
How do Jina Embeddings models compare to OpenAI's and Cohere's latest embeddings?
keyboard_arrow_down
In evaluations on the MTEB English, Multilingual, and LongEmbed benchmarks, jina-embeddings-v3 outperforms the latest proprietary embeddings from OpenAI and Cohere on English tasks, and surpasses multilingual-e5-large-instruct across all multilingual tasks. With a default output dimension of 1024, users can truncate the embedding dimensions down to 32 without sacrificing performance, thanks to the integration of Matryoshka Representation Learning (MRL).
How seamless is the transition from OpenAI's text-embedding-3-large to your solution?
keyboard_arrow_down
The transition is streamlined, as our API endpoint, matches the input and output JSON schemas of OpenAI’s text-embedding-3-large model. This compatibility ensures users can easily replace the OpenAI model with ours when using OpenAI’s endpoint.
How tokens are calculated when using jina-clip models?
keyboard_arrow_down
Tokens are calculated based on the text length and image size. For text in the request, tokens are counted in the standard way. For images, the following steps are conducted: 1. Tile Size: Each image is divided into tiles. For jina-clip-v2, tiles are 512x512 pixels, while for jina-clip-v1, tiles are 224x224 pixels. 2. Coverage: The number of tiles required to cover the input image is calculated. Even if the image dimensions are not perfectly divisible by the tile size, partial tiles are counted as full tiles. 3. Total Tiles: The total number of tiles covering the image determines the cost. For example, a 600x600 pixel image would be covered by 2x2 tiles (4 tiles) in v2 and 3x3 tiles (9 tiles) in v1. 4. Cost Calculation: For jina-clip-v2, each tile costs 4000 tokens, while for jina-clip-v1, each tile costs 1000 tokens. Example: For an image with dimensions 600x600 pixels: • With jina-clip-v2 • The image is divided into 512x512 pixel tiles. • The total number of tiles required is 2 (horizontal) x 2 (vertical) = 4 tiles. • The cost for jina-clip-v2 will be 4*4000 = 16000 tokens. • With jina-clip-v1 • The image is divided into 224x224 pixel tiles. • The total number of tiles required is 3 (horizontal) x 3 (vertical) = 9 tiles. • The cost for jina-clip-v1 will be 9*1000 = 9000 tokens.
Do you provide models for embedding images or audio?
keyboard_arrow_down
Yes, jina-clip-v2 and jina-clip-v1 can embed both images and texts. Embedding models on more modalities will be announced soon!
Can Jina Embedding models be fine-tuned with private or company data?
keyboard_arrow_down
For inquiries about fine-tuning our models with specific data, please contact us to discuss your requirements. We are open to exploring how our models can be adapted to meet your needs.
Contact
Can your endpoints be hosted privately on AWS, Azure, or GCP?
keyboard_arrow_down
Yes, our services are available on AWS, Azure, and GCP marketplaces. If you have specific requirements, please contact us at sales AT jina.ai.
launchAWS SageMakerlaunchGoogle CloudlaunchMicrosoft Azure
Classifier-related common questions
What's different about labels in zero-shot vs few-shot?
keyboard_arrow_down
Zero-shot requires semantic labels during classification and none during training, while few-shot requires labels during training but not classification. This means zero-shot is better for flexible, immediate classification needs, while few-shot is better for fixed, domain-specific categories that can evolve over time.
What's num_iters for and how should I use it?
keyboard_arrow_down
num_iters controls training intensity - higher values reinforce important examples while lower values minimize impact of less reliable data. It can be used to implement time-aware learning by giving recent examples higher iteration counts, making it valuable for evolving data patterns.
How does public classifier sharing work?
keyboard_arrow_down
Public classifiers can be used by anyone with the classifier_id, consuming their own token quota. Users can't access training data or configuration, and can't see others' classification requests, enabling safe classifier sharing.
How much data do I need for few-shot to work well?
keyboard_arrow_down
Few-shot requires 200-400 training examples to outperform zero-shot classification. While it ultimately achieves higher accuracy, it needs this warm-up period to become effective. Zero-shot provides consistent performance immediately without training data.
Can it handle multiple languages and both text/images?
keyboard_arrow_down
Yes - the API supports multilingual queries using jina-embeddings-v3 and multimodal (text/image) classification using jina-clip-v1, with support for URL or base64 encoded images in the same request.
What are the hard limits I should know about?
keyboard_arrow_down
Zero-shot supports 256 classes with no classifier limit, while few-shot is limited to 16 classes and 16 classifiers. Both support 1,024 inputs per request and 8,192 tokens per input.
How do I handle data changes over time?
keyboard_arrow_down
Few-shot mode allows continuous updating through the /train endpoint for adapting to changing data patterns. You can incrementally add new examples or classes when data distribution changes, without rebuilding the entire classifier.
What happens to my training data after I send it?
keyboard_arrow_down
The API uses one-pass online learning - training examples update classifier weights but aren't stored afterward. This means you can't retrieve historical training data, but it ensures privacy and resource efficiency.
Zero-shot vs few-shot - when to use which?
keyboard_arrow_down
Start with zero-shot for immediate results and when you need flexible classification with semantic labels. Switch to few-shot when you have 200-400 examples, need higher accuracy, or need to handle domain-specific/time-sensitive data.
Can I use different models for different languages/tasks?
keyboard_arrow_down
Yes, you can choose between jina-embeddings-v3 for text classification (especially good for multilingual) and jina-clip-v1 for multimodal classification. New models like jina-clip-v2 will be automatically available through the API when released.
Segmenter-related common questions
How much does the Segmenter API cost?
keyboard_arrow_down
The Segmenter API is free to use. By providing your API key, you can access a higher rate limit, and your key won't be charged.
If I don't provide an API key, what is the rate limit?
keyboard_arrow_down
Without an API key, you can access the Segmenter API at a rate limit of 20 RPM.
If I provide an API key, what is the rate limit?
keyboard_arrow_down
With an API key, you can access the Segmenter API at a rate limit of 200 RPM. For premium paid users, the rate limit is 1000 RPM.
Will you charge the tokens from my API key?
keyboard_arrow_down
No, your API key is only used to access a higher rate limit.
Does the Segmenter API support multiple languages?
keyboard_arrow_down
Yes, the Segmenter API is multilingual and supports over 100 languages.
What is the difference between GET and POST requests?
keyboard_arrow_down
GET requests are solely used to count the number of tokens in a text, allows you easily integrate it as a counter in your application. POST requests supports more parameters and features, such as returning the first/last N tokens.
What is the maximum length I can tokenize per request?
keyboard_arrow_down
You can send up to 64k characters per request.
How does the chunking feature work? Is it semantic chunking?
keyboard_arrow_down
The chunking feature segments long documents into smaller chunks based on common structural cues, ensuring accurate segmentation of text into meaningful chunks. Essentially it is a (big!) regex pattern that segments text based on certain syntactical features that often align with semantic boundaries, such as sentence endings, paragraph breaks, punctuation, and certain conjunctions. It is not semantic chunking. This (big) regex is as powerful as it can be within the limitations of regular expressions. It balances complexity and performance. While true semantic understanding isn't possible with regex, it well-approximates context by common structural cues.
How do you handle special tokens such as 'endoftext' in the Segmenter API?
keyboard_arrow_down
If the input contains special tokens, our Segmenter API will put them in the field 'special_tokens'. This allows you to easily identify them and handle them accordingly for your downstream tasks, e.g. removing them before feeding the text into an LLM to prevent injection attacks.
Does chunking support other languages than English?
keyboard_arrow_down
Besides western languages, chunking also works well with Chinese, Japanese, and Korean.
Auto Fine-Tuning-related common questions
How much does the Fine-tuning API cost?
keyboard_arrow_down
The feature is currently in beta and costs 1M tokens per fine-tuned model. You can use your existing API key from the Embedding/Reranker API if it has sufficient tokens, or you can create a new API key, which includes 10M free tokens.
What do I need to input? Do I need to provide training data?
keyboard_arrow_down
You don't need to provide any training data. Simply describe your target domain (the domain for which you want the fine-tuned embeddings to be optimized) in natural language, or use a URL as a reference, and our system will generate synthetic data to train the model.
How long does it take to fine-tune a model?
keyboard_arrow_down
About 30 minutes.
Where are the fine-tuned models stored?
keyboard_arrow_down
The fine-tuned models and synthetic data are stored publicly in the Hugging Face model hub.
If I provide a reference URL, how does the system use it?
keyboard_arrow_down
The system uses the Reader API to fetch the content from the URL. It then analyzes the content to summarize the tone and domain, which it uses as guidelines for generating synthetic data. Therefore, the URL should be publicly accessible and representative of the target domain.
Can I fine-tune a model for a specific language?
keyboard_arrow_down
Yes, you can fine-tune a model for a non-English language. The system automatically detects the language of your domain instructions and generates synthetic data accordingly. We also recommend choosing the appropriate base model for the target language. For example, if targeting a German domain, you should select the 'jina-embeddings-v2-base-de' as the base model.
Can I fine-tune non-Jina embeddings, e.g., bge-M3?
keyboard_arrow_down
No, our fine-tuning API only supports Jina v2 models.
How do you ensure the quality of the fine-tuned models?
keyboard_arrow_down
At the end of the fine-tuning process, the system evaluates the model using a held-out test set and reports performance metrics. You will receive an email detailing the before/after performance on this test set. You are also encouraged to evaluate the model on your own test set to ensure its quality.
How do you generate synthetic data?
keyboard_arrow_down
The system generates synthetic data by integrating the target domain instruction you provide with LLM agents' reasoning. It produces hard negative triplets, which are essential for training high-quality embedding models. For more details, please refer to our upcoming research paper on Arxiv.
Can I keep my fine-tuned models and synthetic data private?
keyboard_arrow_down
Currently, no. Note that this feature is still in beta. Storing the fine-tuned models and synthetic data publicly in the Hugging Face model hub helps us and the community evaluate the quality of the training. In the future, we plan to offer a private storage option.
How can I use the fine-tuned model?
keyboard_arrow_down
Since all fine-tuned models are uploaded to Hugging Face, you can access them via SentenceTransformers by simply specifying the model name.
I never received the email with the evaluation results. What should I do?
keyboard_arrow_down
Please check your spam folder. If you still can't find it, please contact our support team using the email address you provided.
Contact
API-related common questions
code
Can I use the same API key for reader, embedding, reranking, classifying and fine-tuning APIs?
keyboard_arrow_down
Yes, the same API key is valid for all search foundation products from Jina AI. This includes the reader, embedding, reranking, classifying and fine-tuning APIs, with tokens shared between the all services.
code
Can I monitor the token usage of my API key?
keyboard_arrow_down
Yes, token usage can be monitored in the 'API Key & Billing' tab by entering your API key, allowing you to view the recent usage history and remaining tokens. If you have logged in to the API dashboard, these details can also be viewed in the 'Manage API Key' tab.
code
What should I do if I forget my API key?
keyboard_arrow_down
If you have misplaced a topped-up key and wish to retrieve it, please contact support AT jina.ai with your registered email for assistance. It's recommended to log in to keep your API key securely stored and easily accessible.
Contact
code
Do API keys expire?
keyboard_arrow_down
No, our API keys do not have an expiration date. However, if you suspect your key has been compromised and wish to retire it, please contact our support team for assistance. You can also revoke your key in the API Key Management dashboard.
Contact
code
Can I transfer tokens between API keys?
keyboard_arrow_down
Yes, you can transfer tokens from a premium key to another. After logging into your account on the API Key Management dashboard, use the settings of the key you want to transfer out to move all remaining paid tokens.
code
Can I revoke my API key?
keyboard_arrow_down
Yes, you can revoke your API key if you believe it has been compromised. Revoking a key will immediately disable it for all users who have stored it, and all remaining balance and associated properties will be permanently unusable. If the key is a premium key, you have the option to transfer the remaining paid balance to another key before revocation. Notice that this action cannot be undone. To revoke a key, go to the key settings in the API Key Management dashboard.
code
Why is the first request for some models slow?
keyboard_arrow_down
This is because our serverless architecture offloads certain models during periods of low usage. The initial request activates or 'warms up' the model, which may take a few seconds. After this initial activation, subsequent requests process much more quickly.
code
Is user input data used for training your models?
keyboard_arrow_down
We adhere to a strict privacy policy and do not use user input data for training our models. We are also SOC 2 Type I and Type II compliant, ensuring high standards of security and privacy.
Billing-related common questions
attach_money
Is billing based on the number of sentences or requests?
keyboard_arrow_down
Our pricing model is based on the total number of tokens processed, allowing users the flexibility to allocate these tokens across any number of sentences, offering a cost-effective solution for diverse text analysis requirements.
attach_money
Is there a free trial available for new users?
keyboard_arrow_down
We offer a welcoming free trial to new users, which includes ten millions tokens for use with any of our models, facilitated by an auto-generated API key. Once the free token limit is reached, users can easily purchase additional tokens for their API keys via the 'Buy tokens' tab.
attach_money
Are tokens charged for failed requests?
keyboard_arrow_down
No, tokens are not deducted for failed requests.
attach_money
What payment methods are accepted?
keyboard_arrow_down
Payments are processed through Stripe, supporting a variety of payment methods including credit cards, Google Pay, and PayPal for your convenience.
attach_money
Is invoicing available for token purchases?
keyboard_arrow_down
Yes, an invoice will be issued to the email address associated with your Stripe account upon the purchase of tokens.
Offices
location_on
Sunnyvale, CA
710 Lakeway Dr, Ste 200, Sunnyvale, CA 94085, USA
location_on
Berlin, Germany (HQ)
Prinzessinnenstraße 19-20, 10969 Berlin, Germany
location_on
Beijing, China
Level 5, Building 6, No.48 Haidian West St. Beijing, China
location_on
Shenzhen, China
402 Floor 4, Fu'an Technology Building, Shenzhen, China
Search Foundation
DeepSearch
Reader
Embeddings
Reranker
Classifier
Segmenter
API Documentation
Get Jina API key
Rate Limit
API Status
Company
About us
Contact sales
Newsroom
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms
Security
Terms & Conditions
Privacy
Manage Cookies
email
Jina AI © 2020-2025.