A list of messages between the user and the assistant comprising the conversation so far.
Streaming
If true, returns a stream of events that happen during the Run as server-sent events, terminating when the Run enters a terminal state with a data: [DONE] message.
Reasoning Effort
Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
medium
arrow_drop_down
upload
Request
Bash
Language
arrow_drop_down
curl https://deepsearch.jina.ai/v1/chat/completions \-H"Authorization: Bearer "\-H"Content-Type: application/json"\-d @- <<EOFEOF
{
"model": "jina-deepsearch-v1",
"messages": [
{
"role": "user",
"content": "Hi!"
},
{
"role": "assistant",
"content": "Hi, how can I help you?"
},
{
"role": "user",
"content": "what's the latest blog post from jina ai?"
}
],
"stream": true,
"reasoning_effort": "medium"
}
EOFEOF
Vibe check with a simple chat UI. DeepSearch is best for complex questions that require iteratively reasoning, world-knowledge or up-to-date information.
For the best experience, we recommend using professional chat clients. DeepSearch is fully compatible with OpenAI's Chat API schema, making it easy to use with any OpenAI-compatible client.
DeepSearch combines web searching, reading, and reasoning for comprehensive investigation. Think of it as an agent that you give a research task to - it searches extensively and works through multiple iterations before providing an answer. This process involves continuous research, reasoning, and approaching the problem from various angles. This is fundamentally different from standard LLMs that generate answers directly from pretrained data, and from traditional RAG systems that rely on one-time, surface-level searches.
Standard LLMs
attach_money
about 1000 tokens
access_time
about 1s
check
Quick answers to general knowledge questions
close
Cannot access real-time or post-training information
Answers are generated purely from pretrained knowledge with a fixed cutoff date
RAG and Grounded LLMs
attach_money
about 10,000 tokens
access_time
about 3s
check
Questions requiring current or domain-specific information
close
Struggles with complex questions requiring multi-hop reasoning
Answers generated by summarizing a single-pass search results
Can access current information beyond training cutoff
DeepSearch
attach_money
about 500,000 tokens
access_time
about 50s
check
Complex questions requiring thorough research and reasoning
info
Takes longer than simple LLM or RAG approaches
Autonomous agent that iteratively searches, reads, and reasons
Dynamically decides next steps based on current findings
Self-evaluates answer quality before returning results
Can perform deep dives into topics through multiple search and reasoning cycles
API pricing is based on the token usage. One API key gives you access to all search foundation products.
With Jina Search Foundation API
The easiest way to access all of our products. Top-up tokens as you go.
Enter the API key you wish to recharge
error
visibility_off
Top up this API key with more tokens
Depending on your location, you may be charged in USD, EUR, or other currencies. Taxes may apply.
Please input the right API key to top up
Understand the rate limit
Rate limits are the maximum number of requests that can be made to an API within a minute per IP address/API key (RPM). Find out more about the rate limits for each product and tier below.
keyboard_arrow_down
Rate Limit
Rate limits are tracked in two ways: RPM (requests per minute) and TPM (tokens per minute). Limits are enforced per IP/API key and can be reached based on whichever threshold—RPM or TPM—is hit first. Note, when API key is provided in the request, rate limits are tracked per key, not per IP address.
Columns
arrow_drop_down
Product
API Endpoint
Descriptionarrow_upward
w/o API Key
w/ API Key
w/ Premium API Key
Average Latency
Token Usage Counting
Allowed Request
Embedding API
https://api.jina.ai/v1/embeddings
Convert text/images to fixed-length vectors
block
500 RPM & 1,000,000 TPM
2,000 RPM & 5,000,000 TPM
bolt
depends on the input size
help
Count the number of tokens in the input request.
POST
Reranker API
https://api.jina.ai/v1/rerank
Tokenize and segment long text
block
500 RPM & 1,000,000 TPM
2,000 RPM & 5,000,000 TPM
bolt
depends on the input size
help
Count the number of tokens in the input request.
POST
Reader API
https://r.jina.ai
Convert URL to LLM-friendly text
20 RPM
200 RPM
1000 RPM
4.6s
Count the number of tokens in the output response.
GET/POST
DeepSearch
https://deepsearch.jina.ai/v1/chat/completions
Reason, search and iterate to find the best answer
block
10 RPM
30 RPM
56.7s
Count the total number of tokens in the whole process.
POST
Reader API
https://s.jina.ai
Search the web and convert results to LLM-friendly text
block
40 RPM
100 RPM
8.7s
Count the number of tokens in the output response.
GET/POST
Reader API
https://g.jina.ai
Grounding a statement with web knowledge
block
10 RPM
30 RPM
22.7s
Count the total number of tokens in the whole process.
GET/POST
Classifier API (Zero-shot)
https://api.jina.ai/v1/classify
Classify inputs using zero-shot classification
block
200 RPM & 500,000 TPM
1,000 RPM & 3,000,000 TPM
bolt
depends on the input size
Tokens counted as: input_tokens + label_tokens
POST
Classifier API (Few-shot)
https://api.jina.ai/v1/classify
Classify inputs using a trained few-shot classifier
block
20 RPM & 200,000 TPM
60 RPM & 1,000,000 TPM
bolt
depends on the input size
Tokens counted as: input_tokens
POST
Classifier API
https://api.jina.ai/v1/train
Train a classifier using labeled examples
block
20 RPM & 200,000 TPM
60 RPM & 1,000,000 TPM
bolt
depends on the input size
Tokens counted as: input_tokens × num_iters
POST
Segmenter API
https://api.jina.ai/v1/segment
Tokenize and segment long text
20 RPM
200 RPM
1,000 RPM
0.3s
Token is not counted as usage.
GET/POST
Auto-Recharge for Low Token Balance
Recommended for uninterrupted service in production. When your token balance drops below the set threshold, we will automatically recharge your saved payment method for the last purchased package, until the threshold is met.
DeepSearch is an LLM API that performs iterative search, reading, and reasoning until it finds an accurate answer to a query or reaches its token budget limit.
How is DeepSearch different from OpenAI and Gemini's deep research capabilities?
keyboard_arrow_down
Unlike OpenAI and Gemini, DeepSearch specifically focuses on delivering accurate answers through iteration rather than generating long-form articles. It's optimized for quick, precise answers from deep web search rather than creating comprehensive reports.
What API key do I need to use DeepSearch?
keyboard_arrow_down
You need a Jina API key. We offers 1M free tokens for new API keys.
What happens when DeepSearch reaches its token budget? Does it return an incomplete answer?
keyboard_arrow_down
It generates a final answer based on all accumulated knowledge, rather than just giving up or returning an incomplete response.
Does DeepSearch guarantee accurate answers?
keyboard_arrow_down
No. While it uses an iterative search process to improve accuracy, the evaluation shows it achieves a 75% pass rate on test questions, significantly better than the 0% baseline (gemini-2.0-flash) but not perfect.
How long does a typical DeepSearch query take?
keyboard_arrow_down
It varies significantly - queries can take anywhere from 1 to 42 steps, with an average of 4 steps based on evaluation data. That's 20 seconds. Simple queries might resolve quickly, while complex research questions can involve many iterations and up to 120 seconds.
Can DeepSearch work with any OpenAI-compatible client like Chatwise, CherryStudio or ChatBox?
keyboard_arrow_down
Yes, the official DeepSearch API at deepsearch.jina.ai/v1/chat/completions is fully compatible with the OpenAI API schema, using 'jina-deepsearch-v1' as the model name. Therefore it is super easy to switch from OpenAI to DeepSearch and use with local clients or any OpenAI-compatible client. We highly recommend Chatwise for a seamless experience.
What are the rate limits for the API?
keyboard_arrow_down
Rate limits vary by API key tier, ranging from 10 RPM to 30 RPM. This is important to consider for applications with high query volumes.
What is the content inside the <think> tag?
keyboard_arrow_down
DeepSearch wraps thinking steps in XML tags ... and provides the final answer afterward, following the OpenAI streaming format but with these special markers for the chain of thoughts.
Does DeepSearch use Jina Reader for web search and reading?
keyboard_arrow_down
Yes. Jina Reader is used for web search and reading, providing the system with the ability to efficiently access and process web content.
Why does DeepSearch use so many tokens for my queries?
keyboard_arrow_down
Yes, the token usage of DeepSearch on complex queries is arguably high - averaging 70,000 tokens compared to 500 for basic LLM responses. This shows the depth of research but also has cost implications.
Is there a way to control or limit the number of steps?
keyboard_arrow_down
The system is primarily controlled by token budget rather than step count. Once the token budget is exceeded, it enters Beast Mode for final answer generation. Check reasoning_effort for more details.
How reliable are the references in the answers?
keyboard_arrow_down
References are considered so important that if an answer is deemed definitive but lacks references, the system continues searching rather than accepting the answer.
Can DeepSearch handle questions about future events?
keyboard_arrow_down
Yes, but with extensive research steps. The example of 'who will be president in 2028' shows it can handle speculative questions through multiple research iterations, though accuracy isn't guaranteed for such predictions.
Rate limits are tracked in two ways: RPM (requests per minute) and TPM (tokens per minute). Limits are enforced per IP/API key and can be reached based on whichever threshold—RPM or TPM—is hit first. Note, when API key is provided in the request, rate limits are tracked per key, not per IP address.
Columns
arrow_drop_down
Product
API Endpoint
Descriptionarrow_upward
w/o API Key
w/ API Key
w/ Premium API Key
Average Latency
Token Usage Counting
Allowed Request
Embedding API
https://api.jina.ai/v1/embeddings
Convert text/images to fixed-length vectors
block
500 RPM & 1,000,000 TPM
2,000 RPM & 5,000,000 TPM
bolt
depends on the input size
help
Count the number of tokens in the input request.
POST
Reranker API
https://api.jina.ai/v1/rerank
Tokenize and segment long text
block
500 RPM & 1,000,000 TPM
2,000 RPM & 5,000,000 TPM
bolt
depends on the input size
help
Count the number of tokens in the input request.
POST
Reader API
https://r.jina.ai
Convert URL to LLM-friendly text
20 RPM
200 RPM
1000 RPM
4.6s
Count the number of tokens in the output response.
GET/POST
DeepSearch
https://deepsearch.jina.ai/v1/chat/completions
Reason, search and iterate to find the best answer
block
10 RPM
30 RPM
56.7s
Count the total number of tokens in the whole process.
POST
Reader API
https://s.jina.ai
Search the web and convert results to LLM-friendly text
block
40 RPM
100 RPM
8.7s
Count the number of tokens in the output response.
GET/POST
Reader API
https://g.jina.ai
Grounding a statement with web knowledge
block
10 RPM
30 RPM
22.7s
Count the total number of tokens in the whole process.
GET/POST
Classifier API (Zero-shot)
https://api.jina.ai/v1/classify
Classify inputs using zero-shot classification
block
200 RPM & 500,000 TPM
1,000 RPM & 3,000,000 TPM
bolt
depends on the input size
Tokens counted as: input_tokens + label_tokens
POST
Classifier API (Few-shot)
https://api.jina.ai/v1/classify
Classify inputs using a trained few-shot classifier
Can I use the same API key for reader, embedding, reranking, classifying and fine-tuning APIs?
keyboard_arrow_down
Yes, the same API key is valid for all search foundation products from Jina AI. This includes the reader, embedding, reranking, classifying and fine-tuning APIs, with tokens shared between the all services.
code
Can I monitor the token usage of my API key?
keyboard_arrow_down
Yes, token usage can be monitored in the 'API Key & Billing' tab by entering your API key, allowing you to view the recent usage history and remaining tokens. If you have logged in to the API dashboard, these details can also be viewed in the 'Manage API Key' tab.
code
What should I do if I forget my API key?
keyboard_arrow_down
If you have misplaced a topped-up key and wish to retrieve it, please contact support AT jina.ai with your registered email for assistance. It's recommended to log in to keep your API key securely stored and easily accessible.
No, our API keys do not have an expiration date. However, if you suspect your key has been compromised and wish to retire it, please contact our support team for assistance. You can also revoke your key in the API Key Management dashboard.
Yes, you can transfer tokens from a premium key to another. After logging into your account on the API Key Management dashboard, use the settings of the key you want to transfer out to move all remaining paid tokens.
code
Can I revoke my API key?
keyboard_arrow_down
Yes, you can revoke your API key if you believe it has been compromised. Revoking a key will immediately disable it for all users who have stored it, and all remaining balance and associated properties will be permanently unusable. If the key is a premium key, you have the option to transfer the remaining paid balance to another key before revocation. Notice that this action cannot be undone. To revoke a key, go to the key settings in the API Key Management dashboard.
code
Why is the first request for some models slow?
keyboard_arrow_down
This is because our serverless architecture offloads certain models during periods of low usage. The initial request activates or 'warms up' the model, which may take a few seconds. After this initial activation, subsequent requests process much more quickly.
code
Is user input data used for training your models?
keyboard_arrow_down
We adhere to a strict privacy policy and do not use user input data for training our models. We are also SOC 2 Type I and Type II compliant, ensuring high standards of security and privacy.
Billing-related common questions
attach_money
Is billing based on the number of sentences or requests?
keyboard_arrow_down
Our pricing model is based on the total number of tokens processed, allowing users the flexibility to allocate these tokens across any number of sentences, offering a cost-effective solution for diverse text analysis requirements.
attach_money
Is there a free trial available for new users?
keyboard_arrow_down
We offer a welcoming free trial to new users, which includes one million tokens for use with any of our models, facilitated by an auto-generated API key. Once the free token limit is reached, users can easily purchase additional tokens for their API keys via the 'Buy tokens' tab.
attach_money
Are tokens charged for failed requests?
keyboard_arrow_down
No, tokens are not deducted for failed requests.
attach_money
What payment methods are accepted?
keyboard_arrow_down
Payments are processed through Stripe, supporting a variety of payment methods including credit cards, Google Pay, and PayPal for your convenience.
attach_money
Is invoicing available for token purchases?
keyboard_arrow_down
Yes, an invoice will be issued to the email address associated with your Stripe account upon the purchase of tokens.
Offices
location_on
Sunnyvale, CA
710 Lakeway Dr, Ste 200, Sunnyvale, CA 94085, USA
location_on
Berlin, Germany (HQ)
Prinzessinnenstraße 19-20, 10969 Berlin, Germany
location_on
Beijing, China
Level 5, Building 6, No.48 Haidian West St. Beijing, China
location_on
Shenzhen, China
402 Floor 4, Fu'an Technology Building, Shenzhen, China