Reader
Get LLM-friendly input from a URL or a web search, by simply adding
r.jina.ai
in front.Feeding web information into LLMs is an important step of grounding, yet it can be challenging. The simplest method is to scrape the webpage and feed the raw HTML. However, scraping can be complex and often blocked, and raw HTML is cluttered with extraneous elements like markups and scripts. The Reader API addresses these issues by extracting the core content from a URL and converting it into clean, LLM-friendly text, ensuring high-quality input for your agent and RAG systems.
LLMs have a knowledge cut-off, meaning they can't access the latest world knowledge. This leads to problems such as misinformation, outdated responses, hallucinations, and other factuality issues. Grounding is absolutely essential for GenAI applications. Reader allows you to ground your LLM with the latest information from the web. Simply prepend
https://s.jina.ai/
to your query, and Reader will search the web and return the top five results with their URLs and contents, each in clean, LLM-friendly text. This way, you can always keep your LLM up-to-date, improve its factuality, and reduce hallucinations.Images on the webpage are automatically captioned using a vision language model in the reader and formatted as image alt tags in the output. This gives your downstream LLM just enough hints to incorporate those images into its reasoning and summarizing processes. This means you can ask questions about the images, select specific ones, or even forward their URLs to a more powerful VLM for deeper analysis!
Yes, Reader natively supports PDF reading. It's compatible with most PDFs, including those with many images, and it's lightning fast! Combined with an LLM, you can easily build a ChatPDF or document analysis AI in no time.
The best part? It's free!
Reader API is available for free and offers flexible rate limit and pricing. Built on a scalable infrastructure, it offers high accessibility, concurrency, and reliability. We strive to be your preferred grounding solution for your LLMs.
Endpoint | Description | Rate limit w/o API key | Rate limit with API key | Token counting scheme | Average latency |
---|---|---|---|---|---|
r.jina.ai | Read a URL return its content, useful for check grounding | 20 RPM | 200 RPM | Based on the output tokens | 3 seconds |
s.jina.ai | Search on the web return top-5 results, useful for search grounding | 5 RPM | 40 RPM | Based on the output tokens for all 5 search results | 10 seconds |
Don't panic! Every new API key contains one million free tokens!
Are you already a paid API user but still want a higher rate limit of up to 1000 RPM? We can support you!
Reader API
Get LLM-friendly input from a URL or a web search, by simply adding <code>r.jina.ai</code> in front.
Basic Usage
double_arrow
Read a URL
Add
https://r.jina.ai/
to any URL in your code or tool where LLM access is expected. This will return the main content of the page in clean, LLM-friendly text.search
Search a query
Add
https://s.jina.ai/
to your query. This will call the search engine and returns top-5 results with their URLs and contents, each in clean, LLM-friendly text.Advanced Usage
The behavior of the Reader API can be controlled with request headers. Here is a complete list of supported headers.
Target Selector
Provide a CSS selector to focus on a more specific part of the page. Useful when your desired content doesn't show under the default settings.
Wait For Selector
Wait for a specific element to appear before returning. Useful when your desired content doesn't show under the default settings.
Gather All Links At the End
A "Buttons & Links" section will be created at the end. This helps the downstream LLMs or web agents navigating the page or take further actions.
Gather All Images At the End
An "Images" section will be created at the end. This gives the downstream LLMs an overview of all visuals on the page, which may improve reasoning.
Use POST Method
Use POST instead of GET method with a URL passed in the body. Useful for building SPAs with hash-based routing.
JSON Response
The response will be in JSON format, containing the URL, title, content, and timestamp (if available). In Search mode, it returns a list of five entries, each following the described JSON structure.
Forward Cookie
Our API server can forward your custom cookie settings when accessing the URL, which is useful for pages requiring extra authentication. Note that requests with cookies will not be cached.
Image Caption
Captions all images at the specified URL, adding 'Image [idx]: [caption]' as an alt tag for those without one. This allows downstream LLMs to interact with the images in activities such as reasoning and summarizing.
Use a Proxy Server
Our API server can utilize your proxy to access URLs, which is helpful for pages accessible only through specific proxies.
Bypass the Cache
Our API server caches both Read and Search mode contents for a certain amount of time. To bypass this cache, set this header to true.
Stream Mode
Stream mode is beneficial for large target pages, allowing more time for the page to fully render. If standard mode results in incomplete content, consider using Stream mode.
Level of Details
You can control the level of detail in the response to prevent over-filtering. The default pipeline is optimized for most websites and LLM input.
Default
upload
Request (bash)
curl 'https://r.jina.ai/https://example.com'
upload
Request (javascript)
fetch('https://r.jina.ai/https://example.com', {
method: 'GET',
})
key
API key
Available tokens
0
API Pricing
Our API pricing is structured around the number of tokens sent in the requests. For Reader API, it is the number of tokens in the responses. This pricing model is applicable to all products in Jina AI's search foundation: Embedding, Reranking, Reader, Auto Fine-Tuning APIs. With the same API key, you have access to all API services.
Enter the API key you wish to recharge
Auto-recharge when tokens are low
Recommended for uninterrupted service in production. When your token balance is below the threshold you set, we will automatically recharge your credit card for the same amount as your last top-up. If you purchased multiple packs in the last top-up, we will recharge only one pack.
≤ 1M Tokens
Recharge threshold
Top up this API key with more tokens
Depending on your location, you may be charged in USD, EUR, or other currencies. Taxes may apply.
Please input the right API key to top up
Reader-related common questions
What are the costs associated with using the Reader API?
How does the Reader API function?
Is the Reader API open source?
What is the typical latency for the Reader API?
Why should I use the Reader API instead of scraping the page myself?
Does the Reader API support multiple languages?
What should I do if a website blocks the Reader API?
Can the Reader API extract content from PDF files?
Can the Reader API process media content from web pages?
Is it possible to use the Reader API on local HTML files?
Does Reader API cache the content?
Can I use the Reader API to access content behind a login?
Can I use the Reader API to access PDF on arXiv?
How does image caption work in Reader?
What is the scalability of the Reader? Can I use it in production?
What is the rate limit of the Reader API?
API-related common questions
code
Can I use the same API key for embedding, reranking, reader, fine-tuning APIs?
code
Can I monitor the token usage of my API key?
code
What should I do if I forget my API key?
code
Do API keys expire?
code
Why is the first request for some models slow?
code
Is user input data used for training your models?
Billing-related common questions
attach_money
Is billing based on the number of sentences or requests?
attach_money
Is there a free trial available for new users?
attach_money
Are tokens charged for failed requests?
attach_money
What payment methods are accepted?
attach_money
Is invoicing available for token purchases?