JINA AI SEARCH FOUNDATION APIs - IMPLEMENTATION GUIDE

Key Principles:
- Choose simplest solution: Use single API when possible
- Answer "can't do" for tasks outside these APIs' scope
- Prefer built-in features over custom implementations
- Leverage multilingual/multimodal capabilities when needed
- Output the final code directly, dont explain anything.

Core APIs and Use Cases:

1. EMBEDDINGS API (https://api.jina.ai/v1/embeddings)
Purpose: Convert text/images to fixed-length vectors
Best for: Semantic search, similarity matching, clustering
Example Request:
{
    "model": "jina-embeddings-v3",
    "task": "text-matching",
    "input": ["search query", "document text"]
}
Example Response:
{
  "model": "jina-embeddings-v3",
  "object": "list",
  "usage": {
    "total_tokens": 410,
    "prompt_tokens": 410
  },
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0086874915, 0.10442207, ...]
    }
  ]
}

2. RERANKER API (https://api.jina.ai/v1/rerank)
Purpose: Improve search result relevancy
Best for: Refining search results, RAG accuracy
Example Request:
{
    "model": "jina-reranker-v2-base-multilingual",
    "query": "search query",
    "documents": ["candidate1", "candidate2"]
}
Example Response:
{
  "model": "jina-reranker-v2-base-multilingual",
  "usage": {
    "total_tokens": 815,
    "prompt_tokens": 815
  },
  "results": [
    {
      "index": 0,
      "document": {
        "text": "Organic skincare for sensitive skin with aloe vera and chamomile..."
      },
      "relevance_score": 0.8783142566680908
    }
  ]
}

3. READER API (https://r.jina.ai)
Purpose: Convert URLs to LLM-friendly text (basically a markdown)
Best for: Web scraping, web content extraction, RAG input preparation
Example Response:
Title: Example Domain
URL Source: https://example.com/
Markdown Content:
This domain is for use in illustrative examples in documents...

4. SEARCH API (https://s.jina.ai)
Purpose: Web search with LLM-friendly results
Best for: Knowledge retrieval, RAG source gathering
Example Response:
[1] Title: Jina AI - Your Search Foundation, Supercharged.
[1] URL Source: https://jina.ai/
[1] Description: Our frontier models form the search foundation...
[1] Markdown Content: Jina AI - Your Search Foundation, Supercharged.

5. GROUNDING API (https://g.jina.ai)
Purpose: Ground statements with web knowledge
Best for: Fact verification, claim validation
Example Response:
Title: Example Domain
URL Source: https://example.com/
Markdown Content: [Fact verification results...]

6. CLASSIFIER API (https://api.jina.ai/v1/classify)
Purpose: Zero-shot/few-shot classification
Best for: Content categorization without training
Example Request:
{
    "model": "jina-embeddings-v3",
    "input": [{"text": "content"}],
    "labels": ["category1", "category2"]
}
Example Response:
{
  "usage": {
    "total_tokens": 196,
    "prompt_tokens": 196
  },
  "data": [
    {
      "object": "classification",
      "index": 0,
      "prediction": "Simple task",
      "score": 0.35216382145881653
    }
  ]
}

7. SEGMENTER API (https://segment.jina.ai)
Purpose: Tokenize and segment long text
Best for: Breaking down documents into manageable chunks
Example Response:
{
  "num_tokens": 78,
  "tokenizer": "cl100k_base",
  "usage": {"tokens": 0},
  "num_chunks": 4,
  "chunk_positions": [[3,55], [55,93], [93,110], [110,135]],
  "chunks": [
    "Jina AI: Your Search Foundation, Supercharged! 🚀\n  ",
    "Ihrer Suchgrundlage, aufgeladen! 🚀\n  ",
    "您的搜索底座，从此不同！🚀\n  ",
    "検索ベース,もう二度と同じことはありません！🚀\n"
  ]
}

RECOMMENDED PATTERNS:

1. Basic Search Implementation:
- For simple queries: Use Search API directly
- For better relevancy: First use Search API, then pass results through Reranker API
- Consider using embedding comparison only when semantic matching is crucial

2. RAG (Retrieval-Augmented Generation) Pipeline:
- Basic flow: Reader API -> Segmenter -> Embeddings
- Enhanced flow: Add Reranker as final step
- When to use each step:
  * Reader: When source is a URL
  * Segmenter: When content is long
  * Embeddings: For semantic matching
  * Reranker: When result ordering is critical

3. Fact Checking Implementation:
- Simple verification: Use Grounding API directly
- Enhanced verification: Search API first, then Grounding API
- Use X-Site header to specify trusted sources

4. Classification Tasks:
- Single-language: Use Classifier API directly
- Multilingual: Use embeddings-v3 model
- Multiple categories: Provide semantic labels

5. Content Processing:
- URL content: Reader API only
- Long text: Segmenter API only
- Mixed content: Reader -> Segmenter

INTEGRATION GUIDELINES:
- Always handle API errors and rate limits
- Implement retries for network failures
- Cache results when appropriate
- Validate inputs before API calls
- Handle multilingual content properly

ANTI-PATTERNS TO AVOID:
1. Don't chain APIs unnecessarily
2. Don't segment already short text
3. Don't rerank without query-document pairs
4. Don't use grounding for open questions

WHAT THESE APIs CAN'T DO:
1. Generate new text or images
2. Modify or edit content
3. Execute code or perform calculations
4. Real-time data processing
5. Store or cache results permanently

All APIs require:
- Authorization: Bearer token (get it from https://jina.ai/?sui=apikey)
- Error handling
- Rate limit consideration
- Response validation