JINA AI SEARCH FOUNDATION APIs - IMPLEMENTATION GUIDE Key Principles: - Choose simplest solution: Use single API when possible - Answer "can't do" for tasks outside these APIs' scope - Prefer built-in features over custom implementations - Leverage multilingual/multimodal capabilities when needed - Output the final code directly, dont explain anything. Core APIs and Use Cases: 1. EMBEDDINGS API (https://api.jina.ai/v1/embeddings) Purpose: Convert text/images to fixed-length vectors Best for: Semantic search, similarity matching, clustering Example Request: { "model": "jina-embeddings-v3", "task": "text-matching", "input": ["search query", "document text"] } Example Response: { "model": "jina-embeddings-v3", "object": "list", "usage": { "total_tokens": 410, "prompt_tokens": 410 }, "data": [ { "object": "embedding", "index": 0, "embedding": [0.0086874915, 0.10442207, ...] } ] } 2. RERANKER API (https://api.jina.ai/v1/rerank) Purpose: Improve search result relevancy Best for: Refining search results, RAG accuracy Example Request: { "model": "jina-reranker-v2-base-multilingual", "query": "search query", "documents": ["candidate1", "candidate2"] } Example Response: { "model": "jina-reranker-v2-base-multilingual", "usage": { "total_tokens": 815, "prompt_tokens": 815 }, "results": [ { "index": 0, "document": { "text": "Organic skincare for sensitive skin with aloe vera and chamomile..." }, "relevance_score": 0.8783142566680908 } ] } 3. READER API (https://r.jina.ai) Purpose: Convert URLs to LLM-friendly text (basically a markdown) Best for: Web scraping, web content extraction, RAG input preparation Example Response: Title: Example Domain URL Source: https://example.com/ Markdown Content: This domain is for use in illustrative examples in documents... 4. SEARCH API (https://s.jina.ai) Purpose: Web search with LLM-friendly results Best for: Knowledge retrieval, RAG source gathering Example Response: [1] Title: Jina AI - Your Search Foundation, Supercharged. [1] URL Source: https://jina.ai/ [1] Description: Our frontier models form the search foundation... [1] Markdown Content: Jina AI - Your Search Foundation, Supercharged. 5. GROUNDING API (https://g.jina.ai) Purpose: Ground statements with web knowledge Best for: Fact verification, claim validation Example Response: Title: Example Domain URL Source: https://example.com/ Markdown Content: [Fact verification results...] 6. CLASSIFIER API (https://api.jina.ai/v1/classify) Purpose: Zero-shot/few-shot classification Best for: Content categorization without training Example Request: { "model": "jina-embeddings-v3", "input": [{"text": "content"}], "labels": ["category1", "category2"] } Example Response: { "usage": { "total_tokens": 196, "prompt_tokens": 196 }, "data": [ { "object": "classification", "index": 0, "prediction": "Simple task", "score": 0.35216382145881653 } ] } 7. SEGMENTER API (https://segment.jina.ai) Purpose: Tokenize and segment long text Best for: Breaking down documents into manageable chunks Example Response: { "num_tokens": 78, "tokenizer": "cl100k_base", "usage": {"tokens": 0}, "num_chunks": 4, "chunk_positions": [[3,55], [55,93], [93,110], [110,135]], "chunks": [ "Jina AI: Your Search Foundation, Supercharged! πŸš€\n ", "Ihrer Suchgrundlage, aufgeladen! πŸš€\n ", "ζ‚¨ηš„ζœη΄’εΊ•εΊ§οΌŒδ»Žζ­€δΈεŒοΌπŸš€\n ", "ζ€œη΄’γƒ™γƒΌγ‚Ή,γ‚‚γ†δΊŒεΊ¦γ¨εŒγ˜γ“γ¨γ―γ‚γ‚ŠγΎγ›γ‚“οΌπŸš€\n" ] } RECOMMENDED PATTERNS: 1. Basic Search Implementation: - For simple queries: Use Search API directly - For better relevancy: First use Search API, then pass results through Reranker API - Consider using embedding comparison only when semantic matching is crucial 2. RAG (Retrieval-Augmented Generation) Pipeline: - Basic flow: Reader API -> Segmenter -> Embeddings - Enhanced flow: Add Reranker as final step - When to use each step: * Reader: When source is a URL * Segmenter: When content is long * Embeddings: For semantic matching * Reranker: When result ordering is critical 3. Fact Checking Implementation: - Simple verification: Use Grounding API directly - Enhanced verification: Search API first, then Grounding API - Use X-Site header to specify trusted sources 4. Classification Tasks: - Single-language: Use Classifier API directly - Multilingual: Use embeddings-v3 model - Multiple categories: Provide semantic labels 5. Content Processing: - URL content: Reader API only - Long text: Segmenter API only - Mixed content: Reader -> Segmenter INTEGRATION GUIDELINES: - Always handle API errors and rate limits - Implement retries for network failures - Cache results when appropriate - Validate inputs before API calls - Handle multilingual content properly ANTI-PATTERNS TO AVOID: 1. Don't chain APIs unnecessarily 2. Don't segment already short text 3. Don't rerank without query-document pairs 4. Don't use grounding for open questions WHAT THESE APIs CAN'T DO: 1. Generate new text or images 2. Modify or edit content 3. Execute code or perform calculations 4. Real-time data processing 5. Store or cache results permanently All APIs require: - Authorization: Bearer token (get it from https://jina.ai/?sui=apikey) - Error handling - Rate limit consideration - Response validation