While bi-encoder models such as Jina Embeddings can quickly retrieve many matching documents from a database of pre-computed embeddings, reranker models can refine this set by using a slower but more precise approach of cross-encoding users’ queries and retrieved documents. Jina AI has released our first reranker model, jina-reranker-v1-base-en
, and, in this article, we’ll present in-depth reasoning for why a reranker is essential to optimize RAG accuracy and how to get started building a state-of-the-art RAG system using Jina Embeddings/Reranker, LlamaIndex, and the Mixtral-8x7B-Instruct-v0.1
language model (hosted on HuggingFace).
You’ll need:
Since the Jina Embeddings and Reranker models as well as Mixtral run remotely and are accessed via a RESTful API, you won’t need any special hardware.
What is a reranker?
Before continuing with the tutorial, it is important to highlight what rerankers are in the first place. For a full understanding of what a reranker is and why Jina Reranker V1 is the best choice for you, we encourage you to read our Jina Reranker V1 release post before continuing.
In summary, rerankers are cross-encoder models that take as input a document-query pair, and emit a combined relevance score for that input pair. Using rerankers, users can sort documents from most to least relevant for a given query.
Why use jina-reranker-v1-base-en
?
Reranking provides much more relevant information than using solely an embedding model. In our model release post, we demonstrated that Jina Reranker stands out compared to its open- and closed-source competitors and it can improve search systems by 8% in hit rate and 33% in mean reciprocal rank.
This has a direct impact on the quality of responses obtained through the applied RAG solution. With the theory backing up this claim, we’ll show you a practical example so you can see with your own eyes what effect Jina Reranker has on a RAG pipeline built with LlamaIndex.
Before we start: A Note on LlamaIndex Node-Postprocessors
Node-postprocessors in LlamaIndex are modules that transform or filter nodes after retrieval and before response synthesis within a query engine. As part of this package, LlamaIndex offers both built-in options as well as an API for custom additions.
Jina Reranker has now been integrated into LlamaIndex as a node postprocessor. To increase response accuracy, retrieved nodes are re-ordered based on relevance to the query, and the top N nodes are returned.
Follow along on Google Colab
This tutorial has an accompanying notebook that you can run on Google Colab or locally.
The dataset: 2024 Nike Kids Product Catalog
To showcase Jina Reranker’s performance increase for RAG applications, we’ve chosen the 2024 Nike Kids Product Catalog as our dataset. The document contains a structured set of kids’ products offered by Nike in 2024. We selected this dataset as it showcases the effect of using a reranker clearly and is relatable to most users.
Install the prerequisites
To install the requirements, run:
pip install llama-index-postprocessor-jinaai-rerank
pip install llama-index-embeddings-jinaai
pip install llama-index
pip install llama-index-llms-huggingface
pip install "huggingface_hub[inference]"
Access Mixtral LLM
To use the Mixtral-8x7B-Instruct-v0.1
LLM, you need a HuggingFace token.
from llama_index.llms.huggingface import HuggingFaceInferenceAPI
hf_inference_api_key = "<your HuggingFace access token here>"
mixtral_llm = HuggingFaceInferenceAPI(
model_name="mistralai/Mixtral-8x7B-Instruct-v0.1",
token=hf_inference_api_key,
)
Access Jina Embeddings and Jina Reranker
To use our Jina Embeddings and Jina Reranker, you need a dedicated API key. Store it in a variable called api_key
and call the Jina Embeddings model from LlamaIndex:
from llama_index.embeddings.jinaai import JinaEmbedding
api_key = "<your Jina key here>"
jina_embeddings = JinaEmbedding(api_key=api_key)
Similarly, you can call the Jina Reranker model. By setting the top_n
parameter, you can decide how many of the most relevant documents to return in the final output. In this case, we set top_n=2
:
from llama_index.postprocessor.jinaai_rerank import JinaRerank
jina_rerank = JinaRerank(api_key=api_key, top_n=2)
Download the 2024 Nike Kids Product Catalog
To download the data, run the following code:
from llama_index.core import SimpleDirectoryReader
import requests
url = '<https://niketeam-asset-download.nike.net/catalogs/2024/2024_Nike%20Kids_02_09_24.pdf?cb=09302022>'
response = requests.get(url)
with open('Nike_Catalog.pdf', 'wb') as f:
f.write(response.content)
reader = SimpleDirectoryReader(
input_files=["Nike_Catalog.pdf"]
)
documents = reader.load_data()
Generate and index embeddings with Jina Embeddings
Now that the setup is complete, we’ll generate the embedding vectors (nodes) and index them. Jina Embeddings v2 models accept input of up to 8192 tokens, large enough that for a document like this, we don’t need to do any further text segmentation or check if any section has too many tokens. To embed and index the document, run the following code:
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(
documents=documents, embed_model=jina_embeddings
)
Query for results without Jina Reranker
When we query for specific information from this set of texts, the LlamaIndex query_engine
does the following:
- With Jina Embeddings V2, it creates an embedding for the query.
- It uses the index to get the
top_k = 10
stored embeddings with the highest cosine to the query embedding and return its place in the index. - It will look up the corresponding text in the vector data array.
Let’s ask what the best Nike jersey is in terms of fabric:
query_engine = index.as_query_engine(
similarity_top_k=10, llm=mixtral_llm
)
response = query_engine.query(
"What are the best padded pants that Nike sells?",
)
print(response.source_nodes[0].text)
Result:
NIKE KIDS EQUIPMENT87NIKE BRASILIA SMALL DUFFEL 9.5
DM3976 $37.00
SIZES: Misc OFFER DATE: 07/01/22 END DATE: 07/01/25
Tough 600D polyester • Durable 300D polyester • Detachable shoulder
strap • Ventilated shoe or wet/dry storage • Secure zip pocket •
Limited lifetime guarantee • Screened Swoosh design trademark
DIMENSIONS: 20" L x 10" W x 11" H
010 Black/Black/(White) 068 Iron Grey/Black/(White)
...
Query for results with Reranker
We now want to apply the reranker to see if the RAG application yields a different, more relevant result. To do so, we need to add the node_postprocessors
to the query_engine
:
query_engine = index.as_query_engine(
similarity_top_k=10, llm=mixtral_llm, node_postprocessors=[jina_rerank]
)
response = query_engine.query(
"What are the best padded pants that Nike sells?",
)
print(response.source_nodes[0].text)
Note that compared to the previous case without the reranker, the query_engine
now also contains the node_postprocessors parameter set to [jina_rerank]
.
Result:
NIKE KIDS FOOTBALL – STOCK10
DJ5731 $47.00
SIZES: XS, S, M, L, XL, 2XL, 3XL
FABRIC: Body/panels lining: 100% polyester. Pad: 100%
ethylene vinyl acetate.
OFFER DATE: 04/01/23
END DATE: 04/01/27
Take the field ready to give it your all in the Nike Recruit
Pants. They’re made from lightweight, stretchy fabric with
sweat-wicking power to help keep you dry and moving freely
when the game heats up. With integrated pads shaped for a
comfortable fit, you’ll be prepared for a performance you can
be proud of. Choose from 6 different colors to outfit your
team. Nike Dri-FIT technology moves sweat away from your skin
for quicker evaporation, helping you stay dry and comfortable.
Lightweight knit fabric stretches with you to let you move
naturally. Thigh, knee, hip and tailbone pads are shaped for
an optimal fit, without compromising on coverage. A
body-hugging fit is designed to help keep the padding in place
and close to the body. Belt at the waist lets you dial in your
perfect fit to maximize comfort. Elastic at hems.
Hip width: 15", Inseam length: 11.75" (size medium).
010 Black/(White) 060 Team Anthracite/(White) 100 White/(Black)
419 Team Navy/(White) 493 Team Royal/(White) 657 Team Scarlet/(White)
Conclusion
As we can see, the query without the reranker leads to a top result which mentions “mesh back for breathability” and “slim fit with soft hand feel”. In comparison, by using a reranker, we obtain a top result that is “engineered for optimal breathability”, has a “moisture-wicking design” that “helps keep you dry and cool under match-day pressure”, and features “lightweight fabric in a relaxed, easy fit”.
The second result is much more accurate and appropriate for the query we asked. With our last two posts, we showed both from a theoretical and practical perspective that adding Jina Reranker to your RAG pipeline increases your retrieval accuracy and improves the quality of the responses you obtain from it.