Jina.ai logo
Building Open Domain QA system with Jina-image
odqa
qa chatbot

Building Open Domain QA system with Jina

Nan Wang
Nan Wang

Do you struggle to find the right answers to your question from dozens of documents? Do you want to find these answers by asking questions directly in natural languages? Do you get tired of figuring out the proper keywords for the search box? The solution to all of these problems is an intelligent Question Answering (QA) system. In this blog, we will discuss how to build the open domain QA system from scratch with Jina.

Open Domain Question Answering (ODQA) in 2021

Searching for the right information is an integral part of our daily life.

When you are reading this post, you are likely wondering “What is Jina?”, “How will it help me?" or maybe “What are the things to know before you start with Jina?”. The answers to these questions can be easily found in our documentation. Given such factoid questions asked in natural languages, finding answers based on documents is formatted as Open Domain Question Answering (ODQA) in academics.

A standard ODQA pipeline is a two-stage system containing a retriever and reader. The retriever retrieves the candidate contexts via either the traditional or the neural retrieval methods. The reader extracts answers from the contexts.

This procedure is the same as how we answer the questions in an open-book examination. Suppose that we have little knowledge of the question but we are allowed to refer to the books during the exam. A common strategy is to firstly find the related chapters or passages and then read through the text to find the exact answer.

Let’s now look at different components of this open domain question answering pipeline and how they work in tandem to produce a smooth search interface capable of supporting natural language queries.

Retriever

Given a bunch of text documents, the retriever selects a few related passages as contexts based on the question. These contexts are later sent to the reader for extracting answers so the reader doesn’t have to read all the documents and search latency will be minimal. Let's look at two different retrieval methods and see how the execution differs for both of them.

Term-based vs Dense-Vector retrieval

Traditionally, the retriever is implemented using term-based methods, such as TF-IDF or BM25, which match keywords with an inverted index. This implementation is efficient but suffers from the issue of term mismatching. For example, when you want to know the core concepts in Jina and type “What are the core concepts in Jina?”, you will miss the most important document because the original text is written as “... Document, Executor, and Flow are the three fundamental concepts”. Because the search system does not know “core concepts” are semantically related to “fundamental concepts”.

Another issue with the term-base method is the expensive query. Questions such as “What is Jina?” are usually considered expensive queries because such queries will return tons of unrelated results. This is rooted in the fact that the keyword-based search system does not understand the question and purely retrieves all the results containing the keyword “Jina”.

To address these issues, dense retrieval methods have been proposed to replace the term-based method and have been proven to outperform these traditional methods. Instead of building an inverted index and matching the exact keywords, the dense retrieval methods encode the questions and the passages into vectors in a high-dimensional space. The related passages are retrieved by comparing the vector of the question with the vectors of the passages.

Encoder

To encode the questions and the passages, one option is the widely-used pretrained language models. With Jina Hub, you can directly try out different encoders out-of-box.

Here we use the TransfomerTorchEncoder which wraps up the huggingface transformers library and enables the user to use the pretrained models from huggingface transformers directly. Besides, some of the models from the huggingface model hub are supported as well. Here we use the model from sentence-transformers/all-mpnet-base-v2.

from jina import Executor, Document, DocumentArray
encoder = Executor.from_hub('jinahub://TransformerTorchEncoder')
da = DocumentArray([
    Document(text='Jina is a neural search framework.'),
    Document(text='Jina relies heavily on multiprocessing.'),
    Document(text='Jina is backed by Jina AI.')])

encoder.encode(docs=da)
for doc in da:
    print(f'{doc.embedding}')

Tips: If you want Jina to automatically install the extra dependencies, create a python virtual environment following the instructions at virtualenv and change Line 2 to

- encoder = Executor.from_hub('jinahub://TransformerTorchEncoder')
+ encoder = Executor.from_hub('jinahub://TransformerTorchEncoder', install_requirements=True)

Notice that we encode the question and the context passages with the same encoder. This might not be ideal because they usually have very different semantic meanings. For example, “What is Jina?” has a very different meaning from “Jina is a neural search framework”.

A more reasonable approach would be to encode them differently. One of the SOTA models to achieve this is the encoder from Deep Passage Retrieval (DPR), which has trained two BERT models jointly so that the question and the contexts are encoded differently but still into the same space. To choose whether the question or the context model, one can set the encoder_type argument. To try out DPR encoder, you just need to change the following line in the above code

- encoder = Executor.from_hub('jinahub://TransformerTorchEncoder')
+ encoder = Executor.from_hub('jinahub://DPRTextEncoder', uses_with={'encoder_type': 'context'})

Tips: Find more information at Jina Hub about TransformerTorchEncoder and DPRTextEncoder.

Vector Index

For comparing the vectors and retrieving the top K nearest neighbors to the question vector, we can calculate the cosine similarities for each combination between the question and the passages. The complexity is O(n*log(n)) due to the requirement of retrieving top K results.

Alternatively, we usually resort to the Approximate Nearest Neighbour (ANN) algorithms to get an approximated result. In Jina Hub, you can find both implementations easily

indexer = Executor.from_hub('jinahub://SimpleIndexer')
indexer.index(docs=da)

q_da = DocumentArray([Document(text='What is Jina?')])

indexer.search(docs=q_da)
for m in q_da[0].matches:
    print(f'score: {m.scores["cosine"].value:.4f}, text: {m.text}')

To try out the ANN indexer, you just need to change Line 1 as in the below code. Under the hood, we use an indexer based on Hnswlib, which is efficient and flexible in handling large amounts of vectors. The only bottleneck is memory usage because all the vectors are stored in memory in the format of float32.

- indexer = Executor.from_hub('jinahub://SimpleIndexer')
+ indexer = Executor.from_hub('jinahub://U1MIndexer')

Tips: Find more information at Jina Hub about SimpleIndexer and U1MIndexer.

Reader

The reader extracts the exact answer from the context. Usually the contexts are long sentences containing multiple factoid and therefore we need the reader to extract the right answer based on the question. This is well studied as the machine reading comprehension problem. Given a question and the candidate contexts, the reader will return a score together with the most possible starting and ending position of the answers.

For DPR we will use the pretrained reader model by Facebook Research for the ODQA problem. Under the hood it uses the BERT model to serve two purposes. Firstly, the representation of [CLS] token is used to calculate the relevant scores for each context to measure how relevant they are to the question. This part plays a role as a reranker with a cross-attention mechanism, which has more capacity than the dual encoder model. The downside is that this model is more expensive to compute and therefore it is only feasible to use on a small number of candidates.

The second usage of the BERT representation is to calculate the probabilities of being a START or END position. Two hidden layers are appended for calculating the probability of being a START or an END position. All the tokens share the same layer weights.

DPRReaderRanker is available directly at the Jina Hub as well.

ranker = Executor.from_hub('jinahub://DPRReaderRanker')
ranker.rank(docs=q_da)
for m in q_da[0].matches:
    print(f'score: {m.scores["relevance_score"].value:.4f}, text: {m.text}')

Put them all together in a Flow

Now we have gone through the components needed for building an ODQA system. The Jina Flow can help us line them together and serve as a service.

Index

We first create the Jina Flow and then index all the Documents so that the dense vector retriever can retrieve the context. The Documents are encoded by the DPRTextEncoder and stored by the SimpleIndexer.

from jina import Document, DocumentArray, Flow

da = DocumentArray([
     Document(text='Jina is a neural search framework that empowers anyone to build SOTA and scalable deep learning search applications in minutes.'),
     Document(text='Document, Executor, and Flow are the three fundamental concepts in Jina.'),
     Document(text='Jina is backed by Jina AI and licensed under Apache-2.0.')])

f = (Flow()
     .add(uses='jinahub+docker://DPRTextEncoder', encoder_type='context')
     .add(uses='jinahub+docker://SimpleIndexer'))

with f:
     f.post(on='/index', inputs=da, show_progress=True)

Query

After indexing the Documents, we build a query Flow following the retriever-reader structure. We query with a question in natural languages and get back the answers with relevance scores.

from jina import Flow, Document, DocumentArray

f = (Flow(expose_port=45678)
     .add(uses='jinahub+docker://DPRTextEncoder', encoder_type='context')
     .add(uses='jinahub+docker://SimpleIndexer')
     .add(uses='jinahub+docker://DPRReaderRanker'))

q_da = DocumentArray([Document(text='What is Jina?')])

with f:
    resp = f.post(on='/search', inputs=q_da, return_results=True)

for doc in resp[0].docs:
    print(f'question: {doc.text}')
    for m in doc.matches:
        print(f'score: {m.scores["relevance_score"].value:.4f}, answer: {m.text}')

Host a RESTful service

For hosting the service, we need to change the following line of codes,

- with f:
-     f.post(on='/search', inputs=da, return_results=True)
+ with f:
+     f.cors = True
+     f.protocol = 'http'
+     f.block()

Now you can query directly via RESTful API. The Swagger docs UI is available at http://localhost:45678/docs

curl --request POST \
     -d '{"data": ["text": "What is Jina?"]}' \
     -H 'Content-Type: application/json' 
     'http://localhost:45678/search'

A more detailed documentation about the client-serving usage of Jina can be found at Jina Docs

Summary

In this post, we have a gentle walkthrough for building a two-stage ODQA system with Jina. Besides the retriever-reader pipeline, another choice is to replace the reader with a generator for generating answers based on the context. Furthermore, with a large language model such as GPT-3, it is also possible to generate answers directly from the question without retrieving any context. However, efficiency and precision are potential issues in practice.

In the future posts, we will cover more about building ODQA systems with SOTA models in Jina. Stay tuned and happy Searching!

Reference

© Jina AI 2020-2022. All rights reserved.