Jina.ai logo
How to build a production-ready Financial Question Answering system with Jina and BERT-image
Financial Q&A

How to build a production-ready Financial Question Answering system with Jina and BERT

Bithiah Yuan
Bithiah Yuan

Learn how to use the neural search framework, Jina, to build a Financial Question Answering (QA) search application using the FiQA dataset, PyTorch, and Hugging Face transformers.

For my master’s thesis, I built a Financial QA system using a fine-tuned BERT model called FinBERT-QA. Motivated by the emerging demand in the financial industry for the automatic analysis of unstructured and structured data at scale, QA systems can provide lucrative and competitive advantages to companies by facilitating the decision making of financial advisers.

The goal of my thesis was to search for a ranked list of relevant answer passages given a question. Here is an example of a financial domain-based question and a ground truth answer from the FiQA dataset:

Sample QA from the financial domain

Here is a list of other questions from FiQA:

• What does it mean that stocks are “memoryless”?
• What would a stock be worth if dividends did not exist?
• What are the risks of Dividend-yielding stocks?
• Why do financial institutions charge so much to convert currency?
• Is there a candlestick pattern that guarantees any kind of future profit?
• 15 year mortgage vs 30 year paid off in 15
• Why is it rational to pay out a dividend?
• Why do companies have a fiscal year different from the calendar year?
• What should I look at before investing in a start-up?
• Where do large corporations store their massive amounts of cash?

Financial QA is hard because the vocabularies are context specific, for example, a machine would have a hard time understanding what an ETF is. Nevertheless, with the power of BERT, I improved the state-of-the-art (SOTA) results by an average of 19% on three evaluation metrics (Precision, MRR, NDCG).

Evaluation results from FinBERT-QA

Even though my thesis was about QA in the financial domain, the approach that I have used can be applied to a general QA dataset or QA in other domains such as insurance.

After finishing my thesis, I realized that just having a model and SOTA results is not good enough because there was a gap between my research and business needs. This was when I discovered Jina, a framework designed to help me bridge this gap. To help people better understand Jina, I prepared this tutorial to demonstrate how I used Jina to easily transform my research into a production-ready system.

Table of Contents


What is Jina?

Open-source deep learning frameworks such as TensorFlow and PyTorch provide building blocks for designing and quickly implementing neural network-based applications through a high level programming interface.

Similarly, Jina is an open-source neural search framework that offers the building blocks for designing and implementing neural network-based search applications.

Co-founded by the creator of bert-as-service and Fashion-MNIST, Jina enables developers to build production-ready cloud-native search systems using SOTA pre-trained deep learning models, in which each component of the system is a microservice that can be deployed, scaled, and maintained independently.

If you come from a data science or academic background like me, the terms cloud-native and microservices may sound daunting. That's why we will learn by example in this tutorial and use the NLP task, Financial QA, to familiarize ourselves with Jina's core concepts!

Financial QA with BERT

Before we jump into the tutorial, let's first understand how to build a QA system with BERT. Our goal is to search for the top-k most relevant answer passages when given a question from task 2 of the FiQA dataset.

In 2018, Google's pre-trained BERT models, used for transfer learning, shook the NLP world and achieved the SOTA results on numerous tasks, marking NLP's ImageNet moment.

What is neat about BERT is that we can fine-tune a pre-trained BERT model on our QA task by simply transforming it into a binary classification task, where the input is the concatenation of a question and an answer and the output is a binary label indicating the relevancy score of the QA pair. We can then take the softmax scores of each QA pair to get a probability of relevancy and rank these scores.

Fine-tuning method for our QA task

The FiQA dataset has roughly 6,000 questions and 57,000 answers. Instead of computing a probability for each question 57,000 times, we can adapt a passage reranking approach. We first use a Retriever to return the top-50 candidate answers for each question, and then use FinBERT-QA, a BERT-based model fine-tuned on the FiQA dataset as a Reranker to compute the relevancy scores and rerank the top-50 QA pairs to get the top-10 answers.

QA pipeline with reranking

If you are interested in the details of my thesis, you can learn more here.

Why Jina?

Why is having the SOTA model and results is not good enough?

Jina as a bridge between research and industry

The motivation behind my research was to be able to help financial advisor answer questions from large-scale reports. However, the way I implemented the QA pipeline is not reusable and it won't scale to business demands. By industry standards, it is not production-ready.

Since Jina enables us to build cloud-native systems, which embrace microservices, instead of wrapping my entire pipeline in a single Docker container, Jina will break down the pipeline into components (preprocessor, encoder, indexer, etc.). Moreover, each of these components will be a microservice in its own isolated Docker container managed by the Flow API.

Illustration from manypixels

For those of you new to cloud-native concepts, you can think of a microservice as an independent component of your application, for example, using FinBERT-QA to encode our questions and answers. You can then create multiple independent components or microservices to construct an application like a BERT-powered QA system. Since each of the components of the application can be deployed independently, they can also scale individually and respond to rapid changes and business needs.

Illustration from manypixels

Being cloud-native is a modern design that more and more businesses are adapting to because it can help them save resources and grow. However, designing such systems is not easy. We need to consider many principles, patterns and best practices, for example, how will the each component communicate with each other? How can they work in parallel? Luckily, instead of starting from scratch, Jina does all the hard work for us by providing us the building blocks so that we can easily construct a cloud-native BERT-powered QA system using an reranking approach that is ready to serve in production!


Now that we have an overview, let's learn how to build a production-ready Financial QA system using the reranking approach and dive deeper into some new Jina terminologies. We will use FinBERT to encode our questions and answer passages into embeddings and FinBERT-QA to rerank the top-50 answer matches.

The final code of this tutorial can be found here.

Set up

Clone the repository that we will be working together with here:

git clone https://github.com/yuanbit/jina-financial-qa-search-template.git

We will use financial-qa-search/ as our working directory.

Install the requirements

pip install -r requirements.txt

Download data and model

bash get_data.sh

For this tutorial, we won't be searching through all 57,000 answer passages from the FiQA dataset. We will work with a sample dataset called test_answers.csv, containing about 800 answer passages. If you want to experiment with the full dataset, you can use answer_collection.tsv.


In Jina, we will build a Financial QA system with two pipelines, one for indexing our answer passages and the other for querying. These pipelines are called Flows, which also serve to manage the state and context of the microservices as well as orchestrating them. Let's see what an overview of the Index Flow and Query Flow, you can click on the images to see the details:

Index Flow

Query Flow

To understand these Flows, let's start with the Index Flow and look into the individual components one by one.

Index Flow

The main idea behind the Index Flow is to use a pre-trained BERT model to encode all of our answer passages into embeddings then indexing these embeddings so that they can be searched in the Query Flow.

Step 1. Define our data

We want to index a subset of the answer passages from the FiQA dataset, dataset/test_answers.csv:

398960	From  http://financial-dictionary.thefreedictionary.com/Business+Fundamentals  The  facts  that  affect  a  company's      underlying  value.  Examples  of  business      fundamentals  include  debt,  cash  flow,      supply  of  and  demand  for  the  company's      products,  and  so  forth.  For  instance,      if  a  company  does  not  have  a      sufficient  supply  of  products,  it  will      fail.  Likewise,  demand  for  the  product      must  remain  at  a  certain  level  in      order  for  it  to  be  successful.  Strong      business  fundamentals  are  considered      essential  for  long-term  success  and      stability.  See  also:  Value  Investing,      Fundamental  Analysis.  For  a  stock  the  basic  fundamentals  are  the  second  column  of  numbers  you  see  on  the  google  finance  summary  page,    P/E  ratio,  div/yeild,  EPS,  shares,  beta.      For  the  company  itself  it's  generally  the  stuff  on  the  'financials'  link    (e.g.  things  in  the  quarterly  and  annual  report,    debt,  liabilities,  assets,  earnings,  profit  etc.
19183	If  your  sole  proprietorship  losses  exceed  all  other  sources  of  taxable  income,  then  you  have  what's  called  a  Net  Operating  Loss  (NOL).  You  will  have  the  option  to  "carry  back"  and  amend  a  return  you  filed  in  the  last  2  years  where  you  owed  tax,  or  you  can  "carry  forward"  the  losses  and  decrease  your  taxes  in  a  future  year,  up  to  20  years  in  the  future.  For  more  information  see  the  IRS  links  for  NOL.  Note:  it's  important  to  make  sure  you  file  the  NOL  correctly  so  I'd  advise  speaking  with  an  accountant.  (Especially  if  the  loss  is  greater  than  the  cost  of  the  accountant...)
327002	To  be  deductible,  a  business  expense  must  be  both  ordinary  and  necessary.  An  ordinary  expense  is  one  that  is  common  and  accepted  in  your  trade  or  business.  A  necessary  expense  is  one  that  is  helpful  and  appropriate  for  your  trade  or  business.  An  expense  does  not  have  to  be  indispensable  to  be  considered  necessary.    (IRS,  Deducting  Business  Expenses)  It  seems  to  me  you'd  have  a  hard  time  convincing  an  auditor  that  this  is  the  case.    Since  business  don't  commonly  own  cars  for  the  sole  purpose  of  housing  $25  computers,  you'd  have  trouble  with  the  "ordinary"  test.    And  since  there  are  lots  of  other  ways  to  house  a  computer  other  than  a  car,  "necessary"  seems  problematic  also.

Our dataset consists of a column of answer id and text, which we will denote as docid and doc respectively in this tutorial. In order to index our data, we need to first define it in a Jina data type called Document.

Index Flow - Step 1

In programming languages there are data types such as int, float, boolean, and more. In NumPy, TensorFlow, and PyTorch, we manipulate and pass around objects such as ndarray and tensor, which are referred to as primitive data types. Similarly, a Document is a Jina-specific data type for representing data.

Defining our data in a Document

In our project directory financial-qa-search/ the app.py file consists of the Financial QA search application that we will build. Notice that we set our data path in the config function as follows:

You can change the path to answer_collection.tsv to index with the full dataset.

Let's first make sure we import Document from jina:

After the config function, let's create a Python generator and define the Document to contain the id and text corresponding to the answer passages:

A Document is a high-level way for us to define and view the contents stored in Protobuf, which is what Jina uses to enable the microservices in the Flow to communicate with each other. It is like an envelope containing our data and is used to send messages between the microservices of our Flow. Instead of directly dealing with Protobuf, which serializes our data into bytes, we can simply print our Document and see that a single answer passage will look as follows:

id: "13755c6081bebe1a"
mime_type: "text/plain"
tags {
  fields {
    key: "id"
    value {
      number_value: 398960.0
text: "From  http://financial-dictionary.thefreedictionary.com/Business+Fundamentals  The  facts  that  affect  a  
company\'s underlying  value. Examples  of  business fundamentals  include  debt,  cash  flow, supply  of  and  demand  
for  the  company\'s      products,  and  so  forth.  For  instance, if  a  company  does  not  have  a sufficient  
supply  of  products,  it  will      fail.  Likewise,  demand  for  the  product      must  remain  at  a  certain  
level  in      order  for  it  to  be  successful.  Strong      business  fundamentals  are  considered essential  for  
long-term  success  and      stability.  See  also:  Value  Investing, Fundamental  Analysis.  For  a  stock  the  basic
fundamentals  are  the  second  column  of  numbers  you  see  on  the  google  finance  summary  page, P/E  ratio,  
div/yeild,  EPS,  shares,  beta.      For  the  company  itself  it\'s  generally  the  stuff  on  the  \'financials\'  
link    (e.g.  things  in  the  quarterly  and  annual  report,    debt,  liabilities,  assets,  earnings,  profit  etc."

As we move along the Index Flow, the contents of the Document will be changed, for example, we can see in the Index Flow that the embeddings of the answer passages are added to the Document after the encoding step.

Embeddings of the answer passages are added to the Document after the encoding step1

The encoding step uses an Executor, namely the Encoder. Let's understand this more next.

Step 2. Encode Answer Passages

After defining the Document for the Index Flow, the next step is to encode the answer text into embeddings using a pre-trained BERT model. The logic that does the encoding is called an Encoder, which is part of Jina's family of Executors.

We will look at other Executors later and only focus on the Encoder for now. Instead using TensorFlow or PyTorch with the combination of Hugging Face transformers and implementing the Encoder ourselves, we can simply take advantage of Jina Hub, an open-registry for hosting Jina Executors via container images.

There are all kinds of Encoders and other types of Executors in Jina Hub for different tasks and data types (e.g. image, video, audio, multimodal), allowing us to ship and exchange reusable component and build various deep learning-based search engines, e.g. text-image, cross-modal, and multi-modal searches. Since our task is a text-to-text search, we will use the TransformerTorchEncoder for this tutorial.

Before we talk about how to use the Encoder in our Index Flow, let's understand three more important Jina concepts in this step:

  • Driver: Recall Jina uses Protobuf to send messages between the microservices in the Flow, which are in the form of bytes. We would have a problem if we were to pass the Document directly to the Encoder because the Encoder needs the answer text as input instead of bytes. Instead of dealing directly with Protobuf, Jina uses Drivers to translate data for an Executor, so that we only need to work with data types that we are familiar with (e.g. text, image, np,array, etc...). The Driver interprets messages in the Flow and passes the appropriate data to the Executor.

Encoder - the Driver receives the Document in byes and passes the text to the Encoder. The Encoder outputs the embeddings of the text and the Driver adds them to the Document.

For example, in the encoding step, the Driver receives the Document in bytes, interprets it as a Document, and passes the text in the Document to the Encoder. After the Encoder outputs the embeddings for the corresponding text, the Driver again interprets the embeddings, and adds them to the Document. The Document below shows how it has be transformed by the Driver in the encoding step and will serve as the input for the next indexing step.

The Driver transformed the Document by adding the embeddings in the encoding step.

  • Pea: Since an Executor needs a Driver to be able to process our data, they are both necessary components of a microservice in the Flow. Therefore, we use a Pea to wrap the Executor and Driver together to get our Encoder Microservice. The Pea is, therefore, a microservice that constantly listens for incoming messages from the gateway or other Peas in the Flow and calls the Driver when it receives a message. As a microservice, Peas can also run in Docker, containing all dependencies and context in one place.

  • Pod: To optimize our neural search application, Jina provides parallelization out of the box. Instead of having a single Encoder, we can split it into multiple processes. The visualization of the Encoding step shows the Encoder being split into three processes with each process wrapped by a Pea.

    In order for our multiple Encoder microservices to behave functionally as one Encoder, we wrap the group of homogeneous (identical) Peas in a Pod. The Pod is, therefore, a group of homogeneous microservices that is also responsible for load balancing, further control and context management. The beauty about this design is that a Pod can either run on the local host or on different computers over a network, making our application distributed, efficient, and scalable.

Now that we understand these foundational concepts, how do we create a Pod for the Encoder?

It may all sound extremely complicated, but with the building blocks provided by Jina we can (1) design an Index Flow and (2) create an Encoder Pod with two simple YAML files. These YAML files will allow us to customize our neural search application without touching the core of Jina's code.

I. Create a Pod for the Encoder

Let's first create a file encode.yml inside the folder called pods. In encode.yml we first specify the name of the Encoder we want to use from Jina Hub TransformerTorchEncoder. We can choose the model we want to use, in our case we use FinBERT, which further pre-trained bert-base-uncased on a large financial corpus. Since TransformerTorchEncoder was implemented using Hugging Face transformers, you can also directly use the model by specifying its name if it is available on the Hugging Face Model Hub. We can also include other hyperparameters such as the maximum sequence length or pooling strategy.

Simple as that! 🐣 We just created a deep learning-based Encoder microservice ready to be parallelized! The pods folder will also be the home to other Pods that we will need, which will also be defined using YAML files.

II. Add the Encoder to the Index Flow

Now that we have our Encoder ready, let's put it in our Index Flow. Let's create a file index.yml inside a folder called flows. In index.yml, we specify our first Pod in the Index Flow which is the encoder by giving the path to our pods/encode.yml file. We can specify how many processes we want to split the Encoder into by using the parallel parameter. This will be an environment variable specified in app.py, which we will look at in the end. parallel also determines how many Peas we will have in each Pod.

Well done! 💪 You've just created a pipeline for deep learning-powered microservices! Next, let's finish the design of the Index Flow by adding another Pod containing the Indexer.

Step 3. Indexing

After obtaining the embeddings for the answer passages, we will create another Executor called the Indexer to store our data so that they can be retrieved in query time. Similar to the previous step, the Driver receives the Document and passes the docid, doc, and embeddings to the Indexer.

We will use the Compound Indexer, which acts as a single indexer using both the (1) Vector and (2) Key-Value Indexers from Jina Hub:

  1. Vector Indexer: Stores the answer embeddings and is queried by the question embedding to retrieve the closest answer embeddings using the k-nearest neighbors algorithm.

  2. Key-Value (KV) Indexer: Stores the Document data (text, blob, metadata) and is queried by the Document id (normally extracted from the Vector Indexer) to retrieve the information of the data such as answer id and text.

The Indexer will store our data so that they can be retrieved in query time

We again wrap the Driver and Indexer in a Pea, group identical Peas in a Pod, and define them using YAML files.

I. Create a Pod for the Indexer

Let's create the file pods/doc.yml and define our compound indexer as !CompoundIndexer with the components !NumpyIndexer which is the Vector Indexer and !BinaryPbIndexer which is the KV Indexer. The indexed data will be stored in vec.gz and doc.gz respectively. The workspace is the directory where the indexes will be stored, which will be inside our working directory.

II. Add the Indexer to the Index Flow

Now let's go back to flows/index.yml and add our Indexer to the Index Flow as doc_indexer. If our data is big, we can also add sharding to our application for optimization. This will also be used as an environment variable in app.py, which we will see later.

Great job! 👏 You have just designed a cloud-native pipeline for indexing financial answer passages! We can also use Jina's Flow API to visualize the Index Flow. First let's set our environment variables in the terminal for parallel and shards:

export JINA_PARALLEL='1'
export JINA_SHARDS='1'

Next, let's open a jupyter notebook in our working directory and do the following:

Index Flow visualzation

Here we see our Index Flow with two Pods - the Encoder, encoder and Indexer, doc_indexer.

Build an Indexer Application

Let's see how we can use the Index Flow as our application. In app.py, we can change parallel in the config function to indicate how many Peas (processes) we want to split each microservice in for each Pod. We can also change shards to indicate parallelization during the indexing step. We will leave both of them unchanged for now. This means that we will only have one Pea in each Pod.

Let's first import Flow from Jina's Flow API:

After the index_generator function that we added in Step 1. Define our data, let's add the index function which will first load the Index Flow that we have created in flows/index.yml and pass the input Document from index_generator to the flow. We set our batch_size=16 for encoding the answer passages into embeddings.

We are now ready to index our data. In the our working directory run:

python app.py index


At the end you will see the following:

✅ done in1 minute and 54 seconds 🐎 7.7/s
    doc_indexer@18903[I]:recv ControlRequest from ctl▸doc_indexer▸⚐
    doc_indexer@18903[I]:Terminating loop requested by terminate signal RequestLoopEnd()
    doc_indexer@18903[I]:#sent: 56 #recv: 56 sent_size: 1.7 MB recv_size: 1.7 MB
    doc_indexer@18903[I]:request loop ended, tearing down ...
    doc_indexer@18903[I]:indexer size: 865 physical size: 3.1 MB
    doc_indexer@18903[S]:artifacts of this executor (vecidx) is persisted to ./workspace/doc_compound_indexer-0/vecidx.bin
    doc_indexer@18903[I]:indexer size: 865 physical size: 3.2 MB
    doc_indexer@18903[S]:artifacts of this executor (docidx) is persisted to ./workspace/doc_compound_indexer-0/docidx.bin

Hooray 🙌 we finished the first part of our application! The embedding indexes and Document data will be stored in a directory called workspace.

Query Flow

After indexing our data, we need to create a Query Flow. The main idea behind the Query Flow is to use the same BERT-based model to encode a given question into an embedding and use the Indexer to search for the most similar answer embeddings. To further improve the search results, we will use the same reranking technique as my thesis Therefore, we will need to add another a reranking step using FinBERT-QA to recompute the scores of the answer matches returned by Jina.

Query Flow

Let us again walk through the steps one by one.

Step 1. Encode Question

Let's assume that the question text will be a user input. Jina will take this input and define a new Document.

Encoder in the Query Flow

I. Add the Encoder to the Query Flow

Just like the encoding step of the Index Flow, we encode the questions using the same Encoder. Therefore, we can use the same Encoder from pods/encode.yml in our Query Flow. We will create a new query.yml file in the flows folder and add the Encoder Pod to it:

Step 2. Search Indexes

After encoding the questions, the question embeddings will be added to the Document by the Driver. This Document is then sent to the Indexer in the next Pod and the Driver will pass the question embeddings to the Indexer. The Indexer will then search for the answers with the most similar embeddings using the k-nearest neighbors algorithm and pass a list of top-k answer matches to the Driver to be added to the Document.

The Indexer will search for the answers with the most similar embeddings

The matches will contain data such as the docid, doc, and match scores. Since we are also using the same Indexer from the Index Flow, all we need to do again is add the Indexer Pod to flows/query.yml:

Step 3. Reranking

Let's assume that the Indexer returns the top-k answer matches at this point and we want to recompute the match scores to get better results. Jina has a class of Executors called the Rankers, in particular, the Match2DocRankers re-scores the matches for a query by calculating new scores. If you look at the Rankers on Jina Hub, the Levenshtein Ranker uses the Levenshtein distance to recompute the match scores.

However, instead of using a distance metric to recompute the scores, we want to load our fined-tuned BERT model, FinBERT-QA, in the Ranker and recompute the scores by using the concatenation of the question and the current match answers as inputs into a binary classification task.

In order to do this we need to create our own custom Executor and implement our own logic. In this section we will use PyTorch and Hugging Face transformers to implement our custom Ranker.

The main idea here is to pass our query text and the matches (containing the answer text and match scores) to the Ranker to return a reordered list of matches based on the relevancy scores computed by FinBERT-QA. The Driver will then update the matches in the Document based on this reordered list.

The Ranker recomputes the scores of the matches using FinBERT-QA

Recall that Peas can run in Docker, this means that we can simply build a Docker image with our implementation of the Ranker and use the image in the Query Flow. The Jina Hub API let's use to Cookiecutter to create the templates of all the files we will need to do this. Let's get started by making sure that the Jina Hub extension is installed:

pip install "jina[hub]"

Build a Custom Executor

Let's first create the templates that we will need to build a Docker image for our custom Ranker.

1.) Set up.

In the financial-qa-search/ directory type:

jina hub new

This will pop up a wizard that helps you walk through the process. Let's give our Executor the name FinBertQARanker and make sure to select 4 - Ranker for the Executor type. We will use jinaai/jina as our base image for the Docker image that we will build.

You've downloaded /Users/bithiah/.cookiecutters/cookiecutter-jina-hub before. Is it okay to delete and re-download it? [yes]: yes
executor_name [The class name of the executor (UpperCamelCase)]: FinBertQARanker
Select executor_type:
1 - Encoder
2 - Crafter
3 - Indexer
4 - Ranker
5 - Evaluator
Choose from 1, 2, 3, 4, 5 [1]: 4
description [What does this executor do?]: recomputes match scores using FinBERT-QA                
keywords [keywords to describe the executor, separated by commas]: 
pip_requirements []: 
base_image [jinaai/jina]: 
author_name [Jina AI Dev-Team ([email protected])]: 
author_url [https://jina.ai]: 
author_vendor [Jina AI Limited]: 
docs_url [https://github.com/jina-ai/jina-hub]: 
version [0.0.1]: 
license [apache-2.0]: 

After pressing Enter, you will see a new directory called FinBertQARanker. Your file structure should now look as follows:

Project folder structure

We will the implement our logic of the Ranker in __init__.py, write some tests in tests/test_finbertqaranker.py, and change the Dockerfile to contain everything we need to build the image.

The code for the Ranker can be found here.

2.) Fill in the logic for reranking.

We will now implement our logic in __init__.py, which should look like the following:

Jina contains different base classes for the Executors with different functionalities. The base Ranker class that we will use is called Match2DocRankers, which has the functionality to recompute the match scores.

Let's first change the base class of BaseRanker to Match2DocRanker. Let's also import PyTorch using Jina and some other modules that we will need as well as define our current directory.

Our logic will be implemented in the FinBertQARanker class which will use TorchDevice and Match2DocRanker from Jina. We will download the models that we need in the Dockerfile later. Let us assume now we have two models in the folder models/: (1) bert-qa/ and (2) 2_finbert-qa-50_512_16_3e6.pt.

(1) bert-qa: bert-base-uncased fine-tuned on the MS Macro dataset from Passage Re-ranking with BERT

(2) 2_finbert-qa-50_512_16_3e6.pt: FinBERT-QA model - fine-tuned bert-qa on the FiQA dataset.

We first specify bert-qa/ as the the pre-trained model that would be used for initialization, 2_finbert-qa-50_512_16_3e6.pt as the model that would be used to compute the QA relevancy scores, and the maximum sequence length for the QA pairs:

Then we add a post_init function to the class to load the models for the binary classification task. Make sure to set the model in evaluation mode.

Now let's implement a private _get_score function to compute each of the relevancy scores of the question and the top-k answer matches. We first concatenate the question and each top-k answer and encode them to get the inputs (input_ids, token_type_ids, att_mask) that the model needs using the tokenizer from transformers. We then feed the inputs into the model and get the prediction scores that the QA pairs are relevant (label = 1). We apply the softmax function to the scores to transform the prediction scores into probabilities between 0 and 1. The output would then be the relevancy score in the form of a probability for the QA pair.

Lastly, let's fill in the scoring function that takes the question from the user and Jina's match scores as input and uses _get_scores to recompute new scores:

3.) Write a Unit Test

In order to create a new Executor and build a Docker image with the Jina Hub API, we need to write a unit test. We can find a template for this in tests/test_finbertqaranker.py. I wrote a simple check to compute the relevance probability for two answer matches given a query and to check to see if FinBertQARanker computes the same score as our expectation:

4.) Add Requirements

Other than Jina we are also using PyTorch and transformers for FinBertQARanker, so let's add them to FinBertQARanker/requirements.txt:


5.) Prepare Dockerfile

Let's change our Dockerfile to the contents below, which will download the models into a folder called models/.

6.) Build Docker image with Jina Hub API

We are finally ready to build FinBertQARanker into a Docker image. In our working directory, let's type:

jina hub build FinBertQARanker/ --pull --test-uses --timeout-ready 60000

--pull downloads our Jina base image if it is not already local.

--test-uses adds an extra test to check if the built image can dry-run successfully via Jina's Flow API. --timeout-ready gives our post_init function time to load the models.

If the build is successful, you will see this message:

 HubIO@10240[I]:Successfully built ba3fac0f3a46
 HubIO@10240[I]:Successfully tagged jinahub/pod.ranker.finbertqaranker:0.0.1-0.8.13
 HubIO@10240[I]:building FinBertQARanker/ takes 6 minutes and 12 seconds (372.31s)
 HubIO@10240[S]:🎉 built jinahub/pod.ranker.finbertqaranker:0.0.1-0.8.13 (sha256:ba3fac0f3a) uncompressed size: 3.3 GB

Congratulations 🥳, you have successfully built a custom Executor in the form of a Docker image with the tag name jinahub/pod.ranker.finbertqaranker:0.0.1-0.8.23! Let's see how we can use it in the Query Flow next.

I. Create a custom Ranker Pod

To use our custom Ranker, FinBertQARanker, we need to first create a new Pod for the Ranker. Let's create the file rank.yml in the pods folder. Next, let's copy the contents from FinBertQARanker/config.yml to pods/rank.yml and you should have the following:

This is going to tell the Query Flow to use the logic we have implemented in our Exectuor, FinBertQARanker/__init__.py. Since the code for this implementation is loaded inside the workspace folder in the Docker image, let's add workspace/ before __init__.py.

The Encoder and Indexer Executors that we have used so far all use default Drivers in the Pods. Since we created our custom Executor, we need to tell the Ranker Pod which Driver to use. In this case we will use the Matches2DocRankDriver for the Match2DocRanker base Ranker class. Hence, our rank.yml will look as follows:

Hooray 🎊 we now have a custom Ranker Pod! Let's see next how we can use it in the Query Flow.

II. Use Custom Ranker in the Query Flow

Like the other Executor Pods, we just need to add ranker after the doc_indexer and tell the Query Flow to use the Docker image and Ranker Pod that we have just created by specifying the prefix docker:// in front of the tag name. The final flows/query.yml should look as follows:

Be aware that the tag name of the Docker image might change depending the current Jina release. Make sure to change the tag name that accordingly to your build message.

We can again visualize the Query Flow using the Flow API in a jupyter notebook as follows:

Query Flow visualization

Here we see our Query Flow with three Pods containing the Encoder, encoder and Indexer, doc_indexer, and Ranker, ranker. At the end of the Query Flow, the Driver from the Ranker Pod will have changed the matches in the Document to an reordered list of matches based on the probabilities computed by our custom Ranker, FinBertQARanker. Next, we will see how we can access this list of final matches in our app.py.

Build a Search Application

Get matches and scores stored in the Document

Since our final matches and their relevancy probability are stored in the Document, in app.py, we can write a function to print out the response to a question from the user input. We can loop through the matches in our Document, d.matches, and print out the values of the scores and the matching answer text.

We can then write our search method that uses the Query Flow from flows/query.yml and passes the user inputs into print_resp. In f.search_lines(), we specify the input as our user query, the output as the response to be printed, and the top-k answers we want to retrieve. The cool thing about f.search_lines() is that it automatically creates a Document for the user query, like sugar magic 🍬!

Hooray! 🎉🎉🎉 We have just finished building our Financial QA search engine! We can now run:

python app.py search

and try out different questions! The Ranker might take some time to compute the relevancy scores since it is using a BERT-based model. Here is a list of sample questions:

• What does it mean that stocks are “memoryless”?
• What would a stock be worth if dividends did not exist?
• What are the risks of Dividend-yielding stocks?
• Why do financial institutions charge so much to convert currency?
• Is there a candlestick pattern that guarantees any kind of future profit?
• 15 year mortgage vs 30 year paid off in 15
• Why is it rational to pay out a dividend?
• Why do companies have a fiscal year different from the calendar year?
• What should I look at before investing in a start-up?
• Where do large corporations store their massive amounts of cash?


In this blog, I introduced core Jina concepts and demonstrated how to build a production-ready Financial QA system using Jina. I also explained how to use the Jina Hub API to create a BERT-powered Ranker Executor. Thanks to the building blocks that Jina provides, we could easily use the SOTA and powerful model, FinBERT-QA, in production.

The neural search application we have just built with Jina runs locally on our own machines, but can also be completely distributed and run on multiple machines in a network, making our application highly reusable, scalable, and efficient. On top of that, common cloud-native features such as persistence, scheduling, chaining, grouping, and parallelization all come out of the box.

Moreover, there are variants of pre-trained BERT models for other domains such as biomedical, science, and legal. You can use these models to build a QA search application and experiment with the results!

Next Step: Evaluation

If you made it all the way through this tutorial, you might be wondering, "how do I evaluate the search results?". Great question! Jina has a class of Executors called the Evaluator and has implementations of common evaluation metrics like Precision and Reciprocal Error. Evaluation is an important step and will allow us to optimize the search results and design the most effective Flows. We will see in the next tutorial how we can add the Evaluator in our Financial QA application.

Learn More

To learn more about Jina, I recommend reading the following articles:

and checking out our Github page!

If you want to learn Jina by doing, I encourage you to start building your own examples and sharing them with the community to help us grow our open-source ecosystem! 🚀 For example, check out this community project - transformers-for-lawyers built with Jina.

We saw how versatile and extensible Jina is and we could create all kinds of search applications using our own logic and models for NLP, Computer Vision, and other ML search applications. Jina Hub is a great place to get started, where you can use the available Executors to build other types of search engines (for images, videos, etc...) or create your own Executors using the Jina Hub API! You can always come back to this tutorial and walk through the process again.

As an open-source company we would also love your help and contributions.️ We have issues labelled as good first issue to get started! You can read more about our contributing guidelines here.

If you want to know more about Jina's new features or ask any questions, welcome to join our Slack Community and our monthly public Engineering All Hands via Zoom or Youtube live stream.

If you are interested in joining us as a full-time AI / Backend / Frontend developer, please submit your CV to our job portal. Let’s build the next open-source neural search ecosystem together!


  • Slack channel - a communication platform for developers to discuss Jina
  • Community newsletter - subscribe to the latest update, release and event news of Jina
  • LinkedIn - get to know Jina AI as a company and find job opportunities
  • Twitter - follow us and interact with us using hashtag #JinaSearch
  • Company - know more about our company, we are fully committed to open-source!
© Jina AI 2020-2022. All rights reserved.