Jina.ai logo
Remote Search Flows with JinaD-image
JinaD
remote flow

Remote Search Flows with JinaD

Susana Guzmán
Susana Guzmán

Hello everyone! Today I'd like to talk to you about JinaD! I just worked a bit with it the last week so I thought to share what I discovered with you all.

First things first…WHAT?

Yeah so, JinaD stands for Jina Daemon, and the core idea is to remember that Jina was born to be distributed, so we need to take advantage of that, and what JinaD aims at is to spin up Flows remotely.

I think the easiest will be to check some of the JinaD tests to see what's going on, let's take a look at the test_query_with_shards.

Again…what?

Let's define first what is the expected output and then let's see how we'll do it. For this test we want

  1. Build our Docker image with Docker compose
  2. Create remote Flows
  3. Index & Query

Right…let's see the implementation

As you can see with have several files here, and since our first step was to build our Dockerfile let's check that one first

FROM jinaai/jina:test-pip

WORKDIR /

RUN apt-get update && \
    apt-get install --no-install-recommends -y git \
    curl

RUN python -m pip install --no-cache-dir --upgrade pip && \
    git clone https://github.com/jina-ai/jinad.git && \
    pip install $(grep -ivE "jina" jinad/requirements.txt) --ignore-installed && \
    cd jinad && python setup.py install

COPY . /

COPY tests/integration/distributed/test_index_query_with_shards/entrypoint.sh .

RUN chmod +x entrypoint.sh

ENTRYPOINT ["bash", "-c", "./entrypoint.sh"]

Ok, so the first thing here is that we'll use the test-pip image as a base, if you don't have it yet you can build it with:

docker build --build-arg PIP_TAG="[devel]" -f tests/integration/jinad/Dockerfiles/Dockerfile -t jinaai/jina:test-pip .

And then we get all the necessary requirements for JinaD.

Now the next step is to spin up our flow, so let's check the test_integration.py

def test_flow():
    if Path.cwd().name != 'jinad':
        sys.exit(
            'test_index_query_with_shards.py should only be run from the jinad base directory'
        )

    start_docker_compose(compose_yml)

    time.sleep(10)

    flow_id = send_flow(flow_yml, pod_dir)['flow_id']

    print(f'Successfully started the flow: {flow_id}. Lets index some data')

As you can see, after checking we're on the right folder, we run start_docker_compose, and this will take care to build up our image with the Dockerfile we just saw. After that, the next step is to spin up our Flow, we have a print to see our Flow id just (case you can never have enough prints for mental health) for sanity.

Ok! so we have our Flow spun up remotely already! dances remotely but now let's use it cause otherwise was just a lot of time I could have spent dancing remotely.

for x in range(100):
    text = 'text:hey, dude ' + str(x)
    text_indexed = call_api(
        method='post',
        url='http://0.0.0.0:45678/api/index',
        payload={'top_k': 10, 'data': [text]},
    )['index']['docs'][0]['text']

    assert text_indexed == text

In this case, I'm going to index 100 documents, each with the text "hey, dude" plus the document number. And we check that the text we indexed is actually what we wanted to index.

And just like that, we spun up our Flow and indexed some documents on it. If you run this, at this point you'd see on your terminal the confirmation of what we just did and the Flow ids:

But then that's not really useful until we query data, so we need to spin up another Flow, this time for query.

Let's remember that a Flow is a Flow, it doesn't matter if it's for index or query, so we create another one that is exactly the same as we did for the index:

flow_id = send_flow(flow_yml, pod_dir)['flow_id']
print(f'Successfully started the flow: {flow_id}. Lets send some query')

And now in this new Flow we get the query results:

texts_matched = get_results(query='anything will match the same')
assert len(texts_matched['search']['docs'][0]['matches']) == 10

Let's look at this a bit closer, what is that get_results doing

def call_api(method, url, payload=None, headers={"Content-Type": "application/json"}):
    return getattr(requests, method)(
        url, data=json.dumps(payload), headers=headers
    ).json()

def get_results(query, top_k=10):
    return call_api(
        method="post",
        url="http://0.0.0.0:45678/api/search",
        payload={"top_k": top_k, "data": [query]},
    )

As you can see, here we are specifying the URL, the method, and the payload. We defined the top_k as 10, and that's why on the previous assert, I was verifying that I indeed receive 10 matches.

And that is iiiiit! the last part is just to stop everything docker related with

stop_docker_compose(compose_yml)

So this was fun and I hope was good enough for you to see how you can spin up Flows in different machines.

I feel very sad I couldn't speak not even once about cats in this post so I'll come back to fix that sometime soon. In the meantime, keep checking our other examples and don't hesitate to contact us on our Twitter, Github, or Slack

© 2021 Jina AI GmbH. All rights reserved.Terms of Service|Privacy Policy