News
Models
Products
keyboard_arrow_down
DeepSearch
Search, read and reason until best answer found.
Reader
Convert any URL to Markdown for better grounding LLMs.
Embeddings
World-class multimodal multilingual embeddings.
Reranker
World-class reranker for maximizing search relevancy.
More
keyboard_arrow_down
Classifier
Zero-shot and few-shot classification for image and text.
Segmenter
Cut long text into chunks and do tokenization.

API Docs
Auto codegen for your copilot IDE or LLM
open_in_new


Company
keyboard_arrow_down
About us
Contact sales
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms & Conditions


Log in
login
Se-what now?
Play with it yourself
Understanding the game
Getting started
Let’s write some code
Next steps
Tech blog
May 17, 2023

Do you want to play a game? Let's build Semantle with DocArray and GPT

Exploring DocArray v2 by building a word game using GPT
Black board with a grid layout displaying tech terms FASTAPI, PYDANTIC, WEAVIATE, DOCARRAY in stylized yellow and green lette
Alex C-G
Alex C-G • 5 minutes read

tagSe-what now?

Semantle is a game similar to Wordle, but rather than guessing based on the word's letters, you guess based on the word's semantics.

This is just the junior version. As you can see I'm quite the expert when it comes to competing with 12 year olds

We'll build our own simple version using DocArray v2 and OpenAI’s large language models, based on the top 10,000 most common words in English.

tagPlay with it yourself

To test it out yourself, you can install the version from the [repo]. It's a little bit enhanced, but all the relevant stuff is covered in this tutorial.

git clone https://github.com/alexcg1/semantle-docarray

tagUnderstanding the game

In our implementation of Semantle:

  • A loop is started where the user has to guess the word.
  • As soon as the guess matches the target word, the loop ends and the game displays the guess and hint counts to the user.
  • If the user asks for a hint, we use GPT-4 to generate a hint from the target word.
  • If the user doesn't get it right and doesn't ask for a hint, we fall back to else, where:
  • The guess counter increments.
  • The distance is calculated and presented to the user.

Keep reading to see how we build it all up!

tagGetting started

tagInstall requirements

To build our game, we'll need a few dependencies:

  • docarray - our data structure, ideal for representing multimodal documents and embeddings.
  • openai - we'll use their models to generate embeddings for the target word, guess words, and hints.
pip install docarray openai

tagGet the list of words

wget https://raw.githubusercontent.com/first20hours/google-10000-english/master/google-10000-english.txt

Alternatively, for an easier version you can use a list of just fruits:

wget https://raw.githubusercontent.com/alexcg1/semantle-docarray/main/wordlists/fruits.txt

💡
Why not use a random word generator API? Because English has a lot of words. Most of which very few people would know. I want to keep this example accessible. (C'mon, who's going to guess "pseudoantidisestablishmentarianism"?)

tagSet OpenAI API key

You'll need an OpenAI API key to use the models. You use this command from the CLI:

export OPENAI_API_KEY=<your_key_here>

tagLet’s write some code

tagDefining our data structure

Since we’re just working with text and embeddings, we’ll define a data structure with just a text field and an embedding field by subclassing from BaseDoc. Let’s call that data structure WordDoc. Not that kind of Word doc. Don't worry. No Clippy here).

Just kidding. Clippy is always there. Watching. Waiting to come back from the dungeon dimensions where Bill Gates banished him so long ago

We'll create a file called app.py:


from docarray import Doclist, BaseDoc

class WordDoc(BaseDoc):
    text: str = ''
    embedding: NdArray[1536] | None

Here we've defined our .embedding attribute as an NdArray with a length of 1,536, which is the length of the embeddings we'll get later from OpenAI. We didn't need to define the .text attribute, since that's already provided by TextDoc.

That means we have two useful attributes to work with:

  • text: the string of the word.
  • embedding: the NdArray of the vector embedding of the word.

tagChoosing our target word

We'll randomly choose one of those words from our list and encode it into a WordDoc:

word_list = 'google-10000-english.txt' # or fruits.txt

def wordlist_to_doclist(filename: str):
    docs = DocList[TextDoc]()
    with open(filename, 'r') as file:
        for line in file:
            doc = TextDoc(text=line.lower().strip())
            docs.append(doc)

    return docs

all_words = wordlist_to_doclist(word_list)
target_word = random.choice(all_words)

tagGetting the embedding for our target word

We need to generate an embedding for our target word so that we can give the user a score for their guess based on how near or far it is from our target word.

import openai

openai.api_key = os.environ['OPENAI_API_KEY']

def gpt_encode(doc: WordDoc, model_name: str='text-embedding-ada-002'):
    response = openai.Embedding.create(input=doc.text, model=model_name)
    doc.embedding = response['data'][0]['embedding']

    return doc

gpt_encode(target_word)

💡
We use the ada-002 model here since we're just generating the embedding for a simple string. Using GPT-4 would be like using a nuclear bomb to crack a nut.

tagGetting a user's guess

A user has a (currently) infinite number of guesses:

guess = WordDoc(text='')

guess_counter = 0
hint_counter = 0

while guess.text != target_word.text:
    guess = WordDoc(text=input('What is your guess? '))

tagResponding to a wrong guess

If the user's input doesn't match the target word, we'll encode the input into vectors using our function from above:

gpt_encode(guess)

Then we work out the distance of the vectors of both the guess and the target word:

distance = cosine(target_word.embedding, guess.embedding)

if guess.text != target_word.text:
    print(f'Try again. Your distance is {round(distance, 2)}.')

The score corresponds to the distance between the guess and the target, so an exact match is 0, while the most distant match could be up to 1.
💡
Then the while loop checks if the guess matches the target. If not, the user gets another guess.

tagResponding to a /hint request

We can re-arrange the above into an if block:

while guess.text != target_word.text:
    guess = WordDoc(text=input('What is your guess? '))

    if guess.text.lower() == '/hint':
        hint_counter += 1
        #  give hint
    else:
        #  check if guess is right or wrong

For the hint, we once again use OpenAI, but this time with GPT. This is because generating a hint for a word is way more complex than just generating an embedding, and requires decent prompting:

import openai

def get_hint(doc, model_name='text-davinci-003'):
    response = openai.Completion.create(
        model=model_name,
        prompt=f'Create a crossword clue for the word "{doc.text}". You will give a brief dictionary definition of the word, but not say the word itself.',
        max_tokens=100,
        temperature=0.7,
    )

    hint = response.choices[0].text.strip()

    return hint

tagResponding to a correct guess

This is handled by the while loop itself. As soon as the text of the guess is equal to that of the target word, the loop ends and the user is given their score.

while guess.text != target_word.text:
    guess = WordDoc(text=input('What is your guess? '))

    if guess.text.lower() == '/hint':
        hint_counter +=1
        print(get_hint(target_word))
    else:
        guess_counter += 1
        gpt_encode(guess)

        distance = cosine(target_word.embedding, guess.embedding)

        if guess.text != target_word.text:
            print(
                f'Try again. Your distance is {round(distance, 2)}.'
            )

print(
    f'Congratulations! You guessed in {guess_counter} turn(s) and {hint_counter} hint(s).'
)

tagNext steps

DocArray is way more than a toy just for making games. You can use it to build search engines, chatbots, image generation APIs, music search (ala Shazam) and much more.

How would you improve the game? Or what (actually useful) thing would you build with DocArray? Join our Discord community and let us know, or make your first contribution today!

Join the DocArray Discord
Categories:
Tech blog
rss_feed

Read more
May 07, 2025 • 9 minutes read
Model Soup’s Recipe for Embeddings
Bo Wang
Scott Martens
Still life drawing of a purple bowl filled with apples and oranges on a white table. The scene features rich colors against a
April 16, 2025 • 10 minutes read
On the Size Bias of Text Embeddings and Its Impact in Search
Scott Martens
Black background with a simple white ruler marked in centimeters, emphasizing a minimalist design.
April 01, 2025 • 17 minutes read
Using DeepSeek R1 Reasoning Model in DeepSearch
Andrei Ungureanu
Alex C-G
Brown background with a stylized whale graphic and the text "THINK:" and ":SEARCH>" in code-like font.
Offices
location_on
Sunnyvale, CA
710 Lakeway Dr, Ste 200, Sunnyvale, CA 94085, USA
location_on
Berlin, Germany (HQ)
Prinzessinnenstraße 19-20, 10969 Berlin, Germany
location_on
Beijing, China
Level 5, Building 6, No.48 Haidian West St. Beijing, China
location_on
Shenzhen, China
402 Floor 4, Fu'an Technology Building, Shenzhen, China
Search Foundation
DeepSearch
Reader
Embeddings
Reranker
Classifier
Segmenter
API Documentation
Get Jina API key
Rate Limit
API Status
Company
About us
Contact sales
Newsroom
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms
Security
Terms & Conditions
Privacy
Manage Cookies
email
Jina AI © 2020-2025.