DocArray 0.32 Update

DocArray is a library for representing, sending and storing multi-modal data, perfect for Machine Learning applications.

Engineering Group
DocArray 0.32 Update
DocArray 0.32 Update

Release Note (v0.32.0)

This release contains 4 new features, 5 bug fixes and 4 documentation improvements.

Release 💫 Release v0.32.0 · docarray/docarray
Release Note (v0.32.0)This release contains 4 new features, 0 performance improvements, 5 bug fixes and 4 documentation improvements.🆕 FeaturesSubindex for document index (#1428)The subindex fe…

🆕 Features

Subindex for document index (#1428)

The subindex feature allows you to index documents that contain another DocList by automatically creating a separate collection/index for each such DocList:

# create nested document schema
class SimpleDoc(BaseDoc):
    tensor: NdArray[10]
    text: str


class MyDoc(BaseDoc):
    docs: DocList[SimpleDoc]


# create some docs
my_docs = [
    MyDoc(
        docs=DocList[SimpleDoc](
            [
                SimpleDoc(
                    tensor=np.ones(10) * (j + 1),
                    text=f"hello {j}",
                )
                for j in range(10)
            ]
        ),
    )
]

# index them into Elasticsearch
index = ElasticDocIndex[MyDoc](index_name="idx")
index.index(my_docs)  # index with name 'idx' and 'idx__docs' will be generated

# search on the nested level (subindex)
query = np.random.rand(10)
matches_root, matches_nested, scores = index.find_subindex(
    query, search_field="docs__tensor", limit=5
)

OpenAPI and FastAPI tensor shapes (#1510)

We have enabled shaped tensors to be properly represented in OpenAPI/SwaggerUI, both in examples and the schema.

This means that you can now build web APIs using FastAPI where the SwaggerUI properly communicates tensor shapes to your users:

class Doc(BaseDoc):
    embedding_torch: TorchTensor[3, 4]


app = FastAPI()


@app.post("/foo", response_model=Doc, response_class=DocArrayResponse)
async def foo(doc: Doc) -> Doc:
    return Doc(embedding=doc.embedding_np)

Generated Swagger UI:

image
image

Save and load in-memory index (#1534)

We added a persist method to the InMemoryExactNNIndex class to save the index to disk.

# Save your existing index as a binary file
doc_index.persist('docs.bin')
# Initialize a new document index using the saved binary file
new_doc_index = InMemoryExactNNIndex[MyDoc](index_file_path='docs.bin')

🐞 Bug Fixes

search_field should be optional in hybrid text search (#1516)

We have added a sane default to text_search() for the search_field argument that is now Optional.

Check if file path exists for in-memory index (#1537)

We have added an internal check to see if index_file_path exists when passed to InMemoryExactNNIndex.

Add empty judgement to index search (#1533)

We have ensured that empty indices do not fail when find is called.

Detach torch tensors (#1526)

Serializing tensors with gradients no longer fails.

Docvec display fixes (#1522)

We have resolved Docvec display issues.

📗 Documentation Improvements

  • Remove erroneous info (#1531)
  • Fix link to documentation in readme (#1525)
  • Flatten structure (#1520)
  • Fix links (#1518)

🤘 Contributors

We would like to thank all contributors to this release:

Engineering Group

Engineering Group

We do opensource, we do neural search, we do creative AI, we do MLOps. We do we.
... and You!

... and You!

You love opensource and AI engineering. So join Jina AI today! Let's lead the future of Multimodal AI. 🚀

Table of Contents

1
🆕 Features
2
🐞 Bug Fixes
3
📗 Documentation Improvements
4
🤘 Contributors
layout