Release Note (v0.32.0
)
This release contains 4 new features, 5 bug fixes and 4 documentation improvements.
🆕 Features
Subindex for document index (#1428)
The subindex feature allows you to index documents that contain another DocList
by automatically creating a separate collection/index for each such DocList
:
# create nested document schema
class SimpleDoc(BaseDoc):
tensor: NdArray[10]
text: str
class MyDoc(BaseDoc):
docs: DocList[SimpleDoc]
# create some docs
my_docs = [
MyDoc(
docs=DocList[SimpleDoc](
[
SimpleDoc(
tensor=np.ones(10) * (j + 1),
text=f"hello {j}",
)
for j in range(10)
]
),
)
]
# index them into Elasticsearch
index = ElasticDocIndex[MyDoc](index_name="idx")
index.index(my_docs) # index with name 'idx' and 'idx__docs' will be generated
# search on the nested level (subindex)
query = np.random.rand(10)
matches_root, matches_nested, scores = index.find_subindex(
query, search_field="docs__tensor", limit=5
)
OpenAPI and FastAPI tensor shapes (#1510)
We have enabled shaped tensors to be properly represented in OpenAPI/SwaggerUI, both in examples and the schema.
This means that you can now build web APIs using FastAPI where the SwaggerUI properly communicates tensor shapes to your users:
class Doc(BaseDoc):
embedding_torch: TorchTensor[3, 4]
app = FastAPI()
@app.post("/foo", response_model=Doc, response_class=DocArrayResponse)
async def foo(doc: Doc) -> Doc:
return Doc(embedding=doc.embedding_np)
Generated Swagger UI:
Save and load in-memory index (#1534)
We added a persist
method to the InMemoryExactNNIndex
class to save the index to disk.
# Save your existing index as a binary file
doc_index.persist('docs.bin')
# Initialize a new document index using the saved binary file
new_doc_index = InMemoryExactNNIndex[MyDoc](index_file_path='docs.bin')
🐞 Bug Fixes
search_field
should be optional in hybrid text search (#1516)
We have added a sane default to text_search()
for the search_field
argument that is now Optional.
Check if file path exists for in-memory index (#1537)
We have added an internal check to see if index_file_path
exists when passed to InMemoryExactNNIndex
.
Add empty judgement to index search (#1533)
We have ensured that empty indices do not fail when find
is called.
Detach torch tensors (#1526)
Serializing tensors with gradients no longer fails.
Docvec
display fixes (#1522)
We have resolved Docvec
display issues.
📗 Documentation Improvements
- Remove erroneous info (#1531)
- Fix link to documentation in readme (#1525)
- Flatten structure (#1520)
- Fix links (#1518)
🤘 Contributors
We would like to thank all contributors to this release:
- Mohammad Kalim Akram (@makram93)
- Johannes Messner (@JohannesMessner)
- Anne Yang (@AnneYang720)
- Zhaofeng Miao (@mapleeit)
- Joan Fontanals (@JoanFM)
- Kacper Łukawski (@kacperlukawski)
- IyadhKhalfallah (@IyadhKhalfallah)
- Saba Sturua (@jupyterjazz)