Release Note (0.35.0
)
This release contains 3 new features, 2 bug fixes, and 1 documentation improvement.
🆕 Features
More serialization options for DocVec
(#1562)
DocVec
now has the same serialization interface as DocList
. This means that that following methods are available for it:
to_protobuf()
/from_protobuf()
to_base64()
/from_base64()
save_binary()
/load_binary()
to_bytes()
/from_bytes()
to_dataframe()
/from_dataframe()
For example, you can now perform Base64 (de)serialization like this:
from docarray import BaseDoc, DocVec
class SimpleDoc(BaseDoc):
text: str
dv = DocVec[SimpleDoc]([SimpleDoc(text=f'doc {i}') for i in range(2)])
base64_repr_dv = dv.to_base64(compress=None, protocol='pickle')
dl_from_base64 = DocVec[SimpleDoc].from_base64(
base64_repr_dv, compress=None, protocol='pickle'
)
For further guidance, check out the documentation section on serialization.
Validate file formats in URL (#1606) (#1669)
Validate the file formats given in URL types such as AudioURL, TextURL, ImageURL
to check they correspond to the expected mime type.
Add methods to create BaseDoc
from schema (#1667)
Sometimes it can be useful to dynamically create a BaseDoc
from a given schema of an original BaseDoc
. Using the methods create_pure_python_type_model
and create_base_doc_from_schema
you can make sure to reconstruct the BaseDoc
.
from docarray.utils.create_dynamic_doc_class import (
create_base_doc_from_schema,
create_pure_python_type_model,
)
from typing import Optional
from docarray import BaseDoc, DocList
from docarray.typing import AnyTensor
from docarray.documents import TextDoc
class MyDoc(BaseDoc):
tensor: Optional[AnyTensor]
texts: DocList[TextDoc]
MyDocPurePython = create_pure_python_type_model(MyDoc) # Due to limitation of DocList as Pydantic List, we need to have the MyDoc `DocList` converted to `List`.
NewMyDoc = create_base_doc_from_schema(
MyDocPurePython.schema(), 'MyDoc', {}
)
new_doc = NewMyDoc(tensor=None, texts=[TextDoc(text='text')])
🐞 Bug Fixes
Cap Pydantic version (#1682)
Due to the breaking change in Pydantic v2, we have capped the version to avoid problems when installing DocArray.
Better error message when DocVec is unusable (#1675)
After calling doc_list = doc_vec.to_doc_list()
, doc_vec
ends up in an unusable state since its data has been transferred to doc_list
. This fix gives users a more informative error message when they try to interact with doc_vec
after it has been made unusable.
📗 Documentation Improvements
- Fix a reference in README (#1674)
🤟 Contributors
We would like to thank all contributors to this release:
- Saba Sturua (@jupyterjazz)
- Joan Fontanals (@JoanFM)
- Han Xiao (@hanxiao)
- Johannes Messner (@JohannesMessner)