CLIP-as-service is a low-latency high-scalability service for embedding images and text. It can be easily integrated as a micro-service into neural search solutions.
Release Note (0.8.0)
Release time: 2022-10-12 08:11:40
This release contains 3 new features, 1 performance improvement, and 1 documentation improvement.
🆕 Features
Support large ONNX model files (#828)
Before this release, the ONNX model file was limited to 2GB. Now we support
large ONNX models which are archived into zip files, in which several small
ONNX files are stored for subgraphs. As a result, we are now able to serve
all of the CLIP models via onnxruntime.
Support ViT-B-32, ViT-L-14, ViT-H-14 and ViT-g-14 trained on laion-2b (#825)
Users can now serve four new CLIP models from OpenCLIP trained on the Laion-2B dataset:
- ViT-B-32::laion2b-s34b-b79k
- ViT-L-14::laion2b-s32b-b82k
- ViT-H-14::laion2b-s32b-b79k
- ViT-g-14::laion2b-s12b-b42k
The ViT-H-14 model achieves 78.0% zero-shot top-1 accuracy on ImageNet and 73.4% on zero-shot image retrieval at Recall@5 on MS COCO. This is the best-performing open source CLIP model. To use the new models, simply specify the model name, e.g., ViT-H-14::laion2b-s32b-b79k
in the Flow YAML. For example:
jtype: Flow
version: '1'
with:
port: 51000
executors:
- name: clip_t
uses:
jtype: CLIPEncoder
with:
name: ViT-H-14::laion2b-s32b-b79k
metas:
py_modules:
- clip_server.executors.clip_torch
Please refer to model support to see the full list of supported models.
In-place result in clip_client
; preserve output order by uuid (#815)
The clip_client
module now supports in-place embedding. This means the result of a call to the CLIP server to get embeddings is stored in the input DocumentArray
, instead of creating a new DocumentArray
. Consequently, the DocumentArray
returned by a call to Client.encode
now has the same order as the input DocumentArray
.
This could cause a breaking change if code depends on Client.encode
to return a new DocumentArray
instance.
If you run the following code, you can verify that the input DocumentArray
now contains the embeddings and that the order is unchanged.
from docarray import DocumentArray, Document
from clip_client import Client
c = Client('grpc://0.0.0.0:51000')
da = [
Document(text='she smiled, with pain'),
Document(uri='apple.png'),
Document(uri='apple.png').load_uri_to_image_tensor(),
Document(blob=open('apple.png', 'rb').read()),
Document(uri='https://clip-as-service.jina.ai/_static/favicon.png'),
Document(
uri=''
),
]
c.encode(da)
print(da.embeddings)
🚀 Performance
Drop image content to boost latency (#824)
Calls to Client.encode
no longer return the input image with the embedding. Since embeddings are now inserted into the original DocumentArray
instance, this is unnecessary network traffic. As a result, the system is now faster and more responsive. Performance improvement is dependent on the size of the image and network bandwidth.
📗 Documentation Improvements
CLIP benchmark on zero-shot classification and retrieval tasks (#832)
We now provide benchmark information for CLIP models on zero-shot classification and retrieval tasks. This information should help users to choose the best CLIP model for their specific use-cases. For more details, please read the Benchmark page in the CLIP-as-Service User Guide.
🤟 Contributors
We would like to thank all contributors to this release:
- Felix Wang (@numb3r3)
- Ziniu Yu (@ZiniuYu)
- Jie Fu (@jemmyshin)