Jina is a MLOps framework that empowers anyone to build cross-modal and multi-modal applications on the cloud. It uplifts a PoC into a production-ready service. Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer.
Release Note (3.12.0
)
This release contains 8 new features, 16 bug fixes and 15 documentation improvements.
🆕 Features
Support multiple protocols at the same time in Flow Gateways (#5435 and #5378)
Prior to this release, a Flow only exposed one server in its Gateway with one of the following protocols: HTTP, gRPC or WebSockets.
Now, you can specify multiple protocols and for each one, a separate server is started. Each server is bound to its own port.
For instance, you can do:
from jina import Flow
flow = Flow(port=[12345, 12345, 12345], protocol=['http', 'grpc', 'websocket'])
with flow:
flow.block()
or: jina flow --uses flow.yml
where flow.yml
is:
jtype: Flow
with:
protocol:
- 'grpc'
- 'http'
- 'websocket'
port:
- 12345
- 12344
- 12343
The protocol
and port
parameters can still accept single values rather than a list. Therefore, there is no breaking change.
Alias parameters protocols
and ports
are also defined:
flow = Flow(ports=[12345, 12345, 12345], protocols=['http', 'grpc', 'websocket'])
In Kubernetes, this exposes separate services for each protocol.
Read the docs for more information.
Add option to return requests in order using the Client
(#5404)
If you use replicated Executors, those which finish processing first return their results to the Gateway which then returns them to the client. This is useful if you want results as soon as each replicated Executor finishes processing your Documents.
However, this may be inconvenient if you want the Documents you send to the Flow to return in order. In this release, you can retain the order of sent Documents (when using replicated Executors) by passing the results_in_order
parameter in the Client
.
For instance, if your Flow looks like this:
from jina import Flow, DocumentArray, Document
f = Flow().add(replicas=2)
You can do the following to keep results in order:
input_da = DocumentArray([Document(text=f'{i}') for i in range(100)])
with f:
result_da = f.post('/', inputs=input_da, request_size=10, results_in_order=True)
assert result_da[:, 'text'] == input_da[:, 'text']
Add docs_map
parameter to Executor endpoints (#5366)
Executor endpoint signatures are extended to the following:
class MyExecutor(Executor):
@requests
async def foo(
self, docs: DocumentArray, parameters: Dict, docs_matrix: Optional[List[DocumentArray]], docs_map: Optional[Dict[str, DocumentArray]]
) -> Union[DocumentArray, Dict, None]:
pass
Basically, the parameter docs_map
has been added. It's a dictionary that maps previous Executor names to DocumentArrays. This is useful when you have an Executor that combines results from many previous Executors, and you need information about where each resulting DocumentArray comes from.
Add Gateway API (#5342)
Prior to this release, all Gateway configurations were specified in the Flow API. However, by principle, Flow parameters are commonly inherited by Executors and the Gateway. We already gave the Executor its own API to be customized (either using the method add()
or the executors
YAML section in Flow YAML).
In this release, we have done the same for Gateway. It defines its own API in both the Python API and YAML interface. In the Python API, you can configure the Gateway using the config_gateway()
method:
flow = Flow().config_gateway(port=12345, protocol='http')
And in the YAML interface, you can configure the Gateway using the gateway
section:
!Flow
gateway:
protocol: http
port: 12344
executors:
- name: exec
This is useful when you want to apply parameters just for the Gateway. If you want a parameter to be applied to all Executors, then continue to use the Flow API.
Keep in mind that you can still provide Gateway parameters using the Flow API. This means there are no breaking changes introduced.
Support UUID in CUDA_VISIBLE_DEVICES round-robin assignment (#5360)
You can specify a comma-separated list of GPU UUIDs in the CUDA_VISIBLE_DEVICES
to assign devices to Executor replicas in a round-robin fashion. For instance:
CUDA_VISIBLE_DEVICES=RRGPU-0aaaaaaa-74d2-7297-d557-12771b6a79d5,GPU-0bbbbbbb-74d2-7297-d557-12771b6a79d5,GPU-0ccccccc-74d2-7297-d557-12771b6a79d5,GPU-0ddddddd-74d2-7297-d557-12771b6a79d5
Check CUDA's documentation to see the accepted formats to assign CUDA devices by UUID.
GPU device | Replica ID |
---|---|
GPU-0aaaaaaa-74d2-7297-d557-12771b6a79d5 |
0 |
GPU-0bbbbbbb-74d2-7297-d557-12771b6a79d5 |
1 |
GPU-0ccccccc-74d2-7297-d557-12771b6a79d5 |
2 |
GPU-0ddddddd-74d2-7297-d557-12771b6a79d5 |
3 |
GPU-0aaaaaaa-74d2-7297-d557-12771b6a79d5 |
4 |
Thanks to our community member @mchaker for submitting this feature request!
Capture shard failures in the head runtime (#5338)
In case you use Executor shards, partially failed requests (those that fail on a subset of the shards) no longer raise an error.
Instead, successful results are returned. An error is raised only when all shards fail to process Documents. Basically, the HeadRuntime
's behavior is updated to fail only when all shards fail.
Thanks to our community user @soumil1 for submitting this feature request.
Add successful, pending and failed metrics to HeadRuntime (#5374)
More metrics have been added to the Head Pods:
jina_number_of_pending_requests
: number of pending requestsjina_successful_requests
: number of successful requestsjina_failed_requests
: number of failed requestsjina_received_request_bytes
: the size of received requests in bytesjina_sent_response_bytes
: the size of sent responses in bytes
See more in the instrumentation docs.
Add deployment label in gRPC stub metrics (#5344)
Executor metrics used to show up aggregated at the Gateway level and users couldn't see separate metrics per Executor. With this release, we have added labels for Executors so that metrics in the Gateway can be generated per Executor or aggregated over all Executors.
🐞 Bug Fixes
Check whether the deployment is in Executor endpoints mapping (#5440)
This release adds an extra check in the Gateway when sending requests to deployments: The Gateway sends requests to the deployment only if it is in the Executor endpoint mapping.
Unblock event loop to allow health service (#5433)
Prior to this release, sync function calls inside Executor endpoints blocked the event loop. This meant that health-checks submitted to Executors failed for long tasks (for instance, inference using a large model).
In this release, such tasks no longer block the event loop. While concurrent requests to the same Executor wait until the sync task finishes, other runtime tasks remain functional, mainly health-checks.
Dump environment variables to string for Kubernetes (#5430)
Environment variables are now cast to strings before dumping them to Kubernetes YAML.
Unpin jina-hubble-sdk version (#5412)
This release frees (unpins) jina-hubble-sdk
version. The latest jina-hubble-sdk
is installed with the latest Jina.
Bind servers to host
argument instead of __default_host__
(#5405)
This release makes servers at each Jina pod (head, Gateway, worker) bind to the host address specified by the user, instead of always binding to the __default_host__
corresponding to the OS. This lets you, depending on your network interface, restrict or expose your Flow services in your network.
For instance, if you wish to expose all pods to the internet, except for the last Executor, you can do:
flow = Flow(host='0.0.0.0').add().add(host='127.0.0.1')
After this fix, Jina respects this syntax and binds the last Executor only to 127.0.0.1
(accessible only inside the host machine).
Thanks to @wqh17101 for reporting this issue!
Fix backoff_multiplier
format when using max_attempts
in the Client
(#5403)
This release fixes the format of backoff_multiplier
parameter when injected into the gRPC request. The issue appeared when you use the max_attempts
parameter in Client
.
Maintain the correct tracing operations chain (#5391)
Tracing spans for Executors used to show up out of order. This behavior has been fixed by using the method start_as_current_span
instead of start_span
to maintain the tracing chain in the correct order.
Use Async health servicer for tracing interceptors when tracing is enabled (#5392)
When tracing is enabled, health checks in Docker and Kubernetes deployments used to fail silently until the Flow timed out. This happened because tracing interceptors expected RPC stubs to be coroutines.
This release fixes this issue by using the async aio.HealthServicer
instead of grpc_health.HealthServicer
. Health checks submitted to runtimes (Gateway, head, worker) no longer fail when tracing is enabled.
Properly update requests count in case of Exception inside the HeadRuntime
(#5383)
In case of an Exception being raised in the HeadRuntime
, request counts were not updated properly (pending requests should have been decremented and failed requests should have been incremented). This is fixed in this release and the Exception is caught to update request counts.
Fix endpoint binding when inheriting Executors (#5380)
When an Executor is inherited, the bound endpoints of the parent Executor used to be overridden by those of the child Executor. This meant, if you inherited from Executors but still chose to use the parent Executor in your Flow, a wrong endpoint could have been called.
This behavior is fixed by making Executor.requests
a nested dict that also includes information about the class name. This helps to properly support Executor inheritance.
Missing recording logic in connection stub metrics (#5363)
Recording of request and response size in bytes is fixed to track all cases. This makes these metrics more accurate for the Gateway.
Move build configs to pyproject
(#5351)
Build requirements have been moved from setup.py
to pyproject.toml
. This suppresses deprecation warnings that show up when installing Jina.
New timer should keep labels (#5341)
The MetricsTimer
in instrumentation previously created new timers without keeping the histogram metric labels. This behavior is fixed and new timers retain the same labels.
Use non-mutable default for MetricsTimer
constructor (#5339)
Use None
instead of empty dict as a default value for histogram_metric_labels
in MetricsTimer
constructor.
Catch RpcError
and show better error messages in Client
(#5325)
In the Client
, we catch RpcError
and show its details instead of showing a standard error message.
Import OpenTelemetry functions only when tracing is enabled in WorkerRuntime
(#5321)
This release ensures OpenTelemetry functions are only imported when tracing is enabled in the worker.
📗 Documentation Improvements
- Remove 3 off-topic articles
- Enable flag to convert resource labels to metric labels (#5409)
- Add reference to Jina Kotlin client from community (#5390)
- Add section about the transition from DocArray to Jina (#5382)
- Add tips about supporting multiprocessing with
fork
in Jina when using macOS (#5379) - Fix reference to multiprocessing with
spawn
section and emphasize the need for entrypoint protection (#5370) - Update instructions to build protos locally using protogen Docker image (#5335)
- Change mentions of JCloud to Jina AI Cloud (#5329)
- Restructure docs into cloud-native section (#5332)
- Add contributor acknowledgement and spell checking (#5324)
- Create a Kubernetes section (#5315)
- Add reference to Go and PHP clients (#5253)
- Introduce versioning to the documentation (#5310)
- Support redirects for removed documentation pages (#5301)
- Use better Grafana screenshot without random text block (#5306)
🤘 Contributors
We would like to thank all contributors to this release:
- Ziniu Yu (@ZiniuYu)
- Andrei Ungureanu (@Andrei997)
- Alex Cureton-Griffiths (@alexcg1)
- Yanlong Wang (@nomagick)
- Johannes Messner (@JohannesMessner)
- samsja (@samsja)
- Joan Fontanals (@JoanFM)
- Zhaofeng Miao (@mapleeit)
- Girish Chandrashekar (@girishc13)
- Jackmin801 (@Jackmin801)
- Han Xiao (@hanxiao)
- AlaeddineAbdessalem (@alaeddine-13)
- Nan Wang (@nan-wang)