#
ConfigurationJCloud extends Jina’s Flow YAML specification by introducing the special field jcloud
. This lets you define resources and scaling policies for each Executor and Gateway.
Here’s a Flow with two Executors that have specific resource needs: indexer
requires a 10 GB ebs
disk, whereas encoder
requires a G4 instance, which implies that two cores and 4 GB RAM are used. See the below sections for further information about instance types.
jtype: Flow
executors:
- name: encoder
uses: jinaai+docker://<username>/Encoder
jcloud:
resources:
instance: C4
- name: indexer
uses: jinaai+docker://<username>/Indexer
jcloud:
resources:
storage:
kind: ebs
size: 10G
Allocate Executor resources#
Since each Executor has its own business logic, it may require different cloud resources. One Executor might need more RAM, whereas another might need a bigger disk.
In JCloud, you can pass highly customizable, finely-grained resource requests for each Executor using the jcloud.resources
argument in your Flow YAML.
Instance#
JCloud uses the concept of an “instance” to represent a specific set of hardware specifications. In the above example, a C4 instance type represents two cores and 4 GB RAM based on the CPU tiers instance definition table below.
Note
We will translate the raw numbers from input to instance tier that fits most closely if you are still using the legacy resource specification interface, such as:
jcloud:
resources:
cpu: 8
memory: 8G
There are circumstances in the instance tier where they don’t exactly fulfill the CPU cores and memory you need, like in the above example.
In cases like this we “ceil” the requests to the lowest tier that satisfies all the specifications.
In this case, C6
would be considered, as C5
’s Cores
are lower than what’s being requested (4 vs 8).
There are also two types of instance tiers, one for CPU instances, one for GPU.
Pricing#
Each instance has a fixed Credits Per Hour
number, indicating how many credits JCloud will charge
if a certain instance is used. For example, if an Executor uses C3
, it implies that 10
credits will be spent
from the operating user account. Other important facts to note:
If the Flow is powering other App(s) you create, you will be charged by the App(s), not the underlying Flow.
Credits Per Hour
is on an Executor/Gateway basis, the totalCredits Per Hour
of a Flow is the sum of all the credits each components cost.If shards/replicas are used in an Executor/Gateway, the same instance type will be used, so
Credits Per Hour
will be multiplied. For example, if an Executor usesC3
and it has two replicas, theCredits Per Hour
for the Executor would double to20
. The only exception is when sharding is used. In that caseC1
would be used for the shards head, regardless of what instance type has been entered for the shared Executor.
Hint
Please visit Jina AI Cloud Pricing for more information about billing and credits.
CPU tiers#
Instance |
Cores |
Memory |
Credits per hour |
---|---|---|---|
C1 |
0.1 |
0.2 GB |
1 |
C2 |
0.5 |
1 GB |
5 |
C3 |
1 |
2 GB |
10 |
C4 |
2 |
4 GB |
20 |
C5 |
4 |
8 GB |
40 |
C6 |
8 |
16 GB |
80 |
C7 |
16 |
32 GB |
160 |
C8 |
32 |
64 GB |
320 |
By default, C1 is allocated to each Executor and Gateway.
JCloud offers the general Intel Xeon processor (Skylake 8175M or Cascade Lake 8259CL) for the CPU instances.
GPU tiers#
JCloud supports GPU workloads with two different usages: shared
or dedicated
.
If GPU is enabled, JCloud will provide NVIDIA A10G Tensor Core GPUs with 24 GB memory for workloads in both usage types.
Hint
When using GPU resources, it may take a few extra minutes before all Executors are ready to serve traffic.
Instance |
GPU |
Memory |
Credits per hour |
---|---|---|---|
G1 |
shared |
14 GB |
100 |
G2 |
1 |
14 GB |
125 |
G3 |
2 |
24 GB |
250 |
G4 |
4 |
56 GB |
500 |
Dedicated GPU#
Using a dedicated GPU is the default way to provision a GPU for an Executor. This automatically creates nodes or assigns the Executor to a GPU node. In this case, the Executor owns the whole GPU.
To use a dedicated
GPU, G2
/ G3
/ G4
needs to be specified as instance type.
Storage#
JCloud supports three kinds of storage: ephemeral (default), efs (network file storage) and ebs (block device).
ephemeral
storage will assign space to an Executor when it is created. Data in ephemeral
storage is deleted permanently if Executors are restarted or rescheduled.
Hint
By default, we assign ephemeral
storage to all Executors in a Flow. This lets the storage resize dynamically, so you don’t need to shrink/grow volumes manually.
If your Executor needs to share data with other Executors and retain data persistency, consider using efs
. Note that:
IO performance is slower compared to
ebs
orephemeral
The disk can be shared with other Executors or Flows.
Default storage size is 5 GB.
If your Executor needs high IO, you can use ebs
instead. Note that:
The disk cannot be shared with other Executors or Flows.
Default storage size is 5 GB.
JCloud also supports retaining the data that a Flow was using while it was active. You can set the retain
argument to true
to enable this feature.
jtype: Flow
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
jcloud:
resources:
storage:
kind: ebs
size: 10G
retain: true
- name: executor2
uses: jinaai+docker://<username>/Executor2
jcloud:
resources:
storage:
kind: efs
Pricing#
Here are the numbers in terms of credits per GB per month for the three kinds of storage described above.
Instance |
Credits per GB per month |
---|---|
Ephemeral |
0 |
EBS |
30 |
EFS |
75 |
For example, using 10 GB of EBS storage for a month costs 30
credits.
If shards/replicas are used, we will multiply credits further by the number of storages created.
Scale out Executors#
On JCloud, demand-based autoscaling functionality is naturally offered thanks to the underlying Kubernetes architecture. This means that you can maintain serverless deployments in a cost-effective way with no headache of setting the right number of replicas anymore!
Autoscaling with jinaai+serverless://
#
The easiest way to scale out your Executor is to use a Serverless Executor. This can be enabled by using jinaai+serverless://
instead of jinaai+docker://
in Executor’s uses
, such as:
jtype: Flow
executors:
- name: executor1
uses: jinaai+serverless://<username>/Executor1
JCloud autoscaling leverages Knative behind the scenes, and jinahub+serverless
uses a set of Knative configurations as defaults.
Hint
For more information about the Knative autoscaling configurations, please visit Knative autoscaling.
Autoscaling with custom args#
If jinaai+serverless://
doesn’t meet your requirements, you can further customize autoscaling configurations by using the autoscale
argument on a per-Executor basis in the Flow YAML, such as:
jtype: Flow
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
jcloud:
autoscale:
min: 1
max: 2
metric: rps
target: 50
Below are the defaults and requirements for the configurations:
Name |
Default |
Allowed |
Description |
---|---|---|---|
min |
1 |
int |
Minimum number of replicas ( |
max |
2 |
int, up to 5 |
Maximum number of replicas |
metric |
concurrency |
|
Metric for scaling |
scale_down_delay |
30s |
str, |
Time window which must pass at reduced concurrency before a scaling down |
target |
100 |
int |
Target number the replicas try to maintain. |
The unit of target
depends of the metric specified. Refer to the table below:
Metric |
Target |
---|---|
|
Number of concurrent requests processed at any given time. |
|
Number of requests processed per second per replica. |
|
Average % CPU utilization of each pod |
|
Average mebibytes of memory used by each pod |
After you make a JCloud deployment using the autoscaling configuration, the Flow serving part is just the same: the only difference you may notice is it takes a few extra seconds to handle the initial requests since it needs to scale the deployments behind the scenes. Let JCloud handle the scaling from now on, and you can deal with the code!
Note, that if metric
is cpu
or memory
, min
will be reset to 1 if user sets it to set to 0.
Pricing#
At present, pricing for autoscaled Executor/Gateway largely follows the same JCloud pricing rules as other Jina AI services.
We track the minimum number of replicas in autoscale configurations and use it as a multiplier for the replicas used when calculating the
Credits Per Hour
.
Restrictions#
Restrictions
Autoscale does not currently allow the use of
ebs
as a storage type in combination. Please useefs
andephemeral
instead.Autoscale is not supported for multi-protocol Gateways.
Configure availability tolerance#
If service issues cause disruption of Executors, JCloud lets you specify a tolerance level for number of replicas that stay up or go down.
The JCloud parameters minAvailable
and maxUnavailable
ensure that Executors will stay up even if a certain number of replicas go down.
Name |
Default |
Allowed |
Description |
---|---|---|---|
|
N/A |
Lower than number of replicas |
Minimum number of replicas available during disruption |
|
N/A |
Lower than numbers of replicas |
Maximum number of replicas unavailable during disruption |
jtype: Flow
executors:
- uses: jinaai+docker://<username>/Executor1
replicas: 5
jcloud:
minAvailable: 2
In case of disruption, ensure at least two replicas will still be available, while three may be down.
jtype: Flow
executors:
- uses: jinaai+docker://<username>/Executor1
replicas: 5
jcloud:
maxUnavailable: 2
In case of disruption, ensure that if a maximum of two replicas are down, at least three replicas will still be available.
Configure Gateway#
The Gateway can be customized just like an Executor.
Set timeout#
By default, the Gateway will close connections that have been idle for over 600 seconds. If you want a longer connection timeout threshold, change the timeout
parameter under gateway.jcloud
.
jtype: Flow
gateway:
jcloud:
timeout: 800
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
Control Gateway resources#
To customize the Gateway’s CPU or memory, specify the instance type under gateway.jcloud.resources
:
jtype: Flow
gateway:
jcloud:
resources:
instance: C3
executors:
- name: encoder
uses: jinaai+docker://<username>/Encoder
Expose Executors#
A Flow deployment without a Gateway is often used for Use external Executors, which can be shared between different Flows. You can expose an Executor by setting expose: true
(and un-expose the Gateway by setting expose: false
):
jtype: Flow
gateway:
jcloud:
expose: false # don't expose the Gateway
executors:
- name: custom
uses: jinaai+docker://<username>/CustomExecutor
jcloud:
expose: true # expose the Executor
You can expose the Gateway along with Executors:
jtype: Flow
gateway:
jcloud:
expose: true
executors:
- name: custom1
uses: jinaai+docker://<username>/CustomExecutor1
jcloud:
expose: true # expose the Executor
Other deployment options#
Customize Flow name#
You can use the name
argument to specify the Flow name in the Flow YAML:
jtype: Flow
jcloud:
name: my-name
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
Specify Jina version#
To control Jina’s version while deploying a Flow to jcloud
, you can pass the version
argument in the Flow YAML:
jtype: Flow
jcloud:
version: 3.10.0
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
Add Labels#
You can use labels
(as key-value pairs) to attach metadata to your Flows and Executors:
Flow level labels
:
jtype: Flow
jcloud:
labels:
username: johndoe
app: fashion-search
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
Executor level labels
:
jtype: Flow
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
jcloud:
labels:
index: partial
group: backend
Hint
Keys in labels
have the following restrictions:
Must be 63 characters or fewer.
Must begin and end with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), underscores (_), dots (.), and alphanumerics between.
The following keys are skipped if passed in the Flow YAML.
user
jina
-version
Monitoring#
To enable tracing support in Flows, you can pass enable: true
argument in the Flow YAML. (Tracing support is not enabled by default in JCloud)
jtype: Flow
jcloud:
monitor:
traces:
enable: true
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
You can pass the enable: true
argument to gateway
to only enable tracing support in the Gateway:
jtype: Flow
gateway:
jcloud:
monitor:
traces:
enable: true
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
You can also only enable tracing support in executor1
.
jtype: Flow
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
jcloud:
monitor:
traces:
enable: true