A job in Ocypod represents some task created by clients, which will be queued, then fetched and processed by workers.
Each job has a set of metadata associated with it, some of which is managed by Ocypod, and some of which can be created/updated by clients/workers.
Job lifecycle and statuses
When a job is initially created, it's added to a queue, assigned the
Clients will then poll that queue for new jobs, receiving the job's payload
(the contents of its
input field), and the job's ID. The job is removed
from its queue, and its status is set to
If the client completes the job, it will send a message to Ocypod asking
it to update the job's status to
completed. If there's some
error/exception and the client can't finish the job, it will mark the job
If the client fails to complete/fail a job before the job's timeout (or
heartbeat timeout) is exceeded, then Ocypod marks the job as
Ocypod will periodically look at all failed and timed out jobs and check if they're elgible for automatic retries, and if so, will re-queue them.
The Ocypod server maintains the following information about a job, some of which is immutable, some of which will be modified by Ocypod throughout a job's lifecycle, and some of which is modifiable by clients.
id- autogenerated ID for the job, generated when a job is first created and queued
queue- name of the queue the job was created in
status- current status of the job
tags- list of tags (if any) assigned to this job at creation time
created_at- date/time this job was first created and queued
started_at- date/time this job was accepted by a client, and the job's status changed to
ended_at- date/time this job stopped running, whether due to successful completed, timing out, or failure
last_heartbeat- date/time the last heartbeat for this job was sent by the client executing it
input- the job's payload, sent by the client creating this job - this typically contains the data needed for a worker to execute the job
output- contains any information the client working on this job decides to store here, this might include the job's result, progress information, partial results, etc. - it can be set anytime the task is running
timeout- maximum execution time of the job before it's marked as timed out
heartbeat_timeout- maximum time without receiving a heartbeat before the job is marked as timed out
expires_after- amount of time this job metadata will persist in Ocypod after the job reaches a final state (i.e.
timed_outwith no retries remaining)
retries- number of times this job will automatically be requeued on failure
retries_attempted- number of times this job has failed and been requeued
retry_delays- minimum amount of time to wait between each retry attempt
ended- indicates whether the job is in a final state or not (i.e. completed, or failed/timed out with no retries remaining)
A job in Ocypod will always have one of the following statuses:
created- set by the server when a job is first created and added to a queue
running- set by the server when a worker picks up a job
completed- set by the client to mark a job as successfully completed
failed- set by the client to mark a job as having failed
timed_out- set by the server when a job exceeds either its
cancelled- set by client to mark that a job has been cancelled
To aid clients that are checking on the status of jobs, each job also has an
ended boolean field. This is set to
true if the job is in its final state,
A job is marked as ended in the following circumstances:
- job has
- job has
- job has
failedstatus and 0 retries remaining
- job has
timed_outstatus and 0 retries remaining
Each queue in Ocypod has its own settings, which are used as defaults for jobs created on that queue (though they can be overridden on a per-job basis).
A queue in Ocypod is a FIFO, with new jobs being added to the beginning of the queue, and workers taking jobs from the end of the queue.
Each queue has a number of settings, which are defaults that are applied to new jobs created in that queue. Each can be overridden on a per-job basis, they just exist at the queue level for convenience.
This is the maximum amount of time a job can be running for before it's considered to have timed out. It's specified as a human readable duration string, e.g. "30s", "1h15m5s", "3w2d", etc.
To disable timeouts entirely, this can be set to "0s".
For long running jobs, it's recommended that workers send regular heartbeats
to the Ocypod server to let it know that the job is still being processed.
This allows timeouts or failures to be noticed much earlier than if just
heartbeat_timeout setting determines how long a job can be running
without getting a heartbeat update before it's considered to have timed out.
It's specified as a human readable duration string.
To disable heartbeat timeouts entirely, this can be set to "0s".
This setting determines how long jobs that have ended (either successfully completed, failed, or timed out without any retries) will remain in the system. After this period of time, the job and its metadata will be cleared from Ocypod.
This is specified as a human readable duration string, and can be set to "0s" to disable expiry entirely. In this case, you'll be responsible for managing and cleaning up old jobs manually.
A tag is a short string that can be attached to a job at creation time. An endpoint for getting all job IDs by tag is provided.
This allows separate jobs to be grouped together, use cases include e.g.:
- using a batch ID tag to a related set of jobs
- using a username tag to track all jobs belonging to a user
- using a source tag to track the client/process that created a job