Core concepts

Job

A job in Ocypod represents some task created by clients, which will be queued, then fetched and processed by workers.

Each job has a set of metadata associated with it, some of which is managed by Ocypod, and some of which can be created/updated by clients/workers.

Job lifecycle and statuses

When a job is initially created, it's added to a queue, assigned the queued status.

Clients will then poll that queue for new jobs, receiving the job's payload (the contents of its input field), and the job's ID. The job is removed from its queue, and its status is set to running.

If the client completes the job, it will send a message to Ocypod asking it to update the job's status to completed. If there's some error/exception and the client can't finish the job, it will mark the job as failed.

If the client fails to complete/fail a job before the job's timeout (or heartbeat timeout) is exceeded, then Ocypod marks the job as timed_out.

Ocypod will periodically look at all failed and timed out jobs and check if they're elgible for automatic retries, and if so, will re-queue them.

Job metadata

The Ocypod server maintains the following information about a job, some of which is immutable, some of which will be modified by Ocypod throughout a job's lifecycle, and some of which is modifiable by clients.

id - autogenerated ID for the job, generated when a job is first created and queued
queue - name of the queue the job was created in
status - current status of the job
tags - list of tags (if any) assigned to this job at creation time
created_at - date/time this job was first created and queued
started_at - date/time this job was accepted by a client, and the job's status changed to running
ended_at - date/time this job stopped running, whether due to successful completed, timing out, or failure
last_heartbeat - date/time the last heartbeat for this job was sent by the client executing it
input - the job's payload, sent by the client creating this job - this typically contains the data needed for a worker to execute the job
output - contains any information the client working on this job decides to store here, this might include the job's result, progress information, partial results, etc. - it can be set anytime the task is running
timeout - maximum execution time of the job before it's marked as timed out
heartbeat_timeout - maximum time without receiving a heartbeat before the job is marked as timed out
expires_after - amount of time this job metadata will persist in Ocypod after the job reaches a final state (i.e. completed/failed/timed_out with no retries remaining)
retries - number of times this job will automatically be requeued on failure
retries_attempted - number of times this job has failed and been requeued
retry_delays - minimum amount of time to wait between each retry attempt
ended - indicates whether the job is in a final state or not (i.e. completed, or failed/timed out with no retries remaining)

Job Status

A job in Ocypod will always have one of the following statuses:

queued - set by the server when a job is first created and added to a queue
running - set by the server when a worker picks up a job
completed - set by the client to mark a job as successfully completed
failed - set by the client to mark a job as having failed
timed_out - set by the server when a job exceeds either its timeout or heartbeat_timeout
cancelled - set by client to mark that a job has been cancelled

To aid clients that are checking on the status of jobs, each job also has an ended boolean field. This is set to true if the job is in its final state, or false otherwise.

A job is marked as ended in the following circumstances:

job has completed status
job has cancelled status
job has failed status and 0 retries remaining
job has timed_out status and 0 retries remaining

Queue

Each queue in Ocypod has its own settings, which are used as defaults for jobs created on that queue (though they can be overridden on a per-job basis).

A queue in Ocypod is a FIFO, with new jobs being added to the beginning of the queue, and workers taking jobs from the end of the queue.

Queue settings

Each queue has a number of settings, which are defaults that are applied to new jobs created in that queue. Each can be overridden on a per-job basis, they just exist at the queue level for convenience.

`timeout`

This is the maximum amount of time a job can be running for before it's considered to have timed out. It's specified as a human readable duration string, e.g. "30s", "1h15m5s", "3w2d", etc.

To disable timeouts entirely, this can be set to "0s".

`heartbeat_timeout`

For long running jobs, it's recommended that workers send regular heartbeats to the Ocypod server to let it know that the job is still being processed. This allows timeouts or failures to be noticed much earlier than if just relying on timeout.

The heartbeat_timeout setting determines how long a job can be running without getting a heartbeat update before it's considered to have timed out. It's specified as a human readable duration string.

To disable heartbeat timeouts entirely, this can be set to "0s".

`expires_after`

This setting determines how long jobs that have ended (either successfully completed, failed, or timed out without any retries) will remain in the system. After this period of time, the job and its metadata will be cleared from Ocypod.

This is specified as a human readable duration string, and can be set to "0s" to disable expiry entirely. In this case, you'll be responsible for managing and cleaning up old jobs manually.

`retries`

This controls the number of times that jobs created in this queue will be automatically retried.

If a job fails or times out and has a number of retries remaining, it will be re-queued.

To disable retries, this can be set to 0.

`retry_delays`

This configures an optional list of delays to apply whenever a job is retried. This allows for different backoff strategies to be configured, depending on the application.

If the number of retries exceeds the number of retry delays specified, then the last value will continue to be used.

E.g. configuring a queue with retries: 4 and retry_delays: ["10s", "1m", "5m"] means that if a job in this queue keeps failing, Ocypod will wait 10 seconds before retrying for the 1st time, 1 minute before retrying a 2nd time, and 5 minutes before retrying for the 3rd and 4th times.

To disable retry delays, this can be ommitted, or set to an empty list.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search

Core concepts

Job

Job lifecycle and statuses

Job metadata

Job Status

Queue

Queue settings

timeout

heartbeat_timeout

expires_after

retries

retry_delays

Tag

`timeout`

`heartbeat_timeout`

`expires_after`

`retries`

`retry_delays`