Work request scheduling
Lifecycle of a user-submitted work request
When a work request gets submitted by a user, debusine records it with a
pending
status and with no worker assigned. The list of pending work
requests constitutes the queue of work requests that are waiting to be
processed.
When the debusine scheduler finds a suitable worker (it must be idle and
must match the requirements defined by the planned task), the work
request is assigned to the worker and the worker is notified of the
availability of a new work request to process. When the worker starts
to process the work request, the status is updated to running
.
Note
At any point in time, there is at most one (pending or running) work request assigned to a given worker.
When the worker has finished to process the work request, and after
having sent back the results and uploaded generated artifacts, the status
is updated to completed
.
The aborted
status is a special case, it can only be set by the
submitter, by an administrator or by the scheduler when the pre-requisites
are not met. It is the official way to cancel a work request.
A failed work request can be retried, in which case a new work request is created superseding the old one, its dynamic task data is recomputed with new lookups to update references to artifacts like build environments, and previous work request dependencies are updated to point to the new one, effectively replacing it. The superseded work request will be kept for inspection.
Lifecycle of a work request inside a workflow
When a workflow creates work requests, it will typically create
dependencies between them. When a work request has a dependency
against a work request that is not yet completed, it is put in the
blocked
status.
The scheduler will move the work request to the pending
status
only when all the dependent work requests have successfully completed
their work. When a dependent work request has failed (and when it was not
allowed to fail), the work request will be marked as aborted
.
Priorities
Work requests have a base priority and a priority adjustment. The former is set automatically, and the latter by administrators. The effective priority of a work request is the sum of its base priority and its priority adjustment. The scheduler considers eligible tasks in descending order of effective priority.
The base priority of a work request is normally set by a workflow template or a workflow orchestrator. Failing that, it defaults to the effective priority of the parent work request (computed at creation time). If there is no parent work request, it defaults to 0.
Workflow templates have a priority, which is used as the initial base priority for work requests created from that template. This can be set by administrators, and it is expected that workflows used by automated QA tasks would be given a negative priority.
When workflow orchestrators lay out an execution plan, they may adjust the base priority of each resulting work request relative to the parent work request’s effective priority. For example, an orchestrator planning several different kinds of tasks might choose to give quicker static analysis tasks a slightly higher base priority than slower dynamic analysis tasks.
Separating the priority adjustment from the base priority allows us to tell more easily when effective priorities have been adjusted manually.
Users with the db.change_workrequest
permission (including superusers)
can use debusine manage-work-request --set-priority-adjustment ADJUSTMENT
WORK_REQUEST_ID
to adjust a work request’s priority.
Ordering of work request
The scheduler handles the queue of pending work requests in descending order of effective priority, breaking ties by chronological order of creation (first in, first out).
Note
This doesn’t mean that all work requests will be processed in that priority order because they can have different requirements for workers. If work request N has no suitable worker available, but N+1 has one worker available, then N+1 will start before N.
Matching of workers and work requests
Every time that a worker completes a work request, the scheduler kicks in and tries to find a suitable next work request for that worker.
The scheduler builds a dictionary-based description of that worker by combining static metadata (set by the administrators) and dynamic metadata (returned by the worker themselves). The key/values from the static metadata take precedence over those provided by the dynamic metadata.
A first filter is made by:
excluding work requests whose
task_name
are listed in thetasks_denylist
metadataselecting work requests whose
task_name
are listed in thetasks_allowlist
metadata
Then a second — work-request specific — filter is made by the scheduler.
For each work request, the scheduler builds the Task
object out of the
work request and runs task.can_run_on(worker_metadata)
to verify if
the work request can run on that worker.
If there are remaining work requests, then they are deemed to be suitable for that worker and the oldest of those work requests is assigned to that worker.
Management of architecture-specific tasks
Many work requests have to run on worker of a specific architecture (or on
a worker that is compatible with that architecture). This selection is done
as part of the task.can_run_on
method and relies on the
system:architectures
metadata key.
That key is set by default to a list with a single item containing the host
architecture (as returned by dpkg --print-architecture
).
See Configure the list of compatible architectures for detailed instructions on how to configure the appropriate metadata.