Workflow orchestration
Each workflow is controlled by a subclass of
Workflow
, which runs on the server with full database access. It is in
charge of “orchestrating” the workflow by doing the following:
laying out an initial execution plan, as a directed acyclic graph of new
WorkRequest
instances (via itspopulate
method)optionally, being called back later to modify its graph after analysing the current state of its work requests (via its
callback
method)
Workflows are themselves special kinds of work requests (with task_type
set to "workflow"
), and may contain sub-workflows which are in charge of
parts of the graph. The initial workflow is referred to as the “root
workflow”.
The root workflow has a special collection associated with it, referred to as the “internal collection”, and shared among all its sub-workflows. This is used to coordinate sub-workflows, allowing work requests to declare that they will provide certain kinds of artifacts which may then be required by work requests in other sub-workflows.
Dependencies
Under a root workflow, work requests may depend on each other, including work requests from different sub-workflows; but they may not depend on work requests outside their root workflow.
Child work requests are normally created using the
work_request_ensure_child
method,
which helps to ensure that workflow population is idempotent. It creates
work requests in the blocked
status using the deps
unblock strategy,
meaning that they will become pending
once all their dependencies have
completed.
Sub-workflows
Advanced workflows can be created by combining multiple limited-purpose workflows: for example, the reverse_dependencies_autopkgtest workflow figures out which source packages need to have tests run for them, but creates instances of the autopkgtest workflow to schedule tests for each individual source package.
Sub-workflows are integrated in the general graph of their parent workflow
as work requests of type workflow
. From a user interface perspective,
they are typically hidden as a single step in the visual representation of
the parent’s workflow.
Cooperation between workflows is defined at the level of workflows. Individual work requests should not concern themselves with this; they are designed to take inputs using lookups and produce output artifacts that are linked to the work request.
On the providing side, workflows use the
update-collection-with-artifacts event reaction to add
relevant output artifacts from work requests to the internal collection, and
create promises to indicate to other workflows
that they have done so. Providing workflows choose item names in the
internal collection; it is the responsibility of workflow designers to
ensure that they do not clash, and workflows that provide output artifacts
have a optional prefix
field in their task data to allow multiple
instances of the same workflow to cooperate under the same root workflow.
The provides_artifact
method helps
with this.
On the requiring side, workflows look up the names of artifacts they require
in the internal collection; each of those lookups may return nothing, or a
promise including a work request ID, or an artifact that already exists, and
they may use that to determine which child work requests they create. They
use lookups in their child work requests to refer to
items in the internal collection (e.g.
internal@collections/name:build-amd64
), and add corresponding
dependencies on work requests that promise to provide those items. The
requires_artifact
method helps
with this.
Sub-workflows may depend on other steps within the root workflow while still
being fully populated in advance of being able to run. A workflow that
needs more information before being able to populate child work requests
should normally depend on the work requests that will provide the
information it needs; failing that, it should use workflow callbacks to run the workflow orchestrator again when it is
ready. (For example, a workflow that creates a source package and then
builds it may not know which work requests it needs to create until it has
created the source package and can look at its Architecture
field.)
Workflows themselves should not normally have dependencies, since that means that their orchestrators cannot run and populate the work request graph in advance. The exception is where the workflow orchestrator itself needs some information from some of its input artifacts in order to work out which child work requests to create; in such cases the workflow itself should have dependencies that mean the orchestrator does not run until that information is available. Otherwise, it is better for workflows to create child work requests that have whatever dependencies are needed.