Workflow orchestration

Each workflow is controlled by a subclass of Workflow, which runs on the server with full database access. It is in charge of “orchestrating” the workflow by doing the following:

  • laying out an initial execution plan, as a directed acyclic graph of new WorkRequest instances (via its populate method)

  • optionally, being called back later to modify its graph after analysing the current state of its work requests (via its callback method)

Workflows are themselves special kinds of work requests (with task_type set to "workflow"), and may contain sub-workflows which are in charge of parts of the graph. The initial workflow is referred to as the “root workflow”.

The root workflow has a special collection associated with it, referred to as the “internal collection”, and shared among all its sub-workflows. This is used to coordinate sub-workflows, allowing work requests to declare that they will provide certain kinds of artifacts which may then be required by work requests in other sub-workflows.

Dependencies

Under a root workflow, work requests may depend on each other, including work requests from different sub-workflows; but they may not depend on work requests outside their root workflow.

Child work requests are normally created using the work_request_ensure_child method, which helps to ensure that workflow population is idempotent. It creates work requests in the blocked status using the deps unblock strategy, meaning that they will become pending once all their dependencies have completed.

Sub-workflows

Advanced workflows can be created by combining multiple limited-purpose workflows: for example, the reverse_dependencies_autopkgtest workflow figures out which source packages need to have tests run for them, but creates instances of the autopkgtest workflow to schedule tests for each individual source package.

Sub-workflows are integrated in the general graph of their parent workflow as work requests of type workflow. From a user interface perspective, they are typically hidden as a single step in the visual representation of the parent’s workflow.

Cooperation between workflows is defined at the level of workflows. Individual work requests should not concern themselves with this; they are designed to take inputs using lookups and produce output artifacts that are linked to the work request.

On the providing side, workflows use the update-collection-with-artifacts event reaction to add relevant output artifacts from work requests to the internal collection, and create promises to indicate to other workflows that they have done so. Providing workflows choose item names in the internal collection; it is the responsibility of workflow designers to ensure that they do not clash, and workflows that provide output artifacts have a optional prefix field in their task data to allow multiple instances of the same workflow to cooperate under the same root workflow. The provides_artifact method helps with this.

On the requiring side, workflows look up the names of artifacts they require in the internal collection; each of those lookups may return nothing, or a promise including a work request ID, or an artifact that already exists, and they may use that to determine which child work requests they create. They use lookups in their child work requests to refer to items in the internal collection (e.g. internal@collections/name:build-amd64), and add corresponding dependencies on work requests that promise to provide those items. The requires_artifact method helps with this.

Sub-workflows may depend on other steps within the root workflow while still being fully populated in advance of being able to run. A workflow that needs more information before being able to populate child work requests should normally depend on the work requests that will provide the information it needs; failing that, it should use workflow callbacks to run the workflow orchestrator again when it is ready. (For example, a workflow that creates a source package and then builds it may not know which work requests it needs to create until it has created the source package and can look at its Architecture field.)

Workflows themselves should not normally have dependencies, since that means that their orchestrators cannot run and populate the work request graph in advance. The exception is where the workflow orchestrator itself needs some information from some of its input artifacts in order to work out which child work requests to create; in such cases the workflow itself should have dependencies that mean the orchestrator does not run until that information is available. Otherwise, it is better for workflows to create child work requests that have whatever dependencies are needed.