Task configuration

See Task configuration for an introduction to the concept and Manage task configuration for a practical guide on how to create and maintain task configuration.

Task configuration entries

Tweaks and exceptions from normal task processing are recorded as debusine:task-configuration entries in a debusine:task-configuration collection, and then later used to feed the relevant workflows and work requests.

Before a task becomes pending, relevant entries (if any) are looked up, merged in a well defined order and finally applied.

Tasks can be any type of Tasks, but for all practical purposes, Worker and Workflow are the most likely targets for configuration.

The result of applying a task configuration entry can be:

key/value pairs added or replaced in the task_data field of Work requests
Extra task-provided scheduler tags added to the task’s computed set.
Extra task-required scheduler tags added to the task’s computed set.

When task configuration is applied

To be able to apply changes to the submitted task_data configuration, we need to be able to know the subject and the context, which may depend on information not available when the task is created. For example, the subject may be derived from an artifact that is the output of a previous work request in a workflow.

Task configuration can thus be applied only when a task becomes pending, and subject and context are generated at that time using the task’s debusine.db.tasks.DBTask.get_task_configuration_subject_context() method.

This also ensures that at the time a task is pending, the final set of scheduler tags has been computed, including those contributed by task configuration entries, so that the scheduler has all the information needed to look for a suitable worker.

How task configuration is applied

Looking up task configuration entries for a task

When looking up task configuration entries for a task, these task attributes are used for matching:

Task type
Task name
Task subject, that is, a string describing what is being processed by the task
Task configuration context, which is a string that can usefully be leveraged to apply some consistent configuration across all work requests sharing the same configuration context value
Scheduler tags provided by the task
Scheduler tags required by the task

Those values are used to lookup the various snippets of configuration, which are retrieved and processed in a well defined order (see Ordering matching entries).

The collection can host partial or full configuration data. But it is expected to be mainly useful to store overrides, i.e. variations compared the defaults provided by the task or its containing workflow.

For example, for the debian-pipeline workflow, subject would typically be the source package name while context would be the name of the target suite.

Ordering matching entries

The final configured task data is generated by merging multiple snippets of configuration, each stored in its debusine:task-configuration entry. The ordering is well defined and intended to allow maintaining entries that apply at different levels of granularity.

The entries to be applied to a task are sorted according to:

The value of the path and position fields, if present
Subject and context, in this order:
1. entries with context and no subject
2. entries with subject and no context
3. entries with both subject and context
The database ID of the entry, as a tie-breaker to make ordering deterministic

If an entry uses templates in use_templates, the referenced template entries are placed immediately after the entry in the resulting ordered list.

About templates

Template entries follow the same structure as other entries, except that the task matching parts are replaced by a template name.

They are meant to share some common configuration across multiple similar packages, are only used indirectly: when a normal configuration entry, or another template, refers to them as part of its use_templates field.

Building the set of changes to apply

Once the entries are sorted, they are processed to build a set of default values and override values, with the following operations:

default_values = dict()
override_values = dict()
locked_values = set()
provide_tags = set()
require_tags = set()

for config_item in all_items:
    # Drop all the entries referenced in `delete_values` (except
    # locked values)
    for key in config_item.delete_values:
        if key in locked_values:
            continue
        del default_values[key]
        del override_values[key]

    # Merge the default/override values in the response
    # (except locked values)
    for key, value in config_item.default_values:
        if key in locked_values:
            continue
        default_values[key] = value
    for key, value in config_item.override_values:
        if key in locked_values:
            continue
        override_values[key] = value

    # Update the set of locked values
    locked_values.update(config_item.lock_values)

    # Update the sets of provided tags
    provide_tags.update(config_item.provide_tags)
    require_tags.update(config_item.require_tags)

return (default_values, override_values, provide_tags, require_tags)

Applying changes

Once we have a set of default_values, override_values, and tags to provide/require, they get applied to the data available in task_data:

new_task_data = task_data.copy()
default_values, override_values = get_merged_task_configuration()

# Apply default values (add missing values, but also replace explicit
# None values)
for k, v in default_values:
    if new_task_data.get(k) is None:
        new_task_data[k] = v

# Apply overrides
new_task_data.update(override_values)

The result is stored in WorkRequest.configured_task_data, which will be used from that point on as the task’s data, while WorkRequest.task_data remains untouched as documentation for the initial task input.

Moreover, provided and required tags from task configuration entries are merged with the set of computed scheduler tags, using the WORKSPACE provenance (see Tag provenance):

# Add provided/required tags
tags_provided.add(
    ttags.ProvenanceProvided.WORKSPACE, provide_tags
)
tags_required.add(
    wtags.ProvenanceRequired.WORKSPACE, require_tags
)

Reducing workflow complexity

Having the ability to store overrides at the worker task level saves us from adding too many configuration parameters on the workflows, so that the only required parameters are those that are important to control the orchestration step.

For example, we can have configuration for the sbuild worker task next to the configuration for the debian-pipeline workflow:

- task_type: Workflow
  task_name: debian-pipeline
  default_values:
    ...

- task_type: Worker
  task_name sbuild
  context: stretch
  override_values:
    - backend: incus-lxc

This shows how the sbuild_backend parameter might no longer be a needed input for the debian-pipeline workflow, though it is still available.

Task configuration examples

See Example task configuration files to see some realistic examples.