Task configuration
When you maintain a complete distribution like Debian or one of its derivatives, you have to deal with special cases and exceptions, for example:
disable build/autopkgtest/etc. of a package on a specific architecture because it kills the workers
restrict the build/autopkgtest/etc. of a package to specific workers where the build is known to succeed
etc.
As a derivative, you might want to make opinionated choices and change some of the build parameters by using a specific build profile on some packages.
Those tweaks and exceptions need to be recorded and managed somewhere, and then later used to feed the relevant workflows and work requests. We plan to store this information in new collections and improve workflows/work request creation to be able to use those collections to apply the desired overrides.
Collection debusine:task-configuration
This collection is meant to store configuration data for tasks. By
configuration data, we mean “key/value pairs” that are going to be fed in
the task_data
field of Work requests. By tasks, we mean any
type of Tasks, but for all practical purposes, Worker
and Workflow
tasks are the most likely target here.
The actual configuration data is to be generated by merging multiple snippets of configuration (stored in multiple collection items), each applying at different levels of granularity.
To provide fine-grained control of the configuration, we consider that a subject is being processed by a task and that the task can have a configuration context. The configuration context is typically another parameter of the task that can usefully be leveraged to apply some consistent configuration across all work requests sharing the same configuration context.
Todo
Decide whether we want to support multiple configuration contexts? A task could return multiple contexts and typically the sbuild task could usefully have parameters related to the host_architecture and to the target suite, on top of parameters related to the source package.
It could be achieved with context values like suite=jessie
and
architecture=arm64
.
Those two values are used to lookup the various snippets of configuration. The snippets are retrieved and processed in the following orders:
global (subject=None, context=None)
context level (subject=None, context != None)
subject level (subject != None, context=None)
specific-combination level (subject != None, context != None)
The collection can host partial or full configuration data. But it is expected to be mainly useful to store overrides, i.e. variations compared the defaults provided by the task or its containing workflow.
To correctly represent all those possibilities, the bare data item always has the following fields:
task_type
(required): thetask_type
of the task for which we want to provide default values and overridestask_name
(required): thetask_name
of the task for which we want to provide default values and overridessubject
(defaults to None): an abstract string value representing the subject of the task (i.e. something passed as input). It is meant to group possible inputs for the tasks into groups that we expect to configure similarly.context
(defaults to None): an abstract string value representing the configuration context of the task. It is typically another important task parameter (or derived from it).template
(defaults to None): the name of a template entry
For example, for the debian-pipeline
workflow, subject
would typically be
the source package name while context
would be the name of the target
suite.
The name of each item is TASK_TYPE:TASK_NAME:SUBJECT:CONTEXT
, except
for a template item where it is template:TEMPLATE
.
Other collection-specific characteristics:
Data: none
Valid items:
debusine:task-configuration
bare data
Lookup names:
None. The name based lookup is sufficient.
Constraints:
when
template
is set,task_type
,task_name
,subject
andcontext
should be None
Bare data debusine:task-configuration
On top of the mandatory classification fields documented above, the following fields are supported and correspond to successive steps to perform to build the configuration data:
use_templates
(list): a list of template names whose corresponding entries shall be retrieved and imported as part of the configuration returned for the current entrydelete_values
(list): a list of configuration keys to delete from the values returned by the previous configuration levelsdefault_values
(dict): values to use as default values if the user did not provide any value for the given configuration keysoverride_values
(dict): values to use even if the user did provide a value for the given configuration keylock_values
(list): a list of configuration keys that should be locked (i.e. next configuration level can no longer provide or modify the corresponding value)comment
(string): multiline free form text used to document the reasons behind the provided configuration. Text can use Markdown syntax.
This mechanism only allows to control top-level configuration keys in
task_data
fields. It is not possible to override a single value
in a nested dictionary, but you can override the whole dictionary if you
wish so.
When the same configuration key appears in default_values
and
override_values
(either in a single entry, or in the entry created by
combining the different levels), the one from override_values
take
precedence over the one from default_values
.
About templates
Template entries follow the same structure as other entries, but they are
only used indirectly, when a normal configuration entry refers
to them as part of its use_templates
field.
It is meant to share some common configuration across multiple similar packages.
Example:
template:uefi-sign:
default_values:
enable_make_signed_source: True
make_signed_source_purpose: uefi
template:uefi-sign-with-fwupd-key:
use_templates:
- uefi-sign
default_values:
make_signed_source_key: sid@debian:suite-signing-keys/fingerprint:AEC1234
template:uefi-sign-with-grub-key:
use_templates:
- uefi-sign
default_values:
make_signed_source_key: sid@debian:suite-signing-keys/fingerprint:CBD3214
Workflow:debian-pipeline:fwupd-efi::
use_templates:
- sign-with-fwupd-key
Workflow:debian-pipeline:fwupdate::
use_templates:
- sign-with-fwupd-key
Workflow:debian-pipeline:grub2::
use_templates:
- sign-with-grub-key
Design considerations
Workflow vs task feature
While this was initially started as “external instructions for the
debian-pipeline
workflow”, the comments lead us to build this as a
general solution to provide configuration for any workflow. But since
workflows are just one kind of tasks, I figured out that we could just as
well apply this new concept to all kinds of tasks so that when you provide
some configuration to a workflow, it would also apply to all the child
tasks.
Having the ability to store overrides at the worker task level can save us from adding too many parameters on the workflows. The only required parameters would be those that are important to control the orchestration step.
For example, we could have configuration for the sbuild worker task next to the configuration for the debian-pipeline workflow:
Workflow:debian-pipeline:::
default_values:
...
Worker:sbuild::jessie:
override_values:
backend: schroot
This specific example shows how the sbuild_backend
parameter might no
longer be needed on the debian-pipeline
workflow. We might still want
to keep it.
Despite this, I still expect that the bulk of the configuration data
stored in debusine:task-configuration
will concern workflows because
workflows are designed to have a very flat task_data
structure, i.e. with
many top-level keys that can thus be individually overridden. This is not
always the case for worker tasks.
Integration with dynamic_data
To be able to apply changes to the submitted task_data
configuration,
we need to be able to know the subject and the context. However in
many cases the subject is not yet known (because it is the output of a
previous work request).
In spirit, this is similar to the fact that the various lookups can only be resolved when the work request becomes
pending. The conversion of those fields is handled by
compute_dynamic_data()
.
I thus suggest to replace or enhance that process to also take care of applying the task configuration.
Algorithm to apply the configuration
The logic that we want to see implemented is the following:
First build a single “configuration entry” by combining all the relevant collection items. To achieve this, you need to process all items in the correct order (integrating the items referenced from
use_templates
just before the corresponding item) by doing the following operations:default_values = dict() override_values = dict() locked_values = set() for config_item in all_items: # Drop all the entries referenced in `delete_values` (except # locked values) for key in config_item.delete_values: if key in locked_values: continue del default_values[key] del override_values[key] # Merge the default/override values in the response # (except locked values) for key, value in config_item.default_values: if key in locked_values: continue default_values[key] = value for key, value in config_item.override_values: if key in locked_values: continue override_values[key] = value # Update the set of locked values locked_values.update(config_item.lock_values) return (default_values, override_values)
Then apply the operations of that single combined-entry to the data available in
task_data
:new_task_data = task_data.copy() default_values, override_values = get_merged_task_configuration() # Apply default values (add missing values, but also replace explicit # None values) for k, v in default_values: if new_task_data.get(k) is None: new_task_data[k] = v # Apply overrides new_task_data.update(override_values)
Implementation plan
Extend
BaseTaskData
with atask_configuration
field that is a Single lookup and that should return the (optional)debusine:task-configuration
collection to use to configure the task.Extend
BaseDynamicTaskData
with 4 new fields:task_configuration_id
: the result of the collection lookup of thetask_configuration
fieldsubject
(str): the subject value defined aboveconfiguration_context
(str): the context value defined aboveruntime_context
(str): the context value defined for the task-history collection
Extend
WorkRequest
with a newconfigured_task_data
field that is similar totask_data
and a newversion
integer field used to store the version of the code that has been used to compute the dynamic task data.Create a new
configure_on_server()
method on Task that builds up oncompute_dynamic_data()
and that computes theconfigured_task_data
field:def configure_on_server(..., task_database: TaskDatabaseInterface): # 1. Compute dynamic data (including the subject / context / # task_configuration_id values) dynamic_data = self.compute_dynamic_data(task_database) # 2. Apply the configuration from task-configuration (when # possible) configured_data = task_database.apply_configuration( self.task_data, dynamic_data) # 3. Recompute the dynamic data with the configured_data self.task_data = configured_data dynamic_data = self.compute_dynamic_data(task_database) # 4. Store everything in the database task_database.set_configured_task_data(configured_data) task_database.set_dynamic_data(dynamic_data) task_database.set_version(self.TASK_VERSION)
Hook that new method in place of the current
compute_dynamic_data()
in the scheduler. Make sure the Task class is fed withconfigured_task_data
in the workers.Gradually update the tasks to compute the new BaseDynamicTaskData fields in
compute_dynamic_data()
.