.. _task-configuration: ================== Task configuration ================== When you maintain a complete distribution like Debian or one of its derivatives, you have to deal with special cases and exceptions, for example: * disable build/autopkgtest/etc. of a package on a specific architecture because it kills the workers * restrict the build/autopkgtest/etc. of a package to specific workers where the build is known to succeed * etc. As a derivative, you might want to make opinionated choices and change some of the build parameters by using a specific build profile on some packages. Those tweaks and exceptions need to be recorded and managed somewhere, and then later used to feed the relevant workflows and work requests. We plan to store this information in new collections and improve workflows/work request creation to be able to use those collections to apply the desired overrides. Collection ``debusine:task-configuration`` ========================================== This collection is meant to store configuration data for tasks. By configuration data, we mean "key/value pairs" that are going to be fed in the ``task_data`` field of :ref:`explanation-work-requests`. By tasks, we mean any :ref:`type of Tasks `, but for all practical purposes, ``Worker`` and ``Workflow`` tasks are the most likely target here. The actual configuration data is to be generated by merging multiple snippets of configuration (stored in multiple collection items), each applying at different levels of granularity. To provide fine-grained control of the configuration, we consider that a *subject* is being processed by a task and that the task can have a *configuration context*. The *configuration context* is typically another parameter of the task that can usefully be leveraged to apply some consistent configuration across all work requests sharing the same *configuration context*. .. todo:: Decide whether we want to support multiple configuration contexts? A task could return multiple contexts and typically the sbuild task could usefully have parameters related to the host_architecture and to the target suite, on top of parameters related to the source package. It could be achieved with context values like ``suite=jessie`` and ``architecture=arm64``. Those two values are used to lookup the various snippets of configuration. The snippets are retrieved and processed in the following orders: * global (subject=None, context=None) * context level (subject=None, context != None) * subject level (subject != None, context=None) * specific-combination level (subject != None, context != None) The collection can host partial or full configuration data. But it is expected to be mainly useful to store overrides, i.e. variations compared the defaults provided by the task or its containing workflow. To correctly represent all those possibilities, the bare data item always has the following fields: * ``task_type`` (required): the ``task_type`` of the task for which we want to provide default values and overrides * ``task_name`` (required): the ``task_name`` of the task for which we want to provide default values and overrides * ``subject`` (defaults to None): an abstract string value representing the *subject* of the task (i.e. something passed as input). It is meant to group possible inputs for the tasks into groups that we expect to configure similarly. * ``context`` (defaults to None): an abstract string value representing the *configuration context* of the task. It is typically another important task parameter (or derived from it). * ``template`` (defaults to None): the name of a template entry For example, for the ``debian-pipeline`` workflow, ``subject`` would typically be the source package name while ``context`` would be the name of the target suite. The name of each item is ``TASK_TYPE:TASK_NAME:SUBJECT:CONTEXT``, except for a template item where it is ``template:TEMPLATE``. Other collection-specific characteristics: * Data: none * Valid items: * ``debusine:task-configuration`` bare data * Lookup names: * None. The name based lookup is sufficient. * Constraints: * when ``template`` is set, ``task_type``, ``task_name``, ``subject`` and ``context`` should be None Bare data ``debusine:task-configuration`` ----------------------------------------- On top of the mandatory classification fields documented above, the following fields are supported and correspond to successive steps to perform to build the configuration data: * ``use_templates`` (list): a list of template names whose corresponding entries shall be retrieved and imported as part of the configuration returned for the current entry * ``delete_values`` (list): a list of configuration keys to delete from the values returned by the previous configuration levels * ``default_values`` (dict): values to use as default values if the user did not provide any value for the given configuration keys * ``override_values`` (dict): values to use even if the user did provide a value for the given configuration key * ``lock_values`` (list): a list of configuration keys that should be locked (i.e. next configuration level can no longer provide or modify the corresponding value) * ``comment`` (string): multiline free form text used to document the reasons behind the provided configuration. Text can use Markdown syntax. This mechanism only allows to control top-level configuration keys in ``task_data`` fields. It is not possible to override a single value in a nested dictionary, but you can override the whole dictionary if you wish so. When the same configuration key appears in ``default_values`` and ``override_values`` (either in a single entry, or in the entry created by combining the different levels), the one from ``override_values`` take precedence over the one from ``default_values``. About templates --------------- Template entries follow the same structure as other entries, but they are only used indirectly, when a normal configuration entry refers to them as part of its ``use_templates`` field. It is meant to share some common configuration across multiple similar packages. Example:: template:uefi-sign: default_values: enable_make_signed_source: True make_signed_source_purpose: uefi template:uefi-sign-with-fwupd-key: use_templates: - uefi-sign default_values: make_signed_source_key: AEC1234 template:uefi-sign-with-grub-key: use_templates: - uefi-sign default_values: make_signed_source_key: CBD3214 Workflow:debian-pipeline:fwupd-efi:: use_templates: - sign-with-fwupd-key Workflow:debian-pipeline:fwupdate:: use_templates: - sign-with-fwupd-key Workflow:debian-pipeline:grub2:: use_templates: - sign-with-grub-key Design considerations ===================== Workflow vs task feature ------------------------ While this was initially started as “external instructions for the ``debian-pipeline`` workflow”, the comments lead us to build this as a general solution to provide configuration for any workflow. But since workflows are just one kind of tasks, I figured out that we could just as well apply this new concept to all kinds of tasks so that when you provide some configuration to a workflow, it would also apply to all the child tasks. Having the ability to store overrides at the worker task level can save us from adding too many parameters on the workflows. The only required parameters would be those that are important to control the orchestration step. For example, we could have configuration for the sbuild worker task next to the configuration for the debian-pipeline workflow:: Workflow:debian-pipeline::: default_values: ... Worker:sbuild::jessie: override_values: backend: incus-lxc This specific example shows how the ``sbuild_backend`` parameter might no longer be needed on the ``debian-pipeline`` workflow. We might still want to keep it. Despite this, I still expect that the bulk of the configuration data stored in ``debusine:task-configuration`` will concern workflows because workflows are designed to have a very flat ``task_data`` structure, i.e. with many top-level keys that can thus be individually overridden. This is not always the case for worker tasks. Integration with dynamic_data ----------------------------- To be able to apply changes to the submitted ``task_data`` configuration, we need to be able to know the *subject* and the *context*. However in many cases the subject is not yet known (because it is the output of a previous work request). In spirit, this is similar to the fact that the various :ref:`lookups ` can only be resolved when the work request becomes pending. The conversion of those fields is handled by ``compute_dynamic_data()``. I thus suggest to replace or enhance that process to also take care of applying the task configuration. Algorithm to apply the configuration ------------------------------------ The logic that we want to see implemented is the following: * First build a single "configuration entry" by combining all the relevant collection items. To achieve this, you need to process all items in the correct order (integrating the items referenced from ``use_templates`` just before the corresponding item) by doing the following operations:: default_values = dict() override_values = dict() locked_values = set() for config_item in all_items: # Drop all the entries referenced in `delete_values` (except # locked values) for key in config_item.delete_values: if key in locked_values: continue del default_values[key] del override_values[key] # Merge the default/override values in the response # (except locked values) for key, value in config_item.default_values: if key in locked_values: continue default_values[key] = value for key, value in config_item.override_values: if key in locked_values: continue override_values[key] = value # Update the set of locked values locked_values.update(config_item.lock_values) return (default_values, override_values) * Then apply the operations of that single combined-entry to the data available in ``task_data``:: new_task_data = task_data.copy() default_values, override_values = get_merged_task_configuration() # Apply default values (add missing values, but also replace explicit # None values) for k, v in default_values: if new_task_data.get(k) is None: new_task_data[k] = v # Apply overrides new_task_data.update(override_values) Implementation plan =================== * Extend ``BaseTaskData`` with a ``task_configuration`` field that is a :ref:`lookup-single` and that should return the (optional) ``debusine:task-configuration`` collection to use to configure the task. * Extend ``BaseDynamicTaskData`` with 4 new fields: * ``task_configuration_id``: the result of the collection lookup of the ``task_configuration`` field * ``subject`` (str): the subject value defined above * ``configuration_context`` (str): the context value defined above * ``runtime_context`` (str): the context value defined for the task-history collection * Extend ``WorkRequest`` with a new ``configured_task_data`` field that is similar to ``task_data`` and a new ``version`` integer field used to store the version of the code that has been used to compute the dynamic task data. * Create a new ``configure()`` method on ``TaskDatabase`` that builds up on ``compute_dynamic_data()`` and that computes the ``configured_task_data`` field:: def configure(task: BaseTask[Any, Any]): # 1. Compute dynamic data (including the subject / context / # task_configuration_id values) dynamic_data = task.compute_dynamic_data(self) # 2. Apply the configuration from task-configuration (when # possible) configured_data = self.apply_configuration( task.data, dynamic_data) # 3. Recompute the dynamic data with the configured_data task.data = configured_data dynamic_data = task.compute_dynamic_data(self) # 4. Store everything in the database self.set_configured_task_data(configured_data) self.set_dynamic_data(dynamic_data) self.set_version(self.TASK_VERSION) * Hook that new method in place of the current ``compute_dynamic_data()`` in the scheduler. Make sure the Task class is fed with ``configured_task_data`` in the workers. * Gradually update the tasks to compute the new BaseDynamicTaskData fields in ``compute_dynamic_data()``.