.. _task-configuration:

==================
Task configuration
==================

When you maintain a complete distribution like Debian or one of its
derivatives, you have to deal with special cases and exceptions, for
example:

* disable build/autopkgtest/etc. of a package on a specific architecture
  because it kills the workers
* restrict the build/autopkgtest/etc. of a package to specific workers
  where the build is known to succeed
* etc.

As a derivative, you might want to make opinionated choices and change some
of the build parameters by using a specific build profile on some
packages.

Those tweaks and exceptions need to be recorded and managed somewhere, and
then later used to feed the relevant workflows and work requests. We plan
to store this information in new collections and improve workflows/work
request creation to be able to use those collections to apply the
desired overrides.

.. collection:: debusine:task-configuration

Collection ``debusine:task-configuration``
==========================================

This collection is meant to store configuration data for tasks. By
configuration data, we mean "key/value pairs" that are going to be fed in
the ``task_data`` field of :ref:`explanation-work-requests`. By tasks, we mean any
:ref:`type of Tasks <explanation-tasks>`, but for all practical purposes, ``Worker``
and ``Workflow`` tasks are the most likely target here.

The actual configuration data is to be generated by merging multiple
snippets of configuration (stored in multiple collection items), each
applying at different levels of granularity.

To provide fine-grained control of the configuration, we consider
that a *subject* is being processed by a task and that the task can
have a *configuration context*. The *configuration context* is typically
another parameter of the task that can usefully be leveraged to apply some
consistent configuration across all work requests sharing the same
*configuration context*.

.. todo:: Decide whether we want to support multiple configuration
   contexts? A task could return multiple contexts and typically
   the sbuild task could usefully have parameters related to the
   host_architecture and to the target suite, on top of parameters
   related to the source package.

   It could be achieved with context values like ``suite=jessie`` and
   ``architecture=arm64``.

Those two values are used to lookup the various snippets of configuration.
The snippets are retrieved and processed in the following orders:

* global (subject=None, context=None)
* context level (subject=None, context != None)
* subject level (subject != None, context=None)
* specific-combination level (subject != None, context != None)

The collection can host partial or full configuration data. But it is
expected to be mainly useful to store overrides, i.e. variations compared
the defaults provided by the task or its containing workflow.

To correctly represent all those possibilities, the bare data item
always has the following fields:

* ``task_type`` (required): the ``task_type`` of the task for which we
  want to provide default values and overrides
* ``task_name`` (required): the ``task_name`` of the task for which we
  want to provide default values and overrides
* ``subject`` (defaults to None):  an abstract string value representing the
  *subject* of the task (i.e. something passed as input). It is meant to
  group possible inputs for the tasks into groups that we expect to
  configure similarly.
* ``context`` (defaults to None): an abstract string value representing the
  *configuration context* of the task. It is typically another important
  task parameter (or derived from it).
* ``template`` (defaults to None): the name of a template entry

For example, for the ``debian-pipeline`` workflow, ``subject`` would typically be
the source package name while ``context`` would be the name of the target
suite.

The name of each item is ``TASK_TYPE:TASK_NAME:SUBJECT:CONTEXT``, except
for a template item where it is ``template:TEMPLATE``.

Other collection-specific characteristics:

* Data: none

* Valid items:

  * :bare-data:`debusine:task-configuration` bare data

* Lookup names:

  * None. The name based lookup is sufficient.

* Constraints:

  * when ``template`` is set, ``task_type``, ``task_name``, ``subject`` and
    ``context`` should be None

.. bare-data:: debusine:task-configuration

Bare data ``debusine:task-configuration``
-----------------------------------------

On top of the mandatory classification fields documented above, the
following fields are supported and correspond to successive steps to
perform to build the configuration data:

* ``use_templates`` (list): a list of template names whose corresponding
  entries shall be retrieved and imported as part of the configuration
  returned for the current entry
* ``delete_values`` (list): a list of configuration keys to delete from the
  values returned by the previous configuration levels
* ``default_values`` (dict): values to use as default values if the user did not
  provide any value for the given configuration keys
* ``override_values`` (dict): values to use even if the user did provide a
  value for the given configuration key
* ``lock_values`` (list): a list of configuration keys that should
  be locked (i.e. next configuration level can no longer provide or modify
  the corresponding value)
* ``comment`` (string): multiline free form text used to document the
  reasons behind the provided configuration. Text can use Markdown syntax.

This mechanism only allows to control top-level configuration keys in
``task_data`` fields. It is not possible to override a single value
in a nested dictionary, but you can override the whole dictionary if you
wish so.

When the same configuration key appears in ``default_values`` and
``override_values`` (either in a single entry, or in the entry created by
combining the different levels), the one from ``override_values`` take
precedence over the one from ``default_values``.

About templates
---------------

Template entries follow the same structure as other entries, but they are
only used indirectly, when a normal configuration entry refers
to them as part of its ``use_templates`` field.

It is meant to share some common configuration across multiple similar
packages.

Example::

    template:uefi-sign:
      default_values:
        enable_make_signed_source: True
        make_signed_source_purpose: uefi

    template:uefi-sign-with-fwupd-key:
      use_templates:
        - uefi-sign
      default_values:
        make_signed_source_key: AEC1234

    template:uefi-sign-with-grub-key:
      use_templates:
        - uefi-sign
      default_values:
        make_signed_source_key: CBD3214

    Workflow:debian-pipeline:fwupd-efi::
      use_templates:
        - sign-with-fwupd-key

    Workflow:debian-pipeline:fwupdate::
      use_templates:
        - sign-with-fwupd-key

    Workflow:debian-pipeline:grub2::
      use_templates:
        - sign-with-grub-key

Design considerations
=====================

Workflow vs task feature
------------------------

While this was initially started as “external instructions for the
``debian-pipeline`` workflow”, the comments lead us to build this as a
general solution to provide configuration for any workflow. But since
workflows are just one kind of tasks, I figured out that we could just as
well apply this new concept to all kinds of tasks so that when you provide
some configuration to a workflow, it would also apply to all the child
tasks.

Having the ability to store overrides at the worker task level can save us
from adding too many parameters on the workflows. The only required
parameters would be those that are important to control the orchestration
step.

For example, we could have configuration for the sbuild worker
task next to the configuration for the debian-pipeline workflow::

    Workflow:debian-pipeline:::
      default_values:
        ...

    Worker:sbuild::jessie:
      override_values:
        backend: incus-lxc

This specific example shows how the ``sbuild_backend`` parameter might no
longer be needed on the ``debian-pipeline`` workflow. We might still want
to keep it.

Despite this, I still expect that the bulk of the configuration data
stored in :collection:`debusine:task-configuration` will concern workflows
because workflows are designed to have a very flat ``task_data`` structure,
i.e. with many top-level keys that can thus be individually overridden. This
is not always the case for worker tasks.

Integration with dynamic_data
-----------------------------

To be able to apply changes to the submitted ``task_data`` configuration,
we need to be able to know the *subject* and the *context*. However in
many cases the subject is not yet known (because it is the output of a
previous work request).

In spirit, this is similar to the fact that the various :ref:`lookups
<lookup-syntax>` can only be resolved when the work request becomes
pending. The conversion of those fields is handled by
``compute_dynamic_data()``.

I thus suggest to replace or enhance that process to also take care of
applying the task configuration.

Algorithm to apply the configuration
------------------------------------

The logic that we want to see implemented is the following:

* First build a single "configuration entry" by combining all the relevant
  collection items. To achieve this, you need to process all items in the
  correct order (integrating the items referenced from ``use_templates``
  just before the corresponding item) by doing the following operations::

    default_values = dict()
    override_values = dict()
    locked_values = set()

    for config_item in all_items:
        # Drop all the entries referenced in `delete_values` (except
        # locked values)
        for key in config_item.delete_values:
            if key in locked_values:
                continue
            del default_values[key]
            del override_values[key]

        # Merge the default/override values in the response
        # (except locked values)
        for key, value in config_item.default_values:
            if key in locked_values:
                continue
            default_values[key] = value
        for key, value in config_item.override_values:
            if key in locked_values:
                continue
            override_values[key] = value

        # Update the set of locked values
        locked_values.update(config_item.lock_values)

    return (default_values, override_values)

* Then apply the operations of that single combined-entry to the data
  available in ``task_data``::

    new_task_data = task_data.copy()
    default_values, override_values = get_merged_task_configuration()

    # Apply default values (add missing values, but also replace explicit
    # None values)
    for k, v in default_values:
        if new_task_data.get(k) is None:
            new_task_data[k] = v

    # Apply overrides
    new_task_data.update(override_values)

Implementation plan
===================

* Extend ``BaseTaskData`` with a ``task_configuration`` field that
  is a :ref:`lookup-single` and that should return the (optional)
  :collection:`debusine:task-configuration` collection to use to configure
  the task.

* Extend ``BaseDynamicTaskData`` with 4 new fields:

  * ``task_configuration_id``: the result of the collection lookup of the
    ``task_configuration`` field
  * ``subject`` (str): the subject value defined above
  * ``configuration_context`` (str): the context value defined above
  * ``runtime_context`` (str): the context value defined for the
    task-history collection

* Extend ``WorkRequest`` with a new ``configured_task_data`` field that
  is similar to ``task_data`` and a new ``version`` integer field used
  to store the version of the code that has been used to compute the
  dynamic task data.

* Create a new ``configure()`` method on ``TaskDatabase`` that builds up
  on ``compute_dynamic_data()`` and that computes the
  ``configured_task_data`` field::

    def configure(task: BaseTask[Any, Any]):
        # 1. Compute dynamic data (including the subject / context /
        # task_configuration_id values)
        dynamic_data = task.compute_dynamic_data(self)

        # 2. Apply the configuration from task-configuration (when
        # possible)
        configured_data = self.apply_configuration(
            task.data, dynamic_data)

        # 3. Recompute the dynamic data with the configured_data
        task.data = configured_data
        dynamic_data = task.compute_dynamic_data(self)

        # 4. Store everything in the database
        self.set_configured_task_data(configured_data)
        self.set_dynamic_data(dynamic_data)
        self.set_version(self.TASK_VERSION)

* Hook that new method in place of the current ``compute_dynamic_data()``
  in the scheduler. Make sure the Task class is fed with
  ``configured_task_data`` in the workers.

* Gradually update the tasks to compute the new BaseDynamicTaskData fields
  in ``compute_dynamic_data()``.