Task statistics

We want to store meta-data about previous runs of various tasks. Runtime data of former runs can be used to:

  • select a powerful worker when the task is resource hungry (or the opposite)

  • decide whether a failure is fatal based on the existence of previous successful runs

  • etc.

Store runtime statistics in WorkRequest

The WorkRequest model gets a new JSON field named output_data. That field is set upon completion of the work request. The values are provided by the worker through the view UpdateWorkRequestAsCompletedView.

Among the keys that can be present in that new field, we have the following standardized values:

  • runtime_statistics: see RuntimeStatistics model below.

  • errors: a list of errors. Each error is a dictionary with the following keys:

    • message: user-friendly error message

    • code: computer-friendly error code

    Note

    Typically used to return validation/configuration errors to the user that resulted in the task not being run at all. Other additional keys might be set depending on the error code.

    This errors key is not required for the design that we are doing here, but it explains why I opted to create an output_data field instead of a runtime_statistics field. See #432 or #227 for related issues that we could fix with this new possibility.

RuntimeStatistics model

The model combines runtime data about the task itself:

  • duration (optional, integer): the runtime duration of the task in seconds

  • cpu_time (optional, integer): the amount of CPU time used in seconds (combining user and system CPU time)

  • disk_space (optional, integer): the maximum disk space used during the task’s execution (in bytes)

  • memory (optional, integer): the maximum amount of RAM used during the task’s execution (in bytes)

But also some data about the worker to help analyze the values and/or to provide reference data in the case of missing runtime data:

  • available_disk_space (optional, integer): the available disk space when the task started (in bytes, may be rounded)

  • available_memory (optional, integer): the amount of RAM that was available when the task started (in bytes, may be rounded)

  • cpu_count (optional, integer): the number of CPU cores on the worker that ran the task

Collection debusine:task-history

This singleton collection helps to find previous runs of a given task that used similar input parameters and is expected to have a similar behaviour.

To correctly represent the history of a large number of task runs, the bare data item always has the following fields:

  • task_type (required): the task_type of the work request for which we want to keep statistics

  • task_name (required): the task_name of the work request for which we want to keep statistics

  • subject (required): an abstract string value representing the subject of the task. It is meant to group possible inputs into groups that we expect to behave similarly.

  • context (optional): an abstract string value representing the runtime context in which the task is executed. It is meant to represent some of the task parameters that can significantly alter the runtime behaviour of the task.

For example, for the sbuild task, subject would typically be the source package name while context would be the name of the target suite and the target architecture.

The subject and runtime context are computed dynamically by the task’s compute_dynamic_data() method and thus stored in the corresponding field.

The name of each item is TASK_TYPE:TASK_NAME:SUBJECT:CONTEXT:WORKREQUESTID.

Other collection-specific characteristics:

  • Data:

    • old_items_to_keep: number of old items to keep. Defaults to 5. The collection always keeps the last success and last failure, and a given number of most recent entries for each subject/context combination. The cleanup is automatically done when adding new items.

      Note

      At some point, we may need more advanced logic than this, for instance to clean up statistics about packages that are gone from the corresponding suite.

  • Valid items:

    • debusine:historical-task-run bare data

  • Lookup names:

    • last-entry:TASK_TYPE:TASK_NAME:SUBJECT:CONTEXT returns the most recently added entry for the specific combination of task/subject/context.

    • last-success:TASK_TYPE:TASK_NAME:SUBJECT:CONTEXT returns the most recently added entry where result is success for the specific combination of task/subject/context.

    • last-failure:TASK_TYPE:TASK_NAME:SUBJECT:CONTEXT returns the most recently added entry where result is failure or error for the specific combination of task/subject/context.

  • Multiple lookup filters:

    • same_work_request: given a Multiple lookup, return conditions matching task runs for the same work request as any of the resulting artifacts

    • same_workflow: given a Multiple lookup, return conditions matching task runs for work requests from the same workflow as any of the resulting artifacts

  • Constraints:

    • None.

Bare data item: debusine:historical-task-run

On top of the mandatory classification fields documented above, the following fields are defined in the data item:

  • timestamp (required): the date and time (as a Unix timestamp — cf date +%s) of the task run

  • work_request_id (required): the ID of the WorkRequest corresponding to the monitored task

  • result (required): duplicates the string value of the result field of the associated WorkRequest

  • runtime_statistics (required): duplicates the value of the runtime_statistics key in the output_data dictionary of the associated WorkRequest

Example data:

timestamp: 1722692645
work_request_id: 12
result: success
runtime_statistics:
    duration: 6230
    cpu_time: 4300
    disk_space: 14780131
    memory: 344891034
    available_disk_space: 12208271360
    available_memory: 32839598080
    cpu_count: 4

New action record-in-task-history

This action is meant to be used as an event reaction to store the current task run in a debusine:task-history collection. The following fields are supported:

  • collection (Single lookup, required): debusine:task-history collection to update

  • subject (optional, defaults to value stored in dynamic_data): the subject string used to record the statistics

  • context (optional, defaults to value stored in dynamic_data): the runtime context string used to record the statistics

When the action is executed, it simply adds a new entry to the collection.

Note

This action is not meant to be manually added on each work request. Instead it should be automatically executed upon completion of each work request provided that the target collection has been set in the new task_history task_data field.

Open question: how and where to use the statistics

In theory, the statistics might only be available when the task becomes pending when we have the final result for compute_dynamic_data() and the guarantee to have values for subject/context.

If we want to use those statistics to tweak the configuration of the work request (i.e. adding new worker requirements), then it needs some careful coordination between the scheduler and the workflow.

In practice, many workflows will know the subject/context values by advance and can possibly configure the work request at creation time.

Implementation plan

  • Add a new optional task_history Single lookup field in BaseTaskData.

  • Add a new get_event_reactions(event_name) method on the BaseTask class that returns a list of actions. By default that list should contain the new record-in-task-history action configured with the collection passed in the task_data.tash_history field. If the collection is not set, then the returned list is empty.

  • Tweak WorkRequest.get_triggered_actions() to combine the event reactions provided by the task implementation with the event reactions explicitly configured when creating the task.