Task statistics
We want to store meta-data about previous runs of various tasks. Runtime data of former runs can be used to:
select a powerful worker when the task is resource hungry (or the opposite)
decide whether a failure is fatal based on the existence of previous successful runs
etc.
Store runtime statistics in WorkRequest
The WorkRequest
model gets a new JSON field named output_data
. That
field is set upon completion of the work request. The values are provided
by the worker through the view UpdateWorkRequestAsCompletedView
.
Among the keys that can be present in that new field, we have the following standardized values:
runtime_statistics
: see RuntimeStatistics model below.errors
: a list of errors. Each error is a dictionary with the following keys:message
: user-friendly error messagecode
: computer-friendly error code
Note
Typically used to return validation/configuration errors to the user that resulted in the task not being run at all. Other additional keys might be set depending on the error code.
This
errors
key is not required for the design that we are doing here, but it explains why I opted to create anoutput_data
field instead of aruntime_statistics
field. See #432 or #227 for related issues that we could fix with this new possibility.
RuntimeStatistics model
The model combines runtime data about the task itself:
duration
(optional, integer): the runtime duration of the task in secondscpu_time
(optional, integer): the amount of CPU time used in seconds (combining user and system CPU time)disk_space
(optional, integer): the maximum disk space used during the task’s execution (in bytes)memory
(optional, integer): the maximum amount of RAM used during the task’s execution (in bytes)
But also some data about the worker to help analyze the values and/or to provide reference data in the case of missing runtime data:
available_disk_space
(optional, integer): the available disk space when the task started (in bytes, may be rounded)available_memory
(optional, integer): the amount of RAM that was available when the task started (in bytes, may be rounded)cpu_count
(optional, integer): the number of CPU cores on the worker that ran the task
Collection debusine:task-history
This singleton collection helps to find previous runs of a given task that used similar input parameters and is expected to have a similar behaviour.
To correctly represent the history of a large number of task runs, the bare data item always has the following fields:
task_type
(required): thetask_type
of the work request for which we want to keep statisticstask_name
(required): thetask_name
of the work request for which we want to keep statisticssubject
(required): an abstract string value representing the subject of the task. It is meant to group possible inputs into groups that we expect to behave similarly.context
(optional): an abstract string value representing the runtime context in which the task is executed. It is meant to represent some of the task parameters that can significantly alter the runtime behaviour of the task.
For example, for the sbuild
task, subject
would typically be
the source package name while context
would be the name of the target
suite and the target architecture.
The subject and runtime context are computed dynamically by the task’s
compute_dynamic_data()
method and thus stored in the corresponding
field.
The name of each item is TASK_TYPE:TASK_NAME:SUBJECT:CONTEXT:WORKREQUESTID
.
Other collection-specific characteristics:
Data:
old_items_to_keep
: number of old items to keep. Defaults to 5. The collection always keeps the last success and last failure, and a given number of most recent entries for each subject/context combination. The cleanup is automatically done when adding new items.Note
At some point, we may need more advanced logic than this, for instance to clean up statistics about packages that are gone from the corresponding suite.
Valid items:
debusine:historical-task-run
bare data
Lookup names:
last-entry:TASK_TYPE:TASK_NAME:SUBJECT:CONTEXT
returns the most recently added entry for the specific combination of task/subject/context.last-success:TASK_TYPE:TASK_NAME:SUBJECT:CONTEXT
returns the most recently added entry whereresult
issuccess
for the specific combination of task/subject/context.last-failure:TASK_TYPE:TASK_NAME:SUBJECT:CONTEXT
returns the most recently added entry whereresult
isfailure
orerror
for the specific combination of task/subject/context.
Multiple lookup filters:
same_work_request
: given a Multiple lookup, return conditions matching task runs for the same work request as any of the resulting artifactssame_workflow
: given a Multiple lookup, return conditions matching task runs for work requests from the same workflow as any of the resulting artifacts
Constraints:
None.
Bare data item: debusine:historical-task-run
On top of the mandatory classification fields documented above, the following fields are defined in the data item:
timestamp
(required): the date and time (as a Unix timestamp — cfdate +%s
) of the task runwork_request_id
(required): the ID of the WorkRequest corresponding to the monitored taskresult
(required): duplicates the string value of the result field of the associated WorkRequestruntime_statistics
(required): duplicates the value of theruntime_statistics
key in theoutput_data
dictionary of the associated WorkRequest
Example data:
timestamp: 1722692645
work_request_id: 12
result: success
runtime_statistics:
duration: 6230
cpu_time: 4300
disk_space: 14780131
memory: 344891034
available_disk_space: 12208271360
available_memory: 32839598080
cpu_count: 4
New action record-in-task-history
This action is meant to be used as an event reaction to store the current
task run in a debusine:task-history
collection. The following fields
are supported:
collection
(Single lookup, required):debusine:task-history
collection to updatesubject
(optional, defaults to value stored in dynamic_data): the subject string used to record the statisticscontext
(optional, defaults to value stored in dynamic_data): the runtime context string used to record the statistics
When the action is executed, it simply adds a new entry to the collection.
Note
This action is not meant to be manually added on each work request.
Instead it should be automatically executed upon completion of each
work request provided that the target collection has been set in the
new task_history
task_data field.
Open question: how and where to use the statistics
In theory, the statistics might only be available when the task becomes
pending when we have the final result for compute_dynamic_data()
and
the guarantee to have values for subject/context.
If we want to use those statistics to tweak the configuration of the work request (i.e. adding new worker requirements), then it needs some careful coordination between the scheduler and the workflow.
In practice, many workflows will know the subject/context values by advance and can possibly configure the work request at creation time.
Implementation plan
Add a new optional
task_history
Single lookup field in BaseTaskData.Add a new
get_event_reactions(event_name)
method on theBaseTask
class that returns a list of actions. By default that list should contain the newrecord-in-task-history
action configured with the collection passed in thetask_data.tash_history
field. If the collection is not set, then the returned list is empty.Tweak
WorkRequest.get_triggered_actions()
to combine the event reactions provided by the task implementation with the event reactions explicitly configured when creating the task.