Collection of tasks.
The debusine.tasks module hierarchy hosts a collection of
Task that are
used by workers to fulfill WorkRequest sent by the debusine scheduler.
Creating a new task requires adding a new file containing a class inheriting
Task base class. The name of the class must be unique among
all child classes.
A child class must, at the very least, override the
Base class for tasks
A Task object serves two purpose: encapsulating the logic of what needs to be done to execute the task (cf
execute()that are run on a worker), and supporting the scheduler by determining if a task is suitable for a given worker. That is done in a two-step process, collating metadata from each worker (with the
analyze_worker()method that is run on a worker) and then, based on this metadata, see if a task is suitable (with
can_run_on()that is executed on the scheduler).
Can be overridden to enable jsonschema validation of the
task_dataparameter passed to
Must be overridden by child classes to document the current version of the task’s code. A task will only be scheduled on a worker if its task version is the same as the one running on the scheduler.
Initialise the task
analyze_worker() → dict¶
Return dynamic metadata about the current worker.
This method is called on the worker to collect information about the worker. The information is stored as a set of key-value pairs in a dictionary.
That information is then reused on the scheduler to be fed to
can_run_on()and determine if a task is suitable to be executed on the worker.
Derived objects can extend the behaviour by overriding the method, calling
metadata = super().analyze_worker(), and then adding supplementary data in the dictionary.
To avoid conflicts on the names of the keys used by different tasks you should use key names obtained with
a dictionary describing the worker
- Return type
Return dictionary with metadata for each task in Task._sub_tasks
Subclasses of Task get registered in Task._sub_tasks. Return a dictionary with the metadata of each of the subtasks.
This method is executed in the worker when submitting the dynamic metadata.
can_run_on(worker_metadata: dict) → bool¶
Check if the specified worker can run the task.
This method shall take its decision solely based on the supplied
worker_metadataand on the configured task data (
The default implementation returns always True except if there’s a mismatch between the :py:attribute:TASK_VERSION on the scheduler side and on the worker side.
Derived objects can implement further checks by overriding the method in the following way:
if not super().can_run_on(worker_metadata): return False if ...: return False return True
class_from_name(sub_task_class_name: str) → Type[debusine.tasks._task.Task]¶
Return class for :param sub_task_class_name (case-insensitive)
__init_subclass__() registers Task subclasses’ into Task._sub_tasks.
Configure the task with the supplied
The supplied data is first validated against the JSON schema defined in the TASK_DATA_SCHEMA class attribute. If validation fails, a TaskConfigError is raised. Otherwise, the supplied task_data is stored in the data attribute.
Derived objects can extend the behaviour by overriding the method and calling
super().configure(task_data)however the extra checks must not access any resource of the worker as the method can also be executed on the server when it tries to schedule work requests.
execute() → bool¶
Execute the requested task.
The task must first have been configured. It is allowed to take as much time as required. This method will only be run on a worker. It is thus allowed to access resources local to the worker.
It is recommended to fail early by raising a :py:exc:TaskConfigError if the parameters of the task let you anticipate that it has no chance of completing successfully.
is_valid_task_name(task_name) → bool¶
Return True if task_name is registered (its class is imported)
logging.Loggerinstance that can be used in child classes when you override methods to implement the task.
Halt the task due to invalid configuration