Tasks

External API: input and output of tasks

Autopkgtest task (NOT IMPLEMENTED YET)

The task_data associated to this task can contain the following keys:

  • input (required): a dictionary describing the input data:

    • source_arfifact_id (required): the ID of the artifact representing the source package to be tested with autopkgtest

    • binary_artifacts_ids (required): a list of debian:binary-packages artifact IDs representing the binary packages to be tested with autopkgtest (they are expected to be part of the same source package as the one identified with source_artifact_id)

    • context_artifacts_ids (optional): a list of debian:binary-packages artifact IDs representing a special context for the tests. This is used to trigger autopkgtests of reverse dependencies, where context_artifacts_ids is set to the artifacts of the updated package whose reverse dependencies are tested, and source/binary artifact IDs are one of the reverse dependency whose autopkgtests will be executed.

  • architecture (required): the Debian architecture that will be used in the chroot or VM where tests are going to be run. The packages submitted in input:binary_artifacts_ids usually have a matching architecture (but need not in the case of cross-architecture package testing, eg. testing i386 packages in an amd64 system).

  • distribution (required): base distribution of the chroot/container/VM that will be used to run the tests. The distribution codename is prefixed by the vendor identifier.

  • backend (optional): the virtualization backend to use, defaults to auto where the task is free to use the most suitable backend. Other valid options are schroot, lxc, qemu, podman.

Note

We are not supporting all backends as there’s a cost in supporting the required infrastructure setup to make use of all the backends. For schroot, we already have that due to the sbuild task. For podman and qemu we can do all the setup without root rights. For lxc it’s the reference used by Debian and we want to support it too.

  • include_tests (optional): a list of the tests that will be executed. If not provided (or empty), defaults to all tests being executed. Translates into --test-name=TEST command line options.

  • exclude_tests (optional): a list of tests that will skipped. If not provided (or empty), then no tests are skipped. Translates into the --skip-test=TEST command line options.

  • debug_level (optional, defaults to 0): a debug level between 0 and 3. Translates into -d up to -ddd command line options.

  • extra_apt_sources (optional): a list of APT sources. Each APT source is described by a single line (deb http://MIRROR CODENAME COMPONENT) that is copied to a file in /etc/apt/sources.list.d. Translates into --add-apt-source command line options.

  • use_packages_from_base_repository (optional, defaults to False): if True, then we pass --apt-default-release=$DISTRIBUTION with the name of the base distribution given in the distribution key.

  • environment (optional): a dictionary listing environment variables to inject in the build and test environment. Translates into (multiple) --env=VAR=VALUE command line options.

  • needs_internet (optional, defaults to “run”): Translates directly into the --needs-internet command line option. Allowed values are “run”, “try” and “skip”.

  • fail_on (optional): indicates whether the work request must be marked as failed in different scenario identified by the following sub-keys:

    • failed_test (optional, defaults to true): at least one test has failed (and the test was not marked as flaky).

    • flaky_test (optional, defaults to false): at least one flaky test has failed.

    • skipped_test (optional, defaults to false): at least one test has been skipped.

  • timeout (optional): a dictionary where each key/value pair maps to the corresponding --timeout-KEY=VALUE command line option with the exception of the global key that maps to --timeout=VALUE. Supported keys are global, factor, short, install, test, copy and build.

Note

At this point, we have voluntarily not added any key for the --pin-packages option because that option is not explicit enough: differences between the mirror used to schedule jobs and the mirror used by the jobs result in tests that are not testing the version that we want. At this point, we believe it’s better to submit all modified packages explicitly via input:context_artifacts_ids so that we are sure of the .deb that we are submitting and testing with. That way we can even test reverse dependencies before the modified package is available in any repository.

This assumes that we can submit arbitrary .deb on the command line and that they are effectively used as part of the package setup.

autopkgtest is always run with the options --apt-upgrade --output-dir=ARTIFACT-DIR --summary=ARTIFACT-DIR/summary --no-built-binaries. An artifact of category debian:autopkgtest is generated and its content is a copy of what’s available in the ARTIFACT-DIR (except for files in binaries/, they are excluded to save space). The artifact has “relates to” relationships for the artifacts used as input that are part of the source package being tested.

The data field of the artifact has the following structure:

  • results: a dictionary with details about the tests that have been run. Each key is the name of the test (as shown in the summary file) and the value is another dictionary with the following keys:

    • status: one of PASS, FAIL, FLAKY or SKIPPED

    • details: more details when available

  • cmdline: the complete command line that has been used for the run

  • source_package: a dictionary with some information about the source package hosting the tests that have been run. It has the following sub-keys:

    • name:the name of the source package

    • version: the version of the source package

    • url: the URL of the source package

  • architecture: the architecture of the system where tests have been run

  • distribution: the distribution of the system where tests have been run (formatted as VENDOR:CODENAME)

Debootstrap task (NOT IMPLEMENTED YET)

The debootstrap task implements the SystemBootstrap interface except that it only supports a single repository in the bootstrap_repositories key.

On top of the keys defined in that interface, it also supports the following additional keys in task_data:

  • bootstrap_options

    • script: last parameter on debootstrap’s command line

The various keys in the first entry of bootstrap_repositories are mapped to the corresponding command line options and parameters:

  • mirror, suite and script map to positional command line parameters

  • components maps to --components

  • check_signature maps to --check-gpg or --no-check-gpg

  • keyring_package maps to an extra package name in --include

  • keyring maps to --keyring

The following keys from bootstrap_options are also mapped: * variant maps to --variant * extra_packages maps to --include

Mmdebstrap task (NOT IMPLEMENTED YET)

The mmdebstrap task fully implements the SystemBootstrap interface.

The keys from bootstrap_options are mapped to command line options:

  • variant maps to --variant (and it supports more values than debootstrap, see its manual page)

  • extra_packages maps to --include

The keys from bootstrap_repositories are used to build a sources.list file that is then fed to mmdebstrap as input.

Lintian task

The task_data associated to this task can contain the following keys:

  • input (required): a dictionary of values describing the input data, one of the sub-keys is required but both can be given at the same time too.

    • source_arfifact_id (optional): the ID of the artifact representing the source package to be tested with lintian

    • binary_artifacts_ids (optional): a list of debian:binary-packages artifact IDs representing the binary packages to be tested with lintian (they are expected to be part of the same source package as the one identified with source_artifact_id)

Note

While it’s possible to submit only a source or only a single binary artifact, you should aim to always submit source + arch-all + arch-any related artifacts to have the best test coverage as some tags can only be emitted when lintian has access to all of them at the same time.

  • output (optional): a dictionary of values controlling some aspects of the generated artifacts

    • source_analysis (optional, defaults to True): indicates whether we want to generate the debian:lintian artifact for the source package

    • binary_all_analysis (optional, defaults to True): same as source_analysis but for the debian:lintian artifact related to Architecture: all packages

    • binary_any_analysis (optional, defaults to True): same as source_analysis but for the debian:lintian artifact related to Architecture: any packages

  • target_distribution (optional): the fully qualified name of the distribution that will provide the lintian software to analyze the packages. Defaults to debian:unstable.

  • min_lintian_version (optional): request that the analysis be performed with a lintian version that is higher or equal to the version submitted. If a satisfying version is not pre-installed and cannot be installed with apt-get install lintian, then the work request is aborted.

  • include_tags (optional): a list of the lintian tags that are allowed to be reported. If not provided (or empty), defaults to all. Translates into the --tags or --tags-from file command line option.

  • exclude_tags (optional): a list of the lintian tags that are not allowed to be reported. If not provided (or empty), then no tags are hidden. Translates into the --suppress-tags or --suppress-tags-from file command line option.

  • fail_on_severity (optional, defaults to none): if the analysis emits tags of that severity or higher, then the task will return a “failure” instead of a “success”. Valid values are (in decreasing severity) “error”, “warning”, “info”, “pedantic”, “experimental”, “overridden”. “none” is a special value indicating that we should never fail.

The lintian runs will always use the options --display-level ">=classification" (>=pedantic in jessie) --no-cfg --display-experimental --info --show-overrides to collect the full set of data that lintian can provide.

Note

Current lintian can generate “masked” tags (with M: prefix) when you use --show-overrides. For the purpose of debusine, we entirely ignore those tags on the basis that it’s lintian’s decision to hide them (and not the maintainer’s decision) and as such, they don’t bring any useful information. Lintian is full of exceptions to not emit some tags and the fact that some tags rely on a modular exception mechanism that can be diverted to generate masked tags is not useful to package maintainers.

For those reasons, we suggested to lintian’s maintainers to entirely stop emitting those tags in https://bugs.debian.org/1053892

Between 1 to 3 artifacts of category debian:lintian will be generated (one for each source/binary package artifact submitted) and they will have a “relates to” relationship with the corresponding artifact that has been analyzed. These artifacts contain a lintian.txt file with the raw (unfiltered) lintian output and an analysis.json file with the details about all the tags discovered (in a top-level tags key), some statistics/summary (in a top-level summary key) and a version key with the value 1.0 if the content follows the (initial) JSON structure described below.

The summary key is also duplicated in the data field of the artifact. It is a dictionary with the following keys:

  • tags_count_by_severity: a dictionary with a sub-key for each of the possible severities documenting the number of tags of the corresponding severity that have been emitted by lintian

  • package_filename: a dictionary mapping the name of the binary or source package to its associated filename (will be a single key dictionary for the case of a source package lintian analysis, and a multiple keys one for the case of an analysis of binary packages)

  • tags_found: the list of non-overriden tags that have been found during the analysis

  • overridden_tags_found: the list of overridden tags that have been found during the analysis

  • lintian_version: the lintian version used for the analysis

  • distribution: the distribution in which lintian has been run

The tags key in analysis.json is a sorted list of tags where each tag is represented with a dictionary. The list is sorted by the following criteria:

  • binary package name in alphabetical order (if relevant)

  • severity (from highest to lowest)

  • tag name (alphabetical order)

  • tag details (alphabetical order)

Each tag is represented with the following fields:

  • tag: the name of the tag

  • severity: one of the possible severities (see below for full list)

  • package: the name of the binary or source package (there is no risk of confusion between a source and a binary of the same name as the artifact with the analysis is dedicated either to a source packages or to a set of binary packages, but not to both at the same time)

  • note: the details associated to the tag (those are printed after the tag name in the lintian output)

  • pointer: the optional part shown between angle brackets that gives a specific location for the issue (often a filename and a line number)

  • explanation: the long description shown after a tag with --info, aka the lines prefixed with N: (they always start and end with an empty line)

  • comment: the maintainer’s comment shown on lines prefixed with N: just before a given overridden tag (those lines can be identified by the lack of an empty line between them and the tag)

Note

Here’s the ordered list of all the possible severities (from highest to lowest):

  • error

  • warning

  • info

  • pedantic

  • experimental

  • overridden

  • classification

Note that experimental and overridden are not true tag severities, but lintian’s output replaces the usual severity field for those tags with X or O and it is thus not easily possible to capture the original severity.

And while classification is implemented like a low-severity issue, those tags do not represent real issues, they are just a convenient way to export data generated while doing the analysis.

Sbuild task

Regarding inputs, the sbuild task is compatible with the ontology defined for Task PackageBuild even though it implements only a subset of the possible options at this time.

Todo

Document the outputs of the task (artifact and their relationships)

Piuparts task

A specific task to represent a binary package check using the piuparts utility.

The task_data associated to this task can contain the following keys:

  • input (required): a dictionary describing the input data

    • binary_artifacts_ids (required): a list of debian:binary-packages artifact IDs representing the binary packages to be tested. Multiple Artifacts can be provided so as to support e.g. testing binary packages from split indep/arch builds.

  • distribution (required): name of the target distribution.

  • host_architecture (required): the architecture that we want to test on.

The piuparts output will be provided as a new artifact.

Internal API: debusine.tasks

Collection of tasks.

The debusine.tasks module hierarchy hosts a collection of Task that are used by workers to fulfill WorkRequest sent by the debusine scheduler.

Creating a new task requires adding a new file containing a class inheriting from the Task base class. The name of the class must be unique among all child classes.

A child class must, at the very least, override the Task.execute() method.

class debusine.tasks.Task[source]

Base class for tasks.

A Task object serves two purpose: encapsulating the logic of what needs to be done to execute the task (cf configure() and execute() that are run on a worker), and supporting the scheduler by determining if a task is suitable for a given worker. That is done in a two-step process, collating metadata from each worker (with the analyze_worker() method that is run on a worker) and then, based on this metadata, see if a task is suitable (with can_run_on() that is executed on the scheduler).

TASK_DATA_SCHEMA: dict[str, Any] = {}

Can be overridden to enable jsonschema validation of the task_data parameter passed to configure().

TASK_VERSION: Optional[int] = None

Must be overridden by child classes to document the current version of the task’s code. A task will only be scheduled on a worker if its task version is the same as the one running on the scheduler.

__init__()[source]

Initialize the task.

abort()[source]

Task does not need to be executed. Once aborted cannot be changed.

property aborted: bool

Return if the task is aborted.

Tasks cannot transition from aborted -> not-aborted.

analyze_worker() dict[source]

Return dynamic metadata about the current worker.

This method is called on the worker to collect information about the worker. The information is stored as a set of key-value pairs in a dictionary.

That information is then reused on the scheduler to be fed to can_run_on() and determine if a task is suitable to be executed on the worker.

Derived objects can extend the behaviour by overriding the method, calling metadata = super().analyze_worker(), and then adding supplementary data in the dictionary.

To avoid conflicts on the names of the keys used by different tasks you should use key names obtained with self.prefix_with_task_name(...).

Returns:

a dictionary describing the worker.

Return type:

dict.

classmethod analyze_worker_all_tasks()[source]

Return dictionary with metadata for each task in Task._sub_tasks.

Subclasses of Task get registered in Task._sub_tasks. Return a dictionary with the metadata of each of the subtasks.

This method is executed in the worker when submitting the dynamic metadata.

append_to_log_file(filename: str, lines: list[str]) None[source]

Open log file and write contents into it.

Parameters:
  • filename – use self.open_debug_log_file(filename)

  • lines – write contents to the logfile

can_run_on(worker_metadata: dict) bool[source]

Check if the specified worker can run the task.

This method shall take its decision solely based on the supplied worker_metadata and on the configured task data (self.data).

The default implementation returns always True except if there’s a mismatch between the :py:attribute:TASK_VERSION on the scheduler side and on the worker side.

Derived objects can implement further checks by overriding the method in the following way:

if not super().can_run_on(worker_metadata):
    return False

if ...:
    return False

return True
Parameters:

worker_metadata (dict) – The metadata collected from the worker by running analyze_worker() on all the tasks on the worker under consideration.

Returns:

the boolean result of the check.

Return type:

bool.

static class_from_name(sub_task_class_name: str) Type[Task][source]

Return class for :param sub_task_class_name (case-insensitive).

__init_subclass__() registers Task subclasses’ into Task._sub_tasks.

configure(task_data)[source]

Configure the task with the supplied task_data.

The supplied data is first validated against the JSON schema defined in the TASK_DATA_SCHEMA class attribute. If validation fails, a TaskConfigError is raised. Otherwise, the supplied task_data is stored in the data attribute.

Derived objects can extend the behaviour by overriding the method and calling super().configure(task_data) however the extra checks must not access any resource of the worker as the method can also be executed on the server when it tries to schedule work requests.

Parameters:

task_data (dict) – The supplied data describing the task.

Raises:

TaskConfigError – if the JSON schema is not respected.

configure_server_access(debusine: Debusine)[source]

Set the object to access the server.

execute() bool[source]

Call the _execute() method, upload debug artifacts.

See _execute() for more information.

Returns:

result of the _execute() method.

execute_logging_exceptions() bool[source]

Execute self.execute() logging any raised exceptions.

find_file_by_suffix(directory: Path, suffix: str) Optional[Path][source]

Find file in directory with the specified suffix.

If there is no file ending with suffix or there is more than one file: return None and write a log in the directory.

Parameters:
  • directory – directory to find the file. Not recursive.

  • suffix – suffix to find.

Returns:

file path or None

static is_valid_task_name(task_name) bool[source]

Return True if task_name is registered (its class is imported).

logger

A logging.Logger instance that can be used in child classes when you override methods to implement the task.

name

The name of the task. It is computed by __init__() by converting the class name to lowercase.

open_debug_log_file(filename: str, *, mode='a') Union[TextIO, BinaryIO][source]

Open a temporary file and return it.

The files are always for the same temporary directory, calling it twice with the same file name will open the same file.

The caller must call .close() when finished writing.

prefix_with_task_name(text: str) str[source]
Returns:

the text prefixed with the task name and a colon.

static task_names() list[str][source]

Return list of sub-task names.

exception debusine.tasks.TaskConfigError[source]

Halt the task due to invalid configuration.

Task to build Debian packages with sbuild.

This task module implements the PackageBuild ontology for its task_data: https://freexian-team.pages.debian.net/debusine/design/ontology.html#task-packagebuild

class debusine.tasks.sbuild.Sbuild[source]

Task implementing a Debian package build with sbuild.

TASK_DATA_SCHEMA: dict[str, Any] = {'additionalProperties': False, 'properties': {'build_components': {'items': {'enum': ['any', 'all', 'source']}, 'type': 'array', 'uniqueItems': True}, 'distribution': {'type': 'string'}, 'host_architecture': {'type': 'string'}, 'input': {'additionalProperties': False, 'properties': {'source_artifact_id': {'type': 'integer'}}, 'required': ['source_artifact_id'], 'type': 'object'}, 'sbuild_options': {'items': {'type': 'string'}, 'type': 'array'}}, 'required': ['input', 'distribution', 'host_architecture'], 'type': 'object'}

Can be overridden to enable jsonschema validation of the task_data parameter passed to configure().

TASK_VERSION: Optional[int] = 1

Must be overridden by child classes to document the current version of the task’s code. A task will only be scheduled on a worker if its task version is the same as the one running on the scheduler.

__init__()[source]

Initialize the sbuild task.

analyze_worker()[source]

Report metadata for this task on this worker.

can_run_on(worker_metadata: dict) bool[source]

Check the specified worker can run the requested task.

property chroot_name: str

Build name of required chroot.

configure(task_data)[source]

Handle sbuild-specific configuration.

configure_for_execution(download_directory: Path) bool[source]

Configure Task: set variables needed for the build() step.

Return True if configuration worked, False, if there was a problem.

Note: self.find_file_by_suffix() write to a log file to be uploaded as artifact.

execute() bool[source]

Verify task can be executed and super().execute().

Raises:

TaskConfigError.

fetch_input(destination: Path) bool[source]

Download the source artifact.

upload_artifacts(directory: Path, *, execution_success: bool)[source]

Upload the artifacts from directory.

Parameters:
  • directory – directory containing the files that will be uploaded.

  • execution_success – if False skip uploading .changes and .deb/.udeb