Artifacts

The categorization of artifacts does not enforce anything on the structure of the associated files and key-value data. However, there must be some consistency and rules to be able to a make a meaningful use of the system.

This document presents the various categories that we use to manage a Debian-based distribution. For each category, we explain:

  • what associated files you can find

  • what key-value data you can expect

  • what relationships with other artifacts are likely to exist

Data key names are used in pydantic models, and must therefore be valid Python identifiers.

Relationships

Artifacts may have the following relations with each others:

  • built-using: indicates that the build of the artifact used the target artifact (ex: “binary-packages” artifacts are built using “source-package” artifacts)

  • extends: indicates that the artifact is extending the target artifact in some way (ex: a “source-upload” artifact extends a “source-package” artifact with target distribution information)

  • relates-to: indicates that the artifact relates to another one in some way (ex: a “binary-upload” artifact relates-to a “binary-package”, or a “package-build-log” artifact relates to a “binary-package”).

Category debian:source-package

This artifact represents a set of files that can be extracted in some way to provide a file hierarchy containing source code that can be built into debian:binary-package artifact(s).

  • Data:

    • name: the name of the source package

    • version: the version of the source package

    • type: the type of the source package

      • dpkg for a source package that can be extracted with dpkg-source -x on the .dsc file

    • dsc_fields: a parsed version of the fields available in the .dsc file

  • Files: for the dpkg type, a .dsc file and all the files referenced in that file

  • Relationships:

Category debian:binary-package

This artifact represents a single binary package (a .deb file or similar) produced during the build of a source package for a given architecture.

If the build of a source-package produces more than one binary for a given architecture, or binaries of more than one architecture, one debian:binary-package artifact is created for each binary and architecture.

  • Data:

    • srcpkg_name: the name of the source package

    • srcpkg_version: the version of the source package

    • deb_fields: a parsed version of the fields available in the .deb’s control file

    • deb_control_files: a list of the files in the .deb’s control part

  • Files: a .deb file

  • Relationships:

    • built-using: the corresponding debian:source-package

    • built-using: other debian:binary-package (for example in the case of signed packages duplicating the content of an unsigned package)

    • built-using: other debian:source-package (general case of Debian’s Built-Using field)

Category debian:binary-packages

This artifact represents the set of binary packages (.deb files and similar) produced during the build of a source package for a given architecture.

If the build of a source-package produces binaries of more than one architecture, one debian:binary-packages artifact is created for each architecture, listing only the binary packages for that architecture.

  • Data:

    • srcpkg_name: the name of the source package

    • srcpkg_version: the version of the source package

    • version: the version used for the build (can be different from the source version in case of binary-only rebuilds; note that individual binary packages may have versions that differ from this if the source package uses dpkg-gencontrol -v)

    • architecture: the architecture that the packages have been built for. Can be any real Debian architecture or all.

    • packages: the list of binary packages that are part of the build for this architecture.

  • Files: one or more .deb files

  • Relationships:

    • built-using: the corresponding debian:source-package

    • built-using: other debian:binary-package (for example in the case of signed packages duplicating the content of an unsigned package)

    • built-using: other debian:source-package (general case of Debian’s Built-Using field)

Category debian:upload

This artifact represents an upload of source and/or binary packages. Currently uploads are always represented with .changes file but the structure of the artifact makes it possible to represent other kind of uploads in the future (like uploads with signed git tags, or some debusine native internal upload).

  • Data:

    • type: the type of the source upload

      • dpkg: for an upload generated out of a .changes file created by dpkg-buildpackage

    • changes_fields: a parsed version of the fields available in the .changes file

  • Files:

    • a .changes file

    • All files mentioned in the .changes file

  • Relationships:

    • extends: (optional) one debian:source-package

    • extends: (optional) one or more debian:binary-package and/or debian:binary-packages

Category debian:package-build-log

This artifact contains a package’s build log and some associated information about the corresponding package build. It is kept around for traceability and for diagnostic purposes.

  • Data:

    • source: name of the source package built

    • version: version of the source package built

    • filename: name of the log file

    • maybe other information extracted out of the build log (build time, disk space used, etc.)

  • Files:

    • a single file .build file

  • Relationships:

    • relates-to: one (or more) debian:binary-package and/or debian:binary-packages built

    • relates-to: the corresponding debian:source-package (if built from a source package)

Category debian:system-tarball

This artifact contains a tarball of a Debian system. The tarball is compressed with xz.

  • Data:

    • filename: filename of the tarball inside the artifact (e.g. “system.tar.xz”).

    • vendor: name of the distribution vendor (can be found in ID field in /etc/os-release)

    • codename: name of the distribution used to bootstrap the system

    • mirror: URL of the mirror used to bootstrap the system

    • variant: value of --variant parameter of debootstrap

    • pkglist: a dictionary listing versions of installed packages (cf dpkg-query -W)

    • architecture: the architecture of the Debian system

    • with_dev: boolean value indicating whether /dev has been populated with the most important special files in /dev (null, zero, full, random, urandom, tty, console, ptmx) as well as some usual symlinks (fd, stdin, stdout, stderr).

    • with_init: boolean value indicating whether the system contains an init system in /sbin/init and can thus be “booted” in a container.

  • Files:

    • $filename (e.g. system.tar.xz): tarball of the Debian system

  • Relationships:

    • None.

Note

In preparation of support of different compression schemes, we have decided that the extension of the filename dictates the compression scheme used and that it should be compatible with tar --auto-compress.

Category debian:system-image

This artifact contains a disk image of a Debian system. The disk has a GPT partition table and is EFI bootable (for architectures that support EFI). The disk image should usually contain an EFI partition and a partition for the root filesystem. The root partition should use a partition type UUID respecting the discoverable partitions specification.

  • Data:

    • Same as debian:system-tarball with some extra fields. The filename field points to the disk image.

    • image_format: indicates the format of the image (e.g. raw, qcow2)

    • filesystem: indicates the filesystem used on the root filesystem (e.g. ext4, btrfs, iso9660)

    • size: indicates the size of the filesystem on the root filesystem (in bytes)

    • boot_mechanism: a list of all the ways that the image can be booted. Valid values are efi and bios.

  • Files:

    • $filename (e.g. image.tar.xz, image.qcow2, image.iso): the nature of the file depends on the image_format field.

  • Relationships:

    • None.

Note

At this point, we expect official Debusine tasks to only generate and use images that are bootable with EFI. But the artifact specification has the boot_mechanism key to be future-proof and for the benefit of custom tasks that would make different choices.

raw image format

The image itself is wrapped in a xz-compressed tarball to be able to properly support sparse filesystem images (i.e. files with holes without any data) and to save some space with compression.

The filename field points to the tarball that should contain a root.img file which is the raw disk image.

qcow2 image format

The filename field points directly to the qcow2 image.

Category debusine:work-request-debug-logs

  • Data: empty

  • Files:

    • any number of files containing logs and information to help a debusine user understand the WorkRequest output: commands executed, output of the commands, etc.

  • Relationships:

    • relates-to: the corresponding debian:source-package (if built from a source package)

Category debian:blhc

  • Data: empty

  • Files:

    • One blhc output file called blhc.txt for the corresponding input build log.

  • Relationships:

    • relates-to: the corresponding debian:package-build-log input artifact.

Category debian:lintian

  • Files:

    • lintian.txt: the raw (unfiltered) lintian output

    • analysis.json: the details about all the tags discovered (in a top-level tags key), some statistics/summary (in a top-level summary key) and a version key with the value 1.0 if the content follows the (initial) JSON structure described below.

  • Data:

    • summary: a duplicate of the summary key in analysis.json

  • analysis.json structure:

    • version: always 1.0

    • summary: a dictionary with the following keys:

      • tags_count_by_severity: a dictionary with a sub-key for each of the possible severities documenting the number of tags of the corresponding severity that have been emitted by lintian

      • package_filename: a dictionary mapping the name of the binary or source package to its associated filename (will be a single key dictionary for the case of a source package lintian analysis, and a multiple keys one for the case of an analysis of binary packages)

      • tags_found: the list of non-overridden tags that have been found during the analysis

      • overridden_tags_found: the list of overridden tags that have been found during the analysis

      • lintian_version: the lintian version used for the analysis

      • distribution: the distribution in which lintian has been run

    • tags: a sorted list of tags where each tag is represented with a dictionary. The list is sorted by the following criteria:

      • binary package name in alphabetical order (if relevant)

      • severity (from highest to lowest)

      • tag name (alphabetical order)

      • tag details (alphabetical order)

      Each tag is represented with the following fields:

      • tag: the name of the tag

      • severity: one of the possible severities (see below for full list)

      • package: the name of the binary or source package (there is no risk of confusion between a source and a binary of the same name as the artifact with the analysis is dedicated either to a source packages or to a set of binary packages, but not to both at the same time)

      • note: the details associated to the tag (those are printed after the tag name in the lintian output)

      • pointer: the optional part shown between angle brackets that gives a specific location for the issue (often a filename and a line number)

      • explanation: the long description shown after a tag with --info, aka the lines prefixed with N: (they always start and end with an empty line)

      • comment: the maintainer’s comment shown on lines prefixed with N: just before a given overridden tag (those lines can be identified by the lack of an empty line between them and the tag)

Note

Here’s the ordered list of all the possible severities (from highest to lowest):

  • error

  • warning

  • info

  • pedantic

  • experimental

  • overridden

  • classification

Note that experimental and overridden are not true tag severities, but lintian’s output replaces the usual severity field for those tags with X or O and it is thus not easily possible to capture the original severity.

And while classification is implemented like a low-severity issue, those tags do not represent real issues, they are just a convenient way to export data generated while doing the analysis.

  • Relationships:

    • relates-to: the corresponding artifacts that have been analyzed. They can be of type debian:source-package, debian:binary-package, debian:binary-packages, or debian:upload.

Category debian:autopkgtest

  • Data:

    • results: a dictionary with details about the tests that have been run. Each key is the name of the test (as shown in the summary file) and the value is another dictionary with the following keys:

      • status: one of PASS, FAIL, FLAKY or SKIP

      • details: more details when available

    • cmdline: the complete command line that has been used for the run

    • source_package: a dictionary with some information about the source package hosting the tests that have been run. It has the following sub-keys:

      • name:the name of the source package

      • version: the version of the source package

      • url: the URL of the source package

    • architecture: the architecture of the system where tests have been run

    • distribution: the distribution of the system where tests have been run (formatted as VENDOR:CODENAME)

  • Files:

    • Every file found in the autopkgtest output directory, except for files in binaries/ that are excluded to save space.

  • Relationships:

    • relates-to: the artifacts used as input that are part of the source package being tested. They can be of types debian:source-package, debian:upload, debian:binary-packages or debian:upload

Category debusine:signing-key

This artifact records the existence of a key in the signing service. To avoid expiry, these artifacts should also be added to the collection that they are intended to be used with; they are therefore valid items in debian:suite-signing-keys collections.

  • Data:

    • purpose: the purpose of this key: uefi or openpgp

    • fingerprint: the fingerprint of this key

    • public_key: the base64-encoded public key

  • Files: none

  • Relationships: none

Todo

We’ll eventually need to deal with removing private keys from the signing service when they’re no longer referenced by anything in the debusine database, but it’s not clear exactly what the lifetime rules should be. See https://salsa.debian.org/freexian-team/debusine/-/merge_requests/616#note_497019.

Category debusine:signing-input

This artifact provides input to a Sign task. It will typically be created by the ExtractForSigning task or the Sbuild task.

  • Data:

    • trusted_certs: a list of SHA-256 fingerprints of certificates built into the signed code as roots of trust for verifying additional privileged code (see Describing the trust chain). If present, all the listed fingerprints must be listed in the DEBUSINE_SIGNING_TRUSTED_CERTS Django setting. This is used to avoid accidentally creating trust chains from production to test signing certificates.

    • binary_package_name: the name of the binary package that this artifact was extracted from, if any

  • Files: one or more files to be signed

  • Relationships:

    • relates-to: any other artifacts from which the files to be signed were extracted

Category debusine:signing-output

This artifact contains the output of a Sign task.

  • Data:

    • purpose: the purpose of the key used to sign these files: uefi or openpgp.

    • fingerprint: the fingerprint of the key used to sign these files

    • results: a list of dictionaries describing signed files, as follows (exactly one of output_file and error_message must be present):

      • file: name of the file that was signed

      • output_file: name of the file containing the signature

      • error_message: error message resulting from attempting to sign the file

  • Files:

    • zero or more files containing signatures

  • Relationships:

Category debusine:promise

This bare data item represents an artifact that will eventually be provided as the output of some existing work request. The bare data item may contain additional properties of said artifact to help differentiate them. These items are created by workflows when they populate their work request graphs.

Workflows add promises to their internal collection using the same name as they use for the expected artifact.

  • Data:

    • promise_work_request_id (integer, required): the ID of the work request that is expected to fulfil this promise

    • promise_workflow_id (integer, required): the ID of the workflow work request that created this promise

    • promise_category (string, required): the category of the artifact that is expected to be provided

Other data items should match the per-item data structure used when adding the promised artifact to the workflow’s internal collection, which is determined by the parent workflow based on its protocol for communicating with other workflows. This allows constructing lookups that match either a promise or the promised artifact. Names starting with promise_ are reserved.

When a work request is retried, it must update the work request ID in any associated promises.

Category debian:debdiff

  • Data:

    • original: The name of the first file passed to debdiff.

    • new: The name of the second file passed to debdiff.

  • Files:

    • One debdiff output file called debdiff.txt for the corresponding two source or binary packages.

  • Relationships:

    • relates-to: two debian:source-package or debian:binary-package