.. _collections: =========== Collections =========== Collections are abstract aggregates of artifacts. To be able to make meaningful use of the system, they need to be assigned categories, each of which has some additional key-value data. Some additional key-value data is associated with each item in a collection. The structure of that data also depends on the category used for the collection. Items in collections may be looked up using various names, depending on the category. These names are analogous to URL routing in web applications (and indeed could be used by debusine's URL routing, as well as when inspecting the collection directly): a name resolves to at most one item at a time, and an item may be accessible via more than one name. The existence of multiple "lookup names" that resolve to an item does not imply duplicates of that item or any associated artifacts. All collections support a generic ``name:NAME`` lookup, which returns the active item whose ``name`` is equal to ``NAME``. Data and per-item data key names are used in ``pydantic`` models, and must therefore be valid Python identifiers. .. _collection-derived: Derived collections =================== To support automated QA at the scale of a distribution, some collections are derived automatically from other collections. For example, the collection of Lintian output for a suite would be derived automatically by running a Lintian task on each of the packages in the corresponding ``debian:suite`` collection. Such collections have additional information to allow keeping track of what work needs to be done to keep them up to date: * Per-item data: * ``derived_from``: a list of the internal collection item IDs from which this item was derived Implementations of the :ref:`update-derived-collection-task` use this information to keep such derived collections up to date. .. _collection-archive: Category ``debian:archive`` =========================== This collection represents a `Debian archive (a.k.a. repository) `_. * Variables when adding items: none * Data: * ``may_reuse_versions``: if true, versions of packages in this archive may be reused provided that the previous packages with that version have been removed; this should be false for typical user-facing archives to avoid confusing behaviour from apt, but it may be useful to set it to true for experimental archives * Valid items: * ``debian:suite`` collections * Per-item data: none * Lookup names: * ``name:NAME``: the suite whose ``name`` property is ``NAME`` * ``source-version:NAME_VERSION``: the source package named ``NAME`` at ``VERSION``. * ``binary-version:NAME_VERSION_ARCHITECTURE``: the set of binary packages on ``ARCHITECTURE`` whose ``srcpkg_name`` property is ``NAME`` and whose ``version`` property is ``VERSION``. * Constraints: * there may be at most one package with a given name and version (and architecture, in the case of binary packages) active in the collection at a given time, although the same package may be in multiple suites * each poolified file name resulting from an active artifact may only refer to at most one concrete file in the collection at a given time (this differs from the above constraint in the case of source packages, which contain multiple files that may overlap with other source packages) * if ``may_reuse_versions`` is false, then each poolified file name in the collection may only refer to at most one concrete file, regardless of whether conflicting files are active or removed .. _collection-suite: Category ``debian:suite`` ========================= This collection represents a single `suite `_ in a Debian archive. Its ``name`` is the name of the suite. * Variables when adding items: * ``component``: the component (e.g. ``main`` or ``non-free``) in which this package is published * ``section``: the section (e.g. ``python``) for this package * ``priority``: for binary packages, the priority (e.g. ``optional``) for this package * Data: * ``release_fields``: dictionary of static fields to set in this suite's ``Release`` file * ``may_reuse_versions``: if true, versions of packages in this suite may be reused provided that the previous packages with that version have been removed; this should be false for typical user-facing suites to avoid confusing behaviour from apt, but it may be useful to set it to true for experimental suites * Valid items: * ``debian:source-package`` artifacts * ``debian:binary-package`` artifacts * Per-item data: * ``srcpkg_name``: for binary packages, the name of the corresponding source package (copied from underlying artifact for ease of lookup and to preserve history) * ``srcpkg_version``: for binary packages, the version of the corresponding source package (copied from underlying artifact for ease of lookup and to preserve history) * ``package``: the name from the package's ``Package:`` field (copied from underlying artifact for ease of lookup and to preserve history) * ``version``: the version of the package (copied from underlying artifact for ease of lookup and to preserve history) * ``architecture``: for binary packages, the architecture of the package (copied from underlying artifact for ease of lookup and to preserve history) * ``component``: the component (e.g. ``main`` or ``non-free``) in which this package is published * ``section``: the section (e.g. ``python``) for this package * ``priority``: for binary packages, the priority (e.g. ``optional``) for this package * Lookup names: * ``source:NAME``: the current version of the source package named ``NAME``. * ``source-version:NAME_VERSION``: the source package named ``NAME`` at ``VERSION``. * ``binary:NAME_ARCHITECTURE`` the current version of the binary package named ``NAME`` on ``ARCHITECTURE``. * ``binary-version:NAME_VERSION_ARCHITECTURE`` the binary package named ``NAME`` at ``VERSION`` on ``ARCHITECTURE``. * Constraints: * there may be at most one package with a given name and version (and architecture, in the case of binary packages) active in the collection at a given time * each poolified file name resulting from an active artifact may only refer to at most one concrete file in the collection at a given time (this differs from the above constraint in the case of source packages, which contain multiple files that may overlap with other source packages) * if ``may_reuse_versions`` is false, then each poolified file name in the collection may only refer to at most one concrete file, regardless of whether conflicting files are active or removed .. _collection-environments: Category ``debian:environments`` ================================ .. todo:: The definition of this category is not yet fully agreed. We'll revisit it when we're closer to being able to try out an implementation so that we can see how the lookup mechanisms will work. This collection represents a group of :ref:`debian:system-tarball ` and/or :ref:`debian:system-image ` artifacts, such as the tarballs used by build daemons across each suite and architecture. In the short term, there will be one ``debian:environments`` collection per distribution vendor with the collection name set to the name of the vendor (e.g. "debian"), so that it can be looked up by the vendor's name. This is subject to change. * Variables when adding items: * ``codename`` (optional): set the distribution version codename for this environment (defaults to the codename that the artifact was built for) * ``variant`` (optional): identifier indicating what kind of tarball or image this is; for example, an image optimized for use with autopkgtest might have its variant set to "autopkgtest" * Data: none * Valid items: * ``debian:system-tarball`` artifacts * ``debian:system-image`` artifacts * Per-item data: * ``codename``: codename of the distribution version (copied from underlying artifact for ease of lookup and to preserve history, but may be overridden to reuse the same tarball for another distribution version) * ``architecture``: architecture name (copied from underlying artifact for ease of lookup and to preserve history) * ``variant``: an optional identifier indicating what kind of tarball or image this is; for example, an image optimized for use with autopkgtest might have its variant set to "autopkgtest" * Lookup names: * ``tarball:CODENAME:ARCHITECTURE``: the current system tarball for codename ``CODENAME`` and architecture ``ARCHITECTURE`` * ``tarball:CODENAME:ARCHITECTURE:VARIANT``: the current system tarball for codename ``CODENAME``, architecture ``ARCHITECTURE``, and variant ``VARIANT`` * ``image:CODENAME:ARCHITECTURE``: the current system image for codename ``CODENAME`` and architecture ``ARCHITECTURE`` * ``image:CODENAME:ARCHITECTURE:VARIANT``: the current system image for codename ``CODENAME``, architecture ``ARCHITECTURE``, and variant ``VARIANT`` * Constraints: * there may be at most one active tarball or image respectively with a given vendor, codename, variant and architecture at a given time .. _collection-suite-lintian: Category ``debian:suite-lintian`` ================================= This :ref:`derived collection ` represents a group of :ref:`debian:lintian artifacts ` for packages in a :ref:`debian:suite collection `. Lintian analysis tasks are performed on combinations of source and binary packages together, since that provides the best test coverage. The resulting ``debian:lintian`` artifacts are related to all the source and binary artifacts that were used by that task, and each of the items in this collection is recorded as being derived from all the base ``debian:source-package`` or ``debian:binary-package`` artifacts that were used in building the associated ``debian:lintian`` artifact. However, each item in this collection has exactly one architecture (including ``source`` and ``all``) in its metadata; as a result, source packages and ``Architecture: all`` binary packages may be base items for multiple derived items at once. Item names are set to ``{package}_{version}_{architecture}``, substituting values from the per-item data described below. * Variables when adding items: none * Data: none * Valid items: * ``debian:lintian`` artifacts * Per-item data: * ``package``: the name of the source package being analyzed, or the source package from which the binary package being analyzed was built * ``version``: the version of the source package being analyzed, or the source package from which the binary package being analyzed was built * ``architecture``: ``source`` for a source analysis, or the appropriate architecture name for a binary analysis * Lookup names: * ``latest:PACKAGE_ARCHITECTURE``: the latest analysis for the source package named ``PACKAGE`` on ``ARCHITECTURE``. * ``version:PACKAGE_VERSION_ARCHITECTURE``: the analysis for the source package named ``PACKAGE`` at ``VERSION`` on ``ARCHITECTURE``. * Constraints: * there may be at most one analysis for a given source package name, version, and architecture active in the collection at a given time For example, given ``hello_1.0.dsc``, ``hello-doc_1.0_all.deb``, ``hello_1.0_amd64.deb``, and ``hello_1.0_s390x.deb``, the following items would exist: * ``hello_1.0_source``, with ``{"package": "hello", "version": "1.0", "architecture": "source"}`` as per-item data, derived from ``hello_1.0.dsc`` and some binary packages * ``hello_1.0_all``, with ``{"package": "hello", "version": "1.0", "architecture": "all"}`` as per-item data, derived from ``hello_1.0.dsc`, ``hello-doc_1.0_all.deb``, and possibly some other binary packages * ``hello_1.0_amd64``, with ``{"package": "hello", "version": "1.0", "architecture": "amd64"}`` as per-item data, derived from ``hello_1.0.dsc``, ``hello-doc_1.0_all.deb``, and ``hello_1.0_amd64.deb`` * ``hello_1.0_s390x``, with ``{"package": "hello", "version": "1.0", "architecture": "s390x"}`` as per-item data, derived from ``hello_1.0.dsc``, ``hello-doc_1.0_all.deb``, and ``hello_1.0_s390x.deb`` .. _collection-workflow-internal: Category ``debusine:workflow-internal`` ======================================= This collection stores runtime data of a :ref:`workflow `. Bare items can be used to store arbitrary JSON data, while artifact items can help to share artifacts between all the tasks (and help retain them for long-running workflows). Items are normally added to this collection using the :ref:`action-update-collection-with-artifacts` action. * Variables when adding items: none; pass an item name instead * Data: none * Valid items: artifacts of any category * Per-item data: none * Lookup names: only the standard ``name:NAME`` lookup .. note:: When a workflow is contained within another workflow they share the same internal collection, so that a sub-workflow can access the artifacts produced by its parent workflow .. note:: The artifacts referenced through the internal collection should not expire while the workflow is running. But they should be allowed to expire once the workflow expiration delay is over. This will likely require to be able to flag a collection as not retaining their contained artifacts. And the delete-expired-artifact will thus have to be able to remove artifacts from collections that do not retain their artifacts. Workflow instances can only expire when their internal collection no longer contains any artifact. Otherwise the workflow instance is kept to facilitate the analysis of (the origin of) artifacts that were created by the workflow. .. todo:: The whole expiration point needs some redesign, tracked in issue #346