Copying artifacts between workspaces

Motivation

It is useful in various situations to be able to copy artifacts between workspaces. For example, once we have repository hosting, we’ll want a way to copy packages between repositories as part of managing transitions, and copying at least source packages between workspaces (and perhaps even between scopes) would be a natural part of maintaining derivative distributions.

To support security workflows, we need to be able to prepare artifacts in a private (“embargoed”) workspace and then copy them to somewhere public once the embargo has expired. Doing this should require some kind of intentional flag: we don’t want to make it too easy to break embargoes by accident.

Permission considerations

Copying artifacts requires both the ability to read from the source and the ability to write to the destination (either directly or via a workflow).

After artifacts have been made public, it’s helpful to be able to see the work request that created them, without having to somehow also copy the work request around. To achieve this, the permission predicate that checks whether a user can see a work request may check whether any of the artifacts produced by the work request are visible to that user, and return True in that case even if the work request itself would not ordinarily be visible.

Note

It may be surprising that this rule is “any of the artifacts produced by the work request” rather than “all of the artifacts produced by the work request”; but there isn’t usually anywhere useful to copy debusine:work-request-debug-logs artifacts to, and making only some of the artifacts produced by a work request public seems unlikely to be a realistic unembargoing use case.

While build logs may expose additional information not in the output artifacts (such as build-dependencies where security updates are also being prepared), similar information might easily be exposed by the output artifacts themselves anyway, so the onus is on people who make artifacts public to check that it is safe to do so.

Resource accounting considerations

We want to be able to track the resource usage of workspaces and scopes. If artifacts are copied between workspaces (and hence perhaps between scopes), then the same files may exist in multiple workspaces, complicating this kind of analysis. The question is likely to be something along the lines of “how much data does debusine need to store on behalf of this workspace or scope that it would not otherwise need to store?”.

A reasonable first cut would be to track the origin of copies, and to account an artifact’s files to a workspace (and its containing scope) if the artifact is in that workspace and is no longer in its origin workspace. We therefore add a nullable Artifact.original_artifact foreign key, with on_delete=SET_NULL.

Some other variations are possible, and are not made more difficult by this design. For example, we may wish to account for each workspace’s usage without considering whether files have been copied from or to other workspaces (in which case the total file store size may be less than the sum of the sizes of all workspaces); or to calculate the “unique” size of a workspace as the total size of all files that appear only in that workspace.

CopyCollectionItems task

This server task copies items into given target collections, which may or may not be in the same workspace as the original items. It returns an error if:

  • the user/workflow that created the task does not have permission to read the items or to write to the target collection

  • any of the items is a collection

  • unembargo is False, any of the items are in a private workspace, and the target collection is in a public workspace

  • the collection manager fails to add the items (e.g. because they are incompatible with the collection)

The task_data for this task may contain the following keys:

  • copies: a list of dictionaries as follows:

    • source_items (Multiple lookup, required): a list of items to copy (as usual for lookups, these may be collection items or they may be artifacts looked up directly by ID)

    • target_collection (Single lookup, required): the collection to copy items into

    • unembargo (boolean, defaults to False): if True, allow copying from private to public workspaces

    • replace (boolean, defaults to False): if True, replace existing similar items

    • name_template (string, optional): template used to generate the name for the target collection item, using the str.format templating syntax (with variables inside curly braces)

    • variables (dictionary, optional): pass these variables when adding items to the target collection; if a given source item came from a collection, then this is merged into the per-item data from the corresponding source collection item, with the values given here taking priority in cases of conflict

For each of the entries in copies, the task copies the source items to the target collection’s workspace; when copying artifacts, if the contained files are already in one of that workspace’s file stores, then it copies references to them, and otherwise it copies the file contents. For each source item, it then adds a collection item to the target collection, using name_template and variables in the same way as in update-collection-with-artifacts.

All the requested copies happen in a single database transaction; if one of them fails then they are all rolled back.

Workflow package_publish

This workflow publishes source and/or binary packages to a given target suite. It is normally expected to be used as a sub-workflow.

  • task_data:

    • source_artifact (Single lookup, optional): a debian:source-package or debian:upload artifact representing the source package (the former is used when the workflow is started based on a .dsc rather than a .changes)

    • binary_artifacts (Multiple lookup, optional): a list of debian:upload artifacts representing the binary packages

    • target_suite (Single lookup, optional): the debian:suite collection to publish packages to

    • unembargo (boolean, defaults to False): if True, allow publishing artifacts from private workspaces to public suites

    • replace (boolean, defaults to False): if True, replace existing similar items

    • suite_variables (dictionary, optional): pass these variables when adding items to the target suite collection; if a given source or binary artifact came from a collection, then this is merged into the per-item data from the corresponding collection item, with the values given here taking priority in cases of conflict; see debian:suite for the available variable names

At least one of source_artifact and binary_artifacts must be set.

The workflow creates a CopyCollectionItems task. The copies field in its task data is as follows:

  • source_items: the union of whichever of {source_artifact} and {binary_artifacts} are set

  • target_collection: {target_suite}

  • unembargo: {unembargo}

  • replace: {replace}

  • variables: {suite_variables}

Any of the lookups in source_items may result in promises, and in that case the workflow adds corresponding dependencies.

If the source and target workspaces have different instances of the debian:package-build-logs collection, then the workflow also adds an entry to copies as follows:

  • source_items:

    collection: {source build logs collection}
    lookup__same_work_request: {binary_artifacts}
    
  • target_collection: target build logs collection

  • unembargo: {unembargo}

  • replace: {replace}

If the source and target workspaces have different instances of the debusine:task-history collection, then the workflow also adds an entry to copies as follows:

  • source_items:

    collection: {source task history collection}
    lookup__same_workflow: {binary_artifacts}
    
  • target_collection: target task history collection

  • unembargo: {unembargo}

  • replace: {replace}