Copying artifacts between workspaces
Motivation
It is useful in various situations to be able to copy artifacts between workspaces. For example, once we have repository hosting, we’ll want a way to copy packages between repositories as part of managing transitions, and copying at least source packages between workspaces (and perhaps even between scopes) would be a natural part of maintaining derivative distributions.
To support security workflows, we need to be able to prepare artifacts in a private (“embargoed”) workspace and then copy them to somewhere public once the embargo has expired. Doing this should require some kind of intentional flag: we don’t want to make it too easy to break embargoes by accident.
Permission considerations
Copying artifacts requires both the ability to read from the source and the ability to write to the destination (either directly or via a workflow).
After artifacts have been made public, it’s helpful to be able to see the work request that created them, without having to somehow also copy the work request around. To achieve this, the permission predicate that checks whether a user can see a work request may check whether any of the artifacts produced by the work request are visible to that user, and return True in that case even if the work request itself would not ordinarily be visible.
Note
It may be surprising that this rule is “any of the artifacts produced by the work request” rather than “all of the artifacts produced by the work request”; but there isn’t usually anywhere useful to copy debusine:work-request-debug-logs artifacts to, and making only some of the artifacts produced by a work request public seems unlikely to be a realistic unembargoing use case.
While build logs may expose additional information not in the output artifacts (such as build-dependencies where security updates are also being prepared), similar information might easily be exposed by the output artifacts themselves anyway, so the onus is on people who make artifacts public to check that it is safe to do so.
Resource accounting considerations
We want to be able to track the resource usage of workspaces and scopes. If artifacts are copied between workspaces (and hence perhaps between scopes), then the same files may exist in multiple workspaces, complicating this kind of analysis. The question is likely to be something along the lines of “how much data does debusine need to store on behalf of this workspace or scope that it would not otherwise need to store?”.
A reasonable first cut would be to track the origin of copies, and to
account an artifact’s files to a workspace (and its containing scope) if the
artifact is in that workspace and is no longer in its origin workspace. We
therefore add a nullable Artifact.original_artifact
foreign key, with
on_delete=SET_NULL
.
Some other variations are possible, and are not made more difficult by this design. For example, we may wish to account for each workspace’s usage without considering whether files have been copied from or to other workspaces (in which case the total file store size may be less than the sum of the sizes of all workspaces); or to calculate the “unique” size of a workspace as the total size of all files that appear only in that workspace.
CopyCollectionItems task
This server task copies items into given target collections, which may or may not be in the same workspace as the original items. It returns an error if:
the user/workflow that created the task does not have permission to read the items or to write to the target collection
any of the items is a collection
unembargo
is False, any of the items are in a private workspace, and the target collection is in a public workspacethe collection manager fails to add the items (e.g. because they are incompatible with the collection)
The task_data
for this task may contain the following keys:
copies
: a list of dictionaries as follows:source_items
(Multiple lookup, required): a list of items to copy (as usual for lookups, these may be collection items or they may be artifacts looked up directly by ID)target_collection
(Single lookup, required): the collection to copy items intounembargo
(boolean, defaults to False): if True, allow copying from private to public workspacesreplace
(boolean, defaults to False): if True, replace existing similar itemsname_template
(string, optional): template used to generate the name for the target collection item, using thestr.format
templating syntax (with variables inside curly braces)variables
(dictionary, optional): pass these variables when adding items to the target collection; if a given source item came from a collection, then this is merged into the per-item data from the corresponding source collection item, with the values given here taking priority in cases of conflict
For each of the entries in copies
, the task copies the source items to
the target collection’s workspace; when copying artifacts, if the contained
files are already in one of that workspace’s file stores, then it copies
references to them, and otherwise it copies the file contents. For each
source item, it then adds a collection item to the target collection, using
name_template
and variables
in the same way as in
update-collection-with-artifacts.
All the requested copies happen in a single database transaction; if one of them fails then they are all rolled back.
Workflow package_publish
This workflow publishes source and/or binary packages to a given target suite. It is normally expected to be used as a sub-workflow.
task_data
:source_artifact
(Single lookup, optional): adebian:source-package
ordebian:upload
artifact representing the source package (the former is used when the workflow is started based on a.dsc
rather than a.changes
)binary_artifacts
(Multiple lookup, optional): a list ofdebian:upload
artifacts representing the binary packagestarget_suite
(Single lookup, optional): thedebian:suite
collection to publish packages tounembargo
(boolean, defaults to False): if True, allow publishing artifacts from private workspaces to public suitesreplace
(boolean, defaults to False): if True, replace existing similar itemssuite_variables
(dictionary, optional): pass these variables when adding items to the target suite collection; if a given source or binary artifact came from a collection, then this is merged into the per-item data from the corresponding collection item, with the values given here taking priority in cases of conflict; see debian:suite for the available variable names
At least one of source_artifact
and binary_artifacts
must be set.
The workflow creates a CopyCollectionItems task. The copies
field in its task data is as follows:
source_items
: the union of whichever of{source_artifact}
and{binary_artifacts}
are settarget_collection
:{target_suite}
unembargo
:{unembargo}
replace
:{replace}
variables
:{suite_variables}
Any of the lookups in source_items
may result in promises, and in that case the workflow adds corresponding
dependencies.
If the source and target workspaces have different instances of the
debian:package-build-logs collection,
then the workflow also adds an entry to copies
as follows:
source_items
:collection: {source build logs collection} lookup__same_work_request: {binary_artifacts}
target_collection
: target build logs collectionunembargo
:{unembargo}
replace
:{replace}
If the source and target workspaces have different instances of the
debusine:task-history collection, then the
workflow also adds an entry to copies
as follows:
source_items
:collection: {source task history collection} lookup__same_workflow: {binary_artifacts}
target_collection
: target task history collectionunembargo
:{unembargo}
replace
:{replace}