Contribute a new workflow
Workflows are Debusine’s most powerful feature, and we expect other people to have ideas of interesting workflows to try out. Eventually we want to provide some kind of domain-specific language to create custom workflows directly in the system, but in the meantime you are welcome to contribute your own workflows and we will help you as needed.
Design
All new workflows need a blueprint. This should at minimum detail what task data the workflow accepts and what child work requests it creates; see existing workflows for examples of how to lay this out. If your workflow involves any new tasks, then the blueprint should also include detailed descriptions of those in the same way.
It will save you time to discuss your plans with the development team so that we can advise on what approaches are likely to work most smoothly. You can do this either before or after writing a draft blueprint.
Task breakdown
Start by thinking about how your workflow breaks down into tasks. Some good questions to ask yourself are:
Can they be offloaded to external workers (most tasks should be in this category), or do they need direct access to the Debusine database (perhaps because they involve analysing a collection)?
How finely does the work need to be divided into individual tasks?
Are there any existing tasks that can help you solve parts of your problem?
What artifacts will need to be passed between tasks, and what dependencies will you need to have between them?
Input to the workflow
You should define the task data that your workflow will accept. This is usually related to the task data accepted by the individual tasks that it will create, but it may be higher-level and more opinionated. A good rule of thumb is that tasks express mechanism, while workflows express policy.
Workflow task data is constructed by combining workflow templates defined in a workspace with user input, so it should be laid out in a way that will allow workspace owners to use workflow templates to set useful rules for how contributors may start workflows. In particular, this means that while workflow task data can be arbitrary data structures, you probably want to lean towards a flat layout; top-level items not set in a workflow template may be provided by the user who starts the workflow.
Implementation
Each workflow has a Pydantic model in
debusine/server/workflows/models.py
for its task data, and a
corresponding orchestrator class under debusine/server/workflows/
. At
minimum, that class should look like this:
class FooWorkflow(Workflow[FooWorkflowData, BaseDynamicTaskData]):
"""Do stuff."""
TASK_NAME = "foo"
def build_dynamic_data(
self, task_database: TaskDatabaseInterface
) -> BaseDynamicTaskData:
"""Compute dynamic data for this workflow."""
return BaseDynamicTaskData(subject=...)
def populate(self) -> None:
"""Create work requests."""
def get_label(self) -> str:
"""Return the task label."""
return "..."
The build_dynamic_data
method is typically less complex for workflows
than for tasks. In the workflow case, it usually just needs to compute a
reasonable subject
for a workflow instance based on self.data
(see
Task configuration) and perhaps also parameter_summary
describing
the most important parameters to the workflow for display in the web UI.
The populate
method does the bulk of the orchestrator’s work. Usually,
it looks up any artifacts or collections needed from self.data
, decides
which child work requests it needs to create, and calls
self.work_request_ensure_child
to
create them. It may also call self.provides_artifact
and/or
self.requires_artifact
to indicate
dependencies between work requests in the workflow; take care that the
name
passed to self.provides_artifact
is unique across all internal
collection items under the same root workflow.
If your workflow creates sub-workflows, then it is your workflow’s responsibility to run their orchestrators in turn. That normally looks like this:
def populate(self) -> None:
wr = self.work_request_ensure_child(...)
wr.mark_running()
orchestrate_workflow(wr)
In some exceptional cases the sub-workflow’s orchestrator may not be able to
work out which child work requests to create until a dependency has
completed. In such cases, the populate
method should instead have this
sort of structure:
def populate(self) -> None:
wr = self.work_request_ensure_child(task_type=TaskTypes.WORKFLOW, ...)
self.requires_artifact(wr, ...)
if wr.status == WorkRequest.Statuses.PENDING:
wr.mark_running()
orchestrate_workflow(wr)
The get_label
method returns a string used as a label for a workflow
instance in the web UI.
You may find it helpful to consult some existing implementations for
inspiration. The lintian workflow
(debusine/server/workflows/lintian.py
) is a relatively simple example
that demonstrates some of the points here.
Documentation
Once you have implemented your workflow, make sure to move the corresponding
documentation from the blueprint into the main documentation, usually under
docs/reference/workflows/specs/
.