.. _contribute-workflow: Contribute a new workflow ========================= :ref:`explanation-workflows` are Debusine's most powerful feature, and we expect other people to have ideas of interesting workflows to try out. Eventually we want to provide some kind of domain-specific language to create custom workflows directly in the system, but in the meantime you are welcome to :ref:`contribute ` your own workflows and we will help you as needed. Design ------ All new workflows need a :ref:`blueprint `. This should at minimum detail what task data the workflow accepts and what child work requests it creates; see :ref:`existing workflows ` for examples of how to lay this out. If your workflow involves any new tasks, then the blueprint should also include detailed descriptions of those in the same way. It will save you time to :ref:`discuss ` your plans with the development team so that we can advise on what approaches are likely to work most smoothly. You can do this either before or after writing a draft blueprint. Task breakdown ~~~~~~~~~~~~~~ Start by thinking about how your workflow breaks down into :ref:`tasks `. Some good questions to ask yourself are: * Can they be offloaded to external workers (most tasks should be in this category), or do they need direct access to the Debusine database (perhaps because they involve analysing a :ref:`collection `)? * How finely does the work need to be divided into individual tasks? * Are there any :ref:`existing tasks ` that can help you solve parts of your problem? * What artifacts will need to be passed between tasks, and what dependencies will you need to have between them? Input to the workflow ~~~~~~~~~~~~~~~~~~~~~ You should define the task data that your workflow will accept. This is usually related to the task data accepted by the individual tasks that it will create, but it may be higher-level and more opinionated. A good rule of thumb is that tasks express mechanism, while workflows express policy. Workflow task data is constructed by combining workflow templates defined in a workspace with user input, so it should be laid out in a way that will allow workspace owners to use workflow templates to set useful rules for how contributors may start workflows. In particular, this means that while workflow task data can be arbitrary data structures, you probably want to lean towards a flat layout; top-level items not set in a workflow template may be provided by the user who starts the workflow. Implementation -------------- Each workflow has a Pydantic model in ``debusine/server/workflows/models.py`` for its task data, and a corresponding orchestrator class under ``debusine/server/workflows/``. At minimum, that class should look like this: .. code-block:: python class FooWorkflow(Workflow[FooWorkflowData, BaseDynamicTaskData]): """Do stuff.""" TASK_NAME = "foo" def build_dynamic_data( self, task_database: TaskDatabaseInterface ) -> BaseDynamicTaskData: """Compute dynamic data for this workflow.""" return BaseDynamicTaskData(subject=...) def populate(self) -> None: """Create work requests.""" def get_label(self) -> str: """Return the task label.""" return "..." The ``build_dynamic_data`` method is typically less complex for workflows than for tasks. In the workflow case, it usually just needs to compute a reasonable ``subject`` for a workflow instance based on ``self.data`` (see :ref:`task-configuration`) and perhaps also ``parameter_summary`` describing the most important parameters to the workflow for display in the web UI. The ``populate`` method does the bulk of the orchestrator's work. Usually, it looks up any artifacts or collections needed from ``self.data``, decides which child work requests it needs to create, and calls :py:meth:`self.work_request_ensure_child ` to create them. It may also call :py:meth:`self.provides_artifact ` and/or :py:meth:`self.requires_artifact ` to indicate dependencies between work requests in the workflow; take care that the ``name`` passed to ``self.provides_artifact`` is unique across all internal collection items under the same root workflow. If your workflow creates sub-workflows, then it is your workflow's responsibility to run their orchestrators in turn. That normally looks like this: .. code-block:: python def populate(self) -> None: wr = self.work_request_ensure_child(...) wr.mark_running() orchestrate_workflow(wr) In some exceptional cases the sub-workflow's orchestrator may not be able to work out which child work requests to create until a dependency has completed. In such cases, the ``populate`` method should instead have this sort of structure: .. code-block:: python def populate(self) -> None: wr = self.work_request_ensure_child(task_type=TaskTypes.WORKFLOW, ...) self.requires_artifact(wr, ...) if wr.status == WorkRequest.Statuses.PENDING: wr.mark_running() orchestrate_workflow(wr) The ``get_label`` method returns a string used as a label for a workflow instance in the web UI. You may find it helpful to consult some existing implementations for inspiration. The :ref:`lintian ` workflow (``debusine/server/workflows/lintian.py``) is a relatively simple example that demonstrates some of the points here. Documentation ------------- Once you have implemented your workflow, make sure to move the corresponding documentation from the blueprint into the main documentation, usually under ``docs/reference/workflows/specs/``.