Deployment architecture

Debusine has the following major components that should be considered as part of a deployment:

PostgreSQL database
Redis database
File stores
Server
Scheduler
Workers

PostgreSQL database

Debusine stores most of its persistent data in PostgreSQL, including artifact metadata, assets, collections, and work requests (see Debusine concepts). Debusine also uses it as a Celery result backend.

PostgreSQL may be deployed separately from other components, and may be deployed in a primary/replica arrangement for higher availability (although Debusine does not yet support routing read-only requests to different members of the cluster, so multiple instances does not yet provide any performance advantages). The server, the server’s Celery workers, and the scheduler all need read-write access to the database.

The database schema is maintained as part of Debusine, and is upgraded automatically by regular package upgrades using Django migrations.

The PostgreSQL database must be backed up, preferably using continuous archiving to allow for point-in-time recovery. The details are out of the scope of this document, but typical approaches include WAL-G and pgBackRest.

Redis database

Debusine uses Redis as a Celery broker: this transports messages from the server or the scheduler to the Celery workers, in order to execute server-side asynchronous tasks.

Redis also acts as a channel layer, which is part of how Debusine communicates with its workers. For example, when a work request is assigned to a worker or when a worker completes a work request, Debusine sends notifications to other parts of itself via Redis.

Debusine uses a second Redis database to store state for runtime Prometheus metrics across multiple processes.

Debusine uses a third Redis database to store live log streams for running tasks. This database is configured via the LIVE_LOG_REDIS_URL setting.

Redis may be deployed separately from other components. The server, the server’s Celery workers, and the scheduler all need access to it.

The Redis database does not need to be backed up. The worst case is that some ephemeral messages are lost and that some tasks may need to be retried.

File stores

Files in artifacts may be large, and so they are stored in separate file stores rather than in PostgreSQL. These files are content-addressed, and the stores may be local or remote. File stores are configured by the instance administrator at the scope level.

Local storage is the default and is suitable for small installations, but it must be backed up manually. It must be on the same machine as the server, and so can only be used if the server is running on a single machine.

Remote storage such as S3 is typically better for larger installations, and is required if the server needs to be scaled across multiple machines. An installation might rely on the provider’s redundancy, or might run separate backups using tools such as rclone.

A scope may have multiple file stores with various policies, which may be used to improve redundancy by storing the same files in multiple independent places.

Each file has a corresponding row in PostgreSQL, which may be referred to by artifacts. In the event of disaster recovery, it is possible that the state of the overall system may have changed between the most recent PostgreSQL backup and the most recent file store backup, and it may not be possible to achieve perfect consistency. However, this should only affect files uploaded and database changes made since the most recent backup. In such a situation, an administrator can minimize data loss by restoring the most recent backup of each of PostgreSQL and the file stores, and relying on the regular debusine-admin delete_expired and debusine-admin vacuum_storage jobs to clean up or report on inconsistencies.

Server

The server can be installed using the debusine-server package. It is a WSGI/ASGI server deployed using Gunicorn and Uvicorn. By default, the server runs (2 × CPU cores) + 1 workers; this may need to be tuned on individual systems, especially those with many cores.

The server includes configuration for running behind an nginx instance on the same machine, which is recommended. On larger installations, it is normally appropriate to also place a load balancer such as haproxy in front of the server.

The server may in principle be deployed on multiple machines with a load balancer to distribute traffic between them, as long as a local file store is not in use. This is not yet well-tested in production.

Unless a local file store is in use or PostgreSQL is being run on the same machine, the server does not store persistent data, although its configuration should be backed up.

systemd units:

`debusine-server.service`	Main server
`debusine-server-migrate.service`	Run database schema migrations on upgrade
`debusine-server-provisioner.service`	Dynamic worker provisioner
`debusine-server-delete-expired.timer`	Regular cleanup of expired or unused objects (timer)
`debusine-server-delete-expired.service`	Regular cleanup of expired or unused objects
`debusine-server-vacuum-storage.timer`	Regular storage maintenance (timer)
`debusine-server-vacuum-storage.service`	Regular storage maintenance

Scheduler

The scheduler is responsible for distributing tasks to workers. It runs as a Celery service, using Redis as its broker and PostgreSQL as its result backend.

While in theory the scheduler could be deployed separately from the server, in practice it is part of the debusine-server package and it is normally simplest to run it on the same machine as the server. Only one scheduler instance may run at a time; if the server is running on multiple machines, then systemctl mask debusine-server-scheduler.service should be run on all but one of them.

The scheduler does not store persistent data. It shares configuration with the server, and so that configuration should be backed up.

systemd units:

debusine-server-scheduler.service

Scheduler

Workers

External workers

Each external worker runs on a separate machine, installed using the debusine-worker package. The various Recommends of that package are optional in the sense that the worker can still function without them, but it will not be able to execute all tasks, so typical worker installations should not use APT options such as --no-install-recommends.

External workers must be able to communicate with the server over HTTPS, and must be able to download whatever resources tasks need. In typical installations this will include broad outbound internet access.

Installations may use static workers, or may use debusine-admin worker_pool to manage dynamic worker pools on cloud providers. A “provisioner” service on the server is responsible for auto-scaling dynamic workers; as with the scheduler, if the server is running on multiple machines, then systemctl mask debusine-server-provisioner.service should be run on all but one of them. If possible, it is usually best to maintain at least a small number of static workers to handle base load, and reserve dynamic workers for bursts.

External workers do not store persistent data, although their configuration should be backed up.

systemd units:

debusine-worker.service

External worker

Celery workers

Celery workers handle trusted server-side tasks that require direct access to the Debusine database. While in theory a Celery worker could be deployed separately from the server, in practice it is part of the debusine-server package and it is normally simplest to run it on the same machine(s) as the server.

Celery workers do not store persistent data. They share configuration with the server, and so that configuration should be backed up.

systemd units:

`debusine-server-celery.service`	Celery worker
`debusine-server-periodic-tasks.service`	Run periodic Celery tasks

Signing workers

Signing workers are responsible for handling sensitive private key material. They require a separate PostgreSQL database. Private keys may be stored in that database directly, encrypted at rest using a key known to the signing workers; alternatively, there is some support for using private keys in a hardware security module (currently only for UEFI Secure Boot).

Signing workers must be able to communicate with the server over HTTPS, and with their dedicated PostgreSQL database; they should not have general internet access. The PostgreSQL database used by signing workers must be firewalled off from everything other than the signing workers themselves.

If a hardware security module is in use, it is currently best to use only a single signing worker, as the scheduler does not yet know how to route tasks to the correct worker; we expect to remove this limitation in future.

The contents of the PostgreSQL database used by signing workers should be backed up in the same way as the main database, although it changes much less often. Signing workers do not otherwise store persistent data, although their configuration should be backed up.

systemd units:

`debusine-signing.service`	Signing worker
`debusine-signing-migrate.service`	Run signing database schema migrations on upgrade

Monitoring

Debusine supports Prometheus monitoring using the /api/1.0/open-metrics/ endpoint. Freexian uses this to maintain various Grafana graphs such as request latency, queue lengths, and worker pool sizes, although we do not currently provide reusable graphs.

More details are in Available metrics.

Ansible configuration

Freexian provides several Ansible roles that may be useful to teams deploying Debusine, either directly or as a reference.