Deployment architecture
Debusine has the following major components that should be considered as part of a deployment:
PostgreSQL database
Redis database
Server
Scheduler
PostgreSQL database
Debusine stores most of its persistent data in PostgreSQL, including artifact metadata, assets, collections, and work requests (see Debusine concepts). Debusine also uses it as a Celery result backend.
PostgreSQL may be deployed separately from other components, and may be deployed in a primary/replica arrangement for higher availability (although Debusine does not yet support routing read-only requests to different members of the cluster, so multiple instances does not yet provide any performance advantages). The server, the server’s Celery workers, and the scheduler all need read-write access to the database.
The database schema is maintained as part of Debusine, and is upgraded automatically by regular package upgrades using Django migrations.
The PostgreSQL database must be backed up, preferably using continuous archiving to allow for point-in-time recovery. The details are out of the scope of this document, but typical approaches include WAL-G and pgBackRest.
Redis database
Debusine uses Redis as a Celery broker: this transports messages from the server or the scheduler to the Celery workers, in order to execute server-side asynchronous tasks.
Redis also acts as a channel layer, which is part of how Debusine communicates with its workers. For example, when a work request is assigned to a worker or when a worker completes a work request, Debusine sends notifications to other parts of itself via Redis.
Redis may be deployed separately from other components. The server, the server’s Celery workers, and the scheduler all need access to it.
The Redis database does not need to be backed up. The worst case is that some ephemeral messages are lost and that some tasks may need to be retried.
File stores
Files in artifacts may be large, and so they are stored in separate file stores rather than in PostgreSQL. These files are content-addressed, and the stores may be local or remote. File stores are configured by the instance administrator at the scope level.
Local storage is the default and is suitable for small installations, but it must be backed up manually. It must be on the same machine as the server, and so can only be used if the server is running on a single machine.
Remote storage such as S3 is typically better for larger installations, and is required if the server needs to be scaled across multiple machines. An installation might rely on the provider’s redundancy, or might run separate backups using tools such as rclone.
A scope may have multiple file stores with various policies, which may be used to improve redundancy by storing the same files in multiple independent places.
Each file has a corresponding row in PostgreSQL, which may be referred to by
artifacts. In the event of disaster recovery, it is possible that the state
of the overall system may have changed between the most recent PostgreSQL
backup and the most recent file store backup, and it may not be possible to
achieve perfect consistency. However, this should only affect files
uploaded and database changes made since the most recent backup. In such a
situation, an administrator can minimize data loss by restoring the most
recent backup of each of PostgreSQL and the file stores, and relying on the
regular debusine-admin delete_expired and debusine-admin
vacuum_storage jobs to clean up or report on inconsistencies.
Server
The server can be installed using the debusine-server package. It is a
WSGI/ASGI server deployed using Gunicorn and
Uvicorn. By default, the server runs (2 × CPU
cores) + 1 workers; this may need to be tuned on individual systems, especially
those with many cores.
The server includes configuration for running behind an nginx instance on the same machine, which is recommended. On larger installations, it is normally appropriate to also place a load balancer such as haproxy in front of the server.
The server may in principle be deployed on multiple machines with a load balancer to distribute traffic between them, as long as a local file store is not in use. This is not yet well-tested in production.
Unless a local file store is in use or PostgreSQL is being run on the same machine, the server does not store persistent data, although its configuration should be backed up.
systemd units:
|
Main server |
|
Run database schema migrations on upgrade |
|
Dynamic worker provisioner |
|
Regular cleanup of expired or unused objects (timer) |
|
Regular cleanup of expired or unused objects |
|
Regular storage maintenance (timer) |
|
Regular storage maintenance |
Scheduler
The scheduler is responsible for distributing tasks to workers. It runs as a Celery service, using Redis as its broker and PostgreSQL as its result backend.
While in theory the scheduler could be deployed separately from the server,
in practice it is part of the debusine-server package and it is normally
simplest to run it on the same machine as the server. Only one scheduler
instance may run at a time; if the server is running on multiple machines,
then systemctl mask debusine-server-scheduler.service should be run on
all but one of them.
The scheduler does not store persistent data. It shares configuration with the server, and so that configuration should be backed up.
systemd units:
|
Scheduler |
Workers
External workers
Each external worker runs on a separate machine, installed using the
debusine-worker package. The various Recommends of that package are
optional in the sense that the worker can still function without them, but
it will not be able to execute all tasks, so typical worker installations
should not use APT options such as --no-install-recommends.
External workers must be able to communicate with the server over HTTPS, and must be able to download whatever resources tasks need. In typical installations this will include broad outbound internet access.
Installations may use static workers, or may use debusine-admin
worker_pool to manage dynamic worker pools
on cloud providers. A “provisioner” service on the server is responsible
for auto-scaling dynamic workers; as with the scheduler, if the server is
running on multiple machines, then systemctl mask
debusine-server-provisioner.service should be run on all but one of them.
If possible, it is usually best to maintain at least a small number of
static workers to handle base load, and reserve dynamic workers for bursts.
External workers do not store persistent data, although their configuration should be backed up.
systemd units:
|
External worker |
Celery workers
Celery workers handle trusted server-side tasks that require direct access
to the Debusine database. While in theory a Celery worker could be deployed
separately from the server, in practice it is part of the
debusine-server package and it is normally simplest to run it on the
same machine(s) as the server.
Celery workers do not store persistent data. They share configuration with the server, and so that configuration should be backed up.
systemd units:
|
Celery worker |
|
Run periodic Celery tasks |
Signing workers
Signing workers are responsible for handling sensitive private key material. They require a separate PostgreSQL database. Private keys may be stored in that database directly, encrypted at rest using a key known to the signing workers; alternatively, there is some support for using private keys in a hardware security module (currently only for UEFI Secure Boot).
Signing workers must be able to communicate with the server over HTTPS, and with their dedicated PostgreSQL database; they should not have general internet access. The PostgreSQL database used by signing workers must be firewalled off from everything other than the signing workers themselves.
If a hardware security module is in use, it is currently best to use only a single signing worker, as the scheduler does not yet know how to route tasks to the correct worker; we expect to remove this limitation in future.
The contents of the PostgreSQL database used by signing workers should be backed up in the same way as the main database, although it changes much less often. Signing workers do not otherwise store persistent data, although their configuration should be backed up.
systemd units:
|
Signing worker |
|
Run signing database schema migrations on upgrade |
Monitoring
Debusine supports Prometheus monitoring using
the /api/1.0/open-metrics/ endpoint. Freexian uses this to maintain
various Grafana graphs such as request latency,
queue lengths, and worker pool sizes, although we do not currently provide
reusable graphs.
Ansible configuration
Freexian provides several Ansible roles that may be useful to teams deploying Debusine, either directly or as a reference.