Dynamic cloud storage scaling

Requirements

To support larger volumes of data, debusine needs to be able to dynamically scale its storage beyond a local file system on a single server. Local storage is still useful as the initial destination for uploads to debusine, and potentially for hot data (since public clouds typically charge for outbound data transfer), but it is too limiting to be the only option available.

Workspaces currently have “default” and “other” file stores, but this system is only partially implemented: in most cases debusine only consults the default file store. Incoming uploads should still go to the default file store, but it should be possible to serve a file from any store, with some appropriate policies for which one to select. Additionally, the current system was designed before scopes existed, and it seems unnecessarily fiddly to configure file stores at the workspace level; this should move to the scope level instead.

Administrators should be able to set policies for which file stores to use at the scope level, with tools for copying and/or moving data between local and remote stores. In particular, they need a tool to ensure that all data from a remote store also exists in a local store, in order to be able to shut down a cloud account.

Uploading files to remote stores should be handled asynchronously, using a periodic job.

Expected changes

  • Move Workspace.default_file_store and Workspace.other_file_stores to Scope.file_stores, and Workspace.file_stores to Scope.upload_file_stores and Scope.download_file_stores (ordered according to the corresponding priorities; see below).

    • The default file store is given an upload priority of 100. Other file stores are left with unset priorities.

    • The data migration fails if there are workspaces in a scope with different file stores; the administrator will have to resolve that manually.

    • Move the --default-file-store option from debusine-admin manage_workspace to a new debusine-admin scope manage command.

  • Add the following extra data on the relationship between Scope and FileStore, and extend debusine-admin scope manage to be able to change it:

    • upload_priority (integer, optional): The priority of this store for the purpose of storing new files. When adding a new file, debusine tries stores whose policies allow adding new files in descending order of upload priority, counting null as the lowest.

    • download_priority (integer, optional): The priority of this store for the purpose of serving files to clients. When downloading a file, debusine tries stores in descending order of download priority, counting null as the lowest; it breaks ties in descending order of upload priority, again counting null as the lowest. If there is still a tie, it picks one of the possibilities arbitrarily.

    • policy (JSON): A Pydantic-modelled object as follows:

      • populate (boolean, defaults to False): If True, the storage maintenance job ensures that this store has a copy of all files in the scope.

      • drain (boolean or string, defaults to False): If True, the storage maintenance job moves all files in this scope to some other store in the same scope, following the same rules for finding a target store as for uploads of new files. If a string, the storage maintenance job moves all files in this scope into the store with that name in this scope. In either case, it does not move into a store if that would take its total size over soft_max_size (either for the scope or the file store), and it logs an error if it cannot find any eligible target store.

      • read_only (boolean, defaults to False): If True, debusine will not add new files to this store. Use this in combination with drain: True to prepare for removing the file store.

      • write_only (boolean, defaults to False): If True, debusine will not read files from this store. This is suitable for provider storage classes that are designed for long-term archival rather than routine retrieval, such as S3 Glacier Deep Archive.

      • soft_max_size (integer, optional): An integer specifying the number of bytes that the file store can hold for this scope (accounting files that are in multiple scopes to all of the scopes in question). This limit may be exceeded temporarily during uploads; the storage maintenance job will move the least-recently-used files to another file store to get back below the limit.

  • In non-test code that reads file contents (debusine.server.tar.TarArtifact, debusine.server.tasks.package_upload.PackageUpload, debusine.web.views.files.FileDownloadMixin, debusine.web.views.files.FileWidget, debusine.web.views.lintian.LintianView), use Scope.download_file_stores(file).first() or equivalent rather than Scope.default_file_store.

  • Add a new instance_wide boolean field to FileStore. If True, this store can be used by any scope on this debusine instance. If False, it may only be used by a single scope (i.e. there is a unique constraint on Scope/FileStore relations where FileStore.instance_wide is False).

  • Add new soft_max_size and max_size integer fields to FileStore, specifying soft and hard limits respectively in bytes for the total capacity of the store. The soft limit may be exceeded temporarily during uploads; the storage maintenance job will move the least-recently-used files to another file store to get back below the limit. The hard limit may not be exceeded even temporarily during uploads.

  • Add options to debusine-admin scope manage to allow modifying the set of file stores for a scope. (Note that this potentially includes enabling or disabling the shared local storage, which is a store.)

  • In debusine-admin monthly_cleanup, handle files that do not have a local path.

  • Rename debusine-admin monthly_cleanup to debusine-admin vacuum_file_store, in preparation for it becoming a more general storage maintenance job. Run it daily rather than monthly.

  • Update debusine-admin vacuum_file_store to copy or move files in a scope to other stores as needed to satisfy the populate, drain, and soft_max_size policies.

  • Add a debusine-admin scope show command, showing data including a brief tabular representation of the contents of the scope’s file stores. A file store with no files has either never been populated or has been drained, and can safely be removed.

  • Add a debusine-admin delete_file_store command, mirroring the existing debusine-admin create_file_store. It must refuse to delete a file store that still contains files (unless --force is used), and suggest that the store be drained first.

  • Implement an Amazon S3 file store, with configuration holding the necessary API keys. get_url should be implemented using a presigned URL with a short expiration time, e.g. using boto3.

Storage policy recommendations

The shared local storage should normally have the highest upload_priority of any store, in order not to block uploads of new files on slow data transfers. Its store-level soft_max_size field should be set somewhat below the available file system size, with clearance for at least a week’s worth of uploads if possible. That will give the daily storage maintenance job time to move least-recently-used files to other file stores.

To guard against data loss, files may be in multiple stores: for example, a backup store might use the populate policy to ensure that it has a copy of all files, and perhaps write_only to ensure that debusine does not try to serve files directly from it. Alternatively, an administrator might use lower-level tools such as rclone to handle backups.