.. _blueprint-debuginfod-server: ================= debuginfod server ================= Related issue: :issue:`Provide debuginfod server <957>` Debusine builds and hosts Debian repositories but currently has no way to serve debug symbols to developers. When a crash occurs, a developer must manually find and install the correct ``-dbgsym`` package for the exact binary they are debugging. This blueprint proposes bringing `debuginfod `_ server functionality to Debusine, allowing :manpage:`gdb(1)` to automatically fetch debug symbols from a Debusine archive without any manual steps. Goals ===== - Extract ``.debug`` ELF files from ``-dbgsym`` packages during the :task:`Sbuild` task and store them as discrete artifacts in Debusine. - Serve those artifacts through debuginfod-compatible HTTP endpoints (``/buildid//debuginfo``), scoped per archive. - Document how to configure :manpage:`gdb(1)` to use a Debusine archive as a debuginfod source, so developers can get automatic symbol resolution with a single ``DEBUGINFOD_URLS`` environment variable. Requirements ============ - The extraction pipeline must run inside the isolated sbuild worker so that a malformed or malicious ELF file cannot affect the web server or other builds. - Each extracted ``.debug`` file must be stored as a Debusine artifact indexed by its build-ID for fast HTTP lookups. - Debug symbol artifacts must be published into a :collection:`debian:suite` alongside their parent :artifact:`debian:binary-package` artifacts, via the existing ``RELATES_TO`` relation and :task:`CopyCollectionItems` mechanism. - debuginfod URLs must be resolvable from the root of each archive, so that a single ``DEBUGINFOD_URLS`` entry covers all suites in the archive. - The HTTP response must include the headers required by the debuginfod protocol specification: ``X-DEBUGINFOD-FILE`` and ``X-DEBUGINFOD-SIZE``. - Responses must support HTTP 206 Partial Content so that :manpage:`gdb(1)` can fetch individual ELF sections without downloading the full file. - Two :artifact:`debian:debug-symbols` artifacts in the same archive that share a build-ID must have identical file contents, analogous to the existing pool-file uniqueness constraints. - The :collection:`debian:archive` collection must expose a per-archive configuration option (similar to Launchpad's ``build_debug_symbols``) controlling whether debug symbols are extracted. When disabled, ``DEB_BUILD_OPTIONS=noautodbgsym`` is passed to sbuild to suppress ``-dbgsym`` package generation entirely. Out of scope ============ The following parts of the official debuginfod specification are excluded: - **Source file serving** (``/buildid//source``): the :manpage:`debuginfod(8)` man page explicitly notes that, due to Debian and Ubuntu packaging policies, debuginfod cannot resolve source files for ``.deb`` and ``.ddeb`` packages. `debuginfod.debian.net has the same limitation `_. - **Executable serving** (``/buildid//executable``): :artifact:`debian:binary-package` stores the whole ``.deb`` as a single file rather than broken-out binaries. Extracting individual executables on-the-fly inside a request handler is not feasible; pre-extraction at build time would require significant additional storage and pipeline changes that are out of scope for the initial implementation. - **Metrics endpoint** (``/metrics``): this Prometheus statistics endpoint is designed for a standalone C++ daemon; Debusine's existing application-level logging and system-wide monitoring tools are more appropriate. - **Metadata search** (``/metadata``): implementing a searchable JSON index for build-ID metadata requires a separate database design that is out of scope for the initial implementation. - **IMA signatures** (``X-DEBUGINFOD-IMA-SIGNATURE``): this response header carries per-file Integrity Measurement Architecture signatures used primarily for RPM packages and has no standard applicability to Debian ``.deb`` packages. - **Upstream federation**: automatically forwarding unresolved build-ID requests to upstream servers such as ``debuginfod.debian.net`` is excluded. Debusine's primary use case is serving self-hosted, private, or localised archives where the operator controls the source. - **DWZ supplement ingestion**: ``-dbgsym.deb`` packages built with DWZ compression ship a shared supplement ELF file at ``./usr/lib/debug/.dwz/.debug`` that several regular ``.debug`` files reference via their ``.gnu_debugaltlink`` section. Ingesting these supplements is excluded from the initial implementation to keep the first iteration small. Until DWZ support is added, :manpage:`gdb(1)` requests for the supplement's build-ID return ``404``; the practical effect is that DWZ-using debug info renders with the alternate strings table missing, not that debugging fails entirely. Background: dbgsym packages and ELF build-IDs ============================================== When Debian builds a binary package it strips debug information to keep the shipped binary small. For example, ``util-linux_2.40.2-1_amd64.deb`` contains the stripped binary while ``util-linux-dbgsym_2.40.2-1_amd64.deb`` contains the DWARF debug symbols for the exact same binary. Inside the ``data.tar`` of every ``-dbgsym.deb`` the debug files follow a fixed path convention:: ./usr/lib/debug/.build-id/XX/YYYY.debug where ``XX`` is the first two hex characters of the 40-character build-ID and ``YYYY`` is the remaining 38 hex characters. The build-ID is assigned at link time and uniquely identifies a specific binary. ELF classification uses three sections: ``.note.gnu.build-id`` Contains the raw build-ID bytes. Reading them sequentially and converting to lowercase hex yields the familiar 40-character string. ``.debug_info`` / ``.gnu_debugdata`` Presence of either section confirms that DWARF debug information is embedded, marking the file as a ``.debug`` artifact to be extracted. ``.gnu_debugaltlink`` Present when a package was built with DWZ compression. Contains the build-ID of a shared DWZ supplement file living at ``./usr/lib/debug/.dwz/.debug`` inside the same ``-dbgsym.deb``. DWZ supplement ingestion is excluded from the initial implementation (see "Out of scope" above); only files under ``./usr/lib/debug/.build-id/`` are ingested. Implementation plan =================== .. artifact:: debian:debug-symbols New artifact category: ``debian:debug-symbols`` ------------------------------------------------ A new artifact category ``debian:debug-symbols`` will be added as an enum value in ``debusine/artifacts/models.py``. Each instance represents all ``.debug`` ELF files extracted from a single ``-dbgsym`` package. Files are stored as multiple entries within one artifact; each entry's ``FileInArtifact.path`` is set to ``usr/lib/debug/.build-id//.debug``, where ```` is the first two hex characters of the build-ID and ```` is the remaining 38 characters — the same path the file occupies inside the ``-dbgsym.deb`` with the leading ``./`` stripped. Storing all debug files for one ``-dbgsym`` package in a single artifact avoids creating one artifact per debug file, which would place excessive load on the database during publishing and expiry. The artifact data carries the following field: .. list-table:: :header-rows: 1 :widths: 20 20 60 * - Field - Type - Purpose * - ``build_ids`` - list of 40-char hex strings - Index of all build-IDs contained in this artifact ``X-DEBUGINFOD-SIZE`` is derived from the stored file size at serve time and does not need to be persisted. Extraction pipeline in the :task:`Sbuild` task ----------------------------------------------- Debusine already opens ``.deb`` files in ``upload_artifact()`` to read control data and create :artifact:`debian:binary-package` artifacts. The debug-symbol extraction follows the same pattern, added as two new helpers called from ``_upload_binary_packages()``: ``_upload_debug_symbols(dbgsym_deb)`` For each ``-dbgsym.deb`` in the build output: 1. Open the ``data.tar`` archive. 2. Iterate over every file whose path matches ``./usr/lib/debug/.build-id/**/*.debug``. Each such file becomes one entry in the resulting :artifact:`debian:debug-symbols` artifact, with its build-ID joining the artifact's ``build_ids`` list and its ``FileInArtifact.path`` set to ``usr/lib/debug/.build-id//.debug`` (the same in-tar path with the leading ``./`` stripped). Files under ``./usr/lib/debug/.dwz/`` (DWZ supplements) are not ingested in the initial implementation; see "Out of scope" above. 3. For each file, parse the ELF structure with ``pyelftools`` to locate the ``.note.gnu.build-id`` section and extract the build-ID. Also check for ``.debug_info`` / ``.gnu_debugdata`` to confirm it is a debug file. 4. Verify that the in-tar path matches the convention ``./usr/lib/debug/.build-id//.debug``, where ```` and ```` are derived from the build-ID extracted in step 3. If the path does not match, the task fails with an explanatory error. ``dh_strip``'s ``make_debug`` function constructs this path deterministically from the build-ID and we are not aware of any tooling in Debian that constructs ``-dbgsym.deb`` packages by hand, so a mismatch indicates a malformed package and must not be silently ingested. 5. Accumulate all such files into a single :artifact:`debian:debug-symbols` artifact and upload it. ``_create_debug_symbol_relations(debug_artifact, binary_artifact)`` Records a ``RELATES_TO`` relation from each :artifact:`debian:debug-symbols` artifact to its parent :artifact:`debian:binary-package` artifact. Running extraction inside the sbuild worker confines the blast radius of any malformed or malicious ELF input to the isolated worker process and avoids re-fetching files from artifact storage, since all build output is already present on disk. Collection specification changes --------------------------------- The specs in ``docs/reference/collections/specs/`` must be updated to reflect the new artifact type: - ``debian:suite`` and ``debian:archive``: a uniqueness constraint is added in both collection specifications, in the same shape as the existing ``pool-file`` constraints. Within either a single :collection:`debian:suite` or a single :collection:`debian:archive`, two :artifact:`debian:debug-symbols` collection items that share a build-ID must refer to files with identical contents. The suite-level constraint allows the archive-level constraint to be relaxed when an obsolete suite is removed; the archive-level constraint prevents two suites in the same archive from disagreeing about the file for a given build-ID. The constraint text in the two specifications is essentially identical. - ``debian:suite``: :artifact:`debian:debug-symbols` becomes a valid item type. See the per-item data table under "Publishing into a suite" below. - ``debian:archive``: a new boolean data field ``build_debug_symbols`` (default ``true``) controls whether debug symbols are extracted for this archive. Database index -------------- A new Django migration adds a partial B-tree index on ``CollectionItem.data->>'build_id'`` conditioned on: - ``category = 'debian:debug-symbols'`` - ``child_type = 'a'`` (artifact item in Debusine's ``CollectionItem`` model) - ``parent_category = 'debian:suite'`` This pattern is taken directly from ``migration 0005``, which adds a similar partial index for repository index path lookups. The partial condition keeps the index small by covering only debug-symbol rows in suite collections, excluding the much larger set of binary and source package rows that carry no ``build_id`` field. Because URLs are anchored at the archive level, build-IDs must be unique across the entire archive (not just within a single suite). This is enforced by the uniqueness constraints described in the collection specification changes above (one each in :collection:`debian:suite` and :collection:`debian:archive`), rather than by the index itself. Publishing into a suite ----------------------- When a binary package is published into a suite, :workflow:`package_publish` must pull in the matching debug symbols alongside it. A new helper ``_add_debug_symbols()`` follows the ``RELATES_TO`` relation recorded during the :task:`Sbuild` task to obtain the :artifact:`debian:debug-symbols` artifact, then queues it for copying into the suite via :task:`CopyCollectionItems`. Inside ``DebianSuiteManager.do_add_artifact()``, a new ``elif`` branch handles the ``DEBUG_SYMBOLS`` category and creates one ``CollectionItem`` row per build-ID contained in the artifact: .. list-table:: Per-item data for ``debian:debug-symbols`` items in a suite :header-rows: 1 :widths: 25 25 50 * - Field - Type - Source * - ``build_id`` - 40-char hex string - The build-ID this collection item represents * - ``srcpkg_name`` - string - Mirrored from the parent :artifact:`debian:binary-package` item * - ``srcpkg_version`` - string - Mirrored from the parent :artifact:`debian:binary-package` item * - ``package`` - string - Mirrored from the parent :artifact:`debian:binary-package` item * - ``version`` - string - Mirrored from the parent :artifact:`debian:binary-package` item * - ``architecture`` - string - Mirrored from the parent :artifact:`debian:binary-package` item The collection item is named ``debugsym:``. The "parent" binary-package item is the one reached via the ``RELATES_TO`` relation recorded by ``_create_debug_symbol_relations()`` during the :task:`Sbuild` task. If the suite already contains a ``CollectionItem`` with the same ``(name, parent_collection)`` — which occurs when a source package is built reproducibly more than once — the existing item's file hash is compared against the incoming file. If the hashes match the collision is logged and ignored; if they differ an error is raised, as this would indicate a non-reproducible build with the same build-ID, which is a toolchain problem. HTTP serving ------------ ``DebugInfoView`` ~~~~~~~~~~~~~~~~~ The view is scoped to an archive (inherits from ``ArchiveFileView``) rather than a suite, so that a single ``DEBUGINFOD_URLS`` entry resolves build-IDs across all suites in the archive. URL pattern (appended automatically by :manpage:`gdb(1)`):: https://///buildid//debuginfo The view queries ``CollectionItem`` filtered by ``build_id`` and archive, hitting the partial B-tree index for a fast lookup. It then calls the existing ``stream_file()`` helper and appends the mandatory response headers: .. list-table:: :header-rows: 1 :widths: 30 30 40 * - Header - Source - Description * - ``X-DEBUGINFOD-FILE`` - computed from ``build_id`` as ``/usr/lib/debug/.build-id//.debug`` - Path of the ``.debug`` file within the binary package. The leading ``./`` of the in-tar path is an implementation detail of the ``.deb`` format and is stripped before the value is emitted. * - ``X-DEBUGINFOD-SIZE`` - ``file_in_artifact.file.size`` - File size in bytes :manpage:`gdb(1)` sends a ``HEAD`` request before ``GET`` to check availability. Django's ``django.views.generic.base.View.setup`` already aliases ``head`` to ``get`` when ``get`` is defined and ``head`` is not, so ``DebugInfoView`` needs no explicit handling. ``HEAD`` is covered by unit tests alongside ``GET`` and ``Range:`` requests.