fix(kernel-bootstrap): unify external raw container flow and remove default second disk

This commit is contained in:
vxtls 2026-03-03 08:33:19 -05:00
parent 919200478b
commit 85c389044d
4 changed files with 120 additions and 91 deletions

View file

@ -2,10 +2,10 @@
This repository uses [`README.rst`](./README.rst) as the canonical main documentation. This repository uses [`README.rst`](./README.rst) as the canonical main documentation.
## Kernel-bootstrap `payload.img` ## Kernel-bootstrap raw `external.img`
`payload.img` is a raw container disk used in kernel-bootstrap offline mode `external.img` is a raw container disk used in kernel-bootstrap mode when
(`--repo` and `--external-sources` are both unset). `--external-sources` is set and `--repo` is unset.
### Why not put everything in the initial image? ### Why not put everything in the initial image?
@ -20,7 +20,7 @@ around the Fiwix transition in QEMU or on bare metal).
So the design is intentionally split: So the design is intentionally split:
- Initial image: only what is required to reach `improve: import_payload` - Initial image: only what is required to reach `improve: import_payload`
- `payload.img`: the rest of offline distfiles - `external.img`: the rest of distfiles
This is not a patch-style workaround. It is a two-phase transport design that This is not a patch-style workaround. It is a two-phase transport design that
keeps early boot deterministic and moves bulk data import to a stage where the keeps early boot deterministic and moves bulk data import to a stage where the
@ -29,14 +29,14 @@ runtime is robust enough to process it safely.
### Why import from an external image and copy into main filesystem? ### Why import from an external image and copy into main filesystem?
Because the bootstrap still expects distfiles to end up under the normal local Because the bootstrap still expects distfiles to end up under the normal local
path (`/external/distfiles`) for later steps. `payload.img` is used as a path (`/external/distfiles`) for later steps. `external.img` is used as a
transport medium only. transport medium only.
The flow is: The flow is:
1. Boot minimal initial image. 1. Boot minimal initial image.
2. Reach `improve: import_payload`. 2. Reach `improve: import_payload`.
3. Detect the payload disk by magic (`LBPAYLD1`) across detected block devices. 3. Detect the external container disk by magic (`LBPAYLD1`) across detected block devices.
4. Copy payload files into `/external/distfiles`. 4. Copy payload files into `/external/distfiles`.
5. Continue the build exactly as if files had been present locally all along. 5. Continue the build exactly as if files had been present locally all along.
@ -54,7 +54,7 @@ The importer probes detected block devices and selects the one with magic `LBPAY
### Manual creation without Python ### Manual creation without Python
Prepare `payload.list` as: Prepare `external.list` as:
```text ```text
<archive-name> <absolute-path-to-archive> <archive-name> <absolute-path-to-archive>
@ -66,8 +66,8 @@ Then:
cat > make-payload.sh <<'SH' cat > make-payload.sh <<'SH'
#!/bin/sh #!/bin/sh
set -e set -e
out="${1:-payload.img}" out="${1:-external.img}"
list="${2:-payload.list}" list="${2:-external.list}"
write_u32le() { write_u32le() {
v="$1" v="$1"
@ -89,14 +89,17 @@ while read -r name path; do
done < "${list}" done < "${list}"
SH SH
chmod +x make-payload.sh chmod +x make-payload.sh
./make-payload.sh payload.img payload.list ./make-payload.sh external.img external.list
``` ```
Attach `payload.img` as an extra raw disk in QEMU, or as the second disk on bare metal. Attach `external.img` as an extra raw disk in QEMU, or as the second disk on bare metal.
### When it is used ### When it is used
- Used in kernel-bootstrap offline mode. - Used in kernel-bootstrap with `--external-sources` and without `--repo`.
- Not used when `--repo` or `--external-sources` is provided. - Not used with `--repo` (that path still uses an ext filesystem disk).
- `--build-guix-also` increases payload contents (includes post-early `steps-guix` - Without `--external-sources` and without `--repo`, there is no second disk:
the initial image only includes distfiles needed before `improve: get_network`,
and later distfiles are downloaded from mirrors.
- `--build-guix-also` increases container contents (includes post-early `steps-guix`
sources), but does not change the mechanism. sources), but does not change the mechanism.

View file

@ -63,34 +63,36 @@ Without using Python:
* *Only* copy distfiles listed in ``sources`` files for ``build:`` steps * *Only* copy distfiles listed in ``sources`` files for ``build:`` steps
manifested before ``improve: get_network`` into this disk. manifested before ``improve: get_network`` into this disk.
* In kernel-bootstrap offline mode (no ``--repo`` and no * In kernel-bootstrap mode with ``--external-sources`` (and no ``--repo``),
``--external-sources``), use the second image as ``payload.img``. use the second image as ``external.img``.
``payload.img`` is a raw container (not a filesystem) used to carry the ``external.img`` is a raw container (not a filesystem) used to carry the
distfiles that are not needed before ``improve: import_payload``. distfiles that are not needed before ``improve: import_payload``.
In other words, the first image only carries the minimal set needed to In other words, the first image only carries the minimal set needed to
reach the importer; the rest of the offline distfiles live in payload. reach the importer; the rest of the distfiles live in ``external.img``.
* Header magic: ``LBPAYLD1`` (8 bytes). * Header magic: ``LBPAYLD1`` (8 bytes).
* Then: little-endian ``u32`` file count. * Then: little-endian ``u32`` file count.
* Repeated for each file: little-endian ``u32`` name length, * Repeated for each file: little-endian ``u32`` name length,
little-endian ``u32`` file size, raw file name bytes, raw file bytes. little-endian ``u32`` file size, raw file name bytes, raw file bytes.
* If you are not in that mode, the second disk can still be used as an * With ``--repo``, the second disk remains an ext3 distfiles/repo disk.
optional ext3 distfiles disk, as before. * Without ``--external-sources`` and without ``--repo``, no second disk is
used: the initial image includes only pre-network distfiles, and later
distfiles are downloaded from configured mirrors after networking starts.
* Run QEMU, with 4+G RAM, optionally SMP (multicore), both drives (main * Run QEMU, with 4+G RAM, optionally SMP (multicore), both drives (main
builder image plus payload/ext3 image), a NIC with model E1000 builder image plus external image, when a second image is used), a NIC with model E1000
(``-nic user,model=e1000``), and ``-machine kernel-irqchip=split``. (``-nic user,model=e1000``), and ``-machine kernel-irqchip=split``.
c. **Bare metal:** Follow the same steps as QEMU, but the disks need to be c. **Bare metal:** Follow the same steps as QEMU, but the disks need to be
two different *physical* disks, and boot from the first disk. two different *physical* disks, and boot from the first disk.
Manual ``payload.img`` preparation Manual raw ``external.img`` preparation
---------------------------------- ---------------------------------------
The following script creates a raw ``payload.img`` from a manually prepared The following script creates a raw ``external.img`` from a manually prepared
file list. This is equivalent to what ``rootfs.py`` does for kernel-bootstrap file list. This is equivalent to what ``rootfs.py`` does for kernel-bootstrap
offline mode. with ``--external-sources`` (and no ``--repo``).
1. Prepare a ``payload.list`` with one file per line, formatted as: 1. Prepare an ``external.list`` with one file per line, formatted as:
``<archive-name> <absolute-path-to-archive>``. ``<archive-name> <absolute-path-to-archive>``.
2. Run: 2. Run:
@ -99,8 +101,8 @@ offline mode.
cat > make-payload.sh <<'EOF' cat > make-payload.sh <<'EOF'
#!/bin/sh #!/bin/sh
set -e set -e
out="${1:-payload.img}" out="${1:-external.img}"
list="${2:-payload.list}" list="${2:-external.list}"
write_u32le() { write_u32le() {
v="$1" v="$1"
@ -122,16 +124,19 @@ offline mode.
done < "${list}" done < "${list}"
EOF EOF
chmod +x make-payload.sh chmod +x make-payload.sh
./make-payload.sh payload.img payload.list ./make-payload.sh external.img external.list
3. Attach ``payload.img`` as an additional raw disk when booting in QEMU, or 3. Attach ``external.img`` as an additional raw disk when booting in QEMU, or
as the second physical disk on bare metal. as the second physical disk on bare metal.
Notes: Notes:
* ``payload.img`` is used in kernel-bootstrap offline mode regardless of * ``external.img`` raw container mode is used with ``--external-sources`` (and
``--build-guix-also``. With ``--build-guix-also``, the payload content is no ``--repo``). With ``--build-guix-also``, the container content is larger
larger because it also includes post-early sources from ``steps-guix``. because it also includes post-early sources from ``steps-guix``.
* Without ``--external-sources`` and without ``--repo``, there is no second
image. The initial image only includes distfiles needed before
``improve: get_network``; later distfiles are downloaded from mirrors.
* The runtime importer identifies the correct disk by checking the magic * The runtime importer identifies the correct disk by checking the magic
``LBPAYLD1`` on each detected block device, not by assuming a device name. ``LBPAYLD1`` on each detected block device, not by assuming a device name.

View file

@ -26,7 +26,7 @@ class Generator():
git_dir = os.path.join(os.path.dirname(os.path.join(__file__)), '..') git_dir = os.path.join(os.path.dirname(os.path.join(__file__)), '..')
distfiles_dir = os.path.join(git_dir, 'distfiles') distfiles_dir = os.path.join(git_dir, 'distfiles')
payload_magic = b'LBPAYLD1' raw_container_magic = b'LBPAYLD1'
# pylint: disable=too-many-arguments,too-many-positional-arguments # pylint: disable=too-many-arguments,too-many-positional-arguments
def __init__(self, arch, external_sources, early_preseed, repo_path, mirrors, def __init__(self, arch, external_sources, early_preseed, repo_path, mirrors,
@ -46,8 +46,9 @@ class Generator():
build_guix_also=self.build_guix_also build_guix_also=self.build_guix_also
) )
self.bootstrap_source_manifest = self.source_manifest self.bootstrap_source_manifest = self.source_manifest
self.payload_source_manifest = [] self.external_source_manifest = []
self.payload_image = None self.external_image = None
self.kernel_bootstrap_mode = None
self.target_dir = None self.target_dir = None
self.external_dir = None self.external_dir = None
@ -59,13 +60,31 @@ class Generator():
self.external_dir = os.path.join(self.target_dir, 'external') self.external_dir = os.path.join(self.target_dir, 'external')
self.distfiles() self.distfiles()
def _prepare_kernel_bootstrap_payload_manifests(self): def _select_kernel_bootstrap_mode(self):
""" """
Split early source payload from full offline payload. Select how kernel-bootstrap should transport distfiles.
""" """
# Keep the early builder payload small: include only sources needed if self.repo_path:
# before improve: import_payload runs, so payload.img is the primary self.kernel_bootstrap_mode = "repo"
# carrier for the rest of the offline distfiles. self.external_source_manifest = []
return
if self.external_sources:
self.kernel_bootstrap_mode = "raw_external"
self._prepare_kernel_bootstrap_external_manifests()
return
self.kernel_bootstrap_mode = "network_only"
self.bootstrap_source_manifest = self.early_source_manifest
self.external_source_manifest = []
def _prepare_kernel_bootstrap_external_manifests(self):
"""
Split distfiles between init image and external raw container.
"""
# Keep the early builder image small: include only sources needed
# before improve: import_payload runs, so external.img is the primary
# carrier for the remaining distfiles.
self.bootstrap_source_manifest = self.get_source_manifest( self.bootstrap_source_manifest = self.get_source_manifest(
stop_before_improve="import_payload", stop_before_improve="import_payload",
build_guix_also=False build_guix_also=False
@ -75,7 +94,7 @@ class Generator():
if self.bootstrap_source_manifest == full_manifest: if self.bootstrap_source_manifest == full_manifest:
raise ValueError("steps/manifest must include `improve: import_payload` in kernel-bootstrap mode.") raise ValueError("steps/manifest must include `improve: import_payload` in kernel-bootstrap mode.")
bootstrap_set = set(self.bootstrap_source_manifest) bootstrap_set = set(self.bootstrap_source_manifest)
self.payload_source_manifest = [entry for entry in full_manifest if entry not in bootstrap_set] self.external_source_manifest = [entry for entry in full_manifest if entry not in bootstrap_set]
def _copy_manifest_distfiles(self, out_dir, manifest): def _copy_manifest_distfiles(self, out_dir, manifest):
os.makedirs(out_dir, exist_ok=True) os.makedirs(out_dir, exist_ok=True)
@ -92,7 +111,7 @@ class Generator():
self.download_file(url, directory, file_name) self.download_file(url, directory, file_name)
self.check_file(distfile_path, checksum) self.check_file(distfile_path, checksum)
def _create_raw_payload_image(self, target_path, manifest): def _create_raw_container_image(self, target_path, manifest, image_name="external.img"):
if manifest is None: if manifest is None:
manifest = [] manifest = []
@ -103,31 +122,33 @@ class Generator():
files_by_name = {} files_by_name = {}
for checksum, _, _, file_name in manifest: for checksum, _, _, file_name in manifest:
if file_name in files_by_name and files_by_name[file_name] != checksum: if file_name in files_by_name and files_by_name[file_name] != checksum:
raise ValueError(f"Conflicting payload file with same name but different hash: {file_name}") raise ValueError(
f"Conflicting container file with same name but different hash: {file_name}"
)
files_by_name[file_name] = checksum files_by_name[file_name] = checksum
payload_path = os.path.join(target_path, "payload.img") container_path = os.path.join(target_path, image_name)
ordered_names = sorted(files_by_name.keys()) ordered_names = sorted(files_by_name.keys())
with open(payload_path, "wb") as payload: with open(container_path, "wb") as container:
payload.write(self.payload_magic) container.write(self.raw_container_magic)
payload.write(struct.pack("<I", len(ordered_names))) container.write(struct.pack("<I", len(ordered_names)))
for file_name in ordered_names: for file_name in ordered_names:
file_name_bytes = file_name.encode("utf_8") file_name_bytes = file_name.encode("utf_8")
if len(file_name_bytes) > 0xFFFFFFFF: if len(file_name_bytes) > 0xFFFFFFFF:
raise ValueError(f"Payload file name too long: {file_name}") raise ValueError(f"Container file name too long: {file_name}")
src_path = os.path.join(self.distfiles_dir, file_name) src_path = os.path.join(self.distfiles_dir, file_name)
file_size = os.path.getsize(src_path) file_size = os.path.getsize(src_path)
if file_size > 0xFFFFFFFF: if file_size > 0xFFFFFFFF:
raise ValueError(f"Payload file too large for raw container format: {file_name}") raise ValueError(f"Container file too large for raw container format: {file_name}")
payload.write(struct.pack("<II", len(file_name_bytes), file_size)) container.write(struct.pack("<II", len(file_name_bytes), file_size))
payload.write(file_name_bytes) container.write(file_name_bytes)
with open(src_path, "rb") as src_file: with open(src_path, "rb") as src_file:
shutil.copyfileobj(src_file, payload, 1024 * 1024) shutil.copyfileobj(src_file, container, 1024 * 1024)
return payload_path return container_path
def prepare(self, target, using_kernel=False, kernel_bootstrap=False, target_size=0): def prepare(self, target, using_kernel=False, kernel_bootstrap=False, target_size=0):
""" """
@ -138,9 +159,10 @@ class Generator():
""" """
self.target_dir = target.path self.target_dir = target.path
self.external_dir = os.path.join(self.target_dir, 'external') self.external_dir = os.path.join(self.target_dir, 'external')
self.payload_image = None self.external_image = None
self.payload_source_manifest = [] self.external_source_manifest = []
self.bootstrap_source_manifest = self.source_manifest self.bootstrap_source_manifest = self.source_manifest
self.kernel_bootstrap_mode = None
# We use ext3 here; ext4 actually has a variety of extensions that # We use ext3 here; ext4 actually has a variety of extensions that
# have been added with varying levels of recency # have been added with varying levels of recency
@ -153,10 +175,7 @@ class Generator():
if kernel_bootstrap: if kernel_bootstrap:
self.target_dir = os.path.join(self.target_dir, 'init') self.target_dir = os.path.join(self.target_dir, 'init')
os.mkdir(self.target_dir) os.mkdir(self.target_dir)
self._select_kernel_bootstrap_mode()
if not self.repo_path and not self.external_sources:
self.external_dir = os.path.join(self.target_dir, 'external')
self._prepare_kernel_bootstrap_payload_manifests()
elif using_kernel: elif using_kernel:
self.target_dir = os.path.join(self.target_dir, 'disk') self.target_dir = os.path.join(self.target_dir, 'disk')
self.external_dir = os.path.join(self.target_dir, 'external') self.external_dir = os.path.join(self.target_dir, 'external')
@ -188,14 +207,17 @@ class Generator():
if kernel_bootstrap: if kernel_bootstrap:
self.create_builder_hex0_disk_image(self.target_dir + '.img', target_size) self.create_builder_hex0_disk_image(self.target_dir + '.img', target_size)
if self.repo_path or self.external_sources: if self.kernel_bootstrap_mode == "repo":
mkfs_args = ['-d', os.path.join(target.path, 'external')] mkfs_args = ['-d', os.path.join(target.path, 'external')]
target.add_disk("external", filesystem="ext3", mkfs_args=mkfs_args) target.add_disk("external", filesystem="ext3", mkfs_args=mkfs_args)
else: elif self.kernel_bootstrap_mode == "raw_external":
# Offline kernel-bootstrap mode keeps the early image small and # external.img is a raw container, imported at improve: import_payload.
# puts remaining distfiles in payload.img. self.external_image = self._create_raw_container_image(
self.payload_image = self._create_raw_payload_image(target.path, self.payload_source_manifest) target.path,
target.add_existing_disk("payload", self.payload_image) self.external_source_manifest,
image_name="external.img",
)
target.add_existing_disk("external", self.external_image)
elif using_kernel: elif using_kernel:
mkfs_args = ['-F', '-d', os.path.join(target.path, 'disk')] mkfs_args = ['-F', '-d', os.path.join(target.path, 'disk')]
target.add_disk("disk", target.add_disk("disk",
@ -246,16 +268,20 @@ class Generator():
def distfiles(self): def distfiles(self):
"""Copy in distfiles""" """Copy in distfiles"""
early_distfile_dir = os.path.join(self.target_dir, 'external', 'distfiles') distfile_dir = os.path.join(self.external_dir, 'distfiles')
main_distfile_dir = os.path.join(self.external_dir, 'distfiles')
if early_distfile_dir != main_distfile_dir: if self.kernel_bootstrap_mode in ("raw_external", "repo"):
self._copy_manifest_distfiles(early_distfile_dir, self.early_source_manifest) self._copy_manifest_distfiles(distfile_dir, self.bootstrap_source_manifest)
return
if self.kernel_bootstrap_mode == "network_only":
self._copy_manifest_distfiles(distfile_dir, self.early_source_manifest)
return
if self.external_sources: if self.external_sources:
shutil.copytree(self.distfiles_dir, main_distfile_dir, dirs_exist_ok=True) shutil.copytree(self.distfiles_dir, distfile_dir, dirs_exist_ok=True)
else: else:
self._copy_manifest_distfiles(main_distfile_dir, self.bootstrap_source_manifest) self._copy_manifest_distfiles(distfile_dir, self.bootstrap_source_manifest)
@staticmethod @staticmethod
def output_dir(srcfs_file, dirpath): def output_dir(srcfs_file, dirpath):

View file

@ -129,10 +129,8 @@ def create_configuration_file(args):
""" """
config_path = os.path.join('steps', 'bootstrap.cfg') config_path = os.path.join('steps', 'bootstrap.cfg')
with open(config_path, "w", encoding="utf_8") as config: with open(config_path, "w", encoding="utf_8") as config:
payload_required = ((args.bare_metal or args.qemu) kernel_bootstrap = ((args.bare_metal or args.qemu) and not args.kernel)
and not args.kernel payload_required = kernel_bootstrap and args.external_sources and not args.repo
and not args.repo
and not args.external_sources)
config.write(f"ARCH={args.arch}\n") config.write(f"ARCH={args.arch}\n")
config.write(f"ARCH_DIR={stage0_arch_map.get(args.arch, args.arch)}\n") config.write(f"ARCH_DIR={stage0_arch_map.get(args.arch, args.arch)}\n")
config.write(f"FORCE_TIMESTAMPS={args.force_timestamps}\n") config.write(f"FORCE_TIMESTAMPS={args.force_timestamps}\n")
@ -147,11 +145,8 @@ def create_configuration_file(args):
config.write(f"QEMU={args.qemu}\n") config.write(f"QEMU={args.qemu}\n")
config.write(f"BARE_METAL={args.bare_metal or (args.qemu and args.interactive)}\n") config.write(f"BARE_METAL={args.bare_metal or (args.qemu and args.interactive)}\n")
config.write(f"BUILD_GUIX_ALSO={args.build_guix_also}\n") config.write(f"BUILD_GUIX_ALSO={args.build_guix_also}\n")
if (args.bare_metal or args.qemu) and not args.kernel: if kernel_bootstrap:
if args.repo or args.external_sources: config.write("DISK=sdb1\n" if args.repo else "DISK=sda\n")
config.write("DISK=sdb1\n")
else:
config.write("DISK=sda\n")
config.write("KERNEL_BOOTSTRAP=True\n") config.write("KERNEL_BOOTSTRAP=True\n")
else: else:
config.write("DISK=sda1\n") config.write("DISK=sda1\n")
@ -414,11 +409,16 @@ print(shutil.which('chroot'))
path = os.path.join(args.target, os.path.relpath(generator.target_dir, args.target)) path = os.path.join(args.target, os.path.relpath(generator.target_dir, args.target))
print("Please:") print("Please:")
print(f" 1. Take {path}.img and write it to a boot drive and then boot it.") print(f" 1. Take {path}.img and write it to a boot drive and then boot it.")
payload_disk = target.get_disk("payload") external_disk = target.get_disk("external")
if payload_disk is not None: if external_disk is not None:
payload_path = os.path.join(args.target, os.path.relpath(payload_disk, args.target)) external_path = os.path.join(args.target, os.path.relpath(external_disk, args.target))
print(" 2. Take " + if args.repo:
f"{payload_path} and attach it as a second raw disk (/dev/sdb preferred).") print(" 2. Take " +
f"{external_path} and attach it as a second disk (/dev/sdb preferred).")
else:
print(" 2. Take " +
f"{external_path} and attach it as a second raw container disk "
"(/dev/sdb preferred).")
else: else:
if args.stage0_image: if args.stage0_image:
@ -472,11 +472,6 @@ print(shutil.which('chroot'))
arg_list += [ arg_list += [
'-drive', 'file=' + target.get_disk("external") + ',format=raw', '-drive', 'file=' + target.get_disk("external") + ',format=raw',
] ]
payload_disk = target.get_disk("payload")
if payload_disk is not None:
arg_list += [
'-drive', 'file=' + payload_disk + ',format=raw',
]
arg_list += [ arg_list += [
'-machine', 'kernel-irqchip=split', '-machine', 'kernel-irqchip=split',
'-nic', 'user,ipv6=off,model=e1000' '-nic', 'user,ipv6=off,model=e1000'