fix(kernel-bootstrap): unify external raw container flow and remove default second disk

This commit is contained in:
vxtls 2026-03-03 08:33:19 -05:00
parent 919200478b
commit 85c389044d
4 changed files with 120 additions and 91 deletions

View file

@ -2,10 +2,10 @@
This repository uses [`README.rst`](./README.rst) as the canonical main documentation.
## Kernel-bootstrap `payload.img`
## Kernel-bootstrap raw `external.img`
`payload.img` is a raw container disk used in kernel-bootstrap offline mode
(`--repo` and `--external-sources` are both unset).
`external.img` is a raw container disk used in kernel-bootstrap mode when
`--external-sources` is set and `--repo` is unset.
### Why not put everything in the initial image?
@ -20,7 +20,7 @@ around the Fiwix transition in QEMU or on bare metal).
So the design is intentionally split:
- Initial image: only what is required to reach `improve: import_payload`
- `payload.img`: the rest of offline distfiles
- `external.img`: the rest of distfiles
This is not a patch-style workaround. It is a two-phase transport design that
keeps early boot deterministic and moves bulk data import to a stage where the
@ -29,14 +29,14 @@ runtime is robust enough to process it safely.
### Why import from an external image and copy into main filesystem?
Because the bootstrap still expects distfiles to end up under the normal local
path (`/external/distfiles`) for later steps. `payload.img` is used as a
path (`/external/distfiles`) for later steps. `external.img` is used as a
transport medium only.
The flow is:
1. Boot minimal initial image.
2. Reach `improve: import_payload`.
3. Detect the payload disk by magic (`LBPAYLD1`) across detected block devices.
3. Detect the external container disk by magic (`LBPAYLD1`) across detected block devices.
4. Copy payload files into `/external/distfiles`.
5. Continue the build exactly as if files had been present locally all along.
@ -54,7 +54,7 @@ The importer probes detected block devices and selects the one with magic `LBPAY
### Manual creation without Python
Prepare `payload.list` as:
Prepare `external.list` as:
```text
<archive-name> <absolute-path-to-archive>
@ -66,8 +66,8 @@ Then:
cat > make-payload.sh <<'SH'
#!/bin/sh
set -e
out="${1:-payload.img}"
list="${2:-payload.list}"
out="${1:-external.img}"
list="${2:-external.list}"
write_u32le() {
v="$1"
@ -89,14 +89,17 @@ while read -r name path; do
done < "${list}"
SH
chmod +x make-payload.sh
./make-payload.sh payload.img payload.list
./make-payload.sh external.img external.list
```
Attach `payload.img` as an extra raw disk in QEMU, or as the second disk on bare metal.
Attach `external.img` as an extra raw disk in QEMU, or as the second disk on bare metal.
### When it is used
- Used in kernel-bootstrap offline mode.
- Not used when `--repo` or `--external-sources` is provided.
- `--build-guix-also` increases payload contents (includes post-early `steps-guix`
- Used in kernel-bootstrap with `--external-sources` and without `--repo`.
- Not used with `--repo` (that path still uses an ext filesystem disk).
- Without `--external-sources` and without `--repo`, there is no second disk:
the initial image only includes distfiles needed before `improve: get_network`,
and later distfiles are downloaded from mirrors.
- `--build-guix-also` increases container contents (includes post-early `steps-guix`
sources), but does not change the mechanism.

View file

@ -63,34 +63,36 @@ Without using Python:
* *Only* copy distfiles listed in ``sources`` files for ``build:`` steps
manifested before ``improve: get_network`` into this disk.
* In kernel-bootstrap offline mode (no ``--repo`` and no
``--external-sources``), use the second image as ``payload.img``.
``payload.img`` is a raw container (not a filesystem) used to carry the
* In kernel-bootstrap mode with ``--external-sources`` (and no ``--repo``),
use the second image as ``external.img``.
``external.img`` is a raw container (not a filesystem) used to carry the
distfiles that are not needed before ``improve: import_payload``.
In other words, the first image only carries the minimal set needed to
reach the importer; the rest of the offline distfiles live in payload.
reach the importer; the rest of the distfiles live in ``external.img``.
* Header magic: ``LBPAYLD1`` (8 bytes).
* Then: little-endian ``u32`` file count.
* Repeated for each file: little-endian ``u32`` name length,
little-endian ``u32`` file size, raw file name bytes, raw file bytes.
* If you are not in that mode, the second disk can still be used as an
optional ext3 distfiles disk, as before.
* With ``--repo``, the second disk remains an ext3 distfiles/repo disk.
* Without ``--external-sources`` and without ``--repo``, no second disk is
used: the initial image includes only pre-network distfiles, and later
distfiles are downloaded from configured mirrors after networking starts.
* Run QEMU, with 4+G RAM, optionally SMP (multicore), both drives (main
builder image plus payload/ext3 image), a NIC with model E1000
builder image plus external image, when a second image is used), a NIC with model E1000
(``-nic user,model=e1000``), and ``-machine kernel-irqchip=split``.
c. **Bare metal:** Follow the same steps as QEMU, but the disks need to be
two different *physical* disks, and boot from the first disk.
Manual ``payload.img`` preparation
----------------------------------
Manual raw ``external.img`` preparation
---------------------------------------
The following script creates a raw ``payload.img`` from a manually prepared
The following script creates a raw ``external.img`` from a manually prepared
file list. This is equivalent to what ``rootfs.py`` does for kernel-bootstrap
offline mode.
with ``--external-sources`` (and no ``--repo``).
1. Prepare a ``payload.list`` with one file per line, formatted as:
1. Prepare an ``external.list`` with one file per line, formatted as:
``<archive-name> <absolute-path-to-archive>``.
2. Run:
@ -99,8 +101,8 @@ offline mode.
cat > make-payload.sh <<'EOF'
#!/bin/sh
set -e
out="${1:-payload.img}"
list="${2:-payload.list}"
out="${1:-external.img}"
list="${2:-external.list}"
write_u32le() {
v="$1"
@ -122,16 +124,19 @@ offline mode.
done < "${list}"
EOF
chmod +x make-payload.sh
./make-payload.sh payload.img payload.list
./make-payload.sh external.img external.list
3. Attach ``payload.img`` as an additional raw disk when booting in QEMU, or
3. Attach ``external.img`` as an additional raw disk when booting in QEMU, or
as the second physical disk on bare metal.
Notes:
* ``payload.img`` is used in kernel-bootstrap offline mode regardless of
``--build-guix-also``. With ``--build-guix-also``, the payload content is
larger because it also includes post-early sources from ``steps-guix``.
* ``external.img`` raw container mode is used with ``--external-sources`` (and
no ``--repo``). With ``--build-guix-also``, the container content is larger
because it also includes post-early sources from ``steps-guix``.
* Without ``--external-sources`` and without ``--repo``, there is no second
image. The initial image only includes distfiles needed before
``improve: get_network``; later distfiles are downloaded from mirrors.
* The runtime importer identifies the correct disk by checking the magic
``LBPAYLD1`` on each detected block device, not by assuming a device name.

View file

@ -26,7 +26,7 @@ class Generator():
git_dir = os.path.join(os.path.dirname(os.path.join(__file__)), '..')
distfiles_dir = os.path.join(git_dir, 'distfiles')
payload_magic = b'LBPAYLD1'
raw_container_magic = b'LBPAYLD1'
# pylint: disable=too-many-arguments,too-many-positional-arguments
def __init__(self, arch, external_sources, early_preseed, repo_path, mirrors,
@ -46,8 +46,9 @@ class Generator():
build_guix_also=self.build_guix_also
)
self.bootstrap_source_manifest = self.source_manifest
self.payload_source_manifest = []
self.payload_image = None
self.external_source_manifest = []
self.external_image = None
self.kernel_bootstrap_mode = None
self.target_dir = None
self.external_dir = None
@ -59,13 +60,31 @@ class Generator():
self.external_dir = os.path.join(self.target_dir, 'external')
self.distfiles()
def _prepare_kernel_bootstrap_payload_manifests(self):
def _select_kernel_bootstrap_mode(self):
"""
Split early source payload from full offline payload.
Select how kernel-bootstrap should transport distfiles.
"""
# Keep the early builder payload small: include only sources needed
# before improve: import_payload runs, so payload.img is the primary
# carrier for the rest of the offline distfiles.
if self.repo_path:
self.kernel_bootstrap_mode = "repo"
self.external_source_manifest = []
return
if self.external_sources:
self.kernel_bootstrap_mode = "raw_external"
self._prepare_kernel_bootstrap_external_manifests()
return
self.kernel_bootstrap_mode = "network_only"
self.bootstrap_source_manifest = self.early_source_manifest
self.external_source_manifest = []
def _prepare_kernel_bootstrap_external_manifests(self):
"""
Split distfiles between init image and external raw container.
"""
# Keep the early builder image small: include only sources needed
# before improve: import_payload runs, so external.img is the primary
# carrier for the remaining distfiles.
self.bootstrap_source_manifest = self.get_source_manifest(
stop_before_improve="import_payload",
build_guix_also=False
@ -75,7 +94,7 @@ class Generator():
if self.bootstrap_source_manifest == full_manifest:
raise ValueError("steps/manifest must include `improve: import_payload` in kernel-bootstrap mode.")
bootstrap_set = set(self.bootstrap_source_manifest)
self.payload_source_manifest = [entry for entry in full_manifest if entry not in bootstrap_set]
self.external_source_manifest = [entry for entry in full_manifest if entry not in bootstrap_set]
def _copy_manifest_distfiles(self, out_dir, manifest):
os.makedirs(out_dir, exist_ok=True)
@ -92,7 +111,7 @@ class Generator():
self.download_file(url, directory, file_name)
self.check_file(distfile_path, checksum)
def _create_raw_payload_image(self, target_path, manifest):
def _create_raw_container_image(self, target_path, manifest, image_name="external.img"):
if manifest is None:
manifest = []
@ -103,31 +122,33 @@ class Generator():
files_by_name = {}
for checksum, _, _, file_name in manifest:
if file_name in files_by_name and files_by_name[file_name] != checksum:
raise ValueError(f"Conflicting payload file with same name but different hash: {file_name}")
raise ValueError(
f"Conflicting container file with same name but different hash: {file_name}"
)
files_by_name[file_name] = checksum
payload_path = os.path.join(target_path, "payload.img")
container_path = os.path.join(target_path, image_name)
ordered_names = sorted(files_by_name.keys())
with open(payload_path, "wb") as payload:
payload.write(self.payload_magic)
payload.write(struct.pack("<I", len(ordered_names)))
with open(container_path, "wb") as container:
container.write(self.raw_container_magic)
container.write(struct.pack("<I", len(ordered_names)))
for file_name in ordered_names:
file_name_bytes = file_name.encode("utf_8")
if len(file_name_bytes) > 0xFFFFFFFF:
raise ValueError(f"Payload file name too long: {file_name}")
raise ValueError(f"Container file name too long: {file_name}")
src_path = os.path.join(self.distfiles_dir, file_name)
file_size = os.path.getsize(src_path)
if file_size > 0xFFFFFFFF:
raise ValueError(f"Payload file too large for raw container format: {file_name}")
raise ValueError(f"Container file too large for raw container format: {file_name}")
payload.write(struct.pack("<II", len(file_name_bytes), file_size))
payload.write(file_name_bytes)
container.write(struct.pack("<II", len(file_name_bytes), file_size))
container.write(file_name_bytes)
with open(src_path, "rb") as src_file:
shutil.copyfileobj(src_file, payload, 1024 * 1024)
shutil.copyfileobj(src_file, container, 1024 * 1024)
return payload_path
return container_path
def prepare(self, target, using_kernel=False, kernel_bootstrap=False, target_size=0):
"""
@ -138,9 +159,10 @@ class Generator():
"""
self.target_dir = target.path
self.external_dir = os.path.join(self.target_dir, 'external')
self.payload_image = None
self.payload_source_manifest = []
self.external_image = None
self.external_source_manifest = []
self.bootstrap_source_manifest = self.source_manifest
self.kernel_bootstrap_mode = None
# We use ext3 here; ext4 actually has a variety of extensions that
# have been added with varying levels of recency
@ -153,10 +175,7 @@ class Generator():
if kernel_bootstrap:
self.target_dir = os.path.join(self.target_dir, 'init')
os.mkdir(self.target_dir)
if not self.repo_path and not self.external_sources:
self.external_dir = os.path.join(self.target_dir, 'external')
self._prepare_kernel_bootstrap_payload_manifests()
self._select_kernel_bootstrap_mode()
elif using_kernel:
self.target_dir = os.path.join(self.target_dir, 'disk')
self.external_dir = os.path.join(self.target_dir, 'external')
@ -188,14 +207,17 @@ class Generator():
if kernel_bootstrap:
self.create_builder_hex0_disk_image(self.target_dir + '.img', target_size)
if self.repo_path or self.external_sources:
if self.kernel_bootstrap_mode == "repo":
mkfs_args = ['-d', os.path.join(target.path, 'external')]
target.add_disk("external", filesystem="ext3", mkfs_args=mkfs_args)
else:
# Offline kernel-bootstrap mode keeps the early image small and
# puts remaining distfiles in payload.img.
self.payload_image = self._create_raw_payload_image(target.path, self.payload_source_manifest)
target.add_existing_disk("payload", self.payload_image)
elif self.kernel_bootstrap_mode == "raw_external":
# external.img is a raw container, imported at improve: import_payload.
self.external_image = self._create_raw_container_image(
target.path,
self.external_source_manifest,
image_name="external.img",
)
target.add_existing_disk("external", self.external_image)
elif using_kernel:
mkfs_args = ['-F', '-d', os.path.join(target.path, 'disk')]
target.add_disk("disk",
@ -246,16 +268,20 @@ class Generator():
def distfiles(self):
"""Copy in distfiles"""
early_distfile_dir = os.path.join(self.target_dir, 'external', 'distfiles')
main_distfile_dir = os.path.join(self.external_dir, 'distfiles')
distfile_dir = os.path.join(self.external_dir, 'distfiles')
if early_distfile_dir != main_distfile_dir:
self._copy_manifest_distfiles(early_distfile_dir, self.early_source_manifest)
if self.kernel_bootstrap_mode in ("raw_external", "repo"):
self._copy_manifest_distfiles(distfile_dir, self.bootstrap_source_manifest)
return
if self.kernel_bootstrap_mode == "network_only":
self._copy_manifest_distfiles(distfile_dir, self.early_source_manifest)
return
if self.external_sources:
shutil.copytree(self.distfiles_dir, main_distfile_dir, dirs_exist_ok=True)
shutil.copytree(self.distfiles_dir, distfile_dir, dirs_exist_ok=True)
else:
self._copy_manifest_distfiles(main_distfile_dir, self.bootstrap_source_manifest)
self._copy_manifest_distfiles(distfile_dir, self.bootstrap_source_manifest)
@staticmethod
def output_dir(srcfs_file, dirpath):

View file

@ -129,10 +129,8 @@ def create_configuration_file(args):
"""
config_path = os.path.join('steps', 'bootstrap.cfg')
with open(config_path, "w", encoding="utf_8") as config:
payload_required = ((args.bare_metal or args.qemu)
and not args.kernel
and not args.repo
and not args.external_sources)
kernel_bootstrap = ((args.bare_metal or args.qemu) and not args.kernel)
payload_required = kernel_bootstrap and args.external_sources and not args.repo
config.write(f"ARCH={args.arch}\n")
config.write(f"ARCH_DIR={stage0_arch_map.get(args.arch, args.arch)}\n")
config.write(f"FORCE_TIMESTAMPS={args.force_timestamps}\n")
@ -147,11 +145,8 @@ def create_configuration_file(args):
config.write(f"QEMU={args.qemu}\n")
config.write(f"BARE_METAL={args.bare_metal or (args.qemu and args.interactive)}\n")
config.write(f"BUILD_GUIX_ALSO={args.build_guix_also}\n")
if (args.bare_metal or args.qemu) and not args.kernel:
if args.repo or args.external_sources:
config.write("DISK=sdb1\n")
else:
config.write("DISK=sda\n")
if kernel_bootstrap:
config.write("DISK=sdb1\n" if args.repo else "DISK=sda\n")
config.write("KERNEL_BOOTSTRAP=True\n")
else:
config.write("DISK=sda1\n")
@ -414,11 +409,16 @@ print(shutil.which('chroot'))
path = os.path.join(args.target, os.path.relpath(generator.target_dir, args.target))
print("Please:")
print(f" 1. Take {path}.img and write it to a boot drive and then boot it.")
payload_disk = target.get_disk("payload")
if payload_disk is not None:
payload_path = os.path.join(args.target, os.path.relpath(payload_disk, args.target))
print(" 2. Take " +
f"{payload_path} and attach it as a second raw disk (/dev/sdb preferred).")
external_disk = target.get_disk("external")
if external_disk is not None:
external_path = os.path.join(args.target, os.path.relpath(external_disk, args.target))
if args.repo:
print(" 2. Take " +
f"{external_path} and attach it as a second disk (/dev/sdb preferred).")
else:
print(" 2. Take " +
f"{external_path} and attach it as a second raw container disk "
"(/dev/sdb preferred).")
else:
if args.stage0_image:
@ -472,11 +472,6 @@ print(shutil.which('chroot'))
arg_list += [
'-drive', 'file=' + target.get_disk("external") + ',format=raw',
]
payload_disk = target.get_disk("payload")
if payload_disk is not None:
arg_list += [
'-drive', 'file=' + payload_disk + ',format=raw',
]
arg_list += [
'-machine', 'kernel-irqchip=split',
'-nic', 'user,ipv6=off,model=e1000'