diff --git a/Payload_img_design.md b/Payload_img_design.md new file mode 100644 index 00000000..cc5eda04 --- /dev/null +++ b/Payload_img_design.md @@ -0,0 +1,106 @@ +# live-bootstrap + +This repository uses [`README.rst`](./README.rst) as the canonical main documentation. + +## Kernel-bootstrap raw `external.img` + +`external.img` is a raw container disk used in kernel-bootstrap mode when +`--external-sources` is set and `--repo` is unset. + +### Why not put everything in the initial image? + +In kernel-bootstrap mode, the first boot image is consumed by very early +runtime code before the system reaches the normal bash-based build stage. +That early stage has tight assumptions about memory layout and file table usage. + +When too many distfiles are packed into the initial image, those assumptions can +be exceeded, which leads to unstable handoff behavior (for example, failures +around the Fiwix transition in QEMU or on bare metal). + +So the design is intentionally split: + +- Initial image: only what is required to reach `improve: import_payload` +- `external.img`: the rest of distfiles + +This is not a patch-style workaround. It is a two-phase transport design that +keeps early boot deterministic and moves bulk data import to a stage where the +runtime is robust enough to process it safely. + +### Why import from an external image and copy into main filesystem? + +Because the bootstrap still expects distfiles to end up under the normal local +path (`/external/distfiles`) for later steps. `external.img` is used as a +transport medium only. + +The flow is: + +1. Boot minimal initial image. +2. Reach `improve: import_payload`. +3. Detect the external container disk by magic (`LBPAYLD1`) across detected block devices. +4. Copy payload files into `/external/distfiles`. +5. Continue the build exactly as if files had been present locally all along. + +### Format + +- Magic: `LBPAYLD1` (8 bytes) +- Then: little-endian `u64` file count +- Repeated entries: + - little-endian `u64` name length + - little-endian `u64` file size + - file name string, encoded as UTF-8 bytes (no terminator) + - file bytes + +`name length` is the number of bytes in the UTF-8 encoded file name (not the number of Unicode code points). + +The importer probes detected block devices and selects the one with magic `LBPAYLD1`. + +### Manual creation without Python + +Prepare `external.list` as: + +```text + +``` + +Then: + +```sh +cat > make-payload.sh <<'SH' +#!/bin/sh +set -e +out="${1:-external.img}" +list="${2:-external.list}" + +write_u64le() { + v="$1" + printf '%016x' "$v" | sed -E 's/(..)(..)(..)(..)(..)(..)(..)(..)/\8\7\6\5\4\3\2\1/' | xxd -r -p +} + +count="$(wc -l < "${list}" | tr -d ' ')" +: > "${out}" +printf 'LBPAYLD1' >> "${out}" +write_u64le "${count}" >> "${out}" + +while read -r name path; do + [ -n "${name}" ] || continue + size="$(wc -c < "${path}" | tr -d ' ')" + name_len="$(printf '%s' "${name}" | wc -c | tr -d ' ')" + write_u64le "${name_len}" >> "${out}" + write_u64le "${size}" >> "${out}" + printf '%s' "${name}" >> "${out}" + cat "${path}" >> "${out}" +done < "${list}" +SH +chmod +x make-payload.sh +./make-payload.sh external.img external.list +``` + +Attach `external.img` as an extra raw disk in QEMU, or as the second disk on bare metal. + +### When it is used + +- Used in kernel-bootstrap with `--external-sources` and without `--repo`. +- Not used with `--repo` (that path still uses an ext filesystem disk). +- Without `--external-sources` and without `--repo`, there is no second disk: + the initial image only includes distfiles needed before `improve: get_network`, + and later distfiles are downloaded from mirrors. diff --git a/README.rst b/README.rst index 154c860c..7afef338 100644 --- a/README.rst +++ b/README.rst @@ -63,17 +63,85 @@ Without using Python: * *Only* copy distfiles listed in ``sources`` files for ``build:`` steps manifested before ``improve: get_network`` into this disk. - * Optionally (if you don't do this, distfiles will be network downloaded): + * In kernel-bootstrap mode with ``--external-sources`` (and no ``--repo``), + use the second image as ``external.img``. + ``external.img`` is a raw container (not a filesystem) used to carry the + distfiles that are not needed before ``improve: import_payload``. + In other words, the first image only carries the minimal set needed to + reach the importer; the rest of the distfiles live in ``external.img``. - * On the second image, create an MSDOS partition table and one ext3 - partition. - * Copy ``distfiles/`` into this disk. - * Run QEMU, with 4+G RAM, optionally SMP (multicore), both drives (in the - order introduced above), a NIC with model E1000 + * Header magic: ``LBPAYLD1`` (8 bytes). + * Then: little-endian ``u64`` file count. + * Repeated for each file: little-endian ``u64`` name length, + little-endian ``u64`` file size, UTF-8 encoded file name bytes + (no terminator), raw file bytes. + * ``name length`` is the number of UTF-8 bytes (not Unicode code points). + + * With ``--repo``, the second disk remains an ext3 distfiles/repo disk. + * Without ``--external-sources`` and without ``--repo``, no second disk is + used: the initial image includes only pre-network distfiles, and later + distfiles are downloaded from configured mirrors after networking starts. + * Run QEMU, with 4+G RAM, optionally SMP (multicore), both drives (main + builder image plus external image, when a second image is used), a NIC with model E1000 (``-nic user,model=e1000``), and ``-machine kernel-irqchip=split``. c. **Bare metal:** Follow the same steps as QEMU, but the disks need to be two different *physical* disks, and boot from the first disk. +Manual raw ``external.img`` preparation +--------------------------------------- + +The following script creates a raw ``external.img`` from a manually prepared +file list. This is equivalent to what ``rootfs.py`` does for kernel-bootstrap +with ``--external-sources`` (and no ``--repo``). + +1. Prepare an ``external.list`` with one file per line, formatted as: + `` ``. +2. Run: + + :: + + cat > make-payload.sh <<'EOF' + #!/bin/sh + set -e + out="${1:-external.img}" + list="${2:-external.list}" + + write_u64le() { + v="$1" + printf '%016x' "$v" | sed -E 's/(..)(..)(..)(..)(..)(..)(..)(..)/\8\7\6\5\4\3\2\1/' | xxd -r -p + } + + count="$(wc -l < "${list}" | tr -d ' ')" + : > "${out}" + printf 'LBPAYLD1' >> "${out}" + write_u64le "${count}" >> "${out}" + + while read -r name path; do + [ -n "${name}" ] || continue + size="$(wc -c < "${path}" | tr -d ' ')" + name_len="$(printf '%s' "${name}" | wc -c | tr -d ' ')" + write_u64le "${name_len}" >> "${out}" + write_u64le "${size}" >> "${out}" + printf '%s' "${name}" >> "${out}" + cat "${path}" >> "${out}" + done < "${list}" + EOF + chmod +x make-payload.sh + ./make-payload.sh external.img external.list + +3. Attach ``external.img`` as an additional raw disk when booting in QEMU, or + as the second physical disk on bare metal. + +Notes: + +* ``external.img`` raw container mode is used with ``--external-sources`` (and + no ``--repo``). +* Without ``--external-sources`` and without ``--repo``, there is no second + image. The initial image only includes distfiles needed before + ``improve: get_network``; later distfiles are downloaded from mirrors. +* The runtime importer identifies the correct disk by checking the magic + ``LBPAYLD1`` on each detected block device, not by assuming a device name. + Mirrors ------- diff --git a/lib/generator.py b/lib/generator.py index 9b1f8bce..585677d6 100755 --- a/lib/generator.py +++ b/lib/generator.py @@ -12,6 +12,7 @@ import hashlib import os import random import shutil +import struct import tarfile import traceback @@ -25,6 +26,7 @@ class Generator(): git_dir = os.path.join(os.path.dirname(os.path.join(__file__)), '..') distfiles_dir = os.path.join(git_dir, 'distfiles') + raw_container_magic = b'LBPAYLD1' # pylint: disable=too-many-arguments,too-many-positional-arguments def __init__(self, arch, external_sources, early_preseed, repo_path, mirrors): @@ -33,8 +35,21 @@ class Generator(): self.external_sources = external_sources self.repo_path = repo_path self.mirrors = mirrors - self.source_manifest = self.get_source_manifest(not self.external_sources) - self.early_source_manifest = self.get_source_manifest(True) + self.pre_network_source_manifest = self.get_source_manifest( + stop_before_improve="get_network", + ) + self.pre_import_source_manifest = self.get_source_manifest( + stop_before_improve="import_payload", + ) + # Only raw-external mode needs full upfront availability for container generation. + if self.external_sources and not self.repo_path: + self.source_manifest = self.get_source_manifest() + else: + self.source_manifest = self.pre_network_source_manifest + self.bootstrap_source_manifest = self.source_manifest + self.external_source_manifest = [] + self.external_image = None + self.kernel_bootstrap_mode = None self.target_dir = None self.external_dir = None @@ -46,6 +61,115 @@ class Generator(): self.external_dir = os.path.join(self.target_dir, 'external') self.distfiles() + def _select_kernel_bootstrap_mode(self): + """ + Select how kernel-bootstrap should transport distfiles. + """ + if self.repo_path: + # Keep second-disk staging outside init image for ext3 repo mode. + self.external_dir = os.path.join(os.path.dirname(self.target_dir), 'external') + self.kernel_bootstrap_mode = "repo" + self.external_source_manifest = [] + return + + if self.external_sources: + # Raw external container mode keeps early distfiles inside init image. + self.external_dir = os.path.join(self.target_dir, 'external') + self.kernel_bootstrap_mode = "raw_external" + self._prepare_kernel_bootstrap_external_manifests() + return + + # Network-only mode keeps pre-network distfiles inside init image. + self.external_dir = os.path.join(self.target_dir, 'external') + self.kernel_bootstrap_mode = "network_only" + self.bootstrap_source_manifest = self.pre_network_source_manifest + self.external_source_manifest = [] + + def _prepare_kernel_bootstrap_external_manifests(self): + """ + Split distfiles between init image and external raw container. + """ + # Keep the early builder image small: include only sources needed + # before improve: import_payload runs, so external.img is the primary + # carrier for the remaining distfiles. + self.bootstrap_source_manifest = self.pre_import_source_manifest + + full_manifest = self.get_source_manifest() + if self.bootstrap_source_manifest == full_manifest: + raise ValueError("steps/manifest must include `improve: import_payload` in kernel-bootstrap mode.") + bootstrap_set = set(self.bootstrap_source_manifest) + self.external_source_manifest = [entry for entry in full_manifest if entry not in bootstrap_set] + + def _kernel_bootstrap_init_manifest(self): + """ + Return the exact manifest that is allowed inside init image. + """ + mode_to_manifest = { + "network_only": self.pre_network_source_manifest, # up to get_network + "raw_external": self.bootstrap_source_manifest, # up to import_payload + "repo": self.pre_network_source_manifest, # up to get_network + } + manifest = mode_to_manifest.get(self.kernel_bootstrap_mode) + if manifest is None: + raise ValueError(f"Unexpected kernel bootstrap mode: {self.kernel_bootstrap_mode}") + return manifest + + def _copy_manifest_distfiles(self, out_dir, manifest): + os.makedirs(out_dir, exist_ok=True) + for entry in manifest: + file_name = entry[3].strip() + shutil.copy2(os.path.join(self.distfiles_dir, file_name), + os.path.join(out_dir, file_name)) + + def _ensure_manifest_distfiles(self, manifest): + for entry in manifest: + checksum, directory, url, file_name = entry + distfile_path = os.path.join(directory, file_name) + if not os.path.isfile(distfile_path): + self.download_file(url, directory, file_name) + self.check_file(distfile_path, checksum) + + def _create_raw_container_image(self, target_path, manifest, image_name="external.img"): + if manifest is None: + manifest = [] + + if manifest: + # Guarantee all payload distfiles exist and match checksums. + self._ensure_manifest_distfiles(manifest) + + files_by_name = {} + for checksum, _, _, file_name in manifest: + if file_name in files_by_name and files_by_name[file_name] != checksum: + raise ValueError( + f"Conflicting container file with same name but different hash: {file_name}" + ) + files_by_name[file_name] = checksum + + container_path = os.path.join(target_path, image_name) + ordered_names = sorted(files_by_name.keys()) + with open(container_path, "wb") as container: + container.write(self.raw_container_magic) + if len(ordered_names) > 0xFFFFFFFFFFFFFFFF: + raise ValueError("Too many files for raw container format.") + container.write(struct.pack(" 0xFFFFFFFFFFFFFFFF: + raise ValueError(f"Container file name too long: {file_name}") + + src_path = os.path.join(self.distfiles_dir, file_name) + file_size = os.path.getsize(src_path) + if file_size > 0xFFFFFFFFFFFFFFFF: + raise ValueError(f"Container file too large for raw container format: {file_name}") + + container.write(struct.pack("/dev/null 2>&1 || : + fi + + if [ ! -r /proc/partitions ]; then + echo "payload-import failed: /proc/partitions is unavailable." >&2 + exit 1 + fi + + while read -r major minor blocks name; do + case "${major}" in + ""|major|*[!0-9]*) + continue + ;; + esac + case "${minor}" in + ""|minor|*[!0-9]*) + continue + ;; + esac + + dev_path="/dev/lbpayload-${major}-${minor}" + [ -b "${dev_path}" ] || mknod -m 600 "${dev_path}" b "${major}" "${minor}" >/dev/null 2>&1 || : + + if payload-import --probe "${dev_path}" >/dev/null 2>&1; then + payload-import --device "${dev_path}" /external/distfiles + found_payload=1 + break + fi + done < /proc/partitions + + if [ "${found_payload}" != 1 ]; then + echo "payload-import failed: no payload image found in /proc/partitions devices." >&2 + exit 1 + fi +fi diff --git a/steps/manifest b/steps/manifest index 7ee6e7cb..b5060e63 100644 --- a/steps/manifest +++ b/steps/manifest @@ -115,6 +115,7 @@ build: findutils-4.2.33 build: musl-1.2.5 build: linux-headers-4.14.341-openela build: gcc-4.0.4 +build: payload-import-1.0 ( KERNEL_BOOTSTRAP == True ) build: util-linux-2.19.1 build: e2fsprogs-1.45.7 build: dhcpcd-10.0.1 @@ -132,6 +133,7 @@ jump: break ( INTERNAL_CI == pass1 ) improve: populate_device_nodes jump: linux ( CHROOT == False ) jump: move_disk ( KERNEL_BOOTSTRAP == True ) +improve: import_payload ( KERNEL_BOOTSTRAP == True ) improve: finalize_job_count improve: finalize_fhs improve: open_console ( CONSOLES == True ) diff --git a/steps/payload-import-1.0/pass1.sh b/steps/payload-import-1.0/pass1.sh new file mode 100644 index 00000000..ebe73540 --- /dev/null +++ b/steps/payload-import-1.0/pass1.sh @@ -0,0 +1,21 @@ +#!/bin/bash +# +# SPDX-FileCopyrightText: 2026 live-bootstrap contributors +# SPDX-License-Identifier: MIT + +src_get() { + : +} + +src_unpack() { + dirname=. + cp -r ../src . +} + +src_compile() { + gcc -m32 -march=i386 -std=c89 -static -o payload-import src/payload-import.c +} + +src_install() { + install -D payload-import "${DESTDIR}${BINDIR}/payload-import" +} diff --git a/steps/payload-import-1.0/src/payload-import.c b/steps/payload-import-1.0/src/payload-import.c new file mode 100644 index 00000000..2e9a59a7 --- /dev/null +++ b/steps/payload-import-1.0/src/payload-import.c @@ -0,0 +1,353 @@ +/* SPDX-FileCopyrightText: 2026 live-bootstrap contributors */ +/* SPDX-License-Identifier: MIT */ + +#include +#include +#include +#include +#include +#include + +#define MAGIC "LBPAYLD1" +#define MAGIC_LEN 8 +#define MAX_NAME_LEN 1024 +#define COPY_BUFSZ 65536 +#define SYS_MOUNT 21 + +static unsigned long long read_u64le(const unsigned char *buf) +{ + return (unsigned long long)buf[0] + | ((unsigned long long)buf[1] << 8) + | ((unsigned long long)buf[2] << 16) + | ((unsigned long long)buf[3] << 24) + | ((unsigned long long)buf[4] << 32) + | ((unsigned long long)buf[5] << 40) + | ((unsigned long long)buf[6] << 48) + | ((unsigned long long)buf[7] << 56); +} + +static int read_exact(FILE *in, void *buf, size_t len) +{ + size_t got = 0; + unsigned char *out = (unsigned char *)buf; + + while (got < len) { + size_t n = fread(out + got, 1, len - got, in); + if (n == 0) { + return -1; + } + got += n; + } + return 0; +} + +static int copy_exact(FILE *in, FILE *out, unsigned long long len, + const char *device, const char *name, const char *out_path) +{ + unsigned char *buf; + unsigned long long remaining = len; + + buf = (unsigned char *)malloc(COPY_BUFSZ); + if (buf == NULL) { + fputs("payload-import: out of memory\n", stderr); + return 1; + } + + while (remaining > 0) { + size_t chunk = (size_t)COPY_BUFSZ; + unsigned long long copied = len - remaining; + size_t nread; + size_t written; + if (remaining < (unsigned long long)COPY_BUFSZ) { + chunk = (size_t)remaining; + } + nread = fread(buf, 1, chunk, in); + if (nread != chunk) { + if (feof(in)) { + fprintf(stderr, + "payload-import: truncated payload while reading %s from %s " + "(offset=%llu wanted=%llu got=%llu)\n", + name, device, + copied, + (unsigned long long)chunk, + (unsigned long long)nread); + } else { + fprintf(stderr, + "payload-import: read error while reading %s from %s " + "(offset=%llu): %s\n", + name, device, copied, strerror(errno)); + } + free(buf); + return 1; + } + written = fwrite(buf, 1, chunk, out); + if (written != chunk) { + fprintf(stderr, + "payload-import: write error while writing %s to %s " + "(offset=%llu wanted=%llu wrote=%llu): %s\n", + name, out_path, + copied, + (unsigned long long)chunk, + (unsigned long long)written, + strerror(errno)); + free(buf); + return 1; + } + remaining -= (unsigned long long)chunk; + } + + free(buf); + return 0; +} + +static int is_valid_name(const char *name) +{ + const unsigned char *s = (const unsigned char *)name; + + if (*s == 0) { + return 0; + } + + while (*s != 0) { + if (*s == '/' || *s == '\\') { + return 0; + } + s += 1; + } + return 1; +} + +static int has_payload_magic(const char *path) +{ + FILE *in; + char magic[MAGIC_LEN]; + + in = fopen(path, "rb"); + if (in == NULL) { + return 1; + } + if (read_exact(in, magic, MAGIC_LEN) != 0) { + fclose(in); + return 1; + } + fclose(in); + if (memcmp(magic, MAGIC, MAGIC_LEN) != 0) { + return 1; + } + return 0; +} + +#ifndef __i386__ +#error "This is only for x86 i386 fiwix/linux" +#endif +static int sys_mount(const char *source, const char *target, + const char *fstype, unsigned int flags, const void *data) +{ + int ret; + /* Only for x86 fiwix/linux */ + __asm__ __volatile__( + "int $0x80" + : "=a"(ret) + : "0"(SYS_MOUNT), + "b"(source), + "c"(target), + "d"(fstype), + "S"(flags), + "D"(data) + : "memory" + ); + + return ret; +} + +static int ensure_proc_partitions(void) +{ + struct stat st; + int ret; + + if (stat("/proc/partitions", &st) == 0) { + return 0; + } + + if (mkdir("/proc", 0755) != 0 && errno != EEXIST) { + return 1; + } + + ret = sys_mount("proc", "/proc", "proc", 0U, (const void *)0); + if (ret < 0) { + return 1; + } + + if (stat("/proc/partitions", &st) != 0) { + return 1; + } + + return 0; +} + +static int extract_payload(const char *device, const char *dest_dir) +{ + FILE *in; + char magic[MAGIC_LEN]; + unsigned char u64buf[8]; + unsigned long long file_count; + unsigned long long i; + + in = fopen(device, "rb"); + if (in == NULL) { + fprintf(stderr, "payload-import: cannot open %s: %s\n", device, strerror(errno)); + return 1; + } + + if (read_exact(in, magic, MAGIC_LEN) != 0 || memcmp(magic, MAGIC, MAGIC_LEN) != 0) { + fclose(in); + fprintf(stderr, "payload-import: %s is not a payload image\n", device); + return 1; + } + + if (read_exact(in, u64buf, 8) != 0) { + fclose(in); + fputs("payload-import: malformed payload header\n", stderr); + return 1; + } + file_count = read_u64le(u64buf); + if (file_count > 200000ULL) { + fclose(in); + fprintf(stderr, "payload-import: unreasonable file count: %llu\n", file_count); + return 1; + } + + if (mkdir(dest_dir, 0755) != 0 && errno != EEXIST) { + fclose(in); + fprintf(stderr, "payload-import: cannot create %s: %s\n", dest_dir, strerror(errno)); + return 1; + } + + printf("payload-import: reading %llu files from %s\n", file_count, device); + for (i = 0; i < file_count; ++i) { + unsigned long long name_len; + unsigned long long data_len; + char *name; + char out_path[4096]; + FILE *out; + + if (read_exact(in, u64buf, 8) != 0) { + fclose(in); + fputs("payload-import: truncated entry header\n", stderr); + return 1; + } + name_len = read_u64le(u64buf); + if (read_exact(in, u64buf, 8) != 0) { + fclose(in); + fputs("payload-import: truncated entry size\n", stderr); + return 1; + } + data_len = read_u64le(u64buf); + + if (name_len == 0ULL || name_len > (unsigned long long)MAX_NAME_LEN) { + fclose(in); + fprintf(stderr, "payload-import: invalid name length %llu\n", name_len); + return 1; + } + + name = (char *)malloc((size_t)name_len + 1U); + if (name == NULL) { + fclose(in); + fputs("payload-import: out of memory\n", stderr); + return 1; + } + + if (read_exact(in, name, (size_t)name_len) != 0) { + free(name); + fclose(in); + fputs("payload-import: truncated file name\n", stderr); + return 1; + } + name[(size_t)name_len] = 0; + + if (!is_valid_name(name)) { + fclose(in); + fprintf(stderr, "payload-import: invalid payload file name: %s\n", name); + free(name); + return 1; + } + + if (snprintf(out_path, sizeof(out_path), "%s/%s", dest_dir, name) >= (int)sizeof(out_path)) { + free(name); + fclose(in); + fputs("payload-import: output path too long\n", stderr); + return 1; + } + + out = fopen(out_path, "wb"); + if (out == NULL) { + fprintf(stderr, "payload-import: cannot write %s: %s\n", out_path, strerror(errno)); + free(name); + fclose(in); + return 1; + } + + if (copy_exact(in, out, data_len, device, name, out_path) != 0) { + free(name); + fclose(out); + unlink(out_path); + fclose(in); + return 1; + } + + fclose(out); + printf("payload-import: %s\n", name); + free(name); + } + + fclose(in); + return 0; +} + +static void usage(const char *name) +{ + fprintf(stderr, + "Usage:\n" + " %s --mount-proc\n" + " %s --probe \n" + " %s --device \n", + name, name, name); +} + +int main(int argc, char **argv) +{ + const char *device = NULL; + const char *dest_dir = NULL; + int i; + + if (argc == 2 && strcmp(argv[1], "--mount-proc") == 0) { + return ensure_proc_partitions(); + } + + if (argc == 3 && strcmp(argv[1], "--probe") == 0) { + return has_payload_magic(argv[2]); + } + + for (i = 1; i < argc; ++i) { + if (strcmp(argv[i], "--device") == 0) { + i += 1; + if (i >= argc) { + usage(argv[0]); + return 1; + } + device = argv[i]; + } else if (dest_dir == NULL) { + dest_dir = argv[i]; + } else { + usage(argv[0]); + return 1; + } + } + + if (device == NULL || dest_dir == NULL) { + usage(argv[0]); + return 1; + } + + return extract_payload(device, dest_dir); +}