From c0e84ad46254cfccac83bf7869fed03c145816fe Mon Sep 17 00:00:00 2001 From: vxtls <187420201+vxtls@users.noreply.github.com> Date: Sun, 15 Mar 2026 13:39:58 -0400 Subject: [PATCH] docs(payload): specify UTF-8 filename encoding and define name_len as UTF-8 byte length --- Payload_img_design.md | 7 +++++-- README.rst | 7 +++++-- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/Payload_img_design.md b/Payload_img_design.md index 4f223ecc..6bfc34a9 100644 --- a/Payload_img_design.md +++ b/Payload_img_design.md @@ -47,9 +47,11 @@ The flow is: - Repeated entries: - little-endian `u64` name length - little-endian `u64` file size - - file name bytes (no terminator) + - file name string, encoded as UTF-8 bytes (no terminator) - file bytes +`name length` is the number of bytes in the UTF-8 encoded file name (not the number of Unicode code points). + The importer probes detected block devices and selects the one with magic `LBPAYLD1`. ### Manual creation without Python @@ -82,7 +84,8 @@ write_u64le "${count}" >> "${out}" while read -r name path; do [ -n "${name}" ] || continue size="$(wc -c < "${path}" | tr -d ' ')" - write_u64le "${#name}" >> "${out}" + name_len="$(printf '%s' "${name}" | wc -c | tr -d ' ')" + write_u64le "${name_len}" >> "${out}" write_u64le "${size}" >> "${out}" printf '%s' "${name}" >> "${out}" cat "${path}" >> "${out}" diff --git a/README.rst b/README.rst index 210506db..e1380d39 100644 --- a/README.rst +++ b/README.rst @@ -73,7 +73,9 @@ Without using Python: * Header magic: ``LBPAYLD1`` (8 bytes). * Then: little-endian ``u64`` file count. * Repeated for each file: little-endian ``u64`` name length, - little-endian ``u64`` file size, raw file name bytes, raw file bytes. + little-endian ``u64`` file size, UTF-8 encoded file name bytes + (no terminator), raw file bytes. + * ``name length`` is the number of UTF-8 bytes (not Unicode code points). * With ``--repo``, the second disk remains an ext3 distfiles/repo disk. * Without ``--external-sources`` and without ``--repo``, no second disk is @@ -117,7 +119,8 @@ with ``--external-sources`` (and no ``--repo``). while read -r name path; do [ -n "${name}" ] || continue size="$(wc -c < "${path}" | tr -d ' ')" - write_u64le "${#name}" >> "${out}" + name_len="$(printf '%s' "${name}" | wc -c | tr -d ' ')" + write_u64le "${name_len}" >> "${out}" write_u64le "${size}" >> "${out}" printf '%s' "${name}" >> "${out}" cat "${path}" >> "${out}"