docs(payload): specify UTF-8 filename encoding and define name_len as UTF-8 byte length

This commit is contained in:
vxtls 2026-03-15 13:39:58 -04:00
parent c742563d3b
commit c0e84ad462
2 changed files with 10 additions and 4 deletions

View file

@ -47,9 +47,11 @@ The flow is:
- Repeated entries:
- little-endian `u64` name length
- little-endian `u64` file size
- file name bytes (no terminator)
- file name string, encoded as UTF-8 bytes (no terminator)
- file bytes
`name length` is the number of bytes in the UTF-8 encoded file name (not the number of Unicode code points).
The importer probes detected block devices and selects the one with magic `LBPAYLD1`.
### Manual creation without Python
@ -82,7 +84,8 @@ write_u64le "${count}" >> "${out}"
while read -r name path; do
[ -n "${name}" ] || continue
size="$(wc -c < "${path}" | tr -d ' ')"
write_u64le "${#name}" >> "${out}"
name_len="$(printf '%s' "${name}" | wc -c | tr -d ' ')"
write_u64le "${name_len}" >> "${out}"
write_u64le "${size}" >> "${out}"
printf '%s' "${name}" >> "${out}"
cat "${path}" >> "${out}"

View file

@ -73,7 +73,9 @@ Without using Python:
* Header magic: ``LBPAYLD1`` (8 bytes).
* Then: little-endian ``u64`` file count.
* Repeated for each file: little-endian ``u64`` name length,
little-endian ``u64`` file size, raw file name bytes, raw file bytes.
little-endian ``u64`` file size, UTF-8 encoded file name bytes
(no terminator), raw file bytes.
* ``name length`` is the number of UTF-8 bytes (not Unicode code points).
* With ``--repo``, the second disk remains an ext3 distfiles/repo disk.
* Without ``--external-sources`` and without ``--repo``, no second disk is
@ -117,7 +119,8 @@ with ``--external-sources`` (and no ``--repo``).
while read -r name path; do
[ -n "${name}" ] || continue
size="$(wc -c < "${path}" | tr -d ' ')"
write_u64le "${#name}" >> "${out}"
name_len="$(printf '%s' "${name}" | wc -c | tr -d ' ')"
write_u64le "${name_len}" >> "${out}"
write_u64le "${size}" >> "${out}"
printf '%s' "${name}" >> "${out}"
cat "${path}" >> "${out}"