Buffer Printing

Producing a type's string form is a frequent operation (display, serialisation). Writing to an IOBuffer and extracting a String works, but carries overhead from IO dispatch and copying. PackedParselets replaces this with direct writes to a Memory{UInt8} buffer.

Rewriting pass

The print expressions from the pattern walk use standard print(io, ...) calls. A post-processing pass rewrites these for the tobytes method:

Original	Rewritten
`print(io, "literal")`	Word-sized stores into `buf`
`print(io, string(var, base=b, pad=p))`	`bufprint(buf, pos, var, b, p)`
`printchars(io, packed, n, ranges)`	Reverse-SWAR stores or `unpackchars!` call
`__tobytes_print(io, embedded)`	`tobytes` + `unsafe_copyto!`

Control flow (if~/~else for optionals) is preserved; only the leaf print calls are replaced. The IO-based print method continues to use the original expressions unchanged.

Decimal integer output (jeaiii)

Integer fields use the jeaiii multiply-shift algorithm (Jeon, 2022) for base-10 output, eliminating both ndigits() and divrem(). A single multiply by a magic constant produces a fixed-point representation; the integer part gives the first digit pair, and multiplying the fractional part by 100 advances to the next pair. A radix-100 lookup table converts each pair to two ASCII bytes via a UInt16 store.

The digit count is determined implicitly: range checks (num < 100, < 10000, etc.) select the correct magic constant, and the first pair's value (< 10 or \ge 10) resolves odd vs even digit counts without a separate ndigits call. This covers all UInt32 values (up to 10 digits). Larger values and non-decimal bases fall back to a divrem loop with the same radix-100 table.

Zero-padding, when needed, shifts the digits right after writing to insert leading zeros — the only correct approach since jeaiii produces digits left-to-right and the digit count isn't known upfront.

Character sequence output (reverse-SWAR)

Character sequences (letters, hex, alphnum, charset) are stored as packed bpc-bit indices. Unpacking to ASCII bytes for output reverses the SWAR reduction: a binary cascade spreads bpc-bit fields into one-per-byte, then adds the ASCII base offset and byte-swaps for memory-order storage.

The cascade constants (masks and gaps) are precomputed at type-generation time. At each step, the lower half of each field group shifts right to open a gap equal to the target byte spacing. After ⌈{}log<sub>2</sub>(n)⌉{} steps, each field sits in its own byte. A final right-shift, mask, and add produces the ASCII bytes, and hton converts to memory order for a single register-sized store.

For single-range charsets (e.g. uppercase letters, digits 0–9), each byte is just base + index. For two-range charsets (e.g. hex: 0–9 + A–F), a branchless SWAR comparison adds a conditional delta to bytes in the second range. The comparison uses borrow-safe per-byte subtraction: ~((var | 0x80…)

threshold) ⊻ var) & 0x80…~ sets bit 7 for bytes where the index exceeds the

first range's length.

Chunks cascade through register sizes (8 → 4 → 2 bytes), with a scalar tail for any remainder. This handles all fixed-width charseq fields regardless of length.

Literal strings

Literal strings use the same register decomposition as matching: they're split into 8/4/2/1-byte chunks, each written with a single unsafe_store! or byte assignment.

The `tobytes` / `string` pipeline

tobytes(buf, val) writes directly into a caller-provided Memory{UInt8}, returning the byte count. This is the zero-allocation inner method.

tobytes(val) allocates a buffer at the maximum output length, calls the 2-arg method, and returns (buf, len). string(val) calls tobytes and then either uses unsafe_takestring (zero-copy) for fixed-length output, or copies to a correctly-sized StringMemory for variable-length output.