Code Generation Overview
maketype takes a pattern and compiles it into a bit-packed primitive type with efficient parsing and serialisation, along with printing, property access, comparison, and introspection, all generated from the same specification.
Pipeline
Code generation proceeds in five phases:
- Pattern walking: We traverse the pattern tree, dispatching each node to a segment handler that appends expressions to the parse, print, and property accumulators. Handlers select data loading strategies and emit optimised code for digit parsing, string matching, and choice resolution.
- Sentinel emission: handlers emit placeholder expressions for quantities that depend on the complete pattern (the final type size, total byte counts, etc.). These can't be resolved yet.
- Length-check insertion: we analyse segment boundaries and insert the minimum number of runtime length checks, using greedy forward batching.
- Sentinel resolution: placeholders are replaced with their final values. Statically-satisfied checks are folded away, potentially eliminating entire branches. This is also where wide/narrow load path selection is finalised.
- Assembly: resolved expressions are wrapped into function definitions and combined into a single
:toplevelblock.
Expression accumulation
Each pattern node appends to three vectors:
-Parse :: advance pos through data, validate, extract, and pack into the parsed bit-accumulator. -Print :: extract values from val and write to io, reconstructing the canonical string. -Properties :: per-field extraction code, keyed by the :name(...) field name.
These expressions all reference shared variables (pos, data, nbytes, parsed, val, io) that are bound by the enclosing function templates.
Context threading
State is threaded via an ImmutableDict. Each handler may extend the context for its children without affecting siblings, modelling lexical scoping. For example, an optional block pushes its tracking variable so that inner nodes can decide whether a failure means "return an error" or "set the flag to false and jump to cleanup."
Sentinels and error handling
Several quantities can't be determined until the full pattern has been processed. Rather than requiring a two-pass approach, handlers emit placeholder calls that are resolved after the walk completes.
Each length sentinel captures (branch_id, emission_max, ...) where emission_max is the branch's parsed_max at the time of emission:
-__length_check(branch_id, emission_max, n_min, n_max, n_expr) :: resolved to true when parsed_min - emission_max >= n_max, or to a runtime check otherwise. -__length_bound(branch_id, emission_max, n) :: resolved to constant n when guaranteed, or min(n, nbytes - pos + 1). -__static_length_check(branch_id, emission_max, n) :: compile-time branch selection for choosing between wide and narrow load paths. -__branch_check :: root upfront guard or optional entry guard.
Casting sentinels (__cast_to_packed / __cast_from_packed) are replaced with Core.Intrinsics operations (zext_int, trunc_int, or bitcast) once the final type size is known.
Parse errors are integer indices into a compile-time message tuple. parsebytes returns either a valid instance or (error_index, pos). No strings or exception objects are allocated on the hot path.
Further reading
-Data loading :: register decomposition and load strategies -Bit-packing :: field encoding into the primitive type -SWAR digit parsing :: parallel digit validation and conversion -Perfect hashing :: choice matching -String matching :: literal and prefix matching -Buffer printing :: direct-to-memory output (jeaiii, reverse-SWAR)