Segments
Segments are the building blocks of patterns. Each segment describes how to parse, pack, extract, and print a portion of the input.
Structural constructs
Tuples (sequencing)
A tuple sequences its elements left to right:
("prefix-", :id(digits(4)), "-suffix")The explicit seq(args...) form is equivalent.
Field capture
:name(...) wraps a segment (or sequence) and exposes it as a named property on the generated type. Single-segment fields return the extracted value directly; multi-segment fields concatenate their printed forms into a string.
Optional
optional(args...) wraps content that may be absent. Internally this is equivalent to choice(seq(args...), ""): a choice between the content and nothing. On parse failure within the optional, pos is rewound and the optional is marked absent. Properties return nothing when absent.
Absence is encoded via sentinels in the bit representation: typically by reserving a zero value or adding a presence bit. See Bit-packing for encoding strategies.
Optionals nest: each level forks its own branch for independent byte-count tracking.
Value segments
digits
Numeric fields parsed from digit characters.
digits(n) # exactly n digits
digits(lo:hi) # variable width, lo to hi digits
digits(max=M) # 1 to ceil(log_base(M)) digits, value <= M
digits(n, base=16) # n hex digits
digits(n, min=A, max=B, pad=P)| Keyword | Default | Description |
|---|---|---|
base | 10 | Numeric base (2–62) |
min | 0 | Minimum value |
max | –- | Maximum value |
pad | 0 | Zero-pad printed output to width |
skip | –- | Characters to skip during parse (e.g. "-") |
groups | –- | Print in groups: N (uniform) or (A, B, C) |
exclude | –- | Excluded values: N, lo:hi, or (N, lo:hi, ...) |
When skip is set, the specified characters are silently consumed during parsing. The groups keyword (requires skip) prints the value with the first skip character as a separator. groups=4 splits into groups of 4; groups=(3, 3, 4) gives explicit sizes.
The exclude keyword rejects values or ranges at parse time. A single value (exclude=13), a range (exclude=8:90), or a tuple of mixed values and ranges (exclude=(0, 8:90)) are accepted. Useful for non-contiguous valid ranges like ArXiv old-format years (00–07 and 91–99): digits(2, pad=2, exclude=8:90).
Fixed-width fields with base ≤ 16 use SWAR parsing; variable-width fields up to 8 digits also use SWAR with runtime digit counting. Wider or higher-base fields fall back to scalar parseint.
letters
Alphabetic characters (A–Z, a–z).
letters(n) # exactly n letters
letters(lo:hi) # variable width
letters(n, upper=true) # store as uppercase
letters(n, lower=true) # store as lowercase
letters(n, casefold=true) # case-insensitive parse, uppercase storeEach character is packed into ⌈{}log<sub>2</sub>(alphabet size)⌉{} bits.
All character sequence segments (letters, alphnum, hex, charset) support the skip and groups keywords with the same semantics as digits: see above.
alphnum
Alphanumeric characters (0–9, A–Z, a–z). Same interface as letters.
hex
Hexadecimal characters (0–9, A–F). Same interface as letters. Default store is uppercase.
When case-specific output is not needed, digits(n, base=16) may be preferable — it uses SWAR parsing for significantly faster parsing, and stores the numeric value directly rather than character-packed data.
charset
Custom character set from explicit ranges.
charset(n, 'A':'Z', '0':'9') # fixed width
charset(lo:hi, 'a':'f', '-') # variable width, ranges + single chars
charset(n, 'A':'Z', casefold=true) # case-insensitive
charset(6, '0':'9', 'a':'z', numeric=true) # numeric: extract returns integerThe first argument is the length (integer or range). Subsequent arguments are character ranges ('a':'z') or single characters.
With numeric=true, the extracted property value is the packed integer (treating each character as a digit in base N where N is the alphabet size) rather than a string. This enables checkdigit validation on charset fields and is useful for base-32 or similar encodings where the character sequence represents a number.
choice
String choice
Match one of several literal strings.
choice("alpha", "beta", "gamma") # value-carrying: stores index
choice("http://", "https://", is="http") # fixed: stores bool (matched canonical?)| Keyword | Description |
|---|---|
casefold | Case-insensitive matching (default from parent) |
is | Canonical form: store only whether it matched |
When captured as a field, the property returns a Symbol (value form) or is unavailable (fixed form). Uses perfect hashing when possible, with linear scan as fallback.
Empty strings in the option list make the choice optional (zero-length match allowed).
Structured choice
When any argument to choice is a compound pattern (tuple or call expression) rather than a plain string, it becomes a structured choice between alternative arm patterns.
choice(
seq(choice("O", "P", "Q"), digits(1), alphnum(3), digits(1)),
seq(charset(1, 'A':'N', 'R':'Z'), digits(1), alphnum(8)))Each arm is tried in order. A discriminant records which arm matched. Fields from non-active arms return nothing. An empty string "" as an arm matches zero bytes (equivalent to the absent case in optional).
choice(("A", :x(digits(2))), ("B", :y(digits(3))), "")When the arms have a byte position where their possible values are disjoint (e.g. the first byte is "A" in one arm and "B" in another), the compiler emits a dispatch table for O(1) arm selection instead of sequential try-and-rewind.
Arms that share the same field name produce a single property that dispatches on the discriminant:
choice(seq("VCV", :id(digits(9))),
seq("RCV", :id(digits(8))))
# .id returns the active arm's valueTagged choice
A tagged choice exposes the discriminant as a named property with user-specified tag values:
choice(:kind,
:alpha => seq("A", :val(digits(2))),
:beta => seq("B", :val(digits(3))))The first argument is a QuoteNode naming the tag property. Remaining arguments are tag => arm pairs. Accessing .kind returns :alpha or :beta depending on which arm matched. Tags may be any type (symbols, strings, integers) so long as all arms use the same type.
Tagged choices nest naturally: an inner tagged choice inside an outer choice arm returns nothing for its tag when the outer arm is inactive.
embed
Embed another packed primitive type inline.
embed(InnerType)Delegates parsing to the inner type's parsebytes and printing to its tobytes. The inner type must be a primitive subtype of the outer type's supertype.
Non-value segments
Literals / bare strings
Required literal text. Bare strings in the pattern are shorthand:
"PFX-" # equivalent to literal("PFX-")Uses word-sized comparisons.
skip
Strip optional prefixes on parse. Does not store any value.
Each argument is a sequential step: tried in order, each optionally stripping one prefix. A bare string is a step with one alternative; choice(...) provides multiple alternatives at the same position.
skip("-") # strip "-" if present
skip(choice("http://", "https://")) # strip one of these
skip(choice("http://", "https://"), "www.") # strip scheme, then "www."
skip(choice(" ", "/"), print="-") # strip " " or "/", print "-"
skip(print="-") # strip "-" if present, print "-"Within each step, alternatives are tried longest-first. The print keyword specifies what to emit on output (nothing if omitted).