Ignition Compiler Pipeline (HIR → Bytecode)
This document specifies how runmat-ignition
lowers the high-level IR (HIR) emitted by the parser into VM bytecode and how names, functions, objects, and indexing semantics are realized at compile-time.
The VM is a stack machine: each instruction pops values from the top of the stack and pushes results back. Heavyweight numeric and object semantics are delegated to runmat-runtime
builtins; the compiler focuses on control flow, name resolution, structural lowering, and correct stack discipline.
Inputs and Outputs
- Input:
runmat_hir::HirProgram
- Output:
Bytecode
(infunctions.rs
):instructions: Vec<Instr>
: linear VM programvar_count: usize
: number of global/local slots required by this unitfunctions: HashMap<String, UserFunction>
: user-defined functions discovered in this unit (including synthesized closures)
Compiler::new
pre-scans the HIR to compute var_count
by visiting variable IDs found in statements and expressions.
High-level Pass Structure
Compilation is single-pass over statements with small local analyses. A preliminary sweep records imports and global/persistent declarations so name resolution and storage bindings are stable.
- Validate program invariants
validate_imports(prog)
for duplicate/ambiguous importsvalidate_classdefs(prog)
for class attribute/name conflicts
- Pre-collect declarations and emit them:
Import
:RegisterImport { path, wildcard }
Global/Persistent
: named forms are emitted so the VM can bind thread locals at runtime
- Compile all other statements sequentially.
Name Resolution and Imports
Unqualified names at call-sites are resolved at compile-time using the following precedence:
- Local variables / constants handled directly by HIR kinds (
Var
,Constant
) - User functions defined in the current compilation unit
- Specific imports:
import pkg.foo
resolvesfoo
→pkg.foo
- Wildcard imports:
import pkg.*
resolvesfoo
→pkg.foo
- Static class methods:
import MyClass.*
may resolvefoo
→CallStaticMethod("MyClass", "foo", ...)
if unambiguous
Builtins are looked up first by unqualified name, then via specific imports, then via wildcard imports. Static properties can also be compiled under Class.*
wildcard when unambiguous. Ambiguities produce compile-time errors.
Constants (HirExprKind::Constant
) are resolved against runmat_builtins::constants()
. If not found, the compiler attempts unqualified static property lookup via Class.*
imports.
Expressions
The compiler is responsible for stack shaping and choosing appropriate instructions. Representative cases:
-
Numbers/strings/chars:
Number
→LoadConst
String('...')
→LoadCharRow
(char row vector)String("...")
→LoadString
(scalar string)Constant
→ constant lookup orLoadStaticProperty
via Class.* imports
-
Unary and binary ops:
- Emit left then right then the operator instruction (e.g.,
Add
,ElemMul
,Pow
) - Logical
!
lowers tox == 0
- Short-circuit
&&
/||
use conditional jumps emitting only the necessary side (see “Short-circuit lowering” below)
- Emit left then right then the operator instruction (e.g.,
-
Ranges:
start[:step]:end
→ push components thenCreateRange(has_step)
-
Indexing and slicing:
- Pure numeric:
Index(n)
- Mixed
:
,end
, vector/range/logical:IndexSlice(dims, numeric_count, colon_mask, end_mask)
end - k
in numeric positions:IndexSliceEx(..., end_offsets)
- Range endpoints using
end
per-dimension:IndexRangeEnd
or 1-D fast-pathIndex1DRangeEnd
- Pure numeric:
-
Literals (tensor/cell):
- Pure numeric rectangular matrices:
CreateMatrix(rows, cols)
- Mixed/dynamic: push all elements row-major + row lengths then
CreateMatrixDynamic(rows)
- Cells:
CreateCell2D(rows, cols)
(rectangular) or ragged fallback using the same opcode with(1,total)
- Special case:
[C{:}]
lowers toCallBuiltinExpandMulti("cat", specs)
with first arg fixed2
and second argument expand-all from the cell
- Pure numeric rectangular matrices:
-
Function calls:
- User function:
CallFunction(name, argc)
- Builtin:
CallBuiltin(name, argc)
- Comma-list expansion from
C{...}
arguments usesCall*ExpandMulti
withArgSpec
per argument feval(f, ...)
usesCallFeval
/CallFevalExpandMulti
for dynamic dispatch (closures, handles, strings)- If an argument is a user function call, the compiler can “inline-expand” its multiple outputs into the caller’s argument list via
CallFunctionMulti
+ packing
- User function:
-
Anonymous functions and closures:
- Free variables are discovered (
collect_free_vars
) and captured by value in the order of first appearance - A synthetic
UserFunction
is created with captures prepended to parameters CreateClosure(synth_name, capture_count)
is emitted with capture values on the stack
- Free variables are discovered (
-
Member access and methods:
- Instance field:
LoadMember(field)
/StoreMember(field)
- Dynamic field:
LoadMemberDynamic
/StoreMemberDynamic
- Instance method call:
CallMethod(name, argc)
- Static property/method via
classref('T')
or metaclass literal:LoadStaticProperty
/CallStaticMethod
- Instance field:
Statements
- Expression statement: compile expr then
Pop
- Assignment: compile RHS then
StoreVar(var_id)
- If / ElseIf / Else: emit
JumpIfFalse
guards per branch and backpatch end jumps - While: loop header guard with a single
JumpIfFalse
to loop end; keepbreak
/continue
stacks to backpatch - For-range: requires a
Range
expression; the compiler emits:- Initialize
var
,end_var
,step_var
(with default step 1) - If
step == 0
jump to loop end - Conditional form depends on step sign:
var <= end
for non-negative,var >= end
otherwise - Body, handle
continue
, then incrementvar += step
- Initialize
- Switch: compare scrutinee against each case, chain
JumpIfFalse
, collect end jumps - Try/Catch:
EnterTry(catch_pc, catch_var?)
, compile try body,PopTry
, jump over catch; then compile catch and backpatch - Global/Persistent/Import: named variants emitted up-front and repeated if they occur in function bodies for binding in VM
- Function definitions: materialize
UserFunction
entries (not executed inline) - Class definitions: lowered to a single
RegisterClass
carrying static metadata for runtime registration
Multi-assign
[a,b,c] = rhs
lowers as follows:
- If
rhs
is a user function: emitCallFunctionMulti(name, argc, outc)
whereoutc == len([a,b,c])
, thenStoreVar
/Pop
right-to-left - If
rhs
is builtin/unknown:CallBuiltinMulti(name, argc, outc)
and distribute results similarly - If
rhs
isC{...}
(cell indexing):IndexCellExpand(num_indices, outc)
- Otherwise: first real variable gets
expr
, others receive0
(matlab-compatible defaulting in many test paths)
L-values (Index and Member assignment)
- Numeric-only indexing: push base and indices, compile RHS (with user-function packing optimization for 1-D),
StoreIndex(n)
then write-back to variable/member - Slices (
:
,end
, ranges, vectors, logical): compute masks and numeric-count, compile RHS (attempting vector packing for function returns or cell expand), thenStoreSlice
/StoreSliceEx
/StoreRangeEnd
/StoreSlice1DRangeEnd
as appropriate, finally store back to the base variable or member - Cell assignment:
StoreIndexCell(n)
- Member and dynamic-member assignment re-evaluate the base when necessary to perform
StoreMember
/StoreMemberDynamic
and then store updated object back to the root variable if applicable
Short-circuit lowering
a && b
:
- compile
a
, compare to 0 (NotEqual
),JumpIfFalse
over RHS path - compile
b
, compare to 0, unconditional jump to end - false path pushes
0
a || b
:
- compile
a
, compare to 0,JumpIfFalse
to RHS path - true path pushes
1
, jump to end - RHS path compiles
b != 0
Objects and Classes
Objects are mediated by runtime registries:
- Instance access:
LoadMember
/StoreMember
check access and dependent-property behavior; when absent, fall back tosubsref
/subsasgn
if provided by the class - Methods:
CallMethod
dispatches toClassName.method
or genericname
builtins;LoadMethod
returns a closure bound to the receiver - Static members:
LoadStaticProperty
/CallStaticMethod
with class names fromclassref('T')
or metaclass literal;RegisterClass
installs definitions at runtime
Diagnostics
Compiler errors are returned as Err(String)
. Runtime errors are normalized by the VM via the mex error model (see ERROR_MODEL.md
).
For instruction-by-instruction semantics see INSTR_SET.md
. For gather/scatter details see INDEXING_AND_SLICING.md
.