RunMat HIR
High-level Intermediate Representation for MATLAB code. HIR is the semantic hub between parsing and execution (interpreter/JIT). It resolves identifiers to VarIds, attaches static types, normalizes constructs, and runs early semantic validations so downstream components can be simpler and faster.
Goals
- Provide a typed, SSA-friendly structure for the engine
- Preserve MATLAB semantics (indexing, cells, classes, methods, metaclass)
- Enable flow-sensitive inference and optimizations (constant folding, dispatch)
- Catch structural and attribute errors early (classdef attributes, imports)
Core data structures
VarId(usize): stable variable identifiers after name bindingType(fromrunmat-builtins):Int,Num,Bool,StringTensor { shape: Option<Vec<Option<usize>>> }(column-major semantics)Cell { element_type: Option<Box<Type>>, length: Option<usize> }Function { params: Vec<Type>, returns: Box<Type> }Struct { known_fields: Option<Vec<String>> }(inference-only)Void,Unknown,Union(Vec<Type>)
HirExpr { kind, ty }(selected variants):- Literals and names:
Number,String,Var(VarId),Constant - Ops:
Unary,Binary - Aggregates:
Tensor,Cell,Range,Colon,End - Indexing:
Index,IndexCell - Calls and members:
FuncCall,FuncHandle,AnonFunc,Member,MemberDynamic,MethodCall - Metaclass:
MetaClass("pkg.Class")
- Literals and names:
HirStmt(selected variants):ExprStmt(expr, suppressed)(semicolon suppression)Assign(VarId, expr, suppressed)MultiAssign(Vec<Option<VarId>>, expr, suppressed)with~asNoneAssignLValue(HirLValue, expr, suppressed)whereHirLValue∈ IndexCell- Control flow:
If,While,For,Switch,TryCatch - Declarations:
Function { name, params, outputs, body, has_varargin, has_varargout },Global,Persistent - Flow control:
Break,Continue,Return - Class:
ClassDef { name, super_class, members } - Imports:
Import { path: Vec<String>, wildcard: bool }
HirClassMember:Properties,Methods,Events,Enumeration,Arguments(carryparser::Attrattributes)HirProgram { body }
Lowering (AST → HIR)
Ctxmanages scopes, binds names toVarId, and maintainsvar_typesfor flow typing.- Variables shadow constants; bare identifiers that are known functions lower to
FuncCall(name, []). - Indexing vs calls is already disambiguated by the parser; HIR keeps
Index/IndexCellandFuncCalldistinct. - L-values lower to
HirLValuefor dot/paren/brace writes. PlainA(…) = visAssignLValue. Functionstatements recordhas_varargin/has_varargoutflags.ClassDeflowers structurally intoHirClassMemberblocks with attributes preserved.Importlowers to a dedicatedHirStmt::Import(no runtime effect; used by name resolution/validation).- Metaclass
?Qualified.Namelowers toHirExprKind::MetaClass("Qualified.Name"); postfix is handled in the compiler. - Function-level
arguments ... endblocks (when present) are parsed; names are accepted and exposed to later validation. Constraint checking (types/defaults/ranges) is enforced at HIR/VM time rather than parsing time.
Early validations and helpers
validate_classdefs(&HirProgram)runs duringlower():- Detects duplicate properties/methods and name conflicts between them
- Enforces attribute constraints (e.g., Methods:
Abstract∧Sealedinvalid; Properties:Static∧Dependentinvalid;Access/GetAccess/SetAccessvalues limited topublic|private) - Performs basic sanity checks for
Events,Enumeration, andArguments(unique names; no conflicts with props/methods)
- Imports:
collect_imports(&HirProgram)normalize_imports(&HirProgram) -> Vec<NormalizedImport { path, wildcard, unqualified }]validate_imports(&HirProgram)checks duplicates and ambiguity among specifics with the same unqualified name
- Multi-LHS structural validation: lowering rejects invalid LHS shapes early (e.g., empty LHS vectors, unsupported mixed forms); shape/size rules are enforced by the interpreter at assignment.
- Globals/Persistents: a per-program symbol set is collected across units to model lifetimes and name binding consistently.
Type inference (expressions)
- Numbers/strings/booleans map to
Num/String/Bool. - Arithmetic/elementwise ops: if any operand is
Tensor, result isTensor(shape may unify when known). - Range/colon produce
Tensor. - Indexing computes output type conservatively. For tensors with known rank, scalar indices drop dimensions.
- Cells compute a unified element type across literals when possible.
- Member/Method calls are
Unknownby default (value-dependent at runtime). - Metaclass expression has
Stringtype.
Flow-sensitive inference
Two complementary passes exist:
- Inter-procedural return summaries
infer_function_output_types(&HirProgram) -> HashMap<String, Vec<Type>>- Gathers all function names (top-level and class methods)
- Seeds summaries from each function's own exits/fallthrough, then iterates to a small fixed point (cap at 3 iters)
- Merges types at joins; Unknown ⊔ T = T; otherwise unify
- Uses an internal
analyze_stmts(outputs, …, func_returns)whose env joins propagate return types
- Per-function variable environments
infer_function_variable_types(&HirProgram) -> HashMap<String, HashMap<VarId, Type>- Similar dataflow that produces a final environment for each function
- Uses return summaries from (1) to type
FuncCall - Includes a simple callsite fallback for direct callees: when a callee's summary is missing/Unknown, a single-pass analysis of the callee body (seeding parameter types conservatively) infers direct output assignments. This stabilizes per-position types for
[a,b]=f(...)at callers.
Struct-field flow inference
- HIR uses
Type::Struct { known_fields: Option<Vec<String>> }to conservatively track observed fields on variables. - The analysis refines struct knowledge in two ways:
- Writes:
s.field = exprmarkssas Struct and adds"field"toknown_fields. - Conditions (then-branch refinement): detect any of the following and add asserted fields:
isfield(s, 'x')ismember('x', fieldnames(s))orismember(fieldnames(s), 'x')strcmp(fieldnames(s), 'x')/strcmpi(…), includingany(strcmp(…))orall(strcmp(…))- Conjunctions using
&&or&are traversed; negations are ignored (no refinement)
- Writes:
- Refinements are applied to the then-branch env only and merged back at joins using
Type::unifyfor Structs.
Multi-assign typing
[a,b] = f(...)is typed per-position using the callee's return summary when available.- If a summary is incomplete or missing, a simple fallback (single-pass over the callee) infers direct assignments to outputs and fills Unknowns conservatively.
- Mixed forms like
[~,b] = f(...)are handled by storingNonein the LHS vector and skipping the slot.
Function call typing
- Builtins: signatures come from the registry (
runmat-builtins). - User functions: return summaries and the per-position logic above are used for accurate call result typing in both expression and
MultiAssigncontexts.
Remapping utilities
remapping::create_function_var_map,create_complete_function_var_mapremapping::remap_function_body/remap_stmt/remap_exprto rewriteVarIds for local execution framesremapping::collect_function_variablesscans bodies to compute complete maps
Public entry points
lower(&AstProgram) -> Result<HirProgram, String>: lowers AST, runs return-summary inference (for seeding), then validates classeslower_with_context/lower_with_full_context: lowering for REPL with preexisting variables/functions- Validation helpers:
validate_classdefs,collect_imports,normalize_imports,validate_imports - Inference helpers:
infer_function_output_types,infer_function_variable_types
Testing
- Mirrors parser coverage for syntax constructs; adds HIR-specific tests:
- L-value lowering (member/paren/brace), multi-assign and
~placeholder - Control-flow joins across if/elseif/else, switch/otherwise, while/for loops, try/catch
- Class attribute validation (invalid combos, duplicates, conflicts)
- Import normalization/ambiguity checks
- Fuzz seeds for lowering edge cases
- L-value lowering (member/paren/brace), multi-assign and
Notes and differences from MATLAB
- MATLAB is dynamically typed; HIR attaches conservative static types for optimization only. Programs acceptable to MATLAB remain acceptable; Unknown is used when insufficient info.
- Column-major Tensor semantics are preserved throughout indexing/slicing/shape operations.
- Class blocks are carried structurally; access/attribute validations run during lowering; advanced OOP attributes may have future passes.
- Metaclass expressions are represented explicitly; postfix static member/method usage is compiled appropriately downstream.
Roadmap / future enhancements
- Inter-procedural propagation of struct field knowledge across calls
- Deeper OOP attribute validations (Hidden/Constant/Transient interplay; static/instance access rules)
- Richer import resolution summaries for static method/property lookup in the HIR stage
- Shape reasoning improvements for Tensor broadcasting and indexing
Remaining edges
- Arguments metadata: carry
arguments ... enddeclared names/constraints (when available from parser) and surface to runtime validation. Current parser accepts names; HIR will add optional metadata structs without breaking format. - Multi-LHS validation: parser structurally restricts to identifiers/
~; HIR enforces shape semantics at runtime. Additional unit tests exist; no further work is blocking. - Globals/Persistents: cross-unit name binding is wired; additional tests around nested functions/closures will be added.
Minimal example
MATLAB:
function y = f(s)
if isfield(s, 'x') && any(strcmp(fieldnames(s), 'y'))
s.y = 1;
end
y = g(s.x);
end
HIR sketch:
Function { name: "f", params: [s], outputs: [y], ... }
If { cond: FuncCall("isfield", [Var(s), String('x')]) && any(strcmp(fieldnames(s),'y')), then: [ AssignLValue(Member(Var(s),'y'), Number(1)) ] }
Assign(Var(y), FuncCall("g", [Member(Var(s), "x")]))
Return summaries infer type of g's first output if available; variable analysis refines s as a Struct with fields {x,y} along the then-branch.