Power-Step Normalisation Fusion
Power-step fusion targets iterative solvers that repeatedly multiply by a matrix and renormalise the result (e.g. block power iteration, Krylov updates, orthogonal projection loops).
What Qualifies
- Matmul numerator. The numerator must be a builtin
mtimes(lhs, rhs). - Denominator structure. The denominator must follow
sqrt(sum((numerator.^2)) + epsilon)orsqrt(sum((numerator.^2)))with an optional scalar epsilon add. - Shared input. The
.^2node must receive the same value ID as the matmul output; extra reshapes/transposes are not recognised. - Constant epsilon. If an epsilon is present it must be a scalar literal so the planner can embed it.
- Exclusive ownership. Nodes in the pattern (matmul, pow, sum, add, sqrt, div) cannot be part of another fusion group.
Why Power-Step Normalisation?
Power-step normalisation is a common pattern in iterative solvers. It shows up in block power iteration, Krylov updates, orthogonal projection loops, and other iterative solvers. By fusing this pattern, we can keep the normalised result resident on the GPU, avoiding the overhead of uploading and downloading it to the CPU, and allows us to compute it without launching a second kernel.
Benefits
- Fewer GPU launches. We can consolidate multiple power-step normalisation computations into a single kernel, reducing the number of GPU launches.
Limitations & Tips
- The detector expects the epsilon add (if any) to appear either inside or outside the
sqrtexactly once. Reordering the scalar add will break fusion. - Different normalisation schemes (e.g. per-column scaling or norms other than 2-norm) are not yet fused.
- Very large
lhs/rhsshapes may trip the provider’s matmul guardrails and force a fallback.
If a workload should fuse but does not, enable RUNMAT_DEBUG_FUSION=1 to have the planner print why a node was rejected, then compare against the criteria above.