Fusion Reference

Fusion Guide

RunMat's acceleration layer recognises multiple flavours of fusible graphs and hands them to your GPU provider as single kernels.

RunMat fuses common patterns that show up across linear algebra, signal processing, imaging, and solver workloads into single GPU programs. Keeping them fused prevents redundant memory traffic and lets us re-use provider kernels to ship quickly.

Elementwise & reductions: Collapse dozens of scalar operations into one dispatch and prevent repeated reads/writes of the same tensor.
Matmul epilogues: Fusing scale, bias, and activation work avoids launching a second kernel that touches the full matrix again and delivers RunMat's matmul + activation parity goals.
Covariance / Gram / power-step / explained-variance chains: Iterative factorizations spend most of their time in repeated "multiply, renormalize, measure" loops. Treating each stage as a fusion kind keeps eigensolvers and Krylov methods resident on the GPU.
Image normalisation: Imaging and sensor pipelines often start with per-frame whitening plus gain/bias adjustments. Folding statistics and affine transforms into one kernel removes several launches per frame.

We prioritised these groups because they appear across domains, keep chatty host/device traffic off PCIe, and benefit greatly from fusion. We'll be adding more fusion groups in the future to cover more workloads.

Have a new fusion flavour in mind? Open an issue or submit a pull request so we can explore it together.

RunMat Currently Fuses the Following Patterns

Elementwise Chains

RunMat collapses arithmetic and transcendental expressions into one shader with full broadcasting.

Read Guide →

Reductions

sum, mean, and similar column/row reductions with omit-NaN handling and scaling rules.

Read Guide →

Matmul Epilogues

Keep matmul outputs on device for scale, bias, clamp, pow, and diagonal extraction epilogues.

Read Guide →

Centered Gram / Covariance

Mean subtraction plus covariance / Gram construction for any tall matrix stays resident.

Read Guide →

Power-Step Normalisation

Fuse matmul plus vector normalisation stages for iterative solvers and eigensolvers.

Read Guide →

Explained Variance

Track diag(Q' * G * Q)-style diagnostics without leaving the GPU.

Read Guide →

Image Normalisation

Batch × H × W whitening, gain, and bias fusion for image-like tensors.

Read Guide →

How to Use These Docs

Looking for coverage: Start with the link that matches your math. Each page lists the exact instruction patterns the fusion planner looks for and the operations that stay on device.
Investigating surprises: If a workload isn't fusing, cross-check the prerequisites section (e.g. single-consumer chains for elementwise groups or constant epsilon for power steps).
Extending RunMat: Combine these docs with docs/HOW_RUNMAT_FUSION_WORKS.md to see where to add new detection logic or builtin metadata.
Telemetry correlation: Provider telemetry reports fusion_kind labels. Match those labels to the filenames above to understand what the GPU executed.