Back to Benchmarks

Elementwise Math Benchmark

11/25/2025
5 min read
By RunMat Team

This benchmark exercises an elementwise sequence representative of an arbitrary image/signal pre-processing. For each vector `x` we compute:

y0 = sin(x) .* exp(-x / single(10));
y1 = y0 .* cos(x / 4) + single(0.25) .* (y0 .^ 2);
y2 = tanh(y1) + single(0.1) .* y1;

The scripts scale the number of samples via ELM_POINTS (default 5,000,001). Every implementation prints RESULT_ok.


Results

RunMat is up to 144x faster

Elementwise Math Perf Sweep (points)

pointsRunMat (ms)PyTorch (ms)NumPy (ms)NumPy ÷ RunMatPyTorch ÷ RunMat
1M145.15856.4172.390.50×5.90×
2M149.75901.0579.490.53×6.02×
5M145.141111.16119.450.82×7.66×
10M143.391377.43154.381.08×9.61×
100M144.8116,404.221,073.097.41×113.28×
200M156.9416,558.982,114.6613.47×105.51×
500M137.5817,882.115,026.9436.54×129.97×
1B144.4020,841.4211,931.9382.63×144.34×

M = 10⁶ elements, B = 10⁹ elements.


Full sources:


Why RunMat is fast (accelerate + fusion)

RunMat fuses elementwise stages and keeps tensors resident on device between steps, while random number generation and updates execute in large, coalesced kernels—a strong fit for GPUs. For the big picture on fusion and residency, see the Introduction to RunMat on the GPU document.


Reproduce the benchmarks

See the benchmarks directory in the RunMat repo on GitHub for the full source code and instructions to reproduce the benchmarks: runmat-org/runmat/benchmarks.

Enjoyed this benchmark? Join the newsletter

Monthly updates on RunMat, Rust internals, and performance tips.

Ready to try RunMat?

Get started with the modern MATLAB runtime today.