RunMat Accelerate API
Purpose
runmat-accelerate-api
defines the minimal, dependency-light interface between the language runtime and GPU acceleration providers. It exists so that:
- The runtime (
runmat-runtime
) can expose user-facing builtins likegpuArray
andgather
without taking a hard dependency on any particular backend implementation. - Acceleration crates (e.g.,
runmat-accelerate
) can implement providers and register themselves at process startup, enabling device-resident execution when available and falling back to CPU when not.
This crate intentionally avoids depending on other workspace crates to prevent dependency cycles.
What it contains
- GpuTensorHandle: A small, opaque handle describing a device-resident tensor (shape, device id, and a backend-managed buffer id). The actual device memory is fully managed by the backend.
- AccelProvider: The provider trait backends implement to upload/download/free GPU buffers and to report device info.
- Global provider registry:
register_provider
andprovider()
allow a single global provider to be installed by the host (e.g.,runmat-accelerate
or an application) and used by the runtime. - Host tensor types:
HostTensorView
andHostTensorOwned
provide simple host-side tensor representations (data + shape) without introducing a dependency onrunmat-builtins
.
How it fits with the rest of the system
runmat-runtime
implements the MATLAB-facing builtinsgpuArray(x)
andgather(x)
. These builtins callrunmat-accelerate-api::provider()
and, if a provider is registered, forward to it. Otherwise, they return CPU fallbacks (keeping semantics predictable even when no GPU is present).runmat-accelerate
implements one or more concrete providers and callsregister_provider(...)
during initialization (e.g., from the REPL/CLI startup or host app), so the runtime automatically benefits from GPU acceleration where appropriate.
Autograd & optimization (planned)
This API is intentionally minimal; reverse-mode autograd and higher-level optimizations live above this layer in runmat-accelerate
and the runtime/JIT. The expected integration is:
- The runtime will provide a tape/graph for Tensor/Matrix ops (on CPU or device). Gradients are computed by composing known primitive gradients; providers can remain unaware of autograd.
- When a provider offers fused kernels, the planner/JIT can map differentiated op chains to fewer device launches (elementwise fusion, simple BLAS fusions), reducing temporaries and transfers by default.
- No changes to this API are required for basic autograd; future extensions (optional) may add hooks for fused gradient kernels and streams/queues.
Safety and lifetime
register_provider
stores a 'static
reference. The caller must ensure the provider outlives the program (common in singletons created at startup). The handle (GpuTensorHandle
) contains only POD metadata (no GC pointers) and is safe to move/copy.
API reference (concise)
pub struct GpuTensorHandle { pub shape: Vec<usize>, pub device_id: u32, pub buffer_id: u64 }
pub trait AccelProvider { upload(&HostTensorView) -> Result<GpuTensorHandle>; download(&GpuTensorHandle) -> Result<HostTensorOwned>; free(&GpuTensorHandle) -> Result<()>; device_info() -> String }
pub struct HostTensorView<'a> { pub data: &'a [f64], pub shape: &'a [usize] }
pub struct HostTensorOwned { pub data: Vec<f64>, pub shape: Vec<usize> }
pub unsafe fn register_provider(&'static dyn AccelProvider)
pub fn provider() -> Option<&'static dyn AccelProvider>
Example (provider skeleton)
use runmat_accelerate_api::{AccelProvider, GpuTensorHandle, HostTensorOwned, HostTensorView, register_provider};
struct MyProvider;
impl AccelProvider for MyProvider {
fn upload(&self, host: &HostTensorView) -> anyhow::Result<GpuTensorHandle> {
// allocate device buffer, copy host.data, return handle
Ok(GpuTensorHandle { shape: host.shape.to_vec(), device_id: 0, buffer_id: 1 })
}
fn download(&self, h: &GpuTensorHandle) -> anyhow::Result<HostTensorOwned> {
// copy device -> host and return
Ok(HostTensorOwned { data: vec![0.0; h.shape.iter().product()], shape: h.shape.clone() })
}
fn free(&self, _h: &GpuTensorHandle) -> anyhow::Result<()> { Ok(()) }
fn device_info(&self) -> String { "MyDevice 0".to_string() }
}
// at startup
unsafe { register_provider(&MyProvider); }
Current state
- Stable trait and handle definitions.
- Used by
runmat-runtime
builtins (gpuArray
,gather
). - Ready for concrete provider implementations in
runmat-accelerate
.
Roadmap
- Add optional fields (dtype, strides) without breaking existing providers (via feature flags or additive fields).
- Extend provider API for streams/queues, memory pools, unified/pinned memory, and device events.
- Multi-device provider model and selection heuristics.
- Document autograd integration points and optional provider hooks for fused gradient kernels.