GPUXRuntime¶
Main runtime class for ML inference with universal GPU compatibility.
Overview¶
GPUXRuntime is the core class for loading ONNX models and running inference with automatic GPU provider selection.
from gpux import GPUXRuntime
runtime = GPUXRuntime(model_path="model.onnx")
result = runtime.infer({"input": data})
Class: GPUXRuntime¶
Constructor¶
Parameters:
model_path(str | Path, optional): Path to ONNX model fileprovider(str, optional): Preferred execution provider (cuda,coreml,rocm, etc.)**kwargs: Additional runtime configurationmemory_limit(str): GPU memory limit (default:"2GB")batch_size(int): Batch size (default:1)timeout(int): Timeout in seconds (default:30)enable_profiling(bool): Enable profiling (default:False)
Example:
runtime = GPUXRuntime(
model_path="sentiment.onnx",
provider="cuda",
memory_limit="4GB",
batch_size=8
)
Methods¶
load_model()¶
Load an ONNX model for inference.
Parameters:
model_path(str | Path): Path to ONNX model fileprovider(str, optional): Preferred execution provider
Raises:
FileNotFoundError: If model file doesn't existRuntimeError: If model cannot be loaded
Example:
infer()¶
Run inference on input data.
Parameters:
inputs(dict): Input data dictionary mapping input names to NumPy arrays or values
Returns:
dict[str, np.ndarray]: Output dictionary mapping output names to NumPy arrays
Raises:
ValueError: If model not loaded or invalid inputsRuntimeError: If inference fails
Example:
result = runtime.infer({
"input_ids": np.array([[1, 2, 3]]),
"attention_mask": np.array([[1, 1, 1]])
})
print(result["logits"])
benchmark()¶
Run performance benchmark.
Parameters:
inputs(dict): Input data for benchmarkingnum_runs(int): Number of benchmark iterations (default:100)warmup_runs(int): Number of warmup iterations (default:10)
Returns:
dict[str, float]: Performance metricsmean_time: Mean inference time (ms)min_time: Minimum time (ms)max_time: Maximum time (ms)std_dev: Standard deviation (ms)throughput_fps: Throughput in FPS
Example:
metrics = runtime.benchmark(
inputs={"input": np.random.rand(1, 10).astype(np.float32)},
num_runs=1000,
warmup_runs=50
)
print(f"Mean time: {metrics['mean_time']:.2f} ms")
print(f"Throughput: {metrics['throughput_fps']:.1f} FPS")
get_model_info()¶
Get model information.
Returns:
ModelInfo | None: Model information or None if not loaded
Example:
info = runtime.get_model_info()
print(f"Model: {info.name} v{info.version}")
print(f"Size: {info.size_mb:.1f} MB")
get_provider_info()¶
Get selected provider information.
Returns:
dict: Provider information
Example:
provider = runtime.get_provider_info()
print(f"Provider: {provider['name']}")
print(f"Platform: {provider['platform']}")
get_available_providers()¶
Get list of available providers.
Returns:
list[str]: List of available provider names
Example:
cleanup()¶
Clean up resources.
Example:
Complete Example¶
import numpy as np
from gpux import GPUXRuntime
# Initialize runtime
runtime = GPUXRuntime(
model_path="sentiment.onnx",
provider="cuda",
memory_limit="2GB"
)
# Get model info
info = runtime.get_model_info()
print(f"Model: {info.name}")
print(f"Inputs: {[inp.name for inp in info.inputs]}")
# Run inference
result = runtime.infer({
"input_ids": np.array([[101, 2054, 2003, ...]]),
"attention_mask": np.array([[1, 1, 1, ...]])
})
print(f"Sentiment: {result['logits']}")
# Benchmark
metrics = runtime.benchmark(
inputs={"input_ids": np.array([[101, 2054]]), "attention_mask": np.array([[1, 1]])},
num_runs=1000
)
print(f"Mean time: {metrics['mean_time']:.2f} ms")
# Cleanup
runtime.cleanup()