GPUXRuntime¶

Main runtime class for ML inference with universal GPU compatibility.

Overview¶

GPUXRuntime is the core class for loading ONNX models and running inference with automatic GPU provider selection.

from gpux import GPUXRuntime

runtime = GPUXRuntime(model_path="model.onnx")
result = runtime.infer({"input": data})

Class: `GPUXRuntime`¶

Constructor¶

GPUXRuntime(
    model_path: str | Path | None = None,
    provider: str | None = None,
    **kwargs
)

Parameters:

model_path (str | Path, optional): Path to ONNX model file
provider (str, optional): Preferred execution provider (cuda, coreml, rocm, etc.)
**kwargs: Additional runtime configuration
memory_limit (str): GPU memory limit (default: "2GB")
batch_size (int): Batch size (default: 1)
timeout (int): Timeout in seconds (default: 30)
enable_profiling (bool): Enable profiling (default: False)

Example:

runtime = GPUXRuntime(
    model_path="sentiment.onnx",
    provider="cuda",
    memory_limit="4GB",
    batch_size=8
)

Methods¶

`load_model()`¶

Load an ONNX model for inference.

load_model(
    model_path: str | Path,
    provider: str | None = None
) -> None

Parameters:

model_path (str | Path): Path to ONNX model file
provider (str, optional): Preferred execution provider

Raises:

FileNotFoundError: If model file doesn't exist
RuntimeError: If model cannot be loaded

Example:

runtime = GPUXRuntime()
runtime.load_model("model.onnx", provider="cuda")

`infer()`¶

Run inference on input data.

infer(inputs: dict[str, Any]) -> dict[str, np.ndarray]

Parameters:

inputs (dict): Input data dictionary mapping input names to NumPy arrays or values

Returns:

dict[str, np.ndarray]: Output dictionary mapping output names to NumPy arrays

Raises:

ValueError: If model not loaded or invalid inputs
RuntimeError: If inference fails

Example:

result = runtime.infer({
    "input_ids": np.array([[1, 2, 3]]),
    "attention_mask": np.array([[1, 1, 1]])
})
print(result["logits"])

`benchmark()`¶

Run performance benchmark.

benchmark(
    inputs: dict[str, Any],
    num_runs: int = 100,
    warmup_runs: int = 10
) -> dict[str, float]

Parameters:

inputs (dict): Input data for benchmarking
num_runs (int): Number of benchmark iterations (default: 100)
warmup_runs (int): Number of warmup iterations (default: 10)

Returns:

dict[str, float]: Performance metrics
mean_time: Mean inference time (ms)
min_time: Minimum time (ms)
max_time: Maximum time (ms)
std_dev: Standard deviation (ms)
throughput_fps: Throughput in FPS

Example:

metrics = runtime.benchmark(
    inputs={"input": np.random.rand(1, 10).astype(np.float32)},
    num_runs=1000,
    warmup_runs=50
)
print(f"Mean time: {metrics['mean_time']:.2f} ms")
print(f"Throughput: {metrics['throughput_fps']:.1f} FPS")

`get_model_info()`¶

Get model information.

get_model_info() -> ModelInfo | None

Returns:

ModelInfo | None: Model information or None if not loaded

Example:

info = runtime.get_model_info()
print(f"Model: {info.name} v{info.version}")
print(f"Size: {info.size_mb:.1f} MB")

`get_provider_info()`¶

Get selected provider information.

get_provider_info() -> dict[str, Any]

Returns:

dict: Provider information

Example:

provider = runtime.get_provider_info()
print(f"Provider: {provider['name']}")
print(f"Platform: {provider['platform']}")

`get_available_providers()`¶

Get list of available providers.

get_available_providers() -> list[str]

Returns:

list[str]: List of available provider names

Example:

providers = runtime.get_available_providers()
print(f"Available: {', '.join(providers)}")

`cleanup()`¶

Clean up resources.

cleanup() -> None

Example:

runtime.cleanup()

Complete Example¶

import numpy as np
from gpux import GPUXRuntime

# Initialize runtime
runtime = GPUXRuntime(
    model_path="sentiment.onnx",
    provider="cuda",
    memory_limit="2GB"
)

# Get model info
info = runtime.get_model_info()
print(f"Model: {info.name}")
print(f"Inputs: {[inp.name for inp in info.inputs]}")

# Run inference
result = runtime.infer({
    "input_ids": np.array([[101, 2054, 2003, ...]]),
    "attention_mask": np.array([[1, 1, 1, ...]])
})
print(f"Sentiment: {result['logits']}")

# Benchmark
metrics = runtime.benchmark(
    inputs={"input_ids": np.array([[101, 2054]]), "attention_mask": np.array([[1, 1]])},
    num_runs=1000
)
print(f"Mean time: {metrics['mean_time']:.2f} ms")

# Cleanup
runtime.cleanup()

GPUXRuntime¶

Overview¶

Class: GPUXRuntime¶

Constructor¶

Methods¶

load_model()¶

infer()¶

benchmark()¶

get_model_info()¶

get_provider_info()¶

get_available_providers()¶

cleanup()¶

Complete Example¶

See Also¶

Class: `GPUXRuntime`¶

`load_model()`¶

`infer()`¶

`benchmark()`¶

`get_model_info()`¶

`get_provider_info()`¶

`get_available_providers()`¶

`cleanup()`¶