GPU Providers¶

Understanding execution providers and how GPUX selects the best backend for your hardware.

🎯 What You'll Learn¶

✅ What execution providers are
✅ Available providers and platforms
✅ Provider selection logic
✅ Platform-specific optimization
✅ Troubleshooting provider issues

🧠 What are Execution Providers?¶

Execution providers are backends that execute ONNX models on specific hardware:

graph LR
    A[ONNX Model] --> B[ONNX Runtime]
    B --> C{Provider}
    C -->|NVIDIA| D[TensorRT/CUDA]
    C -->|Apple| E[CoreML]
    C -->|AMD| F[ROCm]
    C -->|Intel| G[OpenVINO]
    C -->|Windows| H[DirectML]
    C -->|Fallback| I[CPU]

📋 Available Providers¶

Priority Order¶

GPUX selects providers in this order:

TensorrtExecutionProvider - NVIDIA TensorRT (best performance)
CUDAExecutionProvider - NVIDIA CUDA
ROCmExecutionProvider - AMD ROCm
CoreMLExecutionProvider - Apple Silicon
DirectMLExecutionProvider - Windows DirectML
OpenVINOExecutionProvider - Intel OpenVINO
CPUExecutionProvider - CPU fallback

Provider Details¶

Provider	Hardware	OS	Performance
TensorRT	NVIDIA GPU	Linux, Windows	⭐⭐⭐⭐⭐
CUDA	NVIDIA GPU	Linux, Windows	⭐⭐⭐⭐
ROCm	AMD GPU	Linux	⭐⭐⭐⭐
CoreML	Apple Silicon	macOS	⭐⭐⭐⭐
DirectML	Any GPU	Windows	⭐⭐⭐
OpenVINO	Intel GPU/CPU	All	⭐⭐⭐
CPU	Any	All	⭐⭐

🔍 Provider Selection¶

Automatic Selection¶

GPUX automatically selects the best provider:

runtime:
  gpu:
    backend: auto  # Automatic selection

Manual Selection¶

Force a specific provider:

# Build with specific provider
gpux build . --provider cuda

# Or in gpux.yml
runtime:
  gpu:
    backend: cuda

Check Selected Provider¶

gpux inspect model-name

🖥️ Platform-Specific Guides¶

NVIDIA GPUs¶

Requirements: - CUDA 11.8+ or 12.x - cuDNN 8.x - NVIDIA drivers 520+

Install CUDA Runtime:

# Check CUDA version
nvidia-smi

# Install ONNX Runtime GPU
pip install onnxruntime-gpu

Configuration:

runtime:
  gpu:
    backend: cuda  # or tensorrt for best performance
    memory: 4GB

TensorRT Optimization:

# TensorRT provides 2-10x speedup
gpux build . --provider tensorrt

Apple Silicon (M1/M2/M3)¶

Requirements: - macOS 12.0+ - Apple Silicon Mac

CoreML is built-in:

runtime:
  gpu:
    backend: coreml
    memory: 2GB

Performance: - ✅ Excellent for small-medium models - ✅ Low power consumption - ✅ Unified memory architecture

AMD GPUs¶

Requirements: - ROCm 5.4+ - Supported AMD GPU

Install ROCm Runtime:

# Install ONNX Runtime with ROCm
pip install onnxruntime-rocm

# Verify ROCm
rocm-smi

Configuration:

runtime:
  gpu:
    backend: rocm
    memory: 4GB

Intel GPUs¶

Requirements: - Intel GPU (integrated or Arc) - OpenVINO toolkit

Install OpenVINO:

pip install onnxruntime-openvino

Configuration:

runtime:
  gpu:
    backend: openvino
    memory: 2GB

Windows (DirectML)¶

Requirements: - Windows 10/11 - DirectX 12 compatible GPU

DirectML works with: - NVIDIA GPUs - AMD GPUs - Intel GPUs

Configuration:

runtime:
  gpu:
    backend: directml
    memory: 2GB

⚙️ Provider Configuration¶

CUDA Configuration¶

from gpux.core.providers import ProviderManager

manager = ProviderManager()
cuda_config = {
    'device_id': 0,  # GPU device ID
    'cudnn_conv_algo_search': 'EXHAUSTIVE',
    'do_copy_in_default_stream': True
}

TensorRT Configuration¶

tensorrt_config = {
    'trt_max_workspace_size': 1 << 30,  # 1GB
    'trt_fp16_enable': True,  # FP16 optimization
    'trt_engine_cache_enable': True  # Cache compiled engines
}

CoreML Configuration¶

coreml_config = {
    'coreml_flags': 0,  # Default settings
}

🔄 Fallback Behavior¶

If preferred provider fails, GPUX falls back to CPU:

TensorRT → CUDA → CPU
          ↓
      (if failed)

Example:

# Try CUDA, fallback to CPU if not available
gpux build . --provider cuda

🐛 Troubleshooting¶

Provider Not Available¶

Check available providers:

import onnxruntime as ort
print(ort.get_available_providers())

Install missing provider:

# NVIDIA
pip install onnxruntime-gpu

# AMD
pip install onnxruntime-rocm

# Intel
pip install onnxruntime-openvino

Provider Selection Failed¶

Error: No execution providers available

Solution: 1. Verify drivers installed 2. Check ONNX Runtime version 3. Try CPU fallback

gpux build . --provider cpu

Performance Issues¶

Compare providers:

# Benchmark each provider
gpux build . --provider cuda
gpux run model --benchmark --runs 1000

gpux build . --provider cpu
gpux run model --benchmark --runs 1000

📊 Performance Comparison¶

Example inference times for ResNet-50:

Provider	Time (ms)	Speedup
TensorRT	2.1	47x
CUDA	4.5	22x
CoreML	8.3	12x
DirectML	15.2	6.5x
OpenVINO	18.7	5.3x
CPU	98.5	1x

Results vary by hardware and model

💡 Key Takeaways¶

What You Learned

✅ Execution providers explained ✅ Provider priority and selection ✅ Platform-specific setup ✅ Configuration options ✅ Troubleshooting provider issues ✅ Performance comparison

Previous: Models | Next: Inputs & Outputs →