Platform Guides¶
Platform-specific optimization guides for GPUX.
Overview¶
GPUX provides universal GPU compatibility across all major platforms. Each platform has specific optimizations and execution providers for best performance.
Supported Platforms¶
🟢 NVIDIA GPUs¶
Execution Providers: TensorRT, CUDA
- Best performance with TensorRT optimization
- Wide range of GPU support (GeForce, Quadro, Tesla)
- Excellent for production deployments
🔴 AMD GPUs¶
Execution Provider: ROCm
- Native AMD GPU acceleration
- Support for Radeon and Instinct series
- Linux-focused deployment
🍎 Apple Silicon¶
Execution Provider: CoreML
- Optimized for M1/M2/M3/M4 chips
- Neural Engine acceleration
- Excellent power efficiency
🔷 Intel GPUs¶
Execution Provider: OpenVINO
- Support for Intel Iris and Arc GPUs
- CPU + iGPU optimization
- Cross-platform support
🪟 Windows DirectML¶
Execution Provider: DirectML
- Universal Windows GPU support
- Works with any DirectX 12 compatible GPU
- Fallback for non-NVIDIA Windows systems
💻 CPU-Only¶
Execution Provider: CPU
- Universal fallback
- No GPU required
- Multi-threaded optimization
Platform Comparison¶
| Platform | Provider | Performance | Power Efficiency | Ease of Setup |
|---|---|---|---|---|
| NVIDIA (TensorRT) | TensorrtExecutionProvider | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
| NVIDIA (CUDA) | CUDAExecutionProvider | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| AMD (ROCm) | ROCmExecutionProvider | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ |
| Apple Silicon | CoreMLExecutionProvider | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Intel (OpenVINO) | OpenVINOExecutionProvider | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Windows (DirectML) | DirectMLExecutionProvider | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| CPU | CPUExecutionProvider | ⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
Quick Start by Platform¶
NVIDIA GPU¶
# Install CUDA dependencies
uv add onnxruntime-gpu
# Configure for CUDA
runtime:
gpu:
backend: cuda
memory: 4GB
Apple Silicon¶
# CoreML is built-in with onnxruntime
uv add onnxruntime
# Configure for CoreML
runtime:
gpu:
backend: coreml
memory: 2GB
AMD GPU¶
# Install ROCm dependencies
uv add onnxruntime-rocm
# Configure for ROCm
runtime:
gpu:
backend: rocm
memory: 4GB
Windows¶
# Install DirectML
uv add onnxruntime-directml
# Configure for DirectML
runtime:
gpu:
backend: directml
memory: 2GB
Platform Detection¶
GPUX automatically detects your platform and selects the best provider:
from gpux import GPUXRuntime
# Auto-detect best provider
runtime = GPUXRuntime(model_path="model.onnx")
# Check selected provider
info = runtime.get_provider_info()
print(f"Using: {info['name']} on {info['platform']}")
Performance Tips by Platform¶
NVIDIA¶
- Use TensorRT for best performance (up to 4x faster than CUDA)
- Enable FP16 precision for RTX GPUs
- Optimize batch size based on GPU memory
Apple Silicon¶
- Use CoreML for Neural Engine acceleration
- Models <1GB work best on M-series chips
- Enable ANE (Apple Neural Engine) optimizations
AMD¶
- Use ROCm with latest drivers
- Monitor GPU utilization with
rocm-smi - Linux recommended over Windows
Intel¶
- Use OpenVINO for iGPU + CPU optimization
- Works well with INT8 quantized models
- Good for edge deployments
Windows¶
- DirectML works with any DX12 GPU
- Good fallback for non-NVIDIA systems
- Supports NVIDIA, AMD, and Intel GPUs
Environment Setup¶
Linux (NVIDIA)¶
# Install CUDA toolkit
sudo apt-get install cuda-toolkit-12-0
# Install GPUX with GPU support
uv add onnxruntime-gpu
macOS (Apple Silicon)¶
Linux (AMD)¶
Windows¶
# Install DirectML runtime
# (Usually included with Windows 10/11)
# Install GPUX with DirectML
uv add onnxruntime-directml
Troubleshooting¶
Provider Not Available¶
Check available providers:
GPU Not Detected¶
Verify GPU and drivers:
Performance Issues¶
- Check GPU utilization
- Verify correct provider is selected
- Optimize batch size
- Consider model quantization