Runtime Configuration¶
Runtime settings for GPU, timeout, and batch processing.
Overview¶
The runtime section controls execution settings.
runtime:
gpu:
memory: string # GPU memory limit (default: "2GB")
backend: string # GPU backend (default: "auto")
timeout: int # Timeout in seconds (default: 30)
batch_size: int # Batch size (default: 1)
enable_profiling: bool # Enable profiling (default: false)
GPU Configuration¶
gpu.memory¶
GPU memory limit.
- Type:
string - Required: No
- Default:
2GB - Format: Number + Unit (
GB,MB,KB)
runtime:
gpu:
memory: 2GB # 2 gigabytes
memory: 512MB # 512 megabytes
memory: 1024KB # 1024 kilobytes
gpu.backend¶
Preferred GPU backend.
- Type:
string - Required: No
- Default:
auto - Values:
auto,cuda,coreml,rocm,directml,openvino,tensorrt
runtime:
gpu:
backend: auto # Auto-detect best provider
backend: cuda # Force CUDA
backend: coreml # Force CoreML (Apple Silicon)
Execution Settings¶
timeout¶
Inference timeout in seconds.
- Type:
integer - Required: No
- Default:
30
batch_size¶
Default batch size for inference.
- Type:
integer - Required: No
- Default:
1
enable_profiling¶
Enable performance profiling.
- Type:
boolean - Required: No
- Default:
false
Examples¶
Minimal¶
CUDA Configuration¶
CoreML (Apple Silicon)¶
CPU-Only¶
Platform-Specific Examples¶
NVIDIA GPU¶
AMD GPU¶
Apple Silicon¶
Windows DirectML¶
Best Practices¶
Set Appropriate Memory Limits
Set GPU memory based on model size:
- Small models (<100MB): 1GB
- Medium models (100-500MB): 2GB
- Large models (>500MB): 4GB+
Specify Backend in Production
Use explicit backend in production:
Adjust Timeout for Large Models
Increase timeout for large models: