Runtime Configuration¶

Runtime settings for GPU, timeout, and batch processing.

Overview¶

The runtime section controls execution settings.

runtime:
  gpu:
    memory: string        # GPU memory limit (default: "2GB")
    backend: string       # GPU backend (default: "auto")
  timeout: int            # Timeout in seconds (default: 30)
  batch_size: int         # Batch size (default: 1)
  enable_profiling: bool  # Enable profiling (default: false)

GPU Configuration¶

`gpu.memory`¶

GPU memory limit.

Type: string
Required: No
Default: 2GB
Format: Number + Unit (GB, MB, KB)

runtime:
  gpu:
    memory: 2GB    # 2 gigabytes
    memory: 512MB  # 512 megabytes
    memory: 1024KB # 1024 kilobytes

`gpu.backend`¶

Preferred GPU backend.

Type: string
Required: No
Default: auto
Values: auto, cuda, coreml, rocm, directml, openvino, tensorrt

runtime:
  gpu:
    backend: auto      # Auto-detect best provider
    backend: cuda      # Force CUDA
    backend: coreml    # Force CoreML (Apple Silicon)

Execution Settings¶

`timeout`¶

Inference timeout in seconds.

Type: integer
Required: No
Default: 30

runtime:
  timeout: 30   # 30 second timeout
  timeout: 60   # 1 minute timeout

`batch_size`¶

Default batch size for inference.

Type: integer
Required: No
Default: 1

runtime:
  batch_size: 1   # Single inference
  batch_size: 32  # Batch of 32

`enable_profiling`¶

Enable performance profiling.

Type: boolean
Required: No
Default: false

runtime:
  enable_profiling: true

Examples¶

Minimal¶

runtime:
  gpu:
    memory: 2GB

CUDA Configuration¶

runtime:
  gpu:
    memory: 4GB
    backend: cuda
  timeout: 60
  batch_size: 8

CoreML (Apple Silicon)¶

runtime:
  gpu:
    memory: 1GB
    backend: coreml
  timeout: 30
  batch_size: 1

CPU-Only¶

runtime:
  gpu:
    backend: cpu
  timeout: 120
  batch_size: 4

Platform-Specific Examples¶

NVIDIA GPU¶

runtime:
  gpu:
    memory: 8GB
    backend: cuda  # or tensorrt for optimization
  batch_size: 16

AMD GPU¶

runtime:
  gpu:
    memory: 4GB
    backend: rocm
  batch_size: 8

Apple Silicon¶

runtime:
  gpu:
    memory: 2GB
    backend: coreml
  batch_size: 1

Windows DirectML¶

runtime:
  gpu:
    memory: 4GB
    backend: directml
  batch_size: 8

Best Practices¶

Set Appropriate Memory Limits

Set GPU memory based on model size: - Small models (<100MB): 1GB - Medium models (100-500MB): 2GB - Large models (>500MB): 4GB+

Use Auto Backend in Development

Let GPUX choose the best provider:

runtime:
  gpu:
    backend: auto

Specify Backend in Production

Use explicit backend in production:

runtime:
  gpu:
    backend: cuda  # Known hardware

Adjust Timeout for Large Models

Increase timeout for large models:

runtime:
  timeout: 120  # 2 minutes for LLMs