Benchmarking¶

Measure and optimize model performance with built-in benchmarking tools.

🎯 What You'll Learn¶

✅ Running performance benchmarks
✅ Understanding metrics
✅ Comparing providers
✅ Optimization strategies

🚀 Quick Benchmark¶

Run a quick benchmark:

gpux run model-name --benchmark --runs 100

Output:

╭─ Benchmark Results ─────────────────────╮
│ Mean Time     │ 2.45 ms                 │
│ Std Time      │ 0.12 ms                 │
│ Min Time      │ 2.30 ms                 │
│ Max Time      │ 2.85 ms                 │
│ Median Time   │ 2.43 ms                 │
│ P95 Time      │ 2.68 ms                 │
│ P99 Time      │ 2.78 ms                 │
│ Throughput    │ 408.2 fps               │
╰─────────────────────────────────────────╯

📊 Benchmark Options¶

Number of Runs¶

# Quick test (10 runs)
gpux run model --benchmark --runs 10

# Standard (100 runs)
gpux run model --benchmark --runs 100

# Thorough (1000 runs)
gpux run model --benchmark --runs 1000

Warmup Runs¶

Allow model to warm up before measuring:

gpux run model --benchmark --runs 1000 --warmup 50

📈 Understanding Metrics¶

Metric	Description
Mean Time	Average inference time
Std Time	Standard deviation (consistency)
Min Time	Fastest inference
Max Time	Slowest inference
Median Time	Middle value (50^th percentile)
P95 Time	95^th percentile (tail latency)
P99 Time	99^th percentile (worst case)
Throughput	Inferences per second (fps)

🔄 Comparing Providers¶

Test different providers:

# Benchmark with auto-selected provider
gpux run model --benchmark --runs 1000

# Benchmark with CUDA
gpux build . --provider cuda
gpux run model --benchmark --runs 1000

# Benchmark with CPU
gpux build . --provider cpu
gpux run model --benchmark --runs 1000

🐍 Python API Benchmarking¶

from gpux import GPUXRuntime
import numpy as np

runtime = GPUXRuntime(model_path="model.onnx")

# Prepare test data
test_data = {"input": np.random.rand(1, 10).astype(np.float32)}

# Run benchmark
metrics = runtime.benchmark(
    input_data=test_data,
    num_runs=1000,
    warmup_runs=100
)

print(f"Mean time: {metrics['mean_time_ms']:.2f} ms")
print(f"Throughput: {metrics['throughput_fps']:.1f} fps")

📊 Saving Results¶

Save benchmark results to file:

gpux run model --benchmark --runs 1000 --output benchmark.json

Results in JSON:

{
  "mean_time_ms": 2.45,
  "std_time_ms": 0.12,
  "min_time_ms": 2.30,
  "max_time_ms": 2.85,
  "median_time_ms": 2.43,
  "p95_time_ms": 2.68,
  "p99_time_ms": 2.78,
  "throughput_fps": 408.2
}

🚀 Optimization Tips¶

1. Use GPU Acceleration¶

runtime:
  gpu:
    backend: auto  # Let GPUX choose best provider

2. Optimize Batch Size¶

runtime:
  batch_size: 8  # Process 8 samples at once

Benchmark different batch sizes to find optimal value.

3. Enable TensorRT (NVIDIA)¶

For NVIDIA GPUs, TensorRT provides best performance:

gpux build . --provider tensorrt

4. Profile Your Model¶

Enable profiling to identify bottlenecks:

runtime:
  enable_profiling: true

💡 Key Takeaways¶

What You Learned

✅ How to run benchmarks ✅ Understanding performance metrics ✅ Comparing different providers ✅ Optimization strategies ✅ Saving and analyzing results

Previous: Running Inference | Next: Serving