Skip to content

Batch Inference

Process multiple inputs efficiently with batching.


🎯 Overview

Learn batch processing for higher throughput.


📦 Batch Processing

Input Format

[
  {"text": "First input"},
  {"text": "Second input"},
  {"text": "Third input"}
]

Run Batch

gpux run model --file batch_input.json

⚡ Performance

Batch Size Optimization

runtime:
  batch_size: 8  # Process 8 at once

Guidelines: - Start with batch_size: 1 - Increase until memory limit - Benchmark to find optimal

Throughput Comparison

Batch Size Time (ms) Throughput (samples/sec)
1 10 100
8 35 229
32 120 267

🐍 Python API

from gpux import GPUXRuntime

runtime = GPUXRuntime("model.onnx")

# Batch inference
batch = [
    {"input": np.array([[1,2,3]])},
    {"input": np.array([[4,5,6]])},
]

results = runtime.batch_infer(batch)

💡 Key Takeaways

Success

✅ Batch input format ✅ Batch size optimization ✅ Performance gains ✅ Python API usage


Previous: Preprocessing | Next: Python API →