API Reference¶

Complete API reference for GPUX.

Command-Line Interface¶

Documentation for all CLI commands:

gpux build - Build and optimize models
gpux run - Run inference on models
gpux serve - Start HTTP server
gpux inspect - Inspect models and runtime

Configuration Reference¶

Complete gpux.yml configuration reference:

Schema Overview - Complete schema reference
Model - Model source and format
Inputs - Input specifications
Outputs - Output specifications
Runtime - GPU and runtime settings
Serving - HTTP server configuration
Preprocessing - Data preprocessing

Python API¶

Python API reference for programmatic usage:

GPUXRuntime - Main runtime class
ProviderManager - Execution provider management
ModelInspector - Model introspection
Configuration - Configuration parsing

HTTP API¶

REST API reference for serving:

Endpoints Overview - All endpoints
POST /predict - Run inference
GET /health - Health check
GET /info - Model information

Quick Reference¶

Common Commands¶

# Build model
gpux build .

# Run inference
gpux run model-name --input '{"data": [1,2,3]}'

# Start server
gpux serve model-name --port 8080

# Inspect model
gpux inspect model-name

Configuration Template¶

name: model-name
version: 1.0.0

model:
  source: ./model.onnx

inputs:
  - name: input
    type: float32
    shape: [1, 10]

outputs:
  - name: output
    type: float32
    shape: [1, 2]

runtime:
  gpu:
    backend: auto
    memory: 2GB

Python API Example¶

from gpux import GPUXRuntime

runtime = GPUXRuntime(model_path="model.onnx")
result = runtime.infer({"input": data})
print(result["output"])

API Reference¶

Command-Line Interface¶

Configuration Reference¶

Python API¶

HTTP API¶

Quick Reference¶

Common Commands¶

Configuration Template¶

Python API Example¶

See Also¶