API Reference¶
Complete API reference for GPUX.
Command-Line Interface¶
Documentation for all CLI commands:
- gpux build - Build and optimize models
- gpux run - Run inference on models
- gpux serve - Start HTTP server
- gpux inspect - Inspect models and runtime
Configuration Reference¶
Complete gpux.yml configuration reference:
- Schema Overview - Complete schema reference
- Model - Model source and format
- Inputs - Input specifications
- Outputs - Output specifications
- Runtime - GPU and runtime settings
- Serving - HTTP server configuration
- Preprocessing - Data preprocessing
Python API¶
Python API reference for programmatic usage:
- GPUXRuntime - Main runtime class
- ProviderManager - Execution provider management
- ModelInspector - Model introspection
- Configuration - Configuration parsing
HTTP API¶
REST API reference for serving:
- Endpoints Overview - All endpoints
- POST /predict - Run inference
- GET /health - Health check
- GET /info - Model information
Quick Reference¶
Common Commands¶
# Build model
gpux build .
# Run inference
gpux run model-name --input '{"data": [1,2,3]}'
# Start server
gpux serve model-name --port 8080
# Inspect model
gpux inspect model-name
Configuration Template¶
name: model-name
version: 1.0.0
model:
source: ./model.onnx
inputs:
- name: input
type: float32
shape: [1, 10]
outputs:
- name: output
type: float32
shape: [1, 2]
runtime:
gpu:
backend: auto
memory: 2GB
Python API Example¶
from gpux import GPUXRuntime
runtime = GPUXRuntime(model_path="model.onnx")
result = runtime.infer({"input": data})
print(result["output"])