Configuration ⚙️ Advanced¶
Master the gpux.yml configuration file and customize your models for optimal performance.
Advanced Feature
This guide covers manual configuration using gpux.yml files. Most users don't need this - you can use gpux pull to download models from Hugging Face without any configuration. This section is for:
- Users with custom ONNX models
- Advanced configuration needs
- Fine-tuning runtime settings
New to GPUX? Start with First Steps and Pulling Models instead.
🎯 What You'll Learn¶
- ✅ Complete
gpux.ymlstructure - ✅ Model configuration options
- ✅ Input and output specifications
- ✅ Runtime settings and GPU configuration
- ✅ Serving configuration
- ✅ Best practices and common patterns
📝 Configuration File Structure¶
The gpux.yml file is the heart of your GPUX project. It defines everything about your model:
name: model-name # Required: Model identifier
version: 1.0.0 # Required: Semantic version
description: "Description" # Optional: Model description
model: # Required: Model configuration
source: ./model.onnx
format: onnx
inputs: # Required: Input specifications
input_name:
type: float32
shape: [1, 10]
required: true
outputs: # Required: Output specifications
output_name:
type: float32
shape: [1, 2]
runtime: # Optional: Runtime settings
gpu:
memory: 2GB
backend: auto
batch_size: 1
timeout: 30
serving: # Optional: HTTP server config
port: 8080
host: 0.0.0.0
batch_size: 1
timeout: 5
preprocessing: # Optional: Preprocessing config
tokenizer: bert-base-uncased
max_length: 512
📦 Model Configuration¶
The model section defines your ONNX model file:
Basic Example¶
Path Options¶
Model Version¶
Optionally specify the model version separately from the project version:
📥 Input Configuration¶
Define your model's input specifications:
Single Input¶
inputs:
text:
type: string
required: true
max_length: 512
description: "Input text for sentiment analysis"
Multiple Inputs¶
inputs:
image:
type: float32
shape: [1, 3, 224, 224]
required: true
description: "RGB image tensor"
mask:
type: float32
shape: [1, 1, 224, 224]
required: false
description: "Optional attention mask"
Input Types¶
Supported data types:
| Type | Description | Example |
|---|---|---|
float32 |
32-bit floating point | [1.0, 2.5, 3.7] |
float64 |
64-bit floating point | [1.0, 2.5, 3.7] |
int32 |
32-bit integer | [1, 2, 3] |
int64 |
64-bit integer | [1, 2, 3] |
bool |
Boolean | [true, false] |
string |
String | "hello world" |
Shape Specification¶
Define tensor shapes:
Dynamic Shapes¶
Use -1 or omit dimensions for dynamic shapes:
Input Options¶
| Field | Required | Description |
|---|---|---|
type |
✅ Yes | Data type |
shape |
No | Tensor shape |
required |
No | Whether input is required (default: true) |
max_length |
No | Maximum length for strings |
description |
No | Human-readable description |
📤 Output Configuration¶
Define your model's output specifications:
Single Output¶
outputs:
sentiment:
type: float32
shape: [1, 2]
labels: [negative, positive]
description: "Sentiment probabilities"
Multiple Outputs¶
outputs:
logits:
type: float32
shape: [1, 1000]
description: "Raw model outputs"
probabilities:
type: float32
shape: [1, 1000]
labels: [class1, class2, ...] # 1000 classes
description: "Softmax probabilities"
Output Labels¶
Add human-readable labels for classification:
outputs:
emotion:
type: float32
shape: [1, 6]
labels:
- happy
- sad
- angry
- surprised
- neutral
- fearful
description: "Emotion classification"
Output Options¶
| Field | Required | Description |
|---|---|---|
type |
✅ Yes | Data type |
shape |
No | Tensor shape |
labels |
No | Class labels (for classification) |
description |
No | Human-readable description |
⚙️ Runtime Configuration¶
Configure GPU and performance settings:
Complete Example¶
runtime:
gpu:
memory: 4GB # GPU memory limit
backend: auto # Provider selection
batch_size: 1 # Default batch size
timeout: 30 # Timeout in seconds
enable_profiling: false # Enable performance profiling
GPU Configuration¶
Backend Options¶
| Backend | Description | Use Case |
|---|---|---|
auto |
Automatic selection | Default, recommended |
cuda |
NVIDIA CUDA | NVIDIA GPUs |
coreml |
Apple CoreML | Apple Silicon (M1/M2/M3) |
rocm |
AMD ROCm | AMD GPUs |
directml |
DirectML | Windows GPUs |
openvino |
Intel OpenVINO | Intel GPUs |
cpu |
CPU only | No GPU / debugging |
Memory Configuration¶
Specify GPU memory allocation:
Batch Size¶
Set default batch size for inference:
runtime:
batch_size: 1 # Process one sample at a time
# or
batch_size: 32 # Process 32 samples together
Batch Size Optimization
Larger batch sizes improve throughput but require more memory.
Start with batch_size: 1 and increase gradually.
Timeout¶
Set maximum inference time:
Performance Profiling¶
Enable detailed performance profiling:
This generates detailed timing information for debugging performance issues.
🌐 Serving Configuration¶
Configure HTTP server for production deployment:
Basic Example¶
Complete Example¶
serving:
port: 8080 # HTTP port
host: 0.0.0.0 # Bind address (0.0.0.0 = all interfaces)
batch_size: 1 # Server batch size
timeout: 5 # Request timeout (seconds)
max_workers: 4 # Number of worker processes
Serving Options¶
| Field | Default | Description |
|---|---|---|
port |
8080 |
HTTP server port |
host |
0.0.0.0 |
Bind address |
batch_size |
1 |
Batch size for requests |
timeout |
5 |
Request timeout (seconds) |
max_workers |
4 |
Worker processes |
Production Deployment
For production, use a reverse proxy (nginx, Caddy) in front of GPUX.
🔧 Preprocessing Configuration¶
Define preprocessing pipelines (advanced feature):
Text Preprocessing¶
Image Preprocessing¶
Custom Preprocessing¶
Preprocessing Status
Preprocessing features are planned but not fully implemented yet. For now, preprocess data before sending to GPUX.
📋 Complete Examples¶
Example 1: Text Classification¶
name: sentiment-analysis
version: 1.0.0
description: "BERT-based sentiment analysis"
model:
source: ./bert-sentiment.onnx
format: onnx
inputs:
text:
type: string
required: true
max_length: 512
description: "Input text to classify"
outputs:
sentiment:
type: float32
shape: [1, 2]
labels: [negative, positive]
description: "Sentiment probabilities"
runtime:
gpu:
memory: 2GB
backend: auto
batch_size: 1
timeout: 30
serving:
port: 8080
host: 0.0.0.0
timeout: 5
Example 2: Image Classification¶
name: image-classifier
version: 2.0.0
description: "ResNet-50 ImageNet classifier"
model:
source: ./resnet50.onnx
format: onnx
inputs:
image:
type: float32
shape: [1, 3, 224, 224]
required: true
description: "RGB image tensor (normalized)"
outputs:
probabilities:
type: float32
shape: [1, 1000]
description: "ImageNet class probabilities"
runtime:
gpu:
memory: 4GB
backend: auto
batch_size: 8
timeout: 10
serving:
port: 9000
host: 127.0.0.1
batch_size: 16
timeout: 10
max_workers: 2
Example 3: Multi-Input Model¶
name: multi-modal-model
version: 1.0.0
description: "Image + text multi-modal model"
model:
source: ./clip-model.onnx
format: onnx
inputs:
image:
type: float32
shape: [1, 3, 224, 224]
required: true
description: "Image tensor"
text:
type: string
required: true
max_length: 77
description: "Text description"
outputs:
similarity:
type: float32
shape: [1, 1]
description: "Image-text similarity score"
runtime:
gpu:
memory: 8GB
backend: auto
batch_size: 1
timeout: 15
✅ Validation¶
Validate your configuration file:
# Build validates configuration
gpux build .
# Or use Python
python -c "from gpux.config.parser import GPUXConfigParser; GPUXConfigParser().parse_file('gpux.yml')"
🎓 Best Practices¶
1. Use Descriptive Names¶
❌ Bad:
✅ Good:
2. Document Inputs/Outputs¶
❌ Bad:
✅ Good:
inputs:
text_embeddings:
type: float32
shape: [1, 768]
description: "BERT embeddings for input text"
3. Start Conservative¶
Start with conservative settings and optimize later:
runtime:
gpu:
memory: 2GB # Start small
backend: auto # Let GPUX choose
batch_size: 1 # Start with 1
timeout: 30 # Generous timeout
4. Use Semantic Versioning¶
- Major: Breaking changes
- Minor: New features (backward compatible)
- Patch: Bug fixes
5. Environment-Specific Configs¶
Create separate configs for different environments:
project/
├── gpux.yml # Default/development
├── gpux.prod.yml # Production
└── gpux.test.yml # Testing
Use with:
🐛 Common Issues¶
Invalid YAML Syntax¶
Error: Invalid YAML in configuration file
Solution: Check indentation and syntax:
Missing Required Fields¶
Error: At least one input must be specified
Solution: Ensure you have all required sections:
- ✅ name
- ✅ version
- ✅ model
- ✅ inputs
- ✅ outputs
Type Mismatch¶
Error: Type mismatch for input: expected float32, got int32
Solution: Ensure input types in gpux.yml match your ONNX model:
📚 What's Next?¶
Now that you understand configuration, learn how to run inference:
- Running Inference → - Master the
gpux runcommand - Benchmarking → - Measure performance
- API Reference - Complete schema reference
💡 Key Takeaways¶
What You Learned
✅ Complete gpux.yml structure
✅ How to configure inputs and outputs
✅ Runtime and GPU settings
✅ Serving configuration for production
✅ Best practices for configuration
✅ Common issues and solutions
Previous: First Steps | Next: Running Inference