Running Inference¶
Master the gpux run command with registry models and different input formats.
🎯 What You'll Learn¶
- ✅ Running inference on registry models
- ✅ Different ways to provide input data
- ✅ Using JSON files and inline data
- ✅ Saving inference results
- ✅ Batch processing
- ✅ Using the Python API
🚀 Basic Usage¶
Registry Models¶
Run inference on models pulled from registries:
# Sentiment analysis
gpux run distilbert-base-uncased-finetuned-sst-2-english --input '{"inputs": "I love this product!"}'
# Text generation
gpux run facebook/opt-125m --input '{"inputs": "The future of AI is"}'
# Embeddings
gpux run sentence-transformers/all-MiniLM-L6-v2 --input '{"inputs": "Hello world"}'
Local Models¶
Run inference on local models with gpux.yml:
📥 Input Methods¶
1. Inline JSON¶
Provide input directly on the command line:
# Registry model
gpux run distilbert-base-uncased-finetuned-sst-2-english --input '{"inputs": "I love this product!"}'
# Local model
gpux run sentiment-analysis --input '{"text": "I love this product!"}'
2. JSON File¶
Save input to a file:
For sentiment analysis:
For text generation:
For embeddings:
Run with file:
3. File Reference (@ prefix)¶
📤 Output Options¶
Print to Console (Default)¶
Output:
Save to File¶
🔁 Batch Inference¶
Process multiple inputs:
For sentiment analysis:
[
{"inputs": "I love this product!"},
{"inputs": "This is terrible."},
{"inputs": "It's okay, nothing special."}
]
For text generation:
[
{"inputs": "The future of AI is"},
{"inputs": "Machine learning will"},
{"inputs": "Deep learning models"}
]
🐍 Python API¶
Use GPUX programmatically with registry models:
from gpux import GPUXRuntime
import numpy as np
# Initialize runtime with registry model
runtime = GPUXRuntime(model_id="distilbert-base-uncased-finetuned-sst-2-english")
# Prepare input
input_data = {"inputs": "I love this product!"}
# Run inference
result = runtime.infer(input_data)
print(result)
# Cleanup
runtime.cleanup()
Context Manager¶
from gpux import GPUXRuntime
with GPUXRuntime(model_id="distilbert-base-uncased-finetuned-sst-2-english") as runtime:
result = runtime.infer({"inputs": "This is amazing!"})
print(result)
Local Models¶
from gpux import GPUXRuntime
import numpy as np
# Initialize runtime with local model
runtime = GPUXRuntime(model_path="model.onnx")
# Prepare input
input_data = {"data": np.array([[1, 2, 3, 4, 5]])}
# Run inference
result = runtime.infer(input_data)
print(result)
# Cleanup
runtime.cleanup()
⚙️ Advanced Options¶
Specify Provider¶
# Force CPU provider
gpux run distilbert-base-uncased-finetuned-sst-2-english --input '{"inputs": "test"}' --provider cpu
# Force specific GPU provider
gpux run distilbert-base-uncased-finetuned-sst-2-english --input '{"inputs": "test"}' --provider cuda
Verbose Output¶
Custom Model Path¶
💡 Key Takeaways¶
What You Learned
✅ Running inference on registry models (Hugging Face) ✅ Multiple input methods (inline, file, @-prefix) ✅ Different input formats for different model types ✅ Saving output to files ✅ Batch processing with arrays ✅ Using the Python API with both registry and local models ✅ Advanced command-line options
Previous: Pulling Models | Next: Benchmarking