gpux run¶
Run inference on models from registries or local projects.
Overview¶
The gpux run command loads a model and runs inference on provided input data. It supports both registry models (pulled from Hugging Face) and local models with gpux.yml configuration.
Arguments¶
MODEL_NAME (required)¶
Name of the model to run. Can be:
- Registry model:
distilbert-base-uncased-finetuned-sst-2-english - Local model:
sentiment-analysis(requiresgpux.yml) - Model path:
./models/bertor/path/to/model
Examples:
# Registry models
gpux run distilbert-base-uncased-finetuned-sst-2-english
gpux run facebook/opt-125m
gpux run sentence-transformers/all-MiniLM-L6-v2
# Local models
gpux run sentiment-analysis
gpux run image-classifier
gpux run ./models/bert
The command searches for models in:
1. Registry cache (~/.gpux/models/)
2. Current directory (if gpux.yml exists)
3. Directory specified by model name
4. .gpux/ directory (for built models)
Options¶
Input Options¶
--input, -i¶
Input data as JSON string or file path (with @ prefix).
- Type:
string
# JSON string
gpux run sentiment --input '{"text": "I love this!"}'
# File path with @ prefix
gpux run sentiment --input @input.json
--file, -f¶
Input file path (alternative to --input).
- Type:
string
Output Options¶
--output, -o¶
Save output to file.
- Type:
string
Configuration Options¶
--config, -c¶
Configuration file name.
- Type:
string - Default:
gpux.yml
--provider, -p¶
Preferred execution provider.
- Type:
string - Choices:
cuda,coreml,rocm,directml,openvino,tensorrt,cpu
Benchmark Options¶
--benchmark¶
Run benchmark instead of single inference.
- Type:
boolean - Default:
false
--runs¶
Number of benchmark runs.
- Type:
integer - Default:
100
--warmup¶
Number of warmup runs before benchmarking.
- Type:
integer - Default:
10
Other Options¶
--verbose¶
Enable verbose output.
- Type:
boolean - Default:
false
Input Formats¶
Registry Models¶
Registry models typically use standardized input formats:
Text Classification¶
Text Generation¶
Embeddings¶
Question Answering¶
gpux run distilbert-base-cased-distilled-squad --input '{"question": "What is AI?", "context": "AI is artificial intelligence"}'
Local Models¶
Local models use custom input formats defined in gpux.yml:
JSON String¶
JSON File¶
input.json:
File Path with @ Prefix¶
Multiple Inputs¶
For models with multiple inputs:
Output Formats¶
Console Output (Default)¶
Results are displayed as formatted JSON:
File Output¶
Save results to a file:
result.json:
Inference Mode¶
Single Inference¶
Run a single inference and display results:
Output:
Batch Inference¶
For batch inference, pass arrays in input:
Benchmark Mode¶
Basic Benchmark¶
Run performance benchmark with default settings (100 runs, 10 warmup):
Output:
| Metric | Value |
|---|---|
| Mean Time | 0.42 ms |
| Min Time | 0.38 ms |
| Max Time | 1.25 ms |
| Std Dev | 0.08 ms |
| Throughput Fps | 2380.9 |
Custom Benchmark¶
Specify number of runs and warmup iterations:
Save Benchmark Results¶
Save benchmark metrics to a file:
gpux run sentiment \
--input '{"text": "Test"}' \
--benchmark \
--runs 1000 \
--output benchmark.json
benchmark.json:
{
"mean_time": 0.42,
"min_time": 0.38,
"max_time": 1.25,
"std_dev": 0.08,
"throughput_fps": 2380.9,
"num_runs": 1000,
"warmup_runs": 50
}
Examples¶
Registry Models¶
Sentiment Analysis¶
Output:
Text Generation¶
Embeddings¶
Question Answering¶
gpux run distilbert-base-cased-distilled-squad --input '{"question": "What is AI?", "context": "AI is artificial intelligence"}'
Local Models¶
Sentiment Analysis¶
Output:
Image Classification¶
File Input¶
With Specific Provider¶
gpux run distilbert-base-uncased-finetuned-sst-2-english \
--input '{"inputs": "Test"}' \
--provider cuda
Benchmark on Apple Silicon¶
gpux run distilbert-base-uncased-finetuned-sst-2-english \
--input '{"inputs": "Performance test"}' \
--provider coreml \
--benchmark \
--runs 1000
Error Handling¶
Model Not Found¶
Registry Model¶
Solution:
- Check model name spelling
- Verify model exists on Hugging Face Hub
- Try pulling the model first: gpux pull model-name
Local Model¶
Solution: Ensure the model exists and gpux.yml is configured correctly.
No Input Data Provided¶
Solution: Provide input using --input or --file.
Invalid JSON¶
Solution: Ensure your JSON is valid:
# ❌ Wrong (single quotes)
gpux run sentiment --input "{'text': 'test'}"
# ✅ Correct (double quotes)
gpux run sentiment --input '{"text": "test"}'
Missing Input Fields¶
Solution: Provide all required inputs as specified in model configuration.
Registry Model Not Cached¶
Solution: Pull the model first:
Best Practices¶
Use File Input for Large Data
For large inputs, use file input instead of command-line JSON:
Benchmark Before Production
Always benchmark your model to understand performance:
Save Benchmark Results
Save benchmark results for performance tracking:
Warmup is Important
Always include warmup runs for accurate benchmarks. The default is 10, but increase for more stable results:
Use Verbose Mode for Debugging
Enable verbose output to see detailed execution logs:
Performance Tips¶
- Provider Selection: Use the fastest provider for your hardware:
- NVIDIA:
--provider tensorrtor--provider cuda - Apple Silicon:
--provider coreml - AMD:
--provider rocm -
CPU:
--provider cpu -
Batch Processing: Process multiple items at once for better throughput
-
Warmup Runs: Use adequate warmup (50-100 runs) for stable benchmarks
-
Repeated Benchmarks: Run benchmarks multiple times and average results
Related Commands¶
gpux pull- Pull models from registriesgpux build- Build and validate local modelsgpux serve- Start HTTP server for inferencegpux inspect- Inspect model details