`gpux run`¶

Run inference on models from registries or local projects.

Overview¶

The gpux run command loads a model and runs inference on provided input data. It supports both registry models (pulled from Hugging Face) and local models with gpux.yml configuration.

gpux run MODEL_NAME [OPTIONS]

Arguments¶

`MODEL_NAME` (required)¶

Name of the model to run. Can be:

Registry model: distilbert-base-uncased-finetuned-sst-2-english
Local model: sentiment-analysis (requires gpux.yml)
Model path: ./models/bert or /path/to/model

Examples:

# Registry models
gpux run distilbert-base-uncased-finetuned-sst-2-english
gpux run facebook/opt-125m
gpux run sentence-transformers/all-MiniLM-L6-v2

# Local models
gpux run sentiment-analysis
gpux run image-classifier
gpux run ./models/bert

The command searches for models in: 1. Registry cache (~/.gpux/models/) 2. Current directory (if gpux.yml exists) 3. Directory specified by model name 4. .gpux/ directory (for built models)

Options¶

Input Options¶

`--input`, `-i`¶

Input data as JSON string or file path (with @ prefix).

Type: string

# JSON string
gpux run sentiment --input '{"text": "I love this!"}'

# File path with @ prefix
gpux run sentiment --input @input.json

`--file`, `-f`¶

Input file path (alternative to --input).

Type: string

gpux run sentiment --file input.json

Output Options¶

`--output`, `-o`¶

Save output to file.

Type: string

gpux run sentiment --input '{"text": "Great!"}' --output result.json

Configuration Options¶

`--config`, `-c`¶

Configuration file name.

Type: string
Default: gpux.yml

gpux run sentiment --config custom.yml

`--provider`, `-p`¶

Preferred execution provider.

Type: string
Choices: cuda, coreml, rocm, directml, openvino, tensorrt, cpu

gpux run sentiment --provider cuda

Benchmark Options¶

`--benchmark`¶

Run benchmark instead of single inference.

Type: boolean
Default: false

gpux run sentiment --input '{"text": "Test"}' --benchmark

`--runs`¶

Number of benchmark runs.

Type: integer
Default: 100

gpux run sentiment --input '{"text": "Test"}' --benchmark --runs 1000

`--warmup`¶

Number of warmup runs before benchmarking.

Type: integer
Default: 10

gpux run sentiment --input '{"text": "Test"}' --benchmark --warmup 20

Other Options¶

`--verbose`¶

Enable verbose output.

Type: boolean
Default: false

gpux run sentiment --input '{"text": "Test"}' --verbose

Input Formats¶

Registry Models¶

Registry models typically use standardized input formats:

Text Classification¶

gpux run distilbert-base-uncased-finetuned-sst-2-english --input '{"inputs": "I love GPUX!"}'

Text Generation¶

gpux run facebook/opt-125m --input '{"inputs": "The future of AI is"}'

Embeddings¶

gpux run sentence-transformers/all-MiniLM-L6-v2 --input '{"inputs": "Hello world"}'

Question Answering¶

gpux run distilbert-base-cased-distilled-squad --input '{"question": "What is AI?", "context": "AI is artificial intelligence"}'

Local Models¶

Local models use custom input formats defined in gpux.yml:

JSON String¶

gpux run sentiment --input '{"text": "I love GPUX!"}'

JSON File¶

gpux run sentiment --file input.json

input.json:

{
  "text": "I love GPUX!"
}

File Path with @ Prefix¶

gpux run sentiment --input @input.json

Multiple Inputs¶

For models with multiple inputs:

{
  "input_ids": [1, 2, 3, 4, 5],
  "attention_mask": [1, 1, 1, 1, 1]
}

Output Formats¶

Console Output (Default)¶

Results are displayed as formatted JSON:

{
  "sentiment": [0.1, 0.9]
}

File Output¶

Save results to a file:

gpux run sentiment --input '{"text": "Great!"}' --output result.json

result.json:

{
  "sentiment": [0.1, 0.9],
  "labels": ["negative", "positive"]
}

Inference Mode¶

Single Inference¶

Run a single inference and display results:

gpux run sentiment-analysis --input '{"text": "I love this product!"}'

Output:

{
  "sentiment": [0.1, 0.9]
}

Batch Inference¶

For batch inference, pass arrays in input:

gpux run sentiment --input '{
  "text": ["I love this!", "This is terrible", "Pretty good"]
}'

Benchmark Mode¶

Basic Benchmark¶

Run performance benchmark with default settings (100 runs, 10 warmup):

gpux run sentiment --input '{"text": "Test"}' --benchmark

Output:

Metric	Value
Mean Time	0.42 ms
Min Time	0.38 ms
Max Time	1.25 ms
Std Dev	0.08 ms
Throughput Fps	2380.9

Custom Benchmark¶

Specify number of runs and warmup iterations:

gpux run sentiment \
  --input '{"text": "Test"}' \
  --benchmark \
  --runs 1000 \
  --warmup 50

Save Benchmark Results¶

Save benchmark metrics to a file:

gpux run sentiment \
  --input '{"text": "Test"}' \
  --benchmark \
  --runs 1000 \
  --output benchmark.json

benchmark.json:

{
  "mean_time": 0.42,
  "min_time": 0.38,
  "max_time": 1.25,
  "std_dev": 0.08,
  "throughput_fps": 2380.9,
  "num_runs": 1000,
  "warmup_runs": 50
}

Examples¶

Registry Models¶

Sentiment Analysis¶

gpux run distilbert-base-uncased-finetuned-sst-2-english --input '{"inputs": "I love GPUX!"}'

Output:

{
  "logits": [[-3.2, 3.8]],
  "predicted_class": "POSITIVE"
}

Text Generation¶

gpux run facebook/opt-125m --input '{"inputs": "The future of AI is"}'

Embeddings¶

gpux run sentence-transformers/all-MiniLM-L6-v2 --input '{"inputs": "Hello world"}'

Question Answering¶

gpux run distilbert-base-cased-distilled-squad --input '{"question": "What is AI?", "context": "AI is artificial intelligence"}'

Local Models¶

Sentiment Analysis¶

gpux run sentiment-analysis --input '{"text": "I love GPUX!"}'

Output:

{
  "sentiment": [0.05, 0.95],
  "labels": ["negative", "positive"]
}

Image Classification¶

gpux run image-classifier --input '{
  "image": [/* pixel values */]
}'

File Input¶

gpux run distilbert-base-uncased-finetuned-sst-2-english --file input.json --output result.json

With Specific Provider¶

gpux run distilbert-base-uncased-finetuned-sst-2-english \
  --input '{"inputs": "Test"}' \
  --provider cuda

Benchmark on Apple Silicon¶

gpux run distilbert-base-uncased-finetuned-sst-2-english \
  --input '{"inputs": "Performance test"}' \
  --provider coreml \
  --benchmark \
  --runs 1000

Error Handling¶

Model Not Found¶

Registry Model¶

Error: Model 'invalid-model-name' not found in registry

Solution: - Check model name spelling - Verify model exists on Hugging Face Hub - Try pulling the model first: gpux pull model-name

Local Model¶

Error: Model 'sentiment-analysis' not found

Solution: Ensure the model exists and gpux.yml is configured correctly.

No Input Data Provided¶

Error: No input data provided

Solution: Provide input using --input or --file.

Invalid JSON¶

Error parsing input JSON: Expecting property name enclosed in double quotes

Solution: Ensure your JSON is valid:

# ❌ Wrong (single quotes)
gpux run sentiment --input "{'text': 'test'}"

# ✅ Correct (double quotes)
gpux run sentiment --input '{"text": "test"}'

Missing Input Fields¶

Run failed: Missing required input: 'inputs'

Solution: Provide all required inputs as specified in model configuration.

Registry Model Not Cached¶

Error: Model 'distilbert-base-uncased-finetuned-sst-2-english' not found in cache

Solution: Pull the model first:

gpux pull distilbert-base-uncased-finetuned-sst-2-english

Best Practices¶

Use File Input for Large Data

For large inputs, use file input instead of command-line JSON:

gpux run model --file input.json

Benchmark Before Production

Always benchmark your model to understand performance:

gpux run model --input @test.json --benchmark --runs 1000

Save Benchmark Results

Save benchmark results for performance tracking:

gpux run model --benchmark --output metrics.json

Warmup is Important

Always include warmup runs for accurate benchmarks. The default is 10, but increase for more stable results:

gpux run model --benchmark --warmup 50

Use Verbose Mode for Debugging

Enable verbose output to see detailed execution logs:

gpux run model --input @test.json --verbose

Performance Tips¶

Provider Selection: Use the fastest provider for your hardware:
NVIDIA: --provider tensorrt or --provider cuda
Apple Silicon: --provider coreml
AMD: --provider rocm
CPU: --provider cpu
Batch Processing: Process multiple items at once for better throughput
Warmup Runs: Use adequate warmup (50-100 runs) for stable benchmarks
Repeated Benchmarks: Run benchmarks multiple times and average results

gpux pull - Pull models from registries
gpux build - Build and validate local models
gpux serve - Start HTTP server for inference
gpux inspect - Inspect model details

gpux run¶

Overview¶

Arguments¶

MODEL_NAME (required)¶

Options¶

Input Options¶

--input, -i¶

--file, -f¶

Output Options¶

--output, -o¶

Configuration Options¶

--config, -c¶

--provider, -p¶

Benchmark Options¶

--benchmark¶

--runs¶

--warmup¶

Other Options¶

--verbose¶

Input Formats¶

Registry Models¶

Text Classification¶

Text Generation¶

Embeddings¶

Question Answering¶

Local Models¶

JSON String¶

JSON File¶

File Path with @ Prefix¶

Multiple Inputs¶

Output Formats¶

Console Output (Default)¶

File Output¶

Inference Mode¶

Single Inference¶

Batch Inference¶

Benchmark Mode¶

Basic Benchmark¶

Custom Benchmark¶

Save Benchmark Results¶

Examples¶

Registry Models¶

Sentiment Analysis¶

Text Generation¶

Embeddings¶

Question Answering¶

Local Models¶

Sentiment Analysis¶

Image Classification¶

File Input¶

With Specific Provider¶

Benchmark on Apple Silicon¶

Error Handling¶

Model Not Found¶

Registry Model¶

Local Model¶

No Input Data Provided¶

Invalid JSON¶

Missing Input Fields¶

Registry Model Not Cached¶

Best Practices¶

Performance Tips¶

Related Commands¶

See Also¶

`gpux run`¶

`MODEL_NAME` (required)¶

`--input`, `-i`¶

`--file`, `-f`¶

`--output`, `-o`¶

`--config`, `-c`¶

`--provider`, `-p`¶

`--benchmark`¶

`--runs`¶

`--warmup`¶

`--verbose`¶