Skip to content

gpux run

Run inference on models from registries or local projects.


Overview

The gpux run command loads a model and runs inference on provided input data. It supports both registry models (pulled from Hugging Face) and local models with gpux.yml configuration.

gpux run MODEL_NAME [OPTIONS]

Arguments

MODEL_NAME (required)

Name of the model to run. Can be:

  • Registry model: distilbert-base-uncased-finetuned-sst-2-english
  • Local model: sentiment-analysis (requires gpux.yml)
  • Model path: ./models/bert or /path/to/model

Examples:

# Registry models
gpux run distilbert-base-uncased-finetuned-sst-2-english
gpux run facebook/opt-125m
gpux run sentence-transformers/all-MiniLM-L6-v2

# Local models
gpux run sentiment-analysis
gpux run image-classifier
gpux run ./models/bert

The command searches for models in: 1. Registry cache (~/.gpux/models/) 2. Current directory (if gpux.yml exists) 3. Directory specified by model name 4. .gpux/ directory (for built models)


Options

Input Options

--input, -i

Input data as JSON string or file path (with @ prefix).

  • Type: string
# JSON string
gpux run sentiment --input '{"text": "I love this!"}'

# File path with @ prefix
gpux run sentiment --input @input.json

--file, -f

Input file path (alternative to --input).

  • Type: string
gpux run sentiment --file input.json

Output Options

--output, -o

Save output to file.

  • Type: string
gpux run sentiment --input '{"text": "Great!"}' --output result.json

Configuration Options

--config, -c

Configuration file name.

  • Type: string
  • Default: gpux.yml
gpux run sentiment --config custom.yml

--provider, -p

Preferred execution provider.

  • Type: string
  • Choices: cuda, coreml, rocm, directml, openvino, tensorrt, cpu
gpux run sentiment --provider cuda

Benchmark Options

--benchmark

Run benchmark instead of single inference.

  • Type: boolean
  • Default: false
gpux run sentiment --input '{"text": "Test"}' --benchmark

--runs

Number of benchmark runs.

  • Type: integer
  • Default: 100
gpux run sentiment --input '{"text": "Test"}' --benchmark --runs 1000

--warmup

Number of warmup runs before benchmarking.

  • Type: integer
  • Default: 10
gpux run sentiment --input '{"text": "Test"}' --benchmark --warmup 20

Other Options

--verbose

Enable verbose output.

  • Type: boolean
  • Default: false
gpux run sentiment --input '{"text": "Test"}' --verbose

Input Formats

Registry Models

Registry models typically use standardized input formats:

Text Classification

gpux run distilbert-base-uncased-finetuned-sst-2-english --input '{"inputs": "I love GPUX!"}'

Text Generation

gpux run facebook/opt-125m --input '{"inputs": "The future of AI is"}'

Embeddings

gpux run sentence-transformers/all-MiniLM-L6-v2 --input '{"inputs": "Hello world"}'

Question Answering

gpux run distilbert-base-cased-distilled-squad --input '{"question": "What is AI?", "context": "AI is artificial intelligence"}'

Local Models

Local models use custom input formats defined in gpux.yml:

JSON String

gpux run sentiment --input '{"text": "I love GPUX!"}'

JSON File

gpux run sentiment --file input.json

input.json:

{
  "text": "I love GPUX!"
}

File Path with @ Prefix

gpux run sentiment --input @input.json

Multiple Inputs

For models with multiple inputs:

{
  "input_ids": [1, 2, 3, 4, 5],
  "attention_mask": [1, 1, 1, 1, 1]
}

Output Formats

Console Output (Default)

Results are displayed as formatted JSON:

{
  "sentiment": [0.1, 0.9]
}

File Output

Save results to a file:

gpux run sentiment --input '{"text": "Great!"}' --output result.json

result.json:

{
  "sentiment": [0.1, 0.9],
  "labels": ["negative", "positive"]
}


Inference Mode

Single Inference

Run a single inference and display results:

gpux run sentiment-analysis --input '{"text": "I love this product!"}'

Output:

{
  "sentiment": [0.1, 0.9]
}

Batch Inference

For batch inference, pass arrays in input:

gpux run sentiment --input '{
  "text": ["I love this!", "This is terrible", "Pretty good"]
}'

Benchmark Mode

Basic Benchmark

Run performance benchmark with default settings (100 runs, 10 warmup):

gpux run sentiment --input '{"text": "Test"}' --benchmark

Output:

Metric Value
Mean Time 0.42 ms
Min Time 0.38 ms
Max Time 1.25 ms
Std Dev 0.08 ms
Throughput Fps 2380.9

Custom Benchmark

Specify number of runs and warmup iterations:

gpux run sentiment \
  --input '{"text": "Test"}' \
  --benchmark \
  --runs 1000 \
  --warmup 50

Save Benchmark Results

Save benchmark metrics to a file:

gpux run sentiment \
  --input '{"text": "Test"}' \
  --benchmark \
  --runs 1000 \
  --output benchmark.json

benchmark.json:

{
  "mean_time": 0.42,
  "min_time": 0.38,
  "max_time": 1.25,
  "std_dev": 0.08,
  "throughput_fps": 2380.9,
  "num_runs": 1000,
  "warmup_runs": 50
}


Examples

Registry Models

Sentiment Analysis

gpux run distilbert-base-uncased-finetuned-sst-2-english --input '{"inputs": "I love GPUX!"}'

Output:

{
  "logits": [[-3.2, 3.8]],
  "predicted_class": "POSITIVE"
}

Text Generation

gpux run facebook/opt-125m --input '{"inputs": "The future of AI is"}'

Embeddings

gpux run sentence-transformers/all-MiniLM-L6-v2 --input '{"inputs": "Hello world"}'

Question Answering

gpux run distilbert-base-cased-distilled-squad --input '{"question": "What is AI?", "context": "AI is artificial intelligence"}'

Local Models

Sentiment Analysis

gpux run sentiment-analysis --input '{"text": "I love GPUX!"}'

Output:

{
  "sentiment": [0.05, 0.95],
  "labels": ["negative", "positive"]
}

Image Classification

gpux run image-classifier --input '{
  "image": [/* pixel values */]
}'

File Input

gpux run distilbert-base-uncased-finetuned-sst-2-english --file input.json --output result.json

With Specific Provider

gpux run distilbert-base-uncased-finetuned-sst-2-english \
  --input '{"inputs": "Test"}' \
  --provider cuda

Benchmark on Apple Silicon

gpux run distilbert-base-uncased-finetuned-sst-2-english \
  --input '{"inputs": "Performance test"}' \
  --provider coreml \
  --benchmark \
  --runs 1000

Error Handling

Model Not Found

Registry Model

Error: Model 'invalid-model-name' not found in registry

Solution: - Check model name spelling - Verify model exists on Hugging Face Hub - Try pulling the model first: gpux pull model-name

Local Model

Error: Model 'sentiment-analysis' not found

Solution: Ensure the model exists and gpux.yml is configured correctly.

No Input Data Provided

Error: No input data provided

Solution: Provide input using --input or --file.

Invalid JSON

Error parsing input JSON: Expecting property name enclosed in double quotes

Solution: Ensure your JSON is valid:

# ❌ Wrong (single quotes)
gpux run sentiment --input "{'text': 'test'}"

# ✅ Correct (double quotes)
gpux run sentiment --input '{"text": "test"}'

Missing Input Fields

Run failed: Missing required input: 'inputs'

Solution: Provide all required inputs as specified in model configuration.

Registry Model Not Cached

Error: Model 'distilbert-base-uncased-finetuned-sst-2-english' not found in cache

Solution: Pull the model first:

gpux pull distilbert-base-uncased-finetuned-sst-2-english


Best Practices

Use File Input for Large Data

For large inputs, use file input instead of command-line JSON:

gpux run model --file input.json

Benchmark Before Production

Always benchmark your model to understand performance:

gpux run model --input @test.json --benchmark --runs 1000

Save Benchmark Results

Save benchmark results for performance tracking:

gpux run model --benchmark --output metrics.json

Warmup is Important

Always include warmup runs for accurate benchmarks. The default is 10, but increase for more stable results:

gpux run model --benchmark --warmup 50

Use Verbose Mode for Debugging

Enable verbose output to see detailed execution logs:

gpux run model --input @test.json --verbose


Performance Tips

  1. Provider Selection: Use the fastest provider for your hardware:
  2. NVIDIA: --provider tensorrt or --provider cuda
  3. Apple Silicon: --provider coreml
  4. AMD: --provider rocm
  5. CPU: --provider cpu

  6. Batch Processing: Process multiple items at once for better throughput

  7. Warmup Runs: Use adequate warmup (50-100 runs) for stable benchmarks

  8. Repeated Benchmarks: Run benchmarks multiple times and average results



See Also