Skip to content

Running Inference

Master the gpux run command with registry models and different input formats.


🎯 What You'll Learn

  • ✅ Running inference on registry models
  • ✅ Different ways to provide input data
  • ✅ Using JSON files and inline data
  • ✅ Saving inference results
  • ✅ Batch processing
  • ✅ Using the Python API

🚀 Basic Usage

Registry Models

Run inference on models pulled from registries:

# Sentiment analysis
gpux run distilbert-base-uncased-finetuned-sst-2-english --input '{"inputs": "I love this product!"}'

# Text generation
gpux run facebook/opt-125m --input '{"inputs": "The future of AI is"}'

# Embeddings
gpux run sentence-transformers/all-MiniLM-L6-v2 --input '{"inputs": "Hello world"}'

Local Models

Run inference on local models with gpux.yml:

gpux run model-name --input '{"data": [1,2,3,4,5]}'

📥 Input Methods

1. Inline JSON

Provide input directly on the command line:

# Registry model
gpux run distilbert-base-uncased-finetuned-sst-2-english --input '{"inputs": "I love this product!"}'

# Local model
gpux run sentiment-analysis --input '{"text": "I love this product!"}'

2. JSON File

Save input to a file:

For sentiment analysis:

{
  "inputs": "This is amazing!"
}

For text generation:

{
  "inputs": "The future of AI is",
  "max_length": 50
}

For embeddings:

{
  "inputs": "Hello world"
}

Run with file:

gpux run distilbert-base-uncased-finetuned-sst-2-english --file input.json

3. File Reference (@ prefix)

gpux run distilbert-base-uncased-finetuned-sst-2-english --input @input.json

📤 Output Options

gpux run model-name --input data.json

Output:

{
  "result": [0.2, 0.8]
}

Save to File

gpux run model-name --input data.json --output result.json

🔁 Batch Inference

Process multiple inputs:

For sentiment analysis:

[
  {"inputs": "I love this product!"},
  {"inputs": "This is terrible."},
  {"inputs": "It's okay, nothing special."}
]

For text generation:

[
  {"inputs": "The future of AI is"},
  {"inputs": "Machine learning will"},
  {"inputs": "Deep learning models"}
]

gpux run distilbert-base-uncased-finetuned-sst-2-english --file batch_input.json

🐍 Python API

Use GPUX programmatically with registry models:

from gpux import GPUXRuntime
import numpy as np

# Initialize runtime with registry model
runtime = GPUXRuntime(model_id="distilbert-base-uncased-finetuned-sst-2-english")

# Prepare input
input_data = {"inputs": "I love this product!"}

# Run inference
result = runtime.infer(input_data)
print(result)

# Cleanup
runtime.cleanup()

Context Manager

from gpux import GPUXRuntime

with GPUXRuntime(model_id="distilbert-base-uncased-finetuned-sst-2-english") as runtime:
    result = runtime.infer({"inputs": "This is amazing!"})
    print(result)

Local Models

from gpux import GPUXRuntime
import numpy as np

# Initialize runtime with local model
runtime = GPUXRuntime(model_path="model.onnx")

# Prepare input
input_data = {"data": np.array([[1, 2, 3, 4, 5]])}

# Run inference
result = runtime.infer(input_data)
print(result)

# Cleanup
runtime.cleanup()

⚙️ Advanced Options

Specify Provider

# Force CPU provider
gpux run distilbert-base-uncased-finetuned-sst-2-english --input '{"inputs": "test"}' --provider cpu

# Force specific GPU provider
gpux run distilbert-base-uncased-finetuned-sst-2-english --input '{"inputs": "test"}' --provider cuda

Verbose Output

gpux run distilbert-base-uncased-finetuned-sst-2-english --input '{"inputs": "test"}' --verbose

Custom Model Path

# Use specific model path
gpux run /path/to/model --input '{"inputs": "test"}'

💡 Key Takeaways

What You Learned

✅ Running inference on registry models (Hugging Face) ✅ Multiple input methods (inline, file, @-prefix) ✅ Different input formats for different model types ✅ Saving output to files ✅ Batch processing with arrays ✅ Using the Python API with both registry and local models ✅ Advanced command-line options


Previous: Pulling Models | Next: Benchmarking