First Steps¶

Get started with GPUX in under 2 minutes by pulling a model from Hugging Face!

🎯 What You'll Build¶

By the end of this guide, you'll have:

✅ Pulled a model from Hugging Face registry
✅ Run inference on a real model
✅ Served your model via HTTP API

🚀 Quick Start: Pull from Hugging Face¶

The fastest way to get started is to pull a pre-trained model from Hugging Face:

# Pull a modern sentiment analysis model (RoBERTa-based)
gpux pull cardiffnlp/twitter-roberta-base-sentiment-latest

Expected output:

╭─ Pulling Model ────────────────────────────────────────────────╮
│ Registry: huggingface                                           │
│ Model: cardiffnlp/twitter-roberta-base-sentiment-latest         │
│ Size: ~500 MB                                                    │
╰─────────────────────────────────────────────────────────────────╯

📥 Downloading model files...
✅ Model downloaded successfully!

🔄 Converting to ONNX...
✅ Conversion completed!

📝 Generating configuration...
✅ Configuration saved to: ~/.gpux/models/cardiffnlp/twitter-roberta-base-sentiment-latest/gpux.yml

Run Inference¶

# Run sentiment analysis
gpux run cardiffnlp/twitter-roberta-base-sentiment-latest \
  --input '{"inputs": "I love this product!"}'

Expected output:

{
  "label": "POSITIVE",
  "score": 0.95
}

Serve Your Model¶

# Start HTTP server
gpux serve cardiffnlp/twitter-roberta-base-sentiment-latest --port 8080

Test the API:

curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{"inputs": "This is amazing!"}'

Congratulations! 🎉

You just pulled, ran, and served a real ML model in under 2 minutes!

🛠️ Advanced: Create Your Own Model¶

Advanced Feature

This section is for users who want to create their own ONNX models and configure them manually. Most users will use gpux pull instead. Skip this section if you're just getting started.

For this tutorial, we'll create a simple linear regression model. Don't worry if you're not familiar with machine learning - this is just for demonstration!

Option 1: Using PyTorch (Recommended)¶

Create a file named create_model.py:

"""Create a simple ONNX model for GPUX tutorial."""
import torch
import torch.nn as nn

# Define a simple linear model
class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 2)  # 10 inputs, 2 outputs

    def forward(self, x):
        return self.linear(x)

# Create model instance
model = SimpleModel()
model.eval()

# Create dummy input
dummy_input = torch.randn(1, 10)

# Export to ONNX
torch.onnx.export(
    model,
    dummy_input,
    "model.onnx",
    input_names=["input"],
    output_names=["output"],
    dynamic_axes={
        "input": {0: "batch_size"},
        "output": {0: "batch_size"}
    }
)

print("✅ Model exported to model.onnx")

Run the script:

# Install PyTorch if needed
uv add torch

# Create the model
python create_model.py

Option 2: Download Example Model¶

Alternatively, download a pre-made example model:

# Download example model (sentiment analysis)
curl -o model.onnx https://github.com/onnx/models/raw/main/vision/classification/mobilenet/model/mobilenetv2-7.onnx

Using Your Own Model

If you already have an ONNX model, just copy it to this directory and rename it to model.onnx.

📝 Create Configuration File¶

Now create a gpux.yml file to configure your model:

# gpux.yml - Configuration for GPUX
name: my-first-model
version: 1.0.0
description: "My first GPUX model"

model:
  source: ./model.onnx
  format: onnx

inputs:
  input:
    type: float32
    shape: [1, 10]
    required: true
    description: "10-dimensional input vector"

outputs:
  output:
    type: float32
    shape: [1, 2]
    description: "2-dimensional output vector"

runtime:
  gpu:
    memory: 2GB
    backend: auto  # Automatically select best GPU
  batch_size: 1
  timeout: 30

Configuration Explained

name: Your model's name (used in CLI commands)
model.source: Path to your ONNX model file
inputs: Define input tensor specifications
outputs: Define output tensor specifications
runtime: GPU and performance settings

🏗️ Build Your Model¶

Validate and build your GPUX project:

gpux build .

Expected output:

╭─ Model Information ─────────────────────────────────────╮
│ Name      │ my-first-model                              │
│ Version   │ 1.0.0                                       │
│ Format    │ onnx                                        │
│ Size      │ 0.1 MB                                      │
│ Inputs    │ 1                                           │
│ Outputs   │ 1                                           │
╰─────────────────────────────────────────────────────────╯

╭─ Execution Provider ────────────────────────────────────╮
│ Provider    │ CoreMLExecutionProvider                   │
│ Platform    │ Apple Silicon                             │
│ Available   │ ✅ Yes                                    │
│ Description │ Optimized for Apple devices              │
╰─────────────────────────────────────────────────────────╯

✅ Build completed successfully!
Build artifacts saved to: .gpux

What Just Happened?

GPUX: 1. ✅ Validated your gpux.yml configuration 2. ✅ Inspected your ONNX model 3. ✅ Detected the best GPU provider (or CPU) 4. ✅ Saved build artifacts to .gpux/ directory

🚀 Run Your First Inference¶

Now let's run inference on your model!

Create Input Data¶

Create a file named input.json:

{
  "input": [[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]]
}

Run Inference¶

gpux run my-first-model --file input.json

Expected output:

{
  "output": [
    [0.123, -0.456]
  ]
}

Congratulations! 🎉

You just ran your first inference with GPUX!

Alternative: Inline Input¶

You can also provide input directly via the command line:

gpux run my-first-model --input '{"input": [[1,2,3,4,5,6,7,8,9,10]]}'

🔍 Inspect Your Model¶

Get detailed information about your model:

gpux inspect my-first-model

Expected output:

╭─ Model Information ─────────────────────────────────────╮
│ Name      │ my-first-model                              │
│ Version   │ 1.0.0                                       │
│ Path      │ ./model.onnx                                │
│ Size      │ 0.1 MB                                      │
╰─────────────────────────────────────────────────────────╯

╭─ Input Specifications ──────────────────────────────────╮
│ Name  │ Type    │ Shape     │ Required │
│ input │ float32 │ [1, 10]   │ ✅       │
╰─────────────────────────────────────────────────────────╯

╭─ Output Specifications ─────────────────────────────────╮
│ Name   │ Type    │ Shape    │
│ output │ float32 │ [1, 2]   │
╰─────────────────────────────────────────────────────────╯

╭─ Runtime Information ───────────────────────────────────╮
│ Provider  │ CoreMLExecutionProvider                     │
│ Backend   │ auto                                        │
│ GPU Memory│ 2GB                                         │
╰─────────────────────────────────────────────────────────╯

📂 Your Project Structure¶

After completing these steps, your project should look like this:

my-first-model/
├── model.onnx           # Your ONNX model
├── gpux.yml             # GPUX configuration
├── input.json           # Sample input data
├── create_model.py      # Model creation script (optional)
└── .gpux/               # Build artifacts (auto-generated)
    ├── model_info.json
    └── provider_info.json

🎓 Understanding the Workflow¶

Here's what happens when you use GPUX with registry models:

graph LR
    A[gpux pull model-id] --> B[Download Model]
    B --> C[Convert to ONNX]
    C --> D[Generate Config]
    D --> E[Cache Model]
    E --> F[gpux run model-id]
    G[input.json] --> F
    F --> H[Load from Cache]
    H --> I[Run Inference]
    I --> J[Return Results]

    E --> K[gpux serve model-id]
    K --> L[Start HTTP Server]
    L --> M[Handle API Requests]

    style A fill:#6366f1,stroke:#4f46e5,color:#fff
    style G fill:#6366f1,stroke:#4f46e5,color:#fff
    style J fill:#10b981,stroke:#059669,color:#fff
    style M fill:#10b981,stroke:#059669,color:#fff

Local Project Workflow¶

For local projects with gpux.yml:

graph LR
    A[gpux.yml] --> B[gpux build]
    C[model.onnx] --> B
    B --> D[Validate Config]
    D --> E[Inspect Model]
    E --> F[Select Provider]
    F --> G[Save Build Info]
    G --> H[gpux run]
    I[input.json] --> H
    H --> J[Load Model]
    J --> K[Run Inference]
    K --> L[Return Results]

    style A fill:#6366f1,stroke:#4f46e5,color:#fff
    style C fill:#6366f1,stroke:#4f46e5,color:#fff
    style I fill:#6366f1,stroke:#4f46e5,color:#fff
    style L fill:#10b981,stroke:#059669,color:#fff

✨ Try Different Model Types¶

Now that you have a working GPUX setup, try these different model types:

📝 Text Models¶

Emotion Analysis¶

# 6 emotion categories (joy, sadness, anger, fear, surprise, neutral)
gpux pull j-hartmann/emotion-english-distilroberta-base
gpux run j-hartmann/emotion-english-distilroberta-base \
  --input '{"inputs": "I feel great today!"}'

Text Generation¶

# Small, efficient language model
gpux pull microsoft/phi-2
gpux run microsoft/phi-2 \
  --input '{"inputs": "The future of AI is"}'

Embeddings¶

# Modern embedding model
gpux pull BAAI/bge-small-en-v1.5
gpux run BAAI/bge-small-en-v1.5 \
  --input '{"inputs": "Hello world"}'

🎤 Audio Models¶

Speech Recognition¶

# Whisper for speech-to-text transcription
gpux pull openai/whisper-base
gpux run openai/whisper-base \
  --input '{"audio": "path/to/audio.wav"}'

Audio Classification¶

# Emotion recognition in audio
gpux pull superb/hubert-base-superb-er
gpux run superb/hubert-base-superb-er \
  --input '{"audio": "path/to/audio.wav"}'

🖼️ Image Models¶

Image Classification¶

# Vision Transformer for image classification
gpux pull google/vit-base-patch16-224
gpux run google/vit-base-patch16-224 \
  --input '{"image": "path/to/image.jpg"}'

Object Detection¶

# DETR for object detection
gpux pull facebook/detr-resnet-50
gpux run facebook/detr-resnet-50 \
  --input '{"image": "path/to/image.jpg"}'

🔍 Inspect Models¶

Get detailed information about any model:

# Get detailed model information
gpux inspect cardiffnlp/twitter-roberta-base-sentiment-latest

⚙️ Use Different Providers¶

# Force CPU provider
gpux run cardiffnlp/twitter-roberta-base-sentiment-latest \
  --input '{"inputs": "test"}' \
  --provider cpu

# Force specific GPU provider
gpux run cardiffnlp/twitter-roberta-base-sentiment-latest \
  --input '{"inputs": "test"}' \
  --provider cuda

🛠️ Advanced: Using Your Own Models¶

If you have your own ONNX model, you can create a gpux.yml configuration file. This is an advanced feature - most users will use gpux pull instead.

See Configuration Guide for details.

🐛 Troubleshooting¶

Model file not found¶

Error: Model file not found: ./model.onnx

Solution: Make sure model.onnx exists in your project directory:

ls -lh model.onnx

Input validation failed¶

Error: Input mismatch. Missing: {'input'}

Solution: Check your input data matches the expected format:

# Verify input specification
gpux inspect my-first-model

# Ensure input.json has the correct key names
cat input.json

Invalid YAML¶

Error: Invalid YAML in configuration file

Solution: Validate your gpux.yml syntax:

# Check YAML syntax
python -c "import yaml; yaml.safe_load(open('gpux.yml'))"

📚 What's Next?¶

Great job! You've successfully pulled and run your first GPUX model. 🎉

Continue learning:

Pulling Models → - Learn more about Hugging Face integration
Running Inference → - Advanced inference techniques
Serving Models → - Production deployment
Benchmarking → - Measure model performance
Configuration → - Advanced: Create custom gpux.yml files

💡 Key Takeaways¶

What You Learned

✅ How to pull models from Hugging Face with gpux pull ✅ How to run inference with gpux run ✅ How to serve models with gpux serve ✅ How to inspect model information with gpux inspect

Previous: Installation | Next: Pulling Models