gpux pull¶

Pull models from registries and convert them to ONNX format.

Synopsis¶

gpux pull <model-id> [OPTIONS]

Description¶

The gpux pull command downloads models from supported registries (currently Hugging Face Hub) and automatically converts them to ONNX format for use with GPUX. Models are cached locally for fast access.

Arguments¶

`<model-id>`¶

The model identifier. Can be specified in several formats:

Simple format: distilbert-base-uncased-finetuned-sst-2-english
Registry format: huggingface:microsoft/DialoGPT-medium
Short alias: hf:microsoft/DialoGPT-medium

Options¶

`--registry`, `-r`¶

Specify the registry to pull from.

gpux pull microsoft/DialoGPT-medium --registry huggingface

Default: huggingface

Supported values: - huggingface - Hugging Face Hub - hf - Short alias for Hugging Face Hub

`--revision`, `--rev`¶

Pull a specific revision or tag of the model.

gpux pull microsoft/DialoGPT-medium --revision v1.0
gpux pull microsoft/DialoGPT-medium --revision abc123def456

Default: main (latest)

`--cache-dir`¶

Specify a custom cache directory.

gpux pull microsoft/DialoGPT-medium --cache-dir /path/to/custom/cache

Default: ~/.gpux/models/ (macOS/Linux) or %USERPROFILE%\.gpux\models\ (Windows)

`--token`¶

Authentication token for private models.

gpux pull your-org/private-model --token "hf_your_token_here"

Note: You can also set the HUGGINGFACE_HUB_TOKEN environment variable.

`--force`, `-f`¶

Force re-download and conversion, even if the model is already cached.

gpux pull microsoft/DialoGPT-medium --force

`--provider`¶

Specify the execution provider for conversion.

gpux pull microsoft/DialoGPT-medium --provider cpu
gpux pull microsoft/DialoGPT-medium --provider cuda

Default: auto (automatically select best available)

Supported values: - auto - Automatically select best provider - cpu - CPU only - cuda - NVIDIA CUDA - coreml - Apple CoreML - rocm - AMD ROCm - directml - Windows DirectML

`--verbose`, `-v`¶

Enable verbose output showing detailed progress.

gpux pull microsoft/DialoGPT-medium --verbose

`--help`, `-h`¶

Show help message and exit.

Examples¶

Basic Usage¶

# Pull a sentiment analysis model
gpux pull distilbert-base-uncased-finetuned-sst-2-english

# Pull a text generation model
gpux pull facebook/opt-125m

# Pull an embedding model
gpux pull sentence-transformers/all-MiniLM-L6-v2

Registry Specification¶

# Explicitly specify Hugging Face registry
gpux pull huggingface:microsoft/DialoGPT-medium

# Use short alias
gpux pull hf:microsoft/DialoGPT-medium

Version Control¶

# Pull specific revision
gpux pull microsoft/DialoGPT-medium --revision v1.0

# Pull specific commit
gpux pull microsoft/DialoGPT-medium --revision abc123def456

Authentication¶

# Pull private model with token
gpux pull your-org/private-model --token "hf_your_token_here"

# Using environment variable
export HUGGINGFACE_HUB_TOKEN="hf_your_token_here"
gpux pull your-org/private-model

Advanced Options¶

# Force re-download
gpux pull microsoft/DialoGPT-medium --force

# Use custom cache directory
gpux pull microsoft/DialoGPT-medium --cache-dir /path/to/cache

# Verbose output
gpux pull microsoft/DialoGPT-medium --verbose

# CPU-only conversion
gpux pull microsoft/DialoGPT-medium --provider cpu

Output¶

Success Output¶

╭─ Pulling Model ────────────────────────────────────────────────╮
│ Registry: huggingface                                           │
│ Model: microsoft/DialoGPT-medium                                │
│ Revision: main                                                  │
│ Size: 1.2 GB                                                    │
╰─────────────────────────────────────────────────────────────────╯

📥 Downloading model files...
✅ Model downloaded successfully!

🔄 Converting to ONNX...
✅ Conversion completed!

📝 Generating configuration...
✅ Configuration saved to: ~/.gpux/models/microsoft-DialoGPT-medium/gpux.yml

🎉 Model ready! Use: gpux run microsoft/DialoGPT-medium

Verbose Output¶

╭─ Pulling Model ────────────────────────────────────────────────╮
│ Registry: huggingface                                           │
│ Model: microsoft/DialoGPT-medium                                │
│ Revision: main                                                  │
│ Size: 1.2 GB                                                    │
╰─────────────────────────────────────────────────────────────────╯

📥 Downloading model files...
  └─ Downloading config.json... ✅
  └─ Downloading pytorch_model.bin... ✅
  └─ Downloading tokenizer.json... ✅
  └─ Downloading tokenizer_config.json... ✅
✅ Model downloaded successfully!

🔄 Converting to ONNX...
  └─ Loading PyTorch model... ✅
  └─ Exporting to ONNX... ✅
  └─ Validating ONNX model... ✅
✅ Conversion completed!

📝 Generating configuration...
  └─ Analyzing model inputs... ✅
  └─ Analyzing model outputs... ✅
  └─ Generating gpux.yml... ✅
✅ Configuration saved to: ~/.gpux/models/microsoft-DialoGPT-medium/gpux.yml

🎉 Model ready! Use: gpux run microsoft/DialoGPT-medium

Exit Codes¶

0 - Success
1 - General error
2 - Model not found
3 - Network error
4 - Authentication error
5 - Conversion error

Environment Variables¶

`HUGGINGFACE_HUB_TOKEN`¶

Authentication token for Hugging Face Hub.

export HUGGINGFACE_HUB_TOKEN="hf_your_token_here"

`GPUX_CACHE_DIR`¶

Default cache directory for models.

export GPUX_CACHE_DIR="/path/to/custom/cache"

`GPUX_LOG_LEVEL`¶

Logging level for debugging.

export GPUX_LOG_LEVEL="DEBUG"

Cache Management¶

Cache Location¶

Models are cached in: - macOS/Linux: ~/.gpux/models/ - Windows: %USERPROFILE%\.gpux\models\

Cache Structure¶

~/.gpux/models/
├── microsoft-DialoGPT-medium/
│   ├── model.onnx              # Converted ONNX model
│   ├── gpux.yml               # Auto-generated config
│   ├── tokenizer.json         # Tokenizer files
│   ├── config.json            # Model configuration
│   └── metadata.json          # GPUX metadata
└── distilbert-base-uncased-finetuned-sst-2-english/
    ├── model.onnx
    ├── gpux.yml
    └── ...

Cache Operations¶

# Check cache size
du -sh ~/.gpux/models/

# List cached models
ls ~/.gpux/models/

# Clear specific model cache
rm -rf ~/.gpux/models/model-name

# Clear all cache
rm -rf ~/.gpux/models/

Troubleshooting¶

Common Issues¶

Model Not Found¶

Error: Model not found: invalid-model-name

Solutions: - Check model name spelling - Verify model exists on Hugging Face Hub - Try with full organization name: org/model-name

Download Failed¶

Error: Network error: Failed to download model

Solutions: - Check internet connection - Verify Hugging Face Hub is accessible - Try again with --force flag

Conversion Failed¶

Error: Conversion failed: Unsupported model architecture

Solutions: - Try a different model - Check if model supports ONNX conversion - Use --verbose for detailed error information

Authentication Failed¶

Error: Authentication failed: Invalid token

Solutions: - Verify token is correct - Check token permissions - Ensure token starts with hf_

Memory Issues¶

Error: Out of memory during conversion

Solutions: - Try a smaller model - Close other applications - Use CPU-only conversion: --provider cpu

gpux run - Run inference on pulled models
gpux inspect - Inspect model information
gpux serve - Serve models via HTTP API