Pulling Models from Registries¶

Learn how to pull and use models from Hugging Face and other registries with GPUX.

🎯 What You'll Learn¶

✅ Pulling models from Hugging Face Hub
✅ Understanding model caching and versioning
✅ Working with different model types
✅ Troubleshooting common issues

🚀 Basic Usage¶

Pull a Model¶

# Pull a sentiment analysis model
gpux pull distilbert-base-uncased-finetuned-sst-2-english

Specify Registry¶

# Explicitly specify Hugging Face registry
gpux pull huggingface:microsoft/DialoGPT-medium

# Use short alias
gpux pull hf:microsoft/DialoGPT-medium

Pull Specific Version¶

# Pull a specific revision/tag
gpux pull microsoft/DialoGPT-medium --revision v1.0

📦 Model Types¶

Text Classification¶

# Sentiment analysis
gpux pull distilbert-base-uncased-finetuned-sst-2-english

# Run inference
gpux run distilbert-base-uncased-finetuned-sst-2-english --input '{"inputs": "I love this!"}'

Text Generation¶

# GPT-style model
gpux pull facebook/opt-125m

# Run inference
gpux run facebook/opt-125m --input '{"inputs": "The future of AI is"}'

Embeddings¶

# Sentence embeddings
gpux pull sentence-transformers/all-MiniLM-L6-v2

# Run inference
gpux run sentence-transformers/all-MiniLM-L6-v2 --input '{"inputs": "Hello world"}'

Question Answering¶

# QA model
gpux pull distilbert-base-cased-distilled-squad

# Run inference
gpux run distilbert-base-cased-distilled-squad --input '{"question": "What is AI?", "context": "AI is artificial intelligence"}'

💾 Model Caching¶

Cache Location¶

Models are cached in: - macOS/Linux: ~/.gpux/models/ - Windows: %USERPROFILE%\.gpux\models\

Cache Structure¶

~/.gpux/models/
├── distilbert-base-uncased-finetuned-sst-2-english/
│   ├── model.onnx
│   ├── gpux.yml
│   ├── tokenizer.json
│   └── config.json
└── facebook-opt-125m/
    ├── model.onnx
    ├── gpux.yml
    └── ...

Force Re-download¶

# Force re-download and conversion
gpux pull distilbert-base-uncased-finetuned-sst-2-english --force

🔍 Model Information¶

Inspect Pulled Model¶

# Get detailed information
gpux inspect distilbert-base-uncased-finetuned-sst-2-english

Expected output:

╭─ Model Information ─────────────────────────────────────╮
│ Name      │ distilbert-base-uncased-finetuned-sst-2-english │
│ Registry  │ huggingface                                     │
│ Size      │ 268 MB                                           │
│ Format    │ onnx                                            │
│ Cached    │ ✅ Yes                                          │
╰─────────────────────────────────────────────────────────╯

╭─ Input Specifications ──────────────────────────────────╮
│ Name   │ Type    │ Shape     │ Required │
│ inputs │ string  │ variable  │ ✅       │
╰─────────────────────────────────────────────────────────╯

╭─ Output Specifications ─────────────────────────────────╮
│ Name   │ Type    │ Shape    │
│ logits │ float32 │ [1, 2]   │
╰─────────────────────────────────────────────────────────╯

⚙️ Advanced Options¶

Custom Cache Directory¶

# Use custom cache location
gpux pull microsoft/DialoGPT-medium --cache-dir /path/to/custom/cache

Verbose Output¶

# Show detailed progress
gpux pull microsoft/DialoGPT-medium --verbose

Authentication¶

For private models, set your Hugging Face token:

# Set environment variable
export HUGGINGFACE_HUB_TOKEN="your_token_here"

# Or use --token parameter
gpux pull your-org/private-model --token "your_token_here"

🐛 Troubleshooting¶

Model Not Found¶

Error: Model not found: invalid-model-name

Solutions: - Check model name spelling - Verify model exists on Hugging Face Hub - Try with full organization name: org/model-name

Download Failed¶

Error: Network error: Failed to download model

Solutions: - Check internet connection - Verify Hugging Face Hub is accessible - Try again with --force flag

Conversion Failed¶

Error: Conversion failed: Unsupported model architecture

Solutions: - Try a different model - Check if model supports ONNX conversion - Use --verbose for detailed error information

Memory Issues¶

Error: Out of memory during conversion

Solutions: - Try a smaller model - Close other applications - Use CPU-only conversion: --provider cpu

📚 Popular Models¶

Text Classification¶

# Sentiment analysis
gpux pull distilbert-base-uncased-finetuned-sst-2-english
gpux pull cardiffnlp/twitter-roberta-base-sentiment-latest

# Topic classification
gpux pull facebook/bart-large-mnli

Text Generation¶

# Small models
gpux pull facebook/opt-125m
gpux pull microsoft/DialoGPT-small

# Medium models
gpux pull microsoft/DialoGPT-medium
gpux pull facebook/opt-350m

Embeddings¶

# General purpose
gpux pull sentence-transformers/all-MiniLM-L6-v2
gpux pull sentence-transformers/all-mpnet-base-v2

# Multilingual
gpux pull sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

Question Answering¶

# SQuAD models
gpux pull distilbert-base-cased-distilled-squad
gpux pull deepset/roberta-base-squad2

💡 Best Practices¶

1. Start Small¶

Begin with smaller models to test your setup:

# Start with lightweight models
gpux pull distilbert-base-uncased-finetuned-sst-2-english
gpux pull sentence-transformers/all-MiniLM-L6-v2

2. Check Model Size¶

Before pulling large models, check their size:

# Check model info before downloading
curl -s "https://huggingface.co/api/models/microsoft/DialoGPT-medium" | jq '.safetensors.total'

3. Use Specific Versions¶

For production, pin to specific model versions:

# Use specific revision
gpux pull microsoft/DialoGPT-medium --revision v1.0

4. Monitor Cache Usage¶

Keep track of your cache size:

# Check cache size
du -sh ~/.gpux/models/

💡 Key Takeaways¶

What You Learned

✅ How to pull models from Hugging Face Hub ✅ Understanding model caching and versioning ✅ Working with different model types (classification, generation, embeddings) ✅ Troubleshooting common pull issues ✅ Best practices for model management

Previous: First Steps | Next: Running Inference