Hugging Face Models Examples¶
Real-world examples of using popular Hugging Face models with GPUX.
๐ฏ What You'll Learn¶
- โ Pulling and running popular Hugging Face models
- โ Different model types and their use cases
- โ Input/output formats for each model type
- โ Performance characteristics and optimization tips
- โ Common troubleshooting scenarios
๐ Text Classification¶
Sentiment Analysis¶
DistilBERT Sentiment Analysis¶
# Pull the model
gpux pull distilbert-base-uncased-finetuned-sst-2-english
# Run inference
gpux run distilbert-base-uncased-finetuned-sst-2-english --input '{"inputs": "I love this product!"}'
Expected Output:
Batch Processing:
# Create batch input file
cat > batch_sentiment.json << EOF
[
{"inputs": "I love this product!"},
{"inputs": "This is terrible."},
{"inputs": "It's okay, nothing special."}
]
EOF
# Run batch inference
gpux run distilbert-base-uncased-finetuned-sst-2-english --file batch_sentiment.json
Twitter Sentiment Analysis¶
# Pull Twitter-specific model
gpux pull cardiffnlp/twitter-roberta-base-sentiment-latest
# Run inference
gpux run cardiffnlp/twitter-roberta-base-sentiment-latest --input '{"inputs": "Just had the best coffee ever! โ๏ธ"}'
Topic Classification¶
BART MNLI (Zero-shot Classification)¶
# Pull BART MNLI model
gpux pull facebook/bart-large-mnli
# Run zero-shot classification
gpux run facebook/bart-large-mnli --input '{
"inputs": "I love pizza",
"candidate_labels": ["food", "travel", "sports"]
}'
Expected Output:
{
"sequence": "I love pizza",
"labels": ["food", "travel", "sports"],
"scores": [0.95, 0.03, 0.02]
}
๐ฃ๏ธ Text Generation¶
GPT-style Models¶
OPT-125M (Small GPT Model)¶
# Pull OPT model
gpux pull facebook/opt-125m
# Run text generation
gpux run facebook/opt-125m --input '{"inputs": "The future of AI is"}'
Expected Output:
{
"generated_text": "The future of AI is bright and full of possibilities. As technology continues to advance..."
}
DialoGPT (Dialog Generation)¶
# Pull DialoGPT model
gpux pull microsoft/DialoGPT-medium
# Run dialog generation
gpux run microsoft/DialoGPT-medium --input '{"inputs": "Hello, how are you?"}'
Expected Output:
Advanced Text Generation¶
With Parameters¶
# Text generation with parameters
gpux run facebook/opt-125m --input '{
"inputs": "The future of AI is",
"max_length": 50,
"temperature": 0.7,
"do_sample": true
}'
๐ Question Answering¶
SQuAD Models¶
DistilBERT SQuAD¶
# Pull DistilBERT SQuAD model
gpux pull distilbert-base-cased-distilled-squad
# Run question answering
gpux run distilbert-base-cased-distilled-squad --input '{
"question": "What is artificial intelligence?",
"context": "Artificial intelligence (AI) is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals."
}'
Expected Output:
RoBERTa SQuAD (Higher Accuracy)¶
# Pull RoBERTa SQuAD model
gpux pull deepset/roberta-base-squad2
# Run question answering
gpux run deepset/roberta-base-squad2 --input '{
"question": "When was GPUX created?",
"context": "GPUX is a Docker-like runtime for ML inference that was created in 2025 to solve GPU compatibility issues."
}'
๐งฎ Embeddings¶
Sentence Embeddings¶
All-MiniLM-L6-v2 (General Purpose)¶
# Pull embedding model
gpux pull sentence-transformers/all-MiniLM-L6-v2
# Generate embeddings
gpux run sentence-transformers/all-MiniLM-L6-v2 --input '{"inputs": "Hello world"}'
Expected Output:
All-mpnet-base-v2 (Higher Quality)¶
# Pull higher quality embedding model
gpux pull sentence-transformers/all-mpnet-base-v2
# Generate embeddings
gpux run sentence-transformers/all-mpnet-base-v2 --input '{"inputs": "Hello world"}'
Expected Output:
Multilingual Embeddings¶
Multilingual MiniLM¶
# Pull multilingual model
gpux pull sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
# Generate embeddings in different languages
gpux run sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 --input '{"inputs": "Bonjour le monde"}'
gpux run sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 --input '{"inputs": "Hola mundo"}'
๐ Serving Models¶
HTTP API Server¶
Start Server¶
# Start server with sentiment model
gpux serve distilbert-base-uncased-finetuned-sst-2-english --port 8080
Test API¶
# Test sentiment analysis API
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{"inputs": "I love this product!"}'
Expected Response:
Batch API Requests¶
# Test batch processing
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '[
{"inputs": "I love this!"},
{"inputs": "This is terrible."},
{"inputs": "It'\''s okay."}
]'
Health Check¶
Expected Response:
{
"status": "healthy",
"model": "distilbert-base-uncased-finetuned-sst-2-english",
"uptime": "00:05:23"
}
๐ Model Inspection¶
Inspect Model Details¶
Expected Output:
โญโ Model Information โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Name โ distilbert-base-uncased-finetuned-sst-2-english โ
โ Registry โ huggingface โ
โ Size โ 268 MB โ
โ Format โ onnx โ
โ Cached โ โ
Yes โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Input Specifications โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Name โ Type โ Shape โ Required โ
โ inputs โ string โ variable โ โ
โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Output Specifications โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Name โ Type โ Shape โ
โ logits โ float32 โ [1, 2] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Runtime Information โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Provider โ CoreMLExecutionProvider โ
โ Backend โ auto โ
โ GPU Memoryโ 2GB โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Compare Models¶
# Inspect different models
gpux inspect distilbert-base-uncased-finetuned-sst-2-english
gpux inspect facebook/opt-125m
gpux inspect sentence-transformers/all-MiniLM-L6-v2
โก Performance Optimization¶
Provider Selection¶
Apple Silicon (M1/M2/M3)¶
# Use CoreML for best performance
gpux run distilbert-base-uncased-finetuned-sst-2-english \
--input '{"inputs": "test"}' \
--provider coreml
NVIDIA GPUs¶
# Use CUDA for NVIDIA GPUs
gpux run distilbert-base-uncased-finetuned-sst-2-english \
--input '{"inputs": "test"}' \
--provider cuda
# Use TensorRT for maximum performance
gpux run distilbert-base-uncased-finetuned-sst-2-english \
--input '{"inputs": "test"}' \
--provider tensorrt
AMD GPUs¶
# Use ROCm for AMD GPUs
gpux run distilbert-base-uncased-finetuned-sst-2-english \
--input '{"inputs": "test"}' \
--provider rocm
Benchmarking¶
Basic Benchmark¶
# Run benchmark
gpux run distilbert-base-uncased-finetuned-sst-2-english \
--input '{"inputs": "test"}' \
--benchmark
Expected Output:
โญโ Benchmark Results โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Metric โ Value โ
โ Mean Time โ 0.42 ms โ
โ Min Time โ 0.38 ms โ
โ Max Time โ 1.25 ms โ
โ Std Dev โ 0.08 ms โ
โ Throughput Fps โ 2380.9 โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Custom Benchmark¶
# Custom benchmark with more runs
gpux run distilbert-base-uncased-finetuned-sst-2-english \
--input '{"inputs": "test"}' \
--benchmark \
--runs 1000 \
--warmup 50
Save Benchmark Results¶
# Save benchmark to file
gpux run distilbert-base-uncased-finetuned-sst-2-english \
--input '{"inputs": "test"}' \
--benchmark \
--runs 1000 \
--output benchmark_results.json
๐ Troubleshooting¶
Common Issues¶
Model Not Found¶
Error: Model not found: invalid-model-name
Solution:
# Check model name on Hugging Face Hub
# Pull the correct model
gpux pull distilbert-base-uncased-finetuned-sst-2-english
Conversion Failed¶
Error: Conversion failed: Unsupported model architecture
Solution:
# Try a different model
gpux pull facebook/opt-125m
# Use verbose mode for details
gpux pull microsoft/DialoGPT-medium --verbose
Memory Issues¶
Error: Out of memory during conversion
Solution:
# Use CPU-only conversion
gpux pull microsoft/DialoGPT-medium --provider cpu
# Try a smaller model
gpux pull facebook/opt-125m
Input Format Errors¶
Error: Missing required input: 'inputs'
Solution:
# Check correct input format
gpux inspect distilbert-base-uncased-finetuned-sst-2-english
# Use correct input format
gpux run distilbert-base-uncased-finetuned-sst-2-english --input '{"inputs": "test"}'
Debug Mode¶
# Enable debug logging
export GPUX_LOG_LEVEL=DEBUG
gpux run distilbert-base-uncased-finetuned-sst-2-english --input '{"inputs": "test"}' --verbose
๐ Performance Comparison¶
Model Size vs Performance¶
| Model | Size | Type | Use Case | Performance |
|---|---|---|---|---|
distilbert-base-uncased-finetuned-sst-2-english |
268 MB | Classification | Sentiment | Fast |
facebook/opt-125m |
500 MB | Generation | Text | Medium |
microsoft/DialoGPT-medium |
1.2 GB | Generation | Dialog | Slower |
sentence-transformers/all-MiniLM-L6-v2 |
90 MB | Embeddings | Similarity | Very Fast |
facebook/bart-large-mnli |
1.6 GB | Classification | Zero-shot | Slower |
Provider Performance (Apple Silicon)¶
| Provider | Speed | Memory | Compatibility |
|---|---|---|---|
| CoreML | Fastest | Low | Apple Silicon only |
| CPU | Slowest | Lowest | Universal |
| Auto | Optimal | Medium | Universal |
๐ก Best Practices¶
1. Model Selection¶
- Start Small: Begin with smaller models for testing
- Check Compatibility: Verify ONNX conversion support
- Consider Use Case: Choose models optimized for your task
2. Input Formatting¶
- Use Standard Formats: Follow Hugging Face conventions
- Validate Inputs: Check input format before running
- Batch Processing: Use arrays for multiple inputs
3. Performance Optimization¶
- Provider Selection: Use optimal provider for your hardware
- Benchmarking: Always benchmark before production
- Caching: Models are cached locally for fast access
4. Error Handling¶
- Graceful Degradation: Handle conversion failures
- Fallback Options: Have alternative models ready
- Logging: Enable verbose logging for debugging
๐ Related Resources¶
- Pulling Models Tutorial
- Working with Registries
- Running Inference Tutorial
- Hugging Face Hub
- Hugging Face Documentation
๐ก Key Takeaways¶
What You Learned
โ How to pull and run popular Hugging Face models โ Different model types and their specific use cases โ Input/output formats for each model type โ Performance optimization techniques โ Troubleshooting common issues โ Best practices for production usage
Previous: Multi-modal | Next: Speech Recognition