GPUX¶
Docker-like GPU Runtime for ML Inference¶
GPUX provides universal GPU compatibility for ML inference workloads. Run the same model on any GPU without compatibility issues.
⚡ Why GPUX?¶
🌍 Universal GPU Support¶
Works on NVIDIA, AMD, Apple Silicon, Intel, and Windows GPUs. No more "works on my GPU" problems.
🐳 Docker-like UX¶
Familiar commands and configuration. If you know Docker, you know GPUX.
⚙️ Zero Configuration¶
Automatically selects the best GPU provider. Works out of the box.
🚀 High Performance¶
Leverages optimized ONNX Runtime backends with TensorRT, CUDA, CoreML, and more.
🔧 Production Ready¶
Built on mature, battle-tested technologies. Ready for production workloads.
🐍 Python First¶
Simple Python API for seamless integration into your ML pipelines.
🎯 Quick Example¶
Pull models from Hugging Face and run inference - no configuration needed!
# Pull a modern sentiment analysis model
gpux pull cardiffnlp/twitter-roberta-base-sentiment-latest
# Run inference
gpux run cardiffnlp/twitter-roberta-base-sentiment-latest \
--input '{"inputs": "I love this product!"}'
# Start HTTP server
gpux serve cardiffnlp/twitter-roberta-base-sentiment-latest --port 8080
Zero Configuration
GPUX automatically: - Downloads and converts models to ONNX - Generates configuration - Selects the best GPU provider - Handles input preprocessing
Advanced Configuration
For custom models or advanced settings, see Configuration Guide.
🖥️ Supported Platforms¶
| Platform | GPU | Provider | Status |
|---|---|---|---|
| NVIDIA | CUDA | TensorRT, CUDA | ✅ Supported |
| AMD | ROCm | ROCm | ✅ Supported |
| Apple | Metal | CoreML | ✅ Supported |
| Intel | OpenVINO | OpenVINO | ✅ Supported |
| Windows | DirectML | DirectML | ✅ Supported |
| Universal | CPU | CPU | ✅ Supported |
📦 Installation¶
Install GPUX using uv (recommended) or pip:
Why uv?
We recommend using uv for faster, more reliable dependency management.
🚀 Key Features¶
Automatic Provider Selection¶
GPUX automatically selects the best execution provider for your hardware:
from gpux import GPUXRuntime
runtime = GPUXRuntime(model_path="model.onnx")
# Automatically uses:
# - TensorRT/CUDA on NVIDIA GPUs
# - CoreML on Apple Silicon
# - ROCm on AMD GPUs
# - CPU as fallback
Benchmarking Built-in¶
Measure performance with ease:
╭─ Benchmark Results ─────────────────────╮
│ Mean Time │ 0.42 ms │
│ Std Time │ 0.05 ms │
│ Min Time │ 0.38 ms │
│ Max Time │ 0.55 ms │
│ Throughput │ 2,380 fps │
╰─────────────────────────────────────────╯
HTTP Server¶
Serve models with a single command:
Automatic OpenAPI/Swagger documentation at /docs.
📚 Learn More¶
User Guide¶
In-depth documentation of core concepts and features.
API Reference¶
Complete CLI, configuration, and Python API reference.
Deployment¶
Deploy to Docker, Kubernetes, AWS, GCP, Azure, and edge devices.
🌟 Show Your Support¶
If you find GPUX useful, please consider:
- ⭐ Star us on GitHub
- 🐛 Report bugs or request features
- 💬 Join our Discord community
- 📢 Share on Twitter
🤝 Contributing¶
We welcome contributions! See our Contributing Guide to get started.
📄 License¶
GPUX is licensed under the MIT License.
Ready to get started? Check out our Installation Guide!