Skip to content

GPUX

Docker-like GPU Runtime for ML Inference

GPUX provides universal GPU compatibility for ML inference workloads. Run the same model on any GPU without compatibility issues.

Get Started View on GitHub


⚡ Why GPUX?

🌍 Universal GPU Support

Works on NVIDIA, AMD, Apple Silicon, Intel, and Windows GPUs. No more "works on my GPU" problems.

🐳 Docker-like UX

Familiar commands and configuration. If you know Docker, you know GPUX.

gpux build .
gpux run model-name
gpux serve model-name

⚙️ Zero Configuration

Automatically selects the best GPU provider. Works out of the box.

🚀 High Performance

Leverages optimized ONNX Runtime backends with TensorRT, CUDA, CoreML, and more.

🔧 Production Ready

Built on mature, battle-tested technologies. Ready for production workloads.

🐍 Python First

Simple Python API for seamless integration into your ML pipelines.


🎯 Quick Example

Pull models from Hugging Face and run inference - no configuration needed!

# Pull a modern sentiment analysis model
gpux pull cardiffnlp/twitter-roberta-base-sentiment-latest

# Run inference
gpux run cardiffnlp/twitter-roberta-base-sentiment-latest \
  --input '{"inputs": "I love this product!"}'

# Start HTTP server
gpux serve cardiffnlp/twitter-roberta-base-sentiment-latest --port 8080
# Pull Whisper for speech-to-text
gpux pull openai/whisper-base

# Run inference on audio file
gpux run openai/whisper-base \
  --input '{"audio": "path/to/audio.wav"}'
# Pull Vision Transformer for image classification
gpux pull google/vit-base-patch16-224

# Run inference on image
gpux run google/vit-base-patch16-224 \
  --input '{"image": "path/to/image.jpg"}'

Zero Configuration

GPUX automatically: - Downloads and converts models to ONNX - Generates configuration - Selects the best GPU provider - Handles input preprocessing

Advanced Configuration

For custom models or advanced settings, see Configuration Guide.


🖥️ Supported Platforms

Platform GPU Provider Status
NVIDIA CUDA TensorRT, CUDA ✅ Supported
AMD ROCm ROCm ✅ Supported
Apple Metal CoreML ✅ Supported
Intel OpenVINO OpenVINO ✅ Supported
Windows DirectML DirectML ✅ Supported
Universal CPU CPU ✅ Supported

📦 Installation

Install GPUX using uv (recommended) or pip:

uv add gpux
pip install gpux

Why uv?

We recommend using uv for faster, more reliable dependency management.


🚀 Key Features

Automatic Provider Selection

GPUX automatically selects the best execution provider for your hardware:

from gpux import GPUXRuntime

runtime = GPUXRuntime(model_path="model.onnx")
# Automatically uses:
# - TensorRT/CUDA on NVIDIA GPUs
# - CoreML on Apple Silicon
# - ROCm on AMD GPUs
# - CPU as fallback

Benchmarking Built-in

Measure performance with ease:

gpux run model-name --benchmark --runs 1000
╭─ Benchmark Results ─────────────────────╮
│ Mean Time     │ 0.42 ms                 │
│ Std Time      │ 0.05 ms                 │
│ Min Time      │ 0.38 ms                 │
│ Max Time      │ 0.55 ms                 │
│ Throughput    │ 2,380 fps               │
╰─────────────────────────────────────────╯

HTTP Server

Serve models with a single command:

gpux serve model-name --port 8080

Automatic OpenAPI/Swagger documentation at /docs.


📚 Learn More

Tutorial

Step-by-step guide from installation to production deployment.

User Guide

In-depth documentation of core concepts and features.

Examples

Real-world examples: sentiment analysis, image classification, LLM inference, and more.

API Reference

Complete CLI, configuration, and Python API reference.

Deployment

Deploy to Docker, Kubernetes, AWS, GCP, Azure, and edge devices.

Advanced

Performance optimization, custom providers, production best practices.


🌟 Show Your Support

If you find GPUX useful, please consider:


🤝 Contributing

We welcome contributions! See our Contributing Guide to get started.


📄 License

GPUX is licensed under the MIT License.


Ready to get started? Check out our Installation Guide!