Serving Models¶
Deploy models with HTTP APIs for production use.
🎯 What You'll Learn¶
- ✅ Starting HTTP server
- ✅ Making API requests
- ✅ API endpoints
- ✅ Production deployment
- ✅ Scaling strategies
🚀 Quick Start¶
Start HTTP server:
Output:
INFO: Started server on http://0.0.0.0:8080
INFO: Using provider: CoreMLExecutionProvider
INFO: Model loaded: model-name v1.0.0
📡 API Endpoints¶
Health Check¶
Response:
Model Info¶
Response:
Prediction¶
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{"text": "I love this!"}'
Response:
🔧 Server Configuration¶
Configure in gpux.yml:
Command-Line Options¶
# Custom port
gpux serve model --port 9000
# Bind to localhost only
gpux serve model --host 127.0.0.1
# Multiple workers
gpux serve model --workers 4
📊 OpenAPI Documentation¶
Automatic API documentation:
- Swagger UI:
http://localhost:8080/docs - ReDoc:
http://localhost:8080/redoc - OpenAPI spec:
http://localhost:8080/openapi.json
🐍 Python Client¶
Make requests programmatically:
import requests
url = "http://localhost:8080/predict"
data = {"text": "This is great!"}
response = requests.post(url, json=data)
result = response.json()
print(result)
🚀 Production Deployment¶
Docker¶
Create Dockerfile:
FROM python:3.11-slim
WORKDIR /app
# Install GPUX
RUN pip install gpux
# Copy model and config
COPY model.onnx .
COPY gpux.yml .
# Expose port
EXPOSE 8080
# Start server
CMD ["gpux", "serve", "model-name", "--port", "8080"]
Build and run:
Reverse Proxy (nginx)¶
server {
listen 80;
server_name api.example.com;
location / {
proxy_pass http://localhost:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
Load Balancing¶
Use multiple workers:
Or use external load balancer (nginx, HAProxy).
📈 Monitoring¶
Metrics¶
Track performance: - Request latency - Throughput (requests/sec) - Error rates - Memory usage
Logging¶
Enable verbose logging:
💡 Key Takeaways¶
What You Learned
✅ Starting HTTP server ✅ API endpoints and usage ✅ Configuration options ✅ Production deployment with Docker ✅ Scaling and monitoring
🎉 Congratulations!¶
You've completed the GPUX tutorial! You now know how to:
- ✅ Install and configure GPUX
- ✅ Build and run models
- ✅ Optimize performance
- ✅ Deploy to production
Next steps: - User Guide - Deep dive into concepts - Examples - Real-world use cases - Deployment - Production guides
Previous: Benchmarking