Skip to content

Kubernetes Deployment

Deploy GPUX models on Kubernetes with universal GPU compatibility.


🎯 Overview

Complete guide for deploying GPUX on Kubernetes with automatic scaling and GPU resource management.

In Development

Detailed deployment guide for Kubernetes is being developed. Basic functionality is available.


📚 What Will Be Covered

  • ✅ Basic Deployment: Simple Kubernetes deployment
  • 🔄 GPU Resource Management: NVIDIA, AMD, Apple Silicon GPU support
  • 🔄 Auto-scaling: Horizontal Pod Autoscaler (HPA) configuration
  • 🔄 Service Mesh: Istio integration for traffic management
  • 🔄 Monitoring: Prometheus and Grafana integration
  • 🔄 Cost Optimization: Resource limits and node affinity
  • 🔄 Best Practices: Production-ready configurations

🚀 Quick Start

Basic Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpux-model
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpux-model
  template:
    metadata:
      labels:
        app: gpux-model
    spec:
      containers:
      - name: gpux
        image: my-gpux-model:latest
        ports:
        - containerPort: 8080
        env:
        - name: GPUX_LOG_LEVEL
          value: "info"
        resources:
          requests:
            memory: "2Gi"
            cpu: "1"
          limits:
            memory: "4Gi"
            cpu: "2"
---
apiVersion: v1
kind: Service
metadata:
  name: gpux-model-service
spec:
  selector:
    app: gpux-model
  ports:
  - port: 80
    targetPort: 8080
  type: LoadBalancer

GPU Support (NVIDIA)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpux-model-gpu
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpux-model-gpu
  template:
    metadata:
      labels:
        app: gpux-model-gpu
    spec:
      containers:
      - name: gpux
        image: my-gpux-model:latest
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "4Gi"
            cpu: "2"
            nvidia.com/gpu: 1
          limits:
            memory: "8Gi"
            cpu: "4"
            nvidia.com/gpu: 1

🔧 Advanced Configuration

Model Registry Integration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpux-huggingface-model
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpux-huggingface-model
  template:
    metadata:
      labels:
        app: gpux-huggingface-model
    spec:
      initContainers:
      - name: pull-model
        image: my-gpux-model:latest
        command: ["gpux", "pull", "microsoft/DialoGPT-medium"]
        volumeMounts:
        - name: model-cache
          mountPath: /root/.gpux
      containers:
      - name: gpux
        image: my-gpux-model:latest
        command: ["gpux", "serve", "microsoft/DialoGPT-medium", "--port", "8080"]
        ports:
        - containerPort: 8080
        volumeMounts:
        - name: model-cache
          mountPath: /root/.gpux
      volumes:
      - name: model-cache
        emptyDir: {}

Auto-scaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: gpux-model-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: gpux-model
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

💡 Key Takeaways

Success

✅ Universal GPU Support: Works with NVIDIA, AMD, Apple Silicon GPUs ✅ Model Registry Integration: Pull models from Hugging Face Hub ✅ Auto-scaling: Horizontal Pod Autoscaler support ✅ Resource Management: Proper GPU resource allocation ✅ Service Discovery: Kubernetes service integration


Previous: Docker → | Next: AWS →