AWS Deployment¶

Deploy GPUX models on Amazon Web Services with GPU support.

🎯 Overview¶

Complete guide for deploying GPUX on AWS with EC2 GPU instances and managed services.

In Development

Detailed deployment guide for AWS is being developed. Basic functionality is available.

📚 What Will Be Covered¶

🔄 EC2 GPU Instances: P3, P4, G4, G5 instance types
🔄 ECS with GPU: Container orchestration with GPU support
🔄 EKS with GPU: Kubernetes on AWS with GPU nodes
🔄 Lambda Integration: Serverless inference patterns
🔄 S3 Model Storage: Model artifact storage and retrieval
🔄 CloudWatch Monitoring: Metrics and logging
🔄 Cost Optimization: Spot instances and auto-scaling

🚀 Quick Start¶

EC2 GPU Instance¶

# Launch EC2 instance with GPU
aws ec2 run-instances \
  --image-id ami-0c02fb55956c7d316 \
  --instance-type p3.2xlarge \
  --key-name my-key \
  --security-group-ids sg-12345678 \
  --subnet-id subnet-12345678

# Connect and install GPUX
ssh -i my-key.pem ec2-user@<instance-ip>
sudo yum update -y
sudo yum install -y python3 python3-pip
pip3 install gpux

# Pull and serve model
gpux pull microsoft/DialoGPT-medium
gpux serve microsoft/DialoGPT-medium --port 8080

Docker on EC2¶

FROM python:3.11-slim

WORKDIR /app
RUN pip install gpux

# Pull model from Hugging Face Hub
RUN gpux pull microsoft/DialoGPT-medium

EXPOSE 8080
CMD ["gpux", "serve", "microsoft/DialoGPT-medium", "--port", "8080"]

# Build and run on EC2
docker build -t gpux-model .
docker run -p 8080:8080 --gpus all gpux-model

🔧 Advanced Configuration¶

ECS with GPU Support¶

{
  "family": "gpux-model",
  "taskRoleArn": "arn:aws:iam::123456789012:role/ecsTaskRole",
  "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["EC2"],
  "cpu": "2048",
  "memory": "4096",
  "containerDefinitions": [
    {
      "name": "gpux-model",
      "image": "my-gpux-model:latest",
      "portMappings": [
        {
          "containerPort": 8080,
          "protocol": "tcp"
        }
      ],
      "resourceRequirements": [
        {
          "type": "GPU",
          "value": "1"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/gpux-model",
          "awslogs-region": "us-west-2",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ]
}

EKS with GPU Nodes¶

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpux-model-aws
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpux-model-aws
  template:
    metadata:
      labels:
        app: gpux-model-aws
    spec:
      containers:
      - name: gpux
        image: my-gpux-model:latest
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "4Gi"
            cpu: "2"
            nvidia.com/gpu: 1
          limits:
            memory: "8Gi"
            cpu: "4"
            nvidia.com/gpu: 1
      nodeSelector:
        node.kubernetes.io/instance-type: p3.2xlarge

💡 Key Takeaways¶

Success

✅ GPU Instance Support: P3, P4, G4, G5 instances ✅ Container Orchestration: ECS and EKS with GPU support ✅ Model Registry Integration: Pull models from Hugging Face Hub ✅ Auto-scaling: EC2 Auto Scaling Groups ✅ Monitoring: CloudWatch integration

Previous: Kubernetes → | Next: Google Cloud →