HTTP API Endpoints¶

Complete REST API reference for GPUX serving.

Overview¶

GPUX provides a FastAPI-based HTTP server with REST endpoints for inference, health checks, and model information.

Base URL: http://localhost:8080 (default)

Endpoints Summary¶

Method	Endpoint	Description
POST	`/predict`	Run inference
GET	`/health`	Health check
GET	`/info`	Model information
GET	`/metrics`	Performance metrics
GET	`/docs`	Swagger UI documentation
GET	`/redoc`	ReDoc documentation

POST /predict¶

Run inference on input data.

Request¶

Content-Type: application/json

Body:

{
  "input_name": "value",
  ...
}

Example:

curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -D '{"text": "I love this product!"}'

Response¶

Status: 200 OK

Content-Type: application/json

Body:

{
  "output_name": [values],
  ...
}

Example:

{
  "sentiment": [0.1, 0.9]
}

Error Responses¶

400 Bad Request - Invalid input

{
  "detail": "Missing required input: 'text'"
}

500 Internal Server Error - Inference failed

{
  "detail": "Inference failed: ..."
}

GET /health¶

Health check endpoint for monitoring.

Request¶

curl http://localhost:8080/health

Response¶

Status: 200 OK

Body:

{
  "status": "healthy",
  "model": "sentiment-analysis"
}

GET /info¶

Get model information and specifications.

Request¶

curl http://localhost:8080/info

Response¶

Status: 200 OK

Body:

{
  "name": "sentiment-analysis",
  "version": "1.0.0",
  "format": "onnx",
  "size_mb": 256.0,
  "inputs": [
    {
      "name": "input_ids",
      "type": "int64",
      "shape": [1, 128],
      "required": true,
      "description": "Tokenized input IDs"
    }
  ],
  "outputs": [
    {
      "name": "logits",
      "type": "float32",
      "shape": [1, 2],
      "labels": ["negative", "positive"]
    }
  ],
  "metadata": {}
}

GET /metrics¶

Get performance metrics and provider information.

Request¶

curl http://localhost:8080/metrics

Response¶

Status: 200 OK

Body:

{
  "provider": {
    "name": "CUDAExecutionProvider",
    "available": true,
    "platform": "NVIDIA CUDA",
    "description": "NVIDIA CUDA GPU acceleration"
  },
  "available_providers": [
    "CUDAExecutionProvider",
    "CPUExecutionProvider"
  ]
}

Interactive Documentation¶

Swagger UI¶

Interactive API documentation at /docs:

http://localhost:8080/docs

Features: - Try out endpoints - View request/response schemas - Authentication testing - Download OpenAPI spec

ReDoc¶

Alternative documentation at /redoc:

http://localhost:8080/redoc

Features: - Clean, readable format - API structure overview - Code samples - Search functionality

Example Requests¶

Python¶

import requests

# Predict
response = requests.post(
    "http://localhost:8080/predict",
    json={"text": "I love GPUX!"}
)
result = response.json()
print(result["sentiment"])

# Health check
health = requests.get("http://localhost:8080/health")
print(health.json())

# Model info
info = requests.get("http://localhost:8080/info")
print(info.json()["name"])

JavaScript¶

// Predict
const response = await fetch('http://localhost:8080/predict', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ text: 'I love GPUX!' })
});
const result = await response.json();
console.log(result.sentiment);

// Health check
const health = await fetch('http://localhost:8080/health');
const healthData = await health.json();
console.log(healthData.status);

cURL¶

# Predict
curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "I love GPUX!"}'

# Health check
curl http://localhost:8080/health

# Model info
curl http://localhost:8080/info

# Metrics
curl http://localhost:8080/metrics

Error Handling¶

HTTP Status Codes¶

Code	Description
200	Success
400	Bad Request (invalid input)
404	Not Found
500	Internal Server Error

Error Response Format¶

{
  "detail": "Error message here"
}

Common Errors¶

Missing Input:

{
  "detail": "Missing required input: 'text'"
}

Invalid Input Type:

{
  "detail": "Invalid input type for 'image': expected float32"
}

Inference Timeout:

{
  "detail": "Inference timeout after 30 seconds"
}

HTTP API Endpoints¶

Overview¶

Endpoints Summary¶

POST /predict¶

Request¶

Response¶

Error Responses¶

GET /health¶

Request¶

Response¶

GET /info¶

Request¶

Response¶

GET /metrics¶

Request¶

Response¶

Interactive Documentation¶

Swagger UI¶

ReDoc¶

Example Requests¶

Python¶

JavaScript¶

cURL¶

Error Handling¶

HTTP Status Codes¶

Error Response Format¶

Common Errors¶

See Also¶