Practical Examples: Audio Models with GPUX CLI¶

Step-by-step examples for using modern HuggingFace audio models with the GPUX CLI.

🎯 Recommended Models to Get Started¶

Whisper Tiny (Fast and Lightweight)¶

# 1. Download the model
gpux pull openai/whisper-tiny

# 2. Run inference
gpux run openai/whisper-tiny \
  --input '{"audio": "path/to/audio.wav"}'

Whisper Base (Good Balance)¶

# 1. Download
gpux pull openai/whisper-base

# 2. Run with JSON file
echo '{"audio": "audio.wav"}' > input.json
gpux run openai/whisper-base --file input.json

# 3. Save results
gpux run openai/whisper-base \
  --file input.json \
  --output results.json

📝 Complete Examples by Model¶

1. Whisper Small (Better Precision)¶

# Step 1: Download the model
gpux pull openai/whisper-small

# Step 2: Create input file
cat > whisper_input.json << EOF
{
  "audio": "/path/to/your/audio.wav"
}
EOF

# Step 3: Run inference
gpux run openai/whisper-small \
  --file whisper_input.json \
  --output whisper_results.json

# Step 4: View results
cat whisper_results.json

2. Wav2Vec2 Base¶

# Download
gpux pull facebook/wav2vec2-base-960h

# Run
gpux run facebook/wav2vec2-base-960h \
  --input '{"audio": "audio.wav"}' \
  --output wav2vec2_results.json

3. HuBERT Base¶

# Download
gpux pull facebook/hubert-base-ls960

# Run with specific provider
gpux run facebook/hubert-base-ls960 \
  --input '{"audio": "audio.wav"}' \
  --provider coreml \
  --output hubert_results.json

4. SpeechT5 (Text-to-Speech)¶

# Download
gpux pull microsoft/speecht5_tts

# Run (note: SpeechT5 requires text as input)
gpux run microsoft/speecht5_tts \
  --input '{"text": "Hello world"}' \
  --output speech_output.json

🔄 Complete Workflows¶

Workflow 1: Compare Multiple Models¶

#!/bin/bash
# Script to compare multiple audio models

AUDIO_FILE="path/to/audio.wav"

# Models to test
MODELS=(
  "openai/whisper-tiny"
  "openai/whisper-base"
  "openai/whisper-small"
  "facebook/wav2vec2-base-960h"
)

# Download all models
for model in "${MODELS[@]}"; do
  echo "Downloading $model..."
  gpux pull "$model"
done

# Test each model
for model in "${MODELS[@]}"; do
  echo "Testing $model..."
  MODEL_NAME=$(echo "$model" | tr '/' '--')
  gpux run "$model" \
    --input "{\"audio\": \"$AUDIO_FILE\"}" \
    --output "${MODEL_NAME}_results.json"
done

echo "All tests completed! Results saved in *_results.json files"

Workflow 2: Performance Benchmark¶

#!/bin/bash
# Script to benchmark audio models

MODEL="openai/whisper-tiny"
AUDIO_FILE="audio.wav"

# Download model if it doesn't exist
gpux pull "$MODEL"

# Run benchmark
gpux run "$MODEL" \
  --input "{\"audio\": \"$AUDIO_FILE\"}" \
  --benchmark \
  --runs 100 \
  --warmup 10 \
  --output benchmark_results.json

Workflow 3: Batch Processing¶

#!/bin/bash
# Script to process multiple audio files

MODEL="openai/whisper-tiny"
AUDIO_FILES=("audio1.wav" "audio2.wav" "audio3.wav")

# Download model
gpux pull "$MODEL"

# Process each file
for audio_file in "${AUDIO_FILES[@]}"; do
  output_file="${audio_file%.wav}_results.json"

  echo "Processing $audio_file..."
  gpux run "$MODEL" \
    --input "{\"audio\": \"$audio_file\"}" \
    --output "$output_file"
done

🎵 Input Formats¶

Local File¶

gpux run openai/whisper-tiny \
  --input '{"audio": "/absolute/path/to/audio.wav"}'

URL¶

gpux run openai/whisper-tiny \
  --input '{"audio": "https://example.com/audio.mp3"}'

Base64 (for small files)¶

# Encode file to base64
AUDIO_B64=$(base64 -i audio.wav)

# Use in GPUX
gpux run openai/whisper-tiny \
  --input "{\"audio\": \"data:audio/wav;base64,$AUDIO_B64\"}"

JSON File¶

{
  "audio": "/path/to/audio.wav"
}

gpux run openai/whisper-tiny --file input.json

⚡ Performance Optimization¶

Use Specific GPU Provider¶

# Apple Silicon (CoreML)
gpux run openai/whisper-tiny \
  --input '{"audio": "audio.wav"}' \
  --provider coreml

# NVIDIA (CUDA)
gpux run openai/whisper-tiny \
  --input '{"audio": "audio.wav"}' \
  --provider cuda

# AMD (ROCm)
gpux run openai/whisper-tiny \
  --input '{"audio": "audio.wav"}' \
  --provider rocm

Benchmark with Multiple Configurations¶

# Test with different providers
for provider in coreml cuda cpu; do
  echo "Testing with $provider..."
  gpux run openai/whisper-tiny \
    --input '{"audio": "audio.wav"}' \
    --provider "$provider" \
    --benchmark \
    --runs 50 \
    --output "benchmark_${provider}.json"
done

🔍 Inspect Models¶

View Model Information¶

# Inspect downloaded model
gpux inspect openai/whisper-tiny

This will show: - Model inputs and outputs - Shapes and data types - Preprocessing configuration - GPU provider information

📊 Results Analysis¶

View Results in Readable Format¶

# Run and view results
gpux run openai/whisper-tiny \
  --input '{"audio": "audio.wav"}' | jq .

Compare Results from Multiple Models¶

#!/bin/bash
# Script to compare results

MODELS=("openai/whisper-tiny" "openai/whisper-base")
AUDIO="audio.wav"

for model in "${MODELS[@]}"; do
  MODEL_NAME=$(echo "$model" | tr '/' '--')
  gpux run "$model" \
    --input "{\"audio\": \"$AUDIO\"}" \
    --output "${MODEL_NAME}_results.json"
done

# Compare results
echo "Comparing results..."
for result_file in *_results.json; do
  echo "=== $result_file ==="
  cat "$result_file" | jq .
done

🛠️ Troubleshooting¶

Verify Model is Downloaded¶

# List downloaded models
ls -la ~/.gpux/models/

# Verify specific model
ls -la ~/.gpux/models/openai--whisper-tiny/

Re-download Model¶

# Force re-download
gpux pull openai/whisper-tiny --force

View Detailed Logs¶

# Pull with verbose
gpux pull openai/whisper-tiny --verbose

# Run with verbose
gpux run openai/whisper-tiny \
  --input '{"audio": "audio.wav"}' \
  --verbose

💡 Tips and Best Practices¶

Start with small models: Use whisper-tiny first to verify everything works
Use absolute paths: To avoid issues with relative paths
Save results: Use --output to save results and compare them later
Verify audio format: Make sure the audio is WAV, MP3, or FLAC
Use benchmark: To compare performance between models and providers
Check logs: Use --verbose if something doesn't work as expected