Skip to content

Practical Examples: Audio Models with GPUX CLI

Step-by-step examples for using modern HuggingFace audio models with the GPUX CLI.


Whisper Tiny (Fast and Lightweight)

# 1. Download the model
gpux pull openai/whisper-tiny

# 2. Run inference
gpux run openai/whisper-tiny \
  --input '{"audio": "path/to/audio.wav"}'

Whisper Base (Good Balance)

# 1. Download
gpux pull openai/whisper-base

# 2. Run with JSON file
echo '{"audio": "audio.wav"}' > input.json
gpux run openai/whisper-base --file input.json

# 3. Save results
gpux run openai/whisper-base \
  --file input.json \
  --output results.json

📝 Complete Examples by Model

1. Whisper Small (Better Precision)

# Step 1: Download the model
gpux pull openai/whisper-small

# Step 2: Create input file
cat > whisper_input.json << EOF
{
  "audio": "/path/to/your/audio.wav"
}
EOF

# Step 3: Run inference
gpux run openai/whisper-small \
  --file whisper_input.json \
  --output whisper_results.json

# Step 4: View results
cat whisper_results.json

2. Wav2Vec2 Base

# Download
gpux pull facebook/wav2vec2-base-960h

# Run
gpux run facebook/wav2vec2-base-960h \
  --input '{"audio": "audio.wav"}' \
  --output wav2vec2_results.json

3. HuBERT Base

# Download
gpux pull facebook/hubert-base-ls960

# Run with specific provider
gpux run facebook/hubert-base-ls960 \
  --input '{"audio": "audio.wav"}' \
  --provider coreml \
  --output hubert_results.json

4. SpeechT5 (Text-to-Speech)

# Download
gpux pull microsoft/speecht5_tts

# Run (note: SpeechT5 requires text as input)
gpux run microsoft/speecht5_tts \
  --input '{"text": "Hello world"}' \
  --output speech_output.json

🔄 Complete Workflows

Workflow 1: Compare Multiple Models

#!/bin/bash
# Script to compare multiple audio models

AUDIO_FILE="path/to/audio.wav"

# Models to test
MODELS=(
  "openai/whisper-tiny"
  "openai/whisper-base"
  "openai/whisper-small"
  "facebook/wav2vec2-base-960h"
)

# Download all models
for model in "${MODELS[@]}"; do
  echo "Downloading $model..."
  gpux pull "$model"
done

# Test each model
for model in "${MODELS[@]}"; do
  echo "Testing $model..."
  MODEL_NAME=$(echo "$model" | tr '/' '--')
  gpux run "$model" \
    --input "{\"audio\": \"$AUDIO_FILE\"}" \
    --output "${MODEL_NAME}_results.json"
done

echo "All tests completed! Results saved in *_results.json files"

Workflow 2: Performance Benchmark

#!/bin/bash
# Script to benchmark audio models

MODEL="openai/whisper-tiny"
AUDIO_FILE="audio.wav"

# Download model if it doesn't exist
gpux pull "$MODEL"

# Run benchmark
gpux run "$MODEL" \
  --input "{\"audio\": \"$AUDIO_FILE\"}" \
  --benchmark \
  --runs 100 \
  --warmup 10 \
  --output benchmark_results.json

Workflow 3: Batch Processing

#!/bin/bash
# Script to process multiple audio files

MODEL="openai/whisper-tiny"
AUDIO_FILES=("audio1.wav" "audio2.wav" "audio3.wav")

# Download model
gpux pull "$MODEL"

# Process each file
for audio_file in "${AUDIO_FILES[@]}"; do
  output_file="${audio_file%.wav}_results.json"

  echo "Processing $audio_file..."
  gpux run "$MODEL" \
    --input "{\"audio\": \"$audio_file\"}" \
    --output "$output_file"
done

🎵 Input Formats

Local File

gpux run openai/whisper-tiny \
  --input '{"audio": "/absolute/path/to/audio.wav"}'

URL

gpux run openai/whisper-tiny \
  --input '{"audio": "https://example.com/audio.mp3"}'

Base64 (for small files)

# Encode file to base64
AUDIO_B64=$(base64 -i audio.wav)

# Use in GPUX
gpux run openai/whisper-tiny \
  --input "{\"audio\": \"data:audio/wav;base64,$AUDIO_B64\"}"

JSON File

{
  "audio": "/path/to/audio.wav"
}
gpux run openai/whisper-tiny --file input.json

⚡ Performance Optimization

Use Specific GPU Provider

# Apple Silicon (CoreML)
gpux run openai/whisper-tiny \
  --input '{"audio": "audio.wav"}' \
  --provider coreml

# NVIDIA (CUDA)
gpux run openai/whisper-tiny \
  --input '{"audio": "audio.wav"}' \
  --provider cuda

# AMD (ROCm)
gpux run openai/whisper-tiny \
  --input '{"audio": "audio.wav"}' \
  --provider rocm

Benchmark with Multiple Configurations

# Test with different providers
for provider in coreml cuda cpu; do
  echo "Testing with $provider..."
  gpux run openai/whisper-tiny \
    --input '{"audio": "audio.wav"}' \
    --provider "$provider" \
    --benchmark \
    --runs 50 \
    --output "benchmark_${provider}.json"
done

🔍 Inspect Models

View Model Information

# Inspect downloaded model
gpux inspect openai/whisper-tiny

This will show: - Model inputs and outputs - Shapes and data types - Preprocessing configuration - GPU provider information


📊 Results Analysis

View Results in Readable Format

# Run and view results
gpux run openai/whisper-tiny \
  --input '{"audio": "audio.wav"}' | jq .

Compare Results from Multiple Models

#!/bin/bash
# Script to compare results

MODELS=("openai/whisper-tiny" "openai/whisper-base")
AUDIO="audio.wav"

for model in "${MODELS[@]}"; do
  MODEL_NAME=$(echo "$model" | tr '/' '--')
  gpux run "$model" \
    --input "{\"audio\": \"$AUDIO\"}" \
    --output "${MODEL_NAME}_results.json"
done

# Compare results
echo "Comparing results..."
for result_file in *_results.json; do
  echo "=== $result_file ==="
  cat "$result_file" | jq .
done

🛠️ Troubleshooting

Verify Model is Downloaded

# List downloaded models
ls -la ~/.gpux/models/

# Verify specific model
ls -la ~/.gpux/models/openai--whisper-tiny/

Re-download Model

# Force re-download
gpux pull openai/whisper-tiny --force

View Detailed Logs

# Pull with verbose
gpux pull openai/whisper-tiny --verbose

# Run with verbose
gpux run openai/whisper-tiny \
  --input '{"audio": "audio.wav"}' \
  --verbose

💡 Tips and Best Practices

  1. Start with small models: Use whisper-tiny first to verify everything works
  2. Use absolute paths: To avoid issues with relative paths
  3. Save results: Use --output to save results and compare them later
  4. Verify audio format: Make sure the audio is WAV, MP3, or FLAC
  5. Use benchmark: To compare performance between models and providers
  6. Check logs: Use --verbose if something doesn't work as expected

📚 References