Practical Examples: Audio Models with GPUX CLI¶
Step-by-step examples for using modern HuggingFace audio models with the GPUX CLI.
🎯 Recommended Models to Get Started¶
Whisper Tiny (Fast and Lightweight)¶
# 1. Download the model
gpux pull openai/whisper-tiny
# 2. Run inference
gpux run openai/whisper-tiny \
--input '{"audio": "path/to/audio.wav"}'
Whisper Base (Good Balance)¶
# 1. Download
gpux pull openai/whisper-base
# 2. Run with JSON file
echo '{"audio": "audio.wav"}' > input.json
gpux run openai/whisper-base --file input.json
# 3. Save results
gpux run openai/whisper-base \
--file input.json \
--output results.json
📝 Complete Examples by Model¶
1. Whisper Small (Better Precision)¶
# Step 1: Download the model
gpux pull openai/whisper-small
# Step 2: Create input file
cat > whisper_input.json << EOF
{
"audio": "/path/to/your/audio.wav"
}
EOF
# Step 3: Run inference
gpux run openai/whisper-small \
--file whisper_input.json \
--output whisper_results.json
# Step 4: View results
cat whisper_results.json
2. Wav2Vec2 Base¶
# Download
gpux pull facebook/wav2vec2-base-960h
# Run
gpux run facebook/wav2vec2-base-960h \
--input '{"audio": "audio.wav"}' \
--output wav2vec2_results.json
3. HuBERT Base¶
# Download
gpux pull facebook/hubert-base-ls960
# Run with specific provider
gpux run facebook/hubert-base-ls960 \
--input '{"audio": "audio.wav"}' \
--provider coreml \
--output hubert_results.json
4. SpeechT5 (Text-to-Speech)¶
# Download
gpux pull microsoft/speecht5_tts
# Run (note: SpeechT5 requires text as input)
gpux run microsoft/speecht5_tts \
--input '{"text": "Hello world"}' \
--output speech_output.json
🔄 Complete Workflows¶
Workflow 1: Compare Multiple Models¶
#!/bin/bash
# Script to compare multiple audio models
AUDIO_FILE="path/to/audio.wav"
# Models to test
MODELS=(
"openai/whisper-tiny"
"openai/whisper-base"
"openai/whisper-small"
"facebook/wav2vec2-base-960h"
)
# Download all models
for model in "${MODELS[@]}"; do
echo "Downloading $model..."
gpux pull "$model"
done
# Test each model
for model in "${MODELS[@]}"; do
echo "Testing $model..."
MODEL_NAME=$(echo "$model" | tr '/' '--')
gpux run "$model" \
--input "{\"audio\": \"$AUDIO_FILE\"}" \
--output "${MODEL_NAME}_results.json"
done
echo "All tests completed! Results saved in *_results.json files"
Workflow 2: Performance Benchmark¶
#!/bin/bash
# Script to benchmark audio models
MODEL="openai/whisper-tiny"
AUDIO_FILE="audio.wav"
# Download model if it doesn't exist
gpux pull "$MODEL"
# Run benchmark
gpux run "$MODEL" \
--input "{\"audio\": \"$AUDIO_FILE\"}" \
--benchmark \
--runs 100 \
--warmup 10 \
--output benchmark_results.json
Workflow 3: Batch Processing¶
#!/bin/bash
# Script to process multiple audio files
MODEL="openai/whisper-tiny"
AUDIO_FILES=("audio1.wav" "audio2.wav" "audio3.wav")
# Download model
gpux pull "$MODEL"
# Process each file
for audio_file in "${AUDIO_FILES[@]}"; do
output_file="${audio_file%.wav}_results.json"
echo "Processing $audio_file..."
gpux run "$MODEL" \
--input "{\"audio\": \"$audio_file\"}" \
--output "$output_file"
done
🎵 Input Formats¶
Local File¶
URL¶
Base64 (for small files)¶
# Encode file to base64
AUDIO_B64=$(base64 -i audio.wav)
# Use in GPUX
gpux run openai/whisper-tiny \
--input "{\"audio\": \"data:audio/wav;base64,$AUDIO_B64\"}"
JSON File¶
⚡ Performance Optimization¶
Use Specific GPU Provider¶
# Apple Silicon (CoreML)
gpux run openai/whisper-tiny \
--input '{"audio": "audio.wav"}' \
--provider coreml
# NVIDIA (CUDA)
gpux run openai/whisper-tiny \
--input '{"audio": "audio.wav"}' \
--provider cuda
# AMD (ROCm)
gpux run openai/whisper-tiny \
--input '{"audio": "audio.wav"}' \
--provider rocm
Benchmark with Multiple Configurations¶
# Test with different providers
for provider in coreml cuda cpu; do
echo "Testing with $provider..."
gpux run openai/whisper-tiny \
--input '{"audio": "audio.wav"}' \
--provider "$provider" \
--benchmark \
--runs 50 \
--output "benchmark_${provider}.json"
done
🔍 Inspect Models¶
View Model Information¶
This will show: - Model inputs and outputs - Shapes and data types - Preprocessing configuration - GPU provider information
📊 Results Analysis¶
View Results in Readable Format¶
Compare Results from Multiple Models¶
#!/bin/bash
# Script to compare results
MODELS=("openai/whisper-tiny" "openai/whisper-base")
AUDIO="audio.wav"
for model in "${MODELS[@]}"; do
MODEL_NAME=$(echo "$model" | tr '/' '--')
gpux run "$model" \
--input "{\"audio\": \"$AUDIO\"}" \
--output "${MODEL_NAME}_results.json"
done
# Compare results
echo "Comparing results..."
for result_file in *_results.json; do
echo "=== $result_file ==="
cat "$result_file" | jq .
done
🛠️ Troubleshooting¶
Verify Model is Downloaded¶
# List downloaded models
ls -la ~/.gpux/models/
# Verify specific model
ls -la ~/.gpux/models/openai--whisper-tiny/
Re-download Model¶
View Detailed Logs¶
# Pull with verbose
gpux pull openai/whisper-tiny --verbose
# Run with verbose
gpux run openai/whisper-tiny \
--input '{"audio": "audio.wav"}' \
--verbose
💡 Tips and Best Practices¶
- Start with small models: Use
whisper-tinyfirst to verify everything works - Use absolute paths: To avoid issues with relative paths
- Save results: Use
--outputto save results and compare them later - Verify audio format: Make sure the audio is WAV, MP3, or FLAC
- Use benchmark: To compare performance between models and providers
- Check logs: Use
--verboseif something doesn't work as expected