Skip to content

Testing Audio Preprocessing

Complete guide for testing audio preprocessing in GPUX.


🚀 Ways to Test

1. Automated Test Script

Run the test script that includes several test cases:

uv run python scripts/test_audio_preprocessing.py

This script tests: - ✅ Raw audio preprocessing - ✅ Mel spectrogram extraction (Whisper) - ✅ Pipeline integration - ✅ Loading from base64


2. From Command Line (CLI)

Option A: With a local model

  1. Create a configuration file gpux.yml:
name: audio-model
version: 1.0.0

model:
  source: ./model.onnx
  format: onnx

inputs:
  - name: audio
    type: float32
    shape: [1, 16000]

outputs:
  - name: output
    type: float32
    shape: [1, 1000]

preprocessing:
  audio_sample_rate: 16000
  audio_feature_extraction: raw
  1. Create an input file input.json:
{
  "audio": "/path/to/your/audio.wav"
}
  1. Run the model:
gpux run audio-model --input @input.json

Option B: With audio file directly

# With local file
gpux run audio-model --input '{"audio": "/path/to/audio.wav"}'

# With URL
gpux run audio-model --input '{"audio": "https://example.com/audio.mp3"}'

# With base64 (for small tests)
gpux run audio-model --input '{"audio": "data:audio/wav;base64,UklGRiQAAABXQVZFZm10..."}'

Option C: For Whisper models (mel spectrogram)

# gpux.yml
preprocessing:
  audio_sample_rate: 16000
  audio_feature_extraction: mel_spectrogram
  audio_n_mels: 80
  audio_n_fft: 400
  audio_hop_length: 160
gpux run whisper-model --input '{"audio": "/path/to/audio.wav"}'

3. Programmatically (Python)

Basic Example

from pathlib import Path
from gpux.config.parser import PreprocessingConfig
from gpux.core.models import InputSpec, ModelInfo, OutputSpec
from gpux.core.preprocessing.audio import AudioPreprocessor

# Create preprocessor
preprocessor = AudioPreprocessor()

# Configuration
config = PreprocessingConfig(
    audio_sample_rate=16000,
    audio_feature_extraction="raw",
)

# Model information
model_info = ModelInfo(
    name="test-model",
    version="1.0.0",
    format="onnx",
    path=Path("model.onnx"),
    inputs=[
        InputSpec(name="audio", type="float32", shape=[1, 16000]),
    ],
    outputs=[OutputSpec(name="output", type="float32", shape=[1, 1000])],
    metadata={},
)

# Preprocess
input_data = {"audio": "/path/to/audio.wav"}
result = preprocessor.preprocess(input_data, config, model_info)

print(f"Shape: {result['audio'].shape}")
print(f"Dtype: {result['audio'].dtype}")

Example with Pipeline

from gpux.core.preprocessing.pipeline import PreprocessingPipeline
from gpux.config.parser import PreprocessingConfig

# Create pipeline
config = PreprocessingConfig(
    audio_sample_rate=16000,
    audio_feature_extraction="mel_spectrogram",
    audio_n_mels=80,
)
pipeline = PreprocessingPipeline(config=config)

# Process
input_data = {"audio": "/path/to/audio.wav"}
result = pipeline.process(input_data, model_info)

Complete Example with Runtime

from gpux.core.runtime import GPUXRuntime
from gpux.config.parser import PreprocessingConfig, GPUXConfigParser

# Load configuration
parser = GPUXConfigParser()
config = parser.parse_file("gpux.yml")

# Create runtime with preprocessing
runtime = GPUXRuntime(
    model_path=Path("model.onnx"),
    preprocessing_config=config.preprocessing,
)

# Run inference
input_data = {"audio": "/path/to/audio.wav"}
output = runtime.infer(input_data)

print(output)

4. Create Test Audio File

If you need to create an audio file for testing:

import numpy as np
import soundfile as sf

# Generate test signal (1 second, 440 Hz)
duration = 1.0
sample_rate = 16000
t = np.linspace(0, duration, int(sample_rate * duration))
audio = np.sin(2 * np.pi * 440 * t).astype(np.float32)

# Save
sf.write("test_audio.wav", audio, sample_rate)
print("✓ File created: test_audio.wav")

Or using the script:

uv run python -c "
import numpy as np
import soundfile as sf
duration = 1.0
sample_rate = 16000
t = np.linspace(0, duration, int(sample_rate * duration))
audio = np.sin(2 * np.pi * 440 * t).astype(np.float32)
sf.write('test_audio.wav', audio, sample_rate)
print('✓ Created: test_audio.wav')
"

📋 Supported Audio Formats

  • WAV (recommended)
  • MP3
  • FLAC

🔍 Verify It Works

Verify librosa is installed:

uv run python -c "import librosa; print(f'librosa version: {librosa.__version__}')"

Verify preprocessor is registered:

from gpux.core.preprocessing.registry import get_registry

registry = get_registry()
preprocessors = registry.get_all_preprocessors()
print([p.__class__.__name__ for p in preprocessors])
# Should include: ['TextPreprocessor', 'ImagePreprocessor', 'AudioPreprocessor']

Verify it can handle audio:

from gpux.core.preprocessing.audio import AudioPreprocessor

preprocessor = AudioPreprocessor()
input_data = {"audio": "/path/to/audio.wav"}

can_handle = preprocessor.can_handle(input_data)
print(f"Can handle audio: {can_handle}")  # Should be True

🐛 Troubleshooting

Error: "librosa library not available"

Solution: Install dependencies:

uv sync

Error: "Audio file not found"

Solution: Verify the file path is correct:

ls -la /path/to/audio.wav

Error: "Failed to load audio"

Solution: Verify the audio format is compatible (WAV, MP3, FLAC).

Preprocessor not selected

Solution: Verify the input has the "audio" key:

input_data = {"audio": "/path/to/audio.wav"}  # ✓ Correct
input_data = {"sound": "/path/to/audio.wav"}  # ✗ Incorrect

📚 Additional Examples

Example with Resampling

config = PreprocessingConfig(
    audio_sample_rate=8000,  # Resample to 8kHz
    audio_feature_extraction="raw",
)

Example with Custom Mel Spectrogram

config = PreprocessingConfig(
    audio_sample_rate=16000,
    audio_feature_extraction="mel_spectrogram",
    audio_n_mels=128,  # More mel bands
    audio_n_fft=512,   # Larger FFT window
    audio_hop_length=256,
)

✅ Testing Checklist

  • Test script runs without errors
  • Raw audio preprocessing works
  • Mel spectrogram extraction works
  • Loading from local file works
  • Loading from URL works (if you have access)
  • Loading from base64 works
  • Resampling works correctly
  • Integrated pipeline works
  • Runtime with preprocessing works

🎯 Next Steps

Once you've verified that preprocessing works:

  1. Integrate with a real model: Use a real ONNX audio model (Whisper, Wav2Vec, etc.)
  2. Test with real audio: Use real audio files instead of synthetic signals
  3. Optimize configuration: Adjust parameters according to your specific model
  4. Test in production: Run with real data in your production environment