Testing Audio Preprocessing¶
Complete guide for testing audio preprocessing in GPUX.
🚀 Ways to Test¶
1. Automated Test Script¶
Run the test script that includes several test cases:
This script tests: - ✅ Raw audio preprocessing - ✅ Mel spectrogram extraction (Whisper) - ✅ Pipeline integration - ✅ Loading from base64
2. From Command Line (CLI)¶
Option A: With a local model¶
- Create a configuration file
gpux.yml:
name: audio-model
version: 1.0.0
model:
source: ./model.onnx
format: onnx
inputs:
- name: audio
type: float32
shape: [1, 16000]
outputs:
- name: output
type: float32
shape: [1, 1000]
preprocessing:
audio_sample_rate: 16000
audio_feature_extraction: raw
- Create an input file
input.json:
- Run the model:
Option B: With audio file directly¶
# With local file
gpux run audio-model --input '{"audio": "/path/to/audio.wav"}'
# With URL
gpux run audio-model --input '{"audio": "https://example.com/audio.mp3"}'
# With base64 (for small tests)
gpux run audio-model --input '{"audio": "data:audio/wav;base64,UklGRiQAAABXQVZFZm10..."}'
Option C: For Whisper models (mel spectrogram)¶
# gpux.yml
preprocessing:
audio_sample_rate: 16000
audio_feature_extraction: mel_spectrogram
audio_n_mels: 80
audio_n_fft: 400
audio_hop_length: 160
3. Programmatically (Python)¶
Basic Example¶
from pathlib import Path
from gpux.config.parser import PreprocessingConfig
from gpux.core.models import InputSpec, ModelInfo, OutputSpec
from gpux.core.preprocessing.audio import AudioPreprocessor
# Create preprocessor
preprocessor = AudioPreprocessor()
# Configuration
config = PreprocessingConfig(
audio_sample_rate=16000,
audio_feature_extraction="raw",
)
# Model information
model_info = ModelInfo(
name="test-model",
version="1.0.0",
format="onnx",
path=Path("model.onnx"),
inputs=[
InputSpec(name="audio", type="float32", shape=[1, 16000]),
],
outputs=[OutputSpec(name="output", type="float32", shape=[1, 1000])],
metadata={},
)
# Preprocess
input_data = {"audio": "/path/to/audio.wav"}
result = preprocessor.preprocess(input_data, config, model_info)
print(f"Shape: {result['audio'].shape}")
print(f"Dtype: {result['audio'].dtype}")
Example with Pipeline¶
from gpux.core.preprocessing.pipeline import PreprocessingPipeline
from gpux.config.parser import PreprocessingConfig
# Create pipeline
config = PreprocessingConfig(
audio_sample_rate=16000,
audio_feature_extraction="mel_spectrogram",
audio_n_mels=80,
)
pipeline = PreprocessingPipeline(config=config)
# Process
input_data = {"audio": "/path/to/audio.wav"}
result = pipeline.process(input_data, model_info)
Complete Example with Runtime¶
from gpux.core.runtime import GPUXRuntime
from gpux.config.parser import PreprocessingConfig, GPUXConfigParser
# Load configuration
parser = GPUXConfigParser()
config = parser.parse_file("gpux.yml")
# Create runtime with preprocessing
runtime = GPUXRuntime(
model_path=Path("model.onnx"),
preprocessing_config=config.preprocessing,
)
# Run inference
input_data = {"audio": "/path/to/audio.wav"}
output = runtime.infer(input_data)
print(output)
4. Create Test Audio File¶
If you need to create an audio file for testing:
import numpy as np
import soundfile as sf
# Generate test signal (1 second, 440 Hz)
duration = 1.0
sample_rate = 16000
t = np.linspace(0, duration, int(sample_rate * duration))
audio = np.sin(2 * np.pi * 440 * t).astype(np.float32)
# Save
sf.write("test_audio.wav", audio, sample_rate)
print("✓ File created: test_audio.wav")
Or using the script:
uv run python -c "
import numpy as np
import soundfile as sf
duration = 1.0
sample_rate = 16000
t = np.linspace(0, duration, int(sample_rate * duration))
audio = np.sin(2 * np.pi * 440 * t).astype(np.float32)
sf.write('test_audio.wav', audio, sample_rate)
print('✓ Created: test_audio.wav')
"
📋 Supported Audio Formats¶
- WAV (recommended)
- MP3
- FLAC
🔍 Verify It Works¶
Verify librosa is installed:¶
Verify preprocessor is registered:¶
from gpux.core.preprocessing.registry import get_registry
registry = get_registry()
preprocessors = registry.get_all_preprocessors()
print([p.__class__.__name__ for p in preprocessors])
# Should include: ['TextPreprocessor', 'ImagePreprocessor', 'AudioPreprocessor']
Verify it can handle audio:¶
from gpux.core.preprocessing.audio import AudioPreprocessor
preprocessor = AudioPreprocessor()
input_data = {"audio": "/path/to/audio.wav"}
can_handle = preprocessor.can_handle(input_data)
print(f"Can handle audio: {can_handle}") # Should be True
🐛 Troubleshooting¶
Error: "librosa library not available"¶
Solution: Install dependencies:
Error: "Audio file not found"¶
Solution: Verify the file path is correct:
Error: "Failed to load audio"¶
Solution: Verify the audio format is compatible (WAV, MP3, FLAC).
Preprocessor not selected¶
Solution: Verify the input has the "audio" key:
input_data = {"audio": "/path/to/audio.wav"} # ✓ Correct
input_data = {"sound": "/path/to/audio.wav"} # ✗ Incorrect
📚 Additional Examples¶
Example with Resampling¶
config = PreprocessingConfig(
audio_sample_rate=8000, # Resample to 8kHz
audio_feature_extraction="raw",
)
Example with Custom Mel Spectrogram¶
config = PreprocessingConfig(
audio_sample_rate=16000,
audio_feature_extraction="mel_spectrogram",
audio_n_mels=128, # More mel bands
audio_n_fft=512, # Larger FFT window
audio_hop_length=256,
)
✅ Testing Checklist¶
- Test script runs without errors
- Raw audio preprocessing works
- Mel spectrogram extraction works
- Loading from local file works
- Loading from URL works (if you have access)
- Loading from base64 works
- Resampling works correctly
- Integrated pipeline works
- Runtime with preprocessing works
🎯 Next Steps¶
Once you've verified that preprocessing works:
- Integrate with a real model: Use a real ONNX audio model (Whisper, Wav2Vec, etc.)
- Test with real audio: Use real audio files instead of synthetic signals
- Optimize configuration: Adjust parameters according to your specific model
- Test in production: Run with real data in your production environment