Apple Silicon¶

Optimize GPUX for Apple M-series chips with CoreML.

Overview¶

Apple Silicon (M1/M2/M3/M4) provides excellent performance and power efficiency through CoreML and the Neural Engine.

Execution Provider: CoreMLExecutionProvider

Supported Chips¶

M-Series¶

M4 (2024) - 16-core Neural Engine
M3 (2023) - 16-core Neural Engine
M2 (2022) - 16-core Neural Engine
M1 (2020) - 16-core Neural Engine

Variants¶

M4 Pro/Max/Ultra
M3 Pro/Max
M2 Pro/Max/Ultra
M1 Pro/Max/Ultra

Installation¶

1. Install GPUX¶

# CoreML is built-in with onnxruntime
uv add onnxruntime
uv add gpux

No additional dependencies needed!

2. Verify Installation¶

gpux inspect

Expected Output:

✅ CoreMLExecutionProvider: Available

Configuration¶

name: my-model
model:
  source: ./model.onnx

runtime:
  gpu:
    backend: coreml  # Use CoreML
    memory: 2GB
  batch_size: 1  # CoreML works best with batch_size=1

inputs:
  - name: input
    type: float32

outputs:
  - name: output
    type: float32

Performance¶

Benchmarks (M2 Pro)¶

Model	CoreML	CPU	Speedup
BERT-base	450 FPS	50 FPS	9x
ResNet-50	600 FPS	80 FPS	7.5x
MobileNet	1,200 FPS	200 FPS	6x

Power Efficiency¶

5-10x better power efficiency than discrete GPUs
Neural Engine uses minimal power
Excellent for battery-powered devices

Optimization Tips¶

1. Use Batch Size 1¶

CoreML performs best with single inference:

runtime:
  batch_size: 1

2. Model Size¶

Keep models under 1GB for best performance:

✅ BERT-base (110M parameters)
✅ ResNet-50 (25M parameters)
⚠️ GPT-2 (large models may be slower)

3. Data Types¶

CoreML supports: - FP32 (full precision) - FP16 (half precision, faster) - INT8 (quantized, fastest)

Neural Engine¶

What is the Neural Engine?¶

Dedicated ML hardware on Apple Silicon
16-core design (11 TOPS on M1, 15+ TOPS on M3/M4)
Optimized for matrix operations

Enabling Neural Engine¶

CoreML automatically uses the Neural Engine when available:

from gpux import GPUXRuntime

# CoreML will use Neural Engine automatically
runtime = GPUXRuntime(
    model_path="model.onnx",
    provider="coreml"
)

Troubleshooting¶

CoreML Not Available¶

Cause: Running on Intel Mac or non-macOS system

Solution: CoreML only works on Apple Silicon Macs

Slow Performance¶

Reduce model size: Keep under 1GB
Use batch_size=1: CoreML optimized for single inference
Check model compatibility: Some ops not supported

Model Conversion Errors¶

Some ONNX operations are not supported by CoreML. Fallback to CPU:

runtime:
  gpu:
    backend: cpu

Best Practices¶

Optimize for Neural Engine

Keep models small (<1GB)
Use FP16 or INT8 precision
Batch size = 1

Power Efficiency

CoreML is perfect for: - Battery-powered inference - Edge deployments - Always-on ML services

Temperature Monitoring

# Check system temperature
sudo powermetrics --samplers smc

Zero Setup Required

No drivers, no CUDA toolkit - CoreML just works!

Comparison: CoreML vs CPU¶

Metric	CoreML (M2)	CPU (M2)
BERT Throughput	450 FPS	50 FPS
Power Usage	5W	15W
Temperature	Low	Medium