Model arbitrage: orchestrating multi-LLM pipelines in production
Choosing between public tier-1s, custom fine-tunes, and private on-premise open-source models inside high-regulatory constraints. Lessons learned on latency vs cost.
FIG. 01 Hudson Group architectural analysis and deployment trajectory.
Relying on a single AI model for all tasks is a design anti-pattern. Enterprise workloads have diverse requirements: some demand high reasoning ability, others require low latency, and many are bound by strict data residency laws.
The solution is Model Arbitrage—an orchestration layer that routes tasks dynamically to the most cost-effective and compliant model suited for that specific request.
LESSON 04 — SYSTEM INFRASTRUCTURE
Orchestrating the routing middleware
In a recent utility deployment, we routed SCADA data parsing to an on-premise model. This guaranteed data privacy and zero network costs. When the agent encountered complex anomaly detection cases, the routing middleware escalated the task to a private instance of a Tier-1 reasoning model. This strategy reduced average latency by 68% and token costs by 52%.
Orchestrating this middleware requires a dynamic classification gate. Every incoming payload is first analyzed by a lightweight local classifier (often a fine-tuned small model like Llama-3-8B). This classifier estimates the task complexity and token size. If the task is a simple database lookup or formatting exercise, it is executed locally. If it requires multi-step planning or mathematics, it goes to a Tier-1 model.
Additionally, model arbitrage protects against vendor lock-in. If an API provider changes its pricing structure or suffers an outage, the routing engine automatically updates its routing weights. By designing for model independence, enterprise architectures remain resilient, cost-optimized, and compliant under changing market conditions.
Marcus leads enterprise assessment and roadmap engagements at Hudson Group, with a focus on regulated TMT organizations moving from pilot to production. He has overseen deployments across Switzerland, Poland, and the wider EU.
