Select the best LLM to respond to a given prompt in real time!
What is Model Router in Azure AI Foundry?
Model router for Azure AI Foundry is a deployable AI chat model that is trained to select the best large language model to respond to a given prompt in real time. By evaluating factors like query complexity, cost, and performance, it intelligently routes requests to the most suitable model. With that, it delivers high performance while saving on compute costs where possible, all packaged as a single model deployment.
Smaller and cheaper models are used when they're sufficient for the task, but larger and more expensive models are available for more complex tasks.
Deploying the Model Router
To use the model router, we need to initially deploy it. To do that go to Azure AI Foundry and open the model deployments. Choose the '+ Deploy model' option.
Next, select the model-router model and click on 'Confirm'.
Finally, specify the deployment name and click on 'Deploy'.
After the deployment is completed, you can use the model router in your applications.
Using the Model Router
I have created a simple Console application to demonstrate how to use the model router.
We can use model router in the same way we'd use other OpenAI chat models. Let's set the deployment name parameter to the name of our model router deployment.
builder.Services.AddAzureOpenAIChatClient(
deploymentName: "model-router-itt",
endpoint: Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!,
apiKey: Environment.GetEnvironmentVariable("OPENAI_API_KEY")!);
Then let's define two prompts and see what models our router will choose for the responses.
var prompt = "What is the capital city of France?";
var result = await kernel.InvokePromptAsync(prompt, new(executionSettings)).ConfigureAwait(false);
Console.WriteLine($"\n\n{prompt}\n{result}");
prompt = "Write a detailed blog post comparing the benefits and trade-offs of using vector search versus keyword-based search in enterprise AI applications, including practical Azure AI Search configuration examples.";
result = await kernel.InvokePromptAsync(prompt, new(executionSettings)).ConfigureAwait(false);
Console.WriteLine($"\n\n{prompt}\n{result}");
The result for the first prompt is:
We can see that the chosen model here was GPT-4.1-nano-2025-04-14. This seems reasonable as the prompt was very simple.
Let's see the result for the second prompt:
Because the second prompt was more complex than the first, we can see that the router chose the o4-mini-2025-04-16 model.
And that's it. This is how simple it is to use the model router in your applications.
One important limitation to note is that the context window limit is the limit of the smallest underlying model. Other underlying models are compatible with larger context windows, which means an API call with a larger context will succeed only if the prompt happens to be routed to the right model, otherwise the call will fail. To shorten the context window, you can do one of the following:
Thanks for sticking to the end of another article from 'Iliev Talks Tech'.
Cloud Cybersecurity AI
1moThanks for sharing, Dimitar👏👏👏