Select the best LLM to respond to a given prompt in real time!

Dimitar Iliev ☁️

AI Innovation Leader ● B. Sc. Computer Science and Engineering ● 7 x Microsoft Certified ● 30 x Microsoft Applied Skills ● Tech Speaker ● The Weeknd Fan

Published Jun 19, 2025

What is Model Router in Azure AI Foundry?

Model router for Azure AI Foundry is a deployable AI chat model that is trained to select the best large language model to respond to a given prompt in real time. By evaluating factors like query complexity, cost, and performance, it intelligently routes requests to the most suitable model. With that, it delivers high performance while saving on compute costs where possible, all packaged as a single model deployment.

Smaller and cheaper models are used when they're sufficient for the task, but larger and more expensive models are available for more complex tasks.

Deploying the Model Router

To use the model router, we need to initially deploy it. To do that go to Azure AI Foundry and open the model deployments. Choose the '+ Deploy model' option.

Next, select the model-router model and click on 'Confirm'.

Finally, specify the deployment name and click on 'Deploy'.

After the deployment is completed, you can use the model router in your applications.

Using the Model Router

I have created a simple Console application to demonstrate how to use the model router.

We can use model router in the same way we'd use other OpenAI chat models. Let's set the deployment name parameter to the name of our model router deployment.

builder.Services.AddAzureOpenAIChatClient(
   deploymentName: "model-router-itt",
   endpoint: Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!,
   apiKey: Environment.GetEnvironmentVariable("OPENAI_API_KEY")!);

Then let's define two prompts and see what models our router will choose for the responses.

var prompt = "What is the capital city of France?";
var result = await kernel.InvokePromptAsync(prompt, new(executionSettings)).ConfigureAwait(false);
Console.WriteLine($"\n\n{prompt}\n{result}");

prompt = "Write a detailed blog post comparing the benefits and trade-offs of using vector search versus keyword-based search in enterprise AI applications, including practical Azure AI Search configuration examples.";
result = await kernel.InvokePromptAsync(prompt, new(executionSettings)).ConfigureAwait(false);
Console.WriteLine($"\n\n{prompt}\n{result}");

The result for the first prompt is:

We can see that the chosen model here was GPT-4.1-nano-2025-04-14. This seems reasonable as the prompt was very simple.

Let's see the result for the second prompt:

Because the second prompt was more complex than the first, we can see that the router chose the o4-mini-2025-04-16 model.

And that's it. This is how simple it is to use the model router in your applications.

One important limitation to note is that the context window limit is the limit of the smallest underlying model. Other underlying models are compatible with larger context windows, which means an API call with a larger context will succeed only if the prompt happens to be routed to the right model, otherwise the call will fail. To shorten the context window, you can do one of the following:

Summarize the prompt before passing it to the model
Truncate the prompt into more relevant parts
Use document embeddings and have the chat model retrieve relevant sections

Thanks for sticking to the end of another article from 'Iliev Talks Tech'.

Select the best LLM to respond to a given prompt in real time!

Dimitar Iliev ☁️

AI Innovation Leader ● B. Sc. Computer Science and Engineering ● 7 x Microsoft Certified ● 30 x Microsoft Applied Skills ● Tech Speaker ● The Weeknd Fan

What is Model Router in Azure AI Foundry?

Deploying the Model Router

Using the Model Router

Next steps:

More articles by this author

Others also viewed

Google Gemma – Gemini junior

Exploring OpenAI’s New o1-Preview – Key Features and Benefits

Microsoft’s Azure OpenAI Service: Efficient, Intuitive, and Individualized

AI News Highlights from 20th of June, 2025

MCP : AI Game-Changer

OpenAI's GPTs: Custom AI for Everyone

🤖 OpenAI models: new 4.1, new o3, o4 mini, 4.5 gone, 🎥 Kling 2.0 upgrade, 🌍 Stargate expansion to EU?, 🐬 Google decodes dolphins

Should companies train their own LLM?

#8: OpenAI Developer Day, 2023

AI Weekly Roundup: Nov 8, 2024

Explore topics

What is Model Router in Azure AI Foundry?

Deploying the Model Router

Using the Model Router

Next steps:

AI Agents Collaboration using Agent Group Chat

Mar 23, 2025

Azure Costs Out of Control? Here’s How to Take Back Control

Feb 16, 2025

Generative Query Rewriting - Azure AI Search

Dec 9, 2024

Boost search scores with scoring profiles - Azure AI Search

Nov 10, 2024

Log Azure OpenAI events to Azure Event Hubs in Azure API Management

Oct 13, 2024

My Honest Opinion - Microsoft Applied Skills

Sep 26, 2024

Working with an index alias in Azure AI Search

Sep 15, 2024

Implementing Custom Feature Filters - Azure App Configuration

Sep 1, 2024

Load balancing between multiple Azure OpenAI instances using Azure API Management

Aug 18, 2024

Import an Azure OpenAI API as a REST API into Azure API Management

Jul 26, 2024

Others also viewed

Google Gemma – Gemini junior

Exploring OpenAI’s New o1-Preview – Key Features and Benefits

Microsoft’s Azure OpenAI Service: Efficient, Intuitive, and Individualized

AI News Highlights from 20th of June, 2025

MCP : AI Game-Changer

OpenAI's GPTs: Custom AI for Everyone

🤖 OpenAI models: new 4.1, new o3, o4 mini, 4.5 gone, 🎥 Kling 2.0 upgrade, 🌍 Stargate expansion to EU?, 🐬 Google decodes dolphins

Should companies train their own LLM?

#8: OpenAI Developer Day, 2023

AI Weekly Roundup: Nov 8, 2024

Explore topics