Select the best LLM to respond to a given prompt in real time!

Select the best LLM to respond to a given prompt in real time!

What is Model Router in Azure AI Foundry?

Model router for Azure AI Foundry is a deployable AI chat model that is trained to select the best large language model to respond to a given prompt in real time. By evaluating factors like query complexity, cost, and performance, it intelligently routes requests to the most suitable model. With that, it delivers high performance while saving on compute costs where possible, all packaged as a single model deployment.

Smaller and cheaper models are used when they're sufficient for the task, but larger and more expensive models are available for more complex tasks.


Deploying the Model Router

To use the model router, we need to initially deploy it. To do that go to Azure AI Foundry and open the model deployments. Choose the '+ Deploy model' option.

Article content

Next, select the model-router model and click on 'Confirm'.

Article content

Finally, specify the deployment name and click on 'Deploy'.

Article content

After the deployment is completed, you can use the model router in your applications.


Using the Model Router

I have created a simple Console application to demonstrate how to use the model router.

We can use model router in the same way we'd use other OpenAI chat models. Let's set the deployment name parameter to the name of our model router deployment.

builder.Services.AddAzureOpenAIChatClient(
   deploymentName: "model-router-itt",
   endpoint: Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!,
   apiKey: Environment.GetEnvironmentVariable("OPENAI_API_KEY")!);        

Then let's define two prompts and see what models our router will choose for the responses.

var prompt = "What is the capital city of France?";
var result = await kernel.InvokePromptAsync(prompt, new(executionSettings)).ConfigureAwait(false);
Console.WriteLine($"\n\n{prompt}\n{result}");

prompt = "Write a detailed blog post comparing the benefits and trade-offs of using vector search versus keyword-based search in enterprise AI applications, including practical Azure AI Search configuration examples.";
result = await kernel.InvokePromptAsync(prompt, new(executionSettings)).ConfigureAwait(false);
Console.WriteLine($"\n\n{prompt}\n{result}");        

The result for the first prompt is:

Article content

We can see that the chosen model here was GPT-4.1-nano-2025-04-14. This seems reasonable as the prompt was very simple.

Let's see the result for the second prompt:

Article content

Because the second prompt was more complex than the first, we can see that the router chose the o4-mini-2025-04-16 model.

And that's it. This is how simple it is to use the model router in your applications.

One important limitation to note is that the context window limit is the limit of the smallest underlying model. Other underlying models are compatible with larger context windows, which means an API call with a larger context will succeed only if the prompt happens to be routed to the right model, otherwise the call will fail. To shorten the context window, you can do one of the following:

  • Summarize the prompt before passing it to the model
  • Truncate the prompt into more relevant parts
  • Use document embeddings and have the chat model retrieve relevant sections

Thanks for sticking to the end of another article from 'Iliev Talks Tech'.


Next steps:

Thanks for sharing, Dimitar👏👏👏

To view or add a comment, sign in

Others also viewed

Explore topics