Divyam.AI's Performance vis a vis Microsoft and Nvidia Routers

Divyam.AI
5

Table of Contents

Today, you have a choice from a crowd of models which flex intelligence and capability. You have intelligence on tap, but which tap should you turn? Getting the right balance of power and proportionality when choosing your AI toolset plays a huge role in your success with AI deployments. In the context of LLMs, this challenge is crucial as it often bags the bulk of your AI expenditure. Divyam.AI addresses this exact challenge for you by helping you optimize your cost-performance balance for your GenAI deployments.

In this article, we present to you a comparative study of Divyam’s Router (the DAI Router in the diagram)  capabilities vis-a-vis industry titans – Microsoft Model Router ,NVIDIA LLM Router.

To understand the comparison, let us dig into the principle on which Divyam’s Router works. 

Suppose you want to assess the mental abilities and knowledge-based skills of thousands of students – what you would do is design a test with a questionnaire – and make them take the test and rank them on their performance in the test. Institutions have been doing this for decades using the psychometric framework called Item Response Theory (IRT), which has been around since 1968! IIRT is a family of psychometrically-grounded statistical models that takes the response matrix (i.e., how each student answered every question in the questionnaire) as input, and provides an estimate of “skill” possessed by each student, the “skill” needed to solve each question, along with their “difficulty”.

To draw a parallel, now consider each LLM as a student and evaluation benchmarks to be the test. Divyam extends the IRT family to enable the estimation of skill and difficulty of a hitherto unseen prompt, and utilizes the estimated skill of each LLM to estimate an ex-ante performance estimate.  

Comparison of Routers

Routers are models that are trained to select the best large language model (LLM) to respond to a given prompt in real time. It uses a combination of preexisting models to provide high performance while saving on compute costs where possible, all packaged as a single model deployment. 

The Divyam LLM Router employs a proprietary algorithm that assesses the skill required (and the difficulty level) of each prompt, and based on that, routes it to the available models for routing.

Dataset

For our comparative study, the benchmark that we have chosen is the MMLU-Pro , which tests the reasoning and knowledge capabilities of an LLM via a set of multiple choice questions spanning 14 subjects, such as Math, Physics, Computer Science, and Philosophy. Each question is presented with 10 possible answers, and an LLM, upon receiving a 5-shot prompt, must choose the sole correct answer. Randomly-chosen  20% samples (2,406 out of 12,032) serve as the test dataset, on which we report the performance.

LLM Performance

In the table below, we present the performance of a set of contemporary LLMs on MMLU-Pro. From the table, we can see that o4-mini has the best accuracy for this benchmark. Our subsequent tests  will take o4-mini as the basis for our relative comparisons.

Results with Microsoft Model Router

The Microsoft Model Router (MS Router) is packaged as a single Azure AI Foundry model that you deploy. Notably, Model Router cannot be fine-tuned on your data.

The LLMs are chosen from a pre-existing set of models namely gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o4-mini. Notably, one cannot add or remove to this list.

Unlike Microsoft Model Router, Divyam routes your queries to the right LLM based on your preference for  (a) cost optimization (b) performance optimization.

The graph below of Cost Savings vs Accuracy presents Router performance where the selection is limited to the MS router set of LLMs.  Divyam’s Quality Optimization parameters have been tuned to suit Accuracy requirements to compare to MS Router. This tuning is unique to Divyam and is not possible with MS Router.

You can see that for the same relative accuracy, Divyam’s Cost Savings (59.92%)  are nearly double that of MS Router(35.52%).

Whereas MS Router is stuck with its choices of LLMs, nothing restricts Divyam to add the right set of LLMs for our customer. After the 3 Gemini models presented in the above graph are added to Divyam Router (along with the ones Microsoft Model Router was already routing to), we notice a clear uptick in the cost-performance Pareto. 

You can see from the graph above that Divyam does even better in terms of Cost Savings and Accuracy compared to MS Router. For the same relative accuracy, Divyam’s Cost Savings(84.46%) is nearly 3 times that of MS Router(35.52).

Results with NVidia Router:  

Divyam’s ability to prioritize cost and accuracy and separate and combined parameters are unique and yield better and desirable results in both cases. 

The NVIDIA LLM Router can be configured with one of 3 Router Models – 1) task-router 2) complexity-router 3) intent-router – each, in turn, are powered by a (pre-LLM era) language model – Microsoft/DeBERTa-v3-base – which contains 86M parameters.

Furthermore, we consider the task-router and the intent-router unsuitable for our purpose and focus only on the complexity-router. The complexity-router classifies each prompt into one of 6 pre-defined classes (e.g., “Domain”), and routes all prompts in a class to a single, configurable LLM. In our specific example, all queries belonging to “Domain” are routed to the LLM, whereas everything else is routed to the SLM. 

We have tuned Divyam’s Quality Optimizer to “Priority Cost Saving” and “Priority Accuracy”

From the above graph you can see that for the same range of Cost Saving, Divyam(-0.16%) surpasses Nvidia(-18.1%) by a factor of 18 when tuned for Cost Saving. Also, Divyam’s Accuracy (1.31) surpasses that of GPT 4.1 when tuned for accuracy.

The table below is a level deeper into the results in the above table. It shows how Divyam has used the number of dimensions of LLM abilities to get the LLM with the best probable value to be correct for that prompt. It also lists the distribution of LLMs chosen for the percentage of prompts from the test set.

*Divyam’s MMLU-Pro Router Performance 

For a similar test, the NVIDIA LLM Router insights are depicted in the below graphs.

A close-up of a graphAI-generated content may be incorrect.

In conclusion, we see that Divyam’s Router yields better Pareto for both Microsoft Router and NVIDIA Router, even though the philosophies of LLM choice are different in both comparisons. Divyam’s ability to prioritize cost and accuracy and separate and combined parameters are unique and yield better and desirable results in both cases. Moreover, Divyam spans cartel borders in the industry and can easily incorporate LLMs from all segments.

Stay tuned for more experimental results on the cost – performance ratios and deeper tests on confirming Divyam’s low running costs.

Explore More AI Insights

Stay ahead of the curve with our latest blogs on AI innovations, industry trends, and business transformations. Discover how intelligent systems are shaping the future—one breakthrough at a time.

BLOG

AI Strategy focused on maximizing returns on your GenAI investments

5
June 24, 2025
Read More

As industries across the spectrum continue to be sculpted by Gen AI, embracing it with strategic, ethical, and operational foresight will determine the extent to which businesses can truly harness its transformative potential to craft a future that resonates with success, sustainability, and societal contributions. However, GenAI faces its fair share of adoption hurdles. Organizations committed to leveraging generative AI must navigate through myriad challenges, ensuring both the solution efficacy and ethical application. 

Let us take the challenges of

Ensuring Adaptability and Scalability – given the scores of choices of GenAI products in the ever-evolving market today, an organization is continually wondering whether they have the right product chosen. The problem of vendor-lock in looms large as the costs of adapting and scaling are formidable. Your choice of LLM for your applications depends on the crucial balance of cost vs benefits. But this factor is not static – it needs continuous evaluation and evolution given the fluidity of GenAI advancements.

Accuracy and hallucinations – Your organization has GenAI based solutions, but you are continually concerned about the quality of the output you get. There are techniques to mitigate the tendency for AI models to hallucinate, such as by cross-checking the results of one AI model against another, which can bring the rate to under 1%. The lengths one will go to mitigate hallucinations is largely dependent on the actual use case, but it’s something that you need AI developers to continually work on. 

The above two challenges are compounded by the fact that your organization’s AI skill set not only needs to be continually upgraded, but your skilled workforce must also be involved in experimenting and training your application with new entrants in the Gen AI market to ensure that you are not losing out on the latest benefits that they have to offer. This affects your own time to market and cost of production.

What if there was a solution which chose the best LLMs for you , fine-tuned them for your application, continuously evaluated the quality of your output,provided excellent observability across all your GenAI use cases, all this ensuring the best optimized cost benefit ratio always?

A diagram of a companyAI-generated content may be incorrect.

Divyam.AI is the solution you are looking for.

If you are an organization who has established its inference pipelines using GenAI, Divyam can help you upgrade to the best model suited for your application. It is a solution where the best model for your use case is made accessible to you through a universal API which follows the OpenAI API standards. Divyam’s model selector takes each prompt of your application through the API, works its magic and directs it to the best LLM model for that prompt.

 It is unique because you have a leader board of models to choose from at a per prompt granularity. In the absence of Divyam, you would have needed to employ data scientists who would experiment and choose the best model for your application. Moreover, choosing the best model at a per prompt granularity is a hard problem to solve, you would rather have this problem solved for you by a plug and play solution like Divyam. The LLM chosen by Divyam’s fabric could be API based models like ChatGPT or Gemini or models like Llama which could be self hosted in your inference server. 

If you have been running your application through Divyam you would also not worry about fine tuning your inference server. Divyam’s fine tuner takes that headache off you. The fine tuner has intelligence built which chooses the right values of parameters to tune which are suited for your application patterns and uploads the fine tuned model back to your  inference server. This will continuously give your user an evolved experience and the best performance of your inference server model.

In cases where Divyam has chosen API based LLMs for your application, and you are wondering if you are still at your peak Cost benefit ratio, Divyam’s evaluation engine has you covered . The evaluator runs in the background and continuously does AB testing with earlier cheaper versions of the LLM models so that you always  have an almost equal or greater artificial intelligence analysis performance index for your application.

The cost-quality trade-offs of LLM usage are application specific. Different applications have different tolerance and often an organization wants to make a choice between reach vs quality. Divyam.AI provides you with a slider per application, which you can configure to have the desired balance. You can also observe the cost benefits and quality metric improvement on our rich dashboards and compare the performance. This can make all the difference between a positive RoI vs a negative RoI for the application. 

A black background with blue textAI-generated content may be incorrect.

Let us come to the inevitable question of data privacy. Your organization’s data remains safe within your VPCs. Divyam is deployed within your enterprise boundaries, so your data stays put. Divyam only has control plane access to push in the latest and greatest intelligence and monitor quality so that you are always at peak performance.

Divyam.AI is also available as a hosted solution in case you want to get started with a single line of code change. 

In conclusion, Divyam.AI learns from both its global expertise and your own historical data to build a private, use-case-specific knowledge base that trims away hundreds of irrelevant model choices, automatically selects the best one, and continuously monitors live quality. If performance ever dips, it intervenes instantly to protect production, and it reruns the whole process on schedule or the moment a new model emerges. All of this happens without manual effort, so your team can stay focused on delivering core value instead of chasing model upgrades or cost savings.

BLOG

Enterprise GenAI: Turning Hype into Tangible Value

5
April 24, 2025
Read More

Generative AI (GenAI) has moved beyond the realm of science fiction and into the strategic considerations of enterprises across industries. While the ability of GenAI to generate creative content like poems and images creates a lot of excitement, its true potential lies in its capacity to revolutionize business operations, enhance customer experiences, and drive innovation at scale. However, realizing this potential requires a pragmatic approach that moves beyond the hype and focuses on tangible value creation.   

Laying the Foundation for Successful GenAI Adoption

Implementing GenAI in an enterprise setting is not merely an IT project; it's a team sport that demands collaboration across different business functions. Business leaders must take the lead in defining the specific problems they aim to solve with GenAI, while data scientists play a crucial role in addressing data-related challenges such as accuracy, governance, and security. IT professionals are essential for implementing and maintaining the underlying infrastructure and ensuring the technology functions correctly.  

A successful GenAI strategy requires a holistic approach that considers several key factors:

Accuracy and Reliability

Enterprise GenAI deployments for mission-critical tasks demand 100% accuracy. Ensuring the reliability of GenAI outputs through techniques like grounding the models with relevant data is paramount.

Governance and Traceability

Enterprises must establish clear governance frameworks for their GenAI initiatives, including mechanisms for tracking data lineage, model development, and deployment processes. This ensures compliance with regulatory requirements and facilitates auditing.

Scalability and Performance

Enterprise-grade GenAI solutions must be able to handle large volumes of data and user requests while maintaining optimal performance. Building a scalable AI infrastructure that can adapt to evolving business needs is crucial.

Security and Privacy

Protecting sensitive enterprise data is paramount. GenAI systems must be designed with robust security measures to prevent data breaches and ensure compliance with privacy regulations.

Cost-Effectiveness and Sustainability

The economic and environmental impact of GenAI deployments must be carefully considered. Enterprises should strive for cost-effective solutions with a focus on minimizing their carbon footprint.

Fostering a Culture of Innovation for GenAI

Beyond the technological considerations, cultivating a culture of innovation is essential for unlocking the full potential of GenAI within an organization. This involves creating an environment where employees feel empowered to experiment, share bold ideas, and learn from their mistakes.

Use Cases and the Future of Enterprise GenAI

The applications of GenAI in the enterprise are rapidly expanding across various domains, including:

Customer Experience

GenAI-powered virtual assistants and chatbots can provide personalized and efficient customer support, resolve queries quickly, and even proactively identify potential issues.

Content Creation

GenAI can automate the creation of marketing materials, technical documentation, and other forms of content, freeing up human employees for more strategic tasks.

Data Analysis and Insights

GenAI can analyze large datasets to identify hidden patterns, generate actionable insights, and support better decision-making.

Software Development

GenAI tools can assist developers with code generation, testing, and debugging, accelerating the software development lifecycle.

Risk Management and Compliance

GenAI can be used to detect fraudulent activities, automate compliance processes, and improve risk assessment.

While GenAI has tremendous potential to transform the enterprises, a robust and intelligent infrastructure is the bedrock upon which successful adoption and tangible ROI will be built.

The journey from the current inception of GenAI to realizing its true business value will undoubtedly depend on how well organizations can establish that kind of underlying foundation.

At Divyam.ai, we are solving this complex problem for our customers by creating a fully resilient, optimized, and autonomous infrastructure so that businesses can focus on innovation in their business domains!

Cut costs. Boost accuracy. Stay secure.

Smarter enterprise workflows start with Divyam.ai.

Book a Demo