Surfing the LLM Waves: Continuously Benefitting from the Ecosystem Advances

Divyam.AI
5
February 24, 2025

Table of Contents

Generated with Imagen 3, with inspiration from "The Great Wave off Kanagawa".

On September 17, 2024, OpenAI announced o1-preview, which heralded the era of reasoning Large Language Models – which not only auto-regressively generates output tokens, but ponders over them at inference time (via intermediate thinking tokens) to ensure quality. This model enjoys a good performance rating (Artificial Analysis Performance Index: 86), but comes at a high cost (input: $15.00/mt; output: $60.00/mt – where the output tokens also include thinking tokens, and mt is an abbreviation for million tokens). On January 20, 2025, DeepSeek R1 was announced. It delivers an even more impressive performance (Artificial Analysis Performance Index: 89) at ~20-25x lower price (input: $0.75/mt; output: $2.40 on DeepInfra). Shortly thereafter, on January 31, 2025, OpenAI followed suit with o3-mini, which matches the DeepSeek R1 quality (Artificial Analysis Quality Index: 89), but at an intermediate price point (input: $1.1/mt; output: $4.40/mt).

If you are an application developer who benefits from the reasoning capability, should you migrate from o1-preview to DeepSeek R1, and then again to o3-mini? In an intensely competitive field such as frontier LLM, such potentially disruptive events tend to occur frequently – e.g., when a new frontier LLM arrives; when a provider, such as Groq, is able to slash the cost of access. In future as fine-tuning becomes commoditised, we surmise such events will occur even more frequently. Irrespective of these events that cause a step-jump, the quality and price of every provider or LLM change with time (a phenomenon christened LLMflation by a16z: for the same performance, LLM  inference cost gets 10x cheaper every year). This begs the question: must we migrate continuously?

The question of migration is even more nuanced. Two frontier LLMs with equivalent overall performance may perform differently on different tasks: e.g., while both o3-mini and DeepSeek R1 enjoy an equal Artificial Analysis Quality Index of 89, on quantitate reasoning task (MATH-500 benchmark), DeepSeek R1 fetches 97%, whereas o3-mini fetches 92%. This makes the migration decision further contingent on the nature of the application.

An application developer, thus, needs a mechanism for continuous migration – which enables her to decouple the choice of the provider/LLM from the application logic.

As an aside, in the world of finance, a trader would need to re-allocate her portfolio in response to (predicted) movements in the asset prices. A quantitative trader offloads this task to an algorithm.

At Divyam, we believe that an algorithmic, continuous, fractional migration is feasible – where the migration decision is offloaded to a router at a per prompt granularity.

To study the efficacy of routers, we conducted an experiment at Divyam. Specifically, we took the MT-Bench dataset, which contains 80 two-turn conversations between a human expert and a LLM. With Divyam’s evaluation harness, we replayed these conversations to both o1-preview and DeepSeek-R1-Distill-Llama-70B (input cost: $0.23/mt; output cost: $0.69 on DeepInfra; almost equal performance as o1-preview in MATH-500 and HumanEval), and used o1-preview to judge the resulting responses. The prompt template for the judge follows the best practices listed in the landmark Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena paper (note that we allow the application developer to plug in her own eval, instead). The result shows that 48 out of the 80 conversations (60%) elicit an equivalent or a better response if we choose to use the cheaper alternative in place of o1-preview – which amounts to slashing a $100 bill to $42.4 (~2.4x reduction) – at the expense of slight reduction in quality on half the conversations (note that this is a function of willingness-to-pay, a knob that the application developer can tune to suit her appetite for tradeoff). We present a visual summary below:-

Model comparison across traffic, cost, and quality.

This, however, is only an upper-bound. To operationalise this insight, one needs to actually build a router. While a detailed discussion of routing algorithms is deferred to a later blogpost, we illustrate the intuition behind a conceptually simple routing algorithm: k-Nearest Neighbour (kNN). kNN builds an “atlas” of all the prompts in the application log, and remembers the quality that each LLM yielded on them. Presented with a new prompt, kNN simply places it on that atlas, looks up k nearest neighbours around it, and chooses to route to that LLM which yielded the highest average quality in this neighbourhood. The following figure (left panel) visualises the atlas of the MT-Bench. This atlas was obtained by first embedding each prompt into a 384-dimensional space with the “all-MiniLM-L12-v2” Sentence Transformer, and then subsequently projecting them into the plane with t-SNE – a dimension-reduction algorithm – and, lastly, by colouring each conversation according to the most performant LLM for it. The right panel segments it as per the routing decision: i.e., if a prompt maps to a red region, the kNN router, when k=3, routes it to DeepSeek; else, if it falls into the green region, it goes to o1-preview.

Model performance distribution on MT-Bench.

At Divyam, we built an agentic workflow, where agents specialising in evaluation, routing, etc.collaborate to facilitate continuous and fractional (i.e., per prompt) migration – allowing the application developer to direct 100% focus on application development, devoid of any distraction posed by migration.This workflow requires a low-touch integration with the application, and can be deployed on the client’s infrastructure.

Explore More AI Insights

Stay ahead of the curve with our latest blogs on AI innovations, industry trends, and business transformations. Discover how intelligent systems are shaping the future—one breakthrough at a time.

BLOG

AI Strategy focused on maximizing returns on your GenAI investments

5
June 24, 2025
Read More

As industries across the spectrum continue to be sculpted by Gen AI, embracing it with strategic, ethical, and operational foresight will determine the extent to which businesses can truly harness its transformative potential to craft a future that resonates with success, sustainability, and societal contributions. However, GenAI faces its fair share of adoption hurdles. Organizations committed to leveraging generative AI must navigate through myriad challenges, ensuring both the solution efficacy and ethical application. 

Let us take the challenges of

Ensuring Adaptability and Scalability – given the scores of choices of GenAI products in the ever-evolving market today, an organization is continually wondering whether they have the right product chosen. The problem of vendor-lock in looms large as the costs of adapting and scaling are formidable. Your choice of LLM for your applications depends on the crucial balance of cost vs benefits. But this factor is not static – it needs continuous evaluation and evolution given the fluidity of GenAI advancements.

Accuracy and hallucinations – Your organization has GenAI based solutions, but you are continually concerned about the quality of the output you get. There are techniques to mitigate the tendency for AI models to hallucinate, such as by cross-checking the results of one AI model against another, which can bring the rate to under 1%. The lengths one will go to mitigate hallucinations is largely dependent on the actual use case, but it’s something that you need AI developers to continually work on. 

The above two challenges are compounded by the fact that your organization’s AI skill set not only needs to be continually upgraded, but your skilled workforce must also be involved in experimenting and training your application with new entrants in the Gen AI market to ensure that you are not losing out on the latest benefits that they have to offer. This affects your own time to market and cost of production.

What if there was a solution which chose the best LLMs for you , fine-tuned them for your application, continuously evaluated the quality of your output,provided excellent observability across all your GenAI use cases, all this ensuring the best optimized cost benefit ratio always?

A diagram of a companyAI-generated content may be incorrect.

Divyam.AI is the solution you are looking for.

If you are an organization who has established its inference pipelines using GenAI, Divyam can help you upgrade to the best model suited for your application. It is a solution where the best model for your use case is made accessible to you through a universal API which follows the OpenAI API standards. Divyam’s model selector takes each prompt of your application through the API, works its magic and directs it to the best LLM model for that prompt.

 It is unique because you have a leader board of models to choose from at a per prompt granularity. In the absence of Divyam, you would have needed to employ data scientists who would experiment and choose the best model for your application. Moreover, choosing the best model at a per prompt granularity is a hard problem to solve, you would rather have this problem solved for you by a plug and play solution like Divyam. The LLM chosen by Divyam’s fabric could be API based models like ChatGPT or Gemini or models like Llama which could be self hosted in your inference server. 

If you have been running your application through Divyam you would also not worry about fine tuning your inference server. Divyam’s fine tuner takes that headache off you. The fine tuner has intelligence built which chooses the right values of parameters to tune which are suited for your application patterns and uploads the fine tuned model back to your  inference server. This will continuously give your user an evolved experience and the best performance of your inference server model.

In cases where Divyam has chosen API based LLMs for your application, and you are wondering if you are still at your peak Cost benefit ratio, Divyam’s evaluation engine has you covered . The evaluator runs in the background and continuously does AB testing with earlier cheaper versions of the LLM models so that you always  have an almost equal or greater artificial intelligence analysis performance index for your application.

The cost-quality trade-offs of LLM usage are application specific. Different applications have different tolerance and often an organization wants to make a choice between reach vs quality. Divyam.AI provides you with a slider per application, which you can configure to have the desired balance. You can also observe the cost benefits and quality metric improvement on our rich dashboards and compare the performance. This can make all the difference between a positive RoI vs a negative RoI for the application. 

A black background with blue textAI-generated content may be incorrect.

Let us come to the inevitable question of data privacy. Your organization’s data remains safe within your VPCs. Divyam is deployed within your enterprise boundaries, so your data stays put. Divyam only has control plane access to push in the latest and greatest intelligence and monitor quality so that you are always at peak performance.

Divyam.AI is also available as a hosted solution in case you want to get started with a single line of code change. 

In conclusion, Divyam.AI learns from both its global expertise and your own historical data to build a private, use-case-specific knowledge base that trims away hundreds of irrelevant model choices, automatically selects the best one, and continuously monitors live quality. If performance ever dips, it intervenes instantly to protect production, and it reruns the whole process on schedule or the moment a new model emerges. All of this happens without manual effort, so your team can stay focused on delivering core value instead of chasing model upgrades or cost savings.

BLOG

Enterprise GenAI: Turning Hype into Tangible Value

5
April 24, 2025
Read More

Generative AI (GenAI) has moved beyond the realm of science fiction and into the strategic considerations of enterprises across industries. While the ability of GenAI to generate creative content like poems and images creates a lot of excitement, its true potential lies in its capacity to revolutionize business operations, enhance customer experiences, and drive innovation at scale. However, realizing this potential requires a pragmatic approach that moves beyond the hype and focuses on tangible value creation.   

Laying the Foundation for Successful GenAI Adoption

Implementing GenAI in an enterprise setting is not merely an IT project; it's a team sport that demands collaboration across different business functions. Business leaders must take the lead in defining the specific problems they aim to solve with GenAI, while data scientists play a crucial role in addressing data-related challenges such as accuracy, governance, and security. IT professionals are essential for implementing and maintaining the underlying infrastructure and ensuring the technology functions correctly.  

A successful GenAI strategy requires a holistic approach that considers several key factors:

Accuracy and Reliability

Enterprise GenAI deployments for mission-critical tasks demand 100% accuracy. Ensuring the reliability of GenAI outputs through techniques like grounding the models with relevant data is paramount.

Governance and Traceability

Enterprises must establish clear governance frameworks for their GenAI initiatives, including mechanisms for tracking data lineage, model development, and deployment processes. This ensures compliance with regulatory requirements and facilitates auditing.

Scalability and Performance

Enterprise-grade GenAI solutions must be able to handle large volumes of data and user requests while maintaining optimal performance. Building a scalable AI infrastructure that can adapt to evolving business needs is crucial.

Security and Privacy

Protecting sensitive enterprise data is paramount. GenAI systems must be designed with robust security measures to prevent data breaches and ensure compliance with privacy regulations.

Cost-Effectiveness and Sustainability

The economic and environmental impact of GenAI deployments must be carefully considered. Enterprises should strive for cost-effective solutions with a focus on minimizing their carbon footprint.

Fostering a Culture of Innovation for GenAI

Beyond the technological considerations, cultivating a culture of innovation is essential for unlocking the full potential of GenAI within an organization. This involves creating an environment where employees feel empowered to experiment, share bold ideas, and learn from their mistakes.

Use Cases and the Future of Enterprise GenAI

The applications of GenAI in the enterprise are rapidly expanding across various domains, including:

Customer Experience

GenAI-powered virtual assistants and chatbots can provide personalized and efficient customer support, resolve queries quickly, and even proactively identify potential issues.

Content Creation

GenAI can automate the creation of marketing materials, technical documentation, and other forms of content, freeing up human employees for more strategic tasks.

Data Analysis and Insights

GenAI can analyze large datasets to identify hidden patterns, generate actionable insights, and support better decision-making.

Software Development

GenAI tools can assist developers with code generation, testing, and debugging, accelerating the software development lifecycle.

Risk Management and Compliance

GenAI can be used to detect fraudulent activities, automate compliance processes, and improve risk assessment.

While GenAI has tremendous potential to transform the enterprises, a robust and intelligent infrastructure is the bedrock upon which successful adoption and tangible ROI will be built.

The journey from the current inception of GenAI to realizing its true business value will undoubtedly depend on how well organizations can establish that kind of underlying foundation.

At Divyam.ai, we are solving this complex problem for our customers by creating a fully resilient, optimized, and autonomous infrastructure so that businesses can focus on innovation in their business domains!

Cut costs. Boost accuracy. Stay secure.

Smarter enterprise workflows start with Divyam.ai.

Book a Demo