Microsoft Enhances Bing with LLMs and SLMs
Microsoft has announced major updates to Bing’s search infrastructure, incorporating Large Language Models (LLMs), Small Language Models (SLMs), and advanced optimization techniques.
This update aims to enhance performance while reducing costs in search result delivery.
Microsoft’s Vision for Search
In an official announcement, Microsoft stated:
“At Bing, we are always pushing the boundaries of search technology. Leveraging both Large Language Models (LLMs) and Small Language Models (SLMs) marks a significant milestone in enhancing our search capabilities. While transformer models have served us well, the growing complexity of search queries necessitated more powerful models.”
Performance Gains with SLMs
Speed and Efficiency Improvements
Using LLMs in search can lead to slow processing speeds and high costs. To address this, Bing has developed SLMs, which it claims are 100 times faster than LLMs.
Microsoft explains:
“LLMs can be expensive to serve and slow. To improve efficiency, we trained SLM models (~100x throughput improvement over LLM), which process and understand search queries more precisely.”
Integration with NVIDIA TensorRT-LLM
Bing also incorporates NVIDIA’s TensorRT-LLM, a tool designed to enhance the efficiency of SLMs by reducing computational costs and improving response times.
Impact on “Deep Search”
Enhancing Search with TensorRT-LLM
Microsoft reports that integrating TensorRT-LLM technology has significantly improved Bing’s “Deep Search” feature. Deep Search uses SLMs in real-time to generate more relevant results.
Before optimization:
- Latency: 4.76 seconds per batch (20 queries)
- Throughput: 4.2 queries per second per instance
After optimization with TensorRT-LLM:
- Latency reduced by 36% to 3.03 seconds per batch
- Throughput increased by 57% to 6.6 queries per second per instance
Microsoft emphasizes:
“Our product is built on the foundation of providing the best results, and we will not compromise on quality for speed. This is where TensorRT-LLM comes into play, reducing model inference time and, consequently, the end-to-end experience latency without sacrificing result quality.”
Benefits for Bing Users
This update offers several key benefits to users:
✅ Faster search results – Optimized processing leads to quicker response times.
✅ Improved accuracy – SLMs provide more contextualized and relevant results.
✅ Cost efficiency – Lower operational costs allow for further innovation.
Why Bing’s AI Advancements Matter
Bing’s adoption of LLMs, SLMs, and TensorRT-LLM optimizations signals a shift in search technology.
As users ask more complex questions, search engines must quickly deliver precise results. Bing’s move towards smaller, faster models aims to achieve this.
While the full impact remains to be seen, these advancements mark the beginning of a new era in search.


