Bing Search Update: Faster, More Precise Results

Microsoft Enhances Bing with LLMs and SLMs

Microsoft has announced major updates to Bing’s search infrastructure, incorporating Large Language Models (LLMs), Small Language Models (SLMs), and advanced optimization techniques.

This update aims to enhance performance while reducing costs in search result delivery.

Microsoft’s Vision for Search

In an official announcement, Microsoft stated:

“At Bing, we are always pushing the boundaries of search technology. Leveraging both Large Language Models (LLMs) and Small Language Models (SLMs) marks a significant milestone in enhancing our search capabilities. While transformer models have served us well, the growing complexity of search queries necessitated more powerful models.”

Performance Gains with SLMs

Speed and Efficiency Improvements

Using LLMs in search can lead to slow processing speeds and high costs. To address this, Bing has developed SLMs, which it claims are 100 times faster than LLMs.

Microsoft explains:

“LLMs can be expensive to serve and slow. To improve efficiency, we trained SLM models (~100x throughput improvement over LLM), which process and understand search queries more precisely.”

Integration with NVIDIA TensorRT-LLM

Bing also incorporates NVIDIA’s TensorRT-LLM, a tool designed to enhance the efficiency of SLMs by reducing computational costs and improving response times.

Impact on “Deep Search”

Enhancing Search with TensorRT-LLM

Microsoft reports that integrating TensorRT-LLM technology has significantly improved Bing’s “Deep Search” feature. Deep Search uses SLMs in real-time to generate more relevant results.

Before optimization:

Latency: 4.76 seconds per batch (20 queries)
Throughput: 4.2 queries per second per instance

After optimization with TensorRT-LLM:

Latency reduced by 36% to 3.03 seconds per batch
Throughput increased by 57% to 6.6 queries per second per instance

Microsoft emphasizes:

“Our product is built on the foundation of providing the best results, and we will not compromise on quality for speed. This is where TensorRT-LLM comes into play, reducing model inference time and, consequently, the end-to-end experience latency without sacrificing result quality.”

Benefits for Bing Users

This update offers several key benefits to users:
✅ Faster search results – Optimized processing leads to quicker response times.
✅ Improved accuracy – SLMs provide more contextualized and relevant results.
✅ Cost efficiency – Lower operational costs allow for further innovation.

Why Bing’s AI Advancements Matter

Bing’s adoption of LLMs, SLMs, and TensorRT-LLM optimizations signals a shift in search technology.

As users ask more complex questions, search engines must quickly deliver precise results. Bing’s move towards smaller, faster models aims to achieve this.

While the full impact remains to be seen, these advancements mark the beginning of a new era in search.

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}