News
Models
Products
keyboard_arrow_down
DeepSearch
Search, read and reason until best answer found.
Reader
Convert any URL to Markdown for better grounding LLMs.
Embeddings
World-class multimodal multilingual embeddings.
Reranker
World-class reranker for maximizing search relevancy.
More
keyboard_arrow_down
Classifier
Zero-shot and few-shot classification for image and text.
Segmenter
Cut long text into chunks and do tokenization.

API Docs
Auto codegen for your copilot IDE or LLM
open_in_new


Company
keyboard_arrow_down
About us
Contact sales
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms & Conditions


Log in
login
Two problems of RAG
Feed-forward only
Grounding in the wild is hard
My Take
Opinion
May 24, 2024

RAG is Dead, Again?

RAG is just one algorithmic pattern you can use. But if you make it *the* algorithm and idolize it, then you are living in a bubble you created, and the bubble will burst.
Cartoon of four characters in a cemetery with graves marked "RAG," mixing somber themes with humorous actions.
Han Xiao
Han Xiao • 4 minutes read

It is hard to tell if people hate to love RAG or love to hate RAG.

According to recent discussions on X and HN, RAG should be dead, again. This time, critics are focusing on the over-engineering of most RAG frameworks, which, as @jeremyphoward @HamelHusain @Yampeleg demonstrated, could be accomplished with 20 lines of Python code.

The last time we had this vibe was shortly after the release of Claude/Gemini with a super long context window. What makes this time worse is that even Google's RAG generates funny results as @icreatelife @mark_riedl showed, which is ironic because back in April, at Google Next in Las Vegas, Google presented RAG as the grounding solution.

tagTwo problems of RAG

I see two problems with the RAG frameworks and solutions we have today.

tagFeed-forward only

First, nearly all RAG frameworks implement only a "feed-forward" path and lack a "back-propagation" path. It is an incomplete system. I remember @swyx, in one of the episodes of @latentspacepod, arguing that RAG will not be killed by the long context window of LLMs since:

  1. long context is expensive for devs and
  2. long context is hard to debug and lacks decomposability.

But if all RAG frameworks focus only on the forwarding path, how is it easier to debug than an LLM? It is also interesting how many people get overexcited by the auto-magical results of RAG from some random POCs and completely forget that adding more forward layers without backward tuning is a terrible idea. We all know that adding one more layer to your neural networks expands its parametric space and hence representation ability, enabling it to do more potential things, but without training, this is nothing. There are quite some startups in the Bay Area working on evaluation—essentially trying to evaluate the loss of a feed-forward system. Is it useful? Yes. But does it help close the loop of RAG? No.

So who is working on the back-propagation of RAG? Afaik not many. I am mostly familiar with DSPy, a library from @stanfordnlp @lateinteraction that sets its mission on that.

GitHub - stanfordnlp/dspy: DSPy: The framework for programming—not prompting—foundation models
DSPy: The framework for programming—not prompting—foundation models - stanfordnlp/dspy
GitHubstanfordnlp

But even for DSPy, the main focus is on optimizing few-shot demonstrations, not the full system (or at least from community usage). But why is this problem difficult? Because the signal is very sparse, and optimizing a non-differentiable pipeline system is essentially a combinatorial problem—in other words, extremely hard. I learned some submodular optimization during my PhD, and I have a feeling that this technique will be put to good use in RAG optimization.

tagGrounding in the wild is hard

I do agree that RAG is for grounding, despite the funny search results from Google. There are two types of grounding: search grounding, which uses search engines to extend the world knowledge of LLMs, and check grounding, which uses private knowledge (e.g. proprietary data) to do fact-checking.

In both cases, it cites external knowledge to improve the factuality of the result, provided that these external resources are trustworthy. In Google's funny search result, one can easily see that not everything on the web is trustworthy (yeah, big surprise, who would thought!), which makes search grounding look bad. But I do believe you can only laugh at it for now. There are some implicit feedback mechanisms behind the Google Search UI that collect users' reactions to those results and weight the credibility of the website for better grounding. In general, it should be pretty temporary, as this RAG just needs to get past the cold start, and results will improve over time.

Diagram of Jina AI's search process with "Search Grounding," "Private Knowledge," and "Check Grounding" blocks, and related U
Two types of grounding that inspire Jina Reader

RAG was presented as a grounding solution in the Google Next conference.

tagMy Take

RAG is neither dead nor alive; so stop arguing about it. RAG is just one algorithmic pattern you can use. But if you make it the algorithm and idolize it, then you are living in a bubble you created, and the bubble will burst.

Categories:
Opinion
rss_feed

Read more
August 14, 2024 • 17 minutes read
By Hoovering Up the Web, AI Is Poisoning Itself
Alex C-G
Scott Martens
Illustration of a cartoonish robot vacuum cleaner with big eyes and an open mouth, humorously sticking out a tongue to clean,
July 19, 2024 • 22 minutes read
Is Romance Generative AI's Killer App? We Hope Not
Scott Martens
Alex C-G
Sofia Vasileva
Black-and-white cartoon of a man on one knee proposing with a ring, flanked by whimsical robots.
May 07, 2024 • 12 minutes read
When AI Makes AI: Synthetic Data, Model Distillation, And Model Collapse
Scott Martens
Abstract depiction of a brain in purple and pink hues with a fluid, futuristic design against a blue and purple background.
Offices
location_on
Sunnyvale, CA
710 Lakeway Dr, Ste 200, Sunnyvale, CA 94085, USA
location_on
Berlin, Germany (HQ)
Prinzessinnenstraße 19-20, 10969 Berlin, Germany
location_on
Beijing, China
Level 5, Building 6, No.48 Haidian West St. Beijing, China
location_on
Shenzhen, China
402 Floor 4, Fu'an Technology Building, Shenzhen, China
Search Foundation
DeepSearch
Reader
Embeddings
Reranker
Classifier
Segmenter
API Documentation
Get Jina API key
Rate Limit
API Status
Company
About us
Contact sales
Newsroom
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms
Security
Terms & Conditions
Privacy
Manage Cookies
email
Jina AI © 2020-2025.