When implementing DeepResearch, there are at least two places where you need to generate diverse queries. First, you must generate web search queries based on user input (directly throwing user input into the search engine is not a good idea). Second, many DeepResearch systems include a "research planner" that breaks down the original problem into subproblems, calls agents to solve them independently, and then merges their results. Whether generating search queries or subproblems, our expectations remain the same: they must be relevant to the original input and diverse enough to provide unique perspectives on it. Often, we need to limit the number of queries to save money on unnecessary search engine requests or LLM tokens.
While understanding the importance of diverse query generation, most open-source DeepResearch implementations don't take this optimization seriously. They just hardcode these constraints directly into the prompt. Some might ask the LLM for an additional turn to evaluate and diversify the queries. Here's an example of how most implementations basically approach this:

gemini-2.5-flash
as the LLM and the original query is "embeddings and rerankers"
.In this article, I want to demonstrate a more rigorous approach to diverse query generation using sentence embeddings and submodular optimization. Back in my Ph.D. days, submodular optimization was one of my favorites along with L-BFGS. I'll show how to apply it for generating an optimal set of diverse queries under a cardinality constraint, which can significantly improve the overall quality of DeepResearch systems.
tagQuery Generation via Prompting
First, we want to check if prompting is an effective approach for generating diverse queries. We also want to understand if a sophisticated prompt is more effective than a simple prompt. Let's run an experiment comparing the two prompts below to find out:
You are an expert at generating diverse search queries. Given any input topic, generate {num_queries} different search queries that explore various angles and aspects of the topic.
Simple prompt
You are an expert research strategist. Generate an optimal set of diverse search queries that maximizes information coverage while minimizing redundancy.
Task: Create exactly {num_queries} search queries from any given input that satisfy:
- Relevance: Each query must be semantically related to the original input
- Diversity: Each query should explore a unique facet with minimal overlap
- Coverage: Together, the queries should comprehensively address the topic
Process:
1. Decomposition: Break down the input into core concepts and dimensions
2. Perspective Mapping: Identify distinct angles (theoretical, practical, historical, comparative, etc.)
3. Query Formulation: Craft specific, searchable queries for each perspective
4. Diversity Check: Ensure minimal semantic overlap between queries
Structured prompt
We use gemini-2.5-flash
as the LLM with the original query "embeddings and rerankers"
and test both the simple and structured prompts to iteratively generate from one to 20 queries. We then use jina-embeddings-v3 with the text-matching
task to measure sentence similarity between the original query and the generated queries, as well as similarity within the generated queries themselves. Here are the visualizations.

Look at the two plots on the right side, one can see that both simple and structured prompt exhibit large variance in cosine similarity scores, with many reaching 0.7-0.8 similarity, suggesting that some generated queries are nearly identical. Besides, both methods struggle to maintain diversity as more queries are generated. Instead of seeing a clear downward trend in similarity with increasing query count, we observe relatively stable (and high) similarity levels, indicating that additional queries often duplicate existing perspectives.
One explanation is what Wang et al. (2025) found that LLMs often reflect opinions of dominant groups disproportionately, even with prompt steering, indicating a bias towards common perspectives. This is because the LLM’s training data may over-represent certain viewpoints, causing the model to generate variations that align with these prevalent perspectives. Abe et al. (2025) also found that LLM-based query expansion favors popular interpretations while overlooking others. For instance, "What are the benefits of AI?" might yield common benefits like automation, efficiency, ethicalness but miss less obvious ones like drug discovery.


tagProblem Formulation
One might think our previous experiment is inconclusive and that we should improve the prompt and try again. While prompting can certainly change the results to some degree, what's more important is that we've learned something: simply increasing the number of generated queries makes us more likely to get diverse queries. The bad news is that we're also getting bunch of duplicated queries as side-product.
But since it's cheap to generate a large number of queries, which eventually yields some good ones, why don't we treat this as a subset selection problem?
In math, here is how we can formulate this problem: given an original input , a set of candidate queries generated by an LLM using prompt engineering. Select a subset of queries that maximizes coverage while minimizing redundancy.
Unfortunately, finding the optimal subset of queries from candidates requires checking combinations - exponential complexity. For just 20 candidates and , that's 15,504 combinations.
tagSubmodular Function
Before trying to brutally solve the subset selection problem, let me introduce the term submodularity and submodular function to the readers. They may sound unfamiliar to many, but you may well have heard of the idea of "diminishing returns" - well, submodularity is the mathematical representation of that.
Consider placing Wi-Fi routers to provide internet coverage in a large building. The first router you install gives tremendous value - it covers a significant area that previously had no coverage. The second router adds substantial value too, but some of its coverage area overlaps with the first router, so the marginal benefit is less than the first. As you continue adding routers, each additional router covers less and less new area because most spaces are already covered by existing routers. Eventually, the 10th router might provide very little additional coverage since the building is already well-covered.
This intuition captures the essence of submodularity. Mathematically, a set function is submodular if for all and any element :
In plain English: adding an element to a smaller set gives at least as much benefit as adding the same element to a larger set that contains the smaller set.
Now let's apply this concept to our query generation problem. One may immediately realize that query selection exhibits natural diminishing returns:
- The first query we select covers entirely new semantic space
- The second query should cover different aspects, but some overlap is inevitable
- As we add more queries, each additional query covers less and less new ground

tagEmbedding-Based Submodular Function Design
Let be the embedding vector for query , obtained using a sentence embedding model (e.g., jina-embeddings-v3). There are two main approaches to design our objective function:
tagApproach 1: Facility Location (Coverage-Based)
This function measures how well the selected set "covers" all candidate queries, where:
- is the cosine similarity
- ensures relevance to the original query
- measures coverage of candidate by selected set
One caveat is this function only implicitly encourages diversity. It doesn't explicitly penalize similarity within the selected set . Diversity emerges because selecting similar queries provides diminishing coverage returns.
tagApproach 2: Explicit Coverage + Diversity
For more direct control over diversity, we can combine coverage and an explicit diversity term:
where the diversity component can be formulated as:
This diversity term measures the total similarity between selected queries and unselected queries - it's maximized when we select queries that are different from the remaining candidates (a form of graph cut function).
tagDifference Between Two Approaches
Both formulations maintain submodularity.
Facility location function is a well-known submodular function. It exhibits submodularity because of the max operation: when we add a new query to our selected set, each candidate query gets covered by the "best" query in our set (the one with highest similarity). Adding to a smaller set is more likely to improve the coverage of various candidates than adding it to a larger set where many candidates are already well-covered.
In graph cut diversity function, the diversity term is submodular because it measures the "cut" between selected and unselected sets. Adding a new query to a smaller selected set creates more new connections to unselected queries than adding it to a larger selected set.
The facility location approach relies on implicit diversity through coverage competition, while the explicit approach directly measures and optimizes for diversity. So both are valid, but the explicit approach gives you more direct control over the relevance-diversity tradeoff.
tagImplementations
The full implementation can be found here on Github.
Since our function is submodular, we can use the greedy algorithm which provides a approximation guarantee:
Here's the code for optimizing Facility Location (coverage-based) - the one with implicit diversity.
def greedy_query_selection(candidates, embeddings, original_embedding, k, alpha=0.3):
"""
Greedy algorithm for submodular query selection
Args:
candidates: List of candidate query strings
embeddings: Matrix of query embeddings (n x d)
original_embedding: Embedding of original query (d,)
k: Number of queries to select
alpha: Relevance weight parameter
"""
n = len(candidates)
selected = []
remaining = set(range(n))
# Precompute relevance scores
relevance_scores = cosine_similarity(original_embedding, embeddings)
for _ in range(k):
best_gain = -float('inf')
best_query = None
for i in remaining:
# Calculate marginal gain of adding query i
gain = compute_marginal_gain(i, selected, embeddings,
relevance_scores, alpha)
if gain > best_gain:
best_gain = gain
best_query = i
if best_query is not None:
selected.append(best_query)
remaining.remove(best_query)
return [candidates[i] for i in selected]
def compute_marginal_gain(new_idx, selected, embeddings, relevance_scores, alpha):
"""Compute marginal gain of adding new_idx to selected set"""
if not selected:
# First query: gain is sum of all relevance scores
return sum(max(alpha * relevance_scores[j],
cosine_similarity(embeddings[new_idx], embeddings[j]))
for j in range(len(embeddings)))
# Compute current coverage
current_coverage = [
max([alpha * relevance_scores[j]] +
[cosine_similarity(embeddings[s], embeddings[j]) for s in selected])
for j in range(len(embeddings))
]
# Compute new coverage with additional query
new_coverage = [
max(current_coverage[j],
cosine_similarity(embeddings[new_idx], embeddings[j]))
for j in range(len(embeddings))
]
return sum(new_coverage) - sum(current_coverage)
The balance parameter alpha
controls the tradeoff between relevance and diversity:
- High (e.g., 0.8): Prioritizes relevance to original query, may sacrifice diversity
- Low (e.g., 0.2): Prioritizes diversity among selected queries, may drift from original intent
- Moderate (e.g., 0.4-0.6): Balanced approach, often works well in practice
tagLazy Greedy Algorithm
One may notice in the above code:
for i in remaining:
# Calculate marginal gain of adding query i
gain = compute_marginal_gain(i, selected, embeddings,
relevance_scores, alpha)
We are computing the marginal gain for all remaining candidates at each iteration. We can do better than this.
The lazy greedy algorithm is a clever optimization that exploits submodularity to avoid unnecessary computations. The key insight is: if element A had higher marginal gain than element B in iteration , then A will still have higher marginal gain than B in iteration (due to submodularity property).
import heapq
def lazy_greedy_query_selection(candidates, embeddings, original_embedding, k, alpha=0.3):
"""
Lazy greedy algorithm for submodular query selection
More efficient than standard greedy by avoiding unnecessary marginal gain computations
"""
n = len(candidates)
selected = []
# Precompute relevance scores
relevance_scores = cosine_similarity(original_embedding, embeddings)
# Initialize priority queue: (-marginal_gain, last_updated, query_index)
# Use negative gain because heapq is a min-heap
pq = []
for i in range(n):
gain = compute_marginal_gain(i, [], embeddings, relevance_scores, alpha)
heapq.heappush(pq, (-gain, 0, i))
for iteration in range(k):
while True:
neg_gain, last_updated, best_idx = heapq.heappop(pq)
# If this gain was computed in current iteration, it's definitely the best
if last_updated == iteration:
selected.append(best_idx)
break
# Otherwise, recompute the marginal gain
current_gain = compute_marginal_gain(best_idx, selected, embeddings,
relevance_scores, alpha)
heapq.heappush(pq, (-current_gain, iteration, best_idx))
return [candidates[i] for i in selected]
Lazy greedy works like this:
- Maintain a priority queue of elements sorted by their marginal gains
- Only recompute the marginal gain of the top element
- If it's still the highest after recomputation, select it
- Otherwise, reinsert it in correct position and check the next top element
This can provide significant speedups because we avoid recomputing marginal gains for elements that clearly won't be selected.
tagResults
Let's run the experiment again. We use the same simple prompt to generate one to 20 diverse queries and perform the same cosine similarity measurements as before. For submodular optimization, we select queries from the 20 generated candidates using different values of k and measure similarity as before. The results show that queries selected through submodular optimization are more diverse and show lower in-set similarity.

"embeddings and rerankers"

"generative ai"

"geopolitics USA and China"

"google 2025 revenue breakdown"
tagFinal Question: Why Submodular Formulation Matters
You might wonder: why go through the trouble of formulating this as a submodular optimization problem? Why not just use heuristics or other optimization approaches?
In short, submodular formulation transforms an ad-hoc "select diverse queries" heuristic into a rigorous optimization problem with provable guarantees, efficient algorithms, and measurable objectives.
tagGuaranteed Efficiency
Once we prove our objective function is submodular, we get powerful theoretical guarantees and an efficient algorithm. The greedy algorithm that runs in time comparing to checking combinations achieves a approximation to the optimal solution. This means our greedy solution is always at least 63% as good as the best possible solution. No heuristic can promise this.
Moreover, the lazy greedy algorithm is dramatically faster in practice due to the mathematical structure of submodular functions. The speedup comes from diminishing returns: elements that were poor choices in earlier iterations are unlikely to become good choices later. So instead of checking all candidates, lazy greedy typically only needs to recompute gains for the top few candidates.
tagNo Need for Hand-Crafted Heuristics
Without a principled framework, you might resort to ad-hoc rules like "ensure queries have cosine similarity < 0.7" or "balance different keyword categories." These rules are hard to tune and don't generalize. Submodular optimization gives you a principled, mathematically grounded approach. You can tune hyperparameters systematically using validation sets, and monitor solution quality in production systems. When the system produces poor results, you have clear metrics to debug what went wrong.
Finally, submodular optimization is a well-studied field with decades of research, allowing you to leverage advanced algorithms beyond greedy (like accelerated greedy or local search), theoretical insights about when certain formulations work best, and extensions to handle additional constraints such as budget limits or fairness requirements.

For those who are interested in submodular optimization, I recommend this site to learn more.