News
Models
Products
keyboard_arrow_down
DeepSearch
Search, read and reason until best answer found.
Reader
Convert any URL to Markdown for better grounding LLMs.
Embeddings
World-class multimodal multilingual embeddings.
Reranker
World-class reranker for maximizing search relevancy.
More
keyboard_arrow_down
Classifier
Zero-shot and few-shot classification for image and text.
Segmenter
Cut long text into chunks and do tokenization.

API Docs
Auto codegen for your copilot IDE or LLM
open_in_new


Company
keyboard_arrow_down
About us
Contact sales
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms & Conditions


Log in
login
Bounding the Cosine Similarity
Validating the Bounds
Understanding the Bounds
Implications of the Bounds
star
Featured
Tech blog
January 23, 2024

Does Subspace Cosine Similarity Imply High-Dimensional Cosine Similarity?

Does high similarity in subspace assure a high overall similarity between vectors? This post examines the theory and practical implications of subspace similarity.
Diagram illustrating a neural network process with smiley faces and repeated mentions of "Similar" on a blackboard-like backg
Han Xiao
Han Xiao • 9 minutes read
💡
On Jan. 25, 2024, OpenAI released a new embedding model with a new feature called "shortening", which allows developers to trim embeddings—essentially cutting numbers from the sequence's end—without compromising the embedding's ability to represent concepts effectively. Dive into this post for a solid theoretical foundation on the viability and rationale behind this innovation.

Consider this: when measuring the cosine similarity of embedding vectors in high-dimensional spaces, how does their similarity in lower-dimensional subspaces imply the overall similarity? Is there a direct, proportional relationship, or is the reality more complex with high-dimensional data?

More concretely, does high similarity between vectors in their first 256 dimensions assure a high similarity in their full 768 dimensions? Conversely, if vectors significantly differ in some dimensions, does this spell a low overall similarity? These aren't mere theoretical musings; they are crucial considerations for efficient vector retrieval, database indexing, and the performance of RAG systems.

Developers often rely on heuristics, assuming that high subspace similarity equates to high overall similarity or that notable differences in one dimension significantly affect the overall similarity. The question is: are these heuristic methods built on firm theoretical ground, or are they simply assumptions of convenience?

This post delves into these questions, examining the theory and practical implications of subspace similarity in relation to overall vector similarity.

tagBounding the Cosine Similarity

Given vectors A,B∈Rd\mathbf{A}, \mathbf{B}\in \mathbb{R}^dA,B∈Rd, we decompose them as A=[A1,A2]\mathbf{A}=[\mathbf{A}_1, \mathbf{A}_2]A=[A1​,A2​] and B=[B1,B2]\mathbf{B}=[\mathbf{B}_1, \mathbf{B}_2]B=[B1​,B2​], where A1,B1∈Rm\mathbf{A}_1,\mathbf{B}_1\in\mathbb{R}^mA1​,B1​∈Rm and A2,B2∈Rn\mathbf{A}_2,\mathbf{B}_2\in\mathbb{R}^nA2​,B2​∈Rn, with m+n=dm+n=dm+n=d.

The cosine similarity in the subspace Rm\mathbb{R}^mRm is given by cos⁡(A1,B1)=A1⋅B1∥A1∥∥B1∥\cos(\mathbf{A}_1, \mathbf{B}_1)=\frac{\mathbf{A}_1\cdot\mathbf{B}_1}{\|\mathbf{A}_1\|\|\mathbf{B}_1\|}cos(A1​,B1​)=∥A1​∥∥B1​∥A1​⋅B1​​; similarly, the similarity in the subspace Rn\mathbb{R}^nRn is cos⁡(A2,B2)=A2⋅B2∥A2∥∥B2∥\cos(\mathbf{A}_2, \mathbf{B}_2)=\frac{\mathbf{A}_2\cdot\mathbf{B}_2}{\|\mathbf{A}_2\|\|\mathbf{B}_2\|}cos(A2​,B2​)=∥A2​∥∥B2​∥A2​⋅B2​​.

In the original space Rd\mathbb{R}^dRd, the cosine similarity is defined as:cos⁡(A,B)=A⋅B∥A∥∥B∥=A1⋅B1+A2⋅B2∥A1∥2+∥A2∥2∥B1∥2+∥B2∥2=cos⁡(A1,B1)∥A1∥∥B1∥+cos⁡(A2,B2)∥A2∥∥B2∥∥A1∥2+∥A2∥2∥B1∥2+∥B2∥2\begin{align*}\cos(\mathbf{A},\mathbf{B})&=\frac{\mathbf{A}\cdot\mathbf{B}}{\|\mathbf{A}\|\|\mathbf{B}\|}\\&=\frac{\mathbf{A}_1\cdot\mathbf{B}_1+\mathbf{A}_2\cdot\mathbf{B}_2}{\sqrt{\|\mathbf{A}_1\|^2+\|\mathbf{A}_2\|^2}\sqrt{\|\mathbf{B}_1\|^2+\|\mathbf{B}_2\|^2}}\\&=\frac{\cos(\mathbf{A}_1, \mathbf{B}_1)\|\mathbf{A}_1\|\|\mathbf{B}_1\|+\cos(\mathbf{A}_2, \mathbf{B}_2)\|\mathbf{A}_2\|\|\mathbf{B}_2\|}{\sqrt{\|\mathbf{A}_1\|^2+\|\mathbf{A}_2\|^2}\sqrt{\|\mathbf{B}_1\|^2+\|\mathbf{B}_2\|^2}}\end{align*}cos(A,B)​=∥A∥∥B∥A⋅B​=∥A1​∥2+∥A2​∥2​∥B1​∥2+∥B2​∥2​A1​⋅B1​+A2​⋅B2​​=∥A1​∥2+∥A2​∥2​∥B1​∥2+∥B2​∥2​cos(A1​,B1​)∥A1​∥∥B1​∥+cos(A2​,B2​)∥A2​∥∥B2​∥​​

Now, let s:=max⁡(cos⁡(A1,B1),cos⁡(A2,B2))s := \max(\cos(\mathbf{A}_1, \mathbf{B}_1), \cos(\mathbf{A}_2, \mathbf{B}_2))s:=max(cos(A1​,B1​),cos(A2​,B2​)). Then, we have:cos⁡(A,B)≤s∥A1∥∥B1∥+s∥A2∥∥B2∥∥A1∥2+∥A2∥2∥B1∥2+∥B2∥2=∥A1∥∥B1∥+∥A2∥∥B2∥∥A1∥2+∥A2∥2∥B1∥2+∥B2∥2⋅s=cos⁡([∥A1∥,∥A2∥]⏟R2,[∥B1∥,∥B2∥]⏟R2)⋅s≤1⋅s=max⁡(cos⁡(A1,B1),cos⁡(A2,B2))\begin{align*}\cos(\mathbf{A},\mathbf{B})&\leq\frac{s\|\mathbf{A}_1\|\|\mathbf{B}_1\|+s\|\mathbf{A}_2\|\|\mathbf{B}_2\|}{\sqrt{\|\mathbf{A}_1\|^2+\|\mathbf{A}_2\|^2}\sqrt{\|\mathbf{B}_1\|^2+\|\mathbf{B}_2\|^2}}\\&=\frac{\|\mathbf{A}_1\|\|\mathbf{B}_1\|+\|\mathbf{A}_2\|\|\mathbf{B}_2\|}{\sqrt{\|\mathbf{A}_1\|^2+\|\mathbf{A}_2\|^2}\sqrt{\|\mathbf{B}_1\|^2+\|\mathbf{B}_2\|^2}}\cdot s\\&=\cos(\underbrace{[\|\mathbf{A}_1\|, \|\mathbf{A}_2\|]}_{\mathbb{R}^2}, \underbrace{[\|\mathbf{B}_1\|, \|\mathbf{B}_2\|]}_{\mathbb{R}^2})\cdot s\\&\leq 1\cdot s \\&= \max(\cos(\mathbf{A}_1, \mathbf{B}_1), \cos(\mathbf{A}_2, \mathbf{B}_2))\end{align*}cos(A,B)​≤∥A1​∥2+∥A2​∥2​∥B1​∥2+∥B2​∥2​s∥A1​∥∥B1​∥+s∥A2​∥∥B2​∥​=∥A1​∥2+∥A2​∥2​∥B1​∥2+∥B2​∥2​∥A1​∥∥B1​∥+∥A2​∥∥B2​∥​⋅s=cos(R2[∥A1​∥,∥A2​∥]​​,R2[∥B1​∥,∥B2​∥]​​)⋅s≤1⋅s=max(cos(A1​,B1​),cos(A2​,B2​))​

End of proof.

Note that in the final step of the proof, we leverage that the cosine similarity is always less than or equal to 1. This forms our upper bound. Similarly, we can show that the lower bound of cos⁡(A,B)\cos(\mathbf{A},\mathbf{B})cos(A,B) is given by:

cos⁡(A,B)≥t⋅cos⁡([∥A1∥,∥A2∥],[∥B1∥,∥B2∥]) \cos(\mathbf{A},\mathbf{B}) \geq t \cdot \cos([\|\mathbf{A}_1\|, \|\mathbf{A}_2\|], [\|\mathbf{B}_1\|, \|\mathbf{B}_2\|]) cos(A,B)≥t⋅cos([∥A1​∥,∥A2​∥],[∥B1​∥,∥B2​∥]), where t:=min⁡(cos⁡(A1,B1),cos⁡(A2,B2))t:= \min(\cos(\mathbf{A}_1, \mathbf{B}_1), \cos(\mathbf{A}_2, \mathbf{B}_2))t:=min(cos(A1​,B1​),cos(A2​,B2​)).

Note that for the lower bound, we can not hastily conclude that cos⁡(A,B)≥t\cos(\mathbf{A},\mathbf{B}) \geq tcos(A,B)≥t. This is because of the range of the cosine function, which spans between [−1,1][-1, 1][−1,1]. Due to this range, it's impossible to establish a tighter lower bound than the trivial value of -1.

So in conclusion, we have the following loose bound: −1≤cos⁡(A,B)≤max⁡(cos⁡(A1,B1),cos⁡(A2,B2)). -1\leq\cos(\mathbf{A},\mathbf{B})\leq\max(\cos(\mathbf{A}_1, \mathbf{B}_1), \cos(\mathbf{A}_2, \mathbf{B}_2)).−1≤cos(A,B)≤max(cos(A1​,B1​),cos(A2​,B2​)). and a tighter bound γ⋅t≤cos⁡(A,B)≤γ⋅sγ⋅min⁡(cos⁡(A1,B1),cos⁡(A2,B2))≤cos⁡(A,B)≤γ⋅max⁡(cos⁡(A1,B1),cos⁡(A2,B2))\begin{align*} \gamma \cdot t\leq&\cos(\mathbf{A}, \mathbf{B}) \leq\gamma\cdot s\\\gamma \cdot \min(\cos(\mathbf{A}_1, \mathbf{B}_1), \cos(\mathbf{A}_2, \mathbf{B}_2)) \leq &\cos(\mathbf{A}, \mathbf{B}) \leq \gamma \cdot \max(\cos(\mathbf{A}_1, \mathbf{B}_1), \cos(\mathbf{A}_2, \mathbf{B}_2))\end{align*}γ⋅t≤γ⋅min(cos(A1​,B1​),cos(A2​,B2​))≤​cos(A,B)≤γ⋅scos(A,B)≤γ⋅max(cos(A1​,B1​),cos(A2​,B2​))​, where γ=cos⁡([∥A1∥,∥A2∥],[∥B1∥,∥B2∥])\gamma = \cos([\|\mathbf{A}_1\|, \|\mathbf{A}_2\|], [\|\mathbf{B}_1\|, \|\mathbf{B}_2\|]) γ=cos([∥A1​∥,∥A2​∥],[∥B1​∥,∥B2​∥]).

tagConnection to Johnson–Lindenstrauss Lemma

The JL lemma asserts that for any 0<ϵ<10 < \epsilon < 10<ϵ<1 and any finite set of points S S S in Rd \mathbb{R}^d Rd, there exists a mapping f:Rd→Rk f: \mathbb{R}^d \rightarrow \mathbb{R}^k f:Rd→Rk (with k=O(ϵ−2log⁡∣S∣) k = O(\epsilon^{-2} \log |S|) k=O(ϵ−2log∣S∣)) such that for all u,v∈S \mathbf{u}, \mathbf{v} \in S u,v∈S, the Euclidean distances are approximately preserved:

(1−ϵ)∥u−v∥2≤∥f(u)−f(v)∥2≤(1+ϵ)∥u−v∥2(1 - \epsilon) \|\mathbf{u} - \mathbf{v}\|^2 \leq \|f(\mathbf{u}) - f(\mathbf{v})\|^2 \leq (1 + \epsilon) \|\mathbf{u} - \mathbf{v}\|^2(1−ϵ)∥u−v∥2≤∥f(u)−f(v)∥2≤(1+ϵ)∥u−v∥2

Johnson–Lindenstrauss lemma - Wikipedia
Wikimedia Foundation, Inc.Contributors to Wikimedia projects

To make fff work like a subspace selection, we can use a diagonal matrix for projection, such as a 5×35 \times 35×3 matrix fff, albeit not random (note, the typical formulation of the JL lemma involves linear transformations that often utilize random matrices drawn from a Gaussian distribution). For instance, if we aim to retain the 1st, 3rd, and 5th dimensions from a 5-dimensional vector space, the matrix fff could be designed as follows: f=[100000010000001]f = \begin{bmatrix}1 & 0 & 0 \\0 & 0 & 0 \\0 & 1 & 0 \\0 & 0 & 0 \\0 & 0 & 1\end{bmatrix}f=​10000​00100​00001​​
However, by specifying fff to be diagonal, we limit the class of functions that can be used for the projection. The JL lemma guarantees the existence of a suitable fff within the broader class of linear transformations, but when we restrict fff to be diagonal, such a suitable fff may not exist within this restricted class for applying the JL lemma's bounds.

tagValidating the Bounds

To empirically explore the theoretical bounds on cosine similarity in high-dimensional vector spaces, we can employ a Monte Carlo simulation. This method allows us to generate a large number of random vector pairs, compute their similarities in both the original space and subspaces, and then assess how well the theoretical upper and lower bounds hold in practice.

The following Python code snippet implements this concept. It randomly generates pairs of vectors in a high-dimensional space and computes their cosine similarity. Then, it divides each vector into two subspaces, calculates the cosine similarity within each subspace, and evaluates the upper and lower bounds of the full-dimensional cosine similarity based on the subspace similarities.

import numpy as np


def compute_cosine_similarity(U, V):
    # Normalize the rows to unit vectors
    U_norm = U / np.linalg.norm(U, axis=1, keepdims=True)
    V_norm = V / np.linalg.norm(V, axis=1, keepdims=True)
    # Compute pairwise cosine similarity
    return np.sum(U_norm * V_norm, axis=1)


# Generate random data
num_points = 5000
d = 1024
A = np.random.random([num_points, d])
B = np.random.random([num_points, d])

# Compute cosine similarity between A and B
cos_sim = compute_cosine_similarity(A, B)

# randomly divide A and B into subspaces
m = np.random.randint(1, d)
A1 = A[:, :m]
A2 = A[:, m:]
B1 = B[:, :m]
B2 = B[:, m:]

# Compute cosine similarity in subspaces
cos_sim1 = compute_cosine_similarity(A1, B1)
cos_sim2 = compute_cosine_similarity(A2, B2)

# Find the element-wise maximum and minimum of cos_sim1 and cos_sim2
s = np.maximum(cos_sim1, cos_sim2)
t = np.minimum(cos_sim1, cos_sim2)

norm_A1 = np.linalg.norm(A1, axis=1)
norm_A2 = np.linalg.norm(A2, axis=1)
norm_B1 = np.linalg.norm(B1, axis=1)
norm_B2 = np.linalg.norm(B2, axis=1)

# Form new vectors in R^2 from the norms
norm_A_vectors = np.stack((norm_A1, norm_A2), axis=1)
norm_B_vectors = np.stack((norm_B1, norm_B2), axis=1)

# Compute cosine similarity in R^2
gamma = compute_cosine_similarity(norm_A_vectors, norm_B_vectors)

# print some info and validate the lower bound and upper bound
print('d: %d\n'
      'm: %d\n'
      'n: %d\n'
      'avg. cosine(A,B): %f\n'
      'avg. upper bound: %f\n'
      'avg. lower bound: %f\n'
      'lower bound satisfied: %s\n'
      'upper bound satisfied: %s' % (
          d, m, (d - m), np.mean(cos_sim), np.mean(s), np.mean(gamma * t), np.all(s >= cos_sim),
          np.all(gamma * t <= cos_sim)))

A Monte Carlo validator for validating cosine similarity bounds

d: 1024
m: 743
n: 281
avg. cosine(A,B): 0.750096
avg. upper bound: 0.759080
avg. lower bound: 0.741200
lower bound satisfied: True
upper bound satisfied: True

A sample output from our Monte Carlo validator. It's important to note that the lower/upper bound satisfied condition is checked for every vector individually. Meanwhile, the avg. lower/upper bound provides a more intuitive overview of the statistics related to these bounds but doesn't directly influence the validation process.

tagUnderstanding the Bounds

In a nutshell, when comparing two high-dimensional vectors, the overall similarity lies between the best and worst similarities of their subspaces, adjusted for how large or important those subspaces are in the overall scheme. This is what the bounds for cosine similarity in higher dimensions intuitively represent: the balance between the most and least similar parts, weighted by their relative sizes or importance.

Illustrative comparison of two stylus pen caps and bodies with labeled sections on a black background
Each pen has two main components: the body and the cap.

Imagine you're trying to compare two multi-part objects (let's say, two fancy pens) based on their overall similarity. Each pen has two main components: the body and the cap. The similarity of the whole pen (both body and cap) is what we're trying to determine:

tagUpper Bound (γ⋅s\gamma \cdot sγ⋅s)

Think of sss as the best match between corresponding parts of the pens. If the caps are very similar but the bodies aren't, sss is the similarity of the caps.

Now, γ\gammaγ is like a scaling factor based on the size (or importance) of each part. If one pen has a very long body and a short cap, while the other has a short body and a long cap, γ\gammaγ adjusts the overall similarity to account for these differences in proportions.

The upper bound tells us that no matter how similar some parts are, the overall similarity can't exceed this "best part similarity" scaled by the proportion factor.

tagLower Bound (γ⋅t\gamma \cdot tγ⋅t)

Here, ttt is the similarity of the least matching parts. If the bodies of the pens are quite different but the caps are similar, ttt reflects the body's similarity.

Again, γ\gammaγ scales this based on the proportion of each part.

The lower bound means that the overall similarity can't be worse than this "worst part similarity" after accounting for the proportion of each part.

tagImplications of the Bounds

For software engineers working with embeddings, vector search, retrieval, or databases, understanding these bounds has practical implications, particularly when dealing with high-dimensional data. Vector search often involves finding the closest (most similar) vectors in a database to a given query vector, typically using cosine similarity as a measure of closeness. The bounds we discussed can provide insights into the effectiveness and limitations of using subspace similarities for such tasks.

tagUsing Subspace Similarity for Ranking

Safety and Accuracy: Using subspace similarity for ranking and retrieving top-k results can be effective, but with caution. The upper bound indicates that the overall similarity can't exceed the maximum similarity of the subspaces. Thus, if a pair of vectors is highly similar in a particular subspace, it's a strong candidate for being similar in the high-dimensional space.

Potential Pitfalls: However, the lower bound suggests that two vectors with low similarity in one subspace could still be quite similar overall. Therefore, relying solely on subspace similarity might miss some relevant results.

tagMisconceptions and Cautions

Overestimating Subspace Importance: A common misconception is overestimating the importance of a particular subspace. While high similarity in one subspace is a good indicator, it doesn't guarantee high overall similarity due to the influence of other subspaces.

Ignoring Negative Similarities: In cases where the cosine similarity in a subspace is negative, it indicates an opposing relationship in that dimension. Engineers should be wary of how these negative similarities impact the overall similarity.

Categories:
star
Featured
Tech blog
rss_feed

Read more
May 07, 2025 • 9 minutes read
Model Soup’s Recipe for Embeddings
Bo Wang
Scott Martens
Still life drawing of a purple bowl filled with apples and oranges on a white table. The scene features rich colors against a
April 16, 2025 • 10 minutes read
On the Size Bias of Text Embeddings and Its Impact in Search
Scott Martens
Black background with a simple white ruler marked in centimeters, emphasizing a minimalist design.
April 01, 2025 • 17 minutes read
Using DeepSeek R1 Reasoning Model in DeepSearch
Andrei Ungureanu
Alex C-G
Brown background with a stylized whale graphic and the text "THINK:" and ":SEARCH>" in code-like font.
Offices
location_on
Sunnyvale, CA
710 Lakeway Dr, Ste 200, Sunnyvale, CA 94085, USA
location_on
Berlin, Germany (HQ)
Prinzessinnenstraße 19-20, 10969 Berlin, Germany
location_on
Beijing, China
Level 5, Building 6, No.48 Haidian West St. Beijing, China
location_on
Shenzhen, China
402 Floor 4, Fu'an Technology Building, Shenzhen, China
Search Foundation
DeepSearch
Reader
Embeddings
Reranker
Classifier
Segmenter
API Documentation
Get Jina API key
Rate Limit
API Status
Company
About us
Contact sales
Newsroom
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms
Security
Terms & Conditions
Privacy
Manage Cookies
email
Jina AI © 2020-2025.