Correlations: Vibe-Testing Embeddings in GUI

One of the interesting questions people ask us is: "How do you guys vibe-check your embeddings?" Sure, there's MTEB for serious and quantitative evaluation on public benchmarks, but what do you do for open-domain or new problems? Today we want to share a small internal tool we use for debugging and visualization. You can call it our vibe-testing toolkit. We call it Correlations, and it's open source on GitHub.

0:00

/1:23

tagDesign

Correlations generates interactive heatmaps where each cell shows the cosine similarity between two pieces—whether they're chunks from the same or different document collections, modalities, hyperparameters, or models. It supports several interaction:

Hover inspection: Original text/image and similarity scores for individual cell pairs
Region selection: Interactive area selection for focused analysis of similarity patterns
Threshold filtering: Similarity score and text length filters to reduce noise

The tool operates via a two-stage pipeline:

npm run embed: Using the Jina Embeddings API with configurable chunking strategies (newline, punctuation, character-based, or regex patterns)
npm run corr: Browser-based UI serving correlation heatmaps with real-time interactivity

To get started:

npm install
export JINA_API_KEY=your_jina_key_here
npm run embed -- https://jina.ai/news/jina-embeddings-v3-a-frontier-multilingual-embedding-model -o v3-blog.jsonl -t retrieval.query
npm run embed -- https://arxiv.org/pdf/2409.10173 -o v3-arxiv.jsonl -t retrieval.passage
npm run corr -- v3-blog.jsonl v3-arxiv.jsonl

JINA_API_KEY is used for embedding and reading content from a URL when necessary, reading from local text file is of course supported. You can also bring your own embeddings and do npm run corr for visualization only, in which case you don't need JINA_API_KEY. The tool supports both self-correlation analysis (within a single collections) and cross-correlation analysis (between two collections).

tagUse Cases

tagContent Deduplication and Alignment Analysis

We demonstrate the tool's utility through analysis of our jina-embeddings-v3 publications. By comparing the academic paper with the release note, the visualization revealed distinct diagonal patterns in the correlation heatmap, indicating strong chunk-to-chunk alignment between documents. Detailed examination showed systematic content reuse, particularly in technical sections describing LoRA task types.

0:00

/1:19

tagCitation and Reference Validation

The tool proves valuable for validating citation accuracy in retrieval-augmented generation systems, where it becomes critical to verify that retrieved passages genuinely support generated claims. Similarity-based analysis is a powerful and intuitive tool for exploring large data sets, for instance, for revealing patterns by grouping items by similarity.

tagChunking Strategy Exploration

Late chunking and other segmentation strategies can be evaluated by examining how different approaches affect semantic coherence within and between text segments. The visualization helps identify the late chunking effect and the optimal chunk boundaries by revealing similarity patterns that align with semantic structure.

The tool extends beyond text to support image embeddings via jina-clip-v2, enabling analysis of text-image correlation patterns for multimodal applications.

0:00

/0:08

The interpretability challenge is particularly acute when working with high-dimensional embeddings. The landscape of embedding visualization techniques has evolved significantly, with different approaches can be categorized as:

Dimensionality Reduction-Based: Traditional approaches using PCA, t-SNE, UMAP that project high-dimensional spaces to 2D/3D
Interactive Exploration-Based: Tools like Parallax and TextEssence that enable direct manipulation and exploration
Domain-Specific Solutions: Specialized tools like Clustergrammer for biological data
Direct Similarity Visualization: Our approach and similar heatmap-based methods that preserve full relational information

Method	Approach	Use Cases
Correlations	Direct pairwise similarity heatmaps	Text similarity debugging, alignment analysis
Embedding Projector	PCA, t-SNE and custom linear projections	Interactive visualization and interpretation
Parallax	Algebraic formulae for semantic exploration	Understanding semantic relations
TextEssence	Comparative corpus analysis	Diachronic analysis, corpus comparison
Nomic Atlas	Cloud-based scalable visualization	Large-scale datasets, collaboration
Clustergrammer	Interactive heatmap with clustering	High-dimensional biological data
t-SNE	Non-linear cluster visualization	Model debugging, confusion identification
UMAP	Local and global structure preservation	Medium-large datasets, general analysis
PCA	Linear dimensionality reduction	Initial exploration, baseline comparison

tagLimitations of Point-Wise Approaches

Existing visualization tools primarily focus on point-wise representations in 2D spaces, which can lose critical information about pairwise relationships. Moreover, most tools are designed for single embedding space analysis rather than comparative evaluation between different sources, modalities or embedding strategies (e.g. late chunking on vs. off).

For example, we recently encountered two use cases at Jina. The first involves cross-checking citations in DeepSearch, where we need to match generated report to the original excerpts from the reference material. The second is multimodal retrieval, where we need to verify image-text and image-image alignment on new unlabeled data. In both cases, we need to explore the relationships between two collections of embeddings. So we use Correlations to get a sense of how well the matches align and to validate whether the highest correlations consistently correspond to the correct matches.

tagConclusion

Beyond vibe checking, correlations can provide deeper insights into semantic relationships. As a starting point, several key statistics can be extracted from the correlation matrix:

Matrix Density: The proportion of correlations above specified thresholds, indicating overall semantic cohesion
Eigenvalue Distribution: Principal components analysis reveals the dominant patterns in similarity structure
Matrix Rank: Indicates the effective dimensionality of the similarity relationships
Condition Number: Measures numerical stability and potential multicollinearity issues

Advanced analysis may also involve extracting meaningful submatrices that represent coherent semantic regions. Extracting k-order maximal-sum principal submatrix from an n-order real matrix is a typical combinatorial optimization problem that can identify the most highly correlated segments.

Correlations: Vibe-Testing Embeddings in GUI

tagDesign

tagUse Cases

tagContent Deduplication and Alignment Analysis

tagCitation and Reference Validation

tagChunking Strategy Exploration

tagCross-Modal Analysis

tagRelated Work in Embedding Visualization

tagLimitations of Point-Wise Approaches

tagConclusion