Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 49 tok/s

Gemini 2.5 Pro 53 tok/s Pro

GPT-5 Medium 19 tok/s Pro

GPT-5 High 16 tok/s Pro

GPT-4o 103 tok/s Pro

Kimi K2 172 tok/s Pro

GPT OSS 120B 472 tok/s Pro

Claude Sonnet 4 39 tok/s Pro

2000 character limit reached

Granite Embedding R2 Models

Updated 1 September 2025

Granite Embedding R2 Models integrate bi-encoder and cross-encoder architectures to deliver state-of-the-art retrieval across diverse data types.
The training pipeline leverages large-scale pretraining, contrastive fine-tuning, and distillation to support extended context lengths up to 8192 tokens.
Released under Apache 2.0, these models ensure robust data governance and compliance, making them ideal for enterprise deployment and academic research.

Granite Embedding R2 Models constitute a comprehensive family of high–performance encoder–based embedding models specialized for enterprise–scale dense retrieval tasks. Designed as the direct evolution of the original Granite Embedding Models, Granite R2 applies architectural, training, and governance improvements to achieve state–of–the–art accuracy, efficiency, and scalability in real–world information retrieval scenarios, including text, code, long–document, multi–turn conversational, and tabular search. All models are released under the Apache 2.0 license and are explicitly engineered for compliance, unrestricted research, and commercial deployment.

1. Model Architectures and Design Principles

Granite Embedding R2 comprises both bi–encoder and cross–encoder architectures:

Bi–encoder Design: Each input (e.g., query, document) is independently encoded into a fixed–dimension vector, utilizing the [CLS] token embedding produced by a ModernBERT–style transformer. The pairwise similarity between embeddings is computed via temperature–scaled cosine similarity:

$s(q, p) = \frac{1}{\tau} \frac{E(q)_{[\text{CLS}]} \cdot E(p)_{[\text{CLS}]}}{\|E(q)_{[\text{CLS}]}\| \|E(p)_{[\text{CLS}]}\|}$

This architecture supports rapid approximate nearest neighbor search and is foundational for large–scale retrieval.

Cross–encoder (Reranker) Design: Query and document are concatenated and passed through the transformer, producing a joint representation for the [CLS] token. This final hidden state is fed into a classifier to output a single relevance score:

$z = \text{Classifier}(h_{[\text{CLS}]})$

While computationally costlier, cross–encoders yield finer granularity relevance estimation, suitable for re–ranking candidate sets post–retrieval.

Model Sizes and Context Support:
- Base Retriever (22 layers, 149M params, 768D): Suited for top–accuracy across domains, handles up to 8192 tokens context.
- Small Retriever (12 layers, 47M params, 384D): Maintains expanded context capability (8192 tokens) with faster inference.
- ModernBERT Enhancements: Alternating global attention (every third layer, 3x+1 pattern), rotary positional embeddings, and flash attention for scalable long–context computation.

This dual–architecture approach accommodates both large–scale candidate filtering (bi–encoder) and high–precision re–ranking (cross–encoder), underpinned by extensive context (16× expansion to 8192 tokens compared to prior models), directly improving performance in long–document domains.

2. Training Methodologies and Data Governance

Granite R2 employs a multi–stage training pipeline:

Large–Scale Pretraining: Initial model weights are trained on over 2 trillion tokens from high–quality corpora: GneissWeb, Wikipedia, BookCorpus, StackExchange, PubMed, and internal IBM documents. Early training uses a 1024–token context window.
Context Extension: Context length is systematically increased to 8192 tokens by tuning the RoPE theta parameter and additional training on 250B tokens. This ensures stable long–context handling.
Contrastive Finetuning: Models are trained on paired datasets (weakly paired and high–quality annotated pairs) using a contrastive objective:

$\mathcal{L}_C = -\frac{1}{n} \sum_{i=1}^n \log\left(\frac{\exp(s(q_i, p_{i0}))}{Z_i}\right)$

$Z_i = \exp(s(q_i, p_{i0})) + \alpha \sum_{j>0} \exp(s(q_i, p_{ij})) + \ldots$

This maximizes semantic alignment of related pairs, repels negatives, and yields robust retrieval representations.

Distillation & Reranking: Embeddings and similarity scores are distilled from a large teacher model (Mistral–7B–Instruct), further refining representation fidelity.
Tabular Pretraining: For table retrieval, a modified RetroMAE objective is implemented to align table structure and natural language summaries, greatly enhancing performance on table–based datasets.
Governance & Clearance: Every corpus undergoes technical, business, and governance reviews (content description, intended use, licensing, sensitivity assessment). Only enterprise–appropriate data is incorporated, producing models that align fully with organizational standards and mitigate risk associated with sensitive content.

This sophisticated pipeline integrates architectural innovation, data curation, and compliance to forge models suited for mission–critical enterprise use.

3. Performance Metrics and Comparative Evaluation

Granite R2 establishes new standards in both speed and accuracy:

Contextual Capabilities: Supports up to 8192 tokens, representing 16× improvement over many open–source alternatives, translating directly to superior long–document handling (MLDR, LongEmbed).
Speed Advantages:
- R2 models exhibit 19–44% faster encoding/inference relative to comparable open–source embedding models.
- Small variant processes $\sim$ 199 documents/sec, drastically reducing operational latency.
Benchmark Results:
- Text Retrieval (BEIR, MTEB): Base model surpasses competing base models in average NDCG@10.
- Code Retrieval (COIR): High performance through curated code data and targeted finetuning.
- Tabular and Conversational Retrieval: Enhanced through specialized pretraining, outperforming on OpenWikiTables, NQTables, OTT–QA, MT–RAG.

The combination of context, speed, and accuracy results in models that are state–of–the–art across a spectrum of benchmarks and real–world evaluation suites.

4. Application Domains

Granite R2 models are purpose–built for wide applicability:

Text Retrieval: Zero–shot and supervised settings, encompassing technical documentation, web, and enterprise documents.
Code Search: Bi–encoder matching of natural language queries to code via robust, governed code data.
Long–Document Retrieval: Effective resolution on tasks demanding extended contextual comprehension.
Multi–Turn Conversational Retrieval: Adapted for retrieval in dialog systems, multi–passage ML/RAG contexts.
Tabular Data Retrieval: Enabled by tabular pretraining methodology, supporting table–structured information needs.
Enterprise Information Retrieval: Integration into closed–domain, high–volume data pipelines with clear data provenance.

This versatility is achieved through architectural and training choices—in particular large context handling, specialized pretraining recipes, and compliance with licensing and governance protocols.

5. Licensing and Accessibility

All Granite R2 models are available under the Apache 2.0 license:

Academic and Commercial Use: Unrestricted deployment facilitated by permissive terms, contrasting models trained on data under restrictive or non–commercial licenses.
Enterprise Integration: Guaranteed data provenance and governance–approved datasets enable adoption in mission–critical systems.
Transparent Data Pipeline: Full traceability and clearance of training material promote trust and broad organizational acceptance.

This approach positions Granite R2 as especially suitable for organizations requiring clear licensing, compliance, and scalability.

6. Prospective Enhancements

Future work is anticipated in:

Encoder Architecture and Training Optimization: Targeting further performance increments via architectural research and expanded training data.
Domain Expansion: Potential adaptations to additional languages or technical domains.
Long–Context Efficiency: Continued innovation on scalable attention and reduction of computational overhead in lengthy contexts.
Enhanced Distillation & Data Integration: Application of newer teacher models and inclusion of further licensed data sources to remain abreast of retrieval research advances.

This suggests a forward–trajectory toward further speed and precision enhancements in retrieval, underpinned by scalable, governed deployment.

Summary Table: Granite Embedding R2 Model Variants

Model Variant	Layers/Params	Context Length	Embedding Size	Primary Use
Base Retriever	22 / 149M	8192	768	Broad, high-accuracy
Small Retriever	12 / 47M	8192	384	Efficiency-focused
Cross-Encoder Reranker	22 / 149M	8192	n/a	Fine reranking

Granite Embedding R2 Models consolidate modern transformer engineering, contrastive and distillation–based training, specialized pretraining for complex domains, and strict data governance, resulting in models that are benchmark–leading with verified enterprise suitability and transparent licensing.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to Granite Embedding R2 Models.