Version current

Rerankers

Rerankers improve search result quality by reordering documents based on their relevance to a query using specialized models. RedisVL for Java provides powerful reranking capabilities through local HuggingFace cross-encoder models and cloud-based services like Cohere and VoyageAI.

What are Rerankers?

Rerankers provide a relevance boost to search results generated by traditional (lexical) or semantic search strategies. While initial search retrieves a broad set of potentially relevant documents, rerankers apply more sophisticated relevance scoring to produce a refined, higher-quality ranking.

Cross-Encoders vs Bi-Encoders

RedisVL uses cross-encoder models for reranking:

  • Bi-encoders (used for embeddings): Encode query and document separately, then compare vectors

  • Cross-encoders (used for reranking): Encode query+document pairs together, producing direct relevance scores

Cross-encoders are slower but more accurate because they can model the interaction between query and document directly.

Why Use Rerankers?

Reranking addresses common search quality issues:

  • Improve Precision - Surface the most relevant results at the top

  • Better Ranking - Cross-encoders understand query-document relationships better than vector similarity alone

  • Flexible Integration - Apply reranking to any search results (vector, lexical, or hybrid)

  • Cost-Effective - Rerank only the top K results rather than scoring all documents

Typical workflow:

  1. Perform fast initial search (vector or hybrid) to get top 100 candidates

  2. Use reranker to precisely rank top 10 results

  3. Return highest-quality results to users

HFCrossEncoderReranker

The HFCrossEncoderReranker class uses real HuggingFace cross-encoder models running via ONNX Runtime. Models are automatically downloaded and cached locally.

Setup

Add ONNX Runtime and HuggingFace tokenizer dependencies to your project:

Maven
<dependency>
    <groupId>com.microsoft.onnxruntime</groupId>
    <artifactId>onnxruntime</artifactId>
    <version>1.20.0</version>
</dependency>
<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.11.0</version>
</dependency>
<dependency>
    <groupId>ai.djl.huggingface</groupId>
    <artifactId>tokenizers</artifactId>
    <version>0.30.0</version>
</dependency>
Gradle
implementation 'com.microsoft.onnxruntime:onnxruntime:1.20.0'
implementation 'com.google.code.gson:gson:2.11.0'
implementation 'ai.djl.huggingface:tokenizers:0.30.0'

Basic Usage

import com.redis.vl.utils.rerank.HFCrossEncoderReranker;
import com.redis.vl.utils.rerank.RerankResult;
import java.util.Arrays;
import java.util.List;

// Create reranker with default model
HFCrossEncoderReranker reranker = new HFCrossEncoderReranker();

// Define query and documents
String query = "What is the capital of the United States?";
List<String> docs = Arrays.asList(
    "Carson City is the capital city of Nevada.",
    "Washington, D.C. is the capital of the United States.",
    "Charlotte Amalie is the capital of the US Virgin Islands.",
    "Capital punishment exists in the United States."
);

// Rerank documents
RerankResult result = reranker.rank(query, docs);

// Access reranked documents
List<?> rerankedDocs = result.getDocuments();
System.out.println("Top result: " + rerankedDocs.get(0));

// Access relevance scores
if (result.hasScores()) {
    List<Double> scores = result.getScores();
    for (int i = 0; i < rerankedDocs.size(); i++) {
        System.out.println("Score: " + scores.get(i) + " - " + rerankedDocs.get(i));
    }
}

Understanding Relevance Scores

Relevance scores are probability values in the range [0, 1]:

  • 1.0: Perfect relevance (exact match)

  • 0.5: Moderate relevance

  • 0.0: No relevance

The scores are calculated by applying sigmoid activation to the model’s raw outputs, matching the behavior of the Python sentence-transformers library. Higher scores indicate stronger query-document relevance.

Scores are relative to the input documents. A score of 0.9 doesn’t guarantee relevance - it means this document is highly relevant compared to others in the batch.

Builder Pattern

Configure the reranker with the builder:

HFCrossEncoderReranker reranker = HFCrossEncoderReranker.builder()
    .model("cross-encoder/ms-marco-MiniLM-L-6-v2")  // Model name
    .limit(5)                                        // Return top 5 results
    .returnScore(true)                               // Include relevance scores
    .cacheDir("/path/to/model/cache")               // Custom cache directory
    .build();

Supported Models

HFCrossEncoderReranker works with any HuggingFace cross-encoder that has ONNX exports. The implementation automatically detects the model architecture (BERT, XLMRoberta, RoBERTa) and handles tokenization accordingly.

Popular models include:

Model Use Case Size

cross-encoder/ms-marco-MiniLM-L-6-v2

General-purpose reranking (default)

~80MB

cross-encoder/ms-marco-MiniLM-L-12-v2

Higher accuracy general reranking

~130MB

cross-encoder/stsb-distilroberta-base

Semantic similarity scoring

~250MB

BAAI/bge-reranker-base

Multilingual reranking (XLMRoberta)

~280MB

BAAI/bge-reranker-large

Highest accuracy (slower)

~560MB

Models are automatically downloaded from HuggingFace and cached in ~/.cache/redisvl4j/ by default.

Both BERT-based models (e.g., ms-marco-MiniLM) and XLMRoberta-based models (e.g., BAAI/bge-reranker) are fully supported with automatic architecture detection.

Working with String Documents

The simplest form accepts a list of strings:

List<String> docs = Arrays.asList(
    "Redis is an in-memory database",
    "PostgreSQL is a relational database",
    "MongoDB is a document database"
);

RerankResult result = reranker.rank("What is Redis?", docs);

// Returns List<String> when input was List<String>
List<String> rerankedDocs = (List<String>) result.getDocuments();

Working with Map Documents

For structured documents, use maps with a content field:

import java.util.Map;

List<Map<String, Object>> docs = Arrays.asList(
    Map.of("id", "doc1", "content", "Redis is an in-memory database", "source", "wiki"),
    Map.of("id", "doc2", "content", "PostgreSQL is a relational database", "source", "docs"),
    Map.of("id", "doc3", "content", "MongoDB is a document database", "source", "wiki")
);

RerankResult result = reranker.rank("What is Redis?", docs);

// Returns List<Map<String, Object>> with all fields preserved
List<Map<String, Object>> rerankedDocs =
    (List<Map<String, Object>>) result.getDocuments();

// Access full document with metadata
Map<String, Object> topDoc = rerankedDocs.get(0);
System.out.println("ID: " + topDoc.get("id"));
System.out.println("Content: " + topDoc.get("content"));
System.out.println("Source: " + topDoc.get("source"));
Only documents with a content field are ranked. Documents missing this field are skipped.

Configuration Options

HFCrossEncoderReranker reranker = HFCrossEncoderReranker.builder()
    // Model selection
    .model("cross-encoder/ms-marco-MiniLM-L-6-v2")

    // Limit: Maximum number of results to return
    // Useful for reducing response size and computation
    .limit(10)

    // Return scores: Include relevance scores in results
    // Scores help you filter by confidence threshold
    .returnScore(true)

    // Cache directory: Where to store downloaded models
    // Default: ~/.cache/redisvl4j/
    .cacheDir(System.getProperty("user.home") + "/.cache/redisvl4j")

    .build();

Model Caching

Models are automatically cached after first download:

  1. First run: Downloads model from HuggingFace (~80MB for default model)

  2. Subsequent runs: Loads from local cache (fast)

  3. Cache location: ~/.cache/redisvl4j/models/<model-name>/

// First time: Downloads model (one-time ~5-10 seconds)
HFCrossEncoderReranker reranker = new HFCrossEncoderReranker();

// Subsequent times: Loads from cache (instant)
HFCrossEncoderReranker reranker2 = new HFCrossEncoderReranker();

To use a custom cache directory:

String customCache = "/data/ml-models/cache";
HFCrossEncoderReranker reranker = HFCrossEncoderReranker.builder()
    .cacheDir(customCache)
    .build();

Resource Management

Rerankers hold ONNX Runtime sessions that should be cleaned up:

HFCrossEncoderReranker reranker = new HFCrossEncoderReranker();
try {
    // Use reranker
    RerankResult result = reranker.rank(query, docs);
} finally {
    // Clean up resources
    reranker.close();
}

CohereReranker

The CohereReranker class provides cloud-based reranking using Cohere’s powerful Rerank API. Cohere offers state-of-the-art reranking models with excellent multilingual support and advanced features like structured document ranking.

Setup

Add the Cohere Java SDK dependency to your project:

Maven
<dependency>
    <groupId>com.cohere</groupId>
    <artifactId>cohere-java</artifactId>
    <version>1.8.1</version>
    <scope>runtime</scope>
</dependency>
Gradle
runtimeOnly 'com.cohere:cohere-java:1.8.1'
The Cohere SDK is loaded dynamically via reflection, so it can be a runtime dependency only.

API Key Setup

Obtain an API key from Cohere Dashboard and provide it through configuration:

import com.redis.vl.utils.rerank.CohereReranker;
import java.util.Map;

// Option 1: Provide API key directly
Map<String, String> apiConfig = Map.of("api_key", "your-cohere-api-key");
CohereReranker reranker = CohereReranker.builder()
    .apiConfig(apiConfig)
    .build();

// Option 2: Set COHERE_API_KEY environment variable
// Reranker will automatically use it
CohereReranker reranker = CohereReranker.builder().build();

Basic Usage

import com.redis.vl.utils.rerank.CohereReranker;
import com.redis.vl.utils.rerank.RerankResult;
import java.util.Arrays;
import java.util.List;
import java.util.Map;

// Create reranker
Map<String, String> apiConfig = Map.of("api_key", "your-api-key");
CohereReranker reranker = CohereReranker.builder()
    .model("rerank-english-v3.0")
    .limit(3)
    .apiConfig(apiConfig)
    .build();

// Define query and documents
String query = "What is the capital of the United States?";
List<String> docs = Arrays.asList(
    "Carson City is the capital city of Nevada.",
    "Washington, D.C. is the capital of the United States.",
    "Charlotte Amalie is the capital of the US Virgin Islands."
);

// Rerank documents
RerankResult result = reranker.rank(query, docs);

// Access results
List<?> rerankedDocs = result.getDocuments();
List<Double> scores = result.getScores();

for (int i = 0; i < rerankedDocs.size(); i++) {
    System.out.println("Score: " + scores.get(i) + " - " + rerankedDocs.get(i));
}

Understanding Cohere Relevance Scores

Cohere relevance scores indicate query-document relevance:

  • Higher scores: Stronger relevance (typically 0.5 to 1.0 for relevant docs)

  • Lower scores: Weaker relevance (typically below 0.5 for less relevant docs)

  • Score range: Unbounded, but most relevant documents score between 0.8-1.0

Cohere’s reranking models are specifically trained for relevance scoring, providing highly accurate rankings especially for English and multilingual content.

Supported Models

Cohere provides several reranking models:

Model Use Case Languages

rerank-english-v3.0 (default)

High-quality English reranking

English

rerank-multilingual-v3.0

Multilingual reranking

100+ languages

rerank-english-v2.0

Previous generation (legacy)

English

// Use multilingual model
CohereReranker reranker = CohereReranker.builder()
    .model("rerank-multilingual-v3.0")
    .apiConfig(apiConfig)
    .build();

Working with String Documents

Simple string document reranking:

List<String> docs = Arrays.asList(
    "Redis is an in-memory database",
    "PostgreSQL is a relational database",
    "MongoDB is a document database"
);

RerankResult result = reranker.rank("What is Redis?", docs);
List<String> rerankedDocs = (List<String>) result.getDocuments();

Working with Structured Documents

Cohere supports reranking structured documents using the rank_by parameter to specify which fields to consider:

import java.util.Map;

// Create structured documents
List<Map<String, Object>> docs = Arrays.asList(
    Map.of(
        "source", "wiki",
        "passage", "Redis is an in-memory database",
        "timestamp", "2024-01-15"
    ),
    Map.of(
        "source", "docs",
        "passage", "PostgreSQL is a relational database",
        "timestamp", "2024-01-20"
    )
);

// Rerank using the 'passage' field
CohereReranker reranker = CohereReranker.builder()
    .rankBy(List.of("passage"))  // Specify field(s) to rank by
    .apiConfig(apiConfig)
    .build();

RerankResult result = reranker.rank("What is Redis?", docs);

// All fields are preserved in reranked results
List<Map<String, Object>> reranked = (List<Map<String, Object>>) result.getDocuments();
Map<String, Object> topDoc = reranked.get(0);
System.out.println("Source: " + topDoc.get("source"));
System.out.println("Passage: " + topDoc.get("passage"));

You can rank by multiple fields:

CohereReranker reranker = CohereReranker.builder()
    .rankBy(List.of("title", "content", "summary"))
    .apiConfig(apiConfig)
    .build();

Runtime Parameter Overrides

Override configuration parameters at runtime without creating a new reranker:

CohereReranker reranker = CohereReranker.builder()
    .limit(5)  // Default limit
    .apiConfig(apiConfig)
    .build();

// Override limit to 2 for this specific query
RerankResult result = reranker.rank(query, docs, Map.of("limit", 2));

// Override multiple parameters
RerankResult result2 = reranker.rank(query, docs, Map.of(
    "limit", 10,
    "return_score", false,
    "rank_by", List.of("title", "content"),
    "max_chunks_per_doc", 10
));

Supported runtime parameters:

  • limit (Integer): Maximum number of results to return

  • return_score (Boolean): Whether to include relevance scores

  • rank_by (List<String> or String): Field(s) to rank by for structured documents

  • max_chunks_per_doc (Integer): Maximum chunks per document for long documents

Configuration Options

CohereReranker reranker = CohereReranker.builder()
    // Model selection
    .model("rerank-english-v3.0")

    // Limit: Maximum number of results to return (default: 5)
    .limit(10)

    // Return scores: Include relevance scores (default: true)
    .returnScore(true)

    // Rank by: Fields to use for ranking structured documents
    .rankBy(List.of("passage", "title"))

    // API configuration with API key
    .apiConfig(Map.of("api_key", "your-api-key"))

    .build();

Error Handling

try {
    RerankResult result = reranker.rank(query, docs);
} catch (IllegalArgumentException e) {
    // Missing API key or invalid arguments
    System.err.println("Configuration error: " + e.getMessage());
} catch (RuntimeException e) {
    // API call failure or network error
    System.err.println("Reranking failed: " + e.getMessage());
    if (e.getCause() != null) {
        System.err.println("Cause: " + e.getCause().getMessage());
    }
}

Common errors:

  • Missing API key: Set COHERE_API_KEY environment variable or provide in apiConfig

  • Invalid model: Check model name spelling and availability

  • API quota exceeded: Check your Cohere account limits

  • Network error: Verify internet connectivity

VoyageAIReranker

The VoyageAIReranker class provides cloud-based reranking using VoyageAI’s Rerank API. VoyageAI offers efficient and accurate reranking models optimized for production use.

Setup

VoyageAI reranker uses HTTP REST API calls via OkHttp. Add the OkHttp and Jackson dependencies:

Maven
<dependency>
    <groupId>com.squareup.okhttp3</groupId>
    <artifactId>okhttp</artifactId>
    <version>4.12.0</version>
</dependency>
<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>2.18.2</version>
</dependency>
Gradle
implementation 'com.squareup.okhttp3:okhttp:4.12.0'
implementation 'com.fasterxml.jackson.core:jackson-databind:2.18.2'
VoyageAI does not provide an official Java SDK. RedisVL4J uses direct REST API integration via OkHttp.

API Key Setup

Obtain an API key from VoyageAI Dashboard and provide it through configuration:

import com.redis.vl.utils.rerank.VoyageAIReranker;
import java.util.Map;

// Option 1: Provide API key directly
Map<String, String> apiConfig = Map.of("api_key", "your-voyage-api-key");
VoyageAIReranker reranker = VoyageAIReranker.builder()
    .apiConfig(apiConfig)
    .build();

// Option 2: Set VOYAGE_API_KEY environment variable
VoyageAIReranker reranker = VoyageAIReranker.builder().build();

Basic Usage

import com.redis.vl.utils.rerank.VoyageAIReranker;
import com.redis.vl.utils.rerank.RerankResult;
import java.util.Arrays;
import java.util.List;
import java.util.Map;

// Create reranker
Map<String, String> apiConfig = Map.of("api_key", "your-api-key");
VoyageAIReranker reranker = VoyageAIReranker.builder()
    .model("rerank-lite-1")
    .limit(3)
    .apiConfig(apiConfig)
    .build();

// Define query and documents
String query = "What is the capital of the United States?";
List<String> docs = Arrays.asList(
    "Carson City is the capital city of Nevada.",
    "Washington, D.C. is the capital of the United States.",
    "Charlotte Amalie is the capital of the US Virgin Islands."
);

// Rerank documents
RerankResult result = reranker.rank(query, docs);

// Access results
List<?> rerankedDocs = result.getDocuments();
List<Double> scores = result.getScores();

for (int i = 0; i < rerankedDocs.size(); i++) {
    System.out.println("Score: " + scores.get(i) + " - " + rerankedDocs.get(i));
}

Understanding VoyageAI Relevance Scores

VoyageAI relevance scores indicate query-document relevance:

  • Score range: Typically 0.0 to 1.0

  • High relevance: Scores above 0.7

  • Medium relevance: Scores between 0.4 and 0.7

  • Low relevance: Scores below 0.4

VoyageAI’s reranking models provide fast, production-ready relevance scoring suitable for real-time applications.

Supported Models

VoyageAI provides several reranking models:

Model Use Case Speed

rerank-lite-1 (default)

Fast, efficient reranking

Fastest

rerank-2

Balanced accuracy and speed

Medium

rerank-2.5

Highest accuracy

Slower

// Use higher accuracy model
VoyageAIReranker reranker = VoyageAIReranker.builder()
    .model("rerank-2.5")
    .apiConfig(apiConfig)
    .build();

Working with String Documents

Simple string document reranking:

List<String> docs = Arrays.asList(
    "Redis is an in-memory database",
    "PostgreSQL is a relational database",
    "MongoDB is a document database"
);

RerankResult result = reranker.rank("What is Redis?", docs);
List<String> rerankedDocs = (List<String>) result.getDocuments();

Working with Structured Documents

VoyageAI requires structured documents to have a content field:

import java.util.Map;

// Create structured documents with 'content' field
List<Map<String, Object>> docs = Arrays.asList(
    Map.of(
        "source", "wiki",
        "content", "Redis is an in-memory database",  // Must be 'content'
        "timestamp", "2024-01-15"
    ),
    Map.of(
        "source", "docs",
        "content", "PostgreSQL is a relational database",
        "timestamp", "2024-01-20"
    )
);

RerankResult result = reranker.rank("What is Redis?", docs);

// All fields are preserved in reranked results
List<Map<String, Object>> reranked = (List<Map<String, Object>>) result.getDocuments();
Map<String, Object> topDoc = reranked.get(0);
System.out.println("Source: " + topDoc.get("source"));
System.out.println("Content: " + topDoc.get("content"));
Unlike Cohere, VoyageAI does not support custom field selection. Structured documents must use the content field for the text to be ranked.

Runtime Parameter Overrides

Override configuration parameters at runtime:

VoyageAIReranker reranker = VoyageAIReranker.builder()
    .limit(5)  // Default limit
    .apiConfig(apiConfig)
    .build();

// Override limit to 2 for this specific query
RerankResult result = reranker.rank(query, docs, Map.of("limit", 2));

// Override multiple parameters including truncation
RerankResult result2 = reranker.rank(query, docs, Map.of(
    "limit", 10,
    "return_score", false,
    "truncation", true  // Automatically truncate long documents
));

Supported runtime parameters:

  • limit (Integer): Maximum number of results to return

  • return_score (Boolean): Whether to include relevance scores

  • truncation (Boolean): Automatically truncate documents exceeding token limits

Configuration Options

VoyageAIReranker reranker = VoyageAIReranker.builder()
    // Model selection
    .model("rerank-lite-1")

    // Limit: Maximum number of results to return (default: 5)
    .limit(10)

    // Return scores: Include relevance scores (default: true)
    .returnScore(true)

    // API configuration with API key
    .apiConfig(Map.of("api_key", "your-api-key"))

    .build();

Error Handling

try {
    RerankResult result = reranker.rank(query, docs);
} catch (IllegalArgumentException e) {
    // Missing API key, invalid arguments, or missing 'content' field
    System.err.println("Configuration error: " + e.getMessage());
} catch (RuntimeException e) {
    // API call failure or network error
    System.err.println("Reranking failed: " + e.getMessage());
    if (e.getCause() != null) {
        System.err.println("Cause: " + e.getCause().getMessage());
    }
}

Common errors:

  • Missing API key: Set VOYAGE_API_KEY environment variable or provide in apiConfig

  • Missing 'content' field: Structured documents must have a content field

  • Invalid model: Check model name and availability

  • API quota exceeded: Check your VoyageAI account limits

  • Network error: Verify internet connectivity

Integration with SearchIndex

Rerankers work seamlessly with RedisVL search results:

import com.redis.vl.index.SearchIndex;
import com.redis.vl.query.VectorQuery;

// Perform initial vector search (get top 100 candidates)
VectorQuery query = VectorQuery.builder()
    .vector(queryEmbedding)
    .field("embedding")
    .numResults(100)  // Broad initial retrieval
    .build();

List<Map<String, Object>> searchResults = index.query(query);

// Rerank to get best 10 results
HFCrossEncoderReranker reranker = HFCrossEncoderReranker.builder()
    .limit(10)
    .build();

RerankResult reranked = reranker.rank("user query text", searchResults);

// Present top 10 highest-quality results to user
List<Map<String, Object>> topResults =
    (List<Map<String, Object>>) reranked.getDocuments();

Performance Considerations

Speed vs Accuracy

Cross-encoders are more accurate but slower than vector similarity:

  • Vector similarity: ~1ms for 1000 documents (compare embeddings)

  • Cross-encoder reranking: ~10-100ms for 100 documents (model inference)

Best practice: Use vector search for broad retrieval, then rerank top candidates.

Model Selection Trade-offs

Model Speed Accuracy Size

ms-marco-MiniLM-L-6-v2

Fast

Good

Small

ms-marco-MiniLM-L-12-v2

Medium

Better

Medium

bge-reranker-large

Slow

Best

Large

Batch Size Recommendations

For optimal performance:

  • Interactive queries: Rerank top 10-20 candidates

  • Batch processing: Rerank top 50-100 candidates

  • Maximum practical: ~200 documents per query

Memory Usage

  • Model loaded once per JVM: ~200-600MB RAM depending on model

  • Inference per query: ~10-50MB temporary memory

  • Models are cached on disk, not in memory between runs

Error Handling

try {
    RerankResult result = reranker.rank(query, docs);
} catch (IllegalArgumentException e) {
    // Invalid arguments (null query, empty docs, etc.)
    System.err.println("Invalid input: " + e.getMessage());
} catch (RuntimeException e) {
    // Model loading or inference failure
    System.err.println("Reranking failed: " + e.getMessage());
}

Common errors:

  • Model not found: Check model name and network connectivity

  • Out of memory: Use smaller model or increase JVM heap

  • Invalid documents: Ensure documents have content field for Map inputs

Next Steps