Rerankers
Rerankers improve search result quality by reordering documents based on their relevance to a query using specialized models. RedisVL for Java provides powerful reranking capabilities through local HuggingFace cross-encoder models and cloud-based services like Cohere and VoyageAI.
What are Rerankers?
Rerankers provide a relevance boost to search results generated by traditional (lexical) or semantic search strategies. While initial search retrieves a broad set of potentially relevant documents, rerankers apply more sophisticated relevance scoring to produce a refined, higher-quality ranking.
Cross-Encoders vs Bi-Encoders
RedisVL uses cross-encoder models for reranking:
-
Bi-encoders (used for embeddings): Encode query and document separately, then compare vectors
-
Cross-encoders (used for reranking): Encode query+document pairs together, producing direct relevance scores
Cross-encoders are slower but more accurate because they can model the interaction between query and document directly.
Why Use Rerankers?
Reranking addresses common search quality issues:
-
Improve Precision - Surface the most relevant results at the top
-
Better Ranking - Cross-encoders understand query-document relationships better than vector similarity alone
-
Flexible Integration - Apply reranking to any search results (vector, lexical, or hybrid)
-
Cost-Effective - Rerank only the top K results rather than scoring all documents
Typical workflow:
-
Perform fast initial search (vector or hybrid) to get top 100 candidates
-
Use reranker to precisely rank top 10 results
-
Return highest-quality results to users
HFCrossEncoderReranker
The HFCrossEncoderReranker
class uses real HuggingFace cross-encoder models running via ONNX Runtime. Models are automatically downloaded and cached locally.
Setup
Add ONNX Runtime and HuggingFace tokenizer dependencies to your project:
- Maven
-
<dependency> <groupId>com.microsoft.onnxruntime</groupId> <artifactId>onnxruntime</artifactId> <version>1.20.0</version> </dependency> <dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> <version>2.11.0</version> </dependency> <dependency> <groupId>ai.djl.huggingface</groupId> <artifactId>tokenizers</artifactId> <version>0.30.0</version> </dependency>
- Gradle
-
implementation 'com.microsoft.onnxruntime:onnxruntime:1.20.0' implementation 'com.google.code.gson:gson:2.11.0' implementation 'ai.djl.huggingface:tokenizers:0.30.0'
Basic Usage
import com.redis.vl.utils.rerank.HFCrossEncoderReranker;
import com.redis.vl.utils.rerank.RerankResult;
import java.util.Arrays;
import java.util.List;
// Create reranker with default model
HFCrossEncoderReranker reranker = new HFCrossEncoderReranker();
// Define query and documents
String query = "What is the capital of the United States?";
List<String> docs = Arrays.asList(
"Carson City is the capital city of Nevada.",
"Washington, D.C. is the capital of the United States.",
"Charlotte Amalie is the capital of the US Virgin Islands.",
"Capital punishment exists in the United States."
);
// Rerank documents
RerankResult result = reranker.rank(query, docs);
// Access reranked documents
List<?> rerankedDocs = result.getDocuments();
System.out.println("Top result: " + rerankedDocs.get(0));
// Access relevance scores
if (result.hasScores()) {
List<Double> scores = result.getScores();
for (int i = 0; i < rerankedDocs.size(); i++) {
System.out.println("Score: " + scores.get(i) + " - " + rerankedDocs.get(i));
}
}
Understanding Relevance Scores
Relevance scores are probability values in the range [0, 1]:
-
1.0: Perfect relevance (exact match)
-
0.5: Moderate relevance
-
0.0: No relevance
The scores are calculated by applying sigmoid activation to the model’s raw outputs, matching the behavior of the Python sentence-transformers
library. Higher scores indicate stronger query-document relevance.
Scores are relative to the input documents. A score of 0.9 doesn’t guarantee relevance - it means this document is highly relevant compared to others in the batch. |
Builder Pattern
Configure the reranker with the builder:
HFCrossEncoderReranker reranker = HFCrossEncoderReranker.builder()
.model("cross-encoder/ms-marco-MiniLM-L-6-v2") // Model name
.limit(5) // Return top 5 results
.returnScore(true) // Include relevance scores
.cacheDir("/path/to/model/cache") // Custom cache directory
.build();
Supported Models
HFCrossEncoderReranker works with any HuggingFace cross-encoder that has ONNX exports. The implementation automatically detects the model architecture (BERT, XLMRoberta, RoBERTa) and handles tokenization accordingly.
Popular models include:
Model | Use Case | Size |
---|---|---|
|
General-purpose reranking (default) |
~80MB |
|
Higher accuracy general reranking |
~130MB |
|
Semantic similarity scoring |
~250MB |
|
Multilingual reranking (XLMRoberta) |
~280MB |
|
Highest accuracy (slower) |
~560MB |
Models are automatically downloaded from HuggingFace and cached in ~/.cache/redisvl4j/
by default.
Both BERT-based models (e.g., ms-marco-MiniLM) and XLMRoberta-based models (e.g., BAAI/bge-reranker) are fully supported with automatic architecture detection. |
Working with String Documents
The simplest form accepts a list of strings:
List<String> docs = Arrays.asList(
"Redis is an in-memory database",
"PostgreSQL is a relational database",
"MongoDB is a document database"
);
RerankResult result = reranker.rank("What is Redis?", docs);
// Returns List<String> when input was List<String>
List<String> rerankedDocs = (List<String>) result.getDocuments();
Working with Map Documents
For structured documents, use maps with a content
field:
import java.util.Map;
List<Map<String, Object>> docs = Arrays.asList(
Map.of("id", "doc1", "content", "Redis is an in-memory database", "source", "wiki"),
Map.of("id", "doc2", "content", "PostgreSQL is a relational database", "source", "docs"),
Map.of("id", "doc3", "content", "MongoDB is a document database", "source", "wiki")
);
RerankResult result = reranker.rank("What is Redis?", docs);
// Returns List<Map<String, Object>> with all fields preserved
List<Map<String, Object>> rerankedDocs =
(List<Map<String, Object>>) result.getDocuments();
// Access full document with metadata
Map<String, Object> topDoc = rerankedDocs.get(0);
System.out.println("ID: " + topDoc.get("id"));
System.out.println("Content: " + topDoc.get("content"));
System.out.println("Source: " + topDoc.get("source"));
Only documents with a content field are ranked. Documents missing this field are skipped.
|
Configuration Options
HFCrossEncoderReranker reranker = HFCrossEncoderReranker.builder()
// Model selection
.model("cross-encoder/ms-marco-MiniLM-L-6-v2")
// Limit: Maximum number of results to return
// Useful for reducing response size and computation
.limit(10)
// Return scores: Include relevance scores in results
// Scores help you filter by confidence threshold
.returnScore(true)
// Cache directory: Where to store downloaded models
// Default: ~/.cache/redisvl4j/
.cacheDir(System.getProperty("user.home") + "/.cache/redisvl4j")
.build();
Model Caching
Models are automatically cached after first download:
-
First run: Downloads model from HuggingFace (~80MB for default model)
-
Subsequent runs: Loads from local cache (fast)
-
Cache location:
~/.cache/redisvl4j/models/<model-name>/
// First time: Downloads model (one-time ~5-10 seconds)
HFCrossEncoderReranker reranker = new HFCrossEncoderReranker();
// Subsequent times: Loads from cache (instant)
HFCrossEncoderReranker reranker2 = new HFCrossEncoderReranker();
To use a custom cache directory:
String customCache = "/data/ml-models/cache";
HFCrossEncoderReranker reranker = HFCrossEncoderReranker.builder()
.cacheDir(customCache)
.build();
CohereReranker
The CohereReranker
class provides cloud-based reranking using Cohere’s powerful Rerank API. Cohere offers state-of-the-art reranking models with excellent multilingual support and advanced features like structured document ranking.
Setup
Add the Cohere Java SDK dependency to your project:
- Maven
-
<dependency> <groupId>com.cohere</groupId> <artifactId>cohere-java</artifactId> <version>1.8.1</version> <scope>runtime</scope> </dependency>
- Gradle
-
runtimeOnly 'com.cohere:cohere-java:1.8.1'
The Cohere SDK is loaded dynamically via reflection, so it can be a runtime dependency only.
|
API Key Setup
Obtain an API key from Cohere Dashboard and provide it through configuration:
import com.redis.vl.utils.rerank.CohereReranker;
import java.util.Map;
// Option 1: Provide API key directly
Map<String, String> apiConfig = Map.of("api_key", "your-cohere-api-key");
CohereReranker reranker = CohereReranker.builder()
.apiConfig(apiConfig)
.build();
// Option 2: Set COHERE_API_KEY environment variable
// Reranker will automatically use it
CohereReranker reranker = CohereReranker.builder().build();
Basic Usage
import com.redis.vl.utils.rerank.CohereReranker;
import com.redis.vl.utils.rerank.RerankResult;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
// Create reranker
Map<String, String> apiConfig = Map.of("api_key", "your-api-key");
CohereReranker reranker = CohereReranker.builder()
.model("rerank-english-v3.0")
.limit(3)
.apiConfig(apiConfig)
.build();
// Define query and documents
String query = "What is the capital of the United States?";
List<String> docs = Arrays.asList(
"Carson City is the capital city of Nevada.",
"Washington, D.C. is the capital of the United States.",
"Charlotte Amalie is the capital of the US Virgin Islands."
);
// Rerank documents
RerankResult result = reranker.rank(query, docs);
// Access results
List<?> rerankedDocs = result.getDocuments();
List<Double> scores = result.getScores();
for (int i = 0; i < rerankedDocs.size(); i++) {
System.out.println("Score: " + scores.get(i) + " - " + rerankedDocs.get(i));
}
Understanding Cohere Relevance Scores
Cohere relevance scores indicate query-document relevance:
-
Higher scores: Stronger relevance (typically 0.5 to 1.0 for relevant docs)
-
Lower scores: Weaker relevance (typically below 0.5 for less relevant docs)
-
Score range: Unbounded, but most relevant documents score between 0.8-1.0
Cohere’s reranking models are specifically trained for relevance scoring, providing highly accurate rankings especially for English and multilingual content.
Supported Models
Cohere provides several reranking models:
Model | Use Case | Languages |
---|---|---|
|
High-quality English reranking |
English |
|
Multilingual reranking |
100+ languages |
|
Previous generation (legacy) |
English |
// Use multilingual model
CohereReranker reranker = CohereReranker.builder()
.model("rerank-multilingual-v3.0")
.apiConfig(apiConfig)
.build();
Working with String Documents
Simple string document reranking:
List<String> docs = Arrays.asList(
"Redis is an in-memory database",
"PostgreSQL is a relational database",
"MongoDB is a document database"
);
RerankResult result = reranker.rank("What is Redis?", docs);
List<String> rerankedDocs = (List<String>) result.getDocuments();
Working with Structured Documents
Cohere supports reranking structured documents using the rank_by
parameter to specify which fields to consider:
import java.util.Map;
// Create structured documents
List<Map<String, Object>> docs = Arrays.asList(
Map.of(
"source", "wiki",
"passage", "Redis is an in-memory database",
"timestamp", "2024-01-15"
),
Map.of(
"source", "docs",
"passage", "PostgreSQL is a relational database",
"timestamp", "2024-01-20"
)
);
// Rerank using the 'passage' field
CohereReranker reranker = CohereReranker.builder()
.rankBy(List.of("passage")) // Specify field(s) to rank by
.apiConfig(apiConfig)
.build();
RerankResult result = reranker.rank("What is Redis?", docs);
// All fields are preserved in reranked results
List<Map<String, Object>> reranked = (List<Map<String, Object>>) result.getDocuments();
Map<String, Object> topDoc = reranked.get(0);
System.out.println("Source: " + topDoc.get("source"));
System.out.println("Passage: " + topDoc.get("passage"));
You can rank by multiple fields:
CohereReranker reranker = CohereReranker.builder()
.rankBy(List.of("title", "content", "summary"))
.apiConfig(apiConfig)
.build();
Runtime Parameter Overrides
Override configuration parameters at runtime without creating a new reranker:
CohereReranker reranker = CohereReranker.builder()
.limit(5) // Default limit
.apiConfig(apiConfig)
.build();
// Override limit to 2 for this specific query
RerankResult result = reranker.rank(query, docs, Map.of("limit", 2));
// Override multiple parameters
RerankResult result2 = reranker.rank(query, docs, Map.of(
"limit", 10,
"return_score", false,
"rank_by", List.of("title", "content"),
"max_chunks_per_doc", 10
));
Supported runtime parameters:
-
limit
(Integer): Maximum number of results to return -
return_score
(Boolean): Whether to include relevance scores -
rank_by
(List<String> or String): Field(s) to rank by for structured documents -
max_chunks_per_doc
(Integer): Maximum chunks per document for long documents
Configuration Options
CohereReranker reranker = CohereReranker.builder()
// Model selection
.model("rerank-english-v3.0")
// Limit: Maximum number of results to return (default: 5)
.limit(10)
// Return scores: Include relevance scores (default: true)
.returnScore(true)
// Rank by: Fields to use for ranking structured documents
.rankBy(List.of("passage", "title"))
// API configuration with API key
.apiConfig(Map.of("api_key", "your-api-key"))
.build();
Error Handling
try {
RerankResult result = reranker.rank(query, docs);
} catch (IllegalArgumentException e) {
// Missing API key or invalid arguments
System.err.println("Configuration error: " + e.getMessage());
} catch (RuntimeException e) {
// API call failure or network error
System.err.println("Reranking failed: " + e.getMessage());
if (e.getCause() != null) {
System.err.println("Cause: " + e.getCause().getMessage());
}
}
Common errors:
-
Missing API key: Set
COHERE_API_KEY
environment variable or provide inapiConfig
-
Invalid model: Check model name spelling and availability
-
API quota exceeded: Check your Cohere account limits
-
Network error: Verify internet connectivity
VoyageAIReranker
The VoyageAIReranker
class provides cloud-based reranking using VoyageAI’s Rerank API. VoyageAI offers efficient and accurate reranking models optimized for production use.
Setup
VoyageAI reranker uses HTTP REST API calls via OkHttp. Add the OkHttp and Jackson dependencies:
- Maven
-
<dependency> <groupId>com.squareup.okhttp3</groupId> <artifactId>okhttp</artifactId> <version>4.12.0</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.18.2</version> </dependency>
- Gradle
-
implementation 'com.squareup.okhttp3:okhttp:4.12.0' implementation 'com.fasterxml.jackson.core:jackson-databind:2.18.2'
VoyageAI does not provide an official Java SDK. RedisVL4J uses direct REST API integration via OkHttp. |
API Key Setup
Obtain an API key from VoyageAI Dashboard and provide it through configuration:
import com.redis.vl.utils.rerank.VoyageAIReranker;
import java.util.Map;
// Option 1: Provide API key directly
Map<String, String> apiConfig = Map.of("api_key", "your-voyage-api-key");
VoyageAIReranker reranker = VoyageAIReranker.builder()
.apiConfig(apiConfig)
.build();
// Option 2: Set VOYAGE_API_KEY environment variable
VoyageAIReranker reranker = VoyageAIReranker.builder().build();
Basic Usage
import com.redis.vl.utils.rerank.VoyageAIReranker;
import com.redis.vl.utils.rerank.RerankResult;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
// Create reranker
Map<String, String> apiConfig = Map.of("api_key", "your-api-key");
VoyageAIReranker reranker = VoyageAIReranker.builder()
.model("rerank-lite-1")
.limit(3)
.apiConfig(apiConfig)
.build();
// Define query and documents
String query = "What is the capital of the United States?";
List<String> docs = Arrays.asList(
"Carson City is the capital city of Nevada.",
"Washington, D.C. is the capital of the United States.",
"Charlotte Amalie is the capital of the US Virgin Islands."
);
// Rerank documents
RerankResult result = reranker.rank(query, docs);
// Access results
List<?> rerankedDocs = result.getDocuments();
List<Double> scores = result.getScores();
for (int i = 0; i < rerankedDocs.size(); i++) {
System.out.println("Score: " + scores.get(i) + " - " + rerankedDocs.get(i));
}
Understanding VoyageAI Relevance Scores
VoyageAI relevance scores indicate query-document relevance:
-
Score range: Typically 0.0 to 1.0
-
High relevance: Scores above 0.7
-
Medium relevance: Scores between 0.4 and 0.7
-
Low relevance: Scores below 0.4
VoyageAI’s reranking models provide fast, production-ready relevance scoring suitable for real-time applications.
Supported Models
VoyageAI provides several reranking models:
Model | Use Case | Speed |
---|---|---|
|
Fast, efficient reranking |
Fastest |
|
Balanced accuracy and speed |
Medium |
|
Highest accuracy |
Slower |
// Use higher accuracy model
VoyageAIReranker reranker = VoyageAIReranker.builder()
.model("rerank-2.5")
.apiConfig(apiConfig)
.build();
Working with String Documents
Simple string document reranking:
List<String> docs = Arrays.asList(
"Redis is an in-memory database",
"PostgreSQL is a relational database",
"MongoDB is a document database"
);
RerankResult result = reranker.rank("What is Redis?", docs);
List<String> rerankedDocs = (List<String>) result.getDocuments();
Working with Structured Documents
VoyageAI requires structured documents to have a content
field:
import java.util.Map;
// Create structured documents with 'content' field
List<Map<String, Object>> docs = Arrays.asList(
Map.of(
"source", "wiki",
"content", "Redis is an in-memory database", // Must be 'content'
"timestamp", "2024-01-15"
),
Map.of(
"source", "docs",
"content", "PostgreSQL is a relational database",
"timestamp", "2024-01-20"
)
);
RerankResult result = reranker.rank("What is Redis?", docs);
// All fields are preserved in reranked results
List<Map<String, Object>> reranked = (List<Map<String, Object>>) result.getDocuments();
Map<String, Object> topDoc = reranked.get(0);
System.out.println("Source: " + topDoc.get("source"));
System.out.println("Content: " + topDoc.get("content"));
Unlike Cohere, VoyageAI does not support custom field selection. Structured documents must use the content field for the text to be ranked.
|
Runtime Parameter Overrides
Override configuration parameters at runtime:
VoyageAIReranker reranker = VoyageAIReranker.builder()
.limit(5) // Default limit
.apiConfig(apiConfig)
.build();
// Override limit to 2 for this specific query
RerankResult result = reranker.rank(query, docs, Map.of("limit", 2));
// Override multiple parameters including truncation
RerankResult result2 = reranker.rank(query, docs, Map.of(
"limit", 10,
"return_score", false,
"truncation", true // Automatically truncate long documents
));
Supported runtime parameters:
-
limit
(Integer): Maximum number of results to return -
return_score
(Boolean): Whether to include relevance scores -
truncation
(Boolean): Automatically truncate documents exceeding token limits
Configuration Options
VoyageAIReranker reranker = VoyageAIReranker.builder()
// Model selection
.model("rerank-lite-1")
// Limit: Maximum number of results to return (default: 5)
.limit(10)
// Return scores: Include relevance scores (default: true)
.returnScore(true)
// API configuration with API key
.apiConfig(Map.of("api_key", "your-api-key"))
.build();
Error Handling
try {
RerankResult result = reranker.rank(query, docs);
} catch (IllegalArgumentException e) {
// Missing API key, invalid arguments, or missing 'content' field
System.err.println("Configuration error: " + e.getMessage());
} catch (RuntimeException e) {
// API call failure or network error
System.err.println("Reranking failed: " + e.getMessage());
if (e.getCause() != null) {
System.err.println("Cause: " + e.getCause().getMessage());
}
}
Common errors:
-
Missing API key: Set
VOYAGE_API_KEY
environment variable or provide inapiConfig
-
Missing 'content' field: Structured documents must have a
content
field -
Invalid model: Check model name and availability
-
API quota exceeded: Check your VoyageAI account limits
-
Network error: Verify internet connectivity
Integration with SearchIndex
Rerankers work seamlessly with RedisVL search results:
import com.redis.vl.index.SearchIndex;
import com.redis.vl.query.VectorQuery;
// Perform initial vector search (get top 100 candidates)
VectorQuery query = VectorQuery.builder()
.vector(queryEmbedding)
.field("embedding")
.numResults(100) // Broad initial retrieval
.build();
List<Map<String, Object>> searchResults = index.query(query);
// Rerank to get best 10 results
HFCrossEncoderReranker reranker = HFCrossEncoderReranker.builder()
.limit(10)
.build();
RerankResult reranked = reranker.rank("user query text", searchResults);
// Present top 10 highest-quality results to user
List<Map<String, Object>> topResults =
(List<Map<String, Object>>) reranked.getDocuments();
Performance Considerations
Speed vs Accuracy
Cross-encoders are more accurate but slower than vector similarity:
-
Vector similarity: ~1ms for 1000 documents (compare embeddings)
-
Cross-encoder reranking: ~10-100ms for 100 documents (model inference)
Best practice: Use vector search for broad retrieval, then rerank top candidates.
Model Selection Trade-offs
Model | Speed | Accuracy | Size |
---|---|---|---|
|
Fast |
Good |
Small |
|
Medium |
Better |
Medium |
|
Slow |
Best |
Large |
Error Handling
try {
RerankResult result = reranker.rank(query, docs);
} catch (IllegalArgumentException e) {
// Invalid arguments (null query, empty docs, etc.)
System.err.println("Invalid input: " + e.getMessage());
} catch (RuntimeException e) {
// Model loading or inference failure
System.err.println("Reranking failed: " + e.getMessage());
}
Common errors:
-
Model not found: Check model name and network connectivity
-
Out of memory: Use smaller model or increase JVM heap
-
Invalid documents: Ensure documents have
content
field for Map inputs
Next Steps
-
Getting Started Guide - Basic vector search setup
-
Hybrid Queries - Combine vector and metadata filtering
-
Vectorizers - Create embeddings for initial search
-
API Reference - Complete Javadoc documentation