Class ExtractiveSelector

java.lang.Object
com.redis.vl.extensions.summarization.ExtractiveSelector

public class ExtractiveSelector extends Object
BERT-based extractive summarization using sentence clustering.

This class selects the most representative sentences from a document by embedding sentences with BERT, clustering them with k-means, and selecting the sentence closest to each cluster centroid.

Key Feature: Preserves original text exactly, which is critical for SubEM (Substring Exact Match) evaluation where paraphrasing fails.

Example Usage:


 SentenceTransformersVectorizer vectorizer = SentenceTransformersVectorizer.builder()
     .modelName("all-MiniLM-L6-v2")
     .build();

 ExtractiveSelector selector = new ExtractiveSelector(vectorizer);
 SentenceSplitter splitter = new SentenceSplitter();

 String document = "Long document text...";
 List<String> sentences = splitter.split(document);
 List<String> keySentences = selector.selectKeySentences(sentences, 10);

 // keySentences contains the 10 most representative sentences
 // in their original order, with exact original text preserved
 
  • Constructor Details

    • ExtractiveSelector

      public ExtractiveSelector(SentenceTransformersVectorizer embedder)
      Create an extractive selector with default settings.
      Parameters:
      embedder - The sentence transformer vectorizer for embeddings
    • ExtractiveSelector

      public ExtractiveSelector(SentenceTransformersVectorizer embedder, int defaultNumSentences)
      Create an extractive selector with custom number of sentences.
      Parameters:
      embedder - The sentence transformer vectorizer for embeddings
      defaultNumSentences - Default number of sentences to select
    • ExtractiveSelector

      public ExtractiveSelector(SentenceTransformersVectorizer embedder, int defaultNumSentences, int maxIterations)
      Create an extractive selector with full configuration.
      Parameters:
      embedder - The sentence transformer vectorizer for embeddings
      defaultNumSentences - Default number of sentences to select
      maxIterations - Maximum k-means iterations
  • Method Details

    • selectKeySentences

      public List<String> selectKeySentences(List<String> sentences)
      Select the most representative sentences using the default count.
      Parameters:
      sentences - List of sentences to select from
      Returns:
      Selected sentences in original order
    • selectKeySentences

      public List<String> selectKeySentences(List<String> sentences, int k)
      Select the k most representative sentences from the input.

      Algorithm:

      1. Embed all sentences using BERT
      2. Cluster embeddings using k-means++
      3. For each cluster, select the sentence closest to the centroid
      4. Return sentences in their original order
      Parameters:
      sentences - List of sentences to select from
      k - Number of sentences to select
      Returns:
      Selected sentences in original order (preserves exact text)
    • builder

      Builder for ExtractiveSelector.