LLM Message History
Large Language Models are inherently stateless and have no knowledge of previous interactions with a user, or even of previous parts of the current conversation. While this may not be noticeable when asking simple questions, it becomes a hindrance when engaging in long running conversations that rely on conversational context.
The solution to this problem is to append the previous conversation history to each subsequent call to the LLM.
This notebook will show how to use Redis to structure and store and retrieve this conversational message history.
import com.redis.vl.extensions.messagehistory.MessageHistory;
import redis.clients.jedis.UnifiedJedis;
import java.util.*;
UnifiedJedis jedis = new UnifiedJedis("redis://localhost:6379");
MessageHistory chatHistory = new MessageHistory("student tutor", jedis);
To align with common LLM APIs, Redis stores messages with role and content fields.
The supported roles are "system", "user" and "llm".
You can store messages one at a time or all at once.
chatHistory.addMessage(Map.of(
"role", "system",
"content", "You are a helpful geography tutor, giving simple and short answers to questions about European countries."
));
chatHistory.addMessages(List.of(
Map.of("role", "user", "content", "What is the capital of France?"),
Map.of("role", "llm", "content", "The capital is Paris."),
Map.of("role", "user", "content", "And what is the capital of Spain?"),
Map.of("role", "llm", "content", "The capital is Madrid."),
Map.of("role", "user", "content", "What is the population of Great Britain?"),
Map.of("role", "llm", "content", "As of 2023 the population of Great Britain is approximately 67 million people.")
));
At any point we can retrieve the recent history of the conversation. It will be ordered by entry time.
List<Map<String, Object>> context = chatHistory.getRecent(5, false, false, null);
for (Map<String, Object> message : context) {
System.out.println(message);
}
Output:
{role=llm, content=The capital is Paris.}
{role=user, content=And what is the capital of Spain?}
{role=llm, content=The capital is Madrid.}
{role=user, content=What is the population of Great Britain?}
{role=llm, content=As of 2023 the population of Great Britain is approximately 67 million people.}
In many LLM flows the conversation progresses in a series of prompt and response pairs. Message history offer a convenience function store() to add these simply.
String prompt = "what is the size of England compared to Portugal?";
String response = "England is larger in land area than Portugal by about 15000 square miles.";
chatHistory.store(prompt, response);
List<Map<String, Object>> context = chatHistory.getRecent(6, false, false, null);
for (Map<String, Object> message : context) {
System.out.println(message);
}
Output:
{role=user, content=And what is the capital of Spain?}
{role=llm, content=The capital is Madrid.}
{role=user, content=What is the population of Great Britain?}
{role=llm, content=As of 2023 the population of Great Britain is approximately 67 million people.}
{role=user, content=what is the size of England compared to Portugal?}
{role=llm, content=England is larger in land area than Portugal by about 15000 square miles.}
Managing multiple users and conversations
For applications that need to handle multiple conversations concurrently, Redis supports tagging messages to keep conversations separated.
chatHistory.addMessage(
Map.of(
"role", "system",
"content", "You are a helpful algebra tutor, giving simple answers to math problems."
),
"student two"
);
chatHistory.addMessages(
List.of(
Map.of("role", "user", "content", "What is the value of x in the equation 2x + 3 = 7?"),
Map.of("role", "llm", "content", "The value of x is 2."),
Map.of("role", "user", "content", "What is the value of y in the equation 3y - 5 = 7?"),
Map.of("role", "llm", "content", "The value of y is 4.")
),
"student two"
);
List<Map<String, Object>> mathMessages = chatHistory.getRecent(10, false, false, "student two");
for (Map<String, Object> mathMessage : mathMessages) {
System.out.println(mathMessage);
}
Output:
{role=system, content=You are a helpful algebra tutor, giving simple answers to math problems.}
{role=user, content=What is the value of x in the equation 2x + 3 = 7?}
{role=llm, content=The value of x is 2.}
{role=user, content=What is the value of y in the equation 3y - 5 = 7?}
{role=llm, content=The value of y is 4.}
Semantic message history
For longer conversations our list of messages keeps growing. Since LLMs are stateless we have to continue to pass this conversation history on each subsequent call to ensure the LLM has the correct context.
A typical flow looks like this:
while (true) {
String prompt = getUserInput(); // input('enter your next question')
List<Map<String, Object>> context = chatHistory.getRecent(5, false, false, null);
String response = llmApiCall(prompt, context); // LLM_api_call(prompt=prompt, context=context)
chatHistory.store(prompt, response);
}
This works, but as context keeps growing so too does our LLM token count, which increases latency and cost.
Conversation histories can be truncated, but that can lead to losing relevant information that appeared early on.
A better solution is to pass only the relevant conversational context on each subsequent call.
For this, RedisVL has the SemanticMessageHistory, which uses vector similarity search to return only semantically relevant sections of the conversation.
import com.redis.vl.extensions.messagehistory.SemanticMessageHistory;
import com.redis.vl.utils.vectorize.SentenceTransformersVectorizer;
import redis.clients.jedis.UnifiedJedis;
UnifiedJedis jedis = new UnifiedJedis("redis://localhost:6379");
// Create a vectorizer for embedding messages
SentenceTransformersVectorizer vectorizer =
new SentenceTransformersVectorizer("sentence-transformers/all-MiniLM-L6-v2");
// Create semantic message history with vectorizer
SemanticMessageHistory semanticHistory =
new SemanticMessageHistory("geography_tutor", vectorizer, jedis);
Storing messages with embeddings
When you store messages in SemanticMessageHistory, each message is automatically embedded using the configured vectorizer:
// Store a system message
semanticHistory.addMessage(Map.of(
"role", "system",
"content", "You are a helpful geography tutor."
));
// Store prompt-response pairs
semanticHistory.store("What is the capital of France?", "The capital is Paris.");
semanticHistory.store("What is the population of Japan?", "Japan has about 125 million people.");
semanticHistory.store("What are the main exports of Brazil?", "Brazil exports coffee, soybeans, and iron ore.");
semanticHistory.store("What is the tallest mountain in Europe?", "Mont Blanc at 4,808 meters.");
Retrieving semantically relevant messages
Instead of retrieving the most recent messages, you can retrieve the most semantically relevant messages for a given prompt:
// Find messages relevant to a question about countries
List<Map<String, Object>> relevantContext = semanticHistory.getRelevant(
"Tell me about French geography", // search prompt
false, // asText - return as maps, not strings
3, // topK - return up to 3 results
false, // fallback - don't fall back to recent if no matches
null, // sessionTag - use default session
0.5, // distanceThreshold - max semantic distance
null // role filter - search all roles
);
for (Map<String, Object> message : relevantContext) {
System.out.println(message);
}
Output:
{role=user, content=What is the capital of France?}
{role=llm, content=The capital is Paris.}
The semantic search finds messages about France even though the search query uses different words.
Distance threshold
The distanceThreshold parameter controls how similar messages must be to be returned. Lower values mean stricter matching:
// Get or set the default distance threshold
semanticHistory.setDistanceThreshold(0.3);
double threshold = semanticHistory.getDistanceThreshold();
// Override threshold for a specific query
List<Map<String, Object>> strictResults = semanticHistory.getRelevant(
"French cities", false, 5, false, null, 0.2, null); // stricter threshold
Fallback to recent messages
If no semantically relevant messages are found, you can fall back to returning the most recent messages:
List<Map<String, Object>> context = semanticHistory.getRelevant(
"completely unrelated topic xyz",
false,
5,
true, // fallback=true - return recent messages if no semantic matches
null,
0.3,
null
);
Role filtering in semantic search
You can filter semantic search results by message role:
// Search only in user messages
List<Map<String, Object>> userMessages = semanticHistory.getRelevant(
"What about France?", false, 5, false, null, 0.5, "user");
// Search in multiple roles
List<Map<String, Object>> userAndLlm = semanticHistory.getRelevant(
"What about France?", false, 5, false, null, 0.5, List.of("user", "llm"));
Improved LLM conversation flow
With semantic message history, the conversation flow becomes more efficient:
while (true) {
String prompt = getUserInput();
// Get only semantically relevant context instead of recent messages
List<Map<String, Object>> context = semanticHistory.getRelevant(
prompt, // use the current prompt to find relevant history
false, // return as maps
5, // up to 5 relevant messages
true, // fall back to recent if no semantic matches
null, // default session
null, // use default threshold
null // no role filter
);
String response = llmApiCall(prompt, context);
semanticHistory.store(prompt, response);
}
Conversation control
LLMs can hallucinate on occasion and when this happens it can be useful to prune incorrect information from conversational histories so this incorrect information doesn’t continue to be passed as context.
chatHistory.store(
"what is the smallest country in Europe?",
"Monaco is the smallest country in Europe at 0.78 square miles." // Incorrect. Vatican City is the smallest country in Europe
);
// get the key of the incorrect message
List<Map<String, Object>> context = chatHistory.getRecent(1, false, true, null);
String badKey = (String) context.get(0).get("entry_id");
chatHistory.drop(badKey);
List<Map<String, Object>> correctedContext = chatHistory.getRecent(5, false, false, null);
for (Map<String, Object> message : correctedContext) {
System.out.println(message);
}
Output:
{role=user, content=What is the population of Great Britain?}
{role=llm, content=As of 2023 the population of Great Britain is approximately 67 million people.}
{role=user, content=what is the size of England compared to Portugal?}
{role=llm, content=England is larger in land area than Portugal by about 15000 square miles.}
{role=user, content=what is the smallest country in Europe?}
chatHistory.clear();