Working Memory¶
Working memory is session-scoped, durable storage designed for active conversation state and session data. It's the "scratch pad" where an AI agent keeps track of the current conversation context for a particular session.
Overview¶
Working memory provides durable storage for a single conversation session. It's optimized for storing conversation messages, session-specific data, and structured memories that may later be promoted to long-term storage. By default, working memory persists to maintain conversation history, but you can set TTL expiration if your application doesn't need persistent conversation history.
Feature | Details |
---|---|
Scope | Session-scoped |
Lifespan | Durable by default, optional TTL |
Storage | Redis key-value with JSON |
Search | Simple text matching |
Capacity | Limited by window size |
Use Case | Active conversation state |
Indexing | None |
Deduplication | None |
Characteristics¶
- Session Scoped: Each session has its own isolated working memory
- Durable by Default: Persists conversation history unless TTL is explicitly set
- Optional TTL: Can be configured to expire if conversation history isn't needed
- Window Management: Automatically summarizes when message count exceeds limits
- Mixed Content: Stores both conversation messages and structured memory records
- No Indexing: Simple JSON storage in Redis
- Promotion: Structured memories can be promoted to long-term storage
Data Structure¶
Working memory contains:
- Messages: Conversation history (role/content pairs)
- Memories: Structured memory records awaiting promotion
- Context: Summary of past conversation when truncated
- Data: Arbitrary JSON key-value storage
- Metadata: User ID, timestamps, TTL settings
When to Use Working Memory¶
1. Conversation Messages¶
The primary use of working memory is storing conversation messages to maintain context across turns:
import ulid
# Store conversation messages for context continuity
working_memory = WorkingMemory(
session_id="chat_123",
messages=[
MemoryMessage(role="user", content="I'm planning a trip to Paris next month", id=ulid.ULID()),
MemoryMessage(role="assistant", content="That sounds exciting! What type of activities are you interested in?", id=ulid.ULID()),
MemoryMessage(role="user", content="I love museums and good food", id=ulid.ULID())
]
)
# On the next turn, the assistant can access this context:
# - User is planning a Paris trip
# - Trip is next month
# - User likes museums and food
# This enables coherent, context-aware responses
2. Session-Specific Data¶
Use the data
field for temporary session information that doesn't need to persist across conversations:
# Store session-specific facts and configuration
working_memory = WorkingMemory(
session_id="chat_123",
data={
"temp_trip_info": {
"destination": "Paris",
"travel_month": "next month",
"planning_stage": "initial"
},
"user_preferences": {"temperature_unit": "celsius"},
"conversation_mode": "casual"
}
)
3. Structured Memories for Long-Term Storage¶
Use the memories
field for important facts that should be remembered across all future conversations:
# Important facts that should persist beyond this session
working_memory = WorkingMemory(
session_id="chat_123",
memories=[
MemoryRecord(
text="User is planning a trip to Paris next month",
id="trip_planning_paris",
memory_type="episodic",
topics=["travel", "planning"],
entities=["Paris"]
)
]
)
# This memory will be automatically promoted to long-term storage
🔑 Key Distinction: - Use
data
field for session-specific facts that stay only in the session - Usememories
field for important facts that should be promoted to long-term storage - Anything in thememories
field will automatically become persistent and searchable across all future sessions
Producing Long-Term Memories from Working Memory¶
Working memory can automatically extract and promote memories to long-term storage using different strategies. This is one of the most powerful features of the memory server - it can intelligently analyze conversation content and create persistent memories without manual intervention.
Memory Server Extracts in the Background¶
By default, the memory server automatically analyzes working memory content and extracts meaningful memories in the background. This is ideal when you want the memory server to handle all LLM operations internally.
# Configure automatic extraction strategy
working_memory = WorkingMemory(
session_id="chat_123",
long_term_memory_strategy=MemoryStrategyConfig(
extraction_strategy="thread_aware", # Analyzes conversation threads
custom_prompt="Extract key facts about user preferences and important events",
enable_topic_extraction=True,
enable_entity_extraction=True
),
messages=[
MemoryMessage(role="user", content="I'm a software engineer at TechCorp"),
MemoryMessage(role="assistant", content="That's great! What technologies do you work with?"),
MemoryMessage(role="user", content="Mainly Python and React for web applications")
]
)
# The server will automatically extract memories like:
# - "User is a software engineer at TechCorp"
# - "User works with Python and React for web applications"
Your LLM Extracts (Client-Side)¶
If you prefer to manage all LLM activity in your application, you can have your LLM extract memories client-side and add them to working memory. This gives you full control over the extraction process and LLM usage.
# Your LLM can use tools to lazily add memories to working memory
# These will be promoted to long-term storage when the session is processed
# Using the add_memory_to_working_memory tool (lazy approach)
tools = [client.get_add_memory_tool_schema()]
# Your LLM can call this tool to add memories:
# add_memory_to_working_memory(
# session_id="chat_123",
# memory={
# "text": "User prefers Python for backend development",
# "memory_type": "semantic",
# "topics": ["programming", "preferences"]
# }
# )
The Python SDK includes tools that allow your LLM to create memories either lazily (added to working memory for later promotion) or eagerly (created directly in long-term storage):
- Lazy approach:
add_memory_to_working_memory
- adds memories to working memory for batch promotion - Eager approach:
create_long_term_memory
- creates memories directly in long-term storage
See the Long-Term Memory documentation for details on eager creation.
API Endpoints¶
# Get working memory for a session
GET /v1/working-memory/{session_id}?namespace=demo&model_name=gpt-4o
# Set working memory (replaces existing, with optional TTL)
PUT /v1/working-memory/{session_id}?ttl_seconds=3600
# Delete working memory
DELETE /v1/working-memory/{session_id}?namespace=demo
Automatic Promotion¶
When structured memories in working memory are stored, they are automatically promoted to long-term storage in the background:
- Memories with
persisted_at=null
are identified - Server assigns unique IDs and timestamps
- Memories are indexed in long-term storage with vector embeddings
- Working memory is updated with
persisted_at
timestamps
Memory Lifecycle¶
1. Creation in Working Memory¶
# Client creates structured memory
memory = MemoryRecord(
text="User likes Italian food",
id="client_generated_id",
memory_type="semantic"
)
# Add to working memory
working_memory = WorkingMemory(
session_id="current_session",
memories=[memory]
)
2. Automatic Promotion¶
# Server promotes to long-term storage (background)
# - Assigns persisted_at timestamp
# - Generates vector embeddings
# - Indexes for search
# - Updates working memory with timestamps
Best Practices¶
Working Memory Usage¶
- Keep conversation state and session-specific data
- Use for session-specific configuration and context
- Store structured memories that should become long-term
- Set TTL only if conversation history doesn't need to persist
- Let automatic promotion handle long-term memory persistence
Memory Design¶
- Use
data
field for session-specific facts that stay only in the session - Use
memories
field for important facts that should be promoted to long-term storage - Design memory text for LLM consumption
- Include relevant topics and entities for better search
TTL and Persistence¶
Working memory is durable by default to preserve conversation history. However, you can configure TTL (time-to-live) expiration if your application doesn't need persistent conversation history:
# Durable working memory (default behavior)
working_memory = WorkingMemory(
session_id="chat_123",
messages=[...],
# No TTL - memory persists until explicitly deleted
)
# Working memory with TTL expiration
working_memory = WorkingMemory(
session_id="chat_123",
messages=[...],
ttl_seconds=3600 # Expires after 1 hour
)
When to use TTL: - Temporary chat sessions that don't need history - Privacy-sensitive applications requiring automatic cleanup - Resource-constrained environments
When to keep durable (default): - Applications that need conversation history - Multi-turn conversations that reference past context - Customer support or assistant applications
Transparent Reconstruction from Long-Term Memory¶
When index_all_messages_in_long_term_memory
is enabled, working memory can be transparently reconstructed from long-term storage. This allows you to use TTL expiration while still maintaining conversation continuity.
How it works: 1. Set index_all_messages_in_long_term_memory=true
in configuration 2. Messages are automatically indexed in long-term memory as they flow through working memory 3. When working memory expires (TTL), the messages remain in long-term storage 4. If you request a session that doesn't exist in working memory, the system automatically searches long-term memory for messages from that session and reconstructs the working memory
Example workflow:
# 1. Store working memory with TTL (expires after 1 hour)
working_memory = WorkingMemory(
session_id="chat_123",
messages=[
MemoryMessage(role="user", content="Hello"),
MemoryMessage(role="assistant", content="Hi there!"),
],
ttl_seconds=3600 # 1 hour expiration
)
# 2. Messages are automatically indexed in long-term memory
# 3. After 1 hour, working memory expires and is deleted
# 4. Later, when you request the session:
# GET /v1/working-memory/chat_123
# System automatically reconstructs from long-term memory
# Returns working memory with original messages
This feature is perfect for applications that want to: - Reduce Redis memory usage with TTL expiration - Maintain conversation continuity across sessions - Automatically handle session restoration without manual intervention
Configuration¶
Working memory behavior can be configured through environment variables:
# Working memory settings
WINDOW_SIZE=50 # Message window before summarization
LONG_TERM_MEMORY=true # Enable long-term memory features
# Long-term memory settings
ENABLE_DISCRETE_MEMORY_EXTRACTION=true # Extract memories from messages
GENERATION_MODEL=gpt-4o-mini # Model for summarization/extraction
For complete configuration options, see the Configuration Guide.
Related Documentation¶
- Long-term Memory - Persistent, cross-session memory storage
- Memory Integration Patterns - How to integrate memory with your applications
- Memory Extraction Strategies - Different approaches to memory extraction and storage