Memory Types¶

The Redis Agent Memory Server provides two distinct types of memory storage, each optimized for different use cases and access patterns: Working Memory and Long-Term Memory.

Overview¶

Feature	Working Memory	Long-Term Memory
Scope	Session-scoped	Cross-session, persistent
Lifespan	TTL-based (1 hour default)	Permanent until manually deleted
Storage	Redis key-value with JSON	Redis with vector indexing
Search	Simple text matching	Semantic vector search
Capacity	Limited by window size	Unlimited (with compaction)
Use Case	Active conversation state	Knowledge base, user preferences
Indexing	None	Vector embeddings + metadata
Deduplication	None	Hash-based and semantic

Working Memory¶

Working memory is session-scoped, ephemeral storage designed for active conversation state and temporary data. It's the "scratch pad" where an AI agent keeps track of the current conversation context.

Characteristics¶

Session Scoped: Each session has its own isolated working memory
TTL-Based: Automatically expires (default: 1 hour)
Window Management: Automatically summarizes when message count exceeds limits
Mixed Content: Stores both conversation messages and structured memory records
No Indexing: Simple JSON storage in Redis
Promotion: Structured memories can be promoted to long-term storage

Data Structure¶

Working memory contains:

Messages: Conversation history (role/content pairs)
Memories: Structured memory records awaiting promotion
Context: Summary of past conversation when truncated
Data: Arbitrary JSON key-value storage
Metadata: User ID, timestamps, TTL settings

When to Use Working Memory¶

Active Conversation State

import ulid

# Store current conversation messages
working_memory = WorkingMemory(
    session_id="chat_123",
    messages=[
        MemoryMessage(role="user", content="What's the weather like?", id=ulid.ULID()),
        MemoryMessage(role="assistant", content="I'll check that for you...", id=ulid.ULID())
    ]
)

Temporary Structured Data

# Store temporary facts during conversation (using data field)
working_memory = WorkingMemory(
    session_id="chat_123",
    data={
        "temp_trip_info": {
            "destination": "Paris",
            "travel_month": "next month",
            "planning_stage": "initial"
        },
        "conversation_context": "travel planning"
    }
)

Session-Specific Settings

# Store ephemeral configuration
working_memory = WorkingMemory(
    session_id="chat_123",
    data={
        "user_preferences": {"temperature_unit": "celsius"},
        "conversation_mode": "casual",
        "current_task": "trip_planning"
    }
)

Promoting Memories to Long-Term Storage

# Memories in working memory are automatically promoted to long-term storage
working_memory = WorkingMemory(
    session_id="chat_123",
    memories=[
        MemoryRecord(
            text="User is planning a trip to Paris next month",
            id="trip_planning_paris",
            memory_type="episodic",
            topics=["travel", "planning"],
            entities=["Paris"]
        )
    ]
)
# This memory will become permanent in long-term storage

🔑 Key Distinction: - Use data field for temporary facts that stay only in the session - Use memories field for permanent facts that should be promoted to long-term storage - Anything in the memories field will automatically become persistent and searchable across all future sessions

API Endpoints¶

# Get working memory for a session
GET /v1/working-memory/{session_id}?namespace=demo&model_name=gpt-4o

# Set working memory (replaces existing)
PUT /v1/working-memory/{session_id}

# Delete working memory
DELETE /v1/working-memory/{session_id}?namespace=demo

Automatic Promotion¶

When structured memories in working memory are stored, they are automatically promoted to long-term storage in the background:

Memories with persisted_at=null are identified
Server assigns unique IDs and timestamps
Memories are indexed in long-term storage with vector embeddings
Working memory is updated with persisted_at timestamps

Three Ways to Create Long-Term Memories¶

Long-term memories are typically created by LLMs (either yours or the memory server's) based on conversations. There are three pathways:

1. 🤖 Automatic Extraction from Conversations¶

The server automatically extracts memories from conversation messages using an LLM in the background:

# Server analyzes messages and creates memories automatically
working_memory = WorkingMemory(
    session_id="chat_123",
    messages=[
        {"role": "user", "content": "I love Italian food, especially carbonara"},
        {"role": "assistant", "content": "Great! I'll remember your preference for Italian cuisine."}
    ]
    # Server will extract: "User enjoys Italian food, particularly carbonara pasta"
)

2. ⚡ LLM-Identified Memories via Working Memory (Performance Optimization)¶

Your LLM can pre-identify memories and add them to working memory for batch storage:

# LLM identifies important facts and adds to memories field
working_memory = WorkingMemory(
    session_id="chat_123",
    memories=[
        MemoryRecord(
            text="User prefers morning meetings and dislikes calls after 4 PM",
            memory_type="semantic",
            topics=["preferences", "scheduling"],
            entities=["morning meetings", "4 PM"]
        )
    ]
    # Automatically promoted to long-term storage when saving working memory
)

3. 🎯 Direct Long-Term Memory Creation¶

Create memories directly via API or LLM tool calls:

# Direct API call or LLM using create_long_term_memory tool
await client.create_long_term_memories([
    {
        "text": "User works as a software engineer at TechCorp",
        "memory_type": "semantic",
        "topics": ["career", "work"],
        "entities": ["software engineer", "TechCorp"]
    }
])

💡 LLM-Driven Design: The system is designed for LLMs to make memory decisions. Your LLM can use memory tools to search existing memories, decide what's important to remember, and choose the most efficient storage method.

Long-Term Memory¶

Long-term memory is persistent, cross-session storage designed for knowledge that should be retained and searchable across all interactions. It's the "knowledge base" where important facts, preferences, and experiences are stored.

Characteristics¶

Cross-Session: Accessible from any session
Persistent: Survives server restarts and session expiration
Vector Indexed: Semantic search with OpenAI embeddings
Deduplication: Automatic hash-based and semantic deduplication
Rich Metadata: Topics, entities, timestamps, memory types
Compaction: Automatic cleanup and merging of duplicates

Memory Types¶

Long-term memory supports three types of memories:

Semantic: Facts, preferences, general knowledge

{
  "text": "User prefers dark mode interfaces",
  "memory_type": "semantic",
  "topics": ["preferences", "ui"],
  "entities": ["dark mode"]
}

Episodic: Events with temporal context

{
  "text": "User visited Paris in March 2024",
  "memory_type": "episodic",
  "event_date": "2024-03-15T10:00:00Z",
  "topics": ["travel"],
  "entities": ["Paris"]
}

Message: Conversation records (auto-generated)

{
  "text": "user: What's the weather like?",
  "memory_type": "message",
  "session_id": "chat_123"
}

When to Use Long-Term Memory¶

User Preferences and Profile

# Store lasting user preferences
memories = [
    MemoryRecord(
        text="User prefers metric units for temperature",
        id="pref_metric_temp",
        memory_type="semantic",
        topics=["preferences", "units"],
        user_id="user_123"
    )
]

Important Facts and Knowledge

# Store domain knowledge
memories = [
    MemoryRecord(
        text="Customer's subscription expires on 2024-06-15",
        id="sub_expiry_customer_456",
        memory_type="episodic",
        event_date=datetime(2024, 6, 15),
        entities=["customer_456", "subscription"],
        user_id="user_123"
    )
]

Cross-Session Context

# Store context that spans conversations
memories = [
    MemoryRecord(
        text="User is working on a Python machine learning project",
        id="context_ml_project",
        memory_type="semantic",
        topics=["programming", "machine-learning", "python"],
        namespace="work_context"
    )
]

API Endpoints¶

# Create long-term memories
POST /v1/long-term-memory/

# Search long-term memories
POST /v1/long-term-memory/search

Search Capabilities¶

Long-term memory provides powerful search features:

Semantic Vector Search¶

{
  "text": "python programming help",
  "limit": 10,
  "distance_threshold": 0.8
}

Advanced Filtering¶

{
  "text": "user preferences",
  "filters": {
    "user_id": {"eq": "user_123"},
    "memory_type": {"eq": "semantic"},
    "topics": {"any": ["preferences", "settings"]},
    "created_at": {"gte": "2024-01-01T00:00:00Z"}
  }
}

Hybrid Search¶

{
  "text": "travel plans",
  "filters": {
    "namespace": {"eq": "personal"},
    "event_date": {"gte": "2024-03-01T00:00:00Z"}
  },
  "include_working_memory": true,
  "include_long_term_memory": true
}

Memory Lifecycle¶

1. Creation in Working Memory¶

# Client creates structured memory
memory = MemoryRecord(
    text="User likes Italian food",
    id="client_generated_id",
    memory_type="semantic"
)

# Add to working memory
working_memory = WorkingMemory(
    session_id="current_session",
    memories=[memory]
)

2. Automatic Promotion¶

# Server promotes to long-term storage (background)
# - Assigns persisted_at timestamp
# - Generates vector embeddings
# - Indexes for search
# - Updates working memory with timestamps

3. Deduplication and Compaction¶

# Server automatically:
# - Identifies hash-based duplicates
# - Finds semantically similar memories
# - Merges related memories using LLM
# - Removes obsolete duplicates

4. Retrieval and Search¶

# Client searches across all memory
results = await search_memories(
    text="food preferences",
    filters={"user_id": {"eq": "user_123"}}
)

Memory Prompt Integration¶

The memory system integrates with AI prompts through the /v1/memory/prompt endpoint:

# Get memory-enriched prompt
response = await memory_prompt({
    "query": "Help me plan dinner",
    "session": {
        "session_id": "current_chat",
        "model_name": "gpt-4o",
        "context_window_max": 4000
    },
    "long_term_search": {
        "text": "food preferences dietary restrictions",
        "filters": {"user_id": {"eq": "user_123"}},
        "limit": 5
    }
})

# Returns ready-to-use messages with:
# - Conversation context from working memory
# - Relevant memories from long-term storage
# - User's query as final message

Best Practices¶

Working Memory¶

Keep conversation state and temporary data
Use for session-specific configuration
Store structured memories that might become long-term
Let automatic promotion handle persistence

Long-Term Memory¶

Store user preferences and lasting facts
Include rich metadata (topics, entities, timestamps)
Use meaningful IDs for easier retrieval
Leverage semantic search for discovery

Memory Design¶

Use semantic memory for timeless facts
Use episodic memory for time-bound events
Include relevant topics and entities for better search
Design memory text for LLM consumption

Search Strategy¶

Start with semantic search for discovery
Add filters for precision
Use unified search for comprehensive results
Consider both working and long-term contexts

Memory Extraction¶

By default, the system automatically extracts structured memories from conversations as they flow from working memory to long-term storage. This extraction process can be customized using different memory strategies.

Memory Strategies

The system supports multiple extraction strategies (discrete facts, summaries, preferences, custom prompts) that determine how conversations are processed into memories. See Memory Strategies for complete documentation and examples.

Configuration¶

Memory behavior can be configured through environment variables:

# Working memory settings
WINDOW_SIZE=50                    # Message window before summarization
LONG_TERM_MEMORY=true            # Enable long-term memory features

# Long-term memory settings
ENABLE_DISCRETE_MEMORY_EXTRACTION=true  # Extract memories from messages
GENERATION_MODEL=gpt-4o-mini     # Model for summarization/extraction

For complete configuration options, see the Configuration Guide.