Skip to content

Memory Patterns

The most common question developers have is: "How do I actually get memories into and out of my LLM?" Redis Agent Memory Server provides three distinct patterns for integrating memory with your AI applications, each optimized for different use cases and levels of control.

Overview of Using Memory

These integration patterns are not mutually exclusive and can be combined based on your application's needs. Each pattern excels in different scenarios, but most production systems benefit from using multiple patterns together.

Pattern Control Best For Memory Flow
🤖 LLM-Driven LLM decides Conversational agents, chatbots LLM ← tools → Memory
📝 Code-Driven Your code decides Applications, workflows Code ← SDK → Memory
🔄 Background Automatic extraction Learning systems Conversation → Auto Extract → Memory

Pro tip: Start with Code-Driven for predictable behavior, then add Background extraction for continuous learning, and finally consider LLM tools for conversational control when needed.

Pattern 1: LLM-Driven Memory (Tool-Based)

When to use: When you want the LLM to decide what to remember and when to retrieve memories through natural conversation.

How it works: The LLM has access to memory tools and chooses when to store or search memories based on conversation context.

Basic Setup

from agent_memory_client import MemoryAPIClient
import openai

# Initialize clients
memory_client = MemoryAPIClient(base_url="http://localhost:8000")
openai_client = openai.AsyncOpenAI()

# Get memory tools for the LLM
memory_tools = MemoryAPIClient.get_all_memory_tool_schemas()

# Give LLM access to memory tools
response = await openai_client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with persistent memory. Use the provided tools to remember important information and retrieve relevant context."},
        {"role": "user", "content": "Hi! I'm Alice and I love Italian food, especially pasta carbonara."}
    ],
    tools=memory_tools
)

# Handle tool calls
if response.choices[0].message.tool_calls:
    for tool_call in response.choices[0].message.tool_calls:
        result = await memory_client.resolve_function_call(
            function_name=tool_call.function.name,
            args=json.loads(tool_call.function.arguments),
            session_id="chat_alice",
            user_id="alice"
        )
        print(f"LLM stored memory: {result}")

Complete Conversation Loop

class LLMMemoryAgent:
    def __init__(self, memory_url: str, session_id: str, user_id: str, model_name: str = "gpt-4o"):
        self.memory_client = MemoryAPIClient(base_url=memory_url)
        self.openai_client = openai.AsyncOpenAI()
        self.session_id = session_id
        self.user_id = user_id
        self.model_name = model_name

    async def chat(self, user_message: str) -> str:
        # Get or create working memory session for conversation history
        created, working_memory = await self.memory_client.get_or_create_working_memory(
            session_id=self.session_id,
            model_name=self.model_name,
            user_id=self.user_id
        )

        # Get conversation context that includes relevant long-term memories
        context = await self.memory_client.memory_prompt(
            query=user_message,
            session_id=self.session_id,
            long_term_search={
                "text": user_message,
                "filters": {"user_id": {"eq": self.user_id}},
                "limit": 5
            }
        )

        # Get memory tools for the LLM
        tools = MemoryAPIClient.get_all_memory_tool_schemas()

        # Generate response with memory tools and context
        response = await self.openai_client.chat.completions.create(
            model=self.model_name,
            messages=context.messages + [
                {"role": "user", "content": user_message}
            ],
            tools=tools
        )

        # Handle any tool calls
        if response.choices[0].message.tool_calls:
            for tool_call in response.choices[0].message.tool_calls:
                await self.memory_client.resolve_function_call(
                    function_name=tool_call.function.name,
                    args=json.loads(tool_call.function.arguments),
                    session_id=self.session_id,
                    user_id=self.user_id
                )

        assistant_message = response.choices[0].message.content

        # Store the conversation turn in working memory
        from agent_memory_client.models import WorkingMemory, MemoryMessage

        await self.memory_client.set_working_memory(
            session_id=self.session_id,
            working_memory=WorkingMemory(
                session_id=self.session_id,
                messages=[
                    MemoryMessage(role="user", content=user_message),
                    MemoryMessage(role="assistant", content=assistant_message)
                ],
                user_id=self.user_id
            )
        )

        return assistant_message

# Usage
agent = LLMMemoryAgent(
    memory_url="http://localhost:8000",
    session_id="alice_chat",
    user_id="alice",
    model_name="gpt-4o"
)

# First conversation
response1 = await agent.chat("I'm planning a trip to Italy next month")
# LLM might store: "User is planning a trip to Italy next month"

# Later conversation
response2 = await agent.chat("What restaurants should I try?")
# LLM retrieves Italy trip context and suggests Italian restaurants

Advantages

  • Natural conversation flow: Memory operations happen organically
  • User control: Users can explicitly ask to remember or forget things
  • Contextual decisions: LLM understands when memory is relevant
  • Flexible: Works with any conversational pattern

Disadvantages

  • Token overhead: Tool schemas consume input tokens
  • Inconsistent behavior: LLM might not always use memory optimally
  • Cost implications: More API calls for tool usage
  • Latency: Additional round trips for tool execution

Best Practices

# 1. Provide clear system instructions
system_prompt = """
You are an AI assistant with persistent memory capabilities.

When to remember:
- User preferences (food, communication style, etc.)
- Important personal information
- Project details and context
- Recurring topics or interests

When to search memory:
- User asks about previous conversations
- Context would help provide better responses
- User references something from the past

Always be transparent about what you're remembering or have remembered.
"""

# 2. Handle tool call errors gracefully
try:
    result = await memory_client.resolve_function_call(
        function_name=tool_call.function.name,
        args=json.loads(tool_call.function.arguments),
        session_id=session_id,
        user_id=user_id
    )
except Exception as e:
    logger.warning(f"Memory operation failed: {e}")
    # Continue conversation without failing

# 3. Limit tool schemas to essential ones
essential_tools = [
    memory_client.get_long_term_memory_tool_schema(),
    memory_client.search_long_term_memory_tool_schema(),
    memory_client.create_long_term_memories_tool_schema()
]

Pattern 2: Code-Driven Memory (Programmatic)

When to use: When your application logic should control memory operations, or when you need predictable memory behavior.

How it works: Your code explicitly manages when to store memories and when to retrieve context, then provides enriched context to the LLM.

Basic Memory Operations

from agent_memory_client import MemoryAPIClient
from agent_memory_client.models import MemoryRecord

# Initialize client
client = MemoryAPIClient(base_url="http://localhost:8000")

# Store memories programmatically
user_preferences = [
    MemoryRecord(
        text="User Alice prefers email communication over phone calls",
        memory_type="semantic",
        topics=["communication", "preferences"],
        entities=["email", "phone calls"],
        user_id="alice"
    ),
    MemoryRecord(
        text="User Alice works in marketing at TechCorp",
        memory_type="semantic",
        topics=["work", "job", "company"],
        entities=["marketing", "TechCorp"],
        user_id="alice"
    )
]

await client.create_long_term_memories(user_preferences)

# Retrieve relevant context
search_results = await client.search_long_term_memory(
    text="user work and communication preferences",
    filters={"user_id": {"eq": "alice"}},
    limit=5
)

print(f"Found {len(search_results.memories)} relevant memories")
for memory in search_results.memories:
    print(f"- {memory.text}")

Memory-Enriched Conversations

class CodeDrivenAgent:
    def __init__(self, memory_url: str):
        self.memory_client = MemoryAPIClient(base_url=memory_url)
        self.openai_client = openai.AsyncOpenAI()

    async def get_contextual_response(
        self,
        user_message: str,
        user_id: str,
        session_id: str
    ) -> str:
        # 1. Get working memory session (creates if doesn't exist)
        created, working_memory = await self.memory_client.get_or_create_working_memory(session_id)

        # 2. Search for relevant context using session ID
        context_search = await self.memory_client.memory_prompt(
            query=user_message,
            session_id=session_id,
            long_term_search={
                "text": user_message,
                "filters": {"user_id": {"eq": user_id}},
                "limit": 5,
                "recency_boost": True
            }
        )

        # 3. Generate response with enriched context
        response = await self.openai_client.chat.completions.create(
            model="gpt-4o",
            messages=context_search.messages  # Pre-loaded with relevant memories
        )

        # 4. Optionally store the interaction
        await self.store_interaction(user_message, response.choices[0].message.content, user_id, session_id)

        return response.choices[0].message.content

    async def store_interaction(self, user_msg: str, assistant_msg: str, user_id: str, session_id: str):
        """Store important information from the interaction"""
        # Extract key information (you could use LLM or rules for this)
        if "prefer" in user_msg.lower() or "like" in user_msg.lower():
            # Store user preference
            await self.memory_client.create_long_term_memories([
                MemoryRecord(
                    text=f"User expressed: {user_msg}",
                    memory_type="semantic",
                    topics=["preferences"],
                    user_id=user_id,
                    session_id=session_id
                )
            ])

# Usage
agent = CodeDrivenAgent(memory_url="http://localhost:8000")

response = await agent.get_contextual_response(
    user_message="What's a good project management tool?",
    user_id="alice",
    session_id="work_chat"
)
# Response will include context about Alice working in marketing at TechCorp

Batch Operations

# Efficient batch memory storage
batch_memories = []

# Process user data
user_profile = get_user_profile("alice")
for preference in user_profile.preferences:
    batch_memories.append(MemoryRecord(
        text=f"User prefers {preference.value} for {preference.category}",
        memory_type="semantic",
        topics=[preference.category, "preferences"],
        entities=[preference.value],
        user_id="alice"
    ))

# Store all at once
await client.create_long_term_memories(batch_memories)

# Batch search with different queries
search_queries = [
    "user food preferences",
    "user work schedule",
    "user communication style"
]

search_tasks = [
    client.search_long_term_memory(
        text=query,
        filters={"user_id": {"eq": "alice"}},
        limit=3
    )
    for query in search_queries
]

results = await asyncio.gather(*search_tasks)

Advantages

  • Predictable behavior: You control exactly when memory operations happen
  • Efficient: No token overhead for tools, fewer API calls
  • Reliable: No dependency on LLM decision-making
  • Optimizable: You can optimize memory storage and retrieval patterns

Disadvantages

  • More coding required: You need to implement memory logic
  • Less natural: Memory operations don't happen organically in conversation
  • Maintenance overhead: Need to maintain memory extraction/retrieval logic

Best Practices

# 1. Use memory_prompt for enriched context
async def get_enriched_context(user_query: str, user_id: str, session_id: str):
    """Get context that includes both working memory and relevant long-term memories"""
    # First, get the working memory session (creates if doesn't exist)
    created, working_memory = await client.get_or_create_working_memory(session_id)

    # Then use memory_prompt with session ID
    return await client.memory_prompt(
        query=user_query,
        session_id=session_id,
        long_term_search={
            "text": user_query,
            "filters": {
                "user_id": {"eq": user_id},
                "namespace": {"eq": "personal"}  # Filter by domain
            },
            "limit": 5,
            "recency_boost": True  # Prefer recent relevant memories
        }
    )

# 2. Structure memories for searchability
good_memory = MemoryRecord(
    text="User Alice prefers Italian restaurants, especially ones with outdoor seating and vegetarian options",
    memory_type="semantic",
    topics=["food", "restaurants", "preferences", "dietary"],
    entities=["Italian", "outdoor seating", "vegetarian"],
    user_id="alice",
    namespace="dining"
)

# 3. Handle memory errors gracefully
async def safe_memory_search(query: str, **kwargs):
    try:
        return await client.search_long_term_memory(text=query, **kwargs)
    except Exception as e:
        logger.warning(f"Memory search failed: {e}")
        return MemoryRecordResults(memories=[], total=0)  # Empty results

Pattern 3: Background Extraction (Automatic)

When to use: When you want the system to automatically learn from conversations without manual intervention.

How it works: Store conversations in working memory, and the system automatically extracts important information to long-term memory in the background.

Basic Automatic Extraction

from agent_memory_client import MemoryAPIClient
from agent_memory_client.models import WorkingMemory, MemoryMessage

client = MemoryAPIClient(base_url="http://localhost:8000")

async def store_conversation_with_auto_extraction(
    session_id: str,
    user_message: str,
    assistant_message: str,
    user_id: str
):
    """Store conversation - system will automatically extract memories"""

    # Create working memory with the conversation
    working_memory = WorkingMemory(
        session_id=session_id,
        messages=[
            MemoryMessage(role="user", content=user_message),
            MemoryMessage(role="assistant", content=assistant_message)
        ],
        user_id=user_id
    )

    # Store in working memory - background extraction will happen automatically
    await client.set_working_memory(session_id, working_memory)

    # The system will:
    # 1. Analyze the conversation for important information
    # 2. Extract structured memories (preferences, facts, events)
    # 3. Apply contextual grounding (resolve pronouns, references)
    # 4. Store extracted memories in long-term storage
    # 5. Deduplicate similar memories

# Example conversation that triggers extraction
await store_conversation_with_auto_extraction(
    session_id="alice_onboarding",
    user_message="I'm Alice, I work as a Product Manager at StartupCorp. I prefer morning meetings and I'm vegetarian.",
    assistant_message="Nice to meet you Alice! I'll remember your role at StartupCorp and your preferences for meetings and dietary needs.",
    user_id="alice"
)

# System automatically extracts:
# - "User Alice works as Product Manager at StartupCorp" (semantic)
# - "User prefers morning meetings" (semantic)
# - "User is vegetarian" (semantic)

Structured Memory Addition

async def add_structured_memories_for_extraction(
    session_id: str,
    structured_memories: list[dict],
    user_id: str
):
    """Add structured memories that will be promoted to long-term storage"""

    # Convert to MemoryRecord objects
    memory_records = [
        MemoryRecord(**memory_data, user_id=user_id)
        for memory_data in structured_memories
    ]

    # Add to working memory for automatic promotion
    working_memory = WorkingMemory(
        session_id=session_id,
        memories=memory_records,
        user_id=user_id
    )

    await client.set_working_memory(session_id, working_memory)

# Usage
await add_structured_memories_for_extraction(
    session_id="alice_profile_setup",
    structured_memories=[
        {
            "text": "User has 5 years experience in product management",
            "memory_type": "semantic",
            "topics": ["experience", "career", "product_management"],
            "entities": ["5 years", "product management"]
        },
        {
            "text": "User completed MBA at Stanford in 2019",
            "memory_type": "episodic",
            "event_date": "2019-06-15T00:00:00Z",
            "topics": ["education", "mba", "stanford"],
            "entities": ["MBA", "Stanford", "2019"]
        }
    ],
    user_id="alice"
)

Long-Running Learning System

class AutoLearningAgent:
    def __init__(self, memory_url: str):
        self.memory_client = MemoryAPIClient(base_url=memory_url)
        self.openai_client = openai.AsyncOpenAI()

    async def process_conversation(
        self,
        user_message: str,
        session_id: str,
        user_id: str
    ) -> str:
        """Process conversation with automatic learning"""

        # 1. Get working memory session (creates if doesn't exist)
        created, working_memory = await self.memory_client.get_or_create_working_memory(session_id)

        # 2. Get existing context for better responses
        context = await self.memory_client.memory_prompt(
            query=user_message,
            session_id=session_id,
            long_term_search={
                "text": user_message,
                "filters": {"user_id": {"eq": user_id}},
                "limit": 3
            }
        )

        # 3. Generate response with context
        response = await self.openai_client.chat.completions.create(
            model="gpt-4o",
            messages=context.messages + [
                {"role": "user", "content": user_message}
            ]
        )

        assistant_message = response.choices[0].message.content

        # 4. Store conversation for automatic extraction
        await self.memory_client.set_working_memory(
            session_id,
            WorkingMemory(
                session_id=session_id,
                messages=[
                    MemoryMessage(role="user", content=user_message),
                    MemoryMessage(role="assistant", content=assistant_message)
                ],
                user_id=user_id
            )
        )

        return assistant_message

    async def get_learned_information(self, user_id: str, topic: str = None):
        """See what the system has learned about a user"""
        search_query = f"user {topic}" if topic else "user information preferences"

        results = await self.memory_client.search_long_term_memory(
            text=search_query,
            filters={"user_id": {"eq": user_id}},
            limit=10
        )

        return [memory.text for memory in results.memories]

# Usage - system learns over multiple conversations
agent = AutoLearningAgent(memory_url="http://localhost:8000")

# Conversation 1
await agent.process_conversation(
    user_message="I'm working on a React project with TypeScript",
    session_id="coding_help_1",
    user_id="dev_alice"
)

# Conversation 2
await agent.process_conversation(
    user_message="I prefer using functional components over class components",
    session_id="coding_help_2",
    user_id="dev_alice"
)

# Check what system learned
learned_info = await agent.get_learned_information(
    user_id="dev_alice",
    topic="coding preferences"
)
print("System learned:", learned_info)
# Might include: "User prefers functional components over class components"

Advantages

  • Zero overhead: No manual memory management required
  • Learns continuously: System improves understanding over time
  • Contextual grounding: Automatically resolves references and pronouns
  • Deduplication: Prevents duplicate memories
  • Scales naturally: Works with any conversation volume

Disadvantages

  • Less control: Can't control exactly what gets remembered
  • Delayed availability: Extraction happens in background, not immediately
  • Potential noise: Might extract irrelevant information
  • Requires conversation: Needs conversational context to work well

Best Practices

# 1. Provide rich conversation context
working_memory = WorkingMemory(
    session_id=session_id,
    messages=[
        MemoryMessage(role="system", content="User is setting up their profile"),
        MemoryMessage(role="user", content="I'm a senior developer at Google"),
        MemoryMessage(role="assistant", content="I'll note your role as senior developer at Google")
    ],
    context="User onboarding conversation",
    user_id=user_id,
    namespace="profile_setup"  # Organize by domain
)

# 2. Monitor extraction quality
async def check_extracted_memories(user_id: str, session_id: str):
    """Review what was extracted from a session"""
    memories = await client.search_long_term_memory(
        text="",  # Get all memories
        filters={
            "user_id": {"eq": user_id},
            "session_id": {"eq": session_id}
        },
        limit=20
    )

    for memory in memories.memories:
        print(f"Extracted: {memory.text}")
        print(f"Topics: {memory.topics}")
        print(f"Created: {memory.created_at}")

# 3. Combine with manual memory editing when needed
if extracted_memory_needs_correction:
    await client.edit_long_term_memory(
        memory_id=memory.id,
        updates={
            "text": "Corrected version of the memory",
            "topics": ["updated", "topics"]
        }
    )

Hybrid Patterns

Most production systems benefit from combining multiple patterns:

Pattern Combination: Code + Background

class HybridMemoryAgent:
    """Combines code-driven retrieval with background extraction"""

    def __init__(self, memory_url: str):
        self.memory_client = MemoryAPIClient(base_url=memory_url)
        self.openai_client = openai.AsyncOpenAI()

    async def chat(self, user_message: str, user_id: str, session_id: str) -> str:
        # 1. Get working memory session (creates if doesn't exist)
        created, working_memory = await self.memory_client.get_or_create_working_memory(session_id)

        # 2. Code-driven: Get relevant context
        context = await self.memory_client.memory_prompt(
            query=user_message,
            session_id=session_id,
            long_term_search={
                "text": user_message,
                "filters": {"user_id": {"eq": user_id}},
                "limit": 5
            }
        )

        # 3. Generate response
        response = await self.openai_client.chat.completions.create(
            model="gpt-4o",
            messages=context.messages + [
                {"role": "user", "content": user_message}
            ]
        )

        assistant_message = response.choices[0].message.content

        # 4. Background: Store for automatic extraction
        await self.memory_client.set_working_memory(
            session_id,
            WorkingMemory(
                messages=[
                    MemoryMessage(role="user", content=user_message),
                    MemoryMessage(role="assistant", content=assistant_message)
                ],
                user_id=user_id
            )
        )

        return assistant_message

Pattern Combination: LLM Tools + Background

class SmartChatAgent:
    """LLM can use tools, plus automatic background learning"""

    async def chat(self, user_message: str, user_id: str, session_id: str) -> str:
        # Get memory tools
        tools = MemoryAPIClient.get_all_memory_tool_schemas()

        # LLM-driven: Let LLM use memory tools
        response = await self.openai_client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "You have memory tools. Use them when relevant."},
                {"role": "user", "content": user_message}
            ],
            tools=tools
        )

        # Handle tool calls
        if response.choices[0].message.tool_calls:
            for tool_call in response.choices[0].message.tool_calls:
                await self.memory_client.resolve_function_call(
                    function_name=tool_call.function.name,
                    args=json.loads(tool_call.function.arguments),
                    session_id=session_id,
                    user_id=user_id
                )

        # Background: Also store conversation for automatic extraction
        await self.memory_client.set_working_memory(
            session_id,
            WorkingMemory(
                messages=[
                    MemoryMessage(role="user", content=user_message),
                    MemoryMessage(role="assistant", content=response.choices[0].message.content)
                ],
                user_id=user_id
            )
        )

        return response.choices[0].message.content

Decision Framework

Choose your integration pattern based on your requirements:

🤖 Use LLM-Driven When:

  • Building conversational agents or chatbots
  • Users should control what gets remembered
  • Natural conversation flow is important
  • You can handle token overhead and variable costs

📝 Use Code-Driven When:

  • Building applications with specific workflows
  • You need predictable memory behavior
  • Memory operations should be optimized for performance
  • You want full control over what gets stored and retrieved

🔄 Use Background Extraction When:

  • Building learning systems that improve over time
  • You want zero-overhead memory management
  • Conversations provide rich context for extraction
  • Long-term learning is more important than immediate control

🔗 Use Hybrid Patterns When:

  • You want benefits of multiple approaches
  • Different parts of your system have different needs
  • You're building sophisticated AI applications
  • You can handle the additional complexity

Getting Started

  1. Start Simple: Begin with Code-Driven pattern for predictable results
  2. Add Background: Enable automatic extraction for continuous learning
  3. Consider LLM Tools: Add when conversational control becomes important
  4. Optimize: Monitor performance and adjust patterns based on usage

Each pattern can be implemented incrementally, allowing you to start simple and add complexity as your application grows.