Memory Integration Patterns¶

The most common question developers have is: "How do I actually get memories into and out of my LLM?" Redis Agent Memory Server provides three distinct patterns for integrating memory with your AI applications, each optimized for different use cases and levels of control.

Overview of Using Memory¶

These integration patterns are not mutually exclusive and can be combined based on your application's needs. Each pattern excels in different scenarios, but most production systems benefit from using multiple patterns together.

Pattern	Control	Best For	Memory Flow
🤖 LLM-Driven	LLM decides	Conversational agents, chatbots	LLM ← tools → Memory
📝 Code-Driven	Your code decides	Applications, workflows	Code ← SDK → Memory
🔄 Background	Automatic extraction	Learning systems	Conversation → Auto Extract → Memory

Pro tip: Start with Code-Driven for predictable behavior, then add Background extraction for continuous learning, and finally consider LLM tools for conversational control when needed.

Pattern 1: LLM-Driven Memory (Tool-Based)¶

When to use: When you want the LLM to decide what to remember and when to retrieve memories through natural conversation.

How it works: The LLM has access to memory tools and chooses when to store or search memories based on conversation context.

Basic Setup¶

from agent_memory_client import MemoryAPIClient
import openai

# Initialize clients
memory_client = MemoryAPIClient(base_url="http://localhost:8000")
openai_client = openai.AsyncOpenAI()

# Get memory tools for the LLM
memory_tools = MemoryAPIClient.get_all_memory_tool_schemas()

# Give LLM access to memory tools
response = await openai_client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with persistent memory. Use the provided tools to remember important information and retrieve relevant context."},
        {"role": "user", "content": "Hi! I'm Alice and I love Italian food, especially pasta carbonara."}
    ],
    tools=memory_tools
)

# Handle tool calls
if response.choices[0].message.tool_calls:
    for tool_call in response.choices[0].message.tool_calls:
        result = await memory_client.resolve_function_call(
            function_name=tool_call.function.name,
            args=json.loads(tool_call.function.arguments),
            session_id="chat_alice",
            user_id="alice"
        )
        print(f"LLM stored memory: {result}")

Complete Conversation Loop¶

class LLMMemoryAgent:
    def __init__(self, memory_url: str, session_id: str, user_id: str, model_name: str = "gpt-4o"):
        self.memory_client = MemoryAPIClient(base_url=memory_url)
        self.openai_client = openai.AsyncOpenAI()
        self.session_id = session_id
        self.user_id = user_id
        self.model_name = model_name

    async def chat(self, user_message: str) -> str:
        # Get or create working memory session for conversation history
        created, working_memory = await self.memory_client.get_or_create_working_memory(
            session_id=self.session_id,
            model_name=self.model_name,
            user_id=self.user_id
        )

        # Get conversation context that includes relevant long-term memories
        context = await self.memory_client.memory_prompt(
            query=user_message,
            session_id=self.session_id,
            long_term_search={
                "text": user_message,
                "filters": {"user_id": {"eq": self.user_id}},
                "limit": 5
            }
        )

        # Get memory tools for the LLM
        tools = MemoryAPIClient.get_all_memory_tool_schemas()

        # Generate response with memory tools and context
        response = await self.openai_client.chat.completions.create(
            model=self.model_name,
            messages=context.messages + [
                {"role": "user", "content": user_message}
            ],
            tools=tools
        )

        # Handle any tool calls
        if response.choices[0].message.tool_calls:
            for tool_call in response.choices[0].message.tool_calls:
                await self.memory_client.resolve_function_call(
                    function_name=tool_call.function.name,
                    args=json.loads(tool_call.function.arguments),
                    session_id=self.session_id,
                    user_id=self.user_id
                )

        assistant_message = response.choices[0].message.content

        # Store the conversation turn in working memory
        from agent_memory_client.models import WorkingMemory, MemoryMessage

        await self.memory_client.set_working_memory(
            session_id=self.session_id,
            working_memory=WorkingMemory(
                session_id=self.session_id,
                messages=[
                    MemoryMessage(role="user", content=user_message),
                    MemoryMessage(role="assistant", content=assistant_message)
                ],
                user_id=self.user_id
            )
        )

        return assistant_message

# Usage
agent = LLMMemoryAgent(
    memory_url="http://localhost:8000",
    session_id="alice_chat",
    user_id="alice",
    model_name="gpt-4o"
)

# First conversation
response1 = await agent.chat("I'm planning a trip to Italy next month")
# LLM might store: "User is planning a trip to Italy next month"

# Later conversation
response2 = await agent.chat("What restaurants should I try?")
# LLM retrieves Italy trip context and suggests Italian restaurants

Advantages¶

Natural conversation flow: Memory operations happen organically
User control: Users can explicitly ask to remember or forget things
Contextual decisions: LLM understands when memory is relevant
Flexible: Works with any conversational pattern

Disadvantages¶

Token overhead: Tool schemas consume input tokens
Inconsistent behavior: LLM might not always use memory optimally
Cost implications: More API calls for tool usage
Latency: Additional round trips for tool execution

Best Practices¶

# 1. Provide clear system instructions
system_prompt = """
You are an AI assistant with persistent memory capabilities.

When to remember:
- User preferences (food, communication style, etc.)
- Important personal information
- Project details and context
- Recurring topics or interests

When to search memory:
- User asks about previous conversations
- Context would help provide better responses
- User references something from the past

Always be transparent about what you're remembering or have remembered.
"""

# 2. Handle tool call errors gracefully
try:
    result = await memory_client.resolve_function_call(
        function_name=tool_call.function.name,
        args=json.loads(tool_call.function.arguments),
        session_id=session_id,
        user_id=user_id
    )
except Exception as e:
    logger.warning(f"Memory operation failed: {e}")
    # Continue conversation without failing

# 3. Limit tool schemas to essential ones
essential_tools = [
    memory_client.get_long_term_memory_tool_schema(),
    memory_client.search_long_term_memory_tool_schema(),
    memory_client.create_long_term_memories_tool_schema()
]

Pattern 2: Code-Driven Memory (Programmatic)¶

When to use: When your application logic should control memory operations, or when you need predictable memory behavior.

How it works: Your code explicitly manages when to store memories and when to retrieve context, then provides enriched context to the LLM.

Basic Memory Operations¶

from agent_memory_client import MemoryAPIClient
from agent_memory_client.models import MemoryRecord

# Initialize client
client = MemoryAPIClient(base_url="http://localhost:8000")

# Store memories programmatically
user_preferences = [
    MemoryRecord(
        text="User Alice prefers email communication over phone calls",
        memory_type="semantic",
        topics=["communication", "preferences"],
        entities=["email", "phone calls"],
        user_id="alice"
    ),
    MemoryRecord(
        text="User Alice works in marketing at TechCorp",
        memory_type="semantic",
        topics=["work", "job", "company"],
        entities=["marketing", "TechCorp"],
        user_id="alice"
    )
]

await client.create_long_term_memories(user_preferences)

# Retrieve relevant context
search_results = await client.search_long_term_memory(
    text="user work and communication preferences",
    filters={"user_id": {"eq": "alice"}},
    limit=5
)

print(f"Found {len(search_results.memories)} relevant memories")
for memory in search_results.memories:
    print(f"- {memory.text}")

Memory-Enriched Conversations¶

class CodeDrivenAgent:
    def __init__(self, memory_url: str):
        self.memory_client = MemoryAPIClient(base_url=memory_url)
        self.openai_client = openai.AsyncOpenAI()

    async def get_contextual_response(
        self,
        user_message: str,
        user_id: str,
        session_id: str
    ) -> str:
        # 1. Get working memory session (creates if doesn't exist)
        created, working_memory = await self.memory_client.get_or_create_working_memory(session_id)

        # 2. Search for relevant context using session ID
        context_search = await self.memory_client.memory_prompt(
            query=user_message,
            session_id=session_id,
            long_term_search={
                "text": user_message,
                "filters": {"user_id": {"eq": user_id}},
                "limit": 5,
                "recency_boost": True
            }
        )

        # 3. Generate response with enriched context
        response = await self.openai_client.chat.completions.create(
            model="gpt-4o",
            messages=context_search.messages  # Pre-loaded with relevant memories
        )

        # 4. Optionally store the interaction
        await self.store_interaction(user_message, response.choices[0].message.content, user_id, session_id)

        return response.choices[0].message.content

    async def store_interaction(self, user_msg: str, assistant_msg: str, user_id: str, session_id: str):
        """Store important information from the interaction"""
        # Extract key information (you could use LLM or rules for this)
        if "prefer" in user_msg.lower() or "like" in user_msg.lower():
            # Store user preference
            await self.memory_client.create_long_term_memories([
                MemoryRecord(
                    text=f"User expressed: {user_msg}",
                    memory_type="semantic",
                    topics=["preferences"],
                    user_id=user_id,
                    session_id=session_id
                )
            ])

# Usage
agent = CodeDrivenAgent(memory_url="http://localhost:8000")

response = await agent.get_contextual_response(
    user_message="What's a good project management tool?",
    user_id="alice",
    session_id="work_chat"
)
# Response will include context about Alice working in marketing at TechCorp

Batch Operations¶

# Efficient batch memory storage
batch_memories = []

# Process user data
user_profile = get_user_profile("alice")
for preference in user_profile.preferences:
    batch_memories.append(MemoryRecord(
        text=f"User prefers {preference.value} for {preference.category}",
        memory_type="semantic",
        topics=[preference.category, "preferences"],
        entities=[preference.value],
        user_id="alice"
    ))

# Store all at once
await client.create_long_term_memories(batch_memories)

# Batch search with different queries
search_queries = [
    "user food preferences",
    "user work schedule",
    "user communication style"
]

search_tasks = [
    client.search_long_term_memory(
        text=query,
        filters={"user_id": {"eq": "alice"}},
        limit=3
    )
    for query in search_queries
]

results = await asyncio.gather(*search_tasks)

Advantages¶

Predictable behavior: You control exactly when memory operations happen
Efficient: No token overhead for tools, fewer API calls
Reliable: No dependency on LLM decision-making
Optimizable: You can optimize memory storage and retrieval patterns

Disadvantages¶

More coding required: You need to implement memory logic
Less natural: Memory operations don't happen organically in conversation
Maintenance overhead: Need to maintain memory extraction/retrieval logic

Best Practices¶

# 1. Use memory_prompt for enriched context
async def get_enriched_context(user_query: str, user_id: str, session_id: str):
    """Get context that includes both working memory and relevant long-term memories"""
    # First, get the working memory session (creates if doesn't exist)
    created, working_memory = await client.get_or_create_working_memory(session_id)

    # Then use memory_prompt with session ID
    return await client.memory_prompt(
        query=user_query,
        session_id=session_id,
        long_term_search={
            "text": user_query,
            "filters": {
                "user_id": {"eq": user_id},
                "namespace": {"eq": "personal"}  # Filter by domain
            },
            "limit": 5,
            "recency_boost": True  # Prefer recent relevant memories
        }
    )

# 2. Structure memories for searchability
good_memory = MemoryRecord(
    text="User Alice prefers Italian restaurants, especially ones with outdoor seating and vegetarian options",
    memory_type="semantic",
    topics=["food", "restaurants", "preferences", "dietary"],
    entities=["Italian", "outdoor seating", "vegetarian"],
    user_id="alice",
    namespace="dining"
)

# 3. Handle memory errors gracefully
async def safe_memory_search(query: str, **kwargs):
    try:
        return await client.search_long_term_memory(text=query, **kwargs)
    except Exception as e:
        logger.warning(f"Memory search failed: {e}")
        return MemoryRecordResults(memories=[], total=0)  # Empty results

Pattern 3: Background Extraction (Automatic)¶

When to use: When you want the system to automatically learn from conversations without manual intervention.

How it works: Store conversations in working memory, and the system automatically extracts important information to long-term memory in the background.

Basic Automatic Extraction¶

from agent_memory_client import MemoryAPIClient
from agent_memory_client.models import WorkingMemory, MemoryMessage

client = MemoryAPIClient(base_url="http://localhost:8000")

async def store_conversation_with_auto_extraction(
    session_id: str,
    user_message: str,
    assistant_message: str,
    user_id: str
):
    """Store conversation - system will automatically extract memories"""

    # Create working memory with the conversation
    working_memory = WorkingMemory(
        session_id=session_id,
        messages=[
            MemoryMessage(role="user", content=user_message),
            MemoryMessage(role="assistant", content=assistant_message)
        ],
        user_id=user_id
    )

    # Store in working memory - background extraction will happen automatically
    await client.set_working_memory(session_id, working_memory)

    # The system will:
    # 1. Analyze the conversation for important information
    # 2. Extract structured memories (preferences, facts, events)
    # 3. Apply contextual grounding (resolve pronouns, references)
    # 4. Store extracted memories in long-term storage
    # 5. Deduplicate similar memories

# Example conversation that triggers extraction
await store_conversation_with_auto_extraction(
    session_id="alice_onboarding",
    user_message="I'm Alice, I work as a Product Manager at StartupCorp. I prefer morning meetings and I'm vegetarian.",
    assistant_message="Nice to meet you Alice! I'll remember your role at StartupCorp and your preferences for meetings and dietary needs.",
    user_id="alice"
)

# System automatically extracts:
# - "User Alice works as Product Manager at StartupCorp" (semantic)
# - "User prefers morning meetings" (semantic)
# - "User is vegetarian" (semantic)

Structured Memory Addition¶

async def add_structured_memories_for_extraction(
    session_id: str,
    structured_memories: list[dict],
    user_id: str
):
    """Add structured memories that will be promoted to long-term storage"""

    # Convert to MemoryRecord objects
    memory_records = [
        MemoryRecord(**memory_data, user_id=user_id)
        for memory_data in structured_memories
    ]

    # Add to working memory for automatic promotion
    working_memory = WorkingMemory(
        session_id=session_id,
        memories=memory_records,
        user_id=user_id
    )

    await client.set_working_memory(session_id, working_memory)

# Usage
await add_structured_memories_for_extraction(
    session_id="alice_profile_setup",
    structured_memories=[
        {
            "text": "User has 5 years experience in product management",
            "memory_type": "semantic",
            "topics": ["experience", "career", "product_management"],
            "entities": ["5 years", "product management"]
        },
        {
            "text": "User completed MBA at Stanford in 2019",
            "memory_type": "episodic",
            "event_date": "2019-06-15T00:00:00Z",
            "topics": ["education", "mba", "stanford"],
            "entities": ["MBA", "Stanford", "2019"]
        }
    ],
    user_id="alice"
)

Long-Running Learning System¶

class AutoLearningAgent:
    def __init__(self, memory_url: str):
        self.memory_client = MemoryAPIClient(base_url=memory_url)
        self.openai_client = openai.AsyncOpenAI()

    async def process_conversation(
        self,
        user_message: str,
        session_id: str,
        user_id: str
    ) -> str:
        """Process conversation with automatic learning"""

        # 1. Get working memory session (creates if doesn't exist)
        created, working_memory = await self.memory_client.get_or_create_working_memory(session_id)

        # 2. Get existing context for better responses
        context = await self.memory_client.memory_prompt(
            query=user_message,
            session_id=session_id,
            long_term_search={
                "text": user_message,
                "filters": {"user_id": {"eq": user_id}},
                "limit": 3
            }
        )

        # 3. Generate response with context
        response = await self.openai_client.chat.completions.create(
            model="gpt-4o",
            messages=context.messages + [
                {"role": "user", "content": user_message}
            ]
        )

        assistant_message = response.choices[0].message.content

        # 4. Store conversation for automatic extraction
        await self.memory_client.set_working_memory(
            session_id,
            WorkingMemory(
                session_id=session_id,
                messages=[
                    MemoryMessage(role="user", content=user_message),
                    MemoryMessage(role="assistant", content=assistant_message)
                ],
                user_id=user_id
            )
        )

        return assistant_message

    async def get_learned_information(self, user_id: str, topic: str = None):
        """See what the system has learned about a user"""
        search_query = f"user {topic}" if topic else "user information preferences"

        results = await self.memory_client.search_long_term_memory(
            text=search_query,
            filters={"user_id": {"eq": user_id}},
            limit=10
        )

        return [memory.text for memory in results.memories]

# Usage - system learns over multiple conversations
agent = AutoLearningAgent(memory_url="http://localhost:8000")

# Conversation 1
await agent.process_conversation(
    user_message="I'm working on a React project with TypeScript",
    session_id="coding_help_1",
    user_id="dev_alice"
)

# Conversation 2
await agent.process_conversation(
    user_message="I prefer using functional components over class components",
    session_id="coding_help_2",
    user_id="dev_alice"
)

# Check what system learned
learned_info = await agent.get_learned_information(
    user_id="dev_alice",
    topic="coding preferences"
)
print("System learned:", learned_info)
# Might include: "User prefers functional components over class components"

Advantages¶

Zero overhead: No manual memory management required
Learns continuously: System improves understanding over time
Contextual grounding: Automatically resolves references and pronouns
Deduplication: Prevents duplicate memories
Scales naturally: Works with any conversation volume

Disadvantages¶

Less control: Can't control exactly what gets remembered
Delayed availability: Extraction happens in background, not immediately
Potential noise: Might extract irrelevant information
Requires conversation: Needs conversational context to work well

Best Practices¶

# 1. Provide rich conversation context
working_memory = WorkingMemory(
    session_id=session_id,
    messages=[
        MemoryMessage(role="system", content="User is setting up their profile"),
        MemoryMessage(role="user", content="I'm a senior developer at Google"),
        MemoryMessage(role="assistant", content="I'll note your role as senior developer at Google")
    ],
    context="User onboarding conversation",
    user_id=user_id,
    namespace="profile_setup"  # Organize by domain
)

# 2. Monitor extraction quality
async def check_extracted_memories(user_id: str, session_id: str):
    """Review what was extracted from a session"""
    memories = await client.search_long_term_memory(
        text="",  # Get all memories
        filters={
            "user_id": {"eq": user_id},
            "session_id": {"eq": session_id}
        },
        limit=20
    )

    for memory in memories.memories:
        print(f"Extracted: {memory.text}")
        print(f"Topics: {memory.topics}")
        print(f"Created: {memory.created_at}")

# 3. Combine with manual memory editing when needed
if extracted_memory_needs_correction:
    await client.edit_long_term_memory(
        memory_id=memory.id,
        updates={
            "text": "Corrected version of the memory",
            "topics": ["updated", "topics"]
        }
    )

Hybrid Patterns¶

Most production systems benefit from combining multiple patterns:

Pattern Combination: Code + Background¶

class HybridMemoryAgent:
    """Combines code-driven retrieval with background extraction"""

    def __init__(self, memory_url: str):
        self.memory_client = MemoryAPIClient(base_url=memory_url)
        self.openai_client = openai.AsyncOpenAI()

    async def chat(self, user_message: str, user_id: str, session_id: str) -> str:
        # 1. Get working memory session (creates if doesn't exist)
        created, working_memory = await self.memory_client.get_or_create_working_memory(session_id)

        # 2. Code-driven: Get relevant context
        context = await self.memory_client.memory_prompt(
            query=user_message,
            session_id=session_id,
            long_term_search={
                "text": user_message,
                "filters": {"user_id": {"eq": user_id}},
                "limit": 5
            }
        )

        # 3. Generate response
        response = await self.openai_client.chat.completions.create(
            model="gpt-4o",
            messages=context.messages + [
                {"role": "user", "content": user_message}
            ]
        )

        assistant_message = response.choices[0].message.content

        # 4. Background: Store for automatic extraction
        await self.memory_client.set_working_memory(
            session_id,
            WorkingMemory(
                messages=[
                    MemoryMessage(role="user", content=user_message),
                    MemoryMessage(role="assistant", content=assistant_message)
                ],
                user_id=user_id
            )
        )

        return assistant_message

Pattern Combination: LLM Tools + Background¶

class SmartChatAgent:
    """LLM can use tools, plus automatic background learning"""

    async def chat(self, user_message: str, user_id: str, session_id: str) -> str:
        # Get memory tools
        tools = MemoryAPIClient.get_all_memory_tool_schemas()

        # LLM-driven: Let LLM use memory tools
        response = await self.openai_client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "You have memory tools. Use them when relevant."},
                {"role": "user", "content": user_message}
            ],
            tools=tools
        )

        # Handle tool calls
        if response.choices[0].message.tool_calls:
            for tool_call in response.choices[0].message.tool_calls:
                await self.memory_client.resolve_function_call(
                    function_name=tool_call.function.name,
                    args=json.loads(tool_call.function.arguments),
                    session_id=session_id,
                    user_id=user_id
                )

        # Background: Also store conversation for automatic extraction
        await self.memory_client.set_working_memory(
            session_id,
            WorkingMemory(
                messages=[
                    MemoryMessage(role="user", content=user_message),
                    MemoryMessage(role="assistant", content=response.choices[0].message.content)
                ],
                user_id=user_id
            )
        )

        return response.choices[0].message.content

Decision Framework¶

Choose your integration pattern based on your requirements:

🤖 Use LLM-Driven When:¶

Building conversational agents or chatbots
Users should control what gets remembered
Natural conversation flow is important
You can handle token overhead and variable costs

📝 Use Code-Driven When:¶

Building applications with specific workflows
You need predictable memory behavior
Memory operations should be optimized for performance
You want full control over what gets stored and retrieved

🔄 Use Background Extraction When:¶

Building learning systems that improve over time
You want zero-overhead memory management
Conversations provide rich context for extraction
Long-term learning is more important than immediate control

🔗 Use Hybrid Patterns When:¶

You want benefits of multiple approaches
Different parts of your system have different needs
You're building sophisticated AI applications
You can handle the additional complexity

Getting Started¶

Start Simple: Begin with Code-Driven pattern for predictable results
Add Background: Enable automatic extraction for continuous learning
Consider LLM Tools: Add when conversational control becomes important
Optimize: Monitor performance and adjust patterns based on usage

Each pattern can be implemented incrementally, allowing you to start simple and add complexity as your application grows.