Skip to main content
No backward compatibility is required (initial development). This is a clean-slate design doc covering requirements, evaluation criteria, library analysis, and the selected stack.

Context

Hiro League agents (Personal Assistant, Life Coach / Therapist, Research Agent, Home & Family Agent, Social & Media Agent) need memory that spans conversations, sessions, and time. Users share personal information, family details, emotional state, preferences, opinions, schedules, and history. The system must remember across sessions, reason about relationships, and track how facts change over time. The agents already use LangGraph AsyncSqliteSaver for short-term conversation state (checkpointed per thread). This document covers the long-term memory layer that sits alongside that.

Memory categories

These categories reflect the kinds of information Hiro League agents must retain:
CategoryExamplesVolatilitySensitivityGraph needed?
Identity and profileName, family members, home setupVery lowMediumYes (family tree)
Preferences”Prefers dark roast coffee”LowLowNo
Opinions”Thinks remote work is better”MediumLow–MediumNo
Facts and knowledge”Daughter’s school is Lincoln Elementary”LowMediumYes (relationships)
Emotional state”Feeling anxious about job”HighHighNo
Situations”Going through divorce”MediumHighYes (people involved)
Episodes”We discussed meditation on March 15”Immutable (decays in relevance)MediumNo
Relationships”Wife Sarah, kids Tom (8) and Emma (5)“LowMediumCore use case
Commitments”Bot promised to check in about interview”Medium (has deadline)LowNo
Interaction patterns”User prefers morning check-ins”LowLowNo

Memory lifecycle

Write policy

  • Not every utterance becomes a memory. Importance filtering is required — “I had coffee” vs. “I got diagnosed with diabetes” carry different signal.
  • Explicit memories (“remember this”) and implicit memories (system-extracted from conversation) are both needed.
  • Confidence scoring: distinguish facts from passing comments.
  • Source tracking: which conversation, which agent, when.

Read policy

  • Different agents need different memory views. The Therapist needs emotional history; the Home Agent needs family schedules.
  • Retrieval must be context-aware: mentioning “my daughter” should surface daughter-related memories, not everything.
  • Recency bias for some queries, completeness for others.
  • Fast enough to not break conversational flow.

Update and mutation

  • Facts change: “I work at Google” becomes “I left Google and joined Meta.”
  • The old fact should be superseded with temporal markers, not deleted.
  • Opinions shift. Contradictions need a resolution strategy — prefer the most recent statement.

Expiration and forgetting

  • Some memories should decay (mood on a specific day).
  • Some should never expire (family members, medical conditions).
  • Users must be able to say “forget this” — right to be forgotten is critical for trust.
  • Stale situational context needs cleanup (“I’m preparing for my interview” — interview was months ago).

Multi-agent memory architecture

Hiro League has multiple agents with different roles accessing a shared user. This drives specific requirements:

Shared memory layer

All agents should know core identity/profile facts and active situations. No agent should contradict another about basic facts.

Agent-private memory

Therapist conversations may be more sensitive. The user might share something with the Therapist that they do not want the Personal Assistant to reference. Memory scoping and access control between agents is needed.

Cross-agent memory signals

If the Home Agent detects the user hasn’t been sleeping well (IoT data), the Therapist should know. If the Personal Assistant knows about a stressful meeting tomorrow, the Life Coach can check in. This is selective inter-agent memory sharing.

Agent-specific observation memory

Each agent’s own interaction history with the user — what worked, what didn’t, what the user liked. Personalization per agent role, not just per user.

Temporal requirements

  • Point-in-time recall: “What did I tell you about X last month?”
  • Trend detection: mood over time, habit compliance, preference drift.
  • Temporal validity: “I’m on vacation until Friday” has an expiration.
  • Event sequencing: cause-and-effect in the user’s life narrative.
  • Cyclical awareness: birthdays, recurring appointments, seasonal patterns.

Privacy and trust

  • Encryption at rest — memory stores must be encrypted.
  • Tiered sensitivity: medical > emotional > preferences > general facts.
  • User visibility: user should be able to see what has been remembered (memory dashboard).
  • User control: edit, delete, correct memories.
  • No leakage: memories from one user must never bleed into another (multi-user household).
  • Audit trail: who wrote what memory, when, from which conversation.
  • Local-first: aligns with Hiro League’s “private by design” principle.

Scale and performance

  • Memory grows continuously over months and years of daily use.
  • Retrieval must stay fast even with thousands of memories.
  • Consolidation (turning verbose conversations into compact facts) should happen asynchronously, not blocking chat.
  • Memory should be searchable by the user, not just by agents.

Library evaluation criteria

These criteria are ordered by importance for Hiro League’s desktop server deployment.

Tier 1 — deal-breakers

#CriterionWhy it matters
1Server-side, Python-nativeThe server is a Python desktop app. The library must be pip install-able.
2Works with LangChain/LangGraphAlready invested in this stack. Must integrate natively or wire in with minimal glue.
3Embedded / lightweight storageUsers install on a desktop. No Docker, no external services. SQLite, local files, or embedded DBs only.
4Active maintenanceMust have active releases in 2025–2026. Abandoned projects are excluded.
5Windows supportHiro League’s primary target is Windows desktop.

Tier 2 — strong preferences

#CriterionWhy it matters
6Conversation analysis / extraction built-inWe do not want to build a custom memory manager. The library should extract facts from conversations.
7Multi-layer memoryShort-term (conversation buffer) + long-term (facts, preferences, knowledge) in one library or tightly integrated pair.
8Scoping / namespacingMultiple agents, multiple users. Per-user, per-agent, per-shared-pool scoping required.
9Temporal awarenessFacts change. Newer facts should supersede older ones without losing history.

Tier 3 — future-proofing

#CriterionWhy it matters
10Graph memory pathRelationships (family, social) will eventually need graph memory. Support it natively or do not lock it out.
11Swappable storage backendToday embedded, tomorrow maybe Postgres. Must not hardcode one vector DB.
12User-facing memory (inspect/edit/delete)Users want to see what the bot remembers, correct it, delete it. Data must be inspectable.
13Async supportServer is async. Synchronous blocking in the memory hot path hurts responsiveness.

Library categories

There are two distinct layers that are often conflated: Memory managers analyze conversations, extract memories, manage lifecycle, handle retrieval. You talk to them, they figure out what to remember. Storage backends store and retrieve vectors/data. They do not decide what to store — they store what you give them efficiently. A memory manager uses a storage backend underneath. These are not either/or choices; they are different layers.

Memory managers evaluated

Mem0

Purpose-built memory layer. Feed it conversations, it extracts facts/preferences/entities, stores them, retrieves relevant ones later.
  • License: Apache-2.0
  • Install: pip install mem0ai
  • Architecture: Vector memory (semantic search) + graph memory (entity/relationship extraction and traversal). Dual retrieval: vector narrows candidates, graph expands connected context.
  • Vector backends: Qdrant (default), ChromaDB, FAISS, PGVector, Milvus, Pinecone, Weaviate, Elasticsearch, Redis, and others.
  • Graph backends: Neo4j, Memgraph, Kuzu (embedded, in-process), Apache AGE, FalkorDB.
  • LLM support: OpenAI, Ollama (fully offline), and others.
  • Windows: Yes — all embedded backends (Qdrant local, ChromaDB, FAISS, Kuzu) work on Windows.
Recent features (2026):
  • Temporal search filtering (“what happened last week”) — February 2026
  • Hybrid search (semantic + keyword) — January 2026
  • Graph memory status tracking
  • Improved extraction quality
Strengths:
  • Flexible backend choices — pick what fits the deployment.
  • Graph memory with Kuzu = embedded graph, zero infrastructure.
  • Active community, well-documented.
  • Scoping by user_id, agent_id built in.
  • You control the extraction pipeline.
Weaknesses:
  • You still wire the “when to extract” logic (call m.add() with conversation text).
  • Not framework-native to LangGraph — integrated alongside.
  • LongMemEval benchmark: 49% (significantly lower than Hindsight).

Hindsight

Newer memory system that models Retain / Recall / Reflect as first-class operations. Treats memory as a reasoning substrate.
  • License: MIT
  • Install: pip install hindsight-embed (embedded daemon) or pip install hindsight-client (SDK)
  • Architecture: Four memory networks — World Facts, Experiences, Observations (synthesized), Mental Models (curated summaries). Multi-strategy retrieval: temporal reasoning, graph traversal, BM25, semantic search, cross-encoder reranking.
  • Storage: Embedded PostgreSQL (pg0), bundled inside the daemon.
  • LLM support: OpenAI, Groq, Google, Ollama (fully offline).
Strengths:
  • Best benchmark performance: 91.4% on LongMemEval (state of the art).
  • “Reflect” synthesizes across memories to generate answers, not just retrieve.
  • Mental Models (pre-computed reflections that auto-refresh) — useful for “what do I know about this user’s family.”
  • Graph + temporal + keyword + semantic search fused together.
  • Bank-based scoping (each agent gets a bank).
Weaknesses:
  • Windows support unconfirmed — embedded PostgreSQL (pg0) is the risk. Historically poor Windows support for embedded Postgres tooling.
  • Daemon architecture adds a separate process (auto-starts, auto-shuts after 5 min idle). Not an in-process library call — there is a localhost HTTP hop.
  • Heavier footprint (Postgres + ML models, 1–3 min cold start on first run).
  • Newer, smaller community than Mem0.
  • Storage is built-in Postgres only — no swappable vector backend.

LangMem (LangChain ecosystem)

Memory primitives (extract, recall, distill) that integrate with LangGraph storage.
  • License: MIT
  • Install: pip install langmem (already in the LangGraph ecosystem)
  • Architecture: Works with LangGraph’s long-term store. Extraction via LLM prompts.
  • Windows: Yes.
Strengths:
  • Already in the LangGraph stack — zero new infrastructure.
  • Namespacing is native (per-user, per-agent).
  • Short-term (checkpointer) and long-term (store) in one stack.
Weaknesses:
  • No graph memory.
  • Extraction quality depends on prompt configuration.
  • Newer, less battle-tested for “memory as a product.”
  • Vector-only retrieval, no knowledge graph, no multi-hop reasoning, no temporal queries.

Storage backends evaluated

These are vector databases that sit underneath a memory manager (or standalone for RAG).

Qdrant (local mode)

  • License: Apache-2.0
  • Install: pip install qdrant-client
  • Local mode: QdrantClient(path="./db") — persistent disk, no server. Also supports :memory: for ephemeral use.
  • Windows: Yes.
  • Upgrade path: Same API local vs. server — switch to clustered deployment later without code changes.
  • Mem0 integration: Default backend, best-tested code path.

LanceDB

  • License: Apache-2.0
  • Install: pip install lancedb
  • Local mode: Embedded in-process (like SQLite). lancedb.connect("./data")
  • Windows: Yes (since v0.4.0).
  • Features: Hybrid search (vector + BM25), ACID, versioning, columnar (Apache Arrow).
  • Mem0 integration: Not supported. A PR was submitted but never merged. LanceDB is not listed in Mem0’s supported vector stores. Usable for standalone RAG, not as a Mem0 backend.

Milvus Lite

  • License: Apache-2.0
  • Install: pip install pymilvus
  • Local mode: Embedded via local file path.
  • Windows: Not supported. Linux and macOS only.
  • Verdict: Eliminated for Hiro League.

ChromaDB

  • License: Apache-2.0
  • Install: pip install chromadb
  • Local mode: Local directory path, no server.
  • Windows: Yes.
  • Mem0 integration: Supported.

FAISS

  • License: MIT
  • Install: pip install faiss-cpu
  • Local mode: In-process, file-backed.
  • Windows: Yes.
  • Mem0 integration: Supported.

Kuzu (graph database)

  • License: MIT
  • Install: pip install kuzu
  • Local mode: Embedded in-process. No server, no Docker. Persist to a directory or run fully in-memory.
  • Windows: Yes.
  • Mem0 integration: Supported as graph backend since August 2025.

Scoring

Scoring each memory manager against the evaluation criteria (1–5 scale, 5 = best):
CriterionWeightMem0HindsightLangMem
Python, pip installDeal-breaker555
LangChain/LangGraph integrationDeal-breaker4 (alongside)4 (alongside)5 (native)
Embedded / lightweightDeal-breaker5 (Qdrant local + Kuzu)3 (daemon + embedded Postgres)5 (LangGraph store)
Active maintenanceDeal-breaker554
Windows supportDeal-breaker52 (unconfirmed)5
Conversation extraction built-inStrong554
Multi-layer memoryStrong4 (long-term; keep LangGraph for short)5 (all layers)3 (primitives only)
Scoping / namespacingStrong5 (user_id, agent_id)4 (bank-based)5 (namespace-native)
Temporal awarenessStrong4 (added Feb 2026)5 (core design)2
Graph memory pathFuture5 (Kuzu embedded)4 (built-in)1 (none)
Swappable backendFuture5 (many options)1 (Postgres only)3 (store-agnostic)
User-inspectable dataFuture4 (API access)4 (API access)3 (store queryable)
Async supportFuture454
Benchmark (LongMemEval)Reference49%91.4%N/A

Hindsight elimination rationale

Hindsight has the best benchmark performance and the most ambitious architecture (Reflect, Mental Models, fused retrieval). However, it fails on a deal-breaker criterion: Windows support is unconfirmed, and the embedded PostgreSQL daemon architecture carries real risk on Windows desktops. The daemon model (separate process, localhost HTTP, 1–3 min cold start) also conflicts with the lightweight embedded requirement. Hindsight is worth monitoring. If Windows support is confirmed and the daemon overhead becomes acceptable, it could supplement or replace the memory layer in a future phase.

LangMem consideration

LangMem scores well on integration (native to LangGraph) but lacks graph memory and temporal awareness. It is a viable Phase 1 option but does not grow into the full requirement set. Choosing it would mean replacing it later when graph and temporal needs arise.

Selected stack

Memory manager: Mem0 Mem0 meets all deal-breaker criteria and scores highest across the full weighted evaluation. It provides conversation extraction, vector + graph memory, temporal search, scoping, and flexible backend choices — all installable via pip with no external services. Vector backend: Qdrant (local mode) Qdrant is Mem0’s default and most-tested vector backend. Local mode (path="./db") requires no server, works on Windows, and uses the same API as the server deployment — providing a clean upgrade path if needed later. Graph backend: Kuzu (embedded) Kuzu is an embedded in-process graph database. pip install kuzu, point at a directory, done. No Neo4j, no Docker. Supported by Mem0 since August 2025. Enables the relationship memory layer (family tree, social graph, entity connections) without infrastructure overhead.

Stack summary

LayerToolInstallStorage location
Short-term (conversation)LangGraph AsyncSqliteSaverAlready in placeworkspace.db
Long-term (facts, preferences, knowledge)Mem0 + Qdrant localpip install mem0ai qdrant-clientWorkspace directory
Graph (relationships, entities)Mem0 graph + Kuzupip install kuzuWorkspace directory
All components are pip install-able, run embedded on Windows, and store data locally alongside the existing workspace.db.

Implementation phases

Phase 1 — foundation

  • Keep LangGraph AsyncSqliteSaver checkpointer for short-term conversation memory (already in place).
  • Add Mem0 with Qdrant local for long-term memory extraction and retrieval.
  • Wire m.add() calls after agent conversations to extract and store memories.
  • Inject relevant retrieved memories into agent prompts before each response.
  • Scope memories per user and per agent using Mem0’s built-in user_id and agent_id.

Phase 2 — richer memory

  • Add structured memory types (preferences, facts, opinions, situations) with metadata and categories.
  • Add memory consolidation: summarize old conversations asynchronously.
  • Add user-facing memory view in admin UI (inspect, edit, delete).
  • Tune extraction quality: importance filtering, confidence scoring, deduplication.

Phase 3 — graph memory

  • Enable Mem0 graph memory with Kuzu as the embedded graph backend.
  • Extract entities and relationships from conversations (family members, places, people, connections).
  • Use graph traversal for relationship queries (“who is the user’s daughter’s teacher?”).
  • Feed graph context alongside vector results into agent prompts.

Phase 4 — advanced

  • Cross-agent memory sharing policies (Therapist private vs. shared pool).
  • Sentiment and emotion tracking over time.
  • Memory-driven proactive behavior (“you mentioned your interview is today”).
  • Re-evaluate Hindsight if Windows support is confirmed.