No backward compatibility is required (initial development). This is a clean-slate design doc covering requirements, evaluation criteria, library analysis, and the selected stack.
Context
Hiro League agents (Personal Assistant, Life Coach / Therapist, Research Agent, Home & Family Agent, Social & Media Agent) need memory that spans conversations, sessions, and time. Users share personal information, family details, emotional state, preferences, opinions, schedules, and history. The system must remember across sessions, reason about relationships, and track how facts change over time.
The agents already use LangGraph AsyncSqliteSaver for short-term conversation state (checkpointed per thread). This document covers the long-term memory layer that sits alongside that.
Memory categories
These categories reflect the kinds of information Hiro League agents must retain:
| Category | Examples | Volatility | Sensitivity | Graph needed? |
|---|
| Identity and profile | Name, family members, home setup | Very low | Medium | Yes (family tree) |
| Preferences | ”Prefers dark roast coffee” | Low | Low | No |
| Opinions | ”Thinks remote work is better” | Medium | Low–Medium | No |
| Facts and knowledge | ”Daughter’s school is Lincoln Elementary” | Low | Medium | Yes (relationships) |
| Emotional state | ”Feeling anxious about job” | High | High | No |
| Situations | ”Going through divorce” | Medium | High | Yes (people involved) |
| Episodes | ”We discussed meditation on March 15” | Immutable (decays in relevance) | Medium | No |
| Relationships | ”Wife Sarah, kids Tom (8) and Emma (5)“ | Low | Medium | Core use case |
| Commitments | ”Bot promised to check in about interview” | Medium (has deadline) | Low | No |
| Interaction patterns | ”User prefers morning check-ins” | Low | Low | No |
Memory lifecycle
Write policy
- Not every utterance becomes a memory. Importance filtering is required — “I had coffee” vs. “I got diagnosed with diabetes” carry different signal.
- Explicit memories (“remember this”) and implicit memories (system-extracted from conversation) are both needed.
- Confidence scoring: distinguish facts from passing comments.
- Source tracking: which conversation, which agent, when.
Read policy
- Different agents need different memory views. The Therapist needs emotional history; the Home Agent needs family schedules.
- Retrieval must be context-aware: mentioning “my daughter” should surface daughter-related memories, not everything.
- Recency bias for some queries, completeness for others.
- Fast enough to not break conversational flow.
Update and mutation
- Facts change: “I work at Google” becomes “I left Google and joined Meta.”
- The old fact should be superseded with temporal markers, not deleted.
- Opinions shift. Contradictions need a resolution strategy — prefer the most recent statement.
Expiration and forgetting
- Some memories should decay (mood on a specific day).
- Some should never expire (family members, medical conditions).
- Users must be able to say “forget this” — right to be forgotten is critical for trust.
- Stale situational context needs cleanup (“I’m preparing for my interview” — interview was months ago).
Multi-agent memory architecture
Hiro League has multiple agents with different roles accessing a shared user. This drives specific requirements:
Shared memory layer
All agents should know core identity/profile facts and active situations. No agent should contradict another about basic facts.
Agent-private memory
Therapist conversations may be more sensitive. The user might share something with the Therapist that they do not want the Personal Assistant to reference. Memory scoping and access control between agents is needed.
Cross-agent memory signals
If the Home Agent detects the user hasn’t been sleeping well (IoT data), the Therapist should know. If the Personal Assistant knows about a stressful meeting tomorrow, the Life Coach can check in. This is selective inter-agent memory sharing.
Agent-specific observation memory
Each agent’s own interaction history with the user — what worked, what didn’t, what the user liked. Personalization per agent role, not just per user.
Temporal requirements
- Point-in-time recall: “What did I tell you about X last month?”
- Trend detection: mood over time, habit compliance, preference drift.
- Temporal validity: “I’m on vacation until Friday” has an expiration.
- Event sequencing: cause-and-effect in the user’s life narrative.
- Cyclical awareness: birthdays, recurring appointments, seasonal patterns.
Privacy and trust
- Encryption at rest — memory stores must be encrypted.
- Tiered sensitivity: medical > emotional > preferences > general facts.
- User visibility: user should be able to see what has been remembered (memory dashboard).
- User control: edit, delete, correct memories.
- No leakage: memories from one user must never bleed into another (multi-user household).
- Audit trail: who wrote what memory, when, from which conversation.
- Local-first: aligns with Hiro League’s “private by design” principle.
- Memory grows continuously over months and years of daily use.
- Retrieval must stay fast even with thousands of memories.
- Consolidation (turning verbose conversations into compact facts) should happen asynchronously, not blocking chat.
- Memory should be searchable by the user, not just by agents.
Library evaluation criteria
These criteria are ordered by importance for Hiro League’s desktop server deployment.
Tier 1 — deal-breakers
| # | Criterion | Why it matters |
|---|
| 1 | Server-side, Python-native | The server is a Python desktop app. The library must be pip install-able. |
| 2 | Works with LangChain/LangGraph | Already invested in this stack. Must integrate natively or wire in with minimal glue. |
| 3 | Embedded / lightweight storage | Users install on a desktop. No Docker, no external services. SQLite, local files, or embedded DBs only. |
| 4 | Active maintenance | Must have active releases in 2025–2026. Abandoned projects are excluded. |
| 5 | Windows support | Hiro League’s primary target is Windows desktop. |
Tier 2 — strong preferences
| # | Criterion | Why it matters |
|---|
| 6 | Conversation analysis / extraction built-in | We do not want to build a custom memory manager. The library should extract facts from conversations. |
| 7 | Multi-layer memory | Short-term (conversation buffer) + long-term (facts, preferences, knowledge) in one library or tightly integrated pair. |
| 8 | Scoping / namespacing | Multiple agents, multiple users. Per-user, per-agent, per-shared-pool scoping required. |
| 9 | Temporal awareness | Facts change. Newer facts should supersede older ones without losing history. |
Tier 3 — future-proofing
| # | Criterion | Why it matters |
|---|
| 10 | Graph memory path | Relationships (family, social) will eventually need graph memory. Support it natively or do not lock it out. |
| 11 | Swappable storage backend | Today embedded, tomorrow maybe Postgres. Must not hardcode one vector DB. |
| 12 | User-facing memory (inspect/edit/delete) | Users want to see what the bot remembers, correct it, delete it. Data must be inspectable. |
| 13 | Async support | Server is async. Synchronous blocking in the memory hot path hurts responsiveness. |
Library categories
There are two distinct layers that are often conflated:
Memory managers analyze conversations, extract memories, manage lifecycle, handle retrieval. You talk to them, they figure out what to remember.
Storage backends store and retrieve vectors/data. They do not decide what to store — they store what you give them efficiently.
A memory manager uses a storage backend underneath. These are not either/or choices; they are different layers.
Memory managers evaluated
Mem0
Purpose-built memory layer. Feed it conversations, it extracts facts/preferences/entities, stores them, retrieves relevant ones later.
- License: Apache-2.0
- Install:
pip install mem0ai
- Architecture: Vector memory (semantic search) + graph memory (entity/relationship extraction and traversal). Dual retrieval: vector narrows candidates, graph expands connected context.
- Vector backends: Qdrant (default), ChromaDB, FAISS, PGVector, Milvus, Pinecone, Weaviate, Elasticsearch, Redis, and others.
- Graph backends: Neo4j, Memgraph, Kuzu (embedded, in-process), Apache AGE, FalkorDB.
- LLM support: OpenAI, Ollama (fully offline), and others.
- Windows: Yes — all embedded backends (Qdrant local, ChromaDB, FAISS, Kuzu) work on Windows.
Recent features (2026):
- Temporal search filtering (“what happened last week”) — February 2026
- Hybrid search (semantic + keyword) — January 2026
- Graph memory status tracking
- Improved extraction quality
Strengths:
- Flexible backend choices — pick what fits the deployment.
- Graph memory with Kuzu = embedded graph, zero infrastructure.
- Active community, well-documented.
- Scoping by
user_id, agent_id built in.
- You control the extraction pipeline.
Weaknesses:
- You still wire the “when to extract” logic (call
m.add() with conversation text).
- Not framework-native to LangGraph — integrated alongside.
- LongMemEval benchmark: 49% (significantly lower than Hindsight).
Hindsight
Newer memory system that models Retain / Recall / Reflect as first-class operations. Treats memory as a reasoning substrate.
- License: MIT
- Install:
pip install hindsight-embed (embedded daemon) or pip install hindsight-client (SDK)
- Architecture: Four memory networks — World Facts, Experiences, Observations (synthesized), Mental Models (curated summaries). Multi-strategy retrieval: temporal reasoning, graph traversal, BM25, semantic search, cross-encoder reranking.
- Storage: Embedded PostgreSQL (pg0), bundled inside the daemon.
- LLM support: OpenAI, Groq, Google, Ollama (fully offline).
Strengths:
- Best benchmark performance: 91.4% on LongMemEval (state of the art).
- “Reflect” synthesizes across memories to generate answers, not just retrieve.
- Mental Models (pre-computed reflections that auto-refresh) — useful for “what do I know about this user’s family.”
- Graph + temporal + keyword + semantic search fused together.
- Bank-based scoping (each agent gets a bank).
Weaknesses:
- Windows support unconfirmed — embedded PostgreSQL (pg0) is the risk. Historically poor Windows support for embedded Postgres tooling.
- Daemon architecture adds a separate process (auto-starts, auto-shuts after 5 min idle). Not an in-process library call — there is a localhost HTTP hop.
- Heavier footprint (Postgres + ML models, 1–3 min cold start on first run).
- Newer, smaller community than Mem0.
- Storage is built-in Postgres only — no swappable vector backend.
LangMem (LangChain ecosystem)
Memory primitives (extract, recall, distill) that integrate with LangGraph storage.
- License: MIT
- Install:
pip install langmem (already in the LangGraph ecosystem)
- Architecture: Works with LangGraph’s long-term store. Extraction via LLM prompts.
- Windows: Yes.
Strengths:
- Already in the LangGraph stack — zero new infrastructure.
- Namespacing is native (per-user, per-agent).
- Short-term (checkpointer) and long-term (store) in one stack.
Weaknesses:
- No graph memory.
- Extraction quality depends on prompt configuration.
- Newer, less battle-tested for “memory as a product.”
- Vector-only retrieval, no knowledge graph, no multi-hop reasoning, no temporal queries.
Storage backends evaluated
These are vector databases that sit underneath a memory manager (or standalone for RAG).
Qdrant (local mode)
- License: Apache-2.0
- Install:
pip install qdrant-client
- Local mode:
QdrantClient(path="./db") — persistent disk, no server. Also supports :memory: for ephemeral use.
- Windows: Yes.
- Upgrade path: Same API local vs. server — switch to clustered deployment later without code changes.
- Mem0 integration: Default backend, best-tested code path.
LanceDB
- License: Apache-2.0
- Install:
pip install lancedb
- Local mode: Embedded in-process (like SQLite).
lancedb.connect("./data")
- Windows: Yes (since v0.4.0).
- Features: Hybrid search (vector + BM25), ACID, versioning, columnar (Apache Arrow).
- Mem0 integration: Not supported. A PR was submitted but never merged. LanceDB is not listed in Mem0’s supported vector stores. Usable for standalone RAG, not as a Mem0 backend.
Milvus Lite
- License: Apache-2.0
- Install:
pip install pymilvus
- Local mode: Embedded via local file path.
- Windows: Not supported. Linux and macOS only.
- Verdict: Eliminated for Hiro League.
ChromaDB
- License: Apache-2.0
- Install:
pip install chromadb
- Local mode: Local directory path, no server.
- Windows: Yes.
- Mem0 integration: Supported.
FAISS
- License: MIT
- Install:
pip install faiss-cpu
- Local mode: In-process, file-backed.
- Windows: Yes.
- Mem0 integration: Supported.
Kuzu (graph database)
- License: MIT
- Install:
pip install kuzu
- Local mode: Embedded in-process. No server, no Docker. Persist to a directory or run fully in-memory.
- Windows: Yes.
- Mem0 integration: Supported as graph backend since August 2025.
Scoring
Scoring each memory manager against the evaluation criteria (1–5 scale, 5 = best):
| Criterion | Weight | Mem0 | Hindsight | LangMem |
|---|
| Python, pip install | Deal-breaker | 5 | 5 | 5 |
| LangChain/LangGraph integration | Deal-breaker | 4 (alongside) | 4 (alongside) | 5 (native) |
| Embedded / lightweight | Deal-breaker | 5 (Qdrant local + Kuzu) | 3 (daemon + embedded Postgres) | 5 (LangGraph store) |
| Active maintenance | Deal-breaker | 5 | 5 | 4 |
| Windows support | Deal-breaker | 5 | 2 (unconfirmed) | 5 |
| Conversation extraction built-in | Strong | 5 | 5 | 4 |
| Multi-layer memory | Strong | 4 (long-term; keep LangGraph for short) | 5 (all layers) | 3 (primitives only) |
| Scoping / namespacing | Strong | 5 (user_id, agent_id) | 4 (bank-based) | 5 (namespace-native) |
| Temporal awareness | Strong | 4 (added Feb 2026) | 5 (core design) | 2 |
| Graph memory path | Future | 5 (Kuzu embedded) | 4 (built-in) | 1 (none) |
| Swappable backend | Future | 5 (many options) | 1 (Postgres only) | 3 (store-agnostic) |
| User-inspectable data | Future | 4 (API access) | 4 (API access) | 3 (store queryable) |
| Async support | Future | 4 | 5 | 4 |
| Benchmark (LongMemEval) | Reference | 49% | 91.4% | N/A |
Hindsight elimination rationale
Hindsight has the best benchmark performance and the most ambitious architecture (Reflect, Mental Models, fused retrieval). However, it fails on a deal-breaker criterion: Windows support is unconfirmed, and the embedded PostgreSQL daemon architecture carries real risk on Windows desktops. The daemon model (separate process, localhost HTTP, 1–3 min cold start) also conflicts with the lightweight embedded requirement.
Hindsight is worth monitoring. If Windows support is confirmed and the daemon overhead becomes acceptable, it could supplement or replace the memory layer in a future phase.
LangMem consideration
LangMem scores well on integration (native to LangGraph) but lacks graph memory and temporal awareness. It is a viable Phase 1 option but does not grow into the full requirement set. Choosing it would mean replacing it later when graph and temporal needs arise.
Selected stack
Memory manager: Mem0
Mem0 meets all deal-breaker criteria and scores highest across the full weighted evaluation. It provides conversation extraction, vector + graph memory, temporal search, scoping, and flexible backend choices — all installable via pip with no external services.
Vector backend: Qdrant (local mode)
Qdrant is Mem0’s default and most-tested vector backend. Local mode (path="./db") requires no server, works on Windows, and uses the same API as the server deployment — providing a clean upgrade path if needed later.
Graph backend: Kuzu (embedded)
Kuzu is an embedded in-process graph database. pip install kuzu, point at a directory, done. No Neo4j, no Docker. Supported by Mem0 since August 2025. Enables the relationship memory layer (family tree, social graph, entity connections) without infrastructure overhead.
Stack summary
| Layer | Tool | Install | Storage location |
|---|
| Short-term (conversation) | LangGraph AsyncSqliteSaver | Already in place | workspace.db |
| Long-term (facts, preferences, knowledge) | Mem0 + Qdrant local | pip install mem0ai qdrant-client | Workspace directory |
| Graph (relationships, entities) | Mem0 graph + Kuzu | pip install kuzu | Workspace directory |
All components are pip install-able, run embedded on Windows, and store data locally alongside the existing workspace.db.
Implementation phases
Phase 1 — foundation
- Keep LangGraph
AsyncSqliteSaver checkpointer for short-term conversation memory (already in place).
- Add Mem0 with Qdrant local for long-term memory extraction and retrieval.
- Wire
m.add() calls after agent conversations to extract and store memories.
- Inject relevant retrieved memories into agent prompts before each response.
- Scope memories per user and per agent using Mem0’s built-in
user_id and agent_id.
Phase 2 — richer memory
- Add structured memory types (preferences, facts, opinions, situations) with metadata and categories.
- Add memory consolidation: summarize old conversations asynchronously.
- Add user-facing memory view in admin UI (inspect, edit, delete).
- Tune extraction quality: importance filtering, confidence scoring, deduplication.
Phase 3 — graph memory
- Enable Mem0 graph memory with Kuzu as the embedded graph backend.
- Extract entities and relationships from conversations (family members, places, people, connections).
- Use graph traversal for relationship queries (“who is the user’s daughter’s teacher?”).
- Feed graph context alongside vector results into agent prompts.
Phase 4 — advanced
- Cross-agent memory sharing policies (Therapist private vs. shared pool).
- Sentiment and emotion tracking over time.
- Memory-driven proactive behavior (“you mentioned your interview is today”).
- Re-evaluate Hindsight if Windows support is confirmed.