Memory requirements - Hiro League

No backward compatibility is required (initial development). This is a clean-slate design doc covering requirements, evaluation criteria, library analysis, and the selected stack.

Context

Hiro League agents (Personal Assistant, Life Coach / Therapist, Research Agent, Home & Family Agent, Social & Media Agent) need memory that spans conversations, sessions, and time. Users share personal information, family details, emotional state, preferences, opinions, schedules, and history. The system must remember across sessions, reason about relationships, and track how facts change over time. The agents already use LangGraph AsyncSqliteSaver for short-term conversation state (checkpointed per thread). This document covers the long-term memory layer that sits alongside that.

Memory categories

These categories reflect the kinds of information Hiro League agents must retain:

Category	Examples	Volatility	Sensitivity	Graph needed?
Identity and profile	Name, family members, home setup	Very low	Medium	Yes (family tree)
Preferences	”Prefers dark roast coffee”	Low	Low	No
Opinions	”Thinks remote work is better”	Medium	Low–Medium	No
Facts and knowledge	”Daughter’s school is Lincoln Elementary”	Low	Medium	Yes (relationships)
Emotional state	”Feeling anxious about job”	High	High	No
Situations	”Going through divorce”	Medium	High	Yes (people involved)
Episodes	”We discussed meditation on March 15”	Immutable (decays in relevance)	Medium	No
Relationships	”Wife Sarah, kids Tom (8) and Emma (5)“	Low	Medium	Core use case
Commitments	”Bot promised to check in about interview”	Medium (has deadline)	Low	No
Interaction patterns	”User prefers morning check-ins”	Low	Low	No

Memory lifecycle

Write policy

Not every utterance becomes a memory. Importance filtering is required — “I had coffee” vs. “I got diagnosed with diabetes” carry different signal.
Explicit memories (“remember this”) and implicit memories (system-extracted from conversation) are both needed.
Confidence scoring: distinguish facts from passing comments.
Source tracking: which conversation, which agent, when.

Read policy

Different agents need different memory views. The Therapist needs emotional history; the Home Agent needs family schedules.
Retrieval must be context-aware: mentioning “my daughter” should surface daughter-related memories, not everything.
Recency bias for some queries, completeness for others.
Fast enough to not break conversational flow.

Update and mutation

Facts change: “I work at Google” becomes “I left Google and joined Meta.”
The old fact should be superseded with temporal markers, not deleted.
Opinions shift. Contradictions need a resolution strategy — prefer the most recent statement.

Expiration and forgetting

Some memories should decay (mood on a specific day).
Some should never expire (family members, medical conditions).
Users must be able to say “forget this” — right to be forgotten is critical for trust.
Stale situational context needs cleanup (“I’m preparing for my interview” — interview was months ago).

Multi-agent memory architecture

Hiro League has multiple agents with different roles accessing a shared user. This drives specific requirements:

Shared memory layer

All agents should know core identity/profile facts and active situations. No agent should contradict another about basic facts.

Agent-private memory

Therapist conversations may be more sensitive. The user might share something with the Therapist that they do not want the Personal Assistant to reference. Memory scoping and access control between agents is needed.

Cross-agent memory signals

If the Home Agent detects the user hasn’t been sleeping well (IoT data), the Therapist should know. If the Personal Assistant knows about a stressful meeting tomorrow, the Life Coach can check in. This is selective inter-agent memory sharing.

Agent-specific observation memory

Each agent’s own interaction history with the user — what worked, what didn’t, what the user liked. Personalization per agent role, not just per user.

Temporal requirements

Point-in-time recall: “What did I tell you about X last month?”
Trend detection: mood over time, habit compliance, preference drift.
Temporal validity: “I’m on vacation until Friday” has an expiration.
Event sequencing: cause-and-effect in the user’s life narrative.
Cyclical awareness: birthdays, recurring appointments, seasonal patterns.

Privacy and trust

Encryption at rest — memory stores must be encrypted.
Tiered sensitivity: medical > emotional > preferences > general facts.
User visibility: user should be able to see what has been remembered (memory dashboard).
User control: edit, delete, correct memories.
No leakage: memories from one user must never bleed into another (multi-user household).
Audit trail: who wrote what memory, when, from which conversation.
Local-first: aligns with Hiro League’s “private by design” principle.

Scale and performance

Memory grows continuously over months and years of daily use.
Retrieval must stay fast even with thousands of memories.
Consolidation (turning verbose conversations into compact facts) should happen asynchronously, not blocking chat.
Memory should be searchable by the user, not just by agents.

Library evaluation criteria

These criteria are ordered by importance for Hiro League’s desktop server deployment.

Tier 1 — deal-breakers

#	Criterion	Why it matters
1	Server-side, Python-native	The server is a Python desktop app. The library must be `pip install`-able.
2	Works with LangChain/LangGraph	Already invested in this stack. Must integrate natively or wire in with minimal glue.
3	Embedded / lightweight storage	Users install on a desktop. No Docker, no external services. SQLite, local files, or embedded DBs only.
4	Active maintenance	Must have active releases in 2025–2026. Abandoned projects are excluded.
5	Windows support	Hiro League’s primary target is Windows desktop.

Tier 2 — strong preferences

#	Criterion	Why it matters
6	Conversation analysis / extraction built-in	We do not want to build a custom memory manager. The library should extract facts from conversations.
7	Multi-layer memory	Short-term (conversation buffer) + long-term (facts, preferences, knowledge) in one library or tightly integrated pair.
8	Scoping / namespacing	Multiple agents, multiple users. Per-user, per-agent, per-shared-pool scoping required.
9	Temporal awareness	Facts change. Newer facts should supersede older ones without losing history.

Tier 3 — future-proofing

#	Criterion	Why it matters
10	Graph memory path	Relationships (family, social) will eventually need graph memory. Support it natively or do not lock it out.
11	Swappable storage backend	Today embedded, tomorrow maybe Postgres. Must not hardcode one vector DB.
12	User-facing memory (inspect/edit/delete)	Users want to see what the bot remembers, correct it, delete it. Data must be inspectable.
13	Async support	Server is async. Synchronous blocking in the memory hot path hurts responsiveness.

Library categories

There are two distinct layers that are often conflated: Memory managers analyze conversations, extract memories, manage lifecycle, handle retrieval. You talk to them, they figure out what to remember. Storage backends store and retrieve vectors/data. They do not decide what to store — they store what you give them efficiently. A memory manager uses a storage backend underneath. These are not either/or choices; they are different layers.

Memory managers evaluated

Mem0

Purpose-built memory layer. Feed it conversations, it extracts facts/preferences/entities, stores them, retrieves relevant ones later.

License: Apache-2.0
Install: pip install mem0ai
Architecture: Vector memory (semantic search) + graph memory (entity/relationship extraction and traversal). Dual retrieval: vector narrows candidates, graph expands connected context.
Vector backends: Qdrant (default), ChromaDB, FAISS, PGVector, Milvus, Pinecone, Weaviate, Elasticsearch, Redis, and others.
Graph backends: Neo4j, Memgraph, Kuzu (embedded, in-process), Apache AGE, FalkorDB.
LLM support: OpenAI, Ollama (fully offline), and others.
Windows: Yes — all embedded backends (Qdrant local, ChromaDB, FAISS, Kuzu) work on Windows.

Recent features (2026):

Temporal search filtering (“what happened last week”) — February 2026
Hybrid search (semantic + keyword) — January 2026
Graph memory status tracking
Improved extraction quality

Strengths:

Flexible backend choices — pick what fits the deployment.
Graph memory with Kuzu = embedded graph, zero infrastructure.
Active community, well-documented.
Scoping by user_id, agent_id built in.
You control the extraction pipeline.

Weaknesses:

You still wire the “when to extract” logic (call m.add() with conversation text).
Not framework-native to LangGraph — integrated alongside.
LongMemEval benchmark: 49% (significantly lower than Hindsight).

Hindsight

Newer memory system that models Retain / Recall / Reflect as first-class operations. Treats memory as a reasoning substrate.

License: MIT
Install: pip install hindsight-embed (embedded daemon) or pip install hindsight-client (SDK)
Architecture: Four memory networks — World Facts, Experiences, Observations (synthesized), Mental Models (curated summaries). Multi-strategy retrieval: temporal reasoning, graph traversal, BM25, semantic search, cross-encoder reranking.
Storage: Embedded PostgreSQL (pg0), bundled inside the daemon.
LLM support: OpenAI, Groq, Google, Ollama (fully offline).

Strengths:

Best benchmark performance: 91.4% on LongMemEval (state of the art).
“Reflect” synthesizes across memories to generate answers, not just retrieve.
Mental Models (pre-computed reflections that auto-refresh) — useful for “what do I know about this user’s family.”
Graph + temporal + keyword + semantic search fused together.
Bank-based scoping (each agent gets a bank).

Weaknesses:

Windows support unconfirmed — embedded PostgreSQL (pg0) is the risk. Historically poor Windows support for embedded Postgres tooling.
Daemon architecture adds a separate process (auto-starts, auto-shuts after 5 min idle). Not an in-process library call — there is a localhost HTTP hop.
Heavier footprint (Postgres + ML models, 1–3 min cold start on first run).
Newer, smaller community than Mem0.
Storage is built-in Postgres only — no swappable vector backend.

LangMem (LangChain ecosystem)

Memory primitives (extract, recall, distill) that integrate with LangGraph storage.

License: MIT
Install: pip install langmem (already in the LangGraph ecosystem)
Architecture: Works with LangGraph’s long-term store. Extraction via LLM prompts.
Windows: Yes.

Strengths:

Already in the LangGraph stack — zero new infrastructure.
Namespacing is native (per-user, per-agent).
Short-term (checkpointer) and long-term (store) in one stack.

Weaknesses:

No graph memory.
Extraction quality depends on prompt configuration.
Newer, less battle-tested for “memory as a product.”
Vector-only retrieval, no knowledge graph, no multi-hop reasoning, no temporal queries.

Storage backends evaluated

These are vector databases that sit underneath a memory manager (or standalone for RAG).

Qdrant (local mode)

License: Apache-2.0
Install: pip install qdrant-client
Local mode: QdrantClient(path="./db") — persistent disk, no server. Also supports :memory: for ephemeral use.
Windows: Yes.
Upgrade path: Same API local vs. server — switch to clustered deployment later without code changes.
Mem0 integration: Default backend, best-tested code path.

LanceDB

License: Apache-2.0
Install: pip install lancedb
Local mode: Embedded in-process (like SQLite). lancedb.connect("./data")
Windows: Yes (since v0.4.0).
Features: Hybrid search (vector + BM25), ACID, versioning, columnar (Apache Arrow).
Mem0 integration: Not supported. A PR was submitted but never merged. LanceDB is not listed in Mem0’s supported vector stores. Usable for standalone RAG, not as a Mem0 backend.

Milvus Lite

License: Apache-2.0
Install: pip install pymilvus
Local mode: Embedded via local file path.
Windows: Not supported. Linux and macOS only.
Verdict: Eliminated for Hiro League.

ChromaDB

License: Apache-2.0
Install: pip install chromadb
Local mode: Local directory path, no server.
Windows: Yes.
Mem0 integration: Supported.

FAISS

License: MIT
Install: pip install faiss-cpu
Local mode: In-process, file-backed.
Windows: Yes.
Mem0 integration: Supported.

Kuzu (graph database)

License: MIT
Install: pip install kuzu
Local mode: Embedded in-process. No server, no Docker. Persist to a directory or run fully in-memory.
Windows: Yes.
Mem0 integration: Supported as graph backend since August 2025.

Scoring

Scoring each memory manager against the evaluation criteria (1–5 scale, 5 = best):

Criterion	Weight	Mem0	Hindsight	LangMem
Python, pip install	Deal-breaker	5	5	5
LangChain/LangGraph integration	Deal-breaker	4 (alongside)	4 (alongside)	5 (native)
Embedded / lightweight	Deal-breaker	5 (Qdrant local + Kuzu)	3 (daemon + embedded Postgres)	5 (LangGraph store)
Active maintenance	Deal-breaker	5	5	4
Windows support	Deal-breaker	5	2 (unconfirmed)	5
Conversation extraction built-in	Strong	5	5	4
Multi-layer memory	Strong	4 (long-term; keep LangGraph for short)	5 (all layers)	3 (primitives only)
Scoping / namespacing	Strong	5 (user_id, agent_id)	4 (bank-based)	5 (namespace-native)
Temporal awareness	Strong	4 (added Feb 2026)	5 (core design)	2
Graph memory path	Future	5 (Kuzu embedded)	4 (built-in)	1 (none)
Swappable backend	Future	5 (many options)	1 (Postgres only)	3 (store-agnostic)
User-inspectable data	Future	4 (API access)	4 (API access)	3 (store queryable)
Async support	Future	4	5	4
Benchmark (LongMemEval)	Reference	49%	91.4%	N/A

Hindsight elimination rationale

Hindsight has the best benchmark performance and the most ambitious architecture (Reflect, Mental Models, fused retrieval). However, it fails on a deal-breaker criterion: Windows support is unconfirmed, and the embedded PostgreSQL daemon architecture carries real risk on Windows desktops. The daemon model (separate process, localhost HTTP, 1–3 min cold start) also conflicts with the lightweight embedded requirement. Hindsight is worth monitoring. If Windows support is confirmed and the daemon overhead becomes acceptable, it could supplement or replace the memory layer in a future phase.

LangMem consideration

LangMem scores well on integration (native to LangGraph) but lacks graph memory and temporal awareness. It is a viable Phase 1 option but does not grow into the full requirement set. Choosing it would mean replacing it later when graph and temporal needs arise.

Selected stack

Memory manager: Mem0 Mem0 meets all deal-breaker criteria and scores highest across the full weighted evaluation. It provides conversation extraction, vector + graph memory, temporal search, scoping, and flexible backend choices — all installable via pip with no external services. Vector backend: Qdrant (local mode) Qdrant is Mem0’s default and most-tested vector backend. Local mode (path="./db") requires no server, works on Windows, and uses the same API as the server deployment — providing a clean upgrade path if needed later. Graph backend: Kuzu (embedded) Kuzu is an embedded in-process graph database. pip install kuzu, point at a directory, done. No Neo4j, no Docker. Supported by Mem0 since August 2025. Enables the relationship memory layer (family tree, social graph, entity connections) without infrastructure overhead.

Stack summary

Layer	Tool	Install	Storage location
Short-term (conversation)	LangGraph AsyncSqliteSaver	Already in place	`workspace.db`
Long-term (facts, preferences, knowledge)	Mem0 + Qdrant local	`pip install mem0ai qdrant-client`	Workspace directory
Graph (relationships, entities)	Mem0 graph + Kuzu	`pip install kuzu`	Workspace directory

All components are pip install-able, run embedded on Windows, and store data locally alongside the existing workspace.db.

Implementation phases

Phase 1 — foundation

Keep LangGraph AsyncSqliteSaver checkpointer for short-term conversation memory (already in place).
Add Mem0 with Qdrant local for long-term memory extraction and retrieval.
Wire m.add() calls after agent conversations to extract and store memories.
Inject relevant retrieved memories into agent prompts before each response.
Scope memories per user and per agent using Mem0’s built-in user_id and agent_id.

Phase 2 — richer memory

Add structured memory types (preferences, facts, opinions, situations) with metadata and categories.
Add memory consolidation: summarize old conversations asynchronously.
Add user-facing memory view in admin UI (inspect, edit, delete).
Tune extraction quality: importance filtering, confidence scoring, deduplication.

Phase 3 — graph memory

Enable Mem0 graph memory with Kuzu as the embedded graph backend.
Extract entities and relationships from conversations (family members, places, people, connections).
Use graph traversal for relationship queries (“who is the user’s daughter’s teacher?”).
Feed graph context alongside vector results into agent prompts.

Phase 4 — advanced

Cross-agent memory sharing policies (Therapist private vs. shared pool).
Sentiment and emotion tracking over time.
Memory-driven proactive behavior (“you mentioned your interview is today”).
Re-evaluate Hindsight if Windows support is confirmed.

Design decisions

Local build

Website

Todo

Contribution

​Context

​Memory categories

​Memory lifecycle

​Write policy

​Read policy

​Update and mutation

​Expiration and forgetting

​Multi-agent memory architecture

​Shared memory layer

​Agent-private memory

​Cross-agent memory signals

​Agent-specific observation memory

​Temporal requirements

​Privacy and trust

​Scale and performance

​Library evaluation criteria

​Tier 1 — deal-breakers

​Tier 2 — strong preferences

​Tier 3 — future-proofing

​Library categories

​Memory managers evaluated

​Mem0

​Hindsight

​LangMem (LangChain ecosystem)

​Storage backends evaluated

​Qdrant (local mode)

​LanceDB

​Milvus Lite

​ChromaDB

​FAISS

​Kuzu (graph database)

​Scoring

​Hindsight elimination rationale

​LangMem consideration

​Selected stack

​Stack summary

​Implementation phases

​Phase 1 — foundation

​Phase 2 — richer memory

​Phase 3 — graph memory

​Phase 4 — advanced

Context

Memory categories

Memory lifecycle

Write policy

Read policy

Update and mutation

Expiration and forgetting

Multi-agent memory architecture

Shared memory layer

Agent-private memory

Cross-agent memory signals

Agent-specific observation memory

Temporal requirements

Privacy and trust

Scale and performance

Library evaluation criteria

Tier 1 — deal-breakers

Tier 2 — strong preferences

Tier 3 — future-proofing

Library categories

Memory managers evaluated

Mem0

Hindsight

LangMem (LangChain ecosystem)

Storage backends evaluated

Qdrant (local mode)

LanceDB

Milvus Lite

ChromaDB

FAISS

Kuzu (graph database)

Scoring

Hindsight elimination rationale

LangMem consideration

Selected stack

Stack summary

Implementation phases

Phase 1 — foundation

Phase 2 — richer memory

Phase 3 — graph memory

Phase 4 — advanced