In the OpenClaw architecture, we solve this problem with a hybrid memory system that juggles between volatile working memory (RAM/context) and persistent long-term memory (markdown/vectors).
1. the anatomy of memory: short term vs. long term
In order for OpenClaw to act autonomously, it must prioritize information. We technically divide the memory into three layers:
| Layer | medium | Function | Durability |
|---|---|---|---|
| Active Context | LLM Context Window | Current chat history & tool outputs | Minutes (fleeting) |
| Short-Term Memory | SOUL.md / SESSION.log | Current sub-goals and intermediate statuses | Hours (session-based) |
| Long-term memory | MEMORY.md + Vector DB | Historical facts, user preferences | Permanent |
2 The "Summarization Trigger": Preventing the overflow
If an agent works for hours in the shell, the context window (e.g. 128k tokens in Claude 3.5 Sonnet) fills up rapidly. Before the window "overflows" and the model begins to forget the first instructions, OpenClaw initiates a recursive compression loop:
- Token monitoring: The Node.js core constantly monitors the token count of the payload.
- Self-Summarization: If the load reaches a threshold value (e.g. 80%), the orchestrator sends an internal prompt: "Summarize the previous findings and the current system status.
- "The Swap: The detailed log data is removed from the active prompt and replaced by the compact summary. The details are simultaneously written to the SESSION.log on the hard disk.
3. vector pruning:
Relevance through mathFor long-term memory, OpenClaw uses RAG (Retrieval-Augmented Generation). If the user asks: "How did we solve the problem with the Docker container three weeks ago?", the agent does not search linearly in text deserts, but mathematically in vector space.Technically speaking, each text section is converted into a high-dimensional vector (embedding). The similarity between the query ($q$) and the stored documents ($d$) is calculated using cosine similarity
The problem:
Knowledge becomes outdated over time (stale data).
The OpenClaw solution (Vector-Pruning):
The agent performs periodic "memory audits". If information in the MEMORY.md is contradictory (e.g. an old API documentation vs. a new one), the LLM recognizes the conflict during indexing. It actively "prunes" (deletes or archives) the old vector entry in order to maintain the precision of the search results.
4 Write-Back: The art of self-updating
The centerpiece is the tool call update_memory. OpenClaw is constantly updating its own "personality" and knowledge base.
Example of an automated workflow:
- Agent solves a complex bug in a Python script.
- The ReAct loop decides: "This is important for the future."
- Call: fs.writeFile('MEMORY.md', new_insight, { flag: 'a' }).
- The next time OpenClaw is started, this Markdown file is read in, vectorized and is available as "experience".
This process makes OpenClaw local-first and human-readable. You can open the MEMORY.md at any time and see what your agent has learned about you and its tasks - and correct it manually if necessary.
5 Conclusion: Consistency through abstraction
By intelligently swapping between ephemeral token context and permanent Markdown vector logic, OpenClaw bypasses the physical limitations of today's LLMs. The agent retains its "focus" (short term) without losing its "identity" (long term).