agentmemory (by Rohit G — @rohitg00) is the cleanest example of the validator-wave shift in the skill ecosystem: instead of claiming 'we have memory', it ships the LongMemEval benchmark result table directly in the README. The project hit 3,882 stars in early May 2026 at +754/day GitHub trending velocity, and its README opens with the explicit framing 'based on real-world benchmarks.' Architecturally, agentmemory replaces the built-in 200-line cap of CLAUDE.md and .cursorrules with a persistent memory system that captures every tool use automatically (12 auto-capture hooks, zero manual memory.add() calls) and injects relevant context at session start. The retrieval stack uses BM25 + Vector hybrid search, scoring 95.2% on LongMemEval-S (only 1.4pp behind pure vector at 96.6%), with BM25-alone as a baseline at 86.2%. agentmemory ships dedicated integration folders for Claude Code, OpenClaw, and several other harnesses — explicit cross-harness compatibility is part of the value proposition.
agentmemory is a persistent memory layer for AI coding agents that frames itself, deliberately, around academic benchmarks rather than around feature lists. The README claims ‘#1 persistent memory for AI coding agents based on real-world benchmarks’ — and unlike most skill-pack READMEs, the benchmark table is actually in the repository, derived from LongMemEval (an ICLR 2025 long-term-memory evaluation suite).
The project sits at the leading edge of what we’re calling the validator wave: the early-May 2026 shift from “ship more skills” to “ship measurable skills.” It hit 3,882 stars at +754/day GitHub trending velocity on 2026-05-10. The relevant comparison is not other memory libraries — it’s the validator-class projects emerging in the same week (react-doctor, claude-doctor) that share the same posture: claim + benchmark + integration.
agentmemory replaces the practical 200-line capacity ceiling of CLAUDE.md and .cursorrules with a persistent memory store that:
memory.add() calls in user codeThe benchmark suite is the most useful artifact in the repo for evaluating it against alternatives:
| Retrieval strategy | LongMemEval-S accuracy |
|---|---|
| BM25 alone | 86.2% |
| BM25 + Vector hybrid (default) | 95.2% |
| Pure vector | 96.6% |
The 1.4pp gap between hybrid and pure-vector is the operator-grade detail — hybrid runs at meaningfully lower token cost, and the developer can decide which trade-off matters for their workload.
The benchmark-in-the-README posture is the new floor for skill primitives. Until early May 2026, “we have memory” was a sufficient claim. The agentmemory contract — claim + benchmark + integration folder — is what we expect the next wave of skill primitives to converge on, because once a benchmark exists in one corner of the ecosystem, the next-most-rational user demands one for the rest.
agentmemory ships dedicated integration folders for Claude Code, OpenClaw, and several other harnesses inside /integrations/. The cross-harness posture matters strategically: a memory layer that’s only on one harness is structurally narrower than a memory layer that travels.
Open-source AI pair programming tool that works in your terminal to edit code across your entire repository.
Open-source AI coding harness builder that makes AI coding workflows deterministic and repeatable via YAML-defined DAG workflows.
AWS's AI-powered coding assistant that helps developers build, deploy, and optimize applications on AWS with code generation and transformation.