headroom is an open-source context compression tool that sits between your agent's orchestrator and the LLM API. It intercepts outbound context — tool results, file contents, conversation history — and compresses it using specialized strategies (SmartCrusher for JSON, CodeCompressor for code ASTs, Kompress for prose). Available as a Python/Node library, a drop-in proxy, or an MCP server. Built by Tejas Chopra (Netflix). CCR mode stores originals locally for reversible compression.
headroom compresses everything your AI agent reads — tool outputs, logs, files, RAG chunks, and conversation history — before it reaches the LLM. The tool achieves 60–95% fewer tokens while maintaining answer quality, as validated on GSM8K (±0.000 delta), TruthfulQA (+0.030), and BFCL tool-use (97% accuracy at 32% compression).
The project hit #1 on GitHub trending with +3,139 stars/day (12.8k total) in June 2026, driven by the AI cost reckoning — Uber’s $1,500/month cap on coding tools and similar enterprise budget blowouts.
headroom routes content through a ContentRouter that detects the type and applies the right compression strategy. SmartCrusher handles JSON arrays and nested objects with deterministic, schema-preserving compression. CodeCompressor parses syntax trees for six languages. Kompress-base is a custom HuggingFace model trained on agentic traces for prose and documentation.
Real-world benchmarks show 92% token reduction on code search (100 results: 17,765 → 1,408 tokens), 92% on SRE incident debugging (65,694 → 5,118 tokens), and 73% on GitHub issue triage (54,174 → 14,761 tokens).
The proxy mode requires zero code changes — start it on port 8787 and point your agent’s base URL at it. The MCP server exposes headroom_compress, headroom_retrieve, and headroom_stats tools. The library mode provides inline compress(messages) for custom agent code. Python 3.10+ and Node.js supported. Apache 2.0 licensed.
Persistent memory layer for AI coding agents — benchmark-backed (95.2% on LongMemEval-S), 92% fewer tokens per session vs full-context pasting, zero manual memory.add() calls.
AWS's AI-powered coding assistant that helps developers build, deploy, and optimize applications on AWS with code generation and transformation.
Open-source AI pair programming tool that works in your terminal to edit code across your entire repository.