About headroom

headroom is an open-source context compression tool that sits between your agent's orchestrator and the LLM API. It intercepts outbound context — tool results, file contents, conversation history — and compresses it using specialized strategies (SmartCrusher for JSON, CodeCompressor for code ASTs, Kompress for prose). Available as a Python/Node library, a drop-in proxy, or an MCP server. Built by Tejas Chopra (Netflix). CCR mode stores originals locally for reversible compression.

Key Features

60–95% token reduction on tool outputs, logs, and RAG chunks

Three install modes: library, proxy (port 8787), MCP server

AST-aware code compression for Python, JS, Go, Rust, Java, C++

CacheAligner stabilizes prefixes for Anthropic/OpenAI KV cache hits

CCR reversible compression — LLM can retrieve originals on demand

Compatible with Claude Code, Codex, Cursor, Aider, Copilot CLI

Overview

headroom compresses everything your AI agent reads — tool outputs, logs, files, RAG chunks, and conversation history — before it reaches the LLM. The tool achieves 60–95% fewer tokens while maintaining answer quality, as validated on GSM8K (±0.000 delta), TruthfulQA (+0.030), and BFCL tool-use (97% accuracy at 32% compression).

The project hit #1 on GitHub trending with +3,139 stars/day (12.8k total) in June 2026, driven by the AI cost reckoning — Uber’s $1,500/month cap on coding tools and similar enterprise budget blowouts.

Key Capabilities

headroom routes content through a ContentRouter that detects the type and applies the right compression strategy. SmartCrusher handles JSON arrays and nested objects with deterministic, schema-preserving compression. CodeCompressor parses syntax trees for six languages. Kompress-base is a custom HuggingFace model trained on agentic traces for prose and documentation.

Real-world benchmarks show 92% token reduction on code search (100 results: 17,765 → 1,408 tokens), 92% on SRE incident debugging (65,694 → 5,118 tokens), and 73% on GitHub issue triage (54,174 → 14,761 tokens).

Integration

The proxy mode requires zero code changes — start it on port 8787 and point your agent’s base URL at it. The MCP server exposes headroom_compress, headroom_retrieve, and headroom_stats tools. The library mode provides inline compress(messages) for custom agent code. Python 3.10+ and Node.js supported. Apache 2.0 licensed.

headroom

About headroom

Key Features

Overview

Key Capabilities

Integration

Similar Agents

agentmemory

Amazon Q Developer

Aider