CocoIndex is a data transformation and indexing framework purpose-built for AI workloads. Its smart incremental engine keeps target indexes always-fresh and explainable: when a source changes, CocoIndex identifies affected records, propagates the change across joins and lookups, updates the target, and retires stale rows — without touching anything that hasn't changed. The core is written in Rust for production-grade performance. CocoIndex powers RAG pipelines, semantic search backends, and continuously-updated agent memory systems. The CocoIndex ecosystem also includes purpose-built tools like cocoindex-code (a tree-sitter-based code search engine that saves ~70% tokens for coding agents) and a HackerNews trending-topics detector that uses an LLM to extract entities from real-time content. CocoIndex hit GitHub Trending in May 2026 in the Rust category as developers address the architectural challenge of keeping AI agents connected to continuously-changing data.
CocoIndex is an incremental data framework engineered for the AI-agent era — when an agent’s data substrate has to stay fresh under continuous change. Most indexing pipelines today are batch-oriented: rebuild the whole index when something changes, or accept staleness. CocoIndex takes the opposite approach with an incremental engine that detects exactly which records changed, propagates the impact through joins and lookups, updates the target, and retires stale rows. The result is an always-fresh index without the cost of full reprocessing.
The architectural standout is the incremental engine, written in Rust for production-grade throughput. Application developers write transformation logic in Python — a familiar surface — while the core handles change tracking, dependency propagation, and consistency guarantees. CocoIndex’s custom-source API lets you ingest from anywhere: HTTP APIs, file systems, databases, queues. The companion projects round out a complete agent-data stack: cocoindex-code provides AST-based code search that reduces token use ~70% versus full-file context, and the HackerNews trending-topics example shows the framework wired up with an LLM extracting entities from a real-time feed.
CocoIndex shines anywhere the source data churns and the agent needs current information: RAG pipelines over docs that update daily, semantic search over evolving knowledge bases, code-aware coding agents that can’t afford to re-embed the entire repo on every commit, customer-context indexes wired to CRM updates. The realtime-codebase-indexing project specifically targets large repositories with near-real-time freshness, which is the missing piece for production-grade coding agents. The HackerNews demo project ships as a learnable reference implementation for anyone building an “agent that watches a feed and extracts structure.”
CocoIndex is open source under Apache 2.0 — free to use and self-host. As a framework rather than a turnkey product, it requires engineering investment to wire into your agent stack, write transformation logic, and operate the underlying storage targets. For teams that just want vector search over a static document set, simpler products like Chroma or Pinecone may suffice. CocoIndex’s ROI is highest when source data changes frequently and full reindexing would be expensive.
CocoIndex is built for engineering teams shipping production agent systems where data freshness matters and full-rebuild indexing is too slow or too expensive. It pairs especially well with code-aware coding agents (via cocoindex-code), with research and monitoring agents that watch streaming sources, and with any RAG pipeline whose source documents update on a meaningful cadence. If your agent needs to know what’s true right now, not what was true at the last batch run, CocoIndex is the substrate.
Open-source multi-agent framework for AI-driven financial analysis — simulates a hedge fund with specialized analyst agents and a portfolio manager agent.
TypeScript deep financial research agent that produces investor-grade equity research reports — autonomously gathering data, modeling, and writing analyst-quality output.
An AI-powered data analysis agent that lets you chat with your data to generate insights, visualizations, and reports instantly.