Incremental Indexing for Live Documents

February 28, 2026 | 12 minutes

Once you add real retrieval to an AI system (something I covered in the previous post on hybrid search) the next problem shows up immediately: how do you keep the index up to date while the user is actively editing?

Re-building the entire index on every change is wasteful. Debouncing the whole document is sloppy. Running it nightly means your agent reasons over stale text. In collaborative systems built on tools like Yjs, documents aren't static blobs. They're living trees, and the index has to keep pace with them.

The key idea is this: don't debounce time, debounce structure. This post walks through how to build incremental indexing for live documents without melting your CPU, and why this matters for agent reliability in the kind of retrieval pipelines I've been writing about.

The Naive Approaches and Why They Fail

Before I landed on structural dirty-tracking, I tried (or at least considered) every obvious shortcut. None of them hold up once you're dealing with a real editing session.

Re-index the Entire Manuscript on Every Change

This is the simplest approach to reason about: something changed, so re-build everything. The problem is that it's completely impractical at scale. Every keystroke triggers a full re-index, which burns CPU, increases latency, and scales with the total document size every single time. For a short note, it's fine. For a 60,000-word document, it's a non-starter.

Debounce the Whole Document

The next idea I reached for is a time-based debounce: wait two seconds after typing stops, then re-build the entire index. This is better than re-indexing on every keystroke, but you're still re-building everything. The cost is proportional to the full document, not the change. And it feels arbitrary — you're guessing at the right delay rather than reacting to what actually changed.

Nightly Re-build

A scheduled nightly re-build is safe and predictable, but the index drifts throughout the editing day. If a someone spends four hours revising a document, the search index is four hours stale by the time they ask the agent a question. In my experience, search results lying to your agent is worse than slow search. Stale evidence leads to stale reasoning, and that's a harder bug to track down than latency. It also reduces user trust in your agent.

The Structural Insight

Here's the pivot that made everything else fall into place: documents are trees.

You don't edit "a document." You edit a scene, a paragraph, a node. The document has structure, and that structure gives you a natural unit for tracking what changed. Instead of asking "did the document change?", you ask "which nodes changed?" and re-index only those.

The tradeoff is that you need stable IDs on your structural nodes. Think node IDs, chapter IDs, whatever maps to meaningful boundaries in your document. But if you're working with a structured editor (Slate, ProseMirror), you likely have those already.

The Incremental Indexing Pipeline

At a high level, the pipeline looks like this: a Yjs update comes in, you detect which node IDs were affected, add them to a dirty set, and let a background worker pick them up. The worker re-builds the text projection for each dirty node, updates the BM25 and vector indices, and clears the dirty flag.

Each step is simple on its own. The interesting part is getting the boundaries right: what counts as a "node," how you detect changes, and when the worker runs.

Detecting Changed Nodes in Yjs

Yjs gives you document updates as opaque byte arrays. You can't inspect individual characters directly, but you can track which subtrees changed by giving your structural nodes stable IDs (nodeId, paragraphId, etc.) and mapping updates to those IDs.

1doc.on("update", (update, origin) => {
2  const changedNodeIds = extractChangedNodeIds(update);
3
4  for (const nodeId of changedNodeIds) {
5    markNodeDirty(docId, nodeId);
6  }
7});

The extractChangedNodeIds function depends on your document schema, but the pattern is always the same: walk the update, figure out which structural nodes it touches, and mark them dirty. Section-level granularity is usually enough. You don't need paragraph-level precision to keep the index honest, you just need to know which scenes changed.

The Postgres Schema

The data model is straightforward. You need four tables: one for the structural nodes themselves, one for their flattened text content, one for dirty tracking, and one for the search index chunks.

Node metadata stores the tree structure:

1CREATE TABLE manuscript_nodes (
2  id TEXT PRIMARY KEY,
3  doc_id TEXT NOT NULL,
4  parent_id TEXT,
5  type TEXT,
6  ord_path TEXT,
7  updated_at TIMESTAMP
8);

Node content projections hold the flattened text for each node. This is what the background worker re-builds when a node is marked dirty:

1CREATE TABLE node_content (
2  node_id TEXT PRIMARY KEY,
3  text_projection TEXT,
4  token_count INT,
5  updated_at TIMESTAMP
6);

Dirty tracking is intentionally simple with just the documend ID, node ID, and a timestamp:

1CREATE TABLE dirty_nodes (
2  doc_id TEXT,
3  node_id TEXT,
4  last_dirty_at TIMESTAMP,
5  PRIMARY KEY (doc_id, node_id)
6);

And the search index table holds the chunked text that BM25 and vector search actually query against:

1CREATE TABLE node_chunks (
2  node_id TEXT,
3  chunk_ord INT,
4  text TEXT,
5  PRIMARY KEY (node_id, chunk_ord)
6);

Your vector index would store embeddings per chunk alongside this. The key idea is that when a node gets re-indexed, you only replace the chunks for that node; everything else stays untouched.

One thing to watch for: re-chunking a node after an edit can shift chunk boundaries. If you're splitting by character count, adding a paragraph in the middle of a scene can cause every subsequent chunk in that scene to contain different text, even if those paragraphs didn't change. That means embeddings get recomputed for chunks that are only slightly different, and any citations pointing at a specific chunk ordinal (the chunk_ord column from the schema above) might now reference shifted content. Scene-level granularity helps contain the blast radius since you're only re-chunking one scene at a time, not the whole document. But the chunking strategy itself matters too.

Choosing a Chunking Strategy

How you split a node's text into chunks affects retrieval quality, index stability, and how much work you redo on each edit. There are several approaches, and the right one depends on what you're optimizing for.

Fixed-Size Character Chunking

The simplest option: split text into chunks of roughly 1,200 to 1,500 characters, breaking at the nearest whitespace. This is easy to implement and gives you predictable chunk sizes, which is useful when you need to stay within embedding model token limits. The downside is that edits in the middle of a scene shift every subsequent chunk boundary, so even unchanged paragraphs get re-embedded. Use this when you're prototyping or when chunk stability doesn't matter much, like if you re-build the full node on every dirty flush anyway.

Fixed-size character chunking splits at fixed intervals regardless of content structure

Paragraph-Boundary Chunking

Split on double newlines or paragraph breaks, then merge adjacent short paragraphs until you hit a target size. This is more stable than fixed-size chunking because adding a sentence to one paragraph doesn't affect the boundaries of the next. It also produces chunks that align with how people write, which tends to improve retrieval relevance since a paragraph usually contains one coherent idea. The tradeoff is that paragraph lengths vary widely. A one-line paragraph produces a tiny chunk with a weak embedding, and a long block of unbroken prose produces an oversized chunk. You need a minimum and maximum size to keep things reasonable.

Paragraph-boundary chunking splits at natural paragraph breaks and merges short paragraphs

Sentence-Boundary Chunking

Split on sentence boundaries (periods, question marks, exclamation points followed by whitespace), then group sentences into chunks up to your target size. This gives you the most deterministic split points because editing one sentence almost never affects neighboring chunks. It works well for technical or legal documents where sentences are self-contained units of meaning. The tradeoff is that it's harder to implement correctly. Abbreviations, decimal numbers, and quoted dialogue all contain periods that aren't sentence boundaries. In my experiements, a regex-based sentence splitter gets you 90% of the way there, but edge cases will bite you if your content is messy.

Sentence-boundary chunking splits at sentence ends so edits don't cascade to neighboring chunks

Structural Chunking

If your document has explicit structure below the scene level such as labeled sections, headings within a scene, or numbered steps, you can chunk on those boundaries. This produces the most semantically meaningful chunks because each one maps to a structural unit the writer created intentionally. It's also the most stable, since structure changes less often than prose. The tradeoff is that it only works when the structure exists. Freeform narrative prose doesn't have sub-scene headings, so you'd fall back to paragraph or sentence chunking for those nodes.

Structural chunking maps each chunk to an author-defined section in the document tree

Which One to Use

For most incremental indexing systems, paragraph-boundary chunking with min/max size limits is the sweet spot. It's stable enough that routine edits don't cascade, it aligns with natural writing units, and it's straightforward to implement. If your agent cites specific chunks in its responses and those citations need to stay valid across edits, sentence-boundary chunking gives you the most stability. If you're just getting started and want something working fast, fixed-size chunking is fine. You can always swap the strategy later since the chunking logic is isolated inside rebuildProjection.

The Background Worker Strategy

Don't re-index immediately on every keystroke. That would defeat the purpose. Instead, run a background worker on a schedule that balances freshness against editing performance.

A good default is to flush dirty nodes every 20 to 30 seconds while the user is actively editing. You also want to flush immediately on blur or tab-hidden events (the user just switched away, so there's no typing to interfere with). And you run a full re-build occasionally as a safety net, in case something slips through the cracks.

1async function flushDirtyNodes(docId: string) {
2  const dirty = await getDirtyNodes(docId);
3
4  for (const nodeId of dirty) {
5    const text = rebuildProjection(nodeId);
6    await updateSearchIndex(nodeId, text);
7    await clearDirty(nodeId);
8  }
9}

The worker loop itself is simple. The complexity lives in rebuildProjection, which walks the Yjs subtree for that node and flattens it into plain text for indexing. Everything else is just plumbing.

In a browser environment, this should run in a Web Worker so the main thread stays free for editing. The indexer is doing CPU work (walking subtrees, flattening text, potentially computing embeddings) and none of that should compete with keystroke handling. Post a message to the worker with the list of dirty node IDs, let it do the re-building off-thread, and have it post back when it's done.

Why This Matters for Agents

This is the part that connects back to the retrieval pipeline from the previous post. Agents reason over retrieved evidence. If your index is stale, the retrieval layer returns outdated text, the evidence is wrong, the reasoning fails, and verification layers misfire because they're checking against bad data.

Incremental indexing doesn't just save CPU. It stabilizes agent cognition. When the index tracks changes at the structural level, the evidence stays fresh and the agent's reasoning stays grounded. A stale index is a silent source of hallucination but the agent isn't making things up, it's working with bad inputs.

Real-Time vs. Eventual Consistency

You don't need millisecond-perfect indexing. Users won't notice if search is 15 seconds behind while they're typing. They will notice if typing lags because the indexer is hogging the event loop.

The bias should be toward fast editing and eventual index freshness. Perfect real-time sync is an over-investment for this kind of system.

That said, there's one moment where freshness really matters: right before the agent responds. If a user edits a scene and immediately asks a question about it, the index might still have the old version queued for re-indexing. The agent would retrieve stale text and reason over content that no longer exists.

The fix is to give the agent a way to check for pending flushes and force one before it runs retrieval. Expose a simple API like hasPendingFlush(docId) and awaitFlush(docId) that the agent calls at the start of any search-dependent workflow. If dirty nodes exist, flush them synchronously before retrieving. If not, proceed immediately. This adds a few hundred milliseconds at most, and only when it matters.

1async function ensureFreshIndex(docId: string) {
2  const dirty = await getDirtyNodes(docId);
3  if (dirty.length > 0) {
4    await flushDirtyNodes(docId);
5  }
6}
7
8// In the agent's search workflow:
9await ensureFreshIndex(docId);
10const results = await hybridSearch(docId, query);

This turns eventual consistency into "consistent at the moment of query," which is the guarantee that actually matters. The index can drift while the user is typing, but it snaps up to date the instant the agent needs it.

A Full Agent Workflow with Incremental Indexing

Here's how all the pieces fit together in a real editing session. The user edits a scene, Yjs marks it dirty, the background worker re-builds the projection and updates both the lexical and vector indices. When the user later runs a search, BM25 retrieves candidate scenes, vector search re-ranks them semantically, and Gemini reasons over the top evidence.

Each layer has a single responsibility. The indexer keeps the data fresh. The retrieval layer finds candidates. The LLM reasons over bounded evidence. Nothing is trying to do two jobs at once.

The Bigger Pattern

The hardest part of AI search isn't retrieval. It's change.

If you design your system so that structure is explicit, dirty nodes are tracked, retrieval is incremental, and reasoning is bounded, something interesting happens: your agents become calmer. They stop hallucinating over stale data. Your search becomes reliable because the index is honest. And your infrastructure scales because you're only doing work proportional to what changed, not proportional to the entire document.

What's Next

This post is part of a series on building better retrieval for agentic systems. Incremental indexing solves the "keeping the index honest" problem, but there's more to build on top of it. In upcoming posts, I'll be covering contradiction detection over incremental updates and more.