Introducing the Elastic Context Window

Every LLM has a context window — the maximum amount of text it can process in a single call. Today's frontier models top out at around 1 million tokens. That sounds like a lot, until you're working with a large legal archive, a multi-year document repository, or a technical dataset that runs to tens of millions of words.

The standard workaround is RAG: retrieve the relevant chunks, discard the rest. But as we've written about elsewhere, retrieval is lossy — and lossiness means missed information, which means wrong answers when it matters most.

We needed a different approach. That approach is the Elastic Context Window (ECW).

The core idea

ECW treats the context window limit not as a ceiling, but as a sliding window. Rather than picking which documents to send to the model, ECW sends all of them — in parallel passes across the catalog — and asks the model to take structured notes on each pass.

Those notes are then synthesized in a reduce phase: the model reads all the notes together and produces a final, coherent answer. Because every document was read in the map phase, nothing is dropped. The reduce phase assembles a complete picture.

The result is a system that can reason accurately over datasets of any size — we've tested beyond 400 million tokens — using any underlying LLM. The model doesn't need to be modified. ECW is a layer above the model, not a change to it.

Why it works with any model

Because ECW operates at the orchestration layer, it works with any model that can follow structured note-taking instructions. We've run it with OpenAI, Anthropic Claude, Google Gemini, and Llama variants — and the accuracy characteristics hold across all of them.

This is a meaningful operational advantage. You're not locked to a single provider. You can run your most sensitive workloads against a locally-deployed Llama model, fully on-prem, with no data leaving your environment.

What it means for accuracy

The benchmark results are clear. On questions where the answer is diffuse — spread across many documents or buried in low-retrieval-scoring passages — ECW consistently outperforms RAG. The accuracy gap grows with dataset size, because the retrieval penalty in RAG compounds as there are more documents to miss.

On straightforward queries where the answer is in one obvious place, the results are comparable. ECW doesn't help much when retrieval already works well. But when retrieval would fail — in the cases that matter most in high-stakes domains — ECW delivers.

What we've built around it

ECW is the engine inside Awarity. On top of it we've built a full product: a UI for ingesting and cataloging documents, a CLI for workflow integration, cloud deployment templates for AWS and Azure, and integrations with VS Code and Word.

If you're dealing with large datasets and accuracy is non-negotiable, we'd love to show you what ECW can do against your data.

Reach us at hello@awarity.ai