Context
6 min read

How We Implemented RLMs Before the Paper Dropped

*When academic research validates production intuitions*

Aidan O'Gara

Contributor · Oct 20, 2018

How We Implemented RLMs Before the Paper Dropped

When academic research validates production intuitions

By Aidan O'Gara


A few weeks ago, the RLM (Recursive Language Model) paper from MIT went viral. "Destroyed context window limits." "10M+ token prompts now possible." Twitter exploded with takes about how this changes everything.

We read the paper with a strange sense of déjà vu. We'd already built something remarkably similar—months before the paper dropped. Not because we're prescient, but because the production problems we were solving forced us in this direction.

Here's the story of how we got here, what we learned, and why academic validation of production intuitions matters.


The Problem: Context Window Limits in Real Workloads

Context windows have grown dramatically. GPT-4 Turbo handles 128k tokens. Claude handles 200k. Gemini claims 1M+. You'd think the context window problem is solved.

It isn't.

Enterprise workloads routinely exceed even the largest context windows. Consider:

  • An M&A deal room with 500 documents totaling 2M tokens
  • A litigation discovery set with thousands of emails and attachments
  • A codebase with hundreds of thousands of lines across multiple repositories
  • An organization's decision traces accumulated over years of operation

You can't stuff this into a context window. And even if you could, attention mechanisms degrade with length. Information gets lost. The model "forgets" things that are technically in context.

The industry answer is RAG: chunk everything, embed it, retrieve relevant pieces, stuff those in the prompt. This works for simple lookup queries but fails for tasks requiring synthesis across many sources.

We needed something better.


The Intuition: What If the Model Could Navigate?

Our intuition came from watching how humans handle large information sets.

A skilled analyst dealing with a massive document corpus doesn't read everything sequentially. They:

  1. Formulate a question about what they're looking for
  2. Navigate to likely locations
  3. Skim to assess relevance
  4. Deep dive into relevant sections
  5. Aggregate findings across multiple sources
  6. Iterate if gaps are identified

This is recursive. Each step might spawn sub-steps. The question "What are the key risks in this deal?" might decompose into:

  • "What financial risks are mentioned in the financial statements?"
  • "What legal risks are identified in the contracts?"
  • "What operational risks appear in the management presentations?"

Each of these spawns further navigation. The process is inherently tree-structured.

Why couldn't a model do this?


The Implementation: Recursive Context Processing

We implemented what we called "recursive context processing" in the Context Engine:

function process(query, context_scope):
    # Assess whether scope is handleable in current context window
    if fits_in_window(context_scope):
        return direct_answer(query, context_scope)
    
    # Decompose into sub-queries and sub-scopes
    sub_tasks = decompose(query, context_scope)
    
    # Recursively process each sub-task
    sub_results = [process(sq, sc) for sq, sc in sub_tasks]
    
    # Aggregate sub-results into final answer
    return aggregate(query, sub_results)

The key insight: the model decides how to decompose. We don't hard-code chunking strategies. The model looks at the query and the scope of context available, then determines how to break it down.

For a query about deal risks, it might decide to process by document type (financials, legal, operational). For a query about a specific person's involvement, it might process by time period. For a technical question, it might process by code module.

The decomposition is query-dependent and context-aware.


What We Learned

Lesson 1: Recursive Processing Preserves Information

Traditional RAG retrieves chunks without understanding their relationship to other chunks. If the answer to your question spans three paragraphs in different documents, RAG might retrieve each paragraph but lose the thread that connects them.

Recursive processing preserves the reasoning chain. The model decomposes into sub-questions, processes each, then synthesizes. The synthesis step explicitly considers how pieces fit together.

In our testing, this produced dramatically better results on synthesis-heavy queries—the kinds of queries that actually matter in enterprise settings.

Lesson 2: The Model Knows How to Navigate

Modern language models have been trained extensively on code navigation. They understand directory structures, cross-references, hierarchical organization. When you give them a navigable context (like Context FS provides), they naturally use navigation strategies.

We didn't have to teach the model to navigate. We had to provide structure that navigation made sense for.

Lesson 3: Compute Is Cheap; Context Is Expensive

Recursive processing uses more compute than single-pass processing. You might call the model 10, 20, 50 times to answer a single query.

We initially worried about this. Turns out, it doesn't matter much. LLM inference is getting cheaper rapidly. What's expensive is poor answers that require human correction. A query that costs $0.50 in compute but saves 30 minutes of analyst time is an obvious win.

The goal isn't minimizing compute. It's maximizing answer quality given the available context.

Lesson 4: Trajectories Are Training Data

Every recursive processing chain generates a trajectory: the decomposition decisions, the navigation paths, the aggregation steps. These trajectories capture how to reason about complex context.

We quickly realized: these trajectories are training data. They show how the model successfully navigated complex contexts to answer real queries. They can be used for:

  • Fine-tuning the decomposition policy
  • Training specialized navigation models
  • Distilling long-context handling into smaller models

This aligns exactly with what the RLM paper notes: "RLM trajectories can be used for RL training or distillation."


When the Paper Dropped

The RLM paper formalized much of what we'd built intuitively:

  • Recursive decomposition to handle contexts beyond window limits
  • Sub-model calls to process chunks independently
  • Aggregation to synthesize sub-results
  • Trajectory capture for training and improvement

The framing was different. Their paper emphasizes the theoretical framework and uses examples like "process a 10M token database." Our implementation was driven by production requirements for enterprise context.

But the core pattern was the same. And seeing it validated academically was reassuring.


Why This Matters

The RLM paper is significant not just as a technique but as a paradigm shift.

Traditional LLM usage stuffs everything into a fixed context window and hopes attention mechanisms handle it. This is like having a meeting with everyone in the company simultaneously and hoping the relevant people speak up.

RLM-style processing enables a different paradigm: deliberate navigation through information space. The model moves through context purposefully, guided by the query, building understanding incrementally.

This is how humans handle large information sets. It's how effective retrieval systems should work. And it's what we needed to make AI useful for real enterprise workloads.


What's Next

We're continuing to develop these ideas:

Adaptive decomposition: Learning from successful trajectories which decomposition strategies work best for which query types.

Hierarchical context: Building multi-level navigation where the model can zoom in and out of detail levels.

Cross-session learning: Using trajectories from past queries to inform navigation strategies for similar future queries.

The context window problem isn't fully solved—there's still plenty of research to do on attention mechanisms, memory architectures, and reasoning strategies. But the recursive processing paradigm is a significant step forward, and we're excited to see it getting academic attention.

Sometimes production needs drive you to solutions that research validates later. That's a good sign that we're working on the right problems.


This is part of a series on technical innovations at Context. Learn more at context.inc

Share this article