6 Jun 2026

Sleep-time compute

The Context team

An agent doing real work produces a lot of exhaust. Every task leaves a raw trail: what it retrieved, what it tried, where it went wrong, what a human corrected. That trail is where the learning is, and it arrives in the worst possible form, raw and voluminous, at the worst possible time, while a user is waiting for an answer.

So we do not learn from it then. We learn from it later, on idle compute, the way a person consolidates the day's work overnight. We call it sleep-time compute.

The hot path is the wrong place to learn

At inference time, two things are scarce: latency and attention. The user is waiting, and every millisecond spent tidying up the record of the last task is a millisecond the current one is slower. Distilling a messy trace into clean, reusable context while the user watches is the wrong trade.

So the platform does almost none of it there. During a task it captures the raw trail cheaply and moves on. The expensive part, turning that trail into something the next agent can use, happens off the hot path entirely, where there is no user clock to run down.

There are only two other options, and both are worse. Do the distillation inline, and every task pays a tax to clean up after the last one, which users feel as latency. Or never do it, and the system never learns, throwing away the most valuable thing each task produced. Off-the-hot-path consolidation is how you get the learning without the tax.

Why sleep is the right word

The analogy to sleep is more than cute. Biological memory consolidation does something structurally similar: the day's raw experience is replayed and compressed into durable memory while the body is offline, precisely because doing it during waking hours would interfere with acting.

Shifting consolidation to idle time is the same move, for the same reason. The system acts during the day and learns at night, and neither gets in the other's way. The work stays fast because the learning waited for a quiet moment.

Process it while nothing is happening

Between active tasks, there is compute sitting idle and no user waiting on it. That is when sleep-time compute runs. It takes the raw activity from recent work and consolidates it into structured knowledge, the procedural notes, the corrections, the recurring patterns, written back into the context layer where future agents will find them.

The work that would have cost latency in the hot path costs nothing on idle cycles, and it improves every retrieval that comes after. Tomorrow's tasks are faster and better because of processing that ran on time no one was using. Nothing today was slower for it.

Make it concrete. While building an earnings deck, an agent leaves a raw trail: the files it opened, the chart it drew, and an expert's correction that the numbers should be adjusted EBITDA and the waterfall should run quarter over quarter. Raw, that is a few hundred lines of event log, useless to the next agent as it stands. Sleep-time compute reads it and writes one durable note into the account's context: this account uses adjusted EBITDA, waterfalls run quarter over quarter. The next agent reads the note in a sentence, not the log in a thousand lines.

What the pipeline does

Concretely, the consolidation is a pipeline. Raw events are distilled into candidate observations. Signal is extracted from the noise, the recurring correction kept, the one-off discarded. And the kept signal is written into the context store as structured, retrievable knowledge that the next agent reads at the right moment.

Raw work events
traces, corrections, observations
Distill
raw to candidate notes
Extract signal
keep recurring, drop one-off
Curate into .context
structured, retrievable
Generator proposes, Reflector critiques, Curator writes. Runs on idle compute, in org, team, and individual tiers.
Sleep-time compute turns the raw exhaust of work into structured context, off the hot path, while no user is waiting.

It runs in tiers, org, team, and individual, concurrently, because what a whole organization should learn and what one person's assistant should learn are different consolidations. A pattern that belongs to everyone is written high in the tree; a preference that belongs to one person stays with them.

Generator, Reflector, Curator

The heart of it is a loop borrowed from recent research. A Generator proposes updates to the context from the new activity. A Reflector critiques them, asking whether a proposed lesson is real, consistent with what is already known, and worth keeping. A Curator decides what actually gets written, incrementally, into the store.

The point of the loop is restraint. Naively appending everything an agent noticed would bloat the context with noise and contradictions, and a noisy context retrieves worse, not better. The Generator, Reflector, and Curator cycle is what keeps the layer growing in signal rather than in volume. The goal is not a bigger store. It is a sharper one.

Consistency is the Reflector's real job. When a new lesson contradicts one already in the store, appending both would leave the next agent with two answers and no way to choose. The Reflector catches the conflict, and the Curator resolves it, updating the older note or scoping the new one to the case where it applies, so the store stays coherent rather than accumulating contradictions. Writes are deduplicated by meaning, so the same lesson learned ten times is recorded once, with its confidence reinforced rather than its volume.

When it runs

Two triggers fire it. A scheduled cadence, on the order of every few hours, sweeps recent activity on a regular beat. And a capacity threshold fires when unprocessed activity piles up past a configurable limit, so a burst of work gets consolidated promptly instead of waiting for the next scheduled pass.

Steady state runs on the schedule; spikes get caught by the threshold. Neither competes with live inference, because both run on the idle pool, decoupled from the workload serving users. The learning never borrows from the latency budget of the work.

Accumulate now, present later

One design choice underneath all of this is worth pulling out, because it is the own-the-data-rent-the-methods principle applied to the context layer. There is a hard line between accumulating knowledge and presenting it. The store is the durable accumulation: the raw and distilled record of what the organization has learned.

How that store gets compiled into the right context for a given prompt is a separate, swappable layer. The presentation side is built to be replaced, today's prompt-optimization method, tomorrow's, without reprocessing the store underneath. The methods for turning context into a prompt will keep improving. The accumulated context does not have to be rebuilt each time one does, which is the point of keeping the two apart.

The payoff is concrete. When a better prompt-optimization method ships, you point the compiler at the existing store and get better prompts immediately, with no reprocessing of a year of distilled context. You rent the newest method and keep the data it runs on. That is the whole thesis of the context layer, in one operational detail.

Do the learning in its sleep

The cheap insight is that agents generate the data to improve themselves just by working. The expensive part is turning that raw data into something usable, and the trick is to never pay for it in the hot path. Capture cheaply while the user waits; consolidate on idle compute while no one does; keep the accumulation durable and the presentation swappable. The agent that runs your task tomorrow is better than today's, and not one of your tasks today was slower for it. That is the whole point of doing the learning in its sleep.