7 Jul 2026

Own your intelligence: the state of enterprise AI in Q3 2026

The Context team

The model is commoditizing and value is migrating to the application layer. Most firms are buying the wrong one.

Where enterprise AI actually stands

Walk into a large enterprise in mid-2026 and the picture is remarkably consistent. Copilots are deployed across every function, adoption dashboards are green, and spend is climbing. Ask the people doing the work what has changed, and the honest answer is usually: not much. AI is everywhere and has changed very little, because most of what firms bought is pass-through. The same model, accessed through a vendor’s interface, produces the same artifacts the team was already producing, somewhat faster. The productivity gain is real, but it accrues identically to every buyer of the same product, and it leaves when the contract does.

This piece is about why that gap exists and what to do about it. The argument, stated up front: the model layer is commoditizing, and value is migrating to the application layer. The application layer most firms have today was built to bill for usage, not to capture anything. Firms that build a capture-first application layer will compound an asset no competitor can buy. Firms that keep buying pass-through will not. And the path to the thing everyone now wants, a model of the firm’s own, runs through a sequence most firms are executing in the wrong order.

The state of the market

Three forces are moving value away from the model layer, where most executive attention has been spent.

First, the public benchmarks have saturated. Frontier models now cluster within roughly ten percent of one another on MMLU, GSM8K, MATH-500, and GPQA Diamond. A leaderboard position confirms that a model is in the capability band; it no longer tells you which one to buy.

Second, the model itself is commoditizing. Near-identical capability is available across a fifty-fold price range, much of it on open weights. A capable model is now a utility input, and token prices continue to fall.

Benchmark composite versus list price per million output tokens, July 2026. Representative published scores and list prices.

Third, adoption has outrun integration. Enterprise AI spend is up roughly 320 percent since 2023 while token costs have fallen roughly 280x, yet only about five percent of firms report capturing measurable value. The work happens, but the value does not stick, because nothing in the deployment was built to make it stick.

Meanwhile the cost curve is bending against buyers. As labs shift to usage-based pricing and agentic workloads multiply consumption, a seat that costs hundreds of dollars today plausibly costs thousands next year. In early July, Palantir’s Alex Karp described enterprise leaders as “livid” over what he called tokenmaxxing: consumption maximized, value unproven. Weeks earlier, Satya Nadella warned that entire industries risk finding their knowledge “commoditized right out from underneath them.” The benchmarks are imperfect proxies, the five-percent figure is contested, and both executives have commercial positions. But the underlying mechanism is hard to dispute. Every firm now has access to the same models, the same tokens, and the same capability band. There is no durable differentiation at the model layer, and value capture there is depressed and falling. The question is where it goes instead.

Where value accrues, and where it leaks

It goes to the application layer: the surface where the model meets the firm’s actual work, context, and judgment. That is the only place differentiation is possible, because that is where proprietary information lives and where processes actually change. Value accrues to whoever owns the application layer that captures it.

The problem is that the application layer most firms buy does not capture anything. Copilots, coding agents, assistant seats, and vertical tools are all pass-through. Every company runs the same model through a vendor-owned interface and produces the same artifacts its teams already produced. The gain is identical for every buyer. There is genuine demand for AI in the workflow, but nothing about the business changes: the model passes through, the artifact gets produced, and the work goes back to how it was. Value accrues to the application layer in theory and leaks out in practice, because a pass-through interface is not a capture surface.

The subtler version of this catches firms two years in. Fifteen copilots deployed across functions, dashboards showing healthy usage, a procurement narrative about AI everywhere. Underneath it, fifteen rented surfaces feeding fifteen vendor-owned learning loops, none of which the firm controls or can carry forward. Tokenmaxxing is not just overpaying for tokens. It is paying to build other people’s application layers while your own never gets built.

The frontier labs make this worse when they sell the application layer themselves. A router owned by the seller of the most expensive tokens cannot credibly optimize your cost down. The compounding asset accumulates in the lab’s environment rather than yours. And a lab racing to build the best general models cannot simultaneously act as the neutral custodian of the firms it learns from. Use the labs as an input. Do not let them be the custodian.

The right application layer

The answer is not another tool. Every tool on the market shares the same structural failure: it asks the firm to adapt its workflow to the tool, and it produces the same artifacts slightly faster instead of enabling the net-new processes that would actually change how the team works. The face of enterprise AI today is old artifacts with AI inside. The opportunity is new processes the team owns, processes that did not exist before and that improve with use. No tool that requires you to adapt your workflow to it can deliver that, because the workflow is the thing you are trying to reinvent.

What is needed is a workspace: an application layer where every team, not just engineers, can build use cases in an environment the firm owns, on context the firm controls, with proprietary signal captured by default. The capture layer is not a separate engineering project. It is the substrate the team builds on, which means every use case shipped is also an installment of the asset. The team describes the work; the platform handles routing, evaluation, governance, cost telemetry, and memory.

This answers the objection that matters most. Building a capture layer sounds like a lot to ask of a business that adopted AI to accelerate, not to become an AI infrastructure company. On the right platform, capture, evaluation, and governance are properties of the workspace rather than engineering projects. The firm does not need to become an infrastructure company. It needs an application layer that already is one.

What compounds

The durable asset is not the firm’s data. External data commoditizes and proprietary data ages. The durable asset is the tacit judgment in senior practitioners’ heads and in the flow of work: the exception handled a particular way, the draft corrected for a particular reason, the grounds on which an output was accepted or rejected. This signal is created continuously and captured almost nowhere. The capabilities that compound are the ones that capture it at the moment of creation and convert it into assets the firm owns.

Nadella frames this as two kinds of capital: human capital, the knowledge and pattern recognition of the firm’s people, and token capital, the AI capability the firm builds and owns. “You can offload a task, or even a job,” he wrote, “but you can never offload your learning.” The test is simple. Swap the model. Does the capability survive? If yes, the firm built genuine token capital. If no, it built a wrapper.

The model-swap test, shown schematically. A pass-through tool's capability tracks the model underneath it; a capture-first application layer retains its context, evals, and memories across the swap. Shaded band: variation across task suites.

Qualcomm ran a version of this test in production. The same foundation model went from a 23 percent task pass rate to 98 percent in four months, with no fine-tuning and no model change. Every point of the gain came from the application layer: retrieval over indexed documentation, learned memories, and an eval-gated improvement loop. The capability was never in the weights. It was in what the weights could see.

Five capabilities compound, and together they form one lifecycle: routing to the cheapest sufficient model (one sponsor cut token cost 57 percent month over month while usage doubled); private evaluation scored against the firm’s own rubrics; observability into where AI value accrues, firm-wide and per team; ring-fenced sovereignty with an ejection path that keeps everything; and firm-specific models trained on the firm’s captured work. That last capability is the one everyone now talks about. It is also the one most firms get wrong.

Measured, from one sponsor deployment: monthly token cost fell 57 percent between months two and three while usage roughly doubled over the period.

Every company wants its own model

A thesis is forming in the more ambitious corners of the market: every company should eventually have its own model, trained on its captured work, reasoning from its first principles, owned outright. It is the logical terminus of the capture strategy. If the application layer is the firm’s proprietary judgment accumulated over time, then at sufficient volume that judgment can be encoded in weights the firm owns. The model becomes the asset.

Getting there requires four things, in order. Capture: the traces, rubrics, and preference data generated in the flow of work. Dataset: that captured signal turned into something a model can learn from, through curation, labeling, and structured annotation. Synthesis: much of the training data will be synthetic, generated from captured patterns, and this is a specialized discipline where model quality is largely determined. Training infrastructure: compute, orchestration, evaluation loops, and validation, a cloud-scale build that no financial services firm, manufacturer, or healthcare system should attempt in-house.

Most firms get the sequence backwards. They stand up infrastructure first and discover they have nothing to train on, because the capture layer was never running. It is building the factory before securing the materials. The right sequence is to start capture now, on a workspace that captures by default; let the dataset accumulate from work the team is already doing; and stand up training infrastructure, with a partner, once the capture is deep enough to justify it. The firm owns the capture and the dataset. The synthesis pipeline and the training infrastructure come from a partner whose business is building them, and the partner is present throughout: the workspace captures from day one, the platform turns capture into training-ready data as it grows, and the training infrastructure is there when the firm is ready. The firm never has to become an AI infrastructure company. It has to become a company that captures its own work.

Schematic. Two firms decide at month zero to pursue a firm-specific model; the capture-first firm reaches the training threshold in month 4, the firm that builds infrastructure for nine months first reaches it in month 13. Infrastructure can be bought and parallelized; capture time cannot be compressed.

One caution from the field before treating the model as the finish line. When Qualcomm distilled its captured work into small models it owns, the distilled models scored within two points of the frontier teacher at a fraction of the cost. Cut off from the live context graph, the same distilled model lost eight points. The weights are an artifact of the loop, not a replacement for it. Even when the firm owns the model, the loop is the asset.

Reinforcement learning as a service

The capabilities above are properties of a workspace built to have them. What makes them compound into genuine token capital is reinforcement learning as a service: the loop in which captured traces and expert feedback are fed back into the system so that every run improves the next one. This is not a feature the firm builds. It is a service the platform provides on top of the capture layer. Traces show what happened. Evals measure whether it was good. The reinforcement loop trains the system to produce more of the good and less of the bad, against the firm’s own definition of good. The firm owns the capture and the judgment; the platform runs the loop. This is why the partner matters throughout the lifecycle: the loop runs continuously on the firm’s captured work, and the infrastructure that runs it belongs to the partner.

Where forward-deployed engineers fit

Every vendor now sells forward-deployed engineers, and most firms end up using them as a permanent dependency. The FDEs land, build the workflows, and stay, and the capability runs only as long as the vendor’s engineers are in the room. That is a managed service dressed up as a partnership.

The right model is on, off, on. In phase one, FDEs embed alongside the first team for enablement and discovery, explicitly time-boxed, with a defined deliverable: a team that can build without them. In phase two, the FDEs step back, the team self-serves on the workspace, and it finds the use cases nobody scoped. In phase three, FDEs return only for the high-leverage moments, the work that compounds. The tell is in the contract. If the FDEs never leave, the firm is renting engineers. If the phases are scoped and named, the firm is buying a capability that compounds.

Deployment takeaways

The question is not which model to standardize on. The model is the cheapest and most fungible input, and the answer matters less every quarter. The questions that matter concern the application layer: who owns it, where the capture accrues, and whether the firm is building an asset or renting one. Five steps, in order.

1. Understand and consolidate your AI use cases. Today there is no governed way to know where the needle is moving inside most firms, only a scatter of invisible use cases and vertical tools with no cross-team visibility. Fix that first: see what is happening across the firm and where AI is creating or leaking value. Then identify the processes most ready to be documented and transformed, the workflows that are high-toil, well understood, and ripe for capture. You cannot prioritize what you cannot see.

2. Decide what to build versus what to externalize. If you were going to build a cloud, would you build your own, or build on someone else’s platform? The same decision applies here. Own your IT team and your data model. Externalize the data context layer, the training infrastructure, and the synthesis pipeline. The infrastructure is not the asset; building it in-house is building the factory before you have the materials. The asset is the capture. Own that, and externalize the rest.

3. Build on a platform that connects data, people, and context. If the platform separates data, people, and context across silo platforms or vertical tools, you are back to the fifteen-copilot problem. Once the platform is in place, move fast on capture: get the flow of work into it, and capture the workload by default. Then build the second layer, augmented workflows that create net-new ways of doing the work with AI. Not just automating the low-hanging fruit, but building new processes that leverage intelligence and that the team owns. The companies pulling ahead are not plugging into a tool handed to them. They are building net-new systems that turn the business itself into an asset.

4. Capture traces, private evals, and human expert feedback. This is the mechanism that compounds. Every workflow generates execution traces, the full path of tool calls, steps, and decisions; capture those by default. Then build private evals that score every run against the firm’s own rubrics, so you know whether the work is getting better, not whether the model is getting better on public benchmarks. Then capture the human expert feedback: the edits, corrections, overrides, and the reasons an output was accepted or rejected. That feedback is the firm’s judgment encoded as data, and it is the thing no competitor can copy. Traces show what happened, evals measure whether it was good, and feedback teaches the system what good means in your shop. Skip this step and you have a workspace. Do it and you have a capture layer that appreciates over time and eventually feeds the firm’s own model. One diligence question separates the two: ask for the failure half-life on your tasks, the number of eval-gated iterations that removes half of the remaining failures. A vendor running a real loop knows this number. A pass-through vendor has a benchmark slide.

5. Build toward continual enterprise intelligence. This is the end state, and it is an operating model rather than a project. Once the capture layer is running, the evals are scoring, the feedback is flowing, and the reinforcement loop is compounding, the firm has a system that gets better at its work every day without anyone retraining it manually, because the work itself is the training signal. The model is rented. The application layer is owned. The capture compounds. The firms that reach this state first will hold an asset that cannot be bought or copied, because it is built from their own work. The firms that do not will keep renting intelligence and paying for tokens while the learning walks out the door.