hcl
Iterance

Heat

·6 min read

This or that. Chicken or egg. Foundation or flair. Now or later. Build or bury. Conflict is such a natural part of human existence that we almost take it for granted. We’re so prone to overlooking the act of friction that we often forget the elements: Earthquakes? Fault lines. Light? Refraction. Sonar? Obstruction.

Moreover, the argument could go, if we forget the elements; we’re likely to neglect the necessity of an uneasy relationship between opposing forces. There’s building tension in the tech scene, and in society more broadly. Compute or offering. Infrastructure or application. Revenue or optionality.

A scene midway through Michael Mann’s 1995 movie, Heat, captures the theme of uneasy contention perfectly. Set in a diner off the 105, somewhere in Los Angeles, with a backdrop of people filled with indifference and distraction. Al Pacino’s character, full of wild-eyed, ruthless optimism opposes Robert De Niro’s everpresent scowl, jointly framed in a booth. The pure irony in the scene is the awareness that they are both approaching the same problem, but from two different entries.

The interesting thing about drag, opposition, constraints is that the pressure often delivers some byproduct. More often than not, that byproduct is heat. Whether it is the manufacture of silicon chips, the incessant hum of a data center, or the constant switching back and forth of users from Codex to Opus: there is friction. However, on that last point, it is far too easy to label it as simply a battle between the frontier models.

Rather, the real conflict is between the two dependencies shaping the current artificial intelligence landscape: compute and revenue. One pays, the other provides. Every dollar of revenue attributed to AI today requires four or five dollars of capital investment ahead of time. Revenue (or capital) funds the purchase, but revenue requires compute to generate. Furthermore, fabrication is physically constrained and capacity often has to be spoken for quarters before it arrives.

Yet compute, alone, does not generate users or revenue. The offering does. The application layer is the thing a customer touches, adopts, and pays for. Compute without an offering is an expensive machine running idle. An offering without compute is a product living on borrowed infrastructure. Thus, it looks as if all this heat is turning into something of a pressure cooker. Kudos to Harry, Jason, and Rory on the 20VC pod; they’ve done an excellent job of helping cut through the headlines and posturing on the topic recently. One of the biggest challenges, beneath the surface, is that anyone building at scale with AI is being asked to work within a binary posed as a question. Compute or offering. The trouble is, that framing is a trap.

The trap works as an easy narrative because it presents a resource problem. Choose compute or choose offering. Allocate accordingly. But the actual problem is not allocation: it is sequencing. Revenue buys optionality. Optionality buys compute. Compute runs anything. We’re all fortunate to be living through this chapter. Notably, the history of transformative technology is really a history of fits, starts, allocations, and sequencing. In 2002, Carlota Perez published Technological Revolutions and Financial Capital, tracing five technological surges across 250 years: canals, railways, steel, mass production, and information technology. In her paper, The financial crisis and the future of innovation: A view of technical change with the aid of history, she added some clarifying perspective on the feedback loop and time horizon of innovation. In it, she argues, each transformation followed the same two-phase sequence. Installation first: where financial capital rushes in, infrastructure gets built, overinvestment becomes obvious and reconciliation follows. Deployment comes second: production capital takes the helm and the technology restructures the actual economy.

Perez draws a distinction the current market has not internalized: installation revenue and deployment revenue are structurally different. The capital expenditures are real, but they are forward bets. Nobody knows how much compute the market will actually consume. What we do know: the revenue being generated today sits on top of a handful frontier models and relies on a feature set that changes quarterly. What has been particularly impressive about the feature set, or the application layer, has been the consistent fracturing of services. Less than two years ago, the number of wrapper companies seemed limitless. However, as OpenAI and Anthropic have expanded their core offering - they’ve also gained valuable insight on usage and adoption channels. Thus, their push into finance, law, and other high-value areas (beyond code-writing) hasn’t been surprising. Rather, the change in token economics (or subsidies) is the real looming headline.

Yet, the “wrappers die when pricing shifts” narrative is a bit, well, obvious and trite. Sure, we can all agree about the lack of defensibility and novelty of another “chatbot for x” but as time goes on, that framing becomes more localized. If there is value in the application, then it stands to reason that the developers will eventually figure out the fungibility between models. The more pressing question is not what tokens will cost tomorrow. It is whether companies can build consistent, recurring use around specific models matched to specific needs, rather than routing everything through a frontier model and calling it a product.

The age-old argument is that in a world of scarcity, you can only focus on one thing at a time. That may be true; it certainly sounds good on a motivational poster. But the focus question is not simply compute or offering. It is velocity or discipline. Build fast on one model, or build deliberately across many. The companies that chose velocity get some form of revenue (or calculated derivative). The companies that chose discipline get optionality. These are decidedly not the same dependencies nor the same outcomes. The friction between them is what generates heat.

That’s why I’ve found myself spending more and more time on HuggingFace evaluating local and cloud hosting solutions, determining if n-1 models can perform specific tasks at similar accuracy or lower cost. Pulling inputs and associating data from Garmin, Apple, Peloton, Final Surge and cross-referencing against weather and calendars has been an educational jaunt. Yet, not every bit of that process or pipeline needs Opus. Once loaded, the training plan logic runs well on a smaller model. However, the various combinations of activities matched against recovery windows and travel needed a larger context window and stronger numerical reasoning.

Ultimately, the lesson for all of us: don’t get too dependent on one process, workflow, model, or thought pattern. During that epic diner scene, De Niro recounts a lesson shared his way from another fellow misfit:

“A guy told me one time, ‘Don’t let yourself get attached to anything you are not willing to walk out on in 30 seconds flat if you feel the heat around the corner.’”

Good advice. At the rate the industry is advancing, 30 seconds feels generous, especially with summer almost here.