AI Inference Is the New World Currency (And Forbes Just Said So)

2026-05-10 update: This post had been incubating for weeks while I told myself it wasn't quite ready. Yesterday, Forbes ran a piece by Nisha Talagala titled "Are Tokens The New Currency? A Primer For Business". Same thesis, different byline, different audience — and now my draft looks late instead of early. Free advice from a guy who just learned the lesson the hard way: don't sit on a good idea waiting for it to be perfect. Hit publish. Forbes apparently agreed.

The Wife Test

A few weeks ago my wife told me she'd read online that generating silly AI memes was wasting natural resources. I told her she was right.

Then I tried to explain why she was more right than the article she'd read had told her.

The environmental math is real but unspectacular at the scale of one image. Research estimates from Hugging Face and Carnegie Mellon put AI image generation in the same rough neighborhood as charging a phone, with model-dependent measurements that can land around 0.003–0.012 kWh per image. The water footprint is closer to a shot glass than to a swimming pool, despite some fairly wild viral claims that have made the rounds.

At individual scale, a Ghibli filter is not a moral catastrophe.

At civilizational scale, it is something stranger. It's an allocation decision.

That sentence is the hinge.

Once you see it as an allocation decision, you start to notice that there's a thing being allocated, and that thing is starting to behave the way money behaves.

What is currency, anyway?

A cow was a currency, once. A cow was a stored claim on labor — milk, meat, plowing — that you could trade for other stored claims on labor. Then someone in Lydia, around 600 BCE, figured out you could compress the cow into a coin: a small disc of electrum that didn't need feeding and traveled well. Then someone in Tang Dynasty China figured out you could compress the coin into a piece of paper. Bretton Woods (1944) anchored that paper to gold. Nixon cut the anchor in 1971. Kissinger built a new one in 1974: oil priced in dollars, in exchange for U.S. security guarantees. Bitcoin (2009) reintroduced cryptographic scarcity without a central issuer.

Six transitions. Each one expanded what could be stored, exchanged, and trusted at a distance. Each one took something scarce and built an abstraction layer on top of it. Each one made the previous layer feel quaint.

The seventh is happening right now. The unit is the inference token.

This is no longer a hot take. NVIDIA's own corporate copy calls tokens "the currency of AI" produced by "AI factories." Deloitte ran a piece in January 2026 declaring that AI cannot be managed with outdated cost models and that business leaders should "treat AI economics with the same rigor as energy or capital allocation, recognizing tokens as the new currency."

The Center for Strategic and International Studies — a Washington think tank not normally given to metaphor — published a paper in February openly proposing a "compute-dollar system" to replace the petrodollar. Goldman Sachs is now tracking $7.6 trillion in cumulative AI infrastructure capex projected 2026–2031. BloombergNEF tracks the fourteen largest data center operators alone at roughly $750 billion in 2026 capex, up from less than $450 billion in 2025.

That is not one article getting spicy. That is the stack talking to itself.

Forbes ran their primer yesterday.

It's not a hot take. It's a chorus.

Oracle laid off 30,000 people for this

On the morning of March 31, 2026, roughly thirty thousand people opened their email at six in the morning to find their job had been eliminated. The note was four paragraphs long. It was signed "Oracle Leadership." It did not come from their manager. It did not come from their CEO. By the time most of them finished reading it, they were already locked out of the systems they'd spent years building.

Larry Ellison's company had eliminated roughly 18% of its global workforce in a single email — workers in the U.S., India, Canada, Mexico, and Uruguay all received it the same morning — to free up roughly $8–10 billion a year in cash. According to a TD Cowen analysis, Oracle is also openly evaluating the sale of Oracle Health, the electronic health records business it bought from Cerner in 2022 for $28.3 billion. Ellison stood up at the announcement four years ago and said the deal was "about saving lives." Today the lifesaving business is on the table to fund something else.

What is that something else? Buildings full of GPUs.

The Stargate joint venture with OpenAI, SoftBank, and Abu Dhabi's MGX is now reportedly valued at $500 billion. Last month the Financial Times reported Oracle alone will spend $40 billion buying around 400,000 NVIDIA GB200s for a single Texas campus to lease to OpenAI. Oracle has taken on tens of billions in new debt. The stock is down 24% year-to-date. Moody's holds a negative outlook. Multiple banks have stepped back from financing some of the projects.

This is what tens of thousands of layoffs and a possible $28 billion divestiture look like when the underlying explanation is "we need to manufacture more inference."

Pattern, not anomaly. Three weeks later Cloudflare said AI had made 1,100 jobs obsolete — Cloudflare, the company that sits between most of the web and everything else on the internet.

I covered that one in Guru's Tech Bytes, too, because it fits the pattern almost too neatly. The infrastructure layer is reorganizing itself around what compute will be needed for, and the people who built the previous configuration are getting reorganized out of the building first.

The telescope can't keep up

Two weeks ago TechCrunch ran a story called "AI galaxy hunters are adding to the global GPU crunch." Here are the actual numbers, because they're absurd.

NASA's Nancy Grace Roman Space Telescope launches in September 2026, eight months ahead of schedule. Over its five-year primary mission it will produce roughly twenty thousand terabytes — twenty petabytes — of data.

The Vera C. Rubin Observatory in Chile, just doing its first-light observations now, will produce roughly 10–20 terabytes per night, depending on whether you count raw images, processed products, and alert streams.

Either way: absurd.

The bottleneck is no longer whether we can point a big enough eye at the sky. It is whether we can afford to think about what it saw.

Astronomers are now competing for Blackwell chips with Microsoft and OpenAI. UC Santa Cruz astrophysicist Brant Robertson built an NSF-funded GPU cluster that's already obsolete. The Trump administration's FY26 budget proposes cutting NSF by 50%.

Picture that for a second. The most ambitious astronomy program in human history is being throttled by its inability to compete with consumer AI for compute time.

The miners weren't wrong, just early

Here's the cleanest visual metaphor in the whole story. Facilities purpose-built to manufacture cryptocurrency tokens are being retrofitted to manufacture intelligence tokens. Same buildings. Same substations. Same cooling. Same business model: energy in, monetizable digital scarcity out. Different output.

Core Scientific's CEO calls the AI conversion "one of the largest infrastructure shifts of this decade" — 39% of the company's Q4 2025 revenue is now AI colocation. Hut 8 signed a $7 billion, fifteen-year lease deal with Google-backed Fluidstack. HIVE Digital is converting a Tier-1 facility in Boden, Sweden to liquid-cooled HPC — nine months versus three years for greenfield. Bitfarms announced a complete shift to AI by 2027. Marathon's CEO now describes the company as "energy transformation" — converting energy into digital value, deliberately ambiguous about which kind.

As of Q4 2025, roughly 30% of revenue across publicly listed Bitcoin miners came from AI/HPC, on average. Projections put that at 70% by end of 2026 for operators with executed contracts. Cumulative AI/HPC contracts signed by former miners crossed $70 billion in early 2026.

Bitcoin was supposed to be the new currency. It turned out to be the rehearsal. The miners weren't wrong about digital scarcity. They were just early about which kind of digital scarcity would matter.

PetroCompute

Now widen the lens. The Gulf states figured out something fifty years ago: oil-rich nations could convert geology into geopolitical leverage by anchoring a global currency to a single physical commodity. They are now converting that leverage into something that will outlast geology.

Depending on how you count broad national investment pledges versus hard AI data-center commitments, the Gulf AI checkbook now runs from tens of billions in named projects to trillion-dollar state pledges.

The exact aggregate is squishy.

The direction is not.

Microsoft holds a $1.5 billion equity stake in G42 — the UAE's national AI champion, chaired by the country's National Security Advisor — as part of a $15.2 billion total commitment running through 2029.

Saudi Arabia's HUMAIN, formed by the Public Investment Fund in May 2025, has a $77 billion strategy targeting 1.9 GW of AI capacity by 2030, plus a $10 billion Google Cloud partnership and a Groq–Aramco Digital deal aiming to build the world's largest AI inference data center.

Crown Prince Mohammed bin Salman raised the kingdom's overall U.S. investment pledge from $600 billion to $1 trillion during his Washington visit. Qatar's QIA put together a $20 billion Qai-Brookfield AI infrastructure JV last December.

The Stargate UAE 5-gigawatt campus, the largest such facility outside the United States, is being built by G42 and operated by OpenAI and Oracle.

Researcher Abdullah Alzabin gave it a name: PetroCompute.

The structural similarity to the 1974 petrodollar deal is so close it's almost embarrassing. Same desert. Same families. Same Pentagon. Same dollar settlement. Fifty years apart. The only real difference is that you can't blockade an inference token.

And the structure was tested in March.

CNBC reported that Iranian drones damaged AWS facilities in the UAE and Bahrain. The Islamic Revolutionary Guard Corps subsequently published a target list of twenty-nine technology assets across Bahrain, Qatar, and the UAE — facilities associated with AWS, Microsoft, Google, NVIDIA, IBM, Oracle, and Palantir.

For the first time in modern conflict, commercial hyperscale data centers became explicit kinetic military targets. The U.S. response was immediate.

Cause and effect, exactly as the Gulf had designed it. An attack on UAE digital infrastructure triggered direct U.S. military response. The hedge worked. Compute infrastructure has formally joined the list of things modern states defend with militaries.

Too dangerous to release. Yet the NSA has it.

Anthropic announced Claude Mythos Preview in early April 2026. The model's autonomous cybersecurity capabilities are reportedly extreme — it independently discovered a 27-year-old flaw in OpenBSD and a 16-year-old bug in FFmpeg, and can find and chain exploits in every major operating system and browser.

Anthropic explicitly judged it too dangerous to release publicly. Instead they created Project Glasswing, a roughly forty-organization consortium with restricted access. Twelve names are public: Microsoft, Google, Apple, AWS, Cisco, CrowdStrike, Broadcom, JPMorgan Chase, NVIDIA, Palo Alto Networks, The Linux Foundation, and Anthropic itself.

Then TechCrunch, citing Axios, reported on April 20 that the NSA is using Mythos Preview on classified networks — despite the Department of Defense (which oversees the NSA) having declared Anthropic a "supply-chain risk" in February 2026 and currently fighting Anthropic in federal court.

The Pentagon dispute traces to Anthropic refusing to make Claude available for "all lawful purposes." Anthropic drew two firm lines: no autonomous weapons, no domestic mass surveillance.

The paradox is irresistible. The same agency suing Anthropic for being too restrictive is using its most restricted model on classified networks.

This is exactly the dynamic you'd expect with any strategic resource. Things sufficiently powerful become simultaneously most regulated and most coveted. Frontier AI now sits in the same category as fissile material and signals intelligence — and those things, historically, have been currencies of state power.

Your wife was right

Loop it back. Inference is finite. It is contested. It is geopolitically fought over.

PJM Interconnection — the grid operator for thirteen U.S. states and 65 million people — forecasts a six-gigawatt shortfall by 2027. Capacity market clearing prices in PJM jumped from $28.92/MW for the 2024-25 delivery year to $329.17/MW for 2026-27 — over ten times higher in two years.

Sightline Climate reports that 11 GW of announced 2026 data center capacity has no construction underway, with high-voltage transformer lead times stretching from a 24-month pre-2020 norm to five years today. Polymarket traders put 93.5% probability on at least one U.S. AI data center moratorium passing into law by year-end 2026.

Then there is the part that sounds like satire but is apparently a product strategy: NVIDIA and Span announced a partnership where homeowners host liquid-cooled Blackwell GPUs in their utility rooms, because the grid can't keep up with central demand and someone has decided your laundry room can.

On the policy side of the same coin, Belgium is exploring a state takeover of its nuclear reactors to keep them running after years of phase-out politics. I covered that turn in Guru's Tech Bytes, too. The official rationale is energy prices, but the actual shape of the problem is bigger: compute-driven demand is forcing nations to reconsider whether shutting down baseload was a good idea in the first place.

France never stopped building reactors. South Korea is back in. Even Germany's Greens are quieter on the nuclear question than they used to be.

When something is finite, contested, convertible into anything else of value, and now defended by militaries, it stops being a resource and starts being a currency.

So my wife was right. The Ghibli filter is wasteful.

But the waste isn't the water. The waste is what the water could have done instead: Insilico's CDK12/13 cancer inhibitors, Vera Rubin's nightly transient catalog, Mythos finding 27-year-old security holes, or OpenAI's o1 model identifying or coming very close to the right diagnosis in 67% of emergency-room triage cases versus 50–55% for the two attending physicians it was tested against.

I covered that Harvard ER paper in Guru's Tech Bytes, too. It is exactly the kind of inference you want to exist when the pager goes off.

Every actually-useful inference call we displaced to render a dog wearing sunglasses is one we didn't spend on the ER.

Tokens are not just a billing unit. They are the world's emerging reserve currency, and we are spending them.

Choose what you want history to remember about how we spent ours.

The post-imperfect tax

One last thing.

I sat on this draft for weeks. Yesterday Forbes published the same idea. Now you're reading it later than you should have, with my "I told you so" instinct mostly defanged by a real journalist with a much bigger megaphone.

Free advice from someone who just paid this tax: don't sit on a good idea waiting for it to be perfect. Hit publish.

Forbes apparently agreed.

If you liked this thread of thinking, Guru's Tech Bytes — my daily AI-generated tech briefing pulling from Hacker News — has been chasing these stories in near real time.

The Cloudflare cuts are in episode 35. Belgium going back on nuclear is in episode 27. The Harvard ER paper is in episode 31.

Same five-minute morning-coffee format. Subscribe in your podcast app of choice.

FAQ

What does it mean that "tokens are the new currency"?

In AI economics, a token is the smallest unit of work an AI model processes — a chunk of text, an image fragment, or a piece of audio.

AI services are now priced and billed by token consumption. As tokens become the underlying unit of how value is created, exchanged, and accounted for in AI workflows, they're starting to function the way money does: a stored claim on a scarce resource that can be converted into other things of value.

NVIDIA, Deloitte, and Forbes have all explicitly framed tokens as a currency in 2025–2026.

How is AI inference different from AI training?

Training is the one-time process of teaching an AI model on massive datasets — analogous to building a factory. Inference is the model running in production to answer questions, generate images, write code, or process data — analogous to the factory making products.

Training is capital expenditure; inference is cost of goods sold. As AI moves from research to deployment, inference is becoming the dominant compute cost: Deloitte projects inference will account for roughly two-thirds of AI compute by end of 2026, up from one-third in 2023.

Why are Bitcoin miners pivoting to AI data centers?

Bitcoin mining facilities and AI inference data centers need almost identical physical infrastructure: massive power supply, advanced cooling, and grid connections in remote, low-cost locations.

Mining margins are volatile and halving-cycle dependent; AI compute leasing offers fifteen-year contracts with hyperscaler-grade credit ratings. Conversion is fast — typically nine months versus three years for greenfield buildouts.

As of Q4 2025, roughly 30% of publicly listed Bitcoin miners' revenue came from AI/HPC, with projections of 70% by end of 2026 for operators with executed contracts.

What is "PetroCompute" and the compute-dollar system?

PetroCompute is a term coined by researcher Abdullah Alzabin to describe how Gulf oil-producing nations are converting hydrocarbon wealth into AI compute infrastructure investments to maintain geopolitical leverage in a post-oil world.

The compute-dollar system is a related concept proposed by CSIS in February 2026: a 21st-century replacement for the 1974 petrodollar arrangement, where access to U.S.-made AI chips would be conditioned on dollar settlement of AI-enabled exports.

Both frame AI compute as the strategic commodity that energy was during the Cold War.

Why does AI inference matter for ordinary people?

Three reasons.

Cost: token-based AI pricing means consumer apps like ChatGPT, Cursor, and image generators have unpredictable, demand-based costs that flow back to end users.

Energy bills: U.S. data center demand is projected to roughly double from 80 GW in 2025 to 150 GW by 2028, and grid expansion costs get spread across all ratepayers. In PJM territory, capacity prices are up 10x in two years.

Allocation: when inference is finite and contested, every kilowatt-hour spent generating Ghibli-style memes is a kilowatt-hour not spent on cancer drug discovery, telescope data analysis, or critical security scanning.

The decisions about which uses of AI matter aren't made by anyone in particular, but they get made.