Nvidia Corporation (NASDAQ:NVDA) has built a $4.4 trillion empire selling chips for training AI models, but the AI business, previously defined by massive training runs, may soon not require the same amount of chips.
Hyperscalers are still spending heavily on training, but the priority has shifted to inference, the real-time computing that actually delivers AI to end users.
Jensen Huang has been calling 2026 the year inference takes over, and the numbers back him up.
OpenAI and Anthropic are producing thousands of times more inference tokens than a year ago as agentic AI workloads explode.
But Nvidia’s bestselling Grace Blackwell servers may not be the right hardware for the job.
Users say the systems consume too much energy and lack the memory for efficient inference.
‘There Is No Moat In Inference’
Cerebras CEO Andrew Feldman is leading the charge.
He told The …
This post was originally published here



