For most of the past five years, building anything with artificial intelligence meant building for the cloud. You sent text to a data centre, a very large model thought about it, and an answer came back. That assumption is quietly coming apart.
A new class of compact models — small enough to run entirely on a phone, a laptop, or a browser tab — has caught up to where the frontier sat barely a year ago. For a growing number of tasks, the round trip to a server is no longer worth making.
Three things arrived at once. Models got dramatically more efficient per parameter. Consumer chips picked up dedicated neural hardware, turning yesterday’s bottleneck into a background process. And the tooling matured to the point where shipping a model inside an app stopped being a research project.
The interesting frontier isn’t the biggest model anymore. It’s the smallest one that’s still good enough.
Running locally changes the economics and the experience at once: latency disappears, privacy improves by design, cost flattens because there is no per-token bill, and reliability rises because an app that works offline is simply a better app.
On-device AI trades one set of problems for another. Memory budgets are tight, battery is finite, and the diversity of consumer hardware turns “ship once, run everywhere” into a real test. When the model lives on the device, so does the responsibility for updating and securing it.
The most likely future is hybrid, not either-or. Devices will handle what they can instantly and privately, then reach for the cloud only when a task genuinely needs more. The cloud was the default because there was no alternative. There is now — and it fits in your pocket.
After a decade of churn, teams are picking stability over novelty — and shipping faster with smaller stacks.
The capacity ordered during the panic is arriving all at once. Prices are falling — and the geopolitics are shifting.
A pilot line is producing cells at a yield that finally pencils out. The implications reach far beyond electric cars.