For most of the past five years, building anything with artificial intelligence meant building for the cloud. That assumption is quietly coming apart.
A new class of compact models — small enough to run entirely on a phone, a laptop, or a browser tab — has caught up to where the frontier sat barely a year ago. For a growing number of tasks, the round trip to a server is no longer worth making.
What changed
Three things arrived at once: models got dramatically more efficient per parameter, consumer chips picked up dedicated neural hardware, and the tooling matured to the point where shipping a model inside an app stopped being a research project.
Why developers care
Latency disappears, privacy improves by design, cost flattens, and reliability rises. None of this makes the cloud obsolete — the largest tasks still belong in a data centre — but the centre of gravity is shifting for the long tail of ordinary features.
Where this goes next
The most likely future is hybrid, not either-or. Devices will handle what they can instantly and privately, then reach for the cloud only when a task genuinely needs more.
