The quiet shift to on-device AI is rewriting how software gets built

For most of the past five years, building anything with artificial intelligence meant building for the cloud. That assumption is quietly coming apart.

A new class of compact models — small enough to run entirely on a phone, a laptop, or a browser tab — has caught up to where the frontier sat barely a year ago. For a growing number of tasks, the round trip to a server is no longer worth making.

What changed

Three things arrived at once: models got dramatically more efficient per parameter, consumer chips picked up dedicated neural hardware, and the tooling matured to the point where shipping a model inside an app stopped being a research project.

Why developers care

Latency disappears, privacy improves by design, cost flattens, and reliability rises. None of this makes the cloud obsolete — the largest tasks still belong in a data centre — but the centre of gravity is shifting for the long tail of ordinary features.

Where this goes next

The most likely future is hybrid, not either-or. Devices will handle what they can instantly and privately, then reach for the cloud only when a task genuinely needs more.