Local inference
Running a model on your own hardware rather than calling a hosted API — private, free at the margin, and offline-capable, at the cost of some quality and speed.
Local inference means running a model on your own machine instead of calling a hosted API — so the data never leaves your device, it's free at the margin, and it works offline. It became practical through two advances: small models that punch above their weight, and quantisation that shrinks them to fit consumer hardware. The trade is some quality and speed for privacy and control.
Local inference became practical through two advances: small models that punch above their size, and quantisation that shrinks them to fit consumer hardware. Tools now hide the setup behind a single command.
It's the democratisation track — anyone can run capable AI privately — and the proliferation worry, since the same accessibility cuts both ways. Both are true at once.