Why AI features belong in the extension, not the backend

Skelf-Research · May 6, 2026 ·

architectureprivacy

There is a default assumption in modern web development that the AI call lives on the backend. The reasons are plausible. You can hide your API key on the server. You can rate-limit centrally. You can switch models without re-shipping a client. You can log everything for evaluation. You can charge per request without trusting the client. So most AI features get built as a backend that calls a model, fronted by a thin client.

For browser extensions, that default is wrong more often than it is right. This is the case for calling the model directly from the extension and skipping the backend — and the cases where the rule reverses.

What an extension already has

A browser extension is not a thin client. It is a runtime with access to a real DOM, real network, real storage, and real compute. It can do almost everything a backend can do for a single-user AI feature. It can hold credentials in chrome.storage.local. It can issue authenticated fetches over HTTPS. It can cache responses on disk. It can rate-limit itself per provider with a queue. It can show a settings panel where the user types in their own key.

The only thing it does not have is a place to hide that key from the user. And for the indie-dev, BYO-key case, that does not matter — the user is the key holder.

That single shift in assumption — “the user owns the key” — invalidates most of the reasons people give for a backend. You no longer need to hide a key, because there is no key for you to hide. You no longer need to rate-limit globally, because the rate limit is the user’s own provider quota. You no longer need to charge per request, because the user is paying their provider directly. You no longer need to log usage for billing, because you are not billing.

What you are left with is a very short list of things a backend can do that the extension cannot. And that short list is the case for adding a backend — not the case for assuming one by default.

The shorter path

The first practical benefit is latency. When the AI call lives on a backend, the round trip is: extension → your server → provider → your server → extension. That is two extra hops. Each hop is a TLS handshake’s worth of time, give or take. For a feature where the user is actively waiting on the response — which is most AI features — those hops show up as visible latency.

Calling the provider directly from the extension’s content or background script removes both hops. The first byte from the model arrives faster. For a streaming response, the user sees tokens sooner. For a non-streaming response, the spinner finishes sooner. This is not a benchmark cliff; it is a couple hundred milliseconds. But a couple hundred milliseconds at the start of every user interaction adds up.

Anouk’s AIService is designed to be called from inside a content script for this reason. You hand it an instruction, a piece of content, and a stable request id; it talks to the configured provider over fetch; you get the response back as a string. No backend.

The privacy story writes itself

The second practical benefit is privacy, and it is the one your users actually care about. When a backend mediates the AI call, the user is sending their content — which might be their email, their PR, their internal Notion page, their Google Doc — to your server, in addition to the provider. That means you can see it. That means your DPA has to cover it. That means a privacy-conscious user has to decide whether they trust you.

When the extension calls the provider directly, the user’s content goes from their browser to the provider they chose. You never touch it. You do not log it, because there is nothing to log. You do not have to write a section of your privacy policy explaining why you store it, because you do not. The story you can put on your store listing is: “Your content is sent only to the provider you configure. We do not see it.”

That is a much better story than the version where you have to explain a server-side path. And it is true by construction, not by promise.

The operating cost is zero

The third practical benefit is operating cost. A backend that mediates AI calls is a non-trivial thing to run. You need hosting. You need a database for keys (or a key vault). You need rate-limiting infrastructure. You need monitoring. You need to absorb the bill when usage spikes. You need an on-call rotation when the model provider is down and your users are wondering why your feature is broken.

For a weekend project, that cost is enormous. It is the reason most weekend AI projects either become subscription products (because the cost has to be recouped) or shut down (because the cost was not recouped). Removing the backend collapses the cost to the static hosting bill for your marketing site and the store listing fee.

This is the cost-side argument for “calling from the extension.” It is also why Anouk’s defaults are shaped this way: the framework assumes the user supplies the provider URL and key at runtime through the settings panel, and the extension calls the provider directly. No backend is on the critical path.

Where the rule reverses

This argument is not absolute. There are real cases where the backend belongs.

You are reselling inference. If you are running a paid product where the user does not bring their own key — you are billing them, and you are calling the provider on their behalf — you need a backend. Otherwise your key is in their extension. Otherwise they are not paying you for inference; they are paying you for nothing.

You need guardrails the user cannot bypass. If your product has a content policy that must be enforced before the model call (or after it, before the user sees the response), and the user must not be able to disable it, the policy logic cannot live in the extension. The user owns the extension’s code. They can patch it. A backend is the only place where you can enforce a rule the client cannot turn off.

You need centralized evaluation or logging. If you need to log every request for evaluation, A/B-test prompts in the wild, or compute aggregate metrics across users, you need a backend collecting that data. The extension cannot phone home in a way that survives the user’s adblock list, and it should not.

You are coordinating across users. If your feature shares model outputs across multiple users — collaborative AI features, organizational memory, anything multiplayer — there is shared state. Shared state needs a server. The extension is wrong shape for that.

You need a provider abstraction the user cannot see. If you are switching models based on cost, region, or quality without telling the user, you need a backend that does the switch. The user’s extension cannot pick a model the user does not know about.

In all of these cases, build a backend. The extension can still be the front-end; it just calls your server instead of the provider.

How Anouk fits

Anouk is shaped for the case where the rule applies — the indie-dev sweet spot, the BYO-key feature, the privacy-friendly browser augmentation. Its defaults are: user configures the provider in the settings panel, extension calls the provider over fetch, response gets cached locally. No backend.

If your feature is in the other category — reselling, guardrails, evaluation, multiplayer — Anouk’s AIService is not the wrong tool, but it is no longer doing most of the work. You will replace providerUrl with a URL that points at your own server, and the queue, the cache, and the settings panel will sit in front of your backend instead of OpenAI’s.

That is fine. The framework does not stop being useful when you add a backend. It just stops being the whole thing.

The thesis is not “never build a backend.” It is “do not default to one.” For a meaningful slice of AI features — probably most of the ones that look like browser augmentation — the extension is enough. Skipping the backend gives the user a faster, more private feature and gives you a product you can actually afford to operate.