June 21, 2026

The hidden cost of generative AI (and how not to go broke)

In a demo, AI is “free”: you run a few tests and that’s it. In a service open to the public, every generation costs money, and if you don’t control how many get fired, the business stops being profitable — or simply drains your account.

Where the bleeding comes from

Retries. WhatsApp and queues retry messages. If each retry generates again, you pay two, three, four times for the same reply. (We cover it in another post: the defense is atomic deduplication.)
Abuse. A bored user — or a bot — can request generations non-stop. Without a brake, one afternoon costs you a bill.
Infinite regeneration. If the UX allows “give me more options” without a limit, cost spikes on normal use, not even malicious.

The defenses (the boring but profitable side)

Rate limits that are generous but firm: a per-user daily cap that stops abuse without bothering real use, checked transactionally before each generation.
Generate only after explicit confirmation, not on every message.
Dedup so you don’t pay twice for retries.
Metrics to tune the limits with real data (rounds per user, failures, latency), not by guessing.

Metering, as you grow

If you serve several shops, “don’t go broke” isn’t enough: you have to measure consumption per client to bill it with margin. That’s usage accounting: who generated what, how much it cost, and how it maps to their invoice. Another system layer the customer never sees but that decides whether the business model adds up.

The takeaway

The expensive part of generative AI isn’t the model: it’s governing how much it gets used. Rate limits, dedup, anti-abuse, metrics and metering are what separate “a cool demo” from “a service that makes money instead of losing it”.

It’s invisible work, yes. But it’s exactly what makes offering “design your product with AI” sustainable. In Taituri it’s already built and measured.

— The Taituri team