Hello curious people, welcome to the nineteenth issue of The Main Thread.
Every time a user sends a message to your AI app, you spend $0.002.
This may seem nothing, right? It’s just a fraction of a cent. Barely worth thinking about.
Until you do the math.
The Napkin Math That Changes Everything
Let’s say you are building an AI customer support bot. Let’s say we start with only a modest goal of having 10k users, each having 5 conversations/month, and each conversation averaging 8 messages.
10,000 users * 5 conversations/user * 8 messages/conversation = 400K calls/month
Total cost = $0.002/call * 400k calls/month = $800/monthThis is quite manageable. But here’s where it gets interesting.
You app grows and you hit 100k users. Conversations get longer as users trust the bot more. Suddenly:
100k users * 8 conversations/user * 12 messages/conversation = 9.6M calls/month
Total cost = $0.002/calls * 9.6M calls/month = $19,200/monthNow, let’s add the real kicker: that $0.002 was optimistic. Once you include system prompts, conversation history, RAG context, and tool calls, you are looking at $0.01-0.05 per interaction.
9.6M calls/month * $0.03/call = $288,000/month$288k/month in API costs. And you haven’t paid for servers, employees, or marketing yet.
This is why AI startups are struggling to find unit economics. The marginal cost of serving a user never drops to zero. Every conversation costs real money.
The Token Tak On Every Feature
Here’s what most founders don’t realize: every product decision is now a pricing decision.
Let’s add conversation history for context
Without history: 500 tokens/request
With 10-message history: 3000 tokens/request
Cost increase: 6x.
Let’s use RAG to ground responses
Without RAG: 500 tokens
With RAG (5 retrieved chunks): 2500 tokens
Cost increase: 5x.
Let’s add a system prompt with personality and rules
Minimal system prompt: 100 tokens
Detailed system prompt: 800 tokens
Cost increase: Adds $0.001 to every single request, forever
Let’s users upload documents
Processing a 10-page PDF: ~8000 tokens
If users upload 3 documents per session: $0.24/session just for document processing
Every feature that improves quality also increases cost. This is the fundamental tension of AI product development.
The Metrics That Actually Matter
Forget vanity metrics. For AI apps, these are your survival numbers:
1. Cost Per Conversation (CPC)
CPC = Total API Costs / Total ConversationsIf your CPC is $0.15 and users pay $20/month for unlimited access, you need each users to have fewer than 133 conversations to stay profitable.
2. Token Efficiency Ratio (TER)
TER = Output Value / Tokens ConsumedAre you spending 5000 tokens in answering a question that can be measured in 500? Measure it.
3. Prompt Waste Percentage (PWP)
PWP = (Unnecessary Tokens / Total Tokens) * 100How much of your token spend is system prompt boilerplate that never changes? How much is the conversation history that model doesn’t need?
Most apps I have seen have 30-50% prompt waste. That’ 30-50% of your API bill doing nothing.
The Playbook: How To Fix Your Unit Economics
1. Compress Your System Prompt
Let’s say your system prompt has 800 tokens. It’s included in every single request. Now, multiply it by your monthly API calls.
800 tokens * 9.6M requests * $0.00001/token = $76800/yearRewrite your system prompt to 300 tokens and you will save $48k/year.
Before
You are a helpful, friendly, and knowledgeable customer support assistant for Acme Corp. You should always be polite and professional. You help users with questions about their orders, shipping, returns, and product information...
[800 tokens of instructions]After
You can see that it’s same behaviour but 75% fewer tokens. Forever.
2. Implement Sliding Window Context
You don’t need the entire conversation history. The last 4-6 messages usually suffice.
def get_context_window(messages, max_messages=6):
# Always keep system prompt + last N messages
return [messages[0]] + messages[-max_messages:]At message 20, you are only sending last 6 messages, instead of 20. 70% cost reduction.
3. Cache Aggressively
If same question is asked by 100 users and you are making API call every time, it’s utter foolishness. Cache it.
# Semantic cache: if query is similar enough, return cached response
cache_key = get_embedding(query)
cached = vector_cache.search(cache_key, threshold=0.95)
if cached:
return cached.response # $0.00 instead of $0.03Even a 20% cache hit rate on common queries saves thousands per month.
4. Right-Size Your Model
You must understand that not every request needs GPT-5

Build are router that classifies queries and sends them to the appropriate model:
def route_query(query):
complexity = estimate_complexity(query)
if complexity == "simple":
return "gpt-3.5-turbo" # 20x cheaper
return "gpt-4-turbo"60-70% of queries can be handled by cheaper models.
5. Charge Per Value, Not Per Seat
The “unlimited usage for $20“ is a death trap for AI apps.
Options that work:
Per conversation pricing: $0.10 per conversation
Tiered usage: 100 conversations/month, then $0.05 each
Outcome based: Free to ask, pay when AI resolves your issue
Align your revenue with your costs. If a power user costs your $50/month in API fees, they should pay more than a casual user who costs $2.
The Uncomfortable Truth
The truth that no body wants to admit is that at current API prices, most AI apps cannot support unlimited usage at low price points. The math doesn’t work.
Scenario: $15/month subscription, unlimited usage
Average user: 50 conversations/month
Your cost at $0.10/conversation: $5
Gross margin: 67% ✓
Reality: Power users have 300+ conversations/month
Your cost: $30/month
You are paying $15/month for them to use your product
A few power users can destroy your unit economics. This is why usage-based pricing is needed for survival.
Bottom Line
That $0.002 seems insignificant until you multiply it by the scale you are hoping to achieve.
Every token is a tax on growth. Every unnecessary prompt is money burned. Every feature that adds context is a pricing decision in disguise.
The AI apps that survive will be the ones that obsess over token efficiency the way previous generations obsessed over server costs and database queries.
What's the most wasteful token spend you've discovered in your app? I am collecting horror stories.
— Anirudh
