Hello curious people, welcome to the nineteenth issue of The Main Thread.

Every time a user sends a message to your AI app, you spend $0.002.

This may seem nothing, right? It’s just a fraction of a cent. Barely worth thinking about.

Until you do the math.

The Napkin Math That Changes Everything

Let’s say you are building an AI customer support bot. Let’s say we start with only a modest goal of having 10k users, each having 5 conversations/month, and each conversation averaging 8 messages.

10,000 users * 5 conversations/user * 8 messages/conversation = 400K calls/month

Total cost = $0.002/call * 400k calls/month = $800/month

This is quite manageable. But here’s where it gets interesting.

You app grows and you hit 100k users. Conversations get longer as users trust the bot more. Suddenly:

100k users * 8 conversations/user * 12 messages/conversation = 9.6M calls/month

Total cost = $0.002/calls * 9.6M calls/month = $19,200/month

Now, let’s add the real kicker: that $0.002 was optimistic. Once you include system prompts, conversation history, RAG context, and tool calls, you are looking at $0.01-0.05 per interaction.

9.6M calls/month * $0.03/call = $288,000/month

$288k/month in API costs. And you haven’t paid for servers, employees, or marketing yet.

This is why AI startups are struggling to find unit economics. The marginal cost of serving a user never drops to zero. Every conversation costs real money.

The Token Tak On Every Feature

Here’s what most founders don’t realize: every product decision is now a pricing decision.

Let’s add conversation history for context

Without history: 500 tokens/request
With 10-message history: 3000 tokens/request
Cost increase: 6x.

Let’s use RAG to ground responses

Without RAG: 500 tokens
With RAG (5 retrieved chunks): 2500 tokens
Cost increase: 5x.

Let’s add a system prompt with personality and rules

Minimal system prompt: 100 tokens
Detailed system prompt: 800 tokens
Cost increase: Adds $0.001 to every single request, forever

Let’s users upload documents

Processing a 10-page PDF: ~8000 tokens
If users upload 3 documents per session: $0.24/session just for document processing

Every feature that improves quality also increases cost. This is the fundamental tension of AI product development.

The Metrics That Actually Matter

Forget vanity metrics. For AI apps, these are your survival numbers:

1. Cost Per Conversation (CPC)

CPC = Total API Costs / Total Conversations

If your CPC is $0.15 and users pay $20/month for unlimited access, you need each users to have fewer than 133 conversations to stay profitable.

2. Token Efficiency Ratio (TER)

TER = Output Value / Tokens Consumed

Are you spending 5000 tokens in answering a question that can be measured in 500? Measure it.

3. Prompt Waste Percentage (PWP)

PWP = (Unnecessary Tokens / Total Tokens) * 100

How much of your token spend is system prompt boilerplate that never changes? How much is the conversation history that model doesn’t need?

Most apps I have seen have 30-50% prompt waste. That’ 30-50% of your API bill doing nothing.

The Playbook: How To Fix Your Unit Economics

1. Compress Your System Prompt

Let’s say your system prompt has 800 tokens. It’s included in every single request. Now, multiply it by your monthly API calls.

800 tokens * 9.6M requests * $0.00001/token = $76800/year

Rewrite your system prompt to 300 tokens and you will save $48k/year.

Before

You are a helpful, friendly, and knowledgeable customer support assistant for Acme Corp. You should always be polite and professional. You help users with questions about their orders, shipping, returns, and product information... 

[800 tokens of instructions]

After

Acme support agent. Help with: orders, shipping, returns, products. Be concise. Ask clarifying questions if needed. Never make up order info. [200 tokens]

You can see that it’s same behaviour but 75% fewer tokens. Forever.

2. Implement Sliding Window Context

You don’t need the entire conversation history. The last 4-6 messages usually suffice.

def get_context_window(messages, max_messages=6): 
    # Always keep system prompt + last N messages 
    return [messages[0]] + messages[-max_messages:]

At message 20, you are only sending last 6 messages, instead of 20. 70% cost reduction.

3. Cache Aggressively

If same question is asked by 100 users and you are making API call every time, it’s utter foolishness. Cache it.

# Semantic cache: if query is similar enough, return cached response 
cache_key = get_embedding(query) 
cached = vector_cache.search(cache_key, threshold=0.95) 
if cached: 
    return cached.response  # $0.00 instead of $0.03

Even a 20% cache hit rate on common queries saves thousands per month.

4. Right-Size Your Model

You must understand that not every request needs GPT-5

Build are router that classifies queries and sends them to the appropriate model:

def route_query(query): 
    complexity = estimate_complexity(query) 
    if complexity == "simple": 
        return "gpt-3.5-turbo"  # 20x cheaper 
    return "gpt-4-turbo"

60-70% of queries can be handled by cheaper models.

5. Charge Per Value, Not Per Seat

The “unlimited usage for $20“ is a death trap for AI apps.

Options that work:

Per conversation pricing: $0.10 per conversation
Tiered usage: 100 conversations/month, then $0.05 each
Outcome based: Free to ask, pay when AI resolves your issue

Align your revenue with your costs. If a power user costs your $50/month in API fees, they should pay more than a casual user who costs $2.

The Uncomfortable Truth

The truth that no body wants to admit is that at current API prices, most AI apps cannot support unlimited usage at low price points. The math doesn’t work.

Scenario: $15/month subscription, unlimited usage

Average user: 50 conversations/month
Your cost at $0.10/conversation: $5
Gross margin: 67% ✓

Reality: Power users have 300+ conversations/month

Your cost: $30/month
You are paying $15/month for them to use your product

A few power users can destroy your unit economics. This is why usage-based pricing is needed for survival.

Bottom Line

That $0.002 seems insignificant until you multiply it by the scale you are hoping to achieve.

Every token is a tax on growth. Every unnecessary prompt is money burned. Every feature that adds context is a pricing decision in disguise.

The AI apps that survive will be the ones that obsess over token efficiency the way previous generations obsessed over server costs and database queries.

What's the most wasteful token spend you've discovered in your app? I am collecting horror stories.

— Anirudh

The $0.002 That Decides If Your AI App Makes Money