When building AI agents, the focus is generally on making them work without worrying too much about costs from a strategic perspective. However, token usage economy becomes crucial when considering monetization models, especially for AI agents, because even simple tasks can be very token-consuming. Here are some key points and ideas on this topic for startups.

Observation

Hard to make a profit. For a consuming product, charging $20/mo per months subscription is hard to make a profit, especially built on top of thos exising premium models (as described in later section).
Token costs will drop. Based on current trend, token costs have been dropping every few months when there is a new model release (check the history price in models in later section). The cost drops because of aspect
- New advanced model and optimization on existing model
- Self-hosted open-source model gets better performance and take on increasing portion of the task
- Increasing self-trained and more customized model for specific task to increase the efficiency
AI agents consume more token. AI agents can consume a lot of tokens even for simple tasks, because of the function calling (tool use), customized system prompt, additional context, also in area of summarization and memories, and trial and error.

Ways to optimize

Use premium expensive models for critical business areas.
Combine multiple models in the workflow; some components might not need a premium model.
Self-host models.
- Groq, Fast AI Inference on open source models
- Paperspace by DigitalOcean
- Vast.ai low-cost cloud GPU rental
- OpenLLM by BentoML
Keep the system prompt, additional context, and tool use relevant and concise.

Major Premium Models

Claude 3.5 Sonnet

$3 / MTok input tokens
$15 / MTok output tokens

Gemini 1.5 Pro

$5 / M input tokens
- input: $0.00125 / 1k characters
- approximately 4 characters per token
$15 / M output tokens
- Output: $0.00375 / 1k characters

GPT-4o

$5.00 / 1M input tokens
$15.00 / 1M output tokens

Resources

Model Price Compare by context.ai
What you need to know about Self-Hosting Large Language Models (LLMs)

Summary of LLM Token Economies

Major Premium Models

Resources