When building AI agents, the focus is generally on making them work without worrying too much about costs from a strategic perspective. However, token usage economy becomes crucial when considering monetization models, especially for AI agents, because even simple tasks can be very token-consuming. Here are some key points and ideas on this topic for startups.
Observation
- Hard to make a profit. For a consuming product, charging $20/mo per months subscription is hard to make a profit, especially built on top of thos exising premium models (as described in later section).
- Token costs will drop. Based on current trend, token costs have been dropping every few months when there is a new model release (check the history price in models in later section). The cost drops because of aspect
- New advanced model and optimization on existing model
- Self-hosted open-source model gets better performance and take on increasing portion of the task
- Increasing self-trained and more customized model for specific task to increase the efficiency
- AI agents consume more token. AI agents can consume a lot of tokens even for simple tasks, because of the function calling (tool use), customized system prompt, additional context, also in area of summarization and memories, and trial and error.
Ways to optimize
- Use premium expensive models for critical business areas.
- Combine multiple models in the workflow; some components might not need a premium model.
- Self-host models.
- Groq, Fast AI Inference on open source models
- Paperspace by DigitalOcean
- Vast.ai low-cost cloud GPU rental
- OpenLLM by BentoML
- Keep the system prompt, additional context, and tool use relevant and concise.
Major Premium Models
- $3 / MTok input tokens
- $15 / MTok output tokens
- $5 / M input tokens
- input: $0.00125 / 1k characters
- approximately 4 characters per token
- $15 / M output tokens
- Output: $0.00375 / 1k characters
- $5.00 / 1M input tokens
- $15.00 / 1M output tokens