Enter your expected token usage. See what it costs across 731 models from 9 providers.
| Provider | Model | Input | Output | Per request ^ | Monthly |
|---|---|---|---|---|---|
| fireworks | SSD-1B | <$0.001 | <$0.001 | <$0.001 | <$0.001 |
| fireworks | japanese-stable-diffusion-xl | <$0.001 | <$0.001 | <$0.001 | <$0.001 |
| fireworks | playground-v2-1024px-aesthetic | <$0.001 | <$0.001 | <$0.001 | <$0.001 |
| fireworks | playground-v2-5-1024px-aesthetic | <$0.001 | <$0.001 | <$0.001 | <$0.001 |
| fireworks | stable-diffusion-xl-1024-v1-0 | <$0.001 | <$0.001 | <$0.001 | <$0.001 |
| fireworks | flux-1-schnell-fp8 | <$0.001 | <$0.001 | <$0.001 | <$0.001 |
| fireworks | flux-1-dev-fp8 | <$0.001 | <$0.001 | <$0.001 | <$0.001 |
| fireworks | flux-1-dev-controlnet-union | <$0.001 | <$0.001 | <$0.001 | $0.0015 |
| groq | llama-3.2-1b-preview | <$0.001 | <$0.001 | <$0.001 | $0.060 |
| fireworks | flux-kontext-pro | <$0.001 | <$0.001 | <$0.001 | $0.060 |
| mistral | ministral-3b | <$0.001 | <$0.001 | <$0.001 | $0.060 |
| groq | llama-3.1-8b-instant | <$0.001 | <$0.001 | <$0.001 | $0.090 |
| groq | llama3-8b-8192 | <$0.001 | <$0.001 | <$0.001 | $0.090 |
| mistral | mistral-small-24b-instruct-2501 | <$0.001 | <$0.001 | <$0.001 | $0.090 |
| groq | llama-3.2-3b-preview | <$0.001 | <$0.001 | <$0.001 | $0.090 |
| groq | gemma-7b-it | <$0.001 | <$0.001 | <$0.001 | $0.105 |
| gemini | gemini-flash-1.5-8b | <$0.001 | <$0.001 | <$0.001 | $0.113 |
| mistral | devstral-small | <$0.001 | <$0.001 | <$0.001 | $0.120 |
| fireworks | flux-kontext-max | <$0.001 | <$0.001 | <$0.001 | $0.120 |
| mistral | mistral-small-3-2-2506 | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| together | gpt-oss-20b | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| together | Qwen/Qwen1.5-0.5B | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| together | Qwen/Qwen1.5-1.8B | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| together | Qwen/Qwen1.5-4B | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| together | google/gemma-2b | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| together | meta-llama/Meta-Llama-3-8B-Instruct-Lite | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| together | microsoft/phi-2 | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| together | togethercomputer/RedPajama-INCITE-Base-3B-v1 | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| together | togethercomputer/RedPajama-INCITE-Chat-3B-v1 | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| together | togethercomputer/RedPajama-INCITE-Instruct-3B-v1 | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| together | together-ai-up-to-4b | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | gemma-3-27b-it | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | llama-v3p2-1b-instruct | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | llama-v3p2-3b-instruct | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | codegemma-2b | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | cogito-v1-preview-llama-3b | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | deepseek-coder-1b-base | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | deepseek-r1-distill-qwen-1p5b | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | ernie-4p5-21b-a3b-pt | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | ernie-4p5-300b-a47b-pt | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | flux-1-dev | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | flux-1-schnell | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | gemma-2b-it | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | llama-guard-3-1b | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | llama-v2-70b | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | llama-v3p1-405b-instruct-long | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | llama-v3p1-70b-instruct-1b | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | llama-v3p2-1b | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | llama-v3p2-3b | <$0.001 | <$0.001 | <$0.001 | $0.150 |
| fireworks | minimax-m1-80k | <$0.001 | <$0.001 | <$0.001 | $0.150 |
Showing top 50 cheapest models. 731 total across 9 providers. Data from pricing.json, updated weekly. Full table | API
LLMKit tracks actual costs per request, per session, per user. Budget limits reject requests before they reach the provider.
MIT licensed. Built with Claude Code. Source on GitHub