LLM API Pricing Comparison

731 models across 9 providers. Prices per 1M tokens in USD. Updated weekly.

Providers: anthropic, deepseek, fireworks, gemini, groq, mistral, openai, together, xai. Data sourced from official pricing pages. Use the free API for programmatic access.

anthropic

29 models

ModelInput $/1MOutput $/1MCache $/1M
claude-3-haiku-20240307$0.25$1.25$0.03
claude-3-haiku$0.25$1.25$0.03
claude-3-5-haiku-20241022$0.8$4$0.08
claude-3-5-haiku-latest$0.8$4$0.08
claude-haiku-4-5$1$5$0.1
claude-haiku-4-5-20251001$1$5$0.1
claude-sonnet-4-6$3$15$0.3
claude-sonnet-4-5$3$15$0.3
claude-sonnet-4-20250514$3$15$0.3
claude-3-5-sonnet$3$15$0.3
claude-3-7-sonnet-latest$3$15$0.3
claude-3-sonnet$3$15$0.3
claude-sonnet-4-0$3$15$0.3
claude-3-7-sonnet-20250219$3$15$0.3
claude-4-sonnet-20250514$3$15$0.3
claude-sonnet-4-5-20250929$3$15$0.3
claude-opus-4-6$5$25$0.5
claude-opus-4-5$5$25$0.5
claude-opus-4-5-20251101$5$25$0.5
claude-opus-4-6-20260205$5$25$0.5
claude-2$8$24-
claude-v1$8$24-
claude-opus-4-20250514$15$75$1.5
claude-3-opus-latest$15$75$1.5
claude-opus-4-0$15$75$1.5
claude-opus-4-1$15$75$1.5
claude-3-opus-20240229$15$75$1.5
claude-4-opus-20250514$15$75$1.5
claude-opus-4-1-20250805$15$75$1.5

deepseek

6 models

ModelInput $/1MOutput $/1MCache $/1M
deepseek-coder$0.14$0.28-
deepseek-v3$0.27$1.1$0.07
deepseek-chat$0.28$0.42$0.028
deepseek-reasoner$0.28$0.42$0.028
deepseek-v3.2$0.28$0.42$0.028
deepseek-r1$0.55$2.19-

fireworks

257 models

ModelInput $/1MOutput $/1MCache $/1M
SSD-1B$0.0001$0.0001-
japanese-stable-diffusion-xl$0.0001$0.0001-
playground-v2-1024px-aesthetic$0.0001$0.0001-
playground-v2-5-1024px-aesthetic$0.0001$0.0001-
stable-diffusion-xl-1024-v1-0$0.0001$0.0001-
flux-1-schnell-fp8$0.0003$0.0003-
flux-1-dev-fp8$0.0005$0.0005-
flux-1-dev-controlnet-union$0.001$0.001-
flux-kontext-pro$0.04$0.04-
gpt-oss-20b$0.07$0.3$0.04
flux-kontext-max$0.08$0.08-
gemma-3-27b-it$0.1$0.1-
llama-v3p2-1b-instruct$0.1$0.1-
llama-v3p2-3b-instruct$0.1$0.1-
codegemma-2b$0.1$0.1-
cogito-v1-preview-llama-3b$0.1$0.1-
deepseek-coder-1b-base$0.1$0.1-
deepseek-r1-distill-qwen-1p5b$0.1$0.1-
ernie-4p5-21b-a3b-pt$0.1$0.1-
ernie-4p5-300b-a47b-pt$0.1$0.1-
flux-1-dev$0.1$0.1-
flux-1-schnell$0.1$0.1-
gemma-2b-it$0.1$0.1-
llama-guard-3-1b$0.1$0.1-
llama-v2-70b$0.1$0.1-
llama-v3p1-405b-instruct-long$0.1$0.1-
llama-v3p1-70b-instruct-1b$0.1$0.1-
llama-v3p2-1b$0.1$0.1-
llama-v3p2-3b$0.1$0.1-
minimax-m1-80k$0.1$0.1-
ministral-3-3b-instruct-2512$0.1$0.1-
nemotron-nano-v2-12b-vl$0.1$0.1-
phi-2-3b$0.1$0.1-
phi-3-mini-128k-instruct$0.1$0.1-
qwen2-vl-2b-instruct$0.1$0.1-
qwen2p5-0p5b-instruct$0.1$0.1-
qwen2p5-1p5b-instruct$0.1$0.1-
qwen2p5-coder-0p5b$0.1$0.1-
qwen2p5-coder-0p5b-instruct$0.1$0.1-
qwen2p5-coder-1p5b$0.1$0.1-
qwen2p5-coder-1p5b-instruct$0.1$0.1-
qwen2p5-coder-3b$0.1$0.1-
qwen2p5-coder-3b-instruct$0.1$0.1-
qwen3-0p6b$0.1$0.1-
qwen3-1p7b$0.1$0.1-
qwen3-1p7b-fp8-draft$0.1$0.1-
qwen3-1p7b-fp8-draft-131072$0.1$0.1-
qwen3-1p7b-fp8-draft-40960$0.1$0.1-
stablecode-3b$0.1$0.1-
starcoder2-3b$0.1$0.1-
gpt-oss-120b$0.15$0.6$0.07
llama4-scout-instruct-basic$0.15$0.6-
qwen3-30b-a3b$0.15$0.6-
qwen3-coder-30b-a3b-instruct$0.15$0.6-
qwen3-vl-30b-a3b-instruct$0.15$0.6-
qwen3-vl-30b-a3b-thinking$0.15$0.6-
accounts/fireworks/models/llama-v3p1-8b-instruct$0.2$0.2$0.1
llama-v3p1-8b-instruct$0.2$0.2-
fireworks-ai-4.1b-to-16b$0.2$0.2-
fireworks-ai-up-to-4b$0.2$0.2-
llama-v3p2-11b-vision-instruct$0.2$0.2-
chronos-hermes-13b-v2$0.2$0.2-
code-llama-13b$0.2$0.2-
code-llama-13b-instruct$0.2$0.2-
code-llama-13b-python$0.2$0.2-
code-llama-7b$0.2$0.2-
code-llama-7b-instruct$0.2$0.2-
code-llama-7b-python$0.2$0.2-
code-qwen-1p5-7b$0.2$0.2-
codegemma-7b$0.2$0.2-
cogito-v1-preview-llama-8b$0.2$0.2-
cogito-v1-preview-qwen-14b$0.2$0.2-
deepseek-coder-7b-base$0.2$0.2-
deepseek-coder-7b-base-v1p5$0.2$0.2-
deepseek-coder-7b-instruct-v1p5$0.2$0.2-
deepseek-r1-0528-distill-qwen3-8b$0.2$0.2-
deepseek-r1-distill-llama-8b$0.2$0.2-
deepseek-r1-distill-qwen-14b$0.2$0.2-
deepseek-r1-distill-qwen-7b$0.2$0.2-
dobby-mini-unhinged-plus-llama-3-1-8b$0.2$0.2-
firellava-13b$0.2$0.2-
firesearch-ocr-v6$0.2$0.2-
gemma-7b$0.2$0.2-
gemma-7b-it$0.2$0.2-
gemma2-9b-it$0.2$0.2-
hermes-2-pro-mistral-7b$0.2$0.2-
internvl3-8b$0.2$0.2-
llama-guard-2-8b$0.2$0.2-
llama-guard-3-8b$0.2$0.2-
llama-v2-13b$0.2$0.2-
llama-v2-13b-chat$0.2$0.2-
llama-v2-7b$0.2$0.2-
llama-v2-7b-chat$0.2$0.2-
llama-v3-8b$0.2$0.2-
llama-v3-8b-instruct-hf$0.2$0.2-
llamaguard-7b$0.2$0.2-
ministral-3-14b-instruct-2512$0.2$0.2-
ministral-3-8b-instruct-2512$0.2$0.2-
mistral-7b$0.2$0.2-
mistral-7b-instruct-4k$0.2$0.2-
mistral-7b-instruct-v0p2$0.2$0.2-
mistral-7b-instruct-v3$0.2$0.2-
mistral-7b-v0p2$0.2$0.2-
mistral-nemo-base-2407$0.2$0.2-
mistral-nemo-instruct-2407$0.2$0.2-
mythomax-l2-13b$0.2$0.2-
nous-capybara-7b-v1p9$0.2$0.2-
nous-hermes-llama2-13b$0.2$0.2-
nous-hermes-llama2-7b$0.2$0.2-
nvidia-nemotron-nano-12b-v2$0.2$0.2-
nvidia-nemotron-nano-9b-v2$0.2$0.2-
openchat-3p5-0106-7b$0.2$0.2-
openhermes-2-mistral-7b$0.2$0.2-
openhermes-2p5-mistral-7b$0.2$0.2-
openorca-7b$0.2$0.2-
phi-3-vision-128k-instruct$0.2$0.2-
pythia-12b$0.2$0.2-
qwen-v2p5-14b-instruct$0.2$0.2-
qwen-v2p5-7b$0.2$0.2-
qwen2-7b-instruct$0.2$0.2-
qwen2-vl-7b-instruct$0.2$0.2-
qwen2p5-14b$0.2$0.2-
qwen2p5-7b-instruct$0.2$0.2-
qwen2p5-coder-14b$0.2$0.2-
qwen2p5-coder-14b-instruct$0.2$0.2-
qwen2p5-coder-7b$0.2$0.2-
qwen2p5-coder-7b-instruct$0.2$0.2-
qwen2p5-vl-3b-instruct$0.2$0.2-
qwen2p5-vl-7b-instruct$0.2$0.2-
qwen3-14b$0.2$0.2-
qwen3-4b$0.2$0.2-
qwen3-4b-instruct-2507$0.2$0.2-
qwen3-8b$0.2$0.2-
qwen3-vl-8b-instruct$0.2$0.2-
rolm-ocr$0.2$0.2-
snorkel-mistral-7b-pairrm-dpo$0.2$0.2-
starcoder-16b$0.2$0.2-
starcoder-7b$0.2$0.2-
starcoder2-15b$0.2$0.2-
starcoder2-7b$0.2$0.2-
toppy-m-7b$0.2$0.2-
yi-6b$0.2$0.2-
zephyr-7b-beta$0.2$0.2-
llama4-maverick-instruct-basic$0.22$0.88-
qwen3-235b-a22b$0.22$0.88-
glm-4p5-air$0.22$0.88-
qwen3-235b-a22b-instruct-2507$0.22$0.88-
qwen3-235b-a22b-thinking-2507$0.22$0.88-
qwen3-vl-235b-a22b-instruct$0.22$0.88-
qwen3-vl-235b-a22b-thinking$0.22$0.88-
minimax-m2p1$0.3$1.2-
minimax-m2$0.3$1.2-
qwen3-coder-480b-a35b-instruct$0.45$1.8-
fireworks-ai-moe-up-to-56b$0.5$0.5-
deepseek-coder-v2-lite-base$0.5$0.5-
deepseek-coder-v2-lite-instruct$0.5$0.5-
deepseek-v2-lite-chat$0.5$0.5-
dolphin-2p6-mixtral-8x7b$0.5$0.5-
firefunction-v1$0.5$0.5-
gpt-oss-safeguard-20b$0.5$0.5-
mixtral-8x7b$0.5$0.5-
mixtral-8x7b-instruct$0.5$0.5-
mixtral-8x7b-instruct-hf$0.5$0.5-
nous-hermes-2-mixtral-8x7b-dpo$0.5$0.5-
qwen3-30b-a3b-instruct-2507$0.5$0.5-
deepseek-r1-basic$0.55$2.19-
glm-4p5$0.55$2.19-
glm-4p6$0.55$2.19-
deepseek-v3p2$0.56$1.68$0.28
deepseek-v3p1$0.56$1.68-
deepseek-v3p1-terminus$0.56$1.68-
glm-4p7$0.6$2.2-
kimi-k2p5$0.6$3$0.1
kimi-k2-instruct$0.6$2.5-
kimi-k2-instruct-0905$0.6$2.5-
kimi-k2-thinking$0.6$2.5-
accounts/fireworks/models/llama-v3p3-70b-instruct$0.9$0.9$0.45
deepseek-v3-0324$0.9$0.9-
qwen2p5-vl-72b-instruct$0.9$0.9-
fireworks-ai-above-16b$0.9$0.9-
deepseek-v3$0.9$0.9-
firefunction-v2$0.9$0.9-
llama-v3p2-90b-vision-instruct$0.9$0.9-
qwen2-72b-instruct$0.9$0.9-
qwen2p5-coder-32b-instruct$0.9$0.9-
code-llama-34b$0.9$0.9-
code-llama-34b-instruct$0.9$0.9-
code-llama-34b-python$0.9$0.9-
code-llama-70b$0.9$0.9-
code-llama-70b-instruct$0.9$0.9-
code-llama-70b-python$0.9$0.9-
cogito-v1-preview-llama-70b$0.9$0.9-
cogito-v1-preview-qwen-32b$0.9$0.9-
deepseek-coder-33b-instruct$0.9$0.9-
deepseek-r1-distill-llama-70b$0.9$0.9-
deepseek-r1-distill-qwen-32b$0.9$0.9-
devstral-small-2505$0.9$0.9-
dobby-unhinged-llama-3-3-70b-new$0.9$0.9-
dolphin-2-9-2-qwen2-72b$0.9$0.9-
fare-20b$0.9$0.9-
internvl3-38b$0.9$0.9-
internvl3-78b$0.9$0.9-
kat-coder$0.9$0.9-
kat-dev-32b$0.9$0.9-
kat-dev-72b-exp$0.9$0.9-
llama-v2-70b-chat$0.9$0.9-
llama-v3-70b-instruct$0.9$0.9-
llama-v3-70b-instruct-hf$0.9$0.9-
llama-v3p1-70b-instruct$0.9$0.9-
llama-v3p1-nemotron-70b-instruct$0.9$0.9-
llama-v3p3-70b-instruct$0.9$0.9-
llava-yi-34b$0.9$0.9-
mistral-small-24b-instruct-2501$0.9$0.9-
nous-hermes-2-yi-34b$0.9$0.9-
nous-hermes-llama2-70b$0.9$0.9-
phind-code-llama-34b-python-v1$0.9$0.9-
phind-code-llama-34b-v1$0.9$0.9-
phind-code-llama-34b-v2$0.9$0.9-
qwen-qwq-32b-preview$0.9$0.9-
qwen1p5-72b-chat$0.9$0.9-
qwen2-vl-72b-instruct$0.9$0.9-
qwen2p5-32b$0.9$0.9-
qwen2p5-32b-instruct$0.9$0.9-
qwen2p5-72b$0.9$0.9-
qwen2p5-72b-instruct$0.9$0.9-
qwen2p5-coder-32b$0.9$0.9-
qwen2p5-coder-32b-instruct-128k$0.9$0.9-
qwen2p5-coder-32b-instruct-32k-rope$0.9$0.9-
qwen2p5-coder-32b-instruct-64k$0.9$0.9-
qwen2p5-math-72b-instruct$0.9$0.9-
qwen2p5-vl-32b-instruct$0.9$0.9-
qwen3-30b-a3b-thinking-2507$0.9$0.9-
qwen3-32b$0.9$0.9-
qwen3-coder-480b-instruct-bf16$0.9$0.9-
qwen3-next-80b-a3b-instruct$0.9$0.9-
qwen3-next-80b-a3b-thinking$0.9$0.9-
qwen3-vl-32b-instruct$0.9$0.9-
qwq-32b$0.9$0.9-
yi-34b$0.9$0.9-
yi-34b-200k-capybara$0.9$0.9-
yi-34b-chat$0.9$0.9-
fireworks-ai-56b-to-176b$1.2$1.2-
deepseek-coder-v2-instruct$1.2$1.2-
mixtral-8x22b-instruct-hf$1.2$1.2-
cogito-671b-v2-p1$1.2$1.2-
dbrx-instruct$1.2$1.2-
deepseek-prover-v2$1.2$1.2-
deepseek-v2p5$1.2$1.2-
glm-4p5v$1.2$1.2-
gpt-oss-safeguard-120b$1.2$1.2-
mistral-large-3-fp8$1.2$1.2-
mixtral-8x22b$1.2$1.2-
mixtral-8x22b-instruct$1.2$1.2-
deepseek-r1-0528$3$8-
deepseek-r1$3$8-
llama-v3p1-405b-instruct$3$3-
yi-large$3$3-

gemini

50 models

ModelInput $/1MOutput $/1MCache $/1M
gemini-flash-1.5-8b$0.0375$0.15$0.01
gemini-1.5-flash$0.075$0.3$0.01875
gemini-2.0-flash-lite$0.075$0.3-
gemini-flash-1.5$0.075$0.3$0.01875
gemini-2.0-flash-lite-001$0.075$0.3$0.0187
gemini-2.0-flash$0.1$0.4$0.025
gemini-2.5-flash-lite$0.1$0.4$0.01
gemini-2.0-flash-001$0.1$0.4$0.025
gemini-2.5-flash-lite-preview-09-2025$0.1$0.4$0.01
gemini-flash-lite-latest$0.1$0.4$0.025
gemini-2.5-flash-lite-preview-06-17$0.1$0.4$0.025
gemini-1.0-pro-vision-001$0.125$0.375-
gemini-pro$0.125$0.375-
gemini-2.5-flash-preview$0.15$0.6-
claude-3-haiku$0.25$1.25$0.03
gemini-3.1-flash-lite-preview$0.25$1.5$0.025
gemini-2.5-flash$0.3$2.5$0.03
gemini-2.5-flash-image$0.3$30-
gemini-live-2.5-flash-preview-native-audio-09-2025$0.3$2$0.075
gemini-robotics-er-1.5-preview$0.3$2.5-
gemini-2.5-flash-preview-09-2025$0.3$2.5$0.075
gemini-flash-latest$0.3$2.5$0.075
gemini-2.5-flash-preview-tts$0.3$2.5-
gemini-2.5-flash-native-audio-latest$0.3$2.5-
gemini-2.5-flash-native-audio-preview-09-2025$0.3$2.5-
gemini-2.5-flash-native-audio-preview-12-2025$0.3$2.5-
gemini-exp-1206$0.3$2.5$0.03
gemini-gemma-2-27b-it$0.35$1.05-
gemini-gemma-2-9b-it$0.35$1.05-
gemini-3-flash-preview$0.5$3$0.05
gemini-3.1-flash-image-preview$0.5$60-
gemini-live-2.5-flash-preview$0.5$2-
claude-3-5-haiku$0.8$4$0.08
gemini-2.5-pro$1.25$10$0.125
gemini-1.5-pro$1.25$5-
gemini-pro-1.5$1.25$5$0.3125
gemini-2.5-computer-use-preview-10-2025$1.25$10-
gemini-2.5-pro-preview-tts$1.25$10$0.125
gemini-pro-latest$1.25$10$0.125
gemini-3-pro-image-preview$2$120-
gemini-3-pro-preview$2$12$0.2
gemini-3.1-pro-preview$2$12$0.2
deep-research-pro-preview-12-2025$2$12-
gemini-3.1-pro-preview-customtools$2$12$0.2
claude-3-5-sonnet$3$15$0.3
claude-3-7-sonnet$3$15$0.3
claude-4-sonnet$3$15$0.3
claude-opus-4-6$5$25$0.5
claude-3-opus$15$75$1.5
claude-4-opus$15$75$1.5

groq

37 models

ModelInput $/1MOutput $/1MCache $/1M
llama-3.2-1b-preview$0.04$0.04-
llama-3.1-8b-instant$0.05$0.08-
llama3-8b-8192$0.05$0.08-
llama-3.2-3b-preview$0.06$0.06-
gemma-7b-it$0.07$0.07-
openai/gpt-oss-20b$0.075$0.3$0.0375
gpt-oss-20b$0.075$0.3$0.0375
gpt-oss-safeguard-20b$0.075$0.3$0.037
meta-llama/llama-4-scout-17b-16e-instruct$0.11$0.34-
llama-4-scout-17b-16e-instruct$0.11$0.34-
openai/gpt-oss-120b$0.15$0.6$0.075
gpt-oss-120b$0.15$0.6$0.075
llama-3.2-11b-text-preview$0.18$0.18-
llama-3.2-11b-vision-preview$0.18$0.18-
llama3-groq-8b-8192-tool-use-preview$0.19$0.19-
gemma2-9b-it$0.2$0.2-
llama-guard-3-8b$0.2$0.2-
meta-llama/llama-4-maverick-17b-128e-instruct$0.2$0.6-
meta-llama/llama-guard-4-12b$0.2$0.2-
llama-guard-4-12b$0.2$0.2-
llama-4-maverick-17b-128e-instruct$0.2$0.6-
mixtral-8x7b-32768$0.24$0.24-
qwen/qwen3-32b$0.29$0.59-
qwen3-32b$0.29$0.59-
llama-3.3-70b-versatile$0.59$0.79-
llama-3.1-405b-reasoning$0.59$0.79-
llama-3.1-70b-versatile$0.59$0.79-
llama-3.3-70b-specdec$0.59$0.99-
llama3-70b-8192$0.59$0.79-
llama2-70b-4096$0.7$0.8-
deepseek-r1-distill-llama-70b$0.75$0.99-
mistral-saba-24b$0.79$0.79-
llama3-groq-70b-8192-tool-use-preview$0.89$0.89-
llama-3.2-90b-text-preview$0.9$0.9-
llama-3.2-90b-vision-preview$0.9$0.9-
moonshotai/kimi-k2-instruct$1$3$0.5
kimi-k2-instruct-0905$1$3$0.5

mistral

63 models

ModelInput $/1MOutput $/1MCache $/1M
ministral-3b$0.04$0.04-
mistral-small-24b-instruct-2501$0.05$0.08-
devstral-small$0.06$0.12-
mistral-small-3-2-2506$0.06$0.18-
mistral-small-latest$0.1$0.3-
ministral-8b$0.1$1-
mistral-embed$0.1$0.1-
devstral-small-2505$0.1$0.3-
devstral-small-2507$0.1$0.3-
devstral-small-latest$0.1$0.3-
labs-devstral-small-2512$0.1$0.3-
mistral-small$0.1$0.3-
ministral-3-3b-2512$0.1$0.1-
mistral-nemo$0.15$0.15-
pixtral-12b$0.15$0.15-
ministral-3-8b-2512$0.15$0.15-
pixtral-12b-2409$0.15$0.15-
mistral-saba$0.2$0.6-
ministral-3-14b-2512$0.2$0.2-
mistral-7b$0.25$0.25-
mistral-tiny$0.25$0.25-
codestral-mamba-latest$0.25$0.25-
open-codestral-mamba$0.25$0.25-
open-mistral-7b$0.25$0.25-
codestral-latest$0.3$0.9-
codestral$0.3$0.9-
codestral-2508$0.3$0.9-
open-mistral-nemo$0.3$0.3-
open-mistral-nemo-2407$0.3$0.3-
mistral-medium-3$0.4$2-
devstral-medium-2507$0.4$2-
devstral-latest$0.4$2-
devstral-medium-latest$0.4$2-
devstral-2512$0.4$2-
mistral-medium-2505$0.4$2-
mistral-medium-latest$0.4$2-
mistral-medium-3-1-2508$0.4$2-
magistral-small$0.5$1.5-
magistral-small-2506$0.5$1.5-
magistral-small-latest$0.5$1.5-
magistral-small-1-2-2509$0.5$1.5-
mistral-large-3$0.5$1.5-
mistral-large-2512$0.5$1.5-
mixtral-8x7b$0.7$0.7-
open-mixtral-8x7b$0.7$0.7-
mixtral-8x22b-instruct$0.9$0.9-
codestral-2405$1$3-
mistral-large-latest$2$6-
magistral-medium$2$5-
mistral-large$2$6-
pixtral-large$2$6-
magistral-medium-2506$2$5-
magistral-medium-2509$2$5-
magistral-medium-1-2-2509$2$5-
magistral-medium-latest$2$5-
mistral-large-2411$2$6-
open-mixtral-8x22b$2$6-
pixtral-large-2411$2$6-
pixtral-large-latest$2$6-
mistral-medium$2.7$8.1-
mistral-medium-2312$2.7$8.1-
mistral-large-2407$3$9-
mistral-large-2402$4$12-

openai

145 models

ModelInput $/1MOutput $/1MCache $/1M
gpt-5-nano$0.05$0.4$0.005
gpt-5-nano-2025-08-07$0.05$0.4$0.005
gpt-4.1-nano$0.1$0.4$0.025
gpt-4.1-nano-2025-04-14$0.1$0.4$0.025
gpt-4o-mini$0.15$0.6$0.075
gpt-4o-mini-2024-07-18$0.15$0.6$0.075
gpt-4o-mini-audio-preview$0.15$0.6-
gpt-4o-mini-audio-preview-2024-12-17$0.15$0.6-
gpt-4o-mini-search-preview$0.15$0.6$0.075
gpt-4o-mini-search-preview-2025-03-11$0.15$0.6$0.075
gpt-5.4-nano$0.2$1.25$0.02
ft:gpt-4.1-nano-2025-04-14$0.2$0.8$0.05
gpt-5-mini$0.25$2$0.025
gpt-5.1-codex-mini$0.25$2$0.025
gpt-5-mini-2025-08-07$0.25$2$0.025
ft:gpt-4o-mini$0.3$1.2-
gpt-4o-mini-2024-07-18.ft-$0.3$1.2-
ft:gpt-4o-mini-2024-07-18$0.3$1.2$0.15
gpt-4.1-mini$0.4$1.6$0.1
ada$0.4$0.4-
gpt-4.1-mini-2025-04-14$0.4$1.6$0.1
babbage$0.5$0.5-
gpt-3.5-turbo$0.5$1.5-
gpt-3.5-turbo-0125$0.5$1.5-
gpt-4o-mini-realtime-preview$0.6$2.4$0.3
gpt-realtime-mini$0.6$2.4$0.06
gpt-audio-mini$0.6$2.4-
gpt-audio-mini-2025-10-06$0.6$2.4-
gpt-audio-mini-2025-12-15$0.6$2.4-
gpt-4o-mini-realtime-preview-2024-12-17$0.6$2.4$0.3
gpt-realtime-mini-2025-10-06$0.6$2.4$0.06
gpt-realtime-mini-2025-12-15$0.6$2.4$0.06
gpt-5.4-mini$0.75$4.5$0.075
ft:gpt-4.1-mini-2025-04-14$0.8$3.2$0.2
gpt-3.5-turbo-1106$1$2-
o4-mini$1.1$4.4$0.275
o3-mini$1.1$4.4$0.55
o3-mini-2025-01-31$1.1$4.4$0.55
o4-mini-2025-04-16$1.1$4.4$0.275
gpt-4o-mini-transcribe$1.25$5-
gpt-5$1.25$10$0.125
gpt-5.1$1.25$10$0.125
gpt-5.1-2025-11-13$1.25$10$0.125
gpt-5.1-chat-latest$1.25$10$0.125
gpt-5-2025-08-07$1.25$10$0.125
gpt-5-chat$1.25$10$0.125
gpt-5-chat-latest$1.25$10$0.125
gpt-5-codex$1.25$10$0.125
gpt-5.1-codex$1.25$10$0.125
gpt-5.1-codex-max$1.25$10$0.125
gpt-4o-mini-transcribe-2025-03-20$1.25$5-
gpt-4o-mini-transcribe-2025-12-15$1.25$5-
gpt-5-search-api$1.25$10$0.125
gpt-5-search-api-2025-10-14$1.25$10$0.125
codex-mini$1.5$6$0.375
gpt-3.5-0301$1.5$2-
gpt-3.5-turbo-0613$1.5$2-
gpt-3.5-turbo-instruct$1.5$2-
codex-mini-latest$1.5$6$0.375
gpt-5.2$1.75$14$0.175
gpt-5.3-codex$1.75$14$0.175
gpt-5.2-2025-12-11$1.75$14$0.175
gpt-5.2-chat-latest$1.75$14$0.175
gpt-5.3-chat-latest$1.75$14$0.175
gpt-5.2-codex$1.75$14$0.175
gpt-4.1$2$8$0.5
o3$2$8$0.5
curie$2$2-
o4-mini-deep-research$2$8$0.5
gpt-4.1-2025-04-14$2$8$0.5
o3-2025-04-16$2$8$0.5
o4-mini-deep-research-2025-06-26$2$8$0.5
gpt-4o$2.5$10$1.25
gpt-4o-search-preview$2.5$10-
gpt-4o-transcribe$2.5$10-
gpt-5-image-mini$2.5$2$0.25
gpt-5.4$2.5$15$0.25
gpt-4o-transcribe-diarize$2.5$10-
gpt-4o-2024-08-06$2.5$10$1.25
gpt-4o-2024-11-20$2.5$10$1.25
gpt-4o-audio-preview$2.5$10-
gpt-4o-audio-preview-2024-12-17$2.5$10-
gpt-4o-audio-preview-2025-06-03$2.5$10-
gpt-audio$2.5$10-
gpt-audio-1.5$2.5$10-
gpt-audio-2025-08-28$2.5$10-
gpt-4o-mini-tts$2.5$10-
gpt-4o-search-preview-2025-03-11$2.5$10$1.25
gpt-5.4-2026-03-05$2.5$15$0.25
gpt-4o-mini-tts-2025-03-20$2.5$10-
gpt-4o-mini-tts-2025-12-15$2.5$10-
computer-use$3$12-
ft:gpt-3.5-turbo-$3$6-
gpt-3.5-turbo-16k$3$4-
o1-mini$3$12$1.5
ft:gpt-3.5-turbo$3$6-
ft:gpt-3.5-turbo-0125$3$6-
ft:gpt-3.5-turbo-0613$3$6-
ft:gpt-3.5-turbo-1106$3$6-
ft:gpt-4.1-2025-04-14$3$12$0.75
ft:gpt-4o$3.75$15-
ft:gpt-4o-2024-08-06$3.75$15$1.875
ft:gpt-4o-2024-11-20$3.75$15-
gpt-realtime$4$16$0.4
ft:o4-mini-2025-04-16$4$16$1
gpt-realtime-1.5$4$16$0.4
gpt-realtime-2025-08-28$4$16$0.4
chatgpt-4o-latest$5$15-
gpt-4o-realtime-preview$5$20$2.5
gpt-4o-2024-05-13$5$15-
gpt-4o-realtime-preview-2024-12-17$5$20$2.5
gpt-4o-realtime-preview-2025-06-03$5$20$2.5
gpt-image-1.5$5$10$1.25
gpt-image-1.5-2025-12-16$5$10$1.25
gpt-4o:extended$6$18-
gpt-4-turbo$10$30-
gpt-4-vision-preview$10$30-
gpt-5-image$10$10$1.25
o3-deep-research$10$40$2.5
gpt-4-0125-preview$10$30-
gpt-4-1106-preview$10$30-
gpt-4-turbo-2024-04-09$10$30-
gpt-4-turbo-preview$10$30-
o3-deep-research-2025-06-26$10$40$2.5
gpt-5-pro$15$120-
o1$15$60$7.5
gpt-5-pro-2025-10-06$15$120-
o1-2024-12-17$15$60$7.5
davinci$20$20-
o3-pro$20$80-
text-davinci-002$20$20-
text-davinci-003$20$20-
o3-pro-2025-06-10$20$80-
gpt-5.2-pro$21$168-
gpt-5.2-pro-2025-12-11$21$168-
gpt-4$30$60-
gpt-5.4-pro$30$180-
ft:gpt-4-0613$30$60-
gpt-4-0314$30$60-
gpt-4-0613$30$60-
gpt-5.4-pro-2026-03-05$30$180$3
gpt-4-32k$60$120-
gpt-4.5-preview$75$150$37.5
o1-pro$150$600-
o1-pro-2025-03-19$150$600-

together

105 models

ModelInput $/1MOutput $/1MCache $/1M
gpt-oss-20b$0.05$0.2-
Qwen/Qwen1.5-0.5B$0.1$0.1-
Qwen/Qwen1.5-1.8B$0.1$0.1-
Qwen/Qwen1.5-4B$0.1$0.1-
google/gemma-2b$0.1$0.1-
meta-llama/Meta-Llama-3-8B-Instruct-Lite$0.1$0.1-
microsoft/phi-2$0.1$0.1-
togethercomputer/RedPajama-INCITE-Base-3B-v1$0.1$0.1-
togethercomputer/RedPajama-INCITE-Chat-3B-v1$0.1$0.1-
togethercomputer/RedPajama-INCITE-Instruct-3B-v1$0.1$0.1-
together-ai-up-to-4b$0.1$0.1-
gpt-oss-120b$0.15$0.6-
Qwen3-Next-80B-A3B-Instruct$0.15$1.5-
Qwen3-Next-80B-A3B-Thinking$0.15$1.5-
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo$0.18$0.18-
meta-llama/Llama-4-Scout-17B-16E-Instruct$0.18$0.59-
meta-llama/Meta-Llama-3-8B-Instruct-Turbo$0.18$0.18-
Llama-4-Scout-17B-16E-Instruct$0.18$0.59-
Meta-Llama-3.1-8B-Instruct-Turbo$0.18$0.18-
NousResearch/Nous-Capybara-7B-V1p9$0.2$0.2-
NousResearch/Nous-Hermes-llama-2-7b$0.2$0.2-
Open-Orca/Mistral-7B-OpenOrca$0.2$0.2-
Qwen/Qwen1.5-7B$0.2$0.2-
Undi95/Toppy-M-7B$0.2$0.2-
allenai/OLMo-7B$0.2$0.2-
codellama/CodeLlama-7b-Instruct-hf$0.2$0.2-
google/gemma-7b$0.2$0.2-
lmsys/vicuna-7b-v1.5$0.2$0.2-
meta-llama/Llama-2-7b-chat-hf$0.2$0.2-
meta-llama/Llama-3-8b-chat-hf$0.2$0.2-
mistralai/Mistral-7B-Instruct-v0.1$0.2$0.2-
mistralai/Mistral-7B-Instruct-v0.2$0.2$0.2-
mistralai/Mistral-7B-v0.1$0.2$0.2-
openchat/openchat-3.5-1210$0.2$0.2-
snorkelai/Snorkel-Mistral-PairRM-DPO$0.2$0.2-
teknium/OpenHermes-2-Mistral-7B$0.2$0.2-
teknium/OpenHermes-2p5-Mistral-7B$0.2$0.2-
togethercomputer/GPT-JT-Moderation-6B$0.2$0.2-
togethercomputer/Llama-2-7B-32K-Instruct$0.2$0.2-
togethercomputer/RedPajama-INCITE-7B-Base$0.2$0.2-
togethercomputer/RedPajama-INCITE-7B-Chat$0.2$0.2-
togethercomputer/RedPajama-INCITE-7B-Instruct$0.2$0.2-
togethercomputer/StripedHyena-Hessian-7B$0.2$0.2-
togethercomputer/StripedHyena-Nous-7B$0.2$0.2-
togethercomputer/alpaca-7b$0.2$0.2-
zero-one-ai/Yi-6B$0.2$0.2-
together-ai-4.1b-8b$0.2$0.2-
Qwen3-235B-A22B-Instruct-2507-tput$0.2$6-
Qwen3-235B-A22B-fp8-tput$0.2$0.6-
GLM-4.5-Air-FP8$0.2$1.1-
NousResearch/Nous-Hermes-Llama2-13b$0.225$0.225-
codellama/CodeLlama-13b-Instruct-hf$0.225$0.225-
meta-llama/Llama-2-13b-chat-hf$0.225$0.225-
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8$0.27$0.85-
Llama-4-Maverick-17B-128E-Instruct-FP8$0.27$0.85-
Austism/chronos-hermes-13b$0.3$0.3-
Gryphe/MythoMax-L2-13b$0.3$0.3-
Nexusflow/NexusRaven-V2-13B$0.3$0.3-
Qwen/Qwen1.5-14B$0.3$0.3-
Undi95/ReMM-SLERP-L2-13B$0.3$0.3-
WizardLM/WizardLM-13B-V1.2$0.3$0.3-
lmsys/vicuna-13b-v1.5$0.3$0.3-
upstage/SOLAR-10.7B-Instruct-v1.0$0.3$0.3-
together-ai-8.1b-21b$0.3$0.3-
GLM-4.7$0.45$2-
Kimi-K2.5$0.5$2.8-
meta-llama/Meta-Llama-3-70B-Instruct-Lite$0.54$0.54-
DeepSeek-R1-0528-tput$0.55$2.19-
DeepSeek-V3.1$0.6$1.7-
Mixtral-8x7B-Instruct-v0.1$0.6$0.6-
GLM-4.6$0.6$2.2-
Qwen3.5-397B-A17B$0.6$3.6-
Qwen3-235B-A22B-Thinking-2507$0.65$3-
codellama/CodeLlama-34b-Instruct-hf$0.776$0.776-
NousResearch/Nous-Hermes-2-Yi-34B$0.8$0.8-
deepseek-ai/deepseek-coder-33b-instruct$0.8$0.8-
zero-one-ai/Yi-34B$0.8$0.8-
together-ai-21.1b-41b$0.8$0.8-
meta-llama/Meta-Llama-3.3-70B-Instruct-Turbo$0.88$0.88-
meta-llama/Llama-3.3-70B-Instruct-Turbo$0.88$0.88-
meta-llama/Meta-Llama-3-70B-Instruct-Turbo$0.88$0.88-
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo$0.88$0.88-
Llama-3.3-70B-Instruct-Turbo$0.88$0.88-
Meta-Llama-3.1-70B-Instruct-Turbo$0.88$0.88-
mistralai/Mixtral-8x7B-Instruct-v0.1$0.9$0.9-
NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO$0.9$0.9-
NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT$0.9$0.9-
Qwen/Qwen1.5-72B$0.9$0.9-
codellama/CodeLlama-70b-Instruct-hf$0.9$0.9-
garage-bAInd/Platypus2-70B-instruct$0.9$0.9-
meta-llama/Llama-2-70b-chat-hf$0.9$0.9-
meta-llama/Llama-3-70b-chat-hf$0.9$0.9-
mistralai/Mixtral-8x7B-v0.1$0.9$0.9-
together-ai-41.1b-80b$0.9$0.9-
Kimi-K2-Instruct$1$3-
Kimi-K2-Instruct-0905$1$3-
Qwen/Qwen2.5-72B-Instruct-Turbo$1.2$1.2-
microsoft/WizardLM-2-8x22B$1.2$1.2-
DeepSeek-V3$1.25$1.25-
together-ai-81.1b-110b$1.8$1.8-
Qwen3-Coder-480B-A35B-Instruct-FP8$2$2-
mistralai/Mixtral-8x22B-Instruct-v0.1$2.4$2.4-
DeepSeek-R1$3$7-
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo$3.5$3.5-
Meta-Llama-3.1-405B-Instruct-Turbo$3.5$3.5-

xai

39 models

ModelInput $/1MOutput $/1MCache $/1M
grok-4-1-fast-reasoning$0.2$0.5$0.05
grok-4-1-fast-non-reasoning$0.2$0.5$0.05
grok-4-fast-non-reasoning$0.2$0.5$0.05
grok-4-fast-reasoning$0.2$0.5$0.05
grok-code-fast-1$0.2$1.5$0.02
grok-4-1-fast$0.2$0.5$0.05
grok-4-1-fast-reasoning-latest$0.2$0.5$0.05
grok-4-1-fast-non-reasoning-latest$0.2$0.5$0.05
grok-code-fast$0.2$1.5$0.02
grok-code-fast-1-0825$0.2$1.5$0.02
grok-3-mini$0.3$0.5$0.075
grok-3-mini-beta$0.3$0.5$0.075
grok-3-mini-latest$0.3$0.5$0.075
grok-3-mini-fast$0.6$4$0.15
grok-3-mini-fast-beta$0.6$4$0.15
grok-3-mini-fast-latest$0.6$4$0.15
grok-4.20-0309-reasoning$2$6$0.2
grok-4.20-0309-non-reasoning$2$6$0.2
grok-4.20-multi-agent-0309$2$6$0.2
grok-4$2$6$0.2
grok-2$2$10-
grok-2-1212$2$10-
grok-2-vision-1212$2$10-
grok-2-latest$2$10-
grok-2-vision$2$10-
grok-2-vision-latest$2$10-
grok-4-latest$2$6$0.2
grok-4.20-multi-agent-beta-0309$2$6$0.2
grok-4.20-beta-0309-reasoning$2$6$0.2
grok-4.20-beta-0309-non-reasoning$2$6$0.2
grok-3$3$15$0.75
grok-4-0709$3$15$0.75
grok-3-beta$3$15$0.75
grok-3-latest$3$15$0.75
grok-3-fast$5$25$1.25
grok-3-fast-beta$5$25$1.25
grok-3-fast-latest$5$25$1.25
grok-beta$5$15-
grok-vision-beta$5$15-

Track what your AI agents actually cost

LLMKit sits between your app and AI providers. Every request gets logged with token counts and dollar costs. Budget limits reject requests before they reach the provider.

MIT licensed. Built with Claude Code. Source on GitHub