257 models. LLaMA, Mixtral, and open models on Fireworks AI. Prices per 1M tokens in USD.
Cheapest input
$0.0001/1M
SSD-1B
Most expensive input
$3/1M
yi-large
Models with cache pricing
6 of 257
| Model | Input $/1M | Output $/1M | Cache $/1M |
|---|---|---|---|
| SSD-1B | $0.0001 | $0.0001 | - |
| japanese-stable-diffusion-xl | $0.0001 | $0.0001 | - |
| playground-v2-1024px-aesthetic | $0.0001 | $0.0001 | - |
| playground-v2-5-1024px-aesthetic | $0.0001 | $0.0001 | - |
| stable-diffusion-xl-1024-v1-0 | $0.0001 | $0.0001 | - |
| flux-1-schnell-fp8 | $0.0003 | $0.0003 | - |
| flux-1-dev-fp8 | $0.0005 | $0.0005 | - |
| flux-1-dev-controlnet-union | $0.001 | $0.001 | - |
| flux-kontext-pro | $0.04 | $0.04 | - |
| gpt-oss-20b | $0.07 | $0.3 | $0.04 |
| flux-kontext-max | $0.08 | $0.08 | - |
| gemma-3-27b-it | $0.1 | $0.1 | - |
| llama-v3p2-1b-instruct | $0.1 | $0.1 | - |
| llama-v3p2-3b-instruct | $0.1 | $0.1 | - |
| codegemma-2b | $0.1 | $0.1 | - |
| cogito-v1-preview-llama-3b | $0.1 | $0.1 | - |
| deepseek-coder-1b-base | $0.1 | $0.1 | - |
| deepseek-r1-distill-qwen-1p5b | $0.1 | $0.1 | - |
| ernie-4p5-21b-a3b-pt | $0.1 | $0.1 | - |
| ernie-4p5-300b-a47b-pt | $0.1 | $0.1 | - |
| flux-1-dev | $0.1 | $0.1 | - |
| flux-1-schnell | $0.1 | $0.1 | - |
| gemma-2b-it | $0.1 | $0.1 | - |
| llama-guard-3-1b | $0.1 | $0.1 | - |
| llama-v2-70b | $0.1 | $0.1 | - |
| llama-v3p1-405b-instruct-long | $0.1 | $0.1 | - |
| llama-v3p1-70b-instruct-1b | $0.1 | $0.1 | - |
| llama-v3p2-1b | $0.1 | $0.1 | - |
| llama-v3p2-3b | $0.1 | $0.1 | - |
| minimax-m1-80k | $0.1 | $0.1 | - |
| ministral-3-3b-instruct-2512 | $0.1 | $0.1 | - |
| nemotron-nano-v2-12b-vl | $0.1 | $0.1 | - |
| phi-2-3b | $0.1 | $0.1 | - |
| phi-3-mini-128k-instruct | $0.1 | $0.1 | - |
| qwen2-vl-2b-instruct | $0.1 | $0.1 | - |
| qwen2p5-0p5b-instruct | $0.1 | $0.1 | - |
| qwen2p5-1p5b-instruct | $0.1 | $0.1 | - |
| qwen2p5-coder-0p5b | $0.1 | $0.1 | - |
| qwen2p5-coder-0p5b-instruct | $0.1 | $0.1 | - |
| qwen2p5-coder-1p5b | $0.1 | $0.1 | - |
| qwen2p5-coder-1p5b-instruct | $0.1 | $0.1 | - |
| qwen2p5-coder-3b | $0.1 | $0.1 | - |
| qwen2p5-coder-3b-instruct | $0.1 | $0.1 | - |
| qwen3-0p6b | $0.1 | $0.1 | - |
| qwen3-1p7b | $0.1 | $0.1 | - |
| qwen3-1p7b-fp8-draft | $0.1 | $0.1 | - |
| qwen3-1p7b-fp8-draft-131072 | $0.1 | $0.1 | - |
| qwen3-1p7b-fp8-draft-40960 | $0.1 | $0.1 | - |
| stablecode-3b | $0.1 | $0.1 | - |
| starcoder2-3b | $0.1 | $0.1 | - |
| gpt-oss-120b | $0.15 | $0.6 | $0.07 |
| llama4-scout-instruct-basic | $0.15 | $0.6 | - |
| qwen3-30b-a3b | $0.15 | $0.6 | - |
| qwen3-coder-30b-a3b-instruct | $0.15 | $0.6 | - |
| qwen3-vl-30b-a3b-instruct | $0.15 | $0.6 | - |
| qwen3-vl-30b-a3b-thinking | $0.15 | $0.6 | - |
| accounts/fireworks/models/llama-v3p1-8b-instruct | $0.2 | $0.2 | $0.1 |
| llama-v3p1-8b-instruct | $0.2 | $0.2 | - |
| fireworks-ai-4.1b-to-16b | $0.2 | $0.2 | - |
| fireworks-ai-up-to-4b | $0.2 | $0.2 | - |
| llama-v3p2-11b-vision-instruct | $0.2 | $0.2 | - |
| chronos-hermes-13b-v2 | $0.2 | $0.2 | - |
| code-llama-13b | $0.2 | $0.2 | - |
| code-llama-13b-instruct | $0.2 | $0.2 | - |
| code-llama-13b-python | $0.2 | $0.2 | - |
| code-llama-7b | $0.2 | $0.2 | - |
| code-llama-7b-instruct | $0.2 | $0.2 | - |
| code-llama-7b-python | $0.2 | $0.2 | - |
| code-qwen-1p5-7b | $0.2 | $0.2 | - |
| codegemma-7b | $0.2 | $0.2 | - |
| cogito-v1-preview-llama-8b | $0.2 | $0.2 | - |
| cogito-v1-preview-qwen-14b | $0.2 | $0.2 | - |
| deepseek-coder-7b-base | $0.2 | $0.2 | - |
| deepseek-coder-7b-base-v1p5 | $0.2 | $0.2 | - |
| deepseek-coder-7b-instruct-v1p5 | $0.2 | $0.2 | - |
| deepseek-r1-0528-distill-qwen3-8b | $0.2 | $0.2 | - |
| deepseek-r1-distill-llama-8b | $0.2 | $0.2 | - |
| deepseek-r1-distill-qwen-14b | $0.2 | $0.2 | - |
| deepseek-r1-distill-qwen-7b | $0.2 | $0.2 | - |
| dobby-mini-unhinged-plus-llama-3-1-8b | $0.2 | $0.2 | - |
| firellava-13b | $0.2 | $0.2 | - |
| firesearch-ocr-v6 | $0.2 | $0.2 | - |
| gemma-7b | $0.2 | $0.2 | - |
| gemma-7b-it | $0.2 | $0.2 | - |
| gemma2-9b-it | $0.2 | $0.2 | - |
| hermes-2-pro-mistral-7b | $0.2 | $0.2 | - |
| internvl3-8b | $0.2 | $0.2 | - |
| llama-guard-2-8b | $0.2 | $0.2 | - |
| llama-guard-3-8b | $0.2 | $0.2 | - |
| llama-v2-13b | $0.2 | $0.2 | - |
| llama-v2-13b-chat | $0.2 | $0.2 | - |
| llama-v2-7b | $0.2 | $0.2 | - |
| llama-v2-7b-chat | $0.2 | $0.2 | - |
| llama-v3-8b | $0.2 | $0.2 | - |
| llama-v3-8b-instruct-hf | $0.2 | $0.2 | - |
| llamaguard-7b | $0.2 | $0.2 | - |
| ministral-3-14b-instruct-2512 | $0.2 | $0.2 | - |
| ministral-3-8b-instruct-2512 | $0.2 | $0.2 | - |
| mistral-7b | $0.2 | $0.2 | - |
| mistral-7b-instruct-4k | $0.2 | $0.2 | - |
| mistral-7b-instruct-v0p2 | $0.2 | $0.2 | - |
| mistral-7b-instruct-v3 | $0.2 | $0.2 | - |
| mistral-7b-v0p2 | $0.2 | $0.2 | - |
| mistral-nemo-base-2407 | $0.2 | $0.2 | - |
| mistral-nemo-instruct-2407 | $0.2 | $0.2 | - |
| mythomax-l2-13b | $0.2 | $0.2 | - |
| nous-capybara-7b-v1p9 | $0.2 | $0.2 | - |
| nous-hermes-llama2-13b | $0.2 | $0.2 | - |
| nous-hermes-llama2-7b | $0.2 | $0.2 | - |
| nvidia-nemotron-nano-12b-v2 | $0.2 | $0.2 | - |
| nvidia-nemotron-nano-9b-v2 | $0.2 | $0.2 | - |
| openchat-3p5-0106-7b | $0.2 | $0.2 | - |
| openhermes-2-mistral-7b | $0.2 | $0.2 | - |
| openhermes-2p5-mistral-7b | $0.2 | $0.2 | - |
| openorca-7b | $0.2 | $0.2 | - |
| phi-3-vision-128k-instruct | $0.2 | $0.2 | - |
| pythia-12b | $0.2 | $0.2 | - |
| qwen-v2p5-14b-instruct | $0.2 | $0.2 | - |
| qwen-v2p5-7b | $0.2 | $0.2 | - |
| qwen2-7b-instruct | $0.2 | $0.2 | - |
| qwen2-vl-7b-instruct | $0.2 | $0.2 | - |
| qwen2p5-14b | $0.2 | $0.2 | - |
| qwen2p5-7b-instruct | $0.2 | $0.2 | - |
| qwen2p5-coder-14b | $0.2 | $0.2 | - |
| qwen2p5-coder-14b-instruct | $0.2 | $0.2 | - |
| qwen2p5-coder-7b | $0.2 | $0.2 | - |
| qwen2p5-coder-7b-instruct | $0.2 | $0.2 | - |
| qwen2p5-vl-3b-instruct | $0.2 | $0.2 | - |
| qwen2p5-vl-7b-instruct | $0.2 | $0.2 | - |
| qwen3-14b | $0.2 | $0.2 | - |
| qwen3-4b | $0.2 | $0.2 | - |
| qwen3-4b-instruct-2507 | $0.2 | $0.2 | - |
| qwen3-8b | $0.2 | $0.2 | - |
| qwen3-vl-8b-instruct | $0.2 | $0.2 | - |
| rolm-ocr | $0.2 | $0.2 | - |
| snorkel-mistral-7b-pairrm-dpo | $0.2 | $0.2 | - |
| starcoder-16b | $0.2 | $0.2 | - |
| starcoder-7b | $0.2 | $0.2 | - |
| starcoder2-15b | $0.2 | $0.2 | - |
| starcoder2-7b | $0.2 | $0.2 | - |
| toppy-m-7b | $0.2 | $0.2 | - |
| yi-6b | $0.2 | $0.2 | - |
| zephyr-7b-beta | $0.2 | $0.2 | - |
| llama4-maverick-instruct-basic | $0.22 | $0.88 | - |
| qwen3-235b-a22b | $0.22 | $0.88 | - |
| glm-4p5-air | $0.22 | $0.88 | - |
| qwen3-235b-a22b-instruct-2507 | $0.22 | $0.88 | - |
| qwen3-235b-a22b-thinking-2507 | $0.22 | $0.88 | - |
| qwen3-vl-235b-a22b-instruct | $0.22 | $0.88 | - |
| qwen3-vl-235b-a22b-thinking | $0.22 | $0.88 | - |
| minimax-m2p1 | $0.3 | $1.2 | - |
| minimax-m2 | $0.3 | $1.2 | - |
| qwen3-coder-480b-a35b-instruct | $0.45 | $1.8 | - |
| fireworks-ai-moe-up-to-56b | $0.5 | $0.5 | - |
| deepseek-coder-v2-lite-base | $0.5 | $0.5 | - |
| deepseek-coder-v2-lite-instruct | $0.5 | $0.5 | - |
| deepseek-v2-lite-chat | $0.5 | $0.5 | - |
| dolphin-2p6-mixtral-8x7b | $0.5 | $0.5 | - |
| firefunction-v1 | $0.5 | $0.5 | - |
| gpt-oss-safeguard-20b | $0.5 | $0.5 | - |
| mixtral-8x7b | $0.5 | $0.5 | - |
| mixtral-8x7b-instruct | $0.5 | $0.5 | - |
| mixtral-8x7b-instruct-hf | $0.5 | $0.5 | - |
| nous-hermes-2-mixtral-8x7b-dpo | $0.5 | $0.5 | - |
| qwen3-30b-a3b-instruct-2507 | $0.5 | $0.5 | - |
| deepseek-r1-basic | $0.55 | $2.19 | - |
| glm-4p5 | $0.55 | $2.19 | - |
| glm-4p6 | $0.55 | $2.19 | - |
| deepseek-v3p2 | $0.56 | $1.68 | $0.28 |
| deepseek-v3p1 | $0.56 | $1.68 | - |
| deepseek-v3p1-terminus | $0.56 | $1.68 | - |
| glm-4p7 | $0.6 | $2.2 | - |
| kimi-k2p5 | $0.6 | $3 | $0.1 |
| kimi-k2-instruct | $0.6 | $2.5 | - |
| kimi-k2-instruct-0905 | $0.6 | $2.5 | - |
| kimi-k2-thinking | $0.6 | $2.5 | - |
| accounts/fireworks/models/llama-v3p3-70b-instruct | $0.9 | $0.9 | $0.45 |
| deepseek-v3-0324 | $0.9 | $0.9 | - |
| qwen2p5-vl-72b-instruct | $0.9 | $0.9 | - |
| fireworks-ai-above-16b | $0.9 | $0.9 | - |
| deepseek-v3 | $0.9 | $0.9 | - |
| firefunction-v2 | $0.9 | $0.9 | - |
| llama-v3p2-90b-vision-instruct | $0.9 | $0.9 | - |
| qwen2-72b-instruct | $0.9 | $0.9 | - |
| qwen2p5-coder-32b-instruct | $0.9 | $0.9 | - |
| code-llama-34b | $0.9 | $0.9 | - |
| code-llama-34b-instruct | $0.9 | $0.9 | - |
| code-llama-34b-python | $0.9 | $0.9 | - |
| code-llama-70b | $0.9 | $0.9 | - |
| code-llama-70b-instruct | $0.9 | $0.9 | - |
| code-llama-70b-python | $0.9 | $0.9 | - |
| cogito-v1-preview-llama-70b | $0.9 | $0.9 | - |
| cogito-v1-preview-qwen-32b | $0.9 | $0.9 | - |
| deepseek-coder-33b-instruct | $0.9 | $0.9 | - |
| deepseek-r1-distill-llama-70b | $0.9 | $0.9 | - |
| deepseek-r1-distill-qwen-32b | $0.9 | $0.9 | - |
| devstral-small-2505 | $0.9 | $0.9 | - |
| dobby-unhinged-llama-3-3-70b-new | $0.9 | $0.9 | - |
| dolphin-2-9-2-qwen2-72b | $0.9 | $0.9 | - |
| fare-20b | $0.9 | $0.9 | - |
| internvl3-38b | $0.9 | $0.9 | - |
| internvl3-78b | $0.9 | $0.9 | - |
| kat-coder | $0.9 | $0.9 | - |
| kat-dev-32b | $0.9 | $0.9 | - |
| kat-dev-72b-exp | $0.9 | $0.9 | - |
| llama-v2-70b-chat | $0.9 | $0.9 | - |
| llama-v3-70b-instruct | $0.9 | $0.9 | - |
| llama-v3-70b-instruct-hf | $0.9 | $0.9 | - |
| llama-v3p1-70b-instruct | $0.9 | $0.9 | - |
| llama-v3p1-nemotron-70b-instruct | $0.9 | $0.9 | - |
| llama-v3p3-70b-instruct | $0.9 | $0.9 | - |
| llava-yi-34b | $0.9 | $0.9 | - |
| mistral-small-24b-instruct-2501 | $0.9 | $0.9 | - |
| nous-hermes-2-yi-34b | $0.9 | $0.9 | - |
| nous-hermes-llama2-70b | $0.9 | $0.9 | - |
| phind-code-llama-34b-python-v1 | $0.9 | $0.9 | - |
| phind-code-llama-34b-v1 | $0.9 | $0.9 | - |
| phind-code-llama-34b-v2 | $0.9 | $0.9 | - |
| qwen-qwq-32b-preview | $0.9 | $0.9 | - |
| qwen1p5-72b-chat | $0.9 | $0.9 | - |
| qwen2-vl-72b-instruct | $0.9 | $0.9 | - |
| qwen2p5-32b | $0.9 | $0.9 | - |
| qwen2p5-32b-instruct | $0.9 | $0.9 | - |
| qwen2p5-72b | $0.9 | $0.9 | - |
| qwen2p5-72b-instruct | $0.9 | $0.9 | - |
| qwen2p5-coder-32b | $0.9 | $0.9 | - |
| qwen2p5-coder-32b-instruct-128k | $0.9 | $0.9 | - |
| qwen2p5-coder-32b-instruct-32k-rope | $0.9 | $0.9 | - |
| qwen2p5-coder-32b-instruct-64k | $0.9 | $0.9 | - |
| qwen2p5-math-72b-instruct | $0.9 | $0.9 | - |
| qwen2p5-vl-32b-instruct | $0.9 | $0.9 | - |
| qwen3-30b-a3b-thinking-2507 | $0.9 | $0.9 | - |
| qwen3-32b | $0.9 | $0.9 | - |
| qwen3-coder-480b-instruct-bf16 | $0.9 | $0.9 | - |
| qwen3-next-80b-a3b-instruct | $0.9 | $0.9 | - |
| qwen3-next-80b-a3b-thinking | $0.9 | $0.9 | - |
| qwen3-vl-32b-instruct | $0.9 | $0.9 | - |
| qwq-32b | $0.9 | $0.9 | - |
| yi-34b | $0.9 | $0.9 | - |
| yi-34b-200k-capybara | $0.9 | $0.9 | - |
| yi-34b-chat | $0.9 | $0.9 | - |
| fireworks-ai-56b-to-176b | $1.2 | $1.2 | - |
| deepseek-coder-v2-instruct | $1.2 | $1.2 | - |
| mixtral-8x22b-instruct-hf | $1.2 | $1.2 | - |
| cogito-671b-v2-p1 | $1.2 | $1.2 | - |
| dbrx-instruct | $1.2 | $1.2 | - |
| deepseek-prover-v2 | $1.2 | $1.2 | - |
| deepseek-v2p5 | $1.2 | $1.2 | - |
| glm-4p5v | $1.2 | $1.2 | - |
| gpt-oss-safeguard-120b | $1.2 | $1.2 | - |
| mistral-large-3-fp8 | $1.2 | $1.2 | - |
| mixtral-8x22b | $1.2 | $1.2 | - |
| mixtral-8x22b-instruct | $1.2 | $1.2 | - |
| deepseek-r1-0528 | $3 | $8 | - |
| deepseek-r1 | $3 | $8 | - |
| llama-v3p1-405b-instruct | $3 | $3 | - |
| yi-large | $3 | $3 | - |
Proxy your Fireworks AI requests through LLMKit. Every call gets logged with token counts, dollar costs, and session attribution. Set budget limits that actually reject requests before they hit the provider.
MIT licensed. Built with Claude Code. Source on GitHub