All providers/Fireworks AI

Fireworks AI API Pricing

257 models. LLaMA, Mixtral, and open models on Fireworks AI. Prices per 1M tokens in USD.

Cheapest input

$0.0001/1M

SSD-1B

Most expensive input

$3/1M

yi-large

Models with cache pricing

6 of 257

ModelInput $/1MOutput $/1MCache $/1M
SSD-1B$0.0001$0.0001-
japanese-stable-diffusion-xl$0.0001$0.0001-
playground-v2-1024px-aesthetic$0.0001$0.0001-
playground-v2-5-1024px-aesthetic$0.0001$0.0001-
stable-diffusion-xl-1024-v1-0$0.0001$0.0001-
flux-1-schnell-fp8$0.0003$0.0003-
flux-1-dev-fp8$0.0005$0.0005-
flux-1-dev-controlnet-union$0.001$0.001-
flux-kontext-pro$0.04$0.04-
gpt-oss-20b$0.07$0.3$0.04
flux-kontext-max$0.08$0.08-
gemma-3-27b-it$0.1$0.1-
llama-v3p2-1b-instruct$0.1$0.1-
llama-v3p2-3b-instruct$0.1$0.1-
codegemma-2b$0.1$0.1-
cogito-v1-preview-llama-3b$0.1$0.1-
deepseek-coder-1b-base$0.1$0.1-
deepseek-r1-distill-qwen-1p5b$0.1$0.1-
ernie-4p5-21b-a3b-pt$0.1$0.1-
ernie-4p5-300b-a47b-pt$0.1$0.1-
flux-1-dev$0.1$0.1-
flux-1-schnell$0.1$0.1-
gemma-2b-it$0.1$0.1-
llama-guard-3-1b$0.1$0.1-
llama-v2-70b$0.1$0.1-
llama-v3p1-405b-instruct-long$0.1$0.1-
llama-v3p1-70b-instruct-1b$0.1$0.1-
llama-v3p2-1b$0.1$0.1-
llama-v3p2-3b$0.1$0.1-
minimax-m1-80k$0.1$0.1-
ministral-3-3b-instruct-2512$0.1$0.1-
nemotron-nano-v2-12b-vl$0.1$0.1-
phi-2-3b$0.1$0.1-
phi-3-mini-128k-instruct$0.1$0.1-
qwen2-vl-2b-instruct$0.1$0.1-
qwen2p5-0p5b-instruct$0.1$0.1-
qwen2p5-1p5b-instruct$0.1$0.1-
qwen2p5-coder-0p5b$0.1$0.1-
qwen2p5-coder-0p5b-instruct$0.1$0.1-
qwen2p5-coder-1p5b$0.1$0.1-
qwen2p5-coder-1p5b-instruct$0.1$0.1-
qwen2p5-coder-3b$0.1$0.1-
qwen2p5-coder-3b-instruct$0.1$0.1-
qwen3-0p6b$0.1$0.1-
qwen3-1p7b$0.1$0.1-
qwen3-1p7b-fp8-draft$0.1$0.1-
qwen3-1p7b-fp8-draft-131072$0.1$0.1-
qwen3-1p7b-fp8-draft-40960$0.1$0.1-
stablecode-3b$0.1$0.1-
starcoder2-3b$0.1$0.1-
gpt-oss-120b$0.15$0.6$0.07
llama4-scout-instruct-basic$0.15$0.6-
qwen3-30b-a3b$0.15$0.6-
qwen3-coder-30b-a3b-instruct$0.15$0.6-
qwen3-vl-30b-a3b-instruct$0.15$0.6-
qwen3-vl-30b-a3b-thinking$0.15$0.6-
accounts/fireworks/models/llama-v3p1-8b-instruct$0.2$0.2$0.1
llama-v3p1-8b-instruct$0.2$0.2-
fireworks-ai-4.1b-to-16b$0.2$0.2-
fireworks-ai-up-to-4b$0.2$0.2-
llama-v3p2-11b-vision-instruct$0.2$0.2-
chronos-hermes-13b-v2$0.2$0.2-
code-llama-13b$0.2$0.2-
code-llama-13b-instruct$0.2$0.2-
code-llama-13b-python$0.2$0.2-
code-llama-7b$0.2$0.2-
code-llama-7b-instruct$0.2$0.2-
code-llama-7b-python$0.2$0.2-
code-qwen-1p5-7b$0.2$0.2-
codegemma-7b$0.2$0.2-
cogito-v1-preview-llama-8b$0.2$0.2-
cogito-v1-preview-qwen-14b$0.2$0.2-
deepseek-coder-7b-base$0.2$0.2-
deepseek-coder-7b-base-v1p5$0.2$0.2-
deepseek-coder-7b-instruct-v1p5$0.2$0.2-
deepseek-r1-0528-distill-qwen3-8b$0.2$0.2-
deepseek-r1-distill-llama-8b$0.2$0.2-
deepseek-r1-distill-qwen-14b$0.2$0.2-
deepseek-r1-distill-qwen-7b$0.2$0.2-
dobby-mini-unhinged-plus-llama-3-1-8b$0.2$0.2-
firellava-13b$0.2$0.2-
firesearch-ocr-v6$0.2$0.2-
gemma-7b$0.2$0.2-
gemma-7b-it$0.2$0.2-
gemma2-9b-it$0.2$0.2-
hermes-2-pro-mistral-7b$0.2$0.2-
internvl3-8b$0.2$0.2-
llama-guard-2-8b$0.2$0.2-
llama-guard-3-8b$0.2$0.2-
llama-v2-13b$0.2$0.2-
llama-v2-13b-chat$0.2$0.2-
llama-v2-7b$0.2$0.2-
llama-v2-7b-chat$0.2$0.2-
llama-v3-8b$0.2$0.2-
llama-v3-8b-instruct-hf$0.2$0.2-
llamaguard-7b$0.2$0.2-
ministral-3-14b-instruct-2512$0.2$0.2-
ministral-3-8b-instruct-2512$0.2$0.2-
mistral-7b$0.2$0.2-
mistral-7b-instruct-4k$0.2$0.2-
mistral-7b-instruct-v0p2$0.2$0.2-
mistral-7b-instruct-v3$0.2$0.2-
mistral-7b-v0p2$0.2$0.2-
mistral-nemo-base-2407$0.2$0.2-
mistral-nemo-instruct-2407$0.2$0.2-
mythomax-l2-13b$0.2$0.2-
nous-capybara-7b-v1p9$0.2$0.2-
nous-hermes-llama2-13b$0.2$0.2-
nous-hermes-llama2-7b$0.2$0.2-
nvidia-nemotron-nano-12b-v2$0.2$0.2-
nvidia-nemotron-nano-9b-v2$0.2$0.2-
openchat-3p5-0106-7b$0.2$0.2-
openhermes-2-mistral-7b$0.2$0.2-
openhermes-2p5-mistral-7b$0.2$0.2-
openorca-7b$0.2$0.2-
phi-3-vision-128k-instruct$0.2$0.2-
pythia-12b$0.2$0.2-
qwen-v2p5-14b-instruct$0.2$0.2-
qwen-v2p5-7b$0.2$0.2-
qwen2-7b-instruct$0.2$0.2-
qwen2-vl-7b-instruct$0.2$0.2-
qwen2p5-14b$0.2$0.2-
qwen2p5-7b-instruct$0.2$0.2-
qwen2p5-coder-14b$0.2$0.2-
qwen2p5-coder-14b-instruct$0.2$0.2-
qwen2p5-coder-7b$0.2$0.2-
qwen2p5-coder-7b-instruct$0.2$0.2-
qwen2p5-vl-3b-instruct$0.2$0.2-
qwen2p5-vl-7b-instruct$0.2$0.2-
qwen3-14b$0.2$0.2-
qwen3-4b$0.2$0.2-
qwen3-4b-instruct-2507$0.2$0.2-
qwen3-8b$0.2$0.2-
qwen3-vl-8b-instruct$0.2$0.2-
rolm-ocr$0.2$0.2-
snorkel-mistral-7b-pairrm-dpo$0.2$0.2-
starcoder-16b$0.2$0.2-
starcoder-7b$0.2$0.2-
starcoder2-15b$0.2$0.2-
starcoder2-7b$0.2$0.2-
toppy-m-7b$0.2$0.2-
yi-6b$0.2$0.2-
zephyr-7b-beta$0.2$0.2-
llama4-maverick-instruct-basic$0.22$0.88-
qwen3-235b-a22b$0.22$0.88-
glm-4p5-air$0.22$0.88-
qwen3-235b-a22b-instruct-2507$0.22$0.88-
qwen3-235b-a22b-thinking-2507$0.22$0.88-
qwen3-vl-235b-a22b-instruct$0.22$0.88-
qwen3-vl-235b-a22b-thinking$0.22$0.88-
minimax-m2p1$0.3$1.2-
minimax-m2$0.3$1.2-
qwen3-coder-480b-a35b-instruct$0.45$1.8-
fireworks-ai-moe-up-to-56b$0.5$0.5-
deepseek-coder-v2-lite-base$0.5$0.5-
deepseek-coder-v2-lite-instruct$0.5$0.5-
deepseek-v2-lite-chat$0.5$0.5-
dolphin-2p6-mixtral-8x7b$0.5$0.5-
firefunction-v1$0.5$0.5-
gpt-oss-safeguard-20b$0.5$0.5-
mixtral-8x7b$0.5$0.5-
mixtral-8x7b-instruct$0.5$0.5-
mixtral-8x7b-instruct-hf$0.5$0.5-
nous-hermes-2-mixtral-8x7b-dpo$0.5$0.5-
qwen3-30b-a3b-instruct-2507$0.5$0.5-
deepseek-r1-basic$0.55$2.19-
glm-4p5$0.55$2.19-
glm-4p6$0.55$2.19-
deepseek-v3p2$0.56$1.68$0.28
deepseek-v3p1$0.56$1.68-
deepseek-v3p1-terminus$0.56$1.68-
glm-4p7$0.6$2.2-
kimi-k2p5$0.6$3$0.1
kimi-k2-instruct$0.6$2.5-
kimi-k2-instruct-0905$0.6$2.5-
kimi-k2-thinking$0.6$2.5-
accounts/fireworks/models/llama-v3p3-70b-instruct$0.9$0.9$0.45
deepseek-v3-0324$0.9$0.9-
qwen2p5-vl-72b-instruct$0.9$0.9-
fireworks-ai-above-16b$0.9$0.9-
deepseek-v3$0.9$0.9-
firefunction-v2$0.9$0.9-
llama-v3p2-90b-vision-instruct$0.9$0.9-
qwen2-72b-instruct$0.9$0.9-
qwen2p5-coder-32b-instruct$0.9$0.9-
code-llama-34b$0.9$0.9-
code-llama-34b-instruct$0.9$0.9-
code-llama-34b-python$0.9$0.9-
code-llama-70b$0.9$0.9-
code-llama-70b-instruct$0.9$0.9-
code-llama-70b-python$0.9$0.9-
cogito-v1-preview-llama-70b$0.9$0.9-
cogito-v1-preview-qwen-32b$0.9$0.9-
deepseek-coder-33b-instruct$0.9$0.9-
deepseek-r1-distill-llama-70b$0.9$0.9-
deepseek-r1-distill-qwen-32b$0.9$0.9-
devstral-small-2505$0.9$0.9-
dobby-unhinged-llama-3-3-70b-new$0.9$0.9-
dolphin-2-9-2-qwen2-72b$0.9$0.9-
fare-20b$0.9$0.9-
internvl3-38b$0.9$0.9-
internvl3-78b$0.9$0.9-
kat-coder$0.9$0.9-
kat-dev-32b$0.9$0.9-
kat-dev-72b-exp$0.9$0.9-
llama-v2-70b-chat$0.9$0.9-
llama-v3-70b-instruct$0.9$0.9-
llama-v3-70b-instruct-hf$0.9$0.9-
llama-v3p1-70b-instruct$0.9$0.9-
llama-v3p1-nemotron-70b-instruct$0.9$0.9-
llama-v3p3-70b-instruct$0.9$0.9-
llava-yi-34b$0.9$0.9-
mistral-small-24b-instruct-2501$0.9$0.9-
nous-hermes-2-yi-34b$0.9$0.9-
nous-hermes-llama2-70b$0.9$0.9-
phind-code-llama-34b-python-v1$0.9$0.9-
phind-code-llama-34b-v1$0.9$0.9-
phind-code-llama-34b-v2$0.9$0.9-
qwen-qwq-32b-preview$0.9$0.9-
qwen1p5-72b-chat$0.9$0.9-
qwen2-vl-72b-instruct$0.9$0.9-
qwen2p5-32b$0.9$0.9-
qwen2p5-32b-instruct$0.9$0.9-
qwen2p5-72b$0.9$0.9-
qwen2p5-72b-instruct$0.9$0.9-
qwen2p5-coder-32b$0.9$0.9-
qwen2p5-coder-32b-instruct-128k$0.9$0.9-
qwen2p5-coder-32b-instruct-32k-rope$0.9$0.9-
qwen2p5-coder-32b-instruct-64k$0.9$0.9-
qwen2p5-math-72b-instruct$0.9$0.9-
qwen2p5-vl-32b-instruct$0.9$0.9-
qwen3-30b-a3b-thinking-2507$0.9$0.9-
qwen3-32b$0.9$0.9-
qwen3-coder-480b-instruct-bf16$0.9$0.9-
qwen3-next-80b-a3b-instruct$0.9$0.9-
qwen3-next-80b-a3b-thinking$0.9$0.9-
qwen3-vl-32b-instruct$0.9$0.9-
qwq-32b$0.9$0.9-
yi-34b$0.9$0.9-
yi-34b-200k-capybara$0.9$0.9-
yi-34b-chat$0.9$0.9-
fireworks-ai-56b-to-176b$1.2$1.2-
deepseek-coder-v2-instruct$1.2$1.2-
mixtral-8x22b-instruct-hf$1.2$1.2-
cogito-671b-v2-p1$1.2$1.2-
dbrx-instruct$1.2$1.2-
deepseek-prover-v2$1.2$1.2-
deepseek-v2p5$1.2$1.2-
glm-4p5v$1.2$1.2-
gpt-oss-safeguard-120b$1.2$1.2-
mistral-large-3-fp8$1.2$1.2-
mixtral-8x22b$1.2$1.2-
mixtral-8x22b-instruct$1.2$1.2-
deepseek-r1-0528$3$8-
deepseek-r1$3$8-
llama-v3p1-405b-instruct$3$3-
yi-large$3$3-

Track Fireworks AI costs with LLMKit

Proxy your Fireworks AI requests through LLMKit. Every call gets logged with token counts, dollar costs, and session attribution. Set budget limits that actually reject requests before they hit the provider.

MIT licensed. Built with Claude Code. Source on GitHub