Groq API Pricing

37 models. LLaMA, Mixtral, Gemma on Groq inference hardware. Prices per 1M tokens in USD.

Cheapest input

$0.04/1M

llama-3.2-1b-preview

Most expensive input

$1/1M

kimi-k2-instruct-0905

Models with cache pricing

7 of 37

ModelInput $/1MOutput $/1MCache $/1M
llama-3.2-1b-preview$0.04$0.04-
llama-3.1-8b-instant$0.05$0.08-
llama3-8b-8192$0.05$0.08-
llama-3.2-3b-preview$0.06$0.06-
gemma-7b-it$0.07$0.07-
openai/gpt-oss-20b$0.075$0.3$0.0375
gpt-oss-20b$0.075$0.3$0.0375
gpt-oss-safeguard-20b$0.075$0.3$0.037
meta-llama/llama-4-scout-17b-16e-instruct$0.11$0.34-
llama-4-scout-17b-16e-instruct$0.11$0.34-
openai/gpt-oss-120b$0.15$0.6$0.075
gpt-oss-120b$0.15$0.6$0.075
llama-3.2-11b-text-preview$0.18$0.18-
llama-3.2-11b-vision-preview$0.18$0.18-
llama3-groq-8b-8192-tool-use-preview$0.19$0.19-
gemma2-9b-it$0.2$0.2-
llama-guard-3-8b$0.2$0.2-
meta-llama/llama-4-maverick-17b-128e-instruct$0.2$0.6-
meta-llama/llama-guard-4-12b$0.2$0.2-
llama-guard-4-12b$0.2$0.2-
llama-4-maverick-17b-128e-instruct$0.2$0.6-
mixtral-8x7b-32768$0.24$0.24-
qwen/qwen3-32b$0.29$0.59-
qwen3-32b$0.29$0.59-
llama-3.3-70b-versatile$0.59$0.79-
llama-3.1-405b-reasoning$0.59$0.79-
llama-3.1-70b-versatile$0.59$0.79-
llama-3.3-70b-specdec$0.59$0.99-
llama3-70b-8192$0.59$0.79-
llama2-70b-4096$0.7$0.8-
deepseek-r1-distill-llama-70b$0.75$0.99-
mistral-saba-24b$0.79$0.79-
llama3-groq-70b-8192-tool-use-preview$0.89$0.89-
llama-3.2-90b-text-preview$0.9$0.9-
llama-3.2-90b-vision-preview$0.9$0.9-
moonshotai/kimi-k2-instruct$1$3$0.5
kimi-k2-instruct-0905$1$3$0.5

Track Groq costs with LLMKit

Proxy your Groq requests through LLMKit. Every call gets logged with token counts, dollar costs, and session attribution. Set budget limits that actually reject requests before they hit the provider.

MIT licensed. Built with Claude Code. Source on GitHub