All providers/Together AI

Together AI API Pricing

105 models. Open-source models on Together AI infrastructure. Prices per 1M tokens in USD.

Cheapest input

$0.05/1M

gpt-oss-20b

Most expensive input

$3.5/1M

Meta-Llama-3.1-405B-Instruct-Turbo

Models with cache pricing

0 of 105

ModelInput $/1MOutput $/1MCache $/1M
gpt-oss-20b$0.05$0.2-
Qwen/Qwen1.5-0.5B$0.1$0.1-
Qwen/Qwen1.5-1.8B$0.1$0.1-
Qwen/Qwen1.5-4B$0.1$0.1-
google/gemma-2b$0.1$0.1-
meta-llama/Meta-Llama-3-8B-Instruct-Lite$0.1$0.1-
microsoft/phi-2$0.1$0.1-
togethercomputer/RedPajama-INCITE-Base-3B-v1$0.1$0.1-
togethercomputer/RedPajama-INCITE-Chat-3B-v1$0.1$0.1-
togethercomputer/RedPajama-INCITE-Instruct-3B-v1$0.1$0.1-
together-ai-up-to-4b$0.1$0.1-
gpt-oss-120b$0.15$0.6-
Qwen3-Next-80B-A3B-Instruct$0.15$1.5-
Qwen3-Next-80B-A3B-Thinking$0.15$1.5-
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo$0.18$0.18-
meta-llama/Llama-4-Scout-17B-16E-Instruct$0.18$0.59-
meta-llama/Meta-Llama-3-8B-Instruct-Turbo$0.18$0.18-
Llama-4-Scout-17B-16E-Instruct$0.18$0.59-
Meta-Llama-3.1-8B-Instruct-Turbo$0.18$0.18-
NousResearch/Nous-Capybara-7B-V1p9$0.2$0.2-
NousResearch/Nous-Hermes-llama-2-7b$0.2$0.2-
Open-Orca/Mistral-7B-OpenOrca$0.2$0.2-
Qwen/Qwen1.5-7B$0.2$0.2-
Undi95/Toppy-M-7B$0.2$0.2-
allenai/OLMo-7B$0.2$0.2-
codellama/CodeLlama-7b-Instruct-hf$0.2$0.2-
google/gemma-7b$0.2$0.2-
lmsys/vicuna-7b-v1.5$0.2$0.2-
meta-llama/Llama-2-7b-chat-hf$0.2$0.2-
meta-llama/Llama-3-8b-chat-hf$0.2$0.2-
mistralai/Mistral-7B-Instruct-v0.1$0.2$0.2-
mistralai/Mistral-7B-Instruct-v0.2$0.2$0.2-
mistralai/Mistral-7B-v0.1$0.2$0.2-
openchat/openchat-3.5-1210$0.2$0.2-
snorkelai/Snorkel-Mistral-PairRM-DPO$0.2$0.2-
teknium/OpenHermes-2-Mistral-7B$0.2$0.2-
teknium/OpenHermes-2p5-Mistral-7B$0.2$0.2-
togethercomputer/GPT-JT-Moderation-6B$0.2$0.2-
togethercomputer/Llama-2-7B-32K-Instruct$0.2$0.2-
togethercomputer/RedPajama-INCITE-7B-Base$0.2$0.2-
togethercomputer/RedPajama-INCITE-7B-Chat$0.2$0.2-
togethercomputer/RedPajama-INCITE-7B-Instruct$0.2$0.2-
togethercomputer/StripedHyena-Hessian-7B$0.2$0.2-
togethercomputer/StripedHyena-Nous-7B$0.2$0.2-
togethercomputer/alpaca-7b$0.2$0.2-
zero-one-ai/Yi-6B$0.2$0.2-
together-ai-4.1b-8b$0.2$0.2-
Qwen3-235B-A22B-Instruct-2507-tput$0.2$6-
Qwen3-235B-A22B-fp8-tput$0.2$0.6-
GLM-4.5-Air-FP8$0.2$1.1-
NousResearch/Nous-Hermes-Llama2-13b$0.225$0.225-
codellama/CodeLlama-13b-Instruct-hf$0.225$0.225-
meta-llama/Llama-2-13b-chat-hf$0.225$0.225-
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8$0.27$0.85-
Llama-4-Maverick-17B-128E-Instruct-FP8$0.27$0.85-
Austism/chronos-hermes-13b$0.3$0.3-
Gryphe/MythoMax-L2-13b$0.3$0.3-
Nexusflow/NexusRaven-V2-13B$0.3$0.3-
Qwen/Qwen1.5-14B$0.3$0.3-
Undi95/ReMM-SLERP-L2-13B$0.3$0.3-
WizardLM/WizardLM-13B-V1.2$0.3$0.3-
lmsys/vicuna-13b-v1.5$0.3$0.3-
upstage/SOLAR-10.7B-Instruct-v1.0$0.3$0.3-
together-ai-8.1b-21b$0.3$0.3-
GLM-4.7$0.45$2-
Kimi-K2.5$0.5$2.8-
meta-llama/Meta-Llama-3-70B-Instruct-Lite$0.54$0.54-
DeepSeek-R1-0528-tput$0.55$2.19-
DeepSeek-V3.1$0.6$1.7-
Mixtral-8x7B-Instruct-v0.1$0.6$0.6-
GLM-4.6$0.6$2.2-
Qwen3.5-397B-A17B$0.6$3.6-
Qwen3-235B-A22B-Thinking-2507$0.65$3-
codellama/CodeLlama-34b-Instruct-hf$0.776$0.776-
NousResearch/Nous-Hermes-2-Yi-34B$0.8$0.8-
deepseek-ai/deepseek-coder-33b-instruct$0.8$0.8-
zero-one-ai/Yi-34B$0.8$0.8-
together-ai-21.1b-41b$0.8$0.8-
meta-llama/Meta-Llama-3.3-70B-Instruct-Turbo$0.88$0.88-
meta-llama/Llama-3.3-70B-Instruct-Turbo$0.88$0.88-
meta-llama/Meta-Llama-3-70B-Instruct-Turbo$0.88$0.88-
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo$0.88$0.88-
Llama-3.3-70B-Instruct-Turbo$0.88$0.88-
Meta-Llama-3.1-70B-Instruct-Turbo$0.88$0.88-
mistralai/Mixtral-8x7B-Instruct-v0.1$0.9$0.9-
NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO$0.9$0.9-
NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT$0.9$0.9-
Qwen/Qwen1.5-72B$0.9$0.9-
codellama/CodeLlama-70b-Instruct-hf$0.9$0.9-
garage-bAInd/Platypus2-70B-instruct$0.9$0.9-
meta-llama/Llama-2-70b-chat-hf$0.9$0.9-
meta-llama/Llama-3-70b-chat-hf$0.9$0.9-
mistralai/Mixtral-8x7B-v0.1$0.9$0.9-
together-ai-41.1b-80b$0.9$0.9-
Kimi-K2-Instruct$1$3-
Kimi-K2-Instruct-0905$1$3-
Qwen/Qwen2.5-72B-Instruct-Turbo$1.2$1.2-
microsoft/WizardLM-2-8x22B$1.2$1.2-
DeepSeek-V3$1.25$1.25-
together-ai-81.1b-110b$1.8$1.8-
Qwen3-Coder-480B-A35B-Instruct-FP8$2$2-
mistralai/Mixtral-8x22B-Instruct-v0.1$2.4$2.4-
DeepSeek-R1$3$7-
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo$3.5$3.5-
Meta-Llama-3.1-405B-Instruct-Turbo$3.5$3.5-

Track Together AI costs with LLMKit

Proxy your Together AI requests through LLMKit. Every call gets logged with token counts, dollar costs, and session attribution. Set budget limits that actually reject requests before they hit the provider.

MIT licensed. Built with Claude Code. Source on GitHub