105 models. Open-source models on Together AI infrastructure. Prices per 1M tokens in USD.
Cheapest input
$0.05/1M
gpt-oss-20b
Most expensive input
$3.5/1M
Meta-Llama-3.1-405B-Instruct-Turbo
Models with cache pricing
0 of 105
| Model | Input $/1M | Output $/1M | Cache $/1M |
|---|---|---|---|
| gpt-oss-20b | $0.05 | $0.2 | - |
| Qwen/Qwen1.5-0.5B | $0.1 | $0.1 | - |
| Qwen/Qwen1.5-1.8B | $0.1 | $0.1 | - |
| Qwen/Qwen1.5-4B | $0.1 | $0.1 | - |
| google/gemma-2b | $0.1 | $0.1 | - |
| meta-llama/Meta-Llama-3-8B-Instruct-Lite | $0.1 | $0.1 | - |
| microsoft/phi-2 | $0.1 | $0.1 | - |
| togethercomputer/RedPajama-INCITE-Base-3B-v1 | $0.1 | $0.1 | - |
| togethercomputer/RedPajama-INCITE-Chat-3B-v1 | $0.1 | $0.1 | - |
| togethercomputer/RedPajama-INCITE-Instruct-3B-v1 | $0.1 | $0.1 | - |
| together-ai-up-to-4b | $0.1 | $0.1 | - |
| gpt-oss-120b | $0.15 | $0.6 | - |
| Qwen3-Next-80B-A3B-Instruct | $0.15 | $1.5 | - |
| Qwen3-Next-80B-A3B-Thinking | $0.15 | $1.5 | - |
| meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo | $0.18 | $0.18 | - |
| meta-llama/Llama-4-Scout-17B-16E-Instruct | $0.18 | $0.59 | - |
| meta-llama/Meta-Llama-3-8B-Instruct-Turbo | $0.18 | $0.18 | - |
| Llama-4-Scout-17B-16E-Instruct | $0.18 | $0.59 | - |
| Meta-Llama-3.1-8B-Instruct-Turbo | $0.18 | $0.18 | - |
| NousResearch/Nous-Capybara-7B-V1p9 | $0.2 | $0.2 | - |
| NousResearch/Nous-Hermes-llama-2-7b | $0.2 | $0.2 | - |
| Open-Orca/Mistral-7B-OpenOrca | $0.2 | $0.2 | - |
| Qwen/Qwen1.5-7B | $0.2 | $0.2 | - |
| Undi95/Toppy-M-7B | $0.2 | $0.2 | - |
| allenai/OLMo-7B | $0.2 | $0.2 | - |
| codellama/CodeLlama-7b-Instruct-hf | $0.2 | $0.2 | - |
| google/gemma-7b | $0.2 | $0.2 | - |
| lmsys/vicuna-7b-v1.5 | $0.2 | $0.2 | - |
| meta-llama/Llama-2-7b-chat-hf | $0.2 | $0.2 | - |
| meta-llama/Llama-3-8b-chat-hf | $0.2 | $0.2 | - |
| mistralai/Mistral-7B-Instruct-v0.1 | $0.2 | $0.2 | - |
| mistralai/Mistral-7B-Instruct-v0.2 | $0.2 | $0.2 | - |
| mistralai/Mistral-7B-v0.1 | $0.2 | $0.2 | - |
| openchat/openchat-3.5-1210 | $0.2 | $0.2 | - |
| snorkelai/Snorkel-Mistral-PairRM-DPO | $0.2 | $0.2 | - |
| teknium/OpenHermes-2-Mistral-7B | $0.2 | $0.2 | - |
| teknium/OpenHermes-2p5-Mistral-7B | $0.2 | $0.2 | - |
| togethercomputer/GPT-JT-Moderation-6B | $0.2 | $0.2 | - |
| togethercomputer/Llama-2-7B-32K-Instruct | $0.2 | $0.2 | - |
| togethercomputer/RedPajama-INCITE-7B-Base | $0.2 | $0.2 | - |
| togethercomputer/RedPajama-INCITE-7B-Chat | $0.2 | $0.2 | - |
| togethercomputer/RedPajama-INCITE-7B-Instruct | $0.2 | $0.2 | - |
| togethercomputer/StripedHyena-Hessian-7B | $0.2 | $0.2 | - |
| togethercomputer/StripedHyena-Nous-7B | $0.2 | $0.2 | - |
| togethercomputer/alpaca-7b | $0.2 | $0.2 | - |
| zero-one-ai/Yi-6B | $0.2 | $0.2 | - |
| together-ai-4.1b-8b | $0.2 | $0.2 | - |
| Qwen3-235B-A22B-Instruct-2507-tput | $0.2 | $6 | - |
| Qwen3-235B-A22B-fp8-tput | $0.2 | $0.6 | - |
| GLM-4.5-Air-FP8 | $0.2 | $1.1 | - |
| NousResearch/Nous-Hermes-Llama2-13b | $0.225 | $0.225 | - |
| codellama/CodeLlama-13b-Instruct-hf | $0.225 | $0.225 | - |
| meta-llama/Llama-2-13b-chat-hf | $0.225 | $0.225 | - |
| meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 | $0.27 | $0.85 | - |
| Llama-4-Maverick-17B-128E-Instruct-FP8 | $0.27 | $0.85 | - |
| Austism/chronos-hermes-13b | $0.3 | $0.3 | - |
| Gryphe/MythoMax-L2-13b | $0.3 | $0.3 | - |
| Nexusflow/NexusRaven-V2-13B | $0.3 | $0.3 | - |
| Qwen/Qwen1.5-14B | $0.3 | $0.3 | - |
| Undi95/ReMM-SLERP-L2-13B | $0.3 | $0.3 | - |
| WizardLM/WizardLM-13B-V1.2 | $0.3 | $0.3 | - |
| lmsys/vicuna-13b-v1.5 | $0.3 | $0.3 | - |
| upstage/SOLAR-10.7B-Instruct-v1.0 | $0.3 | $0.3 | - |
| together-ai-8.1b-21b | $0.3 | $0.3 | - |
| GLM-4.7 | $0.45 | $2 | - |
| Kimi-K2.5 | $0.5 | $2.8 | - |
| meta-llama/Meta-Llama-3-70B-Instruct-Lite | $0.54 | $0.54 | - |
| DeepSeek-R1-0528-tput | $0.55 | $2.19 | - |
| DeepSeek-V3.1 | $0.6 | $1.7 | - |
| Mixtral-8x7B-Instruct-v0.1 | $0.6 | $0.6 | - |
| GLM-4.6 | $0.6 | $2.2 | - |
| Qwen3.5-397B-A17B | $0.6 | $3.6 | - |
| Qwen3-235B-A22B-Thinking-2507 | $0.65 | $3 | - |
| codellama/CodeLlama-34b-Instruct-hf | $0.776 | $0.776 | - |
| NousResearch/Nous-Hermes-2-Yi-34B | $0.8 | $0.8 | - |
| deepseek-ai/deepseek-coder-33b-instruct | $0.8 | $0.8 | - |
| zero-one-ai/Yi-34B | $0.8 | $0.8 | - |
| together-ai-21.1b-41b | $0.8 | $0.8 | - |
| meta-llama/Meta-Llama-3.3-70B-Instruct-Turbo | $0.88 | $0.88 | - |
| meta-llama/Llama-3.3-70B-Instruct-Turbo | $0.88 | $0.88 | - |
| meta-llama/Meta-Llama-3-70B-Instruct-Turbo | $0.88 | $0.88 | - |
| meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo | $0.88 | $0.88 | - |
| Llama-3.3-70B-Instruct-Turbo | $0.88 | $0.88 | - |
| Meta-Llama-3.1-70B-Instruct-Turbo | $0.88 | $0.88 | - |
| mistralai/Mixtral-8x7B-Instruct-v0.1 | $0.9 | $0.9 | - |
| NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO | $0.9 | $0.9 | - |
| NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT | $0.9 | $0.9 | - |
| Qwen/Qwen1.5-72B | $0.9 | $0.9 | - |
| codellama/CodeLlama-70b-Instruct-hf | $0.9 | $0.9 | - |
| garage-bAInd/Platypus2-70B-instruct | $0.9 | $0.9 | - |
| meta-llama/Llama-2-70b-chat-hf | $0.9 | $0.9 | - |
| meta-llama/Llama-3-70b-chat-hf | $0.9 | $0.9 | - |
| mistralai/Mixtral-8x7B-v0.1 | $0.9 | $0.9 | - |
| together-ai-41.1b-80b | $0.9 | $0.9 | - |
| Kimi-K2-Instruct | $1 | $3 | - |
| Kimi-K2-Instruct-0905 | $1 | $3 | - |
| Qwen/Qwen2.5-72B-Instruct-Turbo | $1.2 | $1.2 | - |
| microsoft/WizardLM-2-8x22B | $1.2 | $1.2 | - |
| DeepSeek-V3 | $1.25 | $1.25 | - |
| together-ai-81.1b-110b | $1.8 | $1.8 | - |
| Qwen3-Coder-480B-A35B-Instruct-FP8 | $2 | $2 | - |
| mistralai/Mixtral-8x22B-Instruct-v0.1 | $2.4 | $2.4 | - |
| DeepSeek-R1 | $3 | $7 | - |
| meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo | $3.5 | $3.5 | - |
| Meta-Llama-3.1-405B-Instruct-Turbo | $3.5 | $3.5 | - |
Proxy your Together AI requests through LLMKit. Every call gets logged with token counts, dollar costs, and session attribution. Set budget limits that actually reject requests before they hit the provider.
MIT licensed. Built with Claude Code. Source on GitHub