NVIDIA: Nemotron 3 Ultra
Nemotron 3 Ultra is a text-in, text-out model from NVIDIA with a 1-million-token context window and a per-request output cap of 16,384 tokens. It supports tool use and reasoning, which makes it a candidate for agentic workflows and multi-step tasks. Structured output support is unconfirmed, so teams with strict schema requirements should verify compatibility before committing. At $0.50 per million input tokens and $2.20 per million output tokens, the pricing sits in a competitive range for reasoning-capable models, but there is currently no independent benchmark coverage to validate its actual performance. Buyers who need a long-context reasoning model and are open to evaluating an unproven option might shortlist it, particularly if they already operate within the NVIDIA ecosystem. Anyone prioritizing head-to-head benchmark comparisons should wait for third-party results before making it a primary choice.
- Model ID
- nvidia/nemotron-3-ultra-550b-a55b
- Vendor
- nvidia
- Tokenizer
- Other
- Input Modalities
- text
- Output Modalities
- text
- Max Output
- 16,384 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- text only
- Audio
- no
- Moderated
- no