nvidia

NVIDIA: Nemotron 3 Ultra

Nemotron 3 Ultra is a text-in, text-out model from NVIDIA with a 1-million-token context window and a per-request output cap of 16,384 tokens. It supports tool use and reasoning, which makes it a candidate for agentic workflows and multi-step tasks. Structured output support is unconfirmed, so teams with strict schema requirements should verify compatibility before committing. At $0.50 per million input tokens and $2.20 per million output tokens, the pricing sits in a competitive range for reasoning-capable models, but there is currently no independent benchmark coverage to validate its actual performance. Buyers who need a long-context reasoning model and are open to evaluating an unproven option might shortlist it, particularly if they already operate within the NVIDIA ecosystem. Anyone prioritizing head-to-head benchmark comparisons should wait for third-party results before making it a primary choice.

Quality Score
98/100
price + capability + benchmarks
Input Price
$0.50
per 1M tokens
Output Price
$2.20
per 1M tokens
Context Window
1,000,000
tokens
Model ID
nvidia/nemotron-3-ultra-550b-a55b
Vendor
nvidia
Tokenizer
Other
Input Modalities
text
Output Modalities
text
Max Output
16,384 tokens
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
✓ supported
Vision
text only
Audio
no
Moderated
no

Similar models