nvidia

NVIDIA: Nemotron 3 Ultra

Nemotron 3 Ultra is a text-in, text-out model from NVIDIA with a 1-million-token context window and a per-request output cap of 16,384 tokens. It supports tool use and reasoning, which makes it a candidate for agentic workflows and multi-step tasks. Structured output support is unconfirmed, so teams with strict schema requirements should verify compatibility before committing. At $0.50 per million input tokens and $2.20 per million output tokens, the pricing sits in a competitive range for reasoning-capable models, but there is currently no independent benchmark coverage to validate its actual performance. Buyers who need a long-context reasoning model and are open to evaluating an unproven option might shortlist it, particularly if they already operate within the NVIDIA ecosystem. Anyone prioritizing head-to-head benchmark comparisons should wait for third-party results before making it a primary choice.

Query via API → View on nvidia → Estimate cost

Quality Score

98/100

price + capability + benchmarks

Input Price

$0.50

per 1M tokens

Output Price

$2.20

per 1M tokens

Context Window

1,000,000

tokens

Model ID: nvidia/nemotron-3-ultra-550b-a55b
Vendor: nvidia
Tokenizer: Other
Input Modalities: text
Output Modalities: text
Max Output: 16,384 tokens
Tool Calling: ✓ supported
Structured Output: ✓ supported
Reasoning Mode: ✓ supported
Vision: text only
Audio: no
Moderated: no

Similar models

nvidia

NVIDIA: Nemotron 3 Ultra

Similar models

NVIDIA: Nemotron 3 Super

NVIDIA: Nemotron 3 Nano 30B A3B

NVIDIA: Nemotron 3 Nano Omni (free)

NVIDIA: Nemotron 3 Super (free)

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

NVIDIA: Nemotron Nano 12B 2 VL (free)