openai

OpenAI: GPT Audio

GPT Audio is OpenAI's model built for workflows that combine text and audio input, accepting both modalities within a 128,000-token context window. It supports tool use, which makes it usable in agentic pipelines, but it does not offer reasoning mode or confirmed structured output support. Completions are capped at 16,384 tokens per response. At $2.50 per million input tokens and $10.00 per million output tokens, this sits in a mid-to-upper price range, and there is currently no independent benchmark coverage to validate where it stands against alternatives. Buyers who need native audio comprehension alongside text in a single API call have limited options, so GPT Audio may be worth shortlisting on capability fit alone. That said, the absence of benchmark data means performance claims are unverified, and teams with tight budgets or quality thresholds should treat this as an early-stage choice until independent evaluations are available.

Query via API → View on openai → Estimate cost

Quality Score

84/100

price + capability + benchmarks

Input Price

$2.50

per 1M tokens

Output Price

$10.00

per 1M tokens

Context Window

128,000

tokens

Model ID: openai/gpt-audio
Vendor: openai
Tokenizer: GPT
Input Modalities: text, audio
Output Modalities: text, audio
Max Output: 16,384 tokens
Tool Calling: ✓ supported
Structured Output: ✓ supported
Reasoning Mode: not supported
Vision: text only
Audio: ✓ accepts audio
Moderated: yes

Category rankings

Where OpenAI: GPT Audio places across the 3 categories it ranks in. How we rank →

#	Category	Score
#18	Audio SummarizationVoice · of 19 ranked	104
#18	TTS ReplacementVoice · of 19 ranked	99
#19	TranscriptionVoice · of 19 ranked	99

Similar models

openai

OpenAI: GPT Audio

Category rankings

Similar models

OpenAI: GPT-5.5 Pro

OpenAI: GPT-5.4 Pro

OpenAI: GPT-5.2 Pro

OpenAI: o3 Deep Research

OpenAI: GPT-5 Pro

OpenAI: o3 Pro