Parameter | Type | Description | Default |
---|---|---|---|
model | String | The ID of the model to use. | Required |
prompt | String | Prompt provided to influence transcription style or vocabulary. Example: Please transcribe carefully, including pauses and hesitations. “ | Optional |
temperature | Number | Sampling temperature between 0 and 1. Higher values (e.g., 0.8) increase randomness, while lower values (e.g., 0.2) make output more focused. | 0 |
file | File | Audio file in FLAC, MP3, MP4, MPEG, MPGA, M4A, Ogg, WAV, or WebM format. File size limit is 25MB. | Required |
response_format | String | Output format: JSON or text. | json |
language | String | The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency. | Required |
Parameter | Type | Description | Default |
---|---|---|---|
model | String | The ID of the model to use. | Required |
messages | Message | A list of messages containing role (user/system/assistant), type (text/audio_content), and audio_content (base64 audio content). | Required |
response_format | String | The output format is either json or text. | json |
temperature | Number | Sampling temperature between 0 and 1. Higher values (e.g., 0.8) increase randomness, while lower values (e.g., 0.2) make output more focused. | 0 |
max_tokens | Number | The maximum number of tokens to generate. | 1000 |
file | File | Audio file in FLAC, MP3, MP4, MPEG, MPGA, M4A, Ogg, WAV, or WebM format. Each single file must not exceed 30 seconds in duration. | Required |
language | String | The target language for transcription or translation. | Optional |
stream | Boolean | Enables streaming responses. | false |
stream_options | Object | Additional streaming configuration (e.g., {“include_usage”: true}). | Optional |