Transcription

Converts audio to text in the specified language.

Endpoint

POST https://api.sambanova.ai/v1/audio/transcriptions

Request parameters

The following tables outline the parameters required to make a transcription request, parameter type, description, and default values.

Whisper Large v3

Parameter	Type	Description	Default
`model`	String	The ID of the model to use.	Required
`prompt`	String	Prompt provided to influence transcription style or vocabulary. Example: Please transcribe carefully, including pauses and hesitations. “	Optional
`temperature`	Number	Sampling temperature between 0 and 1. Higher values (e.g., 0.8) increase randomness, while lower values (e.g., 0.2) make output more focused.	0
`file`	File	Audio file in FLAC, MP3, MP4, MPEG, MPGA, M4A, Ogg, WAV, or WebM format. File size limit is 25MB.	Required
`response_format`	String	Output format: JSON or text.	`json`
`language`	String	The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.	Required

Qwen2-Audio-7B-Instruct

Parameter	Type	Description	Default
`model`	String	The ID of the model to use.	Required
`messages`	Message	A list of messages containing role (user/system/assistant), type (text/audio_content), and audio_content (base64 audio content).	Required
`response_format`	String	The output format is either json or text.	`json`
`temperature`	Number	Sampling temperature between 0 and 1. Higher values (e.g., 0.8) increase randomness, while lower values (e.g., 0.2) make output more focused.	0
`max_tokens`	Number	The maximum number of tokens to generate.	1000
`file`	File	Audio file in FLAC, MP3, MP4, MPEG, MPGA, M4A, Ogg, WAV, or WebM format. Each single file must not exceed 30 seconds in duration.	Required
`language`	String	The target language for transcription or translation.	Optional
`stream`	Boolean	Enables streaming responses.	false
`stream_options`	Object	Additional streaming configuration (e.g., {“include_usage”: true}).	Optional

Request format

CURL

This section provides examples of how to send a request using different methods.

curl --location 'https://api.sambanova.ai/v1/audio/transcriptions' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--form 'model="Qwen2-Audio-7B-Instruct"' \
--form 'language="spanish"' \
--form 'response_format="json"' \
--form 'temperature="0.01"' \
--form 'file=@"/path/to/audio/file.mp3"' \
--form 'stream="true"'

Python

import requests

def transcribe_audio(audio_file_path, api_key, language="english"):
  headers = {"Authorization": f"Bearer {api_key}"}

  files = {"file": open(audio_file_path, "rb")}

  data = {
      "model": "Qwen2-Audio-7B-Instruct",
      "language": language,
      "response_format": "json",
      "temperature": 0.01,
      "stream": true,  # Optional
  }

  response = requests.post(
      "https://api.sambanova.ai/v1/audio/transcriptions",
      headers=headers,
      files=files,
      data=data,
  )

  return response.json()

Response format

The API returns a translation of the input audio in the selected format.

JSON

{
    "text": "It's a sound effect of a bell chiming, specifically a church bell."
}

Text

It's a sound effect of a bell chiming, specifically a church bell.

Endpoints

Using the API

Endpoint

Request parameters

Whisper Large v3

Qwen2-Audio-7B-Instruct

Request format

CURL

Python

Response format

JSON

Text

Endpoints

Using the API

​Endpoint

​Request parameters

​Whisper Large v3

​Qwen2-Audio-7B-Instruct

​Request format

​CURL

​Python

​Response format

​JSON

​Text

Endpoint

Request parameters

Whisper Large v3

Qwen2-Audio-7B-Instruct

Request format

CURL

Python

Response format

JSON

Text