> ## Documentation Index
> Fetch the complete documentation index at: https://docs.elumenta.ru/llms.txt
> Use this file to discover all available pages before exploring further.

# Speech-to-Text

> Transcribe audio files using Whisper, GPT-4o Transcribe, or ElevenLabs Scribe

<Note>
  STT uses a dedicated multipart endpoint `POST /api/v2/stt`, not the standard `/generate` endpoint.
</Note>

## Request

```bash theme={null}
curl -X POST https://elumenta.ru/api/v2/stt \
  -H "Authorization: Bearer nb_YOUR_API_KEY" \
  -F "audio=@recording.mp3" \
  -F "model=whisper" \
  -F "language=en"
```

<ParamField header="Authorization" type="string" required>
  `Authorization: Bearer nb_YOUR_API_KEY`
</ParamField>

<ParamField body="audio" type="file" required>
  Audio file. Supported formats: `mp3`, `mp4`, `wav`, `m4a`, `ogg`, `flac`, `webm`. Max size: 25 MB.
</ParamField>

<ParamField body="model" type="string" default="whisper">
  STT model slug. See table below.
</ParamField>

<ParamField body="language" type="string">
  Language code (e.g. `en`, `ru`, `es`). Optional — auto-detected if omitted.
</ParamField>

<ParamField body="diarize" type="boolean" default="false">
  Speaker diarization. Only supported by `elevenlabs-scribe`.
</ParamField>

## Models

| Slug                | Provider   | Tier    | Cost  | Notes                                   |
| ------------------- | ---------- | ------- | ----- | --------------------------------------- |
| `whisper`           | OpenAI     | Starter | 2 tkn | Fast, 99 languages                      |
| `gpt-4o-transcribe` | OpenAI     | Basic+  | 2 tkn | Highest accuracy                        |
| `elevenlabs-scribe` | ElevenLabs | Basic+  | 2 tkn | Best for meetings, supports diarization |

## Response

```json theme={null}
{
  "id": 18510,
  "status": "completed",
  "model_slug": "whisper",
  "result_text": "Hello and welcome to today's episode...",
  "tokens_spent": 2,
  "processing_ms": 3420
}
```

## Diarization (who said what)

Available only with `elevenlabs-scribe`:

```bash theme={null}
curl -X POST https://elumenta.ru/api/v2/stt \
  -H "Authorization: Bearer nb_YOUR_API_KEY" \
  -F "audio=@meeting.mp3" \
  -F "model=elevenlabs-scribe" \
  -F "diarize=true"
```

Response includes speaker labels in `result_text`:

```
[Speaker 1]: Hello, let's start the meeting.
[Speaker 2]: Sure, I have three points to discuss.
```

## Code Examples

<CodeGroup>
  ```python Python theme={null}
  import requests

  with open("audio.mp3", "rb") as f:
      response = requests.post(
          "https://elumenta.ru/api/v2/stt",
          headers={"Authorization": "Bearer nb_YOUR_API_KEY"},
          files={"audio": f},
          data={"model": "whisper", "language": "en"}
      )

  print(response.json()["result_text"])
  ```

  ```javascript JavaScript theme={null}
  const form = new FormData();
  form.append("audio", audioBlob, "recording.mp3");
  form.append("model", "gpt-4o-transcribe");

  const res = await fetch("https://elumenta.ru/api/v2/stt", {
    method: "POST",
    headers: { "Authorization": "Bearer nb_YOUR_API_KEY" },
    body: form
  });

  const data = await res.json();
  console.log(data.result_text);
  ```
</CodeGroup>
