Skip to main content
Transcribe audio files into timestamped text segments with optional speaker diarization. Ideal for podcasts, interviews, and recorded calls that need to be searchable or summarized.

Request body

  • sourceUrl — HTTPS or signed URL to the audio file (MP3, WAV, FLAC, etc.). Required if file is not provided.
  • file — Optional uploaded audio file using multipart/form-data.
  • sourceName — Optional label saved with the transcript.
  • options — Optional object. Supported keys:
    • transcriptionModel — Preferred model (e.g., whisper-large-v3); defaults to Horizon’s balanced model.
    • speakerLabels — Boolean; enable speaker diarization (default false).
    • segmentLength — Target characters per chunk (default 800).
    • language — ISO language hint to improve transcription accuracy.
  • webhookUrl — Optional HTTPS URL Horizon should call when the transcription finishes.

Sample request

curl https://api.worklet.cloud/v1/extract/audio \
  -H "Content-Type: application/json" \
  -d '{
    "sourceUrl": "https://cdn.example.com/audio/office-hours-42.mp3",
    "sourceName": "Office Hours 42",
    "options": {
      "transcriptionModel": "whisper-large-v3",
      "speakerLabels": true,
      "segmentLength": 700,
      "language": "en"
    }
  }'

# or upload the raw audio (Base64 encoded)

curl https://api.worklet.cloud/v1/extract/audio \
  -H "Content-Type: application/json" \
  -d '{
    "file": "data:audio/mpeg;base64,//uQZAAAAAAAAAAAAAAAA...",
    "sourceName": "Office Hours 42",
    "options": {
      "transcriptionModel": "whisper-large-v3",
      "speakerLabels": true,
      "segmentLength": 700,
      "language": "en"
    }
  }'

Response

Returns 202 Accepted with jobId, status, statusUrl, and optional etaSeconds. When processing finishes, result contains transcript chunks with timestamps and, when enabled, speaker labels.

Notes

  • Audio longer than 30 minutes always runs asynchronously; use statusUrl or webhooks for completion.
  • Provide language hints for multilingual content to reduce latency.
  • Speaker diarization adds processing time but produces cleaner dialogue segmentation.
  • Poll GET /jobs/{jobId} (equivalent to statusUrl) when you want to check progress or download transcripts later.
  • To upload the audio directly, send multipart/form-data with a file field instead of sourceUrl.

x402 flow

Audio extraction is billed per minute via Coinbase’s x402 protocol. A missing proof yields:
HTTP/1.1 402 Payment Required
Content-Type: application/json

{
  "x402Version": 1,
  "accepts": [
    {
      "scheme": "exact",
      "network": "base-sepolia",
      "maxAmountRequired": "220000",
      "resource": "POST /extract/audio",
      "description": "Horizon audio transcription",
      "mimeType": "application/json",
      "payTo": "0xYourReceivingWallet",
      "maxTimeoutSeconds": 600,
      "asset": "0xYourUSDCContract",
      "extra": {
        "name": "USDC",
        "version": "1"
      }
    }
  ],
  "error": null
}
Send the challenge to your facilitator, run /verify and /settle, then replay the POST with the Base64 payload in X-PAYMENT. Horizon restarts the job and returns X-PAYMENT-RESPONSE on success.