Request body
sourceUrl— HTTPS or signed URL to the audio file (MP3, WAV, FLAC, etc.). Required iffileis not provided.file— Optional uploaded audio file usingmultipart/form-data.sourceName— Optional label saved with the transcript.options— Optional object. Supported keys:transcriptionModel— Preferred model (e.g.,whisper-large-v3); defaults to Horizon’s balanced model.speakerLabels— Boolean; enable speaker diarization (defaultfalse).segmentLength— Target characters per chunk (default800).language— ISO language hint to improve transcription accuracy.
webhookUrl— Optional HTTPS URL Horizon should call when the transcription finishes.
Sample request
Response
Returns202 Accepted with jobId, status, statusUrl, and optional etaSeconds. When processing finishes, result contains transcript chunks with timestamps and, when enabled, speaker labels.
Notes
- Audio longer than 30 minutes always runs asynchronously; use
statusUrlor webhooks for completion. - Provide language hints for multilingual content to reduce latency.
- Speaker diarization adds processing time but produces cleaner dialogue segmentation.
- Poll
GET /jobs/{jobId}(equivalent tostatusUrl) when you want to check progress or download transcripts later. - To upload the audio directly, send
multipart/form-datawith afilefield instead ofsourceUrl.
x402 flow
Audio extraction is billed per minute via Coinbase’s x402 protocol. A missing proof yields:/verify and /settle, then replay the POST with the Base64 payload in X-PAYMENT. Horizon restarts the job and returns X-PAYMENT-RESPONSE on success.Body
application/json
Provide either sourceUrl or file.
Extraction hints such as language, segmentLength, transcriptionModel, or sheet preferences depending on the endpoint.
Webhook to call when the extraction completes.
Upload the raw file instead of providing sourceUrl.
Response
Extraction job accepted
Example:
"job_01hx9q9"
Available options:
queued, processing, completed, failed Canonical link to GET /jobs/{jobId} for this job.
Example:
"extract/pdf"
Present when the job completes synchronously.
Estimated seconds until completion.