.doc, .docx) or similar word-processing files into clean text blocks. Styles, tables, and lists are normalized while preserving heading hierarchy for downstream indexing.
Request body
sourceUrl— HTTPS or signed URL pointing to the document. Required iffileis not provided.file— Optional uploaded document (.doc,.docx, etc.) usingmultipart/form-data.sourceName— Optional label for the extracted dataset (e.g.,Pricing Policy).options— Optional object. Supported keys:segmentLength— Target characters per chunk (default1000).language— ISO language hint to improve sentence segmentation.includeComments— Boolean; include tracked changes and comments in output (defaultfalse).
webhookUrl— Optional HTTPS URL Horizon should call when the extraction finishes.
Sample request
Response
Returns202 Accepted with jobId, status, and statusUrl. When the doc is small, extracted chunks appear immediately in result.
Notes
- Track changes are removed by default; set
includeComments: trueto retain reviewer notes. - Embedded images are ignored; captions are extracted where available.
- Use
segmentLengthto tune chunk size for language models or vector storage. - Poll
GET /jobs/{jobId}(matches thestatusUrl) to check progress or fetch the final output later on demand. - To upload the document directly, send
multipart/form-datawith afilefield instead ofsourceUrl.
x402 flow
Word document extraction is priced via Coinbase’s x402 protocol. A missing proof yields a challenge like:accepts payload to your facilitator, complete /verify and /settle, then replay the request with the Base64 token in X-PAYMENT. Horizon validates the proof, resumes extraction, and includes X-PAYMENT-RESPONSE on success. Refer to the Coinbase quickstart if you need a reference implementation.Body
application/json
Provide either sourceUrl or file.
Extraction hints such as language, segmentLength, transcriptionModel, or sheet preferences depending on the endpoint.
Webhook to call when the extraction completes.
Upload the raw file instead of providing sourceUrl.
Response
Extraction job accepted
Example:
"job_01hx9q9"
Available options:
queued, processing, completed, failed Canonical link to GET /jobs/{jobId} for this job.
Example:
"extract/pdf"
Present when the job completes synchronously.
Estimated seconds until completion.