Request body
sourceUrl— Absolute URL to the page. Required iffileis not provided.file— Optional HTML snapshot (text/html) uploaded viamultipart/form-data.sourceName— Optional label stored with the extracted record.options— Optional object. Supported keys:selector— CSS selector that scopes extraction to a specific container.stripSelectors— Array of selectors to remove (ads, nav, etc.).segmentLength— Target characters per chunk (default1200).language— ISO language hint for improved segmentation.
webhookUrl— Optional HTTPS URL Horizon should call when the extraction finishes.
Sample request
Response
Returns202 Accepted with jobId, status, and statusUrl. If the URL is concise, the request may finish synchronously and include the normalized chunks under result.
Notes
- The extractor renders the page with a headless browser to execute light client-side JavaScript. Heavier SPAs may require exporting content or using the crawl endpoint.
- Use
stripSelectorsto remove headers/footers, cookie banners, or social widgets before chunking. - Authentication-gated URLs are not supported; provide publicly accessible pages or host signed snapshots.
- Poll
GET /jobs/{jobId}(same as thestatusUrl) to monitor progress or retrieve the extracted chunks later. - To upload a static HTML snapshot instead of crawling, send
multipart/form-datawith afilefield.
x402 flow
Website extraction is priced per page via Coinbase’s x402 protocol. A missing proof results in:/verify and /settle, then replay the request with the facilitator-issued Base64 payload in X-PAYMENT. Horizon resumes processing and returns settlement details via X-PAYMENT-RESPONSE.Body
application/json
Provide either sourceUrl or file.
Extraction hints such as language, segmentLength, transcriptionModel, or sheet preferences depending on the endpoint.
Webhook to call when the extraction completes.
Upload the raw file instead of providing sourceUrl.
Response
Extraction job accepted
Example:
"job_01hx9q9"
Available options:
queued, processing, completed, failed Canonical link to GET /jobs/{jobId} for this job.
Example:
"extract/pdf"
Present when the job completes synchronously.
Estimated seconds until completion.