Skip to main content
POST
/
crawl
Request content crawl
curl --request POST \
  --url https://api.horizon.new/v1/crawl \
  --header 'Content-Type: application/json' \
  --data '{
  "url": "<string>",
  "depth": 1,
  "priority": "normal",
  "notify": {
    "url": "<string>",
    "events": [
      "completed"
    ]
  },
  "webhookUrl": "<string>"
}'
{
  "crawlId": "crawl_01hx9ra",
  "status": "queued",
  "statusUrl": "<string>",
  "etaSeconds": 123
}
Submit a URL for the Horizon crawler to ingest as part of the extraction pipeline. Crawled pages supply training data for search relevance, contextual prompting, and guardrails. The crawler runs asynchronously and respects directives in robots.txt.

Request body

  • url — Required absolute URL to the page or sitemap.
  • depth — Optional integer for how many link levels to follow (0–3); defaults to 1.
  • priority — Optional value (low, normal, high) that influences queue time.
  • notify — Optional webhook configuration ({ "url": "...", "events": ["completed"] }).
  • webhookUrl — Optional shorthand to register a completion webhook for this crawl job.

Sample request

curl https://api.horizon.new/v1/crawl \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.horizon.new",
    "depth": 2,
    "priority": "high",
    "notify": {
      "url": "https://example.com/hooks/crawl",
      "events": ["completed", "failed"]
    }
  }'

Response

Returns 202 Accepted with a crawlId, status (queued), and the normalized rootUrl. Monitor progress via the statusUrl in the response or subscribe to webhook notifications.

Behavior

  • The crawler deduplicates submissions; repeated requests extend expiration but reuse the same crawlId.
  • Scripts, tracking pixels, and assets over 10 MB are skipped to keep the dataset lean.
  • Poll GET /jobs/{jobId} (the statusUrl targets this endpoint) to observe progress until the crawl completes.

x402 flow

Ingestion work is priced per crawl via Coinbase’s x402 protocol. When the account needs funding, Horizon returns a 402 challenge:
HTTP/1.1 402 Payment Required
Content-Type: application/json

{
  "x402Version": 1,
  "accepts": [
    {
      "scheme": "exact",
      "network": "base-sepolia",
      "maxAmountRequired": "300000",               // $0.30 in 6‑decimal USDC
      "resource": "POST /crawl",
      "description": "Horizon crawl job",
      "mimeType": "application/json",
      "payTo": "0xYourReceivingWallet",
      "maxTimeoutSeconds": 300,
      "asset": "0xYourUSDCContract",
      "extra": {
        "name": "USDC",
        "version": "1"
      }
    }
  ],
  "error": null
}
Handle the challenge with the Coinbase workflow:
  1. Provide the accepts object to your facilitator as described in Client & server responsibilities and call /verify + /settle.
  2. Replay the crawl request with the facilitator-issued Base64 payload in the X-PAYMENT header:
    curl https://api.horizon.new/v1/crawl \
      -H "Content-Type: application/json" \
      -H "X-PAYMENT: eyJ4NDAyVmVyc2lvbiI6MSwic2NoZW1lIjoiZXhhY3QiLC4uLn0=" \
      -d '{ ...original body... }'
    
  3. Horizon validates the payment, queues the crawl, and returns the usual 202 Accepted. Successful responses can expose X-PAYMENT-RESPONSE for bookkeeping.
Need a facilitator or wallet? The Quickstart for sellers walks through setup on Base and Solana.

Body

application/json
url
string<uri>
required
depth
integer
default:1
Required range: 0 <= x <= 3
priority
enum<string>
default:normal
Available options:
low,
normal,
high
notify
object
webhookUrl
string<uri>

Webhook to call when the crawl completes.

Response

Crawl accepted

crawlId
string
required
Example:

"crawl_01hx9ra"

status
enum<string>
required
Available options:
queued,
processing
statusUrl
string<uri>
required
etaSeconds
integer | null