Extract a website

Calls assume fetchWithPayment already wraps your HTTP requests to satisfy x402 invoices.

1. Submit the website extraction

const baseUrl = process.env.HORIZON_BASE_URL ?? 'https://api.worklet.cloud/v1';

const siteResponse = await fetchWithPayment(`${baseUrl}/extract/website`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    sourceUrl: 'https://blog.horizon.new/product-update',
    sourceName: 'Product Update',
    options: {
      selector: 'article',
      stripSelectors: ['.share-buttons', '.newsletter-cta'],
    },
    webhookUrl: 'https://example.com/webhooks/horizon/extraction',
  }),
});

const siteJob = await siteResponse.json();
console.log('website job', siteJob.jobId);

2. Handle synchronous completion

if (siteJob.status === 'completed' && siteJob.result) {
  console.log('Inline chunks', siteJob.result.chunks?.length ?? 0);
}

3. Pull normalized content

const status = await fetchWithPayment(siteJob.statusUrl).then((res) => res.json());

if (status.state === 'processing') {
  await new Promise((resolve) => setTimeout(resolve, 3000));
}

if (status.state !== 'succeeded') {
  throw new Error(`Website extraction failed: ${status.error?.code ?? 'unknown'}`);
}

const { chunks } = status.result;
console.log('First chunk preview', chunks[0]?.content.slice(0, 120));

4. Keep content fresh

Re-run the extraction on a cadence and look at status.result.hash to detect drift.
Record the canonical URL so /search can surface back-links to the original page.
Combine with /examples/discovery/crawl-website when you need breadth plus targeted selectors.

Start here

Getting started

Examples

Automation assistants

1. Submit the website extraction

2. Handle synchronous completion

3. Pull normalized content

4. Keep content fresh

Start here

Getting started

Examples

Automation assistants

​1. Submit the website extraction

​2. Handle synchronous completion

​3. Pull normalized content

​4. Keep content fresh

1. Submit the website extraction

2. Handle synchronous completion

3. Pull normalized content

4. Keep content fresh