Turn any web page into
LLM-ready content

Enter a URL. Siphon renders the page in a headless browser, executes JavaScript, scrolls for lazy content, and extracts clean Markdown / JSON / plain text.

Siphon Playground

TRY IT — PAGES THAT WEB_FETCH CAN'T HANDLE

siphon.world/

Format

Target

Exclude

Auto Scroll

Image Caption

Siphon (headless browser)

^ Enter a URL and click Extract or try the examples below

web_fetch (plain HTTP)

HTTP-only result will appear here for comparison

SPA Rendering

Executes JavaScript via headless Chromium, waits for async data to load, and extracts the real content from React / Vue / Angular single-page apps.

↓

Scroll & Lazy Load

Automatically scrolls the page to trigger lazy-loaded images, infinite scroll feeds, comment sections, and other dynamically loaded content.

Click to Expand

Clicks "Load More" buttons, expands collapsed sections, and switches tabs to capture content hidden behind user interactions.

Precision Extraction

Use CSS selectors to target specific content and exclude noise like navigation bars, ads, and footers. Keep only what you need.

AI Enhanced

Optionally use an LLM to restructure extracted content, handle complex tables, multilingual pages, and custom field extraction.

{}

Structured Data

Automatically extracts embedded JSON-LD, Open Graph, and Microdata metadata from the page with zero configuration.

API Usage

    # Simple call — get Markdown directly

    curl http://siphon.world/extract?url=https://example.com

    # SPA page — wait and target extraction

    curl -X POST http://siphon.world/extract \

      -H "Content-Type: application/json" \

      -d '{

        "url": "https://vuejs.org/guide/introduction.html",

        "wait_for_selector": ".content",

        "target_selector": ".content",

        "exclude_selectors": [".VPSidebarNav"]

      }'

    # Lazy-load page — auto scroll

    curl -X POST http://siphon.world/extract \

      -d '{"url": "https://news.ycombinator.com", "scroll_to_bottom": true, "max_scrolls": 5}'

Turn any web page intoLLM-ready content