Control real browsers through a simple REST API. Get structured page data, stable element refs, and change diffs instead of raw HTML.
Six steps to show the full lifecycle: create a session, observe the page, fill forms, extract data, scroll, and screenshot.
{
"url": "https://app.example.com/login",
"viewport": {
"width": 1280,
"height": 720
},
"auto_dismiss_blockers": true
}
{
"session_id": "ses_abc123def456",
"page": {
"url": "https://app.example.com/login",
"title": "Log In",
"stable": true,
"markdown": {
"content": "# Log In\n\nWelcome back..."
},
"interactive_elements": [
{ "ref": "e1", "tag": "input",
"label": "Email" },
{ "ref": "e2", "tag": "input",
"label": "Password" },
{ "ref": "e3", "tag": "button",
"label": "Sign In" }
],
"forms": [
{ "id": "login", "fields": 2 }
]
},
"blockers_dismissed": ["cookie_consent"]
}
Launch Puppeteer, set viewport, navigate, wait for networkidle, detect and dismiss cookie banner (2-3 extra actions), call page.content(), parse 15,000+ character HTML with cheerio, manually extract form fields.
~25 lines of code. ~3,500 tokens for the raw HTML alone.
One POST request. Navigate, auto-dismiss the cookie banner, return markdown content, element refs, and form structures. The page is ready for your agent to read and act on.
1 API call. Markdown + refs + forms. ~150 tokens.
{
"steps": [
{
"observe": {
"scope": "#main-content",
"format": "markdown",
"include_links": true
}
}
]
}
{
"completed": 1,
"page": {
"url": "https://app.example.com/login",
"stable": true,
"markdown": {
"content": "# Log In\n\nWelcome back.\nEnter your credentials.",
"length": 48
},
"interactive_elements": [
{ "ref": "e1", "tag": "input",
"label": "Email" },
{ "ref": "e2", "tag": "input",
"label": "Password" },
{ "ref": "e3", "tag": "button",
"label": "Sign In" }
],
"changes": null
},
"error": null
}
Call page.content() for the full DOM (15,000+ characters), then use cheerio or regex to extract just the section you need. Convert HTML to markdown yourself. No way to detect what changed since your last read.
~4,000 tokens for a single page read. No scoping. No diff.
Scope to a CSS selector, get back clean markdown and element refs for just that section. The changes field shows what shifted since the last observation, so your agent never re-reads stale content.
~200 tokens scoped. Markdown built in. Diff tracking automatic.
{
"steps": [
{
"fill_form": {
"fields": {
"Email": "[email protected]",
"Password": "secret123"
},
"submit": true
}
}
]
}
{
"completed": 1,
"page": {
"url": "https://app.example.com/dashboard",
"title": "Dashboard",
"stable": true,
"changes": {
"url_changed": true,
"title_changed": true
},
"interactive_elements": [
{ "ref": "e1", "tag": "a",
"label": "Settings" },
{ "ref": "e2", "tag": "a",
"label": "Logout" }
]
},
"error": null
}
Find email input by CSS selector (breaks if markup changes), page.type() the value, find password input, type again, find submit button by selector, page.click(), waitForNavigation(), then re-scrape the entire page to see results.
~15 lines. 3 fragile selectors. ~3,500 tokens to re-read the new page.
One step matches fields by label, fills both inputs, finds the submit button, clicks it, and waits for navigation. The response includes the new page state with a diff showing what changed.
1 API call. 0 selectors. Change diff included automatically.
{
"steps": [
{
"extract": {
"products": [
{
"_parent": ".product-card",
"name": "h2 >> text",
"price": ".price >> text",
"url": "a >> href"
}
]
}
}
]
}
{
"completed": 1,
"extraction": {
"products": [
{
"name": "Wireless Headphones",
"price": "$79.99",
"url": "/products/wireless-headphones"
},
{
"name": "USB-C Hub",
"price": "$34.99",
"url": "/products/usb-c-hub"
}
]
},
"error": null
}
Write page.evaluate() with querySelectorAll, map each element to an object, handle null checks for missing fields, JSON.stringify the result, parse it back in Node.js. Selectors break when the site redesigns.
~20 lines of in-page JavaScript. Fragile. No schema validation.
Declare the shape of the data you want. The extract step handles the DOM traversal and returns clean, structured JSON. Tables are auto-parsed. Supports CSS selectors and XPath.
1 declarative schema. Structured JSON response.
{
"steps": [
{
"scroll_collect": {
"max_text_length": 50000,
"max_scrolls": 30
}
}
]
}
{
"completed": 1,
"page": {
"url": "https://news.example.com/feed",
"title": "Latest News",
"stable": true,
"markdown": {
"content": "# Latest News\n\n## Story 1...",
"length": 32840
},
"interactive_elements": [
{ "ref": "e1", "label": "Load More" }
],
"scroll": {
"y": 14200,
"height": 14200,
"percent": 100
}
},
"error": null
}
Write a scroll loop: scroll down, wait for lazy content to load, check if you've reached the bottom, repeat. Handle race conditions with loading spinners and infinite scroll triggers. Collect content at each position, deduplicate.
~35 lines. Fragile timing. Content deduplication is your problem.
One step. Scrolls through the entire page, waits for lazy-loaded content at each position, deduplicates, and returns a single unified markdown observation. Handles infinite scroll, loading spinners, and content gates.
1 step. Full page content in one response. Up to 50,000 characters.
{
"steps": [
{
"screenshot": {
"full_page": true,
"format": "png"
}
},
{
"close": {}
}
]
}
{
"completed": 2,
"screenshots": [
{
"format": "png",
"width": 1280,
"height": 3200,
"data": "iVBORw0KGgo..."
}
],
"page": null,
"error": null
}
page.screenshot() to a temp file, read the file into a buffer, base64-encode it, browser.close(), handle cleanup errors if the process crashed. You manage Chrome process lifecycle yourself.
~12 lines. Must manage file I/O and process cleanup.
Screenshot and close in a single call. Base64 image data returned inline. The close step destroys the session and stops the billing clock. No cleanup code needed.
1 API call, 2 steps. Session cleaned up automatically.
Six capabilities that sit between your agent and the page, so the LLM spends tokens on the task, not on browser overhead.
Every response includes a stability signal that tells your agent when the page is fully loaded and ready. No more guessing wait times or burning tokens on premature reads.
Interactive elements get short, stable refs like e1, e2, e3. Your agent clicks by ref instead of constructing fragile CSS selectors.
After each action, the API returns only what changed: elements added, removed, or modified. Your agent reads a 30-token diff instead of re-parsing the entire page.
Cookie banners, newsletter popups, and chat widgets are detected and dismissed automatically. Your agent never wastes actions on interruptions irrelevant to the task.
Pages are compressed into a structured, token-efficient representation: interactive elements, headings, and visible text. Thousands of DOM nodes become a compact JSON object.
When an action fails, you get context, not just "element not found." The API tells you if an overlay is blocking the target, if a CAPTCHA appeared, and what to do next.
One API, many possibilities. From autonomous agents to data pipelines, Browserbeam gives your code a browser it can see through.
Give your AI agent a real browser it can see and control. Structured page data, interactive element refs, and markdown content your model can read.
POST /v1/sessions
{
"url": "https://app.example.com/login",
"viewport": {
"width": 1280,
"height": 720
},
"auto_dismiss_blockers": true
}
{
"session_id": "ses_abc123def456",
"page": {
"url": "https://app.example.com/login",
"title": "Log In",
"stable": true,
"markdown": {
"content": "# Log In\n\nWelcome back..."
},
"interactive_elements": [
{ "ref": "e1", "tag": "input", "label": "Email" },
{ "ref": "e2", "tag": "input", "label": "Password" },
{ "ref": "e3", "tag": "button", "label": "Sign In" }
],
"forms": [
{ "id": "login", "fields": 2 }
]
},
"blockers_dismissed": ["cookie_consent"]
}
POST /v1/sessions
{
"url": "https://store.example.com",
"steps": [{
"extract": {
"products": [{
"_parent": ".product-card",
"name": "h2 >> text",
"price": ".price >> text",
"url": "a >> href"
}]
}
}, {
"close": {}
}]
}
{
"session_id": "ses_def789",
"completed": 2,
"extraction": {
"products": [
{
"name": "Wireless Headphones",
"price": "$79.99",
"url": "/products/wireless-headphones"
},
{
"name": "USB-C Hub",
"price": "$34.99",
"url": "/products/usb-c-hub"
},
{
"name": "Mechanical Keyboard",
"price": "$129.00",
"url": "/products/mech-keyboard"
}
]
},
"page": null,
"error": null
}
POST /v1/sessions/:id/act
{
"steps": [{
"observe": {
"scope": "#checkout-form",
"format": "markdown",
"include_links": true
}
}]
}
{
"completed": 1,
"page": {
"url": ".../checkout",
"stable": true,
"markdown": {
"content": "## Checkout\n\nShipping...",
"length": 284
},
"interactive_elements": [
{ "ref": "e1", "tag": "input", "label": "Address" },
{ "ref": "e2", "tag": "select", "label": "Country" },
{ "ref": "e3", "tag": "button", "label": "Pay Now" }
],
"changes": null
},
"error": null
}
POST /v1/sessions
{
"url": "https://app.example.com/login",
"steps": [
{ "fill_form": {
"fields": {
"Email": "[email protected]",
"Password": "secret123"
},
"submit": true
}},
{ "wait": { "ms": 2000 }},
{ "screenshot": {
"full_page": true
}},
{ "close": {} }
]
}
{
"session_id": "ses_ghi012",
"completed": 4,
"page": {
"url": ".../dashboard",
"title": "Dashboard",
"stable": true,
"changes": {
"url_changed": true,
"title_changed": true
},
"interactive_elements": [
{ "ref": "e1", "label": "Settings" },
{ "ref": "e2", "label": "Logout" }
]
},
"screenshots": [{
"format": "png",
"width": 1280,
"height": 3200,
"data": "iVBORw0KGgo..."
}],
"error": null
}
Pay for the time your sessions are open. Start with a 1-hour free trial. No credit card needed.
For individuals and side projects
For teams and production use
For agencies and high-volume use
Structured page data, stable element refs, and change diffs. One REST API.