Skip to content

Browser Actions

Browser actions are the building blocks for web automation.

Overview

Available actions:

  • Navigation
  • Click interactions
  • Text input
  • Content extraction
  • Screenshots
  • Scrolling
  • Waiting

Action Types

Navigate to a URL:

json
{
  "action": "navigate",
  "url": "https://example.com"
}

Click

Click an element:

json
{
  "action": "click",
  "selector": "#submit-button"
}

Options:

json
{
  "action": "click",
  "selector": ".menu-item",
  "button": "left",
  "click_count": 2,
  "delay": 100
}

Type

Enter text:

json
{
  "action": "type",
  "selector": "#email-input",
  "text": "user@example.com"
}

With options:

json
{
  "action": "type",
  "selector": "#search",
  "text": "search query",
  "delay": 50,
  "clear": true
}

Press

Press keyboard keys:

json
{
  "action": "press",
  "key": "Enter"
}

Common keys:

  • Enter, Tab, Escape
  • ArrowUp, ArrowDown, ArrowLeft, ArrowRight
  • Backspace, Delete
  • Modifiers: Shift+Enter, Control+a

Extract

Extract content from page:

json
{
  "action": "extract",
  "selector": ".product-price",
  "attribute": "text"
}

Extract types:

json
{
  "action": "extract",
  "selector": "img.hero",
  "attribute": "src"
}

Multiple elements:

json
{
  "action": "extract",
  "selector": ".item-title",
  "multiple": true
}

Screenshot

Capture screenshot:

json
{
  "action": "screenshot"
}

With options:

json
{
  "action": "screenshot",
  "selector": "#main-content",
  "full_page": false,
  "format": "png"
}

Scroll

Scroll the page:

json
{
  "action": "scroll",
  "direction": "down",
  "amount": 500
}

Scroll to element:

json
{
  "action": "scroll",
  "selector": "#footer",
  "behavior": "smooth"
}

Wait

Wait for conditions:

json
{
  "action": "wait",
  "selector": ".loading",
  "state": "hidden"
}

Wait types:

json
// Wait for element
{
  "action": "wait",
  "selector": "#content",
  "state": "visible",
  "timeout": 10000
}

// Wait for time
{
  "action": "wait",
  "time": 2000
}

// Wait for navigation
{
  "action": "wait",
  "type": "navigation"
}

Executing Actions

Via API

bash
curl -X POST https://your-domain.com/api/agents/{id}/browser/{sessionId}/execute \
  -H "Content-Type: application/json" \
  -d '{
    "action": "click",
    "selector": "#login-button"
  }'

Response:

json
{
  "success": true,
  "result": null,
  "duration_ms": 150
}

Via Workflow

json
{
  "nodes": [
    {
      "id": "nav-1",
      "type": "navigate",
      "data": { "url": "https://example.com" }
    },
    {
      "id": "click-1",
      "type": "click",
      "data": { "selector": "#login" }
    },
    {
      "id": "type-1",
      "type": "type",
      "data": {
        "selector": "#email",
        "text": "{{credentials.email}}"
      }
    }
  ]
}

Selectors

CSS Selectors

json
// By ID
{ "selector": "#my-element" }

// By class
{ "selector": ".my-class" }

// By attribute
{ "selector": "[data-testid='submit']" }

// By tag
{ "selector": "button" }

// Combined
{ "selector": "form#login input[type='email']" }

XPath Selectors

json
{
  "selector": "//button[contains(text(), 'Submit')]",
  "selector_type": "xpath"
}

Text Selectors

json
{
  "selector": "text=Click here",
  "selector_type": "text"
}

Action Chaining

Execute multiple actions:

bash
curl -X POST .../browser/{sessionId}/execute \
  -d '{
    "actions": [
      { "action": "navigate", "url": "https://example.com" },
      { "action": "wait", "selector": "#content" },
      { "action": "click", "selector": "#menu" },
      { "action": "screenshot" }
    ]
  }'

Error Handling

Common Errors

ErrorCauseSolution
ElementNotFoundSelector doesn't matchVerify selector, wait for element
TimeoutAction took too longIncrease timeout
NavigationFailedPage failed to loadCheck URL, retry
ClickInterceptedElement blockedWait, scroll into view

Retry Logic

json
{
  "action": "click",
  "selector": "#button",
  "retry": {
    "attempts": 3,
    "delay": 1000
  }
}

Advanced Actions

Hover

json
{
  "action": "hover",
  "selector": ".dropdown-trigger"
}

Select

Select dropdown option:

json
{
  "action": "select",
  "selector": "#country",
  "value": "US"
}

Upload

Upload file:

json
{
  "action": "upload",
  "selector": "input[type='file']",
  "file_path": "/path/to/file.pdf"
}

Evaluate

Execute JavaScript:

json
{
  "action": "evaluate",
  "script": "document.querySelector('#count').textContent"
}

Best Practices

1. Use Stable Selectors

Prefer:

  • IDs and data attributes
  • Semantic selectors

Avoid:

  • Dynamic classes
  • Position-based selectors

2. Wait for Elements

Always wait before interacting:

json
[
  { "action": "wait", "selector": "#form" },
  { "action": "type", "selector": "#email", "text": "..." }
]

3. Handle Dynamic Content

Wait for content to load:

json
{
  "action": "wait",
  "selector": ".loading",
  "state": "hidden"
}

4. Set Appropriate Timeouts

Adjust for slow pages:

json
{
  "action": "wait",
  "selector": "#data-table",
  "timeout": 30000
}

5. Validate Results

Check action results:

json
{
  "action": "extract",
  "selector": ".success-message",
  "validate": true
}

API Reference

See Browser API for complete endpoint documentation.

Released under the MIT License.