Scriptable Browsers: From Selenium to Playwright-Powered Monitoring

Web browsers were built for humans. For a long time that was fine — a person clicked links, filled forms, and eyeballed the results. But as web applications grew more complex (AJAX, SPAs, client-side rendering), the industry needed a way to drive browsers programmatically. First for testing. Then for scraping. And now, for monitoring.

This post traces the evolution of scriptable browsers and explains why the same technology that powers your E2E test suite is now the best tool for knowing whether your site actually works for real users.

A brief history

Selenium (2004)

Selenium appeared when Google Chrome didn’t exist yet and Firefox was the browser to automate. There was no browser API for external control — Selenium solved this by injecting a Java library into the browser, loading custom extensions, and exposing an HTTP interface to accept commands.

Over time, this approach was standardized into the JSONWire protocol. Support for other engines followed — ChromeDriver, OperaDriver, GeckoDriver. Then the W3C WebDriver standard formalized the idea, and browsers began shipping native support. Selenium 4 (2018, stable in 2021) adopted WebDriver natively.

Today Selenium is at v4.41 with 12 releases shipped in 2025 alone. It’s far from dead — but its architecture carries 20 years of history.

Puppeteer (2017)

Google’s Chrome DevTools team introduced Puppeteer in May 2017, built on a different protocol: CDP (Chrome DevTools Protocol). Instead of the HTTP-based WebDriver, CDP uses WebSocket for real-time bidirectional communication with the browser.

This was a significant upgrade. CDP gives you access to everything the DevTools panel sees: network waterfall, console messages, performance traces, coverage data. For scraping and testing, it meant you could intercept network requests, inject JavaScript, and capture HAR files without an external MITM proxy.

The trade-off: Puppeteer was JavaScript-only and Chromium-first. Firefox support came later (via patched binaries), and the API was async-only.

Currently at v24.40, Puppeteer remains widely used — especially in data extraction and AI agent tooling, where its lean CDP usage is 15-20% faster than alternatives for certain workloads.

Playwright (2019)

Playwright was created by the same engineers who built Puppeteer, after they moved from Google to Microsoft. It launched in November 2019 and addressed Puppeteer’s limitations head-on:

Multi-browser from day one. Chrome, Firefox, and WebKit — all using CDP (or Playwright’s own protocol variants).
Multi-language. JavaScript/TypeScript first, but also Python, Java, and C#.
Built-in test runner. npx playwright test ships with parallel execution, auto-retries, HTML reports, and trace recording.
Sync and async APIs. In Python, you choose. Selenium only offered sync; Puppeteer only offered async.
Auto-waiting. Elements are automatically waited for before interaction — no more sleep(2) or manual wait loops.
Built-in network introspection. Intercept, mock, or record all network traffic without external tools.

The numbers speak for themselves: 78,600+ GitHub stars, 13.5 million weekly npm downloads, and 235% year-over-year growth. In 2026, Playwright surpassed Selenium as the top automation testing tool in industry surveys. 94% retention rate — once teams adopt it, they stay.

The comparison matrix

	Selenium	Puppeteer	Playwright
Protocol	WebDriver (HTTP)	CDP (WebSocket)	CDP / custom (WebSocket)
Browsers	Chrome, Firefox, Safari, Edge, IE	Chrome, Firefox, WebKit	Chrome, Firefox, WebKit
Languages	Java, Python, C#, JS, Ruby	JS/TS only	JS/TS, Python, Java, C#
Network introspection	Requires MITM proxy	Built-in	Built-in
Auto-wait	No	No	Yes
Test runner	External (TestNG, pytest)	External	Built-in (`@playwright/test`)
Binary management	3rd-party `webdriver_manager`	Built-in `npx puppeteer install`	Built-in `npx playwright install`
Headless	Via browser flags + Xvfb	Native	Native

* Puppeteer’s Firefox and WebKit support uses patched browser builds.

If you’re starting fresh in 2026 — no legacy codebase, no deep investment in another framework — Playwright is the clear default for its simplicity, speed, multi-browser support, and built-in tooling.

What headless actually means

Headless mode runs the browser without a GUI. No window renders on screen, but the full engine runs: DOM parsing, JavaScript execution, layout computation, network requests. The advantages are practical:

Faster. Some rendering steps (compositing, painting) can be skipped.
Less resource-hungry. No GPU process, no display server dependency.
Runs anywhere. Linux containers, CI runners, server VMs — no Xvfb needed.

As of Chrome 132 (January 2025), --headless defaults to the “new” headless mode — a full browser instance without a window, not the old stripped-down shell. The previous --headless=old mode was removed from the main binary and lives on as a separate chrome-headless-shell binary for lightweight use cases.

What’s happening now: BiDi and AI agents

Two shifts are reshaping the browser automation landscape:

WebDriver BiDi is a new W3C standard that brings bidirectional communication to WebDriver — combining CDP’s real-time capabilities with WebDriver’s cross-browser standardization. Firefox has already removed CDP support entirely (since Firefox 129); BiDi is the only automation path. Chrome still defaults to CDP, with BiDi as an opt-in. Playwright tracks BiDi progress but hasn’t adopted it yet due to missing features.

AI agent browsers are a new category. Tools like Browser Use, Steel Browser, and Hyperbrowser give AI agents the ability to navigate the web autonomously. Notably, Browser Use moved from Playwright to raw CDP for performance. And Lightpanda — a headless browser written in Zig from scratch — claims 11x faster execution and 9x less memory than headless Chrome, with a CDP-compatible API so existing scripts work as drop-in replacements.

From testing to monitoring

Here’s the connection that matters for operations teams: if you can drive a browser to test a user flow, you can drive a browser to monitor that flow continuously.

A traditional HTTP monitor sends a GET request, checks the status code, and measures response time. This tells you the endpoint is alive. But a 200 OK can hide:

Broken JavaScript that prevents the page from rendering
Failed API calls that leave the UI in an error state
Missing assets (CSS, images, fonts) that break the layout
Degraded Core Web Vitals that make the page unusable

Browser monitoring loads your page in a real Chromium instance and captures what a real user would experience. Same Playwright engine used for E2E testing — now running on a schedule from multiple global locations.

Oack Browser Monitoring: Pageload mode

Oack’s Pageload monitor opens your URL in headless Chromium (via Playwright) and captures everything:

Web Vitals — the metrics Google uses to measure user experience:

Metric	What it measures	Good threshold
TTFB	Server response time	< 200 ms
FCP	First visible content	< 1.8 s
LCP	Main content ready	< 2.5 s
CLS	Layout stability	< 0.1

Page timing — DOM Interactive, DOMContentLoaded, and window Load events, tracked as time series so you can spot regressions.

HAR waterfall — the full HTTP Archive of every network request the page makes. Every resource URL, status code, size, and timing bar. Filter by type (JS, CSS, images, XHR). Resources that returned 4xx/5xx are highlighted.

Screenshots — viewport or full-page snapshots captured on every check, not just failures. Visual proof of what the page looked like at that moment.

Console logs — JavaScript errors and warnings from the DevTools console. Set a threshold: if console.error() fires more than N times, the check fails.

Health evaluation is configurable: main document status code + page load timeout + console error threshold + resource error threshold. Any condition breach triggers your alert channels — Slack, Telegram, PagerDuty, email, webhooks.

Each check runs in a fresh browser context — no cookies, no cache — simulating a first-time visitor. The default interval is 5 minutes (minimum: 60 seconds for higher tiers). This is heavier than an HTTP probe, but the signal quality is categorically different.

Oack Browser Monitoring: Test Suite mode

Pageload answers “does my page load correctly?” Test Suite answers “can a user actually do things?”

The key design decision: zero rewrites. You write standard Playwright tests with test() and expect(). The same .spec.ts file runs locally with npx playwright test and on Oack as a scheduled monitor. No custom API, no proprietary format, no vendor lock-in.

Here’s a real example — an E2E test for a login flow:

import { test, expect } from '@playwright/test';

test('user can log in and see the store', async ({ page }) => {
  await page.goto(process.env.BASE_URL + '/login');
  await page.getByTestId('email-input').fill(process.env.LOGIN_EMAIL!);
  await page.getByTestId('password-input').fill(process.env.LOGIN_PASSWORD!);
  await page.getByTestId('login-submit').click();
  await expect(page).toHaveURL(/\/store/);
});

That’s it. Standard Playwright. Credentials come from environment variables — managed securely via Oack’s team-level secrets (AES-256-GCM encrypted at rest, write-only after creation, never exposed in API responses).

Deploy with two commands

Test it against the platform:

oackctl test --team $TEAM --monitor $MONITOR --dir ./tests

This uploads your project, runs it on a remote browser-checker inside Docker, and returns the Playwright HTML report — complete with screenshots, error details, and timing. The report auto-opens in your browser.

Once you’re happy, deploy for continuous monitoring:

oackctl deploy --team $TEAM --monitor $MONITOR --dir ./tests

The platform runs npm install once, caches your node_modules, and executes npx playwright test on every scheduled check. Any test failure = monitor goes DOWN, alerts fire.

Filtering

You don’t have to run your entire test suite on every check. Filter by test name, project, or tag:

# Only login tests, every 5 minutes
oackctl deploy --team $T --monitor $M --dir ./tests --pw-grep "login"

# Only tests tagged @critical
oackctl deploy --team $T --monitor $M --dir ./tests --pw-tag "critical"

# Only the Chromium project
oackctl deploy --team $T --monitor $M --dir ./tests --pw-project "chromium"

What the probe captures

Every test run produces:

Playwright HTML report — the same report you see locally, with full step details, screenshots, and traces
Pass/fail counts — 5 passed, 1 failed, 0 skipped
Total duration — how long the suite took
Git metadata — commit SHA, branch, who deployed

The report is served via signed URLs (HMAC + 1-hour expiry) and retained for 3-7 days depending on your plan.

The infrastructure under the hood

Each browser check runs in an ephemeral Docker container — created, executed, and destroyed per probe. This is important for two reasons:

Isolation. A frozen Chromium tab (infinite JS loop, OOM) can’t block other monitors. The container is killed after the timeout + 30-second grace period.
Security. User test suites run in containers with --read-only filesystem, --memory=512m, --cpus=1.5, --pids-limit=256. No access to host credentials or other monitors’ data.

The browser-checker binary (Go) manages the lifecycle: downloads the test suite, extracts cached dependencies, spawns the Docker container, reads artifacts from a bind-mounted temp directory, uploads the Playwright report to storage, and sends the probe result via WebSocket. Credentials for artifact storage and the API never enter the container.

Cold start is ~2 seconds (container creation + Node.js init + Chromium launch) — acceptable for checks running every 60-300 seconds.

When to use which

Scenario	HTTP Monitor	Browser Pageload	Browser Test Suite
Is the endpoint alive?	Yes	Overkill	Overkill
Does the page render?	No	Yes	Overkill
Are Web Vitals healthy?	No	Yes	No (use Pageload)
Can users log in?	No	No	Yes
Does checkout work?	No	No	Yes
CI/CD integration?	N/A	N/A	Yes (`oackctl test` in pipeline)
Resource cost	Minimal	Medium	Higher

Start with HTTP monitors for availability. Add Pageload monitors for your most important pages — homepage, landing pages, pricing. Add Test Suite monitors for critical user flows that, if broken, directly cost revenue.

From scraping to testing to monitoring

Scriptable browsers started as a scraping and testing tool. Twenty years later, the same technology — now mature, fast, and well-supported — is the foundation for answering the question that matters most in production: does my site work for a real user, right now?

Playwright won the framework war. The infrastructure to run it reliably at scale (ephemeral containers, artifact storage, scheduled execution) is the hard part. That’s what we built.