Documentation

Oack is an uptime and performance monitoring platform with deep TCP-level telemetry, multi-channel alerting, and AI-assisted troubleshooting via MCP. Run checkers on your own infrastructure, get notified in seconds, and diagnose issues all the way down to the network layer.

Quick Start

Get monitoring in three steps:

  1. 1. Create an account — Sign up at app.oack.io and create your first team.
  2. 2. Add a monitor — Enter the URL you want to watch, pick an HTTP method, set the check interval, and choose a checker region.
  3. 3. Configure alerts — Create alert channels (Slack, Discord, Telegram, PagerDuty, Email, or Webhook) and link them to your monitor. You'll be notified within seconds when something goes wrong.

HTTP Monitoring

Each monitor performs an HTTP/HTTPS request to your endpoint at a configurable interval. Every probe captures:

  • Full timing breakdown — DNS lookup, TCP connect, TLS handshake, send, wait (TTFB), and receive phases.
  • HTTP headers & body — Request and response headers plus truncated body (1 KB) captured on every probe.
  • TCP metrics — Kernel-level TCP_INFO data including RTT, retransmits, congestion window, and segment counters.
  • Packet capture — Optional per-probe pcap of the full HTTP exchange (SYN to FIN) for deep post-mortem analysis.

Monitors support custom HTTP methods (GET, POST, HEAD, etc.), custom headers, request body, and configurable timeouts. Check intervals range from 30 seconds (Business) to 5 minutes (Free).

Health Rules

Health rules define when a monitor is considered up or down. Each monitor has:

  • Success criteria — Expected HTTP status codes (e.g. 200-299) and maximum latency threshold.
  • Failure threshold — Number of consecutive failing probes required before the monitor transitions from UP to DOWN.
  • Recovery threshold — Number of consecutive passing probes required before the monitor transitions from DOWN back to UP.

SSL & Domain Expiration

Oack automatically monitors SSL certificate and domain registration expiration for every active HTTP monitor. A daily sweep checks:

  • SSL certificates — TLS handshake reads the leaf certificate's expiration date.
  • Domain registration — RDAP protocol (the ICANN standard replacement for WHOIS) checks registration expiry.

Notifications are sent through your linked alert channels at configurable thresholds (default: 30, 14, 7, and 1 days before expiration). Available on Pro and Business plans.

Web Checker — Pageload

Pageload monitors launch a real Chromium browser (Playwright) to load your page exactly like a visitor would. Designed for performance monitoring — measure how fast your page loads, track Web Vitals over time, and get alerted when performance degrades. No scripting required — just enter a URL.

Web Vitals & timing metrics

MetricNameWhat it tells you
TTFBTime to First ByteHow long until the browser receives the first byte from the server. High TTFB points to slow server processing, DNS issues, or network latency. Under 200 ms is good; above 600 ms needs investigation.
FCPFirst Contentful PaintWhen the browser renders the first piece of visible content (text, image, or canvas). This is the moment your page stops being blank. Under 1.8 s is good; above 3 s feels slow to users.
LCPLargest Contentful PaintWhen the largest visible element (hero image, heading block, video poster) finishes rendering. This is the best proxy for "the page looks ready." Under 2.5 s is good; above 4 s means users are waiting too long for the main content.
CLSCumulative Layout ShiftHow much the page layout shifts unexpectedly while loading (ads popping in, images resizing, fonts swapping). It's a score, not a time — under 0.1 is good; above 0.25 means things are jumping around and annoying your visitors.
DOM InteractiveDOM InteractiveWhen the HTML document has been fully parsed and the DOM is ready for JavaScript to manipulate. Render-blocking scripts and large HTML payloads push this number up.
DOMContentLoadedDOMContentLoaded EventWhen the HTML and all deferred scripts have finished executing. A big gap between DOM Interactive and DOMContentLoaded usually means heavy synchronous JavaScript.
Load EventWindow LoadWhen the entire page — including images, stylesheets, iframes, and fonts — has finished loading. This is the "everything done" marker.

What each probe captures

  • Web Vitals — TTFB, FCP, LCP, and CLS measured from the real browser rendering pipeline.
  • Page timing — DOM Interactive, DOMContentLoaded, and Load Event timestamps.
  • HAR waterfall — Full HTTP Archive of every network request the page made, with timing, size, and status. Download and inspect in any HAR viewer.
  • Screenshots — Optional viewport or full-page screenshot captured after the page loads.
  • Console log — All console messages (errors, warnings, info) emitted during page load, with counts for each severity.
  • Resource summary — Total resource count, error count, and total bytes transferred.

Web Checker — Test Suite

Test Suite monitors run standard Playwright Test files to verify functional user flows — login, search, checkout, multi-page navigation. Designed for scenario testing, not page speed. The platform runs npx playwright test on schedule and alerts you when tests fail.

Write your tests with test() and expect(), run them locally with npx playwright test, then deploy the same directory to Oack. No custom API, no rewrites — the same tests run everywhere.

Example: PokéStore e2e tests

poke-store.oack.io is a demo Pokémon store with login, search, cart, and checkout flows. Source. The test suite lives alongside the frontend code:

tests/e2e/store.spec.ts
import { test, expect } from '@playwright/test';

async function loginAsAsh(page) {
  await page.goto('/login');
  await page.getByTestId('email-input').fill('[email protected]');
  await page.getByTestId('password-input').fill('pikachu123');
  await page.getByTestId('login-submit').click();
  await page.waitForURL(/\/store/);
}

test.describe('PokéStore', () => {
  test('should log in and see store', async ({ page }) => {
    await loginAsAsh(page);
    await expect(page.getByTestId('user-name')).toHaveText('Ash Ketchum');
  });

  test('should search Pokémon', async ({ page }) => {
    await loginAsAsh(page);
    await page.getByTestId('search-input').fill('pikachu');
    await expect(page.getByTestId('pokemon-name')).toContainText('Pikachu');
  });
});

Run locally

Terminal
cd web
npx playwright test

# 13 passed (24.1s)

Skip repetitive flags with .oackctl.env

Create a .oackctl.env file in your project root to avoid passing --team and --monitor on every command. oackctl auto-loads it from the current directory.

.oackctl.env
OACKCTL_TEAM=a98957b0-a129-4032-a2c4-d18ac8dd2287
OACKCTL_MONITOR=f190f477-48f7-46d7-a533-25ca3b1541e1

Now you can run commands without the flags:

Terminal
oackctl test --dir web
oackctl deploy --dir web

Every CLI flag maps to an OACKCTL_ env var: --teamOACKCTL_TEAM, --monitorOACKCTL_MONITOR, --pw-grepOACKCTL_PW_GREP, etc. Add .oackctl.env to your .gitignore if it contains team-specific IDs.

Test on Oack (one-off run)

Upload the same directory for a one-off test run on Oack's browser infrastructure. The result includes a full Playwright HTML report.

Terminal
oackctl test --team <TEAM> --monitor <MONITOR> --dir web

# Packaging web...
#   Files: 74 (112.7 KB)
#
# Running test...
#
# Result: PASSED
# Report: https://api.oack.io/api/v1/artifacts/.../report/index.html

Deploy for continuous monitoring

Deploy the test suite to a browser monitor. It runs on schedule (e.g. every 5 minutes) and you get alerted when tests fail.

Terminal
oackctl deploy --team <TEAM> --monitor <MONITOR> --dir web

# Packaging web...
#   Files: 74 (112.7 KB)
#
# Uploading suite...
#   Suite: 112.7 KB
#   Tests: tests/e2e/store.spec.ts
#   Git:   8169f4ce (main)
#
# Deployed.
# Monitor: https://app.oack.io/teams/.../monitors/...

What you get

  • Playwright HTML report — full test breakdown with screenshots, error details, and timing. Opens in your browser after each test run.
  • Pass/fail health status — any test failure = monitor DOWN. Alerts fire through your configured channels (email, Slack, PagerDuty, etc.).
  • Git metadata — each deploy records the commit SHA, branch, and who deployed. Visible in the dashboard.
  • Environment variables — pass credentials and config via --env flags or team-level secrets. Tests access them via process.env.
  • Filters — run a subset of tests with --pw-grep, --pw-project, or --pw-tag.

Multi-monitor config

For complex setups, define all check suites in a oack.config.json file:

oack.config.json
{
  "team": "<TEAM_ID>",
  "dir": "web",
  "checks": [
    {
      "name": "PokéStore Login",
      "pw_grep": "login"
    },
    {
      "name": "PokéStore Chromium Only",
      "pw_project": "chromium"
    },
    {
      "name": "PokéStore Critical Flows",
      "pw_tag": "critical"
    },
    {
      "name": "PokéStore Full Suite"
    }
  ]
}

The name field is the unique key — monitors are matched by name within the team. If a monitor with that name already exists, it's updated. If not, a new browser monitor is created automatically. Removing a check from the config does not delete the monitor — use oackctl monitors delete for that.

Filters narrow which tests each monitor runs:

  • pw_grep — match test names (--grep flag in Playwright)
  • pw_project — run a specific Playwright project (e.g. chromium, firefox)
  • pw_tag — filter by @tag annotation in test titles
Terminal
oackctl config-deploy --config oack.config.json

# Deploying 4 check suites...
#   PokéStore Login .............. created (monitor abc12345)
#   PokéStore Chromium Only ...... created (monitor bcd23456)
#   PokéStore Critical Flows ..... created (monitor cde34567)
#   PokéStore Full Suite ......... created (monitor def45678)
# Done.

Alert Channels

Alert channels define where notifications are sent when a monitor changes state. Supported channel types:

ChannelFreeProBusiness
EmailYesYesYes
Slack-YesYes
Discord-YesYes
Telegram-YesYes
Webhooks-YesYes
PagerDuty-YesYes
SMS & Calls--Coming soon

Telegram uses a one-click deep-link flow — no bot tokens or chat IDs to configure manually. Webhook payloads include HMAC signatures for verification.

Alert Behavior

Monitors notify only explicitly linked alert channels. If no channels are linked, the monitor stays silent.

Alert events are dispatched on two transitions:

  • DOWN — fired when the failure threshold is reached. Includes monitor URL, status code, and error details.
  • Recovery — fired when the recovery threshold is reached. Includes downtime duration.

Incident Lifecycle

Incidents track the full lifecycle of an outage or degradation — from detection to resolution. They can be created automatically from monitor failures or manually by any team member.

Every incident moves through a defined set of statuses:

StatusMeaning
DraftIncident created but not yet declared. Allows teams to assess before notifying stakeholders.
InvestigatingTeam is actively looking into the issue. Escalation timers begin if an escalation policy is attached.
IdentifiedRoot cause found. Responders are working on a fix.
MonitoringFix deployed. Team is watching to confirm stability before closing.
ResolvedIncident closed. Duration is calculated automatically from declared_at to resolved_at.

Each incident carries linked monitors, severity, tags, and a timeline of updates. Status page subscribers are notified automatically when an incident is published.

On-Call Scheduling

On-call schedules define who gets paged when an incident is triggered. Create rotation schedules so the right engineer is automatically notified — no manual routing required.

Key concepts

  • Rotations — Define a recurring schedule (daily, weekly, or custom) that cycles through team members.
  • Overrides — Temporarily replace the scheduled on-call for vacations, swaps, or out-of-band coverage.
  • Handoffs — Automatic transition between shifts with configurable overlap to ensure no gaps in coverage.

Escalation Policies

Escalation policies ensure incidents are never missed. If the primary on-call doesn't acknowledge within a configurable timeout, the incident automatically escalates to the next level.

How it works

  1. 1. Incident triggered — The on-call engineer at level 1 of the escalation policy is notified via their preferred channels.
  2. 2. No acknowledgment — If the no-ack timeout expires (e.g., 5 minutes), the incident escalates to level 2.
  3. 3. Acknowledgment — Acknowledging stops the escalation timer. The responder owns the incident.
  4. 4. Further escalation — Additional levels can notify team leads, managers, or entire channels as a last resort.

Escalation events are recorded in the incident timeline, creating a full audit trail of who was notified, when, and whether they responded.

War Rooms & Post-Mortems

War rooms provide a shared space for incident responders to coordinate in real time. Post updates, link monitors, tag team members, and track status transitions — all in one timeline.

After an incident is resolved, generate a post-mortem report that includes:

  • Timeline — Chronological record of all escalation events, status changes, and team comments.
  • Impact — Duration, affected monitors, and severity.
  • Root cause — Document what went wrong and why.
  • Action items — Track follow-up tasks to prevent recurrence.

Uptime, MTBF & MTTR

Oack computes three reliability metrics from monitor status change history:

  • Uptime % — percentage of time the monitor was in the UP state within the selected window.
  • MTBF — Mean Time Between Failures. Average duration between consecutive DOWN incidents.
  • MTTR — Mean Time To Recovery. Average duration of DOWN incidents before recovery.

Metrics are available over 7-day, 30-day, 90-day, and 365-day windows.

Probe Aggregation

For time ranges longer than 12 hours, probes are automatically aggregated into time buckets using SQL-level statistical functions.

Available aggregation functions: avg, median, min, max, p75, p90, p95, p99.

Bucket size scales with range: 5m buckets for 12-24h, up to 12h buckets for 90d+. Each bucket includes all six timing phases, probe count, and error count. Maximum 1,000 buckets per query.

TCP Telemetry

Every probe captures kernel-level TCP_INFO metrics with zero overhead:

  • RTT (Round-Trip Time) — Smoothed RTT and RTT variance as seen by the kernel.
  • Retransmits — Total retransmitted segments during the connection.
  • Congestion window — TCP congestion window size, indicating bandwidth capacity.
  • Segment counters — Segments sent and received during the exchange.

Performance Percentiles

When you open a probe's detail view, Oack computes a percent rank for each latency fraction — telling you where this probe sits relative to all successful probes for the same monitor.

PercentileInterpretation
0 – 50Faster than average. This fraction performed better than at least half of all probes.
50 – 75Normal range. Slightly above median but within typical variance.
75 – 90Above average. This fraction was slower than most probes — worth noting but may not indicate a problem.
90 – 100Anomalous. This fraction was slower than 90%+ of probes. Likely indicates a real issue.

Latency fractions

FractionWhat it measures
dns_msDNS resolution time
connect_msTCP connection establishment
tls_msTLS handshake (null for plain HTTP)
send_msTime to send the request
wait_msTime to first byte (TTFB) — server processing time
receive_msTime to download the response body
total_msEnd-to-end request duration (sum of all fractions)

Time windows

Each fraction is ranked across four time windows:

  • 1 day — Detects recent anomalies.
  • 7 days — Weekly baseline.
  • 30 days — Monthly baseline.
  • 90 days — Long-term baseline.

CDN Enrichment (Cloudflare)

When your target sits behind Cloudflare, Oack streams edge logs directly into probe details using Cloudflare Instant Logs. Each probe is enriched with CDN-level context automatically.

Requirement: The Cloudflare zone must be on a Business plan or higher. Instant Logs is not available on Free or Pro zones.

What each probe captures

  • Edge PoP — The Cloudflare data center that served the request.
  • Cache status — HIT, MISS, DYNAMIC, EXPIRED, or other cf-cache-status values.
  • Edge timing — Edge TTFB and origin response time from the Cloudflare perspective.
  • CDN GEO — Geographic location of the edge node that handled the request.

Setup

  1. Go to Account Settings → Integrations.
  2. Add a Cloudflare Zone integration with your zone ID and an API token that has the Logs:Read permission.
  3. Enrichment starts automatically for any monitor whose target matches the configured zone.

Probe Sharing

Share probe data with anyone using permalink share links.

  • Time range — Pick exact start and end timestamps.
  • Expiration — 1 hour, 24 hours, 7 days, 30 days, or 1 year.
  • Access mode — Public or authenticated links.
  • View count — Every share link tracks views.

Redaction Groups

When creating a share link you can hide sensitive data. Redaction is applied server-side.

Redaction GroupFields hidden
Monitor nameReplaced with a generic label
Checker IPChecker public IP address
Source ASNSource AS number and network name
HTTP bodies & authRequest/response bodies and authorization headers

Network Checker

A network checker is an agent that runs on your infrastructure, connects to Oack, and performs HTTP health checks against your monitors.

  • Shared — available to all accounts. Oack runs shared checkers in multiple regions.
  • Dedicated — private to your account. You deploy and manage the checker binary.

Checker Installation

The checker binary supports Linux (amd64, arm64), macOS (Intel & Apple Silicon), and FreeBSD (amd64, arm64).

Linux FreeBSD macOS
Binary amd64 / arm64 amd64 / arm64 Intel / Apple Silicon
Package (deb/rpm) amd64 / arm64 - -
Docker amd64 / arm64 - via Docker Desktop
Homebrew -

Homebrew (macOS / Linux)

Terminal
brew tap oack-io/tap
brew install network-tester

Shell script

Terminal
curl -sSfL "https://raw.githubusercontent.com/oack-io/network-tester/refs/heads/main/install-network-tester.sh" | bash

Docker

Terminal
docker pull oackio/network-tester:latest

mkdir -p $HOME/.net-checker-data
docker run --rm \\
    --cap-add NET_RAW \\
    -v $HOME/.net-checker-data:/data \\
    oackio/network-tester:latest \\
    --token-db /data/tokens.db --mode shared

MCP (AI-Assisted Troubleshooting)

Oack exposes a Model Context Protocol server that lets AI agents read your monitoring data. All MCP tools are read-only.

Claude Code config
{
  "mcpServers": {
    "oack": {
      "type": "http",
      "url": "https://api.oack.io/mcp/"
    }
  }
}

To allow Claude to use all Oack MCP tools without permission prompts:

Terminal
/permissions add mcp__oack__* "allow all Oack MCP tools"

CLI (oackctl)

oackctl is the official command-line interface for the Oack platform.

Install via Homebrew

Terminal
brew tap oack-io/tap
brew install oackctl

Install via shell script

Terminal
curl -sSfL "https://raw.githubusercontent.com/oack-io/oackctl/refs/heads/main/install-oackctl.sh" | bash

Quick start

Terminal
# Authenticate (opens browser for device flow)
oackctl login

# List your teams
oackctl teams list

# List monitors in a team
oackctl monitors list --team <team-id>

# Create a monitor
oackctl monitors create --team <team-id> \\
  --name "Production API" \\
  --url "https://api.example.com/health" \\
  --interval 60

# View probe results
oackctl probes list --team <team-id> --monitor <monitor-id> --limit 10

REST API

All platform functionality is available through the REST API at https://api.oack.io/api/v1/. Browse the full Swagger documentation.

The OpenAPI spec is available at https://api.oack.io/openapi.json. Import it into Postman, Insomnia, or any OpenAPI-compatible tool:

Postman: Import → Link → https://api.oack.io/openapi.json

Terraform Provider

Manage your Oack monitoring infrastructure as code with the official Terraform provider. Create teams, monitors, alert channels, status pages, and PagerDuty integrations — all in version-controlled HCL.

Installation

main.tf
terraform {
  required_providers {
    oack = {
      source  = "oack-io/oack"
      version = "~> 0.1"
    }
  }
}

provider "oack" {
  api_key    = var.oack_api_key    # or OACK_API_KEY env var
  account_id = var.oack_account_id # or OACK_ACCOUNT_ID env var
}

Available resources

ResourceDescription
oack_teamTeams that own monitors, channels, and API keys
oack_monitorHTTP/HTTPS monitors with SSL/domain expiry, latency thresholds, checker preferences
oack_alert_channelSlack, Email, Webhook, Telegram, Discord, PagerDuty channels
oack_monitor_alert_channel_linkRoute alerts from monitors to channels
oack_status_pagePublic or password-protected status pages with custom branding
oack_status_page_componentComponents and groups on status pages
oack_status_page_watchdogAuto-create/resolve incidents when monitors change health
oack_pagerduty_integrationTwo-way PagerDuty incident sync
oack_external_linkQuick links to Grafana, Datadog, or other dashboards
oack_team_api_keyTeam-scoped API keys for CI/CD and deploy events

Example: full-stack setup

main.tf
resource "oack_team" "production" {
  name = "Production"
}

resource "oack_monitor" "api" {
  team_id           = oack_team.production.id
  name              = "API Health"
  url               = "https://api.example.com/health"
  check_interval_ms = 30000
  ssl_expiry_enabled    = true
  domain_expiry_enabled = true
}

resource "oack_alert_channel" "slack" {
  team_id = oack_team.production.id
  name    = "Engineering Slack"
  type    = "slack"
  config  = jsonencode({ webhook_url = var.slack_webhook })
}

resource "oack_monitor_alert_channel_link" "api_slack" {
  team_id    = oack_team.production.id
  monitor_id = oack_monitor.api.id
  channel_id = oack_alert_channel.slack.id
}

See the full GitHub repository for progressive examples and resource documentation.

Account Roles

Every user in an account has one of five roles:

RoleDescription
OwnerFull control. Manage subscription, transfer ownership, delete account.
AdminCreate/manage teams, monitors, alert channels. Invite/remove members.
Billing AdminView/manage subscription and billing. Read-only access to teams.
MemberCreate teams and monitors, manage alert channels, invite team members.
GuestRead-only access. Default role for newly invited users.

Team Roles

RoleDescription
OwnerFull control. Delete team, transfer ownership.
AdminCreate/manage monitors and alert channels. Manage members.
MemberView monitors/probes/metrics. Create share links. Cannot modify monitors.

Permissions Summary

ActionMin Account RoleMin Team Role
View monitors & probesGuestMember
Create share linksMemberMember
Create/update/delete monitorsMemberAdmin
Manage alert channelsMemberAdmin
Invite account membersAdmin-
Manage subscriptionOwner / Billing-
Delete accountOwner-

Plan Comparison

FeatureFreePro ($29/mo)Business ($249/mo)
Monitors10100500
Check interval5 min60 sec30 sec
Teams1550
Members320Unlimited
Dedicated checkers-5Unlimited
Probe retention7 days90 days365 days
SSL & domain monitoring-YesYes
Alert channelsEmailAll standardAll + SMS (soon)

See Pricing for full details.