Documentation
Oack is an uptime and performance monitoring platform with deep TCP-level telemetry, multi-channel alerting, and AI-assisted troubleshooting via MCP. Run checkers on your own infrastructure, get notified in seconds, and diagnose issues all the way down to the network layer.
Quick Start
Create your account, set up a team, and add your first monitor in minutes.
Get startedNetwork Checker
Deploy a dedicated checker on your infrastructure for private monitoring.
Learn moreQuick Start
Get monitoring in three steps:
- 1. Create an account — Sign up at app.oack.io and create your first team.
- 2. Add a monitor — Enter the URL you want to watch, pick an HTTP method, set the check interval, and choose a checker region.
- 3. Configure alerts — Create alert channels (Slack, Discord, Telegram, PagerDuty, Email, or Webhook) and link them to your monitor. You'll be notified within seconds when something goes wrong.
HTTP Monitoring
Each monitor performs an HTTP/HTTPS request to your endpoint at a configurable interval. Every probe captures:
- Full timing breakdown — DNS lookup, TCP connect, TLS handshake, send, wait (TTFB), and receive phases.
- HTTP headers & body — Request and response headers plus truncated body (1 KB) captured on every probe.
- TCP metrics — Kernel-level TCP_INFO data including RTT, retransmits, congestion window, and segment counters.
- Packet capture — Optional per-probe pcap of the full HTTP exchange (SYN to FIN) for deep post-mortem analysis.
Monitors support custom HTTP methods (GET, POST, HEAD, etc.), custom headers, request body, and configurable timeouts. Check intervals range from 30 seconds (Business) to 5 minutes (Free).
Health Rules
Health rules define when a monitor is considered up or down. Each monitor has:
- Success criteria — Expected HTTP status codes (e.g. 200-299) and maximum latency threshold.
- Failure threshold — Number of consecutive failing probes required before the monitor transitions from UP to DOWN.
- Recovery threshold — Number of consecutive passing probes required before the monitor transitions from DOWN back to UP.
SSL & Domain Expiration
Oack automatically monitors SSL certificate and domain registration expiration for every active HTTP monitor. A daily sweep checks:
- SSL certificates — TLS handshake reads the leaf certificate's expiration date.
- Domain registration — RDAP protocol (the ICANN standard replacement for WHOIS) checks registration expiry.
Notifications are sent through your linked alert channels at configurable thresholds (default: 30, 14, 7, and 1 days before expiration). Available on Pro and Business plans.
Web Checker — Pageload
Pageload monitors launch a real Chromium browser (Playwright) to load your page exactly like a visitor would. Designed for performance monitoring — measure how fast your page loads, track Web Vitals over time, and get alerted when performance degrades. No scripting required — just enter a URL.
Web Vitals & timing metrics
| Metric | Name | What it tells you |
|---|---|---|
| TTFB | Time to First Byte | How long until the browser receives the first byte from the server. High TTFB points to slow server processing, DNS issues, or network latency. Under 200 ms is good; above 600 ms needs investigation. |
| FCP | First Contentful Paint | When the browser renders the first piece of visible content (text, image, or canvas). This is the moment your page stops being blank. Under 1.8 s is good; above 3 s feels slow to users. |
| LCP | Largest Contentful Paint | When the largest visible element (hero image, heading block, video poster) finishes rendering. This is the best proxy for "the page looks ready." Under 2.5 s is good; above 4 s means users are waiting too long for the main content. |
| CLS | Cumulative Layout Shift | How much the page layout shifts unexpectedly while loading (ads popping in, images resizing, fonts swapping). It's a score, not a time — under 0.1 is good; above 0.25 means things are jumping around and annoying your visitors. |
| DOM Interactive | DOM Interactive | When the HTML document has been fully parsed and the DOM is ready for JavaScript to manipulate. Render-blocking scripts and large HTML payloads push this number up. |
| DOMContentLoaded | DOMContentLoaded Event | When the HTML and all deferred scripts have finished executing. A big gap between DOM Interactive and DOMContentLoaded usually means heavy synchronous JavaScript. |
| Load Event | Window Load | When the entire page — including images, stylesheets, iframes, and fonts — has finished loading. This is the "everything done" marker. |
What each probe captures
- Web Vitals — TTFB, FCP, LCP, and CLS measured from the real browser rendering pipeline.
- Page timing — DOM Interactive, DOMContentLoaded, and Load Event timestamps.
- HAR waterfall — Full HTTP Archive of every network request the page made, with timing, size, and status. Download and inspect in any HAR viewer.
- Screenshots — Optional viewport or full-page screenshot captured after the page loads.
- Console log — All console messages (errors, warnings, info) emitted during page load, with counts for each severity.
- Resource summary — Total resource count, error count, and total bytes transferred.
Web Checker — Test Suite
Test Suite monitors run standard Playwright Test files to verify functional user flows — login, search, checkout, multi-page navigation.
Designed for scenario testing, not page speed. The platform runs npx playwright test on schedule and alerts you when tests fail.
Write your tests with test() and expect(), run them locally with npx playwright test,
then deploy the same directory to Oack. No custom API, no rewrites — the same tests run everywhere.
Example: PokéStore e2e tests
poke-store.oack.io is a demo Pokémon store with login, search, cart, and checkout flows. Source. The test suite lives alongside the frontend code:
import { test, expect } from '@playwright/test';
async function loginAsAsh(page) {
await page.goto('/login');
await page.getByTestId('email-input').fill('[email protected]');
await page.getByTestId('password-input').fill('pikachu123');
await page.getByTestId('login-submit').click();
await page.waitForURL(/\/store/);
}
test.describe('PokéStore', () => {
test('should log in and see store', async ({ page }) => {
await loginAsAsh(page);
await expect(page.getByTestId('user-name')).toHaveText('Ash Ketchum');
});
test('should search Pokémon', async ({ page }) => {
await loginAsAsh(page);
await page.getByTestId('search-input').fill('pikachu');
await expect(page.getByTestId('pokemon-name')).toContainText('Pikachu');
});
}); Run locally
cd web
npx playwright test
# 13 passed (24.1s) Skip repetitive flags with .oackctl.env
Create a .oackctl.env file in your project root to avoid passing --team and --monitor on every command.
oackctl auto-loads it from the current directory.
OACKCTL_TEAM=a98957b0-a129-4032-a2c4-d18ac8dd2287
OACKCTL_MONITOR=f190f477-48f7-46d7-a533-25ca3b1541e1 Now you can run commands without the flags:
oackctl test --dir web
oackctl deploy --dir web
Every CLI flag maps to an OACKCTL_ env var:
--team → OACKCTL_TEAM,
--monitor → OACKCTL_MONITOR,
--pw-grep → OACKCTL_PW_GREP, etc.
Add .oackctl.env to your .gitignore if it contains team-specific IDs.
Test on Oack (one-off run)
Upload the same directory for a one-off test run on Oack's browser infrastructure. The result includes a full Playwright HTML report.
oackctl test --team <TEAM> --monitor <MONITOR> --dir web
# Packaging web...
# Files: 74 (112.7 KB)
#
# Running test...
#
# Result: PASSED
# Report: https://api.oack.io/api/v1/artifacts/.../report/index.html Deploy for continuous monitoring
Deploy the test suite to a browser monitor. It runs on schedule (e.g. every 5 minutes) and you get alerted when tests fail.
oackctl deploy --team <TEAM> --monitor <MONITOR> --dir web
# Packaging web...
# Files: 74 (112.7 KB)
#
# Uploading suite...
# Suite: 112.7 KB
# Tests: tests/e2e/store.spec.ts
# Git: 8169f4ce (main)
#
# Deployed.
# Monitor: https://app.oack.io/teams/.../monitors/... What you get
- Playwright HTML report — full test breakdown with screenshots, error details, and timing. Opens in your browser after each test run.
- Pass/fail health status — any test failure = monitor DOWN. Alerts fire through your configured channels (email, Slack, PagerDuty, etc.).
- Git metadata — each deploy records the commit SHA, branch, and who deployed. Visible in the dashboard.
- Environment variables — pass credentials and config via
--envflags or team-level secrets. Tests access them viaprocess.env. - Filters — run a subset of tests with
--pw-grep,--pw-project, or--pw-tag.
Multi-monitor config
For complex setups, define all check suites in a oack.config.json file:
{
"team": "<TEAM_ID>",
"dir": "web",
"checks": [
{
"name": "PokéStore Login",
"pw_grep": "login"
},
{
"name": "PokéStore Chromium Only",
"pw_project": "chromium"
},
{
"name": "PokéStore Critical Flows",
"pw_tag": "critical"
},
{
"name": "PokéStore Full Suite"
}
]
}
The name field is the unique key — monitors are matched by name within the team.
If a monitor with that name already exists, it's updated. If not, a new browser monitor is created automatically.
Removing a check from the config does not delete the monitor — use oackctl monitors delete for that.
Filters narrow which tests each monitor runs:
pw_grep— match test names (--grepflag in Playwright)pw_project— run a specific Playwright project (e.g.chromium,firefox)pw_tag— filter by@tagannotation in test titles
oackctl config-deploy --config oack.config.json
# Deploying 4 check suites...
# PokéStore Login .............. created (monitor abc12345)
# PokéStore Chromium Only ...... created (monitor bcd23456)
# PokéStore Critical Flows ..... created (monitor cde34567)
# PokéStore Full Suite ......... created (monitor def45678)
# Done. Alert Channels
Alert channels define where notifications are sent when a monitor changes state. Supported channel types:
| Channel | Free | Pro | Business |
|---|---|---|---|
| Yes | Yes | Yes | |
| Slack | - | Yes | Yes |
| Discord | - | Yes | Yes |
| Telegram | - | Yes | Yes |
| Webhooks | - | Yes | Yes |
| PagerDuty | - | Yes | Yes |
| SMS & Calls | - | - | Coming soon |
Telegram uses a one-click deep-link flow — no bot tokens or chat IDs to configure manually. Webhook payloads include HMAC signatures for verification.
Alert Behavior
Monitors notify only explicitly linked alert channels. If no channels are linked, the monitor stays silent.
Alert events are dispatched on two transitions:
- DOWN — fired when the failure threshold is reached. Includes monitor URL, status code, and error details.
- Recovery — fired when the recovery threshold is reached. Includes downtime duration.
Incident Lifecycle
Incidents track the full lifecycle of an outage or degradation — from detection to resolution. They can be created automatically from monitor failures or manually by any team member.
Every incident moves through a defined set of statuses:
| Status | Meaning |
|---|---|
| Draft | Incident created but not yet declared. Allows teams to assess before notifying stakeholders. |
| Investigating | Team is actively looking into the issue. Escalation timers begin if an escalation policy is attached. |
| Identified | Root cause found. Responders are working on a fix. |
| Monitoring | Fix deployed. Team is watching to confirm stability before closing. |
| Resolved | Incident closed. Duration is calculated automatically from declared_at to resolved_at. |
Each incident carries linked monitors, severity, tags, and a timeline of updates. Status page subscribers are notified automatically when an incident is published.
On-Call Scheduling
On-call schedules define who gets paged when an incident is triggered. Create rotation schedules so the right engineer is automatically notified — no manual routing required.
Key concepts
- Rotations — Define a recurring schedule (daily, weekly, or custom) that cycles through team members.
- Overrides — Temporarily replace the scheduled on-call for vacations, swaps, or out-of-band coverage.
- Handoffs — Automatic transition between shifts with configurable overlap to ensure no gaps in coverage.
Escalation Policies
Escalation policies ensure incidents are never missed. If the primary on-call doesn't acknowledge within a configurable timeout, the incident automatically escalates to the next level.
How it works
- 1. Incident triggered — The on-call engineer at level 1 of the escalation policy is notified via their preferred channels.
- 2. No acknowledgment — If the no-ack timeout expires (e.g., 5 minutes), the incident escalates to level 2.
- 3. Acknowledgment — Acknowledging stops the escalation timer. The responder owns the incident.
- 4. Further escalation — Additional levels can notify team leads, managers, or entire channels as a last resort.
Escalation events are recorded in the incident timeline, creating a full audit trail of who was notified, when, and whether they responded.
War Rooms & Post-Mortems
War rooms provide a shared space for incident responders to coordinate in real time. Post updates, link monitors, tag team members, and track status transitions — all in one timeline.
After an incident is resolved, generate a post-mortem report that includes:
- Timeline — Chronological record of all escalation events, status changes, and team comments.
- Impact — Duration, affected monitors, and severity.
- Root cause — Document what went wrong and why.
- Action items — Track follow-up tasks to prevent recurrence.
Uptime, MTBF & MTTR
Oack computes three reliability metrics from monitor status change history:
- Uptime % — percentage of time the monitor was in the UP state within the selected window.
- MTBF — Mean Time Between Failures. Average duration between consecutive DOWN incidents.
- MTTR — Mean Time To Recovery. Average duration of DOWN incidents before recovery.
Metrics are available over 7-day, 30-day, 90-day, and 365-day windows.
Probe Aggregation
For time ranges longer than 12 hours, probes are automatically aggregated into time buckets using SQL-level statistical functions.
Available aggregation functions:
avg,
median,
min,
max,
p75,
p90,
p95,
p99.
Bucket size scales with range: 5m buckets for 12-24h, up to 12h buckets for 90d+. Each bucket includes all six timing phases, probe count, and error count. Maximum 1,000 buckets per query.
TCP Telemetry
Every probe captures kernel-level TCP_INFO metrics with zero overhead:
- RTT (Round-Trip Time) — Smoothed RTT and RTT variance as seen by the kernel.
- Retransmits — Total retransmitted segments during the connection.
- Congestion window — TCP congestion window size, indicating bandwidth capacity.
- Segment counters — Segments sent and received during the exchange.
Performance Percentiles
When you open a probe's detail view, Oack computes a percent rank for each latency fraction — telling you where this probe sits relative to all successful probes for the same monitor.
| Percentile | Interpretation |
|---|---|
| 0 – 50 | Faster than average. This fraction performed better than at least half of all probes. |
| 50 – 75 | Normal range. Slightly above median but within typical variance. |
| 75 – 90 | Above average. This fraction was slower than most probes — worth noting but may not indicate a problem. |
| 90 – 100 | Anomalous. This fraction was slower than 90%+ of probes. Likely indicates a real issue. |
Latency fractions
| Fraction | What it measures |
|---|---|
| dns_ms | DNS resolution time |
| connect_ms | TCP connection establishment |
| tls_ms | TLS handshake (null for plain HTTP) |
| send_ms | Time to send the request |
| wait_ms | Time to first byte (TTFB) — server processing time |
| receive_ms | Time to download the response body |
| total_ms | End-to-end request duration (sum of all fractions) |
Time windows
Each fraction is ranked across four time windows:
- 1 day — Detects recent anomalies.
- 7 days — Weekly baseline.
- 30 days — Monthly baseline.
- 90 days — Long-term baseline.
CDN Enrichment (Cloudflare)
When your target sits behind Cloudflare, Oack streams edge logs directly into probe details using Cloudflare Instant Logs. Each probe is enriched with CDN-level context automatically.
What each probe captures
- Edge PoP — The Cloudflare data center that served the request.
- Cache status — HIT, MISS, DYNAMIC, EXPIRED, or other cf-cache-status values.
- Edge timing — Edge TTFB and origin response time from the Cloudflare perspective.
- CDN GEO — Geographic location of the edge node that handled the request.
Setup
- Go to Account Settings → Integrations.
- Add a Cloudflare Zone integration with your zone ID and an API token that has the
Logs:Readpermission. - Enrichment starts automatically for any monitor whose target matches the configured zone.
Probe Sharing
Share probe data with anyone using permalink share links.
- Time range — Pick exact start and end timestamps.
- Expiration — 1 hour, 24 hours, 7 days, 30 days, or 1 year.
- Access mode — Public or authenticated links.
- View count — Every share link tracks views.
Redaction Groups
When creating a share link you can hide sensitive data. Redaction is applied server-side.
| Redaction Group | Fields hidden |
|---|---|
| Monitor name | Replaced with a generic label |
| Checker IP | Checker public IP address |
| Source ASN | Source AS number and network name |
| HTTP bodies & auth | Request/response bodies and authorization headers |
Network Checker
A network checker is an agent that runs on your infrastructure, connects to Oack, and performs HTTP health checks against your monitors.
- Shared — available to all accounts. Oack runs shared checkers in multiple regions.
- Dedicated — private to your account. You deploy and manage the checker binary.
Checker Installation
The checker binary supports Linux (amd64, arm64), macOS (Intel & Apple Silicon), and FreeBSD (amd64, arm64).
| Linux | FreeBSD | macOS | |
|---|---|---|---|
| Binary | ✓ amd64 / arm64 | ✓ amd64 / arm64 | ✓ Intel / Apple Silicon |
| Package (deb/rpm) | ✓ amd64 / arm64 | - | - |
| Docker | ✓ amd64 / arm64 | - | ✓ via Docker Desktop |
| Homebrew | ✓ | - | ✓ |
Homebrew (macOS / Linux)
brew tap oack-io/tap
brew install network-tester Shell script
curl -sSfL "https://raw.githubusercontent.com/oack-io/network-tester/refs/heads/main/install-network-tester.sh" | bash Docker
docker pull oackio/network-tester:latest
mkdir -p $HOME/.net-checker-data
docker run --rm \\
--cap-add NET_RAW \\
-v $HOME/.net-checker-data:/data \\
oackio/network-tester:latest \\
--token-db /data/tokens.db --mode shared MCP (AI-Assisted Troubleshooting)
Oack exposes a Model Context Protocol server that lets AI agents read your monitoring data. All MCP tools are read-only.
{
"mcpServers": {
"oack": {
"type": "http",
"url": "https://api.oack.io/mcp/"
}
}
} To allow Claude to use all Oack MCP tools without permission prompts:
/permissions add mcp__oack__* "allow all Oack MCP tools" CLI (oackctl)
oackctl is the official command-line interface for the Oack platform.
Install via Homebrew
brew tap oack-io/tap
brew install oackctl Install via shell script
curl -sSfL "https://raw.githubusercontent.com/oack-io/oackctl/refs/heads/main/install-oackctl.sh" | bash Quick start
# Authenticate (opens browser for device flow)
oackctl login
# List your teams
oackctl teams list
# List monitors in a team
oackctl monitors list --team <team-id>
# Create a monitor
oackctl monitors create --team <team-id> \\
--name "Production API" \\
--url "https://api.example.com/health" \\
--interval 60
# View probe results
oackctl probes list --team <team-id> --monitor <monitor-id> --limit 10 REST API
All platform functionality is available through the REST API at
https://api.oack.io/api/v1/.
Browse the full Swagger documentation.
The OpenAPI spec is available at https://api.oack.io/openapi.json. Import it into Postman, Insomnia, or any OpenAPI-compatible tool:
Postman: Import → Link → https://api.oack.io/openapi.json
Terraform Provider
Manage your Oack monitoring infrastructure as code with the official Terraform provider. Create teams, monitors, alert channels, status pages, and PagerDuty integrations — all in version-controlled HCL.
Installation
terraform {
required_providers {
oack = {
source = "oack-io/oack"
version = "~> 0.1"
}
}
}
provider "oack" {
api_key = var.oack_api_key # or OACK_API_KEY env var
account_id = var.oack_account_id # or OACK_ACCOUNT_ID env var
} Available resources
| Resource | Description |
|---|---|
oack_team | Teams that own monitors, channels, and API keys |
oack_monitor | HTTP/HTTPS monitors with SSL/domain expiry, latency thresholds, checker preferences |
oack_alert_channel | Slack, Email, Webhook, Telegram, Discord, PagerDuty channels |
oack_monitor_alert_channel_link | Route alerts from monitors to channels |
oack_status_page | Public or password-protected status pages with custom branding |
oack_status_page_component | Components and groups on status pages |
oack_status_page_watchdog | Auto-create/resolve incidents when monitors change health |
oack_pagerduty_integration | Two-way PagerDuty incident sync |
oack_external_link | Quick links to Grafana, Datadog, or other dashboards |
oack_team_api_key | Team-scoped API keys for CI/CD and deploy events |
Example: full-stack setup
resource "oack_team" "production" {
name = "Production"
}
resource "oack_monitor" "api" {
team_id = oack_team.production.id
name = "API Health"
url = "https://api.example.com/health"
check_interval_ms = 30000
ssl_expiry_enabled = true
domain_expiry_enabled = true
}
resource "oack_alert_channel" "slack" {
team_id = oack_team.production.id
name = "Engineering Slack"
type = "slack"
config = jsonencode({ webhook_url = var.slack_webhook })
}
resource "oack_monitor_alert_channel_link" "api_slack" {
team_id = oack_team.production.id
monitor_id = oack_monitor.api.id
channel_id = oack_alert_channel.slack.id
} See the full GitHub repository for progressive examples and resource documentation.
Account Roles
Every user in an account has one of five roles:
| Role | Description |
|---|---|
| Owner | Full control. Manage subscription, transfer ownership, delete account. |
| Admin | Create/manage teams, monitors, alert channels. Invite/remove members. |
| Billing Admin | View/manage subscription and billing. Read-only access to teams. |
| Member | Create teams and monitors, manage alert channels, invite team members. |
| Guest | Read-only access. Default role for newly invited users. |
Team Roles
| Role | Description |
|---|---|
| Owner | Full control. Delete team, transfer ownership. |
| Admin | Create/manage monitors and alert channels. Manage members. |
| Member | View monitors/probes/metrics. Create share links. Cannot modify monitors. |
Permissions Summary
| Action | Min Account Role | Min Team Role |
|---|---|---|
| View monitors & probes | Guest | Member |
| Create share links | Member | Member |
| Create/update/delete monitors | Member | Admin |
| Manage alert channels | Member | Admin |
| Invite account members | Admin | - |
| Manage subscription | Owner / Billing | - |
| Delete account | Owner | - |
Plan Comparison
| Feature | Free | Pro ($29/mo) | Business ($249/mo) |
|---|---|---|---|
| Monitors | 10 | 100 | 500 |
| Check interval | 5 min | 60 sec | 30 sec |
| Teams | 1 | 5 | 50 |
| Members | 3 | 20 | Unlimited |
| Dedicated checkers | - | 5 | Unlimited |
| Probe retention | 7 days | 90 days | 365 days |
| SSL & domain monitoring | - | Yes | Yes |
| Alert channels | All standard | All + SMS (soon) |
See Pricing for full details.