PerfCatch

Introducing PerfCatch — eBPF Per-Request Performance Monitoring for Kubernetes Applications

Measure CPU time, memory, network I/O, and duration for every HTTP request — zero code changes required.

The Problem

Traditional monitoring tools give you pod-level CPU and memory averages. But when a single slow request causes a timeout or an OOM kill, you're left guessing — which request consumed all the resources?

APM tools require SDK instrumentation, code changes, and per-host licensing. What if you could get per-request resource measurement without touching a single line of application code?

Introducing PerfCatch

PerfCatch is an open-source eBPF-based monitoring tool that measures resource consumption for every individual HTTP request hitting your Kubernetes pods.

It deploys as a DaemonSet, attaches eBPF programs to kernel TCP functions, and captures detailed per-request metrics automatically.

Zero instrumentation required. Works with any language — Python, Go, Java, Node.js, Rust, C# — because it operates at the kernel level.

What It Measures

Metric Source Description
duration_ms accept() → tcp_close() Total request wall-clock time
cpu_time_ms sched_switch tracepoint Actual on-CPU execution time
memory_rss_bytes /proc/<pid>/status Process RSS memory at request time
bytes_received tcp_recvmsg kprobe Network bytes received per request
bytes_sent tcp_sendmsg kprobe Network bytes sent per request
correlation_id HTTP header capture Extracted from X-Correlation-ID, traceparent, etc.
http_method/path TCP stream first bytes GET /compute extracted from request line

Key Features

  • Per-request granularity — CPU, memory, duration, and network per individual HTTP request
  • Real CPU time — actual on-CPU nanoseconds from kernel scheduler, not estimation
  • Correlation ID tracking — auto-extracts from 7 built-in headers + custom ones
  • < 1% CPU overhead — eBPF runs in kernel space with zero-copy event delivery
  • 500+ requests/sec throughput per node
  • Helm chart deployment — single command with bundled Prometheus + Grafana
  • Pre-built Grafana dashboard — 8 panels with request table, time-series charts, and filtering
  • Flexible storage — in-memory ring buffer, Prometheus, VictoriaMetrics, or SQLite
  • Dependency tracking — captures outbound TCP connections during request processing

How It Works

PerfCatch runs 3 eBPF programs in kernel space:

  1. request_tracker.c — tracks TCP accept → send → close lifecycle per connection
  2. dependency_tracker.c — captures outbound calls made during request processing
  3. resource_tracker.c — accumulates real CPU time via sched_switch tracepoint

In userspace, a Collector correlates events with K8s pod metadata, stores them in a 50K-entry ring buffer, and exposes both Prometheus histograms and a JSON query API.

Quick Start

Step 1: Deploy

helm install perfcatch charts/perfcatch \
  -n perfcatch --create-namespace \
  --set config.namespace=my-app

Step 2: Verify

kubectl -n perfcatch get pods
# NAME              READY   STATUS    RESTARTS   AGE
# perfcatch-znplr   1/1     Running   0          30s

Step 3: Send requests with correlation IDs

curl -H "X-Correlation-ID: order-12345" http://my-app/compute

Step 4: Query the API

curl localhost:9090/api/requests?correlation_id=order-12345
{
  "count": 1,
  "requests": [{
    "correlation_id": "order-12345",
    "pod_name": "sample-app-5879fb87f5-zcfbr",
    "http_method": "GET",
    "http_path": "/compute",
    "duration_ms": 9.24,
    "cpu_time_ms": 8.90,
    "memory_rss_bytes": 53518336,
    "bytes_received": 161,
    "bytes_sent": 170
  }]
}

Step 5: View in Grafana

kubectl -n monitoring port-forward svc/grafana 3000:3000
# Open http://localhost:3000 (admin / perfcatch)
# Dashboard: "PerfCatch - eBPF Request Metrics"

Deployment Options

Mode What You Get
Full Stack (default) Agent + Prometheus (5Gi PVC, 15d retention) + Grafana with pre-built dashboard
With VictoriaMetrics Full stack + long-term per-request storage via Remote Write (30d retention)
Existing Prometheus Agent only — auto-discovered via pod annotations
Standalone No Prometheus. SQLite persistence on hostPath. Query via /api/requests

Supported Correlation Headers

PerfCatch auto-detects these headers from HTTP requests (configurable):

  • x-correlation-id
  • x-request-id
  • x-trace-id
  • traceparent
  • x-amzn-trace-id
  • request-id
  • correlation-id
  • + any custom header via config

Why PerfCatch?

PerfCatch Traditional APM
Code changes None SDK integration required
CPU overhead < 1% 5-15%
Language support Any (kernel-level) Language-specific agents
CPU time accuracy Real on-CPU ns (sched_switch) Wall-clock estimation
Cost Free / open source Per-host licensing
Deploy time 1 Helm command Hours of SDK integration

Links

Post a Comment

Previous Post Next Post