⚡ PerfCatch

Introducing PerfCatch — eBPF Per-Request Performance Monitoring for Kubernetes Applications

Measure CPU time, memory, network I/O, and duration for every HTTP request — zero code changes required.

The Problem

Traditional monitoring tools give you pod-level CPU and memory averages. But when a single slow request causes a timeout or an OOM kill, you're left guessing — which request consumed all the resources?

APM tools require SDK instrumentation, code changes, and per-host licensing. What if you could get per-request resource measurement without touching a single line of application code?

Introducing PerfCatch

PerfCatch is an open-source eBPF-based monitoring tool that measures resource consumption for every individual HTTP request hitting your Kubernetes pods.

It deploys as a DaemonSet, attaches eBPF programs to kernel TCP functions, and captures detailed per-request metrics automatically.

Zero instrumentation required. Works with any language — Python, Go, Java, Node.js, Rust, C# — because it operates at the kernel level.

What It Measures

Metric	Source	Description
`duration_ms`	accept() → tcp_close()	Total request wall-clock time
`cpu_time_ms`	sched_switch tracepoint	Actual on-CPU execution time
`memory_rss_bytes`	/proc/<pid>/status	Process RSS memory at request time
`bytes_received`	tcp_recvmsg kprobe	Network bytes received per request
`bytes_sent`	tcp_sendmsg kprobe	Network bytes sent per request
`correlation_id`	HTTP header capture	Extracted from X-Correlation-ID, traceparent, etc.
`http_method/path`	TCP stream first bytes	GET /compute extracted from request line

Key Features

Per-request granularity — CPU, memory, duration, and network per individual HTTP request
Real CPU time — actual on-CPU nanoseconds from kernel scheduler, not estimation
Correlation ID tracking — auto-extracts from 7 built-in headers + custom ones
< 1% CPU overhead — eBPF runs in kernel space with zero-copy event delivery
500+ requests/sec throughput per node
Helm chart deployment — single command with bundled Prometheus + Grafana
Pre-built Grafana dashboard — 8 panels with request table, time-series charts, and filtering
Flexible storage — in-memory ring buffer, Prometheus, VictoriaMetrics, or SQLite
Dependency tracking — captures outbound TCP connections during request processing

How It Works

PerfCatch runs 3 eBPF programs in kernel space:

request_tracker.c — tracks TCP accept → send → close lifecycle per connection
dependency_tracker.c — captures outbound calls made during request processing
resource_tracker.c — accumulates real CPU time via sched_switch tracepoint

In userspace, a Collector correlates events with K8s pod metadata, stores them in a 50K-entry ring buffer, and exposes both Prometheus histograms and a JSON query API.

Quick Start

Step 1: Deploy

helm install perfcatch charts/perfcatch \
  -n perfcatch --create-namespace \
  --set config.namespace=my-app

Step 2: Verify

kubectl -n perfcatch get pods
# NAME              READY   STATUS    RESTARTS   AGE
# perfcatch-znplr   1/1     Running   0          30s

Step 3: Send requests with correlation IDs

curl -H "X-Correlation-ID: order-12345" http://my-app/compute

Step 4: Query the API

curl localhost:9090/api/requests?correlation_id=order-12345

{
  "count": 1,
  "requests": [{
    "correlation_id": "order-12345",
    "pod_name": "sample-app-5879fb87f5-zcfbr",
    "http_method": "GET",
    "http_path": "/compute",
    "duration_ms": 9.24,
    "cpu_time_ms": 8.90,
    "memory_rss_bytes": 53518336,
    "bytes_received": 161,
    "bytes_sent": 170
  }]
}

Step 5: View in Grafana

kubectl -n monitoring port-forward svc/grafana 3000:3000
# Open http://localhost:3000 (admin / perfcatch)
# Dashboard: "PerfCatch - eBPF Request Metrics"

Deployment Options

Mode	What You Get
Full Stack (default)	Agent + Prometheus (5Gi PVC, 15d retention) + Grafana with pre-built dashboard
With VictoriaMetrics	Full stack + long-term per-request storage via Remote Write (30d retention)
Existing Prometheus	Agent only — auto-discovered via pod annotations
Standalone	No Prometheus. SQLite persistence on hostPath. Query via /api/requests

Supported Correlation Headers

PerfCatch auto-detects these headers from HTTP requests (configurable):

x-correlation-id
x-request-id
x-trace-id
traceparent
x-amzn-trace-id
request-id
correlation-id
+ any custom header via config

Why PerfCatch?

	PerfCatch	Traditional APM
Code changes	None	SDK integration required
CPU overhead	< 1%	5-15%
Language support	Any (kernel-level)	Language-specific agents
CPU time accuracy	Real on-CPU ns (sched_switch)	Wall-clock estimation
Cost	Free / open source	Per-host licensing
Deploy time	1 Helm command	Hours of SDK integration

Links

GitHub: github.com/DevOpsArts/perfcatch
Website: devopsarts.github.io/perfcatch

Post a Comment