Logsnare

The only Kubernetes log agent with intelligent error context capture, rule-based alerting, and 9 pluggable storage backends.

The Problem: Finding the Needle in the Log Haystack

Every SRE knows the pain: an alert fires at 3 AM, and you're digging through gigabytes of logs trying to understand what happened before the error. Traditional log solutions either capture everything (expensive) or miss crucial context (frustrating).

What if your log agent was smart enough to capture only what matters—the error AND the context around it—and alert you instantly?

Introducing Logsnare

Logsnare is an open-source Kubernetes log monitoring agent that solves this problem with intelligent error-aware context capture and rule-based alerting. Instead of blindly forwarding all logs, Logsnare:

  • πŸ” Detects errors using configurable regex patterns
  • Captures context – logs BEFORE and AFTER the error
  • 🚨 Alerts intelligently – route different errors to different teams
  • πŸ’Ύ Stores smartly – choose from 9 storage backends
  • Scales efficiently – handles 500+ pods with minimal resources

Key Features

🎯 Smart Error Detection

Logsnare uses regex and string-based pattern matching to detect errors across multiple languages and frameworks:

errorPatterns:
  - "ERROR"
  - "Exception"
  - "FATAL"
  - "panic:"
  - "Traceback"
  - "OOMKilled"
  - "CrashLoopBackOff"
  
  These patterns are "fully customizable"

🚨 Rule-Based Alerting NEW

Route different error patterns to different teams with customizable thresholds:

alerting:
  enabled: true
  rules:
    # Critical errors → On-call team immediately
    - name: "critical-errors"
      patterns: ["CRITICAL", "FATAL", "OOMKilled", "panic:"]
      threshold:
        count: 1          # Alert on FIRST occurrence
        windowSeconds: 60
      email:
        enabled: true
        toAddresses: ["oncall@company.com"]

    # Java exceptions → Backend team (after 2 occurrences)
    - name: "java-exceptions"  
      patterns: ["NullPointerException", "OutOfMemoryError"]
      threshold:
        count: 2          # Alert after 2 occurrences
        windowSeconds: 300
      email:
        enabled: true
        toAddresses: ["backend-team@company.com"]

    # Python errors → Data team
    - name: "python-errors"
      patterns: ["Traceback", "TypeError", "ValueError"]
      threshold:
        count: 1
      email:
        enabled: true
        toAddresses: ["data-team@company.com"]

Why rule-based alerting matters:

  • 🎯 Reduce alert fatigue – only alert relevant teams
  • ⏱️ Smart thresholds – distinguish between flaky tests and real outages
  • πŸ“§ Multiple channels – email and webhooks (Slack, PagerDuty, etc.)
  • πŸ”„ Context included – alerts contain the actual error with surrounding logs

πŸ“¦ 9 Storage Backends

One agent, any storage destination:

Backend Use Case
PostgreSQL Relational queries, SQL analysis
MongoDB Flexible document storage
Elasticsearch Full-text search, Kibana dashboards
Azure Log Analytics Azure ecosystem, KQL queries
AWS CloudWatch AWS ecosystem, CloudWatch Insights
GCP Cloud Logging Google Cloud ecosystem

πŸ”„ Context Capture Window

Configure how much context to capture around errors:

captureWindow:
  bufferDurationMinutes: 2  # Lines BEFORE error
  captureAfterMinutes: 2    # Lines AFTER error

This means when an error occurs, you get the full story—not just the error line.

Quick Start

Deploy Logsnare in under 2 minutes:

# Clone the repository
git clone https://github.com/DevOpsArts/logsnare.git
cd logsnare

# Install with Helm (PostgreSQL backend + alerting)
helm install logsnare-engine ./charts/logsnare-engine \
  --namespace logsnare \
  --create-namespace \
  --set storage.type=postgresql \
  --set connections.postgresql.host=your-db-host \
  --set connections.postgresql.username=logsnare \
  --set connections.postgresql.password=YOUR_PASSWORD \
  --set alerting.enabled=true \
  --set alerting.email.smtpHost=smtp.company.com

Architecture

┌─────────────────────────────────────────────┐
│           Kubernetes Cluster                │
│                                             │
│  ┌─────┐  ┌─────┐  ┌─────┐                 │
│  │Pod A│  │Pod B│  │Pod C│  ← Monitored    │
│  └──┬──┘  └──┬──┘  └──┬──┘                 │
│     └────────┼────────┘                     │
│              ▼                              │
│     ┌────────────────────┐                  │
│     │   Logsnare-Engine  │                  │
│     │  • Error Detection │                  │
│     │  • Context Capture │                  │
│     │  • Rule-Based Alert│ ──► πŸ“§ Email     │
│     │  • Rolling Buffer  │ ──► πŸ”— Webhook   │
│     └─────────┬──────────┘                  │
└───────────────┼─────────────────────────────┘
                ▼
       ┌────────────────┐
       │ Storage Backend│
       │ (Your Choice)  │
       └────────────────┘
  

Production-Ready Features

  • πŸ”’ Security: Non-root container, read-only filesystem, seccomp profiles
  • πŸ”„ High Availability: Leader election for multi-replica deployments
  • πŸ“Š Scalability: ThreadPoolExecutor handles 500+ pods efficiently
  • πŸ” SSL/TLS: Secure connections to all database backends
  • πŸ›‘️ Network Policies: Built-in network isolation templates
  • 🚨 Smart Alerting: Rule-based routing with thresholds and cooldowns

Get Started Today

Logsnare is open-source and free to use. Check out the resources below:


Have questions or feedback? Drop a comment below or open an issue on GitHub!

Tags: Kubernetes, DevOps, SRE, Logging, Monitoring, Alerting, Azure, AWS, GCP, Helm, Open Source

Post a Comment

Previous Post Next Post