Catalog
concept#Observability#Platform#DevOps#Security

Log Management

Log management organizes collection, storage and analysis of application and system logs for troubleshooting, security and compliance.

Log management covers collection, transport, storage and analysis of application and system logs.
Established
Medium

Classification

  • Medium
  • Technical
  • Architectural
  • Intermediate

Technical context

Tracing systems (e.g., OpenTelemetry)Monitoring and alerting tools (e.g., Prometheus, Grafana)Archive storage or cloud object storage

Principles & goals

Centralized collection with consistent formatsSeparation of hot and cold storage by usageCorrelation of logs with traces and metrics
Run
Enterprise, Domain, Team

Use cases & scenarios

Compromises

  • Uncontrolled log growth leads to cost explosion
  • Sensitive data may be stored without proper masking
  • Missing time synchronization hampers correlation
  • Define and use log levels consistently
  • Prefer structured logs (JSON) for easier analysis
  • Mask or filter sensitive data early

I/O & resources

  • Log emitters (applications, infrastructure, network devices)
  • Schema and field documentation
  • Policies for retention and data protection
  • Searchable log indices
  • Alerting and dashboard visualizations
  • Archived log backups for audits

Description

Log management covers collection, transport, storage and analysis of application and system logs. It defines structured processes for ingestion, indexing, retention and search to support troubleshooting, security monitoring and compliance. Effective log management lowers MTTR, improves observability and enables forensic investigations.

  • Faster fault detection and reduced MTTR
  • Improved security forensics and auditability
  • Better basis for capacity planning decisions

  • High storage and operational costs with long retention
  • Complexity in standardizing formats across heterogeneous systems
  • Blind spots when logs are incomplete or not instrumented

  • Log ingest rate

    Number of incoming log events per second; indicator for scaling needs.

  • MTTR (Mean Time To Repair)

    Average time to recover after an incident; measures effectiveness of log analysis.

  • Storage cost per GB per month

    Monthly cost for retaining logs; important for retention decisions.

ELK stack in an e‑commerce platform

Centralized collection of web, application and security logs to analyze user errors and performance bottlenecks.

OpenTelemetry-based log pipeline

Unstructured logs are normalized via a collector, correlated with traces and forwarded to an observability backend.

Cloud-native logging with centralized retention

Managed log ingest combined with cost-efficient long-term archives for compliance and forensic purposes.

1

Analyze existing log sources and volumes

2

Define formats, retention and SLAs

3

Introduce a collector, normalization and backend

⚠️ Technical debt & bottlenecks

  • Legacy emitters with proprietary formats
  • Neglected retention rules consuming storage
  • Missing automation for index rotation and archiving
ingest-rateindexing-throughputstorage-capacity
  • Storing passwords or sensitive tokens in logs
  • Extending retention across the board instead of selective archiving
  • Triggering alerts on raw log errors without noise filtering
  • Forgetting timezone and NTP issues
  • Insufficient indexing strategy leads to slow searches
  • Undefined field conventions hinder correlation
Knowledge of log formatting and parsingOperating distributed collector and ingest systemsSecurity: masking, access control and auditing
Scalability of the ingest pipelineData retention and compliance requirementsLinking with tracing and metrics
  • Network bandwidth between collector and backend
  • Regulatory constraints on data storage
  • Budget for long-term archiving