Segments

Observability & Monitoring

Observability and monitoring are crucial for understanding and managing complex systems.

Model order
  1. Knowledge domains
  2. /Thematic areas
  3. /Segments
  4. /Building blocks
View
Segment
Type
Classification
MethodAlerting & Incident Link

Alerting

A process for monitoring and notifying critical events.

#Observability#Reliability
MethodAlerting & Incident Link

Incident Management

A systematic approach to identifying and resolving incidents in IT environments.

#Observability#Reliability
ConceptAlerting & Incident Link

On-Call

Organized team duty to respond to incidents and operational disruptions outside regular hours. Purpose is rapid recovery, minimizing downtime, and providing clear escalation paths.

#Reliability#Observability
ConceptGovernance & Operational Practice

Error Budget Policy

A policy that defines a service's tolerable error budget and the organizational actions triggered when that budget is exceeded.

#Reliability#Governance
ConceptGovernance & Operational Practice

Observability Practice

A conceptual guide for systematically capturing, correlating and analysing telemetry (metrics, traces, logs) to enable fast debugging and performance optimisation.

#Observability#Reliability
ConceptGovernance & Operational Practice

Service Level Objective (SLO)

A Service Level Objective (SLO) defines specific performance expectations for a service.

#Observability#Reliability
ConceptInstrumentation & Data Collection

Instrumentation

Strategic collection of telemetry from software and infrastructure to make behavior, performance and operational state measurable.

#Observability#Platform
ConceptInstrumentation & Data Collection

Telemetry Collection

Concept for systematically collecting and forwarding metrics, logs and traces to support observability and operations.

#Observability#Platform
TechnologyInstrumentation & Data Collection

OpenTelemetry

Open standard and toolkit for instrumenting and collecting traces, metrics and logs via SDKs, collectors and exporters.

#Observability#Platform
ConceptSignals & Telemetry

Distributed Tracing

Technique for tracking and correlating requests across services to make performance issues and root causes in distributed systems visible.

#Observability#Reliability
ConceptSignals & Telemetry

Logs

Time-ordered records of events and state changes used for debugging, monitoring, and forensic analysis.

#Observability#Reliability
ConceptSignals & Telemetry

Metrics

Metrics help measure and analyze the performance and efficiency of processes.

#Data#Analytics
ConceptTracing & Service Insights

Dependency Mapping

Systematic capture and visualization of dependencies between components, services and teams to support architecture and decision-making processes.

#Architecture#Integration
ConceptTracing & Service Insights

Distributed Tracing

Technique for tracking and correlating requests across services to make performance issues and root causes in distributed systems visible.

#Observability#Reliability
ConceptTracing & Service Insights

Service Map

Visual representation of services and their runtime dependencies to analyze communication, impact and failure sources.

#Architecture#Observability
ConceptVisualization & Dashboards

Data Visualization

Data visualization is the graphical representation of data to make patterns, trends, and insights visible.

#Data#Analytics
ConceptVisualization & Dashboards

Observability Dashboard

Central dashboard for visualizing and analyzing telemetry (metrics, logs, traces) to enable rapid incident diagnosis and performance monitoring.

#Observability#Platform
ToolVisualization & Dashboards

Grafana

Grafana is an open-source tool for visualizing and analyzing data.

#Data#Platform