Catalog
concept#Security#Reliability#Observability

Incident Response

Structured process for detecting, analysing and containing security incidents and restoring normal operations.

Incident response is a structured process for detecting, assessing and containing security incidents and restoring normal operations.
Established
Medium

Classification

  • Medium
  • Organizational
  • Organizational
  • Intermediate

Technical context

Security Information and Event Management (SIEM)Endpoint Detection and Response (EDR)Ticketing and ChatOps systems for coordination

Principles & goals

Early preparation reduces response time.Clear roles and communication channels are essential.Lessons learned and continuous improvement close the loop.
Run
Enterprise, Team

Use cases & scenarios

Compromises

  • Misjudgements can lead to incorrect containment and follow-up issues.
  • Sensitive information may be exposed during response.
  • Over-automation can result in missing contextual analysis.
  • Regular tabletop exercises with cross-functional teams.
  • Version and review playbooks after every incident.
  • Separate evidence preservation from recovery activities.

I/O & resources

  • Telemetry from SIEM, EDR, network logs
  • Contact data and escalation matrix
  • Playbooks, runbooks and verification procedures
  • Containment and recovery actions
  • Forensic artifacts and analysis reports
  • Improvement actions and updated playbooks

Description

Incident response is a structured process for detecting, assessing and containing security incidents and restoring normal operations. It includes preparation, detection, analysis, containment, eradication and lessons learned. The goal is to minimise damage, enable rapid recovery and continuously strengthen organisational resilience.

  • Faster restoration of services after security incidents.
  • Reduction of damage scope and downtime.
  • Improved transparency and accountability within the organisation.

  • Requires continuous maintenance of playbooks and tools.
  • Depends on quality of underlying telemetry.
  • Can slow down when responsibilities are unclear.

  • Mean Time to Detect (MTTD)

    Average time from incident occurrence to detection.

  • Mean Time to Respond (MTTR)

    Average time to initial response or containment.

  • Number of recurring incidents

    Count of incidents that reoccur after closure.

Organisation with dedicated CSIRT

A company operates a dedicated Computer Security Incident Response Team with clear escalation and communication processes.

Cloud service provider with playbooks

A cloud provider uses standardized playbooks for common incidents and automated runbooks to speed up recovery.

Small team with external incident support

A startup relies on external specialists for forensic analysis while focusing internal resources on coordination and communication.

1

Establish an incident response team and role allocation.

2

Create and test playbooks for common incidents.

3

Integrate telemetry sources and establish alerting.

⚠️ Technical debt & bottlenecks

  • Outdated playbooks and missing automation scripts.
  • Fragmented log storage complicates correlation analysis.
  • Insufficient documentation of recovery processes.
Communication overheadSkill shortageTool and data integration
  • Immediately restoring production systems without forensics.
  • Publicly communicating sensitive details during an ongoing investigation.
  • Automatically blocking accounts without escalation for legitimate exceptions.
  • Over-optimisation for speed instead of contextual quality.
  • Unclear severity criteria lead to misprioritisation.
  • Untested playbooks fail in real incidents.
Fundamentals of IT forensics and log analysisCommunication and crisis management skillsKnowledge of relevant compliance and reporting obligations
Reliable telemetry and log consistencyFast communication and escalation pathsRepeatable, tested playbooks and runbooks
  • Limited forensic capacity for parallel incidents.
  • Regulatory requirements for data retention and reporting.
  • Restricted access to historical telemetry data.