Basalt Studio logo
Basalt Studio.Basalt Studio.
Back

A guide to incident response playbooks

Eliott Ardisson

Eliott Ardisson

Founder & CEO - Basalt Studio

Updated
tutorials

A practical guide to building incident response playbooks: how to structure them, assign roles, automate workflows, and maintain them over time.

ai agents
automation
programmatic

TL;DR

  • An incident response playbook is a documented, step-by-step procedure for handling a specific type of security incident — it replaces improvised coordination with a tested process.
  • Start by mapping your actual threat landscape before writing a single procedure. The playbooks you build first should reflect the incidents your team already handles most often.
  • Clear role assignments matter as much as the technical steps. During an active incident, ambiguity about who decides what causes more damage than the threat itself.
  • Automation is worth introducing incrementally — start with alert routing and notifications, not full autonomous response.
  • Playbooks decay. A procedure written for last year’s infrastructure and last year’s team is a liability, not an asset. Build review cycles in from the start.

What an Incident Response Playbook Actually Is

An incident response playbook is a structured document that tells your team exactly what to do when a specific security incident occurs. It specifies who acts, in what sequence, with which tools, within what timeframes, and how decisions get escalated.

That definition sounds obvious. In practice, most organizations conflate playbooks with general incident response policies, or produce documents so abstract they provide no actionable guidance under real pressure. A useful playbook is operationally specific: it names tools, assigns actions to roles, and includes the exact communication templates your team will use at 2 AM.

The six phases most frameworks reference — preparation, identification, containment, eradication, recovery, and lessons learned — are a useful skeleton. But a phase list is not a playbook. A playbook tells your on-call engineer what to do in the first fifteen minutes of a ransomware event, before the incident commander even picks up the phone.


Why This Matters More Than the Theory Suggests

IBM’s annual Cost of a Data Breach report consistently shows a meaningful gap in breach costs between organizations with mature incident response capabilities and those without. The gap isn’t a rounding error — it reflects faster containment, fewer repeat compromises, and stronger regulatory standing.

But the financial argument, while real, undersells the practical one. When an incident hits, your team is operating under stress, often with incomplete information, and sometimes at odd hours. A playbook doesn’t eliminate those conditions. It reduces the cognitive load enough that your team can function effectively despite them.

Without structured procedures, a few things reliably go wrong:

  • Delayed initial response. Manual coordination adds hours to the time between detection and containment. That window is when threats spread.
  • Inconsistent actions. Two engineers responding independently may take conflicting containment steps — one isolates a system, the other keeps it online for monitoring, and the result is neither.
  • Communication drift. Stakeholders receive different status updates from different people. Executives make decisions based on stale information. Customers hear things before legal has reviewed them.
  • Compliance failures. GDPR, HIPAA, and equivalent frameworks impose notification deadlines — typically 72 hours. Poor coordination is the most common reason organizations miss them.

Structured playbooks address all four. They don’t require a large security budget to produce. The constraint is time and discipline, not spend.


Before You Write Anything: Assess What You’re Actually Dealing With

Playbook development fails when organizations skip the assessment phase and start writing procedures from memory or generic templates. The result is documentation that describes how someone imagines incidents unfold, not how they actually do in your environment.

Run a structured review of the past two to three years of security incidents first. You’re looking for patterns: which incident types occur most often, which attack vectors they used, which systems were affected, how long response took, and where the process broke down. This is your threat landscape, and it should drive playbook prioritization.

Alongside that, audit your current response capabilities honestly:

  • Team coverage. Who participates in incident response? Is there genuine 24/7 availability, or does coverage drop off on evenings and weekends?
  • Tool inventory. List every security tool, monitoring system, and communication platform your team uses during incidents. Note which are well-integrated and which require manual context-switching.
  • Existing documentation. Most organizations have some procedures written down somewhere. Identify what exists, assess how current it is, and flag where the gaps are.
  • Skills. Where does your team have strong technical depth? Where are the gaps that would require external support during a complex incident?

This assessment shapes everything downstream. Without it, you’re writing playbooks for a hypothetical organization, not yours.


Building a Classification Framework

Before playbooks, you need a consistent way to categorize incidents. Classification determines which playbook applies and what response tier activates.

Most organizations work with a combination of incident type and severity level.

Incident types might include: malware infections, phishing campaigns, data breaches, insider threats, distributed denial-of-service attacks, account compromises, and unauthorized system access. These categories should map to your actual threat history, not a generic taxonomy.

Severity levels are typically four-tiered (critical, high, medium, low) and defined by a combination of factors:

  • How many users or systems are affected
  • Whether sensitive data has been exposed or exfiltrated
  • Whether core business operations are impaired
  • What regulatory reporting obligations are triggered
  • What the estimated financial exposure is

The value of a written severity matrix is that it removes subjective judgment from a moment when people are under pressure. An engineer at midnight shouldn’t have to decide unilaterally whether something is a critical incident. The matrix tells them.

Pair your severity levels with defined response timeframes. Critical incidents warrant a 15-minute initial response and 30-minute stakeholder notification. Lower-severity incidents can tolerate longer windows. Write those timeframes down and make them known — they create accountability without requiring surveillance.


How to Structure an Individual Playbook

Each playbook should cover one incident type in enough procedural detail that a competent team member who hasn’t handled this exact scenario before can follow it effectively.

The core components:

  • Incident description. What this playbook covers and the conditions that trigger it.
  • Prerequisites. Required access, tools, and personnel before response begins.
  • Phase-by-phase procedures. Step-by-step actions for each response phase, with named roles attached to each step.
  • Decision points. Explicit criteria for escalating severity, involving external resources, or changing approach.
  • Communication templates. Pre-written messages for internal notifications, executive briefings, customer communications, and regulatory notifications. These should be drafted in advance, not improvised during an incident.
  • Documentation requirements. What information to capture, in what format, and where to store it.

To make this concrete: a phishing response playbook at a mid-sized accounting firm might look like this in practice. In the first fifteen minutes, the security tool quarantines reported emails automatically and the on-call engineer receives an alert. Between fifteen and sixty minutes, that engineer analyzes headers, identifies whether credentials were harvested, pulls the list of users who interacted with the message, and assesses scope. In the first four hours, any compromised accounts get password resets, malicious URLs are blocked at the proxy level, and a user notification goes out. Over the following twenty-four hours, legitimate quarantined emails are reviewed and released, accounts are re-enabled after verification, and the incident is documented for post-incident review.

The specific times and tools will differ by organization. The point is that this level of specificity is what makes a playbook usable under pressure.


Roles and Responsibilities: The Part Most Teams Get Wrong

The technical procedures matter. The role assignments matter more, because the procedures only execute correctly when the right person owns each step.

Define at minimum: an incident commander who holds overall coordination authority, a technical lead responsible for hands-on response, a communications lead who owns all internal and external messaging, a legal or compliance contact for regulatory questions, and a business stakeholder who can make continuity decisions.

Create a RACI matrix for your most consequential activities — initial assessment, technical containment, stakeholder notification, regulatory filing, and business continuity decisions. Map each activity to who is Responsible, Accountable, Consulted, and Informed. This is especially important for activities that cross organizational boundaries.

On-call and escalation procedures deserve their own section. Specify who can initiate a response without waiting for management approval, when to escalate to senior leadership, and who holds decision authority when primary contacts are unavailable. Backup contacts should be current — this is a maintenance issue as much as a design issue.


Where Automation Fits In

Manual incident response has a ceiling. Humans can’t process alert volumes at machine speed, can’t maintain 24/7 attention, and will introduce variation under stress. Automation addresses all three.

The right approach is incremental, not wholesale. Start with tasks that are high-volume, low-judgment, and well-defined:

  • Routing and triaging alerts based on type and severity
  • Quarantining reported phishing emails
  • Sending initial notifications to the response team
  • Creating incident tickets with pre-populated fields
  • Collecting system logs and initial forensic evidence

These automations reduce the manual burden on your first responders and compress the time between detection and initial containment. They’re achievable with workflow tools and API integrations without requiring a dedicated SOAR platform.

As your processes mature, you can extend automation to more complex actions — account suspension based on behavioral signals, automated network segmentation, threat intelligence correlation. At each step, maintain human override capability and monitor for false positives. Automation that fires incorrectly in production causes its own incidents.

In our work helping founder-led professional services firms build incident response workflows, the most common failure point isn’t the automation itself — it’s deploying automation before the underlying manual process is well-understood. Automate a broken process and you get broken results faster.


Keeping Playbooks Current

A playbook written once and filed is not an asset. It becomes a liability the moment your environment changes in ways it doesn’t reflect.

Build maintenance into the system from the start:

  • Quarterly reviews to update personnel, tools, and procedures
  • Immediate updates after infrastructure changes, tool replacements, or significant organizational shifts
  • Post-incident reviews that feed findings back into playbook revisions
  • Annual testing that includes tabletop exercises for each major playbook and at least one full simulation

Tabletop exercises are underused. They surface gaps in procedures that are invisible on paper — unclear escalation triggers, ambiguous role boundaries, communication templates that don’t fit the actual scenario. Run them regularly, debrief honestly, and use the findings.


Common Pitfalls

A few patterns appear consistently in organizations that invest in playbooks but don’t get the expected results:

Over-engineering the first draft. The goal of your first playbook is to cover 80% of a common scenario, not 100% of every possible variant. Complexity that doesn’t get used in real incidents adds cognitive load without adding protection.

Skipping communication planning. Technical teams focus on containment and recovery and treat communication as secondary. Regulators and customers do not share that prioritization. Every playbook should include communication procedures with equal rigor to the technical steps.

Testing only in isolation. Running a tabletop exercise for the security team without involving legal, communications, or business leadership misses half the failure modes. Cross-functional exercises are harder to schedule and more valuable.

Treating tools as the solution. A SOAR platform or workflow automation tool doesn’t create incident response capability. It operationalizes capability you’ve already built through well-designed procedures and trained people. In that order.


Getting Started Without Overcomplicating It

If your organization has no formal playbooks today, the path forward is straightforward. Identify the three incident types your team encounters most frequently. Write one-page procedures for each, focused on the first four hours of response. Define roles, attach timeframes, and include one communication template per playbook. Test each one in a tabletop exercise before treating it as production-ready.

That’s a starting point, not a finished product. But it’s a functional one — and it will surface gaps that you can address in subsequent iterations.

For organizations further along, the priorities shift toward testing rigor, automation expansion, and cross-functional integration. The work is never fully done, which is why maintenance cadence matters as much as initial design.


If you’re working through this for the first time or reassessing a process that’s grown stale, Basalt Studio runs AI strategy sessions to help founder-led businesses map their operational gaps and identify where structured workflows — including incident response — can be automated effectively. You can book a call at https://cal.com/eliott-ardisson-kzq7zs/ai-strategy-call if that’s a useful conversation to have.