Incident Response Procedures

Overview

Incident response for AI systems requires specialized procedures adapted to unique characteristics of machine learning systems. Unlike traditional software incidents, AI incidents may involve data poisoning, model manipulation, or subtle performance degradation that’s harder to detect and diagnose.

This lesson covers practical procedures for detecting, containing, and responding to AI-specific incidents.

Incident Categories and Response Triggers

Category 1: Performance Degradation

Characteristics:

Model accuracy drops unexpectedly
System produces incorrect or nonsensical outputs
Latency increases or timeouts occur
System becomes unreliable or unusable

Triggers for Incident Response:

Performance Degradation Thresholds:
  Accuracy Incident:
    - "Drop > 5% from baseline in any 24-hour period"
    - "Drop > 10% cumulative over 1 week"
    - "Accuracy below published minimum specification"
    - "Accuracy disparity between subgroups widens suddenly"

  Availability Incident:
    - "System downtime > 30 minutes"
    - "Error rate exceeds 1% of requests"
    - "Response latency > 3x normal"
    - "Inability to serve predictions to > 5% of users"

  Output Quality Incident:
    - "Nonsensical outputs (complete gibberish)"
    - "Toxic or harmful content generation"
    - "Rapid contradiction with previous behavior"
    - "Known false outputs detected"

Immediate Actions (0-30 minutes):

Verification: Confirm incident with independent testing
- Run performance test against known good dataset
- Compare current vs recent baseline metrics
- Verify across multiple instances/regions
Impact assessment: Understand scope and severity
- How many users/decisions affected?
- What is the impact of incorrect decisions?
- Is there imminent harm risk?
- Can system be safely continued?
Initial containment:
- For severe cases: pause system, route to human review
- For moderate cases: increase monitoring frequency
- Preserve current model state for forensics
- Begin capturing detailed logs

Example Response:

# Performance Degradation Incident Response

import logging
from datetime import datetime

class PerformanceDegradationHandler:
    def __init__(self, ai_system):
        self.system = ai_system
        self.logger = logging.getLogger(__name__)

    def handle_accuracy_drop(self, current_accuracy, baseline):
        """Handle detected accuracy degradation"""

        drop_pct = ((baseline - current_accuracy) / baseline) * 100

        if drop_pct > 10:  # Critical
            severity = "CRITICAL"
            action = self.pause_system()
        elif drop_pct > 5:  # High
            severity = "HIGH"
            action = self.increase_human_review()
        else:
            severity = "MEDIUM"
            action = self.increase_monitoring()

        incident = {
            'timestamp': datetime.now(),
            'type': 'Performance Degradation',
            'severity': severity,
            'baseline_accuracy': baseline,
            'current_accuracy': current_accuracy,
            'drop_pct': drop_pct,
            'action_taken': action
        }

        self.logger.error(f"Incident: {incident}")
        self.escalate_to_incident_commander(incident)

        return incident

    def pause_system(self):
        """Pause AI system and route to human review"""
        # Update routing: all new requests → human review
        # Stop new model deployments
        # Alert on-call team
        return "System paused; routing to human review"

    def increase_human_review(self):
        """Increase human oversight"""
        # Sample all decisions for review (vs random sampling)
        # Lower confidence threshold for human review
        # Alert supervisors of potential issue
        return "Human review threshold increased"

    def increase_monitoring(self):
        """Increase monitoring frequency"""
        # Move to real-time monitoring (vs hourly)
        # Add additional metrics
        # Alert if condition worsens
        return "Monitoring frequency increased"

    def escalate_to_incident_commander(self, incident):
        """Escalate to incident command"""
        # Page on-call incident commander
        # Create incident ticket
        # Notify affected stakeholders
        pass

Category 2: Bias and Discrimination Incidents

Characteristics:

Disproportionate impact on protected groups
Disparate impact ratio drops below regulatory threshold (e.g., 0.80 for fair lending)
Pattern of unfair or discriminatory decisions
User complaints about discrimination

Triggers for Incident Response:

Bias and Discrimination Incident Triggers:
  Statistical Evidence:
    - "Disparate impact ratio < 0.80 (80% rule)"
    - "False negative rate differs > 5% between groups"
    - "Unexplained performance gap between demographics"
    - "Sudden shift in group treatment"

  User Complaints:
    - "User reports discriminatory treatment"
    - "Social media or press coverage of potential bias"
    - "Complaint to regulatory authority"
    - "Pattern of similar complaints"

  Audit Findings:
    - "Bias testing reveals discrimination"
    - "Data source audit finds discriminatory data"
    - "Model interpretation shows proxy discrimination"

Immediate Actions (0-1 hour):

Validation: Confirm bias allegation
- Run fair lending or fairness analysis
- Segment decisions by protected characteristics
- Calculate disparate impact and fairness metrics
- Check for known proxies (ZIP code, etc.)
Impact determination: Who is affected?
- Count individuals/decisions affected
- Calculate scope of discriminatory treatment
- Assess harm to affected individuals
- Identify pattern or isolated incident
Containment: Prevent further harm
- If pattern confirmed: halt new unfair decisions
- Route discriminatory cases to human review
- Preserve decision records for affected individuals
- Prepare for notification/remediation

Example Response Procedure:

Bias Incident Response Workflow:

Step 1 - Verification (< 30 min):
  Activities:
    - "Pull recent decision data (last 30 days)"
    - "Segment by protected characteristics"
    - "Calculate approval rates and performance metrics by group"
    - "Statistical test for significance"
  Success Criteria:
    - "Bias confirmed or ruled out with confidence"

Step 2 - Impact Assessment (< 1 hour):
  Activities:
    - "Identify all affected individuals"
    - "Quantify decisions that were unfair"
    - "Calculate potential economic harm"
    - "Assess regulatory implications"
  Success Criteria:
    - "Impact clearly understood"
    - "Affected population identified"

Step 3 - Root Cause Analysis (< 4 hours):
  Activities:
    - "Examine training data for bias"
    - "Analyze features used by model"
    - "Check for proxy discrimination"
    - "Review recent changes/updates"
  Possible Root Causes:
    - "Training data reflects historical discrimination"
    - "Feature correlates with protected attribute"
    - "Model uses illegal proxy variables"
    - "Recent data or model change introduced bias"

Step 4 - Remediation Planning (< 24 hours):
  Options:
    1. "Remove discriminatory feature"
    2. "Retrain with debiasing techniques"
    3. "Increase human oversight for affected group"
    4. "Revert to previous model version"
    5. "Retire system pending redesign"

Step 5 - Affected Individual Notification (varies):
  Requirements:
    - "Notify individuals affected by discrimination"
    - "Explain what happened and why"
    - "Offer remediation (reverse decision, offer product)"
    - "Provide appeal mechanism"
    - "Communicate with regulators as required"

Step 6 - Monitoring and Verification (ongoing):
  Activities:
    - "Monitor fairness metrics closely"
    - "Quarterly bias audits for 6 months"
    - "User feedback monitoring"
    - "Regulatory correspondence"

Category 3: Security and Data Incidents

Characteristics:

Unauthorized access to models or training data
Model poisoning or manipulation
Training data exfiltration or breach
Adversarial attack damaging model

Triggers for Incident Response:

Security Incident Triggers:
  Access Control Violations:
    - "Unauthorized model download or access"
    - "Training data accessed outside normal use"
    - "Model weights modified without authorization"
    - "Suspicious access patterns detected"

  Data Breach:
    - "Personal data from training set exfiltrated"
    - "Training data dumped publicly"
    - "Ransomware targeting model repository"
    - "Credential compromise affecting AI system"

  Model Attacks:
    - "Model behavior changes unexpectedly (potential poisoning)"
    - "Adversarial examples causing failures"
    - "Prompt injection attacks on language models"
    - "Model extraction attack detected"

  Supply Chain:
    - "Third-party component vulnerability"
    - "Malicious dependency in model dependencies"
    - "Compromised model from vendor"

Immediate Actions (0-15 minutes):

Containment: Stop the bleeding
- Revoke compromised credentials immediately
- Isolate affected systems from network
- Pause affected models; route to human review
- Secure backup copies of clean models and data
Preservation: Gather evidence
- Copy system logs and monitoring data
- Preserve access logs for last 30 days
- Take snapshots of affected systems
- Document current system state
- Begin forensic imaging
Response activation:
- Page security incident response team
- Notify incident commander
- Begin forensic investigation
- Prepare communication templates

Security Incident Containment:

# AI Security Incident Containment

class AISecurityIncidentHandler:
    def handle_model_compromise(self, detected_at):
        """Handle potential model poisoning/manipulation"""

        # Phase 1: Immediate Containment
        actions = {
            'pause_system': self.pause_model_serving(),
            'isolate_systems': self.isolate_compute_nodes(),
            'revoke_access': self.revoke_recent_credentials(),
            'preserve_logs': self.preserve_system_logs(),
            'notify_team': self.page_security_team()
        }

        # Phase 2: Forensic Investigation
        forensics = {
            'model_integrity': self.verify_model_integrity(),
            'data_integrity': self.verify_training_data_integrity(),
            'access_logs': self.analyze_access_logs(),
            'change_history': self.audit_recent_changes(),
            'threat_assessment': self.assess_threat_indicators()
        }

        # Phase 3: Recovery Decision
        recovery_plan = self.determine_recovery_strategy(forensics)

        return {
            'containment': actions,
            'forensics': forensics,
            'recovery_plan': recovery_plan
        }

    def pause_model_serving(self):
        """Immediately stop model serving"""
        # Stop inference endpoints
        # Clear model from memory
        # Enable fallback/human review
        return "Model serving paused"

    def isolate_compute_nodes(self):
        """Isolate potentially compromised systems"""
        # Disconnect from network
        # Stop pulling from repositories
        # Enable forensic access only
        return "Systems isolated for forensics"

    def revoke_recent_credentials(self):
        """Revoke credentials that could enable attack"""
        # Invalidate API keys
        # Reset database passwords
        # Revoke data access tokens
        # Expire SSH keys
        return "Credentials revoked"

    def verify_model_integrity(self):
        """Verify model hasn't been modified"""
        # Calculate checksum of saved model
        # Compare to last known good checkpoint
        # Check model weights for anomalies
        return "Model integrity assessment"

    def verify_training_data_integrity(self):
        """Verify training data hasn't been poisoned"""
        # Check data file integrity
        # Verify data lineage
        # Look for suspicious additions
        return "Data integrity assessment"

    def analyze_access_logs(self):
        """Analyze who accessed system and when"""
        # Timeline of access events
        # Unusual access patterns
        # Privileged operations
        return "Access log analysis"

    def determine_recovery_strategy(self, forensics):
        """Determine best recovery approach"""
        if forensics['model_integrity']['compromised']:
            return "Restore from clean backup"
        elif forensics['data_integrity']['poisoned']:
            return "Retrain model from validated data"
        else:
            return "Root cause investigation; monitor closely"

Containment Strategies

Human Oversight Escalation

When AI system behavior becomes suspect, immediately increase human involvement:

Human Escalation Levels:

Level 0 - Normal Operation:
  - "Automated decisions with random human sampling"
  - "Sampling rate: 1-5% of decisions"
  - "Purpose: Ongoing QA and fairness monitoring"

Level 1 - Increased Monitoring:
  - "All decisions logged and analyzed"
  - "Automated anomaly detection"
  - "Sampling rate increased to 10-20%"
  - "Purpose: Detect and escalate issues"

Level 2 - High Confidence Required:
  - "Human review required if confidence < 80%"
  - "All uncertain decisions routed to human"
  - "Purpose: Reduce risk of incorrect decisions"

Level 3 - Near-Total Oversight:
  - "Human review required for all decisions"
  - "AI provides recommendation only"
  - "Human makes final decision"
  - "Purpose: Maximum control during incident"

Level 4 - System Pause:
  - "AI system disabled completely"
  - "All decisions made by humans"
  - "Purpose: Complete control while investigating"

Rollback Strategies

For incidents caused by recent changes, rollback may be appropriate:

Rollback Decision Framework:

Pre-Rollback Checks:
  - "Verify that version to rollback is clean/trusted"
  - "Assess impact of rollback on legitimate users"
  - "Calculate time needed for rollback"
  - "Prepare reverse rollback procedure"

Rollback Execution:
  - "Take snapshot of current (problematic) state"
  - "Stop current model serving"
  - "Deploy previous stable version"
  - "Verify behavior with test dataset"
  - "Resume serving"

Post-Rollback:
  - "Monitor closely for issues (1 hour)"
  - "Compare performance to expectations"
  - "Plan investigation of problematic version"
  - "Plan remediation before re-deployment"

Scenario: Model Retrain Introduced Bias
  Rollback: Return to previous model version
  Impact: Affects 10,000 loans pending decision
  Action: Rollback to previous version immediately
  Timeframe: 15 minutes
  Then: Investigate retraining procedure; test new version

Scenario: Feature Addition Decreased Accuracy
  Rollback: Remove new feature, retrain
  Impact: Minor (only affects new feature scoring)
  Action: Can wait for proper fix vs rollback
  Timeframe: Do full fix in next release
  Then: Investigate why feature testing missed issue

Communication and Escalation

Incident Command Structure

Establish clear roles during incidents:

Incident Command Structure:

Incident Commander:
  Responsibilities:
    - "Overall incident coordination"
    - "Decision-making authority"
    - "Communication with executives"
    - "Scope definition and prioritization"
  Activation: "Automatically paged on critical/high severity"

Technical Lead:
  Responsibilities:
    - "Root cause analysis"
    - "Technical mitigation decisions"
    - "Recovery planning"
    - "Post-incident technical review"

Communications Lead:
  Responsibilities:
    - "Stakeholder communication"
    - "Regulatory notifications if required"
    - "User/customer communication"
    - "Press coordination"

Compliance/Legal:
  Responsibilities:
    - "Regulatory requirement assessment"
    - "Legal risk evaluation"
    - "Notification requirements"
    - "Documentation for audit"

Subject Matter Experts:
  - "AI/ML specialists"
  - "Data engineers"
  - "Security engineers"
  - "Product owners"
  Responsibilities:
    - "Technical investigation"
    - "Mitigation implementation"
    - "Testing and validation"

Communication Templates

Prepare templates for different incident scenarios:

Template: Internal Stakeholder Notification (Bias Incident)

Subject: Incident Report - Potential Discrimination in Loan Approval System

---

An incident has been detected in our Loan Approval AI system affecting
credit decisions made over the past 72 hours.

INCIDENT SUMMARY:
- Type: Potential discrimination (disparate impact)
- System Affected: Credit Decision Engine v2.3
- Scope: ~2,500 credit decisions in the past 3 days
- Severity: HIGH

WHAT HAPPENED:
Monitoring detected a significant change in approval rates for applicants
in certain demographic groups. Preliminary analysis shows the approval
rate disparity has widened unexpectedly.

ACTIONS TAKEN:
1. System routing increased to 100% human review
2. Investigation into root cause initiated
3. Historical decisions audited for impact
4. Compliance team notified

NEXT STEPS:
- Root cause analysis complete: 4 hours
- Affected individuals identified: 6 hours
- Remediation plan: 12 hours
- Execution: 24-48 hours

CURRENT STATUS:
All new credit decisions are being reviewed by humans. Normal AI operation
is suspended pending investigation. There is no immediate risk of additional
discriminatory decisions.

---

For questions: [Incident Commander Name], [Contact Info]

Evidence Preservation

AI incidents require careful evidence preservation for forensics and post-mortem:

Evidence Preservation Checklist:

System State:
  ☐ "Complete model files and weights"
  ☐ "Model hyperparameters and configuration"
  ☐ "Training data manifest and metadata"
  ☐ "Feature engineering code and configuration"
  ☐ "Preprocessing pipeline documentation"

Logs and Telemetry:
  ☐ "Application logs (last 30 days)"
  ☐ "System logs (access, authentication)"
  ☐ "Model serving logs (predictions, confidence)"
  ☐ "Performance metrics (accuracy, latency)"
  ☐ "Monitoring and alerting logs"

Access and Changes:
  ☐ "User access logs"
  ☐ "Code repository commit history"
  ☐ "Model versioning history"
  ☐ "Configuration change logs"
  ☐ "Deployment records"

Data:
  ☐ "Recent prediction data (input/output)"
  ☐ "Training data snapshot"
  ☐ "Test data and validation results"
  ☐ "Human review/override decisions"

Documentation:
  ☐ "Incident timeline"
  ☐ "Initial and ongoing assessment notes"
  ☐ "Forensic analysis notes"
  ☐ "Communication records"
  ☐ "Decision logs"

Key Takeaway

Key Takeaway: Effective incident response requires preparation, clear procedures for different incident types, and a well-organized incident command structure. Quick detection, decisive containment, and careful evidence preservation enable fast recovery while protecting the organization and affected individuals.

Exercise: Develop Incident Response Playbooks

Identify incident types: What are the top 3 incidents your systems could face?
Response procedure: For each type, document immediate actions (0-30 min)
Decision trees: How will you decide between pause, rollback, or live troubleshooting?
Communication templates: Draft templates for stakeholder notification
Tabletop exercise: Walk through scenario-based responses
Continuous improvement: How will you learn from real or simulated incidents?

Next: AI Security Forensics