Incident Response

Purpose

This page documents the incident response process for Maqsafy services.

The purpose is to provide a controlled process for detecting, analyzing, containing, resolving, and documenting operational and security incidents.

Incident Response Scope

This process applies to:

Application outages
API failures
Database issues
Queue failures
Redis failures
Nginx or reverse proxy issues
Payment or wallet incidents
Credential-related incidents
Data access or RBAC incidents
Security incidents
Backup or restore failures
Integration failures

Incident Response References

NIST SP 800-61 Rev. 3 is the current NIST guidance for incident response recommendations and considerations within cybersecurity risk management. NIST’s incident response project states that the goal is to help organizations prepare for incident response, reduce incident impact, and improve detection, response, and recovery activities.

CISA’s incident response plan basics states that an incident response plan should clarify roles and responsibilities and guide key activities during incidents.

Incident Severity Levels

Severity	Description	Example
SEV-1 Critical	Major production outage or high-impact security/financial incident	Platform down, payment/wallet integrity issue, data exposure
SEV-2 High	Major feature or service degraded with business impact	Login outage, API failure, queue backlog affecting users
SEV-3 Medium	Partial issue with workaround available	Report failure, delayed notifications, non-critical integration issue
SEV-4 Low	Minor issue with limited impact	UI bug, single non-critical job failure

Severity Assignment Rules

If financial integrity may be affected, classify as SEV-1 or SEV-2 until confirmed.
If student data may be exposed, classify as SEV-1 until confirmed.
If tenant isolation may be broken, classify as SEV-1 until confirmed.
If credential deactivation/replacement is incorrect, classify as SEV-1 or SEV-2 based on impact.
If production is fully unavailable, classify as SEV-1.
If the issue is limited to staging or local environment, classify based on release impact.

Incident Lifecycle

1. Detection

Incident detection may come from:

Monitoring alerts
User reports
Support tickets
Application logs
Payment gateway alerts
Failed backup alerts
Security events
Team observation

2. Triage

During triage, confirm:

Check	Required
Environment identified	Yes
Affected service identified	Yes
User impact estimated	Yes
Financial impact estimated	If applicable
Data exposure risk assessed	If applicable
Tenant isolation risk assessed	If applicable
Incident severity assigned	Yes
Incident owner assigned	Yes

3. Containment

Containment actions may include:

Disable affected feature temporarily.
Block suspicious access.
Stop affected queue job.
Isolate compromised account.
Disable affected integration.
Put application in maintenance mode if required.
Prevent duplicate financial processing.
Freeze high-risk withdrawal or refund workflow if needed.

4. Eradication

Eradication removes the confirmed root cause.

Examples:

Fix code defect.
Correct configuration.
Patch vulnerable package.
Rotate exposed credentials.
Fix RBAC or tenant isolation logic.
Correct failed migration.
Fix Nginx upstream configuration.
Repair queue worker configuration.

5. Recovery

Recovery restores normal operation.

Recovery checks:

Check	Expected Result
Application health endpoint	Healthy
Login	Working
API endpoints	Working
Queue workers	Processing jobs
Redis	Reachable
Database	Reachable
Nginx	No critical errors
Payment flow	Working, if impacted
Wallet ledger	Reconciled, if impacted
Credential flow	Correct, if impacted
Logs	No new critical exceptions

6. Post-Incident Review

Post-incident review must document:

Timeline
Root cause
Impact
Detection method
Resolution
Preventive actions
Owner of corrective actions
Evidence collected
Follow-up deadline

Incident Communication Channel

Channel	Purpose	Status
Slack	Primary incident communication channel for all severity levels	Confirmed

Incident Roles

Role	Responsibility
Incident Commander	Owns incident coordination and decisions
Technical Lead	Leads technical investigation and fix
Operations Owner	Handles infrastructure, deployment, monitoring, and recovery
Product Owner	Confirms business impact and user-facing priority
Support Owner	Handles user/support communication
Security Owner	Leads security assessment and evidence handling
Finance Owner	Reviews wallet, payment, refund, and settlement impact
Communications Owner	Prepares internal or external communication if required

Confirmed Incident Leadership

Name / Role	Incident Leadership Role	Status
CTO	Incident Commander / Technical Leadership	Confirmed
Product Manager	Product Impact and Priority	Confirmed
Manager Support	Support and User Communication	Confirmed
CEO	Executive Escalation	Confirmed

Role Assignment

Incident Type	Required Owners
Application outage	Incident Commander, Technical Lead, Operations Owner
Payment or wallet incident	Incident Commander, Technical Lead, Finance Owner
Credential incident	Incident Commander, Technical Lead, Product Owner
Data exposure incident	Incident Commander, Security Owner, Technical Lead
Integration failure	Technical Lead, Operations Owner, Integration Owner
Backup failure	Operations Owner, Technical Lead

Communication Rules

Internal Communication

Internal updates should include:

Field	Description
Incident ID	Unique incident reference
Severity	SEV-1 / SEV-2 / SEV-3 / SEV-4
Status	Investigating / Identified / Contained / Recovering / Resolved
Impact	Affected users, services, or operations
Current action	What is being done now
Owner	Responsible person/team
Next update	Expected update time

External Communication

External communication is required only when approved by management or required by contract, regulation, or business impact.

External messages must not include:

Internal IPs
Secrets
Private logs
Detailed exploit information
Customer personal data
Unconfirmed assumptions

Incident Types and Runbooks

Application Down

Symptoms

Health endpoint fails.
Users cannot access dashboard or app.
Nginx returns 502, 503, or 504.

First Checks

curl -I https://example.com
sudo nginx -t
sudo tail -f /var/log/nginx/error.log
docker ps
docker logs <container-name>

Containment

Confirm affected service.
Restart failed service if safe.
Roll back recent deployment if confirmed as cause.
Notify stakeholders if production impact is high.

Symptoms

Users cannot log in.
API returns unauthorized or validation errors.
OTP delivery fails.

First Checks

tail -f storage/logs/laravel.log
php artisan route:list
php artisan config:cache

Containment

Confirm if issue affects all users or a specific role.
Check authentication provider or OTP provider.
Check rate limits and failed login logs.
Avoid disabling security controls without approval.

Queue Backlog

Symptoms

Notifications delayed.
Reports delayed.
Exports not generated.
Background jobs not processing.

First Checks

php artisan queue:failed
php artisan queue:work
php artisan queue:restart
docker ps
docker logs <queue-worker-container>

Containment

Restart queue workers.
Pause non-critical heavy jobs.
Prioritize financial or user-impacting jobs.
Review failed jobs before retrying sensitive jobs.

Redis Failure

Symptoms

Cache, sessions, or queues fail.
Logs show Redis connection errors.
Queue jobs stop processing.

First Checks

docker ps
docker logs <redis-container>
docker network ls
docker network inspect <network-name>
redis-cli -h <redis-host> -p 6379 ping

Containment

Confirm Redis is running.
Confirm service name and network.
Restart dependent services if required.
Avoid clearing production cache without understanding impact.

Database Incident

Symptoms

Login fails.
Dashboard pages fail to load.
API returns database errors.
Reports or financial workflows fail.

First Checks

php artisan migrate:status
mysql -h <db-host> -u <db-user> -p
tail -f storage/logs/laravel.log

Containment

Stop destructive jobs or migrations.
Preserve logs.
Confirm backup availability.
Avoid manual production data edits without approval and audit record.

Payment or Wallet Incident

Symptoms

Wallet balance mismatch.
Payment succeeded but wallet not credited.
Refund duplicated or failed.
Settlement mismatch.

First Checks

Check payment gateway reference.
Check transaction table.
Check wallet ledger.
Check failed jobs.
Check webhook logs.
Check reconciliation reports.

Containment

Stop duplicate processing.
Freeze affected transaction workflow if needed.
Do not manually adjust balances without approval.
Use adjustment or reversal records instead of silent edits.
Preserve all references and logs.

Recovery

Reconcile gateway status with internal ledger.
Correct through controlled financial records.
Confirm no duplicate wallet effect.
Document final financial impact.

Credential Incident

Symptoms

Credential assigned to wrong student.
Credential delivery status incorrect.
Credential deactivated incorrectly.
Credential duplication risk.

First Checks

Check credential inventory record.
Check student and wallet linkage.
Check lifecycle events.
Check audit logs.
Check actor and timestamp.

Containment

Disable affected credential if risk is confirmed and approved.
Prevent further use if credential integrity is uncertain.
Do not delete credential history.
Record all corrective actions.

Recovery

Correct assignment through approved workflow.
Record reason.
Confirm lifecycle history.
Notify responsible school user if needed.

RBAC or Tenant Isolation Incident

Symptoms

User sees data outside assigned scope.
School Manager sees another school’s data.
Supplier sees another supplier’s order.
Operator sees another cafeteria’s records.

First Checks

Identify user role and scope.
Identify affected endpoint.
Review query filters.
Review authorization middleware.
Check access logs and audit logs.

Containment

Disable affected endpoint or feature if needed.
Remove excessive permission.
Patch backend authorization.
Review similar endpoints.
Force logout affected sessions if required.

Recovery

Deploy fix.
Test positive and negative access cases.
Confirm no cross-tenant data remains visible.
Document data exposure assessment.

Evidence Handling

Preserve evidence for security, financial, and data incidents.

Evidence may include:

Application logs
Nginx logs
Database audit records
Payment gateway references
Queue failed job records
User and role records
Credential lifecycle events
Screenshots with sensitive data masked
Timeline of actions

Evidence Rules

Do not edit original logs.
Do not share raw logs externally without review.
Mask personal data before sharing screenshots.
Preserve timestamps and reference IDs.
Keep evidence access restricted.

Incident Timeline Template

Use this format during the incident.

## Incident Timeline

| Time | Event | Owner | Notes |
|---|---|---|---|
| YYYY-MM-DD HH:mm | Incident detected | TBD | TBD |
| YYYY-MM-DD HH:mm | Severity assigned | TBD | TBD |
| YYYY-MM-DD HH:mm | Containment started | TBD | TBD |
| YYYY-MM-DD HH:mm | Root cause identified | TBD | TBD |
| YYYY-MM-DD HH:mm | Fix deployed | TBD | TBD |
| YYYY-MM-DD HH:mm | Service recovered | TBD | TBD |
| YYYY-MM-DD HH:mm | Incident closed | TBD | TBD |

Incident Report Template

Use this template after resolution.

## Incident Report: INC-YYYYMMDD-001

| Field | Details |
|---|---|
| Incident ID | INC-YYYYMMDD-001 |
| Severity | SEV-1 / SEV-2 / SEV-3 / SEV-4 |
| Environment | Production / Staging / Local |
| Start Time | YYYY-MM-DD HH:mm |
| End Time | YYYY-MM-DD HH:mm |
| Duration | TBD |
| Affected Services | TBD |
| Affected Users | TBD |
| Business Impact | TBD |
| Financial Impact | TBD |
| Data Exposure | Yes / No / Under Review |
| Root Cause | Confirmed cause only |
| Resolution | Fix applied |
| Verification | How recovery was confirmed |
| Owner | Responsible person/team |

Corrective Actions

Action	Owner	Due Date	Status
TBD	TBD	TBD	Open

Escalation Criteria

Escalate to management when:

Production is down.
Payment or wallet integrity may be affected.
Student data may be exposed.
Tenant isolation may be broken.
Credentials may be misassigned or compromised.
Backup or restore capability is impaired.
Regulatory, contractual, or school communication may be required.

Closure Criteria

An incident can be closed only when:

Check	Required
Service restored	Yes
Root cause confirmed	Yes
Impact assessed	Yes
Logs preserved where needed	Yes
Financial reconciliation completed, if applicable	Yes
Data exposure assessment completed, if applicable	Yes
Corrective actions documented	Yes
Owner assigned for follow-up actions	Yes

Open Items

Item	Status	Notes
Confirm incident commander	Confirmed — CTO	See Confirmed Incident Leadership above
Confirm escalation contacts	Confirmed — CTO, Product Manager, Manager Support, CEO	See Confirmed Incident Leadership above
Confirm communication channels	Confirmed — Slack	Primary incident channel
Confirm incident ID format	Needs Technical Verification	Required
Confirm evidence storage location	Needs Technical Verification	Required
Confirm customer notification policy	Needs Technical Verification	Required
Confirm security incident legal review process	Needs Technical Verification	Required
Confirm financial reconciliation owner	Needs Technical Verification	Required

Rules

Do not make unconfirmed claims during an incident.
Do not delete logs.
Do not manually change financial records without approval.
Do not expose customer data in incident messages.
Do not share secrets or internal infrastructure details externally.
Document root cause only after confirmation.
Use Under Review when impact is not yet confirmed.

Purpose​

Incident Response Scope​

Incident Response References​

Incident Severity Levels

Severity Assignment Rules​

Incident Lifecycle

1. Detection​

2. Triage​

3. Containment​

4. Eradication​

5. Recovery​

6. Post-Incident Review​

Incident Communication Channel​

Incident Roles

Confirmed Incident Leadership​

Role Assignment​

Communication Rules

Internal Communication​

External Communication​

Incident Types and Runbooks

Application Down​

Symptoms​

First Checks​

Containment​

Login Failure​

Symptoms​

First Checks​

Containment​

Queue Backlog​

Symptoms​

First Checks​

Containment​

Redis Failure​

Symptoms​

First Checks​

Containment​

Database Incident​

Symptoms​

First Checks​

Containment​

Payment or Wallet Incident​

Symptoms​

First Checks​

Containment​

Recovery​

Credential Incident​

Symptoms​

First Checks​

Containment​

Recovery​

RBAC or Tenant Isolation Incident​

Symptoms​

First Checks​

Containment​

Recovery​

Evidence Handling

Evidence Rules​

Incident Timeline Template

Incident Report Template

Corrective Actions​

Escalation Criteria

Closure Criteria

Open Items

Rules​

Purpose

Incident Response Scope

Incident Response References

Severity Assignment Rules

1. Detection

2. Triage

3. Containment

4. Eradication

5. Recovery

6. Post-Incident Review

Incident Communication Channel

Confirmed Incident Leadership

Role Assignment

Internal Communication

External Communication

Application Down

Symptoms

First Checks

Containment

Login Failure

Symptoms

First Checks

Containment

Queue Backlog

Symptoms

First Checks

Containment

Redis Failure

Symptoms

First Checks

Containment

Database Incident

Symptoms

First Checks

Containment

Payment or Wallet Incident

Symptoms

First Checks

Containment

Recovery

Credential Incident

Symptoms

First Checks

Containment

Recovery

RBAC or Tenant Isolation Incident

Symptoms

First Checks

Containment

Recovery

Evidence Rules

Corrective Actions

Rules