Incident Response & Disaster Recovery Procedure
Last Updated: 2025-05-17
1. Overview
At Cust, the security of customer data and the reliability of our service are foundational to our platform. This document outlines our procedures for responding to security incidents and for recovering the service in the event of a major disruption. Our approach is designed to minimize impact, ensure transparent communication, and maintain the trust of our customers.
2. Incident Response (IR) Procedure
Our IR procedure governs our response to a confirmed security event (e.g., a data breach, unauthorized access). It is aligned with the NIST Cybersecurity Framework to ensure a structured and effective response.
- Incident Commander: Our CTO, Karolis Januškas, serves as the designated Incident Commander and leads the Incident Response Team.
- Incident Response Framework:
- Phase 1: Detection & Analysis
We employ 24/7 automated monitoring of our application and infrastructure, using services including Sentry. These systems are configured to generate real-time alerts upon detection of system errors, anomalous activity, or potential security vulnerabilities. Any alert triggers an immediate analysis by our engineering team to determine the scope and severity of the potential incident. - Phase 2: Containment & Eradication
Upon confirmation of a security incident, our immediate priority is to contain the threat to prevent further impact. This may include isolating affected systems, revoking credentials, or deploying emergency patches. Once contained, our team works to eradicate the root cause of the incident to ensure it cannot reoccur. - Phase 3: Notification & Communication
Cust is committed to transparent communication. In the event of a security incident impacting customer data, we will notify affected customers via email without undue delay. Our policy is to provide this notification within 24 hours of confirming the incident. The communication will be delivered to your designated security contact and will provide details about the nature of the incident, the potential impact, and our response actions. - Phase 4: Post-Mortem & Lessons Learned
After every incident, we conduct a thorough, blameless post-mortem to analyze the root cause, the effectiveness of our response, and opportunities for improvement. The findings are used to implement corrective actions and enhance our security posture.
3. Disaster Recovery (DR) & Business Continuity
Our DR procedure governs our ability to maintain service availability and recover from major outages (e.g., infrastructure failure, regional disruption).
- Hosting & High Availability Architecture:
- Platform: All of our infrastructure is hosted on Heroku, a Salesforce company, which runs on Amazon Web Services (AWS).
- Data Residency: All US customer data resides within the AWS US East region, ensuring data remains within the United States. International customer data resides within AWS EU Central region.
- Resilience: Our platform is deployed across multiple AWS Availability Zones. In the event of a single server or zone failure, our system is designed to failover automatically with minimal to no service interruption.
- Recovery Objectives & Backup Strategy:
- Recovery Point Objective (RPO): Our target for potential data loss. We maintain continuous, point-in-time backups of our production databases. This allows for an RPO of under 5 minutes.
- Recovery Time Objective (RTO): Our target for time to recover. In the unlikely event of a full regional outage, our disaster recovery plan targets a full service restoration with an RTO of under 4 hours.
- Backup Retention: Backups are retained for 30 days, allowing us to restore the database to any specific minute within that window.
- DR Testing & Validation:
We test our disaster recovery plan annually to ensure our procedures, tools, and team are prepared to meet our stated RTO/RPO targets.