Disaster Recovery Overview

This section covers the Disaster Recovery (DR) planning, runbooks, and procedures for the SONAN DIGITAL CRM platform. The goal of DR planning is to minimize downtime, protect client data, and ensure the business can continue operating after any failure — whether that is a Vercel deployment failure, a Supabase outage, accidental data deletion, or a full infrastructure failure requiring recreation from scratch.

⚠️

Always read this section before any production incident

During a live incident, stress and time pressure lead to mistakes. Familiarize yourself with these documents before an incident occurs so that recovery steps are second nature.

Purpose

The SONAN DIGITAL CRM is a multi-tenant SaaS platform handling confidential client data, financial transactions, and business-critical workflows. Failures at any layer of the stack — hosting, database, authentication, email, or payments — directly affect client operations. This DR plan exists to:

Define clear recovery time and recovery point objectives so that response is structured, not ad hoc
Assign ownership and escalation paths for each incident type
Provide step-by-step runbooks that can be followed under pressure without prior context
Create a post-incident review culture so that failures drive system improvements

RTO and RPO Targets

Metric	Definition	Target
RTO (Recovery Time Objective)	Maximum acceptable downtime before service is restored	4 hours for P1
RPO (Recovery Point Objective)	Maximum acceptable data loss measured in time	1 hour for P1

These targets are based on the current infrastructure tier (Supabase Pro PITR, Vercel Pro deployment history). If the project is downgraded to a free tier on any provider, these targets must be revised.

Incident Classification

Classification	Examples	Target RTO	Target RPO
P1 Critical	Full platform outage, complete auth failure, confirmed data loss, security breach	4 hours	1 hour
P2 High	Major feature unavailable (billing, contracts, proposals), partial auth failure, significant performance degradation	8 hours	4 hours
P3 Medium	Non-critical feature down (wiki, notifications, reporting), cosmetic data inconsistency	24 hours	24 hours
P4 Low	Minor bugs, UI glitches, single-user edge cases, cosmetic issues	72 hours	N/A

ℹ️

Classification determines urgency, not effort

A P4 bug may be complex to fix but is low priority. A P1 outage may be resolved by a single rollback click. Classify by business impact, not by engineering complexity.

Stack Overview

The platform is composed of the following services, each with distinct failure modes:

Layer	Service	Provider	Failure Impact
Hosting & Edge Runtime	Next.js 15 on Vercel	Vercel	Full platform unavailable
Database	PostgreSQL via Supabase	Supabase	All data reads/writes fail
Authentication	Supabase Auth	Supabase	All logins fail
File Storage	Supabase Storage	Supabase	Document uploads/downloads fail
Transactional Email	Resend	Resend	Email notifications and invites fail
Payments	Stripe	Stripe	Billing and invoice flows fail
Error Monitoring	Sentry	Sentry	Error visibility lost (not user-facing)

Document Index

Document	Purpose
DR Plan	Full disaster recovery plan — scope, infrastructure map, failure strategies, communication
Production Recovery Runbook	Step-by-step recovery for 5 specific failure scenarios
Database Restore Guide	How to restore the Supabase PostgreSQL database using PITR or manual backups
Storage Restore Guide	How to recover Supabase Storage files and resolve orphaned records
Environment Recreation Guide	Full ground-up recreation of all infrastructure from scratch
Rollback Guide	How to roll back a Vercel deployment and handle schema migration conflicts
Emergency Checklist	Printable, checkbox-driven checklist for use during live incidents

Quick Response Summary

Incident detected
       │
       ▼
Classify: P1 / P2 / P3 / P4
       │
       ▼
Open emergency-checklist.md → follow Steps 1–4
       │
       ▼
Identify failure type:
  ├── Vercel deployment issue    → production-recovery-runbook.md § Scenario 1
  ├── Supabase outage            → production-recovery-runbook.md § Scenario 2
  ├── Data corruption/deletion   → production-recovery-runbook.md § Scenario 3
  ├── Compromised credentials    → production-recovery-runbook.md § Scenario 4
  └── Full infrastructure loss   → production-recovery-runbook.md § Scenario 5
       │
       ▼
Execute runbook → verify recovery → post-incident review

💡

Bookmark the Supabase and Vercel status pages

Disaster Recovery Plan

Document version: 1.0
Last reviewed: 2026-06-30
Owner: Engineering Lead
Review cadence: Quarterly or after any P1/P2 incident

1. Objectives

This Disaster Recovery Plan (DRP) defines the strategy, responsibilities, and procedures to recover the SONAN DIGITAL CRM platform following any disruptive event. The plan has three primary objectives:

Minimize downtime — restore service to end users as quickly as possible within defined RTO targets
Protect client data — prevent permanent data loss and ensure RPO targets are met through backup and PITR infrastructure
Maintain audit trail — document every action taken during an incident to support post-incident review, compliance requirements, and future prevention

All engineers with production access are expected to read and be familiar with this document before handling any live incident.

2. Scope

This plan covers all components of the production SONAN DIGITAL CRM environment:

Component	In Scope	Notes
Vercel (hosting)	Yes	Edge runtime, all Next.js routes and API routes
Supabase (database)	Yes	PostgreSQL, Row Level Security, migrations
Supabase Auth	Yes	JWT sessions, email/password, TOTP MFA
Supabase Storage	Yes	Private documents bucket, public avatars bucket
Resend	Yes	Transactional email (invites, notifications, contracts)
Stripe	Yes	Payment processing, webhook event handling
Sentry	Partial	Error monitoring — outage degrades visibility but not user service
GitHub	Partial	Source of truth for code; outage blocks deploys but not running service

Out of scope: Local development environments, staging/preview deployments, client-side analytics.

3. Infrastructure Map

Service	Provider	Region	Account	Criticality
Next.js App Hosting	Vercel	Global Edge (multi-region)	Vercel Pro team account	Critical
PostgreSQL Database	Supabase	`ap-southeast-1` (Singapore)	Supabase organization	Critical
Auth Service	Supabase Auth	Same as DB	Same project	Critical
File Storage	Supabase Storage	Same as DB	Same project	High
Transactional Email	Resend	Global	Resend account	Medium
Payment Processing	Stripe	Global	Stripe account	High
Error Monitoring	Sentry	Global (cloud)	Sentry organization	Low
DNS & CDN	Cloudflare	Global	Cloudflare account	High
Source Control	GitHub	Global	GitHub organization	High

ℹ️

Region note

If the Supabase project region differs from ap-southeast-1, update this table. The region is visible in the Supabase dashboard under Settings → General.

4. Backup Assumptions

The following backup capabilities are assumed to be in place. Verify these assumptions quarterly.

4.1 Supabase Automatic Backups

Daily backups are taken automatically on all Supabase plans
Retained for 7 days on Free tier, 30 days on Pro tier
Accessible via Supabase Dashboard → Settings → Database → Backups

4.2 Supabase PITR (Point-in-Time Recovery)

Available on Pro plan and above
Enables restoration to any point within the last 7 days (Pro) or longer on Enterprise
WAL (Write-Ahead Log) streaming is continuous — RPO at the database level is effectively seconds
Restore initiates a new database instance; connection string changes after restore

🚨

PITR is not available on the Free tier

If the Supabase project is ever downgraded to Free, PITR is lost. The RPO falls back to the previous daily backup (up to 24 hours of data loss). Never downgrade to Free without updating the DR plan and communicating the change to stakeholders.

4.3 Vercel Deployment History

Vercel retains the last unlimited deployments on Pro (30 retained on Hobby)
Each deployment can be individually promoted to production via the dashboard
Deployment history is the primary mechanism for code rollback

4.4 No Application-Level Backup

There is currently no separate application-level database export or S3 offsite backup. The sole database recovery mechanism is Supabase's built-in backup and PITR. This is a known risk — see Section 5.

5. Single Points of Failure

SPOF	Risk Level	Mitigation	Notes
Supabase (all services on one project)	High	Supabase SLA on Pro tier; PITR available	No hot standby; regional outage affects all
Vercel Global Edge	Low	Multi-region edge; Vercel SLA covers uptime	Historically very reliable
Resend email delivery	Medium	Outage degrades notifications; core CRM still functional	Email is non-blocking for CRM flows
Stripe	Low	Stripe SLA; webhooks are idempotent and retried	Payment outage prevents new invoices only
Cloudflare DNS	Low	Global anycast DNS; extremely high availability
Single Supabase region	Medium	No cross-region replication in current tier	Consider Enterprise for cross-region if required
GitHub availability	Low	Running service unaffected; only deploys blocked

⚠️

Supabase is the primary SPOF

The database, authentication, and storage all reside in a single Supabase project. A full Supabase project outage affects the entire platform simultaneously. Supabase does not offer hot standby on Pro — recovery from a project-level failure depends on their SLA and internal recovery procedures. For mission-critical uptime beyond 99.9%, evaluate Supabase Enterprise or a self-hosted PostgreSQL fallback.

6. Recovery Strategies by Failure Type

6.1 Vercel Outage or Deployment Failure

Detection: 5xx errors on all pages, Vercel dashboard shows unhealthy deployment
Strategy: - If a bad deployment: roll back to the last known-good deployment via Vercel Dashboard → Deployments → Promote - If a Vercel platform outage: monitor https://www.vercel-status.com, wait for Vercel resolution - If persistent: escalate to Vercel support with incident details

See Rollback Guide for detailed steps.

6.2 Supabase Platform Outage (Managed Service)

Detection: Database connection errors in Vercel logs, auth failures across all users
Strategy: - Supabase is a managed service — there are no actions the team can take to restore it - Monitor https://status.supabase.com - Communicate status to affected clients - If outage exceeds 4 hours with no resolution, escalate to Supabase support

See Production Recovery Runbook — Scenario 2.

6.3 Data Corruption or Accidental Deletion

Detection: Missing records reported by users, inconsistent data visible in admin UI
Strategy: - Immediately identify the scope and timestamp of corruption - If ongoing corruption: temporarily disable the affected write path (feature flag or route disable) - Initiate PITR restore to a point before the corruption event - After restore, verify data integrity before restoring write access

See Database Restore Guide for PITR steps.

6.4 Environment Variable / Credential Compromise

Detection: Unauthorized API activity, strange charges in Stripe, security alert
Strategy: - Immediately rotate all affected credentials in the provider's dashboard - Update Vercel environment variables with new credentials - Trigger a redeploy to pick up new values - Review provider access logs for unauthorized activity - File security incident report

See Production Recovery Runbook — Scenario 4.

6.5 Full Infrastructure Failure

Detection: Complete loss of all services; recreating from scratch required
Strategy: - Follow environment recreation procedure in full - Prioritize in order: database → auth → hosting → email → payments → monitoring - Estimated recovery time: 2–4 hours for an experienced engineer

See Environment Recreation Guide.

7. Communication Plan During Incidents

Internal Communication

Severity	Who to Notify	Channel	Within
P1	All engineers + project owner	Direct message + email	15 minutes of detection
P2	Engineering lead + relevant engineer	Direct message	30 minutes of detection
P3/P4	Ticket created in project management tool	Async	Next business day

External Communication (Client-Facing)

P1: Prepare a brief status message. Do NOT share internal stack details, error messages, or timeline speculation. Example: "We are aware of an issue affecting the platform and are actively working to resolve it. We will provide an update within [X] hours."
P2: Notify only affected clients if the feature outage impacts their active workflows.
P3/P4: No proactive communication unless a client reports the issue.

🚨

Do not share internal details externally

Never include error stack traces, database names, provider names, or root cause hypotheses in client-facing communications during an active incident. Share facts only: what is affected, when it started, and when the next update will come.

8. Post-Incident Review Requirements

Every P1 and P2 incident must trigger a post-incident review (PIR), completed within 48 hours of resolution. The PIR must include:

Timeline — chronological log of when the incident was detected, escalated, and resolved, with timestamps
Root cause — technical explanation of what failed and why
Impact — which clients were affected, how long, and what data or functionality was unavailable
Actions taken — step-by-step recovery actions performed
What went well — processes or tools that helped during recovery
What went wrong — gaps in tooling, documentation, or response
Prevention items — specific, actionable tickets created to prevent recurrence
DR plan update — if this incident revealed a gap in this document, update it now

The PIR is stored in the project's internal wiki and linked from the incident ticket. There is no blame assigned — the goal is systemic improvement.

Production Recovery Runbook

This runbook provides step-by-step recovery instructions for the five most likely production failure scenarios. Each scenario includes symptoms, who to involve, estimated time to recover, numbered steps, and a verification checklist.

⚠️

Before you start any scenario

Open emergency-checklist.md and complete Steps 1–4 (classify, communicate, preserve evidence, triage) before executing any recovery steps.
Do not skip the verification checklist at the end of each scenario — partial recovery is worse than no recovery if it gives a false sense of stability.

Scenario 1: Vercel Deployment Failure

Classification: P1 (if all pages fail) or P2 (if specific routes fail)
Who to involve: Engineer who triggered the deployment + Engineering Lead
Estimated time to recover: 5–30 minutes (rollback) or 30–90 minutes (fix and redeploy)

Symptoms

HTTP 500 errors on all pages or specific routes after a recent deployment
Vercel dashboard shows the deployment as live but functions are failing
Sentry shows a spike of new errors immediately after deploy time
Users report the app is blank, crashing, or showing an error page
Edge function timeouts in Vercel logs

Root Cause Checklist

Before deciding between rollback and fix-and-redeploy, check:

[ ] Did a deployment happen within the last 2 hours?
[ ] Are errors scoped to specific routes or the entire app?
[ ] Do Vercel function logs show import errors, missing env vars, or runtime crashes?
[ ] Is the error a Supabase connection issue rather than a code issue?

Recovery Steps

Option A: Rollback (fastest)

Open the Vercel Dashboard and navigate to the project
Click Deployments in the left sidebar
Identify the last deployment that was working (look for the deployment before the current one, or check the timestamp against when errors started)
Click the ⋯ (three-dot menu) on the target deployment
Select Promote to Production
Confirm the promotion — Vercel will swap traffic within ~30 seconds
Monitor Vercel Functions logs for the next 5 minutes
Check Sentry error rate — it should drop to baseline

Option B: Fix and Redeploy (when rollback is not safe due to migration)

Identify the error in Vercel function logs or Sentry
Fix the code in the GitHub repository
Push to main — Vercel will auto-deploy
Monitor the deployment build log for errors
Once deployed, verify in Vercel that the new deployment is live
Check Sentry and Vercel logs for 10 minutes

🚨

Check for database migrations before rolling back

If the failing deployment included a database migration (new columns, tables, or RLS policies), rolling back the code while leaving the new schema in place may cause the old code to fail against the new schema. See Rollback Guide for how to handle migration conflicts.

Verification Checklist

[ ] Home page (/) loads with HTTP 200
[ ] Admin login at /auth/login works end-to-end
[ ] At least one API route (e.g., /api/admin/clients) returns data
[ ] Sentry error rate has returned to pre-incident baseline
[ ] Vercel deployment marked as Production is the correct one

Scenario 2: Supabase Outage (Managed Service)

Classification: P1 (full outage) or P2 (degraded performance)
Who to involve: Engineering Lead + Project Owner for client communication
Estimated time to recover: Dependent on Supabase SLA — typically 15 minutes to 4 hours

Symptoms

Auth errors: "Failed to fetch" or "JWT expired" errors in browser console
All API routes return 500 with database connection errors in Vercel logs
Supabase dashboard shows project as unhealthy or unreachable
Users cannot log in; existing sessions may also fail

What You Cannot Do

This is a managed service outage. The engineering team cannot restore the database, restart Supabase, or migrate to another provider within a P1 timeframe. The only available actions are monitoring, communication, and post-outage verification.

Recovery Steps

Confirm it is a Supabase outage — check https://status.supabase.com and look for an active incident in the ap-southeast-1 region (or whichever region the project is in)
Confirm it is not a Vercel configuration issue — check recent deployments; if no recent deploy happened, Vercel is likely not the cause
Check Vercel logs for the exact database error message — copy and save it for the PIR
Notify the team using the P1 communication protocol (see DR Plan § 7)
Prepare a client-facing status message — do not include technical details; state that a third-party service is experiencing an outage and you are monitoring for resolution
Subscribe to Supabase status updates — click "Subscribe to updates" on their status page for the active incident
Monitor every 15 minutes — check status page for progress
If outage exceeds 2 hours — open a support ticket with Supabase including your project reference ID (found in Supabase Dashboard → Settings → General)
Once Supabase reports resolution, wait 5 minutes before testing — services may need a few minutes to fully stabilize
Test recovery — follow the verification checklist below
Send a recovery notification to affected clients

Verification Checklist

[ ] Supabase status page shows all systems operational
[ ] Admin login works end-to-end
[ ] At least one Supabase query returns expected data
[ ] File uploads/downloads from Supabase Storage work
[ ] Verify no data was lost during the outage (check row counts on clients, projects, invoices for recent activity)
[ ] Sentry shows no new database-related errors

Scenario 3: Data Corruption or Accidental Deletion

Classification: P1 (data loss confirmed) or P2 (corruption suspected but contained)
Who to involve: Engineering Lead + Project Owner (for client impact assessment)
Estimated time to recover: 1–4 hours (depending on restore scope)

Symptoms

Users report missing records that were present earlier
Admin UI shows unexpected empty states or wrong data
An engineer reports running a destructive query accidentally
A bug is discovered that has been silently corrupting records

Recovery Steps

Phase 1: Contain

Identify the scope — which table(s) are affected? How many rows? Which tenants?
Determine the corruption timestamp — when did valid data last exist? Check Sentry for related errors, Vercel logs for unusual API activity, and ask affected users for the last known-good time
Stop new writes if possible — if the corruption is ongoing (e.g., a bug still running in production), disable the affected feature immediately:
For an API route: add an early return (return NextResponse.json({ error: 'Maintenance' }, { status: 503 })) and deploy
For a cron job: disable it in Vercel
Do not attempt to manually fix data in production before restoring — manual fixes may complicate the PITR restore point selection

Phase 2: Restore

Follow the Database Restore Guide to initiate a PITR restore to the timestamp 5 minutes before the identified corruption event
Note the new connection string from the restored Supabase project
Update NEXT_PUBLIC_SUPABASE_URL and related env vars in Vercel if the connection string changed
Trigger a Vercel redeploy

Phase 3: Verify

Run the SQL verification queries from Database Restore Guide § 7 to confirm data integrity
Have a team member manually verify the affected data in the admin UI
Re-enable any disabled features or cron jobs
Monitor Sentry and Vercel logs for 30 minutes

Phase 4: Communicate

Notify affected clients of the data recovery (do not share the cause unless contractually required)
Begin post-incident review

💡

Capture the corruption timestamp precisely

The more precisely you know when corruption started, the less data you lose in the PITR restore. A 5-minute difference in restore point can mean the difference between losing 5 minutes of data vs. 1 hour. Use Vercel logs, Sentry error timestamps, and Supabase audit logs to narrow this down.

Verification Checklist

[ ] Affected records are present with correct data
[ ] Latest legitimate records (pre-corruption) are present
[ ] Auth still works after restore (check that users can log in)
[ ] RLS policies are in place (run the RLS verification queries in database-restore.md)
[ ] Stripe webhooks are still pointing to the correct URL
[ ] Resend domain still verified
[ ] No duplicate records introduced by the restore

Scenario 4: Environment Variable / Credential Compromise

Classification: P1 (active exploitation suspected) or P2 (exposure suspected but no exploitation)
Who to involve: Engineering Lead + Project Owner + (if P1) all engineers immediately
Estimated time to recover: 30–90 minutes

Symptoms

Unexpected Stripe charges or API calls not originating from the app
Supabase admin API calls from unknown IPs
Resend sending emails not initiated by the application
GitHub security alert about a leaked secret
CI/CD log or error message exposing a secret value

Recovery Steps

Immediately — within 5 minutes of detection:

Identify which credentials are compromised — review the exposed secret type (Supabase key, Stripe key, Resend key, CRON_SECRET, etc.)
Do not close the source of exposure until you have documented it — screenshot the log, message, or file that exposed the secret
Assume the worst — treat the secret as actively exploited until proven otherwise

Rotate credentials — one at a time, in order of sensitivity:

Supabase Service Role Key (highest risk — full DB access without RLS):
Supabase Dashboard → Settings → API → Rotate service role key
Copy the new key immediately
Stripe Secret Key:
Stripe Dashboard → Developers → API Keys → Roll key
Copy the new key immediately
Stripe Webhook Secret:
Stripe Dashboard → Developers → Webhooks → select endpoint → Reveal signing secret → Roll secret
Copy the new webhook secret immediately
Resend API Key:
Resend Dashboard → API Keys → delete old key → create new key
Copy the new key immediately
CRON_SECRET:
Generate a new secure random string: openssl rand -base64 32
Note the new value

Update Vercel:

Open Vercel Dashboard → Project → Settings → Environment Variables
Update each rotated secret with its new value
Ensure you update for all environments (Production, Preview, Development)
Trigger a Redeploy — without redeployment, running edge functions still use old env var values cached at deploy time

Verify:

Test Stripe webhook — use the Stripe CLI or dashboard to send a test event and confirm it succeeds with the new secret
Test email sending — trigger a test notification or invite to confirm Resend works
Test a protected API route that uses SUPABASE_SERVICE_ROLE_KEY
Review access logs:
- Supabase: Dashboard → Logs → API logs — filter by time of exposure
- Stripe: Dashboard → Developers → Events — look for unexpected API calls
- Resend: Dashboard → Logs — look for emails not initiated by the app

Audit:

Determine how the secret was exposed (committed to Git, visible in logs, etc.)
If committed to Git: remove from history using git filter-repo or BFG, force-push, and notify GitHub
Create a post-incident review

Verification Checklist

[ ] All rotated secrets updated in Vercel env vars
[ ] Vercel redeploy completed successfully
[ ] Stripe test webhook succeeds with new signing secret
[ ] Admin login and authenticated API calls work
[ ] No unexpected API activity in Supabase, Stripe, or Resend logs since rotation
[ ] Old secret confirmed invalid (test with old value — should get 401/403)

Scenario 5: Full Infrastructure Recreation

Classification: P1 — complete loss requiring rebuild from scratch
Who to involve: All engineers + Project Owner
Estimated time to recover: 2–4 hours

Symptoms

Supabase project deleted or corrupted beyond recovery
Vercel project deleted or account access lost
Multiple services simultaneously unavailable with no path to recovery via existing resources
Security incident requiring full environment teardown and rebuild

Recovery Steps

This scenario requires following the Environment Recreation Guide in full. The high-level sequence is:

Verify that recreation is truly necessary — confirm that no restore, rollback, or credential rotation can resolve the issue
Notify all stakeholders immediately — this is a multi-hour outage
Create a new Supabase project and apply all database migrations
Configure Supabase Auth (email provider, redirect URLs, MFA settings)
Create Supabase Storage buckets with correct privacy settings
Create a new Vercel project and configure all environment variables
Restore data from the most recent Supabase backup (see Database Restore Guide)
Configure Stripe webhook pointing to new domain
Configure Resend domain verification
Verify all integrations end-to-end before announcing recovery
Communicate recovery to all clients

See Environment Recreation Guide for the complete step-by-step procedure with specific commands and configuration values.

Verification Checklist

[ ] All migrations applied and verified
[ ] Admin user can log in
[ ] At least one tenant's data is accessible
[ ] Stripe payment flow works end-to-end (use test mode)
[ ] Email notification is received when triggered
[ ] File upload and download works
[ ] Sentry is receiving errors from the new environment
[ ] Custom domain resolves correctly
[ ] All Vercel environment variables set correctly

Database Restore Guide

This guide covers all methods for restoring the Supabase PostgreSQL database — from Point-in-Time Recovery (PITR) for data loss scenarios to manual restores from exported backups. Read the entire guide before initiating any restore.

🚨

Restoring to production has irreversible consequences

A PITR restore creates a new database instance. It does NOT restore in-place — your current database remains temporarily accessible, but the connection string will change. Any writes made to the old database after the restore point will be permanently lost. Ensure you have stopped all writes before restoring.

1. Supabase PITR Overview

Point-in-Time Recovery (PITR) is available on Supabase Pro plan and above. It allows you to restore the database to any second within the retention window by replaying WAL (Write-Ahead Log) segments from the last full backup forward to the target timestamp.

Plan	PITR Availability	Retention Window
Free	Not available	Daily backups, 7-day retention
Pro	Available	7 days
Team	Available	14 days
Enterprise	Available	30+ days (configurable)

⚠️

PITR availability depends on plan

Verify the current Supabase plan before relying on PITR as the recovery mechanism. Go to Supabase Dashboard → Settings → Billing to confirm the active plan.

2. How to Initiate a PITR Restore

Via Supabase Dashboard

Log in to https://supabase.com/dashboard
Select the SONAN DIGITAL project
Navigate to Settings (gear icon in the left sidebar)
Click Database under the Settings menu
Scroll to the Backups section
Select the Point in Time tab
Use the calendar and time picker to select the target restore timestamp
Choose a time 5 minutes before the identified corruption or deletion event
Always err earlier — losing 10 minutes of legitimate data is better than including 1 minute of corrupted data
Click Restore and confirm the dialog
Supabase will begin provisioning a new database. This typically takes 5–20 minutes
Once complete, Supabase will provide a new project URL and connection strings

💡

Record the new connection strings immediately

After a PITR restore, Supabase provides a new project reference ID and connection string. Copy these immediately — they are needed to update Vercel environment variables.

New Environment Variables After PITR Restore

After the restore completes, update these variables in Vercel → Project → Settings → Environment Variables:

Variable	Where to Find New Value
`NEXT_PUBLIC_SUPABASE_URL`	New project URL (Settings → API in the restored project)
`NEXT_PUBLIC_SUPABASE_ANON_KEY`	Settings → API → anon key
`SUPABASE_SERVICE_ROLE_KEY`	Settings → API → service_role key

After updating, trigger a Vercel redeploy for the new values to take effect.

3. Manual Restore from Daily Backup

If PITR is not available (Free plan) or the corruption is older than the PITR retention window, restore from a daily backup.

Step 1: Download the Backup

Supabase Dashboard → Settings → Database → Backups → Scheduled Backups tab
Find the backup closest to (but before) the corruption event
Click Download to get the .sql.gz backup file

Step 2: Apply to a New Project (Recommended)

Rather than overwriting the existing project:

Create a new Supabase project in the same organization and region
In your local terminal, run:

# Decompress
gunzip backup.sql.gz

# Apply to the new project
psql "postgresql://postgres:{{ DB_PASSWORD }}@{{ DB_HOST }}:5432/postgres" < backup.sql

Step 3: Apply to Existing Project (Destructive)

Only if you cannot create a new project:

# WARNING: This drops and recreates the entire database
psql "postgresql://postgres:{{ DB_PASSWORD }}@{{ DB_HOST }}:5432/postgres" \
  -c "DROP SCHEMA public CASCADE; CREATE SCHEMA public;"

psql "postgresql://postgres:{{ DB_PASSWORD }}@{{ DB_HOST }}:5432/postgres" < backup.sql

🚨

Applying to existing project drops all current data

Running DROP SCHEMA public CASCADE permanently destroys all current data. Only do this if you are certain the backup is the correct recovery target and the current data is not recoverable.

4. Verification After Restore

After any restore (PITR or manual), run the following verification steps before re-enabling writes or notifying users of recovery.

4.1 Check Row Counts for Key Tables

Connect to the restored database and run:

SELECT
  'clients' AS table_name, COUNT(*) AS row_count FROM clients
UNION ALL SELECT 'projects', COUNT(*) FROM projects
UNION ALL SELECT 'invoices', COUNT(*) FROM invoices
UNION ALL SELECT 'proposals', COUNT(*) FROM proposals
UNION ALL SELECT 'contracts', COUNT(*) FROM contracts
UNION ALL SELECT 'time_logs', COUNT(*) FROM time_logs
UNION ALL SELECT 'notifications', COUNT(*) FROM notifications
UNION ALL SELECT 'users', COUNT(*) FROM auth.users
ORDER BY table_name;

Compare against known row counts before the incident. If counts are significantly lower than expected, the wrong restore point may have been selected.

4.2 Verify Latest Transactions Are Present

-- Check most recent invoice
SELECT id, created_at, subtotal_cents, status
FROM invoices
ORDER BY created_at DESC
LIMIT 5;

-- Check most recent client
SELECT id, name, created_at
FROM clients
ORDER BY created_at DESC
LIMIT 5;

-- Check most recent time log
SELECT id, logged_date, hours, created_at
FROM time_logs
ORDER BY created_at DESC
LIMIT 5;

Verify that the most recent records match what users reported seeing before the incident.

Navigate to the production URL (or staging after env var update)
Attempt to log in with a known admin user
Verify the session is created successfully and the dashboard loads

4.4 Test API Endpoints

# Test an authenticated API endpoint (replace TOKEN with a valid JWT)
curl -H "Authorization: Bearer {{ VALID_JWT }}" \
  https://{{ YOUR_DOMAIN }}/api/admin/clients

# Should return 200 with client list, not 500

4.5 Verify RLS Policies Are In Place

⚠️

PITR restores ALL schema objects including RLS policies

Supabase PITR restores the entire database state including Row Level Security policies. However, if you ran a manual restore from a SQL dump, verify that RLS is enabled on all tables.

-- Check RLS is enabled on all user-facing tables
SELECT
  tablename,
  rowsecurity AS rls_enabled
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY tablename;

-- All rows should show rls_enabled = true
-- If any show false, re-apply migrations immediately

5. Known Limitations

Limitation	Impact	Workaround
PITR creates a new database instance	Connection string changes; all env vars must be updated	Update Vercel env vars and redeploy immediately after restore
PITR does not restore Supabase Storage files	File bytes in storage buckets are not included in PITR	See Storage Restore Guide
PITR retention window is 7 days on Pro	Corruption older than 7 days cannot be PITR-restored	Use oldest available daily backup as fallback
Manual backup restores have a coarser restore point	Restoration is to the daily backup time, not the minute	May lose up to 24h of data if PITR is unavailable
Auth schema (`auth.*`) is included in backup	Restored users may have different JWT sub values if Supabase regenerated keys	Verify user logins post-restore

6. Data Loss Assessment

After restoring, determine exactly what data was lost so it can be communicated to affected clients.

Step 1: Identify the Restore Gap

-- Find records created between the restore timestamp and "now"
-- Replace '{{ RESTORE_TIMESTAMP }}' with the actual restore point
SELECT
  'invoices' AS table_name,
  COUNT(*) AS records_in_gap
FROM invoices
WHERE created_at > '{{ RESTORE_TIMESTAMP }}'
UNION ALL
SELECT 'clients', COUNT(*) FROM clients WHERE created_at > '{{ RESTORE_TIMESTAMP }}'
UNION ALL
SELECT 'projects', COUNT(*) FROM projects WHERE created_at > '{{ RESTORE_TIMESTAMP }}'
UNION ALL
SELECT 'time_logs', COUNT(*) FROM time_logs WHERE created_at > '{{ RESTORE_TIMESTAMP }}';

Step 2: Document Affected Tenants

-- Find which tenants had activity in the lost window
SELECT DISTINCT
  tenant_id,
  COUNT(*) AS lost_records
FROM (
  SELECT tenant_id, created_at FROM invoices WHERE created_at > '{{ RESTORE_TIMESTAMP }}'
  UNION ALL
  SELECT tenant_id, created_at FROM clients WHERE created_at > '{{ RESTORE_TIMESTAMP }}'
  UNION ALL
  SELECT tenant_id, created_at FROM projects WHERE created_at > '{{ RESTORE_TIMESTAMP }}'
) activity
GROUP BY tenant_id
ORDER BY lost_records DESC;

7. SQL Verification Queries

Run these after every restore before declaring recovery complete.

-- 1. Verify tenant isolation is intact (each tenant sees only their data)
SELECT tenant_id, COUNT(*) AS client_count
FROM clients
GROUP BY tenant_id
ORDER BY client_count DESC;

-- 2. Verify no orphaned foreign keys (projects without clients)
SELECT p.id, p.name, p.client_id
FROM projects p
LEFT JOIN clients c ON c.id = p.client_id
WHERE c.id IS NULL;
-- Should return 0 rows

-- 3. Verify no orphaned invoices (invoices without projects)
SELECT i.id, i.project_id
FROM invoices i
LEFT JOIN projects p ON p.id = i.project_id
WHERE p.id IS NULL;
-- Should return 0 rows

-- 4. Verify time_logs have valid task references
SELECT tl.id, tl.task_id
FROM time_logs tl
LEFT JOIN tasks t ON t.id = tl.task_id
WHERE t.id IS NULL;
-- Should return 0 rows

-- 5. Verify RLS policies exist (count should be > 0)
SELECT COUNT(*) AS policy_count
FROM pg_policies
WHERE schemaname = 'public';

-- 6. Check for any tables with RLS disabled
SELECT tablename
FROM pg_tables
WHERE schemaname = 'public'
  AND rowsecurity = false;
-- Should return 0 rows for all user-facing tables

-- 7. Verify auth users exist
SELECT COUNT(*) AS user_count FROM auth.users;
-- Should match expected number of registered users

-- 8. Check for recently created records (confirms restore point is correct)
SELECT MAX(created_at) AS latest_record
FROM clients;
-- Should be at or just before the target restore timestamp

Storage Restore Guide

This guide covers recovery procedures for Supabase Storage — the object storage layer that holds uploaded documents, avatars, and other files. Storage recovery is fundamentally different from database recovery and requires special handling.

⚠️

PITR does NOT restore storage files

Supabase Point-in-Time Recovery restores the PostgreSQL database only. The actual file bytes stored in Supabase Storage buckets are not included in PITR. After a database restore, database records pointing to storage objects may be restored, but the underlying files may be missing, stale, or orphaned. Always run a storage audit after any database restore.

1. Understanding Supabase Storage Architecture

Supabase Storage is a two-layer system:

Layer	What it contains	Recovery mechanism
PostgreSQL metadata (`storage.objects` table)	File paths, sizes, mime types, bucket IDs, owner references	Restored by PITR / database backup
Object store (S3-compatible)	Actual file bytes	Separate from PITR — contact Supabase Support

When a database restore occurs, the storage.objects table returns to its state at the restore point. However, the actual files in the object store may be ahead or behind that state — creating two types of inconsistency:

Orphaned files: Files exist in the object store but have no matching record in storage.objects
Orphaned records: Records exist in storage.objects but the actual file bytes have been deleted from the object store

2. What Can Be Restored

Scenario	Database Records	File Bytes	Recovery Path
Records deleted, files intact	Not in DB	In storage	Re-insert records via PITR restore
Records intact, files deleted	In DB	Not in storage	Contact Supabase Support; re-upload from local copies if available
Both deleted	Not in DB	Not in storage	Requires external backup (re-upload from source)
Database corrupted, storage intact	Corrupt	In storage	PITR restore DB, then audit for orphaned files

Contacting Supabase Support for Storage Recovery

If file bytes are lost and you need Supabase to attempt a storage recovery:

Log in to https://supabase.com/dashboard
Open your project
Navigate to Support → New Ticket
Select Incident type
Provide: project reference ID, bucket name(s), approximate date/time of loss, number of files affected
Note that storage backup availability is not guaranteed and may vary by plan

3. Orphaned Files: Files Without Database Records

After a database restore, files may exist in the storage bucket that have no corresponding row in storage.objects. These are "ghost files" — they consume storage but are inaccessible through the API.

Audit Query: Find Orphaned Storage References

This query identifies storage.objects records that have no corresponding reference in the application tables:

-- Find storage objects not referenced anywhere in the application
-- Adjust the subquery to include all tables that store file paths/references

SELECT
  o.id AS storage_object_id,
  o.name AS file_path,
  o.bucket_id,
  o.created_at,
  o.metadata->>'size' AS file_size_bytes
FROM storage.objects o
WHERE o.bucket_id = 'documents'
  AND o.name NOT IN (
    -- Replace with actual column(s) storing file paths in your schema
    SELECT file_path FROM contracts WHERE file_path IS NOT NULL
    UNION
    SELECT file_path FROM proposals WHERE file_path IS NOT NULL
    -- Add other tables that reference storage objects
  )
ORDER BY o.created_at DESC;

Handling Orphaned Files

If they are recent and belong to a failed upload: safe to delete
If their origin is unknown: keep for 30 days before deleting — a user may be trying to access them
Deletion: use the Supabase Storage API or dashboard to remove orphaned objects after audit

4. Orphaned Records: Database Records Without Files

After a database restore, storage.objects records may exist pointing to files that no longer exist in the object store. API calls attempting to access these files will return 404 errors.

Audit Query: Identify Broken File Links

-- This query surfaces storage records — compare against actual storage object list
-- Use Supabase Storage API or dashboard to list actual files in the bucket

SELECT
  id,
  name AS file_path,
  bucket_id,
  created_at,
  last_accessed_at
FROM storage.objects
WHERE bucket_id = 'documents'
ORDER BY created_at DESC;

To verify file existence, use the Supabase Storage API:

import { createServiceClient } from '@/lib/supabase/server'

const supabase = createServiceClient()

// List all files in the documents bucket
const { data: files, error } = await supabase.storage
  .from('documents')
  .list('', { limit: 1000 })

// Cross-reference with storage.objects records

For each record in storage.objects with no corresponding file in the object store, the file is unrecoverable unless a local copy exists or Supabase Support can restore it.

5. Bucket Privacy Verification

After any database restore, re-migration, or infrastructure change, verify that bucket policies are correctly configured. An incorrect policy change that makes a private bucket public is a serious security incident.

🚨

The `documents` bucket must ALWAYS be private

The documents bucket contains confidential client contracts, proposals, and financial documents. It must never be set to public. Verify this after every restore.

Verify via Supabase Dashboard

Supabase Dashboard → Storage → Buckets
For the documents bucket: confirm Public bucket toggle is OFF (private)
For the avatars bucket: confirm Public bucket toggle is ON (intentionally public for display)

Verify via SQL

-- Check bucket configurations
SELECT
  id AS bucket_name,
  public AS is_public,
  allowed_mime_types,
  file_size_limit
FROM storage.buckets
ORDER BY id;

-- Expected output:
-- avatars   | true  | ...
-- documents | false | ...

If the documents bucket shows is_public = true, immediately set it to private:

UPDATE storage.buckets
SET public = false
WHERE id = 'documents';

Then verify RLS policies on storage.objects are still enforced:

SELECT *
FROM pg_policies
WHERE tablename = 'objects'
  AND schemaname = 'storage';

6. Document-by-Document Recovery Procedure

If file bytes are lost and must be recovered manually (from local engineer copies, client email attachments, or external sources), use the following procedure:

Step 1: Identify Missing Files

Run the orphaned records audit query (Section 4) to get a list of file paths that need recovery.

Step 2: Collect Source Files

Check with the client for original documents
Check engineer email/Slack for any files shared during onboarding
Check local machine downloads folders if the file was ever downloaded during review

Step 3: Re-upload via API

import { createServiceClient } from '@/lib/supabase/server'
import { readFileSync } from 'fs'

const supabase = createServiceClient()
const fileBuffer = readFileSync('/path/to/recovered-file.pdf')

const { data, error } = await supabase.storage
  .from('documents')
  .upload('tenant-id/contracts/original-file-name.pdf', fileBuffer, {
    contentType: 'application/pdf',
    upsert: true, // overwrite if record exists but file is missing
  })

if (error) {
  console.error('Upload failed:', error)
} else {
  console.log('Recovered file uploaded:', data.path)
}

Step 4: Verify

After re-uploading, confirm the file is accessible:

const { data: signedUrl } = await supabase.storage
  .from('documents')
  .createSignedUrl('tenant-id/contracts/original-file-name.pdf', 60)

// Download and verify the file is readable

7. Prevention: Periodic External Backup

Supabase Storage backup availability is not guaranteed at all tiers. To reduce exposure, consider implementing a periodic export of critical documents:

Recommended Approach

Identify critical document types — contracts, signed proposals, client onboarding documents
Create a nightly export script that:
Queries storage.objects for all documents
Downloads each file using the service role key
Uploads to an external storage provider (AWS S3, Cloudflare R2, Backblaze B2)
Run the script as a Vercel cron job or external scheduled task

Example Export Script (Node.js)

// /scripts/backup-storage.ts
// Run with: npx ts-node scripts/backup-storage.ts

import { createClient } from '@supabase/supabase-js'

const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_ROLE_KEY!
)

async function backupDocuments() {
  const { data: files } = await supabase.storage
    .from('documents')
    .list('', { limit: 1000 })

  if (!files) return

  for (const file of files) {
    const { data } = await supabase.storage
      .from('documents')
      .download(file.name)

    if (data) {
      // Upload to external backup provider
      // await s3Client.putObject({ Bucket: '...', Key: file.name, Body: data })
      console.log(`Backed up: ${file.name}`)
    }
  }
}

backupDocuments()

💡

Store backup credentials separately

The external backup storage credentials should be stored in a separate secrets manager (not in the same Vercel project). If the main Vercel project is compromised, backup credentials should remain unaffected.

Environment Recreation Guide

This guide covers full ground-up recreation of all SONAN DIGITAL CRM infrastructure. Use this only when restoration from existing backups is not possible — for example, after account loss, catastrophic multi-service failure, or a security incident requiring a complete teardown.

🚨

This is a last resort

Environment recreation causes extended downtime and potential data loss. Before proceeding, confirm that none of the following faster options are available: Vercel deployment rollback, Supabase PITR restore, or credential rotation. If any of those paths are open, use them first.

Estimated total time: 2–4 hours for an experienced engineer
Prerequisite knowledge: Supabase, Vercel, Stripe, Resend, Cloudflare, GitHub Actions

Prerequisites

Ensure you have active access to all of the following before starting:

Account	Purpose	Access Needed
GitHub	Source of truth for code and migrations	Repo read + Actions write
Supabase	Database, Auth, Storage	Organization owner or project creator
Vercel	Hosting and edge functions	Project creator + env var access
Stripe	Payment processing	Account owner (to create webhooks)
Resend	Transactional email	API key creation
Cloudflare	DNS management	Zone editor for the domain

Have the following available before starting:

The GitHub repository URL: {{ GITHUB_REPO_URL }}
The custom domain: {{ CUSTOM_DOMAIN }}
The most recent database backup file (from Supabase or external backup)
All migration SQL files from the repository (/supabase/migrations/)
Contact information for each provider's support if something goes wrong

Step 1: Create Supabase Project

1.1 Create the Project

Log in to https://supabase.com/dashboard
Select the correct organization (not personal)
Click New Project
Set:
Name: sonan-digital-crm (or equivalent)
Database Password: generate a strong password — save it immediately in a password manager
Region: Southeast Asia (Singapore) — ap-southeast-1
Plan: Pro (required for PITR and production readiness)
Click Create new project — provisioning takes 1–2 minutes

1.2 Record Credentials

Once the project is ready, note the following from Settings → API:

Variable	Value
`NEXT_PUBLIC_SUPABASE_URL`	`https://{{ PROJECT_REF }}.supabase.co`
`NEXT_PUBLIC_SUPABASE_ANON_KEY`	`{{ ANON_KEY }}`
`SUPABASE_SERVICE_ROLE_KEY`	`{{ SERVICE_ROLE_KEY }}`
Database connection string	`postgresql://postgres:{{ DB_PASSWORD }}@db.{{ PROJECT_REF }}.supabase.co:5432/postgres`

1.3 Apply Database Migrations

Option A: Using Supabase CLI (preferred)

# Install Supabase CLI if not already installed
npm install -g supabase

# Login
supabase login

# Link to new project
supabase link --project-ref {{ PROJECT_REF }}

# Push all migrations
supabase db push

Option B: Manual SQL execution

In the Supabase Dashboard, go to SQL Editor
Open each migration file from the repository under /supabase/migrations/ in order (sorted by filename/timestamp)
Execute each file in sequence
Verify no errors before proceeding to the next migration

Verify migrations applied:

-- Check migration history table (if using Supabase CLI)
SELECT * FROM supabase_migrations.schema_migrations ORDER BY version;

-- Check key tables exist
SELECT tablename
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY tablename;

1.4 Restore Data (if available)

If a database backup is available, restore it now before configuring Auth (to avoid user ID conflicts):

psql "postgresql://postgres:{{ DB_PASSWORD }}@db.{{ PROJECT_REF }}.supabase.co:5432/postgres" \
  < /path/to/backup.sql

See Database Restore Guide for full restore steps.

1.5 Configure Supabase Auth

Navigate to Authentication → Providers
Enable Email provider
Set the following:
Confirm email: Enabled
Secure email change: Enabled
Enable email signup: Enabled
Navigate to Authentication → URL Configuration
Set Site URL: https://{{ CUSTOM_DOMAIN }}
Add Redirect URLs:
https://{{ CUSTOM_DOMAIN }}/auth/callback
https://{{ CUSTOM_DOMAIN }}/auth/confirm
http://localhost:3000/auth/callback (for local development)
Navigate to Authentication → MFA
Enable TOTP (Time-Based One-Time Password) for MFA

1.6 Create Storage Buckets

Navigate to Storage → New Bucket and create:

Bucket Name	Public	Purpose
`documents`	No (Private)	Client contracts, proposals, confidential files
`avatars`	Yes (Public)	User profile photos

🚨

The documents bucket must be private

Never create the documents bucket as public. Verify the toggle is OFF before saving.

After creating the buckets, verify Storage RLS policies are in place by checking that the migration SQL included policy definitions for storage.objects.

Step 2: Create Vercel Project

2.1 Import from GitHub

Log in to https://vercel.com
Click Add New → Project
Select Import Git Repository
Connect to GitHub and select {{ GITHUB_REPO_URL }}
Set the Framework Preset to Next.js
Set Root Directory to / (or the app root if monorepo)
Do not deploy yet — configure environment variables first

2.2 Configure Environment Variables

Navigate to Settings → Environment Variables and set all of the following. Apply to Production environment (and Preview/Development as needed):

Variable	Value Source
`NEXT_PUBLIC_SUPABASE_URL`	From Step 1.2
`NEXT_PUBLIC_SUPABASE_ANON_KEY`	From Step 1.2
`SUPABASE_SERVICE_ROLE_KEY`	From Step 1.2
`RESEND_API_KEY`	From Step 4 below
`STRIPE_SECRET_KEY`	From Step 3 below
`STRIPE_WEBHOOK_SECRET`	From Step 3 below
`NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY`	From Stripe Dashboard → Developers → API Keys
`CRON_SECRET`	Generate: `openssl rand -base64 32`
`NEXT_PUBLIC_SENTRY_DSN`	From Sentry project settings
`SENTRY_AUTH_TOKEN`	From Sentry account → Auth Tokens (org:ci scope required)

💡

Set CRON_SECRET before deploying

The cron endpoint at /api/admin/appointments/cron checks for the CRON_SECRET header. If not set, the endpoint will be unprotected. Generate and set this before the first deploy.

2.3 Configure Build Settings

In Vercel project settings:

Build Command: next build (default)
Output Directory: .next (default)
Install Command: npm install or pnpm install (match the repo's package manager)
Node.js Version: 20.x (match the repo's .nvmrc or engines field in package.json)

2.4 Configure Custom Domain

Vercel project → Settings → Domains
Add {{ CUSTOM_DOMAIN }}
Vercel will show DNS records to add in Cloudflare
In Cloudflare Dashboard → DNS:
Add the CNAME or A record as shown by Vercel
Set Proxy status to DNS only (grey cloud) initially — switch to proxied after verifying SSL
Wait for DNS propagation (typically 1–5 minutes with Cloudflare)
Vercel will automatically provision an SSL certificate via Let's Encrypt

2.5 Deploy

Trigger the first deploy from Vercel dashboard → Deployments → Redeploy (or push a commit to main)
Monitor the build log for errors
Once deployed, verify the custom domain resolves to the app

Step 3: Configure Stripe

3.1 Locate Stripe Keys

Log in to https://dashboard.stripe.com
Navigate to Developers → API Keys
Copy the Publishable key and Secret key
Set these in Vercel env vars (NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY, STRIPE_SECRET_KEY)

⚠️

Use live mode keys for production

Stripe Dashboard defaults to test mode. Switch to Live mode (toggle in the top-left) before copying keys for production environment variables.

3.2 Create Webhook Endpoint

Stripe Dashboard → Developers → Webhooks
Click Add endpoint
Set Endpoint URL: https://{{ CUSTOM_DOMAIN }}/api/webhooks/stripe
Under Events to send, select:
payment_intent.succeeded
checkout.session.completed
invoice.payment_succeeded
customer.subscription.updated
customer.subscription.deleted
Click Add endpoint
On the endpoint detail page, click Reveal under Signing secret
Copy the signing secret → set as STRIPE_WEBHOOK_SECRET in Vercel
Trigger a Vercel redeploy to pick up the new webhook secret

3.3 Verify Webhook

From the Stripe webhook detail page, click Send test webhook
Select checkout.session.completed event
Click Send test webhook
Verify the endpoint returns 200 OK in the response

Step 4: Configure Resend

4.1 Verify Domain

Log in to https://resend.com
Navigate to Domains → Add Domain
Enter {{ CUSTOM_DOMAIN }} (or the email sending subdomain, e.g., mail.{{ CUSTOM_DOMAIN }})
Resend will provide DNS records (SPF, DKIM, DMARC)
Add these records in Cloudflare DNS

ℹ️

DNS propagation for email records

SPF and DKIM records can take up to 24 hours to propagate fully, though Cloudflare typically propagates within minutes. Do not consider email configuration complete until Resend shows all records as Verified.

4.2 Create API Key

Resend Dashboard → API Keys → Create API Key
Set name: sonan-digital-production
Set permission: Full access (or Sending access if minimal permissions are preferred)
Copy the key → set as RESEND_API_KEY in Vercel
Trigger a Vercel redeploy

4.3 Verify Email Sending

Send a test email via the API:

curl -X POST 'https://api.resend.com/emails' \
  -H 'Authorization: Bearer {{ RESEND_API_KEY }}' \
  -H 'Content-Type: application/json' \
  -d '{
    "from": "no-reply@{{ CUSTOM_DOMAIN }}",
    "to": ["your-email@example.com"],
    "subject": "DR Test — Email Sending Verified",
    "html": "<p>Email sending is working correctly.</p>"
  }'

Verify the email is received.

Step 5: Verify All Integrations End-to-End

Before announcing recovery, complete this full integration verification:

5.1 Auth Flow

[ ] Navigate to https://{{ CUSTOM_DOMAIN }}/auth/login
[ ] Log in with a known admin account
[ ] Verify the admin dashboard loads with data
[ ] Log out and log back in — confirm session persistence works

5.2 Database Connectivity

[ ] Admin dashboard shows existing clients/projects (confirms DB connection and RLS)
[ ] Create a new test record — confirm it persists on page refresh
[ ] Verify multi-tenant isolation: log in as two different tenant admins and confirm each sees only their data

5.3 File Storage

[ ] Upload a file in the documents section
[ ] Verify the file appears in the list
[ ] Download the file — confirm the content is correct
[ ] Verify the Supabase Storage → documents bucket shows the file
[ ] Confirm the signed URL expires (do not test with public access)

5.4 Email Notifications

[ ] Trigger an action that sends a notification email (e.g., invite a new user)
[ ] Verify the email is received within 2 minutes
[ ] Check Resend Dashboard → Logs to confirm delivery

5.5 Stripe (Test Mode First)

[ ] Switch Stripe to Test mode temporarily
[ ] Create a test invoice and attempt payment with Stripe test card 4242 4242 4242 4242
[ ] Verify the webhook fires and the invoice status updates in the CRM
[ ] Switch Stripe back to Live mode

5.6 Error Monitoring

[ ] Verify Sentry is receiving events by triggering a known test error
[ ] Check Sentry Dashboard for the event

5.7 Cron Jobs

[ ] Verify /api/admin/appointments/cron returns 200 when called with the correct CRON_SECRET header
[ ] Verify it returns 401 without the header

Estimated Timeline

Step	Task	Estimated Time
1	Create Supabase project	10 minutes
1.3	Apply migrations	20–40 minutes
1.4	Restore data from backup	15–60 minutes (depends on DB size)
1.5–1.6	Configure Auth + Storage	10 minutes
2	Create Vercel project + env vars	20 minutes
2.4	Configure DNS	10 minutes + propagation
3	Configure Stripe + webhook	15 minutes
4	Configure Resend	10 minutes + DNS propagation
5	End-to-end verification	30–45 minutes
Total		2–4 hours

💡

Work in parallel where possible

DNS propagation for both the custom domain (Step 2.4) and Resend domain (Step 4.1) can be initiated early and verified later. Kick off DNS changes as soon as possible, then continue with the other steps while propagation completes.

Rollback Guide

This guide explains how to roll back a production Vercel deployment, when rollback is and is not safe, and how to handle the most complex rollback scenario: when a database migration was included in the deployment being rolled back.

1. What Vercel Rollback Does (and Does Not Do)

What rollback does: - Instantly swaps the edge function code and static assets served to users - Routes production traffic to the selected previous deployment - Takes effect globally within ~30 seconds

What rollback does NOT do: - Does not revert the database schema — Supabase migrations are independent - Does not roll back environment variable changes (those are set per-deployment at build time but env var changes persist in the project settings) - Does not undo any data changes made by the bad deployment while it was live

This distinction is critical. After a code rollback, the database is always in whatever state the bad deployment left it. If the bad deployment ran a destructive migration or corrupted data, a code rollback alone is insufficient — you also need a Database Restore.

2. When to Roll Back

Situation	Roll Back?	Notes
5xx errors on new deployment, no schema changes	Yes — immediately	Safest rollback scenario
New feature causing data display bugs (no corruption)	Yes	Safe if no migration was in the deployment
Performance regression introduced by new code	Yes	Safe
Security issue in new code (XSS, auth bypass)	Yes — immediately	Also audit for exploitation during the window
Deployment included a migration; code is failing	Careful — see § 4	Need to evaluate if old code is compatible with new schema
Deployment included a migration; code is working but migration has a bug	Do not roll back code	Fix the migration issue separately
Data was corrupted by the new deployment	Roll back code + restore DB	See Database Restore Guide

💡

Default to rollback for 5xx errors

If a deployment causes widespread 5xx errors and you don't know why, roll back first and investigate second. The cost of 5 extra minutes of downtime while you investigate is almost always higher than the cost of spending 5 minutes post-rollback understanding the root cause.

3. How to Roll Back via Vercel Dashboard

Open the Vercel Dashboard
Select the SONAN DIGITAL project
Click Deployments in the left sidebar — this shows the full deployment history
Identify the target deployment — look for the deployment immediately before the current (failing) one, or use the timestamp to find the last known-good deployment
Verify the target by checking its Git commit SHA — match it to the last commit you know was working
Click the ⋯ (three-dot menu) on the target deployment row
Select Promote to Production
Confirm the modal — Vercel will begin the promotion immediately
Traffic will switch within ~30 seconds

Identifying the Right Target Deployment

By time: Look for the deployment timestamped just before the incident started
By commit: Cross-reference the deployment's Git SHA with your commit history in GitHub
By environment: Check that the target deployment was previously serving production traffic successfully (it will have a Production badge in its history)

⚠️

Do not promote a preview deployment to production

Preview deployments may have different environment variables or feature flags. Only promote a deployment that was previously serving as the Production deployment.

4. Safe Rollback Checklist

Work through this checklist before and after every rollback:

Pre-Rollback

[ ] Identify the target deployment SHA — note the Git commit SHA of the deployment you will promote
[ ] Verify the target was previously working — confirm from deployment history that it served production successfully before the current deployment
[ ] Check for database migrations since the target deployment — run: bash git log --oneline {{ TARGET_SHA }}..HEAD -- supabase/migrations/ If this returns any files, migrations have been added since the target. See § 4 below.
[ ] Check for environment variable changes — if new env vars were added in the current deployment, the old code may fail if it tries to read them. Verify the old code doesn't require env vars not present at target time.

Post-Rollback

[ ] Verify the production URL serves the correct version — check the page footer, API version header, or a known UI change to confirm the old code is live
[ ] Monitor Sentry error rates for 15 minutes — error rate should return to pre-incident baseline
[ ] Check Vercel function logs for any new errors in the rolled-back version
[ ] Verify authenticated flows — log in, load data, trigger a key user action
[ ] Open a post-incident ticket documenting what caused the need to roll back

5. Handling Database Migration Conflicts

This is the most complex rollback scenario. It occurs when:

A deployment added a new database migration (new column, renamed column, dropped column, new table, changed RLS policy)
That deployment's code is failing and you want to roll back to the previous code
The previous code was written against the old schema

The risk: rolling back the code while the new schema is still in place may cause the old code to fail in new ways (querying a column that no longer exists, missing a required column, etc.).

Assessment Matrix

Migration type	Old code compatibility	Action
Additive (new table, new nullable column)	Old code usually safe — it ignores new columns	Roll back code; new schema is backward compatible
New non-null column with default	Old code usually safe — DB provides the default	Roll back code; verify no INSERT errors
Renamed column	Old code will fail — it references old name	Must write a DOWN migration before rolling back code
Dropped column	Old code will fail — it tries to SELECT dropped column	Must restore column before rolling back code
Changed RLS policy	Depends on direction — restrictive change may break old code reads	Evaluate and potentially revert RLS policy

Writing and Applying a DOWN Migration

A DOWN migration reverses the UP migration SQL. Write it manually and apply via the Supabase SQL editor.

Example: Reversing a column rename

-- UP migration (the bad deployment ran this)
ALTER TABLE clients RENAME COLUMN company TO company_name;

-- DOWN migration (you write this to make old code work)
ALTER TABLE clients RENAME COLUMN company_name TO company;

Applying a DOWN migration:

Open Supabase Dashboard → SQL Editor
Paste the DOWN migration SQL
Click Run
Verify the change took effect: sql -- Verify column exists with old name SELECT column_name FROM information_schema.columns WHERE table_name = 'clients' AND column_name = 'company';
Now roll back the code in Vercel (the old code is now compatible with the reverted schema)

🚨

DOWN migrations can cause data loss

Reversing a migration that added data (e.g., a migration that ran a backfill) may result in data loss. Always inspect the UP migration carefully and determine if the DOWN migration is safe before applying it.

6. Edge Runtime Rollback Limitation

The SONAN DIGITAL CRM uses export const runtime = 'edge' on all routes. This means:

All routes are deployed as a single Vercel edge deployment
There is no partial rollback — you cannot roll back one route while keeping another
A rollback promotes the entire deployment (all pages, all API routes, all edge functions) to the target state

This is usually the correct behavior — a deployment is an atomic unit. However, it means you cannot surgically roll back a single broken API route. Your only options are:

Roll back the entire deployment
Fix the bug and redeploy
Add a temporary feature flag or early-return to the broken route and redeploy

7. Vercel CLI Rollback (Alternative)

If the Vercel dashboard is unavailable, roll back via the Vercel CLI:

# Install Vercel CLI
npm install -g vercel

# Login
vercel login

# List recent deployments
vercel ls --scope {{ VERCEL_TEAM_SLUG }}

# Promote a specific deployment by URL or ID
vercel promote {{ DEPLOYMENT_URL }} --scope {{ VERCEL_TEAM_SLUG }}

The vercel promote command is equivalent to clicking "Promote to Production" in the dashboard.

Emergency Recovery Checklist

Print this page or keep it bookmarked. Use it at the start of every incident.

This checklist is designed to be followed sequentially under stress. Do not skip steps. Each step is short and actionable.

🚨 Step 1: Identify & Classify

[ ] What is broken? (e.g., "all pages 500", "auth failing", "invoices missing")
[ ] When did it start? Note the exact time: ______________
[ ] Is it P1 (full outage / data loss) or P2 (major feature) or P3 (minor)?
[ ] Is it still ongoing or intermittent?

Classification	Criteria
P1 Critical	Full platform down, confirmed data loss, auth broken for all users
P2 High	Major feature down (billing, contracts, proposals), significant user impact
P3 Medium	Non-critical feature down, limited user impact
P4 Low	Cosmetic issue, isolated to one user

📢 Step 2: Communicate

[ ] Notify the Engineering Lead immediately (P1/P2)
[ ] Notify the Project Owner (P1 only)
[ ] If P1 and clients are affected: prepare a brief, vague status message:

"We are aware of an issue affecting the platform and are actively working to resolve it. An update will follow within [X] hours."
[ ] Do NOT share: stack traces, database errors, provider names, root cause guesses

🗂️ Step 3: Preserve Evidence

[ ] Screenshot or copy all error messages visible in the browser
[ ] Open Vercel Dashboard → Logs → filter to the incident timeframe → copy errors
[ ] Open Sentry Dashboard → find the active error spike → copy the error title and first occurrence time
[ ] Note the last deployment SHA and timestamp from Vercel Deployments page
[ ] Note the last database migration applied (check supabase/migrations/ in GitHub for latest file)
[ ] Save all of the above to a shared doc or thread — you will need this for the post-incident review

🔍 Step 4: Triage (External Status Checks)

Check each provider's status page before assuming the problem is in your code:

[ ] Vercel: https://www.vercel-status.com — any active incidents?
[ ] Supabase: https://status.supabase.com — any active incidents in ap-southeast-1?
[ ] Stripe: https://status.stripe.com — any active incidents?
[ ] Resend: https://status.resend.com — any active incidents?
[ ] Was there a recent deployment? Check Vercel Deployments — did a deploy go out in the last 2 hours?
[ ] Was there a recent DB migration? Check GitHub commits to supabase/migrations/ — any new files in the last 24 hours?

🛠️ Step 5: Route to the Correct Runbook

Based on your triage, go to the relevant section of Production Recovery Runbook:

[ ] Vercel deployment failure / code error → Scenario 1: Vercel Deployment Failure
[ ] Supabase status page shows an incident → Scenario 2: Supabase Outage
[ ] Data is missing or incorrect → Scenario 3: Data Corruption
[ ] Credentials may be compromised → Scenario 4: Credential Compromise
[ ] Full infrastructure failure → Scenario 5: Full Infrastructure Recreation

⚡ Step 6: Attempt Fast Recovery First

Before reaching for complex solutions, try these quick wins in order:

[ ] Can you roll back the last deployment? → Rollback Guide — takes ~2 minutes
[ ] Is it a Supabase-managed outage? → Nothing to do but wait and communicate
[ ] Is it a single broken route? → Add an early return 503 to that route and redeploy while you fix it
[ ] Is it a missing environment variable? → Check Vercel env vars, add the missing value, redeploy

🗄️ Step 7: Database Recovery (if data loss or corruption confirmed)

[ ] Identify the exact timestamp corruption began
[ ] Confirm PITR is available (Supabase Pro plan): Supabase Dashboard → Settings → Database → Backups
[ ] Choose a restore timestamp 5 minutes before the identified corruption time
[ ] Initiate PITR restore: Database Restore Guide
[ ] After restore: update Supabase connection strings in Vercel env vars
[ ] Trigger Vercel redeploy
[ ] Run post-restore verification SQL queries

💾 Step 8: Storage Recovery (if files are missing)

[ ] Confirm whether DB records exist for the missing files (run orphaned records audit query)
[ ] Confirm whether file bytes exist in Supabase Storage bucket (check Storage dashboard)
[ ] If file bytes are lost: contact Supabase Support with project ref, bucket name, and date of loss
[ ] If recoverable from local copies: re-upload via service role client
[ ] Verify documents bucket is still private after any storage operation
[ ] See full procedure: Storage Restore Guide

🔑 Step 9: Credential Rotation (if secrets compromised)

Rotate in this order — do not skip any:

[ ] Supabase Service Role Key → Supabase Dashboard → Settings → API → Rotate
[ ] Stripe Secret Key → Stripe Dashboard → Developers → API Keys → Roll
[ ] Stripe Webhook Secret → Stripe Dashboard → Developers → Webhooks → Roll signing secret
[ ] Resend API Key → Resend Dashboard → API Keys → Delete old → Create new
[ ] CRON_SECRET → Generate new: openssl rand -base64 32
[ ] Update all rotated values in Vercel → Environment Variables
[ ] Trigger Vercel redeploy
[ ] Verify all integrations work after rotation
[ ] See full procedure: Production Recovery Runbook § Scenario 4

🏗️ Step 10: Full Infrastructure Recreation (last resort)

Only if no other recovery path is available:

[ ] Confirm with Engineering Lead and Project Owner that recreation is the only path
[ ] Estimate downtime and communicate to clients: ___ to ___ hours
[ ] Follow Environment Recreation Guide — estimated 2–4 hours
[ ] Complete every verification step before announcing recovery

✅ Step 11: Post-Recovery Verification

Run these checks before declaring the incident resolved:

[ ] Home page (/) returns HTTP 200
[ ] Admin login works end-to-end
[ ] At least one authenticated API call returns correct data
[ ] File upload and download works
[ ] Sentry error rate has returned to baseline (check for 15 minutes)
[ ] Vercel function logs show no new errors
[ ] Stripe: no unexpected events in Stripe Dashboard since recovery
[ ] Resend: no unexpected emails in Resend logs since recovery
[ ] Notify team and project owner that the incident is resolved
[ ] Send client communication if applicable: "The issue affecting the platform has been resolved as of [time]. We apologize for the disruption."
[ ] Open a post-incident review ticket — every P1 and P2 requires a PIR within 48 hours

📋 Post-Incident Review Requirements

A Post-Incident Review (PIR) must be completed within 48 hours of resolution for all P1 and P2 incidents.

The PIR must include:

[ ] Timeline with timestamps: detection → escalation → resolution
[ ] Root cause explanation
[ ] Client impact (who affected, for how long, what was unavailable)
[ ] Actions taken during recovery
[ ] What went well during the response
[ ] What could be improved in the response or tooling
[ ] Prevention items — specific tickets created to prevent recurrence
[ ] DR plan updated if this incident revealed a documentation gap

ℹ️

Contacts

Fill in your team's contact details below:

| Role | Name | Contact |
|---|---|---|
| Engineering Lead | `{{ NAME }}` | `{{ CONTACT }}` |
| Project Owner | `{{ NAME }}` | `{{ CONTACT }}` |
| Supabase Support | — | [https://supabase.com/dashboard/support](https://supabase.com/dashboard/support) |
| Vercel Support | — | [https://vercel.com/help](https://vercel.com/help) |
| Stripe Support | — | [https://support.stripe.com](https://support.stripe.com) |
| Resend Support | — | [https://resend.com/docs](https://resend.com/docs) |

Disaster Recovery Plan & Runbooks

Disaster Recovery Overview

Purpose

RTO and RPO Targets

Incident Classification

Stack Overview

Document Index

Quick Response Summary

Disaster Recovery Plan

1. Objectives

2. Scope

3. Infrastructure Map

4. Backup Assumptions

4.1 Supabase Automatic Backups

4.2 Supabase PITR (Point-in-Time Recovery)

4.3 Vercel Deployment History

4.4 No Application-Level Backup

5. Single Points of Failure

6. Recovery Strategies by Failure Type

6.1 Vercel Outage or Deployment Failure

6.2 Supabase Platform Outage (Managed Service)

6.3 Data Corruption or Accidental Deletion

6.4 Environment Variable / Credential Compromise

6.5 Full Infrastructure Failure

7. Communication Plan During Incidents

Internal Communication

External Communication (Client-Facing)

8. Post-Incident Review Requirements

Production Recovery Runbook

Scenario 1: Vercel Deployment Failure

Symptoms

Root Cause Checklist

Recovery Steps

Verification Checklist

Scenario 2: Supabase Outage (Managed Service)

Symptoms

What You Cannot Do

Recovery Steps

Verification Checklist

Scenario 3: Data Corruption or Accidental Deletion

Symptoms

Recovery Steps

Verification Checklist

Scenario 4: Environment Variable / Credential Compromise

Symptoms

Recovery Steps

Verification Checklist

Scenario 5: Full Infrastructure Recreation

Symptoms

Recovery Steps

Verification Checklist

Database Restore Guide

1. Supabase PITR Overview

2. How to Initiate a PITR Restore

Via Supabase Dashboard

New Environment Variables After PITR Restore

3. Manual Restore from Daily Backup

Step 1: Download the Backup

Step 2: Apply to a New Project (Recommended)

Step 3: Apply to Existing Project (Destructive)

4. Verification After Restore

4.1 Check Row Counts for Key Tables

4.2 Verify Latest Transactions Are Present

4.3 Test Auth Login

4.4 Test API Endpoints

4.5 Verify RLS Policies Are In Place

5. Known Limitations

6. Data Loss Assessment

Step 1: Identify the Restore Gap

Step 2: Document Affected Tenants

7. SQL Verification Queries

Storage Restore Guide

1. Understanding Supabase Storage Architecture

2. What Can Be Restored

Contacting Supabase Support for Storage Recovery

3. Orphaned Files: Files Without Database Records

Audit Query: Find Orphaned Storage References

Handling Orphaned Files

4. Orphaned Records: Database Records Without Files

Audit Query: Identify Broken File Links