Maintenance Schedules
Maintenance Schedule
This page contains the recurring maintenance checklists for the SONAN DIGITAL CRM. These tasks keep the platform secure, performant, and up to date. Assign each checklist to a named engineer at the start of each cycle.
Overview
| Frequency | When | Approximate Time |
|---|---|---|
| Monthly | First Monday of each month | 1โ2 hours |
| Quarterly | First Monday of Jan, Apr, Jul, Oct | 3โ4 hours |
| Annual | January each year | 1 day |
Monthly Checklist
Perform on the first Monday of each month. Document the outcome of each item as a comment or in a shared log.
Error Tracking
- [ ] Review Sentry issues โ open Sentry โ Issues โ filter to "Last 30 days". For each open issue:
- Assign to an engineer if unowned.
- Resolve issues where the underlying bug has been fixed.
- Mark as "Ignored" (with a comment) for known non-actionable errors.
-
Create a bug ticket for any issue that is recurring and not yet addressed.
-
[ ] Check Sentry error rate trend โ is the 30-day error rate higher than last month? If yes, investigate the cause before closing the review.
Performance
-
[ ] Check Vercel function execution times โ go to Vercel Dashboard โ Analytics โ Web Analytics or the Logs tab. Look for any routes where response times have increased compared to last month. Flag routes averaging > 2 seconds for investigation.
-
[ ] Check Supabase database size โ Supabase Dashboard โ Database โ Database Settings โ Database Size. If the database is growing faster than expected, review which tables are growing and whether old records need archiving.
-
[ ] Check Supabase query performance โ Supabase Dashboard โ Database โ Query Performance (pg_stat_statements). Look for queries with high
mean_exec_time(> 500ms). Create indexes or optimize queries as needed.
Security
- [ ] Run npm audit โ in the project root:
bash npm auditReview the output forhighandcriticalseverity vulnerabilities. Apply patches:bash npm audit fix # For breaking changes, review manually: npm audit fix --forceCommit and deploy any security patches immediately.
Operations
-
[ ] Verify cron jobs ran correctly โ check Vercel Dashboard โ Settings โ Cron Jobs for each job. Confirm all runs in the past 30 days succeeded. For any failures, confirm they were addressed at the time or re-run manually.
-
[ ] Review and rotate secrets if schedule is due โ check the Secrets Inventory table. If any secret has passed its rotation interval, perform rotation per the rotation schedule.
-
[ ] Test backup restore โ Supabase Pro includes daily database backups. Once a month, spot-check the restore:
- Supabase Dashboard โ Database โ Backups.
- Note the most recent backup timestamp.
- Use the Point in Time Recovery feature to restore a single table to a scratch environment (or simply confirm the backup is listed and Supabase reports it as valid).
- Document: backup date verified, table checked, restore tested successfully.
Access Review
- [ ] Review user access โ go to the CRM admin panel โ Settings โ Team Management. Review the list of active staff accounts:
- Deactivate any accounts for employees who have left the company.
- Verify that all active accounts have the correct role (employee vs. admin).
-
Confirm no unexpected admin accounts exist.
-
[ ] Review Vercel team access โ Vercel Dashboard โ Team Settings โ Members. Remove any former team members.
-
[ ] Review Supabase project access โ Supabase Dashboard โ Project Settings โ Team. Remove any former team members.
Quarterly Checklist
Perform in January, April, July, and October. This is a deeper review than the monthly cadence.
Dependencies
- [ ] Full dependency update โ review and update all npm packages:
bash npx npm-check-updates -u npm install npm run build npx tsc --noEmitTest thoroughly on thedevbranch before merging tomain. Pay particular attention to: - Next.js version (read the migration guide for any minor version bumps).
- Supabase JS client (
@supabase/supabase-js). - Stripe JS (
stripe,@stripe/stripe-js). -
Sentry (
@sentry/nextjs). -
[ ] Review breaking changes in updated packages. Check each package's changelog for deprecations or API changes that affect the codebase.
Security
-
[ ] Review RLS policies for new tables โ for every new database table added since the last quarterly review, verify a row-level security policy exists:
sql -- List tables with RLS disabled SELECT schemaname, tablename, rowsecurity FROM pg_tables WHERE schemaname = 'public' AND rowsecurity = false;Any table listed here that is not intentionally public must have RLS enabled and appropriate policies applied. -
[ ] Security audit of new API routes โ for every new
/api/route added since the last review: - Confirm authentication check is present (
requireAdminWithTenant()or equivalent). - Confirm input validation is applied to all request body fields.
- Confirm
export const runtime = 'edge'is present. -
Confirm no secrets are logged or returned in responses.
-
[ ] Review Stripe webhook event handling โ check the Stripe changelog for new API versions or deprecated events. Ensure the webhook handler covers all event types the application subscribes to. Update the Stripe API version in
vercel.json/ Stripe client initialization if needed.
Infrastructure
-
[ ] Review Supabase SDK version โ check the current
@supabase/supabase-jsversion inpackage.jsonagainst the latest release on npm. Supabase sometimes introduces behavior changes in minor versions โ read the release notes. -
[ ] Performance review โ run a full Lighthouse audit:
- Open the production app in Chrome Incognito.
- Open DevTools โ Lighthouse โ run for Desktop and Mobile.
- Document scores for: Performance, Accessibility, Best Practices, SEO.
-
Address any Performance or Accessibility regressions from the previous quarter.
-
[ ] API response time review โ using Vercel Analytics or a manual sampling tool, record the p50 and p95 response times for the 5 most-used API routes. Flag any that have degraded since last quarter.
Documentation
- [ ] Review this documentation portal โ verify the docs reflect the current state of the system. Update any pages that describe outdated features, removed routes, or changed processes.
Annual Tasks
Perform in January each year. These tasks require dedicated time โ block a full day.
Security
- [ ] Full security review โ conduct a systematic review of the entire application:
- All API routes: authentication, authorization, input validation, output sanitization.
- All RLS policies: do they correctly enforce tenant isolation?
- All file upload paths: are file types validated? Are uploads scoped to the correct tenant?
- All environment variables: are any secrets inadvertently exposed?
-
Review the OWASP Top 10 and verify the application addresses each category.
-
[ ] Consider external penetration test โ for high-growth years or when handling particularly sensitive client data, engage an external security firm for a penetration test. Document findings and remediation actions.
Disaster Recovery
- [ ] Review and update the Disaster Recovery (DR) plan โ the DR plan should document:
- RTO (Recovery Time Objective): target time to restore service after a major outage.
- RPO (Recovery Point Objective): maximum acceptable data loss (time window).
- Steps to restore from Supabase backup.
- Steps to redeploy the application from scratch (fork โ env vars โ deploy).
-
Contact list for all critical vendors. Update this plan to reflect any architectural changes made during the year.
-
[ ] Run a DR drill โ simulate a catastrophic failure in a staging environment:
- Restore the Supabase database from a backup to a new project.
- Deploy the application to a new Vercel project pointing at the restored database.
- Verify the application is functional.
- Document the time taken and any gaps found in the DR plan.
Documentation and Compliance
-
[ ] Update all documentation โ review every page in this documentation portal. Update for any architectural changes, new modules, removed features, or changed processes introduced during the year.
-
[ ] Review SLA commitments โ compare the year's actual uptime data (from UptimeRobot monthly reports) against the 99.5% SLA target. If SLA was missed in any month, document the cause and ensure the prevention action from the relevant post-mortem has been implemented.
Infrastructure
-
[ ] SSL certificate review โ Cloudflare handles SSL certificate auto-renewal. Verify auto-renewal is still enabled and that the certificate expiry dates are at least 60 days out: Cloudflare Dashboard โ SSL/TLS โ Edge Certificates.
-
[ ] Cost review โ review monthly spend for all services:
- Vercel Team plan
- Supabase Pro plan
- Resend plan
- Sentry Team plan
-
UptimeRobot Are costs in line with usage? Are there unused features in paid plans? Should plans be upgraded or downgraded?
-
[ ] Vendor contract review โ review terms of service and pricing for all vendors. Note any upcoming price changes or plan discontinuations.
Add the quarterly and annual maintenance tasks to the team calendar at the start of each year. Blocked time is the only reliable way to ensure these tasks don't get skipped during busy periods.
Cron Jobs
This page documents all scheduled background jobs running in production โ their schedules, what they do, how they are authenticated, and how to test and troubleshoot them.
Overview
The CRM runs scheduled background jobs via Vercel Cron. These jobs are triggered by Vercel's internal scheduler, which calls a designated API route on the configured schedule.
Key points:
- All cron endpoints require a CRON_SECRET bearer token โ requests without it are rejected with 401 Unauthorized.
- Jobs are designed to be idempotent โ running them twice produces the same result as running them once.
- Cron jobs are defined in vercel.json at the project root.
Authentication
Every cron endpoint checks for the CRON_SECRET bearer token:
// Pattern used in all cron routes
const authHeader = req.headers.get('authorization')
if (authHeader !== `Bearer ${process.env.CRON_SECRET}`) {
return NextResponse.json({ error: 'Unauthorized' }, { status: 401 })
}
When Vercel calls a cron route, it automatically attaches this header. When triggering manually (for testing), you must include it yourself:
curl -X POST https://yourdomain.com/api/admin/invoices/cron \
-H "Authorization: Bearer {{ CRON_SECRET }}" \
-H "Content-Type: application/json"
The CRON_SECRET is a server-only environment variable. Never log it, include it in client code, or share it outside the engineering team.
Job 1: Invoice Overdue Check
| Property | Value |
|---|---|
| Schedule | 0 2 * * * โ 02:00 UTC daily |
| Route | POST /api/admin/invoices/cron |
| Action | overdue (passed as query param or body field) |
| Idempotent | Yes |
What It Does
- Queries the
invoicestable for all records where: status = 'sent'due_date < CURRENT_DATE- For each matching invoice, updates
statusto'overdue'. - Sends an overdue reminder email to the client via Resend, using the client's primary contact email.
- Creates an in-app notification for the assigned account manager.
Invoices already in 'overdue', 'paid', or 'cancelled' status are not touched โ the query specifically filters for status = 'sent', making the job safe to re-run.
Example Manual Trigger
curl -X POST "https://yourdomain.com/api/admin/invoices/cron?action=overdue" \
-H "Authorization: Bearer {{ CRON_SECRET }}"
Expected Response
{
"success": true,
"processed": 3,
"message": "3 invoices marked as overdue"
}
Job 2: Recurring Invoice Generation
| Property | Value |
|---|---|
| Schedule | 0 3 * * * โ 03:00 UTC daily |
| Route | POST /api/admin/invoices/cron |
| Action | recurring (passed as query param or body field) |
| Idempotent | Yes |
What It Does
- Queries the
invoicestable (or arecurring_invoice_templatestable) for records where: is_recurring = truenext_invoice_date <= CURRENT_DATEstatusis not'cancelled'- For each match, creates a new invoice as a copy of the template (same client, line items, amounts, payment terms).
- Sets the new invoice's
statusto'draft'(requires admin review before sending) or'sent'depending on the template'sauto_sendflag. - Updates
next_invoice_dateon the template to the next occurrence (e.g., adds 1 month for monthly recurring).
The idempotency guarantee comes from the next_invoice_date update โ once updated, the same template will not match the query again until the next cycle.
Example Manual Trigger
curl -X POST "https://yourdomain.com/api/admin/invoices/cron?action=recurring" \
-H "Authorization: Bearer {{ CRON_SECRET }}"
Expected Response
{
"success": true,
"generated": 2,
"message": "2 recurring invoices generated"
}
Job 3: Appointments Cron (Reference)
| Property | Value |
|---|---|
| Route | POST /api/admin/appointments/cron |
| Purpose | Sends appointment reminder emails and/or marks past appointments as completed |
| Auth | Authorization: Bearer {{ CRON_SECRET }} |
This job follows the same authentication and response pattern as the invoice cron jobs. Refer to the appointments module source code for its specific schedule and logic.
Vercel Cron Configuration
Cron jobs are declared in vercel.json at the project root:
{
"crons": [
{
"path": "/api/admin/invoices/cron?action=overdue",
"schedule": "0 2 * * *"
},
{
"path": "/api/admin/invoices/cron?action=recurring",
"schedule": "0 3 * * *"
},
{
"path": "/api/admin/appointments/cron",
"schedule": "0 6 * * *"
}
]
}
Vercel Cron invocations go through the standard HTTP request path โ the cron route is a normal edge function. The CRON_SECRET authorization check is what distinguishes a scheduled invocation from an unauthorized external request.
Schedule format: Standard 5-field cron (minute hour day-of-month month day-of-week). All times are UTC.
| Field | Values |
|---|---|
| minute | 0โ59 |
| hour | 0โ23 (UTC) |
| day-of-month | 1โ31 |
| month | 1โ12 |
| day-of-week | 0โ7 (0 and 7 = Sunday) |
Testing Crons Locally
To test a cron job during local development:
1. Set CRON_SECRET in .env.local:
CRON_SECRET=local-dev-secret-change-me
2. Start the dev server:
npm run dev
3. Trigger the cron manually via curl:
# Invoice overdue check
curl -X POST "http://localhost:3000/api/admin/invoices/cron?action=overdue" \
-H "Authorization: Bearer local-dev-secret-change-me"
# Recurring invoice generation
curl -X POST "http://localhost:3000/api/admin/invoices/cron?action=recurring" \
-H "Authorization: Bearer local-dev-secret-change-me"
4. Check the terminal output for the function's log output and the returned JSON.
Monitoring Cron Jobs
After each expected run, verify:
- Vercel Dashboard โ Project โ Settings โ Cron Jobs โ check the run history. Green = success, red = failure.
- Run the query manually in Supabase SQL editor to confirm the expected DB changes:
sql -- For overdue check: how many were marked overdue today? SELECT COUNT(*) FROM invoices WHERE status = 'overdue' AND updated_at::date = CURRENT_DATE; - Check Resend Dashboard for outbound overdue reminder emails sent around 02:00 UTC.
What to Do If a Cron Fails
1. Check the Vercel Cron logs
Go to Vercel Dashboard โ Project โ Settings โ Cron Jobs โ [job name] โ View Logs. Look for the HTTP status code and response body.
| Status | Likely Cause |
|---|---|
401 Unauthorized |
CRON_SECRET env var missing or changed โ verify in Vercel environment variables |
500 Internal Server Error |
Application error in the cron handler โ check Sentry and Vercel edge logs |
504 Gateway Timeout |
The cron job is processing too many records โ may need pagination |
| No log at all | Vercel scheduler may have missed the run โ check if the app was in a failed deployment state |
2. Trigger the job manually
Once the root cause is identified and fixed, trigger the missed run manually via curl (see Testing Crons Locally above, using the production URL and production CRON_SECRET).
When manually triggering a cron on production, be aware of side effects: the overdue cron will send emails to real clients. If the missed run is being re-triggered after a delay of more than a day, verify whether the email sends are still appropriate before triggering.
3. Document the failure
Record in the incident log: - Which cron job failed - The scheduled run time - The actual failure time and error - Whether data was affected (e.g., invoices not marked overdue) - Whether the manual re-trigger resolved the data inconsistency - Root cause and fix applied
4. Verify data integrity after recovery
For the invoice overdue cron, run a reconciliation query in Supabase:
-- Find any invoices that should be overdue but are still 'sent'
SELECT id, client_id, due_date, status
FROM invoices
WHERE status = 'sent'
AND due_date < CURRENT_DATE
ORDER BY due_date;
If rows are returned, the overdue cron missed them. Update them manually or re-trigger the cron after verifying the fix.
Monitoring & Logging
This page covers how to monitor the SONAN DIGITAL CRM in production โ error tracking via Sentry, log access via Vercel, uptime monitoring, alert escalation, and the weekly review process.
1. Sentry โ Error Tracking
Sentry captures all unhandled exceptions from both server-side (edge functions) and client-side (browser) code.
Dashboard: https://sentry.io/organizations/sonan-digital/
What to Check
| Signal | Meaning | Action |
|---|---|---|
| New issue (unseen) | A new error class appeared since last review | Triage immediately โ check if it is user-impacting |
| Issue spike | An existing error is occurring at higher than normal frequency | Investigate root cause; likely triggered by a recent deployment or data edge case |
| Unhandled promise rejection | Async code path without error handling | Low urgency unless frequency is high |
TypeError: Cannot read properties of undefined |
Usually a Supabase join typed as a single object โ use [0] indexing |
Fix in next sprint |
| 500 errors on API routes | Edge function throwing โ check function logs | P2 if frequent; P3 if rare |
How to Triage an Issue
- Click the issue in Sentry.
- Read the stack trace โ check if source maps are loaded (if you see minified code,
SENTRY_AUTH_TOKENmay be missing or the build did not upload maps). - Check Breadcrumbs โ the sequence of events leading to the error.
- Check Tags:
url,method,user.idโ this tells you which route and which user was affected. - Check First seen / Last seen / Times seen โ a fresh issue occurring once may be a data anomaly; 500 occurrences in the last hour is a live incident.
- Cross-reference the timestamp with Vercel Deployments โ if it started after a deployment, the deployment is the likely cause.
Resolving Issues
- Fix and resolve: Merge the fix, confirm the error stops occurring in Sentry, then click Resolve. Sentry will reopen the issue automatically if the same error recurs.
- Accept / Ignore: For known non-actionable errors (e.g., browser extension interference), click Ignore and document the reason in the Sentry comment field.
- Assign: Assign unresolved issues to the responsible engineer so they are not forgotten.
Configure Sentry Alert Rules to send an email or Slack notification when: - A new issue is first seen. - Any issue's frequency exceeds 10 occurrences in 1 hour. Go to Sentry โ Alerts โ Create Alert Rule.
2. Vercel Logs
Vercel provides real-time and historical logs for all edge function invocations.
Accessing logs:
- Go to Vercel Dashboard โ Project โ Logs tab.
- Select Runtime Logs (not Build Logs).
- Filter by Edge (not Functions โ all routes use edge runtime).
Filtering Logs
| Filter | How |
|---|---|
| By route | Type the path in the search box, e.g. /api/admin/invoices |
| By time range | Use the time picker โ logs are retained for 1 hour in real-time view; longer with Vercel Pro log draining |
| By status code | Search for status:500 or status:4 to find errors |
| By request ID | If a user reports an error, ask them for the time โ search around that timestamp |
Downloading Logs
For incidents requiring detailed post-mortem analysis, export logs:
- Vercel Dashboard โ Logs โ Export button (top right).
- Download as
.ndjson(newline-delimited JSON). - Process locally with
jq:bash cat logs.ndjson | jq 'select(.statusCode == 500)'
Edge logs in Vercel can have a 10โ30 second delay before appearing in the dashboard. If you are investigating a live issue, wait a moment before concluding a log is absent.
3. Uptime Monitoring
Recommended: UptimeRobot
Set up the following monitors at uptimerobot.com:
| Monitor | URL | Check Interval | Alert When |
|---|---|---|---|
| API Health Check | https://yourdomain.com/api/health |
Every 1 minute | Response is not 200 |
| Homepage | https://yourdomain.com |
Every 5 minutes | Response is not 200 |
| Client Portal Login | https://yourdomain.com/auth/login |
Every 5 minutes | Response is not 200 |
| Admin Dashboard | https://yourdomain.com/admin |
Every 5 minutes | Response is not 200 or 302 |
Alert notifications: Configure UptimeRobot to send alerts to:
- Primary engineer's email
- A dedicated #alerts Slack channel (via UptimeRobot Slack integration)
- A status page (UptimeRobot provides a hosted status page โ share the URL with clients)
Health Check Endpoint
The /api/health route should return:
{
"status": "ok",
"timestamp": "2026-06-30T12:00:00.000Z",
"version": "1.0.0"
}
If this endpoint returns a non-200 status or times out, the application is not serving traffic correctly.
4. Alert Escalation
When a monitoring alert fires, follow this escalation path:
UptimeRobot / Sentry Alert
โ
โผ
1. Engineer on duty checks within 5 minutes
โ
โโโ Resolved in 15 min? โ Document in incident log (P3/P4)
โ
โผ
2. Issue persists > 15 min โ Engineer escalates to Tech Lead
โ
โโโ Resolved in 60 min? โ Document + post-mortem if P2
โ
โผ
3. Issue persists > 1 hour (P1) โ Tech Lead contacts CTO
+ Operations Manager notified for client communication
โ
โผ
4. External vendor support engaged if platform-level (Vercel/Supabase/Stripe)
Connect both Sentry and UptimeRobot to a dedicated #crm-alerts Slack channel. This ensures the whole team sees alerts without relying on one person checking email.
5. Cron Job Monitoring
Cron jobs run silently โ there is no push notification when they succeed. Monitoring must be proactive.
After each expected cron run:
- Go to Vercel Dashboard โ Project โ Settings โ Cron Jobs tab.
- Click the cron job to see its run history โ green checkmarks = success, red X = failure.
- Click a specific run to view the response body and status code.
- Cross-check the database: for the overdue invoice cron (runs at 02:00 UTC), check in Supabase that invoices with
due_date < todayand oldstatus = 'sent'now havestatus = 'overdue'.
If a cron shows a failure:
- Check the Vercel function logs around the scheduled run time.
- Trigger the cron manually (see Cron Jobs for curl commands).
- Verify the CRON_SECRET environment variable is correctly set if you see 401 responses.
6. Weekly Monitoring Review
Perform this review every Monday morning (or the first business day of each week):
- [ ] Sentry: Open Sentry โ Issues โ filter to "Last 7 days". Review all new and unresolved issues. Assign owners to any unowned issues.
- [ ] Sentry error rate: Check the Sentry project overview for error rate trends โ is the rate increasing week over week?
- [ ] Vercel deployments: Review the deployments from the past week. Any failed deployments? Any rollbacks?
- [ ] Cron job logs: Confirm cron jobs ran successfully every day this week. Note any failures.
- [ ] Uptime: Check UptimeRobot for any downtime incidents in the past 7 days. What was the total uptime percentage?
- [ ] Stripe: Check Stripe Dashboard โ Developers โ Events for any failed webhook deliveries.
- [ ] Resend: Check Resend Dashboard โ Emails for any bounce or delivery failure spikes.
- [ ] Supabase: Check Supabase Dashboard โ Database โ Reports for query performance โ any slow queries (>500ms)?
Document the review outcome in the team's weekly ops note (Slack, Notion, or equivalent).
7. SLA Targets
| Metric | Target | Measured How |
|---|---|---|
| Production uptime | โฅ 99.5% per month | UptimeRobot monthly report |
| API health check response time | < 500ms (p95) | Vercel Analytics or UptimeRobot response time graph |
| Unhandled error rate | < 0.1% of requests | Sentry error rate vs. Vercel request count |
| Cron job success rate | 100% (zero missed runs per month) | Vercel cron run history |
| P1 incident response time | < 15 minutes | Incident log |
99.5% uptime allows approximately 3.65 hours of downtime per month. Vercel's SLA is 99.99% for the Edge network. Supabase Pro's SLA is 99.9%. The combined practical uptime target is set conservatively at 99.5% to account for application-level incidents.