Skip to main content

Backup & Recovery: Postgres & Redis

Ensuring data durability is critical for production readiness. Framework M relies on Postgres for persistent domain data and Redis for transient but important operational data (cache, rate limits, non-persistent sessions).

1. PostgreSQL Backup Strategy

Postgres is the system of record. We recommend a two-layered backup approach.

1.1 Logical Backups (pg_dump)

Suitable for small to medium-sized datasets.

  • Process: Run pg_dump daily.
  • Automation: Use a CronJob (Kubernetes) or a systemd timer (VMs) to dump the database and upload to secure storage (S3, GCS).
  • Tool Recommendation: pgbackrest or simple scripts in a dedicated backup container.
# Example backup command
pg_dump $DATABASE_URL -F c -b -v -f /backups/db-$(date +%F).dump

1.2 Point-In-Time Recovery (PITR)

Essential for large datasets and high-availability requirements.

  • Process: WAL (Write-Ahead Logging) archiving.
  • Tool Recommendation: WAL-G or CloudNativePG (if on Kubernetes).
  • Benefit: Allows you to restore exactly to a specific timestamp, minimizing data loss to seconds.

2. Redis Backup Strategy

While much of the data in Redis is transient (cache), some components (like Rate Limiter counters or certain session data) might be important for operational continuity.

2.1 Persistence Settings

Ensure Redis is configured for persistence in redis.conf:

# RDB: Periodic snapshots
save 900 1
save 300 10

# AOF: Append-only file for every write (Recommended for Durability)
appendonly yes
appendfsync everysec

2.2 Snapshots Backup

  • Process: Backup the dump.rdb or appendonly.aof file regularly.
  • Restoration: Stop Redis, replace the file, and start Redis.

3. Recovery Validation

A backup is only as good as its restore.

  1. Automated Restore Tests: Periodically trigger an automated restore of a production backup to a staging environment.
  2. Monitoring: Set up alerts for backup failures (e.g., if a backup file hasn't been updated in 24 hours).
  3. Documentation: Keep a "Disaster Recovery" runbook updated with exact CLI commands for your environment.

Disaster Recovery

In the event of a total site failure or data corruption:

  1. Provision new infrastructure using the manifests in /docs/deployment/kubernetes.
  2. Restore the latest stable database backup.
  3. Re-point traffic to the new instances.