Skip to main content

Zero Downtime Migration (ZDM) Patterns

Framework M provides the tools and patterns necessary to perform database schema updates without application downtime. This is achieved through a Two-Phase Migration strategy and a CI/CD Verification Pipeline.

1. The Two-Phase Migration Strategy

To avoid breaking the application while migrations are running, schema changes must be forward-compatible.

Phase 1: Forward-Compatible Schema Update

  • Goal: Add new columns or tables without modifying existing ones.
  • Action: Run m migrate sync (declarative) or apply safe patches.
  • Rule: All new columns MUST be nullable or have a database-level default.

Phase 2: Data Backfill & Cleanup

  • Goal: Migrate existing data to the new schema and remove legacy fields.
  • Action: Run background jobs or imperative Alembic patches.
  • Final Step: Once backfill is complete, deploy code that only uses the new schema, then run a migration to drop old artifacts.

2. CI/CD Verification Pipeline

Integrate m migrate verify into your CI pipeline to catch breaking changes before they reach production.

"Dirty Schema" Failure States

The verify command will return a non-zero exit code if it detects:

  • Unmanaged Tables: Tables in the DB that don't match any DocType (Security risk).
  • Breaking Changes: Type mismatches or dropped columns that haven't been reviewed.
  • Missing Revisions: Local migration files that haven't been applied to the test DB.

3. High-Scale Postgres Patterns

When working with large datasets, standard DDL can lock tables and cause downtime.

Concurrent Index Creation

For large tables, never create indexes in a standard transaction. Use the CONCURRENTLY keyword:

# In an Alembic migration
def upgrade():
op.execute("COMMIT") # Break the transaction
op.execute("CREATE INDEX CONCURRENTLY idx_name ON table_name (col)")

Transaction-Break Requirements

Operations like VACUUM, CREATE INDEX CONCURRENTLY, and ALTER TYPE ... ADD VALUE (Enum updates) cannot run inside a transaction block. Always use op.execute("COMMIT") before running these in your patches.

4. Troubleshooting & Recovery

State-Reset Patterns

If a declarative sync fails mid-way, the database might be in a "Partial Sync" state.

  • Recovery: Fix the underlying issue (e.g., lock contention) and re-run m migrate all. The engine is idempotent and will resume from the last successful change.

Manual alembic_version Fixing

If your migration history becomes corrupted (e.g., manual DB edits), you can manually align the version table:

-- View current version
SELECT * FROM alembic_version;

-- Manually set to a specific revision
UPDATE alembic_version SET version_num = 'abc123xyz';

5. Best Practices

  1. Always use Nullable: When adding columns in Phase 1, always make them nullable=True.
  2. Avoid DROP in Phase 1: Never drop a column in the same migration that adds a new one.
  3. Run Verify in CI: Make m migrate verify a mandatory check for every Pull Request.