Skip to main content

Deterministic Schema Migrations (Smart Sync)

Deterministic Schema Migration or "Smart Sync" is a performance optimization in Framework M that eliminates redundant database reflection and schema detection when no structural changes have occurred in your DocType definitions.

The Problem: The Reflection Bottleneck

In a traditional declarative migration system, every time you run a synchronization command (like m migrate sync), the framework must:

  1. Reflect: Query the database to understand the current physical schema (tables, columns, indexes).
  2. Map: Generate a virtual representation of your current Python DocTypes.
  3. Detect: Compare the two models to identify differences.

For large applications or multi-tenant clusters with hundreds of schemas, the Reflection phase becomes a significant bottleneck, often taking seconds per database even when no changes are needed.

The Solution: Two-Layer Synchronization

Framework M implements a "Smart Sync" strategy that adds a fast-path validation layer before the expensive reflection phase.

Layer 1: Structural Checksum (Fast Path)

Every DocType's database representation (SQLAlchemy Table) is converted into a deterministic dictionary containing all DDL-relevant metadata:

  • Column names, types, and nullability.
  • Unique constraints and primary keys.
  • Indexes and their included fields.

This dictionary is hashed using SHA-256 to create a Schema Signature. This signature is then compared against the last known signature stored in a local system table called __schema_versions__.

  • Match: If the signatures match, the system assumes the schema is in sync and skips all reflection and detection.
  • Mismatch: If the signatures differ (or the signature is missing), the system proceeds to Layer 2.

Bypassing Layer 1 (Force Reflection)

Users can manually bypass the checksum fast-path by using the --force flag in the CLI. This is useful for:

  • Troubleshooting suspected state mismatches between the __schema_versions__ table and the physical database.
  • Forcing a complete schema validation and reflection pass.

Layer 2: Full Reflection (Deep Sync)

If Layer 1 detects a change, the framework falls back to the traditional method:

  1. Reflect the physical database.
  2. Perform a deep, field-by-field comparison.
  3. Generate and apply only the specific DDL changes required to reach the target state.

The __schema_versions__ Table

This is a framework-managed system table that tracks the state of your migrations. It contains two primary columns:

  • doctype_name: The unique identifier for the DocType.
  • checksum: The SHA-256 signature of the last successfully synced schema.

[!NOTE] This table is automatically initialized during the first run of m migrate sync. It is ignored by the migration engine's "drop detection" logic, ensuring it is never accidentally deleted.

The "Self-Healing" Handshake

While Smart Sync focuses on declarative state, it is designed to work in tandem with the Imperative Track (Alembic). When an Alembic patch modifies the database schema (e.g., a column rename), the next run of m migrate sync will detect a hash mismatch and trigger a reflection pass.

If the database already matches the new DocType state (because of the patch), the sync engine will simply update the stored signature in __schema_versions__ and exit. This ensures that versioned patches and declarative sync remain perfectly synchronized.

For a deeper dive into how these two systems interact, see the Dual-Track Migration Philosophy.

Multi-Tenancy and Isolation

In multi-tenant environments using separate schemas or databases, the __schema_versions__ table is maintained per-tenant. This ensures that:

  • Different tenants can safely exist at different schema versions.
  • Synchronization remains isolated and deterministic across the entire cluster.
  • Performance gains scale linearly with the number of tenants, as "clean" tenants bypass the reflection bottleneck.

Fail-Safe Design

The Smart Sync logic is designed to be non-intrusive. If the __schema_versions__ table is unavailable (e.g., due to database permissions or during unit tests using mock engines), the system gracefully falls back to Layer 2 (Deep Sync). This ensures that migration reliability is never sacrificed for performance.