Deterministic Schema Migrations (Smart Sync)
Deterministic Schema Migration or "Smart Sync" is a performance optimization in Framework M that eliminates redundant database reflection and schema detection when no structural changes have occurred in your DocType definitions.
The Problem: The Reflection Bottleneck
In a traditional declarative migration system, every time you run a synchronization command (like m migrate sync), the framework must:
- Reflect: Query the database to understand the current physical schema (tables, columns, indexes).
- Map: Generate a virtual representation of your current Python DocTypes.
- Detect: Compare the two models to identify differences.
For large applications or multi-tenant clusters with hundreds of schemas, the Reflection phase becomes a significant bottleneck, often taking seconds per database even when no changes are needed.
The Solution: Two-Layer Synchronization
Framework M implements a "Smart Sync" strategy that adds a fast-path validation layer before the expensive reflection phase.
Layer 1: Structural Checksum (Fast Path)
Every DocType's database representation (SQLAlchemy Table) is converted into a deterministic dictionary containing all DDL-relevant metadata:
- Column names, types, and nullability.
- Unique constraints and primary keys.
- Indexes and their included fields.
This dictionary is hashed using SHA-256 to create a Schema Signature. This signature is then compared against the last known signature stored in a local system table called __schema_versions__.
- Match: If the signatures match, the system assumes the schema is in sync and skips all reflection and detection.
- Mismatch: If the signatures differ (or the signature is missing), the system proceeds to Layer 2.
Bypassing Layer 1 (Force Reflection)
Users can manually bypass the checksum fast-path by using the --force flag in the CLI. This is useful for:
- Troubleshooting suspected state mismatches between the
__schema_versions__table and the physical database. - Forcing a complete schema validation and reflection pass.
Layer 2: Full Reflection (Deep Sync)
If Layer 1 detects a change, the framework falls back to the traditional method:
- Reflect the physical database.
- Perform a deep, field-by-field comparison.
- Generate and apply only the specific DDL changes required to reach the target state.
The __schema_versions__ Table
This is a framework-managed system table that tracks the state of your migrations. It contains two primary columns:
doctype_name: The unique identifier for the DocType.checksum: The SHA-256 signature of the last successfully synced schema.
[!NOTE] This table is automatically initialized during the first run of
m migrate sync. It is ignored by the migration engine's "drop detection" logic, ensuring it is never accidentally deleted.
The "Self-Healing" Handshake
While Smart Sync focuses on declarative state, it is designed to work in tandem with the Imperative Track (Alembic). When an Alembic patch modifies the database schema (e.g., a column rename), the next run of m migrate sync will detect a hash mismatch and trigger a reflection pass.
If the database already matches the new DocType state (because of the patch), the sync engine will simply update the stored signature in __schema_versions__ and exit. This ensures that versioned patches and declarative sync remain perfectly synchronized.
For a deeper dive into how these two systems interact, see the Dual-Track Migration Philosophy.
Multi-Tenancy and Isolation
In multi-tenant environments using separate schemas or databases, the __schema_versions__ table is maintained per-tenant. This ensures that:
- Different tenants can safely exist at different schema versions.
- Synchronization remains isolated and deterministic across the entire cluster.
- Performance gains scale linearly with the number of tenants, as "clean" tenants bypass the reflection bottleneck.
Fail-Safe Design
The Smart Sync logic is designed to be non-intrusive. If the __schema_versions__ table is unavailable (e.g., due to database permissions or during unit tests using mock engines), the system gracefully falls back to Layer 2 (Deep Sync). This ensures that migration reliability is never sacrificed for performance.