Phase 10: Production Readiness & Deployment

Objective: Optimize for production, add monitoring, implement security best practices, and create deployment configurations.

1. Performance Optimization

1.1 Database Optimization

1.2 Caching Strategy

1.3 Connection Pooling

1.4 Async Optimization

Use asyncio.gather() for parallel operations
Avoid blocking calls in async functions
Use connection pools efficiently

2. Security Hardening

2.1 Input Validation

Validate all inputs with Pydantic
Sanitize user-provided data
Prevent SQL injection (use parameterized queries)
Prevent XSS (API-only, no HTML rendering)

2.2 Authentication Security

2.3 Authorization Security

Enforce permissions on all endpoints
Apply RLS to all queries
Validate object-level permissions
Log permission denials

2.4 Secrets Management

2.5 HTTPS & CORS

Enforce HTTPS in production
Configure CORS properly:
- Whitelist allowed origins
- Restrict methods and headers

3. Monitoring & Observability

3.1 Logging

3.2 Metrics

3.3 Tracing

3.4 Health Checks

Create /health endpoint:
- Check database connection
- Check Redis connection
- Return 200 if healthy, 503 if not
Create /ready endpoint:
- Check if app is ready to serve traffic
- Check migrations are up to date

4. Error Handling & Resilience

4.1 Graceful Degradation

Handle database failures gracefully
Handle Redis failures (fallback to no cache)
Retry transient errors

4.2 Circuit Breaker

Implement circuit breaker for external services:
- Webhook calls
- Email sending
- Print service (Gotenberg)

4.3 Timeouts

Set timeouts for all operations:
- HTTP requests: 30s
- Database queries: 10s
- Background jobs: 300s

5. Deployment Configurations

5.1 Docker

Create production Dockerfile:

FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN uv pip install --no-cache-dir .
CMD ["m", "start", "--host", "0.0.0.0"]

Multi-stage build for smaller images
Use non-root user
Add health check

5.2 Docker Compose

5.3 Kubernetes

5.4 Cloud Platforms

6. CI/CD Pipeline

6.1 GitLab CI

Configure .gitlab-ci.yml (already exists in repo):

stages:
  - test
  - build
  - deploy

test:
  stage: test
  script:
    - uv sync
    - uv run pytest
    - uv run mypy --strict .
    - uv run ruff check .
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"

build:
  stage: build
  script:
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

deploy-staging:
  stage: deploy
  script:
    - kubectl set image deployment/app app=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
  environment: staging
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

deploy-production:
  stage: deploy
  script:
    - kubectl set image deployment/app app=$CI_REGISTRY_IMAGE:$CI_COMMIT_TAG
  environment: production
  rules:
    - if: $CI_COMMIT_TAG =~ /^v.*/

6.2 Pre-commit Hooks

Setup pre-commit:
- Run ruff formatter
- Run mypy
- Run tests

6.3 Release Versioning & Publishing

Adopt Semantic Versioning (SemVer):
- MAJOR.MINOR.PATCH format
- Document breaking change policy
- Use 0.x.y for pre-1.0 releases
CHANGELOG management:
- Create CHANGELOG.md following Keep a Changelog format
- Sections: Added, Changed, Deprecated, Removed, Fixed, Security
- Option: Use git-cliff for automation
- Link CHANGELOG to documentation site
Conventional Commits (optional but recommended):
- feat: → MINOR bump
- fix: → PATCH bump
- feat!: or BREAKING CHANGE: → MAJOR bump
- Setup commitlint for enforcement

Release workflow (add to .gitlab-ci.yml):

release:
  stage: deploy
  script:
    - uv build
    - uv publish
  variables:
    UV_PUBLISH_TOKEN: $PYPI_TOKEN
  rules:
    - if: $CI_COMMIT_TAG =~ /^v.*/

release-notes:
  stage: deploy
  image: registry.gitlab.com/gitlab-org/release-cli:latest
  script:
    - echo "Creating release for $CI_COMMIT_TAG"
  release:
    tag_name: $CI_COMMIT_TAG
    description: ./CHANGELOG.md
  rules:
    - if: $CI_COMMIT_TAG =~ /^v.*/

Version sources:
- Single source of truth in pyproject.toml
- Use dynamic versioning from git tags OR __version__ in package
- Expose version via m --version CLI
Pre-release versions:
- Support alpha/beta/rc: 1.0.0-alpha.1, 1.0.0-beta.2, 1.0.0-rc.1
- Publish pre-releases to TestPyPI first

6.4 Git Branching Strategy

[!NOTE] Framework M's hexagonal architecture (adapters are swappable, not forked) enables a simpler branching model than Frappe's version-XX branches. Breaking changes are isolated to Protocol interfaces, not scattered across the codebase.

Trunk-Based Development (recommended for M):

main (default)
  │
  ├── feature/xyz        # Short-lived feature branches
  ├── fix/issue-123      # Bug fix branches
  │
  └── Tags: v0.1.0, v0.2.0, v1.0.0, v1.0.1, v1.1.0 ...

Branch naming conventions:
- main — Always deployable, latest development
- feature/<name> — New features (merge to main via PR)
- fix/<issue-id> — Bug fixes (merge to main via PR)
- release/v<major>.<minor> — Created only when LTS support needed

Release branches (only for LTS):

main
  │
  ├─── v1.0.0 (tag)
  │
  └─── release/v1  ← Created when v2.0.0 is released
         │
         ├─── v1.0.1 (backport tag)
         └─── v1.0.2 (backport tag)

LTS (Long-Term Support) Policy:
- Support last 2 major versions (not 3 like Frappe — M is simpler)
- Security fixes: Backport to all supported versions
- Bug fixes: Backport to latest minor of each supported major
- Features: Only on main, no backports
Backporting workflow:
- Fix on main first (always forward-fix)
- Cherry-pick to release/vX branch
- Tag patch release: vX.Y.Z
- Tool: Use git cherry-pick -x to track source commit
Why simpler than Frappe?
- Hexagonal architecture = Adapters can be versioned independently
- No DB schema in core = Migrations are app-specific, not framework-specific
- Protocol stability = Breaking changes are explicit, not hidden
- Apps can pin framework version = No forced upgrade cascades

7. Database Migration Strategy

[!IMPORTANT] Migrations are generated by Alembic (Phase 02). This section defines when and how to apply them safely.

7.1 Change Classification

Category	Examples	Auto-Apply (Dev)?	Production Strategy
Safe	Add nullable column, add table, add index	✅ Yes	Apply directly
Review	Add non-nullable column, change type (compatible)	⚠️ Generate only	Review → Staging → Prod
Dangerous	Drop column, rename column, change type (incompatible)	❌ No	Manual migration with data handling

7.2 Safe Changes (Auto-Migration in Dev Mode)

These changes are automatically applied in development (m start --dev):

Add table: New DocType → new table
Add nullable column: New optional field → ALTER TABLE ADD COLUMN ... NULL
Add index: Meta.indexes → CREATE INDEX
Add foreign key: Link field → ALTER TABLE ADD CONSTRAINT

# Dev mode: auto-applies safe migrations
m start --dev  # Detects changes, generates & applies migration

# Production: never auto-apply
m start        # Requires explicit `m migrate` first

7.3 Review-Required Changes

These changes generate migrations but require human review:

Add non-nullable column without default:
- Migration will fail if table has existing rows
- Solution: Add with default → backfill → remove default
Change column type (compatible):
- str → Text (widening): Usually safe
- int → bigint (widening): Usually safe
- Review for edge cases

# CLI shows warning for review-required changes
$ m migrate:create add_phone_field

⚠️  Review required: 'phone' is non-nullable without default
    Existing rows will cause migration to fail.

    Recommendation:
    1. Add field with default: phone: str = ""
    2. Run migration
    3. Backfill data
    4. Remove default if desired

7.4 Dangerous Changes (Manual Handling Required)

These changes require explicit data migration code:

Drop Column

# Migration: drop_old_field.py
def upgrade():
    # Step 1: Ensure no code references the column
    # Step 2: Drop the column
    op.drop_column('invoice', 'old_field')

def downgrade():
    # Re-add column (data is lost!)
    op.add_column('invoice', sa.Column('old_field', sa.String(255)))

Rename Column

# Migration: rename_customer_to_party.py
def upgrade():
    # Rename preserves data
    op.alter_column('invoice', 'customer', new_column_name='party')

def downgrade():
    op.alter_column('invoice', 'party', new_column_name='customer')

Type Change (Incompatible)

# Migration: age_str_to_int.py
def upgrade():
    # Step 1: Add new column
    op.add_column('person', sa.Column('age_int', sa.Integer(), nullable=True))

    # Step 2: Migrate data with transformation
    connection = op.get_bind()
    connection.execute(text("""
        UPDATE person
        SET age_int = CASE
            WHEN age ~ '^[0-9]+$' THEN age::integer
            ELSE NULL
        END
    """))

    # Step 3: Drop old column
    op.drop_column('person', 'age')

    # Step 4: Rename new column
    op.alter_column('person', 'age_int', new_column_name='age')

def downgrade():
    # Reverse transformation
    op.alter_column('person', 'age', type_=sa.String(10),
                    postgresql_using='age::varchar')

7.5 Data Migration Patterns

Pattern A: Backfill in Migration (Small Tables)

def upgrade():
    # For tables `< 100k` rows: inline backfill
    op.add_column('user', sa.Column('full_name', sa.String(255)))
    connection = op.get_bind()
    connection.execute(text("""
        UPDATE user SET full_name = first_name || ' ' || last_name
    """))

Pattern B: Background Job (Large Tables)

# Migration: add_computed_field.py
def upgrade():
    # Step 1: Add nullable column
    op.add_column('invoice', sa.Column('tax_amount', sa.Numeric(), nullable=True))
    # Step 2: Document: run `m job:run backfill_tax_amounts` after migration

# Job: jobs/backfill_tax_amounts.py
@job
async def backfill_tax_amounts(batch_size: int = 1000):
    """Backfill tax_amount for existing invoices."""
    async with uow_factory() as uow:
        invoices = await repo.list(
            filters=[Filter("tax_amount", "is", None)],
            limit=batch_size
        )
        for inv in invoices.items:
            inv.tax_amount = inv.total * inv.tax_rate
            await repo.save(uow.session, inv)
        await uow.commit()

Pattern C: Two-Phase Migration (Zero Downtime)

Phase 1 (Deploy v1.1):
  - Add new column (nullable)
  - Code writes to BOTH old and new columns
  - Background job backfills new column

Phase 2 (Deploy v1.2):
  - Code reads from new column only
  - Drop old column in migration

7.6 Zero-Downtime Migrations

Never drop columns in the same deploy as code changes
Always add columns as nullable first
Always backfill data before making non-nullable
Always deploy code changes before dropping columns

Deploy Timeline:

  v1.0: old_field exists, code uses old_field
         │
  v1.1: new_field added (nullable), code writes to BOTH
         │ ← Run backfill job
  v1.2: code reads from new_field only
         │
  v1.3: drop old_field migration

7.7 Migration Workflow (Production)

# 1. Generate migration (dev environment)
m migrate:create add_user_phone_field

# 2. Review generated migration
cat alembic/versions/xxxx_add_user_phone_field.py

# 3. Test on local/staging
m migrate
m migrate:rollback  # Verify rollback works

# 4. Commit migration file to Git
git add alembic/versions/xxxx_add_user_phone_field.py
git commit -m "feat: add phone field to user"

# 5. Deploy to staging
# CI/CD runs: m migrate

# 6. Verify staging works

# 7. Deploy to production
# CI/CD runs: m migrate

7.8 CLI Commands Reference

Command	Description
`m migrate`	Apply all pending migrations
`m migrate:status`	Show current migration state
`m migrate:create \<name\>`	Generate new migration
`m migrate:rollback`	Rollback last migration
`m migrate:history`	Show migration history
`m migrate:heads`	Show latest migration(s)

8. Backup & Recovery

8.1 Database Backups

Automated daily backups
Store backups in S3 or equivalent
Retention policy (30 days)
Test restore process

8.2 Disaster Recovery

Document recovery procedures
Test recovery regularly
RTO (Recovery Time Objective): 4 hours
RPO (Recovery Point Objective): 24 hours

9. Scaling Strategy

9.1 Horizontal Scaling

API servers: Stateless, scale horizontally
Workers: Scale based on queue length
Database: Use read replicas for read-heavy workloads

9.2 Load Balancing

Use load balancer (Nginx, ALB)
Health check endpoints
Session affinity not required (stateless)

10. Documentation

10.1 Deployment Guide

Write step-by-step deployment guide
Include environment variables reference
Include troubleshooting section

10.2 Operations Runbook

11. Testing

11.1 Load Testing

11.2 Security Testing

Run security scans:
- OWASP ZAP
- Dependency vulnerability scan
- Container image scan

Validation Checklist

Final checks before production:

Anti-Patterns to Avoid

❌ Don't: Deploy without testing ✅ Do: Test thoroughly in staging

❌ Don't: Ignore monitoring ✅ Do: Set up alerts for critical metrics

❌ Don't: Hardcode configuration ✅ Do: Use environment variables

❌ Don't: Run as root in containers ✅ Do: Use non-root user

❌ Don't: Skip backups ✅ Do: Automate and test backups regularly

1. Performance Optimization​

1.1 Database Optimization​

1.2 Caching Strategy​

1.3 Connection Pooling​

1.4 Async Optimization​

2. Security Hardening​

2.1 Input Validation​

2.2 Authentication Security​

2.3 Authorization Security​

2.4 Secrets Management​

2.5 HTTPS & CORS​

3. Monitoring & Observability​

3.1 Logging​

3.2 Metrics​

3.3 Tracing​

3.4 Health Checks​

4. Error Handling & Resilience​

4.1 Graceful Degradation​

4.2 Circuit Breaker​

4.3 Timeouts​

5. Deployment Configurations​

5.1 Docker​

5.2 Docker Compose​

5.3 Kubernetes​

5.4 Cloud Platforms​

6. CI/CD Pipeline​

6.1 GitLab CI​

6.2 Pre-commit Hooks​

6.3 Release Versioning & Publishing​

6.4 Git Branching Strategy​

7. Database Migration Strategy​

7.1 Change Classification​

7.2 Safe Changes (Auto-Migration in Dev Mode)​

7.3 Review-Required Changes​

7.4 Dangerous Changes (Manual Handling Required)​

Drop Column​

Rename Column​

Type Change (Incompatible)​

7.5 Data Migration Patterns​

Pattern A: Backfill in Migration (Small Tables)​

Pattern B: Background Job (Large Tables)​

Pattern C: Two-Phase Migration (Zero Downtime)​

7.6 Zero-Downtime Migrations​

7.7 Migration Workflow (Production)​

7.8 CLI Commands Reference​

8. Backup & Recovery​

8.1 Database Backups​

8.2 Disaster Recovery​

9. Scaling Strategy​

9.1 Horizontal Scaling​

9.2 Load Balancing​

10. Documentation​

10.1 Deployment Guide​

10.2 Operations Runbook​

11. Testing​

11.1 Load Testing​

11.2 Security Testing​

Validation Checklist​

Anti-Patterns to Avoid​