Skip to main content

The Evolution of Pluggable Tree Strategies: From Nested Sets to Recursive CTEs and Beyond

· 5 min read
Revant Nandgaonkar
Maintainer of Framework M

Hierarchical data is everywhere in business applications—from organizational charts and product catalogs to tasks, nested comments, and ledger accounts. Yet, representing tree structures in a relational SQL database is a classic engineering trade-off between write performance, read complexity, and storage overhead.

In this article, we'll walk through the evolution of tree strategies in Framework M: how we moved from a Frappe-inspired Nested Set model to a pluggable Strategy Pattern supporting recursive CTEs, and how this architecture paves the way for specialized document and graph repositories.


The Starting Point: Frappe and the Nested Set Model

When we first designed tree support in Framework M, we drew heavy inspiration from Frappe Framework, which uses the Nested Set Model (via lft and rgt bounds) to represent hierarchies.

Root (1, 10)
/ \
Child A (2, 5) Child B (6, 9)
/ \
Leaf A1 (3, 4) Leaf A2 (5, 5)

The Read Advantage

Nested Sets are incredibly fast for read-heavy hierarchies. Retrieving an entire subtree or finding all descendants of a node requires a single, simple SQL range query without recursion or joins:

SELECT * FROM accounts WHERE lft BETWEEN :parent_lft AND :parent_rgt;

The Write Bottleneck

However, this efficiency comes at a steep cost for write operations. Because nodes are packed sequentially inside the bounds, inserting a new node or moving a subtree requires shifting the lft and rgt values of other rows across the table:

UPDATE accounts SET rgt = rgt + 2 WHERE rgt >= :new_lft;
UPDATE accounts SET lft = lft + 2 WHERE lft >= :new_lft;

In a high-concurrency production database, this gap-opening logic causes table-wide row locking, deadlocks, and severe latency spikes. For write-heavy structures like a task list or comments thread, the Nested Set model is a major bottleneck.


Exploring Alternatives: Odoo's Materialized Path

Next, we looked at Odoo, which takes a different approach called Materialized Path (or Lineage).

Instead of integer bounds, Odoo stores the ancestry line directly in a text column (e.g., parent_path = "1/5/12/").

  • The Pros: Inserting a node is simple: append the parent's ID to the parent's path. Moving a node requires updating the prefix of its descendants, which is faster and cleaner than shifting sequential integer bounds.
  • The Cons: Queries require string prefix matching (LIKE '1/5/%'), which can be slow on large datasets and introduces index fragmentation. It also places a hard limit on tree depth based on column character limits.

The UUID Conundrum in Distributed Systems

In Framework M, we use UUIDs (including deterministic UUIDv5) for primary keys to support high-throughput, lock-free, and distributed record generation. If we were to apply Odoo's Materialized Path strategy to our models, our path strings would look like this: parent_path = "f47ac10b-58cc-4372-a567-0e02b2c3d479/3b4e6d42-ef49-49aa-add4-a2e0e454d062/"

Even for a shallow tree of depth 3, the path string exceeds 110 characters. Storing and indexing such long strings on every row incurs massive index storage overhead, inflates database memory pages, and severely slows down string-based prefix queries.


The Choice: Adjacency List with Recursive CTEs

We wanted the simplicity of standard parent-child pointers (parent_value) but needed to avoid the "N+1 query problem" traditionally associated with fetching Adjacency Lists in SQL.

Fortunately, modern relational databases (PostgreSQL, SQLite, and MySQL 8.0+) natively support Recursive Common Table Expressions (CTEs). This makes it possible to fetch arbitrary subtrees recursively in a single query:

WITH RECURSIVE tree_cte AS (
SELECT id, parent_value, name FROM todo WHERE id = :parent_id
UNION ALL
SELECT t.id, t.parent_value, t.name
FROM todo t
INNER JOIN tree_cte ON t.parent_value = tree_cte.id
)
SELECT * FROM tree_cte;

By coupling the Adjacency List model with Recursive CTEs, we achieved:

  1. O(1) Writes: Inserting or moving a node is a single row update (changing the parent_value column).
  2. Fast Subtree Reads: Subtrees are computed entirely on the database server side and returned in a single query.
  3. Strict Validation: We run ancestor CTEs during updates to immediately catch and block cycle loops (the "Ancestor Trap").

Pluggable Strategy Architecture

Rather than forcing developers into a single model, we established a Strategy Pattern to support both patterns transparently.

Developers define their tree schema by choosing a mixin:

  • Use NestedSetMixin for read-heavy structures (e.g., Chart of Accounts).
  • Use AdjacencyListMixin for write-heavy hierarchies (e.g., Project Tasks).

Behind the scenes, TreeManager resolves the mixin type and delegates operations to the corresponding NestedSetTreeStrategy or AdjacencyListTreeStrategy. The generic database repository, web controller routes, and frontend tree views automatically adapt to the active strategy without code duplication.

Future Proof

Because we decoupled the tree logic into a Strategy Pattern, adding new representations (like Odoo's Materialized Path) in the future requires only implementing a new strategy class, leaving the core repository untouched.


Beyond SQL: Graph and Document Repositories

While recursive CTEs scale well for most business applications, deeply nested networks or graph-like relations can still push relational databases to their limits.

Since Framework M's data persistence layer relies on clean repository interfaces, we can extend this hierarchy beyond relational SQL databases:

1. Document Stores (MongoDB)

If a tree is mostly self-contained, we can represent it using nested subdocuments or a materialized path array in MongoDB. MongoDB's $graphLookup stage handles recursive hierarchical lookups natively, making it a powerful alternative for document-heavy trees.

2. Graph Databases (Neo4j)

For complex networks with heterogeneous nodes (e.g., bill of materials, social networks, or permission paths), we can route operations to a specialized Graph Repository running Cypher queries:

MATCH (p:Task {id: $parent_id})<-[:PARENT*]-(c:Task)
RETURN c;

By decoupling the domain entities from the storage adapter, Framework M ensures that as your hierarchical data grows, you can seamlessly migrate your repositories to graph or document databases without changing your core business logic or UI representations.