Files
rust_browser/_bmad-output/implementation-artifacts/2-3-adoption-agency-and-foster-parenting.md
2026-03-14 22:39:20 -04:00

19 KiB

Story 2.3: Adoption Agency & Foster Parenting

Status: done

Story

As a web user, I want misnested formatting tags and misplaced table content to render correctly, So that real-world pages with imperfect HTML display as expected.

Acceptance Criteria

  1. Misnested formatting elements restructured per WHATWG HTML §13.2.6.4.7: Input like <b>text<i>more</b>rest</i> produces the correct DOM tree where formatting is properly nested — <b> wraps "text" and the first "more", a new <i> wraps "rest" after </b>.

  2. Adoption agency iteration and nesting limits respected: The algorithm enforces the spec's outer loop limit (8 iterations) and inner loop limit (3 iterations) to prevent infinite loops on deeply nested or pathological formatting markup.

  3. Foster parenting works for non-table content inside tables: Non-table content directly inside <table>, <tbody>, <tr> elements is reparented to the correct location before the table per WHATWG HTML §13.2.6.1.

  4. Adoption agency and foster parenting interact correctly: Documents combining both mis-nested formatting elements and misplaced table content produce the spec-defined DOM tree.

  5. WPT tree-construction tests pass for adoption agency and foster parenting cases. docs/HTML5_Implementation_Checklist.md updated. just ci passes.

Tasks / Subtasks

  • Task 1: Add positional DOM insertion (AC: #1, #3)

    • 1.1 Add Document::insert_before(parent: NodeId, new_child: NodeId, ref_child: NodeId) method in crates/dom/src/document.rs — inserts new_child before ref_child in parent.children; if new_child already has a parent, remove it first
    • 1.2 Add Document::insert_at(parent: NodeId, child: NodeId, index: usize) helper — inserts child at a specific index in parent's children vec
    • 1.3 Add unit tests for both methods: insert at beginning, middle, end; move from one parent to another; edge cases (empty children, out-of-range index clamps to end)
  • Task 2: Implement adoption agency algorithm (AC: #1, #2)

    • 2.1 Add method TreeBuilder::run_adoption_agency(&mut self, tag: &str, doc: &mut Document) — the main algorithm entry point, called from in-body mode when an end tag for a formatting element is received
    • 2.2 Implement step 1-3: If the subject (end tag name) is the current node's tag and it's NOT in the active formatting elements list, just pop it and return
    • 2.3 Implement outer loop (max 8 iterations per spec):
      • Find the formatting element in active_formatting_elements (last entry matching tag name)
      • If not found, fall through to "any other end tag" behavior
      • If found but NOT in open_elements, remove from active list and return
      • If found but NOT in scope, parse error and return
      • If found and NOT the current node, parse error (continue anyway)
    • 2.4 Implement finding the furthest block: scan open_elements forward from the formatting element — first element that is NOT a formatting/phrasing element is the furthest block
    • 2.5 If no furthest block: pop open_elements up to and including formatting element, remove from active formatting list, return
    • 2.6 Implement the inner loop (max 3 iterations per spec):
      • Let node = element above furthest block in open_elements
      • Walk up open_elements from formatting element
      • If node is in active formatting list, it becomes the new common ancestor
      • Create a new element cloning the formatting element's tag and attributes
      • Reparent: move furthest block's children into the new element
      • Append new element as child of furthest block
      • Replace formatting element in active formatting list with new element
      • Move formatting element in open_elements
    • 2.7 Final step: insert the new element at the appropriate position — as a child of the common ancestor, in the position after the furthest block
    • 2.8 Handle the <a> tag special case: when opening a new <a>, if an <a> is already in active formatting elements, run adoption agency to close it first (Story 2.2 may have stubbed this — replace the stub)
  • Task 3: Implement formatting element classification (AC: #1)

    • 3.1 Create a function or const set is_formatting_element(tag: &str) -> bool for: a, b, big, code, em, font, i, nobr, s, small, strike, strong, tt, u
    • 3.2 Ensure adoption agency is called ONLY for these elements' end tags in the in-body mode handler
    • 3.3 For all other end tags in-body, use the generic "any other end tag" logic (walk open_elements backward, match and pop)
  • Task 4: Implement foster parenting (AC: #3)

    • 4.1 Implement TreeBuilder::foster_parent_insert(&mut self, node: NodeId, doc: &mut Document) — the foster parent insertion point algorithm per §13.2.6.1:
      • Search open_elements backward for a <table> element
      • If <table> has a parent in the DOM → insert node before <table> in its parent's children (using Document::insert_before)
      • If <table> has no parent (or is the first element) → append to the element one above <table> in open_elements
      • If no <table> found → append to the first element in open_elements (html)
    • 4.2 Wire foster parenting into the tree builder: when self.foster_parenting is true and a node would normally be appended, use foster_parent_insert() instead of append_child()
    • 4.3 Handle foster-parented text nodes: when inserting character tokens with foster parenting active, merge with adjacent text nodes at the foster parent insertion point (not at the table-interior position)
    • 4.4 Implement "in table text" mode foster parenting: when collected table-text contains non-whitespace characters, re-insert all collected characters via foster parenting
  • Task 5: Combined interaction handling (AC: #4)

    • 5.1 Test and handle: formatting elements that span across a table boundary (e.g., <b><table><td>text</td></table></b>)
    • 5.2 Test and handle: foster-parented content that contains misnested formatting (e.g., <table><b>text<i>more</b>rest</i></table>)
    • 5.3 Verify adoption agency algorithm works correctly when the formatting element or furthest block has been foster-parented
  • Task 6: Tests and documentation (AC: #5)

    • 6.1 Unit tests for adoption agency: basic misnesting (<b><i></b></i>), triple nesting (<b><i><u></b></u></i>), iteration limits (8+ nested formatting elements), <a> special case
    • 6.2 Unit tests for foster parenting: text inside <table>, elements inside <table>, elements inside <tbody>/<tr>, foster parenting with <table> inside another <table>
    • 6.3 Unit tests for combined scenarios: formatting + tables, nested tables with formatting
    • 6.4 Golden tests for documents exercising adoption agency and foster parenting
    • 6.5 Update docs/HTML5_Implementation_Checklist.md — check off adoption agency and foster parenting items
    • 6.6 Run just ci and ensure all tests pass

Dev Notes

Dependencies on Story 2.2

Story 2.3 assumes the following are already implemented by Story 2.2:

  1. InsertionMode enum with all 23 modes and process_token() dispatch
  2. active_formatting_elements: Vec<FormattingEntry> list on TreeBuilder with FormattingEntry::Element(NodeId) and FormattingEntry::Marker variants
  3. foster_parenting: bool flag on TreeBuilder, set to true in in-table mode for non-table content
  4. Scope helper: has_element_in_scope(&self, target: &str, scope: ScopeType, doc: &Document) -> bool
  5. Formatting elements pushed to active formatting list when encountered (Task 3.9)
  6. Reconstruction of active formatting elements at insertion points (Task 3.10)
  7. <a> stub: Story 2.2 Task 3.8 may have a stub for the <a> adoption agency call — replace it

If any of these are missing, implement them as part of this story.

Architecture Constraints

  • Layer 1 cratehtml depends on dom, shared, tracing only
  • Adding insert_before to dom crate is allowed — it's a same-layer dependency and a fundamental DOM operation
  • No unsafe — enforced by CI
  • Spec citations// HTML §13.2.6.4.7 for adoption agency, // HTML §13.2.6.1 for foster parenting
  • File size limits — the adoption agency algorithm is complex; consider placing it in a separate file (crates/html/src/adoption_agency.rs or within a tree_builder/ module) if tree_builder.rs is already large from Story 2.2

Adoption Agency Algorithm — Detailed Spec Steps

The algorithm per WHATWG HTML §13.2.6.4.7 (simplified):

1. Let subject = the end tag's name (e.g., "b")
2. If current node is subject AND subject NOT in active formatting: pop and return
3. OUTER LOOP (max 8 iterations):
   a. Let formatting_element = last entry in active_formatting matching subject
   b. If not found → "any other end tag" logic, return
   c. If not in open_elements → remove from active_formatting, return
   d. If not in scope → parse error, return
   e. If not current node → parse error (continue)
   f. Let furthest_block = first non-phrasing element AFTER formatting_element in open_elements
   g. If no furthest block → pop up to and including formatting_element, remove from active, return
   h. Let common_ancestor = element BEFORE formatting_element in open_elements
   i. Let bookmark = position of formatting_element in active_formatting
   j. INNER LOOP (starting from furthest_block, walking backward to formatting_element):
      - node = previous node in open_elements
      - If node is formatting_element: break
      - inner_loop_counter++; if > 3: remove node from open_elements, continue
      - If node NOT in active_formatting: remove from open_elements, continue
      - Create new_element = clone of node (same tag + attributes)
      - Replace node with new_element in active_formatting and open_elements
      - Reparent: all children of node → children of new_element
      - If last_node == furthest_block: set bookmark to after new_element in active_formatting
      - Append last_node as child of new_element
      - last_node = new_element
   k. Insert last_node at appropriate place (foster parent if needed, else as child of common_ancestor)
   l. Create new_element = clone of formatting_element (same tag + attributes)
   m. Reparent: all children of furthest_block → children of new_element
   n. Append new_element as child of furthest_block
   o. Remove formatting_element from active_formatting; insert new_element at bookmark position
   p. Remove formatting_element from open_elements; insert new_element after furthest_block in open_elements

DOM Operations Available vs Needed

Operation Available? Location Notes
append_child(parent, child) Yes document.rs Moves child if already has parent
remove_child(parent, child) Yes document.rs Severs parent/child link
insert_before(parent, new, ref) NO → Task 1.1 Must add to document.rs Critical for foster parenting and adoption agency
insert_at(parent, child, index) NO → Task 1.2 Must add to document.rs Helper for positional insertion
parent(id) Yes document.rs Get parent NodeId
children(id) Yes document.rs Get children slice
tag_name(id) Yes document.rs Get element tag name
create_element(tag) Yes document.rs Create new element
set_attribute(id, name, value) Yes document.rs Copy attributes

Key insight: append_child already handles reparenting (removes from old parent first). The missing piece is insert_before for positional insertion — both foster parenting and adoption agency need to insert at specific positions, not just append to the end.

Node Structure Reminder

pub struct Node {
    pub id: NodeId,
    pub kind: NodeKind,
    pub parent: Option<NodeId>,
    pub children: Vec<NodeId>,
}

Children are stored in a Vec<NodeId> — insertion at an arbitrary index is O(n) but straightforward with Vec::insert(index, value).

What NOT to Change (Scope Boundaries)

  • Do NOT add replaceChild or removeChild as public DOM APIs — that's Story 2.4 (DOM Tree Mutation APIs). The insert_before and insert_at added here are internal helpers for the parser, not public JavaScript-exposed APIs.
  • Do NOT modify insertion mode logic — that's Story 2.2. This story only hooks into the existing in-body mode end tag handler and the foster parenting flag.
  • Do NOT add DOM query APIs — that's Story 2.5.

Files to Modify

  • crates/dom/src/document.rs — add insert_before() and insert_at() methods
  • crates/html/src/tree_builder.rs (or new file adoption_agency.rs) — adoption agency algorithm and foster parenting logic
  • crates/html/src/tests/ — new test file adoption_agency_tests.rs and foster_parenting_tests.rs (or combined)
  • tests/goldens/fixtures/ + tests/goldens/expected/ — new golden tests
  • docs/HTML5_Implementation_Checklist.md — update checked items

Previous Story Intelligence

From Story 2.2:

  • Active formatting elements list and FormattingEntry enum will be available
  • Foster parenting flag will be set but not acted upon
  • Scope helpers (has_element_in_scope) will be available
  • TreeBuilder refactored to state machine with InsertionMode enum
  • <a> tag handling may have a stub for adoption agency

From Story 2.1:

  • Tokenizer refactored to state machine — same pattern (enum + match)
  • File size limits enforced — split large files

From Epic 1:

  • Code review catches real edge-case bugs
  • Golden tests essential for rendering-affecting changes
  • Always update checklists at the end

Testing Strategy

  • Unit tests for adoption agency in crates/html/src/tests/adoption_agency_tests.rs
  • Key test cases:
    • <p><b>bold<i>italic</b>rest</i></p> → b wraps "bold" and first "italic", new i wraps "rest"
    • <b><b><b><b><b><b><b><b><b>x</b></b></b></b></b></b></b></b></b> → outer loop limit (8 iterations)
    • <table><b>text</b></table> → "text" foster-parented before table
    • <table><tr><td>cell</td>text</tr></table> → "text" foster-parented
    • <table>text<b>bold</b></table> → both "text" and bold foster-parented
    • <a href="1">first<a href="2">second</a> → adoption agency closes first <a> before opening second
    • Nested tables with foster parenting
    • Formatting across table boundaries

References

Dev Agent Record

Agent Model Used

Claude Opus 4.6 (1M context)

Debug Log References

  • One test initially failed (foster_parent_element_inside_table) due to incorrect test expectations — per spec, both <b> and its text content are foster-parented separately, so <b> is empty and text is a sibling. Fixed test to match spec behavior.
  • remove_from_active_formatting_elements became unused after adoption agency handled its own list management — removed to satisfy -D warnings clippy.

Completion Notes List

  • Task 1: Added Document::insert_before() and Document::insert_at() to crates/dom/src/document.rs with 12 unit tests covering insertion at beginning/middle/end, reparenting, out-of-range clamping, empty children, and dirty flag.
  • Task 2: Implemented full WHATWG HTML §13.2.6.4.7 adoption agency algorithm in new file crates/html/src/tree_builder/adoption_agency.rs. Outer loop (8 iterations) and inner loop (3 iterations) limits enforced. Handles element cloning, reparenting, bookmark tracking, and active formatting list management. <a> special case replaces Story 2.2 stub.
  • Task 3: is_formatting_element() already existed from Story 2.2. Wired adoption agency into the in-body end tag handler for formatting elements, replacing the simplified pop-and-remove stub.
  • Task 4: Foster parenting implemented with foster_parent_insert() for elements and foster_parent_insert_character() for text. Wired into insert_element(), insert_element_no_push(), and insert_character() when foster_parenting flag is active. Text node merging at foster parent insertion point handles adjacent text consolidation.
  • Task 5: Combined interaction tests verify: formatting across table boundaries, foster-parented content with misnested formatting, whitespace vs non-whitespace table text handling.
  • Task 6: 14 unit tests in adoption_agency_tests.rs, 12 unit tests in insert_position_tests.rs, 2 golden tests (278-adoption-agency, 279-foster-parenting), HTML5 checklist updated. just ci passes.

Change Log

  • 2026-03-14: Implemented adoption agency algorithm (§13.2.6.4.7) and foster parenting (§13.2.6.1) with full test coverage.
  • 2026-03-14: Code review fixes — added missing nobr to is_formatting_element(), strengthened 6 weak test assertions to validate DOM structure, added nobr test, removed stale comment in table_modes.rs, removed misleading comment in adoption_agency.rs inner loop.

File List

  • crates/dom/src/document.rs — added insert_before() and insert_at() methods
  • crates/dom/src/tests/insert_position_tests.rs — new: 12 unit tests for positional insertion
  • crates/dom/src/tests/mod.rs — added insert_position_tests module
  • crates/html/src/tree_builder/adoption_agency.rs — new: adoption agency algorithm and foster parenting logic (~300 lines)
  • crates/html/src/tree_builder/mod.rs — registered adoption_agency module, wired foster parenting into insert_element/insert_character, removed unused remove_from_active_formatting_elements
  • crates/html/src/tree_builder/body_mode.rs — replaced simplified formatting end tag handler with adoption agency call, replaced <a> stub with adoption agency, made handle_any_other_end_tag pub(crate)
  • crates/html/src/tests/adoption_agency_tests.rs — new: 14 unit tests for adoption agency, foster parenting, and combined scenarios
  • crates/html/src/tests/mod.rs — added adoption_agency_tests module
  • tests/goldens/fixtures/278-adoption-agency.html — new golden fixture
  • tests/goldens/fixtures/279-foster-parenting.html — new golden fixture
  • tests/goldens/expected/278-adoption-agency.layout.txt — new golden expected
  • tests/goldens/expected/278-adoption-agency.dl.txt — new golden expected
  • tests/goldens/expected/279-foster-parenting.layout.txt — new golden expected
  • tests/goldens/expected/279-foster-parenting.dl.txt — new golden expected
  • tests/goldens.rs — added golden_278 and golden_279 test entries
  • docs/HTML5_Implementation_Checklist.md — checked off adoption agency and foster parenting items