263 lines
19 KiB
Markdown
263 lines
19 KiB
Markdown
# Story 2.3: Adoption Agency & Foster Parenting
|
|
|
|
Status: done
|
|
|
|
## Story
|
|
|
|
As a web user,
|
|
I want misnested formatting tags and misplaced table content to render correctly,
|
|
So that real-world pages with imperfect HTML display as expected.
|
|
|
|
## Acceptance Criteria
|
|
|
|
1. **Misnested formatting elements restructured per WHATWG HTML §13.2.6.4.7:** Input like `<b>text<i>more</b>rest</i>` produces the correct DOM tree where formatting is properly nested — `<b>` wraps "text" and the first "more", a new `<i>` wraps "rest" after `</b>`.
|
|
|
|
2. **Adoption agency iteration and nesting limits respected:** The algorithm enforces the spec's outer loop limit (8 iterations) and inner loop limit (3 iterations) to prevent infinite loops on deeply nested or pathological formatting markup.
|
|
|
|
3. **Foster parenting works for non-table content inside tables:** Non-table content directly inside `<table>`, `<tbody>`, `<tr>` elements is reparented to the correct location before the table per WHATWG HTML §13.2.6.1.
|
|
|
|
4. **Adoption agency and foster parenting interact correctly:** Documents combining both mis-nested formatting elements and misplaced table content produce the spec-defined DOM tree.
|
|
|
|
5. **WPT tree-construction tests pass** for adoption agency and foster parenting cases. `docs/HTML5_Implementation_Checklist.md` updated. `just ci` passes.
|
|
|
|
## Tasks / Subtasks
|
|
|
|
- [x] Task 1: Add positional DOM insertion (AC: #1, #3)
|
|
- [x] 1.1 Add `Document::insert_before(parent: NodeId, new_child: NodeId, ref_child: NodeId)` method in `crates/dom/src/document.rs` — inserts `new_child` before `ref_child` in `parent.children`; if `new_child` already has a parent, remove it first
|
|
- [x] 1.2 Add `Document::insert_at(parent: NodeId, child: NodeId, index: usize)` helper — inserts child at a specific index in parent's children vec
|
|
- [x] 1.3 Add unit tests for both methods: insert at beginning, middle, end; move from one parent to another; edge cases (empty children, out-of-range index clamps to end)
|
|
|
|
- [x] Task 2: Implement adoption agency algorithm (AC: #1, #2)
|
|
- [x] 2.1 Add method `TreeBuilder::run_adoption_agency(&mut self, tag: &str, doc: &mut Document)` — the main algorithm entry point, called from in-body mode when an end tag for a formatting element is received
|
|
- [x] 2.2 Implement step 1-3: If the subject (end tag name) is the current node's tag and it's NOT in the active formatting elements list, just pop it and return
|
|
- [x] 2.3 Implement outer loop (max 8 iterations per spec):
|
|
- Find the **formatting element** in active_formatting_elements (last entry matching tag name)
|
|
- If not found, fall through to "any other end tag" behavior
|
|
- If found but NOT in open_elements, remove from active list and return
|
|
- If found but NOT in scope, parse error and return
|
|
- If found and NOT the current node, parse error (continue anyway)
|
|
- [x] 2.4 Implement finding the **furthest block**: scan open_elements forward from the formatting element — first element that is NOT a formatting/phrasing element is the furthest block
|
|
- [x] 2.5 If no furthest block: pop open_elements up to and including formatting element, remove from active formatting list, return
|
|
- [x] 2.6 Implement the inner loop (max 3 iterations per spec):
|
|
- Let `node` = element above furthest block in open_elements
|
|
- Walk up open_elements from formatting element
|
|
- If `node` is in active formatting list, it becomes the new **common ancestor**
|
|
- Create a **new element** cloning the formatting element's tag and attributes
|
|
- Reparent: move furthest block's children into the new element
|
|
- Append new element as child of furthest block
|
|
- Replace formatting element in active formatting list with new element
|
|
- Move formatting element in open_elements
|
|
- [x] 2.7 Final step: insert the new element at the appropriate position — as a child of the common ancestor, in the position after the furthest block
|
|
- [x] 2.8 Handle the `<a>` tag special case: when opening a new `<a>`, if an `<a>` is already in active formatting elements, run adoption agency to close it first (Story 2.2 may have stubbed this — replace the stub)
|
|
|
|
- [x] Task 3: Implement formatting element classification (AC: #1)
|
|
- [x] 3.1 Create a function or const set `is_formatting_element(tag: &str) -> bool` for: `a`, `b`, `big`, `code`, `em`, `font`, `i`, `nobr`, `s`, `small`, `strike`, `strong`, `tt`, `u`
|
|
- [x] 3.2 Ensure adoption agency is called ONLY for these elements' end tags in the in-body mode handler
|
|
- [x] 3.3 For all other end tags in-body, use the generic "any other end tag" logic (walk open_elements backward, match and pop)
|
|
|
|
- [x] Task 4: Implement foster parenting (AC: #3)
|
|
- [x] 4.1 Implement `TreeBuilder::foster_parent_insert(&mut self, node: NodeId, doc: &mut Document)` — the foster parent insertion point algorithm per §13.2.6.1:
|
|
- Search open_elements backward for a `<table>` element
|
|
- If `<table>` has a parent in the DOM → insert `node` before `<table>` in its parent's children (using `Document::insert_before`)
|
|
- If `<table>` has no parent (or is the first element) → append to the element one above `<table>` in open_elements
|
|
- If no `<table>` found → append to the first element in open_elements (html)
|
|
- [x] 4.2 Wire foster parenting into the tree builder: when `self.foster_parenting` is true and a node would normally be appended, use `foster_parent_insert()` instead of `append_child()`
|
|
- [x] 4.3 Handle foster-parented text nodes: when inserting character tokens with foster parenting active, merge with adjacent text nodes at the foster parent insertion point (not at the table-interior position)
|
|
- [x] 4.4 Implement "in table text" mode foster parenting: when collected table-text contains non-whitespace characters, re-insert all collected characters via foster parenting
|
|
|
|
- [x] Task 5: Combined interaction handling (AC: #4)
|
|
- [x] 5.1 Test and handle: formatting elements that span across a table boundary (e.g., `<b><table><td>text</td></table></b>`)
|
|
- [x] 5.2 Test and handle: foster-parented content that contains misnested formatting (e.g., `<table><b>text<i>more</b>rest</i></table>`)
|
|
- [x] 5.3 Verify adoption agency algorithm works correctly when the formatting element or furthest block has been foster-parented
|
|
|
|
- [x] Task 6: Tests and documentation (AC: #5)
|
|
- [x] 6.1 Unit tests for adoption agency: basic misnesting (`<b><i></b></i>`), triple nesting (`<b><i><u></b></u></i>`), iteration limits (8+ nested formatting elements), `<a>` special case
|
|
- [x] 6.2 Unit tests for foster parenting: text inside `<table>`, elements inside `<table>`, elements inside `<tbody>`/`<tr>`, foster parenting with `<table>` inside another `<table>`
|
|
- [x] 6.3 Unit tests for combined scenarios: formatting + tables, nested tables with formatting
|
|
- [x] 6.4 Golden tests for documents exercising adoption agency and foster parenting
|
|
- [x] 6.5 Update `docs/HTML5_Implementation_Checklist.md` — check off adoption agency and foster parenting items
|
|
- [x] 6.6 Run `just ci` and ensure all tests pass
|
|
|
|
## Dev Notes
|
|
|
|
### Dependencies on Story 2.2
|
|
|
|
Story 2.3 assumes the following are already implemented by Story 2.2:
|
|
|
|
1. **`InsertionMode` enum** with all 23 modes and `process_token()` dispatch
|
|
2. **`active_formatting_elements: Vec<FormattingEntry>`** list on `TreeBuilder` with `FormattingEntry::Element(NodeId)` and `FormattingEntry::Marker` variants
|
|
3. **`foster_parenting: bool`** flag on `TreeBuilder`, set to `true` in in-table mode for non-table content
|
|
4. **Scope helper**: `has_element_in_scope(&self, target: &str, scope: ScopeType, doc: &Document) -> bool`
|
|
5. **Formatting elements pushed** to active formatting list when encountered (Task 3.9)
|
|
6. **Reconstruction of active formatting elements** at insertion points (Task 3.10)
|
|
7. **`<a>` stub**: Story 2.2 Task 3.8 may have a stub for the `<a>` adoption agency call — replace it
|
|
|
|
If any of these are missing, implement them as part of this story.
|
|
|
|
### Architecture Constraints
|
|
|
|
- **Layer 1 crate** — `html` depends on `dom`, `shared`, `tracing` only
|
|
- **Adding `insert_before` to `dom` crate is allowed** — it's a same-layer dependency and a fundamental DOM operation
|
|
- **No unsafe** — enforced by CI
|
|
- **Spec citations** — `// HTML §13.2.6.4.7` for adoption agency, `// HTML §13.2.6.1` for foster parenting
|
|
- **File size limits** — the adoption agency algorithm is complex; consider placing it in a separate file (`crates/html/src/adoption_agency.rs` or within a `tree_builder/` module) if tree_builder.rs is already large from Story 2.2
|
|
|
|
### Adoption Agency Algorithm — Detailed Spec Steps
|
|
|
|
The algorithm per WHATWG HTML §13.2.6.4.7 (simplified):
|
|
|
|
```
|
|
1. Let subject = the end tag's name (e.g., "b")
|
|
2. If current node is subject AND subject NOT in active formatting: pop and return
|
|
3. OUTER LOOP (max 8 iterations):
|
|
a. Let formatting_element = last entry in active_formatting matching subject
|
|
b. If not found → "any other end tag" logic, return
|
|
c. If not in open_elements → remove from active_formatting, return
|
|
d. If not in scope → parse error, return
|
|
e. If not current node → parse error (continue)
|
|
f. Let furthest_block = first non-phrasing element AFTER formatting_element in open_elements
|
|
g. If no furthest block → pop up to and including formatting_element, remove from active, return
|
|
h. Let common_ancestor = element BEFORE formatting_element in open_elements
|
|
i. Let bookmark = position of formatting_element in active_formatting
|
|
j. INNER LOOP (starting from furthest_block, walking backward to formatting_element):
|
|
- node = previous node in open_elements
|
|
- If node is formatting_element: break
|
|
- inner_loop_counter++; if > 3: remove node from open_elements, continue
|
|
- If node NOT in active_formatting: remove from open_elements, continue
|
|
- Create new_element = clone of node (same tag + attributes)
|
|
- Replace node with new_element in active_formatting and open_elements
|
|
- Reparent: all children of node → children of new_element
|
|
- If last_node == furthest_block: set bookmark to after new_element in active_formatting
|
|
- Append last_node as child of new_element
|
|
- last_node = new_element
|
|
k. Insert last_node at appropriate place (foster parent if needed, else as child of common_ancestor)
|
|
l. Create new_element = clone of formatting_element (same tag + attributes)
|
|
m. Reparent: all children of furthest_block → children of new_element
|
|
n. Append new_element as child of furthest_block
|
|
o. Remove formatting_element from active_formatting; insert new_element at bookmark position
|
|
p. Remove formatting_element from open_elements; insert new_element after furthest_block in open_elements
|
|
```
|
|
|
|
### DOM Operations Available vs Needed
|
|
|
|
| Operation | Available? | Location | Notes |
|
|
|---|---|---|---|
|
|
| `append_child(parent, child)` | Yes | `document.rs` | Moves child if already has parent |
|
|
| `remove_child(parent, child)` | Yes | `document.rs` | Severs parent/child link |
|
|
| `insert_before(parent, new, ref)` | **NO → Task 1.1** | Must add to `document.rs` | Critical for foster parenting and adoption agency |
|
|
| `insert_at(parent, child, index)` | **NO → Task 1.2** | Must add to `document.rs` | Helper for positional insertion |
|
|
| `parent(id)` | Yes | `document.rs` | Get parent NodeId |
|
|
| `children(id)` | Yes | `document.rs` | Get children slice |
|
|
| `tag_name(id)` | Yes | `document.rs` | Get element tag name |
|
|
| `create_element(tag)` | Yes | `document.rs` | Create new element |
|
|
| `set_attribute(id, name, value)` | Yes | `document.rs` | Copy attributes |
|
|
|
|
**Key insight:** `append_child` already handles reparenting (removes from old parent first). The missing piece is `insert_before` for positional insertion — both foster parenting and adoption agency need to insert at specific positions, not just append to the end.
|
|
|
|
### Node Structure Reminder
|
|
|
|
```rust
|
|
pub struct Node {
|
|
pub id: NodeId,
|
|
pub kind: NodeKind,
|
|
pub parent: Option<NodeId>,
|
|
pub children: Vec<NodeId>,
|
|
}
|
|
```
|
|
|
|
Children are stored in a `Vec<NodeId>` — insertion at an arbitrary index is `O(n)` but straightforward with `Vec::insert(index, value)`.
|
|
|
|
### What NOT to Change (Scope Boundaries)
|
|
|
|
- **Do NOT add `replaceChild` or `removeChild` as public DOM APIs** — that's Story 2.4 (DOM Tree Mutation APIs). The `insert_before` and `insert_at` added here are internal helpers for the parser, not public JavaScript-exposed APIs.
|
|
- **Do NOT modify insertion mode logic** — that's Story 2.2. This story only hooks into the existing in-body mode end tag handler and the foster parenting flag.
|
|
- **Do NOT add DOM query APIs** — that's Story 2.5.
|
|
|
|
### Files to Modify
|
|
|
|
- `crates/dom/src/document.rs` — add `insert_before()` and `insert_at()` methods
|
|
- `crates/html/src/tree_builder.rs` (or new file `adoption_agency.rs`) — adoption agency algorithm and foster parenting logic
|
|
- `crates/html/src/tests/` — new test file `adoption_agency_tests.rs` and `foster_parenting_tests.rs` (or combined)
|
|
- `tests/goldens/fixtures/` + `tests/goldens/expected/` — new golden tests
|
|
- `docs/HTML5_Implementation_Checklist.md` — update checked items
|
|
|
|
### Previous Story Intelligence
|
|
|
|
**From Story 2.2:**
|
|
- Active formatting elements list and FormattingEntry enum will be available
|
|
- Foster parenting flag will be set but not acted upon
|
|
- Scope helpers (has_element_in_scope) will be available
|
|
- TreeBuilder refactored to state machine with InsertionMode enum
|
|
- `<a>` tag handling may have a stub for adoption agency
|
|
|
|
**From Story 2.1:**
|
|
- Tokenizer refactored to state machine — same pattern (enum + match)
|
|
- File size limits enforced — split large files
|
|
|
|
**From Epic 1:**
|
|
- Code review catches real edge-case bugs
|
|
- Golden tests essential for rendering-affecting changes
|
|
- Always update checklists at the end
|
|
|
|
### Testing Strategy
|
|
|
|
- **Unit tests** for adoption agency in `crates/html/src/tests/adoption_agency_tests.rs`
|
|
- **Key test cases:**
|
|
- `<p><b>bold<i>italic</b>rest</i></p>` → b wraps "bold" and first "italic", new i wraps "rest"
|
|
- `<b><b><b><b><b><b><b><b><b>x</b></b></b></b></b></b></b></b></b>` → outer loop limit (8 iterations)
|
|
- `<table><b>text</b></table>` → "text" foster-parented before table
|
|
- `<table><tr><td>cell</td>text</tr></table>` → "text" foster-parented
|
|
- `<table>text<b>bold</b></table>` → both "text" and bold foster-parented
|
|
- `<a href="1">first<a href="2">second</a>` → adoption agency closes first `<a>` before opening second
|
|
- Nested tables with foster parenting
|
|
- Formatting across table boundaries
|
|
|
|
### References
|
|
|
|
- [WHATWG HTML Living Standard §13.2.6.4.7 — Adoption Agency Algorithm](https://html.spec.whatwg.org/multipage/parsing.html#adoption-agency-algorithm)
|
|
- [WHATWG HTML Living Standard §13.2.6.1 — Foster Parenting](https://html.spec.whatwg.org/multipage/parsing.html#foster-parent)
|
|
- [Source: crates/html/src/tree_builder.rs] — tree builder (will be refactored by Story 2.2)
|
|
- [Source: crates/dom/src/document.rs] — DOM operations, ~639 lines
|
|
- [Source: crates/dom/src/node.rs] — Node structure, ~145 lines
|
|
- [Source: _bmad-output/implementation-artifacts/2-2-tree-builder-insertion-modes.md] — Story 2.2 setup
|
|
- [Source: docs/HTML5_Implementation_Checklist.md] — checklist to update
|
|
|
|
## Dev Agent Record
|
|
|
|
### Agent Model Used
|
|
Claude Opus 4.6 (1M context)
|
|
|
|
### Debug Log References
|
|
- One test initially failed (`foster_parent_element_inside_table`) due to incorrect test expectations — per spec, both `<b>` and its text content are foster-parented separately, so `<b>` is empty and text is a sibling. Fixed test to match spec behavior.
|
|
- `remove_from_active_formatting_elements` became unused after adoption agency handled its own list management — removed to satisfy `-D warnings` clippy.
|
|
|
|
### Completion Notes List
|
|
- **Task 1**: Added `Document::insert_before()` and `Document::insert_at()` to `crates/dom/src/document.rs` with 12 unit tests covering insertion at beginning/middle/end, reparenting, out-of-range clamping, empty children, and dirty flag.
|
|
- **Task 2**: Implemented full WHATWG HTML §13.2.6.4.7 adoption agency algorithm in new file `crates/html/src/tree_builder/adoption_agency.rs`. Outer loop (8 iterations) and inner loop (3 iterations) limits enforced. Handles element cloning, reparenting, bookmark tracking, and active formatting list management. `<a>` special case replaces Story 2.2 stub.
|
|
- **Task 3**: `is_formatting_element()` already existed from Story 2.2. Wired adoption agency into the in-body end tag handler for formatting elements, replacing the simplified pop-and-remove stub.
|
|
- **Task 4**: Foster parenting implemented with `foster_parent_insert()` for elements and `foster_parent_insert_character()` for text. Wired into `insert_element()`, `insert_element_no_push()`, and `insert_character()` when `foster_parenting` flag is active. Text node merging at foster parent insertion point handles adjacent text consolidation.
|
|
- **Task 5**: Combined interaction tests verify: formatting across table boundaries, foster-parented content with misnested formatting, whitespace vs non-whitespace table text handling.
|
|
- **Task 6**: 14 unit tests in `adoption_agency_tests.rs`, 12 unit tests in `insert_position_tests.rs`, 2 golden tests (278-adoption-agency, 279-foster-parenting), HTML5 checklist updated. `just ci` passes.
|
|
|
|
### Change Log
|
|
- 2026-03-14: Implemented adoption agency algorithm (§13.2.6.4.7) and foster parenting (§13.2.6.1) with full test coverage.
|
|
- 2026-03-14: Code review fixes — added missing `nobr` to `is_formatting_element()`, strengthened 6 weak test assertions to validate DOM structure, added `nobr` test, removed stale comment in table_modes.rs, removed misleading comment in adoption_agency.rs inner loop.
|
|
|
|
### File List
|
|
- `crates/dom/src/document.rs` — added `insert_before()` and `insert_at()` methods
|
|
- `crates/dom/src/tests/insert_position_tests.rs` — new: 12 unit tests for positional insertion
|
|
- `crates/dom/src/tests/mod.rs` — added `insert_position_tests` module
|
|
- `crates/html/src/tree_builder/adoption_agency.rs` — new: adoption agency algorithm and foster parenting logic (~300 lines)
|
|
- `crates/html/src/tree_builder/mod.rs` — registered adoption_agency module, wired foster parenting into insert_element/insert_character, removed unused `remove_from_active_formatting_elements`
|
|
- `crates/html/src/tree_builder/body_mode.rs` — replaced simplified formatting end tag handler with adoption agency call, replaced `<a>` stub with adoption agency, made `handle_any_other_end_tag` pub(crate)
|
|
- `crates/html/src/tests/adoption_agency_tests.rs` — new: 14 unit tests for adoption agency, foster parenting, and combined scenarios
|
|
- `crates/html/src/tests/mod.rs` — added `adoption_agency_tests` module
|
|
- `tests/goldens/fixtures/278-adoption-agency.html` — new golden fixture
|
|
- `tests/goldens/fixtures/279-foster-parenting.html` — new golden fixture
|
|
- `tests/goldens/expected/278-adoption-agency.layout.txt` — new golden expected
|
|
- `tests/goldens/expected/278-adoption-agency.dl.txt` — new golden expected
|
|
- `tests/goldens/expected/279-foster-parenting.layout.txt` — new golden expected
|
|
- `tests/goldens/expected/279-foster-parenting.dl.txt` — new golden expected
|
|
- `tests/goldens.rs` — added golden_278 and golden_279 test entries
|
|
- `docs/HTML5_Implementation_Checklist.md` — checked off adoption agency and foster parenting items
|