Files
rust_browser/docs/wpt_known_fail_analysis.md
Zachary D. Rowitsch 87c329a959 Update WPT known-fail analysis: 1,287 pass / 1,626 known_fail
Re-evaluate the full WPT test suite breakdown with current numbers
(+128 tests promoted since Feb 14), add detailed per-category feature
gap analysis with sub-feature counts, cross-cutting themes, and
prioritized recommendations for highest-impact fixes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 23:16:11 -05:00

257 lines
15 KiB
Markdown

# WPT Known-Fail Test Analysis
Last updated: 2026-02-25
## Current State
- **1,287 pass**, **1,626 known_fail**, **1 skip** (2,914 total)
- All known_fail tests are reftests that fail pixel comparison — they require CSS features or layout modes the engine doesn't yet support
- Test suite runs in ~12 seconds with parallel execution
## Completed Fixes
### Fix 1: Node ID Normalization in Reftest Comparison (Done)
In reftests, the test HTML and reference HTML have different `<head>` content (different numbers of `<link>`, `<meta>`, `<title>` elements), causing DOM node IDs to differ. The layout trees are structurally identical in content/dimensions, but `node=#23` vs `node=#15` caused string comparison to fail.
**Fix:** `tests/wpt_harness/runner.rs``normalize_node_ids()` replaces `node=#N` with sequential IDs based on order of appearance before comparing layout and display list dumps.
### Fix 2: Strip CDATA Markers from Style Text (Done)
Many WPT reference HTML files (originally XHTML) wrap CSS in `<![CDATA[...]]>` inside `<style>` tags. The HTML parser extracts raw text including CDATA markers, and the CSS parser received `<![CDATA[ div { color: red; } ]]>` and failed to parse any rules.
**Fix:** `crates/style/src/context.rs` — strip `<![CDATA[` and `]]>` from CSS text in `extract_stylesheets()` before parsing.
### Fix 3: CSS Unit Conversion — cm, mm, in, pt, pc (Done)
The CSS parser only recognized `px`, `em`, `rem`, `%`. Other units (`cm`, `mm`, `in`, `pt`, `pc`) fell through to a default of `Px`, treating `2.54cm` as `2.54px` instead of `96px`.
**Fix:**
1. `crates/css/src/types.rs` — Added `Cm`, `Mm`, `In`, `Pt`, `Pc` variants with `to_px()` conversions
2. `crates/css/src/parser.rs` — Added unit string matching in dimension parsing and grid template parsing
### Results
Fixes 1-3 combined promoted **120 tests** from known_fail to pass (36 → 156 total).
The original estimates (~485 cumulative) were too optimistic — most tests affected by CDATA or unit issues also have other layout differences that prevent passing. The fixes were necessary prerequisites but not sufficient alone for those tests.
### Fix 4: `display: contents` + Implicit `<head>` Insertion (Done)
Two changes combined:
1. **`display: contents`** — Added `Display::Contents` variant. Elements with this value generate no box; their children are promoted into the parent's layout. Implemented via `build_children_into()` in `crates/layout/src/engine/mod.rs`.
2. **Implicit `<head>` insertion** — The HTML parser now creates an implicit `<head>` when encountering head-only elements (`title`, `style`, `link`, `meta`, `script`) without an existing `<head>`. Previously, `<title>` text was rendered visibly in the body for documents lacking explicit `<head>` tags, causing reftest mismatches since test and reference files have different titles. This was the larger win.
**Fix:**
1. `crates/style/src/types.rs` — Added `Display::Contents` variant and keyword parsing
2. `crates/layout/src/engine/mod.rs``build_children_into()` flattens display:contents children
3. `crates/html/src/lib.rs` — Implicit `<head>` creation and proper head-closing on non-head elements
**Results:** 53 tests promoted, 5 false-pass tests demoted (were only passing because both sides had content hidden in `<head>`). Net: **+48 tests** (156 → 204 total).
### Fix 5: Pixel-Based Reftest Comparison + Parallel Test Execution (Done)
Most WPT reftests use different CSS techniques (borders vs backgrounds, etc.) to achieve the same visual result, making layout-tree text comparison fundamentally unable to match. The fix adds pixel-based comparison as a fallback.
**Changes:**
1. `tests/wpt_harness/runner.rs` — Added `rasterize_html()` and `compare_pixels()` functions. Reftests now try layout-tree comparison first (fast path), then fall back to pixel comparison by rasterizing both test and reference HTML to 800x600 pixel buffers and comparing per-pixel with a channel tolerance of 2.
2. `tests/wpt_harness.rs` — Parallelized test execution using `std::thread::scope`, with progress reporting every 100 tests. Skips artifact writing for known_fail tests to reduce I/O.
3. `crates/style/src/context.rs` — Added CDATA stripping to `extract_stylesheet_sources()` (was already in `extract_stylesheets()` but missing from the Pipeline code path).
**Results:** 955 tests promoted from known_fail to pass (204 → 1,159 total). Test suite runs in ~11 seconds with parallel execution.
### Fix 6: Incremental Engine Improvements (Done — Feb 14 → Feb 25)
Multiple engine improvements collectively promoted **128 tests** (1,159 → 1,287). Key changes:
- **Canvas background propagation** (CSS 2.1 §14.2) — body background paints at viewport level
- **Border shorthand fix** — omitted sub-properties now properly reset
- **Extended border styles** — `inset`, `outset`, `groove`, `ridge`, `double`, `hidden`
- **Table cell sizing** — height as minimum (CSS 2.1 §17.5.3), column width overflow fix
- **Linear gradients** — `linear-gradient()` through the full rendering pipeline
- **Background-position/repeat** — CSS sprite support
- **Float sizing fixes** — intrinsic aspect ratio, shrink-to-fit width for block children
- **Block-in-inline splitting** (CSS 2.1 §9.2.1.1) — anonymous block generation
- **Flex-column fix** — `min-height` with `flex-grow` distribution
- **CSS cascade ordering** — per CSS Cascading Level 4
## Remaining Known-Fail Analysis (1,626 tests)
### Breakdown by CSS Specification Area
| Category | Count | Key Missing Features |
|----------|------:|------|
| **css-flexbox** | 569 | Shrink/grow algorithm, alignment, wrap, gap, writing-modes, intrinsic sizing |
| **css-text** | 268 | `break-spaces`, `keep-all`, `hyphens`, text-transform tailoring, `tab-size` |
| **css2-margin-padding** | 154 | Full margin collapsing, margin/padding on table-internal elements |
| **css-tables** | 91 | `border-collapse` painting, height distribution, `visibility: collapse` |
| **css2-positioning** | 89 | `top`/`left` on table-internal elements, abspos overflow, relpos edge cases |
| **css-backgrounds** | 75 | `background-clip`, `border-radius`, `border-image`, border keyword widths |
| **css2-normal-flow** | 70 | Inline-table, inline-block baseline, `min-width`/`max-width` edge cases |
| **css2-floats** | 67 | BFC-float exclusion, clearance computation, float+margin collapsing |
| **css-display** | 59 | `display: run-in` (41), `display: contents` edge cases (15), `flow-root` (3) |
| **css-position** | 58 | Relative pos on table elements (27), abspos static position in flex (14) |
| **css-box** | 38 | `margin-trim` (all 38 — unimplemented CSS4 property) |
| **css2-box-display** | 24 | Block-in-inline edge cases, containing block determination |
| **css-inline** | 6 | Phantom line boxes |
| **pseudo-elements** | 1 | `::before` edge case (feature is implemented) |
### Detailed Feature Gap Analysis
#### 1. Flexbox (569 tests) — Largest Gap
The engine has a functional flexbox implementation (~1,764 lines in `crates/layout/src/engine/flex.rs`) with `gap`, `justify-content`, `align-items`, `align-content`, `flex-wrap`, and `flex-direction` support. The failing tests exercise:
| Sub-feature | ~Count | What's Missing |
|---|---:|---|
| Flex sizing algorithm edge cases | 111 | `flex: initial`/`none`/`auto` shorthand resolution, shrink below min-content |
| Alignment (`align-items`, `align-self`, `align-content`) | 60 | `baseline` alignment, `stretch` with cross-axis constraints |
| `justify-content` edge cases | 30 | `space-evenly`, interaction with `margin: auto` |
| Writing modes (`writing-mode`, `direction: rtl`) | 13 | Flex axis mapping with vertical/RTL writing modes |
| Gap with writing modes | 33 | Gap in non-default writing modes |
| Intrinsic sizing | 21 | `min-content`/`max-content` width of flex containers |
| Table as flex item | 19 | Tables inside flex containers |
| Baseline alignment | 12 | First/last baseline computation for flex items |
| Percentage height resolution | 10 | Definite-size propagation through flex items |
| `aspect-ratio` interaction | 5 | Aspect ratio with flex sizing |
#### 2. CSS Text (268 tests)
The engine has basic text rendering with `letter-spacing` support. Missing:
| Sub-feature | ~Count | Notes |
|---|---:|---|
| `white-space: break-spaces` | 79 | Preserved spaces that wrap |
| `hyphens: auto/manual` | 40 | Language-dependent auto-hyphenation |
| `word-break: keep-all` | 25 | CJK-aware word breaking |
| `word-spacing` | 21 | Word spacing with bidi/writing-modes |
| `text-transform` tailoring | 18 | Language-sensitive capitalization (Dutch IJ, Turkish i) |
| Line breaking (CJK) | 14 | `line-break: strict/loose/anywhere` |
| `letter-spacing` bidi | 11 | Letter-spacing after bidi reordering |
| `text-align-last` | 11 | Last-line alignment, `match-parent`, `justify` |
| `overflow-wrap: anywhere` | 10 | Wrapping anywhere vs break-word |
| `text-autospace` | 8 | CJK↔Latin auto-spacing |
| `tab-size` | 7 | Tab character width |
| `hanging-punctuation` | 3 | Punctuation outside content box |
#### 3. Margin Collapsing & Table-Internal Margins (154 tests)
The engine has basic margin collapsing (`collapse_margins()` in `block.rs`). Missing:
- **Parent-child through-flow** collapsing (margins pass through empty blocks)
- **Negative margin** collapsing rules (most negative + most positive)
- **`min-height` interaction** — doesn't prevent bottom margin adjacency
- **Clearance interaction** — clear changes which margins are adjoining
- **"Does not apply" rules** — margins/padding on `table-row-group`, `table-row`, `table-column`, etc. should be ignored (~48 tests)
#### 4. Tables (91 tests)
The engine has basic table layout with a collapsed borders module (~1,081 lines). Missing:
- Collapsed border **paint ordering** (borders paint in background phase)
- **Height distribution** to row groups (extra height allocation)
- Abspos inside table cells
- `visibility: collapse` on rows/columns
- `box-sizing` interaction with `display: table`
#### 5. Positioning (89 css2 + 58 css-position = 147 tests)
The engine has `position: absolute/relative/sticky/fixed`. Missing:
- `top`/`left`/`right`/`bottom` **application rules for table-internal elements** (~51 tests)
- Abspos **containing block** for inline-level ancestors
- Abspos **overflow** handling
- **Relative positioning of table-internal elements** (td, tr, thead, etc.)
- Static position of **inline-level abspos in block-level context** (14 tests)
#### 6. Backgrounds & Borders (75 tests)
The engine has `background-color`, `background-image`, `background-position`, `background-repeat`, `linear-gradient()`, `box-shadow`, and extended border styles. Missing:
- `background-clip: content-box/padding-box/text` (17 tests)
- `border-image` (5 tests)
- `border-radius` and rounded-corner clipping (3 tests)
- Border width keywords `thin`/`medium`/`thick` = 1/3/5px (9 tests — may be partially working)
- Sub-pixel border snapping
- `background-attachment: fixed/local` (3 tests)
#### 7. Normal Flow (70 tests)
Block-in-inline splitting is now implemented. Remaining:
- `display: inline-table` (11 tests)
- Inline-block **baseline** computation (9 tests)
- `min-width`/`max-width`/`min-height`/`max-height` edge cases (17 tests)
- Inline replaced element sizing (3 tests)
#### 8. Floats (67 tests)
The engine has float layout with BFC avoidance. Missing:
- BFC border boxes must not overlap float margin boxes (29 tests)
- Complex clearance computation with margin collapsing (16 tests)
- Float + table BFC interaction
- Float suppression on abspos elements
#### 9. Display (59 tests)
- **`display: run-in`** — 41 tests. Run-in boxes merge into the following block as inline content. This is a rarely-used CSS2 feature; most browsers dropped support. Low priority.
- **`display: contents` edge cases** — 15 tests. Feature is implemented but fails for: `::first-letter`/`::first-line` interaction, `<fieldset>`/`<button>`/`<details>` special behavior, and flex/table-cell containers.
- **`display: flow-root`** — 3 tests. Not yet parsed.
#### 10. CSS Box Model (38 tests)
All 38 tests are for **`margin-trim`** — a CSS4 property that trims child margins at container edges. Not yet implemented. Low priority (newer spec, limited browser support).
### Cross-Cutting Themes
1. **Writing modes** (`writing-mode: vertical-lr/rl`, `direction: rtl`) — affects flexbox, text, gap, positioning. No writing-mode support exists; ~50+ tests across categories.
2. **Table-internal element rules** — margins, padding, and position offsets on `table-row-group`, `table-row`, `table-column`, etc. should be ignored per spec. ~75+ tests across margin-padding and positioning categories.
3. **Intrinsic sizing** (`min-content`/`max-content`) — affects flexbox intrinsic sizing (21), normal flow `min-width`/`max-width` (17). Partial support exists but edge cases fail.
4. **BFC establishment effects** — BFC blocks avoiding float overlap (29), height computation, margin collapsing with clearance (~18).
## Priority Recommendations
### High-Impact (most tests per effort)
1. **Table-internal "does not apply" rules** (~75 tests) — Relatively straightforward: skip margin/padding/position-offset for elements with `display: table-row-group`, `table-row`, `table-column`, `table-column-group`, `table-header-group`, `table-footer-group`.
2. **Margin collapsing completeness** (~154 tests) — The full algorithm (CSS 2.1 §8.3.1) handles parent-child, negative margins, `min-height` interaction, and clearance. Complex but high payoff.
3. **`background-clip: content-box/padding-box`** (17 tests) — Clip background to content or padding area. Moderate implementation effort.
4. **Border width keywords** (9 tests) — Map `thin`→1px, `medium`→3px, `thick`→5px. Trivial fix.
5. **`display: flow-root`** (3 tests) — Parse as a BFC-establishing block. Trivial.
### Medium-Impact
6. **Flexbox algorithm refinements** (569 tests) — Incremental: fix `flex: initial`/`none`, stretch alignment, baseline, then writing-modes. Each sub-fix could promote 10-50 tests.
7. **Float/BFC exclusion** (29+ tests) — BFC blocks must not overlap float margins.
8. **Collapsed border paint order** (18 tests) — Borders paint in background phase.
### Low Priority
9. **`display: run-in`** (41 tests) — Dropped by most browsers. Skip.
10. **`margin-trim`** (38 tests) — CSS4, limited browser support.
11. **Writing modes** (50+ tests) — Pervasive impact but massive implementation effort.
## Summary Table
| Fix | Tests Promoted | Status | Cumulative Pass |
|-----|---------------|--------|-----------------|
| 1. Node ID normalization | (combined) | Done | — |
| 2. CDATA stripping | (combined) | Done | — |
| 3. CSS unit conversion | (combined) | Done | 156 pass |
| 4. display:contents + implicit head | +48 net | Done | 204 pass |
| 5. Pixel-based comparison + parallel | +955 | Done | 1,159 pass |
| 6. Incremental engine improvements | +128 | Done | 1,287 pass |
| — | — | Remaining | 1,626 known_fail |