Files
rust_browser/docs/wpt_known_fail_analysis.md
Zachary D. Rowitsch 87c329a959 Update WPT known-fail analysis: 1,287 pass / 1,626 known_fail
Re-evaluate the full WPT test suite breakdown with current numbers
(+128 tests promoted since Feb 14), add detailed per-category feature
gap analysis with sub-feature counts, cross-cutting themes, and
prioritized recommendations for highest-impact fixes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 23:16:11 -05:00

15 KiB

WPT Known-Fail Test Analysis

Last updated: 2026-02-25

Current State

  • 1,287 pass, 1,626 known_fail, 1 skip (2,914 total)
  • All known_fail tests are reftests that fail pixel comparison — they require CSS features or layout modes the engine doesn't yet support
  • Test suite runs in ~12 seconds with parallel execution

Completed Fixes

Fix 1: Node ID Normalization in Reftest Comparison (Done)

In reftests, the test HTML and reference HTML have different <head> content (different numbers of <link>, <meta>, <title> elements), causing DOM node IDs to differ. The layout trees are structurally identical in content/dimensions, but node=#23 vs node=#15 caused string comparison to fail.

Fix: tests/wpt_harness/runner.rsnormalize_node_ids() replaces node=#N with sequential IDs based on order of appearance before comparing layout and display list dumps.

Fix 2: Strip CDATA Markers from Style Text (Done)

Many WPT reference HTML files (originally XHTML) wrap CSS in <![CDATA[...]]> inside <style> tags. The HTML parser extracts raw text including CDATA markers, and the CSS parser received <![CDATA[ div { color: red; } ]]> and failed to parse any rules.

Fix: crates/style/src/context.rs — strip <![CDATA[ and ]]> from CSS text in extract_stylesheets() before parsing.

Fix 3: CSS Unit Conversion — cm, mm, in, pt, pc (Done)

The CSS parser only recognized px, em, rem, %. Other units (cm, mm, in, pt, pc) fell through to a default of Px, treating 2.54cm as 2.54px instead of 96px.

Fix:

  1. crates/css/src/types.rs — Added Cm, Mm, In, Pt, Pc variants with to_px() conversions
  2. crates/css/src/parser.rs — Added unit string matching in dimension parsing and grid template parsing

Results

Fixes 1-3 combined promoted 120 tests from known_fail to pass (36 → 156 total).

The original estimates (~485 cumulative) were too optimistic — most tests affected by CDATA or unit issues also have other layout differences that prevent passing. The fixes were necessary prerequisites but not sufficient alone for those tests.

Fix 4: display: contents + Implicit <head> Insertion (Done)

Two changes combined:

  1. display: contents — Added Display::Contents variant. Elements with this value generate no box; their children are promoted into the parent's layout. Implemented via build_children_into() in crates/layout/src/engine/mod.rs.

  2. Implicit <head> insertion — The HTML parser now creates an implicit <head> when encountering head-only elements (title, style, link, meta, script) without an existing <head>. Previously, <title> text was rendered visibly in the body for documents lacking explicit <head> tags, causing reftest mismatches since test and reference files have different titles. This was the larger win.

Fix:

  1. crates/style/src/types.rs — Added Display::Contents variant and keyword parsing
  2. crates/layout/src/engine/mod.rsbuild_children_into() flattens display:contents children
  3. crates/html/src/lib.rs — Implicit <head> creation and proper head-closing on non-head elements

Results: 53 tests promoted, 5 false-pass tests demoted (were only passing because both sides had content hidden in <head>). Net: +48 tests (156 → 204 total).

Fix 5: Pixel-Based Reftest Comparison + Parallel Test Execution (Done)

Most WPT reftests use different CSS techniques (borders vs backgrounds, etc.) to achieve the same visual result, making layout-tree text comparison fundamentally unable to match. The fix adds pixel-based comparison as a fallback.

Changes:

  1. tests/wpt_harness/runner.rs — Added rasterize_html() and compare_pixels() functions. Reftests now try layout-tree comparison first (fast path), then fall back to pixel comparison by rasterizing both test and reference HTML to 800x600 pixel buffers and comparing per-pixel with a channel tolerance of 2.
  2. tests/wpt_harness.rs — Parallelized test execution using std::thread::scope, with progress reporting every 100 tests. Skips artifact writing for known_fail tests to reduce I/O.
  3. crates/style/src/context.rs — Added CDATA stripping to extract_stylesheet_sources() (was already in extract_stylesheets() but missing from the Pipeline code path).

Results: 955 tests promoted from known_fail to pass (204 → 1,159 total). Test suite runs in ~11 seconds with parallel execution.

Fix 6: Incremental Engine Improvements (Done — Feb 14 → Feb 25)

Multiple engine improvements collectively promoted 128 tests (1,159 → 1,287). Key changes:

  • Canvas background propagation (CSS 2.1 §14.2) — body background paints at viewport level
  • Border shorthand fix — omitted sub-properties now properly reset
  • Extended border stylesinset, outset, groove, ridge, double, hidden
  • Table cell sizing — height as minimum (CSS 2.1 §17.5.3), column width overflow fix
  • Linear gradientslinear-gradient() through the full rendering pipeline
  • Background-position/repeat — CSS sprite support
  • Float sizing fixes — intrinsic aspect ratio, shrink-to-fit width for block children
  • Block-in-inline splitting (CSS 2.1 §9.2.1.1) — anonymous block generation
  • Flex-column fixmin-height with flex-grow distribution
  • CSS cascade ordering — per CSS Cascading Level 4

Remaining Known-Fail Analysis (1,626 tests)

Breakdown by CSS Specification Area

Category Count Key Missing Features
css-flexbox 569 Shrink/grow algorithm, alignment, wrap, gap, writing-modes, intrinsic sizing
css-text 268 break-spaces, keep-all, hyphens, text-transform tailoring, tab-size
css2-margin-padding 154 Full margin collapsing, margin/padding on table-internal elements
css-tables 91 border-collapse painting, height distribution, visibility: collapse
css2-positioning 89 top/left on table-internal elements, abspos overflow, relpos edge cases
css-backgrounds 75 background-clip, border-radius, border-image, border keyword widths
css2-normal-flow 70 Inline-table, inline-block baseline, min-width/max-width edge cases
css2-floats 67 BFC-float exclusion, clearance computation, float+margin collapsing
css-display 59 display: run-in (41), display: contents edge cases (15), flow-root (3)
css-position 58 Relative pos on table elements (27), abspos static position in flex (14)
css-box 38 margin-trim (all 38 — unimplemented CSS4 property)
css2-box-display 24 Block-in-inline edge cases, containing block determination
css-inline 6 Phantom line boxes
pseudo-elements 1 ::before edge case (feature is implemented)

Detailed Feature Gap Analysis

1. Flexbox (569 tests) — Largest Gap

The engine has a functional flexbox implementation (~1,764 lines in crates/layout/src/engine/flex.rs) with gap, justify-content, align-items, align-content, flex-wrap, and flex-direction support. The failing tests exercise:

Sub-feature ~Count What's Missing
Flex sizing algorithm edge cases 111 flex: initial/none/auto shorthand resolution, shrink below min-content
Alignment (align-items, align-self, align-content) 60 baseline alignment, stretch with cross-axis constraints
justify-content edge cases 30 space-evenly, interaction with margin: auto
Writing modes (writing-mode, direction: rtl) 13 Flex axis mapping with vertical/RTL writing modes
Gap with writing modes 33 Gap in non-default writing modes
Intrinsic sizing 21 min-content/max-content width of flex containers
Table as flex item 19 Tables inside flex containers
Baseline alignment 12 First/last baseline computation for flex items
Percentage height resolution 10 Definite-size propagation through flex items
aspect-ratio interaction 5 Aspect ratio with flex sizing

2. CSS Text (268 tests)

The engine has basic text rendering with letter-spacing support. Missing:

Sub-feature ~Count Notes
white-space: break-spaces 79 Preserved spaces that wrap
hyphens: auto/manual 40 Language-dependent auto-hyphenation
word-break: keep-all 25 CJK-aware word breaking
word-spacing 21 Word spacing with bidi/writing-modes
text-transform tailoring 18 Language-sensitive capitalization (Dutch IJ, Turkish i)
Line breaking (CJK) 14 line-break: strict/loose/anywhere
letter-spacing bidi 11 Letter-spacing after bidi reordering
text-align-last 11 Last-line alignment, match-parent, justify
overflow-wrap: anywhere 10 Wrapping anywhere vs break-word
text-autospace 8 CJK↔Latin auto-spacing
tab-size 7 Tab character width
hanging-punctuation 3 Punctuation outside content box

3. Margin Collapsing & Table-Internal Margins (154 tests)

The engine has basic margin collapsing (collapse_margins() in block.rs). Missing:

  • Parent-child through-flow collapsing (margins pass through empty blocks)
  • Negative margin collapsing rules (most negative + most positive)
  • min-height interaction — doesn't prevent bottom margin adjacency
  • Clearance interaction — clear changes which margins are adjoining
  • "Does not apply" rules — margins/padding on table-row-group, table-row, table-column, etc. should be ignored (~48 tests)

4. Tables (91 tests)

The engine has basic table layout with a collapsed borders module (~1,081 lines). Missing:

  • Collapsed border paint ordering (borders paint in background phase)
  • Height distribution to row groups (extra height allocation)
  • Abspos inside table cells
  • visibility: collapse on rows/columns
  • box-sizing interaction with display: table

5. Positioning (89 css2 + 58 css-position = 147 tests)

The engine has position: absolute/relative/sticky/fixed. Missing:

  • top/left/right/bottom application rules for table-internal elements (~51 tests)
  • Abspos containing block for inline-level ancestors
  • Abspos overflow handling
  • Relative positioning of table-internal elements (td, tr, thead, etc.)
  • Static position of inline-level abspos in block-level context (14 tests)

6. Backgrounds & Borders (75 tests)

The engine has background-color, background-image, background-position, background-repeat, linear-gradient(), box-shadow, and extended border styles. Missing:

  • background-clip: content-box/padding-box/text (17 tests)
  • border-image (5 tests)
  • border-radius and rounded-corner clipping (3 tests)
  • Border width keywords thin/medium/thick = 1/3/5px (9 tests — may be partially working)
  • Sub-pixel border snapping
  • background-attachment: fixed/local (3 tests)

7. Normal Flow (70 tests)

Block-in-inline splitting is now implemented. Remaining:

  • display: inline-table (11 tests)
  • Inline-block baseline computation (9 tests)
  • min-width/max-width/min-height/max-height edge cases (17 tests)
  • Inline replaced element sizing (3 tests)

8. Floats (67 tests)

The engine has float layout with BFC avoidance. Missing:

  • BFC border boxes must not overlap float margin boxes (29 tests)
  • Complex clearance computation with margin collapsing (16 tests)
  • Float + table BFC interaction
  • Float suppression on abspos elements

9. Display (59 tests)

  • display: run-in — 41 tests. Run-in boxes merge into the following block as inline content. This is a rarely-used CSS2 feature; most browsers dropped support. Low priority.
  • display: contents edge cases — 15 tests. Feature is implemented but fails for: ::first-letter/::first-line interaction, <fieldset>/<button>/<details> special behavior, and flex/table-cell containers.
  • display: flow-root — 3 tests. Not yet parsed.

10. CSS Box Model (38 tests)

All 38 tests are for margin-trim — a CSS4 property that trims child margins at container edges. Not yet implemented. Low priority (newer spec, limited browser support).

Cross-Cutting Themes

  1. Writing modes (writing-mode: vertical-lr/rl, direction: rtl) — affects flexbox, text, gap, positioning. No writing-mode support exists; ~50+ tests across categories.

  2. Table-internal element rules — margins, padding, and position offsets on table-row-group, table-row, table-column, etc. should be ignored per spec. ~75+ tests across margin-padding and positioning categories.

  3. Intrinsic sizing (min-content/max-content) — affects flexbox intrinsic sizing (21), normal flow min-width/max-width (17). Partial support exists but edge cases fail.

  4. BFC establishment effects — BFC blocks avoiding float overlap (29), height computation, margin collapsing with clearance (~18).

Priority Recommendations

High-Impact (most tests per effort)

  1. Table-internal "does not apply" rules (~75 tests) — Relatively straightforward: skip margin/padding/position-offset for elements with display: table-row-group, table-row, table-column, table-column-group, table-header-group, table-footer-group.

  2. Margin collapsing completeness (~154 tests) — The full algorithm (CSS 2.1 §8.3.1) handles parent-child, negative margins, min-height interaction, and clearance. Complex but high payoff.

  3. background-clip: content-box/padding-box (17 tests) — Clip background to content or padding area. Moderate implementation effort.

  4. Border width keywords (9 tests) — Map thin→1px, medium→3px, thick→5px. Trivial fix.

  5. display: flow-root (3 tests) — Parse as a BFC-establishing block. Trivial.

Medium-Impact

  1. Flexbox algorithm refinements (569 tests) — Incremental: fix flex: initial/none, stretch alignment, baseline, then writing-modes. Each sub-fix could promote 10-50 tests.

  2. Float/BFC exclusion (29+ tests) — BFC blocks must not overlap float margins.

  3. Collapsed border paint order (18 tests) — Borders paint in background phase.

Low Priority

  1. display: run-in (41 tests) — Dropped by most browsers. Skip.
  2. margin-trim (38 tests) — CSS4, limited browser support.
  3. Writing modes (50+ tests) — Pervasive impact but massive implementation effort.

Summary Table

Fix Tests Promoted Status Cumulative Pass
1. Node ID normalization (combined) Done
2. CDATA stripping (combined) Done
3. CSS unit conversion (combined) Done 156 pass
4. display:contents + implicit head +48 net Done 204 pass
5. Pixel-based comparison + parallel +955 Done 1,159 pass
6. Incremental engine improvements +128 Done 1,287 pass
Remaining 1,626 known_fail