Reftests now fall back to pixel comparison when layout-tree text comparison fails. Both test and reference HTML are rasterized to 800x600 pixel buffers and compared per-pixel with a channel tolerance of 2. This handles the common WPT pattern where tests use different CSS techniques (borders vs backgrounds) to achieve the same visual result. Test execution is parallelized using std::thread::scope with progress reporting every 100 tests. Suite runs in ~11 seconds across all 2,914 tests. Also fixes missing CDATA stripping in extract_stylesheet_sources() used by the Pipeline code path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5.3 KiB
WPT Known-Fail Test Analysis
Last updated: 2026-02-14
Current State
- 1,159 pass, 1,754 known_fail, 1 skip
- All known_fail tests are reftests (comparing layout tree or pixel output of test HTML against reference HTML)
Completed Fixes
Fix 1: Node ID Normalization in Reftest Comparison (Done)
In reftests, the test HTML and reference HTML have different <head> content (different numbers of <link>, <meta>, <title> elements), causing DOM node IDs to differ. The layout trees are structurally identical in content/dimensions, but node=#23 vs node=#15 caused string comparison to fail.
Fix: tests/wpt_harness/runner.rs — normalize_node_ids() replaces node=#N with sequential IDs based on order of appearance before comparing layout and display list dumps.
Fix 2: Strip CDATA Markers from Style Text (Done)
Many WPT reference HTML files (originally XHTML) wrap CSS in <![CDATA[...]]> inside <style> tags. The HTML parser extracts raw text including CDATA markers, and the CSS parser received <![CDATA[ div { color: red; } ]]> and failed to parse any rules.
Fix: crates/style/src/context.rs — strip <![CDATA[ and ]]> from CSS text in extract_stylesheets() before parsing.
Fix 3: CSS Unit Conversion — cm, mm, in, pt, pc (Done)
The CSS parser only recognized px, em, rem, %. Other units (cm, mm, in, pt, pc) fell through to a default of Px, treating 2.54cm as 2.54px instead of 96px.
Fix:
crates/css/src/types.rs— AddedCm,Mm,In,Pt,Pcvariants withto_px()conversionscrates/css/src/parser.rs— Added unit string matching in dimension parsing and grid template parsing
Results
Fixes 1-3 combined promoted 120 tests from known_fail to pass (36 → 156 total).
The original estimates (~485 cumulative) were too optimistic — most tests affected by CDATA or unit issues also have other layout differences that prevent passing. The fixes were necessary prerequisites but not sufficient alone for those tests.
Fix 4: display: contents + Implicit <head> Insertion (Done)
Two changes combined:
-
display: contents— AddedDisplay::Contentsvariant. Elements with this value generate no box; their children are promoted into the parent's layout. Implemented viabuild_children_into()incrates/layout/src/engine/mod.rs. -
Implicit
<head>insertion — The HTML parser now creates an implicit<head>when encountering head-only elements (title,style,link,meta,script) without an existing<head>. Previously,<title>text was rendered visibly in the body for documents lacking explicit<head>tags, causing reftest mismatches since test and reference files have different titles. This was the larger win.
Fix:
crates/style/src/types.rs— AddedDisplay::Contentsvariant and keyword parsingcrates/layout/src/engine/mod.rs—build_children_into()flattens display:contents childrencrates/html/src/lib.rs— Implicit<head>creation and proper head-closing on non-head elements
Results: 53 tests promoted, 5 false-pass tests demoted (were only passing because both sides had content hidden in <head>). Net: +48 tests (156 → 204 total).
Fix 5: Pixel-Based Reftest Comparison + Parallel Test Execution (Done)
Most WPT reftests use different CSS techniques (borders vs backgrounds, etc.) to achieve the same visual result, making layout-tree text comparison fundamentally unable to match. The fix adds pixel-based comparison as a fallback.
Changes:
tests/wpt_harness/runner.rs— Addedrasterize_html()andcompare_pixels()functions. Reftests now try layout-tree comparison first (fast path), then fall back to pixel comparison by rasterizing both test and reference HTML to 800x600 pixel buffers and comparing per-pixel with a channel tolerance of 2.tests/wpt_harness.rs— Parallelized test execution usingstd::thread::scope, with progress reporting every 100 tests. Skips artifact writing for known_fail tests to reduce I/O.crates/style/src/context.rs— Added CDATA stripping toextract_stylesheet_sources()(was already inextract_stylesheets()but missing from the Pipeline code path).
Results: 955 tests promoted from known_fail to pass (204 → 1,159 total). Test suite runs in ~11 seconds with parallel execution.
Remaining Work (Prioritized)
Remaining (~1,754 tests)
The remaining known_fail tests fail pixel comparison — they require CSS features or layout modes that the engine doesn't yet support. Key areas:
| Category | Issue |
|---|---|
| css-flexbox | Flex abspos, complex item sizing, writing-modes |
| css-text | BiDi, text shaping, advanced text-align |
| css-backgrounds | background-clip, box-shadow, gradients |
| generated-content | ::before/::after pseudo-elements |
| visibility/opacity | Not yet parsed |
| css-tables | Table layout edge cases |
Summary Table
| Fix | Tests Promoted | Status | Cumulative Pass |
|---|---|---|---|
| 1. Node ID normalization | (combined) | Done | — |
| 2. CDATA stripping | (combined) | Done | — |
| 3. CSS unit conversion | (combined) | Done | 156 pass |
| 4. display:contents + implicit head | +48 net | Done | 204 pass |
| 5. Pixel-based comparison + parallel | +955 | Done | 1,159 pass |