Files

Zachary D. Rowitsch 02cf8fd55e Add pixel-based reftest comparison and parallel WPT execution to promote 955 tests

Reftests now fall back to pixel comparison when layout-tree text comparison
fails. Both test and reference HTML are rasterized to 800x600 pixel buffers
and compared per-pixel with a channel tolerance of 2. This handles the
common WPT pattern where tests use different CSS techniques (borders vs
backgrounds) to achieve the same visual result.

Test execution is parallelized using std::thread::scope with progress
reporting every 100 tests. Suite runs in ~11 seconds across all 2,914 tests.

Also fixes missing CDATA stripping in extract_stylesheet_sources() used by
the Pipeline code path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-14 11:22:16 -05:00

5.3 KiB

Raw Permalink Blame History

WPT Known-Fail Test Analysis

Last updated: 2026-02-14

Current State

1,159 pass, 1,754 known_fail, 1 skip
All known_fail tests are reftests (comparing layout tree or pixel output of test HTML against reference HTML)

Completed Fixes

Fix 1: Node ID Normalization in Reftest Comparison (Done)

In reftests, the test HTML and reference HTML have different <head> content (different numbers of <link>, <meta>, <title> elements), causing DOM node IDs to differ. The layout trees are structurally identical in content/dimensions, but node=#23 vs node=#15 caused string comparison to fail.

Fix: tests/wpt_harness/runner.rs — normalize_node_ids() replaces node=#N with sequential IDs based on order of appearance before comparing layout and display list dumps.

Fix 2: Strip CDATA Markers from Style Text (Done)

Many WPT reference HTML files (originally XHTML) wrap CSS in <![CDATA[...]]> inside <style> tags. The HTML parser extracts raw text including CDATA markers, and the CSS parser received <![CDATA[ div { color: red; } ]]> and failed to parse any rules.

Fix: crates/style/src/context.rs — strip <![CDATA[ and ]]> from CSS text in extract_stylesheets() before parsing.

Fix 3: CSS Unit Conversion — cm, mm, in, pt, pc (Done)

The CSS parser only recognized px, em, rem, %. Other units (cm, mm, in, pt, pc) fell through to a default of Px, treating 2.54cm as 2.54px instead of 96px.

Fix:

crates/css/src/types.rs — Added Cm, Mm, In, Pt, Pc variants with to_px() conversions
crates/css/src/parser.rs — Added unit string matching in dimension parsing and grid template parsing

Results

Fixes 1-3 combined promoted 120 tests from known_fail to pass (36 → 156 total).

The original estimates (~485 cumulative) were too optimistic — most tests affected by CDATA or unit issues also have other layout differences that prevent passing. The fixes were necessary prerequisites but not sufficient alone for those tests.

Fix 4: `display: contents` + Implicit `<head>` Insertion (Done)

Two changes combined:

display: contents — Added Display::Contents variant. Elements with this value generate no box; their children are promoted into the parent's layout. Implemented via build_children_into() in crates/layout/src/engine/mod.rs.
Implicit <head> insertion — The HTML parser now creates an implicit <head> when encountering head-only elements (title, style, link, meta, script) without an existing <head>. Previously, <title> text was rendered visibly in the body for documents lacking explicit <head> tags, causing reftest mismatches since test and reference files have different titles. This was the larger win.

Fix:

crates/style/src/types.rs — Added Display::Contents variant and keyword parsing
crates/layout/src/engine/mod.rs — build_children_into() flattens display:contents children
crates/html/src/lib.rs — Implicit <head> creation and proper head-closing on non-head elements

Results: 53 tests promoted, 5 false-pass tests demoted (were only passing because both sides had content hidden in <head>). Net: +48 tests (156 → 204 total).

Fix 5: Pixel-Based Reftest Comparison + Parallel Test Execution (Done)

Most WPT reftests use different CSS techniques (borders vs backgrounds, etc.) to achieve the same visual result, making layout-tree text comparison fundamentally unable to match. The fix adds pixel-based comparison as a fallback.

Changes:

tests/wpt_harness/runner.rs — Added rasterize_html() and compare_pixels() functions. Reftests now try layout-tree comparison first (fast path), then fall back to pixel comparison by rasterizing both test and reference HTML to 800x600 pixel buffers and comparing per-pixel with a channel tolerance of 2.
tests/wpt_harness.rs — Parallelized test execution using std::thread::scope, with progress reporting every 100 tests. Skips artifact writing for known_fail tests to reduce I/O.
crates/style/src/context.rs — Added CDATA stripping to extract_stylesheet_sources() (was already in extract_stylesheets() but missing from the Pipeline code path).

Results: 955 tests promoted from known_fail to pass (204 → 1,159 total). Test suite runs in ~11 seconds with parallel execution.

Remaining Work (Prioritized)

Remaining (~1,754 tests)

The remaining known_fail tests fail pixel comparison — they require CSS features or layout modes that the engine doesn't yet support. Key areas:

Category	Issue
css-flexbox	Flex abspos, complex item sizing, writing-modes
css-text	BiDi, text shaping, advanced text-align
css-backgrounds	background-clip, box-shadow, gradients
generated-content	::before/::after pseudo-elements
visibility/opacity	Not yet parsed
css-tables	Table layout edge cases

Summary Table

Fix	Tests Promoted	Status	Cumulative Pass
1. Node ID normalization	(combined)	Done	—
2. CDATA stripping	(combined)	Done	—
3. CSS unit conversion	(combined)	Done	156 pass
4. display:contents + implicit head	+48 net	Done	204 pass
5. Pixel-based comparison + parallel	+955	Done	1,159 pass

5.3 KiB Raw Permalink Blame History