Add <base href> support with resolve_base_url() wired before all resource loading, document.baseURI JS API (falling back to "about:blank" per spec), image format verification golden tests (PNG/JPEG/GIF/WebP/SVG for both <img> and CSS background-image), and aspect ratio preservation tests. Code review fixes: baseURI spec compliance, restyle path no longer overwrites <base href> URL, added CSS background-image format tests and base URL integration tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
17 KiB
17 KiB
Full HTML5 Support Checklist (Browser Engine)
Checked items in this document mean end-to-end support in the current tree, not parser-only or DOM-only support. If parsing exists but styling, layout, rendering, runtime behavior, or web-exposed APIs are missing, leave the item unchecked and note the missing stages.
Phase 0: Spec Targets & Test Harness
- Define “HTML5” target: WHATWG HTML Living Standard (recommended) vs W3C HTML5 snapshot
- Add Web Platform Tests (WPT) runner integration for HTML (parse + DOM + rendering)
- Add layout/reftest harness for HTML rendering features (with known-fail list)
- Add conformance reporting: pass rate by area (parsing/DOM/forms/media/etc.)
Phase 1: HTML Parsing (Tokenizer + Tree Builder)
- Implement HTML tokenization (all states, incl. script/rawtext/rcdata/plaintext)
- Implemented: Full WHATWG §13.2.5 state machine with all 80 states, including Data, RCDATA, RAWTEXT, PLAINTEXT, ScriptData (with escape/double-escape sub-states), tag/attribute states, comment/DOCTYPE states, CDATA section states, and character reference states. Full HTML5 named entity table (2,125 entries) with binary search. ParseError tracking with line/column info. Null character handling, EOF recovery, Windows-1252 remapping for numeric references.
- Implement tree builder insertion modes:
- Implemented: All 23 insertion modes as an explicit state machine with mode transitions, implicit html/head/body element creation, active formatting elements list (data structure + reconstruction), scope checking (default, list item, button, table, select scopes), form element pointer tracking, heading auto-close, block splitting, and table element auto-creation.
- initial / before html / before head / in head / in head noscript
- after head / in body / text
- in table / in table text / in caption / in column group / in table body / in row / in cell
- in select / in select in table
- in template / after body / in frameset / after frameset / after after body / after after frameset
- Foster parenting rules for tables
- Implemented: Foster parenting reparents non-table content (text and elements) before the
<table>element per HTML §13.2.6.1. Wired intoinsert_element,insert_element_no_push, andinsert_characterwhenfoster_parentingflag is active. Handles text node merging at foster parent insertion point.
- Implemented: Foster parenting reparents non-table content (text and elements) before the
- Adoption agency algorithm (mis-nested formatting elements)
- Implemented: Full WHATWG HTML §13.2.6.4.7 adoption agency algorithm with outer loop limit (8) and inner loop limit (3). Handles
<a>special case (closes existing<a>before opening new one). Formatting element classification for: a, b, big, code, em, font, i, s, small, strike, strong, tt, u.
- Implemented: Full WHATWG HTML §13.2.6.4.7 adoption agency algorithm with outer loop limit (8) and inner loop limit (3). Handles
- Handling of void elements (no end tags)
- Optional end tags rules
- Implemented:
li,dt/dd,option/optgroup,h1–h6,p,rb/rt/rtc/rp, plus existing table tags (td/th/tr/tbody/thead/tfoot). - Missing pipeline steps:
body,html, and remaining edge-case optional-end-tag elements.
- Implemented:
- Parse errors behavior matches spec (recover, don’t crash)
- Implemented: Tokenizer collects ParseError structs with line/col info per WHATWG spec. All states handle unexpected characters and EOF with spec-accurate recovery. Safety limits (10MB input, 500K tokens, 1024 nesting) remain enforced.
- Fragment parsing for:
- innerHTML
- Current status:
Element.innerHTMLgetter/setter is wired to fragment parse/serialize. Context-sensitive fragment parsing is implemented (initial mode based on context element). - Missing pipeline steps: remaining edge cases (adoption agency and foster parenting now integrated).
- Current status:
- Range/createContextualFragment
- template contents
- innerHTML
Phase 2: DOM Core + HTML DOM Integration
- DOM tree: Node/Element/Document/DocumentFragment/Text/Comment
- All node types implemented.
DocumentFragmentadded in Story 2.4 with transparent container semantics (children transfer on insertion).
- All node types implemented.
- DOM mutation algorithms (append/insert/remove/replace)
appendChild,removeChild,insertBefore,replaceChildall implemented with JS host API exposure and DocumentFragment support (Story 2.4).createDocumentFragmentexposed to JS viadocument.createDocumentFragment().
- Live NodeLists/HTMLCollections where required
getElementsByTagNameandgetElementsByClassNamereturn liveHTMLCollectionobjects that re-evaluate on every access (Story 2.5).querySelectorAllreturns static NodeList (JS array snapshot).- HTMLCollection supports
.length, numeric index access, and.item()method.
- DOMTokenList (classList, relList, etc.)
- Attributes: NamedNodeMap semantics + namespace-aware APIs
- Current status: basic get/set attribute storage exists.
- Missing pipeline steps:
NamedNodeMap, namespace-aware APIs, and removal/reflection surface.
- Custom element name validation plumbing (even if CE is later)
- Document/Element query APIs:
- getElementById/getElementsByTagName/getElementsByClassName
- All three exposed to JS on both Document and Element (scoped to descendants for Element).
getElementsByTagName("*")wildcard supported (Story 2.5).
- All three exposed to JS on both Document and Element (scoped to descendants for Element).
- querySelector/querySelectorAll
- Both exposed on Document and Element. Element-scoped queries iterate descendants only. Invalid selectors return
SyntaxError. Leverages existingselectorscrate engine (Story 2.5).
- Both exposed on Document and Element. Element-scoped queries iterate descendants only. Invalid selectors return
- matches/closest
- getElementById/getElementsByTagName/getElementsByClassName
- HTML document quirks:
- quirks mode / limited-quirks / standards mode from doctype
- case-insensitive tag/attr behavior in HTML documents
- Current status: parser lowercases HTML tag/attribute names.
- Missing pipeline steps: full HTML DOM case-insensitive attribute/query behavior.
Phase 3: Events (Needed for HTML)
- EventTarget + add/removeEventListener
- Event dispatch (capture/target/bubble)
- Current status: target + bubble are implemented for click dispatch.
- Missing pipeline steps: capture phase.
- Default actions + preventDefault/cancelable rules
- Current status:
preventDefault()works and app-level default actions are suppressed. - Missing pipeline steps: broader DOM-native default-action coverage beyond app-browser click handling.
- Current status:
- Keyboard/mouse basic events used by forms/links/focus
- Current status: click is implemented.
- Missing pipeline steps: keyboard, broader mouse, and focus event families.
- Focus management:
- activeElement
- focus/blur events
- tab navigation ordering basics
Phase 4: Core Document Lifecycle
- Navigation primitives: load URL → fetch → parse → commit Document
- Document readiness:
- DOMContentLoaded
- load event
- readyState transitions
- Resource loading for:
- <script src>
- Base URL resolution:
- document.baseURI
-
- Implemented:
resolve_base_url()in pipeline, wired before all resource loading.document.baseURIexposed via JS API.
- Implemented:
Phase 5: Scripting & Script Loading
- <script> inline execution
- <script src> external execution
- async scripts
- defer scripts
- module scripts (optional if strictly “HTML5-era” only; recommended for modern web)
- document.write() (including parser insertion + blocking semantics)
- noscript behavior when scripting disabled
Phase 6: HTML Elements (Parsing + DOM + Default Behaviors)
6.1 Document metadata / structure
- html, head, body
- title, base, link, meta, style
- Current status:
linkandstyleare integrated into stylesheet loading;styleis hidden in rendering.<base href>is resolved and used for all resource loading. - Missing pipeline steps:
titledocument integration and metadata APIs.
- Current status:
- template (contents DocumentFragment + inert parsing)
6.2 Sectioning / semantics
- header, footer, main, nav, section, article, aside
- h1–h6, hgroup (legacy), address
- div, span
6.3 Text content
- p, pre, blockquote
- ol, ul, li
- dl, dt, dd
- figure, figcaption
- hr
6.4 Inline text semantics
- a (linking + navigation)
- em, strong, small, s
- cite, q
- Current status: generic inline rendering exists;
citegets UA italics. - Missing pipeline steps:
qquotation/default behavior.
- Current status: generic inline rendering exists;
- dfn, abbr
- data, time
- code, var, samp, kbd
- sub, sup
- i, b, u
- mark
- bdi, bdo, ruby, rt, rp
- br, wbr
- Current status:
bris parsed as a void element. - Missing pipeline steps: verified line-break behavior for
brand soft-wrap behavior forwbr.
- Current status:
6.5 Edits
- ins, del
6.6 Embedded content
- img (+ srcset/sizes optional but expected today)
- picture (optional but strongly recommended)
- iframe (static rendering: src, srcdoc, dimension attributes, style isolation; no JS/interaction/nesting)
- embed, object, param
- Current status:
object[data]can render image payloads and otherwise falls back to children. - Missing pipeline steps:
embed,param, and non-imageobjectbehavior.
- Current status:
- video, audio, source, track (see Phase 10)
- canvas (see Phase 11)
- svg integration points (optional but expected)
- math integration points (optional)
6.7 Tabular data
- table, caption, colgroup, col, tbody, thead, tfoot, tr, td, th
- Current status:
table,caption,colgroup,col,tbody,thead,tfoot,tr,td, andthare end-to-end through layout and paint. colgroup/colsupport: width hints and span attributes. Deferred: column backgrounds,visibility: collapse, column borders in collapsed model.
- Current status:
6.8 Forms content (see Phase 8/9)
- form, label, input, button, select, datalist, optgroup, option, textarea
- Current status:
form,label,input,button,select,optgroup,option, andtextareahave partial parse/layout/runtime support. - Missing pipeline steps:
datalist, full control behavior, full submission semantics, and validation plumbing.
- Current status:
- fieldset, legend, output, progress, meter
- Current status:
fieldset/legendhave UA styling and generic rendering. - Missing pipeline steps:
output,progress, andmeterbehavior/rendering.
- Current status:
6.9 Interactive
- details, summary
- dialog (optional but common)
- menu (largely obsolete; safe parse/support minimal)
Phase 7: Attributes, Reflecting, and “Content Attributes”
- Global attributes: id, class, style, title, lang, dir, hidden, tabindex, contenteditable, draggable, spellcheck
- Current status:
id,class, andstyleare wired;lang/dirparticipate in selector matching;contenteditableis selector-only. - Missing pipeline steps:
hidden,tabindex,draggable,spellcheck, and web-exposed behavior/reflection.
- Current status:
- URL attributes resolution (href/src/action/poster/etc.)
- Current status:
href/src/actionresolve against the page URL in navigation/resource-loading paths. - Missing pipeline steps:
<base>integration and broader DOM/IDL URL reflection.
- Current status:
- Boolean attribute semantics (present/absent)
- Reflecting IDL attributes for major elements (e.g., HTMLAnchorElement.href)
- dataset support (data-* attributes)
Phase 8: Forms (Structure + Submission)
- Form ownership rules (including form attribute)
- Successful controls rules
- Form submission:
- GET
- POST (application/x-www-form-urlencoded)
- multipart/form-data (file inputs)
- submit event + preventDefault
- Current status: button/input clicks can trigger GET/POST submission, and click
preventDefault()suppresses default submission. - Missing pipeline steps: actual
submitevent dispatch and full successful-controls processing.
- Current status: button/input clicks can trigger GET/POST submission, and click
- Form-associated custom elements plumbing (optional if CE implemented)
Phase 9: Constraint Validation & Input Types
- constraint validation API (checkValidity/reportValidity/validity)
- required/disabled/readonly handling
- pattern/min/max/step/minlength/maxlength
- Input types:
- text, search, tel, url, email, password
- checkbox, radio
- submit, reset, button
- number, range
- date, time, datetime-local, month, week
- color
- file
- hidden
- select/option default selection rules
- Current status: collapsed
<select>rendering picks theselectedoption or the first option. - Missing pipeline steps: full DOM/value/selection behavior.
- Current status: collapsed
- textarea value/selection APIs
- datalist suggestions (optional UI, but DOM behavior required)
Phase 10: Media Elements (Audio/Video)
- element core
- Media resource selection algorithm (source elements + type sniffing)
- Playback state machine (paused/seeking/ended)
- Media events (play/pause/timeupdate/canplay/etc.)
- Controls attribute (basic UI optional; API required)
- track element parsing + TextTrack plumbing (can start minimal)
Phase 11: Canvas
- canvas element sizing + fallback content
- 2D context:
- path APIs (beginPath/moveTo/lineTo/arc/rect/etc.)
- fill/stroke styles
- transforms
- text drawing (fillText/strokeText/measureText)
- image drawing (drawImage)
- pixel APIs (getImageData/putImageData)
- toDataURL / toBlob
Phase 12: Navigation, History, and Location
- Location API (href/assign/replace/reload)
- History API:
- back/forward/go
- Current status: internal browsing-context history exists for the app shell.
- Missing pipeline steps: web-exposed
window.historyAPI.
- pushState/replaceState
- popstate event
- back/forward/go
- Fragment navigation (#hash) + scroll to element
- target=_blank/window browsing context basics (can be single-window at first, but model should exist)
Phase 13: Loading, Preload Scanner, and Fetch Integration (HTML-facing)
- Resource prioritization basics (parser-blocking vs async)
- Current status: classic scripts are loaded/executed synchronously after parsing.
- Missing pipeline steps: explicit parser-blocking/async/defer scheduling model.
- preload scanner for
//<script> (optional early, important later)
- CORS mode plumbing for script/img/media where applicable (even if enforcement is minimal initially)
Phase 14: Security Model Basics (HTML-level)
- Same-origin policy hooks for:
- iframe access
- window.opener relationships
- sandbox attribute parsing + enforcement hooks (partial ok early, full later)
- Content Security Policy integration hooks (optional but common)
- Referrer policy (optional but common)
- Current status: dangerous URL schemes are blocked for links/forms.
- Missing pipeline steps: same-origin, sandbox, CSP, and referrer-policy model.
Phase 15: Internationalization & Text Direction
- lang/dir propagation
- Current status:
:lang()and:dir()selector matching works via inherited attribute lookup. - Missing pipeline steps: actual layout/rendering text-direction propagation.
- Current status:
- dir=auto behavior (optional but helpful)
- basic bidi text rendering integration (with CSS direction/unicode-bidi)
Phase 16: Editing APIs (Optional but often expected)
- contenteditable basics (DOM behavior)
- Current status: only selector matching/plumbing exists.
- Missing pipeline steps: editing behavior and DOM interaction model.
- execCommand is legacy (safe ignore), but ensure pages don’t crash
- selection + ranges (see below)
Phase 17: Selection and Ranges (Needed for many pages)
- Range API (createRange/setStart/setEnd/cloneContents/etc.)
- Selection API (window.getSelection, ranges, basic editing interactions)
- caret browsing basics (optional)
Phase 18: Web Components (Not “HTML5 classic”, but modern “full HTML” expectation)
- Custom Elements v1 (define/upgrade/lifecycle)
- Shadow DOM (attachShadow, slots)
- HTML templates + cloning integration
- Scoped event retargeting + composed paths
Phase 19: Storage & Offline (Commonly expected)
- Web Storage: localStorage/sessionStorage
- IndexedDB (bigger; optional but strongly expected for modern web)
- Service Workers (large; optional unless targeting modern “full web”)
Phase 20: Workers & Messaging (Commonly expected)
- postMessage between windows/frames
- Web Workers (DedicatedWorker)
- MessageChannel/MessagePort
Phase 21: Final “Full HTML5” Exit Criteria
- HTML parser passes WPT parsing tests at target threshold
- DOM + Events pass core WPT suites at target threshold
- Forms + validation pass WPT at target threshold
- Media/Canvas pass WPT at target threshold (or documented exclusions)
- No-crash guarantee on malformed HTML/DOM operations
- Publish a conformance report with pass rates + remaining gaps