Files
rust_browser/docs/HTML5_Implementation_Checklist.md
Zachary D. Rowitsch 70ad1244d8 Implement base URL resolution and image rendering completeness with code review fixes (§4.2.3)
Add <base href> support with resolve_base_url() wired before all resource loading,
document.baseURI JS API (falling back to "about:blank" per spec), image format
verification golden tests (PNG/JPEG/GIF/WebP/SVG for both <img> and CSS
background-image), and aspect ratio preservation tests. Code review fixes:
baseURI spec compliance, restyle path no longer overwrites <base href> URL,
added CSS background-image format tests and base URL integration tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 01:36:35 -04:00

17 KiB
Raw Permalink Blame History

Full HTML5 Support Checklist (Browser Engine)

Checked items in this document mean end-to-end support in the current tree, not parser-only or DOM-only support. If parsing exists but styling, layout, rendering, runtime behavior, or web-exposed APIs are missing, leave the item unchecked and note the missing stages.

Phase 0: Spec Targets & Test Harness

  • Define “HTML5” target: WHATWG HTML Living Standard (recommended) vs W3C HTML5 snapshot
  • Add Web Platform Tests (WPT) runner integration for HTML (parse + DOM + rendering)
  • Add layout/reftest harness for HTML rendering features (with known-fail list)
  • Add conformance reporting: pass rate by area (parsing/DOM/forms/media/etc.)

Phase 1: HTML Parsing (Tokenizer + Tree Builder)

  • Implement HTML tokenization (all states, incl. script/rawtext/rcdata/plaintext)
    • Implemented: Full WHATWG §13.2.5 state machine with all 80 states, including Data, RCDATA, RAWTEXT, PLAINTEXT, ScriptData (with escape/double-escape sub-states), tag/attribute states, comment/DOCTYPE states, CDATA section states, and character reference states. Full HTML5 named entity table (2,125 entries) with binary search. ParseError tracking with line/column info. Null character handling, EOF recovery, Windows-1252 remapping for numeric references.
  • Implement tree builder insertion modes:
    • Implemented: All 23 insertion modes as an explicit state machine with mode transitions, implicit html/head/body element creation, active formatting elements list (data structure + reconstruction), scope checking (default, list item, button, table, select scopes), form element pointer tracking, heading auto-close, block splitting, and table element auto-creation.
    • initial / before html / before head / in head / in head noscript
    • after head / in body / text
    • in table / in table text / in caption / in column group / in table body / in row / in cell
    • in select / in select in table
    • in template / after body / in frameset / after frameset / after after body / after after frameset
  • Foster parenting rules for tables
    • Implemented: Foster parenting reparents non-table content (text and elements) before the <table> element per HTML §13.2.6.1. Wired into insert_element, insert_element_no_push, and insert_character when foster_parenting flag is active. Handles text node merging at foster parent insertion point.
  • Adoption agency algorithm (mis-nested formatting elements)
    • Implemented: Full WHATWG HTML §13.2.6.4.7 adoption agency algorithm with outer loop limit (8) and inner loop limit (3). Handles <a> special case (closes existing <a> before opening new one). Formatting element classification for: a, b, big, code, em, font, i, s, small, strike, strong, tt, u.
  • Handling of void elements (no end tags)
  • Optional end tags rules
    • Implemented: li, dt/dd, option/optgroup, h1h6, p, rb/rt/rtc/rp, plus existing table tags (td/th/tr/tbody/thead/tfoot).
    • Missing pipeline steps: body, html, and remaining edge-case optional-end-tag elements.
  • Parse errors behavior matches spec (recover, dont crash)
    • Implemented: Tokenizer collects ParseError structs with line/col info per WHATWG spec. All states handle unexpected characters and EOF with spec-accurate recovery. Safety limits (10MB input, 500K tokens, 1024 nesting) remain enforced.
  • Fragment parsing for:
    • innerHTML
      • Current status: Element.innerHTML getter/setter is wired to fragment parse/serialize. Context-sensitive fragment parsing is implemented (initial mode based on context element).
      • Missing pipeline steps: remaining edge cases (adoption agency and foster parenting now integrated).
    • Range/createContextualFragment
    • template contents

Phase 2: DOM Core + HTML DOM Integration

  • DOM tree: Node/Element/Document/DocumentFragment/Text/Comment
    • All node types implemented. DocumentFragment added in Story 2.4 with transparent container semantics (children transfer on insertion).
  • DOM mutation algorithms (append/insert/remove/replace)
    • appendChild, removeChild, insertBefore, replaceChild all implemented with JS host API exposure and DocumentFragment support (Story 2.4).
    • createDocumentFragment exposed to JS via document.createDocumentFragment().
  • Live NodeLists/HTMLCollections where required
    • getElementsByTagName and getElementsByClassName return live HTMLCollection objects that re-evaluate on every access (Story 2.5).
    • querySelectorAll returns static NodeList (JS array snapshot).
    • HTMLCollection supports .length, numeric index access, and .item() method.
  • DOMTokenList (classList, relList, etc.)
  • Attributes: NamedNodeMap semantics + namespace-aware APIs
    • Current status: basic get/set attribute storage exists.
    • Missing pipeline steps: NamedNodeMap, namespace-aware APIs, and removal/reflection surface.
  • Custom element name validation plumbing (even if CE is later)
  • Document/Element query APIs:
    • getElementById/getElementsByTagName/getElementsByClassName
      • All three exposed to JS on both Document and Element (scoped to descendants for Element). getElementsByTagName("*") wildcard supported (Story 2.5).
    • querySelector/querySelectorAll
      • Both exposed on Document and Element. Element-scoped queries iterate descendants only. Invalid selectors return SyntaxError. Leverages existing selectors crate engine (Story 2.5).
    • matches/closest
  • HTML document quirks:
    • quirks mode / limited-quirks / standards mode from doctype
    • case-insensitive tag/attr behavior in HTML documents
      • Current status: parser lowercases HTML tag/attribute names.
      • Missing pipeline steps: full HTML DOM case-insensitive attribute/query behavior.

Phase 3: Events (Needed for HTML)

  • EventTarget + add/removeEventListener
  • Event dispatch (capture/target/bubble)
    • Current status: target + bubble are implemented for click dispatch.
    • Missing pipeline steps: capture phase.
  • Default actions + preventDefault/cancelable rules
    • Current status: preventDefault() works and app-level default actions are suppressed.
    • Missing pipeline steps: broader DOM-native default-action coverage beyond app-browser click handling.
  • Keyboard/mouse basic events used by forms/links/focus
    • Current status: click is implemented.
    • Missing pipeline steps: keyboard, broader mouse, and focus event families.
  • Focus management:
    • activeElement
    • focus/blur events
    • tab navigation ordering basics

Phase 4: Core Document Lifecycle

  • Navigation primitives: load URL → fetch → parse → commit Document
  • Document readiness:
    • DOMContentLoaded
    • load event
    • readyState transitions
  • Resource loading for:
    • <script src>
  • Base URL resolution:
    • document.baseURI
      • Implemented: resolve_base_url() in pipeline, wired before all resource loading. document.baseURI exposed via JS API.

Phase 5: Scripting & Script Loading

  • <script> inline execution
  • <script src> external execution
  • async scripts
  • defer scripts
  • module scripts (optional if strictly “HTML5-era” only; recommended for modern web)
  • document.write() (including parser insertion + blocking semantics)
  • noscript behavior when scripting disabled

Phase 6: HTML Elements (Parsing + DOM + Default Behaviors)

6.1 Document metadata / structure

  • html, head, body
  • title, base, link, meta, style
    • Current status: link and style are integrated into stylesheet loading; style is hidden in rendering. <base href> is resolved and used for all resource loading.
    • Missing pipeline steps: title document integration and metadata APIs.
  • template (contents DocumentFragment + inert parsing)

6.2 Sectioning / semantics

  • header, footer, main, nav, section, article, aside
  • h1h6, hgroup (legacy), address
  • div, span

6.3 Text content

  • p, pre, blockquote
  • ol, ul, li
  • dl, dt, dd
  • figure, figcaption
  • hr

6.4 Inline text semantics

  • a (linking + navigation)
  • em, strong, small, s
  • cite, q
    • Current status: generic inline rendering exists; cite gets UA italics.
    • Missing pipeline steps: q quotation/default behavior.
  • dfn, abbr
  • data, time
  • code, var, samp, kbd
  • sub, sup
  • i, b, u
  • mark
  • bdi, bdo, ruby, rt, rp
  • br, wbr
    • Current status: br is parsed as a void element.
    • Missing pipeline steps: verified line-break behavior for br and soft-wrap behavior for wbr.

6.5 Edits

  • ins, del

6.6 Embedded content

  • img (+ srcset/sizes optional but expected today)
  • picture (optional but strongly recommended)
  • iframe (static rendering: src, srcdoc, dimension attributes, style isolation; no JS/interaction/nesting)
  • embed, object, param
    • Current status: object[data] can render image payloads and otherwise falls back to children.
    • Missing pipeline steps: embed, param, and non-image object behavior.
  • video, audio, source, track (see Phase 10)
  • canvas (see Phase 11)
  • svg integration points (optional but expected)
  • math integration points (optional)

6.7 Tabular data

  • table, caption, colgroup, col, tbody, thead, tfoot, tr, td, th
    • Current status: table, caption, colgroup, col, tbody, thead, tfoot, tr, td, and th are end-to-end through layout and paint.
    • colgroup/col support: width hints and span attributes. Deferred: column backgrounds, visibility: collapse, column borders in collapsed model.

6.8 Forms content (see Phase 8/9)

  • form, label, input, button, select, datalist, optgroup, option, textarea
    • Current status: form, label, input, button, select, optgroup, option, and textarea have partial parse/layout/runtime support.
    • Missing pipeline steps: datalist, full control behavior, full submission semantics, and validation plumbing.
  • fieldset, legend, output, progress, meter
    • Current status: fieldset/legend have UA styling and generic rendering.
    • Missing pipeline steps: output, progress, and meter behavior/rendering.

6.9 Interactive

  • details, summary
  • dialog (optional but common)
  • menu (largely obsolete; safe parse/support minimal)

Phase 7: Attributes, Reflecting, and “Content Attributes”

  • Global attributes: id, class, style, title, lang, dir, hidden, tabindex, contenteditable, draggable, spellcheck
    • Current status: id, class, and style are wired; lang/dir participate in selector matching; contenteditable is selector-only.
    • Missing pipeline steps: hidden, tabindex, draggable, spellcheck, and web-exposed behavior/reflection.
  • URL attributes resolution (href/src/action/poster/etc.)
    • Current status: href/src/action resolve against the page URL in navigation/resource-loading paths.
    • Missing pipeline steps: <base> integration and broader DOM/IDL URL reflection.
  • Boolean attribute semantics (present/absent)
  • Reflecting IDL attributes for major elements (e.g., HTMLAnchorElement.href)
  • dataset support (data-* attributes)

Phase 8: Forms (Structure + Submission)

  • Form ownership rules (including form attribute)
  • Successful controls rules
  • Form submission:
    • GET
    • POST (application/x-www-form-urlencoded)
    • multipart/form-data (file inputs)
    • submit event + preventDefault
      • Current status: button/input clicks can trigger GET/POST submission, and click preventDefault() suppresses default submission.
      • Missing pipeline steps: actual submit event dispatch and full successful-controls processing.
  • Form-associated custom elements plumbing (optional if CE implemented)

Phase 9: Constraint Validation & Input Types

  • constraint validation API (checkValidity/reportValidity/validity)
  • required/disabled/readonly handling
  • pattern/min/max/step/minlength/maxlength
  • Input types:
    • text, search, tel, url, email, password
    • checkbox, radio
    • submit, reset, button
    • number, range
    • date, time, datetime-local, month, week
    • color
    • file
    • hidden
  • select/option default selection rules
    • Current status: collapsed <select> rendering picks the selected option or the first option.
    • Missing pipeline steps: full DOM/value/selection behavior.
  • textarea value/selection APIs
  • datalist suggestions (optional UI, but DOM behavior required)

Phase 10: Media Elements (Audio/Video)

  • element core
  • Media resource selection algorithm (source elements + type sniffing)
  • Playback state machine (paused/seeking/ended)
  • Media events (play/pause/timeupdate/canplay/etc.)
  • Controls attribute (basic UI optional; API required)
  • track element parsing + TextTrack plumbing (can start minimal)

Phase 11: Canvas

  • canvas element sizing + fallback content
  • 2D context:
    • path APIs (beginPath/moveTo/lineTo/arc/rect/etc.)
    • fill/stroke styles
    • transforms
    • text drawing (fillText/strokeText/measureText)
    • image drawing (drawImage)
    • pixel APIs (getImageData/putImageData)
  • toDataURL / toBlob

Phase 12: Navigation, History, and Location

  • Location API (href/assign/replace/reload)
  • History API:
    • back/forward/go
      • Current status: internal browsing-context history exists for the app shell.
      • Missing pipeline steps: web-exposed window.history API.
    • pushState/replaceState
    • popstate event
  • Fragment navigation (#hash) + scroll to element
  • target=_blank/window browsing context basics (can be single-window at first, but model should exist)

Phase 13: Loading, Preload Scanner, and Fetch Integration (HTML-facing)

  • Resource prioritization basics (parser-blocking vs async)
    • Current status: classic scripts are loaded/executed synchronously after parsing.
    • Missing pipeline steps: explicit parser-blocking/async/defer scheduling model.
  • preload scanner for //<script> (optional early, important later)
  • CORS mode plumbing for script/img/media where applicable (even if enforcement is minimal initially)

Phase 14: Security Model Basics (HTML-level)

  • Same-origin policy hooks for:
    • iframe access
    • window.opener relationships
  • sandbox attribute parsing + enforcement hooks (partial ok early, full later)
  • Content Security Policy integration hooks (optional but common)
  • Referrer policy (optional but common)
    • Current status: dangerous URL schemes are blocked for links/forms.
    • Missing pipeline steps: same-origin, sandbox, CSP, and referrer-policy model.

Phase 15: Internationalization & Text Direction

  • lang/dir propagation
    • Current status: :lang() and :dir() selector matching works via inherited attribute lookup.
    • Missing pipeline steps: actual layout/rendering text-direction propagation.
  • dir=auto behavior (optional but helpful)
  • basic bidi text rendering integration (with CSS direction/unicode-bidi)

Phase 16: Editing APIs (Optional but often expected)

  • contenteditable basics (DOM behavior)
    • Current status: only selector matching/plumbing exists.
    • Missing pipeline steps: editing behavior and DOM interaction model.
  • execCommand is legacy (safe ignore), but ensure pages dont crash
  • selection + ranges (see below)

Phase 17: Selection and Ranges (Needed for many pages)

  • Range API (createRange/setStart/setEnd/cloneContents/etc.)
  • Selection API (window.getSelection, ranges, basic editing interactions)
  • caret browsing basics (optional)

Phase 18: Web Components (Not “HTML5 classic”, but modern “full HTML” expectation)

  • Custom Elements v1 (define/upgrade/lifecycle)
  • Shadow DOM (attachShadow, slots)
  • HTML templates + cloning integration
  • Scoped event retargeting + composed paths

Phase 19: Storage & Offline (Commonly expected)

  • Web Storage: localStorage/sessionStorage
  • IndexedDB (bigger; optional but strongly expected for modern web)
  • Service Workers (large; optional unless targeting modern “full web”)

Phase 20: Workers & Messaging (Commonly expected)

  • postMessage between windows/frames
  • Web Workers (DedicatedWorker)
  • MessageChannel/MessagePort

Phase 21: Final “Full HTML5” Exit Criteria

  • HTML parser passes WPT parsing tests at target threshold
  • DOM + Events pass core WPT suites at target threshold
  • Forms + validation pass WPT at target threshold
  • Media/Canvas pass WPT at target threshold (or documented exclusions)
  • No-crash guarantee on malformed HTML/DOM operations
  • Publish a conformance report with pass rates + remaining gaps