rust_browser/tests/goldens/fixtures/275-rcdata-rawtext-states.html at fix/es-modules-fifth-review-3-4 - rust_browser - Code @ MoldyBits.net

moldybits/rust_browser

Files

T

Zachary D. Rowitsch 2e7642a658 Implement HTML5 tokenizer completeness with code review fixes (§13.2.5)

Refactor tokenizer from 320-line linear scan to full WHATWG §13.2.5 state
machine with all 80 states. Add RCDATA, RAWTEXT, PLAINTEXT, ScriptData
(with escape/double-escape), character reference states, CDATA sections,
and bogus comments. Replace ~100-entity table with full HTML5 set (2,125
unique entries) using sorted array + binary search. Add ParseError tracking
with 33 error kinds including line/column info.

Code review fixes: remove dead Token::RawText variant and tree_builder
branches, add duplicate attribute detection, add noncharacter/control char
reference checks per §13.2.5.80, fix noscript unconditional RAWTEXT
handling, rename golden 093 to 275 to avoid number collision, add script
escape/double-escape tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-14 14:03:23 -04:00

11 lines

191 B

HTML

Raw Permalink Blame History

 <!DOCTYPE html>
 <html>
 <head>
 <title>Title with &amp; entity</title>
 <style>body { margin: 8px; }</style>
 </head>
 <body>
 <xmp>This is <b>preformatted</b> &amp; raw text</xmp>
 </body>
 </html>