Break the 2,194-line monolithic HTML parser file into focused modules:
- entities.rs: HTML entity decoding functions
- tokenizer.rs: Token/Attribute types and Tokenizer
- tree_builder.rs: TreeBuilder and void element handling
- tests/: Split tests into tokenizer, parsing, table, and fragment files
Follows the same pattern as the dom/ and pipeline/ splits. Public API
unchanged — HtmlParser, Token, Attribute, TreeBuilder all re-exported.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>