Files
rust_browser/docs/architecture.md
Zachary D. Rowitsch 7e318179a6 Mark AST-walking JS interpreter as deprecated in favor of bytecode VM
The bytecode pipeline (§3.1) now handles all program-level execution.
Gate execute_program with #[cfg(test)], add deprecation docs to the
AST-walking statements/expressions modules, and update architecture
docs to reflect the bytecode VM as primary. Also adds story 3-2
(generators) as ready-for-dev.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 10:47:46 -04:00

12 KiB

Architecture — rust_browser

Generated: 2026-03-05 | Scan Level: Deep

Executive Summary

rust_browser is a from-scratch web browser engine built as a Rust Cargo workspace with 22 crates organized in a strict 4-layer hierarchy. The architecture follows a deterministic, single-threaded rendering pipeline (HTML → DOM → CSS → Style → Layout → Display List → Render) with arena-based data flow between phases. Worker threads handle only network I/O and image decoding. The project prioritizes correctness, testability, and clear separation of concerns over early performance optimization.

Architecture Overview

┌────────────────────────────────────────────────────────────────┐
│ Layer 3: app_browser                                           │
│   Desktop shell, CLI, event loop, pipeline orchestration       │
├────────────────────────────────────────────────────────────────┤
│ Layer 2: browser_runtime                                       │
│   Tab management, navigation lifecycle, history, browsing ctx  │
├────────────────────────────────────────────────────────────────┤
│ Layer 1: Engine Crates                                         │
│                                                                │
│  ┌─── JS Engine ───┐  ┌──── Rendering Pipeline ─────────────┐ │
│  │ js_parser (AST)  │  │ html → dom → css → selectors        │ │
│  │ js_vm (interp)   │  │   → style → layout → display_list   │ │
│  │ js (facade)      │  │   → render → graphics                │ │
│  └──────────────────┘  └─────────────────────────────────────┘ │
│                                                                │
│  ┌─── Infrastructure ──┐  ┌── Web APIs ──┐                    │
│  │ net (HTTP/file)      │  │ web_api      │                    │
│  │ image (decode)       │  │ (JS↔DOM      │                    │
│  │ fonts (rasterize)    │  │  bridge)     │                    │
│  │ storage (placeholder)│  └──────────────┘                    │
│  │ platform (windowing) │                                      │
│  └──────────────────────┘                                      │
├────────────────────────────────────────────────────────────────┤
│ Layer 0: shared                                                │
│   Common types: Point, Size, Rect, Color, NodeId, StyleId,    │
│   LayoutId, BrowserUrl, errors                                 │
└────────────────────────────────────────────────────────────────┘

Rendering Pipeline

The pipeline is intentionally staged and deterministic. Each phase produces a stable intermediate representation:

1. Input          → HTML source string
2. html crate     → DOM Tree (arena-based, NodeId indices)
3. css crate      → Parsed Stylesheets (Rules, Selectors, Declarations)
4. selectors      → Matched Rules per node (with Specificity)
5. style crate    → StyledDocument (NodeId → ComputedStyles)
6. layout crate   → Layout Tree (LayoutNode, box geometry, positioning)
7. display_list   → Display List (Vec<DisplayItem>: rects, text, borders, images)
8. render crate   → Pixel Buffer (RGBA8 pixels)
9. platform       → On-screen presentation (via softbuffer + winit)

In --render mode, steps 2-7 produce artifact dumps (.layout.txt and .dl.txt) for golden regression testing.

Crate Dependency Graph

Layer Rules (enforced by scripts/check_deps.sh)

  • No upward dependencies — Layer 1 cannot depend on Layer 2 or 3
  • Lateral dependencies within Layer 1 are allowed where necessary (e.g., style depends on css, dom, selectors)
  • Layer 0 (shared) has no internal dependencies

Key Dependencies

Crate Depends On
app_browser (L3) browser_runtime, platform, shared, net, html, dom, css, style, layout, display_list, render, image
browser_runtime (L2) web_api, dom, net, storage, shared
web_api (L1) dom, html, js, js_parser, shared
style (L1) css, dom, selectors, shared
layout (L1) dom, style, image, fonts, shared
display_list (L1) layout, style, shared
render (L1) display_list, fonts, graphics, image, shared
selectors (L1) dom, shared
js_vm (L1) js_parser
js (L1) js_parser, js_vm

Data Flow Patterns

Arena-Based Storage

Each crate uses arena-based storage with integer IDs for cross-crate references:

  • NodeId — DOM node identity (used by dom, html, style, layout, display_list)
  • StyleId — Style identity
  • LayoutId — Layout node identity
  • ImageId — Decoded image identity (used by image, render)

This avoids lifetime-based references across crate boundaries, simplifying the API surface.

Phase-Based Mutation

The pipeline follows a strict phase model:

  1. Parse phase — HTML parser builds DOM tree (mutable)
  2. Style phase — Style engine reads DOM, produces StyledDocument (DOM is immutable)
  3. Layout phase — Layout engine reads styled DOM, produces layout tree
  4. Paint phase — Display list builder reads layout tree, produces draw commands
  5. Render phase — Rasterizer reads display list, produces pixels

Each phase reads the previous phase's output and produces new, stable output.

JavaScript Engine Architecture

┌─────────────────────────────────────────────────┐
│ js crate (facade)                               │
│   JsEngine: combines parser + VM                │
├─────────────────────────────────────────────────┤
│ js_vm crate                                     │
│   JsVm: runtime state machine                   │
│   bytecode::Compiler: AST → bytecode (Chunk)    │
│   bytecode_exec: bytecode execution loop         │
│   Environment: variable scoping                  │
│   JsValue: value representation                  │
│   HostEnvironment: host object binding           │
│   Runtime: builtins, coercion, property access   │
│   [deprecated] AST interpreter: statements.rs,  │
│     expressions/ (see note below)                │
├─────────────────────────────────────────────────┤
│ js_parser crate (parsing)                       │
│   JsParser: source → AST                         │
│   Tokenizer: source → tokens                     │
│   AST types: Program, Statement, Expression      │
└─────────────────────────────────────────────────┘
  • Bytecode pipelineJsParser produces an AST, bytecode::Compiler compiles it to a Chunk (bytecode + constant pool), bytecode_exec runs the bytecode. The old AST-walking interpreter (execute_program, execute_stmt, eval_expr) is deprecated. It still exists because call_function (used by the bytecode executor for function invocations) delegates to execute_function_bodyexecute_stmt. Once function calls are fully compiled to bytecode, the AST interpreter can be removed.
  • Configurable limitsVmConfig with max_statements, max_call_depth
  • Host bindingsHostEnvironment trait for injecting browser APIs
  • Regex — ECMAScript-compatible via regress crate

Web API Bridge

The web_api crate connects the JS engine to the DOM:

  • DOM Host Objectswindow, document exposed as JS host objects
  • Event SystemEventTarget, EventListenerRegistry, DispatchResult
  • SchedulingTaskQueue (setTimeout/setInterval), MicrotaskQueue (Promise callbacks)
  • PromisesPromiseRegistry for async operation tracking
  • Script ExecutionWebApiFacade::execute_script() as the unified entry point

Threading Model

Main Thread (single-threaded):
  ├── JavaScript execution (bytecode VM)
  ├── DOM manipulation
  ├── Style computation
  ├── Layout
  ├── Display list generation
  ├── CPU rasterization
  └── Event dispatch

Worker Threads (I/O only):
  ├── Network requests (HTTP via ureq)
  └── Image decoding

The single-threaded model ensures determinism and simplifies reasoning about state. Worker threads are used only for operations that would block the main thread.

State Management

Navigation Lifecycle

BrowserRuntime
  └── BrowsingContext
        ├── NavigationState: Loading → ReceivingData → Complete | Failed
        ├── Current URL
        └── WebApiFacade
              ├── JS Engine
              ├── DOM Document
              ├── Event Listeners
              ├── Task Queue
              └── Microtask Queue

Application State (app_browser)

  • AppState — mutable browser state (current page, scroll position, input focus)
  • BrowserChrome — UI chrome overlay (URL bar, status)
  • HitTest — Click target resolution from display list
  • FocusOutline — Keyboard focus visualization
  • FormState — Form input handling

Platform Abstraction

Platform-specific code is isolated in crates/platform/:

  • Windowingwinit for cross-platform window creation
  • Pixel presentationsoftbuffer for CPU-rendered pixel blitting
  • Event mapping — OS events → WindowEvent enum
  • Text input — IME state management

The platform and graphics crates are the only ones allowed to use unsafe code.

Testing Architecture

Layer Type Location Purpose
Unit #[cfg(test)] mod tests Inline in crate source Private API testing
Integration tests/*.rs Root tests/ directory Cross-crate behavior
Golden tests/goldens.rs Golden fixtures Layout/display list regression
JS Conformance tests/js262_harness.rs Test262 manifests ECMAScript spec compliance
Web Platform tests/wpt_harness.rs WPT fixtures CSS/HTML spec compliance

Safety Guarantees

  1. unsafe forbidden globally — workspace lint unsafe_code = "forbid"
  2. Exceptions: platform/ and graphics/ crates only (via per-crate lint override)
  3. Enforced by CIscripts/check_unsafe.sh audits unsafe usage
  4. Dependency layeringscripts/check_deps.sh prevents architecture violations
  5. License policydeny.toml prevents license/advisory issues