Files
tcptop/.planning/research/SUMMARY.md
2026-03-21 18:08:55 -04:00

19 KiB

Project Research Summary

Project: tcptop Domain: Rust eBPF-based real-time network monitoring TUI (Linux + macOS) Researched: 2026-03-21 Confidence: MEDIUM-HIGH

Executive Summary

tcptop is a top-style terminal monitoring tool for network connections — showing per-connection bytes, packets, process name, TCP state, and latency in a live-updating TUI. The well-documented approach builds this as a Rust workspace using the Aya eBPF framework (Linux) for kernel-level connection tracing, Ratatui for the TUI, and a platform abstraction trait that gates the Linux eBPF backend behind cfg(target_os = "linux") while leaving room for a macOS backend via libpcap/PKTAP. The competitive landscape (BCC tcptop, nethogs, bandwhich, iftop) confirms the feature set and validates per-connection process attribution as the core differentiator. Per-connection TCP RTT/latency is the strongest unique feature — no existing CLI tool shows this live.

The recommended execution path is Linux-first, macOS-later. The eBPF data pipeline (kernel programs, ring buffer, userspace state table) is the critical path and the hardest engineering. The TUI and CSV logger are straightforward once the data pipeline exists. The single biggest architectural risk is failing to define the DataSource platform abstraction trait before writing any eBPF code — retrofitting it later is expensive and forces a major rewrite of the data pipeline. All other pitfalls (verifier rejection, toolchain fragility, kernel version compatibility) have known mitigations that must be applied at project setup, not after.

The macOS backend (libpcap/PKTAP) is explicitly a v2 concern: it has medium-confidence paths to implementation but depends on a different kernel model that cannot achieve full parity with eBPF. Design the trait to accommodate graceful degradation rather than requiring feature parity. Accept that macOS will show connection-level data without the same depth of TCP state machine visibility.

Key Findings

The stack is well-established with no real alternatives for the core components. Aya is the uncontested Rust eBPF library; Ratatui is the community standard for Rust TUIs; Tokio is required by Aya's async features. The only architectural complexity is the macOS backend, which requires libpcap + PKTAP (process-attributed packet capture) rather than eBPF, and involves userspace reconstruction of connection state from raw packet data.

The build system follows the Aya convention: a Cargo workspace with separate crates for eBPF kernel programs (tcptop-ebpf, targeting bpfel-unknown-none), shared types (tcptop-common, no_std), the userspace binary (tcptop), and an xtask build tool. Nightly Rust is required only for the eBPF crate; userspace code should be written to compile on stable.

Core technologies:

  • Rust (nightly via rust-toolchain.toml): Language — nightly required for eBPF target; pin exact date immediately
  • Aya 0.13.x + aya-ebpf 0.1.x: eBPF framework — only viable pure-Rust eBPF option; no alternatives
  • Ratatui 0.30.x + Crossterm 0.28.x: TUI framework — community standard since tui-rs deprecation; immediate-mode rendering is ideal for high-frequency data updates
  • Tokio 1.43+ (LTS): Async runtime — required by Aya's async map polling; use tokio::select! to multiplex data events and TUI render ticks
  • clap 4.6.x: CLI argument parsing — de facto standard with derive macros
  • pcap 2.3.x (macOS): Packet capture via PKTAP — macOS has no eBPF; PKTAP embeds PID in packet headers making it the closest equivalent
  • sysinfo / procfs: Process metadata — cross-platform PID-to-name enrichment
  • csv 1.4.x + serde: Structured logging — CSV export for offline analysis

Expected Features

The feature research identified a clear three-tier priority structure. The competitive gap analysis confirms that combining per-connection granularity, process attribution, and live TCP RTT in a single tool is genuinely novel — none of BCC tcptop, nethogs, iftop, or bandwhich provides all three.

Must have (table stakes for v1):

  • Real-time per-connection table with bytes tx/rx, packets, PID, process name, TCP state — core product promise
  • Sort by any column (default: bandwidth descending) — unsortable tables are unusable for finding heavy hitters
  • Filter by port and PID via CLI flags (--port, --pid) — minimum viable filtering
  • Summary header (total connections, aggregate bandwidth) — context for the table
  • Configurable refresh interval (--interval, default 1s)
  • Graceful privilege error handling — must print helpful message, not Rust backtrace
  • --help via clap

Should have (competitive differentiators, add after v1 validation):

  • TCP RTT/latency per connection — primary differentiator; no competing tool shows this live
  • Bandwidth rate display (KB/s rolling window) — more actionable than cumulative bytes
  • CSV logging mode (--log output.csv) — enables SRE scripting use case
  • Runtime filter/search (/ key) — power user productivity
  • UDP flow tracking (timeout-based tuple aggregation) — significant separate design problem
  • Reverse DNS with async caching and toggle (-n)
  • Pause/freeze display (p key)
  • Color-coded bandwidth cells

Defer (v2+):

  • macOS backend — high complexity, different kernel model, lower fidelity
  • IPv6 support — BCC tcptop defers this too; implement IPv4-first
  • Interface selection (--interface) — multi-NIC eventually
  • JSON Lines output — if CSV proves insufficient
  • Distribution packaging (Homebrew, .deb/.rpm, cargo-install)

Anti-features to avoid:

  • Layer 7 / application protocol inspection (DPI scope creep)
  • UDP RTT estimation (meaningless without application-layer ACKs)
  • Built-in historical storage (CSV + external tools cover this)
  • Promiscuous mode sniffing (different tool category)

Architecture Approach

The architecture is a three-layer system: a platform abstraction layer with a DataSource trait and platform-specific backends, an application core layer with a connection state table (HashMap behind channel-based ownership) and filter/aggregation logic, and a UI layer with separate Ratatui TUI and CSV logger outputs both reading from the same state table. The entire data flow is async with Tokio: the platform backend runs in its own task and pushes ConnectionEvent structs through an mpsc channel; the main event loop uses tokio::select! to consume data events, keyboard input, and render ticks without coupling data rate to render rate. The eBPF ring buffer (BPF_MAP_TYPE_RINGBUF, kernel 5.8+) is preferred over the older perf buffer for 5x lower overhead.

Major components:

  1. tcptop-common (shared no_std types): ConnectionEventRaw and protocol enums shared between kernel and userspace — must exist before anything else
  2. tcptop-ebpf (kernel programs): kprobes on tcp_sendmsg, tcp_recvmsg, tcp_v4_connect, tcp_close, udp_sendmsg, udp_recvmsg; writes to ring buffer
  3. platform/ (DataSource trait + backends): Linux eBPF backend and macOS pcap/PKTAP backend behind a single async trait; cfg(target_os) selects implementation at compile time
  4. state/ (connection state table): Single source of truth; HashMap<ConnectionKey, ConnectionStats>; owned by main event loop, written via channel from platform backend
  5. tui/ (Ratatui rendering): Sortable connection table, summary header, keyboard input handler — reads state table on each render tick
  6. logger.rs (CSV output): Independent output sink reading the same state snapshot as TUI on each tick
  7. filter.rs (filter engine): Predicate functions applied at render time, not at collection time (except kernel-side filtering for high-volume cases)

Critical Pitfalls

  1. No platform abstraction from day one — Define the DataSource trait before writing any eBPF code. Retrofitting it costs a near-complete rewrite of the data pipeline. eBPF-specific types must never leak into business logic.

  2. eBPF verifier rejection — Keep eBPF programs minimal (one per hook point, under 200 lines, no iterators or complex match chains). The 512-byte stack limit and branch complexity limits are easy to hit in Rust. Test on the minimum target kernel, not just your dev machine.

  3. Nightly toolchain fragility — Pin the exact nightly date in rust-toolchain.toml immediately. Pin bpf-linker version in CI. Never use nightly channel without a date. A rustup update will break the build without a pin.

  4. Kernel version compatibility — Decide minimum supported kernel (recommend 5.8 for ring buffer support) at project start. Test on that kernel in CI. Check for BTF (/sys/kernel/btf/vmlinux) at startup and emit a specific error if missing.

  5. Missed events under load — Use ring buffer over perf buffer. Run data collection in a dedicated Tokio task separate from TUI rendering. Size ring buffer at 256KB (configurable). Implement periodic reconciliation to remove ghost connections. Expose a dropped-events counter.

Implications for Roadmap

The architecture research explicitly derives a build order from component dependencies. This directly maps to a phase structure.

Phase 1: Foundation and Project Setup

Rationale: Toolchain pinning, workspace structure, and the platform abstraction trait must exist before any implementation code. The cost of skipping these is rewriting the data pipeline. Per Pitfalls research, the DataSource trait and nightly toolchain pin are Phase 1 requirements, not Phase 2 cleanup. Delivers: Buildable workspace, pinned toolchain, DataSource trait interface, shared types in tcptop-common, CI green on Linux Addresses: Privilege check UX, root error message, graceful shutdown via signal-hook Avoids: Platform abstraction retrofitting (Pitfall 1), toolchain fragility (Pitfall 3)

Phase 2: eBPF Data Pipeline (Linux, TCP)

Rationale: The eBPF kernel programs and ring buffer communication are the highest-risk, most technically novel component. Nothing else can be validated until events flow from kernel to userspace. This is also where verifier rejection and kernel compatibility issues surface — better to face them with a simple program than a complex one. Delivers: kprobes attached to TCP hooks, ConnectionEvent structs arriving in userspace via ring buffer, connection state table populated with real kernel data Uses: Aya + aya-ebpf, Tokio async runtime, BPF_MAP_TYPE_RINGBUF Implements: tcptop-ebpf kernel programs, platform/linux.rs backend, state/ connection table Avoids: Verifier rejection (Pitfall 2), missed events (Pitfall 5), UDP modeling confusion (Pitfall 6)

Phase 3: TUI and Core UX

Rationale: Once the data pipeline produces real connection events, build the user-facing product. TUI can be developed against mock data in parallel with Phase 2, but integration happens here. This phase delivers the v1 MVP. Delivers: Ratatui TUI with sortable connection table, summary header, keyboard controls (sort, quit, pause), CLI flags (--port, --pid, --interval), v1 MVP shippable on Linux Implements: tui/ module, filter.rs, app.rs main event loop with tokio::select! Avoids: TUI performance traps (uncapped FPS, sort on every frame, per-frame allocations), terminal restore on panic

Phase 4: Differentiators and Polish

Rationale: TCP RTT, bandwidth rates, and CSV logging are the features that separate tcptop from BCC tcptop and bandwhich. They are built on top of the working data pipeline from Phase 2. UDP tracking is included here as a separate subsystem with its own flow model. Delivers: TCP RTT/latency per connection, KB/s rolling window bandwidth rates, CSV logging (--log), UDP flow tracking with timeout-based expiry, runtime search (/), reverse DNS with toggle, color-coded cells, connection age Uses: Additional eBPF tracepoints for RTT measurement, csv + serde, async DNS resolution Implements: logger.rs, RTT calculation in eBPF/userspace, UDP flow subsystem in state/

Phase 5: macOS Backend

Rationale: macOS is a v2 concern because it requires a fundamentally different kernel integration path (libpcap/PKTAP vs eBPF) and will provide lower-fidelity data. The platform abstraction from Phase 1 makes this possible without touching Phase 2-4 code. This phase validates that the trait boundary is truly platform-agnostic. Delivers: Basic connection monitoring on macOS (process-attributed via PKTAP), cross-platform binary, reduced-feature macOS mode documented Uses: pcap 2.3.x, PKTAP headers for PID attribution, proc_pidinfo for process names Implements: platform/macos.rs backend Avoids: macOS DTrace dead end (SIP restrictions), Network Extension Framework overkill

Phase 6: Distribution and Packaging

Rationale: Distribution is orthogonal to functionality and should come after the core product is stable and validated. Packaging decisions (Homebrew cask, .deb/.rpm, cargo-binstall) are informed by user demand. Delivers: Homebrew formula (macOS), .deb/.rpm packages (Linux), man page, CONTRIBUTING.md with toolchain setup, release binary signing Implements: Packaging scripts, GitHub release automation

Phase Ordering Rationale

  • Foundation before eBPF: The platform abstraction trait is cheap to define upfront and catastrophically expensive to retrofit. Toolchain pinning takes 10 minutes and prevents days of debugging.
  • eBPF before TUI: The TUI is straightforward once data flows. Inverting this order (building TUI first with mock data, then wiring in eBPF) risks discovering that the data model doesn't match the display model too late.
  • TCP before UDP: UDP requires a separate conceptual model (flows with timeouts vs connections with state machines). Getting TCP right first means the UDP design has a clear analogy to work from.
  • Linux before macOS: eBPF on Linux is the core value proposition and the part with the most documentation. macOS is a nice-to-have that depends on the same abstraction layer — implement it only after that layer is proven.
  • Features before packaging: Packaging problems are solved problems; distribute only what users have validated.

Research Flags

Phases likely needing /gsd:research-phase during planning:

  • Phase 2 (eBPF Data Pipeline): eBPF kprobe hook selection and RTT measurement via tracepoints need detailed API research; kernel version compatibility matrix needs pinning to exact target; ring buffer sizing needs load-testing data
  • Phase 5 (macOS Backend): PKTAP header format and pcap crate PKTAP integration is sparsely documented; libntstat private API risk needs assessment; proc_pidinfo + socket FD iteration pattern needs validation

Phases with standard patterns (skip or minimize research-phase):

  • Phase 1 (Foundation): Aya workspace template is well-documented; toolchain pinning is mechanical
  • Phase 3 (TUI): Ratatui patterns are thoroughly documented; tokio::select! event loop is standard
  • Phase 6 (Distribution): Packaging is solved for Rust projects; follow established Homebrew/cargo-binstall patterns

Confidence Assessment

Area Confidence Notes
Stack HIGH Aya, Ratatui, Tokio, clap, csv are all undisputed choices with no real alternatives. pcap/PKTAP for macOS is MEDIUM due to sparse documentation of the PKTAP-specific API surface.
Features HIGH Competitor analysis is thorough; BCC tcptop, nethogs, bandwhich, iftop, ss all have public documentation. Feature gaps are clear.
Architecture MEDIUM-HIGH Linux path (eBPF + ring buffer + channel + Ratatui) is well-documented with reference implementations. macOS private API path (com.apple.network.statistics) is MEDIUM confidence — it is a private framework with risk of breaking across macOS versions.
Pitfalls HIGH All critical pitfalls are documented from real-world experience (verifier rejection, toolchain fragility, kernel version compat, missed events). Well-sourced.

Overall confidence: MEDIUM-HIGH

Gaps to Address

  • macOS PKTAP API surface: The pcap crate's PKTAP-specific behavior (accessing the pktap_header struct for PID and process name) is not thoroughly documented in the crate's Rust API. Will require reading pcap crate source and testing against live macOS during Phase 5.
  • TCP RTT measurement via eBPF: The specific tracepoints or kprobes needed for per-connection RTT calculation need validation. tcp_rtt_estimator and tcp_ack hook points exist but their stability and data availability needs confirming during Phase 4.
  • Minimum kernel version decision: Choosing between kernel 5.4 (broad compatibility, perf buffer fallback) and 5.8 (ring buffer, simpler implementation) has product implications. Recommend 5.8 as minimum given it is from 2020 and covers all major LTS distributions in current use, but this needs explicit project decision.
  • bpf-linker LLVM compatibility on macOS: Cross-compiling eBPF on macOS for Linux targets requires LLVM version alignment. This is documented as fragile and needs a CI strategy (Linux runners for eBPF compilation, macOS runners for userspace).

Sources

Primary (HIGH confidence)

Secondary (MEDIUM confidence)

Tertiary (LOW confidence)

  • Apple Developer Forums - nettop alternatives — macOS private API landscape; needs validation during Phase 5 implementation
  • macOS com.apple.network.statistics private framework behavior — undocumented; may break across macOS versions

Research completed: 2026-03-21 Ready for roadmap: yes