20 KiB
Pitfalls Research
Domain: Rust eBPF network monitoring CLI (tcptop) Researched: 2026-03-21 Confidence: HIGH (well-documented domain with many real-world projects to learn from)
Critical Pitfalls
Pitfall 1: macOS Has No eBPF -- Platform Abstraction Must Be Day-One Architecture
What goes wrong: Developers start building the eBPF data collection layer for Linux, get it working, then discover macOS support requires a completely different backend (DTrace, nstat, Network Extension Framework, or libpcap). The resulting code is tightly coupled to eBPF concepts, and retrofitting a platform abstraction layer requires rewriting most of the data pipeline.
Why it happens: eBPF is Linux-only. macOS uses DTrace for kernel tracing, but DTrace is severely limited by System Integrity Protection (SIP) -- even with sudo, SIP prevents tracing system binaries and most kernel probes. Apple's Network Extension framework requires Objective-C/Swift and entitlements. There is no clean equivalent to Linux eBPF on macOS.
How to avoid:
Define a DataSource trait from the very first line of code. The trait returns platform-agnostic connection records (source/dest IP, port, bytes, packets, state, PID). Linux implements it with eBPF via Aya. macOS implements it with a fallback: likely nettop/nstat parsing, libpcap, or lsof+netstat polling. Accept that macOS will have lower fidelity data -- this is a known tradeoff. Do NOT attempt to build a DTrace backend that requires users to disable SIP.
Warning signs:
- eBPF-specific types (map handles, program file descriptors) leaking into core data structures
- No
cfg(target_os)gates in the first prototype - "We'll add macOS later" appearing in planning discussions
Phase to address: Phase 1 (Architecture/Foundation). The trait boundary must exist before any eBPF code is written.
Pitfall 2: eBPF Verifier Rejection -- Programs That Compile Fine But the Kernel Refuses to Load
What goes wrong: The eBPF program compiles successfully with rustc/bpf-linker but the kernel verifier rejects it at load time. Common rejections: "program too complex" (exceeding instruction complexity limit), unbounded loops, stack overflow (512-byte limit), or accessing memory the verifier cannot prove is safe. This is the single most frustrating part of eBPF development.
Why it happens: The eBPF verifier simulates every possible execution path. A loop bounded by a 16-bit variable means the verifier checks up to 65,535 iterations. Branches inside loops multiply complexity exponentially. Rust's safe abstractions (Option, Result, match) generate more branches than equivalent C, hitting complexity limits faster. The 512-byte stack limit (256 with tail calls) means you cannot have large local structs.
How to avoid:
- Keep eBPF programs minimal: extract only the data needed (src/dst IP, port, bytes, PID) and push everything else to userspace
- Use
bpf_loop()helper (kernel 5.17+) for any iteration instead of for-loops - Avoid deep nesting; flatten conditionals
- Use eBPF maps (HashMap, PerCpuHashMap) to store state instead of stack variables
- Test verifier acceptance on the oldest target kernel version, not just your dev machine
- Write small, focused programs (one per hook point) rather than monolithic programs
Warning signs:
- eBPF program source file exceeding 200 lines
- Using Rust iterators or complex pattern matching in eBPF code
- "Works on kernel 6.x but fails on 5.15"
- Stack variables larger than ~100 bytes
Phase to address: Phase 2 (eBPF implementation). Must be understood before writing the first kprobe/tracepoint.
Pitfall 3: Nightly Rust Toolchain and bpf-linker Fragility
What goes wrong:
Aya requires Rust nightly for eBPF compilation (bpfel-unknown-none target). The bpf-linker requires a specific LLVM version matching the nightly toolchain. A routine rustup update breaks the build because bpf-linker is pinned to a specific LLVM version that no longer matches the new nightly. CI breaks. Contributors cannot build. Days are lost debugging toolchain issues.
Why it happens:
eBPF compilation uses unstable Rust features (build-std=core, BPF target). bpf-linker links against LLVM and must match the LLVM version used by rustc. When rustc bumps its LLVM (e.g., from 20 to 21), bpf-linker may not have a matching release yet.
How to avoid:
- Pin the exact nightly date in
rust-toolchain.toml(e.g.,channel = "nightly-2025-11-28") - Pin bpf-linker version in CI and document the exact install command
- Use a workspace with separate
xtaskbuild orchestration (the Aya template provides this pattern) - Test nightly updates in a branch before merging to main
- Document the full toolchain setup in CONTRIBUTING.md
Warning signs:
- No
rust-toolchain.tomlin the repository cargo install bpf-linkerwithout version pinning- CI using
nightlyinstead ofnightly-YYYY-MM-DD - "Works on my machine" but fails for other contributors
Phase to address: Phase 1 (Project setup). Pin toolchain before writing any code.
Pitfall 4: Kernel Version Compatibility -- What Works on Your Dev Machine Fails in Production
What goes wrong: eBPF features vary dramatically across kernel versions. Your program uses kfuncs, ring_buffer, or BTF features available on kernel 6.x but your users run Ubuntu 20.04 (kernel 5.4) or RHEL 8 (kernel 4.18). The program either fails to load or silently produces no data.
Why it happens: Key feature availability by kernel version:
- 4.18: Basic kprobes, tracepoints, perf_event_array
- 5.4: BTF support begins
- 5.8: Ring buffer (BPF_MAP_TYPE_RINGBUF)
- 5.17: bpf_loop() helper, CO-RE improvements
- 6.x: Various kfuncs, improved verifier
Most developers work on recent kernels and never test on older ones.
How to avoid:
- Decide on a minimum kernel version early (recommend 5.4 for broad compatibility, or 5.8 if ring_buffer is needed)
- Use perf_event_array as fallback if ring_buffer is unavailable
- Check
/sys/kernel/btf/vmlinuxat startup; provide a clear error message if BTF is missing - Use kprobes on stable kernel functions (tcp_v4_connect, tcp_set_state, tcp_sendmsg) rather than tracepoints that may not exist on older kernels
- Test in VMs with minimum supported kernel version as part of CI
Warning signs:
- No minimum kernel version documented
- Using features without checking kernel availability
- Only testing on your development kernel
- No runtime feature detection at program startup
Phase to address: Phase 2 (eBPF implementation) for initial decisions; Phase 4 (Packaging/Distribution) for runtime detection and error messages.
Pitfall 5: Missed Events and Data Gaps Under High Connection Volume
What goes wrong: Under heavy network load (thousands of connections/second), the eBPF-to-userspace data pipeline drops events silently. The TUI shows stale data, ghost connections that never disappear, or missing connections. Users lose trust in the tool's accuracy.
Why it happens: Multiple failure modes:
- Perf buffer overflow: Per-CPU buffers fill up; kernel drops events without notification
- Ring buffer overflow: Single shared buffer fills; new events are dropped
- kRetProbe limit: Kernel limits active kRetProbes to ~4,096 (kernel 6.4.5 default); excess probes fail silently
- Race conditions: Connection closes between eBPF probe firing and userspace processing; "ghost" entries persist
- Userspace processing lag: TUI rendering blocks event consumption; events queue up and overflow
How to avoid:
- Use ring_buffer over perf_event_array (5x less overhead on multi-core systems; 7% vs 35%)
- Process events in a dedicated thread, separate from TUI rendering
- Implement periodic reconciliation: sweep connection map and remove entries for connections no longer in kernel state
- Monitor and expose a "dropped events" counter so users know when data is incomplete
- Size ring buffer appropriately (start with 256KB, make configurable)
- Use per-CPU hash maps for aggregation in kernel space; send summaries to userspace, not per-packet events
Warning signs:
- Connection count in TUI never decreases
- CPU spikes correlate with network activity
- No "events dropped" metric anywhere in the codebase
- Single-threaded event loop handling both eBPF events and TUI rendering
Phase to address: Phase 2 (eBPF data pipeline) for architecture; Phase 3 (TUI) for thread separation.
Pitfall 6: UDP "Connection" Tracking Is a Fundamentally Different Problem
What goes wrong: Developers model UDP tracking the same way as TCP -- expecting connect/accept/close lifecycle events. But UDP is connectionless. There are no state transitions to hook. The tool either shows nothing for UDP or shows every single datagram as a separate "connection."
Why it happens: TCP has explicit state machine hooks (tcp_v4_connect, tcp_close, tcp_set_state). UDP has none -- it's fire-and-forget. The kernel does track UDP sockets, but there's no equivalent to TCP connection lifecycle.
How to avoid:
- Track UDP as "flows" not "connections": aggregate by (src_ip, src_port, dst_ip, dst_port) tuple
- Hook
udp_sendmsgandudp_recvmsgfor byte/packet counting - Implement flow timeout: if no packets seen for N seconds (configurable, default 30s), mark flow as expired
- Display UDP flows separately or with a distinct state indicator ("ACTIVE" / "IDLE" / "EXPIRED")
- Do NOT promise RTT/latency for UDP -- it's meaningless without application-layer protocol knowledge (the project already notes this in PROJECT.md)
Warning signs:
- Shared data structures between TCP and UDP tracking with a "state" field that doesn't apply to UDP
- No timeout/expiry mechanism for UDP entries
- Attempting to estimate UDP RTT without acknowledging it's heuristic at best
Phase to address: Phase 2 (eBPF implementation). Design UDP flow tracking as a separate subsystem from TCP state tracking.
Technical Debt Patterns
| Shortcut | Immediate Benefit | Long-term Cost | When Acceptable |
|---|---|---|---|
| Hardcoding eBPF as the only backend | Faster initial development | Cannot support macOS; testing requires root/kernel | Never -- trait abstraction is cheap upfront |
| Polling /proc/net/tcp instead of eBPF | Works without root, no kernel dependency | Misses short-lived connections, high overhead, no per-packet stats | MVP prototype only, replace before v0.1 |
| Single-threaded event loop | Simpler code, no synchronization | TUI blocks event processing; dropped events under load | Never for production; acceptable for proof-of-concept |
| Using perf_event_array when ring_buffer is available | Works on kernel 5.4+ | 5x higher overhead on multi-core; event ordering issues | Only as fallback for kernel < 5.8 |
| Embedding eBPF bytecode at compile time | No runtime compilation needed | Locked to one kernel version's struct layouts without CO-RE | Acceptable if using CO-RE/BTF for portability |
| Skipping process (PID) resolution | Simpler eBPF programs | Major missing feature; users expect to know which process owns a connection | MVP only; add in same phase as eBPF work |
Integration Gotchas
| Integration | Common Mistake | Correct Approach |
|---|---|---|
| Aya eBPF map access | Using HashMap when data is updated from multiple CPUs causing lock contention |
Use PerCpuHashMap for counters; aggregate in userspace |
| Kernel kprobes | Hooking internal kernel functions that get renamed/removed across versions | Hook stable exported functions: tcp_v4_connect, tcp_v6_connect, tcp_close, tcp_set_state, tcp_sendmsg, tcp_recvmsg |
| Process info from eBPF | Calling bpf_get_current_pid_tgid() in network hooks where context may be a kernel thread (softirq) |
Accept that some connections will have PID 0; resolve via socket->sk->sk_uid or /proc fallback |
| Terminal raw mode | Not restoring terminal on panic/crash | Use scopeguard or custom panic handler to restore terminal state before exit |
| CSV logging | Flushing on every write | Buffer writes, flush on interval or signal; use BufWriter with periodic flush |
Performance Traps
| Trap | Symptoms | Prevention | When It Breaks |
|---|---|---|---|
| TUI rendering at uncapped frame rate | 50%+ CPU on a single core; fan spin | Cap at 4 FPS for a monitoring tool; use event-driven rendering (only redraw on new data or user input) | Immediately in debug builds; visible in release at 60 FPS |
| Per-packet events from eBPF to userspace | Ring buffer overflow; massive CPU in userspace parsing | Aggregate in-kernel using PerCpuHashMap; send periodic summaries or only state-change events | Above ~10k packets/second |
| Sorting the connection table on every frame | Noticeable lag with 1000+ connections | Sort only when sort column changes or on a timer (every 1s); use incremental insert for new entries | Above ~500 connections |
| String formatting for every connection row | Allocations per frame per row | Pre-allocate row buffers; use write! into reusable String buffers |
Above ~200 connections at 4+ FPS |
| DNS reverse lookup for every IP | Blocks rendering; DNS timeout stalls TUI | Cache resolved names; resolve asynchronously; display IP immediately, update with name when ready | Any network with DNS latency > 50ms |
Security Mistakes
| Mistake | Risk | Prevention |
|---|---|---|
| Not dropping privileges after eBPF program is loaded | Running entire application as root unnecessarily | Load eBPF programs, then drop to original user with setuid/setgid; only the loader needs CAP_BPF + CAP_NET_ADMIN |
| Logging sensitive connection data to CSV without warning | User inadvertently captures connection metadata to a world-readable file | Default CSV to mode 0600; warn in --help that CSV contains network metadata |
| Shipping pre-compiled eBPF bytecode without signing | Supply chain risk if bytecode is modified | Use include_bytes! to embed bytecode in the binary; sign release binaries |
| Not validating eBPF map data in userspace | Corrupted/malicious map data could cause crashes | Validate all data read from eBPF maps; handle malformed entries gracefully |
UX Pitfalls
| Pitfall | User Impact | Better Approach |
|---|---|---|
| Cryptic error when not running as root | User sees "EPERM" or "operation not permitted" | Detect missing capabilities at startup; print: "tcptop requires root privileges. Run with: sudo tcptop" |
| No indication of kernel incompatibility | Program silently shows no data | Check kernel version and BTF availability at startup; print specific guidance |
| Showing raw IP addresses only | Hard to identify connections at a glance | Async DNS resolution with IP shown immediately and hostname replacing it when resolved |
| Overwhelming amount of data with no default filter | Too many connections to read | Default to showing top 50 connections by bandwidth; allow expand with keybinding |
| No visual indication of sort column | User doesn't know how data is ordered | Highlight active sort column header; show sort direction arrow |
| Screen flicker on resize | Jarring visual experience | Handle SIGWINCH; debounce resize events; clear and redraw once |
"Looks Done But Isn't" Checklist
- eBPF loading: Often missing graceful fallback when BTF is unavailable -- verify behavior on kernel without
/sys/kernel/btf/vmlinux - Connection tracking: Often missing cleanup of terminated connections -- verify connection count returns to 0 after all connections close
- IPv6 support: Often hardcoded to IPv4 only -- verify
tcp_v6_connectand IPv6 address display works - Short-lived connections: Often missed entirely -- verify a
curlrequest appears and disappears in the TUI - Process name resolution: Often shows PID only or empty for kernel threads -- verify graceful handling of PID 0 / kernel threads
- Terminal restore: Often broken on Ctrl+C or panic -- verify terminal is usable after abnormal exit
- Privilege error message: Often just prints a Rust backtrace -- verify clean error message when run without sudo
- macOS backend: Often "planned" but never built -- verify basic functionality on macOS even if lower fidelity
- UDP flow expiry: Often accumulates forever -- verify UDP entries disappear after timeout period
Recovery Strategies
| Pitfall | Recovery Cost | Recovery Steps |
|---|---|---|
| No platform abstraction (Pitfall 1) | HIGH | Introduce trait; refactor all eBPF calls behind it; significant rewrite of data pipeline |
| Verifier rejection (Pitfall 2) | MEDIUM | Simplify eBPF program; split into smaller programs; move logic to userspace |
| Toolchain breakage (Pitfall 3) | LOW | Pin toolchain to last known working nightly; update bpf-linker to match |
| Kernel version incompatibility (Pitfall 4) | MEDIUM | Add runtime feature detection; implement fallback paths for older kernels |
| Missed events (Pitfall 5) | MEDIUM | Switch to ring_buffer; add dedicated event thread; implement reconciliation |
| UDP tracking confusion (Pitfall 6) | MEDIUM | Separate UDP into distinct flow-tracking subsystem with timeout |
Pitfall-to-Phase Mapping
| Pitfall | Prevention Phase | Verification |
|---|---|---|
| No platform abstraction | Phase 1 (Foundation) | DataSource trait exists; eBPF is one implementation behind it |
| Verifier rejection | Phase 2 (eBPF) | All eBPF programs load on minimum target kernel version in CI |
| Toolchain fragility | Phase 1 (Foundation) | rust-toolchain.toml pinned; CI builds green; CONTRIBUTING.md documents setup |
| Kernel version compat | Phase 2 (eBPF) + Phase 4 (Distribution) | Tested on kernel 5.4 and latest; startup prints clear error on unsupported kernel |
| Missed events | Phase 2 (Data pipeline) + Phase 3 (TUI) | Load test with 10k connections; dropped event counter stays at 0 |
| UDP flow tracking | Phase 2 (eBPF) | UDP flows appear with activity and expire after timeout |
| TUI performance | Phase 3 (TUI) | CPU usage < 5% at idle with 500 connections displayed |
| Privilege handling | Phase 1 (Foundation) | Running without sudo prints helpful error; running with sudo drops privileges after eBPF load |
| macOS support | Phase 1 (trait) + Phase 4 (macOS backend) | cargo build succeeds on macOS; basic connection listing works |
Sources
- Aya eBPF library - GitHub
- eBPF with Rust using Aya
- Why Does My eBPF Program Work on One Kernel but Fail on Another?
- BCC kernel version compatibility matrix
- eBPF Ring Buffer vs Perf Buffer
- Ratatui CPU usage discussion
- Ratatui high CPU issue #1338
- Pitfalls of relying on eBPF for security monitoring - Trail of Bits
- eBPF verifier documentation
- Groundcover eBPF verifier guide
- Misadventures in DTrace on macOS
- macOS SIP and DTrace limitations
- bpf-linker - GitHub
- The Challenge with Deploying eBPF Into the Wild - Pixie Labs
- Correlating Network Events and Process Context using eBPF Maps
- eBPF TCP connection state tracking tutorial
Pitfalls research for: Rust eBPF network monitoring CLI (tcptop) Researched: 2026-03-21