Files
Zachary D. Rowitsch 38e6dcc34a chore: archive v1.0 phase directories to milestones/v1.0-phases/
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 01:33:15 -04:00

567 lines
30 KiB
Markdown

# Phase 1: Data Pipeline - Research
**Researched:** 2026-03-21
**Domain:** Linux eBPF kernel tracing, TCP/UDP connection monitoring, Aya Rust eBPF framework
**Confidence:** HIGH
## Summary
Phase 1 implements the core data pipeline: eBPF programs in the Linux kernel capture TCP and UDP events (connections, bytes, packets, state changes, RTT), push them through a ring buffer to userspace, where a Rust application aggregates per-connection statistics and streams them to stdout. This is a greenfield Aya eBPF project following the standard workspace template (ebpf crate, common crate, userspace crate).
The eBPF hook strategy combines kprobes for byte/packet counting (`tcp_sendmsg`, `tcp_recvmsg`, `udp_sendmsg`, `udp_recvmsg`) with the `sock:inet_sock_set_state` tracepoint for TCP state transitions and reading `srtt_us` from the sock struct for RTT. Pre-existing connections are bootstrapped from `/proc/net/tcp` and `/proc/net/udp` on startup. All kernel-to-userspace communication uses a single `RingBuf` map (requires kernel 5.8+).
The platform abstraction trait (`NetworkCollector`) is defined in this phase to enable the macOS backend in Phase 4, but only the Linux/eBPF implementation is built. The proof-of-life output is streaming lines to stdout -- no TUI, no cursor manipulation.
**Primary recommendation:** Use the Aya template workspace structure with `aya-build` in `build.rs` (not xtask). Attach to 4 kprobes + 1 tracepoint, use a single RingBuf for all events, and keep the common crate's event enum as the single shared type between kernel and userspace.
<user_constraints>
## User Constraints (from CONTEXT.md)
### Locked Decisions
- **D-01:** Streaming lines to stdout -- each connection update prints a new line (like `tail -f`). No screen clearing or cursor manipulation.
- **D-02:** Human-readable sizes by default (`1.2 MB`, `340 KB/s`). Show raw value alongside when it fits cleanly without noise (e.g., `1258291 (1.2M)`). If too noisy, drop to human-readable only.
- **D-03:** Full detail per connection -- all fields: local/remote addr+port, PID, process name, TCP state, bytes in/out, packets in/out, RTT, bandwidth rate.
- **D-04:** Output is assumed throwaway scaffolding. Build it simple; decide whether to keep as `--batch` mode when TUI lands in Phase 2.
- **D-05:** Full 4-tuple grouping -- (src IP, src port, dst IP, dst port) = one flow for UDP.
- **D-06:** 5-second idle timeout -- UDP flows disappear 5s after last packet. Tunable via flag deferred.
- **D-07:** No synthesized state for UDP -- show `-` or `UDP` in the state column.
- **D-08:** Flat bidirectional flow tracking -- count bytes/packets in each direction, no request/response inference.
- **D-09:** Minimal error message on missing privileges: `error: tcptop requires root privileges. Run with sudo.`
- **D-10:** If Linux capabilities (`CAP_BPF`, `CAP_PERFMON`) are present, proceed without root. Don't suggest capabilities in error message.
- **D-11:** Exit code 77 on privilege failure.
- **D-12:** Closed TCP connections linger for one display/refresh cycle, then are removed.
- **D-13:** Visual distinction for new/closing connections -- exact styling deferred to Phase 2.
- **D-14:** Connection close events print `[CLOSED] 192.168.1.1:443 -> ...` in streaming output.
- **D-15:** Pre-existing connections sourced from `/proc/net/tcp` on startup, marked as partial.
### Claude's Discretion
- eBPF hook point selection (kprobes vs tracepoints)
- Platform abstraction trait design and boundaries
- Ring buffer vs perf event array for kernel-to-userspace transport
- RTT estimation implementation approach
- Exact format/layout of streaming output lines
- `/proc/net/tcp` parsing strategy for pre-existing connections
- Internal data structures and concurrency model
### Deferred Ideas (OUT OF SCOPE)
- UDP flow idle timeout user-configurable via CLI flag -- Phase 2 or 3
- Protocol hint column for well-known UDP ports -- future enhancement
- Toggle between human-readable and raw byte display via keypress -- Phase 2 TUI
- `--batch` or `--once` mode -- evaluate after Phase 2
</user_constraints>
<phase_requirements>
## Phase Requirements
| ID | Description | Research Support |
|----|-------------|------------------|
| DATA-01 | Per-connection byte counts (sent/received) in real time | kprobes on `tcp_sendmsg`/`tcp_recvmsg`/`udp_sendmsg`/`udp_recvmsg` capture byte counts from msg length args |
| DATA-02 | Per-connection packet counts (sent/received) in real time | Same kprobes increment packet counters per event |
| DATA-03 | TCP connection state (ESTABLISHED, LISTEN, TIME_WAIT, etc.) | `sock:inet_sock_set_state` tracepoint fires on every state transition with old+new state |
| DATA-04 | Correlate connection to owning process (PID, process name) | `bpf_get_current_pid_tgid()` in kprobes captures PID; process name via `bpf_get_current_comm()` or userspace `sysinfo`/`procfs` |
| DATA-05 | Per-connection TCP RTT estimate | Read `srtt_us` from `struct tcp_sock` in kprobe context (shifted right by 3 for microseconds) |
| DATA-06 | Bandwidth rates (KB/s or MB/s) per connection | Userspace calculation: delta bytes / delta time between refresh cycles |
| DATA-07 | Track TCP and UDP (UDP flows synthesized from 4-tuple with idle timeout) | UDP kprobes + userspace HashMap keyed by 4-tuple with 5s idle expiry |
| PLAT-01 | Works on Linux kernel 5.8+ using eBPF | RingBuf requires 5.8+; all hook points available in 5.8+; Aya handles BTF portability |
| PLAT-03 | Platform abstraction allows different backends | `NetworkCollector` trait with async `Stream` of connection events; Linux impl in this phase, macOS impl in Phase 4 |
| OPS-01 | Detect missing root/elevated privileges with clear error | Check `geteuid() == 0` or `CAP_BPF`+`CAP_PERFMON` capabilities before eBPF load |
| OPS-02 | Low overhead on host (no heavy polling, eBPF ring buffer based) | RingBuf is event-driven (no polling); kprobes fire only on actual network calls |
</phase_requirements>
## Standard Stack
### Core (Phase 1 Dependencies)
| Library | Version | Purpose | Verified |
|---------|---------|---------|----------|
| aya | 0.13.1 | eBPF loader, map access, program attachment | crates.io 2025-11-17 |
| aya-ebpf | 0.1.1 | Kernel-side eBPF macros (`#[kprobe]`, `#[tracepoint]`) | crates.io 2025-11-17 |
| aya-log | 0.2.1 | Userspace eBPF log receiver | crates.io 2024-10-09 |
| aya-log-ebpf | 0.1.0 | Kernel-side eBPF logging (`info!`, `debug!`) | crates.io 2024-10-09 |
| aya-build | 0.1.3 | Build script for eBPF compilation (replaces xtask) | crates.io 2025-11-17 |
| tokio | 1.50.0 | Async runtime (eBPF event loop, timers) | crates.io 2026-03-03 |
| clap | 4.6.0 | CLI argument parsing | crates.io 2026-03-12 |
| anyhow | 1.0.102 | Application error handling | crates.io 2026-02-20 |
| thiserror | 2.0.18 | Typed errors for library/trait code | crates.io 2026-01-18 |
| nix | 0.31.2 | Unix syscalls (privilege check, geteuid) | crates.io 2026-02-28 |
| procfs | 0.18.0 | `/proc/net/tcp` and `/proc/net/udp` parsing | crates.io 2025-08-30 |
| serde | 1.0.228 | Serialization for shared types | crates.io 2025-09-27 |
| log + env_logger | 0.4.x / 0.11.x | Internal diagnostics logging | Stable, widely used |
| signal-hook | 0.4.3 | Graceful SIGINT/SIGTERM shutdown | crates.io 2026-01-24 |
### eBPF Crate Dependencies (tcptop-ebpf)
| Library | Version | Purpose |
|---------|---------|---------|
| aya-ebpf | 0.1.1 | eBPF program macros and helpers |
| aya-log-ebpf | 0.1.0 | Kernel-side logging |
### Build Tools
| Tool | Purpose | Notes |
|------|---------|-------|
| Rust nightly | eBPF compilation target | Pin in `rust-toolchain.toml` with `bpfel-unknown-none` target |
| bpf-linker | Links eBPF object files | `cargo install bpf-linker` (on macOS: `--no-default-features` + LLVM 21+) |
| aya-build | Build.rs integration | Compiles eBPF crate during `cargo build` -- no xtask needed |
### Not Needed in Phase 1
| Library | Why Deferred |
|---------|--------------|
| ratatui / crossterm | Phase 2 (TUI) |
| csv | Phase 3 (CSV logging) |
| pcap | Phase 4 (macOS backend) |
| sysinfo | Not needed if `bpf_get_current_comm()` suffices for process names in kernel; `procfs` covers PID-to-name fallback |
## Architecture Patterns
### Recommended Project Structure
```
tcptop/
├── Cargo.toml # Workspace root
├── rust-toolchain.toml # Pin nightly + bpfel-unknown-none target
├── .cargo/config.toml # Build flags for eBPF target
├── tcptop/ # Userspace binary crate
│ ├── Cargo.toml
│ ├── build.rs # Uses aya-build to compile eBPF crate
│ └── src/
│ ├── main.rs # Entry point: privilege check, CLI, event loop
│ ├── collector/
│ │ ├── mod.rs # NetworkCollector trait definition
│ │ └── linux.rs # Linux/eBPF implementation
│ ├── model.rs # ConnectionRecord, ConnectionKey, ConnectionStats
│ ├── aggregator.rs # Connection state aggregation, bandwidth calc, UDP timeout
│ ├── output.rs # Streaming stdout formatter
│ ├── privilege.rs # Root/capability check logic
│ └── proc_bootstrap.rs # /proc/net/tcp+udp parser for pre-existing connections
├── tcptop-ebpf/ # eBPF kernel programs (no_std)
│ ├── Cargo.toml
│ └── src/
│ └── main.rs # All eBPF programs: kprobes + tracepoint
└── tcptop-common/ # Shared types (no_std compatible)
├── Cargo.toml
└── src/
└── lib.rs # Event enum, ConnectionKey, #[repr(C)] structs
```
### Pattern 1: Event-Driven eBPF Pipeline
**What:** Kernel eBPF programs emit events into a RingBuf; userspace consumes them asynchronously via tokio and updates an in-memory connection table.
**When to use:** Always -- this is the core architecture.
**Flow:**
```
Kernel: kprobe/tracepoint fires
-> eBPF program extracts fields from sock/skb
-> Writes TcptopEvent to RingBuf
-> Userspace tokio task polls RingBuf via AsyncFd
-> Deserializes event, updates ConnectionTable HashMap
-> Periodic tick formats and prints to stdout
```
**eBPF side (kernel):**
```rust
// tcptop-common/src/lib.rs
#![no_std]
#[repr(C)]
#[derive(Clone, Copy)]
pub enum TcptopEvent {
TcpSend(DataEvent),
TcpRecv(DataEvent),
UdpSend(DataEvent),
UdpRecv(DataEvent),
TcpStateChange(StateEvent),
}
#[repr(C)]
#[derive(Clone, Copy)]
pub struct DataEvent {
pub pid: u32,
pub comm: [u8; 16],
pub saddr: u32, // IPv4 for now; extend to support IPv6
pub daddr: u32,
pub sport: u16,
pub dport: u16,
pub bytes: u32,
pub srtt_us: u32, // only meaningful for TCP
}
#[repr(C)]
#[derive(Clone, Copy)]
pub struct StateEvent {
pub pid: u32,
pub saddr: u32,
pub daddr: u32,
pub sport: u16,
pub dport: u16,
pub old_state: u32,
pub new_state: u32,
}
```
**eBPF program (kernel):**
```rust
// tcptop-ebpf/src/main.rs
#![no_std]
#![no_main]
use aya_ebpf::{macros::{kprobe, tracepoint, map}, maps::RingBuf, programs::{ProbeContext, TracePointContext}};
use tcptop_common::{TcptopEvent, DataEvent};
#[map]
static EVENTS: RingBuf = RingBuf::with_byte_size(256 * 1024, 0); // 256 KB
#[kprobe]
pub fn tcp_sendmsg(ctx: ProbeContext) -> u32 {
match try_tcp_sendmsg(ctx) {
Ok(ret) => ret,
Err(_) => 0,
}
}
fn try_tcp_sendmsg(ctx: ProbeContext) -> Result<u32, i64> {
// arg0: struct sock*, arg1: struct msghdr*, arg2: size_t (bytes)
let sock: *const core::ffi::c_void = ctx.arg(0).ok_or(1)?;
let size: usize = ctx.arg(2).ok_or(1)?;
// Extract connection tuple from sock->__sk_common
// Write TcptopEvent::TcpSend to EVENTS ring buffer
if let Some(mut entry) = EVENTS.reserve::<TcptopEvent>(0) {
// populate entry...
entry.submit(0);
}
Ok(0)
}
```
**Userspace consumer:**
```rust
// Userspace: read from RingBuf with AsyncFd for async notification
use aya::maps::RingBuf;
use tokio::io::unix::AsyncFd;
let mut ring_buf = RingBuf::try_from(bpf.map_mut("EVENTS").unwrap())?;
let async_fd = AsyncFd::new(ring_buf.as_raw_fd())?;
loop {
let mut guard = async_fd.readable().await?;
while let Some(item) = ring_buf.next() {
let event: &TcptopEvent = unsafe { &*(item.as_ptr() as *const TcptopEvent) };
connection_table.update(event);
}
guard.clear_ready();
}
```
### Pattern 2: Platform Abstraction Trait
**What:** A trait that abstracts the data collection backend, enabling Linux (eBPF) and macOS (pcap) implementations behind a common interface.
**Design:**
```rust
// collector/mod.rs
use tokio::sync::mpsc;
pub enum CollectorEvent {
Data(DataEvent),
StateChange(StateEvent),
}
#[async_trait::async_trait]
pub trait NetworkCollector: Send {
/// Start collecting. Sends events to the provided channel.
async fn start(&mut self, tx: mpsc::Sender<CollectorEvent>) -> Result<()>;
/// Stop collecting and clean up kernel resources.
async fn stop(&mut self) -> Result<()>;
/// Bootstrap pre-existing connections (e.g., from /proc/net/tcp).
fn bootstrap_existing(&self) -> Result<Vec<ConnectionRecord>>;
}
```
**Why mpsc channel:** Decouples the collector from the aggregator. The eBPF ring buffer reader pushes events into the channel; the aggregator consumes them on its own schedule. This also naturally supports the macOS backend in Phase 4 pushing pcap-derived events into the same channel.
### Pattern 3: Connection Table with Bandwidth Calculation
**What:** In-memory HashMap keyed by connection tuple, with periodic tick for rate calculation and UDP idle timeout.
```rust
pub struct ConnectionTable {
connections: HashMap<ConnectionKey, ConnectionRecord>,
last_tick: Instant,
}
impl ConnectionTable {
pub fn tick(&mut self) {
let now = Instant::now();
let dt = now.duration_since(self.last_tick);
for record in self.connections.values_mut() {
record.rate_in = (record.bytes_in - record.prev_bytes_in) as f64 / dt.as_secs_f64();
record.rate_out = (record.bytes_out - record.prev_bytes_out) as f64 / dt.as_secs_f64();
record.prev_bytes_in = record.bytes_in;
record.prev_bytes_out = record.bytes_out;
}
// Expire UDP flows idle > 5 seconds
self.connections.retain(|k, r| {
if k.protocol == Protocol::Udp {
r.last_seen.elapsed() < Duration::from_secs(5)
} else {
true // TCP lifecycle managed by state events
}
});
self.last_tick = now;
}
}
```
### Anti-Patterns to Avoid
- **Polling /proc/net/tcp in a loop:** Misses short-lived connections, high overhead, racy PID mapping. Use eBPF for event-driven capture; `/proc` only at startup for bootstrapping.
- **PerfEventArray instead of RingBuf for this use case:** PerfEventArray is per-CPU with no ordering guarantees and requires per-CPU buffer management. RingBuf provides global ordering (important for state change events), shared buffer (simpler), and notification via fd. Since kernel 5.8 is the minimum anyway, always prefer RingBuf.
- **Separate RingBuf per program type:** Unnecessary complexity. A single RingBuf with a tagged enum (`TcptopEvent`) handles all event types cleanly.
- **Blocking the eBPF event loop with stdout writes:** The RingBuf consumer and the stdout printer must be decoupled. If stdout blocks (pipe full), it must not cause RingBuf overflow. Use a channel between consumer and printer.
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| `/proc/net/tcp` parsing | Custom line parser | `procfs` crate (0.18.0) | Handles IPv4/IPv6, hex parsing, socket states, inode mapping. Battle-tested. |
| CLI argument parsing | Manual `std::env::args` | `clap` 4.6.0 derive macros | Handles `--help`, validation, error messages automatically |
| eBPF program compilation | Custom build scripts | `aya-build` 0.1.3 in `build.rs` | Handles target selection, cross-compilation, artifact embedding |
| Async eBPF event reading | Raw epoll/poll loop | `tokio::io::unix::AsyncFd` wrapping RingBuf fd | Integrates with tokio runtime, handles wakeups correctly |
| Privilege checking | Raw syscall | `nix::unistd::geteuid()` + `caps` crate or manual `/proc/self/status` CapEff parse | Correct cross-distro behavior |
| Human-readable byte formatting | Format function | Simple utility function is fine here (too small for a crate) | Only needs KB/MB/GB and rate formatting; 20 lines of code |
**Key insight:** The eBPF build pipeline is the most complex "don't hand-roll" item. Aya's template structure and `aya-build` handle the nightmare of cross-compiling no_std Rust to BPF bytecode, linking, and embedding the result in the userspace binary.
## Common Pitfalls
### Pitfall 1: RingBuf Entry Alignment
**What goes wrong:** Shared types between eBPF and userspace have mismatched memory layout, causing corrupted data.
**Why it happens:** eBPF enforces 8-byte max alignment. Rust may add padding differently than C.
**How to avoid:** All shared types in `tcptop-common` MUST use `#[repr(C)]` and have no field with alignment > 8. Use only primitive types (u8, u16, u32, u64, i32, arrays of primitives). No `String`, `Vec`, `Option`, or enums with non-trivial discriminants.
**Warning signs:** Garbled IP addresses, impossible PID values, random byte counts.
### Pitfall 2: eBPF Verifier Rejection
**What goes wrong:** Kernel verifier rejects the eBPF program at load time with cryptic errors.
**Why it happens:** Unbounded loops, stack size exceeded (512 bytes max), uninitialized memory reads, pointer arithmetic verifier can't prove safe.
**How to avoid:** Keep eBPF functions small and linear. No loops (or use bounded `for i in 0..MAX`). Minimize stack usage -- read kernel structs field-by-field, don't copy whole structs. Use `bpf_probe_read_kernel` for every kernel pointer dereference.
**Warning signs:** "R1 invalid mem access", "back-edge in control flow", "unreachable insn".
### Pitfall 3: Missing Process Context in Tracepoint
**What goes wrong:** `sock:inet_sock_set_state` fires in interrupt/softirq context where `bpf_get_current_pid_tgid()` returns 0 (kernel context, not the process).
**Why it happens:** TCP state changes can be triggered by incoming packets processed in softirq, not in the context of the owning process.
**How to avoid:** Capture PID in the kprobes (tcp_sendmsg/tcp_recvmsg) which DO run in process context, store in a HashMap keyed by sock pointer. When inet_sock_set_state fires, look up the sock pointer in the map to get the PID. If not found, PID is 0/unknown -- enrich from userspace via procfs.
**Warning signs:** Many connections showing PID 0 or "unknown" process name.
### Pitfall 4: IPv6 Support Forgotten
**What goes wrong:** Tool only handles IPv4, crashes or silently drops IPv6 connections.
**Why it happens:** Easy to prototype with `u32` for IP addresses and forget about IPv6.
**How to avoid:** Use `[u8; 16]` for all IP address fields from the start. In the common crate, include an `af_family: u16` field. Format IPv4-mapped-IPv6 addresses correctly. The `sock.__sk_common.skc_family` field tells you AF_INET vs AF_INET6.
**Warning signs:** Missing connections on dual-stack hosts, especially localhost (::1).
### Pitfall 5: srtt_us Field Interpretation
**What goes wrong:** RTT values are 8x too high.
**Why it happens:** The kernel stores `srtt_us` as "smoothed RTT << 3" (shifted left by 3). Must shift right by 3 to get actual microseconds.
**How to avoid:** Always `srtt_us >> 3` when reading from `struct tcp_sock`. Document this in the code comment referencing `include/net/tcp.h`.
**Warning signs:** RTT values of 800us for localhost connections (should be ~100us).
### Pitfall 6: Pre-existing Connection Byte Counts
**What goes wrong:** Connections that existed before tcptop started show misleadingly low byte counts.
**Why it happens:** eBPF hooks only capture bytes from the moment they're attached. Historical byte counts are not available.
**How to avoid:** Per D-15, mark pre-existing connections as partial. The bootstrap from `/proc/net/tcp` provides the connection tuple and state but NOT byte counts. Start byte counters at 0 and indicate to the user these are "since monitoring started."
**Warning signs:** Long-lived connections (like SSH) showing 0 bytes initially.
### Pitfall 7: RingBuf Overflow Under Load
**What goes wrong:** High-traffic hosts generate more events than userspace can consume; events are silently dropped.
**Why it happens:** RingBuf has fixed size. Unlike PerfEventArray, dropped events are reported to the eBPF side (reserve returns None), not to userspace.
**How to avoid:** Size the ring buffer appropriately (start with 256KB, tune up if needed). In eBPF code, handle the `reserve` returning `None` gracefully -- increment a counter in a separate eBPF array map so userspace can detect and report drops. Consider rate-limiting per-connection updates in the eBPF program.
**Warning signs:** Missing events, gaps in connection tracking, drop counter incrementing.
## Code Examples
### Privilege Check (D-09, D-10, D-11)
```rust
// privilege.rs
use nix::unistd::geteuid;
use std::process;
pub fn check_privileges() {
if geteuid().is_root() {
return;
}
// Check for CAP_BPF and CAP_PERFMON via /proc/self/status
if has_required_capabilities() {
return;
}
eprintln!("error: tcptop requires root privileges. Run with sudo.");
process::exit(77);
}
fn has_required_capabilities() -> bool {
// Parse /proc/self/status for CapEff line
// Check bits for CAP_BPF (39) and CAP_PERFMON (38)
let status = std::fs::read_to_string("/proc/self/status").ok();
if let Some(status) = status {
for line in status.lines() {
if line.starts_with("CapEff:") {
let hex = line.trim_start_matches("CapEff:").trim();
if let Ok(caps) = u64::from_str_radix(hex, 16) {
let cap_perfmon = 1u64 << 38;
let cap_bpf = 1u64 << 39;
return (caps & cap_perfmon != 0) && (caps & cap_bpf != 0);
}
}
}
}
false
}
```
### Streaming Output Format (D-01, D-02, D-03, D-14)
```rust
// output.rs -- example line format
// PROTO LOCAL REMOTE PID PROCESS STATE BYTES_IN BYTES_OUT PKTS_IN PKTS_OUT RTT RATE_IN RATE_OUT
// TCP 192.168.1.10:54321 93.184.216.34:443 1234 curl ESTABLISHED 1258291 (1.2M) 340 (340B) 892 12 28.3ms 420.1 KB/s 113B/s
// UDP 0.0.0.0:53 8.8.8.8:53 567 systemd-r UDP 4096 (4.0K) 128 (128B) 32 1 - 1.3 KB/s 42B/s
// [CLOSED] TCP 192.168.1.10:54321 -> 93.184.216.34:443 (curl, PID 1234)
```
### rust-toolchain.toml
```toml
[toolchain]
channel = "nightly-2026-01-15" # Pin a specific nightly
components = ["rust-src", "rustfmt", "clippy"]
[target.bpfel-unknown-none]
# eBPF target -- only needed for the -ebpf crate
```
### Workspace Cargo.toml
```toml
[workspace]
members = ["tcptop", "tcptop-common", "tcptop-ebpf"]
resolver = "2"
[workspace.dependencies]
aya = { version = "0.13.1", features = ["async_tokio"] }
aya-log = "0.2"
tokio = { version = "1", features = ["full"] }
anyhow = "1"
thiserror = "2"
clap = { version = "4.6", features = ["derive"] }
log = "0.4"
env_logger = "0.11"
```
### build.rs (Userspace Crate)
```rust
// tcptop/build.rs
use std::env;
fn main() {
// aya-build compiles the eBPF crate and makes the bytecode available
// via include_bytes_aligned! in the userspace code
let out_dir = env::var("OUT_DIR").unwrap();
aya_build::build_ebpf(&["tcptop-ebpf"]).expect("Failed to build eBPF programs");
}
```
## eBPF Hook Strategy (Claude's Discretion Recommendation)
### Recommended Hook Points
| Hook | Type | Captures | Why This Over Alternatives |
|------|------|----------|---------------------------|
| `tcp_sendmsg` | kprobe | Bytes/packets out, PID, comm, connection tuple, srtt_us | Process context available; msg size in arg2 |
| `tcp_recvmsg` | kprobe | Bytes/packets in, PID, comm, connection tuple, srtt_us | Process context available; return value has bytes read |
| `udp_sendmsg` | kprobe | UDP bytes/packets out, PID, comm, 4-tuple | Only way to track UDP sends with PID attribution |
| `udp_recvmsg` | kprobe | UDP bytes/packets in, PID, comm, 4-tuple | Only way to track UDP receives with PID attribution |
| `sock:inet_sock_set_state` | tracepoint | TCP state transitions (old_state, new_state, tuple) | Stable kernel API; fires on every TCP state change |
### Why Kprobes Over Tracepoints for Data Hooks
Tracepoints for `tcp:tcp_probe` exist but don't carry byte count information per call. The `tcp_sendmsg` / `tcp_recvmsg` kernel functions take the byte size as a parameter, making kprobes the natural choice. The tradeoff is that kprobes are less stable across kernel versions, but `tcp_sendmsg` signature has been stable for many kernel versions.
### RTT Strategy
Read `srtt_us` from `struct tcp_sock` during `tcp_sendmsg`/`tcp_recvmsg` kprobes. The sock pointer (arg0) can be cast to `tcp_sock*` to access the smoothed RTT field. Shift right by 3 to get microseconds. This piggybacks on existing hooks -- no additional hook needed.
### Ring Buffer vs PerfEventArray
**Use RingBuf.** Rationale:
- Single shared buffer across all CPUs (simpler code)
- Strong event ordering (important for state change events being processed after corresponding data events)
- Precise wakeup notifications (no sampling/watermark tuning)
- Dropped events visible to eBPF side (can count and report)
- Kernel 5.8 is already the minimum requirement
## State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| xtask build pattern | `aya-build` in build.rs | aya 0.13+ (2025) | Simpler builds, standard `cargo build` works |
| PerfEventArray for all events | RingBuf (kernel 5.8+) | Linux 5.8 (2020) | Better ordering, shared buffer, less userspace code |
| BCC Python + C | Aya pure Rust | 2022+ | Single language, better safety, no runtime dependency on BCC/LLVM |
| Manual eBPF bytecode loading | Aya auto-BTF relocation | aya 0.12+ | Portable across kernel versions without recompilation |
**Deprecated/outdated:**
- RedBPF: Abandoned, superseded by Aya
- tui-rs: Deprecated in favor of ratatui (relevant for Phase 2)
- PerfEventArray for ordered event streams: RingBuf is strictly better when kernel >= 5.8
## Open Questions
1. **tcp_recvmsg Return Value for Byte Count**
- What we know: tcp_sendmsg has byte count as arg2. tcp_recvmsg may need a kretprobe to capture actual bytes received from the return value.
- What's unclear: Whether kretprobe on tcp_recvmsg is reliable for byte counts or if we need to read from the msghdr.
- Recommendation: Start with kretprobe on tcp_recvmsg to capture return value. If unreliable, fall back to reading msg_iter length from the msghdr argument.
2. **IPv6 Struct Layout in eBPF**
- What we know: IPv4 addresses are in `__sk_common.skc_rcv_saddr` and `__sk_common.skc_daddr`. IPv6 uses `skc_v6_rcv_saddr` and `skc_v6_daddr`.
- What's unclear: Whether aya-tool generates correct bindings for these fields across kernel versions, or if manual offset calculation is needed.
- Recommendation: Start with IPv4 only for initial proof-of-life, add IPv6 in a follow-up task within Phase 1. Use `aya-tool generate` to get correct struct offsets.
3. **Exact aya-build Usage**
- What we know: aya-build 0.1.3 exists and replaces xtask. It's used in build.rs.
- What's unclear: Exact API surface -- documentation is sparse.
- Recommendation: Generate a project from aya-template first, then adapt the generated build.rs. The template will show current best practices.
## Sources
### Primary (HIGH confidence)
- [Aya docs.rs - RingBuf](https://docs.rs/aya/latest/aya/maps/ring_buf/struct.RingBuf.html) - RingBuf API, AsyncFd integration
- [Aya docs.rs - aya-ebpf RingBuf](https://docs.rs/aya-ebpf/latest/aya_ebpf/maps/ring_buf/struct.RingBuf.html) - Kernel-side RingBuf reserve/submit API
- [Aya Book - Probes](https://aya-rs.dev/book/programs/probes) - Kprobe definition, attachment, context access
- [Aya Template](https://github.com/aya-rs/aya-template) - Project structure, workspace layout
- crates.io version checks (2026-03-21) - All version numbers verified
### Secondary (MEDIUM confidence)
- [Brendan Gregg - TCP Tracepoints](https://www.brendangregg.com/blog/2018-03-22/tcp-tracepoints.html) - TCP tracepoint list, inet_sock_set_state fields
- [Red Hat - TCP RTT with eBPF](https://developers.redhat.com/articles/2024/02/27/network-observability-using-tcp-handshake-round-trip-time) - srtt_us field reading, fentry approach
- [eunomia eBPF Tutorial 14](https://eunomia.dev/tutorials/14-tcpstates/) - TCP state tracking with inet_sock_set_state
- [Yuki Nakamura - Aya Tracepoint](https://yuki-nakamura.com/2024/07/06/writing-ebpf-tracepoint-program-with-rust-aya-tips-and-example/) - Tracepoint struct generation, context reading
- [eBPF Docs - RingBuf Map Type](https://docs.ebpf.io/linux/map-type/BPF_MAP_TYPE_RINGBUF/) - Ring buffer kernel semantics
### Tertiary (LOW confidence)
- [OneUptime - eBPF with Rust Aya](https://oneuptime.com/blog/post/2026-01-07-ebpf-rust-aya/view) - General patterns, build pipeline (blog, not official)
- [Deepfence - Aya companion](https://www.deepfence.io/blog/aya-your-trusty-ebpf-companion) - Map sharing patterns (blog)
## Metadata
**Confidence breakdown:**
- Standard stack: HIGH - All versions verified on crates.io; Aya is the only viable Rust eBPF library
- Architecture: HIGH - Follows established Aya patterns (template structure, RingBuf, kprobe+tracepoint combo); validated against multiple sources
- Hook strategy: MEDIUM-HIGH - Kprobe targets well-established; srtt_us reading is proven technique; tcp_recvmsg byte capture needs runtime validation
- Pitfalls: HIGH - Based on documented kernel behaviors and community-known issues
**Research date:** 2026-03-21
**Valid until:** 2026-04-21 (30 days -- Aya ecosystem is stable at 0.13.x)