567 lines
30 KiB
Markdown
567 lines
30 KiB
Markdown
# Phase 1: Data Pipeline - Research
|
|
|
|
**Researched:** 2026-03-21
|
|
**Domain:** Linux eBPF kernel tracing, TCP/UDP connection monitoring, Aya Rust eBPF framework
|
|
**Confidence:** HIGH
|
|
|
|
## Summary
|
|
|
|
Phase 1 implements the core data pipeline: eBPF programs in the Linux kernel capture TCP and UDP events (connections, bytes, packets, state changes, RTT), push them through a ring buffer to userspace, where a Rust application aggregates per-connection statistics and streams them to stdout. This is a greenfield Aya eBPF project following the standard workspace template (ebpf crate, common crate, userspace crate).
|
|
|
|
The eBPF hook strategy combines kprobes for byte/packet counting (`tcp_sendmsg`, `tcp_recvmsg`, `udp_sendmsg`, `udp_recvmsg`) with the `sock:inet_sock_set_state` tracepoint for TCP state transitions and reading `srtt_us` from the sock struct for RTT. Pre-existing connections are bootstrapped from `/proc/net/tcp` and `/proc/net/udp` on startup. All kernel-to-userspace communication uses a single `RingBuf` map (requires kernel 5.8+).
|
|
|
|
The platform abstraction trait (`NetworkCollector`) is defined in this phase to enable the macOS backend in Phase 4, but only the Linux/eBPF implementation is built. The proof-of-life output is streaming lines to stdout -- no TUI, no cursor manipulation.
|
|
|
|
**Primary recommendation:** Use the Aya template workspace structure with `aya-build` in `build.rs` (not xtask). Attach to 4 kprobes + 1 tracepoint, use a single RingBuf for all events, and keep the common crate's event enum as the single shared type between kernel and userspace.
|
|
|
|
<user_constraints>
|
|
## User Constraints (from CONTEXT.md)
|
|
|
|
### Locked Decisions
|
|
- **D-01:** Streaming lines to stdout -- each connection update prints a new line (like `tail -f`). No screen clearing or cursor manipulation.
|
|
- **D-02:** Human-readable sizes by default (`1.2 MB`, `340 KB/s`). Show raw value alongside when it fits cleanly without noise (e.g., `1258291 (1.2M)`). If too noisy, drop to human-readable only.
|
|
- **D-03:** Full detail per connection -- all fields: local/remote addr+port, PID, process name, TCP state, bytes in/out, packets in/out, RTT, bandwidth rate.
|
|
- **D-04:** Output is assumed throwaway scaffolding. Build it simple; decide whether to keep as `--batch` mode when TUI lands in Phase 2.
|
|
- **D-05:** Full 4-tuple grouping -- (src IP, src port, dst IP, dst port) = one flow for UDP.
|
|
- **D-06:** 5-second idle timeout -- UDP flows disappear 5s after last packet. Tunable via flag deferred.
|
|
- **D-07:** No synthesized state for UDP -- show `-` or `UDP` in the state column.
|
|
- **D-08:** Flat bidirectional flow tracking -- count bytes/packets in each direction, no request/response inference.
|
|
- **D-09:** Minimal error message on missing privileges: `error: tcptop requires root privileges. Run with sudo.`
|
|
- **D-10:** If Linux capabilities (`CAP_BPF`, `CAP_PERFMON`) are present, proceed without root. Don't suggest capabilities in error message.
|
|
- **D-11:** Exit code 77 on privilege failure.
|
|
- **D-12:** Closed TCP connections linger for one display/refresh cycle, then are removed.
|
|
- **D-13:** Visual distinction for new/closing connections -- exact styling deferred to Phase 2.
|
|
- **D-14:** Connection close events print `[CLOSED] 192.168.1.1:443 -> ...` in streaming output.
|
|
- **D-15:** Pre-existing connections sourced from `/proc/net/tcp` on startup, marked as partial.
|
|
|
|
### Claude's Discretion
|
|
- eBPF hook point selection (kprobes vs tracepoints)
|
|
- Platform abstraction trait design and boundaries
|
|
- Ring buffer vs perf event array for kernel-to-userspace transport
|
|
- RTT estimation implementation approach
|
|
- Exact format/layout of streaming output lines
|
|
- `/proc/net/tcp` parsing strategy for pre-existing connections
|
|
- Internal data structures and concurrency model
|
|
|
|
### Deferred Ideas (OUT OF SCOPE)
|
|
- UDP flow idle timeout user-configurable via CLI flag -- Phase 2 or 3
|
|
- Protocol hint column for well-known UDP ports -- future enhancement
|
|
- Toggle between human-readable and raw byte display via keypress -- Phase 2 TUI
|
|
- `--batch` or `--once` mode -- evaluate after Phase 2
|
|
</user_constraints>
|
|
|
|
<phase_requirements>
|
|
## Phase Requirements
|
|
|
|
| ID | Description | Research Support |
|
|
|----|-------------|------------------|
|
|
| DATA-01 | Per-connection byte counts (sent/received) in real time | kprobes on `tcp_sendmsg`/`tcp_recvmsg`/`udp_sendmsg`/`udp_recvmsg` capture byte counts from msg length args |
|
|
| DATA-02 | Per-connection packet counts (sent/received) in real time | Same kprobes increment packet counters per event |
|
|
| DATA-03 | TCP connection state (ESTABLISHED, LISTEN, TIME_WAIT, etc.) | `sock:inet_sock_set_state` tracepoint fires on every state transition with old+new state |
|
|
| DATA-04 | Correlate connection to owning process (PID, process name) | `bpf_get_current_pid_tgid()` in kprobes captures PID; process name via `bpf_get_current_comm()` or userspace `sysinfo`/`procfs` |
|
|
| DATA-05 | Per-connection TCP RTT estimate | Read `srtt_us` from `struct tcp_sock` in kprobe context (shifted right by 3 for microseconds) |
|
|
| DATA-06 | Bandwidth rates (KB/s or MB/s) per connection | Userspace calculation: delta bytes / delta time between refresh cycles |
|
|
| DATA-07 | Track TCP and UDP (UDP flows synthesized from 4-tuple with idle timeout) | UDP kprobes + userspace HashMap keyed by 4-tuple with 5s idle expiry |
|
|
| PLAT-01 | Works on Linux kernel 5.8+ using eBPF | RingBuf requires 5.8+; all hook points available in 5.8+; Aya handles BTF portability |
|
|
| PLAT-03 | Platform abstraction allows different backends | `NetworkCollector` trait with async `Stream` of connection events; Linux impl in this phase, macOS impl in Phase 4 |
|
|
| OPS-01 | Detect missing root/elevated privileges with clear error | Check `geteuid() == 0` or `CAP_BPF`+`CAP_PERFMON` capabilities before eBPF load |
|
|
| OPS-02 | Low overhead on host (no heavy polling, eBPF ring buffer based) | RingBuf is event-driven (no polling); kprobes fire only on actual network calls |
|
|
</phase_requirements>
|
|
|
|
## Standard Stack
|
|
|
|
### Core (Phase 1 Dependencies)
|
|
|
|
| Library | Version | Purpose | Verified |
|
|
|---------|---------|---------|----------|
|
|
| aya | 0.13.1 | eBPF loader, map access, program attachment | crates.io 2025-11-17 |
|
|
| aya-ebpf | 0.1.1 | Kernel-side eBPF macros (`#[kprobe]`, `#[tracepoint]`) | crates.io 2025-11-17 |
|
|
| aya-log | 0.2.1 | Userspace eBPF log receiver | crates.io 2024-10-09 |
|
|
| aya-log-ebpf | 0.1.0 | Kernel-side eBPF logging (`info!`, `debug!`) | crates.io 2024-10-09 |
|
|
| aya-build | 0.1.3 | Build script for eBPF compilation (replaces xtask) | crates.io 2025-11-17 |
|
|
| tokio | 1.50.0 | Async runtime (eBPF event loop, timers) | crates.io 2026-03-03 |
|
|
| clap | 4.6.0 | CLI argument parsing | crates.io 2026-03-12 |
|
|
| anyhow | 1.0.102 | Application error handling | crates.io 2026-02-20 |
|
|
| thiserror | 2.0.18 | Typed errors for library/trait code | crates.io 2026-01-18 |
|
|
| nix | 0.31.2 | Unix syscalls (privilege check, geteuid) | crates.io 2026-02-28 |
|
|
| procfs | 0.18.0 | `/proc/net/tcp` and `/proc/net/udp` parsing | crates.io 2025-08-30 |
|
|
| serde | 1.0.228 | Serialization for shared types | crates.io 2025-09-27 |
|
|
| log + env_logger | 0.4.x / 0.11.x | Internal diagnostics logging | Stable, widely used |
|
|
| signal-hook | 0.4.3 | Graceful SIGINT/SIGTERM shutdown | crates.io 2026-01-24 |
|
|
|
|
### eBPF Crate Dependencies (tcptop-ebpf)
|
|
|
|
| Library | Version | Purpose |
|
|
|---------|---------|---------|
|
|
| aya-ebpf | 0.1.1 | eBPF program macros and helpers |
|
|
| aya-log-ebpf | 0.1.0 | Kernel-side logging |
|
|
|
|
### Build Tools
|
|
|
|
| Tool | Purpose | Notes |
|
|
|------|---------|-------|
|
|
| Rust nightly | eBPF compilation target | Pin in `rust-toolchain.toml` with `bpfel-unknown-none` target |
|
|
| bpf-linker | Links eBPF object files | `cargo install bpf-linker` (on macOS: `--no-default-features` + LLVM 21+) |
|
|
| aya-build | Build.rs integration | Compiles eBPF crate during `cargo build` -- no xtask needed |
|
|
|
|
### Not Needed in Phase 1
|
|
|
|
| Library | Why Deferred |
|
|
|---------|--------------|
|
|
| ratatui / crossterm | Phase 2 (TUI) |
|
|
| csv | Phase 3 (CSV logging) |
|
|
| pcap | Phase 4 (macOS backend) |
|
|
| sysinfo | Not needed if `bpf_get_current_comm()` suffices for process names in kernel; `procfs` covers PID-to-name fallback |
|
|
|
|
## Architecture Patterns
|
|
|
|
### Recommended Project Structure
|
|
```
|
|
tcptop/
|
|
├── Cargo.toml # Workspace root
|
|
├── rust-toolchain.toml # Pin nightly + bpfel-unknown-none target
|
|
├── .cargo/config.toml # Build flags for eBPF target
|
|
├── tcptop/ # Userspace binary crate
|
|
│ ├── Cargo.toml
|
|
│ ├── build.rs # Uses aya-build to compile eBPF crate
|
|
│ └── src/
|
|
│ ├── main.rs # Entry point: privilege check, CLI, event loop
|
|
│ ├── collector/
|
|
│ │ ├── mod.rs # NetworkCollector trait definition
|
|
│ │ └── linux.rs # Linux/eBPF implementation
|
|
│ ├── model.rs # ConnectionRecord, ConnectionKey, ConnectionStats
|
|
│ ├── aggregator.rs # Connection state aggregation, bandwidth calc, UDP timeout
|
|
│ ├── output.rs # Streaming stdout formatter
|
|
│ ├── privilege.rs # Root/capability check logic
|
|
│ └── proc_bootstrap.rs # /proc/net/tcp+udp parser for pre-existing connections
|
|
├── tcptop-ebpf/ # eBPF kernel programs (no_std)
|
|
│ ├── Cargo.toml
|
|
│ └── src/
|
|
│ └── main.rs # All eBPF programs: kprobes + tracepoint
|
|
└── tcptop-common/ # Shared types (no_std compatible)
|
|
├── Cargo.toml
|
|
└── src/
|
|
└── lib.rs # Event enum, ConnectionKey, #[repr(C)] structs
|
|
```
|
|
|
|
### Pattern 1: Event-Driven eBPF Pipeline
|
|
|
|
**What:** Kernel eBPF programs emit events into a RingBuf; userspace consumes them asynchronously via tokio and updates an in-memory connection table.
|
|
|
|
**When to use:** Always -- this is the core architecture.
|
|
|
|
**Flow:**
|
|
```
|
|
Kernel: kprobe/tracepoint fires
|
|
-> eBPF program extracts fields from sock/skb
|
|
-> Writes TcptopEvent to RingBuf
|
|
-> Userspace tokio task polls RingBuf via AsyncFd
|
|
-> Deserializes event, updates ConnectionTable HashMap
|
|
-> Periodic tick formats and prints to stdout
|
|
```
|
|
|
|
**eBPF side (kernel):**
|
|
```rust
|
|
// tcptop-common/src/lib.rs
|
|
#![no_std]
|
|
|
|
#[repr(C)]
|
|
#[derive(Clone, Copy)]
|
|
pub enum TcptopEvent {
|
|
TcpSend(DataEvent),
|
|
TcpRecv(DataEvent),
|
|
UdpSend(DataEvent),
|
|
UdpRecv(DataEvent),
|
|
TcpStateChange(StateEvent),
|
|
}
|
|
|
|
#[repr(C)]
|
|
#[derive(Clone, Copy)]
|
|
pub struct DataEvent {
|
|
pub pid: u32,
|
|
pub comm: [u8; 16],
|
|
pub saddr: u32, // IPv4 for now; extend to support IPv6
|
|
pub daddr: u32,
|
|
pub sport: u16,
|
|
pub dport: u16,
|
|
pub bytes: u32,
|
|
pub srtt_us: u32, // only meaningful for TCP
|
|
}
|
|
|
|
#[repr(C)]
|
|
#[derive(Clone, Copy)]
|
|
pub struct StateEvent {
|
|
pub pid: u32,
|
|
pub saddr: u32,
|
|
pub daddr: u32,
|
|
pub sport: u16,
|
|
pub dport: u16,
|
|
pub old_state: u32,
|
|
pub new_state: u32,
|
|
}
|
|
```
|
|
|
|
**eBPF program (kernel):**
|
|
```rust
|
|
// tcptop-ebpf/src/main.rs
|
|
#![no_std]
|
|
#![no_main]
|
|
|
|
use aya_ebpf::{macros::{kprobe, tracepoint, map}, maps::RingBuf, programs::{ProbeContext, TracePointContext}};
|
|
use tcptop_common::{TcptopEvent, DataEvent};
|
|
|
|
#[map]
|
|
static EVENTS: RingBuf = RingBuf::with_byte_size(256 * 1024, 0); // 256 KB
|
|
|
|
#[kprobe]
|
|
pub fn tcp_sendmsg(ctx: ProbeContext) -> u32 {
|
|
match try_tcp_sendmsg(ctx) {
|
|
Ok(ret) => ret,
|
|
Err(_) => 0,
|
|
}
|
|
}
|
|
|
|
fn try_tcp_sendmsg(ctx: ProbeContext) -> Result<u32, i64> {
|
|
// arg0: struct sock*, arg1: struct msghdr*, arg2: size_t (bytes)
|
|
let sock: *const core::ffi::c_void = ctx.arg(0).ok_or(1)?;
|
|
let size: usize = ctx.arg(2).ok_or(1)?;
|
|
// Extract connection tuple from sock->__sk_common
|
|
// Write TcptopEvent::TcpSend to EVENTS ring buffer
|
|
if let Some(mut entry) = EVENTS.reserve::<TcptopEvent>(0) {
|
|
// populate entry...
|
|
entry.submit(0);
|
|
}
|
|
Ok(0)
|
|
}
|
|
```
|
|
|
|
**Userspace consumer:**
|
|
```rust
|
|
// Userspace: read from RingBuf with AsyncFd for async notification
|
|
use aya::maps::RingBuf;
|
|
use tokio::io::unix::AsyncFd;
|
|
|
|
let mut ring_buf = RingBuf::try_from(bpf.map_mut("EVENTS").unwrap())?;
|
|
let async_fd = AsyncFd::new(ring_buf.as_raw_fd())?;
|
|
|
|
loop {
|
|
let mut guard = async_fd.readable().await?;
|
|
while let Some(item) = ring_buf.next() {
|
|
let event: &TcptopEvent = unsafe { &*(item.as_ptr() as *const TcptopEvent) };
|
|
connection_table.update(event);
|
|
}
|
|
guard.clear_ready();
|
|
}
|
|
```
|
|
|
|
### Pattern 2: Platform Abstraction Trait
|
|
|
|
**What:** A trait that abstracts the data collection backend, enabling Linux (eBPF) and macOS (pcap) implementations behind a common interface.
|
|
|
|
**Design:**
|
|
```rust
|
|
// collector/mod.rs
|
|
use tokio::sync::mpsc;
|
|
|
|
pub enum CollectorEvent {
|
|
Data(DataEvent),
|
|
StateChange(StateEvent),
|
|
}
|
|
|
|
#[async_trait::async_trait]
|
|
pub trait NetworkCollector: Send {
|
|
/// Start collecting. Sends events to the provided channel.
|
|
async fn start(&mut self, tx: mpsc::Sender<CollectorEvent>) -> Result<()>;
|
|
|
|
/// Stop collecting and clean up kernel resources.
|
|
async fn stop(&mut self) -> Result<()>;
|
|
|
|
/// Bootstrap pre-existing connections (e.g., from /proc/net/tcp).
|
|
fn bootstrap_existing(&self) -> Result<Vec<ConnectionRecord>>;
|
|
}
|
|
```
|
|
|
|
**Why mpsc channel:** Decouples the collector from the aggregator. The eBPF ring buffer reader pushes events into the channel; the aggregator consumes them on its own schedule. This also naturally supports the macOS backend in Phase 4 pushing pcap-derived events into the same channel.
|
|
|
|
### Pattern 3: Connection Table with Bandwidth Calculation
|
|
|
|
**What:** In-memory HashMap keyed by connection tuple, with periodic tick for rate calculation and UDP idle timeout.
|
|
|
|
```rust
|
|
pub struct ConnectionTable {
|
|
connections: HashMap<ConnectionKey, ConnectionRecord>,
|
|
last_tick: Instant,
|
|
}
|
|
|
|
impl ConnectionTable {
|
|
pub fn tick(&mut self) {
|
|
let now = Instant::now();
|
|
let dt = now.duration_since(self.last_tick);
|
|
|
|
for record in self.connections.values_mut() {
|
|
record.rate_in = (record.bytes_in - record.prev_bytes_in) as f64 / dt.as_secs_f64();
|
|
record.rate_out = (record.bytes_out - record.prev_bytes_out) as f64 / dt.as_secs_f64();
|
|
record.prev_bytes_in = record.bytes_in;
|
|
record.prev_bytes_out = record.bytes_out;
|
|
}
|
|
|
|
// Expire UDP flows idle > 5 seconds
|
|
self.connections.retain(|k, r| {
|
|
if k.protocol == Protocol::Udp {
|
|
r.last_seen.elapsed() < Duration::from_secs(5)
|
|
} else {
|
|
true // TCP lifecycle managed by state events
|
|
}
|
|
});
|
|
|
|
self.last_tick = now;
|
|
}
|
|
}
|
|
```
|
|
|
|
### Anti-Patterns to Avoid
|
|
|
|
- **Polling /proc/net/tcp in a loop:** Misses short-lived connections, high overhead, racy PID mapping. Use eBPF for event-driven capture; `/proc` only at startup for bootstrapping.
|
|
- **PerfEventArray instead of RingBuf for this use case:** PerfEventArray is per-CPU with no ordering guarantees and requires per-CPU buffer management. RingBuf provides global ordering (important for state change events), shared buffer (simpler), and notification via fd. Since kernel 5.8 is the minimum anyway, always prefer RingBuf.
|
|
- **Separate RingBuf per program type:** Unnecessary complexity. A single RingBuf with a tagged enum (`TcptopEvent`) handles all event types cleanly.
|
|
- **Blocking the eBPF event loop with stdout writes:** The RingBuf consumer and the stdout printer must be decoupled. If stdout blocks (pipe full), it must not cause RingBuf overflow. Use a channel between consumer and printer.
|
|
|
|
## Don't Hand-Roll
|
|
|
|
| Problem | Don't Build | Use Instead | Why |
|
|
|---------|-------------|-------------|-----|
|
|
| `/proc/net/tcp` parsing | Custom line parser | `procfs` crate (0.18.0) | Handles IPv4/IPv6, hex parsing, socket states, inode mapping. Battle-tested. |
|
|
| CLI argument parsing | Manual `std::env::args` | `clap` 4.6.0 derive macros | Handles `--help`, validation, error messages automatically |
|
|
| eBPF program compilation | Custom build scripts | `aya-build` 0.1.3 in `build.rs` | Handles target selection, cross-compilation, artifact embedding |
|
|
| Async eBPF event reading | Raw epoll/poll loop | `tokio::io::unix::AsyncFd` wrapping RingBuf fd | Integrates with tokio runtime, handles wakeups correctly |
|
|
| Privilege checking | Raw syscall | `nix::unistd::geteuid()` + `caps` crate or manual `/proc/self/status` CapEff parse | Correct cross-distro behavior |
|
|
| Human-readable byte formatting | Format function | Simple utility function is fine here (too small for a crate) | Only needs KB/MB/GB and rate formatting; 20 lines of code |
|
|
|
|
**Key insight:** The eBPF build pipeline is the most complex "don't hand-roll" item. Aya's template structure and `aya-build` handle the nightmare of cross-compiling no_std Rust to BPF bytecode, linking, and embedding the result in the userspace binary.
|
|
|
|
## Common Pitfalls
|
|
|
|
### Pitfall 1: RingBuf Entry Alignment
|
|
**What goes wrong:** Shared types between eBPF and userspace have mismatched memory layout, causing corrupted data.
|
|
**Why it happens:** eBPF enforces 8-byte max alignment. Rust may add padding differently than C.
|
|
**How to avoid:** All shared types in `tcptop-common` MUST use `#[repr(C)]` and have no field with alignment > 8. Use only primitive types (u8, u16, u32, u64, i32, arrays of primitives). No `String`, `Vec`, `Option`, or enums with non-trivial discriminants.
|
|
**Warning signs:** Garbled IP addresses, impossible PID values, random byte counts.
|
|
|
|
### Pitfall 2: eBPF Verifier Rejection
|
|
**What goes wrong:** Kernel verifier rejects the eBPF program at load time with cryptic errors.
|
|
**Why it happens:** Unbounded loops, stack size exceeded (512 bytes max), uninitialized memory reads, pointer arithmetic verifier can't prove safe.
|
|
**How to avoid:** Keep eBPF functions small and linear. No loops (or use bounded `for i in 0..MAX`). Minimize stack usage -- read kernel structs field-by-field, don't copy whole structs. Use `bpf_probe_read_kernel` for every kernel pointer dereference.
|
|
**Warning signs:** "R1 invalid mem access", "back-edge in control flow", "unreachable insn".
|
|
|
|
### Pitfall 3: Missing Process Context in Tracepoint
|
|
**What goes wrong:** `sock:inet_sock_set_state` fires in interrupt/softirq context where `bpf_get_current_pid_tgid()` returns 0 (kernel context, not the process).
|
|
**Why it happens:** TCP state changes can be triggered by incoming packets processed in softirq, not in the context of the owning process.
|
|
**How to avoid:** Capture PID in the kprobes (tcp_sendmsg/tcp_recvmsg) which DO run in process context, store in a HashMap keyed by sock pointer. When inet_sock_set_state fires, look up the sock pointer in the map to get the PID. If not found, PID is 0/unknown -- enrich from userspace via procfs.
|
|
**Warning signs:** Many connections showing PID 0 or "unknown" process name.
|
|
|
|
### Pitfall 4: IPv6 Support Forgotten
|
|
**What goes wrong:** Tool only handles IPv4, crashes or silently drops IPv6 connections.
|
|
**Why it happens:** Easy to prototype with `u32` for IP addresses and forget about IPv6.
|
|
**How to avoid:** Use `[u8; 16]` for all IP address fields from the start. In the common crate, include an `af_family: u16` field. Format IPv4-mapped-IPv6 addresses correctly. The `sock.__sk_common.skc_family` field tells you AF_INET vs AF_INET6.
|
|
**Warning signs:** Missing connections on dual-stack hosts, especially localhost (::1).
|
|
|
|
### Pitfall 5: srtt_us Field Interpretation
|
|
**What goes wrong:** RTT values are 8x too high.
|
|
**Why it happens:** The kernel stores `srtt_us` as "smoothed RTT << 3" (shifted left by 3). Must shift right by 3 to get actual microseconds.
|
|
**How to avoid:** Always `srtt_us >> 3` when reading from `struct tcp_sock`. Document this in the code comment referencing `include/net/tcp.h`.
|
|
**Warning signs:** RTT values of 800us for localhost connections (should be ~100us).
|
|
|
|
### Pitfall 6: Pre-existing Connection Byte Counts
|
|
**What goes wrong:** Connections that existed before tcptop started show misleadingly low byte counts.
|
|
**Why it happens:** eBPF hooks only capture bytes from the moment they're attached. Historical byte counts are not available.
|
|
**How to avoid:** Per D-15, mark pre-existing connections as partial. The bootstrap from `/proc/net/tcp` provides the connection tuple and state but NOT byte counts. Start byte counters at 0 and indicate to the user these are "since monitoring started."
|
|
**Warning signs:** Long-lived connections (like SSH) showing 0 bytes initially.
|
|
|
|
### Pitfall 7: RingBuf Overflow Under Load
|
|
**What goes wrong:** High-traffic hosts generate more events than userspace can consume; events are silently dropped.
|
|
**Why it happens:** RingBuf has fixed size. Unlike PerfEventArray, dropped events are reported to the eBPF side (reserve returns None), not to userspace.
|
|
**How to avoid:** Size the ring buffer appropriately (start with 256KB, tune up if needed). In eBPF code, handle the `reserve` returning `None` gracefully -- increment a counter in a separate eBPF array map so userspace can detect and report drops. Consider rate-limiting per-connection updates in the eBPF program.
|
|
**Warning signs:** Missing events, gaps in connection tracking, drop counter incrementing.
|
|
|
|
## Code Examples
|
|
|
|
### Privilege Check (D-09, D-10, D-11)
|
|
```rust
|
|
// privilege.rs
|
|
use nix::unistd::geteuid;
|
|
use std::process;
|
|
|
|
pub fn check_privileges() {
|
|
if geteuid().is_root() {
|
|
return;
|
|
}
|
|
|
|
// Check for CAP_BPF and CAP_PERFMON via /proc/self/status
|
|
if has_required_capabilities() {
|
|
return;
|
|
}
|
|
|
|
eprintln!("error: tcptop requires root privileges. Run with sudo.");
|
|
process::exit(77);
|
|
}
|
|
|
|
fn has_required_capabilities() -> bool {
|
|
// Parse /proc/self/status for CapEff line
|
|
// Check bits for CAP_BPF (39) and CAP_PERFMON (38)
|
|
let status = std::fs::read_to_string("/proc/self/status").ok();
|
|
if let Some(status) = status {
|
|
for line in status.lines() {
|
|
if line.starts_with("CapEff:") {
|
|
let hex = line.trim_start_matches("CapEff:").trim();
|
|
if let Ok(caps) = u64::from_str_radix(hex, 16) {
|
|
let cap_perfmon = 1u64 << 38;
|
|
let cap_bpf = 1u64 << 39;
|
|
return (caps & cap_perfmon != 0) && (caps & cap_bpf != 0);
|
|
}
|
|
}
|
|
}
|
|
}
|
|
false
|
|
}
|
|
```
|
|
|
|
### Streaming Output Format (D-01, D-02, D-03, D-14)
|
|
```rust
|
|
// output.rs -- example line format
|
|
// PROTO LOCAL REMOTE PID PROCESS STATE BYTES_IN BYTES_OUT PKTS_IN PKTS_OUT RTT RATE_IN RATE_OUT
|
|
// TCP 192.168.1.10:54321 93.184.216.34:443 1234 curl ESTABLISHED 1258291 (1.2M) 340 (340B) 892 12 28.3ms 420.1 KB/s 113B/s
|
|
// UDP 0.0.0.0:53 8.8.8.8:53 567 systemd-r UDP 4096 (4.0K) 128 (128B) 32 1 - 1.3 KB/s 42B/s
|
|
// [CLOSED] TCP 192.168.1.10:54321 -> 93.184.216.34:443 (curl, PID 1234)
|
|
```
|
|
|
|
### rust-toolchain.toml
|
|
```toml
|
|
[toolchain]
|
|
channel = "nightly-2026-01-15" # Pin a specific nightly
|
|
components = ["rust-src", "rustfmt", "clippy"]
|
|
|
|
[target.bpfel-unknown-none]
|
|
# eBPF target -- only needed for the -ebpf crate
|
|
```
|
|
|
|
### Workspace Cargo.toml
|
|
```toml
|
|
[workspace]
|
|
members = ["tcptop", "tcptop-common", "tcptop-ebpf"]
|
|
resolver = "2"
|
|
|
|
[workspace.dependencies]
|
|
aya = { version = "0.13.1", features = ["async_tokio"] }
|
|
aya-log = "0.2"
|
|
tokio = { version = "1", features = ["full"] }
|
|
anyhow = "1"
|
|
thiserror = "2"
|
|
clap = { version = "4.6", features = ["derive"] }
|
|
log = "0.4"
|
|
env_logger = "0.11"
|
|
```
|
|
|
|
### build.rs (Userspace Crate)
|
|
```rust
|
|
// tcptop/build.rs
|
|
use std::env;
|
|
|
|
fn main() {
|
|
// aya-build compiles the eBPF crate and makes the bytecode available
|
|
// via include_bytes_aligned! in the userspace code
|
|
let out_dir = env::var("OUT_DIR").unwrap();
|
|
aya_build::build_ebpf(&["tcptop-ebpf"]).expect("Failed to build eBPF programs");
|
|
}
|
|
```
|
|
|
|
## eBPF Hook Strategy (Claude's Discretion Recommendation)
|
|
|
|
### Recommended Hook Points
|
|
|
|
| Hook | Type | Captures | Why This Over Alternatives |
|
|
|------|------|----------|---------------------------|
|
|
| `tcp_sendmsg` | kprobe | Bytes/packets out, PID, comm, connection tuple, srtt_us | Process context available; msg size in arg2 |
|
|
| `tcp_recvmsg` | kprobe | Bytes/packets in, PID, comm, connection tuple, srtt_us | Process context available; return value has bytes read |
|
|
| `udp_sendmsg` | kprobe | UDP bytes/packets out, PID, comm, 4-tuple | Only way to track UDP sends with PID attribution |
|
|
| `udp_recvmsg` | kprobe | UDP bytes/packets in, PID, comm, 4-tuple | Only way to track UDP receives with PID attribution |
|
|
| `sock:inet_sock_set_state` | tracepoint | TCP state transitions (old_state, new_state, tuple) | Stable kernel API; fires on every TCP state change |
|
|
|
|
### Why Kprobes Over Tracepoints for Data Hooks
|
|
|
|
Tracepoints for `tcp:tcp_probe` exist but don't carry byte count information per call. The `tcp_sendmsg` / `tcp_recvmsg` kernel functions take the byte size as a parameter, making kprobes the natural choice. The tradeoff is that kprobes are less stable across kernel versions, but `tcp_sendmsg` signature has been stable for many kernel versions.
|
|
|
|
### RTT Strategy
|
|
|
|
Read `srtt_us` from `struct tcp_sock` during `tcp_sendmsg`/`tcp_recvmsg` kprobes. The sock pointer (arg0) can be cast to `tcp_sock*` to access the smoothed RTT field. Shift right by 3 to get microseconds. This piggybacks on existing hooks -- no additional hook needed.
|
|
|
|
### Ring Buffer vs PerfEventArray
|
|
|
|
**Use RingBuf.** Rationale:
|
|
- Single shared buffer across all CPUs (simpler code)
|
|
- Strong event ordering (important for state change events being processed after corresponding data events)
|
|
- Precise wakeup notifications (no sampling/watermark tuning)
|
|
- Dropped events visible to eBPF side (can count and report)
|
|
- Kernel 5.8 is already the minimum requirement
|
|
|
|
## State of the Art
|
|
|
|
| Old Approach | Current Approach | When Changed | Impact |
|
|
|--------------|------------------|--------------|--------|
|
|
| xtask build pattern | `aya-build` in build.rs | aya 0.13+ (2025) | Simpler builds, standard `cargo build` works |
|
|
| PerfEventArray for all events | RingBuf (kernel 5.8+) | Linux 5.8 (2020) | Better ordering, shared buffer, less userspace code |
|
|
| BCC Python + C | Aya pure Rust | 2022+ | Single language, better safety, no runtime dependency on BCC/LLVM |
|
|
| Manual eBPF bytecode loading | Aya auto-BTF relocation | aya 0.12+ | Portable across kernel versions without recompilation |
|
|
|
|
**Deprecated/outdated:**
|
|
- RedBPF: Abandoned, superseded by Aya
|
|
- tui-rs: Deprecated in favor of ratatui (relevant for Phase 2)
|
|
- PerfEventArray for ordered event streams: RingBuf is strictly better when kernel >= 5.8
|
|
|
|
## Open Questions
|
|
|
|
1. **tcp_recvmsg Return Value for Byte Count**
|
|
- What we know: tcp_sendmsg has byte count as arg2. tcp_recvmsg may need a kretprobe to capture actual bytes received from the return value.
|
|
- What's unclear: Whether kretprobe on tcp_recvmsg is reliable for byte counts or if we need to read from the msghdr.
|
|
- Recommendation: Start with kretprobe on tcp_recvmsg to capture return value. If unreliable, fall back to reading msg_iter length from the msghdr argument.
|
|
|
|
2. **IPv6 Struct Layout in eBPF**
|
|
- What we know: IPv4 addresses are in `__sk_common.skc_rcv_saddr` and `__sk_common.skc_daddr`. IPv6 uses `skc_v6_rcv_saddr` and `skc_v6_daddr`.
|
|
- What's unclear: Whether aya-tool generates correct bindings for these fields across kernel versions, or if manual offset calculation is needed.
|
|
- Recommendation: Start with IPv4 only for initial proof-of-life, add IPv6 in a follow-up task within Phase 1. Use `aya-tool generate` to get correct struct offsets.
|
|
|
|
3. **Exact aya-build Usage**
|
|
- What we know: aya-build 0.1.3 exists and replaces xtask. It's used in build.rs.
|
|
- What's unclear: Exact API surface -- documentation is sparse.
|
|
- Recommendation: Generate a project from aya-template first, then adapt the generated build.rs. The template will show current best practices.
|
|
|
|
## Sources
|
|
|
|
### Primary (HIGH confidence)
|
|
- [Aya docs.rs - RingBuf](https://docs.rs/aya/latest/aya/maps/ring_buf/struct.RingBuf.html) - RingBuf API, AsyncFd integration
|
|
- [Aya docs.rs - aya-ebpf RingBuf](https://docs.rs/aya-ebpf/latest/aya_ebpf/maps/ring_buf/struct.RingBuf.html) - Kernel-side RingBuf reserve/submit API
|
|
- [Aya Book - Probes](https://aya-rs.dev/book/programs/probes) - Kprobe definition, attachment, context access
|
|
- [Aya Template](https://github.com/aya-rs/aya-template) - Project structure, workspace layout
|
|
- crates.io version checks (2026-03-21) - All version numbers verified
|
|
|
|
### Secondary (MEDIUM confidence)
|
|
- [Brendan Gregg - TCP Tracepoints](https://www.brendangregg.com/blog/2018-03-22/tcp-tracepoints.html) - TCP tracepoint list, inet_sock_set_state fields
|
|
- [Red Hat - TCP RTT with eBPF](https://developers.redhat.com/articles/2024/02/27/network-observability-using-tcp-handshake-round-trip-time) - srtt_us field reading, fentry approach
|
|
- [eunomia eBPF Tutorial 14](https://eunomia.dev/tutorials/14-tcpstates/) - TCP state tracking with inet_sock_set_state
|
|
- [Yuki Nakamura - Aya Tracepoint](https://yuki-nakamura.com/2024/07/06/writing-ebpf-tracepoint-program-with-rust-aya-tips-and-example/) - Tracepoint struct generation, context reading
|
|
- [eBPF Docs - RingBuf Map Type](https://docs.ebpf.io/linux/map-type/BPF_MAP_TYPE_RINGBUF/) - Ring buffer kernel semantics
|
|
|
|
### Tertiary (LOW confidence)
|
|
- [OneUptime - eBPF with Rust Aya](https://oneuptime.com/blog/post/2026-01-07-ebpf-rust-aya/view) - General patterns, build pipeline (blog, not official)
|
|
- [Deepfence - Aya companion](https://www.deepfence.io/blog/aya-your-trusty-ebpf-companion) - Map sharing patterns (blog)
|
|
|
|
## Metadata
|
|
|
|
**Confidence breakdown:**
|
|
- Standard stack: HIGH - All versions verified on crates.io; Aya is the only viable Rust eBPF library
|
|
- Architecture: HIGH - Follows established Aya patterns (template structure, RingBuf, kprobe+tracepoint combo); validated against multiple sources
|
|
- Hook strategy: MEDIUM-HIGH - Kprobe targets well-established; srtt_us reading is proven technique; tcp_recvmsg byte capture needs runtime validation
|
|
- Pitfalls: HIGH - Based on documented kernel behaviors and community-known issues
|
|
|
|
**Research date:** 2026-03-21
|
|
**Valid until:** 2026-04-21 (30 days -- Aya ecosystem is stable at 0.13.x)
|