Core Idea
Failure patterns are treated as information, not just noise.
This page is for research work that sits between systems, human understanding, and real-world decision-making. The first featured project is EGESS, a protocol for turning swarm failure, disagreement, and recovery into an interpretable hazard signal.
It uses absence tomography, 2-bit neighbor states, and a conservative `T` score to estimate direction, distance, approach speed, and recovery in a distributed swarm.
Failure patterns are treated as information, not just noise.
Where the hazard is coming from, how close it is, and whether it is spreading or recovering.
Python node network, visual demo, phase-based evaluation, and exportable paper evidence.
EGESS is a distributed sensing protocol built around a simple but unusual idea: the pattern of node failure can become the signal. Instead of only asking sensors to report the environment, each node watches reachability, disagreement, spread, and recovery across its local neighborhood. I describe this as absence tomography: using what disappears at the edges of the network to reconstruct where hazard pressure is moving.
The improved model separates sudden destruction from persistent spread, then combines them into a conservative instability score called `T`. That makes EGESS useful as a swarm-systems prototype, a fault-aware inference tool, and a repeatable evaluation harness for resilient distributed coordination.
Every cycle, a node pulls nearby peers with a lightweight status request. If a peer stops responding, EGESS treats that as possible destruction. If a peer is reachable but reports a much higher instability state, EGESS treats that as possible spread. Recovery is also part of the model: when missing nodes return and spread flattens, the system can shift from warning or impact toward contained recovery.
EGESS treats missing neighbors as an edge-mapped signal. The absence pattern becomes a way to infer hazard shape, direction, and pressure.
Confirms sudden missing or unreachable neighbors. This lane is sensitive to fresh breakage and moving impact.
Confirms persistent disagreement and instability around a node. This lane captures hazard pressure even before direct loss.
Tracks returning nodes and stalled spread so recovery is explainable instead of being treated as a silent reset.
`T` is the conservative combined instability score. The node listens to the strongest danger signal rather than averaging risk away.
Each node sends a compact 2-bit alert state to its six neighboring nodes. That gives every node a local hazard view without requiring global knowledge of the whole network.
00normal01warning10watch11impactE and NE are fine. NW has `T = 3` and slope `+1.5`. W has `T = 1` and slope `+0.5`. The local pattern says the strongest pressure is northwest and getting worse.
In simpler terms: weighted voting becomes direction, direction plus distance becomes movement, and movement gives an ETA. Glassboro can conclude: something hazardous is approaching from the northwest, about four hops away, moving at about 1.2 hops per cycle, and likely to arrive in a little over three cycles.
A neighbor that was reachable last cycle and fails this cycle. This is the strongest signal for fresh front movement.
A neighbor that was already gone and is still gone. This matters for impact and severity, even if the damage is no longer new.
A neighbor that was missing and then comes back. Recovery is part of the sensing model, not just a cleanup event.
A neighbor whose instability is already much higher than mine. This acts as a directional warning and confidence signal.
Answers: “Is danger moving toward me?” It emphasizes new disappearances, then reinforces that signal with disagreement, corroboration, persistent loss, and momentum.
Answers: “How bad is the local damage right now?” It counts total missing neighbors and increases when those losses are adjacent, because clustering suggests real local concentration rather than random isolated failure.
Answers: “Has the event stalled or started recovering?” It rises when no new damage appears for several cycles and neighbors begin returning, then falls if the front keeps spreading sideways.
The combined score `T` is deliberately conservative: whichever is worse between approaching danger and local damage becomes the node’s working instability value. That makes the signal easier to reason about in both the visualizer and the protocol logs.
EGESS now has a phase-based evaluation runner for exact active windows of `60s` or `120s`. The harness can run steady baselines, moving hazards, fire/bomb spread, and adversarial stress while collecting compact evidence for dashboards, spreadsheets, figures, and paper appendices.
Measures steady-state reachability, throughput, overhead, and per-node load without injected damage.
Simulates center ignition, hop-based spread, temporary bomb impact, and recovery trailing the front.
Moves a hazard band across the grid so local and far watch nodes can show detection over distance.
Injects false unavailability, lying sensors, noisy behavior, flapping, and recovery to test robustness.
EGESS runs as a real multi-node simulation with `node.py`, background protocol loops, trigger tooling, and fault injection controls.
The local project includes a full `egess_protocol.html` overview explaining the model, commands, and scoring behavior.
`egess-demo` includes a browser-based visualizer that exposes missing neighbors, disagreement, front score, impact score, and `T`.
The repo supports staged runs like baseline, fire/bomb, tornado sweep, adversarial noise, and recovery.
`demo_proof.sh` and `run_paper_eval.sh` export event logs, compact evidence, TSV summaries, dashboards, and portable bundles.
The toolchain includes a terminal monitor, visual inspector, merged dashboards, figure exports, and Google Sheets-ready CSVs.
EGESS matters because it treats failure patterns as information rather than only as noise. The improved version is stronger because it is no longer only a visual swarm idea: it has a detector layout, a proof runner, exact evaluation windows, storage-safe collection, cross-protocol comparison scaffolding, and exportable evidence. That makes it relevant to hazard inference, distributed systems, resilience engineering, and HCI for interpretable autonomy.