Autonomous Drone Herding System
A real-time multi-agent simulation pipeline for drone herding and flocking control. The goal: process thousands of geospatial updates per frame so you can run large-scale flock scenarios and test control algorithms without waiting. Full report.
The natural question was whether to write it in C or C++ for performance. We stuck with Python because we needed to iterate fast — the control algorithms were changing constantly, and recompiling after every tweak would've killed our feedback loop. The bet was that we could get Python close enough to native speed with Numba JIT without giving up the ability to experiment quickly. That bet paid off.
What I built
Performance. Vectorized core computations with Numba JIT compilation — 50–130x speedup over the baseline. The bottleneck was always going to be neighbor search and movement updates; making those fast enough for real-time was the main challenge. Structuring the code so the compiler could see the hot loops was half the work.
Neighbor caching. A hybrid strategy with movement thresholds amortizes k-NN search cost. Agents only recompute neighbors when they've moved enough to matter. That yielded 17–30% additional throughput for large flocks. This wasn't just an optimization — it reshaped how the simulation models space.
Real-time sync. Replaced HTTP polling with Server-Sent Events for state synchronization. Sub-second latency for multi-drone coordination — the simulation and the application stay in sync without blocking.
Architecture and tests. Separated simulator and application layers so control logic could evolve independently. Added 70+ tests to enable safe refactoring across the backend.
What I learned
Language choice is a design decision, not a default. Python + Numba gave us 50–130x over baseline while keeping iteration speed high. A C rewrite would've been faster in absolute terms but slower in every other way that mattered for a project where the algorithms were still being figured out.
Caching changes your model. The neighbor-caching strategy forced us to decide what "close enough" meant in the simulation. That's not a performance question — it's a modeling question.
Real-time means different constraints. Polling felt simpler until we hit latency and load. SSE gave us continuous updates without the overhead of opening a new connection every few hundred milliseconds.
Built with