Building a CDC pipeline with Debezium — without breaking the source
Lessons from tailing a multi-tenant Mongo oplog into Kafka without putting the source database under more load than it already had.
// distributed systems
Mongo → Kafka via Debezium
oplog CDC at multi-tenant scale
Stub. Full post body to follow.
The shape of the problem: a multi-tenant Mongo cluster already running at ~70% of its IOPS budget, and a downstream analytics pipeline that needs every change in near-real-time. The naïve thing — polling collections on a timer — pushes load somewhere that has none to spare. Debezium’s oplog tailing approach is read-only from Mongo’s perspective and, done right, costs the source almost nothing.
// RELATED
2026-04-28 · 7 min · DATA PLATFORMS
Why we replaced analytics on Mongo with StarRocks
What broke when analytical queries shared a tenant cluster with OLTP — and what moving to a real OLAP engine actually gave us.
2026-03-05 · 8 min · DISTRIBUTED SYSTEMS
Normalizing GPS telemetry from 8k vehicles across half a dozen sources
Polished APIs, flaky devices, customer-internal feeds — what cross-source GPS normalization actually looks like in production.