Context

Commercial environment requiring sustained high-volume event processing.

  • Peak throughput: ~120,000 messages/sec
  • Environment: Multi-cluster Kubernetes deployments
  • Constraints: Cost sensitivity during off-peak load; requirement for horizontal scaling without latency disruption.

Challenge

  • Sustain 120k msg/sec under peak load.
  • Reduce infrastructure cost by approximately 50% during low-demand periods.
  • Avoid consumer group rebalances during scaling events.
  • Maintain deterministic behavior during scale-up and scale-down transitions.
  • Support environment isolation across multiple client deployments.

Standard Kafka consumer group scaling introduces partition rebalances, causing latency spikes and temporary processing stalls. The system required horizontal scaling without these penalties.

Architecture

Platform

  • Streaming: Apache Kafka (self-managed and managed variants).
  • Infrastructure: Kubernetes-based application deployment.
  • Services: Node.js streaming applications.

Control Plane

  • Orchestration: Custom Kubernetes Operator (Python).
  • Delivery: GitOps deployment model using ArgoCD.

Scaling Model

Implemented a rebalance-avoidance strategy:

  • Scaling events triggered by observed throughput metrics.
  • New workers launched in independent consumer groups.
  • Traffic shifted to new workers after stabilization.
  • Existing workers drained prior to decommissioning.
  • Scale-down reversed the process, shifting traffic before termination.

This avoided partition reassignment during scaling events and eliminated rebalance-induced latency spikes.

Operational Model

  • Declarative Kubernetes deployments.
  • GitOps-driven environment promotion.
  • Environment isolation for multi-client clusters.
  • Deterministic rollout and rollback through versioned control plane.

Outcomes

  • Sustained 120k msg/sec peak throughput.
  • Achieved ~50% infrastructure cost reduction during off-peak periods.
  • Eliminated consumer group rebalance disruptions during scaling.
  • Delivered a production-grade scaling model under sustained load.