the hard way Data Balancing The Kafka 0.7 cluster has been stable and well- balanced from the beginning. Kafka 0.8 introduced some new changes.
Partition assignment. Data replication feature. Data Balancing Partition assignment
Cannot use F5 for load balancing.
Load among brokers out of balance. Monitor disk usage with Bosun (oops). Occasional maintenance with kafka-reassign- partitions.sh. Data Balancing Data replication feature
Switched from RAID-10 to JBOD as
recommended on Kafka web site. Drives were severely out-of-balance. A bad drive brings down the whole broker. Switching to RAID-10. Monitor disk usage with Bosun (oops). Data Balancing Our own bugs
Log forwarder topic explosion.
Cap on number of forensic topics per stack. Increased Data Why is there so much more data in the Kafka 0.8 cluster?
How do we scale going forward?
Increased Data Why is there so much more data in the Kafka 0.8 cluster?
Log forwarder EOF bug.
Fixed. Preventative measures going forward: Monitor topics with Spark. quota.producer.default property in Kafka 0.9. Increased Data Why is there so much more data in the Kafka 0.8 cluster?
Snappy compression. Brokers are I/O bound.
Switched back to gzip for forensic data. Continue to use Snappy for binary Avro data. Increased Data Why is there so much more data in the Kafka 0.8 cluster?
Duplicate data. Forwarder sends logs to
eventdata and stack-specific topics. Handle multiple topics with Camus or Gobblin. Increased Data How do we scale going forward?
Add nodes to Kafka cluster.
Repurpose 0.7 servers. Separate Kafka clusters for business and forensic data.