Вы находитесь на странице: 1из 11

Kafka lessons that we learned

Kafka lessons that we learned


the hard way
Data Balancing
The Kafka 0.7 cluster has been stable and well-
balanced from the beginning. Kafka 0.8
introduced some new changes.

Partition assignment.
Data replication feature.
Data Balancing
Partition assignment

Cannot use F5 for load balancing.


Load among brokers out of balance.
Monitor disk usage with Bosun (oops).
Occasional maintenance with kafka-reassign-
partitions.sh.
Data Balancing
Data replication feature

Switched from RAID-10 to JBOD as


recommended on Kafka web site.
Drives were severely out-of-balance.
A bad drive brings down the whole broker.
Switching to RAID-10.
Monitor disk usage with Bosun (oops).
Data Balancing
Our own bugs

Log forwarder topic explosion.


Cap on number of forensic topics per stack.
Increased Data
Why is there so much more data in the Kafka 0.8
cluster?

How do we scale going forward?


Increased Data
Why is there so much more data in the Kafka 0.8
cluster?

Log forwarder EOF bug.


Fixed.
Preventative measures going forward:
Monitor topics with Spark.
quota.producer.default property in Kafka 0.9.
Increased Data
Why is there so much more data in the Kafka 0.8
cluster?

Snappy compression. Brokers are I/O bound.


Switched back to gzip for forensic data.
Continue to use Snappy for binary Avro data.
Increased Data
Why is there so much more data in the Kafka 0.8
cluster?

Duplicate data. Forwarder sends logs to


eventdata and stack-specific topics.
Handle multiple topics with Camus or Gobblin.
Increased Data
How do we scale going forward?

Add nodes to Kafka cluster.


Repurpose 0.7 servers.
Separate Kafka clusters for business and
forensic data.

Вам также может понравиться