PayPal runs 1.5K brokers worldwide and hosts over 20K topics.
A recent post on their engineering blog explains how they did it:
  1. Cluster Management: Instead of having fixed addresses for their brokers, PayPal created a service that dynamically handles these address
  2. Access Control Lists (ACLs): PayPal set up a system to ensure only authorized applications could access specific Kafka clusters and topics, enhancing security.
  3. Monitoring and Alerting: PayPal gets a lot of data on how well their Kafka clusters work to ensure they are reliable. An alert is sent out if something strange happens, making it easier for the team to deal with the problem immediately.
  4. QA Environment: PayPal built a separate replica world (QA environment) where developers can freely test things without affecting the real-world setup.
  5. Topic Onboarding: Teams must make requests to start new topics. Before approving these requests, PayPal has a Kafka team that looks at them, makes sure they can handle them and sets up any security steps that are needed.