This post has been republished via RSS; it originally appeared at: Event Hubs Blog articles.
Event Hubs provides partitions to scale consumers for parallel processing. The concept of partitions belongs to Topics under Event Hubs namespace. Topics helps categorize the incoming messages and consumer in a group processes the events from one of the Topic partitions.
When a Topic is created, the number of partitions are specified at the time of creation. For some special cases though, you may have to add partitions after the Topic has been created. This requires you to dynamically accommodate the addition of partitions. This blog describes the behavior of adding partitions to an existing Topic with Event Hubs. Dynamic additions of partitions is available only on Dedicated Event Hubs clusters and not on Standard Event Hubs namespace.
What is the expected behavior of adding the partitions to an existing topic/event hub instance?
Event Hubs Client behavior
When you add a partition to an existing even hub the event hub client receives a “MessagingException” from the service informing the clients that entity metadata (here entity, is your event hub and metadata is the partition information) has been altered. The clients then will automatically re-open the AMQP links which would have then picked up the changed metadata information. The clients then operate normally.
- Event Hubs sender/producer client – Event Hubs provides three sender options,
- Partition sender – where you are sending events directly to a partition, although partitions are identifiable and can be sent to directly, we do not recommend this pattern. Adding partitions are not impacted to this scenario.
- Partition key sender – in this scenario, clients sends the events with a key so that all events belonging to that key end up in the same partition. In this case, service hashes the key and routes to the corresponding partition.
- Round-robin sender (default) – Here the service round robins the events across partitions. Event Hubs service is aware of partition count changes and will send to new partitions within seconds of altering partition count.
- Event Hubs receiver/consumer client – Event Hubs provides direct receivers and an easy consumer library called the Event Processor Host.
- Direct receivers – The direct receivers listen to specific partitions and their runtime behavior is not affected when an event hub instance partitions are scaled out.
- Event Processor Host (EPH) – This client does not actively and automatically refresh the entity metadata and hence would not pick up on partition count increase. Recreating an EPH instance will cause an entity metadata fetch which in turn will create new blobs for the newly added partitions. Pre-existing blobs will not be affected. Restarting all EPH instances is recommended to ensure that all instances are aware of the newly added partitions, so load-balancing is handled correctly among consumers.
Apache Kafka client behavior
Kafka clients that run on Event Hubs with the Apache Kafka protocol behave differently from event hub clients that use AMQP protocol. Kafka clients, update their metadata once every ‘metadata.max.age.ms’ milliseconds which can be specified the client configurations. The librdkafka libraries will also use the same configuration. Metadata updates inform the clients of service changes including the partition count increases.
Kafka sender/producer client – Producer always dictates that send requests contain the partition destination for each set of produced records. Thus, all produce partitioning is done client-side with producer’s view of broker metadata. Once the newly added partitions are added to the producer’s metadata view, they will be available for producer requests.
Kafka consumer/receiver client – When a consumer group member performs a metadata refresh and picks the newly created partitions, that member initiates a group rebalance. Consumer metadata then will be refreshed for all group members, and the new partitions will be assigned by the allotted rebalance leader.
- If you are using partition key with your producer applications and depending on key hashing to ensure ordering in a partitions, dynamically adding partitions is not recommended. While the existing data preserves ordering, partition hashing will be broken for messages hashed after the partition count changes due to addition of partitions.
- Adding partition to an existing topic or event hub instance is recommended,
- When you are using the round robin (default) method of sending events
- Kafka default partitioning strategies, example – StickyAssignor strategy
How to add partitions to an existing topic or event hub?
- You can scale up the partition count using update management api calls
- You can use the ARM, PS, CLI to update the partitions
- NamespaceManager calls can also update the partition count
- The Kafka AlterTopics api (example, via ‘kafka-topics’ CLI tool) can be used to increase the partition count
Partition can be added only to topics or event hub instances that are in Dedicated Event Hubs clusters. You can only scale up the partitions and once added, you cannot scale them down.