0% found this document useful (0 votes)
49 views

1 Report Design Notifications System

The document discusses a project to build a scalable notification system using Apache Kafka. It describes the key components of the system including a producer service, Kafka cluster, consumer service and cloudwatch metrics. The system is designed to provide scalability and guarantees around delivery and ordering of notifications.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

1 Report Design Notifications System

The document discusses a project to build a scalable notification system using Apache Kafka. It describes the key components of the system including a producer service, Kafka cluster, consumer service and cloudwatch metrics. The system is designed to provide scalability and guarantees around delivery and ordering of notifications.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

CS237 Project Report

Scalable Notification System


Group 1: Balkunje Pritika Shenoy
Reshma Thomas

1 Key Objective and Proposed Work


Notifications are an integral part of most product ecosystems that exist today. They are leveraged for
improving user engagement, analyzing usage statistics, and monitoring application status. These
notifications can be user-facing or be used to notify components within a system. A notification
system is a typical event-driven system where a notification is generated when a certain event occurs.
In most applications today there are several event sources generating a significant volume of events.
These events are generated in a random pattern and thus cannot be predicted. Thus it is essential for
notification systems to not only be highly scalable but also dynamically scalable. In this project, we
focused on building one such dynamically scalable event-driven notification system using Apache
Kafka. In addition, we also measure the performance of our system for different event generation and
event consumption rates.

2 Related efforts
Traditionally event-driven messaging systems were point-to-point and synchronous. This made them
tightly coupled, therefore inflexible and cumbersome to scale. The need for a more decoupled,
flexible, and dynamic communication model gave rise to the publish/subscribe model. In the
publish/subscribe model event producers or publishers are decoupled from the event consumers or
subscribers. A dedicated middleware infrastructure facilitates communication between them. This
model provides the flexibility to scale both the publishers and subscribers independently without
overloading either of the components. This makes the publish/subscribe paradigm suitable for highly
scalable event-driven architectures. Therefore, this model is central to our architectural design.
Publish/Subscribe models can be categorized into three types[4] based on event patterns that
subscribers are interested in. The first type was the topic-based model. Topics are keywords that
classify events or information. In content-based publish/subscribe models events are classified based
on the actual content contained in the event instead of a keyword. The type-based model divides the
event space based on the structure of the event rather than the content itself. These systems can be
built based on three main types of architecture [2] - Hierarchical Client/Server Architecture, Acyclic
Peer-to-Peer Architecture, and General Peer-to-Peer Architecture. To get a better understanding of
how we could leverage this model, we surveyed several event-driven notification systems based on
this model.

Siena is a content-based pub/sub that uses the client-server model with hierarchical topology. The
architecture network overlay where routers perform specialized routing and forwarding functions. The
implemented algorithm has a good overall performance, is stable, and is optimized to be used as a
forwarding function in content-based network routers. Siena compared two architectures for its
publish-subscribe model. The hierarchical client-server architecture was found to work well for a
small number of subscribers that join and leave frequently. However, this architecture doesn’t scale
well for an increasing number of notifications, compared to a peer-to-peer model. Another reason for
Siena’s scalability is the restricted use of types to reduce complexity in pattern matching.

Gryphon is a topic-based and content-based publish-subscribe system that uses the client-server
model developed by the IBM Research Center. Clients may subscribe using both a topic identifying
messages tagged with the topic by a publisher, and using filters that select messages. It uses the link
matching algorithm for multicast messaging. Gryphon is most suitable for distributing large amounts
of data to thousands of clients to a large network in real-time. It also implements a broker organization
protocol for fault-tolerant delivery
Scribe is a topic-based pub/sub model built upon Pastry, a peer-to-peer overlay network. The
subscribe management mechanism in Scribe is best suited for topics with a varying number of
subscribers. The subscribes are distributed across the multicast tree structure of the network. The
randomization property implemented by Pastry ensures that this tree is well balanced and therefore the
load is distributed evenly. As a result, Scribe can support a large number of topics and subscribers per
topic. However, Scribe does not guarantee delivery or ordering of messages

Hermes is a distributed type-based publish/subscribe system. It is implemented as a network of event


brokers deployed using an overlay network for efficient routing and fault tolerance. The proposed
fault-tolerance mechanisms make Hermes extremely scalable and flexible. The existing
implementation of Hermes does not allow dynamic routing of events from publishers to interested
subscribers

Big Active Data systems combine the capabilities of big data storage, analytics, streaming, and
publish-subscribe to provide meaningful actionable notifications to a huge number of subscribers and
big-data analysis of historical data. However, unlike Kafka, which only stores data on local disks of
brokers, BAD consists of a large cluster of distributed storage and computational nodes in the
backend which enhances its capability to store and process massive amounts of data from
heterogeneous sources. A BAD system supports the pub-sub model and at the same time supports big
data queries. Unlike traditional pub-sub systems, BAD is capable of matching subscriptions against
multiple notifications and previously stored data to provide combined results which are more
meaningful to the subscribers than individual notifications. Such systems are especially useful in the
case of users subscribing to emergency notifications.

JMS is a vendor-agnostic Java API that supports point-to-point queueing as well as the
publish-subscribe model. It provides an interface for Message Oriented Middleware vendors to create,
send, receive and understand messages. It can support both the client-server and peer-peer models. It
also supports durable subscriptions. There are several implementations of JMS like IBM MQSeries,
Progress SonicMQ, Fiorano FioranoMQ, and Sun JMQ.

RabbitMQ is a messaging broker which primarily uses AMQP protocol and is very flexible in that it
supports a variety of message exchange patterns such as point-to-point synchronous communication
and asynchronous publish-subscribe communication. RabbitMQ supports intelligent routing by
offering flexibility to application developers in deciding how messages should be routed. The
message acknowledgments on the consumer side ensure reliability. However, RabbitMQ is not
designed to persist messages on a long-term basis, and cannot be used to replay messages.
Throughput is typically 50K messages per second and does not scale well for applications that publish
events at a very high rate.

Apache Kafka is a distributed scalable queue and publish-subscribe-based messaging system. Kafka
consists of a topic to which publishers can publish messages. These events or messages are stored in
servers called brokers. A topic is divided into several partitions and stored across the brokers to
provide load balancing. Individual partitions are also replicated across brokers to provide replication.
Within each partition, messages are stored sequentially thus providing ordering guarantees within a
partition. Kafka has the notion of consumer groups. Each consumer within a consumer group can
consume messages from different partitions. This provides partition-level parallelism. Partitioning of
topics, consumer groups, and decoupling of producers and consumers make Kafka an extremely
scalable system. It uses sequential I/O to provide high read and write throughput. Persistency of data
in log files can be used for monitoring, analysis, and replaying of messages. Kafka provides
high-speed delivery by grouping messages that reduces network overhead. The system continues to
operate even if a broker fails because messages are replicated across brokers. However, there is no
auto-scaling done for the consumers, partitions, and brokers when the load increases. Manual
intervention is required to add more consumers and brokers at run time to manage the load.

3 Design and Architecture

3.1 Components:
1. Producer Service: Fetches RSS feeds every 5 seconds for configured topics. For our testing,
we have configured the topics for CNN news and NOAA weather feeds. The producer service
is a microservice deployed on AWS Elastic Container Services. This service is assigned to an
Auto Scaling target group. This auto-scales the producer instances based on CPU/Memory
usage.
2. Kafka Cluster: We have deployed a 2-broker Kafka cluster on AWS. This Kafka cluster is
fully managed by AWS MSK. The two broker configuration is required for replication of
partitions. The brokers can be autoscaled based on the load. However, for our configuration
two brokers seem to be sufficient.
3. Consumer service: The consumer service consumes messages from the Kafka brokers. The
consumer service also contains a notification handler that sends messages to the appropriate
slack and mail subscribers. The consumer service also implements a mapping functionality to
map the subscribers to topics. This microservice is also deployed on AWS ECS and is
assigned to an auto-scaling group.
4. CloudWatch Metrics: To monitor the performance of our configuration we use the Kafka
metrics integrated with cloud watch. We measure the performance of our system on several
metrics discussed in the following sections.

3.2 Guarantees
Our notification system guarantees scalability by exploiting the parallelism of partitions per topic,
having a corresponding number of consumers in the consumer group for that topic and using
microservice architecture for decoupling and scaling fetching and consumer services independently.
The system is integrated with AWS Elastic Container for auto-scaling the producers, consumers and
brokers based on CPU and memory usage. What our notification system does not guarantee are
message ordering and failure recovery.
3.3 Configurations for planning capacity of the system
The producer service fetches CNN news feeds and NOAA weather feeds at a configured interval of 5
seconds and sends to topics news and weather respectively. Since the parallelization capability of the
system is dependent on the number of partitions, we decided that 16 partitions per topic, is ideal for
our use case. We chose the number of brokers to be two to allow for partition replication. Hence, we
configured partitions to be a multiple of two to allow for an even distribution of partitions and a
replication factor of two. Even though data throughput increases with a number of partitions and a
single broker can support up to 4000 partitions, we did not choose the number of partitions to be in the
hundreds to fully exploit the capability because of two reasons:
1. The arrival rate of events does not exceed 1000 messages per second for our system.
2. Increasing the number of partitions increases the number of open file descriptors and a broker
failure can cause leader election for these partitions to take a long time, thus resulting in a
negative impact on performance
3. We set the expected throughput of our system to be 100 MB/second and we expect the
throughput per partition to be 10 MB/second based on the experiment results we have seen so
far.

We have configured one consumer group per topic. The initial number of consumers configured for
each consumer group is two which is less than the number of partitions. This is because increasing the
number of consumers can result in more consumer group re-balancing time on a consumer failure.
Our system is configured to auto-scale the number of consumers belonging to the consumer group
when the initial number of consumers is not able to keep up with the arrival rate of events. We have
mocked topic-subscriber mapping in the consumer service to test the functionality. The consumer
service sends mail and slack notifications on a real-time basis to the subscribers. We configured a
common connection object for all mail notifications to avoid authentication issues due to the high
mailing rate. To allow for more throughput, we also configured batch size and gzip compression on
the producer side. We also configured the following configurations for high throughput but did not
tune it for our system to avoid too many dynamic variables while planning capacity of the system.
1. Minimizing the impact of consumer group rebalancing by setting max.poll.interval and max.
poll.records
2. Improving throughput by increasing the minimum amount of data fetched in a request -
fetch.max.wait.ms and fetch.min.bytes
3. Lowering latencies by increasing maximum batch sizes - fetch.max.bytes

** The capacity of the system was planned to keep topic subscriptions constant and only allow a
dynamic arrival rate of events.

3.4 Technology stack


Apache Kafka, AWS Elastic Container Service, AWS Elastic Compute, Microservices, Docker, Java,
Spring boot, Prometheus, Grafana, Postgres

3.5 Alternate approaches


We evaluated the following design choices for our system:
1. Since, consumer service is the bottleneck of the system, separate out delivery of notifications
from the consumer service by writing consumed messages into Postgres DB and writing a
separate micro-service that reads messages from the DB periodically, batches all messages for
the same topic and sends out the batched mail and slack notifications periodically. However,
we did not choose this approach as using a shared DB is an anti-pattern that couples the
consumer and delivery microservices. Since our notification delivery functionality is
lightweight, implementing it within our consumer services seemed sufficient.
2. Another approach we considered for separating out consumer service and delivery service
was to use REST calls. However, REST calls at such a high rate between services does not
scale well and negate the benefit of decoupling the services.
3. Use Kafka/Spark streaming for aggregation of messages. We did not choose this approach as
our system is topic-based and does not require extensive processing or matching logic before
delivering to subscribers.

4 Challenges
1. Mail delivery of notifications failed recently due to the removal of third-party less secure
access from Gmail accounts. OAuth needs to be implemented now to get the mail delivery
working.
2. SMTP yahoo mail connections getting reset because of spam detection.
3. With the previously working mail delivery services, the SMTP rate limit caused
connections to be reset due to the higher arrival rate of events during scalability testing.
4. Sender mail account got blocked during scalability testing.
5. Created 100 subscribers initially. However, it was not feasible to get 100 temporary mail ids
with valid domains. It was also not feasible to send to multiple slack channels using a single
webhook. Hence, we had to switch to a small number of subscribers and one slack channel.
6. We intended to auto-scale the number of consumer services based on the Kafka consumer
lag metrics. However, the ECS auto-scaling feature does not allow auto-scaling based on
custom metrics. Since the CPU and memory utilization of consumer instances increase
7. with consumer load, we found this to be a sufficient metric to auto-scale the consumer
service instances with increased load.
8. As parallelization is limited by the number of partitions, auto-scaling the number of
consumer services beyond the number of partitions does not add any value to the scalability
of the system.

5 Test results and evaluation

5.1 Fixed partitions - variable consumer clients

#Partitions #Consumer BytesInPerSec BytesOutPerSec MaxOffsetLag


clients

12 2 1.13 M 16K 253K

12 4 1.10M 20K 513K

12 8 1.11M 69K 770K

Evaluation: As the number of consumer clients increases, the throughput of consumption increases
and max offset lag increases. We believe that slack is not able to keep up with the increasing
throughput and applies a rate limit which causes increasing lag.

5.2 Fixed consumer clients - varying partitions

#Partitions #Consumer clients BytesInPerSec BytesOutPerSec MaxOffsetLag

12 16 1.1 M 61K 983K

16 16 1.09 M 60K 166K

24 16 258K 65K 33.8K


Evaluation :
● As the number of partitions increases from 12 to 16, we observed that the throughput at
producer and consumer remains relatively the same. However, the consumer lag decreases
drastically. We believe this is because we now have a one-to-one mapping between a partition
and a consumer client.
● As the number of partitions increases from 16 to 24, we observe that the throughput at the
producer reduces drastically. We believe this is because as the number of partitions increases,
the number of open file descriptors increases and the broker also needs to replicate those
many partitions. Consumer throughput remains relatively the same and we also observe a
decrease in the consumer lag.

** We compared the bytesInPerSec and bytesOutPerSec at the same timestamp when we


observed the max BytesInPerSec.

5.3 Autoscaling based on CPU Utilization

We show above that when the CPU % goes higher than a configured threshold, the consumer
instances are autoscaled

6 Future work
The notification system can be enhanced to add more capabilities such as:
1. Add priority levels to topics and set delivery time based on the priorities.
2. Batch intelligently such that each subscriber receives a single notification for all the current
feeds they have subscribed for.
3. Set the number of brokers to an odd number which is the thumb rule for quorum-based
elections
4. Since the number of consumers is limited by the number of partitions, create sub-topics and
shard the partitions.
5. Decouple the consumer and delivery services which can also be scaled independently and
implement a load balancer to direct traffic from consumer instances to the delivery service
instances.
6. Implement a user-subscription interface
7. Guarantee message ordering and failure recovery

7 Code Repository
https://github.com/pritikashenoy/NotificationService237

8 References
[1] Antonino Virgillito, Publish/Subscribe Communication System from Models to applications, Phd
Thesis, 2003Universit`a degli Studi di Roma "La Sapienza", Italy

[2] Patrick Th. Eugster , Pascal A. Felber, Rachid Guerraoui, Anne-Marie Kermarrec, The many faces
of publish/subscribe : ACM Computing Surveys (CSUR), Volume 35 , Issue 2 (June 2003) Pages: 114
- 131, 2003
[3] A. Carzaniga, and A.L. Wolf"Content-based Networking: A New Communication Infrastructure".
NSF Workshop on an Infrastructure for Mobile and Wireless Systems. In conjunction with the
International Conference on Computer Communications and Networks ICCCN. Scottsdale, AZ.
October, 2001.

[4]GBanavar,TChandra,B Mukherjee at el. An Efficient Multicast Protocol for Content-Based


Publish-Subscribe Systems,ICDCS 1999 http://www.research.ibm.com/people/b/banavar/icdcs99.pdf

[5] A. Carzaniga, M.J. Rutherford, and A.L. Wolf "A Routing Scheme for Content-Based
Networking". Proceedings of IEEE INFOCOM 2004. Hong Kong, China. March, 2004.

[6] Michael Ahlberg and Mans Rullgard, Peer-to-peer routing with Pastry and Multicast using Scribe,
Technical Report 05/24/2003,
http://www.imit.kth.se/courses/2G1126/vt03/paper_reports/2g1126-group7.pdf

[7] R Baldoni, M Contenti, and AVirgillito, The Evolution of Publish/Subscribe Communication


Systems, Springer Verlag LNCS Vol. 2584, 2003
http://www.dis.uniroma1.it/~virgi/papers/BCV_FUDICO02.pdf

[8] Peter R. Pietzuch and Jean M. Bacon. Hermes: A Distributed Event-Based Middleware
Architecture. Submitted to the Workshop on Distributed Event-Based Systems (DEBS), 2002.
http://citeseer.ist.psu.edu/pietzuch02hermes.html

[9] Antony I. T. Rowstron, Anne-Marie Kermarrec, Miguel Castro, and Peter Druschel, "SCRIBE:
The design of a large-scale event notification infrastructure," in Networked Group Communication,
2001, pp. 30--43. http://citeseer.ist.psu.edu/rowstron01scribe.html

[10] M. K. Aguilera, R. E. Strom, D. C. Sturman, M. Astley, and T. D. Chandra. Matching events in a


content based subscription system. In Eighteenth ACM Symposium on Principles of Distributed
Computing (PODC '99), Atlanta GA, USA, May 4--6 1999.
http://www.research.ibm.com/gryphon/papers/matching.pdf

[11] Banavar, G., Chandra, T., Strom, R., Sturman, D. (1999). A Case for Message Oriented
Middleware. In: Jayanti, P. (eds) Distributed Computing. DISC 1999. Lecture Notes in Computer
Science, vol 1693. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48169-9_1

[12] Philippe Dobbelaere and Kyumars Sheykh Esmaili. 2017. Industry Paper: Kafka versus
RabbitMQ. In Proceedings of DEBS ’17, Barcelona, Spain, June 19-23, 2017, DOI:
10.1145/3093742.3093908

[13] Kreps, Jay, Neha Narkhede, and Jun Rao. "Kafka: A distributed messaging system for log
processing.", Proceedings of the NetDB 2011

[14] S. Jacobs et al., “A BAD demonstration: Towards big active data,” Proc. VLDB Endow., vol. 10,
no. 12, pp. 1941–1944, Aug. 2017.

You might also like