Apache Kafka: What sets it Apart?

+ No comment yet
In this edition, Rahul (rahul dot amodkar at indexnine dot com) looks at the advantages of Apache Kafka, the distributed commit log messaging system, that make it a leading messaging solution when it comes to large scale streaming data processing applications. 

final-ADVANTAGE - Copy - Copy.png

Each day, large amounts of data and metrics are collected from real time activity streams, performance metrics, application logs, web activity tracking and much more. Modern scalable applications need a messaging bus that can collect this massive continuous stream of data without sacrificing good performance and scalability. Apache Kafka is built ground up to solve the problem of a distributed, scalable, reliable message bus.


Apache Kafka is a distributed messaging system originally built at LinkedIn and now part of the Apache Software Foundation. In the words of the authors : “Apache Kafka is publish-subscribe messaging rethought as a distributed commit log”.

While there are many mainstream messaging systems like RabbitMQ and ActiveMQ available, there are certain things that make Apache Kafka stand out when it comes to large scale message processing applications.


A single Kafka broker (server) can handle tens of millions of reads and writes per second from thousands of clients all day long on modest hardware. It is the preferred choice when it comes to ordered durable message delivery. Kafka beat RabbitMQ in terms of performance on large set of benchmarks. According to a research paper published at Microsoft Research, Apache Kafka published 500,000 messages per second and consumed 22,000 messages per second on a 2-node cluster with 6-disk RAID 10.

Source:  research.microsoft.com


In Kafka, messages belonging to a topic are distributed among partitions. A topic has multiple partitions based on predefined parameters, for e.g all messages related to a user would go to a particular partition. The ability of a Kafka topic to be divided into partitions allows the topic to scale beyond a size that will fit on a single server. The concept of dividing a topic into multiple partition allows Kafka to provide both ordering guarantees and load balancing over a pool of consumer processes.

Another aspect that helps Kafka to scale better is the concept of consumer groups (a collection of message subscribers/consumers). The partitions in the topic are assigned to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group. This mechanism aids in parallelism of consuming the messages within a topic. The number of partitions dictates the maximum parallelism of the message consumers.

Distributed By Default

Kafka is designed very differently than other messaging systems in the sense that it is fully distributed from the ground up. Kafka is run as a cluster comprised of one or more servers each of which is called a broker. A Kafka topic is divided into partitions using the built-in partitioning. It has a modern cluster-centric design that offers strong durability and fault tolerance guarantees. The Kafka cluster comprised of multiple servers/brokers can be spread over multiple data centers or availability zones or even regions. This way the application would be up and running even in a disaster scenario of losing a datacenter.


Kafka partitions are replicated across a configurable number of servers thus allowing automatic failover to these replicas when a server in the cluster fails. Apache Kafka treats replication as the default behaviour. This ability of the partitions to be replicated makes the Kafka messages resilient. If any server with certain partitions goes down, it can easily be replaced by other servers within the cluster with the help of the partition replicas.


Messages are persisted on disk for a specific amount of time or for a specific total size of messages in a partition. The parameters are configurable on a per topic basis. One can choose to persist the messages forever. The time to be persisted is configurable per topic.

Guaranteed Ordered Messages

Kafka guarantees a stronger ordering of message delivery than a traditional messaging system. Traditional messaging systems hand out messages in order, but these messages are delivered asynchronously to the consumers. This could result in the message getting delivered out of order to different consumers. Kafka guarantees ordered delivery of messages within a partition. If a system requires total order over messages then this can be achieved by having a topic with no partitions. But this comes at the cost of sacrificing parallelism of message consumers.

Industry Adoption

Apache Kafka has become a popular messaging system in a short period of time with a number of organisations like LinkedIn, Tumblr, PayPal, Cisco, Box, Airbnb, Netflix, Square, Spotify, Pinterest, Uber, Goldman Sachs, Yahoo and Twitter among others using it in production systems.


Apache Kafka is an extremely fast and scalable message bus that supports publish-subscribe semantics. Due to its distributed and scalable nature, it is on its way to becoming a heavyweight in the scalable cloud applications arena. Although originally envisioned for log processing, IoT (Internet of things) applications will find Kafka a very important part of architecture due to the performance and distribution guarantees that it provides.

Want to learn more about how to build scalable applications? Please take a look at our Building Scalable Applications blog series Part I, Part II and Part III.

Building Scalable Applications: Handling Transient Failures in Cloud Applications

+ No comment yet
In this edition, Rahul (rahul dot amodkar at indexnine dot com) looks at an approach to handle the transient failures encountered in cloud platform deployments. We will look at a mechanism that can be employed to take care of transient failures without losing data or impacting user experience.


While there are numerous advantages of deploying on the cloud, there are no guarantees that the cloud platform services will respond successfully every time. In a cloud environment periodic transient failures should be expected. The approach should be to minimize the impact of such failures on the application execution by anticipating and proactively handling failures.

The Problem With Cloud Services

The cloud is made up of hardware components that work together to present a service to the user. Since hardware is subject to failure, cloud services cannot guarantee 100% uptime. Small failures could cascade and affect multiple services, all of some of which could be down for brief periods of time.

When the consumer’s use of the cloud service exceeds a maximum allowed throughput, the system could throttle the consumer’s access to the particular service. Services deploy throttling as a self-defense response to limit the usage, sometimes delaying responses, other times rejecting all or some of an application’s requests. The onus is on the application to retry any requests rejected by the service.

What Needs To Be Done

Cloud applications need to retry operations when failures occur. It does not make sense to retry the operation immediately because most failures should be expected to last for a few minutes at least. We will consider a scenario where the database service is unavailable for a short time.

The figure above shows the flow of the operation which encounters a transient failure and the possible approach to recover from it. The message sent by the web instance is added to the primary message queue. This message is picked up for processing by the worker instances.

When the worker tries to write data to the database tier, it encounters a failure. In this case, the worker adds the data to be written to a “Deferred Processing Queue”.

A scheduled job runs every 5 minutes, listens to the “Deferred Processing Queue” and consumes the messages from this queue for processing. The scheduled job reinserts the message into the primary message queue where it is read by the worker instances and processed successfully if the database service is available. If not, the same process is followed again. This allows the cloud application to process the messages at a deferred time, and makes it resilient to failures.


Transient failures are a common phenomenon in cloud applications. A cloud-based application is expected to withstand such transient failures and prevent loss of data even under extreme circumstances.

Building Scalable Applications Part 3 : Cloud features that help you scale

+ 1 comment
In this edition, Rahul (rahul dot amodkar at indexnine dot com) looks at a typical cloud architecture that can support many types of applications. We will examine each tier and explore concepts such as load balancing and auto-scaling.

In previous posts, we covered the benefits of moving to the cloud for scale and the design principles of building scalable applications. In this post, we will look at a typical reference architecture for a cloud based scalable application and cloud features that help your application scale.


The below listed figure illustrates a reference architecture commonly used by scalable applications. The architecture has the following usual suspects that allow it to handle high traffic and load patterns experienced by typical scalable applications:

  • Load Balancer(s) for the web tier.
  • Auto-scaling for web and worker instances.
  • Message queue
  • Cache
  • SQL and/or NoSQL database(s).

indexnine FINAL.png

Load Balancer

Load balancers are used to distribute load on application services across multiple systems. This allows the application to scale horizontally, and allows the application to add more capacity transparently in response to demand. By fronting application services using a simple proxy, a load-balancer provides clients a single internet location to communicate with, while fanning out  backend processing across multiple service instances. A load balancer can be used for any service within the application as needed. Most commonly a load balancer is used to front web instances.

Load balancers typically use a round-robin algorithm to route traffic to the service layer. In advanced configurations, they can also be configured to control the volume of traffic routed to different instances.

Sticky Sessions

For legacy applications, load balancers also support session stickiness, which means the load balancer routes all requests for a given session to a single instance. This can be used in case the concerned tier is not stateless. Since implementations of load balancers vary, consult the respective documentation to understand how sticky sessions are handled.

Most cloud vendors have load-balancing available as-a-service. On Amazon Web Services, you can configure a load-balancer to front a set of EC2 instances that checks for health of the instances periodically and routes traffic to them only if they are healthy.

Web Tier

A stateless web tier makes it easy to support horizontal scaling. As discussed in part 2, REST web services are designed to be stateless and therefore conducive to scale horizontally. Some modern frameworks that allow rapid creation of web apps are Dropwizard, Spring Boot (java), node.js and express (javascript).


Since the web tier is consumer facing, it is normally set up to auto-scale. Most cloud vendors support autoscaling by monitoring certain characteristics of the instances to be scaled. On Amazon Web Services, it is possible to track CPU usage or network bandwidth usage of instances and respond to extreme events by scaling out (expanding) or scaling in (contracting). To avoid cost escalation, you can also specify a higher limit for the number of instances auto-scaling will spawn.


Caching drives down access times for most frequently used documents. Since documents are stored in-memory the response times for web apps can improve manifold by using a cache. For e.g. User information might hardly change during the course of a session, but related information such as user permissions and role information needs to be accessed frequently. Therefore, the user information and the related roles and permissions are prime targets to be cached. This will result is faster rendering of pages that need to assess permissions before displaying data.

Redis and Memcached are two of the most popular caches available. Amazon web services provides both the implementations in the Elasticache offering.


Content-Delivery-Networks or CDNs are used to deliver static artifacts such as pictures, icons and static HTML pages much faster than a regular “single-source” website. Cloud vendors provide access to CDNs that will replicate static resources closer to the customer, thereby reducing latency. This results in excellent load times especially if there are large images and/or videos being used on the site or application.

Amazon Web Services CloudFront automatically identifies and serves static content from the closest datacenter irrespective of the region the site is hosted in. This results in drastic reduction of latency and faster loading of web artifacts.

Message Queue

Asynchronous communication ensures that the services/tiers are independent of each other. This characteristic allows the system to scale much farther than if all components are closely coupled together. Not all calls in your system need to be asynchronous. You can use the following criteria to qualify a call as asynchronous communication.

  • A call to a third party or external API. 
  • Long running processes
  • Error prone/changed frequently methods.
  • Any operation that does not need an immediate action as a response. 

While you implement the message bus based asynchronous communication, you need to take care that the message bus can scale to the rise in traffic. RabbitMQ is a popular message-oriented middleware used by many applications. You can also consider ActiveMQ, Kafka and Amazon SQS as the other message queue options.

Worker Tier

Similar to the Web tier the worker tier too needs to support horizontal scaling. The worker instances are independent processes that listen to the message queue and process the message as they are received.

The message bus size is a good parameter to monitor for scaling such that the number of worker instances increases with the increase in the number of message in the queue. Amazon AWS Auto Scaling groups and Rackspace Auto Scale are some examples of auto scaling options available with the respective cloud platforms.

Database Tier

A typical Database Tier of a scalable application would make use of both NoSQL and SQL databases as each of these have their own advantages. NoSQL can be used in the following scenarios where:

  • A relational database will not scale to your application’s traffic at an acceptable cost.
  • The data is unstructured/"schemaless". The data does not need explicit definition of schema up front and can just include new fields without any formality.
  • Your application data is expected to become so massive that it needs to be massively distributed
  • The application needs fast key-value access.

You should avoid the use of NoSQL and use RDBMS databases in scenarios where:

  • Complex/dynamic queries are needed. Complex queries are best served from an RDBMS.
  • Transactions need guarantee of ACID (Atomicity, Consistency, Isolation, Durability) properties.

When it comes to NoSQL databases there are multiple options available like MongoDB, Cassandra, AWS DynamoDB and CouchDB. Google recently announced the availability of, Google Cloud Bigtable, a NoSQL database that drives nearly all of Google’s largest applications including Google Search, Gmail and Analytics.


We discussed the various components of a reference architecture that can be used to run different types of applications on the cloud. Each application will dictate the nature and scale of usage of these different components, but most applications will need all of these components to achieve scale. Most cloud vendors have managed service offerings that provide access to these components.