Streaming in the Cloud: Another Arrow in Amazon’s Quiver


I just finished reading the developer’s guide for Amazon’s recently announced Kinesis service.   Kinesis allows customers to build applications that process data streams in real-time.   It has built in fault tolerance, and can scale up and down with data volume.

To a developer, Kinesis’s architecture is pretty straightforward: data producers generate continuous, ordered streams of data records.   Developers write Kinesis applications, which access the data using the Kinesis client library, and can process data streams in whatever way they want.    Streams can be sharded so there are no hard capacity limits for a stream (shards have a 1 MB/second ingress limit, a 2 MB/second egress limit, and up to 1000 PUT transaction per second.)  Since a stream can have multiple readers and writers, the service allows for several different types of processing to take place on a stream.     For example, one application might generate a real-time alert, while another application might pre-process the data before loading it into a database accessed by a reporting app.    A third Kinesis application might load raw or aggregated data into a long term data warehouse like Amazon Redshift.    Pricing is based on an hourly shard rate and volume of PUT transactions.

Kinesis is part of the wave of new tools designed to handle high-volume data streams, such as Kafka and Storm.    In fact, Kinesis combines some of the features of both, adding Kafka’s message persistence to the scalable, high-performance stream processing that Storm provides.    The “magic” in Kinesis – the logic that provides fault tolerance, replication, etc. – is all under the hood.     This is exactly the reason that Kinesis is likely to find a significant user base: it’s a managed service.   Kinesis handles a lot of the details of provisioning and management for you – and it’s tightly integrated with the rest of the Amazon’s cloud services.    You can use Auto Scaling for Kinesis applications, DynamoDB to stored derived data, and Redshift for long-term storage.    Right now, Kinesis is only available as a Limited Preview in the U.S. East region, so you may not be able to get started just yet.   However, if your data is being generated in Amazon’s cloud and you need to do streaming analytics, it seems likely that Kinesis will be the natural choice.    Over time, as open source Kinesis applications start to appear, developers will find it even easier to build complex streaming applications in the Amazon environment.   Overall, Kinesis is a valuable addition to Amazon’s arsenal that can only add to their dominance in the IaaS space.


About ckalmanek

CTO at Case Commons
This entry was posted in Technology and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s