What is Amazon Kinesis?
Amazon Kinesis is a real-time data streaming service that makes it easy to collect, process, and analyze data so you can get quick insights and react as fast as possible to new information.
With Amazon Kinesis you can ingest real-time data such as application logs, website clickstreams, IoT telemetry data, social media feeds, etc., into your databases, data lakes, and data warehouses. You can also build your own real-time applications using this data.
Looking to immediately get some deep info on Kinesis? Check out Cloud Academy’s Getting Started with Amazon Kinesis Learning Path, which includes courses, labs, and exams to fill your knowledge bucket.
Two key characteristics of Kinesis are that it enables you to process and analyze data as it arrives, and then it responds in real-time instead of having to wait until all your data is collected before the processing can begin. You can understand what Kinesis is designed to do by remembering that it can:
- Collect data quickly
- Analyze data quickly
How you collect, store, process, and analyze your data is up to you — it can range from complete customization of the system to a turnkey approach using the PaaS offering.
Why use Amazon Kinesis? Why use real-time analytics?
People use Amazon Kinesis because it’s an effective, real-time analytics service — but why are we so concerned about real-time data?
The chart below shows the function and value of data over time.
The value of data to be preventative or predictive diminishes (rather quickly) over time. The data still has value, it just becomes useful for a different role — for carefully looking at past trends, as opposed to continually acting on what’s happening in the present moment.
Amazon Kinesis Speed — Kinesis is good for companies that already have terabytes of data in the cloud
Data shows that 20% of companies already have migrated to the cloud. If you already have your data living in robust, backed-up resources, it makes sense to use a service designed to play nicely with those resources. You can reduce your time to market and get valuable insights more quickly.
Amazon Kinesis Scalability — elasticity is helpful, especially for network data
If you’re analyzing network data or anything that’s dynamic in nature, it’s important to be aware of cost concerns. You don’t want to commit to services that you don’t need, especially when your data can change at a moment’s notice. And you want to be prepared for spikes that will occur. Kinesis’s scalability helps you stay lean with costs and flexible with demand.
Amazon Kinesis Ease of Use — get benefits of a managed service
Use the AWS Console to quickly start up a Kinesis Firehose stream. In just a few clicks, you’ll be able to see your data coming in, and you won’t have to worry about resource allocation or administration.
Amazon Kinesis services
Kinesis provides four specialized services classified by the type and stage of processing of streaming data, as described below.
Amazon Kinesis Data Streams (KDS)
Note: Originally, this product was called Kinesis Streams — now we have Kinesis Data Streams and Kinesis Video Streams.
Characteristics
- Custom real-time processing.
- Kinesis Data Streams is the fastest service, offering < 1-second processing latency.
- Customization requires coding overhead from you.
How is it architected?
Since Amazon Kinesis Data Streams is a fully customizable offering, it’s good to know some basics about how it’s built. It’s components are:
- Record
- A record is the unit of data in an Amazon Kinesis stream. Each record in the stream is composed of a sequence number, a partition key, and a data blob. A data blob is the data of interest your data producer adds to a stream.
- Shard
- The data records in the stream are distributed into shards. A shard is the base throughput unit of a Kinesis Data Stream, meaning it’s how you gauge the amount of bandwidth you have. The overall capacity of your stream is a function of the number of shards that you specify for that stream. More shards = more capacity. You can always increase your number of shards.
- Producers
- The producers are the mechanisms that continuously push data to streams. A web service sending log data to a stream is an example of a producer.
- Consumers
- Consumers receive records from Kinesis Data Streams and process them. These consumers are known as Amazon Kinesis Streams applications. Consumers can store the output using AWS services such as Amazon DynamoDB, Amazon Redshift, or Amazon S3. An Amazon Kinesis application is a data consumer that reads and processes data from an Amazon Kinesis Stream and typically runs on a fleet of EC2 instances. You need to build your applications using either the Amazon Kinesis API or the Amazon Kinesis Client Library (KCL).
Amazon Kinesis Video Streams (KVS)
Kinesis Video Streams is a purpose-built video streaming analysis service that integrates with AWS Machine Learning (ML)/Artificial Intelligence (AI) offerings.
Characteristics
- Processes video/binary data.
- Makes it easy to capture live video, play it back, and store it for real-time and batch-oriented Machine Learning-driven analytics such as Amazon Rekognition or TensorFlow.
- Offers security to input (secure connection to camera) and for the processed data (encrypted in storage).
Amazon Kinesis Data Firehose (KDF)
This is a managed service to help you quickly get near-real-time data streams collected and into storage for analysis.
Characteristics
- Kinesis Data Firehose is slower than Kinesis Data Streams; has data latency of >60 seconds.
- You just need to dump the data somewhere (S3, Redshift, Elasticsearch Service, + third party services).
- No coding necessary; just use the AWS Console to get started quickly.
Amazon Kinesis Data Analytics (KDA)
Amazon Kinesis Data Analytics allows you to quickly create SQL code that continuously reads, processes, and stores data in near real-time.
Characteristics
- Real-time processing.
- Fully managed service.
- A quick way to get insight into data streams by just writing some SQL.
- Scalable, which helps optimize performance and cost.
What are some Amazon Kinesis use cases?
One way to think about use cases for Kinesis is by considering what some sources of real-time streaming data are, for example:
- Mobile apps, such as any that continually collects your current GPS position
- Open-source tools such as Open Web Analytics (OWA) or Matomo, used to track visitors’ website behavior
- Application logs that continually collect the behavior and operating performance of server applications.
- IoT data, sent from sensors in all our household objects that have internet connectivity
- Social apps such as Twitter and Facebook, exposing user posts and commentary
And a next step in learning about use cases is to check out our course on Working with Amazon Kinesis Analytics. You’ll get valuable, actionable insight on the nuts and bolts of a key Kinesis service.
Wrapping it up
Amazon Kinesis acts as the front end of your streaming data and — depending on your business needs — can be as complicated and as real-time as you want. Coupled with its integration with other AWS services, Kinesis can help you process, react, and take action so that your business move swiftly on its needs.