5 July 2018
- Roy Schulte and David Luckham
This article represents personal opinion, not necessarily that of an employer or anyone else.
After nearly twenty years of development, tailorable event stream processing (ESP) platforms are no longer limited to niche applications or experiments. They have become essential tools for real-time analytics in many kinds of production systems.
A growing number of mainstream companies are finding that they need ESP platforms to handle their streaming data in real-time or near-real-time. The motivation is coming from the explosion in the amount of streaming data that needs to be analyzed, especially sensor data from the Internet of Things (IoT); web clickstreams from customer interactions; social media data such as tweets, Facebook posts and Linked-in updates; market data; weather data; and event streams from transaction processing applications.
Most processing of event streams has always been done in vertically- or horizontally-specialized commercial off-the-shelf (COTS) application packages or SaaS offerings, not in tailorable ESP platforms. That is, companies use purpose-built applications for supply chain visibility, security information and event management (SIEM), fraud detection, real-time customer relationship management (CRM) offers, fleet management, call center monitoring and many other types of systems. These products do event stream processing, but they are hardwired to handle only certain kinds of event data and they can detect only the event patterns that apply to their particular business problem.
Companies don’t need general purpose ESP platforms for most purposes because they can buy or subscribe to these specialized applications to handle event data. However, for a growing number of situations, companies do have to build their ESP application because the functions that they need are not available in a COTS application or SaaS. In those situations, they can either buy an ESP platform and implement on top of that, or they can hand code similar event streaming logic in custom code (not a good plan in most cases).
Academics began building general purpose, tailorable ESP software platforms that developers can use to build and deploy stream analytics applications (then called complex-event processing (CEP)) applications as early as the 1990s. But only a handful of commercial products were available before 2010. These were used primarily for high speed trading systems for financial exchanges and for intelligence applications by government agencies.
In the past eight years, the number of tailorable commercial and open source ESP platforms has grown from a handful to more than 40. This article summarizes six of the key trends for this category of software.
- Ubiquity – Virtually all major software vendors offer one or more tailorable ESP products (see lists at the end of this article). Vendors realize that streaming data is only going to grow more plentiful, and an increasing number of business applications need to be able to deal with streaming data in real time or near-real time.
- IoT – Several years ago, we projected that the IoT would be a killer application for ESP platforms (actually killer applications because IoT is hundreds of different kinds of applications not one kind). This is proving to be the case. IoT applications have to deal with sensor data in real time, and sensor data is generated as real-time event streams. All of the IoT platform suite products that we have seen include a tailorable ESP platform as part of the IoT suite because it doesn’t make sense to design and build ESP logic from scratch for every new IoT application.
- Cloud ESP – Virtually all ESP products can run on public or cloud Infrastructure as a service (IaaS). A growing number of vendors, including Amazon Web Services, Google, IBM, Microsoft, Salesforce, SQLstream and others, offer ESP as a Platform as a Service (PaaS) for companies that don’t want to manage their own cloud ESP service. Moreover, virtually of the IoT suites with embedded ESP platforms are effectively ESP PaaS providers.
- Open source – Many of the ESP platforms that have emerged in the past four years are open source or a hybrid of open source with commercial value add. The first open source ESP, Esper, is more than ten years old and is still very widely used. Red Hat Drools Fusion, now also available as Red Hat Decision Manager, followed a few years later. But the introduction of Storm as an Apache project in 2014 was the start of an avalanche of open source ESPs that includes Apache Spark Streaming, Apache Flink, Apache Kafka KSQL, Apache Beam, Apache Samza, Apache Apex and Apache Gearpump. The basic Apache versions tend to be low on features and hard to use, although they are fine for experiments, learning and small projects by hardcore developers. However, vendors have brought a plethora of hybrid open source/commercial products to market that are supported and provide value added extensions that make it easier to author, administer and manage new applications. Examples include Confluent (Kafka), data Artisans (Flink), Databricks (Spark Streaming), DataTorrent (Apex), Google (Cloud Dataflow on Beam), Hortonworks (Hortonworks Dataflow on Storm), Impetus (on Storm and Spark Streaming), Informatica (Spark Streaming), Intel (Gearpump), Oracle (Spark Streaming), Radicalbit (Flink), and others (apologies to those we have forgotten to mention).
- Edge processing – The default architecture for many IoT applications is to run the stream analytics on the edge near to the source of the events. The sources of IoT events include sensors in devices, digital control systems (DCSs) and historian databases connected to the DCSs or sensors. In some cases, the ESP is actually running in a gateway, router, on a truck car or train, or in another endpoint device itself. There are many good reasons to run ESP on or near the edge: lower latency for fast response to changing conditions, less network traffic, and greater availability (you can’t afford to have a factory, vehicle or other machine to be inoperable because the network is down or the cloud server is down). This is giving rise to hierarchical configurations where initial processing is done on the edge, and then a subset of the events is forwarded to the cloud or a data center where another layer of stream processing is done.
- Stream Data Integration – A type of ESP has emerged that provides special features for stream data integration. To understand this, consider that ESPs are used for two main purposes:
- real-time stream analytics provide situation awareness to people through dashboards, alerts and mobile applications, or trigger automated responses when they detect conditions that require some sort of pre-designed response. This was the primary usage scenario for the early ESPs, and it is still the primary focus for the majority of ESPs (see list of 27 vendors below).
- stream data integration (also called “real-time ETL”) to store the event data in a database or file system for subsequent use by an analytics and business intelligence tool or a data science platform for machine learning. Products that focus on stream data integration provide adapters for various DBMSs, file systems and messaging systems such as Kafka, Kinesis, Pulsar or others (see list of 15 vendors below). Some also have adapters for change data capture (CDC). They are used for real time, low latency, high-volume ingestion of event data, or for bulk moves of data from one database or file to another.
Note that these two product categories have considerable overlap. All ESP products that focus on real time stream analytics are also often used to put event data into databases or files. Conversely, some of the products that focus primarily on stream data integration are also quite capable of real-time stream analytics to drive dashboards, send alerts or trigger automated responses.
ESPs for Stream Analytics
- Amazon Kinesis Analytics
- Axiros Axtract
- Concord Systems Concord
- Confluent/Apache Kafka KSQL
- data Artisans/Apache Flink
- Databricks/Apache Spark Streaming
- EsperTech Esper, EsperTech NEsper
- EVAM (Event and Action Manager)
- Fujitsu Software Interstage Big Data Complex Event Processing Server
- Hitachi uCosminexus Stream Data Platform
- IBM Streams, Operational Decision Manager (ODM)
- Impetus Technologies StreamAnalytix
- LinkedIn/Apache Samza
- LG CNS EventPro
- Microsoft Azure Stream Analytics, Stream Insight
- Oracle Stream Analytics and Stream Explorer
- Radicalbit (Flink)
- Red Hat Drools Fusion/Decision Manager
- SAP Event Stream Processor
- SAS Event Stream Processing Engine
- SQLstream Blaze
- Software AG Apama Streaming Analytics
- Streamlio Intelligent Platform for Fast Data (Heron)
- TIBCO BusinessEvents, StreamBase CEP
- Twitter/Apache Storm, Apache Heron
- Vitria VIA Analytics Platform
- WSO2 Stream Processor
ESPs for Stream Data Integration
- Alooma Platform
- Amazon Kinesis Firehose with Lambda
- Astronomer Cloud, Enterprise, Open (Apache Airflow)
- Confluent Platform (Apache Kafka Streams)
- Datastreams.io Data Stream Manager
- Equalum LTD Data Beaming platform
- Google Cloud Dataflow (Apache Beam)
- Hortonworks DataFlow (HDF)
- Informatica Big Data Streaming
- Intel (Apache) Gearpump
- Nexla Data Operations platform
- Pivotal Spring Cloud Data Flow
- Streamsets Data Collector
- Striim Platform
- Talend Data Preparation