1 May 2019
W. Roy Schulte
This article represents the author’s personal opinion, not necessarily that of Gartner Inc. or any other company.
After twenty-plus years of research and development, event stream processing (ESP) software platforms are no longer limited to use in niche applications or experiments. They have become essential tools for real-time analytics in many business situations.
The motivation is coming from the explosion in the amount of streaming data that needs to be analyzed, especially:
- sensor data from the Internet of Things (IoT);
- clickstreams from user interactions;
- social media events such as tweets, Instagram posts, Facebook posts and Linked-in updates;
- market data;
- weather data; and
- event streams from transactions in business applications.
Academics began building general purpose ESP platforms that developers can use to build and deploy stream analytics applications (then called complex-event processing (CEP)) as early as the 1990s. But only a handful of commercial products were available before 2010. These were used primarily for high speed trading systems for financial exchanges and intelligence applications for government agencies.
In the past nine years, the number of commercial and open source ESP platforms has grown from a handful to more than 40. This post summarizes eight of the key trends for this software.
- Ubiquity – Virtually all major software vendors offer one or more ESP products (see lists below). Vendors realize that streaming data is only going to grow more plentiful, and that an increasing number of business applications need to be able to deal with that data in real time or near-real time.
- IoT – Several years ago, we projected that the IoT would be a killer application for ESP (actually killer applications because IoT is hundreds of different kinds of applications, not one kind). This is proving to be the case. Most IoT applications deal with sensor data, and sensor data is generated as real-time event streams. All of the IoT platform suites that we have seen include an ESP platform as part of the product. Most vendors of IoT platforms wisely choose to leverage their general-purpose ESP products rather than writing a new ESP platform just to embed in their IoT platform.
- Edge processing – The default architecture for many IoT applications is to run the stream analytics on or near the edge to be close the source of the events. The sources of IoT events include sensors, meters, digital control systems (DCSs), supervisory control and data access (SCADA) systems, and historian databases connected to DCSs or SCADA systems. In some cases, the ESP runs in a gateway; router; on a truck, car or train; or even in an endpoint device. There are many good reasons to run ESP on or near the edge: lower latency for fast response to changing conditions; less network overhead; and greater availability (you can’t afford to have a factory, vehicle or other machine to be inoperable because the network is down or the cloud server is down). This is giving rise to hierarchical configurations where initial stream processing is done on the edge, and then a subset of the processed and abstracted events is forwarded to the cloud or a data center where another layer of stream processing is done.
- Cloud ESP – Virtually all ESP products can run on public or cloud Infrastructure as a service (IaaS). A growing number of vendors, including Amazon Web Services, Google, IBM, Microsoft, Salesforce, SQLstream and others, offer ESP as a Platform as a Service (PaaS) for companies that don’t want to manage their own cloud ESP service. Moreover, virtually all of the IoT suites with embedded ESP platforms are effectively ESP PaaS providers.
- Parallel processing – Many of the ESP platforms that came to market in the past six years can be called distributed stream computing platforms (DSCPs) because they spread the workload across multiple servers. If the specific application allows data parallel operations, the incoming data is sharded and distributed to multiple workers, enabling higher throughput (more events per second). Other kinds of ESP platforms can also be set up to distribute the work across multiple nodes, but they require more programming to do that.
- Advanced analytics – Many vendors are somewhere on the journey to integrate machine learning (ML) or business rule engines into their ESP platforms. ML libraries, such as scoring services, can be embedded into the event processing flow. Earlier ESP platforms generally were limited to user defined functions (e.g., written in Java or in the vendor’s proprietary event processing language) without native support for off-the-shelf analytics.
- Open source – The open source movement has had a significant impact on stream processing in the past five years, just as it has impacted other software technologies. Open source comes in two very different flavors:
- Free, open source, stream processing frameworks, mostly from GitHub/Apache, that enable developers to build and run applications without paying license fees. These lack commercial support, have limited development facilities and administration tools, and few connectors to external sources and sinks. However, they are fine for getting started, learning about event processing, and building small or temporary applications. In a few cases, highly proficient development teams have built big, mission-critical applications on these products. Examples of free open source products and their primary contributors include:
- Apache Flink (Alibaba Ververica)
- Apache Gearpump (Intel)
- Apache Heron (Twitter)
- Apache Kafka SQL (LinkedIn, Confluent)
- Apache Samza (LinkedIn)
- Apache Spark Streaming (Databricks)
- Apache Storm (Twitter)
- Drools Fusion (RedHat)
- Esper, Nesper (EsperTech)
- Blended “open core” products that use the open source products mentioned above with the addition of proprietary value-added features. These have commercial support, so they appeal to big enterprises who are averse to risk and are willing to pay license, maintenance or subscription fees. They also have generally better development and administration tools, and connectors to more external systems. Many have real-time dashboards; some have security extensions or change-data-capture (CDC) adapters. These can cost as much as the fully proprietary ESP products, and they lock the application in pretty much the same as a fully proprietary product. Nevertheless, buyers like the aura of (partially) open source, and many of these products have a good set of modern features. Vendors like open core because they don’t have to develop the whole product themselves, so they can focus their resources on the extensions that differentiate their products. Examples include:
- Alibaba Ververica Platform (formerly data Artisans, on Flink)
- Amazon Kinesis Data Analytics for Java (on Flink)
- Cloudera Hortonworks DataFlow (on Kafka, Nifi, Storm)
- Confluent Platform (on Kafka)
- Databricks Spark Streaming (on Spark)
- EsperTech Esper Enterprise Edition
- Google Cloud DataFlow (with Apache Beam)
- Impetus StreamAnalytix (on Flink, Spark, Storm)
- Informatica Big Data Streaming (on Spark)
- Oracle Stream Analytics (on Spark)
- Pivotal Spring Cloud Data Flow
- Radicalbit Natural Analytics (on Flink, Kafka, Spark)
- Red Hat Decision Manager (on Drools Fusion)
- Streamlio Intelligent Platform for Fast Data (on Bookkeepper, Heron, Pulsar)
…and apologies to those I may have overlooked
- Note that other vendors, including Software AG (Apama) and WSO2 (Stream Processor) also provide their ESP products as open source.
- Stream Data Integration (SDI) – A type of ESP has emerged that provides special features for SDI (also called “real-time ETL”). They are used for real-time, low latency, high-volume ingestion of streaming event data, or for bulk data movement from one database or file to another. Products that focus on SDI provide adapters for various DBMSs, file systems and messaging systems such as Kafka, Kinesis, Pulsar or others. Note that the other ESP products (those that focus mostly on real time stream analytics) are also often used to put event data into databases or files (i.e., they can be used for SDI even though they may not have all of the data integration features of the SDI specialists). Conversely, some of the products that focus primarily on SDI are also capable of real-time stream analytics to drive dashboards, send alerts or trigger automated responses. Some of these products are not all that different from the general ESP platforms. Examples of SDI-focused products include:
- (Google) Alooma Platform
- Astronomer Cloud, Enterprise, Open/Apache Airflow
- (Qlik) Attunity Replicate, Compose
- Equalum LTD Data Beaming
- HVR Software Real-time Replicator
- IBM DataStage, Big Integrate, Infosphere Information Server
- Informatica Big Data Streaming
- InfoWorks Autonomous Data Engine
- Nexla Data Operations
- Streamsets Data Collector
- Syncsort DMX
- Talend Data Streams
To close, here is a list of some other significant ESP platforms not listed above in the open source or SDI sections:
- Amazon Kinesis Data Analytics
- Axiros Axtract
- EVAM (Event and Action Manager)
- Fujitsu Software Interstage Big Data Complex Event Processing Server
- (Thales) Guavus SQLStream Blaze
- Hitachi uCosminexus Stream Data Platform
- IBM Streams and Decision Server Insights (ODM)
- LG CNS EventPro
- MapR Converged Data Platform with Streams
- Microsoft Azure Stream Analytics, Stream Insight
- SAP Event Stream Processor
- SAS Event Stream Processing Engine
- Software AG Apama Streaming Analytics
- Striim Platform
- TIBCO BusinessEvents, Streaming
- Vitria VIA Analytics Platform
- WSO2 Stream Processor