The opinions expressed in this article are my own and may not reflect the position of my employer or any other company.
Every big company has hundreds of event streams constantly flowing through its corporate networks. For many business purposes, it is practical to process the data in these event streams with conventional application programs, business intelligence (BI), or data science tools. In such cases, the application developer or analyst typically doesn’t even think of the input data as an event stream, it is “just data.”
However, sometimes conventional programming, BI, or data science tools lack essential features or can’t calculate the desired results fast enough because data is arriving so quickly and the processing logic is too complicated to execute in the available time. In some of these cases, the application developer or business analyst should use an event stream processing (ESP) platform as the software foundation for an event streaming application. Examples of ESP platform products include Confluent ksqlDB, Flink, IBM Streams, Kafka Streams, Microsoft Azure Stream Analytics, Oracle Stream Analytics, Spark Streaming, TIBCO Streaming, and more than two dozen others.
How do you know if you need an ESP platform? It depends on the nature of the business problem. The purpose of this article is to help architects and developers identify the usage scenarios in which ESP platforms or related specialized software are appropriate.
An event is anything that happens. An event object (also called “event,” event message, or notification) is a software object that represents, encodes, or records an activity, generally for the purpose of computer processing. Event objects usually include data about the type of activity, when the activity happened (e.g., a time and date stamp), and sometimes the location of the activity, its cause, and other information. An event stream is a sequence of event objects, typically in order of arrival time. Event streams may move via a messaging system such as AMQP, DDS, JMS, Kafka, Pulsar, RabbitMQ or others, or they may go from event sources to event consumer via RESTful APIs other communication mechanisms.
Organizations have four kinds of event streams:
- The first is a copy of business transactions, such as customer orders, insurance claims, bank deposits, call data records (in telecommunication companies), airline seat reservations, or invoices. These they reflect the operational activities of the company and are mostly generated internally.
- The second are information reports from external sources, such as news feeds, market data feeds, weather reports, or notifications (such as Advanced Shipping Notices) from partners.
- A third kind of event stream documents interactions with people, such as Web clickstreams, contact center phone call logs, emails, tweets, text messages and Facebook, Linkedin, or other social media posts.
- The fourth, and fastest growing, kind of event stream contains data coming from physical devices. This is generally characterized as Internet of Things (IoT) data, and it includes location data from vehicles or smart phones, temperature or accelerometer data from sensors, RFID tag readings, heart beats from patient monitors, meter readings, and signals from machine control systems.
What Do You Do with Event Streams?
Organizations sometimes process events immediately as they arrive or soon thereafter for operations intelligence purposes. In other cases, they ingest events and store them in a data lake or data warehouse for use hours, days or weeks later in business intelligence (BI) reports, machine learning (ML), or other analytical applications.
When event streams are used for operations intelligence, their role is to provide context information to improve the situation awareness of the system or person that receives the information. Situation awareness means knowing what is going on at the time so you can decide what to do. A system or person that receives one or more kinds of event streams obviously knows more and makes better decisions than a system or person that lacks streaming data.
When people think of event streams, they usually think of push-based continuous intelligence systems. Push-based systems listen to one or more event streams and recalculate as new data arrives without being asked. They may refresh a dashboard every second or minute, send alerts, or trigger automated responses without human involvement (decision automation). They may simply monitor a data source, such as Twitter, or help manage a complicated business operation, such as a customer contact center, supply chain, manufacturing line, water utility, telecommunication network, truck fleet, or payment process.
Less-Demanding “Continuous Intelligence”
Low-to-moderate volume continuous intelligence systems that process event streams can be implemented using conventional programming languages or tools without an ESP platform. The system may be implemented in SaaS, a packaged application, a custom-written application, an application built on an Analytics and BI (ABI) platform (such as PowerBI), or even using Excel, depending on the business requirements.
For example, a customer contact center monitoring system dashboard may show the number of calls and the average call duration in a half hour moving window, updated every five minutes. It could also send alerts to mobile devices when the average customer on-hold wait time exceeds one minute. The calculations here are relatively light weight – simple aggregates such as count, sum, average, maximum, or minimum. Aggregate calculations depend on knowing if an event is within a particular time window, but don’t change based on the order or timing of individual events within the window.
When the volume of data is relatively low (e.g., under 50 events per second) and new results need to be shown only every five minutes, conventional software can handle the job. For example, the application can be run as a minibatch application that is triggered every five minutes to read in the recent data and recalculate results. This is “continuous” in the sense that it periodically repeats forever, although it is not smoothly “continuous” in the sense of recalculating every time new data arrives.
Using ESP Platforms for More-Demanding Continuous Intelligence
ESP platforms are software subsystems that process data “in motion,” meaning data that is processed before it is written to a separate database or file (becoming data “at rest”). The query is pre-loaded, so the data comes to the query rather than the query coming to the data (as in conventional programming). ESP platforms retain a relatively small working set of stream data in memory for the duration of a limited time window, typically seconds to hours – just long enough to detect patterns or compute queries. ESP platforms are more flexible than hardwired applications because the query can be adjusted to handle different kinds of input data, different time windows (e.g., one minute or one hour instead of ten minutes) and different search terms.
Continuous intelligence applications are best implemented on ESP platforms if:
- the volume of data is high (thousands or millions of events per second), or
- results must be recalculated frequently (every millisecond or every few seconds), or
- multiple simultaneous queries are applied to the same input event stream, or
- the business problem requires detecting patterns that involve relationships between individual events. This logic is more sophisticated than aggregation because the relative order and timing of specific events matter (in temporal patterns), or the relative location matters (geospatial patterns).
For example, Twitter uses Heron, an ESP platform, to monitor the Twitter feed, which averages more than 6,000 tweets per second. A simple query might report the number of tweets that included the word “inflation” in the past ten minutes. However, at any one time, there may be many thousands of simultaneous queries in effect against the Twitter feed, each looking for different key words, different time windows, and different kinds of metrics.
In high volume scenarios, ESP platform applications scale out vertically (multiple engines working in parallel on the same step in a processing flow) and/or horizontally (split the work up in a sequential pipeline where work is handed from one engine to the next while working on the same multistep event processing query.
Sophisticated ESP platforms support incremental computation to reduce processing overhead and latency. Consider an ESP application that reports the moving average price of Amazon’s stock in the most-recent ten-minute window, updated every 10 seconds. Stock exchanges may report thousands of price quotes and trades per second for any given instrument. For queries that involve moving windows on event data, the fastest and most efficient way to re-calculate a metric such as average price is to subtract out event data that has aged out of the window and add in the most recent data. This uses far fewer instructions than using brute force to recompute the entire metric by adding up all of the events in the time window and dividing by the count every time the window advances.
ESP platforms also make it easier to build applications the detect the occurrence of instances of temporal patterns. A simple example:
If the price of Amazon stock increases more than 5% within a 30-minute window, then sell 25% of our holdings and notify a trader.
This constantly calculates the difference between individual event records that report the current price and an earlier low price within the 30-minute interval. The relative timing matters (comparisons occur within each 30-minute moving window), the order of events matters (the higher price must occur after the lower price), and the price difference (5%) between the events matters. ESP systems retain event data in memory so relationships between event pairs can be tracked.
Business problems that requiring tracking many different people or things that are moving or otherwise changing also benefit from ESP platforms. For example:
Send a text message to me if anyone from my family and friends affinity group comes within ½ mile of my location.
This is effectively a large number of queries in operation at once. A mobile phone company that wants to provide this service to its subscribers could need to track the location of 100,000s or more people simultaneously. The system would receive tens of thousands of updates (location events) per second. For each, it must correlate the person’s location and the location of all others in their affinity group. Although the logic to calculate distance is simple, the number of instructions required to compute all of the possible combinations of people and locations would be prohibitive using conventional application design practices.
Applications on today’s ESP platforms are implemented as multistep data flows going through a tailored sequence of operators (a “topology” or “event processing network”) to perform the processing logic. Operators may perform filtering, transformations, enrichment (lookups or calculations), including joins, or pattern matching.
Interactive and Batch Applications on Event Streams
Not all event streaming applications are continuous and push-based as those described above. Some applications that process event streams are pull-based, i.e., they run on demand when triggered by a person or external system. For example, a person may use an ABI platform (e.g., PowerBI) interactively for ad hoc data exploration and visualization of a snapshot of streaming data that has been loaded into the ABI platform. Or an analyst may use a data science and machine learning (DSML) platform to build and train a machine learning model (ML) by running different experiments on the same input data set (a stored copy of the event stream). [We’ll defer discussion of ML systems that re-train models continuously from event streams for another day].
- engineers use historical sensor event streams to find ML patterns that predict when machines will break
- security analysts study network intrusion patterns
- financial risk managers simulate the effect of interest rates and economic disruptions on portfolios
- marketing departments analyze customer behavior over time.
All of these on-demand (pull-based) applications are “batch” processes in the sense that they operate on a fixed set of events that previously arrived. The data is a snapshot of an event stream that may be minutes, hours, weeks, or years old. The data is typically at rest in a database or file.
Batch processes that are triggered by timers are combination of push and pull. They are a push in the sense that calculations execute automatically without an immediate request from a person or external system. However, they are a pull in the sense that they operate on a fixed batch of data and the timer is, in effect, an internal requester. They are not triggered by the arrival of streaming data, as in a full push-based system. A timer-based system that generates hourly, daily, or weekly reports feels like a traditional batch process. However, a timer-based system that runs every 100 milliseconds is, for many business problems, like any push-based streaming application in its ability to drive “real-time” dashboards, send alerts, or trigger automated responses. Most Spark Streaming applications and many applications on other ESP platforms use timers as part of the logic in “push” based applications.
Most on-demand (pull based) event streaming applications use custom-written programs, ABI platforms, DSML platforms or other analytical tools. However, ESP platforms are used for some on-demand applications in two ways:
- Sometimes ESP platforms are used because of their expressiveness, power, and usefulness (utility) in implementing logic that deals with temporal or geospatial relationships between events. Analysts may use ESP platforms to run analytics on historical event streams by re-streaming the old event data from an event log in a file or database through the ESP engine. This is relevant for investigating past situations or for developing models for subsequent use in real-time, continuous intelligence, push-based ESP applications. For example, fraud analysts in financial markets back-test new trader surveillance algorithms by replaying months’ worth of trade data to see how the new algorithms would have performed. One variation of re-streaming event data for analytical purposes has been described as the Kappa architecture (see https://www.oreilly.com/ideas/questioning-the-lambda-architecture).
The field of complex-event processing (CEP) originated to handle this kind of work. The Rapide language, developed at Stamford University in the 1990s, was used to perform on-demand analytics on historical or simulated event streams. Rapide is an event-oriented modeling language that can support very sophisticated models, including concepts such as horizontal and vertical causality. Rapide can support forensic analyses to debug the design of chips or to investigate the cause of power blackouts. A detailed description is in “The Power of Events: An Introduction to Complex Event Processing in Distributed Enterprise Systems.” by David Luckham (Addison-Wesley Professional, 2002). The Rapide system was later adapted to operate against streaming data in motion for real-time push-based applications, as described in the section above on continuous intelligence.
- Another way that some ESP platforms are used in an on-demand manner occurs in systems that allow external applications to read data maintained within the ESP platform interactively. The systems are fundamentally “push based” in the sense that they continuously listen to one or more real-time event streams and perform calculations as new data arrives. They can control real-time dashboards, send alerts, trigger automated responses or to generate complex events for downstream use by other applications just like any push-based ESP application. However, all ESP platforms maintain internal event data caches to enable them to perform stateful queries in time windows. Some ESP platforms allow an external interactive, pull-based application to read data on-demand from their caches (state stores) rather than receiving the information as a push.
Stream Analytics with Integrated Data Stores
ESP platforms are not the only kind of software that is optimized for high performance analytics on streaming event data. ESP platforms with externally accessible state stores (as described in the previous paragraph) are part way toward another class of product that we call “stream analytics with integrated data stores.”
Examples of these products, include
- Cogility Cogynt
- Evam Actions and Evam Intelligence
- (Thales) Guavus-IQ
- Hazelcast Platform
- Imply Data Platform
- Joulica Amazon Connect
- (FD Technologies) KX Insights and KX Streaming Analytics
- OneMarketData OneTick
- (Aveva) OSIsoft PI
- Radicalbit Natural Analytics
- Scaleout StreamServer
- Scuba Continuous Intelligence Platform
- Snowplow Insights
- and many others.
These very diverse products provide on-demand, pull-based analytics, but some are also used for continuous, push-based continuous intelligence. They ingest and store high volume event streams very quickly, making the “at rest” event data immediately available for interactive queries, exploration, and visualization. Their high performance is enabled by their respective proprietary data models. Some are optimized for partly structured log data, others for structured sensor or market data. They may store TBs of data for months or years, depending on the application.
These products can support the high throughput and low latency required in many event streaming applications better than conventional applications, BI tools, and data science platforms. They provide more-extensive event data storage capabilities than ESP platforms but cannot always match the ultra-high throughput, ultra-low latency, and programming flexibility of ESP platforms. Their usage scenarios scenarios overlap those of ESP platforms, and in some cases, they have embedded ESP platforms.
In a related article, How to Really See What’s Happening in Real-Time Event Processing Systems, David Luckham explains the use of event hierarchies and gives two examples of their use, one in manufacturing and another in retail. A longer report, Monitoring A Chip Fabrication Facility In Real-Time Using Complex Event Processing, explains the technical aspects of event abstraction and complex-event processing (CEP) for those who want to understand how to implement it.