The Future of Event Stream Analytics and CEP
by David Luckham and Roy Schulte
This article represents personal opinion, not the position of any company or other organization.
In a previous article we discussed two key event processing concepts, event stream processing (ESP) and CEP, their basic terminology and relationships, and the different kinds of problems that commercial event processing tools are being used to solve. In that article we said that the main differences arose because ESP was directed at analyzing linear streams of events arriving in the timing order in which they were created, whereas CEP was intended for analyzing clouds of events consisting of many event streams, possibly arriving out of order.
ESP is a simple subset of CEP, but it has very fast implementations that can handle many thousands of events per second, which not so true of implementations of a full set of CEP capabilities. In a word, ESP is “simple problems fast” whereas CEP is “complex problems slower”. Moreover, ESP is simple to implement!
First Steps Towards Merging ESP and CEP
What seems to be happening is that modern ESP platform products (also called stream analytics or event stream analytics products) are now supporting a combination of ESP and certain key CEP capabilities “under the hood”.
For example, Flink has a library, FlinkCEP, that helps you detect patterns in event streams. It leverages a subset of SQL 2016’s Row Pattern Recognition (“match recognize”) clause. Pattern detection is more sophisticated than the aggregate functions (e.g., count, sum, average, maximum, minimum) in common ESP applications.
Many of the modern ESP platforms, including Confluent ksqlDB, Flink, Google Cloud DataFlow, Spark Streaming, SQLStream and others, also have varying kinds of support for late arriving and out-of-order events. This capability was previously rare, found in only the most sophisticated early CEP projects such as Stanford’s Rapide.
On the other hand, at the moment we are not aware of any modern ESP platform that has native support for advanced CEP features such as causality between events or explicit event abstraction hierarchies.
Most stream analytics applications are still solving simple ESP problems like “count the number of tweets on COVID-19 in past 10 minutes,” so they don’t need CEP features. However, the use of CEP pattern detection is increasing as event streams proliferate and companies automate more-demanding decision-making scenarios. We expect that the most advanced CEP concepts, such as causality and standardized event hierarchies, will begin to be applied as organizations ramp up their analysis capabilities.
Why ESP Is Not Enough
Some systems cannot be monitored for correct behavior without using event causality. For example a call center dealing with a large volume of customer calls, including complaints, may have a contractual requirement that all complaints (events C) must be logged (L) and result in an investigation (I) or a refund (R) or a letter of apology (A) and lead to a resolution (RES). A simple AND pattern will not work because of the large numbers of similar events. A pattern constraint monitoring the call center contract must track the causality between events. Such a pattern would look like: C -> L -> (I or R or A) -> RES. In effect causality (->) between events is used to track the chain of events that must happen in response to each complaint. An “AND” of all these events would result in many irrelevant matches between similar events that are not related. Just suppose the same customer complained three times leading to three separate event tracks – they would all be jumbled together by using AND.
Business intelligence required to manage some systems must use event hierarchies. A simple example is the management of large sales-oriented websites, like for example, Amazon.com. One must make business sense out of low-level events on the website. At a low level are website events created by the activity of customers, such as Logon, Search-Catalogue, Choose-Item, Add-to-Cart, Review-Price, Discard-Item, Purchase, Check-Out, etc. At a higher level are business intelligence events, resulting from sets of the lower level events, such as Reclassify-Customer, View-Sales-Promotion, Update-Item-Interest, Reclassify-Item, etc. A Reclassify- Customer event would be a higher level business event resulting from a customer’s activity over a period of time and would be defined in terms of a set of that customer’s website events over some time window. These higher level events result by abstracting from sets of lower level events – sometimes called vertical causality. There would be even higher level events resulting from abstracting sets of Reclassify-Customer events and other types of events.
Business intelligence level events play an important role in running the website and maximizing sales. These events result from sets of the lower level events. There is an event hierarchy involved in gathering the business intelligence needed to run the website. Specifying such a hierarchy with mathematical precision is a typical application of event hierarchies in CEP.
New horizons lead to new tools. And more powerful event processing tools are beginning to combine ESP and CEP. The IT community, including data analytics teams, application architects, process modelers, and project leaders, needs to be persuaded that stream analytics has potential in new areas of applications that are commercially attractive. That will involve proof of concept work.
The early adopters were people who already knew they needed real-time event processing and their problems are usually well formulated. But new stream applications are appearing all the time – in areas involving Internet of Things (IoT), eCommerce, customer engagement and so on. There are many more places where it could be employed to improve the effectiveness and speed of the event processing.
For example, CEP will be needed in Autonomic systems. Many enterprises now have huge warehouses of servers that are very expensive to maintain with human labor. There’s a big effort going on in autonomic computing to automate the operation, self-diagnosis and repair of large systems of machines. We believe that it is easily demonstrated that CEP must be a basic part of maintaining autonomic systems even though current solution builders may not realize that CEP is what they are trying to implement!
Further out on the horizon there’s a wealth of new possibilities. There are huge clouds of events from multiple disparate sources in Homeland Security2, Epidemiology and Pandemic Prediction, Transportation Networks, Smart Cities, Global Warming, and the Environment, just to mention six areas.
We have already described how stream analytics and CEP can be applied to challenging problems in these areas. For example, Homeland Security can use telephone surveillance data to establish causal relationships between bank transfer events on SWIFT networks and thereby detect sets of transfers that are likely to indicate possible money laundering. Another example, even before the arrival of COVID-19, were experiments undertaken to make early predictions of ‘Flu outbreaks by monitoring events generated by cell phone conversations, over-the-counter medication sales, tweets, and other activity on social media. By combining stream analytics and hierarchical event abstraction in CEP, earlier warnings of future epidemic outbreaks are likely to be possible. Indeed, the problem of automating contact tracing in COVID-19 will involve a lot of CEP.
Towards developing real-time environmental monitoring, the world itself has already become an event generating globe with sensors and probes for deep ocean pressure monitoring for tsunami warning, fault monitoring for earthquake studies, forestry monitoring, etc. All of these events are increasingly available via satellite and other channels to organizations such as NOAA. CEP makes it possible to detect more intricate and subtle patterns, involving multiple kinds of event data and events with longer time windows, thus improving the accuracy and effectiveness of both early warning and long-term monitoring systems.
Validation and Correctness
The kinds of event processing tools that are being built now are playing an increasingly important role in our lives. The issue of correctness is going to become an increasing concern. Especially when we are forced to do what they tell us to do!
At the moment there’s a lot of hype among the event stream processing vendors about event throughput, speed of processing and so on, but not enough attention is paid to whether the tool detects event patterns or answers streaming queries correctly. Modern ESP platforms are getting better at some aspects of correctness, particularly in their handling of out-of-order events and late arriving events by distinguishing between event time and arrival time. However, we are not aware of any work to attack the problem of validating applications to ensure that they do what they are supposed to do. In fact, the semantics of event processing operations are usually not precisely defined.
Of course, bugs are a way of life with software, so why not here too? Well, event processing is on the cutting edge of software complexity since it is essentially real-time distributed computing. Moreover, event data is notoriously poor in quality, particularly when coming from sensors in physical devices or transmitted over noisy and unreliable networks. In fact, POTUS has recently (May 2020) questioned the validity of the COVID-19 monitoring data. In the future, developers must validate their applications and present validations for public scrutiny.
In an emerging field it is important to not incur business losses for customers due to incorrect applications. Simple stream analytics applications, such as counting tweets, are generally not problematic. But correctness will become a bigger issue for future event processing applications. A first step towards tackling this is to specify precisely the event processing being done by a tool. And maybe we ought to be seeing that done now!
Another challenge is the management of large sets of event processing rules. Most event processing tools work by applying sets of rules defined for each problem. At present these sets of rules are pretty simple. But they’re written in languages of the moment such as stream SQL, Java, Scala, Python, or domain-specific event processing scripting languages, all of which get pretty incomprehensible after one or two pages of rules.
What is going to happen when we get applications involving many thousands of rules? If you look at credit card fraud detectors as an example of large rule sets, you’ll see redundancies and inconsistencies in the rules every few pages. There are some sophisticated (non-streaming) business rule management systems (BRMS) and decision management suites (DMS) with rule-checking utilities that can detect missing, overlapping or logically conflicting rules. It might be possible to develop analogous utilities to deal with rules in stream analytics and CEP systems.
In event processing we need high-level graphical authoring tools. Rule languages that are easier to understand is a first step in rules management. We also need to understand the impact of adding new rules and how they affect the existing rules. Rule management is about:
- writing correct rules that say what you mean,
- organizing rule sets for efficient execution, so the rules engine tries only the rules that might apply at any time,
- ensuring logical consistency and absence of redundancies,
- managing rules by using tools that support versioning, deployment and, when necessary, rollback to a previous version.
Standards for Event Processing
At the moment there are no generally accepted standards for event processing. This will become a more pressing concern as the role these applications play in our lives increases.
Unfortunately, standards efforts usually result in committees, and eventually come up empty-handed. We don’t see standards happening in the near-term future. There were attempts at defining a standard, SQL-based event processing language for stream analytics rules ten years ago, but they fell by the wayside. Apache Beam is another recent attempt at standardizing the programming model for event stream processing, but it has limited acceptance and little momentum. Roughly half of the 40 or so ESP products on the market today support some dialect of SQL, but the dialects are not consistent. Proprietary rule languages and general-purpose programming languages (Python, Java, Scala, etc.) remain the dominant tools for building stream analytics applications.
Here are some standards that the authors think would be useful at this point:
- basic terminology. The glossary on this web site (www.complexevents.com), based on the work of the now-terminated Event Processing Technical Society is a good start for this.
- measures of event processing performance. Early work on this was performed a decade ago but we have not seen anything beyond vendor benchmarks in recent years.
- definitions of event driven architecture (EDA) remain an area of debate with some general agreement on the principles but no consensus on some important details (a topic for another day). Papers are still being written on how microservices and SOA relate to EDA, a perennial debate.
Event processing has come a long way since the academic research in the 1990s and early 2000s. ESP platform products that support ESP and many of the key pattern-detection aspects of CEP are in widespread commercial use. But industrial exploitation of stream analytics is still immature. Today’s products can be leveraged to improve many more aspects of business, and tomorrow’s products will be applied to even more-demanding business problems. As this trend continues, we expect that ESP tools will implement both event timing and causality and continue to trend towards including a full set of CEP capabilities.