June 12th 2019
Some of you may be wondering why there are two flavors of event processing, event stream processing (ESP) and complex event processing (CEP). Well, I wrote the original version of this article about 13 years ago. And of course, the ESP tools changed as time passed.
A Mr. Robert Mitchell phoned me one day in 2006 for an article he was writing on ESP for ComputerWorld. He wanted to ask a few questions for a sidebar to his main article. We spoke for an hour. A week later one of my correspondents sent me two links to ComputerWorld. What I read was not exactly what I said or intended to say. To be fair to Mr. Mitchell it is difficult to capture the true content of an hour’s conversation in an article of 400 words and another of 200. However, Mr. Mitchell’s questions were certainly relevant, so I wrote a more complete version of my answers at that time.
I came back to this article in 2019 because it turned out to be one of the most read articles on the CEP website. On re-reading it, in conjunction with my colleague, Roy Schulte, it seemed not to be very much out-of-date at all. ESP has progressed as predicted in 2006. So, I decided to update this article and republish it. Finally, once again, my thanks to Mr. Mitchell for provoking me to finish the 2006 article!
What problems was CEP designed to solve?
CEP was developed at Stanford University between 1989 and 1995 to analyze event-driven simulations of distributed system architectures. That’s over twenty-five years ago, now! Here’s how it came about.
We started by designing a new event driven modeling and simulation language called Rapide, for modeling the event activity in a distributed event-driven system. Events in a distributed system can happen independently of one another; they can happen at the same time or at different times; or they can happen in sequence, in which one causes another. Consequently, Rapide had to model not only the timing of events as they were created but also their causal relationships or their independence. Not only that, but we needed to be able to model multi-layered architectures1. So Rapide had to capture levels of events with timing and membership relations between the events at different levels. Our first target was a hardware design for a new chip that involved three levels, the instruction set level, the register transfer level and the hardware gate level. This hierarchy is standard in hardware design and it is well understood how to define relationships between events at the different levels.
As new target architectures arose, we needed to model dynamic architectures such as air traffic control systems where components (e.g., aircraft) could enter or leave the system, or the event communication between components could change over time. Typical examples of hardware systems were versions of the SUN Sparc CPU designs, while examples of software systems included military command and control systems such as the Aegis cruiser radar control architecture, and on the civilian side, Telco protocols, automated robotic manufacturing systems, electronic markets, and air traffic control.
When you simulated a model written in Rapide what you got as output was not the usual time ordered stream of events produced by the event-driven simulators of the 1990’s such as Verilog or VHDL. You got a cloud of events, partially ordered by time and causality in three dimensions, that is, horizontally within each design level (abstraction level), and also vertically across levels. Such a partially ordered event cloud is called a poset (partially ordered set of events).
In addition, we developed a set of event processing principles and techniques for analyzing posets to find out what was happening in a simulation. The construction of hierarchies of events is one of the more sophisticated CEP analysis techniques. Errors that need to be caught might include anything from wrong pin connections in low level hardware designs, to incorrect synchronization among high level distributed communicating processes, or violations of critical design constraints. Design constraints were defined in Rapide by event patterns. We called our set of principles and techniques Complex Event Processing (CEP). And we built a set of analysis tools based on CEP to aid in the analysis of posets. This included graphical representation of posets, a pattern matcher for detecting matches of event patterns in real time in a simulation output, and tools for defining hierarchies of events to support high level analysis.
By 1995 there were several published papers on the Rapide language, simulator and the CEP analysis tools. The system was freely available from Stanford and has been downloaded worldwide and used by researchers to analyze various systems, including industry standards in manufacturing and telecommunications. We were ready to go commercial. But getting into the game of selling a new computer language is always tough. I had already experienced that with the Stanford Ada compiler and the founding of Rational Software in 1980. It seemed to me there was another route.
You could apply CEP tools to analyzing events that were created in any kind of event driven system. So we decoupled the analysis tools from the simulator and started applying them to the commercial message-oriented middleware that had grown up by that time. That was how the CEP toolset was developed for analyzing events as they arrived (that is, while they were “in motion”) in any event-based real-time system. CEP lets you define the design constraints of your system as event patterns and monitor your system’s output in real time for violations of those constraints. Furthermore, you could do that at every level in a multi-level system.
How is ESP different from CEP?
The differences arise from the problem you’re dealing with. If your problem involves analyzing a stream of events then you use ESP; if on the other hand you need to analyze a cloud of events then you use CEP. That’s a little simplistic, so let’s go into more detail.
First of all, ESP is evolving. In the beginning the roots of ESP were different. While CEP was being developed there was a parallel research effort going on in real-time event data analysis. This started in the mid-1990s when the database community realized that databases were too slow to do real-time data analysis. They started researching the idea of running continuous queries on streams of incoming data. They used sliding time windows to speed up the queries. An answer to a query would be valid only over the events in the current time window, but as the window slid forward with time so also the answer was updated to include the new events and exclude the old ones. This research was called Data Streams Management (DSM) and led to the event stream processing world of today. The emphasis was on processing the data in lots of events in real-time. Interestingly since the 1990’s, databases have gotten much faster but at the same time the volume of events has also increased so that the modern data bases still cannot always keep up with current event stream inputs. There are now over 40 commercial and open source ESP products that provide simple kinds of analytics for real-time event stream processing, see Trends in ESP .
Secondly, there’s a fundamental difference between a stream and a cloud. An event stream is a sequence of events ordered by time, such as a stock market feed. An event cloud is the result of many event generating activities going on at different places in an IT system. A cloud might contain many streams. A stream is a special case of a cloud. But the assumption that you are processing a stream of events in their order of arrival has advantages. It lets you design algorithms for processing the data in the events that use very little memory because they don’t have to remember many events. ESP algorithms can be very fast. They compute on events in the stream as they arrive, pass on the results to the next computation and forget those events. Recently, however, some ESP systems now have an ability to deal with very simple cases of out-of-order events by processing on the basis of event time (when the event occurred, according to the event’s time stamp) rather than on processing time (when the ESP software performs the calculation, which may be when the event arrives). This ability to handle out-of-order events emerged in some ESP products beginning around 2015. Such modern ESP systems thus have some similarity with the earliest work on CEP.
On the other hand, if you’re processing a cloud with CEP, you cannot assume that events arrive in a nice order. You need to deal with event time not processing time. You may be searching for instances of a pattern of events in which, say, A and B together cause C, but when you run the search, the event C actually arrives at your observation point before either of A or B. Then you must remember C while continuing to search for an A and B that caused it and will complete a match of the pattern. The events A, B, C could be the actions and responses of several processes in a management protocol that are supposed to synchronize and execute a transaction, but sometimes fail. You may have to remember lots of events before you find the ones you’re looking for that signify a completed transaction. In such a case it is critical to know which events caused which others. This takes more memory and more time! It requires a causal reference model for how events are created in the system that is being analyzed. Referring to this model as events arrive to check for the pattern in which A and B cause C takes time. On the plus side, you can deal with a richer set of problems, not only event data processing at the level of the incoming events, but also for example, the correct or incorrect behavior of hierarchically structured sets of processes in business process management.
The use of event hierarchies in the CEP analysis of event driven systems is unique. A CEP analyzer may create new events when patterns match the incoming events. These new events are viewed as being at a higher level than the input. Their purpose is usually to abstract patterns of input events that signify something of importance has occurred in the target system. By applying event pattern analysis to the higher level events a CEP analyzer can create a hierarchy of events, the higher levels of which contain events that are more humanly understandable than the lower level input. For example, in analyzing a simulation of a chip design at the gate level many thousands of gate level events are created. E.g., “an output signal from gate 32 was transmitted to the input of gate 64.” This event, and many others in the simulation, might be abstracted upwards to the register transfer level, and again to the instruction level, leading finally to the creation of a single more understandable event such as “Add instruction completes”. This would tell a human user whether or not the gate level simulation is behaving as intended.
ESP is focused more on high-speed querying of data in streams of events and applying mathematical algorithms to the event data. Some of the first commercial applications, such as algorithmic trading, were related to trading systems in financial markets. CEP is focused more on extracting information from clouds of events created in enterprise IT and business systems. CEP includes event data analysis, but places emphasis on patterns of events, and abstracting and simplifying information in the patterns. The idea is to support as wide an area of enterprise management decision making as possible. The first commercial applications of CEP were in the Business Activity Monitoring, for example monitoring conformance to service level agreements.
In summary, ESP and CEP are both approaches to event processing. Both literally generate complex events. At first sight ESP is a subset of CEP and the difference boils down to special purpose event processing versus general purpose, but that distinction is not as true now as it used to be, as we shall see next.
How Do Applications of ESP and CEP Products Differ?
Let’s start with the present. The majority of applications built with ESP products seem to focus on aggregates such as count, sum, average, maximum, minimum, top-k or bottom-k. For example, they may be used to count the number of tweets on a certain subject in a time window; or report the temperature of a machine by averaging many individual sensor readings over a 30 second time window. These aggregation-based event hierarchies are much simpler2 than those found in the original CEP applications that used pattern detection (often combined with aggregation). With the appropriate programming, ESP products can be used to correlate events from different streams, detect absent events (those that don’t occur within a time window), search for Boolean combinations such A and B or A or B, or even detect more complicated patterns. But they do not use horizontal causality (event A caused event B on the same abstraction layer) or independence (A happened independently of B), perhaps because the current set of commercial applications for ESP are not used for diagnosing complicated scenarios and typically don’t require complex event patterns.
There are lots of problem areas where you have to look at more than just the data in the stream of the events. In addition to event timing you need to detect which events caused other events, and which events happened independently. This would be the case in any area where enterprise operations need to work in harmony. For example, when business processes in eCommerce don’t synchronize when they should you get silly things happening. One process puts a set of products on sale while another process applies a promotional discount to the same products, resulting in lowering prices twice. To fix this problem it is not enough to detect that prices have been lowered twice on a product. We need to detect when the processes are not communicating as they should. Another example is a set of trading processes in an electronic auction that keep timing out instead of matching a required percentage of bids and offers. We need to find out where they are deviating from the expected pattern of behavior, and that pattern will be complex. When you get into these issues of keeping enterprise operations on track, you must detect as quickly as possible when events are not related as they should be and make adjustments to the operations. This requires CEP to detect patterns of events that signify the problem.
How will the role of stream processing tools evolve in the future?
ESP tools are now involved in applications beyond the algorithmic trading area where, by the way, their principle competition has been in-house custom coded systems. The underlying engineering of ESP systems makes them much more easily scalable than custom coded systems. So, they are attractive to customers who can’t predict in advance how much their event processing needs will grow and change over time. Moreover, ESP products supply the infrastructure for computing moving time windows that would otherwise have to be manually developed in a custom stream processing application.
The applications of ESP will become more sophisticated. As this happens, ESP will be extended to include more and more elements of the original CEP. I’ve had conversations with ESP technologists about this, and some of them certainly know how to add event causality to their event patterns when applications demand it. Of course, when they do this, some of the blazing event processing throughput numbers that are being quoted will decrease a bit. The emerging ESP platforms that use event time and deal with out-of-order processing necessarily produce results with higher latency than those that assume that events are in order and then generate results from events immediately as they arrive. ESP is (partially) merging with CEP to the benefit of all.
- In this context, a layered architecture refers to abstraction layers where events on one abstraction layer are related to events on another layer in an event hierarchy. For example, packets in a network are at a low layer of abstraction. A particular set of packets may together comprise one message. The message is a complex event, and it is at a higher level of abstraction than the packets that are members of the complex event. The relationship between the packets and the message is one of vertical causation because the packets are causal to an event (message) on a higher level of abstraction. Packets relate to other packets as peers and operate in different ways than messages relate to other messages. In this case, the vertical causation is based on a pattern – all of the packets related to a particular message can be correlated because they share a common message identifier. Other kinds of abstraction layers are found in many places in computing and other domains.
- Here is an example of an event hierarchy with vertical causality based on the simple notion of aggregation, rather than the more complex notion of patterns. It starts with a set of five events that report five purchases that occur between 2 and 3 PM, constituting the lowest layer in a reporting hierarchy. The five purchase events are summarized as one complex event that records a count (5 purchases) and sum (the total amount of the purchases that took place within the time window between 2 and 3 PM). The purchase events are members of the complex event called hourly sales. The complex event is on a higher layer of abstraction and would be used for different purposes, such as management reporting or alerting, than the member events, which might be kept for subsequent drill down in case someone needed to understand the details for some business purpose. Vertical causation here is based on aggregate calculations rather than pattern detection. There is no horizontal causation in this application (we stipulate that none of the purchase events caused another purchase event).