by David Luckham
Some of you may be wondering why there are two flavors of event processing, ESP and CEP. Well, I’ve been writing various versions of an article about this for the past eighteen months. And of course, the ESP tools have been changing as I dallied. A Mr. Robert Mitchell phoned me one day not so long ago for an article he was writing on ESP for ComputerWorld. He wanted to ask a few questions for a sidebar to his main article. We spoke for an hour. Last week one of my correspondents sent me two links to ComputerWorld. What I read was not exactly what I said or intended to say. To be fair to Mr. Mitchell it is difficult to capture the true content of an hour’s conversation in an article of 400 words and another of 200. However, Mr. Mitchell’s questions were certainly relevant, so here’s a more complete version of my answers. Oh, and my thanks to Mr. Mitchell for provoking me to finish this article!
What problem was CEP designed to solve? CEP was developed between 1989 and 1995 to analyze event-driven simulations of distributed system architectures. Here’s how it came about.
We started by designing a new event driven modeling and simulation language, called Rapide, for modeling the event activity in a distributed system.1 Events in a distributed system can happen independently, at the same time or at different times, or they can happen in sequence, one causing another. Consequently, Rapide had to model not only the timing of events as they were created but also their causal relationships or their independence. Not only that, but we needed to be able to model multi-layered architectures. So you had to capture levels of events with timing and membership relations between the events at different levels. Also we needed to model dynamic architectures such as air traffic control where components could enter or leave the system, or the event communication between components could change over time. Typical target examples of hardware systems were versions of the SUN Sparc CPU designs, while examples of software systems included military command and control systems such as the Aegis cruiser radar control architecture, and on the civilian side, Telco protocols, automated robotic manufacturing systems, electronic markets, and air traffic control.
When you simulated a model written in Rapide what you got as output was not the usual time ordered stream of events produced by the event-driven simulators of the 1990’s such as Verilog or VHDL. You got a cloud of events, partially ordered by time and causality in two dimensions ? horizontally within each design level, and vertically across levels. Such a partially ordered event cloud is called a poset (partially ordered set of events).
So we developed a set of event processing principles and techniques for analyzing posets to find out what was happening in a simulation. Errors might include anything from wrong pin connections in low level hardware designs, to incorrect synchronization among distributed communicating processes, or violations of critical design constraints. This set of principles and techniques was called complex event processing (CEP). And we built a set of analysis tools based on CEP.
By 1995 there were several published papers on the Rapide language, simulator and analysis tools. The system was freely available from Stanford and has been downloaded worldwide and used by researchers to analyze various systems, including industry standards in manufacturing and telecommunications. We were ready to go commercial.
But getting into the game of selling a new computer language is always tough ? I had already experienced that with the Stanford Ada compiler and the founding of Rational Software in 1980. It seemed to me there was another route. You could apply CEP tools to analyzing events that were created in any kind of event driven system. So we decoupled the tools from the simulator and started applying them to the commercial middleware that had grown up by that time.
How is event stream processing (ESP) different from CEP?
First of all, ESP is evolving. In the beginning the roots of ESP were different. While CEP was being developed there was a parallel research effort going on in real-time event data analysis. This started in the mid-1990s when the database community realized that databases were too slow to do real-time data analysis. They started researching the idea of running continuous queries on streams of incoming data. They used sliding time windows to speed up the queries. An answer to a query would be valid only over the events in the current time window, but as the window slid forward with time so also the answer was updated to include the new events and exclude the old ones. This research was called Data Streams Management (DSM) and led to the event streams processing world of today. The emphasis was on processing the data in lots of events in real-time.
There’s a fundamental difference between a stream and a cloud. An event stream is a sequence of events ordered by time, such as a stockmarket feed. An event cloud is the result of many event generating activities going on at different places in an IT system. A cloud might contain many streams, A stream is a special case of a cloud. But the assumption that you are processing a stream of events in their order of arrival has advantages. It lets you design algorithms for processing the data in the events that use very little memory because they don’t have to remember many events. The algorithms can be very fast. They compute on events in the stream as they arrive, pass on the results to the next computation and forget those events. On the other hand, If you’re processing a cloud, you can’t assume that events arrive in a nice order. You may be looking for sets of events that have a complex relationship. For example, events that should be causally related but are actually independent because of an error. They could be the actions and responses of several processes in a management protocol that are supposed to synchronize and execute a transaction, but sometimes fail. You may have to remember lots of events before you find the ones you’re looking for. In such a case it is critical to know which events caused which others. This takes more memory and more time! On the plus side, you can deal with a richer set of problems, not only event data processing, but also business process management, for example.
Event stream processing is focused more on high-speed querying of data in streams of events and applying mathematical algorithms to the event data. Some of the first commercial applications were to stock-market feeds in financial systems and algorithmic trading. CEP is focused more on extracting information from clouds of events created in enterprise IT and business systems. CEP includes event data analysis, but places emphasis on patterns of events, and abstracting and simplifying information in the patterns. The idea is to support as wide an area of enterprise management decision making as possible. The first commercial applications were in the Business Activity Monitoring, for example monitoring conformance to service level agreements.
So ESP and CEP are both approaches to event processing. At first sight ESP is a subset of CEP and the difference boils down to special purpose event processing versus general purpose ? but that is not as true now as it used to be, as we shall see.
What do stream processing products not do well? Lets start with the present. At the moment very few ESP products use complex patterns of events. Perhaps they use timing between events, or Boolean combinations such A and B or A or B. But they don’t use causality (A caused B) or independence (A happened independently of B), probably because the current set of commercial applications for ESP don’t require complex patterns.
There are lots of problem areas where you have to look at more than just the data in the stream of the events. In addition to event timing you need to detect which events caused other events, and which events happened independently. This would be the case in any area where enterprise operations need to work in harmony. For example, when business processes in an eRetailer don’t synchronize when they should you get silly things happening. One process puts a set of products on sale while another process applies a promotional discount to the same products, resulting in lowering prices twice. To fix this problem it is not enough to detect that prices have been lowered twice on a product. We need to detect when the processes are not communicating as they should. Another example is a set of trading processes in an electronic auction that keep timing out instead of matching a required percentage of bids and offers. We need to find out where they are deviating from the expected pattern of behavior, and that pattern will be complex. When you get into these issues of keeping enterprise operations on track, you have to detect as quickly as possible when events are not related as they should be, and make adjustments to the operations.
How will the role of stream processing tools evolve in the future? ESP tools are now involved in applications beyond the algorithmic trading area where, by the way, their principle competition has been in-house custom coded systems. However, the underlying engineering of ESP systems makes them much more easily scalable than custom coded systems. So they are attractive to customers who can’t predict in advance how much their event processing needs will grow. The applications of ESP will become more sophisticated. As this happens so ESP will be extended to include more and more CEP. I’ve had conversations with ESP technologists about this, and some of them certainly know how to add event causality to their event patterns when applications demand it. Of course, when they do this, some of the blazing event processing throughput numbers that are being quoted will decrease a bit. ESP is merging with CEP to the benefit of all.
What will be the challenges going forward?
New Horizons: The first challenge is to expand the areas to which event processing is being applied. We have to educate the IT community, and a lot of other communities, about its potentials, and that will involve a lot of proof of concept work. So far, the early adopters have been people who already know they need real-time event processing and their problems are usually well formulated. It is true that new applications are appearing all the time – in areas involving RFID, eRetailing and so on. But there are other areas where event processing could be applied. Autonomic systems is an example. Many enterprises now have huge warehouses of servers that are very expensive to maintain with human labor. There’s a big effort going on in autonomic computing to automate the self-diagnosis and repair of large systems of machines. We should be able to demonstrate that complex event processing is a basic part of a solution.
Further out on the horizon there’s a wealth of new possibilities. There are huge clouds of events from multiple disparate sources in Homeland Security2, Epidemiology, Global Warming and the Environment, just to mention three areas. We need to demonstrate that event processing can be applied to challenging problems in these areas. For example, could Homeland Security use telephone surveillance data to enhance monitoring bank transfer events on SWIFT networks for money laundering? Another example, there are some very imaginative experiments going on in medical epidemiology. It turns out you may be able to predict ‘flu outbreaks earlier by monitoring over-the-counter medication sales than by monitoring doctor’s reports. What about analyzing a lot of event sources for early prediction of epidemic outbreaks? And the world itself is becoming an event generating globe with sensors and probes for deep ocean pressure monitoring for Tsunami warning, fault monitoring for earthquake studies, forestry monitoring, etc. All of these events are available via satellite. What are the possibilities for event processing here?
Other challenges are technical worries for the near future.
Validation and Correctness: First of all, the correctness issue. I notice that while there’s a lot of hype among the event processing vendors about event throughput, speed of processing and so on, not much attention is paid to whether the tool detects event patterns or answers streaming queries correctly. It seems nobody has the time to actually validate that a tool does exactly what they say it does. In fact the semantics of the event processing operations are usually not precisely defined. Of course, bugs are a way of life with software, so why not here too? Well, event processing is on the cutting edge of software complexity since it is essentially distributed computing. And in a new emerging field it is important not incur business losses for customers due to incorrect applications. I don’t know of any validation problems at the moment. If there are none, it is probably because the applications involve fairly simple event processing. But correctness may become an issue for event processing tools in the future, say in three to five years. A first step towards tackling this is to specify precisely the event processing being done by a tool. And maybe we ought to be seeing that done now!
Rules Management: Another challenge is the management of large sets of event processing rules. Most event processing tools work by applying sets of rules defined for each application. At present these sets of rules are pretty simple. But they’re written in languages of the moment such as Java, UML, Finite State Machines, and Streaming SQL, which get pretty incomprehensible after several pages. What is going to happen when we get applications involving many thousands of rules? If you look at credit card fraud detectors as an example of large rule sets, you’ll see redundancies and inconsistencies in the rules every few pages. With any complex set of rules, you have a hard time reading and understanding that rule set. So in event processing we need higher-level rule languages that can express rules succinctly in a way that makes them understandable, maybe graphics based languages.
So better rules languages is the first step in rules management because we need to give ourselves a chance to understand the rules we write, and when we add new rules, to understand how they affect the existing rules. Rules management is about (1) writing correct rules ? that say what you mean, (2) organizing rule sets for efficient execution, so the rules engine tries only the rules that might apply at any time, (3) making changes correctly, which involves knowing how a new rule will interact with existing rules, and of course (4) ensuring logical consistency and absence of redundancies.
Standards for Event Processing: Obviously in the course of developing event processing as a field in information technology we will need standards. But defining them and getting consensus are always difficult and time consuming efforts. Even something as simple as a few pages of basic terminology has taken myself and a few colleagues nearly a year. So there is the question of timing. At what point in the development of event processing will standards be beneficial, and which ones?
Here are some standards that I think will be needed early in the game: (1) basic terminology, (2) measures of event processing performance, (3) definitions of event driven architecture (EDA). Without the first two it is hard to have any precise sharing of information about tools and applications. The third has been pushed into prominence by the groundswell in Services and SOAs. Papers are now being written on how SOA relates with EDA. So we’d better get our ideas about EDA straight before yet another confusion sets in!
There are other standards to do with levels of events, and hierarchies of events that will be needed in time. Historically, event hierarchies have been necessary in defining industry standards. There are several industry standard hierarchies in the protocol world that date from the 1960’s, and we all know the ISO 7-layer messaging hierarchy. Hierarchies of business events will be needed in event processing, but not for a while. So I’ll leave discussion of this to another day.
Finally, the answer to the title of this article is: there are some differences now, and there will likely be none in the future. And that’s a good thing!
2. Note a recent article: “DHS Asset Database Can’t Support Vaunted Infrastructure” in Government Computer News (07/12/06) Wilson P. Dizard III