A Standard Paradigm for Building Event Processing Hierarchies
by David Luckham and Roy Schulte
We have described elsewhere how abstraction hierarchies of business events can be employed to improve the real-time operations of businesses in previous articles (please see e.g., https://complexevents.com/2021/01/02/abstraction-hierarchies-in-event-stream-processing/).
However most event processing in industry today uses various kinds of event abstraction without being conscious of that fact. Developers build applications that generate key performance indicators (KPIs) and other metrics without thinking of the higher level, derived information as complex events. The notions of levels of abstraction and abstraction hierarchies are not recognized. The events, techniques and terminologies being used differ from company to company and between organizations within the same company. This leads to differences between event processing technologies that makes collaboration in information sharing within and between companies difficult and costly to implement.
This article describes a paradigm that is proposed as a standard for designing and constructing systems that implement any kind of event abstraction hierarchy. If such a standard, including its associated terminology, is adopted in the implementation of stream processing applications, it will make it easier for developers in different companies to understand each other’s applications. Companies will be able to exchange event information between their systems, and where appropriate, combine their event processing systems. It will also reduce the costs associated with collaboration between enterprises.
The methodology outlined here uses a network of basic building blocks called event processing agents (EPAs). We will first describe EPAs and then how we put them together in networks of EPA’s, event processing networks (EPNs), that implement event abstraction hierarchies.
Event Processing Agents. An EPA is a module that processes input event streams according to a specification. The concept of an EPA is taken from modular programming languages. It consists of an interface that specifies how its input events are transformed into its output events, and a body that implements the transformations.
Figure1 shows the interface of a typical agent. The In Actions are the types of events it will accept as inputs and the Out Actions are the types of events it will send as outputs. The Specification part contains reactive rules that trigger on patterns of input events and result in creating output events. A rule is triggered whenever its input pattern is matched by an incoming set of events, and the rule then creates its output events.
Three main classes of EPA’s are used to build an EPN: filters, maps, and constraints.
Filters are EPAs that reduce the incoming streams of events to those that are relevant to the particular task. They delete unneeded events to filter out noise, so to speak. For example, a filter in a communication network might delete Timeout events in messages. Filters may also make simple transformations of events such as syntactic or semantic transformation of individual attributes in an event (from XML to JSON, string to numeric, Fahrenheit to Centigrade, etc.), or omitting unneeded attributes. They are usually the first EPA to be executed in an EPN.
Maps are EPAs that generate abstractions of input events. The rules in a map’s interface will trigger on patterns of events in the input stream and generate output events that are higher level abstractions of the input events. For example, a map in a trading system might abstract a set of Offer, Bid, Counter and Accept events by a single higher level CompletedTrade event. Maps are the main building blocks of an event hierarchy.
They can also perform any combination of:
- Enrichment – transformation that adds new attributes or modifies input attributes using information from other sources such as lookup tables (e.g., zip code to city and state name) or joins with other event streams.
- Aggregations – transformation that summarizes like attributes from multiple input events using aggregate operations such as rollups (e.g., count, sum, or average), selections (maximum, minimum), reordering, and interpolation (calculating new synthetic attributes from the values of attributes in adjacent events in the stream).
Constraints are special purpose maps for monitoring the input event stream to detect either the presence or absence of specified patterns of events (i.e., pattern matching). Their role is not filtering or aggregation, but detection. For example, they may detect violations of business or security policies. They are placed at critical positions in a business event processing hierarchy.
EPAs can be connected by feeding output events from one EPA into the input events of another EPA to create an EPN. An EPN implements an abstraction hierarchy of events that accomplishes a specific task.
The physical implementation of EPAs and EPNs varies, depending on the software tool that is used and the developer’s choices. An EPA may involve multiple operations, such as a filter followed by a sequence of several maps or even a constraint, i.e., an EPA may be an EPN. This may repeat recursively. An EPN that consist of EPAs that are themselves EPNs could be quite large if fully expanded. The design artifact for an EPN is sometimes called a “topology” or “streaming dataflow graph” and the modules within it are “operator nodes”.
EPNs can be horizontally or vertically partitioned and distributed across multiple cores or multiple servers for scalability, load balancing, and availability purposes. An EPN can be distributed across a physical network of computers if that is appropriate, depending on the location of data or business functions.
An EPA that encompasses multiple operations (i.e., it is an EPN) may be implemented as a set of separate software modules with their own separate interfaces and intermediate data movement between the modules. Or it may be implemented as a single software module that encompasses a set of filters, maps, and/or constraints into one module to reduce the code path length and communication overhead.
Examples
Here are two examples of EPNs that each implement an event abstraction hierarchy used in the operations of an enterprise.
Figure 2. Structure of an EPN for multilevel viewing
Figure 2 shows an event processing hierarchy providing a real-time flow of events to different levels of a business enterprise. Incoming events from the IT layers are first adapted to a standard format for processing. They are then fed through a filter. Normally there would be several filters operating concurrently at this first step. The remaining filtered events then flow into a map that abstracts them to the higher level 1 events where two maps operate. One map operates on the level 1 events to provide a humanly understandable view of those events. The second map abstracts the level 1 events to an even higher level 2. At level 2 there are two further maps, one providing a level 2 view of those events and a second map that is abstracting the level 2 flow of events to level 3.
For example, if the IT layer events were from a sales website, then the level 1 view might be events that capture customer activities on the website, such as searching for items, viewing items, comparing item prices, etc. A level 2 view could consist of abstractions of the level 1 events, such as item popularity, sales statistics, product levels and inventory situations.
Note that each map shown in Figure 2 could be implemented by an EPN of multiple EPAs. This might be done for efficiency by distributing the processing over many EPAs. So, there may be many more EPAs than are shown in the figure.
Figure 3. Structure of an EPN for a global view from multiple locations
Figure 3 shows an EPN that is designed to give a global world view of the event activity in an international enterprise. The event hierarchy must abstract and merge the events flowing through the IT layers of four geographically separated monitoring sites. Each site has a local EPN that adapts and abstracts the events flowing into that site. The events flowing out of the local EPNs are merged into a single event stream. The oval EPA’s represent maps that simply merge their input events into a single output event stream. The global view might enable detection of a distributed denial-of-service attack in progress, or a statistical view of categories of IT traffic (e-mail, web access, Telnet and so on) in a global enterprise with multiple gateways to the Internet.
Conclusion
When analytics on streaming event data is used to support better business decisions, it is often helpful to organize the information into logical event abstraction hierarchies. To implement event hierarchies, developers should design and build EPAs and EPNs that perform the appropriate sequence of calculations.
The implementation paradigm described in this article can be used as a standard way to design and develop complex-event processing systems. This would reduce the costs associated with exchanging information and improve collaboration between developers in the same organization and in its partners. To be sure, event hierarchies that are designed by separate development teams will not be able to automatically interoperate unless maps between the differing event definitions and between the two hierarchies are developed. Issues relating to compatible software may arise. However, using a common paradigm makes it easier to build gateways between applications that need to exchange events. Furthermore, the skills used in one project can be transferred to another project that uses a different event stream processing (ESP) platform software product to solve a different business problem if both projects use the same set of principles described here.
Leave a Reply
You must be logged in to post a comment.