Causality in Event Stream Analytics

 by  David Luckham and Roy Schulte

Modern event stream processing (ESP) platforms (also called stream analytics products) support a combination of ESP and certain key complex event processing (CEP) capabilities “under the hood”.  These enable the products to operate on moving time windows of streaming event data to address a variety of business problems.  They can calculate simple aggregates (e.g., count, sum, average, maximum, minimum) or detect meaningful patterns in the data. Pattern detection in today’s stream analytics products is important because it provides a way to understand causality in business processes. The products may become even more valuable when they expand their CEP features to support more explicit and complete notions of causality.

Why We Need Causality

It is very common to talk about events causing other events. For example, “did the car running the red light cause the accident?” Or “did that large sale of XYZ stock cause the price to dip at 2:00 pm?” In fact, “cause” often goes by other names, a common one being “risk factor”: “Is smoking a risk factor in premature death?”

Everyone is familiar with causality in an intuitive, informal sense. People understand how the world works by understanding cause-and-effect relationships. They have implicit causal models in their heads based on experience. When something happens, people try to identify the cause, or if the situation has a complicated causal chain, they seek the “root” causes of what happened. Once they understand the cause and effect, they make decisions about what to do – they take actions that will have a positive effect on their organizations’ revenue, profit, costs, customer satisfaction or some other goal.

Here is a simple example of causality at work in a business: Event A is “I send you an email” and event B is “you send me an email”, and B happens to be a reply to A. Then A causes B.  That is, A had to happen in order for B to happen. This particular causal relationship is so universally accepted that it is customary to encode the relationship between the two events by appending email A as part of email B.  Of course, if A happens it does not mean that B must happen, but simply B cannot happen unless A does.

A causes B is not the same as A followed by B in an event stream. The first pattern, A causes B, will match only the cases in which I send you an email and you reply to that particular email. The second pattern, A followed by B, will match any pair of events where I send you an email and you send me an email on any subject whether or not it is related to my email. That will produce far more pattern matches, and many of those matches will be irrelevant to goal of tracing the cause of the email events.

Defining What A Causes B Really Means

So, if you are going to use causality in specifying event patterns, what do you mean?  Well, in CEP the definition of causality is as follows:

“If event A had to happen in order for event B to happen, then A is a cause of B. We say “A caused B.” This is a very stark definition of cause, but it is what the everyday concepts like risk factor really mean.

How Do Businesses Use Causality?

Organizations can leverage the notion of causality in a system at two different times: (1) offline, when the system is being designed or modified or (2) in real-time as the system runs.

  • Offline

Analysts may study event logs – historical data on past events – to understand how events unfolded in the past. For example, if a bank transaction is found to be fraudulent, an analyst may look for patterns in previous customer behavior that led up to the fraudulent transaction. Were there address changes, password changes, phone calls, small deposits or withdrawals, web site logins from remote IP addresses, or other activities? Are there similarities among multiple fraudulent transactions? What was in the bank’s business processes and security practices that enabled the fraud attempt to succeed? The analyst discovers previously unknown patterns of events which reflect cause and effect relationships among those events.  These patterns constitute a causal model, a partial mathematical model of how this part of the bank works. The analyst can use the model to design changes to the banks processes and security practices to avoid future similar events. This is a forensic use of the concept of causation. It is retrospective, done hours, days, or weeks after the fact, so it is offline from the ongoing activities of the bank.

  • Run time

An offline cause-and-effect model of the type described above is sometimes used to implement an application that continuously monitors causal relationships at run time. Analysts and software developers build an ESP application to detect new occurrences of the fraud pattern that was discovered when the model was developed. The application listens at run time to streams of address changes, password changes, phone calls, deposits, withdrawals, logins, and other events. When it sees a set of input (“base”) events that match the first part of the fraud pattern, it generates an “emerging fraud” complex event. This complex event is sent as an (output) notification to a software component that interrupts the transaction processing to prevent the potential fraudulent event and initiate an investigation or other remedial action.

How ESP Products Should Support Causality

ESP platforms should provide features that enable a bank’s application developers to implement run time applications such as fraud detection directly on the platform rather than hand coding the stream handling logic in custom application code.

At the moment all ESP platforms provide frameworks for managing time windows in streaming data so the application developer can focus on writing the logic that is specific to the pattern detection. Some platforms even provide features, such as match recognize, that make it easier for developers to build systems that detect patterns.

But what is really needed is explicit support for the concept of causality, so that developers can actually write and employ patterns that use causality between events. Here’s how it might work.

ESP should be extended by adding the identifiers of all the events that caused any event to that event itself. The identifiers are metadata, a vector added to the event object (generally a message) that enables a recipient to trace back the causes of the event. This of course assumes that the events are created by a system whose causal model is well understood.

Having done this, any complex event generated by the system will contain the IDs of its causes – a useful feature in its own right. This will also enable a causal operator, “->” to be added to the ESP event processing language. So, “A -> B” expresses a pattern that will match events A and B only when A is a cause of B. In the bank fraud example, if A is an address change, B is a password change, C is a small deposit, then a potential fraud detection pattern would be

{A and B and C) -> Potential-Fraud.

An analyst can now write emergency actions to be taken when potential frauds occur, such as:

Potential-Fraud -> Warn-Customer and Close-account

Early CEP systems, such as Stanford’s Rapide project, had these kinds of event pattern capabilities for dealing with causality. Rapide had a sophisticated event processing language (domain specific language) with explicit operators for expressing causal relationships within event patterns. This enabled users to build discrete event simulation systems, used in simulating hardware designs, that could detect causal relationships in large, complex sets of events (event clouds). In addition, these features were used successfully in several experiments with CEP applied to detecting event patterns in commercial event processing systems such as messaging systems and fabrication line control systems.

Many of the newer ESP platforms today do not provide languages that are specific to event processing but instead rely on variations of SQL or general purpose system languages such as Python, Java, or Scala. We don’t see signs that causality between events will be explicitly supported by these systems.

Some ESP platforms that do provide specialized event processing languages support a “followed by” operator that helps developers efficiently express a kind of pattern detection logic. However, for reasons described earlier, it is less powerful than a causal operator because “followed by” will match sets of events that may not be causally related.

However, we do expect that some ESP platforms will add limited capabilities to support causality in the near future, probably due to customer pressure. This may be done, as explained above, by adding the identifiers of events that caused an event to that event itself. Unfortunately, high level causal operators will probably not appear in the ESP languages of these platforms. Nevertheless, they may be implemented eventually as stream processing becomes more widespread and applied to ever-more complicated business scenarios in which causality between events is an important consideration. The advantages are obvious:

  • Developers can express causal event patterns in a high level language that is clear and easily understood.
  • Stream analytics applications that leverage causal relationships for business purposes will become easier to develop and modify.
  • Errors involved in coding causality in SQL or a system language such as Java, Python or Scala, are avoided.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.