W. Roy Schulte

This is Part 1 of a two-part series. Part 1 provides context, defines terms, and explains how AI helps Event Stream Processing (ESP). Part 2 explains how ESP helps AI by enabling streaming data pipelines.

ESP and AI software tools are complementary. Sure, ESP is often implemented without using AI tools, and AI and other analytics are often implemented without ESP tools. However, using them together provides the best available solution for an increasing number of business problems because of the high information value and ubiquity of streaming data.

This synergy operates in three ways:

AI and other analytical software components can be executed within online production ESP applications to make smart operational decisions (described in Part 1, below).
Generative (Gen) AI-based copilots can be used at build time to help develop and test ESP applications (also in Part 1).
ESP can be used to ingest and prepare streaming data that helps engineers design, build, and train AI and other analytical solutions (see Part 2).

Key takeaways

If you want to make good business decisions, particularly time-sensitive decisions in rapidly-changing conditions, in many cases you need to use streaming data.

There is an important difference between
- offline activities, when AI and analytics solutions are built and long-term decisions are made using historical data, versus
- online activities, when AI and analytics are executed to make real-time and near-real-time, repeatable, operational decisions using current data.

ESP plays a role in both kinds of activities.

Incoming event streams are filtered and abstracted to generate complex events. Complex events present the information in a form that is more effective for analytics (data scientists call this abstraction process “feature generation”).

ESP is a functional capability not a product. ESP platforms like Flink and Spark Structured Streaming are not the only kind of software that supports ESP. Vendors offer a wide variety of other tools and applications that also implement ESP.

Background

We’ll provide definitions and some context before explaining the synergy between ESP and AI. Skip this part if you regularly work in this field and this background information is obvious.

In this article:

An event stream is an unbounded, continuous sequence of data records (event objects) that report things that happen (event happenings). Think of clickstreams, sensor data flows, data broker feeds (e.g., containing news, traffic, weather, or market data), machine logs, copies of business transactions, social computing activity streams, and similar data flows. Not all real-time data is streaming data, but a lot of it is. Every medium size or large enterprise has many thousands of different event streams continuously flowing over its networks. A car manufacturer reported 3 billion events per day running through its 25 Kafka clusters. A financial market data broker receives 300 billion peak events per day, up from 25 billion a few years ago. A car ride service ingests multiple petabytes of event data daily, mostly through streaming, to manage its operations. This abundance of data can improve the accuracy and effectiveness of time-sensitive operational decisions if it is made available to decision makers (people or systems) in real time. When processed later in batch, streams can also improve the accuracy and effectiveness of some less-time-sensitive strategic and tactical decisions (see “Event Stream Processing Helps AI and Vice Versa, Part 2”).

Event streams are typically conveyed from source (producer) to sink (consumer) using Kafka, Kafka-like, or other messaging subsystems, or other asynchronous communication mechanisms such as webhooks or WebSockets. Messaging subsystems are sometimes called “message brokers” or “event brokers” (e.g., ActiveMQ, IBM MQ, many versions of Kafka, Solace PubSub+, Synadia NATS and Jet, Tanzu RabbitMQ, TIBCO Messaging, and many others). Some event streams are retained in messaging subsystems (e.g., in Kafka topic stores) for days, weeks, or longer to be used for retrospective processing (more on this later). Alternatively, event streams can be saved in external object stores, other files, or databases. Stored event streams may be the raw input data or they may be filtered and abstracted derivatives generated from that raw data (i.e., complex events, see Complex Event Processing (CEP) in the Event Processing Glossary).

ESP software performs incremental computation on event streams as the data arrives. This is (near) real-time stream processing. Sometimes people say that ESP software processes data “in motion” because it operates on data without first storing it in a separate external database or file. But, of course, ESP software does store the data in internal buffers (“state stores”, generally in memory) for a time while waiting to process it. ESP is typically implemented as a multistage data-flow pipeline (a “job graph” or “topology”) that can be diagrammed as a directed acyclic graph (DAG). The pipeline consists of operators that apply a series of calculations on records sorted and grouped into moving time windows (windows are typically minutes or hours in duration but can be shorter or longer depending on application needs). The results of ESP computation are complex events.

Multiple kinds of software products implement ESP including, of course, ESP platforms such as Flink, Spark Structured Streaming and similar frameworks (see more examples in Appendix A). But ESP is also performed in other kinds of products including Data Streaming Platforms (DSPs), Unified Real-time Platforms (URPs), Stream Data Integration (SDI) tools, streaming DBMSs, streaming-enabled data platforms (lakehouse platforms),and some SaaS and packaged application solutions. Some of these products embed open-source ESP platforms like Flink or Spark Structured Streaming while others have their own native implementation of ESP capabilities. For more explanation and examples see Appendix B.

AI refers to computer programs that execute complicated decision logic to accomplish tasks that humans otherwise do. Some kinds of AI leverage machine learning (ML) techniques, such as Gen AI deep neural networks and Transformer models. This article covers such products, but there is more to AI than Gen AI. AI may also leverage symbolic techniques, such as rule processing, optimization, and other computational logic, or hybrid neuro-symbolic approaches that combine multiple techniques. Some definitions of “AI” require that an AI system improves its performance as it runs by learning from ongoing feedback (i.e., run-time re-training). This article applies to such systems, but it is not limited to systems that implement ML at either build time or run time. Also see Neural Networks of Societies of Autonomous Agents and A Future Artificial Intelligence May Attain Concepts Unknown To Mankind.

This article also applies to statistical data science/machine learning (DS/ML) solutions that use analytical techniques, such as regression, clustering, and decision trees, that are not generally considered to be AI by themselves. (Analytics has been defined as “the discipline that applies logic and mathematics to data to provide insights for making better decisions,” which might make AI a subset of analytics, but that’s a semantic debate for another time). So, in this article we’ll use the term “AI/analytics” to encompass any kind of complicated logic that operates on data to make decisions regardless of which mathematical technique(s) it uses.

Implementation Considerations

This article assumes that you are probably using commercial or open-source ESP and AI/analytics software tools, although it also applies to comparable systems that are developed from scratch in more-primitive programming languages. For example, a smart programmer can hand code sophisticated AI or other analytical algorithms in a Java, C++, or Python application, although it’s usually wiser to use a pre-existing AI or statistical library, AI or DS/ML platform product, rule engine, decision management suite, decision intelligence platform (DIP), optimization solver, or some other analytical tool or tools to supply code for complicated decisions. Similarly, a programmer can write an application from scratch that processes event streams and generates complex events without using an ESP platform or other ESP-capable tool, but it’s usually easier and faster to build a stream processing system with the help of some kind of off-the-shelf ESP tool unless the calculations are basic and the volume of streaming data is low. Some applications use data from event streams in simple ways (e.g., one record at a time, no aggregation) and therefore just don’t need stateful ESP.

Now, on to our explanation of how AI/analytics helps ESP.

I. Using AI/analytics to Make ESP Applications Smarter

Every company has many kinds of operational business decisions that must be made quickly, in “real-time” (technically “near real time” or “business real time”). This kind of real time means that a decision is taken within several minutes or seconds, or even in less than a second, after something has changed (i.e., after an event or set of events) or a new inquiry or request for a business transaction has been submitted.

Many real-time decisions are fully automated, but some still involve a human decision-maker. Either way, the key to good decisions is situation awareness, i.e., knowing as much as possible about what is going on now by leveraging all available current information (and yes, all relevant historical information too). A lot of real-time information is only found in streaming data. That is why developers use certain kinds of ESP software, particularly ESP platforms, URPs, some streaming DBMSs, or SaaS and packaged solutions, to implement time-sensitive operational applications.

Examples of such applications include:

E-commerce systems generate product recommendations and “next best” customer offers. Q&A chatbots and other customer service and customer experience applications generate “next best action” decisions. Hyper-personalized customer-facing applications use historical data along with recent streams including customers’ clickstreams (what pages has the customer seen and what data did they enter), transcribed call center phone logs, real-time customer location data, copies of recent sales or payment transactions, and sometimes even weather information.
Financial transaction processing systems approve or reject credit card transactions and reduce fraud in bank withdrawals based on up-to-the-second balances, payments, customer location, and patterns in other recent transactions.
Transportation operations and supply chain management systems use streaming RFID and bar code scans to track the progress of packages. They use GPS-based signals to monitor the location of vehicles and workers, and other telemetry data that report the health of engines and vehicles. They use traffic and weather feeds to optimize routing and delivery instructions.
Trading systems in equity, bond, foreign exchange, energy, commodity, and other markets make buy/sell/hold and pricing decisions based on streaming market data, news feeds, and up-to-the-minute interest rates.
Manufacturing and other IoT systems increase reliability and optimize machine performance based on machine logs, sensor data (e.g., temperature, pressure, flow rate, rotation speed, current draw, valve open/closed), and other telemetry from machines, medical instruments, and other devices.

All of these (near) real-time applications execute AI/analytics code that is integrated with run-time ESP software (see bottom part of Figure 1). Run time AI/analytics code is based on analytical models that were developed earlier, at build time (top part of Figure 1) and then deployed into production. An analytical model is a mathematical representation of a system or process. The specific nature of a model depends on the decision technique being employed, such as Gen AI, other neural net, statistical DS/ML, rules, optimization, or other.

Figure 1. ESP and AI/analytics at Build Time and at Run Time

To reduce complexity in Figure 1, we are not showing all of the things that may be included in the “other business logic” in the lower part of the diagram but you can imagine that it may involve human interactions (user interfaces including chatbots, real-time dashboards, and alerts), database lookups and updates, external API calls, and so forth. In Part 2 of this blog, we’ll also discuss how some less-time-sensitive decisions are made using offline analytics (top part of Figure 1) without deploying executable analytics components into operational applications.

A majority of ESP applications compute fairly simple rules based on correlations or trends such as:

“If a credit card is presented at two different locations that are 50 miles apart within a 30-minute window, then issue a ‘stop transaction’ instruction;” or
“If a user’s web session (a variable length time window) ends with unpurchased items in a shopping cart, then trigger an abandoned-shopping-cart follow-up procedure;” or
“If more than 10 mobile calls are dropped in range of any particular cell tower in a two-minute time interval, then update a performance dashboard and initiate a service maintenance investigation.”

Simple decision algorithms (plain rules and expressions) may be manually coded directly in ESP software tools using SQL, Python, Java, Scala, a pattern language, or other vendor-provided event processing language (EPL). This level of analytics is too simple to be called “AI” and it doesn’t need a separate AI/analytics tool beyond the ESP framework at run time. However, when the decision algorithm is more sophisticated, the run-time, executable AI/analytics code in the ESP application may be generated by the offline AI/analytics tool (again, top part of Figure 1) and it may involve other software dependencies and databases.

As already noted, run-time AI/analytics logic may execute business rules and/or an ML inferencing, optimization solver, large language model (LLM), or other AI/analytics algorithm. For example:

A rule engine may be relevant if there are large number of rules (e.g., more than one screen full). We have seen Flink and other ESP platform applications that embed rule engine code generated by a decision management suite product. Three vendors (IBM, Red Hat, and TIBCO) historically even packaged a rule engine with ESP software within their respective products.
Most DS/ML platforms will generate a software component to be used for run-time ML inferencing as part of the process of developing a statistical model. It may be based on regression (linear, polynomial, or logistic), classification (decision tree, random forest, nearest neighbor, Naive Bayes, or neural network), K-means clustering, or another technique. Some vendors (e.g., SAS Institute) have done extensive work to integrate their DS/ML products with their ESP platforms to make combined development even easier.

Sometimes the executable AI/analytical code runs in-line within the ESP software. In other cases, the ESP software invokes an external LLM, optimization solver, or other analytic via an API. External modules bring more overhead and are slower at run time but are often easier to implement and update.

Run-time AI/analytics logic may test the value of new complex events in conjunction with older data to determine what action to take. That older data can be from a database that was previously populated using a batch or online process. The databases that hold such information could be called operationally-focused feature stores, in contrast to analytical feature stores that were used offline to design and train analytical models. Some operational feature stores with short time-to-live (TTL) data are updated during the day while others with slightly longer lifetimes are updated offline. For example, banks routinely re-compute customers’ credit ratings and retailers re-compute future offer recommendations in nightly batch runs to be used in the next day’s processing when a real-time transaction is triggered.

The algorithm and sometimes even the same code used for computing run-time operational features are the same as those used to compute the features for developing the models at build time (this is true for features/complex events generated in ESP and also for features computed by non-stream-processing analytics pipelines). In a few cases, the same feature store can be used for both analytical model building and operational runtime inferencing purposes but this seems to be quite rare in practice.

Agentic AI

All ESP applications and all agents in general are inherently event-driven and autonomous to a certain extent because they initiate action when they receive events. Production ESP applications may be further designed to use “agentic” AI. Agentic AI refers to ML-based AI applications that have extra flexibility to adjust their behavior at run time based on changing conditions. They are designed to achieve prescribed objectives by understanding the current situation (situation awareness) and context and then determining a course of action that was not pre-designed, typically invoking other agents.

Retrieval Augmented Generation (RAG)

Most ESP applications currently use symbolic logic (particularly rules) rather than LLMs or any other kind of neural net. Of the very small, although growing, population of ESP applications that use LLMs at run time, some kind of grounding or retrieval-augmented generation (RAG) architecture plays a major role. ESP applications deal with (near) real-time business problems and many of the parameters that affect run-time decisions that should be grounded in fresh data from event streams rather than solely relying on the old information baked into pre-trained language models. This may mean including retrieved real-time data in the LLM prompt or enabling the model to access data through an API. External, third-party LLMs (e.g., OpenAI and Azure OpenAI) can already be invoked from ESP frameworks through APIs. Furthermore, Confluent will reportedly integrate Gen AI LLMs as Flink models (analogous to Flink Tables) within Confluent Cloud in the foreseeable future.

Run-time learning

Some AI/analytics models improve as they run by learning (re-training) from feedback and other input (see Figure 2). Examples of ML algorithms that can incrementally improve include neural networks, streaming regression, K-means clustering, support vector machines, and some kinds of optimization.

Figure 2. Some ML Models Learn (Re-train) While Executing Production Workloads

II. Using AI to Help Build ESP applications

On to our next topic. Gen AI copilots seem likely to become even more important for ESP application development than for developing other kinds of IT systems. Most developers and data analysts have limited or no experience with stream processing concepts, and this lack of skills has long been an impediment to the wider adoption of ESP products. Vendors have tried for two decades to make their ESP platforms easier to program by adding support for SQL, higher level EPLs, and drag-and-drop graphical development tools. Until now, these efforts have had limited success because software developers are just more familiar with request/reply, non-event-driven kinds of systems. The emergence of copilots based on LLMs may help reduce the ESP learning curve.

Multiple vendors are adding copilots that work with ESP platforms into their product lines which should bring the usual benefits of offloading repetitive aspects of system development. Copilots may be relevant whether the resulting run-time ESP application executes plain rules, Gen AI, or any other decision technique. Copilots can provide a starting point, especially for inexperienced developers, by generating streaming SQL, Java, or other code needed to program ESP applications. Copilots typically have conversational interfaces that provide context-aware assistance, and facilitate collaboration with other developers and stakeholders. Copilots are also useful for generating documentation from application specifications. They could also help reconcile the semantics in different event objects or generate synthetic test data.

Pattern discovery – identifying meaningful sequences of events in one or more event streams to design complex events – is a particular challenge in ESP. The design of complex events in event hierarchies traditionally relied almost entirely on human intuition. Analysts would invent hypotheses about causal relationships among events and then test the hypotheses through trial-and-error experiments to see which combination of events and their timing, sequence, causality, and other relationships was significant. However, we recently came across a Gen AI-based tool from Motif Analytics that uses an LLM to model sequences of events instead of sequences of words. This allows developers to discover patterns in sets of historical event data more easily, potentially leading to faster and better design of complex-event hierarchies. Something like this might even be a step toward auto-ML for ESP application development.

Of course, copilots are not a panacea. Humans are still essential in the development process. AI engineers and other software engineers need to check the generated code and finish the complete application. The problem of hallucinations hasn’t been solved.

Conclusion

Streaming data is valuable for decision-making and is increasingly common. AI is also valuable for decision-making and is increasingly common. They are naturally being used together more often. Streaming data engineers need to understand AI, and AI engineers need to understand event streams and the technologies for ESP, particularly when addressing (near) real-time operational business problems.

Appendix A. Some Examples of ESP platforms (not a comprehensive list)

Arroyo (from Apache and Arroyo Systems)
Axual KSML
Espertech Esper
Flink (from Aiven, Amazon, Apache, Confluent, Cloudera, IBM, and many other vendors)
Google Cloud Dataflow
Heron and Storm (from Apache and X Corp.)
Hitachi Streaming Data Platform
Kafka Streams (from Apache, Confluent and others)
Microsoft Azure Stream Analytics
SAS Event Stream Processing
Spark Structured Streaming, Spark Streaming (from Apache, Databricks, and others)
TIBCO Streaming
VMware Tanzu Spring Cloud Data Flow

Appendix B. Commercial Products that Support ESP

ESP platforms (examples listed above in Appendix A). These pureplay stream processing frameworks can be used in many different ways but don’t provide messaging or significant data management capabilities on their own.
Data Streaming Platforms (DSPs) combine messaging and ESP platforms. For example, Confluent’s DSP encompasses Kafka-based messaging with Flink-based ESP, along with related connectors and management, security, and metadata management features. Aiven, Axual and other vendors also offer DSPs are part of their cloud (PaaS) services. DSPs can store large amounts of data for long periods of time but don’t provide full DBMS-like interactive query capabilities (although some DSP vendors also offer separate DBMS products or dbPaaS services).
Unified real-time platforms (URPs) provide ESP, data management, and application enablement facilities but limited or no messaging (see What Exactly is a Unified Real-time Platform). URPs are offered by Cogility, Evam, Gigaspaces, Gridgain, Hazelcast, Joulica, KX, Nstream.io, Pathway, Scaleout, Timeplus, Vantiq, Vitria, Volt Active Data, XMPro, ZineOne, and other vendors.
Stream Data Integration (SDI) tools (e.g., from Decodable, IBM, Informatica, Striim, and others) and other data integration tools with streaming capabilities offer ESP, transformation, data quality, security, administration, and metadata management features, along with connectivity to a variety of third-party DBMSs and messaging subsystems. For more information see the Gartner report “Critical Capabilities for Data Integration Tools,” G00806728, 3 December 2024.
Streaming DBMSs (e.g., Materialize Cloud Operational Data Store and RisingWave Cloud) combine ESP and DBMS capabilities, including storage, retrieval, and search. They have limited or no native messaging but connect to Kafka and other messaging products.
Data Platforms (AKA lakehouse platforms) are ramping up their ESP capabilities to complement their extensive data management, data integration, and analytics capabilities. They emphasize analytical use cases but are incrementally expanding towards operational applications. Examples include Apache Hudi, Cloudera Data Platform, Databricks Data Intelligence Platform, Microsoft Fabric, Oracle Intelligent Data Lake, and Snowflake Data Cloud Platform.
Some SaaS and packaged solutions implement domain-specific stream processing as part of their application logic. They are used for industrial IoT, real-time customer engagement, transportation management, supply chain management, AIOps, security information and event processing (SIEM), and other vertical or horizontal applications. They generate complex events but usually don’t call them that. For example, Aveva, Falkonry, Proemion, Seeq, XMPro, and others offer industrial IoT data analytics tools that process streaming data from power plants, oil refineries, factories, mining equipment, and other machines.

Many URP and SDI products, and some SaaS, packaged solutions, and data platforms, have embedded open-source ESP platforms (mostly Flink, sometimes Spark Structured Streaming) to provide their ESP capabilities. The remaining products (including all streaming DBMSs) implement their own native custom-built ESP.

This does not imply that these ESP-capable products are equivalent. They partially overlap in the sense that they perform calculations on event streams in near-real-time but they vary considerably in their relevant usage scenarios and relative strengths and limitations. A full analysis of these product categories is outside the scope of this article.

If you have comments or corrections for this article, please contact me at urpfeedback@gmail.com.

Event Stream Processing Helps AI and Vice Versa, Part 2 February 21, 2025
When Should You Use an Event Stream Processing Platform? September 5, 2021
Why Organizations Use Event Stream Processing With AI February 21, 2025

Event Stream Processing Helps AI and Vice Versa, Part 1

Key takeaways

Background

Implementation Considerations

I. Using AI/analytics to Make ESP Applications Smarter

Agentic AI

Retrieval Augmented Generation (RAG)

Run-time learning

II. Using AI to Help Build ESP applications

Conclusion

Appendix A. Some Examples of ESP platforms (not a comprehensive list)

Appendix B. Commercial Products that Support ESP

Related

Related Posts

Leave a Reply

Key takeaways

Background

Implementation Considerations

I. Using AI/analytics to Make ESP Applications Smarter

Agentic AI

Retrieval Augmented Generation (RAG)

Run-time learning

II. Using AI to Help Build ESP applications

Conclusion

Appendix A. Some Examples of ESP platforms (not a comprehensive list)

Appendix B. Commercial Products that Support ESP

Share this:

Related

Related Posts

Leave a Reply