Big Data vs Event Processing

Database pundit Curt Monash made a brief mention of event processing (/event stream processing) in his discussion on “big data terminology”, presumably as a response to the discussion he started with Forrester’s Brian Hopkins where Brian (very reasonably IMHO) defined “big data” as:

techniques and technologies that make handling data at extreme scale economical.

with “extreme scale” being defined mainly by the metrics of volume and “velocity” – with the latter being the obvious area of interest from an event processing perspective, as stated by Curt:

Low-volume/high-velocity problems are commonly referred to as “event processing” and/or “streaming”.

Ignoring what might constitute high volume / high velocity problems (see later), Curt replaces “velocity” with “structure” to the “big data metrics” chart (with “velocity” being included in his “bigness” metric). But of course the argument over whether “structure” or “velocity” (or neither or both) are relevant metrics for Big Data is entirely perspective-based:

  1. both are characteristics of data / events and
  2. both affect processing and storage techniques,
  3. … along with other metrics like data lifecycles and data value.

From an event perspective, event payloads (real-time data) can be simple values, tuples (such as the equivalent of a database record), or complex explicit data (such as an XML document), for which something like TIBCO BusinessEvents rules, continuous queries or patterns can be applied. For unstructured text then you may want to add TIBCO Patterns, and for non-deterministic data something like TIBCO Spotfire S+ (think neural nets and the like).

From a “big data” perspective, event processing use cases can include customer purchase records, credit card transactions, phone voice packets or text messages, inventory updates, operational sensor reports, etc etc. But from the event processing perspective (i.e. actually exploiting “big data”) there is another dimension to consider: the scale and velocity of the incoming events versus the scale and velocity (and structure) of the existing data it needs to be related to and/or processed against. Some examples might be:

  • large volumes of data at high velocities, compared to large volume of data
    = national security applications
  • large volumes of data at high velocities, compared to normal volume of data
    = sensor processing like Radar
  • normal volumes of data at high velocities, compared to large volume of data
    = web search
  • normal volumes of data at high velocities, compared to normal volume of data
    = automated trading in Capital Markets

This might be a useful way of comparing Big Data requirements against the multitude of different IT technologies and solutions out there. Today, CEP is mostly dealing with normal volumes of data at low to high velocities being tested against normal(ish) volumes of data (maybe up to Terabytes but not Petabytes), with the higher end values requiring fast data grid solutions such as TIBCO ActiveSpaces. But as always, it would be interesting to have some metrics against the Big Data use cases  to see what we are all talking about…