
TIBCO’s own Mark Palmer, SVP, integration and event processing, sits down with technology journalist Ellis Booker to discuss the evolution of live streaming data. Proponents of Big Data in every industry are using it to examine their own data and find patterns to make better, more informed decisions in real time… what we at TIBCO call Fast Data. Eavesdrop on Mark and Ellis’ chat to learn more!
Transcript:
Ellis: Welcome to the TIBCO Podcast. I’m your moderator, Ellis Booker. Our guest today is Mark Palmer, Senior Vice President, General Manager of Integration and Event Processing at TIBCO Software, Inc. Hey Mark!
Mark: Hi. How are you?
Ellis: Great. Thanks for joining us. Let’s start off by just, if you could, explain your role there at TIBCO.
Mark: Yes. So I am responsible for TIBCO’s Integration Products and Event Processing. That includes products like BusinessWorks, which is an integration bus, MDM or Master Data Management, Event Processing, the Live Datamart, and our engage or customer cross-sell upsell application infrastructure. I was the StreamBase CEO. We were acquired about 15 months ago, and I’ve been responsible for that processing since then. I think that this role is new and that we’re building more of a platform out of all of these components, and that’s why my role has been sort of expanded to include these other products to get our customers a more consistent interaction and strategy for the product portfolio.
Ellis: Very good. Proponents of big data analytics are now turning their attention to a whole problem, right? How to effectively use streaming data, which is, of course, very voluminous, moving very quickly, ’cause it’s a stream. Can you describe how TIBCO’s Live Datamart connects directly to this streaming stuff while it’s still moving, and why that is an important direction for analytics?
Mark: Yeah, sure. The big thing with traditional data warehouses, or datamarts, or databases, really, they have traditionally stored information into a database on disk, when the data stops moving. So, this is traditionally being done by ETL technologies, which load and batch the information that you want to analyze. That’s all great and there’s a lot of good applications of that, and that’s traditionally how the Hadoop system also works. But what is now one of the next phases of that model is to not just store and then look at the data lakes, if you will, look back at what’s already happened or look back at the rear view mirror. The new approach, or additional approach, is to [inaudible 00:02:47] the actual streams as they’re coming in. So you look at the data as it’s moving, while it’s still in motion, and that gives you a more real-time, up to date view of your systems and operations.
Ellis: Okay. So we’re talking about querying the stream as it’s flowing across. Can you drill into that a little deeper for us and tell us exactly what technology is responsible for doing that?
Mark: Yeah, absolutely. So the Live Datamart, and also then processing underneath, what they do is they let you take a query and interject it into the stream and look for patterns as the data is flowing by. So, for example, if you’re in an oil and gas company, and you have sensor information flowing from oil wells, you want to know, based on the current temperature, pressure, and voltage readings, that those readings are now breaking some barrier that’s important that might be a signal that the well is about to fail, and then you can use that real-time view to do predictive maintenance on that well. So you can actually look and see if there are some conditions there that need to be paid attention to.
That’s in contrast to the, sort of, the data lake model, where you would store that information, then maybe the next day or the next week, you would run some analytics and you might find out that an oil well is having problems, but it’s too late. So that’s why the query, the streams, streams in this case being internet of things, data as they, readings as they come off the well in real-time. So it allows you to act in the moment and recognize issues before it’s too late.
Ellis: Right, and the example you just gave, the oil well, sort of industrial applications for internet of things. Is there going to be a play here for your retailers, your insurance companies, your credit card processors, for this querying of streams?
Mark: I think, yeah. Just about any industry right now is dealing with streaming information. So you’ve got retailers that want to have a sense based on sensors what current, how many customers are in what stores, what their loyalty levels are. Also streaming information about what inventory is in the store, and then bringing that all together into one consistent view. Introducing social data, tweets from Twitter, updates from any social media source, that information is streaming. Obviously, the capital market’s trading has been doing and dealing with streaming data for, gosh, 20 years now, and is using this technology a lot. In fact, that’s where some of this Live Datamart technology was born. Telecommunications networks, manufacturing, particularly high-tech manufacturing, healthcare. All these environments have data that’s coming at it. Now the current state of the art is just to store that newer data lake and query what happened yesterday. But that’s where the new technologies are moving to where you can, you know, query those streams instead or actually, generally in conjunction with looking back at histories as well, because history is still very important.
Ellis: Right. So, one question that always comes up with new technologies is, can I use the same metaphors and approaches that I’ve used in the past, for example, my querying my datamart to querying this Live Datamart, or do I have to learn an entire new language? So, talk about that.
Mark: No, very, very much, a very familiar programming and query language metaphor for this Live Datamart. It very much looks like SQL in a lot of ways. It’s different in subtle ways in that it’s very temporal oriented. So you’ll do a query, you know, for a traditional database to say, hey, what customers bought this kind of sweater in the last five weeks, right? What the Live Datamart lets you do is say, query and ask for these kinds of customers that match this profile in any, sort of, 10 second window in the future. And that’s actually a key distinction, is that element of time. Having a query that can be submitted and then answered constantly as the data streams flow through it in an effort to find out what’s going to happen, you know, anticipate what’s going to happen next, rather than what’s already happened. So that query language of saying, here’s a query, enter it into the system, watch for these temporal predicates in not only just right now, what’s happening right now, but what’s also happening in the future.
Ellis: Right. So, part of this, I guess, would be to kick out alarms or notifications based on events happening in the future, right?
Mark: Yeah, yep, that’s one big use for it. The other is to really serve a different class of business user that is often underserved by data warehousing technology, and that’s the user who is operationally focused. So, as soon as you turn your operations real-time, you need a different type of staff to monitor operations intelligently. Like so, for example, in the oil and gas example, you need a drilling operations sector to be constantly making sure that the oil wells are running effectively. Marketing organizations now are turning to be very much more dynamic in terms of, in real-time measuring the impact of their social media campaign.
Trading, actually, if you look back 30 years ago, 20 years ago, or even 10 years ago, we had, yeah, everybody’s familiar with the trading pits and the traders that were there yelling at each other making trades. Now, the people that are monitoring and managing trading operations are sitting behind a computer desk and they’re actually monitoring in real-time what’s happening live. So, just about every industry is moving toward these knowledge workers that are using computers to manage more automated operations, and those are the types of people that the Live Datamart serves in this streaming technology.
Ellis: So, would the volume of stream that we’re talking about with any processing sensor array or the financial trading example you gave earlier. What does that mean for storage? In other words, I take it you’re not storing all of this stuff. Is there a protocol for when you’re taking this real-time event information, deciding it’s not worthy for capture, and getting rid of it, and how’s that working?
Mark: Yeah, the sort of, the data stream, data lake analogy, I think, works pretty well here, right? Where, what the streaming technology tends to do is it looks at data as it’s coming into your organization. Now you’re not often going to want to save all of that stuff, especially when it comes to like, IoT sensor readings from oil wells, or from sensors in stores, or from sensors that are on items that you’re selling in a store, right? Most of that data you want to just sort of, not store, but sometimes you want to store it all.
Now, the storage of massive amounts of information is the decision that usually is associated with, still now, with data warehouses Hadoop, especially Spark, which is new, next generation of that infrastructure. So that, those decisions and that storage is still really important and it still happens. It is not the design sensor of the streaming technology. That’s really for, again, operational decisions that are happening in theory, if you will. The things that are happening, you know, sort of today, the last week, the last month, now they were always sort of, inquire history, so they wanted to combine that. But the new element, the new feature that you get is to also be able to query the stream in concert with what’s in the data store.
Ellis: Now, if I understand it, TIBCO’s Live Datamart is iteratively recording events based on the past queries that the analyst has done. So what’s the advantage of that and how . . . ?
Mark: Yeah, so the advantage of that is kind of the example before, right? Which is querying the future. So, let’s say I’m a trader and I want to say, I want to issue a query that’s, tell me any time an order above $50,000 is stuck anywhere within my IT environment for more than 20 seconds. So, that query could be submitted and now, at the beginning of the trading day, obviously that query will be empty. But what the Datamart does, and this is why, you’ll know where the new iteration is, it just remembers the query, right? So that when the trading picks up, you then get the query constantly refreshed, so that you can see that live view of what’s going on, and so that’s the benefit of the way that we process queries is about remembering the queries and querying the future. Now that’s the first benefit.
The second benefit is that you actually get the benefit of reuse, right? ‘Cause once you have a good set of queries that you watch for those really important conditions, those operational conditions, you can store them and continue to have the query processor answer the questions. So you can store up this bank of memory, of intelligent algorithms that are constantly looking for these conditions that are critical to the business.
Ellis: Right.
Mark: So that’s the benefit of being able to store the queries persistently in the data store.
Ellis: In terms of just the query engine itself, is there any machine learning going on here? In other words, is there any automated mechanism for generating queries based on outliers or interesting correlations or something like that?
Mark: Yeah. So there we have a really interesting, this is where we’re doing some really interesting work with the TIBCO portfolio and also through partners, but one of the big ones that we’re doing is working with the Spotfire team, which is our analytics division here at TIBCO. We also have a product that we call TEAR which is TIBCO’s Enterprise R Runtime. So it’s a statistical predictive analytics package that is R that’s been optimized for real-time. And the idea is that a lot of learning happens within these predictive analytics packages and also, in the analytics packages that are looking at those data lakes, right?
So for example, with Spotfire, you can figure out, based on all the historical view of the sensor readings from the oil wells, for example. You can actually look and analyze it and figure out, okay, here are the conditions. Here’s how much the temperature and pressure readings have to change, and here is the statistical probability that that kind of reading from the sensors is going to lead to failure. Once I analyze and learn that pattern, I can then put that pattern into R, into the predictive analytics package AR, and then just push button deploy that into the Live Datamart, and that lets you run full circle between discovery and into operationalizing and watching for those patterns. And that’s one way to do it. There’s certainly also machine learning packages, third-party integrations that we could do the same, but that’s one that we find to be very popular at, you know, within our TIBCO customer base.
Ellis: Right. And as a final question you mentioned a moment ago, that this kind of process allows one to see what’s going on in real-time and even predicting, kind of, future events if it’s set up properly. Of course, comprehending that requires, doesn’t it, some sort of a visualization? Does the TIBCO Live Datamart come with a dashboard or a visualization tool, or can you use third-parties or TIBCO’s own for that?
Mark: Yeah, absolutely. That’s the component of the Datamart that we call Live View, which is the actual user interface that sits on the user’s desk, so you’re getting in a dueling operations center for example, you want a rich, graphical interface that lets you both interactively query the Live Datamart and also see the results in a constantly updating fashion. That desktop or control center environment we like to think about it in like, not sort of, a static environment, because it also has to help you control actions. So it lets the user visualize, for example, on a map, and a heat map, line charts, bar charts, tables. So that’s the visualization component.
It also is the design center and the control center for all your alerting and your filtering of the alerts, because the big trick here is these real-time systems tend to generate thousands, or millions, or even hundreds of millions, in something like the financial markets, of events every day, so the big trick is how do you filter out the wheat from the chaff? How do you find the hundred events out of say a hundred million that actually matter that you want to act on? And that’s the goal of the Live View desktop, which is the, you know, sort of the face, the single real-time pane of glass into the streaming data. And all the queries that you’ve entered into the system.
Ellis: And, as a final question, just in terms of who’s using this, or who will be using this, is it going to be the existing staff of data analysts that you have or do you see this kind of capability pushing down into line of business type people who might have operational requirements or operational responsibility? The oil well guy needs to know what’s happening with the wells as it’s happening, as opposed to getting a report in five days after they’ve all failed, right?
Mark: Yeah, exactly. So it’s really the operational business users that are the end users of the platform. So they’re the ones, like you said, monitoring drilling operations, monitoring trade operations, controlling the actions that a merchandiser takes associated with what’s happening in the store and what my inventory levels are. The manager on the floor, managing the plant at a high-tech manufacturing company, right? In flight operations, we have lots of airlines that use our infrastructure. The flight operations people, the ones that are making the decisions on which crews get scheduled where, which flights get delayed and which ones don’t. Those are the kinds of business functions that need this real-time operational insight that really didn’t have it before. Sometimes they had it in applications, not very tactical, but this gives a general purpose infrastructure for being able to manage real-time operations more effectively and take action when it’s appropriate.
Ellis: And that’s all the time we have for these questions. I’d like to thank Mark Palmer, our guest, for a very interesting interview. Thank you so much for joining us, Mark.
Mark: Okay, thanks for having me.