By Michael O’Connell and Mark Palmer
Microsoft’s purchase of Revolution Analytics created a storm of debate this week over the “right way” to embrace the R language for data science and analytics. Forbes, for example, declared the TIBCO approach of re-implementing the R engine for enterprise scalability plus the ability to access predictive analytics in real time with our Fast Data platform as “the right way” to promote the ultimate business value of R.
Open source purists argue, why not simply extend the existing R code base and continue to use the open source implementation of R?
The TIBCO approach is a hybrid approach: by building a proprietary, scalable, and embeddable version of R, we’re applying our unique ability to engineer software for bet-your-business mission-critical applications. At the same time, we’re promoting the use of R for business-meaningful applications—both for Big Data at rest and streaming analytics on Fast Data in motion. This promotes the use of R as a business tool, rather than keeping it locked away in the sole purview of academics and scientists. We’re promoting the business value of R plus fully embracing the awesome analytics methods developed in the Open Source R community!
For many CIOs, the idea that you can use predictive analytics to automate digital business applications is a big new idea.
Which kinds of applications does this enable? At TIBCO, we’ve seen a recent explosion of applications that employ real-time R. For example, data scientists have used Spotfire for years to mine data lakes to discover patterns such as, “Whenever readings from a submersible pump report a sizeable drop in current at the same time as a dramatic increase in pressure over a given time window relevant to the operation => investigate and open a maintenance case.”
The issue is, until now, there wasn’t a clean, scalable architecture in which to deploy an appropriate R model. IT would take the discovery from a data scientist, re-write the model in the StreamBase language, BusinessEvents rules, Java, or other environment. That takes too long, and limits R as a sideshow in terms of business value. As a result, R isn’t being viewed as a solution that delivers “real business value;” instead, it’s viewed as a tactical tool for the lab.
By creating an enterprise-class R implementation, we put R front and center of the business. Now, data scientists and IT can come together in one architecture that includes R at the core. Data scientists discover patterns with analytics tools such as Spotfire, and then deploy and rapidly iterate the same analysis in the same language inside their streaming analytics platform, such as StreamBase.
And the benefit of this architecture continues. As the R analysis operates in real time, we can gather feedback on how well it’s behaving. That feedback is used to update the appropriate analytic/model, and push-button deploy the R analysis into the stream-processing platform. The automated analysis runs, and we re-train the analytics to continue to refine its parameters, so the analysis gets better and better as we learn from its behavior on updating data. This rapid iteration is possible with the same engine embedded in the discovery and streaming environments.
This architecture provides a Darwinian progression of the intelligence of the analytics, delivering more and more business value as the analyses are improved by IT and data scientists, collaboratively.
Such Analytic Evolution leads to improved automation for the digital business—ongoing and rapid improvement in business impact, all based on Big Data intelligent analysis in R, applied to Fast Data in motion, adding more value and new reasons to use R for impactful business applications.
It’s a win for the business, a win for IT, a win for data scientists, a win for R. Everybody wins.