
Microsoft’s recent acquisition of Revolution Analytics has some data science and analytics professionals wondering what plans the Redmond company has for the R language. Data scientists and statisticians wrote the open-source R language for statistical computing and predictive analytics. While the language is very useful, its architecture has weaknesses that keep it from scaling effectively.
According to Dan Woods, contributor to Forbes.com, in order to fix these limitations, Microsoft should re-implement the R language much like TIBCO Software did when they created the TIBCO Enterprise Runtime for R (TERR). Otherwise, the open-source GPL license that protects the R language may limit what Microsoft can do as it keeps R from being widely embedded into commercial products.
According to Michael O’Connell, Chief Data Scientist at TIBCO, “R is too important to be limited by the GPL license. The data analytics world needs to use R whenever and however they want, and not be hindered by convoluted, non-performant integrations with Open Source R. That’s one reason why we re-implemented the language.”
Dan Woods proposes that Microsoft “adopt a model that reconciles the innovation energy that comes from a successful open-source project with the need to include the innovations of open source in many environments that are not native open-source and would violate the GPL license if you used the open-source version.”
Doing this would allow the R language to be tightly integrated inside Microsoft Excel or Microsoft SQL Server and the code could execute within those environments. This tight integration is not currently possible using the R GPL license, which simply states that the softwar—which connects too deeply with the open-source code—must also be open-sourced.
According to Joseph Sirosh, Corporate Vice President of Information Management & Machine Learning at Microsoft, “Moving forward, we will build R and Revolution’s technology into our data platform products so companies, developers, and data scientists can use it across on-premises, hybrid cloud, and Azure public cloud environments.” However, a recent preview at Microsoft Ignite indicated that R would be running inside a separate sandbox process on SQL Server—an example of the sort of loose integration that might be required to respect the GPL terms and to address concerns over open source R’s scalability, at the potential cost of performance and convenience for the end user.
Next Steps:
- Try Spotfire and start discovering meaningful insights in your own data.
- Try TIBCO TERR and experience the agility of ‘R’ for the enterprise
- Subscribe to our blog to stay up to date on the latest insights and trends in Big Data and Big Data analytics.