
Hadoop is a robust, economical choice for storing large amounts of diverse data on commodity hardware for later analysis. With native support for a number of Hadoop flavors—and a powerful combination of visual data discovery and advanced analytics—Spotfire empowers users of all levels to analyze data from Hadoop and efficiently derive great value from their big data assets.
There are two categories of Hadoop-Spotfire integration:
1. Spotfire native data connectors
2. Integration of Hadoop with the advanced analytic engine for the R language, TIBCO Enterprise Runtime for R (TERR)
Access and Visualization of Hadoop Data
With Spotfire native data connectors, users can access Hadoop data in a number of ways. An in-database connection can push aggregations into Hadoop for analyzing large datasets. Spotfire can then model and visualize this data without the need for scripting or manual query editing—so less technical users can manipulate these datasets—and those with advanced data modeling skills can still use queries.
Less Technical Users and Expert Workflows
From the Hadoop aggregation, users can extract selected slices of data for deeper analysis. Technical users can send this data to TERR for advanced analysis, and their entire workflow can be encapsulated as a best practice in Spotfire. Less technical users can then run the routine, benefit from the advanced visualizations, and make predictions based on large Hadoop datasets. Data scientists can create, analyze, and share results without dealing with the details of the Hadoop architecture.
Information Sharing and Collaboration
Spotfire connections to Hadoop can be quickly configured into analytic workflows, dashboards, and reports, which can then be shared, reused, and consumed across the organization. KPIs based on Hadoop data can be pushed to virtually any user device with Spotfire Metrics. Extensive geo-analytic support within Spotfire makes it easy to generate insights from geographical data.
Production Analysis of Hadoop Data
Once a data connection to Hadoop (or other data source) has been defined, TERR can reuse these connections to extract and analyze data through custom queries independent of Spotfire. TERR includes a set of R functions for this purpose, which can be called in simple scripts in a TERR session, so large Hadoop datasets can be scheduled for routine analysis.
TERR + Hadoop = Power
TERR can execute native Map/Reduce calls directly on a Hadoop cluster via the Hadoop Streaming Interface. Faster, more scalable, and more robust than open source R, TERR enables users to process Hadoop data more quickly and reliably. As performance gains are multiplied across nodes, this approach produces analytic answers much faster and with fewer resources. Individual predictive models can be created in parallel across the Hadoop cluster, and then combined to provide a better, synthesized, sharable result.
With Spotfire, TERR, and Hadoop, analytics can be faster, better, and easily accessible to a very large population of users. Securing these capabilities for better decision-making across the organization is smart.