Cloudera Impala and TIBCO Spotfire®: Fast Interactive Visual Analytics on Big Data

Cloudera Impala and TIBCO Spotfire®: Fast Interactive Visual Analytics on Big Data
Reading Time: 4 minutes

This post was originally published by TIBCO Spotfire partner Cloudera.

Not a day goes by without an interesting discussion about how TIBCO Spotfire helps organizations derive insights from powerful tools available with a Cloudera enterprise data hub, including the leading analytic database, Impala. Old friends from customers ask how TIBCO can assist in their Proof of Value project and acquaintances want to learn the details behind what makes Spotfire and Impala such a great combination. What the business world is experiencing is the exponential growth of data and the accelerating demands or expectations from Big Data that are steering organizations to figure out ways to economically store and manage their vast volumes of data. It seems that a large majority of the organizations are evaluating and choosing Cloudera to do so—a huge driver of why TIBCO has partnered with them and developed a certified connector through the Cloudera Accelerator Program.

TIBCO Spotfire has delivered in-memory visual data discovery and analytics since the 90s and our goal has always been to deliver a highly interactive, responsive, and fast analytic environment. For analysts and consumers to instantly derive insights, data must be at hand, when it matters most. Visualizations, filters, expressions, and functions must all be available in a single view so you are able to quickly define visualizations and drill down to the data you need at any given moment—regardless of data volumes, whether 300M rows in-memory, or half a petabyte in Impala, or your latest business results from Excel. The result could be analytic workflows, dashboards, or reports that can then be shared, reused, and consumed across the organization on virtually any device.

To deliver the best user experience, Spotfire’s native in-memory data engine or a fast database offers low-latency query response. Impala has not only proven itself to be one of the fastest Hadoop query engines, but also a standard tool in the Hadoop ecosystem—with broad usage and industry support. Impala can deliver an interactive analytic experience that Spotfire users expect, with the quality, supportability, and compatibility necessary to run in production. Its capabilities include responding to plot-driven aggregation queries across Big Data sets in runtime, and also delivering slices of row-level data via on-demand drilldown scenarios created by users on the fly. This allows users to visually interact with massive amounts of Impala data in real time, and examine it side by side with other data assets.

Figure 1: A typical Master-Detail configuration in Spotfire
Figure 1: A typical Master-Detail configuration in Spotfire

A native Impala connector developed in cooperation between Cloudera and TIBCO (through the Cloudera Accelerator Program) was the only effective way for Spotfire users to achieve these capabilities. The connector is certified on CDH5 and includes support for Single Sign-On with Kerberos for added security. By simply installing the Cloudera driver in the Spotfire analytics software, users can connect to the Impala cluster—and they’re off to a great start. They don’t have to be modelers or coders who write complex queries—Spotfire’s built-in intelligence generates the optimized queries needed to visualize the data in Hadoop, (and when necessary to join with your other data) get the data out of Hadoop, allowing users to concentrate on data discovery and insightful analysis.

Spotfire and Hadoop: Interactive Visual Analysis on Big Data

The Impala and TIBCO Spotfire capabilities discussed above are vast, but interactive analysis and dashboards are only the beginning. Extensive geo-analytic support within Spotfire makes it easy to generate insights from geographical data.

Experienced users can write their own custom Impala SQL and MapReduce queries from within TIBCO Spotfire to extend the capabilities even more. Additionally, the combination of Spotfire, TIBCO Enterprise Runtime for R (TERR), TIBCO’s advanced analytic engine for the R language, and Hadoop allows less technical users to visualize and make predictions based on large Hadoop data sets. TERR typically runs in the dedicated Spotfire Statistics Server or within the Spotfire client, but you can also choose to bring the engine to the data by running TIBCO’s Enterprise class R engine inside the Hadoop cluster. In addition, with TIBCO Spotfire Content Analytics, unstructured human-created, text-based data such as documents, CRM notes, or call center notes stored in Impala can be analyzed and full-text searched from within the same TIBCO Spotfire interface—allowing business users to get a more holistic view of their business and make informed decisions.

Together, the combination of Spotfire and Impala helps organizations easily deliver analytics for any type of user who is empowered to apply those analytics to any kind of data in many new ways—with the fundamental goal to convert the effects of technological advances into superior, competitive value for our customers worldwide. The partnership between Cloudera and TIBCO has opened up many opportunities and, with the fast advancements made by Cloudera—such as allowing more and larger data sets to be queried—TIBCO can continue to deliver the user experience our customers need.