Navigating Big Data with Ad-hoc Visual Data Discovery

Data is at the very heart of analytics. Besides V’s of big data—Volume, Velocity, Variety and Veracity—it is equally important to be mindful of where the data is stored, how it is accessed and how data discovery tools process, visualize and analyze the data.

Especially, with the emergence of new big data technologies, organizations are being challenged with tackling the issue of data access and determining the best approach for optimal performance.

In this article, we review multiple approaches for data access such as the more commonly practiced in-memory and in-datasource modes. We will also discuss, how Spotfire offers a flexible architecture that allows users to access data via a combination of the traditional modes as well as, unique to Spotfire, the data-on-demand approach.

In-Memory

One of the more traditional approaches for accessing data is in-memory. As the name implies, based on the query criteria, the visualization tool brings in all the relevant data into its own internal memory from the data source. It then sorts the data into a format that allows it to do the calculations required for visualizing the data quickly and efficiently.

This method, of course, offers great performance and speed benefits- so long as the amount of data being loaded into memory is not too large and takes less time, the data set brought into memory suffices the specific questions being asked and perhaps the need for data refresh is infrequent. However, this approach falls short when considerations such as data volume, time to load data from source into memory, going back and forth for new data to refresh results, network speed or strength, memory space etc, become more relevant and critical for performance.

In-datasource

Another fairly common approach for data access is in-datasource (formerly in-database before the advent of Hadoop or other non-database technologies). In this scenario,  the data being analyzed stays in the data source and only the results are brought in to display as visualizations for further analysis.

This method seems much more efficient, however, has its own set of limitations especially when one starts to ask questions such as:

  • How many threads does the database engine have to fulfill requests?
  • How is performance impacted when questions become more complex and require row-level details

Another approach that falls into this mode of data retrieval is to pre-create a set of query answers on a regular basis so that the most common question can be answered quickly and efficiently.  Think Cube? Cubes are pre-built with specific dimensions and measures that contain pre-built answers to common questions and results can be visualized so long as the dimensions and measures match what you want to visualize.

Data-On-Demand

A third option for data access is the hybrid of in-memory and in-datasource, wherein data stays in the data source and the relevant data is retrieved from the source system only when it is needed. This approach optimizes system resources and does not degrade performance- essentially offering the best of both worlds.

Spotfire for Big Data

The key to visual data discovery on big data is to access data in a combination of different ways at the same time, from the same analysis or dashboard.

As a visual data discovery platform, TIBCO Spotfire offers an agile and flexible architecture and is capable of analyzing data from virtually any data source.  Spotfire provides data connectors to a wide range of data sources that allow it to work in-datasource . Relational data sources have different capabilities, however, cubes, with their pre-calculated measures are very different from relational data sources. Spotfire can accommodate all these different relational and non-relational technologies with great ease  – providing enhanced connectivity options.

Besides visualising Big Data, Spotfire offers numerous advanced analytics on Big Data such as running statistical models. This is enabled by the TIBCO Enterprise Runtime for R (TERR) statistical engine. The key to advanced analytics is also the combination of in-datasource and in-memory techniques. For instance, when running predictive analytics in Spotfire, only the relevant data is analyzed and computed in-datasource, then brought into Spotfire and enriched with the powerful in-memory expression capabilities of TERR. Learn more about TERR.