Overview

Big Data Connectivity for High Performance Analytics
Spotfire offers three primary types of native integration with Hadoop and other big data sources:
- Visualizing Data: Native out-of-the-box data connectors that facilitate super fast interactive data visualizations.
- Performing Calculations:
- Bring the engine to the data: Integration with in-datasource distributed computing frameworks that enable data calculations of any complexity on big data.
- Bring the data to the engine: Integration with external statistical engines that get data directly from any data source, including traditional databases.
Together, these modes of integration offer a combination of visual data discovery and advanced analytics. They enable business users to access, combine, and analyze data from any underlying data structures with dashboards and workflows that are powerful and easy to use.

Big Data Connectors
Spotfire Big Data connectors support in-datasource, in-memory and on-demand data access modes. As a result of this data access flexibility, fast interactive visualizations are made possible such that data calculations occur within the data stores and the data is moved into client memory if and when it is needed. Spotfire native data connectors include:
- Certified Hadoop data connectors for Apache Hive, Apache Spark SQL, Cloudera Hive, Cloudera Impala, Databricks Cloud, Hortonworks, MapR Drill and Pivotal HAWQ
- Other certified big data connectors include Teradata, Teradata Aster and Netezza
- Connectors for OSI PI historical and real-time sensor data sources
Learn more about data access with Spotfire data connectors.

In-Datasource Distributed Computing
In addition to convenient Spotfire point-click SQL operations running distributed within the datasource, advanced statistical and machine learning algorithms can be initiated from Spotfire to be run in-datasource on very large datasets, only returning the results needed for visualizations in Spotfire:
- Users interact with point-and-click dashboards that call scripts using the TERR instance embedded in Spotfire.
- The TERR scripts initiate distributed computing jobs via Map/Reduce, H2O, SparkR, or Fuzzy Logix.
- These jobs drive high-performance engines deployed on the Hadoop or other datasource nodes.
- TERR can be deployed as the advanced analytics engine in Hadoop nodes that are driven by MapReduce or Spark. It can also be called on Teradata nodes.
- Results are visualized in Spotfire.

Putting it all together
Combining all these powerful functionalities means that very sophisticated and robust analytic use cases can be encapsulated in easy-to-use interactive workflows. This empowers business users to visualize, analyze, and share the results without worrying about the details of the underlying data architecture.
Example: Spotfire interface for configuring, running and visualizing the results of a model that identifies characteristics of lost shipments. Through this interface business users can perform calculations using both TERR and the H2O distributed computing framework against shipment transaction data stored in a Hadoop cluster.
Analytical Breadth for Big Data

Advanced and Predictive Analytics for Big Data
Users interact with point-and-click Spotfire dashboards to drive a rich array of advanced capabilities that enable prediction, simulation, and optimization. With big data, analysis can be performed in-datasource, only bringing back the aggregations and results needed to populate Spotfire visualizations.

Content Analytics for Big Data
Spotfire provides visualization and analytics on the largely untapped dimension of big data: unstructured text that is captured but hidden in documents, reports, CRM notes, weblogs, social posts, and other sources. Spotfire allows you to visually analyze text-based data in 27 languages and blend it with structured data to add context and detail and obtain deeper insights.

Location Analytics for Big Data
Multi-layer high resolution maps are an excellent way to visualize big data. Spotfire's rich mapping capabilities allow you to create maps with as many reference and feature layers as you need, including calculated advanced analytics features. In addition to geographical maps, Spotfire supports custom maps to visualize data for warehouses, factory floors, semiconductor wafers, and many others.

Machine Learning for Big Data
A broad class of machine learning methods are available in Spotfire as point and click data functions that users can invoke. Data scientists have access to the underlying R code and can extend the data function collection. The machine learning functions are shared with the user community for easy reuse.
Machine learning methods for continuous and categorical response variables are available in Spotfire and TERR including:
- Linear and logistic regression
- Decision trees, random forests, gradient boosting machines (gbm)
- Generalized additive models
- Neural networks

Real-time Event Analytics for Big Data
Insights from visual analytics and modeling in Spotfire can be deployed, at the press of a button, to event processing systems and scored/run on real-time streaming data. This allows you to monitor real-time data and alert end users, such as marketers or engineers, when an anomaly occurs or a new trend begin to emerge. The alerts can combine recent event data with historical data, providing context to enable users to investigate an event's importance and quickly decide on any necessary intervention.
TIBCO Streambase is integrated with Spotfire for such real-time streaming analytics. Streambase does real-time math on streaming data; using rules and models published in Spotfire. Streambase applies the Spotfire insights to streaming data in an automated manner, pushing notifications to a wide array of channels including text, email, database, and BPM systems.
Key Features
Scalable data visualizations
Spotfire big data data visualizations can scale to represent billions of rows of data within an analysis.
Intuitive user interface
Spotfire dashboards and analytic workflows can encapsulate sophisticated use cases that enable business users to visualize, analyze, run calculations, and share the results.
Flexible data architecture
Spotfire's seamless user experience is made possible by the richness of options to access data of any size, perform calculations of any type, and efficiently visualize data aggregations or row-level details.
Agile platform
Spotfire's agile platform empowers business analysts to drive advanced analytic workflows and applications for big data and become truly data-driven.