In this post we explore adding R graphics to Spotfire. This is a case study of using dendrograms for the Spotfire Classification Modeling tool.
We will see how to augment the output of Classification Tree modeling in Spotfire with a commonly used graphic that summarizes complex models in an intuitive way. The steps to accomplish this will reveal several very useful ideas. The Spotfire Analysis that we create can be used either on the desktop or through the web. Usage through the web requires TIBCO Statistics Services, configured to run on TERR.
The Spotfire Classification Modeling tool (Tools > Classification Modeling…) gives the user a point-and-click interface for specifying a Logistic Regression or Classification Tree model. Running the model creates a page in the Spotfire Data Analysis with several panels and visualizations that expose split rules, variable importance, and other diagnostics. We describe how to add a dendrogram to this page that automatically updates whenever the model is run.
Many practitioners use a visualization called a dendrogram to summarize the structure of a tree model. The dendrogram commonly depicts the splitting structure of the tree, and has labels that describe the split rules and the composition of the nodes of the tree. See the example below, generated in R.
Specify and Fit a Classification Tree Model
The first step is to create a classification tree model in Spotfire. We have opened a data file that summarizes kyphosis procedures. The Spotfire Data Table shown in the Table visualization has 81 rows representing data on 81 children who have had corrective spinal surgery. The outcome Kyphosis is a binary variable, the other three variables (Age, Number, and Start) are numeric. The underlying kyphosis data set is taken from the TERR package Sdatasets, where it is documented.
We would like to model Kyphosis on Age, Number, and Start. In Spotfire, go to Tools > Classification Modeling, and specify the model as shown below. Click OK and the page shown in figure 1 is created. Notice that the variable Number is not important in this model. Next we observe carefully what the Classification Modeling tool created in Spotfire.
Objects Created by the Classification Model tool
When the Classification Modeling tool ran the model “MyModel”, it launched a Data Function powered by TERR. How do we know this? Looking at the Data Table Properties dialog, we see four new Data Tables in the analysis, all of whose names are prefixed by our model name: “MyModel”. In this dialog, the Source Information tab for MyMode_fitSummaryTable reveals the details of the TERR Data Function that created it. Scrolling down, we see that the model was fit by the treeClassFit function, which comes from the SpotfireStats package.
Near the bottom, we see that the input named “data” comes from our kyphosis Data Table, limited by filtering.
We also see that in addition to the four Data Tables, a representation of the fitted model is returned, in the Document Property named “modelObj.MyModel”. The documentation for the treeClassFit function states that the fitted model is represented as a raw binary object, and tells us how to read this object for further analysis, which is exactly what we’ll do to create a dendrogram later.
By the way, the Source Information tab also reveals that the Data Function is named “MyModel”. You won’t find this Data Function listed in the Data Function Properties dialog; it is hidden, but can be addressed by IronPython scripts if desired.
Creating a Data Function to Render a Dendrogram
In this step, we create a Data Function that uses the model object identified above to render its dendrogram on the training set. The dendrogram will be shown in a Text Area, and will update every time that the model is rerun on the training set.
The Data Function script is shown below. We’ll go through it step by step.
It is useful to develop the Data Function script in an interactive way in RStudio. Configuring RStudio to run TERR is performed as for an R engine; details are carefully documented. When developing the script, one should manually populate the inputs to be supplied by Spotfire to ensure the script runs as expected on a good variety of cases. Below, we see what this looks like in RStudio.
Now we return to the script.
To convert the raw binary modelObj, we use the BlobToSObject function from the SpotfireUtils TERR package. The SpotfireUtils package is loaded at the top of the script.
Since TERR does not have direct graphics package support (TERR uses Spotfire as primary environment for graphics), we use open-source R to generate the dendrogram. TERR calls on R for this, via the RinR package. It is necessary to have R installed on your system to use this approach. The rpath input is supplied by Spotfire. It contains the path to the desired R executable on your system. For use with TERR 3.2, R 3.1.3 is suggested, but other versions R 3.0.x through R 3.2.x ought to work fine.
[The RinR package provides functions for running code in open-source R from TIBCO Enterprise Runtime for R, or running code in TIBCO Enterprise Runtime for R from open-source R. Using RinR, you can:
Compare results of running the same code in different open-source R and TIBCO Enterprise Runtime for R versions (see RCompare)
Call a function from an open-source R package and return the results to TIBCO Enterprise Runtime for R (see REvaluate))
Use open-source R to create a graphic, and then return the graphic to be displayed in TIBCO Spotfire or a browser (see RGraph).]
The variable msg is initialized to an empty string. If an error is encountered, msg will pass the error message back to Spotfire. Any error message is captured in the last four lines of the script.
The RGraph function, from the RinR package, invokes R and runs the script supplied in its first argument. If its display argument is FALSE, as here, RGraph returns the graphic file as raw bytes. By setting display to TRUE, you can view the graph in the RStudio Viewer pane while developing the script.
Also in the R script, the rpart package is loaded, and the model object is given the S3 class “rpart”. You might recall in the documentation for the treeClassFit function that the fitted model object had class “arbor”. The close connection between the rpart and arbor packages is described in the arbor package DESCRIPTION file. The rpart and arbor packages are part of the base distributions of R and TERR, respectively. The formatg function in the arbor package is required to plot the model object in R; in our script, formatg is assigned to the global environment just before the plotting functions are called.
Once the script is in a satisfactory form, it is copied into the Register Data Functions dialog. The Script, Input Parameters, and Output Parameters tabs must be updated appropriately.
The Input Parameters tab allows you to specify a display name for each input; similarly for outputs.
Click the Run icon in the Register Data Functions dialog to invoke the Edit Parameters dialog. Here, you map inputs and outputs of the Data Function to objects in the Spotfire Analysis. For example you might map the “path to R.exe” input to the value D:/Program Files/R/R-3.1.2/bin/R.
The “model object” input should be mapped to the Document Property named “modelObj.MyModel”. As we saw earlier, this Document Property holds the classification tree model that we fit on our training data.
For the outputs, let’s map “output image” to a Document Property named dendrogram.msg, and “message” to a Document Property named dendrogram.msg. Notice we have checked the box labeled “Refresh function automatically”. Click OK to dismiss the Edit Parameters dialog, and close the Data Function Properties dialog.
Show the Dendrogram in the Analysis
To display the dendrogram in Spotfire dynamically, dismiss the Data Panel, create Text Areas at left and right, and keep the Model Summary panel and Variable Importance bar chart, as shown below.
Finally, place a range control filter for Age in the left-hand Text Area, and place label controls based on dendrogram.msg and dendrogram.img in the right-hand Text Area.
Change the range control filter setting, click the refresh icon in the Model Summary panel, and see the model summary and dendrogram update.
We have shown how to incorporate a dendrogram into the output of the Classification Modeling Tool in Spotfire Analysis. It requires access to the R engine, and uses the RinR package that allows TERR to call out to R graphics functions.
We found that the Classification Modeling tool in Spotfire is powered by TERR, and returns the model object for the fit in a Document Property that is available for further analysis in TERR.
By creating a TERR Data Function that receives the model object and generates a dendrogram with R via the RinR package, we expose a dendrogram in a Spotfire label control that is linked to the Classification Tree model. The dendrogram automatically updates each time the model is rerun, after filtering by Age on the training set, for example.
John M. Chambers and Trevor J. Hastie, Statistical Models in S, Wadsworth and Brooks, Pacific Grove, CA 1992.
Pang-Ning Tan, Michael Steinbach, and Vipin Kumar Introduction to Data Mining, Pearson Higher Education, 2005.