In this post, we will look at how to provide a “Customers who bought this also bought” type of recommendation. This familiar recommendation we see when shopping on Amazon presents us a list of products, which are often of interest to us. How can we find products that are relevant to our customers so we can make suggestions for them?
Affinity analysis allows us to find product categories that are often bought together by customers. The Customer Analytics Affinity Analysis Template for Spotfire in the Customer Analytics Template series lets us perform an affinity analysis and identify high-affinity product categories from historical purchases made by customers. To help illustrate the use of the template, it comes with a sample set of fictitious data that is created such that the data contains strong purchase patterns for certain categories of products. When categories frequently bought together are found, we can use them to drive further propensity analysis for cross-selling and make recommendations to our customers.
In our affinity analysis, the template uses an algorithm to generate an affinity score that measures how often two products are purchased together by the same customer. The algorithm uses a measure called the Jaccard index for the affinity score. It measures the similarity between purchase history of two product categories. The higher the score, the more similar the purchase history of the two categories, and thus the more often the categories are purchased together.
We are using the template’s sample dataset to go through the examples in this post. Otherwise, the historical customer purchase data used by the template would have been prepared with the Customer Analytics Data Pre-processing Template.
The Affinity Analysis Template lets us select the product categories we want to analyze and generates the affinity scores for every pair of product categories within those selected. The template uses 4 visualizations that are used in conjunction with each other to help us identify high-affinity products. Let us take a look at how to use these visualizations by analyzing the results generated for the sample dataset in the template.
We start analyzing the affinity results with the Product Affinities Principle Component scatter plot, shown in Figure 1. The algorithm converts the affinity scores, which is a similarity measure for every pair of product categories, into a distance measure between the pair of categories. It then plots the categories as points in a 2-dimensional scatter plot.
The plot serves to lay out the categories graphically to bring out differences between categories as much as possible. Categories that have large distance measures (i.e. dissimilar) with each other are found far apart in the plot. For example, the men’s outdoor, women’s outdoor, and camping gear categories (on the top left corner) are very different from the 7 women’s clothing categories (dresses, shoes, skirts, sweaters, handbags, pants and suits) on the far right. We also find another group of categories at the bottom of the plot. Clusters of points like this will drive our analysis.
We will select the 3 categories (men’s outdoor, women’s outdoor and camping gear) on the top left. We use the Product Affinities heat map in Figure 2 to visualize the affinity scores for these categories. The product categories are represented on the horizontal and vertical axes. The color of each cell on the heap map where 2 product categories cross reflects the affinity score for these categories. The darker the color, the higher the affinity score. We find that the 3 selected categories (the dark cells highlighted on the bottom right) have high affinity scores with each other.
With the same points selected, we use the 3rd and 4th visualizations to inspect the actual customer purchase history. Both these visualizations, shown in Figure 3, are heat maps and they use the same variables on both axes. The only difference between them is that one of them shows all the product categories being analyzed while the other shows only the categories selected in the Product Affinities Principle Components scatter plot.
To interpret these visualizations, look for purple bands that run across columns of product categories. They indicate that customers purchase those categories together. In this example, we confirm that the 3 categories (men’s outdoor, women’s outdoor and camping gear) bought together by customers.
Let’s walk through another example. We start with the Product Affinities Principle Components scatter plot again, but this time, we select the cluster of points close to the bottom of the plot, as shown in Figure 4.
We explained earlier that points far apart on the plot are dissimilar with each other (and thus not often bought together). We also analyzed a cluster of 3 categories in the previous example and found that they are often bought together (high affinity and thus similar). Before we continue, let’s take note that the inverse is not always true. That is, product categories close to each other on the scatter plot may not always be bought together, as this example will illustrate.
Figure 5 shows the affinity scores highlighted for these categories in the Product Affinities heat map. Unlike the earlier example, we see a block with many relatively lighter-colored cells that reflect lower affinity scores. However, we can still see some darker cells in the highlighted region and we will come back to them later.
Next, we go on to look at the customer purchase history heat maps, as shown in Figure 6. Inspecting the first purchase history heat map, we find purple bands running across the columns for certain categories, such as men’s dress shirts and men’s pants and men’s running clothes and sports accessories. Women’s running clothes and sport accessories seem to have purple bands running across the same rows too. The purple bands for the other categories appear to be more randomly scattered. We can inspect the categories more closely using the heat map that has only the categories selected for this example.
For the cluster of points in this example, we conclude that only men’s dress shirts and men’s pants, men’s running clothes and sports accessories, as well as women’s running clothes and sport accessories are pairs of categories that are purchased together. Recall that earlier on, we found a few darker cells in the highlighted regions of the Product Affinities heat map in Figure 5. Those cells correspond to these pairs of categories.
To satisfy ourselves that no pattern exists in the remaining categories (men’s coats, men’s suits, women’s tops, men’s sweaters, men’s polos, and women’s jeans), look at Figure 7, which shows the customer purchase heat map for only these categories. We do not find any purple band running across the categories.
Now that we have seen how the template can help us identify product categories commonly bought together by customers, you may want to try the analysis on your own data. You can do so by downloading the Customer Analytics Affinity Analysis Template for Spotfire from the TIBCO Community site.
This post is the fourth in the Customer Analytics Templates for TIBCO Spotfire series. Read the first three posts on knowing, segmenting, and performing propensity analysis on your customers. Ready to try? Download a free trial of TIBCO’s Spotfire analytics platform.