What is a Scatter Chart?
A scatter chart, also called a scatter plot, is a chart that shows the relationship between two variables. They are an incredibly powerful chart type, allowing viewers to immediately understand a relationship or trend, which would be impossible to see in almost any other form.
Their origins are unclear, but modern-day scatter charts are based on René Descartes cartesian coordinates system, created in the 17th century. Scatter plots are used a lot in science with a large majority used in science journals and publications.
Scatter charts have been said to be one of the most versatile and useful inventions in the history of statistical graphs. While this may be a bold claim, scatter charts take confusing data and make sense of it. They are far more than just a tool for visualization; they are a tool for discovery.
How Does a Scatter Chart Work?
Like most other graph or chart types, a scatterplot has an X and a Y axis. The X is the horizontal line with the independent variable and the Y is the vertical with the dependent variable. An even scale is created on both axes, and then a mark or dot is made at the point that represents the intersection of the two coordinates.
There are other patterns to be found within a scatter chart:
- Linear or nonlinear: A linear—straight—correlation can be formed through the data points, but a non-linear correlation might show a curved relationship.
- Weak or strong: The stronger the correlation is, the closer the dots will be together. A weak correlation will have more data points spread out.
In order to clearly show these relationships and trends, many scatter charts utilize trend lines. A trend line is drawn on the chart to emphasize the direction and strength of the trend.
Best Practices for Scatter Charts
There are a few simple tips to make sure your scatter chart presents information cleanly and without any distortion to the data.
Start the Y axis at zero. While there may be some instances where a scale accordion is required so that the data is presented more accurately, these instances are few. Be very careful when deciding if an accordion is needed or not.
Keep the scale evenly distributed across both axes. This means there is no distortion.
Think carefully about outliers. If there is reason to suspect they are incorrect, or if they do not add value to your story, it could be wise to exclude them.
With scatter plots, it is often better to include more data and variables, not less. Unlike other chart types, if done correctly, scatter charts are not confusing with more data. Consider adding size and color variations to dots in order to include more relevant data in a way that keeps it easily understood.
Use trend lines. These lines are generally plotted by the software, although they can be added manually. These lines help to make trends very clear to the viewer. However, do not have more than two trend lines, as this can be confusing.
When to Use Scatter Charts
Aside from scientific studies, there are a few times when businesses may decide to use a scatter chart:
- To identify anomalies
- To see how one variable affects another
- To see a correlation, pattern, trend, or relationship
A real estate agent may want to see a relationship between square footage and the price paid for homes. While this simple scatter chart may not drill down and show all the variables, such as location, the recentness of renovations, or size of the garden, it will still give buyers and sellers an idea of what the market is doing and where a house may fit on the scale.
A business may want to see if there is a relationship between sales volumes and some other variable. Does the weather influence sales? The day of the week? What about the number of clothes on a rack? Are more clothes sold if there are more on display?
Benefits of Scatter Charts
Scatter charts have multiple benefits and advantages.
Clearly Shows Relationships
This is arguably the best chart to show relationships between two variables. Not only does it show a relationship between two data points, it shows a whole pattern or trend over a data set.
Easy to Create and Understand
Perhaps because of its popularity, scatter plots are immediately understood. Their purpose is easily recognized, and its data is easy to digest. Not only that, but for those wanting to make a scatter chart, they are simple to create.
The Range of Data Can be Determined
The maximum and minimum values can be seen on scatter plots, which is important to understand the entire data set. However outliers can create confusion.
Disadvantages of Scatter Charts
Can Have Too Much Data
If there is an over-plotted scatter plot chart, patterns are hard to see, as it is just a giant blob. So while a chart needs enough data to form a visible correlation or pattern, there is a point where more data becomes less helpful.
A heatmap may help, showing the most point-heavy parts of the chart. Consider color coding different data sets.
There are times where data can appear to have a pattern or association. But while height and cat ownership might look related, they probably are not.
Avoid plotting variables that are unlikely to be related.
Correlation Does Not Equal Causation
Always remember that correlation does not equal causation. Just because there is a correlation, does not mean one causes the other. While it may seem that tall people own more cats, it is unlikely that tallness is the cause of cat ownership. Even more logical relationships can succumb to this too; while sales may rise when the weather is cold, is that due to the weather, or some third variable such as the free hot chocolate the store offers to customers?
Do not assign causation based on a correlation.
Alternatives to a Scatter Chart
A fishbone diagram resembles a fish skeleton. The “head” is the problem, and the causes to the problem run off the spine, much like fishbones do. This is the other primary chart that people use to help establish causation. However, this does not use quantitative data like a scatterplot, and instead is more of an organic brainstorming session. They are very different charts, designed for different processes. While there may be a cause and effect, this is the limit of the similarities of the charts.
Scatter Chart Resources
TIBCO Spotfire - Scatter Plot Configured as a Density Plot
Examine the density of data displayed in a scatter plot, using transparency or settings which bin...