It’s possible that when you were learning algebra in high school, you had no idea that one day you would need to make a scatter plot to show real-world results.
The examples we had to plot in class seemed ridiculous every time. the ratio of the number of hours spent studying to the test result, the ratio of a group’s weight to height, or the relationship between sales of hot coffee and the weather outside.
You may, however, need to use that elementary school math quite frequently as a working adult (or perhaps just as a curious one).
One of those times is undoubtedly when making a scatter plot. A scatter plot has so many practical uses that can assist you or your audience in understanding data and what it means.
Let’s go back to high school math for a moment because it’s possible that you left all your knowledge of what a scatter plot is back at your desk that was covered in scribbles.
What is a Scatter Plot?
Data visualization that demonstrates the relationship between various variables is known as a scatter plot. The placement of various data points between an x- and y-axis is used to display this data.
This kind of data visualization gets its name from the way that each of these data points appears to be “scattered” across the graph.
The purpose of using a scatter plot also referred to as a scatter diagram or an x-y graph, is to find any patterns or correlations between two variables.
Check out this scatter plot illustration that was taken from one of Visme’s templates.
Make this scatter plot template uniquely yours by editing it.
The price of a home and the square footage of the home are the two factors. To test whether there was a correlation between these two variables, we selected a sample data set from a small number of homes.
A scatter plot will reveal a number of different characteristics in the patterns or correlations.
- A nonlinear correlation may take the form of a curve or another shape within the data points, whereas a linear correlation forms a straight line.
- Strong or Weak: A strong correlation has data points that are closely spaced apart, whereas a weak correlation has data points that are more distant from one another.
- A positive correlation will point up (i.e., the x- and y-values are both increasing) while a negative correlation will point down (i.e., the x-values are rising while the corresponding y-values are falling).
There is no correlation between your data if, however, none of these characteristics are visible in your graph.
Scatter Plots: When to Use Them
There are specific guidelines for when each type of chart or graph will be the most effective data visualization to present your information.
Let’s examine when a to scatter plot is the most effective tool for displaying your data.
Determine the relationship or correlation between two variables using a scatter plot.
Are you looking to see if the combination of your two variables might mean something? You can determine whether your data points have a possible relationship by plotting a scattergram with them.
Let’s imagine that you own an ice cream shop and are trying to figure out why recent sales have been so weak.
A scatter plot could be made to track various variables, such as the ambient temperature.
Make this scatter plot template your own by making changes. Edit and Download
In order to determine correlation, you should always plot your scatter diagram with both the x-axis and the y-axis increasing as they move away from the center.
As demonstrated by the aforementioned example, when it’s cold outside, ice cream purchases generally decline.
When your dependent variable’s independent variable has several values, use a scatter plot.
Let’s briefly review what independent and dependent variables mean by going back to math class.
- A variable is the thing you’re trying to track or measure, first and foremost. In every graph, there are two variables: an independent variable that is typically plotted on the x-axis and a dependent variable that is typically plotted on the y-axis.
- The controlled variable is an independent variable. This is what either naturally changes or what is changed by someone manipulating the experiment or graph.
- The variable being investigated or measured is referred to as a dependent variable. When using a scatter plot, the variable is the one whose correlation with the independent variable we are attempting to ascertain.
- The height will be plotted on the x-axis and the weight will be plotted on the y-axis, as shown in the example below, if you’re trying to find a correlation between height and weight.
- Make this scatter plot template your own by making changes. Edit and Download
- Given that weight fluctuates more than height does, it’s likely that different weights for the same height will appear in your data, giving you more than one value for each independent variable’s dependent variable.
When two variables go well together, use a scatter plot.
A scatter diagram is a great way to visualize a pair of variables’ relationship and determine whether there is a positive or negative correlation when the variables pair well.
Consider the relationship between birth weight and gestational age, or the length of time the baby has been in the womb. It would stand to reason that a baby who had more time to grow inside its mother would be bigger and heavier.
Using a scatter plot, let’s examine this data.
Make this scatter plot template uniquely yours by editing it. Edit and Download
As we might anticipate, a baby tends to weigh more at birth the longer it is able to “cook.”
Other examples of factors that seem to be related include the number of hours worked and the amount of money earned, the amount of time studied and the grade on a test, or the cost and the size of a diamond.
When Not to Use a Scatter Plot
Similar to when it makes sense to use a scatter plot to visualize your data, there are a few situations in which you should avoid using this style of chart.
When your data are not at all related, avoid using a scatter plot.
A scatter plot would be useless as a means of information visualization because there are some variables that clearly show that there is no correlation.
For instance, it would not make sense to combine the students’ various heights and the number of pets they keep at home on a scatter plot if you were conducting a random survey on a classroom full of students.
While it can still be entertaining to plot these two variables since they obviously have no relationship at all, a bar chart (one for each data value) might be a better option in this case.
When you have a lot of data, stay away from scatter plots.
Overplotting is the process of overfilling your scatter plot with data to the point where it fills the entire graph.
- Check out the scatter plot below for another illustration. This type of diagram is challenging to interpret because it is so dense that it essentially merges into one big blob.
- However, there are a few ways to deal with an overly planned scatter plot. Consider using a heatmap to identify the areas of your data with the greatest point density.
- A heatmap-like effect could be produced by using transparent data points, or you could color-code different data sets.
- When your data is so voluminous that it forms a large blob, your best bet is to forgo using a scatter plot.
Things to Consider When Using a Scatter Plot
There are a few considerations to make as you examine your data if you choose to use a scatter plot to identify relationships or correlations.
Not all correlations lead to causes.
The presence of a strong positive or negative correlation in your data does not prove that the independent variable is the cause of the way your dependent variable measures.
These are correlations, so it appears that your independent variable does have an impact on your dependent variable in some way.
Let’s return to the ice cream sales illustration.
- While it might appear that a drop in sales is directly related to the weather, there may be a number of other factors at play.
- A hurricane, for example, might have caused a natural disaster that required an evacuation, which decreased business. There might now be additional competition due to a nearby ice cream shop opening.
People simply don’t want to purchase ice cream on some days. And while the colder weather undoubtedly may play a role, a correlation on a scatter plot does not imply that you should take it as gospel.
There may be multiple dependent variables.
Your data set may contain more than one dependent variable, but a scatter plot will still allow you to see this.
Only the color of each dependent variable needs to be changed in order to compare them to one another on the scatter plot.
Let’s revisit the example of height versus weight.
To see if there was also a difference between those factors, we added two additional dependent variables to that scatter plot: male and female. In order to distinguish between the two, we gave female points an orange color and male points a brown color.
This is yet another excellent method for avoiding overplotting. Making sure your data is color-coded will help to distinguish it and enable you to see more of your points.
Create a dynamic chart title.
Did you know that you could make the title of your chart update by connecting it to a cell in your workbook? Although it’s a little bit of a hack, it’s a cool option that will make you seem smart to your boss, client, or mother.
The data that is updated frequently, such as daily numbers entered manually or data retrieved from a database and imported into Excel, is best suited for dynamic titles.
I’m going to show you a PPC revenue report that is updated every day. The running total up to that day for the month will be displayed in the title. The steps you must follow are as follows:
Ensure that your data is formatted as a table, which is Excel’s equivalent of a simple database, and that it uses the correct number formatting. You should format as a table because, if you create a chart from a table, the chart will update as you add new rows to the table.
Additionally, when you simply enter data in a cell that is immediately beneath or to the right of a formatted table, the table automatically expands to accommodate the addition of any new data.
Enter a SUM formula that includes all 31 rows in a cell just south of row 31 (to accommodate a full month), even though some may be blank if you are only halfway through the month.
It would be simple to select Insert > Charts > Column (Mac: Charts > Column) from any table cell if we were using both columns of the table as a data series.
However, in the table below, we would only choose the header and the cells that have revenue information. The reason for this is that we don’t want the days of the week to transform into a data series. Chart Tools > Design > Chart Styles (Mac: Charts > Chart Styles) provides a wealth of formatting options.
Your chart should have a title that says “Running Total” in it. My title was “PPC Revenue for Oct:”. For details, refer to tip #4 above.
Since the chart area’s default fill is white and it is typically displayed on a white sheet (which I advise maintaining), we will change the Fill to No Fill without anyone noticing.
Simply select the chart, press Ctrl/Command-1, and select Fill: No Fill (Mac: Fill > Solid > Color > No Fill) to accomplish this. In order to pull this off, gridlines will undoubtedly need to be turned off, but you should still do that. This switch is located under View > Show (on a Mac, Layout > View).
Reference the cell with the total by choosing a cell above the chart, just to the right of the title. A cell is referenced by simply placing an = sign in the cell, followed by the cell reference, which can be entered manually or selected using the mouse. Excel will visually indicate the cell you’re referring to by highlighting it in a light blue color. After that, format the cell using the same style that you did for your title.
All that is left to do is raise the chart and place it in line with the heading. Getting everything to line up perfectly required some maneuvering. Since I only had one data series, I simply removed the legend and presto! A catchy title.
The chart and title now update dynamically whenever a new row is added to the table. Impressive, no?
It is obvious that charts offer dimensions that a table cannot provide. The good news is that, once you get the hang of it, you can use any combination of these methods to make your data more attractive and useful in a matter of minutes.