Scatter plot

A scatter plot (also called a scatter graph, scatter chart, scattergram, or scatter diagram) is a plot to display the relation between two variables. The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis (X-axis) and the value of the other variable determining the position on the vertical axis (Y-axis). Typically, the response/outcome/dependent variable is on the Y-axis, and the variable we suspect may be related to the y-axis variable, predictor/explanatory/independent variable is on the X-axis.

A scatter plot reveals the relationship or association between two variables (form, direction, strength) such as

  • Are variables X and Y related?
  • Are variables X and Y linearly related?
  • Are variables X and Y non-linearly related?
  • Are changes in Y related to changes in X?
  • Are there any outliers?

Some examples of scatter plots are given below.

Examples of scatter plots

Examples

To generate a scatter plot, the function plot() can be used. In the following, we plot the relationship between the age (in years) variable and the hvltt (verbal ability) variable of the ACTIVE study. The relationship of the two variables is not clear although tending to be negative.

> usedata('active')
> attach(active)
> 
> plot(age, hvltt)
> 

Add regression line and a smoothing curve

Oftentimes, we are interested in whether two variables are linearly or nonlinearly related. We can better visualize the relationship by adding a straight regression line (linear) or a smoothed curve to the scatter plot. In R, the smoothed curve can be estimated using the loess.smooth() function or we can generate the plot using the scatter.smooth() function directly.

In the example below, we add both a regression line and a smoothed line to the scatter plot between age and hvltt variable. Note that their relationship appears to be nonlinear. Some comments about the code used:

  • lm() function fits a linear regression model.
  • abline() function will add a line with given intercept and slope to an existing figure.
  • lwd option sets the width of lines.
  • lty option sets the width of lines.
  • legend() function adds a legend to the existing figure.
> usedata('active')
> attach(active)
> 
> scatter.smooth(age, hvltt, lpars = list(col = "blue", lwd = 3, lty = 3))
> abline(lm(hvltt~age), col='red',lwd=3)
> legend('topright', c('Linear','Smoothing'), lty=c(1,2), lwd=c(3,3), col=c('red','blue')) 
> 

To cite the book, use: Zhang, Z. & Wang, L. (2017-2022). Advanced statistics using R. Granger, IN: ISDSA Press. https://doi.org/10.35566/advstats. ISBN: 978-1-946728-01-2.
To take the full advantage of the book such as running analysis within your web browser, please subscribe.