Correlation Coefficient Guide: Pearson r, Interpretation & Examples
Quick Answer
- *The Pearson correlation coefficient (r) ranges from –1 to +1, measuring the strength and direction of a linear relationship.
- *r = +1 means perfect positive correlation; r = –1 means perfect negative; r = 0 means no linear relationship.
- *r² (r-squared) tells you the proportion of variance explained — an r of 0.7 means ~49% of variance is shared.
- *Correlation does not imply causation — confounders, reverse causation, and coincidence are always possible.
What Is a Correlation Coefficient?
A correlation coefficient is a number that quantifies the strength and direction of a relationship between two variables. The most common type — the Pearson product-moment correlation coefficient, denoted r— measures how closely two continuous variables follow a straight-line (linear) pattern.
Karl Pearson developed the formula in 1896, building on earlier work by Francis Galton. Today it is the most widely used statistical measure of association. According to a 2020 analysis in PLOS ONE, the Pearson correlation appears in over 75% of published research papers that report bivariate relationships.
The Pearson Correlation Formula
The formula for Pearson's r is:
r = ∑[(x−x̄)(y−ȳ)] / √[∑(x−x̄)² × ∑(y−ȳ)²]
In plain English: it divides the covariance of x and y by the product of their standard deviations. This normalization constrains r to the range [–1, +1].
Worked Example
Consider five students' hours studied (x) and exam scores (y):
| Student | Hours (x) | Score (y) |
|---|---|---|
| A | 2 | 65 |
| B | 4 | 78 |
| C | 6 | 82 |
| D | 8 | 90 |
| E | 10 | 95 |
Computing the Pearson r for this data gives r ≈ 0.986, indicating a very strong positive linear relationship. The r² of 0.972 means that study hours explain about 97% of the variance in exam scores in this sample.
Interpreting Correlation Strength
Jacob Cohen's widely cited 1988 guidelines for behavioral sciences classify correlation strength as:
| |r| Value | Strength | Example |
|---|---|---|
| 0.00–0.10 | Negligible | Shoe size and IQ |
| 0.10–0.30 | Small | Income and happiness (r ≈ 0.20) |
| 0.30–0.50 | Medium | SAT scores and college GPA (r ≈ 0.40) |
| 0.50–0.70 | Large | Height and weight (r ≈ 0.60) |
| 0.70–1.00 | Very large | Study hours and test scores in controlled settings |
Context matters enormously. In physics, an r of 0.70 might be disappointing. In psychology, r = 0.30 can represent a meaningful and publishable finding. The American Psychological Association (APA) emphasizes reporting effect sizes alongside p-values rather than relying on arbitrary thresholds.
R-Squared: The Coefficient of Determination
Squaring the correlation coefficient gives r², which represents the proportion of variance in one variable that is predictable from the other. This is often more intuitive than r itself.
| r | r² | Variance Explained |
|---|---|---|
| 0.30 | 0.09 | 9% |
| 0.50 | 0.25 | 25% |
| 0.70 | 0.49 | 49% |
| 0.80 | 0.64 | 64% |
| 0.90 | 0.81 | 81% |
A correlation of 0.50 sounds strong, but it only explains 25% of the variance. The remaining 75% is driven by other factors. This distinction is critical for making predictions — even a “large” correlation leaves substantial unexplained variation.
Correlation vs Causation
This is the most important concept in applied statistics. Correlation tells you that two variables move together. It does not tell you why.
Three explanations exist for any observed correlation:
- Direct causation: X causes Y (or Y causes X).
- Confounding: A third variable Z drives both X and Y. Ice cream sales and drowning deaths are correlated (r ≈ 0.85 seasonally) because both increase in summer heat — not because ice cream causes drowning.
- Coincidence: Spurious correlations exist everywhere. Tyler Vigen's research catalogued hundreds, including the r = 0.95 correlation between US spending on science and suicides by hanging (1999–2009). Clearly meaningless.
A 2015 study in the American Journal of Epidemiology found that over 40% of media health headlines implied causation from correlational studies. Critical readers should always ask: was this a controlled experiment or an observational study?
Pearson vs Spearman vs Kendall
| Method | Measures | Best For |
|---|---|---|
| Pearson (r) | Linear relationship | Continuous, normally distributed data |
| Spearman (ρ) | Monotonic relationship | Ordinal data, outliers present, non-normal distributions |
| Kendall (τ) | Concordance of pairs | Small samples, tied ranks, ordinal data |
Pearson is the default choice for continuous data with roughly normal distributions. Use Spearman when your data has outliers, is ordinal (like satisfaction ratings), or the relationship is monotonic but curved. Kendall's tau is more robust with small sample sizes (<30) and handles ties better.
Common Mistakes When Using Correlation
Ignoring Outliers
A single outlier can dramatically inflate or deflate Pearson's r. Anscombe's quartet (1973) famously demonstrated four datasets with identical r = 0.816 but completely different patterns — one driven entirely by a single outlier. Always plot your data before interpreting r.
Restricting the Range
Measuring correlation on a subset of data with limited variability suppresses r. For example, the correlation between SAT scores and college GPA appears weak at highly selective universities because all students have similar SAT scores. The true population correlation is higher.
Assuming Linearity
Pearson's r only captures linear relationships. A perfect U-shaped relationship between anxiety and performance (the Yerkes–Dodson curve) would yield r ≈ 0, despite a strong and real association. Use scatterplots and consider non-linear alternatives.
Calculate correlation for your dataset
Try the Free Correlation Coefficient Calculator →Frequently Asked Questions
What does a correlation coefficient of 0.7 mean?
An r value of 0.7 indicates a strong positive linear relationship between two variables. As one variable increases, the other tends to increase as well. The r-squared value (0.49) means that approximately 49% of the variance in one variable is explained by the other.
Does correlation imply causation?
No. Correlation measures the strength of a linear relationship, but it does not prove that one variable causes changes in the other. The correlation could be due to a confounding variable, reverse causation, or pure coincidence. Establishing causation requires controlled experiments or rigorous causal inference methods.
What is a good correlation coefficient?
It depends on the field. In physics, r values above 0.95 are common. In social sciences, r = 0.30 may be considered moderate and meaningful. Cohen's guidelines classify 0.10 as small, 0.30 as medium, and 0.50 as large for behavioral research. Always interpret r in context.
What is the difference between Pearson and Spearman correlation?
Pearson correlation (r) measures linear relationships and assumes both variables are continuous and normally distributed. Spearman correlation (ρ) measures monotonic relationships using rank-ordered data and makes no distributional assumptions. Use Spearman when data is ordinal, has outliers, or the relationship is monotonic but not linear.
Can correlation be negative?
Yes. A negative correlation (r between –1 and 0) means that as one variable increases, the other tends to decrease. For example, the correlation between outdoor temperature and heating costs is strongly negative — as temperature rises, heating costs fall. An r of –0.8 is just as strong as r = 0.8, only in the opposite direction.