Correlation (vs. Causation): what it is and types of tests | Mixpanel
Popular Topics

What are correlation metrics?

Last edited: Aug 22, 2022 Published: Jan 4, 2019
Mixpanel Team

An overview of correlation metrics

Correlation metrics measure whether or not there is a relationship between two variables. For example, whether rising product supply can be linked to a lull in customer demand. Once identified, statistical relationships help companies to forecast sales, target marketing campaigns, and improve their service. Discovering and applying correlation insights requires a careful understanding of probability. Correlation metrics examples:

  • A product team discovers that higher app usage leads to more up-sells
  • A marketing team finds a lead’s score is not predictive of its pipeline value
  • An analytics team finds that individuals with highly rated managers take fewer sick days
  • A customer support team finds that customer retention rises around the holidays

Mixpanel's Signal helps automate correlation analysis

 

 

 

 

 

 

 

 

 

Why do correlation metrics matter?

Correlation metrics help companies make more informed decisions. Once teams identify statistical relationships, they can rely on them. If customer churn increases whenever new software bugs are discovered, for instance, an engineering team can make a good case for hiring more developers. The energy savings technology firm ETS, which helps building owners save money by conserving energy, uses correlations between building energy usage and historical weather data to advise customers on adjusting their heating and cooling. A correlation can be either positive, meaning both variables increase together, or negative, meaning when one rises, the other falls:

  • Positive correlation: When one variable increases, so does the other. For example, ice cream sales rise when the temperature rises.
  • Negative correlation: When one variable increases, the other declines, and vice versa. For example, if supply rises, demand falls.

Teams can also measure the degree to which two variables are correlated using Pearson’s correlation coefficient. The equation measures how much the two variables appear to influence one another on a scale of 0-1, where a score of one indicates a perfect correlation. If, in a music streaming app, song previews and song purchases have a coefficient of 0.94, they almost always occur together. Simply identifying a correlation, however, is not enough. Just because two variables have occurred together in the past doesn’t guarantee they’ll continue to do so in the future. For instance, just because the US trading markets fell on every day it rained in Quito, Ecuador over the past year doesn’t mean investing on that basis is safe. It could be a complete coincidence. Correlations are simply flags that a relationship between two variables warrants further investigation, and that the team should run tests.

Correlation versus causation—what’s the difference?

Just because two events occur at the same time doesn’t necessarily mean they’re related, or that one causes the other. As the University of California Los Angeles anthropologist Jared Diamond writes his book The World Until Yesterday, in the absence of scientific testing, humans are uniquely prone to conflate unrelated events. For instance, an indigenous tribe the professor lived among believes that sneezing is an ill omen because a tribesman once sneezed and proceeded to drown in a river. When the professor sneezed, his hosts wouldn’t let him cross a stream. It’s easy to find the idea of sneezing causing drowning absurd, but the same line of reasoning is endemic in the workplace. A Wharton study found that 57 percent of marketers commit basic errors in A/B testing that lead them to believe that turning a call to action button orange is the key to landing more sales, when it has no impact. Sales teams are notoriously superstitious, believing lucky items such as business card holders influence deals, and product professionals believe reducing their sleep increases their productivity despite mounting evidence to the contrary. Observations alone can never prove cause and effect—only scientific testing can.

How to find correlations

Pearson’s correlation coefficient is lacking in a few other ways. The coefficient’s simplicity obscures important details. Take for example the following diagram, known as Anscombe’s quartet. The four graphs possess an identical mean, regression line, and correlation coefficient yet it’s abundantly clear to a casual observer that the data sets are quite different. 

Source: Wikipedia

To investigate a correlation, teams can:

Conduct tests

Teams can turn an observation, such as how discount offers precede a spike in sales, to conduct a test. They develop a hypothesis such as ‘a modest discount will increase sales by 20 percent,’ launch an A/B test where part of their user population is presented with the change and the other is not, and then test the results for statistical significance. If the test results prove the hypothesis, the relationship is likely to be reliable. The more teams test a relationship, the more they can be certain of its utility.

Create a time scale chart

Time scale charts display two variables over a period of time. This reveals how the variables converge and diverge at various stages. Just because two lines look correlated to the naked eye, however, does not mean they are. It only suggests it. Teams can use time scale data to form a testable hypothesis.

Create a scatter plot

A scatter plot displays each variable on two axes, irrespective of time, to reveal grouping patterns. Grouping suggests relatedness. Groups can also reveal directional relatedness, and whether a correlation is positive or negative. If the charted angles of multiple groups align, it suggests relatedness.

Create a derivative time scale chart

A derivative time scale chart can reduce the variability in noisy data and make relationships clearer. It’s created by dividing one variable by the other, then plotting the data in a time scale chart.

Measure the distance correlation

The distance correlation equation is an attempt by mathematicians to compensate for the lack of granularity in Pearson’s coefficients. The distance correlation measures the distance between points in a dataset’s scatter plot or time scale chart. A score of zero means they barely diverge, and the two variables are highly interdependent.

Conduct qualitative research

To fully understand whether observed correlations are reliable relationships, teams can interview their users. In the case of a SaaS product, teams could learn that a higher click-rate is actually a bad thing, because users are tapping the app out of frustration while waiting for it to load.

Leading indicators or lagging metrics

Teams hunting for correlations are typically searching for leading indicators—metrics that foretell an event, such as a decline in app downloads. If website visits always drop before app downloads, it’s a leading indicator. The converse is a lagging indicator, or a metric that reports on something that has already occurred, like the sales data from a company’s quarterly earnings report.

What are correlation metrics tools?

To identify correlation metrics, teams need the ability to merge and manipulate lots of data. For that, they often use:

User analytics

User analytics serve as a single source of truth for customer data. They offer integrations with common systems such as websites, apps, CRMs, and customer support systems, and possess friendly interfaces that allow non-experts to pull reports and deduce insights.

Segmentation tools

Segmentation tools allow teams to segment their user data by characteristics like product type, customer age, usage, and more, to understand those groups in detail. Segmentation allows teams to view how variables influence particular groups or individuals, and is crucial for fully understanding correlations. Most user analytics offer segmentation, but some companies with non-standard datasets and formats need specialized tools, such as Segment, which integrates with Mixpanel user analytics.

Machine learning tools

Machine learning and artificial intelligence tools can help teams make sure sense of large, complex datasets that their team doesn’t have time to scour. Most user analytics provide machine learning tools to detect anomalies and alert users of interesting relationships.

Get the latest from Mixpanel
This field is required.