Here’s the idea: sometimes you want to optimize for engagement, rather than
something more concrete like revenue or conversion rate. Measuring engagement
is a trickier task, (one you should use Mixpanel for),
but that just makes it more interesting.
Let’s use Posterous (the service that hosts this blog) as an example startup
that might want to optimize for engagement. Perhaps they want to increase the
number of comments received on blog posts, so they drum up an A/B test between
their current layout (the control group) and one with a greater emphasis on
Once it’s been running for a while we can start to analyze our hypothetical
A/B test. We can approach it in a few ways:
- Average actions per user
- Ratio of active to inactive users
- User activity distribution
Comparing average actions per user
The first and most naive approach is to simply compare the average number of
actions per user from each group in your A/B test. This is really simple to
do, and it looks something like this:
|Hypothetical Posterous A/B Test: Average comments per visitor|
If we use this metric, we see that the experimental design is clearly winning,
with approximately 30% (0.0296/0.0222) more comments per user. The trouble
with this technique, though, is that we don’t know anything about the
distribution of users who commented. For all we know, all of the comments in
the experimental group could have been posted by a single user – and that
wouldn’t be optimal. This is an obvious exaggeration, but it leads nicely to
our next option:
Comparing active user ratios
We can avoid the issues with the previous method by ignoring the actual number
of comments and just looking at our visitors. We classify each visitor as
Active or Inactive – those who post at least one comment and those who don’t.
This lets us ignore any outliers, such as a visitor who posts a thousand
times. Now our table looks like this:
|Hypothetical Posterous A/B Test: Active visitor ratio|
|Group||Unique visitors||Active visitors||Ratio|
When we look at the proportion of visitors who posted at least one comment, we
can see that the control group is beating the experimental group by around 30%
– the complete reverse of our last conclusion.
It’s interesting that the two metrics we’ve used so far to measure user
engagement can give entirely different results – it shows that we really need
to look at the underlying distribution.
User activity distribution
The most likely outcome of an A/B test like this is a couple of differently
shaped distributions. They will still be quite similar, and in all likelihood
will be power-law shaped (as the
vast majority of visitors don’t post at all). So, without further ado, here
are our distributions:
This graph may be a bit difficult to interpret. It shows the frequency of visitors with different comment counts – for example, there might be 1,000 visitors who left 2 comments, 345 who left 3 comments, and so on. This means that a point (X, Y) on the curve tells you that there were Y visitors who left X comments. Because this is just an example, the specific numbers don’t really matter. The most important part is the overall shape of the curve.
We can see that most users don’t comment at all, and that there are very
different behaviors between groups. The control group (green line) has more
users that write a small number of comments each, while the experimental group
(red line) has fewer active users who comment frequently.
The question becomes ‘do you want a smaller, highly engaged community, or a
larger, less engaged community?’ There’s no easy answer here; it’s more of a
think-long-and-hard sort of situation that greatly depends on individual
aspects of your startup.
Ultimately, the distribution method is the most powerful, but it’s also the
most difficult to implement and analyze – especially since your results will
likely be less clear-cut than the contrived examples I’ve given here. If
anyone has some hard data, I’d love to hear about it – it would be great to
have a case study. Please email me at
email@example.com if you’re interested in sharing.
This post is based off of a conversation I had with Jesse
Farmer a few weeks ago. If you haven’t read his blog,
you should – there are some real gems.