Why Mixpanel did anomaly detection for product analytics
Engineers use monitoring on their server analytics so that if metrics go awry, they can react quickly and make good decisions. How could Mixpanel make anomaly alerting and detection work for product analytics, too?
Diverse data is one of those things that makes machine learning at Mixpanel both incredibly challenging and fun. Our 20,000 customers send us 250 billion events a month, and no two projects are the same. Figuring out how to build machine learning models that work well for everyone, in every possible scenario, is both an art and a science – and it takes time. For instance, it took us almost a year to develop the high quality conversion predictions we wanted for our first machine learning-backed product, Predict.
To handle the breadth and depth of our data, we’ve had to think differently about established models to solve hard problems in novel ways. So when we decided next to tackle anomaly detection and alerting for key metrics, we were inspired by some of the engineering tooling that already exists.
Engineering teams have already been using tools like StatsD to collect system metrics, and services like PagerDuty to alert then when there is a problem, though these tools require a lot of manual configuration and threshold tuning. Getting an immediate update when a server goes down or API requests spike is now standard operating procedure for engineerings teams, and most engineers cannot imagine life without this kind of tracking and alerting.
But anomaly detection and alerting had never been applied to user and product metrics. Smart Alerts for mobile is changing that.
How Smart Alerts came to be
The original impetus behind Smart Alerts came to us when we realized an important metric for Mixpanel was off: first-time integrations.
First-time integrations are when Mixpanel customers first send us events to track. It’s a very important KPI for us because it’s tied directly to our revenue. So when in March we noticed that first-time integrations were down 10%, we did some digging and discovered that the welcome email open rate had decreased at around the same time.
Digging more, we found that we’d pushed a change to our welcome email at the same time. A seemingly minor edit caused the body of the mail to be clipped in Gmail, leading fewer people to click on the call to action or even see the instructions on how to begin Mixpanel integration.
Noticing what had happened, and piecing together the whole story, took longer than we would have liked. It was honestly kind of embarrassing. Being engineers, we wanted to see if we could solve the problem in an automated way.
Every company has key product metrics that they should be monitoring closely, but sometimes second-tier KPIs slip through the cracks while attention is elsewhere. Even worse, when something does inevitably go wrong, it might take weeks (or longer) to notice. At this point, it’s not only that users have potentially undergone a bad experience. It’s also become harder to figure out what happened and how to fix it. It’s demoralizing.
We thought, “What if we could just tell users that something’s going on, sooner?” We wanted to proactively push the information that Mixpanel users need to see, and to alert users if anything on their dashboards does something that’s unexpected. Could we make something for our customers – especially product teams – that gave them the same kind of alerting that engineering teams had? And even better, could we do it without all the manual configuration? That’s what we set out to do with Smart Alerts for mobile.
Why anomaly detection is so hard for product, and how we solved it
Anomalies are unexplained or unanticipated changes in your product metrics. To build a “warning system” for such a thing seems obvious now, even inevitable.
The main reason anomaly detection hasn’t been done on product is that it’s a lot harder to apply to product metrics. Anomaly detection is difficult in general, and there is no precedent for doing it on product metrics, and only limited prior work on backend systems.
Engineers require alerting because it is essential that they know if a server goes down, and typically the only way to get that alerting is manual configuration and tweaking, so they are willing to do it.
Product managers would like this type of alerting, but they don’t generally have the time to figure out exactly what to monitor and what their thresholds should be, and even if they did, they often don’t have access the proper tooling to connect their metrics system with an alerting system.
Product metrics also move much more randomly and are influenced by a greater number of variables than backend system metrics. A little bit of movement (such as our 10% drop in integrations) can sometimes matter a lot, thus it’s much more challenging to find patterns among the noise in product analytics, and lift out potential anomalies.
Wonja Fairbrother, an engineer on the machine learning team, presenting on anomaly detection and Smart Alerts at a recent Mixpanel Office Hours event.
In addition, Smart Alerts for mobile needs to work with wildly diverse datasets: high vs. low data volume, presence or absence of different daily, weekly, monthly and seasonal trends, general upward and downward trends, as well as noise in the data, and so on – everyone’s data is different and is almost infinitely nuanced. And yet one of the things I like most about Smart Alerts is its simplicity.
We created a dead-simple interface for Smart Alerts. There is literally no setup on your part, and in the Mixpanel mobile app, we can present you with an anomaly and prompt you to examine any unexpected event – all based on the reports you have already added to your dashboard.
Smart Alert’s models can handle fairly complicated data movements: We’ll alert you when a surprising number of customers drop out of a two-week funnel by day three, or alert you when we detect a big change in the retention rates of your daily cohorts. We’ll even alert you when we expected a metric to change and it didn’t, such as seeing Sunday-level activity on a Monday.
For product leaders, Smart Alerts is exciting because you can determine when things are going wrong much sooner. It’s accessible to everyone through the Mixpanel app. And we’re looking to make things simpler and even more powerful in the future.
The future of machine learning at Mixpanel
Machine learning can take a lot of complicated and monotonous computations out of the hands of engineers, analysts, PMs and others, allowing them to focus on leveling up the products they can build.. Machine learning gives engineers a unique opportunity to give anyone a way to make computers work in their best interest.
Without machine learning, there are so many trend lines and so many pieces of data and so many different cases, it can be impossible to know if something’s actually going right or going very wrong, since you can’t look at every single thing every day.
Our ultimate vision for machine learning at Mixpanel is to push insights directly to you. Smart Alerts for mobile is the first step toward that.
Jenny Finkel, PhD is the Engineering Manager and Tech Lead of Mixpanel’s Machine Learning Team. She received her undergraduate degree in Computer Science at Columbia University, followed by an MS and PhD in Computer Science from Stanford. She completed a postdoctoral fellowship at MIT and Columbia.