Alerts on Anomaly Detection
When something is off with your analytics it warrants your attention. An anomaly can signal marketing campaigns that miss the mark, technical problems that affect usability and even issues as serious as a security breach that needs to be addressed immediately.
Why Detecting and Addressing Anomalies is Important
An anomaly is anything that’s unexpected based on the established norm. After gathering data for a while, you’ll begin to see patterns and trends emerge. You’ll get a feel for what’s normal and average. Every now and then something will occur that doesn’t conform. That’s an anomaly.
In data analytics, anomaly detection, also referred to as outlier detection, is the act of identifying anomalies. Finding anomalies is just as important as figuring out your conversion rate or the number of monthly active users (MAU). In some cases, it’s more important.
Anomalies tell you something is off, possibly way off. Whenever an anomaly pops up it usually means something needs fixing. There could be a minor issue that’s throwing things off, or a major issue like a security breach. On the opposite end of the spectrum, anomalies can also signal an opportunity that you don’t want to miss. Either way, the faster you can find out what has caused the anomaly the better off you’ll be.
How Anomalies Are Detected in Data
Anomalies can be detected manually by reviewing data reports, but that isn’t very efficient and you won’t catch things quickly. The more practical method is using an analytics platform with anomaly detection algorithms that can analyze vast amounts of data to immediately recognize anomalies.
The anomaly detection algorithm uses historical data and machine learning to establish baselines and forecast ranges for normal activity. Statistical modeling and significance tests are also used to validate the information. Anything outside of the normal range that passes the validation tests is detected as an anomaly.
Anomalies can be detected by:
- Time Stamp – Historical data over time is used to detect when an anomaly occurs. This is a very basic type of anomaly detection that simply looks for anything that doesn’t fit within a period of time.
- Cluster – A time interval isn’t needed to detect an anomaly. If there’s an outlier among a cluster of data points that indicates an anomaly regardless of timeframe.
Anomalies tend to fall into one of three categories:
- Context – Data is out of sync given a certain context or scenario.
- Collective – A collection of numerous anomalies.
- Point – A single data point that’s way off from everything else.
Essentially, anything that doesn’t conform with previous data (and in more sophisticated analytics platforms what’s projected) should be flagged as an anomaly. But accurate anomaly detection can be challenging.
Testing the Accuracy of Anomaly Detection
There are a few ways to know how accurate your anomaly detection is. One such method is called a confusion matrix. A confusion matrix compares predicted results with actual data to determine how close the analytics algorithm’s predictions were to what really happened. The matrix will have:
- True Positives – Number of results or data points that were accurately predicted. The higher this number is the better your platform can detect real anomalies.
- True Negatives – Events that were correctly predicted to not be anomalies.
- False Positives – A false positive is when something is predicted to be an anomaly but in fact isn’t and results in a false alarm.
- False Negatives – An anomaly that should have been predicted but wasn’t is a false negative.
When the true positives and false negatives are high and the true negatives and false positives are low the anomaly detection is highly accurate.
Examples of Anomalies That Need Attention
There are many types of anomalies, some good and some not so good. Sometimes it all depends on the metrics that are tracked and selected for anomaly detection. Here are a few example anomalies for a better idea of what to watch out for.
Sudden drop in sales or a particular product.
A sudden drop in sales conversions usually means there is some sort of technical issue preventing users from completing a purchase. However, sales drops in a particular location or region could be caused by an event like a natural disaster.
Dramatic dropoff in website traffic.
This anomaly can signal that for some reason your website or a web page went offline. If a data analyst or someone on the product management team is monitoring data daily they can catch it quick and correct the technical issue so that revenue loss is minimized. The drop off could also signal that a website’s rankings have fallen after a Google algorithm change and SEO efforts need to be made to increase visibility.
Unexpectedly high churn rate.
If the churn rate all of a sudden spikes you’ll need to analyze what changed shortly before the anomaly. Did you release an update? Did you increase prices? Did your app have a bug that made it crash? Knowing exactly when the anomaly occurred can help you identify and correct the problem.
Significant increase in sign ups or sales.
This is one of the good anomalies that are a welcome sight. It can indicate that a new marketing campaign is performing better than expected, that a change on the sign up or product page is boosting conversion or that recent exposure is having a positive impact on the business.
Higher pageviews week-over-week.
Another anomaly that’s good to see is a growing number of pageviews. It indicates an abnormal traffic pattern that suggests either current users are viewing more pages, the number of total visitors is increasing or both. Overtime the forecast range may change if the higher pageview count is sustained.
Anomaly detection doesn’t just tell you when a significant change happens. It also points you towards the reason behind the change. With the right analytics tools all this can be done automatically so that anomalies are addressed immediately without having to analyze data on a daily basis.