Data Anomaly Detection
Data anomaly detection, also known as outlier analysis, is used to identify instances when there is a deviation in a dataset. Once an anomaly is detected, it can be analyzed to figure out what caused the data point to go outside of the norm.
The Importance of Anomalous Data and Knowing What’s Normal
Anomalous data can indicate both issues and opportunities.
After collecting data for a while, you’ll get a baseline range for what’s normal within the metrics that you are tracking. You’ll also notice patterns within the baseline. You can even drill down to find data patterns for certain user cohorts, time of day, etc. It’s possible to make informed projections for the future based on the data baseline and patterns.
Anything that is outside of the baseline range or projections is considered a data anomaly.
An anomaly indicates something isn’t going as expected based on the data patterns. That could mean an anomaly occurs when a metric doesn’t change if the data pattern has shown that a change is the norm.
Think of it this way. Your business is generating a constant flow of data, but the data itself isn’t constant. Metrics ebb and grow, influenced by countless actions and outside forces. What’s most important is knowing the data patterns and anticipating the changes. That will tell you when something is off.
If you were a data manager for an online tutoring service you’d find engagement indicators, traffic and conversions drop off in the summer. But if you were to see a spike in service requests in a particular state in July it would catch your attention because that’s typically when tutoring services are in low demand.
Further investigation could reveal that the state had recently approved the addition of a placement exam at the start of the school year. The anomaly led you to information that can significantly help your business in the coming years. You’ll know to recruit more tutors in the area or virtual tutors that are familiar with the exam curriculum. You can also work on increasing your summer marketing budget and running additional campaigns to grab more of the market share.
Key Performance Metric Anomaly Detection
Since there are countless data points that can be tracked, it helps to focus anomaly detection on key performance indicators (KPIs). These are the metrics that matter most in terms of hitting business goals.
Data anomaly detection can be used for KPIs such as:
- Bounce rate
- Time spent in checkout
- Cost per lead
- Feature use
- App installs
- Average purchase value
The Mixpanel analytics system allows you to go a step further by establishing Events. These are meaningful user actions that are usually part of a conversion funnel and often fit the definition of a KPI.
Primary Uses for Data Anomaly Detection
There are endless ways that data anomaly detection can be used to enhance business operations and boost revenue. Below are three uses that apply to most businesses:
Product/Service Quality – Anomalies can indicate when a product, feature or service isn’t performing as expected.
User Experience – You’ll know what behavior is normal, what actions are shared by super users and if a problem is negatively impacting user experience.
Application Performance – An anomaly related to an application that’s used by a business to carry out day-to-day functions can alert you when productivity is off. Anomalies within application performance metrics are also used to monitor apps offered to users by a business.
Types of Data Anomalies
There are five primary types of data anomalies:
When a group of anomalies are present in a data subset it’s known as a collective anomaly. Collective anomalies are identified by comparing datasets.
Contextual anomalies, also known as conditional anomalies, are when a data point is well outside the norm for a metric within a certain context. It may not be an anomaly in another context.
This is an anomaly that deviates from the entire dataset regardless of context.
This refers to an anomaly that applies to just one variable.
Multivariate anomalies involve two or more variables.
Methods of Data Anomaly Detection
Manual data anomaly detection is a tedious task. Millions of data points can be collected and reported in numerous ways. Fortunately, automated anomaly detection is now possible. Mixpanel is one of a few data analytics platforms that uses machine learning algorithms to automate anomaly detection and send automatic alerts when something looks off. And you still have the ability to manually check an anomaly that’s detected.
Time Series Data
Time series data is a major component of anomaly detection for manual and automatic methods. It’s a collection of data values over a period of time. Time series data creates a record that reveals metric baselines and patterns that are used to make projections. These projections are used by analytics systems to identify data anomalies and actionable information.
Once a baseline is established, time series data is also used to identify cyclical patterns and seasonality.
Examples of Data Anomaly Detection
Data anomaly detection is only effective if you know how to use the information. It’s fairly complex so we’ve provided a few examples to help highlight the key concepts and how to make use of the findings. You may find similar scenarios in your data once normal ranges are established.
Sudden and Dramatic Drop-Off in Traffic
A decrease in traffic is far from uncommon. It can be a result of seasonality or a slow down in a successful marketing campaign. But when traffic drops off suddenly and sharply it’s an anomaly, particularly if the numbers go back up where they should be without intervention.
If traffic suddenly drops off and doesn’t tick back up you’re probably dealing with a serious problem that you may or may not be able to rectify in a timely manner. You could discover the cause for this anomaly is a Google algorithm change that reduced your ranking. In that instance, you’ll need to research what Google changed before determining how to regain your site ranking and traffic.
Sudden Increase in Traffic
This is one of the welcome anomalies, but it still needs investigation so you can fully capitalize on the situation causing the traffic increase. If you’ve just launched a new marketing campaign the increase may be anticipated. The anomaly in the traffic data point can tell you how successful the campaign is and where the traffic is coming from so you can allocate marketing spend accordingly.
However, let’s assume the traffic increase wasn’t anticipated and the business hasn’t made recent marketing changes. You investigate the cause of the anomaly and find that a social media influencer with millions of followers mentioned one of your products causing an influx of traffic. You now know that influencer would be a great partner and can reach out to start building a relationship.
These are just two common examples of how data anomalies are used to measure and monitor the health of a business. Automated data anomaly detection that catches outliers when they occur allows businesses to intervene quickly so that the impact of issues is minimized and opportunities are optimized.