Data Insights

What is predictive analytics?

Christopher Gillespie

Companies use predictive analytics to forecast future events based on past data. It helps them mitigate risks and identify opportunities, such as estimating the lifetime value of customers (LTV or CLV) or planning for economic downturns. Predictive analytics involves data mining, statistics, and machine learning. Prediction is complex—lots can go wrong—so teams should predict with caution and choose their tools wisely.

The predictive analytics process

1. Define goals

Like all analytical endeavors, prediction begins with planning. Teams scope out the needs of each business unit that’s involved, such as product, marketing, customer support, or analytics. What are their goals? What questions do they hope to answer? How do they prefer to view data? What sources are they most interested in?

Teams that leap into making predictions before scoping often find themselves backpedalling. “Step one is make a tracking plan,” says Joe Corey, a data scientist and founder of the remote engineering software Subspace. “Without it, you’re going to miss something. And the further you get along in the process, the harder it becomes to go back and add a new data source, change the syntax, or alter the information hierarchy.”

Need help launching analytics? Learn about Mixpanel’s professional services.

2. Collect data

Data collection comes second, and it’s the most time-consuming step. Teams identify their data sources, verify that the data is clean and up-to-date, and connect those sources to an analytics tool. Because most data decays with time, and information like customer records can expire every few years, this stage often suffers from mission creep. Solving one data issue sometimes exposes more data issues and the project balloons beyond its scope.

Sometimes, connecting data is easy, as is the case when the analytics tool offers a pre-built connection, API, or webhooks. Other times, it must be done manually, with CSV files, file transfer protocols (FTP), and custom code.

Once the data is integrated, teams are able to view data from multiple sources within the analytics tool interface. A SaaS company, for example, can connect its CRM data, product data, website data, call center notes, and third-party data providers, and begin inferring how pillars of its business affect one another.

3. Analyze

Once the data is prepared, teams query it to test hypotheses. For instance, they check whether product outages help predict support call volumes, whether a users’ email domain can predict that they’ll use a fraudulent credit card, or whether a user who completes their profile will remain a customer for longer than normal.

It’s critical that teams test the statistical significance of each of their findings before accepting them as truth. If the sample population’s behaviors don’t reflect the entire user population, and a company acts on the prediction anyway, it can make inferior, sometimes costly, decisions. For example, if a call center decides to staff more agents on the assumption that call volume is influenced by the weather and it turns out not to be, they can waste budget on payroll and overtime.

Read How and When to Calculate Statistical Significance

4. Data modeling

When teams isolate what they believe are causal relationships, where one factor such as button clicks reliably influences another, such as sign ups, they can build a model around it. The term ‘model’ is little understood. It isn’t necessarily a computer program—it’s just an equation that summarizes a relationship and suggests an outcome. For example, saying ‘If X occurs, Y is twice as likely to occur.’ Any algebraic equation is, technically, a model.

Examples of predictive models:

  • SaaS: Leads that visit 10+ web pages are 50 percent more likely to purchase.
  • E-commerce: A shopper that abandons an item is 3x more likely to click an ad.
  • Media & entertainment: Listeners that like heavy metal are 2x more likely to enjoy rock.

Teams often backtest their models to ensure that if they had had the model in the past, it would have accurately predicted outcomes. For example, the team behind a personal investment app could simulate a portfolio’s performance over the past fifty years to ensure that it consistently made money.

The most common types of models include:

  • Decision trees use a virtual flowchart to list all possible outcomes for an event. Decision trees are the simplest method for prediction, and are especially useful when a model has missing values or unknown variables.
  • Regression analyses quantify the relatedness of multiple variables and express it as a percentage. Variables can be loosely correlated, highly correlated, or not correlated at all.
  • Machine learning models rely on a neural network, or an algorithm that mimics a human brain, to discover the relationship between variables. Neural networks are powerful, but they can’t always explain why they reached a particular conclusion.

5. Deploy

If a model works, teams use it. But they should always test its effectiveness with a sample population before rolling it out widely, the same way they A/B test new product designs before launch. Good old-fashioned human judgement is an effective quality check to make sure the model results and recommendations make sense. If they seem particularly far off the mark, it’s an opportunity for teams to double-check their data, model, and underlying assumptions.

6. Monitor

Teams should continue to fine-tune their model as they learn more. What works in a test environment very rarely works in reality without adjustment. But real performance data from real users is the first step toward improving the model so it produces reliably useful results. “I really can’t stress the importance of launching early and collecting data from actual users,” says Joe. “It’s like Mike Tyson said—everyone has a plan until they get punched in the face. Reality is always throwing punches.”

Predictive analytics versus prescriptive analytics

Analytics fall into four categories based on the degree to which they are focused on the future. Predictive analytics and prescriptive analytics are very similar, except that prescriptive analytics go one step further to recommend or trigger an action, as with a self-driving car. Predictive analytics software is used to forecast while prescriptive analytics software is used to advise.

  • Descriptive analytics provide data about the past.
  • Diagnostic data display data in real time.
  • Predictive analytics forecast future events using past data.
  • Prescriptive analytics forecast future events and recommend actions.

Mockup: past, present, future, action

Predictive analytics challenges

The only thing growing faster than the use of predictive analytics is its misuse. As big data, fast computing, and user-friendly software have grown cheaper, all manner of companies now attempt to prognosticate the future. In their hurry, many commit age-old errors such as relying on false assumptions.

Take the Greek tale of Croesus, king of Lydia, for example. Croesus had a dream warning him that his son would soon be killed with a spear. Croesus grew paranoid, forbade his son from participating in battles, and sent him on a hunt along with a bodyguard to keep him safe. On the hunt, the bodyguard killed his son with a spear. Croesus’ false assumption that people only die from spears in battle led to his son’s death.

Despite abundant data, most people are no different that Croesus, and base their predictions on assumptions. Companies hire on the assumption that their revenue will continue to grow, invest on the assumption that the market won’t change, and launch products on the assumption that they understand what customers want. Assumptions are like earthquake fault lines that threaten the models that rest upon them. But, even the worst assumptions can be neutralized, if accounted for.

“All models are wrong but some are useful.” – George S. Box

According to the late statistician George S. Box, professionals can rely on models if they maintain a healthy sense of skepticism. “All models are wrong but some are useful,” he said in a 1978 talk where he suggested that all models should be interrogated and used only if they’re confirmed to be “illuminating and useful.” To test their models, teams should check whether they’re built assumptions that are faulty or subject to sudden change, such as:

Assuming the future will be like the past

Sometimes, events occur that have never happened before. These are called black swan events, based on the idea that no amount of observing white swans can confirm the hypothesis that all swans are white. (Unless one views every swan in existence.) It’s only when someone sees a black swan that they know the hypothesis is untrue. But because black swans never occurred in past data, a predictive model could never has foretold it.

This presumption of continuity is the most common false assumption in business. It’s why Microsoft CEO Steve Ballmer famously predicted the iPhone would never succeed, Oracle was slow to recognize the opportunity of cloud-based software, and Blockbuster failed to pivot to streaming video. Their executives weren’t wrong—according to the past data.

Mistaking correlation for causation

Just because two events occur together doesn’t mean one causes the other, or that they’ll continue to be related in the future. “I’m certain there’s a stock out there whose success seems to be perfectly predicted by the temperature in Ulaanbaatar, Mongolia, according to past data” wrote Nassim Taleb, the author of Fooled by Randomness. “But that in no way means they’re related.”

Yet seductively simple correlations often prove irresistible. Analysts, marketers, product teams, and customer support managers latch onto models that appear sound because they involve data, but are spurious and essentially just superstitions. For example, that the number of cranes visible on a city’s skyline predict construction cycles or that social media noise can predict fluctuations in the stock market.

Overfitting

Predictive models should be relatively simple. When they’re too complex, it can be an indication that the model has become what’s known as overfitted—it matches the past data too perfectly, and thus has no predictive value, according to Nate Silver, author of The Signal and the Noise. For example, a company that predicts that its next customer will to be an insurance company whose name begins with an “A” because that was true of the last several customers is narrowing its focus a bit much.

Underfitting, good fit, and overfitting

Insufficient data

Companies often lack enough or good enough data to predict with accuracy. Sometimes, collecting more data would be too expensive. For example, a consumer tech company might have to interview tens of thousands of customers to produce statistically significant results. Other times, data carries inherent biases, such as a survey that was only run to a particular segment of customers, and only includes responses from happy customers who were willing to take the time to fill out a survey.

What can be done about assumptions? Teams must develop a sense of what the anthropologist Jared Diamond calls “constructive paranoia” and endlessly second-guess their assumptions and the validity of their models before ever declaring a relationship true. And even when it does appear incontrovertible, continue to test.

Industries that benefit from predictive analytics tools

Predictive analytics platforms are useful in any industry that benefits from foreknowledge. Which is to say, almost every industry:

  • Retail and e-commerce: Recommend products, understand buyer behavior.
  • Consumer tech: Predict which users will convert.
  • Finance: Control risk, measure markets, predict downturns.
  • Insurance: Assess risk, detect fraud, predict a customer’s likelihood to file a claim.
  • Government: Predict when infrastructure will need maintenance.
  • Healthcare: Identify at-risk patients, forecast demand for hospital beds.
  • Manufacturing: Monitor product quality, avert expensive recalls, forecast demand for utilities such as electricity.
  • Media and entertainment: Predict the popularity of movies and shows, recommend content.
  • Telecommunications: Maximize cross-sell and up-sell, measure the value of new markets.

Predictive analytics examples

Travel Republic predicts why users book trips

The travel booking site Travel Republic wanted to understand the journeys customers took to purchase, and how the company might increase bookings. Like many travel sites, the company suffered low conversion rates because many browsers never buy. The team deployed Mixpanel’s predictive analytics solution to gather more granular data on every website visitor event to predict visitors’ reactions to product changes. The team discovered that they could increase the booking fee without harming conversions, and made the site more profitable.

Read the Travel Republic case study.

Vente-Privee tightens its funnel to increase purchases

The French flash-sale retailer Vente-Privee needed to understand how messages and product adjustments could lead customers to purchase more frequently. It deployed Mixpanel’s anomaly detection feature to automatically alert its team to interesting fluctuations in user behavior. The team was able to understand every step customers took and launched behavior-based notifications to bring users back to the app at critical conversion points.

Read the Vente-Privee case study.

Lemonade predicts user preferences and increases purchases 250 percent

The tech-powered insurer Lemonade wanted to analyze its more than 100,000 policyholders to understand what attracted them to the platform, and what prevented them from purchasing additional products. Mixpanel’s anomaly detection feature helped alert the team to unusual dips and spikes in user drop offs so they could build a sales funnel that increased purchase rates 250 percent.

Read the Lemonade case study.

How to predict with accuracy

As the aphorism goes, prediction is difficult—especially about the future. Companies can make their predictions better by investing in the teams, tools, and training that help them benefit from the upsides of forecasts without suffering the downsides. For example, hiring analysts and data scientists who specialize in data and forecasting, partnering with analytics platforms that have successful customers in a similar industry, and creating prediction guidelines that include checklists. For example, teams should always ask:

  • Is the data accurate?
  • Are the tests statistically significant?
  • What assumptions does our predictive analytics model rely on?
  • What would happen if those assumptions changed?
  • What are the risks of deploying the predictive analytics model?

With the right setup and a healthy sense of constructive paranoia, prediction is possible. Teams can begin to peer around the proverbial corner to act faster and with greater prescience than competitors, and put themselves at a significant advantage in an otherwise unknowable future.

Get the latest from Mixpanel
This field is required.