Why a codeless implementation will let you down

Update: After listening to our customers, we’ve changed our stance on event autotracking and launched Autocapture to help you get set up with insights faster. We still believe any analytics implementation method should offer data control, security, and the ability to deepen your analysis by adding on custom tracking if you want it. We’re happy to say Mixpanel’s Autocapture checks all these boxes. Learn more about Mixpanel Autocapture here.

Missing important data is one of the most frustrating things in analytics – because it prevents you from quickly getting the answers you need. Among the many reasons you might not have the data you need, two common issues stand out: You were unable to get developer time to get the tracking implemented or you flat-out forgot to track something. In 2016, Mixpanel launched Autotrack in an attempt to solve both of these problems.

Autotrack automatically collected user interactions on your web or mobile applications without the need to add additional code. When you wanted to analyze user actions, you used our point-and-click visual tool to create events you care about. The promise of this approach is obvious: You can collect all the data you need without developer resources and won’t have to worry about forgetting to track something. This allows a non-technical user to decrease the time from question to answer without the need to involve technical teams.

We abandoned the codeless and automatic event collection model and believe you should, too. It is a fundamentally flawed approach that falls far short of its promise. If you use it, you’ll end up with limited and unreliable data, spend even more developer time and money trying to fix problems and inconsistencies, and expose yourself and your customers to major security and privacy risks.

How codeless and automatic event collection works

To understand why codeless and automatic event collection will fail you, you need to understand how it works. When you install the tracking library on your web or mobile app, it listens to all user interactions such as clicks, taps, or keystrokes. For every user action, the library sends data about the type of action and the element that was interacted with.

For example, imagine you have a checkout page with three buttons: “Place Order,” “Continue Shopping,” and “Empty Cart.” If a user clicks on “Place Order,” the library sends some data that says something like “User 12345 clicked a button that contains the text ‘Place Order.’” That data is then stored in a database so you can query it later on.

Now you have data being collected but have not yet defined what you care about analyzing. You’ll need to use your analytics provider’s visual interface that allows you to define your events and create a new “Checkout attempted” event. Using the visual interface, you tell it that the “Checkout attempted” event is defined as any clicks on the button with “Place Order” text. In theory, you can see how many people have attempted to checkout. In reality, you will see how many people clicked a button that has the text “Place Order.” The subtle difference here is vitally important as you’ll see in the following problems.

You’re probably going to have to deal with inaccurate data

At this point, you think you know how many users are attempting checkouts, but what if there are other ways to initiate a checkout? Perhaps a user can also checkout by pressing “Enter” on their keyboard instead of clicking the checkout button. Perhaps you have a “One-Click Checkout” button on your item pages that skips this step entirely. In order to ensure your data is accurate, you’ll need to address all of these cases individually when you set up your events. Often, you’ll need a software developer consultation to ensure you’re addressing all cases properly. Otherwise, the data you expect to see will not be the actual data you see and will lead to misinformed decisions.

The tool can miss important data

Let’s say you’ve audited your product and you now have all of the various ways a user can initiate a checkout defined as events. Now you’re ready to dig in and do some behavior analysis or message targeting. So you decide you want to see how many users have attempted to purchase item ID 2853 – but that data is not collected or included in your events because the one-size-fits-all event collection doesn’t know where to find your item IDs or that they’re even important to you. It only knows that users are clicking a button with text “Place Order.” Time to loop in some software developers to help.

Another reason automatic event collection will cause you to miss important data is due to the limited number of interactions that get tracked. Many interactions that end up being very important to you are not tracked at all. Imagine you have item reviews at the bottom of your item pages and you want to know how many users scroll to the reviews section. Most codeless products do not track scroll position. This is just one example of an interaction that is unlikely to be tracked by one-size-fits-all solutions but there are many others. You will need a software developer to build a custom solution for these.

Changes to your application can secretly break your tracking

Imagine you now have your “Checkout attempted” tracking in place how you like it with all of the data you need. One day, someone decides to change the text of the checkout button from “Place Order” to “Finish Order” to see if it increases conversions. But your “Checkout attempted” event is defined by clicks on a button labeled “Place Order.” Now when a user clicks this button, the library will send data that says “User 12345 clicked a button that has the text ‘Finish Order.’”

The one-size-fits-all model doesn’t know it’s the same button, your event count drops to 0, and you’re left wondering what’s going on. The only recourse is to figure out what has changed, update your tracking, and hope all of the work you put in to getting accurate results is still valid.

This is a very basic example of how changes can break your tracking. In reality it becomes much more complex and, again, will require developer resources to sort out.

The data is very difficult to export to other systems

In the solutions currently available on the market that allow for codeless event collection, data is described and collected in a programmatic way. That means when you export the data to a warehouse—or wherever you’d like to send it—you’re going to get a bunch of raw data that isn’t named what you’ve named it in the tool. In our basic example, instead of seeing a list of events that say “Checkout attempted,” you’ll see a dump of data that looks something like “{action: ‘click’, selector: ‘div#el_1235’}.” In reality, it’s a much longer string of machine code and will require software or data engineering resources to make it usable again.

Security: The nail in the coffin

If you’ve gotten this far and still believe the codeless and automatic event collection model is still for you, it’s imperative that you understand the security risks. When we built Autotrack, we spent a lot of time considering the security implications and built in a number of various heuristics to prevent accidental collection of private information, but we still failed. We quickly addressed this particular issue but had to admit that this type of security hole is a fundamental flaw of a track-everything model. As pointed out by Steven Englehardt, a Privacy Engineer at Mozilla, “automated scraping of user data from a page is an inherently insecure process. There is no way a heuristic-based blacklist will be able to filter all possible sensitive information leaks.”

There are two options available to address this problem. First, you can manually blacklist every possible place you might collect sensitive information. This will require software developer resources and puts the onus on you to ensure you don’t miss anything. Second, you can further reduce the information that is automatically collected (such as form fields) which exacerbates the missing-important-data problem. Englehardt addresses these options, “The effort spent by a publisher to ensure no sensitive data is collected could just as well be spent explicitly choosing the form fields from which to collect data. The latter whitelist approach is also significantly less likely to lead to unexpected leaks.”

To drive this point home, consider what an analytics product that currently features codeless and automatic event collection says in their own documentation:

"To ensure we don’t collect any PII (Personally Identifiable Information) or PHI (Personal Health Information), you need to be extremely cautious."

How much is this risk worth to you?

Conclusion

While there are plain benefits to the idea of codeless and automatic data collection, it ultimately falls short of its promise. You’ll find that you’ve traded one set of problems for another, larger set of problems and opened yourself up to a major security risk at the same time. You will still need developer resources for anything beyond some very basic analysis. If you care about accurate data, answering detailed and specific questions, deep-level analysis and targeting, and the privacy of your users, codeless and automatic event collection is not the right solution for you.

Analytics for everyone.

Neil Rahilly

Senior VP, Product and Design @ Mixpanel