Save engineering time on your data warehouse pipelineLast edited: Sep 16, 2021
We’ve released a connector that sends data from Mixpanel to Amazon Redshift Spectrum, Google BigQuery, Snowflake, Google Cloud Storage and Amazon S3. In this post, we’ll outline how this can save you time and engineering resources, and why our customers call it “Mixpanel’s most important product right now”.
In order to understand your users’ behavior, you’ve likely spent lots of time mapping out the data you want to collect from your website and app. Mixpanel provides an intuitive way to analyze that data to identify behavioral trends and their causes. But that data is valuable outside of Mixpanel too. Before, you had to build and maintain a custom data pipeline to your data warehouse to conduct further analysis on your Mixpanel data. Not anymore.
We’re excited to release a pipeline that sends your Mixpanel data to your analytics destination of choice. By letting us do the heavy lifting, your engineers or data scientists can stop spending 80% of their time on data preparation, and more time surfacing key business insights. We support cloud data warehouses like Google BigQuery, Amazon Redshift, and Snowflake, as well as data storage platforms like Google Cloud Storage and Amazon S3, with more destinations like Microsoft Azure coming soon.
Mixpanel + data warehouse = no question left unanswered
We spoke to one of our customers, Robert Chi, a data engineer at Sega, who explains why he uses this feature:
“Data warehouse export allows us to easily analyze Mixpanel and Appsflyer data side by side with DAUs, installs, and revenue numbers. This is one of Mixpanel’s most important products right now.”
Like Sega, we use Mixpanel on a daily basis, in addition to a data warehouse. In our case, we use BigQuery to join our customer data with Google Cloud Platform’s billing data, which helped us save a million dollars a year in obscure costs that we wouldn’t have been able to categorize otherwise. That’s how we knew our customers would find it valuable.
After doing our research and talking to customers, it became clear that this pipeline had to do three things. It had to save our customers’ time, it had to be flexible, and it had to provide accurate, up-to-date data. Let’s start with what how we’ll save you time.
Spend time and engineering resources more wisely
We automatically clean and transform the data for you, so you can save your valuable engineering resources. By looking at the schema of the data, and running it through our transformation pipeline, we make sure that the data doesn’t get lost or cause errors once it’s in your warehouse. We’ll make slight edits to names and move groupings around so that the data is easily queryable once exported, and so that you don’t have to. On top of that, we’ll automatically deduplicate your data.
Robert Chi at Sega went on to explain how this saves his team time, “It saves us a lot of coding time. Often the data team and dev team experiment with custom events, and before you had to wait for a developer or engineer to write the ETL script every time. That made testing cycles much longer. This shortens the test-to-deployment cycle considerably, and is the only way we can support our huge amount of data.”
Control how and where to send your data
Our customers use a variety of data warehouses, so we knew we had to support export to more than just one or two solutions.
On top of that, we provide the flexibility to choose how to send data to these sources. Some customers prefer a single table for all their events, while others, especially customers with huge amounts of data, like to have separate tables for each different type of event. Some only want to send three or four events to their data warehouse, and by not forcing them to send all their data it means their queries are returned faster, and their costs go way down.
Plus, you can set up a scheduled export so your data automatically gets sent every hour, or every day, depending on your preference. That goes for both your event data and your people profile data.
Maria Laura Scuri at Faceit explains how important this is, “Before the BigQuery export feature came along, we used the API. It was quite a lot of data to manipulate, and we weren’t able to export event by event. It would take half a day each week for someone to fix it. But now, the events we need are all divided and easy to query. This is a huge time-saving.”
Up-to date, GDPR-friendly data
Think of this pipeline in terms of a sync, rather than an export. That’s what ensures the data in your warehouse is always up to date, and GDPR compliant. Let’s take a customer, like Faceit, who uses this pipeline to send data to BigQuery. Jane Doe is one of Faceit’s users, so her data would be in both BigQuery and Mixpanel. If she decides to delete her account, her data would first be erased in Mixpanel. The next time data gets synced, we’ll notice that Jane’s data is in BigQuery, but not in Mixpanel, and make the change to delete her data from the warehouse as well.
This is one of the things that sets Mixpanel apart. Almost all ETL platforms export data once and forget about updating it in the future. By not syncing data on a regular basis, companies run the risk of not being GDPR compliant, and having data that’s inaccurate and out-of-date. Mixpanel’s data warehouse export solves this problem, ensuring that your business is GDPR compliant, and runs on accurate data.
Spend less time preparing data, and more time analyzing it
With this data warehouse export pipeline, our customers can get even more value out of their Mixpanel data. Instead of spending valuable time and engineering resources on preparing data for analysis, teams and engineers like Robert can spend more time analyzing the data, and finding ways to apply those insights.
“Now our team has more data than they know what to do with. They don’t need me to prepare an ETL process for them, they can get right into the data,” Robert explained.
Data warehouse export is available to all customers on a paid plan starting today and comes with a 30-day free trial, which includes a daily scheduled export and one day of backfill.