Uncategorized

Community Tip: Guide to Exporting Mixpanel Data

This Community Tip walks developers through the Mixpanel APIs for raw data and tips to set up an ETL (Extract, Transform and Load) configuration for both Event and People data.

Exporting your Mixpanel data can serve a number of purposes whether you simply wish to use the data for raw analysis purposes, store a local copy in your own internal datastore, or pipe the data to another service.

API Endpoints for raw Event and People data

Mixpanel provides powerful processed API endpoints which provide the formatted Event data you see within the various Mixpanel reports like Segmentation, Funnels, and Retention. These endpoints query your raw data for insight analysis, but do not return the raw data you send to Mixpanel.

The raw export API provides a full export of your raw project data so that you can utilize this data for purposes outside of Mixpanel. Instead of returning the number of users who performed an action on a given day like in Mixpanel reports, it will return all instances of that action and all associated properties with each action. This allows you to grab a comprehensive data dump of any data you have sent to Mixpanel.

The engage API provides access to your People data within Mixpanel. Similar to raw export, the engage API will return all of your users and each of their associated people properties.

Depending on the final usage case of the raw data you are exporting, the raw export API (Events), engage API (People), or some combination of both may be more applicable.

Exporting Data

Depending on whether you wish to export Events or People data, the process for exporting the data differs slightly. Requests to either endpoint require the following parameters:

  • api_key (the API key for the current Mixpanel project)
  • expire (an epoch timestamp of when the request is valid until)
  • sig (a hash of all request params with the API signature)

These required parameters are explained in detail here in the context of how an individual request is generated.

Events

For Events, the API endpoint which should be used is https://data.mixpanel.com/api/2.0/export/. In this query you must specify a date range and optionally specific event(s) and property conditions for the export. In general this request can be made either manually by crafting an API request or systematically by using code to do this for you.

The pro move here is to write code to perform and store exports systematically instead of doing this process manually for each API request you wish to make. This can be done through any coding language which can hit the Mixpanel API and record the data. As an example, please see this Gist which contains a sample python script to download raw event data from this API endpoint and write it to a file.

People

For People, the API endpoint which should be used is http://mixpanel.com/api/2.0/engage/. This query requires special handling as this API endpoint is limited to only return 1000 user profiles at a time. After you define a selector which determines the users to download from your project, within the response you will find:

  • session_id (id of the full cached response)
  • page (page number of the entire response currently being viewed)

In order to loop through each and every user within your project, you will need to hold the session_id constant and continue to increase the page number to continue polling for the next page of results. Typically this process is performed programmatically as doing this manually is cumbersome across many pages.

has_results = True
total = 0
with open(output_file_name, 'w') as f:
    while has_results:
        responser = json.loads(response)['results']
        total += len(responser)
        has_results = len(responser) == 1000
        for data in responser:
            f.write(json.dumps(data)+'\n')
            print "%d / %d" % (total,global_total)
            parameters['page'] += 1
            if has_results:
                response = api.request(parameters)

As with the Events export API, this process can be performed in a number of coding languages depending on what you are most comfortable with. As an example, please see this Gist which contains a sample python script to download People data from this API endpoint and write it to a file.

Data is available for export instantly.

ETL with Mixpanel Data

Once you have exported your raw event and/or People data from Mixpanel, you then have the option of using it in a variety of ways. Of special interest to customers is sending this data into another platform to produce a custom integration using Mixpanel data.

Data Format

The return format for Event and People data differs

The format of data returned for events will be line return delimited JSON (aka JSON lines). This format looks like:

{"event":"Homepage A/B Test Start", "properties":{"time":1430495979,"distinct_id":"333342trfgsdggfdsddd","$browser":"Chrome","$initial_referrer":"$direct","$initial_referring_domain":"$direct","$lib_version":"2.4.2","$os":"Mac OS X","$screen_height":1600,"$screen_width":2560,"Tests":"Music Green","User Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36","mp_country_code":"US","mp_lib":"web"}}
{"event":"Page Loaded","properties":{"time":1430495979,"distinct_id":"333342trfgsdggfdsddd","$browser":"Chrome","$initial_referrer":"$direct","$initial_referring_domain":"$direct","$lib_version":"2.4.2","$os":"Mac OS X","$screen_height":1600,"$screen_width":2560,"Page":"Home","Source":"Other","User Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36","mp_country_code":"US","mp_lib":"web"}}
{"event":"Hover","properties":{"time":1430496022,"distinct_id":"sddsddsds42324123","$browser":"Chrome","$initial_referrer":"$direct","$initial_referring_domain":"$direct","$lib_version":"2.4.2","$os":"Mac OS X","$screen_height":1600,"$screen_width":2560,"Image":"Logo","User Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36","mp_country_code":"US","mp_lib":"web"}}

For People, the users are stored in a list of dictionaries under the “results” key of the the API return. If you process each user within the “results” key return on each page, you will be able to parse each user profile. This format looks like:

{"page":0,
"page_size":1000,
"results":
[{"$distinct_id":"2424344243","$properties":{"$browser":"Chrome","$city":"San Jose","$country_code":"US","$initial_referrer":"$direct","$initial_referring_domain":"$direct","$name":"Michael Jordan","$os":"Mac OS X","$region":"California","$timezone":"America/Los_Angeles","test1":["red","blu","yel","red"],"$last_seen":"2015-03-03T09:30:07"}},{"$distinct_id":"brandon@email.com","$properties":{"$browser":"Chrome","$country_code":"US","$email":"brandon@email.com","$first_name":"Brandon","$initial_referrer":"$direct","$initial_referring_domain":"$direct","$last_name":"Smith","$name":"Brandon Smith","$os":"Mac OS X","$username":"brandonsmith5","Item":"Coffee Table","Item Category":"Furniture","User ID":3402,"logins":1,"username":"brandonsmith5","$last_seen":"2015-04-01T12:22:08"}},{"$distinct_id":"14b90aa1e901dd-09b927db2-32677802-13c680-14b90aa1e926a9","$properties":{"$browser":"Chrome","$city":"San Jose","$country_code":"US","$initial_referrer":"$direct","$initial_referring_domain":"$direct","$name":"Mike Jones","$os":"Mac OS X","$region":"California","$timezone":"America/Los_Angeles","UserType":"Production","$last_seen":"2015-02-15T20:35:14"}}],
"session_id":"1433205560-yHmTPq",
"status":"ok",
"total":3}

After you understand the structure of this data, it allows you to perform manipulation on the desired data before sending it on to another platform.

Transform Mixpanel Data

As the data coming back from Mixpanel is JSON, any programming language can contains a tools to interpret and utilize this data. Once you have the data export prepared, the idea here is to grab the specific part of the data you need from the response. For example, if I am exporting People data to MailChimp to sync unsubscribe status, I would likely look for a user’s email address and the value of the “$unsubscribed” key in a People export.

At this point, the goal is to grab the appropriate data from the Mixpanel export response and then format this data to send to other platforms, as their APIs likely require a different data format than Mixpanel returns. Once you grab desired data from the dataset and format the data appropriately, you would then hit another API such as MailChimp or Salesforce to add this Mixpanel data into the other platform in the relevant format for that platform.

Automating the Process

With any process involving exporting, transforming, and loading, this can be time consuming and error prone to perform manually. The best way to automate this process is to first run tests until a working ETL model is built between Mixpanel and the final data location. Once a test model has been achieved, you can automate the process to run at desired intervals (e.g. with a cron or scheduled job).

Your crontab file might contain an entry like:

# start the Mixpanel ETL job daily at 6am server time

0 6 * * * /usr/bin/mixpanel_to_mailchimp_etl.py

As with any ETL process, there are multiple components required to export, translating the response, and then loading it into another platform robustly and consistently.

Whether you’re a developer or an end user, if you have questions about exporting Mixpanel data to a specific platform or are simply interested to hear more about how this can be set up with your Mixpanel data, please email support@mixpanel.com to speak with someone smart, quickly.

Get the latest from Mixpanel
This field is required.