Importing Datasets

A dataset represents a collection of data from a specific source, like Salesforce, Stripe etc. A project may have multiple datasets. Once data has been imported into a dataset, and a dataset version is live, the dataset will be visible in the Insights report.

There are only a few steps required to import a new dataset:

  • Create a dataset by clicking "Add a dataset" within the Data connector tab of your Project settings
  • Create a new dataset version
  • Import events and people to that version
  • Publish the dataset version to make it visible in the UI

If you want to view an example for creating a new dataset and importing data into it, go directly to the Example Workflow. The same example is shown in a complete Python script here.

Authentication

The Datasets API accepts Basic access authentication over HTTPS as an authorization method. To make an authorized request, put your project secret or the dataset secret in the "username" field of the Basic access authentication header. If you connect your Dataset to a third party, we recommend utilizing the dataset secret. Also, make sure you use HTTPS and not HTTP - our API rejects requests made over HTTP, since this sends your secret over the internet in plain text.

The Dataset Version Object

Each dataset can have one or many dataset versions. Each version represents a full snapshot of the data for that specific dataset. Only the five latest versions of a dataset are retained as old data is periodically deleted. The versions resource is available under the api-beta.mixpanel.com/datasets/<\dataset_id>/versions endpoint.

Data is imported into a specific version of the dataset identified by the version_id. Before starting an import, you have to create a new dataset version. After you’ve imported all the data, update the state of this version to mark readable to true. Once the value of the is_live attribute of the dataset version becomes true, your dataset is ready to be queried in your Mixpanel project. This is explained in more detail in the sections below.

Requests will return a dictionary with a data attribute if successful, and an "error" attribute set to a human readable string otherwise. All dataset version updates should have the following arguments:

dataset_id
string
A unique identifier that identifies the dataset. Must be an alphanumeric string less than 256 characters. Required in the resource path.
token
string
The token associated with your project. To find your project token, click your name in the upper righthand corner of your Mixpanel project, select Project settings from the dropdown, then view the Data connector tab.
version_id
string
This string serves as the identifier for a specific dataset version. Required in the resource path.
state
json
The dataset version state object representing the state of the version. Only read-write fields can be changed.

The dataset version state object

ready
boolean
True if all the data sent to this dataset version is ready for querying. (read-only)
ready_at
time
The time at which this dataset was marked as ready. (read-only)
readable
boolean
Set this field to true once all the import requests are done. This field is required when making any update to state. (read-write)
readable_at
time
The time at which this version was set to readable. (read-only)
writable
boolean
Whether this version is writable. This field is required when making any update to state. (read-write)

Creating a dataset version

# Creates a new version for the dataset named TESTDATA. 
# A dataset with the id TESTDATA must already exist in your project in order for success!
curl -XPOST 'https://api-beta.mixpanel.com/datasets/TESTDATA/versions' \
-u API_SECRET or DATASET_SECRET: \
-d 'token=PROJECT_TOKEN'
// Expected Return
{
    "data": {
        "created_at": "2017-06-26T23:00:47.617313Z",
        "is_live": false,
        "state": {
            "readable": false,
            "readable_at": "0001-01-01T00:00:00Z",
            "ready": false,
            "ready_at": "0001-01-01T00:00:00Z",
            "writable": true
        },
        "version_id": "5764640680181760"
    }
}

Listing all dataset versions

# Returns a list of all versions for the dataset TESTDATA
curl --get 'https://api-beta.mixpanel.com/datasets/TESTDATA/versions' \
-u API_SECRET or DATASET_SECRET: \
-d 'token=PROJECT_TOKEN' 
// Expected Return
{
    "data": [
        {
            "created_at": "2017-06-26T23:10:49.386664Z",
            "is_live": false,
            "state": {
                "readable": false,
                "readable_at": "0001-01-01T00:00:00Z",
                "ready": false,
                "ready_at": "0001-01-01T00:00:00Z",
                "writable": true
        id,
            "version_id": "5631943370604544"
        },
        {
            "created_at": "2017-06-26T23:00:47.617313Z",
            "is_live": false,
            "state": {
                "readable": false,
                "readable_at": "0001-01-01T00:00:00Z",
                "ready": false,
                "ready_at": "0001-01-01T00:00:00Z",
                "writable": true
            },
          l "version_id": "5764640680181760"
        }
    ]
}

Updating a dataset version

# Updates a specific version for the dataset named TESTDATA.
# Currently, users can only update the state of a given dataset version. 
# Only the fields marked as read-write can be updated.
curl -XPATCH 'https://api-beta.mixpanel.com/datasets/TESTDATA/versions/5764640680181760' \
-u API_SECRET or DATASET_SECRET: \
-d 'token=PROJECT_TOKEN' \
-d 'state={"readable":true,"writable":true}'
// Expected Return
{
    "data": null
}

Deleting a dataset version

# Deletes the version 1234567 for the dataset named TESTDATA.
curl -XDELETE --get 'https://api-beta.mixpanel.com/datasets/TESTDATA/versions/1234567' \
-u API_SECRET or DATASET_SECRET: \
-d 'token=PROJECT_TOKEN'
// Expected Return
{
    "data": null
}

Importing Events and People

Use the https://api-beta.mixpanel.com/import-events endpoint to upload events into a Dataset and https://api-beta.mixpanel.com/import-people to upload people records. Events and people records should be base64 encoded JSON objects that fit Mixpanel's standard event and people profile structure, however these API's do not support the resolution or creation of aliases.

Requests will return an HTTP response with body "1" if the import call is successful, and a "0" otherwise.

Instead of sending one event or one record per request, you can batch requests. The endpoints will accept up to 1000 records in a single batch. You can read more about batching requests to Mixpanel in our HTTP API documentation.

All dataset import updates should have the following arguments:

dataset_id
string
The id of the dataset you want to send the events/records to.
dataset_version
string
The version_id of the dataset you want to send the events/records to.

Importing an event into a dataset

# Imports an event into version 1234567 of the dataset with the dataset_id TESTDATA.
curl 'https://api-beta.mixpanel.com/import-events?' \
-u API_SECRET or DATASET_SECRET: \
-d data='eyJldmVudCI6ICJBaXJsaW5lIFJldmlldyIsICJwcm9wZXJ0aWVzIjogeyJ0b2tlbiI6ICJjYXNzaWUiLCAiZGlzdGluY3RfaWQiOiAidW5pcXVlIGlkZW50aWZpZXIiLCAiYWlybGluZV9uYW1lIjogImFkcmlhLWFpcndheXMiLCAidGltZSI6IDEzOTU2MTkyMDB9fQ==' \
-d verbose=1 \
-d dataset_id=TESTDATA \
-d dataset_version=1234567 \
// Expected Return
{
    "error": null,
    "status": 1
}

Example workflow

Let’s use the example of a user trying to import public Skytrax airline review data into Mixpanel.

Step 1: Create a dataset for your project

First, create a new dataset by clicking "Add a dataset" within the Data connector tab of your Project settings. You’ll have to keep the name of the dataset the same throughout the query, so if you name your dataset SKYTRAX you should always see something like /datasets/SKYTRAX throughout the URLs you’re using.

Step 2: Create a new version ID for the dataset

The version_id you receive as output is the new version_id that you’ll use throughout your updates and imports, so make sure you change the version number when you make your imports.

curl -XPOST 'https://api-beta.mixpanel.com/datasets/SKYTRAX/versions' \
-u API_SECRET or DATASET_SECRET: \
-d 'token=PROJECT_TOKEN'
// Expected Return
{
    "data": {
        "created_at": "2017-06-26T23:00:47.617313Z",
        "is_live": false,
        "state": {
            "readable": false,
            "readable_at": "0001-01-01T00:00:00Z",
            "ready": false,
            "ready_at": "0001-01-01T00:00:00Z",
            "writable": true
        },
        "version_id": "5764640680181760"
    }
}

You need to include the new version_id, 5764640680181760, in the next few API calls.

Step 3: Import events and records to this specific version of the dataset

Events and people records should be base64 encoded JSON objects that fit Mixpanel's standard event and people record structure. All imports require project token and distinct_id attributes.

These following lines of Python will create a single base64 encoded event. Note, it is possible to encode an array of up to 1000 json events and make only one API call to import the whole array as a batch.

import json
import base64
data = {
    "event": "Airline Review",
    "properties": {
        "airline_name": "adria-airways",
        "distinct_id": "unique identifier",
        "time": 1395619200,
        "token": "PROJECT_TOKEN"
    }
}
j = json.dumps(data)
e = base64.b64encode(j)   

Now we can import the base64 encoded event into the newest version of the skytrax dataset.

curl 'https://api-beta.mixpanel.com/import-events' \
-u API_SECRET or DATASET_SECRET: \
-d data=eyJldmVudCI6ICJBaXJsaW5lIFJldmlldyIsICJwcm9wZXJ0aWVzIjogeyJ0b2tlbiI6ICJQUk9KRUNUX1RPS0VOIiwgImRpc3RpbmN0X2lkIjogInVuaXF1ZSBpZGVudGlmaWVyIiwgImFpcmxpbmVfbmFtZSI6ICJhZHJpYS1haXJ3YXlzIiwgInRpbWUiOiAxMzk1NjE5MjAwfX0= \
-d verbose=1 \
-d dataset_id=SKYTRAX \
-d dataset_version=5764640680181760 
// Expected Return
{
    "error": null,
    "status": 1
}

Step 4: Update the dataset state to readable

# Updates to state must include both the "readable" and "writable" fields
curl -XPATCH 'https://api-beta.mixpanel.com/datasets/SKYTRAX/versions/5764640680181760' \
-u API_SECRET or DATASET_SECRET: \
-d 'token=PROJECT_TOKEN' \
-d 'state={"readable":true,"writable":true}'
// Expected Return
{
    "data": null
}

Step 5: Check to see the dataset version is set to "is_live"

Wait a few minutes, then check to see if the “is_live" attribute is set to true. This attribute indicates that the dataset version is available to be queried in the Insights report.

curl --get 'https://api-beta.mixpanel.com/datasets/SKYTRAX/versions' \
-u API_SECRET or DATASET_SECRET: \
-d 'token=PROJECT_TOKEN' 
// Expected Return
{
    "data": {
        "created_at": "2017-06-26T23:00:47.617313Z",
        "is_live": true,
        "state": {
            "readable": true,
            "readable_at": "2017-08-22T06:01:51.355759Z",
            "ready": true,
            "ready_at": "2017-08-22T06:05:38.57334Z",
            "writable": true
        },
        "version_id": "5764640680181760"
    }
}