Introducing JQL: A Query Language for Analytics
Although JQL is a general-purpose tool, it is designed to make it easy to express typical analytics queries about customer behavior. It has a functional programming design, centered around streaming primitives like map, groupBy and reduce. By composing these elements, it is easy to write queries that scan over user activity streams, compute aggregates, or slice and dice the dataset on multiple dimensions.
The aim of this post is to explain the purpose of JQL, the motivations behind its design, and our plans for its future.
Why create JQL?
We created JQL so that our customers could query their Mixpanel data with maximum flexibility. We started working on JQL over a year ago after it became increasingly apparent that our customers had a subset of questions that were very complex and could not be answered using our built-in reports. Before JQL, the only solution to this problem was to export the data and run the required analysis in another tool. This was often a laborious process that could take hours, and it lacked the real-time, interactive quality of our built-in reports. Now, you can open the JQL Console app in Mixpanel, compose a query, and get the answer in seconds.
How did we design and create it?
A great deal of time and effort went into the design of JQL. Mixpanel is built on top of our own database, Arb, which has evolved over the past five years to support a set of dedicated query types that power our built-in reports. To achieve the flexibility we wanted, we needed to add a general-purpose query interface to Arb. Our goal at the outset of the project was to choose a query language that was simple, familiar, powerful and fast.
We started with SQL, the default choice used by most databases. However, we soon realized that SQL was very verbose and awkward for engagement analytics queries. For instance, a multi-step funnel, segmented on a couple of columns, would extend to several screenfuls of complex SQL, which is cumbersome to write and difficult to reason about. We wanted something simpler that felt purpose-built for analytics. We experimented with our own declarative language but were reluctant to require users to learn an entirely new language.
Adopting this functional paradigm allowed us to express queries as a pipeline of fundamental primitives, like map, filter and reduce. These would serve as the basic building blocks of the language. This also allowed us to make optimizations like hooking into the V8 runtime to replace these functions with native code and parallelizing various stages of the pipeline efficiently across an Arb cluster.
Next, we added some features to make it easier to do common tasks. For instance, groupByallows you to segment your data and groupByUser allows you to analyze each user’s activity history. Segmentation and per-user activity histories are the building block of most engagement analytics. By virtue of these functions, they are baked into JQL as a fundamental concept. We also added builtins for common aggregations. Returning to our original litmus test of writing queries for our own reports, we found that using JQL resulted in terse, understandable code.
What happens next?
From the beginning, JQL was intended to be used both by customers and our own teams.
Internally, among many use cases, JQL offers exciting possibilities for our engineering efficiency. By having a generic, flexible interface to Arb, we will be able to iterate on new products and features quickly. Our intent is therefore to continue to invest in the language. Current plans include another round of optimizations and the addition of a couple more primitives.
As we add these, we hope to make them composable, orthogonal features that interact predictably. We have laid a powerful, flexible foundation, which we will continue to extend. As we move forward, JQL will serve as the internal platform for all of our standard reports and tools.