This post is an update regarding the data loss incident we had a month ago . The purpose is to describe what we are doing to prevent similar failures in the future.
On February 22, 2016, we had a data loss incident that resulted in 9% of our customers losing their events from February 22 or 23, 2016, depending on the customer’s time zone. For more than half of those affected, we lost less than 1% of events from that day. However, in many cases it was more, and in some cases, it was the entire day’s worth of events. It was our worst instance of data loss in four years and a very stressful, disruptive experience for our affected customers, the Mixpanel engineering team, and the rest of the company.
The incident raised two fundamental questions. First, how did a bug get rolled out to all production data servers without being detected? Second, given that bugs will ha...
March 31, 2016