Maintenance issues and an explanation
Update: ETA is within the next day or day and a half to be fully caught up.
Update (Feb 4th, 2:50 AM): Seeing good stability across new infrastructure and we’re crunching data a lot faster.
Update (Feb 4th, 7:35 PM): Processing through the backlog of data. It’s about 1 day backed up however you should see data before Feb 3rd 17:00:00 UTC at least.
Update (Feb 6th, 11:56 PM): We are now back to real-time and all systems are stable.
For the past few days many of our customers must be wondering what’s been
going on with Mixpanel and why they have been seeing a lot of red percentages
and zeroes. The truth is a combination of factors stemming from simply trying
to scale to the large volume data that we process in real-time–it’s never
been easy and at each step we have to get innovative and diagnose problems.
The good news is, we’ve been getting better at this as time goes.
We diagnosed our problems about 1.5 months and had a plan of action to scale
parts our system out again in the smoothest way possible. The goal was not to
just scale out the system and augment performance by a huge order of magnitude
but it was to do so in a manner in which our customers wouldn’t even notice.
We certainly failed at the latter goal. The problem was that we were forced to
roll out the new system sooner than expected to handle scale which didn’t
provide enough time to thoroughly vet some of the systems and issues we have
been able to work through under less stress.
If we’ve lost you as a customer or you’re uneasy with doing your analytics
with us I would ask that you reconsider. When things like this happen you can
be sure we’re absolutely getting the least amount of sleep physically
(sometimes none) possible till we fix the problems that hurt our reputation as
a company. Also, we’ve reached a huge milestone with the new infrastructure
we’ve rolled out to the point where we can focus on creating a reliable
service capable of handling an extremely large data volume in real-time. In
our opinion, things will only get better at this point and we’re the kind of
startup that tends to rise to the challenge no matter what.
If we’ve learned anything from this process we learned to be much more
proactive and providing lots of status updates about what’s going on since we
know you rely on us for your business day to day. We’re sorry for not doing
that better. If there’s anything else you wish to add, please leave us a
comment or email us at firstname.lastname@example.org.
To address any of your concerns right away:
- There was no data loss.
- We’re working through a backlog of data that could take an additional 1-1.5 days before we get to-the-second real-time again.
- Retention tables/charts have been reset–this was a planned side effect not a result from problems we’ve been having.
- You should see new data in your report pages being crunched over the past days or hours that have been missing constantly.
We sincerely apologize for the problems we’ve had over the past days and we’ll
be working on ways to prevent things like that from happening again.