Geolocation Error on 7/9/2015
Incident Summary:
No geolocation was being performed on events and people requests between 5:00PM PST 7/8/2015 to 1:30AM PST 7/9/2015. This affected all customers.
Timeline:
On 7/8/2015 at approximately 5:00 PM PST, a change was deployed to consumers that migrated our geolocation system from a Python module to a C replacement as part of our systems optimization plan. After this point, all geolocation requests made to the Maxmind DB began failing. Around 7/9/2015 1:00AM PST, Mixpanel began reverting the change and by 1:30AM, all consumers were reverted back to the Python Maxmind DB reader, which resolved the issue.
Root Cause:
The migration from a Python module to a C replacement proved not to be a direct drop-in replacement and generated errors (TypeError) when fed a value other than a string. We typically send IPs in the format we receive them, which can sometimes be a long (the decimal representation of the dot-decimal address).
Due to an unrelated earlier issue with unparsable IP addresses sent from some customers we built in protection logic that ignored TypeErrors generated by the IP utility subsystem ( that geolocates, parses, etc.). We did not catch the errors generated by the C MaxMind module because we specifically built logic to ignore these kinds of problems – this was a very unfortunate condition that resulted in us not noticing the issue for 8 hours.
Plan to prevent recurrence
We have made the following changes to our infrastructure and alert systems:
- We first rolled back the update so our codebase used the Python module again.
- We have successfully moved to the C module and are now accepting all values the Python module was able to accept.
- We will build a new testing routine to catch errors dropped by other logic.