How Fitbit’s data science team scales machine learning

Around the time Raj Bhan injured himself, he decided to part ways with the data team at Netflix. He’d been training for a half marathon and plucked a regimen from the web. Unfortunately, he had grouped himself into the wrong bucket and hurt his leg.

“That struck a chord,” he says. “People can suffer adverse effects from being in the wrong program at the wrong time. Fitness programs really need to be personalized.”

There’s a poetry to Raj’s injury, though. He now specializes in creating these programs (sort of). Raj heads up a data science team at Fitbit, the consumer technology company most famous for its stylish wearables.

Ingesting users’ fitness and workout data, Fitbit devices report data back to users through its suite of apps. In the case of personal training tools like Fitstar, Fitbit’s apps can even make fine-tuned fitness recommendations. And that’s no small thing.

“Now that Fitbit trackers are ubiquitous in the market and we’re capturing data from millions of individuals, we are leveraging machine learning to provide smart guidance as part of a personalized experience,” Raj says.

In retrospect, the fitness data that a Fitbit logs probably could’ve helped Raj assess himself better than a standardized regimen. But even then he probably would’ve needed a more personalized service. Now, in a What if…-style twist, the algorithm that Raj’s team has built is the exact thing that could’ve ramped him for his race.

It’s 2017. Machine learning isn’t a vanity project for data teams. Instead, it’s become one of the biggest providers of value. Teams with the resources available need to identify the points in their product where machine learning can powerful leverage and innovate there specifically.

Fitbit’s good problem

Fitbit’s product has a very good problem. The amount of data it tracks can be overwhelming. The devices ingest fitness data and feed it into the applications, where it’s paired with data from the user’s app interactions. From these data sets, the data science team can build a comprehensive user profile. There are occupational hazards, though. What you gain in information, you can lose in clarity.

“The volume of data does makes it challenging,” Raj says. “We have to make sure that we’re scaling both our hardware and our ETL processes. Storage is essentially a solved problem, so we focus our efforts towards compute time and processing.”

Theoretically, sampling data sets would be a safe bet for a model. After all, Fitbit takes in a lot of data. But Fitbit’s product is user-centric. So long as it’s trying to build something precise to each user’s profile, there’s danger in recklessly sampling.

“We never know what we’re going to need later,” Raj says. “If we’re sampling 30%, there’s a potential for losing 70% of the things that are happening. In that case, our model can only be as good as what 30% of the data tells us.”

A 30% sample could be directionally correct; but so is a regimen pulled off the web.

“We’re always asking, what more can we do with user data?” Raj says. “Offering more in the way of personalization and guidance is what we’re striving for.”

Easier said than done. The wealth of fitness data adds a challenging new dimension to the algorithms they put in front of users.

But, believing that device data would provide a noticeably better experience, Raj’s team has tried leveraging more and more of it in their machine learning experiments to improve that experience. And they’re confident it’ll pay off.

The last mile

Although machine learning should be a core competency of any data team, building models shouldn’t be all anyone ever does. That’s just not strategic. Part of making the most effective use of machine learning is knowing what can be repurposed; where innovation is necessary; and where the greatest amount of value can be extracted.

“The need to have people just cranking away at algorithms in house is being diminished,” Raj says. “Many third party companies out there provide these one-size-fits-all solutions that are good for probably 80% of solutions, right out of the box. If you really want to hone in and get that last 20%—to get that last mile—that’s where you need people in-house to work on those kinds of problems.”

Running that last mile can be significant. That’s why pays to do it in the right direction.

Few would disagree that personalization is the future of consumer technology. The question is always to what extent.

The original Fitstar algorithm took post-workout feedback and rejiggered the intensity of the next workout accordingly. Users would input their feedback, Goldilocks-style: This was too hard, too easy, just right. The personal training app could, for example, scale the amount of push-ups up or down. Incrementally improving the product like this, the data team saw jumps in engagement.

But was it the best value the team could provide for all the fitness data a user was trading? Raj thought no.

He knew the next big algorithm his team invested in would have to make evermore significant strides in the arena of personalization. If his team was going to run that last mile, users had better feel it, too. So, they turned to the greatest and most difficult resource at their disposal—all that fitness data in user devices.

A model for every user

“What we’ve done with the latest iteration is truly integrated Fitstar data with Fitbit device data,” he says. “So, whether users have a proclivity toward cycling or running or using the elliptical or hiking, Fitbit automatically tracks those preferences and uses them to generate a custom workout for the user.”

Instead of relying solely on what users report to the app, the app makes its calls based on what the Fitbit device itself is saying. If you go cycling a lot, the algorithm will pick up on that signal and create leg-intensive workouts for you. But if you’d just gone cycling the day before, the algorithm knows you might need recovery time and will create an upper-body workout to give your legs some rest. It takes cognitive load off the user while building a more personalized workout.

Raj’s early misadventure in an unpersonalized regimen had taught him the dangers of directional correctness. The original Fitstar algorithm solved for this in part, but the newest algorithm represents a truer investment in personalization. It’s not asking you to sort yourself in a bucket, or even recommending buckets with precision. It’s creating one uniquely suited for you.

“At the end of the day, this is where the real value of Fitbit is, in trying to provide this sort of motivation and fitness changing experience,” Raj says. “We want to provide insights, but as we move in this direction, we want to provide guidance. Between those two things, Fitbit is priming itself to succeed where other fitness trackers and companies like that are just providing the data.”

Even data teams as well-resourced as Fitbit have to choose wisely: Innovate everywhere and your customers may never notice. Choose a meaningful place to build every once in awhile and customers will feel the difference in your product.

Despite its massive scale, Fitbit is creating a product experience that maps to each individual user, and machine learning is the most effective way of utilizing the data at its disposal.

“Everybody’s different and everybody’s got different goals,” Raj says. “That’s where the machine learning becomes very important and we try to build a model for every specific user.”

From personal experience, he knows how hard and how vital this is. So he’s focused his team’s efforts on the right projects. Although the product is even smarter and more powerful than when Raj joined, his goal is still the same—to get every Fitbit user in the right program at the right time.