No good data goes unpunished - Mixpanel
Product Foundations

No good data goes unpunished

Parker Tarun

Every so often, an email will appear in the inbox of “I have a good job. I’m ostensibly successful. But I feel empty inside.”

The recipient of these emails can relate. Eric Liu, Paul Duan, and Everett Wetchler founded Bayes Impact after their high-tech gigs had stopped answering meaningful questions.

Since 2014, the nonprofit has recruited several full-time engineers and data scientists to its cause: solving the world’s most intractable problems through an ambitious mix of data science and software.

“Not to knock anyone who gets a lot of satisfaction from working in a for-profit job,” Everett begins. “I worked for Google. I worked with wonderful, kind, brilliant people on fascinating scientific problems.”

Although he sounds sincere, one can hear the but… to follow. He too came from a good job. He too was ostensibly successful. He too was feeling a tad empty inside.

“I do my things, I come home, I put the pile of money in my bank account, and then I wash up and try to feel good somewhere else. I had gotten jaded working on general, for-profit abstract problems.”

So he got off-campus. He met with co-founder Paul on a Monday, said no to a competing offer on a Tuesday, and joined the nonprofit as CTO on a Wednesday, just as it was entering Y Combinator.

Bayes is housed in the rear warrens of the Zynga building, an ugly spaceship that runaway speculation has crash-landed in the middle of San Francisco’s South of Market district. The leased-out fuselage stands as a monument to overvaluation. In contrast to it, Bayes’ coworking space is crowded, wire-tangled, and booby-trapped with the phone-charging pods developed by its co-tenant, Doblet.

But while their office is tight, their sphere is lofty. To get a sense of Bayes’ ambition, look no further than its breadth of adversaries—fraud, hospital readmission, police brutality, criminal justice. It is more or less Bono’s shortlist of “Bad things I’m tasked with solving.” And as with Bono, there’s real risk of importing a solution that flies over a problem instead of addressing it.

So, we asked Everett: When you quit your sexy tech gig and apply your sexy tech skills to the world’s most intractable problems, how do we know you’re for real?

“Some of these are pretty straightforward,” Everett says. “There are also areas we’ve looked into a lot and decided not to touch. There’s a lot of hairy moral problems that some of these come to be. ”

Like anyone building a product, Bayes has to empathize with its end-user. Unlike most PMs, however, the choices they make could save someone’s life, hold a government accountable, or rob years of a person’s freedom. Data for good can be an impossible balance to strike.

Do things get better?

Some time has passed since Statistics 100, but tearing down the cobwebs, I can tell you a little about Bayes’ Theorem. Basically, this theorem allows us to complicate our assumptions about the probability of A, based on the relationship we believe it has to the probability of B.

How likely is a person in a set to have cancer? For simplicity’s sake, let’s assume the probability of this to be 1%, with no other conditions, because 1% of the population has cancer. But if we also include that the person is 63, understanding there to be a relationship between age and incidence of cancer, we can then re-assess the probability as being higher.

While this theorem might be an axiom for any quant, it has particular relevance to Bayes Impact. When they approach a problem, they want to understand any and all real-world conditions involved. Unlike Bono, they don’t want to provide a flyover solution.

“We read Wikipedia, research articles, white papers, figure out who’s working on this space, what researchers are actually publishing, what technical work has been done,” Everett says. “Are there other nonprofits in this space that are already working on it? What’s the scope of the problem? How big is it? How many people does it affect? How much human suffering does it generate?”

From there, Bayes partners with an institution doing good work around an issue. Take Zidisha, a peer-to-peer microcredit firm, which connects lenders in the United States directly with borrowers in Asia and Africa. Their value prop? Cutting out the intermediary. The tradeoff of being peer-to-peer? Accrediting borrowers and preventing fraud.

So Bayes’ data scientists collected and scrubbed Zidisha’s existing data: repayment behavior, borrower applications, networks between borrowers, and more. They built predictive models and then fleshed out a plot that would clarify Zidisha’s primary trade-off: for X amount of fraud prevented, how many honest borrowers would Zidisha potentially block? To get their findings operational, the engineers then wrote a Python script that helped Zidisha live-score its applicants.

But with shining examples like this, it’s easy to develop misconceptions about the efficacy of data for good. Lest open-source culture lead us into simple fixes, Everett has a warning: “You don’t just release a form to collect data and things get better.”

Data points are people, too

According to Medicare’s data, 15.65% of patients admitted to a hospital in Northern California for some type of medical treatment readmit within 30 days of discharge. This number is marginally better than the national average (15.9%). It’s hardly an encouraging picture of avoidable risk.

As the largest nonprofit hospital system in the region, Sutter Health has the largest potential impact for lowering readmittance. Sutter was already categorizing patients based on this kind of risk, but there were weaknesses that machine learning could improve.

Bayes is constructing a predictive model that scores patients and identifies who’s at-risk of having complications after release. As is customary with their projects, the team is concurrently prototyping software for clinical application.

“If the hospital has enough resources to do extra follow-up care to 10% of their patients, for example, maybe we can give them a better way to identify which 10% are good.”

But after tagging those patients, then what? Bayes hasn’t just promised a plot or an algorithm. It’s promised an end-to-end solution. Analysis alone isn’t enough, nor is programming.

“If I order everyone by how risky they are, what do you do?” Everett asks. “Do you call some of them more often? Maybe the people who are really high-risk don’t respond well to that treatment.”

The question then becomes: Should someone intervene? Can someone intervene? With which patients? How does this data fuel an actionable product?

“We have to actually think all the way through to ‘At what point do people get better?’ and not just stop at the prediction number being accurate.”

Accuracy remains a top priority. After all, it’s what distinguishes a data-driven approach from a gut check. Increasingly, however, Bayes finds accuracy isn’t an end in itself. Metrics build a case; they don’t close it.

When facing entrenched social problems, product design has an obligation to be more than just accurate. Otherwise, data for good can be inadvertently bad.

Click to tweet. Caption says "Metrics build a case. They don't close it."

More than accuracy

Someone in the California Attorney General’s Office is very excited. Maybe it’s desperation. Maybe the never-ending news cycle of police brutality has driven Sacramento to its wit’s end. Or perhaps, in Bayes, the state has found a real solution.

Recently, California re-upped its reporting standards for use-of-force incidents. While altercations used to only go on file if they resulted in death, new statutes mandate that law enforcement record any incident in which a civilian is seriously injured. And if that’s not enough to ignite conversation from within, someone else will do it from the outside: this data is going to be a matter of public record.

The bureaucrats beheld their Frankensteinian mass of Excel spreadsheets, PDFs, hard copies, non-standard electronic format documents—they knew they needed something sleeker. So Bayes entered in, to build a product that will collect incidents consistently and integrate all of it into a database.

On the reporting side, the tool is both user-friendly and detailed. No department should have trouble with it. On the integrative end, the hope is that analysts will be able to gain insight with similar clarity and detail. But Bayes knows there has to be more. Even the right numbers with the wrong spin will cause more distortion than clarification. There’s tremendous risk here. People’s careers and lives are on the line.

“If I show that two identically-sized police departments each have 100 officers, and one of them has twice as many use-of-force incidents, people’s default reaction is to point to that department and say they’re bad,” Everett says. “The reality is that they may have a very different population with more violent crime, where more force is flatly necessary in the line of duty.”

More attentive readers will observe a certain irony in this. The goal is transparency, and the method is cold hard analytics. But the state would like the data to show one thing, the police something similar, and the activists something else. Meanwhile, Everett and his team are trying to arbiter the political crossfire. He doesn’t feel comfortable leaving the numbers alone and he shouldn’t.

“Simple metrics can bait you to make the wrong conclusion effortlessly,” Everett says. “Even though we generally believe releasing more information is good, we want to also release contextual information to help people interpret the numbers shown.”

It’s taken a minute for technical recruiters to define “data science,” but they’re beginning to agree: Good data science is equal parts statistics, programming, and product management.Why is this so revelatory? Because that third piece allows for context.

Context is a part of Bayes’ products. Even if releasing numbers blindly were right in the name of transparency, it would be shitty product design. Or, as the CTO puts it: “Unless people are better off, we didn’t do our job.”

Every project must lead to conscionable reform. And if they somehow fail to do so, there’s protocol for that, too.

Click to Tweet showing police car lights. Everett Wetchler is quoted saying, "Simple metrics can bait you to make the wrong conclusion effortlessly."

In case of emergency

In theory, the American criminal justice system is supposed to be restorative. With the exception of a life sentence, the intent is to rehabilitate prisoners before eventual release. This is the idea, anyway.

Sentencing, in practice, has failed. The rate of recidivism in the United States is high. Within three years of release, 76.6% of former convicts are rearrested. During Bayes’ research phase, these facts qualified as unacceptable human suffering. It was the kind of thorny, high impact problem they had sought.

It began with a hypothesis: jail time isn’t closely targeted enough. The information that goes into each convict’s sentencing decision is overly broad.

”You’re convicted of burglary, therefore you get X-to-Y amount of years, and it’s up to the judge to give you a sentence in that range,” Everett explains. “But it’s possible that giving someone five years instead of three won’t deter him anymore, in which case you’re wasting two years of his life and taxpayer resources.”

There could be more precision in prison.

By collecting more information on convicts and plugging this into a predictive model, you might be able to determine who should get three years and who should get five. The way the nonprofit saw it, predictive analytics was the best shot at a truly differential sentencing process. Drawing on a more complex profile of convicts—What level of education do they convict have? What is their parents’ criminal history?—the model would quantify who “benefits” from additional jail time and who doesn’t.

And so the team became deeply invested in making criminal justice less punitive. More restorative. They might even crank that 76.6% to a more palatable number. But, after a considerable stretch of time, Bayes came to a conclusion: they had to do nothing.

It wasn’t something a model told them to do. The models, to the extent that they were built, were successful. But then the humans asked, “Is this solution doing what it’s supposed to do?” From an analytical standpoint, yes, it was predicting. From a programmatic one, yes, it was differentiating. But as far as product design, it failed. Even if analyzing noncriminal behavior somehow lowered Total Human Suffering, it still might violate any number of human rights.

“The idea that one person would get more time in jail than the other based on something other than their own criminal history feels wrong to me,” Everett says. “I don’t want to contribute to whatever that is, even if it objectively decreases some number of crimes committed.”

Number of committed crimes, decreased, seems like a golden metric. For a lot of institutions, moving the needle there would constitute unequivocal success. But for every life improved, there was the risk of disenfranchising another, based on data Bayes had deemed significant. And ultimately, people deserve something data points by necessity don’t get: redemption.

It was too difficult to guarantee that people would be “better off” than they are in the current system. In many cases, they could be worse off.

Still, many startups don’t have the luxury of hitting a killswitch. Their KPIs answer to a KPI more neutral than “better off”: profit. So, they move fast and break things. They ship when something is halfway operational. If you can show a model works, it’s reason enough for it to work.

But product ethos matters. It’s the only thing accounting for humanity in the number crunch. It’s something anybody changing the world must think about.

Product ethos

Software may be eating the world, but it would do well to know what goes in its body.

Not every solution will go toe-to-toe with moral quandaries. For the time being, we are spared the LinkedIn title of “Product Philosopher.” But there will come a time when your product, your software, or even your data, will come into conflict with impact. And yet impact—moral or otherwise—is everything to users.

In keeping with this publication’s reverence for the QBQ, it can be said that the question behind the question at Bayes is, “Are people better off?” This is a slightly different question from more easily measured ones, such as “Did we gather valuable insight about a challenging social problem?” or “Can we make X go up/down?”

The former considers impact, while the latter ones consider functionality. An engineering mindset may be tempted to prioritize highly-functioning solutions over impactful ones—force of habit. But we should always be returning to the question behind our questions. Impact should come before function, lest software colonize the places it wants to disrupt.

And for every hazard left on the cutting room floor, there still might be a highly-functional high-impact product on the way. Bayes Impact is building toward massive victories in unemployment, ambulance dispatch, money laundering, neurodegenerative diseases, opioid abuse–the list goes on.

But none of the engineers pretend they’re philosophers or that they’ve mastered applied ethics. They are always returning to impact, the question behind their questions.

“There’s no way to write algorithms that affect human lives without creating unease,” Everett says. “The question is not necessarily, ‘Is this perfect?’ but, ‘Is this better?’”

Click to tweet showing earth from space. Caption says "No good data goes unpunished."

Photographs from background imagery by mattbuck, Denis Barthel, and the NASA Goddard Space Flight Center, and are made available under an Attribution 2.0 Generic license

Get the latest from Mixpanel
This field is required.