Is Machine Learning the Big Bad Wolf of Media? Whisper doesn’t think so.
“We believe that happiness starts with being your real self,” says Ulas Bardak, CTO of Whisper. “That’s what sets us apart from other platforms. We have a strong focus on authenticity.”
It’s not often you hear a chief technology officer focus on the way he wants his users to feel. CTOs typically stay focused on technical things, like data and machine learning. But when engineering a social platform for north of 30 million monthly active users, factoring in how an algorithm can create a safe and inclusive network is inextricably tethered to Whisper’s purpose.
Whisper, a media company based in Los Angeles’ “Silicon Beach,” has the largest online platform where people share honest feelings and find connection over shared topics – without identities or profiles. On their app, users type out a thought or confession and build dynamic memes with emotive backgrounds and video clips with the help of Whisper’s image and video matching technology.
With well over half a million “Whispers” created daily, Whisper’s in-house editorial team is given the necessary building blocks to package and curate some of the most clickable, vulnerable, and engaging stories on the web.
But as it turns out, Whisper’s intention to help people be their “real self” is a mission with a machine behind it. Machine learning, that is.
From its content creation strategy to its data science, engineering, and product teams, machine learning is foundational to Whisper’s entire operation. Its algorithms and deep learning initiatives have not only helped Whisper compete for eyeballs against Facebook, Snapchat, Tumblr, and other social and news networks, they’ve also helped Whisper scale to millions of diverse users across platforms, and build out expressive, online communities along the way.
To Whisper, machine learning is not The Big Bad Wolf of media, as many may fear, replacing editorial staff with robots. In fact, Whisper sees machine learning as the source of media’s reinvention. For them, machine learning is a tool that can work at the behest of its people-powered editorial teams and build a publication that reflects the communities it serves.
From its content moderation system, to spotting bugs and tracking hundreds of metrics that determine engineering and product priorities, Ulas shares how Whisper is one of the few media companies to experiment with machine learning not to replace, but to help its editorial team reach ambitious readership goals and foster close-knit communities.
The promises and pitfalls
In today’s attention economy, publications and online communities are looking to machine learning to help create, automate, and personalize individual reading experiences on a wide scale. For example, The New York Times uses data science to drive its readership strategy in tandem with their editorial staff. However, there’s no greater instance that showcases the promises and pitfalls of machine learning in media than what happened to Facebook in August 2016.
When The Social Network was rumored to be biased against conservative-leaning news, Facebook fired its human editors behind its Trending feature to ensure there was “no evidence of systematic bias” moving forward. When Facebook’s human editors were replaced by an algorithm, it brought to life journalism’s worst fear: that writers and editors will one day be replaced by software.
But as it turns out, Zuck made a bad choice. This algorithm was a shoddy one, and after 72 hours it was promoting fake and low-quality stories. It became clear that despite the advancements software has made in the past few years, some work cannot be done without human reasoning. Today, people are again involved with Trending in a few ways, as Ars Technica reports, doing basic, yet hugely important things like “confirming that a topic is tied to a current news event in the real world.”
Aside from being a political and tech kerfuffle, the Trending fiasco was also a lesson for the media industry at large. Sure, algorithms can help categorize content and provide recommendations at scale, but there are some decisions that cannot be made without human collaboration. This is a key reason why Whisper relies on a blended solution where machine learning collaborates side-by-side with its human moderation and editorial teams.
In order to encourage user-generated content, where the Internet divulges their secrets on a massive scale, Whisper needed to develop an automated system of decision making to keep pace with its growing and active community, while simultaneously ensuring quality control over its content.
“When Whisper first started out, we had a human moderation team, culling and curating the Whisper posts as they were created,” Ulas says. However, as Whisper became more and more popular and there was new content getting created every day on a viral level, it became very obvious to the company that they weren’t going to be able to track all the content.
“Sure, we could continue to hire more people for the moderation team, but there was a more scalable solution,” Ulas says.
The Arbiter, one of the company’s many machine learning tools, deciphers what content is “safe”, what needs to be filtered out, and what needs to be reviewed by a human moderator. The Arbiter moderates 70% of the more than half a million Whispers created on a daily basis, leaving the remaining content to be reviewed by the moderation team. At that point, Whisper’s editorial team handpicks the most engaging Whispers and produces the thousands of articles and videos readers have come to love, which are then featured on the company’s website and social properties.
“This deep learning system helps us manage and categorize the hundreds of thousands of new Whispers created each day,” Ulas says. In the beginning, however, the Arbiter wasn’t as simple as setting up a few basic rules to follow and automate.
“Human language is a very complicated thing,” says Ulas. “It’s why Natural Language Processing exists, right? You can’t just write down 10 rules and hope to catch all these potentially problematic issues.”
What Ulas’ team learned in developing the Arbiter is that an algorithm for a media company is far more complicated than answering a series of simple yes or no questions. The team knew that if the Arbiter were programmed too aggressively, then it would take out a lot of good content, putting at risk the high engagement numbers they needed to succeed. However, if the bar was too lenient, then Whisper ran the risk of letting unfit material to be published and alienating users.
“We want to ensure that the content being created is safe in order to create a positive place where people can share their honest thoughts and feelings,” Ulas says. Considering many of Whisper’s submissions may be written with sarcasm or the need for greater cultural context, a human perspective was crucial.
“When moderating nuanced and somewhat confusing content, using a neural network was our best choice,” Ulas says. As he explains, a neural network in a computer is modeled after how the brain works. “In Whisper’s case, this network of connections spans millions of instances of human classified content, which we were then able to train the Arbiter off of,” he continues. “By the time we were programming the algorithm, we had enough training data from all the previous human moderators in order to employ deep learning techniques.”
If the Arbiter determines that no one should see a Whisper, it’s removed from the platform. But if the text is too complicated, and the algorithm can’t determine the right way to categorize it, the human moderation team weighs in.
“One thing about deep learning is that it works pretty well if you have enough good data. Usually, that’s like hundreds of thousands of instances to millions of instances,” says Ulas. “We had millions of Whispers labeled as good or bad by our human moderators. Based off of this data, we were able to build and experiment with different infrastructures of the Arbiter.”
Still, a rich data set is only the first step. Today, the Arbiter is used for content moderation, but in the future, Whisper envisions its machine learning technology playing a more active role in story and video creation.
While this level of automation is still in the works, machine learning initiatives at Whisper go beyond the Arbiter. And that’s been key to their growth. The secret to Whisper’s success in scaling and supporting more than 30 million monthly active users has also largely been due to the fact that data science is an integral layer for Whisper’s entire operation. Besides content moderation, algorithms also serve product, engineering and support teams.
Data as fuel
“Traditionally, data science would be its own little team that maybe provides services to the rest of the company,” says Ulas. “But in our case, we have dedicated data scientists focused on every feature of the product. They’re involved in the development of both the engineering infrastructure as well as whatever infrastructure they might need to build to support that.”
In structuring every team around data science resources, machine learning has become a fuel source for every team.
“Every week we have stability meetings where I sit down and look at the hundreds of metrics we track, from both the user engagement and infrastructure health metrics, as well as the user satisfaction data pulled directly from the App and Google Play Stores,” Ulas says.
While sticking to the engineering and product priorities determined by their product roadmap, teams also depend on the spiking or dipping trend lines to determine what’s most urgent to work on. “Anomalies and trends have been a good guidance for us to actually focus our work in the coming week in terms of stability around certain products or features,” Ulas explains.
From moderating and categorizing content to tracking product and engineering metrics, data is key to all the decisions Whisper makes as an organization. In fact, algorithms even help Ulas in his managerial style: “I’ve programmed an algorithm to help me schedule one-on-one meetings with different engineers on my team on a regular basis.”
In taking on several disciplines of machine learning to drive and evolve internal operations, Whisper has begun to explore new ways in which automation can create more engaging online experiences for its users.
Creating communities through content
Where there is content to be read and shared, there is also a community to be fostered. And so, with its massive number of monthly active users and wide distribution, Whisper saw the opportunity to create a more tight-knit social environment for its users, with the help of machine learning.
Whisper’s recently launched Groups feature allows anonymous users to join conversation threads organized by a wide variety of interests.
“Topics could range from something like you’re a new mother, or maybe you’re a soldier and you’re struggling with PTSD,” says Ulas. “But of course, it could also just be something fun, like talking to people who like the same taste in music as you.”
From an engineering perspective, Whisper is making the Groups feature as accessible as possible. “With machine learning, and a recommendation engine, we’re trying to make it simple for people to find the groups that they’re most likely to engage with. These deep learning initiatives are daunting, but Groups is the next iteration on our mission to help those users be their authentic selves,” Ulas continues.
And without a doubt, machine learning has been the driver behind Whisper’s continued evolution from a social network to a media company fueled by user-generated content. But Whisper’s use of software highlights the pivotal role technology will play for the media industry at large. As publications continue to rely on advertising as a revenue generator, many companies will continue to strive to expand their audiences.
To compete with Whisper, Facebook, Tumblr, or Snapchat, the next generation of media companies must be driven by content, community, and code.