What regulated industries know about AI in product development that others don’t
Every product team is moving faster right now. AI makes it cheap to generate variants, write code, spin up experiments, and ship features that would have taken weeks a year ago. The question most teams are asking is: how fast can we go?
The product leaders at MXP London were asking a different one:
“How do you make sure you're going fast toward the right thing?”
At the ‘Experiments in the AI Era’ panel, four practitioners from some of the most constrained environments in the industry (regulated finance, high-stakes education, consumer print, and mass-participation events) made a case that the teams best equipped to move fast are the ones who already know what it costs to be wrong.
AI in product development isn't a speed button
Oliver McQuitty, Product Director at Popsa, put the constraint plainly. Popsa is a consumer photo product: once a book leaves the printing facility and arrives at a customer's door, there's no rollback. The product carries people's memories. Getting it wrong isn't a bad metric. It's a broken promise.
That context shapes how Oliver thinks about the speed question. AI lowering the barrier to ship doesn't change the fundamental obligation to ship something worth having. The real opportunity, he argued, is using AI to sharpen your question before you act on it.
Bhavesh Vaghela, CPTO at London Marathon, put speed in a different frame altogether. He's not against it. He called it non-negotiable. But he was equally quick to separate speed from recklessness, and that distinction is where his thinking gets interesting.
Speed is a strategic imperative now. You have to go at pace. What that pace is depends on the constraints of your organization.”
Robin Raven, Head of Product at Pearson, works in one of the most regulated environments on the panel, a FTSE 100 education company where AI is used in assessment and where marking a high-stakes exam wrong has real consequences for real students. His instinct, maybe counterintuitively, leans toward speed. But he frames the speed-versus-quality debate as a distraction from the harder question: are you building the judgment infrastructure to sustain it?
Kavya Vibhu, Director of Product at CBRE Investment Management, deals with a different kind of constraint: investment management regulators, real estate oversight, and a fiduciary obligation to get it right.
Her take was precise: governance isn't a brake on speed. It's what makes sustained speed possible. "Governance forces us, in a good way, to do testing, so that the outcome of the experimentation is of value."
The pattern across all four was the same. The goal isn't to go slower. It's to build the organizational muscle to make better decisions at pace, and that muscle doesn't come from tools.
Further reading on running experiments in the AI era:
Making better decisions at pace: How these teams decide what to test
Bhavesh offered the most concrete reframe on the testing question, and it came from a real example. He walked the room through the London Marathon ballot flow.
His team had a specific user flow, identified the variance, and ran a series of micro-experiments on interactions within that flow. They landed a 5% conversion increase. On ballot volume, that's around half a million pounds in value.
The mechanism wasn't a single big swing. It was a lot of small ones, each informed by the last. AI removed the development constraint that used to make only big bets worth the effort. When building a variant is cheap, the math on marginal gains changes entirely.
Oliver pushed back, not on the principle but on the culture risk. Running a lot of tests isn't the goal. Running successful tests that drive revenue is. He's wary of organizations that celebrate experiment volume as a proxy for progress.
He also made a small but pointed language choice as he refuses to use the word "learnings." They're lessons. "Celebrating lessons that you have learned and applying them to go on and beat your competition in the market: yes, definitely." Running 50 tests where 25 failed and 25 shipped doesn't make his pulse race.
Kavya brought the sharpest operational frame of the section, and it's the kind of thing that sounds simple until you think about how few teams actually do it: start testing the idea before you test the build.
We need to test the idea as much as we test the prototype.”
Her team runs a reversibility check and an assumption count before committing to a full test cycle. If the decision is reversible and the assumptions are few, they move. If assumptions exceed five, they hold and run a proper validation cycle.
Governance as a forcing function, in other words, not as a bureaucratic gate. The most common failure mode in AI-assisted product work is acting on an answer to a question nobody thought through carefully in the first place.
The offload question: What belongs to AI and what doesn't
Kavya had the sharpest take on AI's role in the experimentation workflow, and it came from an investment management mindset. "If the company makes the same decision as the competitor using AI," she said, "it's like having the same McKinsey consultant giving us the same opinion across."
In a business where the edge lives in judgment calls on edge cases (market moves, portfolio strategy, scenarios that have no historical parallel), offloading the thinking to a model trained on the past is a category error. AI earns its place as a review layer, a risk check, a tool for the repetitive and deterministic. Exploratory thinking that depends on human judgment is a different thing entirely.
If your job or the process that you're trying to automate involves heavily on human judgment, I would say stay away from AI for that matter.”
Oliver's framing was complementary but more operational. The things AI should handle, in his view, are the things teams already know are important but consistently deprioritize: analytics hygiene, experiment setup, the cleanup of events that were added to a codebase and never properly maintained.
He pointed to the Mixpanel demo from earlier in the day (automated identification of analytics gaps in GitHub) as exactly the kind of work that compounds over time. "The teams that will win are the ones who have spent the last five years focusing on the really boring stuff that actually makes the system work."
Bhavesh put a practical test on it: if it's deterministic and codifiable, hand it to AI. The noisy, repetitive, easily-forgotten work is where AI removes drag so humans can focus where they actually add value.
The corollary, though, matters just as much. The moment you're offloading critical thinking, you're producing commodity output. "If everyone's doing that," he said, referring to generating AI-written strategy documents, "you just got a commodity." When the same model answers the same question for every company in your market, the answer stops being a competitive asset.
Further reading on running experiments in the AI era:
Critical thinking at scale: The cultural problem no tool solves
The panel's fourth topic was the one with the least obvious answer: how do you scale judgment across an organization when most people would rather move fast and let the AI fill in the gaps?
Robin's team at Pearson has been working on this directly. The efficiency gains from AI have created a new version of an old problem: duplicated effort. But the duplication that matters now isn't code.
When teams work in silos, each group runs its own AI sessions, generates its own strategy drafts, and draws its own conclusions from the same organizational context. The work looks different, but it arrives at the same place. At Pearson, the way they framed it stuck with the room.
The risk is not duplication of code but duplication of thinking. It's so easy to create code right now, but if people are thinking about the same problems across the organization, that's waste.”
Pearson's response: make experiments visible, push outcomes to shared channels, build lightweight shared infrastructure (like a brand-aligner skill) that reduces the friction of doing things right the first time. The goal isn't a central AI committee. It's reducing the signal-to-noise ratio enough that people can actually learn from each other's work instead of re-running it.
Oliver had a structural answer from Popsa: AI councils. Not a governance body, but a regular forum where engineers and product people share what they've been using AI for, what's worked, and what hasn't.
The dynamic that made them work was the social proof, not the format. When someone sees a colleague they respect using an agent to push code or clean up analytics, the abstract permission to try becomes concrete. "It took having those safe spaces for conversation, for information sharing that cuts through the noise of Slack."
Bhavesh framed the whole thing as a cultural shift that has to move in both directions. Language, stories, and rituals are what actually change behavior, not policies.
And it starts with leadership. "If the leadership team are not really engaging with AI, no one else is going to — or they're going to get fooled when Claude gives them a document and they're just going to read it and go, 'That was really good,' but actually there's no critical thinking there."
What happens when AI bypasses the discomfort
Kavya closed on the question that none of the other panelists quite put this way, and it's probably the most important one for the next few years. The concern is on capability loss, not job loss.
"Whenever you're outside your comfort zone, that's where the real learning, the real expansion of you, growth of you happens," she said.
AI lets people skip that discomfort. You hit a hard strategic question (roadmap prioritization, a novel user problem, a scenario with no clean precedent) and the temptation is to ask the model. The model gives you something that looks like an answer. You never had to sit with the question long enough to develop your own judgment about it.
Her framing: you need to be Tony Stark, not just someone holding the Jarvis suit. You still have to know the real question before you can use the tool.
Robin echoed it from a process angle. The act of writing forces critical thinking. Teams that sit down and argue through a hypothesis, that use AI as a coach rather than a ghostwriter, produce better testing strategies than teams that outsource the thinking entirely.
"Choosing what to test and choosing what not to test starts from a really, really well-thought-through, deeply researched, critically thought-through hypothesis that you can't outsource."
Mixpanel's own product work reflects this same conviction. Mixpanel AI is built to surface patterns and anomalies in your data. The product instinct that decides what to do with them still has to be yours.
Stay curious
If you want to see experimentation and feature flagging working together in Mixpanel, go here. And if this session left you thinking about how your team builds hypotheses—or how it doesn't—Mixpanel's product experimentation guide is a good place to dig in.


