The business impacts of data quality issues—what they look like and how to think about fixes
For leaders in product analytics or product management, data is the backbone of every decision, strategy, and innovation. But what happens when that data is unreliable? Expensive data quality issues aren’t just inconvenient—they’re silently eroding trust, delaying progress, and undermining your ability to deliver results.
If you’ve ever questioned the accuracy of your metrics or seen your team struggle with inconsistent tracking, you know the stakes are high. The path to better planning and confident decisions begins with addressing these challenges head-on. Consider these familiar scenarios:
- Your first thought after a “successful launch” announcement is, What data did they use?
- The potential of AI excites your CEO, but inconsistent data makes it unattainable.
- Your teams avoid using analytics tools like Mixpanel due to messy event tracking with labels like Signup Completed, Signed Up, or user_signup.
- Debates over KPI discrepancies consume more time than decision-making.
- Your data team spends more energy in reactive problem-solving than driving proactive insights.
These pain points aren’t just operational nuisances—they’re barriers to growth and innovation. But here’s the good news: You can fix them.
Why better data drives better outcomes
High-quality data isn’t just about accuracy; it’s about enabling your teams to move faster, collaborate smarter, and make decisions with confidence. With tools like Mixpanel and Avo, you can standardize tracking, uncover insights in real time, and create a culture of accountability around data quality.
In this blog, we’ll highlight six root causes for data quality issues, and three essential dos and don’ts so you can:
- Build your case for fixing data quality
- Start getting better insights and make better decisions, today
- Leverage Mixpanel with high quality data as a game-changer that allows your entire team to make better decisions faster
By confronting your data quality challenges, you’ll unlock the full potential of your analytics, enabling your team to plan smarter, execute better, and lead with confidence. It's time to take charge of your data and drive the impact your organization needs.
Facing the data dilemma: Recognize the problem, build the case, and pave the way to better data
If you can relate to some of the above symptoms then here’s the diagnosis: You’re suffering from poor data quality, and you need treatment. We’re hoping this blog and our upcoming workshops with Mixpanel can help you in your work to improve your data quality.
We’ve learned it’s not as easy as just jumping in to implement solutions. There’s a hidden first step, which is allowing ourselves to face the severity of the problem. That will be a large focus of this post. We will guide you through finally facing your data quality issues, shed light on the implications, and help you build your case for taking action.
But for now, let’s start at the beginning: defining what great data looks like, and how we can start the journey towards it.
What does great data look like?
“Data quality” is quite abstract. What does amazing data quality even look like? We need a tangible, relatable understanding of what defines “good data quality” to move forward.
We all strive for data that is accurate, consistent, reliable, and fit for its purpose. We need data we can effectively use for analysis, decision-making, and operational tasks. And in today’s world, that includes data we can reliably build AI products on to drive value and revenue.
But let’s make one thing clear: We should never be striving for perfect. Perfect is the enemy of good. You will always have some data quality issues. However, too many choose to ignore all of them. We get stuck trying to live with them, with the additional cost of dealing with it downstream and risking bad decision-making. But the consequences of data quality issues are too significant to ignore. The most important thing is to realize what the business impact is in your case.
Business impact: Is bad data holding you back? (The answer is yes)
You might be thinking, “Our data isn’t perfect, but what’s the real impact?” Or perhaps you recognize the challenges, but the scale of the problem feels overwhelming, making it easier to accept the status quo.
The reality is your data issues are likely more pervasive—and costly—than you realize. These challenges don’t just impact accuracy; they disrupt decision-making, slow innovation, and ultimately drain resources across the business. From missed opportunities to wasted spend, the cost of subpar data adds up in ways that directly affect your bottom line and long-term strategy.
- No GenAI for you: As highlighted by OpenAI engineer James Betker, “[generative AI] model behavior is not determined by architecture, hyperparameters, or optimizer choices. It’s determined by your dataset, nothing else.”
- Lost revenue: This can cause revenue loss as a result of decisions based on incorrect data or money left on the table from experiments that lead to the wrong conclusion.
- Wasted time and resources: There are whole teams dedicated to building and maintaining expensive analytics pipelines that produce unreliable outputs. As a result, data consumers waste time on analyzing, transforming and patching broken data.
- Bad decisions from bad data: Without reliable data, you can’t discover the right opportunities to improve in your product and go-to-market—and worse still, you may end up heading in the wrong direction.
- Employee burnout: Your data team’s sanity is the last item in this list, but it might be the most critical. Constantly working with broken data, figuring out workarounds, and chasing people to fix issues leads to tension, churn, and ultimately knowledge drain.
While the challenges may seem daunting, there’s a clear path forward. Recognizing the critical importance of data quality is the first step—and once that shift happens, you can begin addressing the root causes with a focused, systematic approach. Let’s dive into some of the most common data quality issues that could be hindering your organization’s performance and explore actionable strategies to resolve them.
What do some common data quality issues look like?
Let’s shed light on themes of issues to help us identify which ones we have. Below are (overlapping) themes of data quality issues with examples, so you can start looking for signals:
Over tracking: This is basically tracking for product “coverage” rather than data use cases. It results in excessive or irrelevant events are collected, which overwhelms your data consumers and your data infrastructure, and magnifies all other issues:
- Your product triggers events for “every” product interaction, without having a use case for it.
- The data is structured as many shallow events representing the same user action where each event has very little metadata.
Usability issues: The data is difficult to use or access, causing a lack of data literacy which blocks you from democratizing data via self-serve analytics (a great read on this by our friend Glenn Vanderlinden):
- Event names are not built to help data consumers understand the user action they represent.
- There is no (accessible) documentation for what the event data means or how it should be used.
- There are no or poor standards to follow when defining data .
Incomplete data: Something is missing in the data you’re collecting, such that it gives an incomplete picture:
- An event does not fire every time a specific user action is performed.
- A required property is missing 20% of the time.
Inconsistent data: Your events and properties have inconsistencies in naming, typing, or even the actions they represent depending on the product, platform, event, or codepath:
- Android fires the checkout completed event when the user clicks the checkout button and iOS and Web when the checkout is successful.
- A single property is sometimes sent as a string, integer and a list of strings.
- Twenty different variations of an event name are fired when a user starts a game based on where it’s fired from; game_started, GAME_STARTED, match_started, game_initiated, user started game.
Duplicate data: Data duplication or even multiplication is happening on some level:
- An event is triggered multiple times for the same user action, resulting in unexpected and incorrect volume spikes.
- More than one property on an event that represent the same metadata.
Inaccurate data: The data doesn’t match your expectations - and might even be totally impossible given the constraints of reality:
- A negative property value is sent for the age property.
- The display_name user property is not updated when the user changes it.
Sequencing issues: When tracking actions are made in an unexpected order compared to user and/or system actions:
- The Email Address user property is updated before they have entered their email and the Email Updated event is fired.
- The Search Result Received event is mistakenly sent after the Search Result clicked event, causing unexpected and inaccurate conversion funnel drops.
Misconfiguration issues: Faulty integrations or unexpected configurations can cause issues at any point in an event’s journey. The further upstream the data issue is, the more magnified the implication is:
- Misconfigured data collection or mapping in your CDP can create ripple effects, leading to widespread issues across connected systems
- Hiding events in your analytics tool UI causes a mismatch in connected destinations, such as the warehouse.
Anyone who has created a chart before will for sure have experienced at least one of these issues. It’s a gut-wrenching feeling—when you’re sharing an amazing win or a fantastic opportunity with your CEO, but they immediately doubt the data. You go back to your desk with your tail between your legs, do a little digging and find out that they’re right. The data wasn’t reliable.
If you relate to any of the sentences at the top of this post, or identify with some of the examples above, you know you have data quality issues. And if you don’t relate to any of this, please reach out and tell us your secret (or talk to your most senior data analyst and they will prove you wrong 😏). The next step is to understand the root causes of these issues.
Root causes: Why do I have all these data quality issues?
Failure to acknowledge the importance of data and data quality in your business strategy leads to lack of ownership, processes, documentation and/or tools, which leads to data quality issues.
Measuring product success is essential and should be a part of the strategy from the beginning. But collecting the data needed to measure success is too often treated as a side project. It’s considered as a chore that needs to be checked off the list in order to ship the feature. A chore done for some analyst, rather than as an impactful tool to make good decisions. We end up rushing and, as a result, track irrelevant events, design suboptimal data structures and make mistakes during implementation.
Ask yourself: Where do we stand with the following:
- Does data matter in the company?
- Does anyone feel accountable for data quality and data governance?
- Do we treat data as a cross-functional team sport between product, engineering and data, or as a chore engineers have to do for some analyst?
- Do we have standards to follow when tracking new features?
- Are our event schemas documented?
- Do engineers have proper tools to implement and validate before releasing?
These are all crucial for fixing the problem at the root, and anything less than fixing at the root will leave us in reactive damage control.
The trick is to break down the challenge into manageable chunks.
Where to start? Three do’s and don’ts
Data quality needs to be addressed at the root. But first, you have to face your issues and build your case to the team on why. Then you build systems to prevent new issues from adding to the mess. Finally, prioritize historical issues to fix, one at a time. Here's what the beginning of that process looks like (full practical steps deserve a blog of their own):
1. Build your case
Don’t: Get stuck in the “data quality for the sake of it” trap: Pick your battles. Focus on issues that have business implications.
Do: Make the problem visible and help everyone understand how much time, money and sanity improving data quality will save. Use the resources from this blog post to identify the business impact and themes of data quality issues of your own data. You’ll start seeing patterns of problems you can tie to business impact.
This will help you get buy-in from management and other teams to pursue the path to better data.
2. Start preventing, today
Don’t: Delay preventative measures: not being ready to fix existing issues does not mean that you can’t start preventing new ones.
Do: Start preventing future issues today, even while you still have messy data: Tackle the root causes of your data quality issues to prevent more from surfacing. You do that by addressing the root causes:
- Data purpose in company culture: Treat data like a product. It shouldn’t exist for the sake of data. Figure out what data means to different parts of your business. Identify the people who care about the data and pull them in.
- Ownership of data and data quality: Establish ownership of data and its quality. Make it clear that the data owners are accountable for the quality and reliability of their data, and equip them with systems to make that easy.
- Facilitate cross-functional collaboration: Break down the silos and enable collaboration between data, product and engineering. Establish processes for data specs with peer reviews and frictionless translation of specs to code.
- Data standards: Define standards for data structures and processes for how to suggest changes with new feature releases or data needs.
- Single source of truth data contracts: Make sure event schemas and their meaning are documented and accessible to all stakeholders. Then treat that documentation like a contract with the engineers who implement. Contracts most effective when clear.
- Streamline implementation workflow: It’s easy to implement analytics events. Getting it consistent and reliable at scale is a whole other story. Establish tools and processes to implement and validate, so product engineers can fulfill their data contracts.
As we discuss in our recent series on data mesh for event data quality at scale, the goal is to establish federated governance with domain-oriented ownership. This means the goal is not to have a central governor of data quality. Instead, we strive for a federated team that defines what constitutes quality, while the domain teams own their own data quality.
3. Prioritize issues to fix
Don’t: Try to fix “all the issues.” Prioritizing all means you’re prioritizing none.
Do: Focus on what’s most impactful for your business, for example important events that revenue operations depend on.
Start with an audit. Establish an overview over your current state of event data, highlighting the data quality issues. By adding some high level information about each issue, you can measure the impact. Useful information is for example
- When did the issue start?
- How much event volume does it affect?
- Which KPIs does it impact?
- Which sources does it come from?
You’ll learn a lot from this, and more importantly, you can prioritize and triage your issues, deciding which to fix and which you will ignore for now and find a way to live with.
Leverage Mixpanel with high quality data to drive better decisions faster
Once data reliability is established, Mixpanel becomes a game-changer for your team. It’s an intuitive, time-saving solution that empowers your entire organization to make informed decisions faster. Without Mixpanel, key players often depend on data experts to find answers, creating bottlenecks. Mixpanel eliminates that dependency by enabling everyone to access meaningful insights directly. As a leader, you’ll equip your team to move faster, collaborate better, and drive successful product outcomes with confidence.
Stay tuned or reach out for more
Want to learn more?
Join our upcoming webinar: Fix Your Data, Empower Your Decisions: Data Quality Series for Product Leaders, where you can connect with like-minded individuals and get actionable advice on how to tackle broken data.
Special thank you to Thora Gudfinnsdottir, Michelle Han, Kalina Bryant, Timo Dechau and Glenn Vanderlinden for their contributions to the ideas in this post.