Taylor Murphy on the funny thing about data
You have a PhD in chemical engineering, you applied that knowledge toward a career in data engineering, and you helped create Meltano at GitLab before reforming it as a standalone company. So my question: Are you more proud of all that stuff or when one of your data tweets gets hundreds of likes?
Haha. Well, I do kind of see all those things as being connected. I have a philosophy that if you’re not having a laugh at some point, then what is the point of doing whatever it is you’re doing? There’s serious work to be done in the data world—money to be made, problems to be solved, and efficiencies to be gained—but you’ve got to have some fun during your work, you’ve got to bring some joy to it.
So, yeah, the jokes on Twitter started in earnest around the beginning of the pandemic, like they did with some of my other Data Twitter friends, kind of as an outlet just to find some joy and to make some laughter around what it is we do. And I’m happy to have found that outlet because I was also always the kind of person in class or in the office to crack some jokes and to have a laugh.
As someone who is also working in the data world, I definitely get some laughs out of Data Twitter content like yours. Have you seen any other greater good coming from your posts? Maybe they’ve sparked some extra interest or engagement in the data community?
Yeah, certainly. I’m often surprised by how well people connect with what I put out.
A challenging part about working in data has been that it’s a role that other people don’t fully understand. So when executives don’t get how or why we do what we do, for example, it’s nice to be able to find relatable ways to communicate that bit of shared pain and then laugh about it. If you’ve been in the space long enough and you know some of the foibles of SQL or running data pipelines, you’re in on the joke. And it’s fun to feel part of that community.
As for how jokes can be productive, one of my personal strong values is truth and honesty. It’s probably what’s attracted me to working with data, given that it’s supposed to be a representation of reality and truth. Since they say there’s always some truth behind a joke, I often find myself making fun of things in data that I want to see improved. Interesting conversations and even some solutions have come out of Twitter threads sparked by silly posts about data tools and things like that.
Speaking of building community, let’s segue to Meltano. As a company, you all have a mission to enable everyone to realize the full potential of their data and, essentially, bring more people into the data world. How are you getting after that?
Yeah, Meltano is open-source infrastructure for building a DataOps platform, or a modern data stack. By being free to use, we’re a first-of-our-kind data tool for building end-to-end data platforms. We hope that removes the barrier to entry for smaller teams and individuals.
Having been a company that was incubated and then spun out of GitLab, we still share a lot of their values, namely that strong commitment to open source. And it doesn’t just stop at providing access to our tools. We think open source is important for data when it comes to integration, replication, and for scalability when connecting n number of sources to n number of destinations.
This can all result in better functionality and performance, but it’s also just a great way to bring together a community of people. We’ve seen this with dbt and the community that they’ve built around a particular tool. I want that to exist for the entire data stack.
So you’re talking about growing community both on the end-user side and on the data tool company side?
One of our beliefs is that the data market is changing rapidly, and the tools of today may not necessarily be the tools of tomorrow. I think there are new categories being created in the data stack all the time. So we want to build the infrastructure of all data platforms that enable people to swap things in and out where it makes sense and to build abstractions underneath all of these different tools. What this ultimately does is help influence the integrations between the tools to be better.
On top of the open-source part, we also want to be able to offer resources and best practices for helping those companies integrate and helping the end users build their data stack with them. For example, if someone wants to add Great Expectations and dbt, we can say, “Oh, somebody’s already configured dbt over here. We can share some of that configuration. It can be up and running with Great Expectations even easier.”
As more companies lean into data, there’s ultimately going to be more teams hitting the, “Oh, we have to scale and we didn’t realize we need this tool,” or, “There’s a new, cool thing we now want,” stuff happening, like we’ve talked about. What baseline tips or guidance do you give out to best handle all those growing pains?
Not to lean into an Amazon analogy too much, but think about Day Two. It’s one thing to set up a system to get some data in and then build a single dashboard, it’s another thing to say, “Okay, now the business logic is changing, now the metric is changing.” How do you handle change, and how do you confidently move forward with these changes? Change management is a big thing.
I have an anecdote of working with a data analyst who said, “Okay, I’m not sure why I’m doing things this way, but I’m going to trust you, Taylor, that this is the way to do it.” A year later, we had to do something and bring some reporting back, and we had to go into the git history to find it, and he came to me and said, “I get it. I finally understand why we’ve been doing it this way the whole time.” And I was like, “Yes, that’s awesome. Now I just need to make that happen quicker.”
So individual tools aren’t necessarily incentivized to think about that, and so that’s one thing I hope to bring with Meltano, where you can add tools on top of it, you get a lot of this stuff for free, and we need to do a good job of pitching the value of why you should care about this. It’s something that you may not care about until six months down the road, but you’re going to really hate the fact that you didn’t implement it six months ago. Yeah, so thinking about change management is one of the biggest things to think about as you’re building out your stack.
Ok, we’ve established that you’re both a wealth of information and laughs when it comes to data. Last thing I want to know is who is on your list of people to follow for the best data jokes or helpful advice about working in the field?
I’ll actually give a plug. There’s a website called moderndatastack.xyz and they have an influencers page. People that come to mind both from there and elsewhere: He’s my mortal enemy, but Seth Rosen has good data and dad jokes. There’s also Anna Filippova, Jacob Matson, Claire Carroll, Sarah Krasnik, Sarah Catanzaro, and Benn Stancil. Not everybody tells jokes, but they’re all great to have in your community.
Can’t plug the Meltano community enough, but also the dbt and Locally Optimistic communities. Any of these open-source tools have fantastic communities, and they’re filled with wonderful people. So check them all out.