How does session replay work?
For software PMs, designers, and engineers in the audience, you’ll most likely be familiar with session replay. It does kind of what it sounds like it does: allows you to replay a user session in your product.
Session replay is incredibly valuable when you’re trying to improve your software product. It shows you (almost) exactly what a user did: what path or journey they took, what UI elements they interacted with, and what bugs or issues they may have run into. It gives you a granular, qualitative look in a quantitative world. Pretty much every company I’ve worked at has used it in one form or another.
Where session replay gets really powerful is when you combine it with the quantitative data that you’d get in a product analytics tool. This is what Mixpanel's new Session Replay release unlocks by placing replays right in user behavior funnels. This lets you aggregate journeys your users are taking plus drill in for a qualitative look at any particular session—all in one motion.
But how does this work, exactly? To the naked eye, it might look like the user’s session was quite literally recorded with a screen grabber—a method that would set off alarm bells for privacy and ethical concerns. Thankfully that’s not what’s going on; it’s actually much, much cooler.
How session replay works conceptually
Instead of literally recording your screen during a session, session replay works by reconstructing the changes made to the code that represents the app you’re using. Allow me to explain.
The DOM
When you use an app in your browser–let’s say Google Drive—what you’re really looking at is a bunch of code. This is a Drive home page:
It’s made up of three types of code behind the scenes:
- HTML: the text and structure of the elements on the page (a header, a paragraph, a table)
- CSS: the styling of the words and structures in your HTML
- JS (JavaScript): the interactivity of your HTML
The sum of all of these things is usually referred to by developers as the DOM, or Document Object Model. The DOM is all of the data that makes up a web page.
Any time you interact with a web page–by clicking on an element, dragging in a file upload, or scrolling around—what you are doing is making changes to the DOM. If you…
- Click on a Drive file → takes you a to a new page, with new HTML/CSS/JS
- Scroll down → updates the DOM’s record of where you are vertically
- Upload a file → brings up a new “file upload” modal with new HTML elements
Really, our view of things is backwards. What we see on a web page is a nicely rendered, styled version of the code that a developer wrote to create this. But it’s that code that’s changing and interacting with you—the screen is just a painting of what’s going on inside.
The car engine analogy
The best analogy I’ve found to help understand this is to think of it like a car. When you put your foot down on the gas, the car goes faster. But the pedal isn’t making the car go faster. It’s giving more gas (or electricity) to the engine, which is producing more power, and that is making your car go faster.
Now if you looked back on a road trip and wanted to understand when you were stuck in traffic and when you weren’t, you could just look at your car’s speed and when it was low or high. But you could also look at how much gas your engine was given, which is the mechanical process underlying your car’s speed. And that’s how session replay works.
Reconstructing the view
Knowing what changes have been made to the DOM is only part of the equation. To build session replay, you need to know what those changes—and the DOM itself—actually looked like to the user that was making them in the first place. Otherwise, you’ve just got a timeline of inscrutable code changes that aren’t useful to anyone, let alone a non-engineer.
The way this is done is by re-applying the CSS—that’s the code that styles a web page —to the HTML to rebuild a time capsule of what the app looked like at that particular time to the user. The result is a video of what looks like a screen recording of a user but in reality is a code-generated rebuilding of what their HTML, CSS, and JS looked like at the time of the session. It’s pretty neat.
Anatomy of a real session replay product
If the above sounds complicated to build, that’s because it is! Many modern session replay products—like that of Mixpanel's—owe their existence to a relatively unknown piece of open source software called rrweb.
rrweb: Open source session replay
Teams building session replay products use rrweb to record changes to the DOM, store them in efficient formats for sending to a server, rebuild them into something that looks like your app, and even in some cases provide play, pause, etc. functionality for the final product. The rrweb site has a neat demo where you can see how their session replay works: You interact with a page, click "REPLAY", and it will, well, replay.
Everything else: Surrounding UI
There’s a lot to build outside of rrweb, though. Back to Mixpanel’s Session Replay UI:
Everything outside of the core replay functionality, like tracking which user events things correspond to, sorting and filtering replays, and sharing with teammates, is not a part of rrweb. Each company needs to build those themselves.
Privacy controls
Session replay needs to be sensitive to user privacy. Besides the obvious stuff like scrubbing personally identifiable information (PII), modern session replay products like Mixpanel allow you to configure pretty granularly what you do and don’t want to record. By default, Mixpanel disables recording inputs (what a user types in a form), all text on a page, and any images or videos. You can also block recording any individual element on a page by specifying it with code.
Ready to see it work for yourself? Try Mixpanel Session Replay free for 30 days. Learn more.