Mo Data Mo Problems? When More Information Makes You More Wrong

Last Updated on November 26, 2019 by Alex Birkett

More data isn’t necessarily better, and in fact, sometimes more data leads to much worse decision making.

We’re all trying to make better business decisions (I hope, at least). All decisions include some level of uncertainty. Properly collected and analyzed data can reduce that uncertainty, but never eliminate it.

However, in practice, I’ve noticed many executives and thought leaders leaning on data like the proverbial drunkard and the lightpost (using it for support rather than illumination). This can lead to both over-certainty and inaccurate decision making, the combination of the two being quite deadly.

I absolutely love data – collecting it, analyzing it, visualizing it. It’s probably my favorite part of my job.

But as marketers and analysts, it would be irresponsible if we didn’t speak realistically about the shortcomings of data, particularly when it comes in very large quantities.

When is ‘Too Much Data’ a Bad Thing?

There’s generally a tradeoff between accuracy and utility. In other words, collecting more data can be costly or can incur costs in how complex the dataset is, but that tradeoff may be worth it if you can gain a greater degree of precision in your measure.

In the easy example of an A/B test, you clearly want to collect enough data to feel confident enough to pull the switch and make a decision. However, there’s a lot of nuance here, even: the consequences of some decisions can be much greater than others, therefore the cost of collecting the data can be outweighed by the importance of making the right decision.

For example, it’s much more important that you not make the wrong decision when designing a bridge or researching a medicine than it is when you’re trying out new CTA button colors.

So right off the bat, we can establish that the value of data also depends on the utility it provides, given the impact of the decision.

Beyond that, sometimes collecting more data isn’t just wrong in the sense that it is costly; sometimes more data makes it less likely you’ll actually make the right decision.

Here are the five times when that is the case:

When we’re tracking the wrong thing
When we’re incorrectly tracking the right thing
When you’re able to find spurious correlations because of “swimming in the data”
When the cost of data collection supersedes its utility
When what we’re tracking is unmeasurable and we’re using data to save face

A Confidence Problem: Boldly Walking in the Wrong Direction?

The underlying theme in all of these is that more data leads to greater confidence in decision making, and making a bad decision with great confidence and ‘the data on your side’ is more dangerous than acknowledging the uncertainty in the decision.

Saying “I don’t actually know, but I have a hunch,” gives you the freedom to pivot upon receiving new data, which is a form of optionality. The opposite is when you commit too heavily to a poor decision due to misinterpreting the data. The more data that backs up your wrong decision, the more likely you are to zealously pursue it.

A real benefit to acting without data, or without much data, is we are forced to acknowledge the inherent uncertainty involved in the decision. When we have too much data, we’re often placated by the numbers. We believe the room for error is much smaller than it really is.

I’ll walk through each of these in detail through stories, quotes from smarter people than myself, and also technical explanations where applicable.

1. You’re Measuring the Wrong Things

The first mistake is when you make decisions on data that is actually tracking the wrong things.

Andrea Jones-Rooy, Professor of Data Science at NYU, gave the example of using data to make better hiring decisions. Here’s how she put it:

“Very few pause to ask if their data is measuring what they think it’s measuring. For example, if we are looking for top job candidates, we might prefer those who went to top universities.

But rather than that being a measure of talent, it might just be a measure of membership in a social network that gave someone the “right” sequence of opportunities to get them into a good college in the first place.

A person’s GPA is perhaps a great measure of someone’s ability to select classes they’re guaranteed to ace, and their SAT scores might be a lovely expression of the ability of their parents to pay for a private tutor.”

“What to measure” is a common topic in conversion rate optimization, as your impact will depend on the yardstick by which you measure yourself. If you’re optimizing for increased click through rates, you may not be improving the bottom line, but simply shuffling papers.

Similarly, we often try to quantify the user experience and tend to choose between a few different metrics – NPS, CSAT, CES, etc. – even though all of these things measure completely distinct things, none of which encompass the entire user experience.

What you track is highly important and shouldn’t be overlooked. If you’ll use a metric to make a decision in the future, put in the time to make sure it means what you think it means (this, of course, is why the Overall Evaluation Criterion, the North Star Metric, the One Metric That Matters, etc., are all such big points of discussion in our respective fields).

Practical aside, you can ignore the following: bounce rates, time on site, pages per session (unless you sell ads), click through rate, pageviews, social shares, “brand awareness” (whatever that means), and whatever other vanity metrics you use to tell stories about your work.

Tools and strategies that involve “tracking everything” are wrong because of this reason: you introduce so much noise that you can’t distinguish the signal. You’re swimming in so much unimportant data that you can’t see the stuff that matters. Nassim Nicholas Taleb explained this in Antifragile:

“More data – such as paying attention to the eye colors of the people around when crossing the street – can make you miss the big truck. When you cross the street, you remove data, anything but the essential threat.”

Measure enough shit and you’ll find a significant correlation somewhere and miss what matters for your business.

2. You’re incorrectly tracking the right things

This is one of the most heartbreaking of the big data errors I see, and it’s probably the most common.

You and your team, including executives, hash out the strategy and map out what you’ll measure to weigh its performance. You spend time mapping out your data strategy – making sure you can technically implement the tracking and that the end user can access and analyze it.

Everyone has a plan until they get punched in the face, as Mike Tyson said.

Your tracking can break down for a truly unlimited amount of reasons.

Tiny variable name changes will ruin R scripts I’ve written. Redirects can strip tracking parameters. During the two plus years I’ve been at HubSpot, we’ve had numerous tracking bugs on both the product and marketing side of things. At CXL, same thing. We did our best to remain vigilant and debug things, of course. But shit happens, and to pretend otherwise isn’t just naive, it’s foolish.

Many end-users of an analytics tool will simply put their faith in the tool, assuming what it says it is tracking is what it is actually tracking. A bounce rate means a bounce rate means a bounce rate…

Of course, a sophisticated analyst knows this isn’t the case (rather, anyone who has spun up more than a few UTM parameters has seen how things can break down in practice).

In a more theoretical context, here’s how Andrea Jones-Rooy explained data instrumentation problems:

“This could take the form of hanging a thermometer on a wall to measure the temperature, or using a stethoscope to count heartbeats. If the thermometer is broken, it might not tell you the right number of degrees. The stethoscope might not be broken, but the human doing the counting might space out and miss a beat.

Generally speaking, as long as our equipment isn’t broken and we’re doing our best, we hope these errors are statistically random and thus cancel out over time—though that’s not a great consolation if your medical screening is one of the errors.”

Take, for example, the act of running an A/B test. There are many nodes in the system here, from putting the javascript tag on your website (assume we’re using a popular client-side browser testing tool like VWO) to the targeting of your users and pages to the splitting and randomizing of traffic, the experience delivery, and the logging of events

Basically, there are a shitload of places for your perfectly planned A/B test to return incomplete or inaccurate data (and it happens all the time).

Flicker effect is commonly talked about, but that’s the tip of the iceberg. If you want to shake your faith up, just read Andrew Anderson’s spiel on variance studies.

Additionally, with A/B testing, the greater the sample size you introduce, the more able you are to detect smaller effect sizes. If you have greater than normal variance or a bug that introduces a flicker or load time increase, then that effect could turn into a false positive at large samples (more data = worse outcome).

Inaccurate tracking and bugs are inevitable in the grand scheme of thiings. That’s why it’s important you hire as many intelligent, experienced, curious, and most important, diligent humans that you can.

Another great example is using survey data or customer interview data, but not analyzing the bias you (the survey designer or interview) introduce. Your data, then, will of course be faulty and will lead you in a poor direction.

The other side of this is if you survey or interview the wrong people, even if you ask the right questions. Bad inputs = bad outputs. Formally, this is known as “selection bias,” and it’s a problem with so many measures. Take, for example, sentiment analysis from Tweets. Andrea Jones-Rooy explains:

“Using data from Twitter posts to understand public sentiment about a particular issue is flawed because most of us don’t tweet—and those who do don’t always post their true feelings. Instead, a collection of data from Twitter is just that: a way of understanding what some people who have selected to participate in this particular platform have selected to share with the world, and no more.”

The more flawed interviews you do, tweets you analyze, or responses you collect, the more confident you’ll be to continue in that direction (thus, more data = worse outcome in this case).

The solution: Trust but verify, as my friend Mercer says.

3. When you’re able to find spurious correlations because of “swimming in the data”

When you have a lot of data, it’s easy to find patterns in it that are completely meaningless.

Segmenting after an A/B test? Well, that’s a best practice. However, if you make decisions on those segments without accounting for multiple comparisons, you’re significantly raising the risk of false positives. Same as measuring multiple metrics during a test (which is also super common, unfortunately)

Really, a lot of the problem with too much data is really just the mishandling and misinterpretation of it. You can solve a lot by reading a book like Georgi Georgiev’s “Statistical Methods in Online A/B Testing.”

Outside of simple misunderstandings with statistics, we truly do have a problem of “swimming in the data,” which is another way to say we’re drowning in bullshit insights.

If you’ve got enough time, watch Justin Rondeau’s talk about the mistake of “swimming in the data” in the context of customer surveys:

The more data points you track and the more data you compile, the more meaningless insights will crop up. If you’re not careful, you’ll spend all of your time chasing statistical ghosts (or if you’re a bit more amoral, intentionally cherry-picking patterns to back up your story). Nassim Taleb says it like this in Antifragile:

“Data is now plentiful thanks to connectivity, and the proportion of spuriousness in the data increases as one gets more immersed in it. A very rarely discussed property of data: it is toxic in large quantities – even in moderate quantities.”

Related, the more often you look at data, the more likely you are to find something of interest (meaningful or not). This all roots back to multiple comparisons and alpha error inflation, but to put it into a more concrete and less academic context, here’s another quote from Taleb’s Antifragile:

“The more frequently you look at data, the more noise you are disproportionally likely to get (rather than the valuable part, called the signal); hence the higher the noise-to-signal ratio. and there is a confusion which is not psychological at all, but inherent in the data itself.

Say you look at information on a yearly basis, for stock prices, or the fertilizer sale of your father-in law’s factory, or inflation numbers in Vladivostok. Assume further that for what you are observing, at a yearly frequency, the ratio of signal to noise is about one to one (half noise, half signal) – this means that about half the changes are real improvements or degradations, the other half come from randomness. This ratio is what you get from yearly observations.

But if you look at the very same data on a daily basis, the composition would change to 95% noise, 5 % signal. And if you observe data on an hourly basis, as people immersed in the news and market price variations do, the split becomes 99.5% noise to .5% signal.

That is 200 times more noise than signal – which is why anyone who listens to the news (except when very, very significant events take place) is one step below a sucker.”

What we need is less data but more precisely tuned to deliver information to questions we care to answer. This is much better than the standard “let’s track everything and see what the data tells us” approach, which as you can see, can be dangerous in even benevolent hands, let alone those of an intentional and savvy cherry picker.

4. When the cost of data collection supersedes its utility

The expected value is a probabilistic anticipated future value of an action. In basically every business case, we’re hoping to maximize the ratio between the cost and the expected value/benefit so that the cost-benefit trade-off makes it worth it to do the action. In other words, we want to get as much as we can for as little cost as possible, and we want to know our ROI with a high degree of certainty.

It’s easy to measure costs in monetary terms, so calculating ROAS isn’t complex. It’s a bit harder to measure resource costs, but smart growth teams calculate the “ease” of a given action and use it to prioritize.

Measuring the cost of data is difficult, though, because the outputs aren’t linear. In many cases, the marginal utility of collecting more data decreases, even if it is accurately collected.

This is a commonly occurring phenomenon, as we can imagine eating donut is satisfying, but the second donut is a bit less so, with each subsequent slice having a lower marginal utility (and past a certain point, perhaps a lower total utility as you tend to get sick from eating too many).

Two marketing examples:

User testing
A/B testing

In user testing, you don’t need more than 5-7 users. You’ll find ~80% of usability issues with 5 users, and past 7 the curve pretty much flattens:

As Jakob Nielsen puts it:

“As you add more and more users, you learn less and less because you will keep seeing the same things again and again. There is no real need to keep observing the same thing multiple times, and you will be very motivated to go back to the drawing board and redesign the site to eliminate the usability problems.”

When you run an A/B test, there are multiple costs incurred, including the cost of setting up the test and the “regret” established during the course of the experiment (either delivering a suboptimal experience to your test group or failing to exploit a better variant sooner).

A/B testing, like other forms of research and data collection, is a trade-off between accuracy and utility. Run the test forever and you (may) get more precision, but you lose almost all of the usefulness. You also introduce powerful opportunity costs, because you could have been running more experiments or doing more impactful work.

In reality, there are problems with running A/B tests for too long that go beyond marginal utility. In fact, due to cookie expiration and other external validity threats, your results will likely become less trustworthy by collecting too much data.

When a marketing leader can’t make a decision due to too much data and reflection, we refer to it as “analysis paralysis,” and it’s not a good thing. Sometimes it’s best to just make a decision and move to the next thing.

5. When what we’re tracking is unmeasurable and we’re using data to save face

We make decisions all day long, and some of them are going to be poor decision – no one bats a thousand. Someone with a mature understanding of data and decision making realizes this and factors it into their expectations. In reality, your good bets just need to outweigh your bad ones, and the a) more often you can do that or b) the bigger you can win when you’re right, the better.

The opposite would be a misunderstanding of the nature of uncertainty in the business world, and in that world, your nonoptimal decisions need to have a fallback, something to point to that “caused” the bad decision (god forbid it was poor judgement).

That’s why, even when it is truly (or very nearly) impossible to measure what they want to measure, leaders will sometimes ask for enough data to back up their decision (often something they’ve already decided on and just need justification for). Because, then, in the event of failure, their face is saved and their blame is absolved. After all, “the data said it would work.”

(Remember: data doesn’t say anything. We use data to reduce uncertainty and make better decisions, but we can never fully reduce uncertainty or make perfect decisions).

Another Nassim Taleb quote fits here:

“What is nonmeasurable and nonpredictable will remain nonmeasurable and nonpredictable, no matter how many PhDs with Russian and Indian names you put on the job – and no matter how much hate mail I get.”

This is a tough point to swallow, because any data we’re using as a proxy for the untrackable is probably a waste of time. Tricia Wang refers to “Quantification bias” – the unconscious valuing the measurable over the immeasurable. Eugen Eşanu summed up the problem in UX Planet:

“People become so fixated on numbers or on quantifying things, that they can’t see anything outside of it. Even when you show them clear evidence. So quantifying is addictive, and when we don’t have something to keep that in check, it’s easy to fall into a trap. Meanwhile, you are searching for future you are trying to predict in a haystack, you don’t feel or see the tornado that is coming behind your back.”

Most business decisions we make are in complex domains. We may be able to tease out some causality running experiments on a signup flow, but for strategic decisions, brand marketing, and team structure decisions (and many more examples)…well, they have first order effects that may be quasi-trackable, but the second and third order effects are usually latent and more important by orders of magnitude.

Negative second and third order effects can usually be mitigated by acting from principles first, and only then letting the data drive you.

You’ve undoubtedly come across a website that looked like a Christmas tree with all of its popups, or you’ve dealt with an unethical business leader who forgot that success is derived from compound interesting and life is a series of iterated games.

In these cases, the allure of the first order effect (higher CTR, more money made by ripping someone off) overshadowed the long term loss. So in absence of measurability, define your principles (personal, team, company) and operate within that sphere. Your career is long, so don’t burn out seeking short term wins.

Okay, so not everything can be tracked (or tracked easily) – What’s the solution, particularly for data driven marketers and organizations?

My take: Track what we can with as much precision as possible, and leave a percentage of your business portfolio open for the “unmeasurable but probably important.”

Dave Gerhardt recently posted something on this topic that I really liked:

That’s an eminently mature way to look at a strategic marketing portfolio.

As an example, I don’t think many companies are accurately measuring “brand awareness,” but clearly branding and brand awareness as concepts are important. So just do things that help prospects learn about what you do and don’t try to tie it to some proxy metric like “social media impressions” – that’s a form of scientism, not to mention it’s game-able to the point where quantification may even backfire.

I like using an 80/20 rule in portfolio development, which I’ve borrowed from Mayur Gupta:

“Do your growth efforts and performance spend benefit from a strong brand (efficiency and/or effectiveness or organic growth)? Are you able to measure and correlate?

Think about the 80–20 rule when it comes to budget distribution — if you can spend 80% of your marketing dollars on everything that is measurable and can be optimized to get to the “OUTCOMEs”, you can spend 20% however you want. Because 100% of marketing will NEVER be measurable (there is no need).”

The ratio isn’t important, only to note that not everything can be forecasted, predicted, or chosen with perfect certainty. Alefiya Dhilla, VP of Marketing at A Cloud Guru, mentioned once to me she thinks in terms of certainty/risk portfolios as well, balancing it with around 70% in tried & true trackable actions, 10-20% in optimizing and improving current systems, and the remainder in unproven or untrackable (but possibly high reward) bets.

The point is humility, much like the serenity prayer, tracking accurately what you can and being okay with what you can’t.

Conclusion

Data is a tool used to reduce the uncertainty in decision making, hopefully allowing us to make better decisions more often. However, in many cases, more data does not equal better outcomes. Data collection comes with a cost, and even if everything is tracked correctly, we need to weigh our decisions by their efficiency and ROI.

Additionally, data often makes us more confident. As Nassim Taleb put it, “Conversely, when you think you know more than you do, you are fragile (to error).”

This is a big problem is what the data is telling us is poorly calibrated with what we want to know. Whether through measuring the wrong thing or measuring the right thing incorrectly, things aren’t always clean and perfect in the world of data.

All that said, I love data and spend most of my time analyzing it. The point is just to be critical (doubt by default, especially if the numbers look too good to be true), be humble (we’ll never know everything or have 100% certainty), and constantly be improving (the best analysts still have a lot of room to grow).