Consider the following scenario:
Your boss reached out to you to find the reason behind the low satisfaction scores the app gets from its users and wants you to find out what might be the reason.
You contacted the customer support team who shared with you that they have been receiving complaints that the app feels sluggish. This is new, so you checked with engineering about the latest changes that were released in the last two weeks. Apparently, a new version of the chat backend has been rolled out a few days ago. You decide that it’s a lead worth pursuing.
You fire up the Mixpanel and put together a graph illustrating the performance measurements – loading time of chat messages.
The culprit is visible immediately. The graph clearly shows that the daily average loading time has increased significantly over the past few days. From 300ms to over 12 seconds!
You rush to your boss, reporting the issue and creating a task for the engineers to identify the problem and propose then implement the changes to lower the loading time to under 1 second.
The engineers spend the next few days investigating what might be causing the lagging, but in all their tests the messages are loading fine.
You just wasted valuable resources on an issue that was never there.
How come, you ask? The graphs clearly showed…
Well, that was just the daily average. And here’s the rule I want you to remember:
Never trust the average.
So what happened? Turns out, as the new version was being rolled out, each day around midnight the servers went into maintenance mode for a minute or two. And there were just enough users online during that time (possibly working in a different timezone), to create several slow response data points that completely skewed our statistics.
The good news is, the above scenario is completely avoidable. Let’s break down the basic stats you can use to catch when the average is lying to you.
The Mean Company
Mean = Average (sort of)
Yup, the Mean is the Average that most of us think of the talking about averages. The good ol’ sum of all, divided by the number of items. Gives you a basic insight into what is the average value of all your records in question. However, as we learnt above, the average can be misleading if there are some big outliers in your data set.
The word “mean” comes from the Middle English word “mene”, which means “middle,” and it is used to describe the value that is in the middle of a set of values when they are arranged in order.
While the term “mean” is often used interchangeably with the term “average”, it is important to note that there are other types of averages as well, such as the median and mode that we will discuss next, which are also measures of central tendency but are calculated differently.
Let’s take a look at a quick example. Take an array of 5 people and their monthly income:
[3000, 2000, 3500, 2700, 70000]
The average income for this group would come up to 16,240. Not exactly accurate, is it?
That’s why usually, for better visibility we pair the Mean with a…
Median
Median marks the middle value of your data set, dividing the data set into a lower and higher half of the data sample.
The word “median” comes from the Latin word “mediānus”, which means “in the middle” or “central”. The Latin word “medius” means “middle”, and it is the daddy of many other English words, such as “medium”, “medieval”, and “mediate”.
If we look at the sample array I provided above, its median value would be 3,500, painting a much more realistic picture of the data set, not affected so much by the large outliers. It’s a far cry from the Mean of 16,240. That’s why you should never share the Averages with anyone before checking the Median first.
Our initial example has 5 values, which means there is an absolute middle value. However, if you’re facing an array with an even number of items, then your median will equal to the average of the two most middle numbers:
[3000, 2000, 3500, 4000, 2700, 70000]
The median for this array would equal (3500 + 4000) / 2 = 3750.
Mode
Mode is the value that appears most often in a set of data, and it can be used to describe the “typical” value of the dataset.
For example, if we have an array of values
[15, 75, 50, 75, 34, 75]
then the mode of this array is 75, since that is the value that appears most frequently.
As a product manager, you might use the mode to understand the most popular features of a product. For example, imagine you have a product that offers several different features, and you want to know which features are most popular with your customers.
You could collect data on how often each feature is used by customers over a certain time period, and then calculate the mode of the dataset. The mode would tell you which feature is used most frequently by your customers and could help you identify which features to focus on or prioritize in future development.
Keep in mind though that the mode is not always the best measure of central tendency to use, especially when the data is continuous and has no clearly defined peaks, or when there are multiple modes in the dataset.
In such cases, make sure to check how the count or frequency of its value. And consider the other measures of central tendency, such as the mean and median, which might be more appropriate to use for a given case.
You’ll never walk alone…
To paraphrase (poorly) the Liverpool’s club song, never let a single stat inspire your decisions. Always check for the context and alternative angles to confirm your assumptions.
Relying solely on the mean can be misleading, as demonstrated by the example of our lagging app. Outliers in a data set can skew the results, so it is crucial to consider other measures of central tendency, such as the median and mode, to gain a more accurate understanding of the data. By using a combination of these measures, you’ll be able to avoid inaccurate conclusions and instead get straight to identifying the actual root cause of any issue that may arise.
Of course, these are only the three most basic stats. Stick around to learn what other tools Descriptive Statistics provide to turn you into a Product person with superpowers.