Outliers are a nuisance, outliers are gifts from the sky.
It doesn’t matter how you look at them: outliers demand attention. Statistics and design have opposing views on the matter, and very different ways of dealing with outliers. But let’s start at the beginning: wat is an outlier?
An outlier is a score very different from the rest of the data (…) they bias estimates of parameters (such as the mean), and also dramatically affect the sum of squared errors.
According to this quote from Andy Fields book (which is amazing, by the way) one should be wary of outliers. The general aim in statistics is to draw conclusions over the general population. Your sample should represent the population, and you want to generalize whatever you are testing on the sample. This means that outliers, which skew the data and bias the average, are unwanted visitors which should be removed from your dataset.
And there is something to say about this method. Outliers can seriously mess up your means and medians, as well as grow your standard deviation (and everything related to the SD). If your aim is to make statements about the average person in your sample, getting rid of the outliers is a natural first step.
Data science follows statistic’s practises, unless you’re performing anomaly detection. But if I may quote Wikipedia: typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or errors in a text.
So again the outlier is treated as something that causes problems. Something unwanted.
Design takes a completely different approach. And I want to take the self-stabilising spoon for people with Parkinson’s as example.
Parkinson’s is a disease that an estimated 0.9% of the world population has. From a statistic’s perspective it doesn’t make sense to focus on this data point. After all, 99.1% does not have this disease and thus we can consider, however crude it may sound, Parkinsons as an anomaly. Yet this spoon has been designed exclusively for this outlier, and with success.
It is one of my favourite designs, because by designing for the extreme you have automatically designed for the rest of the population.
Anyone can use this spoon, despite the fact that it specifically benefits people with Parkinson’s. And instead of looking at outliers as nuisances that skew your data, designers allow them to set the new standard.
I love outliers because focussing on these extraordinary data points just may provide you with something special. And I believe that even in statistics, highlighting this data can give valuable information about the rest of the population. After all: what makes an anomaly an anomaly? Knowing what makes something different, means knowing what is considered normal.Finally, these underdogs might be more similar to the rest than you think. Behaviour that stands out from the herd can be trendsetting (but this is the sociologist in me speaking).