Algorithm bias is the design problem of the 21st century (and here’s what we can do about it)

We have entered the second machine age, or the age of AI. With digital transformation top of mind at all the big businesses, data and artificial intelligence seem to offer endless possibilities. These machines, personally adjusted and without judgment, will infinitely improve our lives.

Well… No. Because all these technologies are infused with bias.

A few weeks ago my colleague Kim had a conversation with a Dutch start-up. They were talking about speech-to-text techniques and wanting to transcribe conversations between the company’s experts and their customers.

But they found a problem: speech-to-text techniques are performing sub-par for female voices. How can this be?

This sub-par performance is a consequence of algorithmic bias. And it doesn’t stop at speech recognition: for example, women are also not getting the same accuracy results in facial recognition. And the problem does not just lie with gender: the same recognition performs poorly on non-white skin colours.

Algorithms consist mostly of human input. Not only in the literal sense of building the code, but also in training the model.

I always describe artificial intelligence as a bomb-sniffing dog. The dog only knows that when he smells a certain scent, he barks and he gets a cookie. Just like that you need to teach the algorithm what to do.

During the training of the model we transfer biases — both conscious and unconscious — into the code. For example, if we’re training an algorithm to find the best candidates for a technical vacancy it will look at the historical data to ‘learn’ what the best candidates were. Because there are more men than women who work in tech, you quickly create a gender-biased algorithm (which is something that Amazon — among many other companies — recently experienced).

It starts with the programmers that develop the algorithms. This group is mostly male and Caucasian, with an average age of 28 years old (there is even an article claiming that the age limit for developers is 45). Because the group is very alike in gender, race and age the chances increase that the biases of the group are also similar.

The issues continue with the test groups, where the algorithm is trained. The bias gets strengthened because — most probably — the test group is not diverse enough either. What you often see in the field is that due to lack of resources and time, the same programmers that build the algorithm also train it on themselves.

Now let me be clear: I do not believe that this group is consciously building biased systems, or building-in sexism or racism. Bias is often unconscious and the fact that the tech industry is male-dominated is a problem that’s been around for decades.

Example: voice recognition in the car industry.

Let’s make it a bit more concrete.

Voice recognition is booming in the car industry. Collaborations between tech and car companies have been all over the news (such as Microsoft and BMW), and in-car speech recognition is becoming an almost standard feature in new cars.

With your voice you can change the radio station, and call your mom whilst driving. But you can also use speech to call emergency services when you’ve been in an accident.

At the time of writing, voice recognition’s highest performer rates 13% worse for females. And this is without taking dialects or race into account.
So it worries me that when I crash my car my odds of successfully calling emergency services via the car’s system is decreased with that high a percentage.

We’re placing a lot of trust in artificial intelligence, assuming that — because it’s a machine — it will judge objectively and fair. But if we do not train our model objectively and fair we will encounter more and more situations where some groups of people will be disadvantaged.

Diversity is not just beneficial for work culture, it also has a positive effect on the test group and training sample. For example, women have a different way of speaking compared to men (source):

Women tend to be more intelligible (for people without high-frequency hearing loss), and to talk slightly more slowly. In general, women also favor more standard forms and make less use of stigmatized variants. Women’s vowels, in particular, lend themselves to classification: women produce longer vowels which are more distinct from each other than men’s are.

Not taking differences like these into account will not just worsen company culture, they can actually make your model perform worse.

The consequences may seem harmless now but have a serious effect in the long term. After all, a lot of the algorithms that are built now form the basis for future progress. This means that the biases will be more difficult to program out of the system, and in some cases are more likely to get strengthened over time.

What can we do about it?

I’m with Teen Vogue (yes, seriously) on this one: “such a complicated problem requires multiple solutions”.

The good news is that algorithms are as good as the input, and we humans control that input. The difficult part is that we need to create algorithms that are neutral, clean from bias. And humans are never without bias.

This new age of AI does offer endless possibilities, but we need to take responsibility in shaping it so everyone can benefit from it. Luckily more and more tech companies are being vocal about their responsibilities in Artificial Intelligence. But in my opinion everyone needs to take responsibility, also individually.

This means that we need to increase diversity in the teams that create algorithms.
We DO need more people from different backgrounds, both geographically, socially and culturally.
We DO need more women in tech, and we need to support them to keep them in tech.
And in my opinion we DO need ethics boards that check the produced algorithms and AI for biases, so we can minimize the amount of flawed algorithms we put into the world.

Being aware of bias is not the end solution, but it’s a good beginning.

Previous
Previous

Digging into MLops: how it can help getting your Machine Learning models into production

Next
Next

Why I love outliers