Dr Krishna Gummadi, from the Max Planck Institute for Software Systems, gave an excellent presentation for the 2019 Turing Talk. Fortunately for me, he discussed the mathematical and statistical problems with AI in layman’s terms.
He started with the stories we know quite well - the potential value in big data, the statistical analysis success that was the ‘moneyball’ phenomenon and how the combination of data and analysis could reduce the noise in human decision-making.
Where it starts to get darker as a story is also quite well known - predictive policing. He cited a piece of investigative journalism by Rachel Ehrenberg that looked at a comparison of estimated drug use by ethnicity in Oakland, compared with the ‘perception’ of the predictive policing algorithm. It was, of course, hugely racist.
To demonstrate the issues that lead to this, Gummadi used the example of Compas, a recidivism risk prediction tool. The system actively tried to remove the bias by excluding information such as race data. He demonstrated that simply removing that information does not remove the bias. This is because the system uses an optimal linear boundary to separate likelihood of recidivism.
It turns out the linear boundary is not optimal for all sub groups - in this case race - and thus leads to... racist results. The error rates are significantly worse for black people than white, both in innocent people wrongly predicted guilty and guilty people wrongly predicted innocent!
What is the issue? The learning objective of the system would normally be to minimise overall error rate. But it actually needs to reduce disparity on the group error rates as well. So, a learning objective would then need to constrain the disparity of rates between different groups - to aim for ‘fair learning’.
The problem here is that this would reduce accuracy - so a trade-off needs to be made between overall errors versus group errors. In other words, the learning constraints embed ethics and values – and the balance is decided before the data is run.
An added complication is that we are discussing more than just ethnicity. It is provenly impossible to optimise for all groups simultaneously.
Some other biases in training data include sampling bias - for example, under-representing some races and labelling bias - for example, the fact that some labels are predicated on human judgements in the first place.
So far, so worrying. But it then got a little worse... if we have fair objectives for a system and unbiased training data to use, it turns out that this still isn’t enough.
Gummadi continued his talk with an explanation of biases in latent representations - these are essentially huge multi-dimensional maps of words and their interrelations with each other and synonymous words (man / king; woman / queen and so on). These interrelations, said Gummadi, are far too complex to be understood by a human. However, they necessarily reflect human society... and he gave the example of interrogating these for connections, so we have Man: Programmer as Woman: Homemaker; Father: Doctor as Mother: Nurse.
When looking at extreme associations the results are sobering. The top five ‘she’ associations are homemaker; nurse; receptionist; librarian and socialite. The top five ‘he’ associations: maestro; skipper; protégé; philosopher; captain.
He used the example of machine translation to show how these may come out in the real world using Turkish to English translation apps. Turkish does not use gender in some constructions, but the software adds them - ‘she is cook, he is an engineer’.
For fair human-machine symbiosis he thus lists three requirements:
- Fair learning objectives
- Unbiased training data
- Unbiased latent representations
It is hard.