The importance of understanding your data before using it to train a model
News has recently broken that Apple's credit card gives women lower credit limits that men, even when they have identical credit histories to their husbands. While we don't know precisely what causes this, anybody who knows the basics of machine learning could tell you that if you train on biassed data, you get a biassed model.
Apple and Goldman Sachs appear to have made one of the most basic data science errors in the book. They've thrown a lot of data at a model (probably a black-box model), without making sure they'd understood it first. If they had done a proper exploratory analysis beforehand, they could have identified potential sources of bias in their data and corrected for them.
An example from one of my previous projects illustrates the importance of understanding your data. Formisimo wanted to predict in real-time whether users would complete or abandon web-forms. Their existing models were capable of predicting to a certain degree of accuracy whether customer had completed or abandoned the form given a complete history of their interactions, but didn't work in a simulation or real-time behaviour. My investigations showed that it was only in the last hundred interactions that a real signal of whether the user would complete or not was present. Taking this into account enabled me to create much better models for them.
Apple now need to go over their training data, work out where the source of bias is, and fix it. If they need a fresh pair of eyes on it, they can contact Playful Technology Limited.