We rely on facial recognition based systems to unlock our phones to home doors and more to law enforcement trying to use it to detect criminals, etc. IBM is preparing a trove of 1 million images sourced from Flickr Creative Commons in order to train the AI based facial recognition algorithm to be fair and accurate. Also, this would help the AI systems to detect faces and minimize the errors.
As John Smith, a fellow and lead scientist at IBM, states that there are many prominent datasets available, however, they are just too narrow and fall short in balance and coverage. Since there are more than 7 billion humans, their faces have a lot of changes that the AI system might not be able to detect and could end up matching the wrong person or no one at all. This could lead to AI-based bias which is something folks at IBM are fighting for so that their systems can learn to detect people without any errors.
A paper published by Joy Buolamwini from MIT states that IBM Watson’ visual recognition platform states that the system was able to identify females with darker skin with an error rate of 35% while it was able to detect lighter-skinned males with an error of less than 1% which could encourage AI-based racial profiling in the long run.
CNBC quotes that a 2016 report by Center for Privacy and Technology at the Georgetown University’s law school stated how disproportionate the facial recognition systems can be to African Americans who are targeted for arrests even without facial recognition systems.
Speaking about the IBM’s efforts here, the tech giant has curated a set of 1 million images under million-image Diversity in Faces (DiF) that the developers can use to train their AI systems to detect and differentiate faces keeping diversity in mind. IBM’s machine learning systems were activated to collect 100 million images from Flickr that were later cropped and then, studied. IBM has followed a labeling approach wherein it has actively labeled each and every aspect on a human face such as the distance between the eyes, forehead size, etc to culminate as a “faceprint”.
Further, IBM has also prepared sets with contrasting skin colors, types of coloration, and vivid range of spectrum including gender, age, and other information labeled so that the developers can use this trove of images to train their AI-based facial recognition systems into being fair, accurate, and non-biased.
Finally, even with a dataset that runs in millions, there is no guarantee that it represents the diversity in faces adequately and that it is to prevent bias in different subsets and groups. The AI-based facial recognition system is still learning and improving and has shortcomings that will be improved further.