Facial recognition software is becoming the go-to security measure for businesses, but it can be inaccurate and racially biased. Although many companies have proposed adding human intervention to mitigate this, a Georgia Tech researcher says human-subject experiments must be a priority before human intervention is considered a one-size-fits-all solution.
“Humans are biased themselves, so how can you resolve an issue of bias with a human?” School of Computer Science Ph.D. alumna Samira Samadi said. “It might even make it worse.”
The limits of facial recognition
Facial recognition software supposedly automates building security. The software takes photos as people enter a building, which it then cross-references with an employee database. If the software finds a match, a person can enter the building.
Despite many advances in image recognition and artificial intelligence, systems are often more accurate for men with lighter skin tones and less for women with darker skin tones. Companies have proposed adding a human evaluator to compensate for the software’s limitations.
Yet Samadi, who researches algorithmic fairness, immediately recognized the potential for more bias. She wanted to know whether adding a human evaluator to the process increases fairness or bias.
Yet designing such a human/user study is challenging as Samadi and colleagues at Microsoft Research realized. Working with actual security guards or receptionists would be ideal, but was not feasible in practice.
Samadi turned to recruiting people through Mechanical Turk as she had done in the past. These users would offer her volume, but they were not trained in recognizing faces. First, she studied how to compare faces. Then she learned how to teach Mechanical Turk users about facial recognition systems, how to make decisions about the accuracy of the system, and how to be confident in that decision.
After research, Samadi developed a user study and did some trials with friends to ensure the study was clear and understandable. Then she ran the study on 300 users on Mechanical Turk.
Each user was trained on how to distinguish faces and evaluate the software. Next, the user saw two images and how they were scored by the software. Samadi expected the human evaluator would show bias between two lighter versus dark-skinned people, but the results were much different.
“We really tried to imitate a real world scenario, but that actually made it more complicated for the users,” Samadi said.
The researchers were unsure whether the problem with the study was because users didn’t understand the study or biased behavior, but they ultimately decided not to publish the research. However, Samadi did publish a position paper, A Human in the Loop is Not Enough: The Need for Human-Subject Experiments in Facial Recognition, with Microsoft Research’s Farough Poursabzi-Sangdeh, Jennifer Wortman Vaughan, and Hanna Wallach. Samadi presented the work at the Conference on Human Factors in Computer Systems (CHI) in April.
The paper argued about both the necessity and issue with studies like these. There are four main challenges about both the efficacy and generalizability with a human-subject study like the one they conducted:
-Datasets: Finding an appropriate dataset is difficult for a number of factors: Sourcing images ethically is challenging because past research has relied on celebrity or politician images who are easily recognizable and thus bias the study. Many datasets are also already biased and contain more lighter-skinned faces than darker. Also, many datasets are higher quality than what would be found in camera footage and not an effective real world comparison.
-Participants: Many available participants for studies like these are students or Mechanical Turk workers who are inexperienced in facial recognition.
-Context: Recognizing faces in an experiment is not comparable to on the job duties when an unfamiliar person may be a threat.
-User Interface: Companies do not release their user interfaces for facial recognition software, leaving it up to researchers to design something that may not reflect what is used in real world software.
“If someone wants to attack this problem in the future, they should know the challenges they have ahead of them,” Samadi said.