Machine learning algorithms dominate society, from helping judges with courtroom decisions to influencing banks on who gets loans. With big and small decisions potentially being swayed by these mathematical equations, research has become dedicated to making algorithms more transparent and fair.
Uthaipon (Tao) Tantipongpipat and Samira Samadi, Georgia Tech Ph.D. students in the School of Computer Science, recently published a new paper that takes large data sets for population analysis and reduces the dimension of those data sets while also preserving essential traits of the groups being analyzed. Algorithms can handle millions of records but the process might compress information and lose details. This, in turn, can lead to groups of people being unfairly associated with certain behaviors or characteristics.
Samadi and Tantipongpipat’s previous work uses principal component analysis (PCA), a dimension reduction technique that has been the gold standard for analyzing large data sets more efficiently. Their own version, Fair-PCA, uses the strength of PCA and retains more information so that algorithms can, in theory, have better data for decision-making.
In their latest work, the duo is optimizing Fair-dimensionality reduction, allowing populations to be more accurately represented when not only using PCA, but a wider class of dimension reduction techniques.
The updated algorithm incorporates multiple equity measurements for populations – i.e. with respect to social and economic welfare – and takes into account multiple demographical attributes. For example, gender is usually analyzed as male and female, but this leaves transgender people and other non-binary people out of an algorithm’s calculations leading to unfair or biased assessments.
This new work is designed to allow machine learning researchers to analyze complex data sets more accurately, potentially leading to less bias.
"I feel like if fairness and bias are not being taken seriously into account at this point, then our problems are only going to compound. Machine learning algorithms are dominating our lives every day and they learn to behave based on previous outcomes. If we just let this build up and if we don't take care of it now, it will have a huge impact, one that may not be as positive as we had hoped,” said Samadi.
The team will present Multi-Criteria Dimensionality Reduction with Applications to Fairness in December at the 33rd Annual Conference on Neural Information Processing Systems (NeurIPS) 2019 in Vancouver, British Columbia.