CSE professor and his team published in a top-tier data mining conference CIKM 2021

A team of undergraduates from Lehigh, NYU, and the University of San Francisco investigated the fairness of graphical models in machine learning. The team found that the state-of-the-art model, called "graph neural networks", can pick up and exacerbate the bias in graph data for person/business credibility evaluation. The research found that unfair predictions by the graphical model can be caused by the number of connections a node has. The finding agrees with our daily intuition. For example, in a social network of job-seekers, someone with more connections seems more credible and can get more accurate and high-quality job recommendations, while a newcomer, who is equally qualified for a position, may not be selected for a job. This is unfair to the majority of the users who have fewer connections while trying to establish themselves. The unfairness in the graph can reinforce itself. Continuing the above example, working in a prominent position can lead to more connections that can result in more resources and opportunities in the future. Such reinforcement has also been found in police surveillance planning: neighborhoods more heavily policed lead to more criminal records, which, when used to train a machine learning model for planning future surveillance, can lead to even more surveillance of the over-surveilled neighborhoods.


There are multiple fairness metrics advocated, but delivering accurate model predictions while meeting the metrics is usually quite difficult. The team developed an optimization algorithm that can find optimal trade-offs among these goals. Under specific contexts, domain experts can select a trade-off that is least harmful to all subpopulations. On November 1-5, the PhD team member, Jiaxin Liu, presented the work at CIKM 2021, a renowned conference on information retrieval and data mining.


In another recent work, to motivate users to adopt graphical models as a powerful analytic tool, the trust of the users in the models must be first established. Explaining how the models work is an important approach to establish that trust. There are open questions, though: how do humans interpret a prediction made on graphs and how do those interpretations turn into trust or distrust? Our PhD student, Chao Chen, who's in his fourth year, led a project to reveal the relationship between human interpretation and trust. The study shows that there are two perception modes corresponding to two explanation desiderata, simulatability and counterfactual relevance. Intuitively, simulatability measures how easy an explanation can be understood by a lay user, while counterfactual relevance measures how strong an explanation provides a sense of what causes the prediction. The team found that a prediction is most trusted by humans if the associated explanation is easy to understand and provides a cause. When the explanation is not simple enough, or when there is no cause provided, users can be confused and less likely to trust the explanations. Motivated by the user study outcome, the team proposed a multi-objective optimization algorithm to search explanations that can balance the two desiderata to maximize human trust in the predictions.


Chao also led another project that can help neural researchers and biochemists make sense of neural network predictions. For example, new drugs or catalysts need to be discovered from millions of potential chemical structures, represented as graphs of atoms. The team investigated how a neural network, called "Siamese network", can compare two graphs and pinpoint structures that are similar to those already known to be useful. Before a neural network can be adopted, a domain expert would like to know, beyond accuracy, whether the neural network sifts the structures like an experienced chemist does. Without any explanations to make sense of the network predictions, a reasonably cautious researcher may refuse to adopt the predictions. Chao designed a self-learning approach to stably identify the causes of the predictions, which explains to the domain experts what patterns the neural network are selected to make the predictions. The explanations also provide a way to improve the neural network by incorporating feedback from domain experts. The collaborative work is done by Prof. Xie's group and Prof. Srinivas Rangarajan from the CBE Department. In December this year, Chao will present two papers on ICDM 2021, the flag-ship conference that publishes cutting-edge data mining research.