Equitable Semi-supervised Learning Unlabeled Data Aid in the Decrease of Discrimination PROJECT TITLE : Fairness in Semi-supervised Learning Unlabeled Data Help to Reduce Discrimination ABSTRACT: The question of whether or not the conclusions drawn by Machine Learning models are equitable is becoming an increasingly important one as the field of Machine Learning gains popularity. In spite of the fact that research is already being conducted to formalize a machine-learning concept of fairness and to design frameworks for building fair models at the expense of accuracy, the vast majority of AI systems are designed for either supervised or unsupervised learning. However, two observations prompted us to ponder whether or not semi-supervised learning might be an effective method for resolving issues involving discrimination. First, previous research has shown that increasing the total number of examples used in training can potentially result in a more favorable balance between accuracy and fairness. Second, the training of the most effective models of today requires an enormous amount of data, which, in terms of practicality, is likely to be possible from a combination of labeled and unlabeled data. As a result, in this paper, we present a framework for fair semi-supervised learning in the pre-processing phase. This framework includes pseudo labeling to predict labels for unlabeled data, a re-sampling method to obtain multiple fair datasets, and, finally, ensemble learning to improve accuracy and decrease discrimination. In semi-supervised learning, the different sources of discrimination and the effect that those sources have on fairness are brought to light through the application of a theoretical decomposition analysis of bias, variance, and noise. Our method is able to use unlabeled data to achieve a better balance between accuracy and discrimination, as shown by a series of experiments conducted on datasets derived from the real world and from simulations of that world. Did you like this research project? To get this research project Guidelines, Training and Code... Click Here facebook twitter google+ linkedin stumble pinterest For Inductive Semi-Supervised Learning Over Large-Scale Graphs, GAIN stands for Graph Attention & Interaction Network. High-cardinality string categorical variables encoding