Noah Arthurs, AJ Alvero
International Educational Data Mining Society
International Educational Data Mining Society
Word vectors are widely used as input features in natural language processing (NLP) tasks. Researchers have found that word vectors often encode the biases of society, and steps have been taken towards debiasing the vectors themselves. However, little has been said about the fairness of the methods used to evaluate the quality of vectors. Analogical and word similarity tasks are commonplace, but both rely on purportedly ground truth statements about the semantic relationships between words (e.g. "man is to woman as king is to queen"). These analogies look reasonable when only taking into account the literal meanings of words, but two issues arise: (1) people don't always use words in a literal sense, and (2) the same word may be used differently by different groups of people. In this paper, we split a dataset of over 800,000 college admissions essays into quartiles based on reported household income (RHI) and train sets of word vectors on each quartile. We then test these