Modular DiffPruning for Bias Mitigation

Abstract

In recent years, large language models have achieved state of the art performance on a wide variety of Natural Language Processing tasks. These capabilities however come with some negative consequences, namely the existence of various societal biases in these models. Furthermore due to their size, storage and inference are becoming increasingly challenging and often require industry-grade infrastructure. In this research, we present a novel solution to address both of these challenges. We build upon two established approaches: first, adversarial training, a common (in-processing) bias mitigation approach which encourages a model to update its parameters during training such that the model can perform a task in an debiased way. Second, Diff pruning which enables storing finetuned models in a parameter-efficient way. We combine both approaches and propose a novel modular bias mitigation technique, where information for debiasing a protected attribute is learned and stored in a sparse subnetwork, which can be added to a pre-trained model on-demand. We show the effectiveness of our method based on a large set of experiments for three protected attributes (gender, age, and race dialect), and on three datasets. The results show that our modular approach has several benefits: it leads to significant reduction in bias while maintaining task performance compared to several strong baselines, it is parameter-efficient, allows for the creation of modular bias mitigation Diff masks, and enables the integration of these bias mitigation modules as needed.

Citation

Lukas Hauzenberger
Modular DiffPruning for Bias Mitigation
Advisor(s): Navid Rekab-saz,
Johannes Kepler University Linz, Master's Thesis, 2023.

BibTeX

@misc{Hauzenberger2023master-thesis,
    title = {Modular DiffPruning for Bias Mitigation},
    author = {Hauzenberger, Lukas},
    school = {Johannes Kepler University Linz},
    year = {2023}
}

Abstract

Citation

BibTeX

Resources