In-Task Knowledge Transfer: Applying AdapterFusion to Dense Passage Retrieval

Abstract

Pre-training and fine-tuning methods have greatly advanced information retrieval (IR). Tra- ditionally, all parameters of extensive pre-trained language models were fine-tuned for specific tasks. However, the growing size of these models and the multitude of tasks have made this approach less practical and more resource-intensive. In recent natural language processing (NLP) developments, there is a growing interest in methods for fine-tuning fewer parameters while still maintaining good or on-par performance. These approaches are typically called parameter-efficient fine-tuning methods. On top of one of these well-proven methods called Adapters, a knowledge transfer method called AdapterFusion was introduced. This allows in a two-stage manner to combine knowledge without the risk of catastrophic forgetting. Various works have used this method to realize modern approaches to bias mitigation, cross-lingual NLP tasks and more. All these applications have in common that they combine knowledge from two separate tasks. In this work, we want to apply AdapterFusion within the same task of IR to see if we can improve the performance of dense passage retrieval (DPR). We run vari- ous experiments with Adapters and AdapterFusion on the MSMARCO passage retrieval task dataset to investigate this topic. By replacing the fully fine-tuned encoder in a DPR setup with a parameter-efficient Adapter-based encoder we show that in a single encoder retriever setup, this does not come with a cost in performance. With only 0.597% of trainable pa- rameters, in comparison to the fully fine-tuned encoder, we not only achieve to maintain the same retrieval performance but also create substantially smaller model checkpoints. Further, by training specific Adapters for query encoding and passage encoding in each a parallel and isolated training setup, we discover that the learning capability of the query encoder is higher in the isolated setup. By then combining these Adapters in various AdapterFusion models we show that we can achieve a successful in-task knowledge transfer that allows the Fusion models to perform better than the individual query and passage Adapters. We compare the results to the single Adapter baseline and the fully fine-tuned baseline and cannot find any significant improvement after the in-task knowledge transfer. Additionally, we analyze the Adapter activations in the AdapterFusion layers and see that even though the passage Adapters are mostly activated the strongest in all systems, we cannot achieve a solid result without a well-trained query Adapter.


Citation

David Obermann
In-Task Knowledge Transfer: Applying AdapterFusion to Dense Passage Retrieval
Advisor(s): Markus Schedl,
Johannes Kepler University Linz, Master's Thesis, 2024.

BibTeX

@misc{Obermann2024master-thesis,
    title = {In-Task Knowledge Transfer: Applying AdapterFusion to Dense Passage Retrieval},
    author = {Obermann, David},
    school = {Johannes Kepler University Linz},
    year = {2024}
}