Efficient Concept Forgetting Via Label Annealing

Published in In Progress, 2024

Recommended citation:

The effectiveness of current machine learning models relies on their ability to grasp diverse concepts present in datasets. However, the presence of biased and noisy data can inadvertently cause these models to capture unintended concepts, thereby undermining their ability to generalize and provide utility. Consequently, addressing the removal of such undesired concepts from trained models becomes imperative for their responsible deployment. The primary aim of this study is to develop methodologies for forgetting specific undesired concepts from pre-trained classification models. We define concept forgetting as a property of the model to disregard the undesired concept during its decision-making process. To achieve this goal, we present a novel algorithm called Label ANnealing (LAN). This iterative algorithm employs a two-stage method for each iteration. In the first stage, pseudo-labels are assigned to the samples by annealing or redistributing the original labels based on the current iteration’s model predictions of all samples in the dataset, and further in the second stage, the model is fine-tuned on the dataset with pseudo-labels. We illustrate the effectiveness of the proposed algorithms across various models and datasets, demonstrating favorable results while maintaining computational efficiency and preserving the model’s generalization capabilities.

Download paper here