πŸ“ f-INE: Influence Estimation using Hypothesis Testing

Published in ICML, 2025 workshop on DataWorld, 2025

Recommended citation: https://openreview.net/pdf?id=26zj5O7dph

πŸ‘₯ Co-authors: Sai Praneeth Karimireddy, Shashwat Sourav, Prathosh A.P.

f-INE

Understanding the influence of individual or groups of training examples on a model’s predictions lies at the core of data valuation, attribution, debugging, and privacy. Yet, due to the inherent randomness and complexity of modern ML training pipelines, reliably measuring this influence remains elusive. In fact, we show that existing influence estimation methods fail to account for this, leading to highly inconsistent results - small changes in data ordering can result in drastically different influence estimates. Instead, we introduce f-influence - a new definition of influence grounded in hypothesis testing that explicitly captures training-time randomness. We define the influence of a data subset as the statistical ease of distinguishing between the outputs of models trained with and without that subset. We prove that f-influence satisfies desirable properties such as compositionality and asymptotic normality, analogous to central limit theorems. Leveraging these properties, we design a new algorithm f- INfluence Estimation (f-INE) algorithm that efficiently estimates the influence of training data in a single training run. Our approach is a theoretically sound, highly scalable, and empirically robust alternative to prior methods, enabling consistent influence estimation ⏩ Full Paper