On the Conflict of Robustness and Learning in Collaborative Machine Learning (2402.13700v2)
Abstract: Collaborative Machine Learning (CML) allows participants to jointly train a machine learning model while keeping their training data private. In many scenarios where CML is seen as the solution to privacy issues, such as health-related applications, safety is also a primary concern. To ensure that CML processes produce models that output correct and reliable decisions \emph{even in the presence of potentially untrusted participants}, researchers propose to use \textit{robust aggregators} to filter out malicious contributions that negatively influence the training process. In this work, we formalize the two prevalent forms of robust aggregators in the literature. We then show that neither can provide the intended protection: either they use distance-based metrics that cannot reliably identify malicious inputs to training; or use metrics based on the behavior of the loss function which create a conflict with the ability of CML participants to learn, i.e., they cannot eliminate the risk of compromise without preventing learning.