Janossy Pooling: Learning Deep Permutation-Invariant Functions for Variable-Size Inputs

Published 5 Nov 2018 in cs.LG and stat.ML | (1811.01900v3)

Abstract: We consider a simple and overarching representation for permutation-invariant functions of sequences (or multiset functions). Our approach, which we call Janossy pooling, expresses a permutation-invariant function as the average of a permutation-sensitive function applied to all reorderings of the input sequence. This allows us to leverage the rich and mature literature on permutation-sensitive functions to construct novel and flexible permutation-invariant functions. If carried out naively, Janossy pooling can be computationally prohibitive. To allow computational tractability, we consider three kinds of approximations: canonical orderings of sequences, functions with $k$-order interactions, and stochastic optimization algorithms with random permutations. Our framework unifies a variety of existing work in the literature, and suggests possible modeling and algorithmic extensions. We explore a few in our experiments, which demonstrate improved performance over current state-of-the-art methods.

Abstract PDF Chat (Pro)

Citations (179)

View on Semantic Scholar

Summary

An Evaluation of Permutation-Invariance Through Janossy Pooling

In this paper, the authors present an analytical exploration of permutation-invariance using Janossy pooling, effectively unifying various existing approaches and proffering new methodological and theoretical advancements. The research explores two principal methodologies, namely $k$ -ary interactions and random permutations, each offering distinctive insights into permutation invariance through this pooling technique.

The $k$ -ary interaction approach facilitates exact Janossy pooling which is applicable to a restricted class of functions denoted as $\harrow{f}$. By incorporating an auxiliary neural network $\rho$ , the model is able to recover any loss in capacity and capture additional higher-order interactions. However, this enhancement conversely reduces tractability and identifiability, enunciating a trade-off between model complexity and operational feasibility. The paper suggests that imposing constraints on $\rho$ , such as convexity or Lipschitz continuity, might enable a more precise control over this balance, thereby fostering both theoretical and empirical endeavors to illuminate the dynamic compromises involved.

In contrast, the random permutation approach demonstrates an absence of explicit trade-offs between model capacity and computational requirements upon increasing the complexity of $\rho$ . This method instead alters the connection between the tractable approximate loss $\dbar{J}$ and the original Janossy loss $\dbar{L}$. Despite the inherent difference between $\dbar{J}$ and $\dbar{L}$, empirical evaluations indicate superior performance from this approach. The last row in Table~\ref{tab:accuracy} substantiate these findings, prompting further inquiries into identifying the domains where $\pi$ -SGD is most effective and the conditions under which its convergence criteria are met.

Moreover, the paper underscores the necessity of comprehending the interrelationship between the loss functions $\dbar{L}$ and $\dbar{J}$ to elucidate the somewhat opaque nature of this procedure. The authors also advocate investigating the connection between random permutation optimization and canonical ordering, contemplating potential improvements each could offer to the other.

In terms of application, the methodologies proposed in this study have promising implications across varied domains. Specifically, challenging tasks involving graphs and non-Poisson point processes present immediate arenas for the practical deployment of the discussed permutation-invariance techniques. This emphasizes the broader potential impacts of the research, suggesting avenues for continued exploration and refinement in both theoretical constructs and tangible applications.

Future developments in artificial intelligence, especially surrounding neural network architectures and optimization techniques, could benefit significantly from the insights presented herein, furthering advancements and adaptations in numerous related fields.