Generalized Neural Sorting Networks with Error-Free Differentiable Swap Functions (2310.07174v2)
Abstract: Sorting is a fundamental operation of all computer systems, having been a long-standing significant research topic. Beyond the problem formulation of traditional sorting algorithms, we consider sorting problems for more abstract yet expressive inputs, e.g., multi-digit images and image fragments, through a neural sorting network. To learn a mapping from a high-dimensional input to an ordinal variable, the differentiability of sorting networks needs to be guaranteed. In this paper we define a softening error by a differentiable swap function, and develop an error-free swap function that holds a non-decreasing condition and differentiability. Furthermore, a permutation-equivariant Transformer network with multi-head attention is adopted to capture dependency between given inputs and also leverage its model capacity with self-attention. Experiments on diverse sorting benchmarks show that our methods perform better than or comparable to baseline methods.
- An O(n log n) sorting network. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC), pp. 1–9, Boston, Massachusetts, USA, 1983.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
- Smooth loss functions for deep top-k classification. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, British Columbia, Canada, 2018.
- Fast differentiable sorting and ranking. In Proceedings of the International Conference on Machine Learning (ICML), pp. 950–959, Virtual, 2020.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pp. 1877–1901, Virtual, 2020.
- Learning to rank: from pairwise approach to listwise approach. In Proceedings of the International Conference on Machine Learning (ICML), pp. 129–136, Corvallis, Oregon, USA, 2007.
- Introduction to algorithms. MIT Press, 4 edition, 2022.
- M. Cuturi. Sinkhorn distances: lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems (NeurIPS), volume 26, pp. 2292–2300, Lake Tahoe, Nevada, USA, 2013.
- Differentiable ranking and sorting using optimal transport. In Advances in Neural Information Processing Systems (NeurIPS), volume 32, Vancouver, British Columbia, Canada, 2019.
- BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual, 2021.
- Stochastic optimization of sorting networks via continuous relaxations. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, Louisiana, USA, 2019.
- Deep residual learning for image recognition. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, Las Vegas, Nevada, USA, 2016.
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.
- D. E. Knuth. The art of computer programming, volume 3. Addison-Wesley Professional, 2 edition, 1998.
- A. Krizhevsky and G. E. Hinton. Learning multiple layers of features from tiny images. Technical report, Computer Science Department, University of Toronto, 2009.
- The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998.
- Set Transformer: A framework for attention-based permutation-invariant neural networks. In Proceedings of the International Conference on Machine Learning (ICML), pp. 3744–3753, Long Beach, California, USA, 2019.
- T.-Y. Liu. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, 3(3):225–331, 2009.
- Swin Transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the International Conference on Computer Vision (ICCV), pp. 10012–10022, Virtual, 2021.
- I. Loshchilov and F. Hutter. Decoupled weight decay regularization. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, British Columbia, Canada, 2018.
- Learning latent permutations with Gumbel-Sinkhorn networks. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, British Columbia, Canada, 2018.
- PolyGen: An autoregressive generative model of 3D meshes. In Proceedings of the International Conference on Machine Learning (ICML), pp. 7220–7229, Virtual, 2020.
- Reading digits in natural images with unsupervised feature learning. In Neural Information Processing Systems Workshop on Deep Learning and Unsupervised Feature Learning, Granada, Spain, 2011.
- Differentiable sorting networks for scalable sorting and ranking supervision. In Proceedings of the International Conference on Machine Learning (ICML), pp. 8546–8555, Virtual, 2021.
- Monotonic differentiable sorting networks. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual, 2022.
- Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems (NeurIPS), volume 27, Montreal, Quebec, Canada, 2014.
- Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), volume 30, pp. 5998–6008, Long Beach, California, USA, 2017.
- Order matters: Sequence to sequence for sets. In Proceedings of the International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2016.
- Deep sets. In Advances in Neural Information Processing Systems (NeurIPS), volume 30, pp. 3391–3401, Long Beach, California, USA, 2017.
- Point transformer. In Proceedings of the International Conference on Computer Vision (ICCV), pp. 16259–16268, Virtual, 2021.