Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generalized Neural Sorting Networks with Error-Free Differentiable Swap Functions (2310.07174v2)

Published 11 Oct 2023 in cs.LG and stat.ML

Abstract: Sorting is a fundamental operation of all computer systems, having been a long-standing significant research topic. Beyond the problem formulation of traditional sorting algorithms, we consider sorting problems for more abstract yet expressive inputs, e.g., multi-digit images and image fragments, through a neural sorting network. To learn a mapping from a high-dimensional input to an ordinal variable, the differentiability of sorting networks needs to be guaranteed. In this paper we define a softening error by a differentiable swap function, and develop an error-free swap function that holds a non-decreasing condition and differentiability. Furthermore, a permutation-equivariant Transformer network with multi-head attention is adopted to capture dependency between given inputs and also leverage its model capacity with self-attention. Experiments on diverse sorting benchmarks show that our methods perform better than or comparable to baseline methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. An O(n log n) sorting network. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC), pp.  1–9, Boston, Massachusetts, USA, 1983.
  2. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  3. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
  4. Smooth loss functions for deep top-k classification. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, British Columbia, Canada, 2018.
  5. Fast differentiable sorting and ranking. In Proceedings of the International Conference on Machine Learning (ICML), pp.  950–959, Virtual, 2020.
  6. Language models are few-shot learners. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pp.  1877–1901, Virtual, 2020.
  7. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the International Conference on Machine Learning (ICML), pp.  129–136, Corvallis, Oregon, USA, 2007.
  8. Introduction to algorithms. MIT Press, 4 edition, 2022.
  9. M. Cuturi. Sinkhorn distances: lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems (NeurIPS), volume 26, pp.  2292–2300, Lake Tahoe, Nevada, USA, 2013.
  10. Differentiable ranking and sorting using optimal transport. In Advances in Neural Information Processing Systems (NeurIPS), volume 32, Vancouver, British Columbia, Canada, 2019.
  11. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  12. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual, 2021.
  13. Stochastic optimization of sorting networks via continuous relaxations. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, Louisiana, USA, 2019.
  14. Deep residual learning for image recognition. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp.  770–778, Las Vegas, Nevada, USA, 2016.
  15. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.
  16. D. E. Knuth. The art of computer programming, volume 3. Addison-Wesley Professional, 2 edition, 1998.
  17. A. Krizhevsky and G. E. Hinton. Learning multiple layers of features from tiny images. Technical report, Computer Science Department, University of Toronto, 2009.
  18. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998.
  19. Set Transformer: A framework for attention-based permutation-invariant neural networks. In Proceedings of the International Conference on Machine Learning (ICML), pp.  3744–3753, Long Beach, California, USA, 2019.
  20. T.-Y. Liu. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, 3(3):225–331, 2009.
  21. Swin Transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the International Conference on Computer Vision (ICCV), pp.  10012–10022, Virtual, 2021.
  22. I. Loshchilov and F. Hutter. Decoupled weight decay regularization. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, British Columbia, Canada, 2018.
  23. Learning latent permutations with Gumbel-Sinkhorn networks. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, British Columbia, Canada, 2018.
  24. PolyGen: An autoregressive generative model of 3D meshes. In Proceedings of the International Conference on Machine Learning (ICML), pp.  7220–7229, Virtual, 2020.
  25. Reading digits in natural images with unsupervised feature learning. In Neural Information Processing Systems Workshop on Deep Learning and Unsupervised Feature Learning, Granada, Spain, 2011.
  26. Differentiable sorting networks for scalable sorting and ranking supervision. In Proceedings of the International Conference on Machine Learning (ICML), pp.  8546–8555, Virtual, 2021.
  27. Monotonic differentiable sorting networks. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual, 2022.
  28. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems (NeurIPS), volume 27, Montreal, Quebec, Canada, 2014.
  29. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), volume 30, pp.  5998–6008, Long Beach, California, USA, 2017.
  30. Order matters: Sequence to sequence for sets. In Proceedings of the International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2016.
  31. Deep sets. In Advances in Neural Information Processing Systems (NeurIPS), volume 30, pp.  3391–3401, Long Beach, California, USA, 2017.
  32. Point transformer. In Proceedings of the International Conference on Computer Vision (ICCV), pp.  16259–16268, Virtual, 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com