Locality-Sensitive Hashing-Based Efficient Point Transformer with Applications in High-Energy Physics (2402.12535v2)
Abstract: This study introduces a novel transformer model optimized for large-scale point cloud processing in scientific domains such as high-energy physics (HEP) and astrophysics. Addressing the limitations of graph neural networks and standard transformers, our model integrates local inductive bias and achieves near-linear complexity with hardware-friendly regular operations. One contribution of this work is the quantitative analysis of the error-complexity tradeoff of various sparsification techniques for building efficient transformers. Our findings highlight the superiority of using locality-sensitive hashing (LSH), especially OR & AND-construction LSH, in kernel approximation for large-scale point cloud data with local inductive bias. Based on this finding, we propose LSH-based Efficient Point Transformer (HEPT), which combines E$2$LSH with OR & AND constructions and is built upon regular computations. HEPT demonstrates remarkable performance on two critical yet time-consuming HEP tasks, significantly outperforming existing GNNs and transformers in accuracy and computational speed, marking a significant advancement in geometric deep learning and large-scale scientific data processing. Our code is available at https://github.com/Graph-COM/HEPT.
- The icecube realtime alert system. Astroparticle Physics, 2017.
- Computing graph neural networks: A survey from algorithms to accelerators. ACM Computing Surveys, 2021.
- Graph neural networks for low-energy event classification & reconstruction in icecube. Journal of Instrumentation, 2022.
- The tracking machine learning challenge: accuracy phase. Springer, 2020.
- The tracking machine learning challenge: throughput phase. Computing and Software for Big Science, 2023.
- Practical and optimal lsh for angular distance. Advances in Neural Information Processing Systems, 2015.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020.
- Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 2017.
- Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478, 2021.
- Language models are few-shot learners. Advances in Neural Information Processing Systems, 2020.
- Charikar, M. S. Similarity estimation techniques from rounding algorithms. Symposium on Theory of Computing, 2002.
- Scatterbrain: Unifying sparse and low-rank attention. Advances in Neural Information Processing Systems, 2021.
- Nagphormer: A tokenized graph transformer for node classification in large graphs. International Conference on Learning Representations, 2022.
- Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509, 2019.
- Rethinking attention with performers. International Conference on Learning Representations, 2021.
- Learning a fourier transform for linear relative positional encodings in transformers. arXiv preprint arXiv:2302.01925, 2023.
- CMS Group. Detector Drawings. Technical report, 2012. URL https://cds.cern.ch/record/1433717.
- CMS Group. CMS Phase-2 computing model: Update document by cms offline software and computing group. Technical report, 2022. URL https://cds.cern.ch/record/2815292.
- Dao, T. Flashattention-2: Faster attention with better parallelism and work partitioning. International Conference on Learning Representations, 2024.
- Fly. https://github.com/HazyResearch/fly, 2023.
- Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems, 2022.
- Smyrf-efficient attention using asymmetric clustering. Advances in Neural Information Processing Systems, 2020.
- Locality-sensitive hashing scheme based on p-stable distributions. Symposium on Computational Geometry, 2004.
- Delphes 3: a modular framework for fast simulation of a generic collider experiment. Journal of High Energy Physics, 2014.
- Charged particle tracking via edge-classifying interaction networks. Computing and Software for Big Science, 2021.
- Molecular dynamics simulations and drug discovery. BMC Biology, 2011.
- Embracing single stride 3d object detector with sparse transformer. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- Fast Graph Representation Learning with PyTorch Geometric. https://github.com/pyg-team/pytorch_geometric, 2019.
- Gaillard, M. Cern data centre passes the 200-petabyte milestone. 2017.
- Invited review article: Icecube: an instrument for neutrino astronomy. Review of Scientific Instruments, 2010.
- Hyperattention: Long-context attention in near-linear time. International Conference on Learning Representations, 2024.
- Learning memory access patterns. In International Conference on Machine Learning, 2018.
- Idiap. Fast transformers. https://github.com/idiap/fast-transformers, 2023.
- Approximate nearest neighbors: towards removing the curse of dimensionality. Symposium on Theory of Computing, 1998.
- Exploiting memory access patterns to improve memory performance in data-parallel architectures. IEEE Transactions on Parallel and Distributed Systems, 2010.
- Learning from protein structure with geometric vector perceptrons. International Conference on Learning Representations, 2021.
- Particle cloud generation with message passing generative adversarial networks. Advances in Neural Information Processing Systems, 2021.
- Evaluating generative models in high energy physics. Physical Review D, 2023.
- Transformers are rnns: Fast autoregressive transformers with linear attention. International Conference on Machine Learning, 2020.
- Adam: A method for stochastic optimization. International Conference for Learning Representations, 2015.
- Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations, 2017.
- Reformer: The efficient transformer. International Conference on Learning Representations, 2020.
- Langacker, P. The standard model and beyond. Taylor & Francis, 2017.
- Mining of massive data sets. Cambridge University Press, 2020.
- Semi-supervised graph neural networks for pileup noise removal. The European Physical Journal C, 2023.
- Gated graph sequence neural networks. International Conference on Learning Representations, 2016.
- High pileup particle tracking with object condensation. International Connecting The Dots Workshop, 2023.
- Focal loss for dense object detection. IEEE/CVF International Conference on Computer Vision, 2017.
- Flatformer: Flattened window attention for efficient point cloud transformer. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- Stable, fast and accurate: Kernelized attention with relative positional encoding. Advances in Neural Information Processing Systems, 2021.
- Voxel transformer for 3d object detection. IEEE/CVF International Conference on Computer Vision, 2021.
- Pileup mitigation at the large hadron collider with graph neural networks. The European Physical Journal Plus, 2019.
- The illustristng simulations: public data release. Computational Astrophysics and Cosmology, 2019.
- Oerter, R. The theory of almost everything: The standard model, the unsung triumph of modern physics. Penguin, 2006.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectors. arXiv e-prints, 2023.
- Random feature attention. International Conference on Learning Representations, 2021.
- Learning representations of irregular particle-detector geometry with distance-weighted graph networks. The European Physical Journal C, 2019.
- Jet tagging via particle clouds. Physical Review D, 2020.
- Particle transformer for jet tagging. International Conference on Machine Learning, 2022.
- Machine learning at the energy and intensity frontiers of particle physics. Nature, 2018.
- Random features for large-scale kernel machines. Advances in Neural Information Processing Systems, 2007.
- Rudin, W. Fourier Analysis on Groups. Wiley, 1991.
- E (n) equivariant graph neural networks. International Conference on Machine Learning, 2021.
- Self-attention with relative position representations. North American Chapter of the Association for Computational Linguistics, 2018.
- Exphormer: Sparse transformers for graphs. International Conference on Machine Learning, 2023.
- Asymmetric lsh (alsh) for sublinear time maximum inner product search (mips). Advances in Neural Information Processing Systems, 2014.
- Equibind: Geometric deep learning for drug binding structure prediction. In International Conference on Machine Learning, 2022.
- Track and vertex reconstruction: From classical to adaptive methods. Reviews of Modern Physics, 2010.
- Swformer: Sparse window transformer for 3d object detection in point clouds. European Conference on Computer Vision, 2022.
- Retentive network: A successor to transformer for large language models. arXiv preprint arXiv:2307.08621, 2023.
- On the error of random fourier features. Uncertainty in Artificial Intelligence, 2015.
- Sparse sinkhorn attention. International Conference on Machine Learning, 2020.
- Equivariance is not all you need: Characterizing the utility of equivariant graph neural networks for particle physics tasks. Knowledge and Logical Reasoning in the Era of Data-driven Learning Workshop at International Conference on Machine Learning, 2023.
- Graph neural networks in particle physics: Implementations, innovations, and challenges. US Community Study on the Future of Particle Physics, 2022.
- Attention is all you need. Advances in Neural Information Processing Systems, 2017.
- Dsvt: Dynamic sparse voxel transformer with rotated sets. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- Wang, P. Performer-pytorch. https://github.com/lucidrains/performer-pytorch, 2023a.
- Wang, P. Reformer-pytorch. https://github.com/lucidrains/reformer-pytorch, 2023b.
- Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768, 2020.
- Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics, 2019.
- Supersymmetry and supergravity. Princeton University Press, 1992.
- Efficient large-scale approximate nearest neighbor search on the gpu. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016.
- Rethinking and improving relative position encoding for vision transformer. IEEE/CVF International Conference on Computer Vision, 2021.
- Nodeformer: A scalable graph structure learning transformer for node classification. Advances in Neural Information Processing Systems, 2022.
- Nyströmformer: A nyström-based algorithm for approximating self-attention. AAAI Conference on Artificial Intelligence, 2021.
- Big bird: Transformers for longer sequences. Advances in Neural Information Processing Systems, 2020.
- Kdeformer: Accelerating transformers via kernel density estimation. International Conference on Machine Learning, 2023.
- Hierarchical graph transformer with adaptive node sampling. Advances in Neural Information Processing Systems, 2022.
- Point transformer. IEEE/CVF International Conference on Computer Vision, 2021.
- Siqi Miao (10 papers)
- Zhiyuan Lu (8 papers)
- Mia Liu (18 papers)
- Javier Duarte (67 papers)
- Pan Li (164 papers)