DDNAS: Discretized Differentiable Neural Architecture Search for Text Classification (2307.06005v1)
Abstract: Neural Architecture Search (NAS) has shown promising capability in learning text representation. However, existing text-based NAS neither performs a learnable fusion of neural operations to optimize the architecture, nor encodes the latent hierarchical categorization behind text input. This paper presents a novel NAS method, Discretized Differentiable Neural Architecture Search (DDNAS), for text representation learning and classification. With the continuous relaxation of architecture representation, DDNAS can use gradient descent to optimize the search. We also propose a novel discretization layer via mutual information maximization, which is imposed on every search node to model the latent hierarchical categorization in text representation. Extensive experiments conducted on eight diverse real datasets exhibit that DDNAS can consistently outperform the state-of-the-art NAS methods. While DDNAS relies on only three basic operations, i.e., convolution, pooling, and none, to be the candidates of NAS building blocks, its promising performance is noticeable and extensible to obtain further improvement by adding more different operations.
- Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning. arXiv:2002.09046 [stat.ML]
- Neural Machine Translation by Jointly Learning to Align and Translate. In International Conference on Learning Representations (ICLR).
- Mutual Information Neural Estimation. In Proceedings of the 35th International Conference on Machine Learning. 531–540.
- A Theoretical Analysis of Feature Pooling in Visual Recognition. In Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML’10). 111–118.
- SMASH: One-Shot Model Architecture Search through HyperNetworks. In International Conference on Learning Representations (ICLR).
- AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20. 2463–2469.
- Xiangning Chen and Cho-Jui Hsieh. 2020. Stabilizing Differentiable Architecture Search via Perturbation-based Regularization. In Proceedings of the 37th International Conference on Machine Learning. 1554–1565.
- Progressive Differentiable Architecture Search: Bridging the Depth Gap Between Search and Evaluation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
- DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation. In Proceedings of Interspeech 2020. 1803–1807.
- A review of performance evaluation measures for hierarchical classifiers. In Evaluation methods for machine learning II: Papers from the AAAI-2007 workshop. 1–6.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.
- RNN Architecture Learning with Sparse Regularization. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1179–1184.
- Hierarchical multi-label classification based on LSTM network and Bayesian decision theory for LncRNA function prediction. Scientific Reports 12, 1 (2022), 1–19.
- Graph Neural Architecture Search. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20. International Joint Conferences on Artificial Intelligence Organization, 1403–1409.
- LSTM: A Search Space Odyssey. IEEE Transactions on Neural Networks and Learning Systems (TNNLS) 28, 10 (2017), 2222–2232.
- AutoAttend: Automated Attention Representation Search. In Proceedings of the 38th International Conference on Machine Learning. 3864–3874.
- Recent Trends in Deep Learning Based Natural Language Processing [Review Article]. IEEE Computational Intelligence Magazine 13, 3 (2018), 55–75.
- Learning deep representations by mutual information estimation and maximization. In International Conference on Learning Representations (ICLR).
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (Nov. 1997), 1735–1780.
- Bill Horne and C. Giles. 1995. An experimental comparison of recurrent neural networks. In Advances in Neural Information Processing Systems, G. Tesauro, D. Touretzky, and T. Leen (Eds.), Vol. 7. 697–704.
- Detection of Cyberbullying Incidents on the Instagram Social Network. In International Conference on Social Informatics (SocInfo).
- Hai Huang and Huan Liu. 2020. Feature selection for hierarchical classification via joint semantic and structural information of labels. Knowledge-Based Systems 195 (2020), 105655.
- SP-NAS: Serial-to-Parallel Backbone Search for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- Improved Differentiable Architecture Search for Language Modeling and Named Entity Recognition. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP). 3585–3590.
- An Empirical Exploration of Recurrent Network Architectures. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (ICML ’15). 2342–2350.
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR).
- A Mutual Information Maximization Perspective of Language Representation Learning. In International Conference on Learning Representations (ICLR).
- SGAS: Sequential Greedy Architecture Search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- A Survey on Text Classification: From Traditional to Deep Learning. ACM Trans. Intell. Syst. Technol. 13, 2, Article 31 (apr 2022).
- DARTS: Differentiable Architecture Search. In International Conference on Learning Representations (ICLR).
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
- Detecting Rumors from Microblogs with Recurrent Neural Networks. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI ’16). 3818–3824.
- Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 142–150.
- A Formal Hierarchy of RNN Architectures. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). 443–459.
- Deep Learning–Based Text Classification: A Comprehensive Review. ACM Comput. Surv. 54, 3, Article 62 (apr 2021).
- A Survey of the Usages of Deep Learning for Natural Language Processing. IEEE Transactions on Neural Networks and Learning Systems (2020), 1–21.
- Ramakanth Pasunuru and Mohit Bansal. 2020. FENAS: Flexible and Expressive Neural Architecture Search. In Findings of the Association for Computational Linguistics: EMNLP 2020. 2869–2876.
- The Surprising Performance of Simple Baselines for Misinformation Detection. In Proceedings of the Web Conference 2021 (WWW ’21). 3432–3441.
- Large-Scale Hierarchical Text Classification with Recursively Regularized Deep Graph-CNN. In Proceedings of the 2018 World Wide Web Conference (WWW ’18). 1063–1072.
- Hierarchical Taxonomy-Aware and Attentional Graph Capsule RCNNs for Large-Scale Multi-Label Text Classification. IEEE Transactions on Knowledge and Data Engineering (TKDE) (2019), 1–1.
- Efficient Neural Architecture Search via Parameters Sharing. In Proceedings of the 35th International Conference on Machine Learning (ICML ’18). 4095–4104.
- Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
- Careful What You Share in Six Seconds: Detecting Cyberbullying Instances in Vine. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM ’15). 617–622.
- Regularized Evolution for Image Classifier Architecture Search. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI ’19). 4780–4789.
- A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions. ACM Comput. Surv. 54, 4, Article 76 (2021).
- Carlos N. Silla and Alex A. Freitas. 2011. A Survey of Hierarchical Classification across Different Application Domains. Data Min. Knowl. Discov. 22, 1–2 (2011), 31–72.
- Towards Automated Neural Interaction Discovery for Click-Through Rate Prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’20). 945–955.
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research 9 (2008), 2579–2605.
- Attention is All you Need. In Advances in Neural Information Processing Systems, Vol. 30. 5998–6008.
- TextNAS: A Neural Architecture Search Space Tailored for Text Representation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI ’20). 9242–9249.
- Fisher Yu and Vladlen Koltun. 2016. Multi-Scale Context Aggregation by Dilated Convolutions. In International Conference on Learning Representations (ICLR).
- Character-Level Convolutional Networks for Text Classification. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (NIPS’15). 649–657.
- A C-LSTM Neural Network for Text Classification. arXiv:1511.08630 [cs.CL]
- Recurrent Highway Networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (ICML’17). 4189–4198.
- Barret Zoph and Quoc V. Le. 2017. Neural Architecture Search with Reinforcement Learning. In International Conference on Learning Representations (ICLR).