Papers
Topics
Authors
Recent
Search
2000 character limit reached

NL2CMD: An Updated Workflow for Natural Language to Bash Commands Translation

Published 15 Feb 2023 in cs.CL, cs.AI, and cs.PF | (2302.07845v3)

Abstract: Translating natural language into Bash Commands is an emerging research field that has gained attention in recent years. Most efforts have focused on producing more accurate translation models. To the best of our knowledge, only two datasets are available, with one based on the other. Both datasets involve scraping through known data sources (through platforms like stack overflow, crowdsourcing, etc.) and hiring experts to validate and correct either the English text or Bash Commands. This paper provides two contributions to research on synthesizing Bash Commands from scratch. First, we describe a state-of-the-art translation model used to generate Bash Commands from the corresponding English text. Second, we introduce a new NL2CMD dataset that is automatically generated, involves minimal human intervention, and is over six times larger than prior datasets. Since the generation pipeline does not rely on existing Bash Commands, the distribution and types of commands can be custom adjusted. We evaluate the performance of ChatGPT on this task and discuss the potential of using it as a data generator. Our empirical results show how the scale and diversity of our dataset can offer unique opportunities for semantic parsing researchers.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (87)
  1. Raymond J Mooney. Semantic parsing: Past, present, and future. In Presentation slides from the ACL Workshop on Semantic Parsing, 2014.
  2. A transformer-based approach for translating natural language to bash commands. In 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 1245–1248. IEEE, 2021.
  3. Explainable natural language to bash translation using abstract syntax tree. In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 258–267, 2021.
  4. Nl2bash: A corpus and semantic parser for natural language interface to the linux operating system. arXiv preprint arXiv:1802.08979, 2018.
  5. A transformer-based approach for translating natural language to bash commands. In 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 1245–1248, 2021. doi:10.1109/ICMLA52953.2021.00202.
  6. J. Sammet. The use of english as a programming language. Commun. ACM, 9:228–230, 1966.
  7. Seq2sql: Generating structured queries from natural language using reinforcement learning. CoRR, abs/1709.00103, 2017.
  8. Learning to mine aligned code and natural language pairs from stack overflow. In International Conference on Mining Software Repositories, MSR, pages 476–486. ACM, 2018. doi:https://doi.org/10.1145/3196398.3196408.
  9. Nl2bash: A corpus and semantic parser for natural language interface to the linux operating system. ArXiv, abs/1802.08979, 2018.
  10. Semantic parsing on freebase from question-answer pairs. In EMNLP, 2013.
  11. Xuliang Liu and H. Zhong. Mining stackoverflow for program repair. 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 118–129, 2018.
  12. Xi Victoria Lin. Program synthesis from natural language using recurrent neural networks. 2017.
  13. Recurrent neural network based language model. In INTERSPEECH, 2010.
  14. Photon: A robust cross-domain text-to-sql system. ArXiv, abs/2007.15280, 2020.
  15. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, 2019.
  16. Get to the point: Summarization with pointer-generator networks. In ACL, 2017.
  17. Valuenet: A neural text-to-sql architecture incorporating values. ArXiv, abs/2006.00888, 2020.
  18. Pointer networks. In NIPS, 2015.
  19. Incorporating external knowledge through pre-training for natural language to code generation. ArXiv, abs/2004.09015, 2020.
  20. Tranx: A transition-based neural abstract syntax parser for semantic parsing and code generation. In EMNLP, 2018.
  21. Empirical evaluation of gated recurrent neural networks on sequence modeling. ArXiv, abs/1412.3555, 2014.
  22. Neurips 2020 nlc2cmd competition: Translating natural language to bash commands. ArXiv, abs/2103.02523, 2021.
  23. Recent advances in google translate, 2020. URL: https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html.
  24. Mike Schuster and K. Paliwal. Bidirectional recurrent neural networks. IEEE Trans. Signal Process., 45:2673–2681, 1997.
  25. Bash reference manual, 2020. URL: https://www.gnu.org/software/bash/manual/bash.html.
  26. Explainable natural language to bash translation using abstract syntax tree. In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 258–267, Online, November 2021. Association for Computational Linguistics. URL: https://aclanthology.org/2021.conll-1.20, doi:10.18653/v1/2021.conll-1.20.
  27. Data diversification: A simple strategy for neural machine translation. Advances in Neural Information Processing Systems, 33:10018–10029, 2020.
  28. Active learning approaches to enhancing neural machine translation. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1796–1806, Online, November 2020. Association for Computational Linguistics. URL: https://aclanthology.org/2020.findings-emnlp.162, doi:10.18653/v1/2020.findings-emnlp.162.
  29. Using document similarity methods to create parallel datasets for code translation. arXiv preprint arXiv:2110.05423, 2021.
  30. Text is all you need: Learning language representations for sequential recommendation. arXiv preprint arXiv:2305.13731, 2023.
  31. Codebert: A pre-trained model for programming and natural languages. CoRR, abs/2002.08155, 2020. URL: https://arxiv.org/abs/2002.08155, arXiv:2002.08155.
  32. The winograd schema challenge. In KR, 2011.
  33. H. Levesque. On our best behaviour. Artif. Intell., 212:27–35, 2014.
  34. Ernest Davis. Notes on ambiguity. URL: https://cs.nyu.edu/faculty/davise/ai/ambiguity.html.
  35. Sequence to sequence learning with neural networks. In NIPS, 2014.
  36. The best of both worlds: Combining recent advances in neural machine translation. In ACL, 2018.
  37. Opennmt: Open-source toolkit for neural machine translation. ArXiv, abs/1701.02810, 2017.
  38. Generating natural language adversarial examples. In EMNLP, 2018.
  39. Free software foundation (2018) linux, 2018. URL: https://www.kernel.org/doc/man-pages/.
  40. Idan Kamara. Bashlex, 2014. URL: https://github.com/idank/bashlex.
  41. fairseq: A fast, extensible toolkit for sequence modeling. ArXiv, abs/1904.01038, 2019.
  42. Magnum-nlc2cmd, 2020. URL: https://github.com/magnumresearchgroup/Magnum-NLC2CMD.
  43. M. Popel and Ondrej Bojar. Training tips for the transformer model. The Prague Bulletin of Mathematical Linguistics, 110:43 – 70, 2018.
  44. Energy and policy considerations for deep learning in nlp. ArXiv, abs/1906.02243, 2019.
  45. Towards accurate and reliable energy measurement of nlp models. ArXiv, abs/2010.05248, 2020.
  46. Towards the systematic reporting of the energy and carbon footprints of machine learning, 2020. arXiv:2002.05651.
  47. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
  48. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  49. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023, 2023.
  50. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382, 2023.
  51. Chatgpt prompt patterns for improving code quality, refactoring, requirements elicitation, and software design. arXiv preprint arXiv:2303.07839, 2023.
  52. Bash gen, 2022. URL: https://github.com/magnumresearchgroup/bash_gen.
  53. Data efficient training with imbalanced label sample distribution for fashion detection. arXiv preprint arXiv:2305.04379, 2023.
  54. Checkpoint ensembles: Ensemble methods from a single training process. ArXiv, abs/1710.03282, 2017.
  55. Hallucinations in neural machine translation. 2018.
  56. Attention is all you need. In NIPS, 2017.
  57. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9:1735–1780, 1997.
  58. Yonghui et al. Wu. Google’s neural machine translation system: Bridging the gap between human and machine translation. ArXiv, abs/1609.08144, 2016.
  59. Language models are unsupervised multitask learners. 2019.
  60. J. Schmidhuber. Deep learning in neural networks: An overview. Neural networks : the official journal of the International Neural Network Society, 61:85–117, 2015.
  61. The netflix prize. 2007.
  62. A syntactic neural model for general-purpose code generation, 2017. arXiv:1704.01696.
  63. The Next 7000 Programming Languages, pages 250–282. Springer International Publishing, Cham, 2019. URL: https://doi.org/10.1007/978-3-319-91908-9_15.
  64. Clai: A platform for ai skills on the command line. ArXiv, abs/2002.00762, 2020.
  65. M. Boot. Redundancy in natural language processing. 1978.
  66. C. Hoare. Hints on programming language design. 1973.
  67. Neurips 2020 competition track, 2020. URL: https://neurips.cc/Conferences/2020/CompetitionTrack.
  68. J. Vig. A multiscale visualization of attention in the transformer model. ArXiv, abs/1906.05714, 2019.
  69. Thomas G. Dietterich. Ensemble methods in machine learning. In Multiple Classifier Systems, 2000.
  70. Sequence-to-sequence learning as beam-search optimization. In EMNLP, 2016.
  71. Imagenet: A large-scale hierarchical image database. In CVPR 2009, 2009.
  72. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115:211–252, 2015.
  73. Todd Holloway. Introduction to ensemble learning - featuring successes in the netflix prize competition, 2007. URL: https://static.aminer.org/pdf/PDF/000/294/514/random_decision_forests.pdf.
  74. Neurips 2020 nlc2cmd, 2020. URL: http://nlc2cmd.us-east.mybluemix.net/.
  75. Samsung nlc2cmd, 2020. URL: https://github.com/Samsung/NLC2CMD.
  76. Nokia nlc2cmd submission hubris, 2020. URL: https://github.com/nokia/nlc2cmd-submission-hubris.
  77. Jetbrains nlc2cmd, 2020. URL: https://github.com/JetBrains/nlc2cmd.
  78. Code and named entity recognition in stackoverflow. ArXiv, abs/2005.01634, 2020.
  79. Incorporating copying mechanism in sequence-to-sequence learning. ArXiv, abs/1603.06393, 2016.
  80. Cosql: A conversational text-to-sql challenge towards cross-domain natural language interfaces to databases. In EMNLP/IJCNLP, 2019.
  81. J. Nielsen. Usability engineering. In The Computer Science and Engineering Handbook, 1997.
  82. Bleu: a method for automatic evaluation of machine translation. In ACL, 2002.
  83. Does bleu score work for code migration? 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), pages 165–176, 2019.
  84. The mathematics of statistical machine translation: Parameter estimation. Comput. Linguistics, 19:263–311, 1993.
  85. J. R. Medina and J. Kalita. Parallel attention mechanisms in neural machine translation. 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 547–552, 2018.
  86. Neural machine translation advised by statistical machine translation: The case of farsi-spanish bilingually low-resource scenario. 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 1209–1213, 2018.
  87. Sqlizer: query synthesis from natural language. Proceedings of the ACM on Programming Languages, 1(OOPSLA):1–26, 2017.
Citations (4)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.